Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and ... Innovation, Systems and Technologies, 266) 9811666237, 9789811666230

This book presents the proceedings of the 9th International Conference on Frontiers of Intelligent Computing: Theory and

104 71 17MB

English Pages 585 [556] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Organization
Preface
Contents
About the Editors
1 Automated Flower Species Identification by Using Deep Convolution Neural Network
1.1 Introduction
1.2 Related Work
1.2.1 Feature Extraction
1.2.2 Local Feature Descriptor
1.2.3 Global Descriptor
1.3 Global Feature Extraction
1.3.1 Training Images
1.3.2 Deep Learning Using CNN
1.4 Proposed System
1.5 Results
1.6 Conclusion
References
2 Information Retrieval for Cloud Forensics
2.1 Introduction
2.1.1 Comparison to Other Cloud Forensics Datasets
2.2 Dataset
2.2.1 Monitoring Database
2.2.2 Monitoring Database
2.3 Training and Testing
2.4 Conclusion
References
3 Machine Translation System Combination with Enhanced Alignments Using Word Embeddings
3.1 Introduction
3.2 Methodology
3.2.1 Confusion Network Generation
3.2.2 Alignment of Hypotheses Using Word Embeddings
3.3 Related Literature
3.4 Set up of the Experiments
3.4.1 Data
3.4.2 Data Pre-processing
3.4.3 Training of MT Systems
3.5 Experiments and Results
3.6 Conclusion
References
4 Geometry-Based Machining Feature Retrieval with Inductive Transfer Learning
4.1 Introduction
4.2 Literature Review
4.3 Materials and Methods
4.3.1 3D CAD Models
4.3.2 Methodology
4.4 Results and Discussion
4.5 Conclusion and Future Work
References
5 Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture
5.1 Introduction
5.2 Literature Review
5.3 Malayalam Phonemes
5.4 Dataset Description
5.5 Experiments and Results
5.5.1 Experimental Setup—The Grapheme to Phoneme Model
5.5.2 Hyperparameters
5.6 Results and Discussion
5.7 Conclusion
References
6 Usage of Blockchain Technology in e-Voting System Using Private Blockchain
6.1 Introduction
6.1.1 Blockchain Technology
6.1.2 Smart Contract
6.1.3 Private Versus Public Blockchain
6.1.4 Practical Byzantine Fault Tolerance (PBFT)
6.2 Literature Review
6.3 Proposed e-Voting Application
6.3.1 Voter/Candidate Registration Phase
6.3.2 Voter Authentication and Login Phase
6.3.3 Vote Casting and Validation Phase
6.3.4 Voter Counting and Announcement of Result Phase
6.4 Conclusion
References
7 Bengali Visual Genome: A Multimodal Dataset for Machine Translation and Image Captioning
7.1 Introduction
7.2 Related Work
7.3 Dataset Preparation
7.3.1 Training Set Preparation
7.3.2 Test Set Preparation
7.4 Sample Applications of BVG
7.4.1 Text-Only Translation
7.4.2 Bengali Caption Generation
7.5 Conclusion and Future Work
References
8 Deep Learning-Based Mosquito Species Detection Using Wingbeat Frequencies
8.1 Introduction
8.2 Related Works
8.3 Materials and Methods
8.3.1 Audio Data Capture
8.3.2 Data Preprocessing
8.3.3 Machine Learning-Based Model Development and Training
8.3.4 Inference Generation
8.4 Results
8.4.1 Experimental Setup
8.4.2 Model Training
8.4.3 Performance Comparison
8.5 Discussion and Conclusion
References
9 Developments in Capsule Network Architecture: A Review
9.1 Introduction
9.1.1 Convolutional Neural Networks (CNN)
9.1.2 Limitations of Convolutional Neural Networks
9.2 Capsule Networks (CapsNet)
9.2.1 Transforming Auto-encoders
9.2.2 Dynamic Routing Between Capsules
9.2.3 Matrix Capsule with EM Routing
9.3 Major Capsule Networks Structures and Implementations
9.3.1 CapsNet Performance Analysis
9.3.2 Other Modifications on Baseline Implementation
9.3.3 Capsule Network Applications
9.3.4 Datasets
9.3.5 Performance Evaluation Methods
9.3.6 Discussion
9.4 Conclusion
References
10 Computer-Aided Segmentation of Polyps Using Mask R-CNN and Approach to Reduce False Positives
10.1 Introduction
10.1.1 Related Work
10.1.2 Contributions
10.2 Methodology
10.2.1 Dataset
10.2.2 Mask R-CNN Implementation
10.2.3 Reducing False Positives
10.2.4 Training Details
10.3 Results
10.3.1 Evaluation Metrics
10.3.2 Evaluation of Polyp Frames
10.4 Conclusion
References
11 Image GPT with Super Resolution
11.1 Introduction
11.2 Background and Related Work
11.2.1 IGPT Algorithm
11.2.2 ESPCNN Algorithm
11.3 Proposed Algorithm
11.4 Our Results on IGPT-S and ESPCNN Super Resolution
11.5 Conclusion
References
12 Boosting Accuracy of Machine Learning Classifiers for Heart Disease Forecasting
12.1 Introduction
12.2 Related Works
12.3 Methodology
12.3.1 Dataset Description
12.3.2 Architecture
12.3.3 Pre-processing
12.3.4 Learning Classifiers
12.3.5 Ensemble Methods
12.4 Implementation and Results
12.4.1 Technology
12.4.2 Evaluation Metrics
12.4.3 Results
12.5 Conclusion
References
13 Rapid Detection of Fragile X Syndrome: A Gateway Towards Modern Algorithmic Approach
13.1 Introduction
13.2 Our Approach
13.3 Results
13.4 Conclusion and Future Prospect
References
14 Summarizing Bengali Text: An Extractive Approach
14.1 Introduction
14.2 Literature Review
14.3 Methodology
14.3.1 BERT-SUM Fine Tune for Summarization
14.3.2 BERT Extractive Summarization
14.4 Result
14.5 Conclusions
References
15 Dynamic Hand Gesture Recognition of the Days of a Week in Indian Sign Language Using Low-Cost Depth Device
15.1 Introduction
15.1.1 Related Work
15.1.2 Our Contributions
15.2 Data Collection
15.3 Pre-processing
15.3.1 Key Frames Extraction
15.3.2 Background Subtraction
15.4 Features Extraction
15.5 Experiments and Analysis
15.6 Conclusion
References
16 Sentiment Analysis on Telugu–English Code-Mixed Data
16.1 Introduction
16.2 Literature Survey
16.3 Proposed Methodologies
16.3.1 Dataset
16.3.2 Lexicon-Based Approach
16.3.3 Machine Learning Approach
16.4 Results
16.5 Conclusion and Future Work
References
17 Fuzziness on Interconnection Networks Under Ratio Labelling
17.1 Introduction
17.2 Basic Concepts
17.3 Context that Enhances the Study
17.4 Main Results
17.4.1 Definition
17.4.2 Theorem
17.4.3 Theorem
17.4.4 Theorem
17.4.5 Theorem
17.4.6 Theorem
17.5 Conclusion
References
18 CoviNet: Role of Convolution Neural Networks (CNN) for an Efficient Diagnosis of COVID-19
18.1 Introduction
18.2 Literature Survey
18.2.1 Limitations
18.3 Proposed Methodology
18.3.1 XGBOOST Classifier
18.3.2 Algorithm for CT-Scan Image Classification
18.3.3 Database
18.4 Screen Shots
18.5 Results and Discussion
18.6 Conclusion and Future Work
References
19 Deep Learning for Real-Time Diagnosis of Pest and Diseases on Crops
19.1 Introduction
19.2 Methods
19.2.1 Data Collection
19.2.2 Android Application
19.2.3 Website
19.2.4 Convolutional Neural Networks (CNN)
19.3 Results and Discussion
19.4 Conclusion
References
20 Sentiment-Based Abstractive Text Summarization Using Attention Oriented LSTM Model
20.1 Introduction
20.2 System Description
20.2.1 Data Processing
20.2.2 System Training
20.2.3 System Testing
20.3 Experimental Design
20.3.1 Dataset Description and Evaluating Methods
20.3.2 Experimental Setup
20.4 Result Analysis
20.5 Conclusion
References
21 A Fuzzy-Based Multiobjective Cat Swarm Optimization Algorithm: A Case Study on Single-Cell Data
21.1 Introduction
21.2 Cat Swarm Optimization
21.2.1 Seeking Mode
21.2.2 Tracing Mode
21.3 Proposed Multiobjective Fuzzy-Based Cat Swarm Optimization Algorithm for Clustering
21.3.1 Solution Representation and Population Initialization
21.3.2 Assignment of Gene Point to Different Cluster
21.3.3 Execution of FCM Step
21.3.4 Objective Function
21.3.5 Selection and Ranking of New Non-dominated Solution for the Next Generation
21.3.6 Clustering Optimization with CSO Algorithm
21.3.7 Terminating Criteria
21.4 Experimental Result
21.4.1 Dataset Description
21.4.2 Discussion of Result
21.4.3 Case Study: Human Urinary Bladder
21.5 Conclusion and Future Work
References
22 KTM-POP: Transliteration of K-POP Lyrics to Marathi
Abstract
22.1 Introduction
22.1.1 Types of Transliteration
22.1.2 Phonetic Representation
22.1.3 Script Specification
22.2 Literature Survey
22.3 Methodology
22.4 Result and Discussion
22.5 Conclusion
References
23 A Comparative Study on the Use of Augmented Reality in Indoor Positioning Systems and Navigation
23.1 Introduction
23.2 Indoor Positioning System
23.3 Augmented Reality
23.3.1 Applications of AR in Navigation
23.4 Augmented Reality-Based Indoor Positioning Systems
23.5 Inference
23.6 Conclusion and Future Scope
References
24 Classification of Chest X-Ray Images to Diagnose COVID-19 Disease Through Transfer Learning
24.1 Introduction
24.2 Related Works
24.3 Proposed Model
24.4 Experimentation and Results
24.4.1 Dataset
24.4.2 Experimental Setup
24.4.3 Results and Analysis
24.5 Comparative Analysis
24.5.1 Within Model Comparison
24.5.2 Comparison Against State-of-the Art Models
24.6 Conclusion
References
25 Remodeling Rainfall Prediction Using Artificial Neural Network and Machine Learning Algorithms
25.1 Introduction
25.2 Related Work
25.3 Methodology
25.4 Experimental Analysis
25.5 Result and Discussion
25.6 Conclusion and Future Work
References
26 Gender-Based Emotion Recognition: A Machine Learning Technique
26.1 Introduction
26.2 Technique and Algorithm
26.3 Feature Extraction
26.4 Gaussian Mixture Model
26.5 Simulation Modeling
26.6 Conclusion
References
27 An Efficient Exploratory Demographic Data Analytics Using Preprocessed Autoregressive Integrated Moving Average
27.1 Introduction
27.2 Methodology
27.2.1 Data Collection
27.2.2 Data Preprocessing
27.3 Data Visualization and Data Insights
27.4 Time Series Forecasting
27.5 Future Work and Conclusion
References
28 Classification of VASA Dataset Using J48, Random Forest, and Naive Bayes
28.1 Introduction
28.2 Related Work
28.3 Methodology
28.3.1 J48 Algorithm
28.3.2 Random Forest
28.3.3 Naive Bayes
28.4 Results and Discussion
28.5 Conclusion
References
29 Efficient Fault-Tolerant Cluster-Based Approach for Wireless Sensor Networks
29.1 Introduction
29.2 Related Work
29.3 System Model and Assumptions
29.4 Proposed Methodology
29.5 Simulation Results
29.6 Conclusion
References
30 From Chalk Boards to Smart Boards: An Integration of IoT into Educational Environment During Covid-19 Pandemic
30.1 Introduction
30.2 Review of Literature
30.3 Objectives
30.4 A Fusion of IoT into Educational Sector for Learning Purpose During Covid-19 Pandemic
30.4.1 Unveiling a World of Opportunities
30.4.2 Challenges in Integrating IoT into Education
30.5 Conclusion
References
31 Design and Development of an IoT-Based Smart Hexa-Copter for Multidisciplinary Applications
31.1 Introduction
31.2 Literature Review
31.3 The Proposed System Architecture and Its Components
31.3.1 Microscopic Description of the Application
31.3.2 Required Components with a Specific Application
31.3.3 Working Functionalities
31.3.4 Circuit Diagram
31.4 Various Calculation and Results
31.4.1 System Testing and Error Analysis
31.5 Conclusion
References
32 Deep Learning-Based Violence Detection from Videos
32.1 Introduction
32.2 Related Work
32.2.1 Approaches Based on Convolutional Neural Networks
32.2.2 Motion Detection-Based Approaches
32.2.3 Approaches Utilizing Other Machine Learning Algorithms
32.3 Proposed Work
32.4 Datasets
32.5 Implementation
32.5.1 Setup
32.5.2 Parameters
32.5.3 Result
32.6 Conclusion
32.7 Future Work
References
33 Stationary Wavelet-Based Fusion Approach for Enhancement of Microscopy Images
33.1 Introduction
33.2 Proposed Fusion-Based Enhancement Approach
33.2.1 Guided Image Filter (GIF)
33.2.2 Morphological Filter (MF)
33.2.3 Unsharp Masking (UM)
33.2.4 Fusion-Based Enhancement Filter Using SWT
33.3 Results and Discussions
33.3.1 Image Quality Assessment
33.3.2 Experimental Results
33.3.3 Discussions
33.4 Conclusion
References
34 A Novel Optimization for Synthesis of Concentric Circular Array Antenna
34.1 Introduction
34.2 Concentric Circular Antenna Array
34.3 Design Equations
34.4 Biogeography-Based Optimization-BBO
34.5 Results
34.5.1 Results for Concentric Circular Array Antenna Using GA
34.5.2 Results for Concentric Circular Array Antenna Using PSO
34.5.3 Results for Concentric Circular Array Antenna Using BBO
34.6 Conclusions
References
35 PAPR Analysis of FBMC and UFMC for 5G Cellular Communications
35.1 Introduction
35.2 FBMC and UFMC
35.2.1 FBMC
35.2.2 UFMC
35.3 PAPR
35.4 Simulation Results
35.4.1 PAPR Analysis
35.5 Conclusion and Future Scope
References
36 Touchless Doorbell with Sanitizer Dispenser: A Precautionary Measure of COVID-19
36.1 Introduction
36.2 Implementation Diagram
36.3 Design Approach of Project
36.4 Results and Discussions
36.5 Conclusion
References
37 Optic Disc Segmentation Based on Active Contour Model for Detection and Evaluation of Glaucoma on a Real-Time Challenging Dataset
37.1 Introduction
37.2 Method and Materials
37.2.1 Method
37.2.2 Materials
37.3 Results and Discussions
37.4 Conclusion
References
38 Array Thinning Using Social Modified Social Group Optimization Algorithm
38.1 Introduction
38.2 Problem Formulation
38.3 The Social Group Optimization Algorithm
38.4 Results and Discussions
38.5 Conclusion
References
39 Impact of Flash Flood on Landuse and Landcover Dynamics and Erosional Pattern of Jiadhal River Basin, Assam, India
39.1 Introduction
39.2 Study Area
39.3 Methodology
39.4 Results and Discussion
39.4.1 Land Use Land Cover and Erosion Dynamics
39.4.2 Impact of Flash Flood on Soil Loss
39.5 Conclusion
References
40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model, Computational Intelligence Methods, and Geospatial Analytics: A Case Study of Aizawl City, Mizoram—India
40.1 Introduction
40.2 Materials and Methodology
40.2.1 Location of the Study Area
40.2.2 Data Used
40.2.3 Methodology
40.3 Results and Discussion
40.3.1 Results Obtained from Assessment of Landslide Hazard Area
40.3.2 Map Validation of Landslide Hazard Accuracy Using Historical Landslide Data
40.3.3 Results Obtained from the Identification of Vulnerable Zones
40.3.4 Results of the Assessment of Risk Analysis for Landslide Hazard in Aizawl City
40.4 Conclusions
References
41 Delineation and Assessment of Groundwater Potential Zones Using Geospatial Technology and Fuzzy Analytical Hierarchy Process Model
41.1 Introduction
41.2 Study Area
41.3 Tools and Techniques Use in This Study
41.4 Research Methodology
41.5 Identification of Influencing Factors
41.6 Result Obtained from Knowledge-Based GIS Assessment
41.7 Results Obtained from AHP-Based GIS Assessment
41.8 Results Obtained from Fuzzy AHP-Based GIS Assessment
41.9 Results Obtained from Proximity-Based GIS Assessment
41.10 Results Obtained by Adding the Result Maps of All Techniques (Knowledge Based, AHP, Fuzzy AHP)
41.11 Summary, Conclusion and Recommendations
References
42 COVID-19: Geospatial Analysis of the Pandemic—A Case Study of Bihar State, India, Using Data Derived from Remote Sensing Satellites and COVID-19 National Geoportal
42.1 Introduction
42.2 Study Area and Methodology
42.2.1 Study Area
42.2.2 Data Used
42.3 Methodology
42.4 Results and Discussions
References
43 Assessment of the Relationship Between Rainfall Trend and Flood Impact: A Case Study of Tinsukia District, Assam
43.1 Introduction
43.2 Research Methodology
43.2.1 Collection of Data and Pre-processing
43.2.2 Flood Impact Indicator Variables
43.3 Results and Discussion
43.4 Conclusions
References
44 Learning Deep Features and Classification for Fresh or off Vegetables to Prevent Food Wastage Using Machine Learning Algorithms
44.1 Introduction
44.1.1 YOLOv4
44.1.2 YOLOv3
44.2 Related Work
44.3 Proposed Method
44.3.1 Data Accumulation and Annotation
44.3.2 Methodology
44.4 Results and Analysis
44.4.1 YOLOv3 Results
44.4.2 YOLOv4 Results
44.4.3 Analysis
44.5 Conclusion
References
45 Secure Trust Level Routing in Delay-Tolerant Network with Node Categorization Technique
45.1 Introduction
45.2 Related Works
45.3 Proposed Methodology
45.4 Results and Discussion
45.5 Conclusion
References
46 Predicting the Trends of COVID-19 Cases Using LSTM, GRU and RNN in India
46.1 Introduction
46.2 Literature Review
46.3 Proposed Methodology
46.4 Experimental Analysis
46.4.1 Dataset Description
46.4.2 Gated Recurrent Units
46.4.3 Long Short-Term Memory
46.4.4 Recurrent Neural Network
46.5 Experimental Analysis
46.6 Conclusion and Future Scope
46.7 Declaration
References
47 Biogeography-Based Optimization
47.1 Introduction
47.2 Proposed Method
47.3 Simulation Results
47.4 Conclusion
References
48 Novel Sentiment Analysis Model with Modern Bio-NLP Techniques Over Chronic Diseases
48.1 Introduction
48.1.1 Origin of the Problem
48.1.2 Problem Statement
48.1.3 Applications
48.2 Review of Literature
48.2.1 Existing Solutions
48.2.2 Summary of Literature Study
48.3 Proposed Method
48.3.1 Design Methodology
48.3.2 System Architecture Diagram
48.3.3 Description of Dataset
48.4 Results and Observations
48.4.1 Step-Wise Description of Results
48.4.2 Testcase Results
48.4.3 Observations from the Work
48.5 Results and Observations
48.5.1 Conclusion
48.5.2 Future Work
References
49 Medical Diagnosis for Incomplete and Imbalanced Data
49.1 Introduction
49.2 Literature Survey
49.3 Proposed Architecture and Methodology
49.3.1 Proposed Architecture
49.3.2 Proposed Methodology
49.4 Experimental Investigations
49.4.1 Dataset
49.4.2 Discussion on Results
49.5 Conclusion
References
50 SafeXAI: Explainable AI to Detect Adversarial Attacks in Electronic Medical Records
50.1 Introduction
50.2 Literature Survey
50.3 Proposed Methodology
50.3.1 Dataset Collection
50.3.2 Model Building
50.3.3 Adversarial Attacks
50.3.4 Working of CW Attack
50.4 Handling Adversarial Attacks
50.4.1 Local Interpretable Model-Agnostic Explanations(LIME)
50.5 Conclusion and Future Work
References
51 Exploring Historical Stock Price Movement from News Articles Using Knowledge Graphs and Unsupervised Learning
51.1 Introduction
51.2 Related Work
51.3 Proposed Methodology
51.3.1 Research Architecture
51.3.2 Data Information
51.3.3 Data Preprocessing
51.4 Implementation
51.5 Results
51.6 Conclusion and Future Work
References
52 Comparative Study of Classical and Quantum Cryptographic Techniques Using QKD Simulator
52.1 Cryptography
52.2 Quantum Cryptography
52.2.1 Quantum Mechanics and Its Properties
52.2.2 Quantum Cryptographic Constructions
52.3 Motivation and Background Study
52.4 Proposed System
52.5 Results
52.6 Conclusion
References
53 A Novel Approach to Encrypt the Data Using DWT and Histogram Feature
53.1 Introduction
53.2 Literature Survey
53.3 Methodology
53.3.1 Pre-processing
53.3.2 DWT (Discrete Wavelet Transform)
53.3.3 Image Histogram
53.3.4 Histogram Shifting and Data Hiding
53.3.5 Extraction Steps
53.4 Experiment
53.5 Result
53.6 Conclusion
References
54 Auto-generation of Smart Contracts from a Domain-Specific XML-Based Language
54.1 Introduction
54.2 Motivating Example and Preliminaries
54.2.1 Running Case
54.2.2 Preliminaries
54.3 SLCML: A Contract-Specification Language
54.4 Patterns and Transformation Rules
54.5 Feasibility Evaluation
54.6 Related Work
54.7 Conclusion
References
Author Index
Recommend Papers

Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and ... Innovation, Systems and Technologies, 266)
 9811666237, 9789811666230

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Smart Innovation, Systems and Technologies 266

Suresh Chandra Satapathy · Peter Peer · Jinshan Tang · Vikrant Bhateja · Anumoy Ghosh   Editors

Intelligent Data Engineering and Analytics Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021)

123

Smart Innovation, Systems and Technologies Volume 266

Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK

The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.

More information about this series at https://link.springer.com/bookseries/8767

Suresh Chandra Satapathy · Peter Peer · Jinshan Tang · Vikrant Bhateja · Anumoy Ghosh Editors

Intelligent Data Engineering and Analytics Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021)

Editors Suresh Chandra Satapathy School of Computer Engineering Kalinga Institute of Industrial Technology (KIIT) Bhubaneswar, Odisha, India Jinshan Tang College of Computing Michigan Technological University Michigan, MI, USA Anumoy Ghosh Department of Electronics and Communication Engineering National Institute of Technology (NIT) Mizoram Aizawl, Mizoram, India

Peter Peer Faculty of Computer and Information Science University of Ljubljana Ljubljana, Slovenia Vikrant Bhateja Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM) Lucknow, India Dr. A. P. J. Abdul Kalam Technical University Lucknow, Uttar Pradesh, India

ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-16-6623-0 ISBN 978-981-16-6624-7 (eBook) https://doi.org/10.1007/978-981-16-6624-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Organization

Chief Patrons Prof. Rajat Gupta, Director, NIT Mizoram

Patrons Prof. Saibal Chatterjee, Dean (Academics), NIT Mizoram Dr. Alok Shukla, Dean (Students’ Welfare), NIT Mizoram Dr. P. Ajmal Koya, Dean (Research and Consultancy), NIT Mizoram Dr. K. Gyanendra Singh, Dean (Faculty Welfare), NIT Mizoram

General Chair Dr. Jinshan Tang, College of Computing, Michigan Technological University, Michigan, USA

Publication Chairs Dr. Yu-Dong Zhang, Department of Informatics, University of Leicester, Leicester, UK Dr. Peter Peer, Faculty of Computer and Information Science, University of Ljubljana, Slovenia Dr. Suresh Chandra Satapathy, KIIT, Bhubaneshwar

v

vi

Organization

Conveners Dr. Ranjita Das, Head, Department of CSE, NIT Mizoram Dr. Anumoy Ghosh, Head, Department of ECE, NIT Mizoram

Organizing Chairs Dr. Ranjita Das, Head, Department of CSE, NIT Mizoram Dr. Anumoy Ghosh, Head, Department of ECE, NIT Mizoram Dr. Rudra Sankar Dhar, Assistant Professor, Department of ECE, NIT Mizoram Dr. Chaitali Koley, Assistant Professor, Department of ECE, NIT Mizoram Mr. Sandeep Kumar Dash, Assistant Professor, Department of CSE, NIT Mizoram

Publicity Chairs Dr. Chaitali Koley, Assistant Professor, Department of ECE, NIT Mizoram Mr. Sushanta Bordoloi, Trainee Teacher, Department of ECE, NIT Mizoram Mr. Sandeep Kumar Dash, Assistant Professor, Department of ECE, NIT Mizoram Mr. Lenin Laitonjam, Trainee Teacher, Department of CSE, NIT Mizoram

Advisory Committee Aime’ Lay-Ekuakille, University of Salento, Lecce, Italy Annappa Basava, Department of CSE, NIT Karnataka Amira Ashour, Tanta University, Egypt Aynur Unal, Standford University, USA Bansidhar Majhi, IIIT Kancheepuram, Tamil Nadu, India Dariusz Jacek Jakobczak, Koszalin University of Technology, Koszalin, Poland Dilip Kumar Sharma, IEEE U.P. Section Ganpati Panda, IIT Bhubaneswar, Odisha, India Jagdish Chand Bansal, South Asian University, New Delhi, India João Manuel R. S. Tavares, Universidade do Porto (FEUP), Porto, Portugal Jyotsana Kumar Mandal, University of Kalyani, West Bengal, India K. C. Santosh, University of South Dakota, USA Le Hoang Son, Vietnam National University, Hanoi, Vietnam Naeem Hanoon, Multimedia University, Cyberjaya, Malaysia Nilanjan Dey, TIET, Kolkata, India Noor Zaman, Universiti Tecknologi, Petronas, Malaysia

Organization

vii

Pradip Kumar Das, Professor, Department of CSE, IIT Guwahati Roman Senkerik, Tomas Bata University in Zlin, Czech Republic Sriparna Saha, Associate Professor, Department of CSE, IIT Patna Sukumar Nandi, Department of CSE, IIT Guwahati Swagatam Das, Indian Statistical Institute, Kolkata, India Siba K. Udgata, University of Hyderabad, Telangana, India Tai Kang, Nanyang Technological University, Singapore Ujjawl Maulic, Department of CSE, Jadavpur University Valentina Balas, Aurel Vlaicu University of Arad, Romania Yu-Dong Zhang, University of Leicester, UK

Technical Program Committee Chairs Dr. Steven L. Fernandes, Creighton University, USA Dr. Vikrant Bhateja, Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM), Lucknow, India

Technical Program Committee A. K. Chaturvedi, Department of Electrical Engineering, IIT Kanpur, India Abdul Rajak A. R., Department of Electronics and Communication Engineering Birla Institute of Dr. Nitika Vats Doohan, Indore, India Ahmad Al-Khasawneh, The Hashemite University, Jordan Alexander christea, University of Warwick, London UK Amioy Kumar, Biometrics Research Lab, Department of Electrical Engineering, IIT Delhi, India Anand Paul, The School of Computer Science and Engineering, South Korea Anish Saha, NIT Silchar Apurva A. Desai, Veer Narmad South Gujarat University, Surat, India Avdesh Sharma, Jodhpur, India Bharat Singh Deora, JRNRV University, India Bhavesh Joshi, Advent College, Udaipur, India Brent Waters, University of Texas, Austin, Texas, USA Chhaya Dalela, Associate Professor, JSSATE, Noida, Uttar Pradesh, India Dan Boneh, Computer Science Department, Stanford University, California, USA Dipankar Das, Jadavpur University Feng Jiang, Harbin Institute of Technology, China Gengshen Zhong, Jinan, Shandong, China Harshal Arolkar, Immd. Past Chairman, CSI Ahmedabad Chapter, India H. R. Vishwakarma, Professor, VIT, Vellore, India Jayanti Dansana, KIIT University, Bhubaneswar, Odisha, India

viii

Organization

Jean Michel Bruel, Departement Informatique IUT de Blagnac, Blagnac, France Jeril Kuriakose, Manipal University, Jaipur, India Jitender Kumar Chhabra, NIT, Kurukshetra, Haryana, India Junali Jasmine Jena, KIIT DU, Bhubaneswar, India Jyoti Prakash Singh, NIT Patna K. C. Roy, Principal, Kautaliya, Jaipur, India Kalpana Jain, CTAE, Udaipur, India Komal Bhatia, YMCA University, Faridabad, Haryana, India Krishnamachar Prasad, Department of Electrical and Electronic Engineering, Auckland, New Zealand Lipika Mohanty, KIIT DU, Bhubaneswar, India Lorne Olfman, Claremont, California, USA Martin Everett, University of Manchester, England Meenakhi Rout, KIIT DU, Bhubaneswar, India Meenakshi Tripathi, MNIT, Jaipur, India Mrinal Kanti Debbarma, NIT Agartala M. Ramakrishna, ANITS, Vizag, India Mukesh Shrimali, Pacific University, Udaipur, India Murali Bhaskaran, Dhirajlal Gandhi College of Technology, Salem, Tamil Nadu, India Ngai-Man Cheung, Assistant Professor, University of Technology and Design, Singapore Neelamadhav Padhi, GIET University, Odisha, India Nilay Mathur, Director, NIIT Udaipur, India Philip Yang, PricewaterhouseCoopers, Beijing, China Pradeep Chouksey, Principal, TIT College, Bhopal, MP, India Prasun Sinha, Ohio State University Columbus, Columbus, OH, USA R. K. Bayal, Rajasthan Technical University, Kota, Rajasthan, India Rajendra Kumar Bharti, Assistant Professor, Kumaon Engineering College, Dwarahat, Uttarakhand, India S. R. Biradar, Department of Information Science and Engineering, SDM College of Engineering and Technology, Dharwad, Karnataka, India Sami Mnasri, IRIT Laboratory Toulouse, France Savita Gandhi, Professor, Gujarat University, Ahmedabad, India Soura Dasgupta, Department of TCE, SRM University, Chennai, India Sushil Kumar, School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India Ting-Peng Liang, National Chengchi University, Taipei, Taiwan V. Rajnikanth, EIE Department, St. Joseph’s College of Engineering, Chennai, India Veena Anand, NIT, Raipur Xiaoyi Yu, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China Yun-Bae Kim, Sung Kyun Kwan University, South Korea

Preface

This book is a collection of high-quality peer-reviewed research papers presented at the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021) held at the National Institute of Technology, Mizoram, Aizawl, India, during June 25–26, 2021. The idea of this conference series was conceived by few eminent professors and researchers from premier institutions of India. The first three editions of this conference: FICTA 2012, 2013, and 2014 were organized by Bhubaneswar Engineering College (BEC), Bhubaneswar, Odisha, India. The fourth edition FICTA 2015 was held at NIT, Durgapur, West Bengal, India. The fifth and sixth editions FICTA 2016 and FICTA 2017 were consecutively organized by KIIT University, Bhubaneswar, Odisha, India. FICTA 2018 was hosted by Duy Tan University, Da Nang City, Vietnam. The eighth edition FICTA 2020 was held at NIT, Karnataka, Surathkal, India. All past eight editions of the FICTA conference proceedings are published in Springer AISC Series. Presently, FICTA 2021 is the ninth edition of this conference series which aims to bring together researchers, scientists, engineers, and practitioners to exchange and share their theories, methodologies, new ideas, experiences, and applications in all areas of intelligent computing theories and applications to various engineering disciplines like computer science, electronics, electrical, mechanical, biomedical engineering, etc. FICTA 2021 had received a good number of submissions from the different areas relating to computational intelligence, intelligent data engineering, data analytics, decision sciences, and associated applications in the arena of intelligent computing. These papers have undergone a rigorous peer-review process with the help of our technical program committee members (from the country as well as abroad). The review process has been very crucial with minimum two reviews each and in many cases 3–5 reviews along with due checks on similarity and content overlap as well. This conference witnessed more than 400+ submissions including the main track as well as special sessions. The conference featured five special sessions in various cutting-edge technologies of specialized focus which were organized and chaired by eminent professors. The total toll of papers included submissions received crosscountry along with ten overseas countries. Out of this pool, only 108 papers were ix

x

Preface

given acceptance and segregated as two different volumes for publication under the proceedings. This volume consists of 54 papers from diverse areas of intelligent data engineering and analytics. The conference featured many distinguished keynote addresses in different spheres of intelligent computing by eminent speakers like Dr. Jinshan Tang (Professor in College of Computing at Michigan Technological University) and Prof. Sukumar Nandi (Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India). Dr. Jinshan Tang’s keynote lecture on “Automatic Segmentation of COVID-19 Infections from Medical Images with Deep Convolutional Neural Network” gives an idea on the recent research trends for segmenting COVID-19 infections in CT slices. The technique only requires scribble supervision, with uncertainty aware self-ensembling and transformation consistent techniques. Also Prof. Sukumar’s talk on the use and challenges of federated learning received ample applause from the vast audience of delegates, budding researchers, faculty, and students. We thank the advisory chairs and steering committees for rendering mentor support to the conference. An extreme note of gratitude to Dr. Ranjita Das (Head, Department of CSE, NIT Mizoram, Aizawl, India) and Dr. Anumoy Ghosh (Head, Department of ECE, NIT Mizoram, Aizawl, India) for providing valuable guidelines and being an inspiration in the entire process of organizing this conference. We would also like to thank Department of Computer Science and Engineering and Department of Electronics and Communication Engineering, NIT Mizoram, Aizawl, India, who jointly came forward and provided their support to organize the ninth edition of this conference series. We take this opportunity to thank the authors of all submitted papers for their hard work, adherence to the deadlines, and patience with the review process. The quality of a refereed volume depends mainly on the expertise and dedication of the reviewers. We are indebted to the technical program committee members who not only produced excellent reviews but also did these in short time frames. We would also like to thank delegates, who have participated in the conference above all hardships. Bhubaneswar, India Ljubljana, Slovenia Houghton, MI, USA Lucknow, India Aizwal, India

Dr. Suresh Chandra Satapathy Dr. Peter Peer Dr. Jinshan Tang Dr. Vikrant Bhateja Dr. Anumoy Ghosh

Contents

1

Automated Flower Species Identification by Using Deep Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shweta Bondre and Uma Yadav

2

Information Retrieval for Cloud Forensics . . . . . . . . . . . . . . . . . . . . . . . Prasad Purnaye and Vrushali Kulkarni

3

Machine Translation System Combination with Enhanced Alignments Using Word Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . Ch Ram Anirudh and Kavi Narayana Murthy

4

5

6

7

8

Geometry-Based Machining Feature Retrieval with Inductive Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. S. Kamal, H. B. Barathi Ganesh, V. V. Sajith Variyar, V. Sowmya, and K. P. Soman Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Priyamvada, D. Govind, Vijay Krishna Menon, B. Premjith, and K. P. Soman Usage of Blockchain Technology in e-Voting System Using Private Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suman Majumder and Sangram Ray Bengali Visual Genome: A Multimodal Dataset for Machine Translation and Image Captioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arghyadeep Sen, Shantipriya Parida, Ketan Kotwal, Subhadarshi Panda, Ondˇrej Bojar, and Satya Ranjan Dash Deep Learning-Based Mosquito Species Detection Using Wingbeat Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ayush Jhaveri, K. S. Sangwan, Vinod Maan, and Dhiraj

1 11

19

31

41

51

63

71

xi

xii

9

Contents

Developments in Capsule Network Architecture: A Review . . . . . . . . Sudarshan Kapadnis, Namita Tiwari, and Meenu Chawla

10 Computer-Aided Segmentation of Polyps Using Mask R-CNN and Approach to Reduce False Positives . . . . . . . . . . . . . . . . . . . . . . . . . Saurabh Jha, Balaji Jagtap, Srijan Mazumdar, and Saugata Sinha 11 Image GPT with Super Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bhumika Shah, Ankita Sinha, and Prashant Saxena

81

91 99

12 Boosting Accuracy of Machine Learning Classifiers for Heart Disease Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Divya Lalita Sri Jalligampala, R. V. S. Lalitha, M. Anil Kumar, Nalla Akhila, Sujana Challapalli, and P. N. S. Lakshmi 13 Rapid Detection of Fragile X Syndrome: A Gateway Towards Modern Algorithmic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Soumya Biswas, Oindrila Das, Divyajyoti Panda, and Satya Ranjan Dash 14 Summarizing Bengali Text: An Extractive Approach . . . . . . . . . . . . . 133 Satya Ranjan Dash, Pubali Guha, Debasish Kumar Mallick, and Shantipriya Parida 15 Dynamic Hand Gesture Recognition of the Days of a Week in Indian Sign Language Using Low-Cost Depth Device . . . . . . . . . . . 141 Soumi Paul, Madhuram Jajoo, Abhijeet Raj, Ayatullah Faruk Mollah, Mita Nasipuri, and Subhadip Basu 16 Sentiment Analysis on Telugu–English Code-Mixed Data . . . . . . . . . 151 K. S. B. S. Saikrishna and C. N. Subalalitha 17 Fuzziness on Interconnection Networks Under Ratio Labelling . . . . 165 A. Amutha and R. Mathu Pritha 18 CoviNet: Role of Convolution Neural Networks (CNN) for an Efficient Diagnosis of COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . 175 D. N. V. S. L. S. Indira and R. Abinaya 19 Deep Learning for Real-Time Diagnosis of Pest and Diseases on Crops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Jinendra Gambhir, Naveen Patel, Shrinivas Patil, Prathamesh Takale, Archana Chougule, Chandra Shekhar Prabhakar, Kalmesh Managanvi, A. Srinivasa Raghavan, and R. K. Sohane 20 Sentiment-Based Abstractive Text Summarization Using Attention Oriented LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Dipanwita Debnath, Ranjita Das, and Shaik Rafi

Contents

xiii

21 A Fuzzy-Based Multiobjective Cat Swarm Optimization Algorithm: A Case Study on Single-Cell Data . . . . . . . . . . . . . . . . . . . . 209 Amika Achom, Ranjita Das, Pratibha Gond, and Partha Pakray 22 KTM-POP: Transliteration of K-POP Lyrics to Marathi . . . . . . . . . . 219 Manisha Satish Divate 23 A Comparative Study on the Use of Augmented Reality in Indoor Positioning Systems and Navigation . . . . . . . . . . . . . . . . . . . . 229 Aashka Dave and Rutvik Dumre 24 Classification of Chest X-Ray Images to Diagnose COVID-19 Disease Through Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Sameer Manubansh and N. Vinay Kumar 25 Remodeling Rainfall Prediction Using Artificial Neural Network and Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . 253 Aakanksha Sharaff, Kshitij Ukey, Rajkumar Choure, Vinay Ujee, and Gyananjaya Tripathy 26 Gender-Based Emotion Recognition: A Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Biswajit Nayak, Bhubaneswari Bisoyi, Prasant Kumar Pattnaik, and Biswajit Das 27 An Efficient Exploratory Demographic Data Analytics Using Preprocessed Autoregressive Integrated Moving Average . . . . . . . . . . 271 Siddhesh Nandakumar Menon, Shubham Tyagi, and Venkatesh Gauri Shankar 28 Classification of VASA Dataset Using J48, Random Forest, and Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 S. Anitha and M. Vanitha 29 Efficient Fault-Tolerant Cluster-Based Approach for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Kavita Jaiswal and Veena Anand 30 From Chalk Boards to Smart Boards: An Integration of IoT into Educational Environment During Covid-19 Pandemic . . . . . . . . 301 Shraddha Dhal, Swati Samantaray, and Suresh Chandra Satapathy 31 Design and Development of an IoT-Based Smart Hexa-Copter for Multidisciplinary Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Goutam Majumder, Gouri Shankar Chakraborty, Shakhaowat Hossain, Yogesh Kumar, Amit Kumar Ojha, and Md. Foysal Majumdar 32 Deep Learning-Based Violence Detection from Videos . . . . . . . . . . . . 323 Neha Singh, Onkareshwar Prasad, and T. Sujithra

xiv

Contents

33 Stationary Wavelet-Based Fusion Approach for Enhancement of Microscopy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Disha Singh, Vikrant Bhateja, and Ankit Yadav 34 A Novel Optimization for Synthesis of Concentric Circular Array Antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 G. Challa Ram, D. Girish Kumar, and M. Venkata Subbarao 35 PAPR Analysis of FBMC and UFMC for 5G Cellular Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 T. Sairam Vamsi, Sudheer Kumar Terlapu, and M. Vamshi Krishna 36 Touchless Doorbell with Sanitizer Dispenser: A Precautionary Measure of COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 G. R. L. V. N. Srinivasa Raju, T. Sairam Vamsi, and Sanjay Dubey 37 Optic Disc Segmentation Based on Active Contour Model for Detection and Evaluation of Glaucoma on a Real-Time Challenging Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Sonali Dash, P. Satish Rama Chowdary, C. V. Gopala Raju, Y. Umamaheshwar, and K. J. N. Siva Charan 38 Array Thinning Using Social Modified Social Group Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 E. V. S. D. S. N. S. L. K. Srikala, M. Murali, M. Vamshi Krishna, and G. S. N. Raju 39 Impact of Flash Flood on Landuse and Landcover Dynamics and Erosional Pattern of Jiadhal River Basin, Assam, India . . . . . . . 389 Amar Kumar Kathwas, Rakesh Saur, and V. S. Rathore 40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model, Computational Intelligence Methods, and Geospatial Analytics: A Case Study of Aizawl City, Mizoram—India . . . . . . . . . 397 Gospel Rohmingthangi, F. C. Kypacharili, Alok Bhushan Mukherjee, and Bijay Singh Mipun 41 Delineation and Assessment of Groundwater Potential Zones Using Geospatial Technology and Fuzzy Analytical Hierarchy Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Hundashisha Thabah and Bijay Singh Mipun 42 COVID-19: Geospatial Analysis of the Pandemic—A Case Study of Bihar State, India, Using Data Derived from Remote Sensing Satellites and COVID-19 National Geoportal . . . . . . . . . . . . . 425 Pallavi Kumari, Richa Sharma, and Virendra Singh Rathore 43 Assessment of the Relationship Between Rainfall Trend and Flood Impact: A Case Study of Tinsukia District, Assam . . . . . . 433 Govind Sharma

Contents

xv

44 Learning Deep Features and Classification for Fresh or off Vegetables to Prevent Food Wastage Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Prateek Sanghi, Sandeep Kumar Panda, Chinmayee Pati, and Pradosh Kumar Gantayat 45 Secure Trust Level Routing in Delay-Tolerant Network with Node Categorization Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Pradosh Kumar Gantayat, Sadhna Mohapatra, and Sandeep Kumar Panda 46 Predicting the Trends of COVID-19 Cases Using LSTM, GRU and RNN in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Sweeti Sah, Akash Kamerkar, B. Surendiran, and R. Dhanalakshmi 47 Biogeography-Based Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Suraj Sharma and D. Chandrasekhar Rao 48 Novel Sentiment Analysis Model with Modern Bio-NLP Techniques Over Chronic Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Palacharla Sri Varun, Gonugunta Leela Manohar, Tamatamala Santhosh Kumar, and C. S. Pavan Kumar 49 Medical Diagnosis for Incomplete and Imbalanced Data . . . . . . . . . . 491 Sravani Sribhashyam, Satya Koganti, Muvvala Vasavi Vineela, and G. Kalyani 50 SafeXAI: Explainable AI to Detect Adversarial Attacks in Electronic Medical Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Shymalagowri Selvaganapathy, Sudha Sadasivam, and Naveen Raj 51 Exploring Historical Stock Price Movement from News Articles Using Knowledge Graphs and Unsupervised Learning . . . . 511 Amol Jain, Binayak Chakrabarti, Yashaswi Upmon, and Jitendra Kumar Rout 52 Comparative Study of Classical and Quantum Cryptographic Techniques Using QKD Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Cherry Mangla and Shalli Rani 53 A Novel Approach to Encrypt the Data Using DWT and Histogram Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Sandeep Kumar Srivastava, Sandhya Katiyar, and Sanjay Kumar 54 Auto-generation of Smart Contracts from a Domain-Specific XML-Based Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 Vimal Dwivedi and Alex Norta Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565

About the Editors

Suresh Chandra Satapathy is a Ph.D. in Computer Science, currently working as Professor and at KIIT (Deemed to be University), Bhubaneshwar, Odisha, India. He held the position of the National Chairman Div-V (Educational and Research) of Computer Society of India and is also senior Member of IEEE. He has been instrumental in organizing more than 20 International Conferences in India as Organizing Chair and edited more than 30 book volumes from Springer LNCS, AISC, LNEE, and SIST Series as Corresponding Editor. He is quite active in research in the areas of swarm intelligence, machine learning, and data mining. He has developed a new optimization algorithm known as Social Group Optimization (SGO) published in Springer Journal. He has delivered number of keynote address and tutorials in his areas of expertise in various events in India. He has more than 100 publications in reputed journals and conference proceedings. Dr. Suresh is in Editorial board of IGI Global, Inderscience, Growing Science journals, and also Guest Editor for Arabian Journal of Science and Engineering published by Springer. Peter Peer is Full Professor of computer science at the University of Ljubljana, Slovenia, where he heads the Computer Vision Laboratory, coordinates the double degree study program with the Kyungpook National University, South Korea, and serves as Vice-Dean for economic affairs. He received his doctoral degree in computer science from the University of Ljubljana in 2003. Within his post-doctorate, he was an invited researcher at CEIT, San Sebastian, Spain. His research interests focus on biometrics and computer vision. He participated in several national and EU funded R&D projects and published more than 100 research papers in leading international peer-reviewed journals and conferences. He is co-organizer of the Unconstrained Ear Recognition Challenge and Sclera Segmentation Benchmarking Competition. He serves as Associated Editor of IEEE Access and IET Biometrics. He is Member of the EAB, IAPR, and IEEE.

xvii

xviii

About the Editors

Dr. Jinshan Tang is currently Professor in the College of Computing at Michigan Technological University. He received his Ph.D. degree from Beijing University of Posts and Telecommunications and postdoctoral training at Harvard Medical School and the National Institute of Health. Dr. Tang’s research covers wide areas related to image processing and imaging technologies. His specific research interests include machine learning, biomedical image analysis and biomedical imaging, biometrics, computer vision, and image understanding. He has obtained more than three million US dollars grants as a PI or Co-PI. He has published more than 110 refereed journals and conference papers. He has also served as Committee Member at various international conferences. He is senior Member of IEEE and Co-Chair of the Technical Committee on Information Assurance and Intelligent Multimedia-Mobile Communications, IEEE SMC society. Dr. Tang serves/served as Editor or Guest Editor of more than 10 journals. Dr. Vikrant Bhateja is Associate Professor, Department of ECE in SRMCEM, Lucknow. His areas of research include digital image and video processing, computer vision, medical imaging, machine learning, pattern analysis, and recognition. He has around 160 quality publications in various international journals and conference proceedings. He is Associate Editor of IJSE and IJACI. He has edited more than 30 volumes of conference proceedings with Springer Nature and is presently EiC of IGI Global: IJNCR journal. Dr. Anumoy Ghosh is currently serving as Head and Assistant Professor, Department of Electronics and Communication Engineering, National Institute of Technology Mizoram. He did his Ph.D. from IIEST Shibpur, India. His research was in the area of antennas, electromagnetic periodic structures, RF energy scavenging, and microwave passive circuits. He has published 11 journal and international conference papers in various journals with SCI impact factors, SCOPUS index, and also in Conference proceedings of Springer, IEEE, etc. Under his supervision, presently four research scholars are doing research work.

Chapter 1

Automated Flower Species Identification by Using Deep Convolution Neural Network Shweta Bondre and Uma Yadav

Abstract In machine learning, image classification plays a very important role in demonstrating any image. Recognition of flower species is based on the geometry, texture, and form of different flowers in the past year. Now, nowadays, flower identification is widely used to recognize medicinal plant species. There are about 400,000 flowering plant species, and modern search engines have the mechanism to search and identify the image containing a flower, but due to millions of flower species worldwide, robustness is lacking. The method of machine learning with CNN is then used to classify the flower species in this proposed research work. With data, we will train the machine learning model, and if any unknown pattern is discovered, then the predictive model will predict the flower species by what it has been gained by the trained data. The built-in camera of the mobile phone will acquire the images of the flower species, and the flower image extraction function is performed using a pretrained network extraction of complex features.

1.1 Introduction Machine learning is the technology which learns from the past dataset to give better output. There are two different values, discrete and continuous on which machine learning works. Machine learning applications are computer vision, pattern recognition, weather forecast, biometric attendance, spam detection, human body disease detection, sentiment analysis, etc. There are three different types of machine learning reinforcement learning, supervised, and unsupervised. Flowers are a very attractive feature of plants, and it is not possible for common man to recognize the name of the flower species because of their existence in the wide variety. Two important features of the flower are their shape and color, which can be used to train the model. S. Bondre (B) · U. Yadav G H Raisoni College of Engineering, Nagpur, India e-mail: [email protected] U. Yadav e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_1

1

2

S. Bondre and U. Yadav

Our mission is, therefore, to support ordinary people by providing them with reliable and precise results. In machine learning, because of substantial advancement, picture-based recognition is the easy way to identify flower species. In the proposed framework, we developed a flower image classification model using CNN to produce efficient result. The previously collected data of various flowers with labels will be used to train the model. The model takes the flower image as input and predicts the name and family of the flower.

1.2 Related Work Recognizing a flower without a botanist expert is a tedious task because of its existence in a wide variety of colors and shapes. So identifying a flower species without a botanist is great help for the pharmaceutical and cosmetic industries. The color of a flower is one of the important features which is included in the paper [1]. In this research work the author consider, GIST and color features for classifying the input images and this model gave 85.93% accuracy which was developed using SVM technique [2]. Developed a model for flower recognition based on image processing in which Hu’s seven-moment and K-nearest neighbor algorithm were used to design their model. To classify the images, edge characteristics were extracted using K-nearest neighbor and Hu’s seven-moment algorithms. Comparative analysis of different models such as random forest, SVM, KNN, and multilayer perceptron for image classification was done by [3, 4]. Describes a model for iris flower species with a preloaded dataset in MATLAB. The dataset is clustered into three different species by using the neural network clustering and k-means algorithm tool in MATLAB. To cluster large datasets without any supervision, neural network clustering is used and it is also used for feature extraction, pattern recognition, image segmentation, vector quantization, data mining, and function approximation. Guru et al. [5] have explored image classification by using K-nearest neighbor classifies in which threshold-based segmentation algorithm is used [6]. Then, by using gray-level co-occurrence matrix (GLCM), Gabor or a combination of both is used for texture feature extraction. Then, classification is performed with the help of K-nearest neighbor. In [2] has discussed and used flower image edge characteristics and color for classification of flower. To derive the flower characteristics from Hu’s seven-moment algorithm, histogram is used and K-nearest neighbor is used for classification. However, the colored flower has less accuracy in this paper similar shape. Global and local descriptors are two different categories of feature extraction. Global descriptors are divided into shape, texture, color, and others. The local descriptor is divided into ORB, BRIEF, SIRF, and SURF as shown in Fig. 1.1.

1 Automated Flower Species Identification by Using Deep …

3

Fig. 1.1 Feature extraction and their types [7]

Fig. 1.2 Feature extraction stages of clear image [7]

1.2.1 Feature Extraction Computer vision enables a wider range of algorithms for device detection. Design, structure, and color are the primary characteristics that could distinguish between different plant groups and their flowers. For quantifying and defining the global images, this is the suitable choice. Because of having several similar attributes in flower species, this approach will not produce better results accurately with one feature vector. As shown in Fig. 1.2, to define the picture more accurately, measurements can be taken by combining different adjectives of features.

1.2.2 Local Feature Descriptor The local feature descriptor plays an essential role in computer vision and pattern recognition [8]. Detection of features is the first big step in the local feature descriptor

4

S. Bondre and U. Yadav

Fig. 1.3 Commonly used local descriptor [7]

based on the image matching method. Valid points in Fig. 1.3 show the categorization on local feature descriptor.

1.2.3 Global Descriptor Along with feature vectors, there are two different methods are used. Combinations of each flower feature have been brought into an individual global feature in the presence of global feature vectors. Bag of visual words (BOVW) is viewed as a local feature vector and also taken as local feature and global feature vectors [9].

1.3 Global Feature Extraction The global descriptors are as follows: Hu moments quantify flower shape, color histogram quantifies flower hue, and Haralick texture quantifies the surface of the flower. Some might understand that images are viewed as matrices, as local vector characteristics require an optimal way to store them. By considering one image at once and collecting three global features and combining the global feature into a unified feature, then save it with the name in HDF5 file format. It imports the libraries needed for which we need to work. Convert the representation of the input image to a fixed dimension (500, 500).

1.3.1 Training Images Thousands of images are used and taken from the ImageNet Web site and also used FLOWERS17 dataset from the Visual Geometry Group at University of Oxford.

1 Automated Flower Species Identification by Using Deep …

5

Fig. 1.4 Dataset analysis

This research paper focuses specially on the classification of flowers in each group. There are five different types of flowers there and each type contains hundreds of photographs of various sides and colors of flowers [10]. There are a total of 3670 images of flowers which are categorized into different types of flowers. The daisy folder contains 633 images, the dandelion folder contains 898 images, the roses folder contains 641 images, sunflower contains 699 images, and tulips contains 799 different images. Figure 1.4 shows that the dataset has a variety of images of every type for analysis.

1.3.2 Deep Learning Using CNN The dataset consists of five different types of flower. The image classification is developed using TensorFlow. Collected images are taken as input, and a deep neural network is applied to train the model. The process ends after it categorized the flower into the correct format. In this analysis, the flow begins by adding a set of flower images as an input. There are five flower styles, which are daisy, roses, dandelion, tulips, and sunflowers. The DNN had to train all of these datasets before each of these 3670 images was recognized by the systems [11] . The FLOWER17 and FLOWER102, the demanding datasets available on the Internet for flowers, are from the Visual Geometry department at the University of Oxford. An artificial neural network (ANN) shown in Fig. 1.6 with several layers between the input and output layers is known as a deep neural network (DNN). If it is a linear or nonlinear interaction, the DNN shown in Fig. 1.5 discovers the right mathematical manipulation to convert the input into the output. The network traverses the layers, measuring the likelihood of each output. A convolutional neural network (CNN) is a form of neural network that employs both a convolutional and a pooling layer. To extract features, the convolution layer compresses an area, or a collection of elements in input data, into a smaller area. Within a given field, the pooling layer selects the data with the highest value. These layers are responsible for extracting a key attribute from the input for classification. CNN shown in Fig. 1.7 is used in this research work with transfer learning to recognize the flower species. In CNN, only the last layer is

6

S. Bondre and U. Yadav

Fig. 1.5 Deep neural network [12]

fully connected, and in ANN, each neuron is connected to other neurons, which is not suitable for images. Flower species recognition is a combination of both object recognition and image classification recognition, as the device must identify a flower in the image and recognize which species it belongs to. An intelligent system must be trained with a larger set of images to recognize the flower species, so that it can predict the flower species from its learned patterns [13]. This approach is referred to as "supervised learning," which involves an outside approach to learning. Fig. 1.6 Artificial neural network [1]

Fig. 1.7 Convolution neural network [11]

1 Automated Flower Species Identification by Using Deep …

7

Fig. 1.8 Proposed system

1.4 Proposed System A flower image is captured using a smart phone which is then sent to a cloud storage platform. The CNN system trained on the FLOWERS17 dataset [9] receives the latest flower image and converts it for processing into a standard matrix form on the server side. The transformed image is sent to CNN where it is supposed to have its performance class mark [14]. The label name is sent after prediction to the same username with the same picture ID, from which the smartphone receives an automatic answer to the flower name from the cloud storage platform. The proposed system diagram is shown in Fig. 1.8.

1.5 Results Programming language Python is used to build the CNN system. Firstly using CNN network, features from the images in the training dataset are extracted. With a batch size of 32 and 150 epochs, the model was trained. Secondly, different machine learning classifiers such as Gaussian-Naïve Bayes, random forests, decision trees, Knearest neighbor, logistic regression, and bagging trees are used to train the network [15]. Finally, random test images are given to the network for label prediction to evaluate the precision of the device. In the figure below, the performance of the prediction is shown in Fig. 1.9. Color channel statistics, local (Fig. 1.10) binary pattern (LBP), color histograms, and Hu moments are all outperformed by CNN

8

S. Bondre and U. Yadav

Fig. 1.9 Output of flower prediction

Fig. 1.10 Accuracy of various machine learning classifier on CNN extracted features

combined with transfer learning approach as feature extractor. So, instead of training a CNN from scratch, "Transfer Learning" is used, in which a pretrained network on a very large dataset Inception-v3, and Xception is used as a feature extractor. On the training dataset, various global feature descriptors are added and tested using the random forests classifier, which reliably outperforms all other classifiers. The proposed scheme and global feature descriptors are compared in Fig. 1.11 using

1 Automated Flower Species Identification by Using Deep …

9

Fig. 1.11 Comparison of proposed work (CNN) with global feature extractor

the FLOWERS17 dataset. In the shown figure, CH is color histogram, LBP is local binary pattern, CS is color statistics, and HM is Hu moments which are compared with the proposed work on the FLOWER17 dataset.

1.6 Conclusion The easiest way to distinguish a plant is by its flower, which is the most beautiful component. This research is about picture categorization by applying machine learning model via framework TensorFlow. The CNN approach was investigated in further depth, beginning with the assembly, training model, and classification of pictures into categories. As a result, recognizing the flower will assist in learning more about the particular plant. With a smaller dataset and minimal computation, the proposed work is the quicker way to train the CNN. This approach could easily be adapted by making more pictures of flower species to classify various species around the world, since there are millions of flower species around the world. The proposed system takes a picture of a flower as an input and shows the common name of the flowers. The model is a convolution neural network that has proven to be one of the most effective image classification methods. Various trials have been carried out. The collected results demonstrate the CNN’s efficacy in flower recognition. In the results, it has been observed that logistic regression, Bagging tree and random forest achieve good accuracy of 93.9, 88.5 and 90.5, respectively.

References 1. Lodh, A., Parekh, R.: Flower recognition system based on color and GIST features. (2017). https://doi.org/10.1109/DEVIC.2017.8074061 2. Tiay, T., Benyaphaichit, P., Riyamongkol, P.: Flower recognition system based on image processing. (2014). https://doi.org/10.1109/ICT-ISPC.2014.6923227

10

S. Bondre and U. Yadav

3. Mete, B.R., Ensari, T.: Flower classification with deep CNN and machine learning algorithms. (2019). https://doi.org/10.1109/ISMSIT.2019.8932908 4. Poojitha, V., Bhadauria, M., Jain, S., Garg, A.: A collocation of IRIS flower using neural network clustering tool in MATLAB. (2016). https://doi.org/10.1109/CONFLUENCE.2016. 7508047 5. Guru, D.S., Sharath, Y.H., Manjunath, S.: Texture features and KNN in classification of flower images. IJCA Spec. Issue “Recent Trends Image Process. Pattern Recognit. 21–29 (2010) 6. Gede, I.: Texture analysis on image motif of endek bali using K-nearest neighbor classification method. Int. J. Adv. Comput. Sci. Appl. (2015). https://doi.org/10.14569/ijacsa.2015.060928 7. Dayanand Lal, N., Sahana, D.S., Veena, R.C., Brahmananda, S.H., Sakkari, D.S.: Image classification of the flower species identification using machine learning. Int. J. Adv. Sci. Technol. 29(5), 995–1007 (2019) 8. Leng, C., Zhang, H., Li, B., Cai, G., Pei, Z., He, L.: Local feature descriptor for image matching: a survey. IEEE Access (2019). https://doi.org/10.1109/ACCESS.2018.2888856 9. Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. (2006). https:// doi.org/10.1109/CVPR.2006.42 10. Tetila, E.C., et al.: Automatic recognition of soybean leaf diseases using UAV images and deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. (2020). https://doi.org/10. 1109/LGRS.2019.2932385 11. Ramakrishnan, P., Sivagurunathan, P.T., Sathish Kumar, N.: Fruit classification based on convolutional neural network. Int. J. Control Autom. (2019). https://doi.org/10.1049/joe.2019. 0422 12. Gogul, I., Kumar, V.S.: Flower species recognition system using convolution neural networks and transfer learning. (2017). https://doi.org/10.1109/ICSCN.2017.8085675 13. Sünderhauf, N., McCool, C., Upcroft, B., Perez, T.: Fine-grained plant classification using convolutional neural networks for feature extraction. (2014) 14. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. (2008). https://doi.org/10.1109/ICVGIP.2008.47 15. Jaya Prakash, R., Devi, T.: Resolving presentation attack using CNN (Convolutional neural network). Test Eng. Manag. (2019)

Chapter 2

Information Retrieval for Cloud Forensics Prasad Purnaye and Vrushali Kulkarni

Abstract The use of cloud computing and cloud-based services has increased sharply in the past decade. Regardless of many advantages, the extensive use of the cloud has also created a large attack platform, frequently exploited by cybercriminals, requiring real-time, automated detection and competent forensics tools for investigations. Evidence identification so far has been limited to performance studies on datasets that were created a long time ago and are not specific to cloud environments. In this paper, we introduce a novel dataset for cloud-specific evidence detection. The dataset has two categories: the monitoring database and the evidence database. The monitoring database has 43 features and 9610 records, whereas the evidence database has 360 memory dump files of around 280 GB which contain memory dumps of benign virtual machines and a hostile virtual machine. The dataset will be an important resource for evidence identification research in the cloud environment.

2.1 Introduction We introduce a novel dataset for cloud forensics aimed at the identification of evidence by profiling virtual machines. Monitoring of virtual machines is done at the hypervisor. Virtual machines are profiled on the disk, memory, and network paradigm. We have generated a dataset for profiling virtual machines and detection of evidence in the cloud. Our dataset has two categories and three parts. The monitoring database and evidence database are the two categories of the dataset. The monitoring database has two parts. The first part of the dataset is the monitoring data gathered from the OpenNebula Application Programming Interface (API). OpenNebula is an open-source management tool that helps oversee private clouds, public clouds, P. Purnaye (B) · V. Kulkarni School of Computer Engineering and Technology, MIT World Peace University, Pune, Maharashtra 411038, India e-mail: [email protected] V. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_2

11

12

P. Purnaye and V. Kulkarni

and hybrid clouds. OpenNebula combines existing virtualization technologies and provides advanced-level features with which we have gathered the first monitoring dataset. The second part of the dataset is obtained from the Kernel Virtual Machine (KVM). This is similar to the first dataset, but the monitoring information gathered using the KVM has more details. The second dataset is generated using libvirt. libvirt is an open-source API for managing platform virtualization. It provides an API to manage the KVM. The third and final part of the dataset is raw memory dumps of the virtual machines which can serve as evidence in case of an investigation. The third part of the dataset which is monitoring data is collected from the KVM using libvirt. The memory dumps are acquired using libvirt API that does not require any agent inside the virtual machine.

2.1.1 Comparison to Other Cloud Forensics Datasets There have been several other models proposed for cloud evidence acquisition including OCF [1], FROST [2], SNAPS [3], LINEA [4], FORENVISOR [5], kumodocs [6], SecLaSS [8], and AlmaNebula [9]. However, none of the proposed models have published any dataset for evidence detection. Table 2.1 shows some properties of existing datasets in comparison with our proposed dataset. Unlike previous datasets, ours consists of features that can be used to profile virtual machine activities related to disk, network, and memory. In addition to this, our dataset also preserves volatile data in the form of raw memory dumps of the deployed virtual machine.

2.2 Dataset To generate the dataset, a private cloud was set up. The system configuration includes Intel® CoreTM i5-4590 processor with 12 GB of RAM with 1 TB of HDD. The private cloud setup was done using a KVM type-1 hypervisor along with OpenNebula (version 5.12) as a cloud management platform. To simulate the real-time cloud environment, a script-generating synthetic workload was deployed on the virtual machines of the cloud.

2.2.1 Monitoring Database We have deployed a private cloud using OpenNebula. There are two ways to monitor virtualized resources. One of the ways is to use the abstract APIs provided by OpenNebula. The second way is to use libvirt API to use KVM level monitoring. The first part of the dataset is generated using OpenNebula Monitoring API, and the

2 Information Retrieval for Cloud Forensics

13

Table 2.1 Comparison of existing datasets with the proposed dataset Sr no

Framework

Description

Dataset published

Evidence detection

Volatile data acquisition

1

OCF [1]

Open cloud forensics framework to generate and maintain electronically stored information (ESI) that can be used in court of law

No

No

No

2

FROST [2]

Forensic toolkit for the openStack cloud platform to gather forensic evidence without CSP’s intervention

No

No

No

3

SNAPS [3]

A system that stores No multiple snapshots of target virtual machine along with the provenance

No

Yes

4

LINEA [4]

Snapshot tool for virtual network infrastructures in the cloud

No

No

Yes

5

FORENVISOR [9]

A tool for live forensic analysis in the form of dynamic hypervisor

No

No

Yes

6

kumodocs [6]

Analysis tool for No Google Docs based on the DraftBack, a browser extension that replays the complete history of documents residing in the document folder

No

No

7

Kumodd [5]

Cloud storage No forensic tool for cloud drive acquisition including snapshot of cloud-native artifacts in format such as PDF

No

No

(continued)

14

P. Purnaye and V. Kulkarni

Table 2.1 (continued) Sr no

Framework

Description

8

Kumofs [6]

Forensic tool for acquisition/analysis of file meta-data residing in the cloud

9

Dataset published

Evidence detection

Volatile data acquisition

No

No

No

SecLaaS [7]

Framework that No stores virtual machines’ logs and provides access to forensic investigators ensuring the confidentiality of the cloud users

No

No

10

AlmaNebula [8]

Forensics as a service No model

No

No

11

Our proposed model

Profiles virtual machine’s activities and acquire volatile evidence

Yes

Yes

Yes

second part of the dataset is generated using libvirt API. The OpenNebula dataset has fewer parameters as compared to the libvirt dataset. The OpenNebula API provides 13 parameters, whereas the KVM provides 39 parameters. These parameters are associated with the disk, network, and memory activities of the virtual machine. We have published both datasets on IEEE Dataport [10]. A detailed description of the dataset is given in Tables 2.2 and 2.3, and the dataset is available on IEEE Dataport [10, 11].

2.2.2 Monitoring Database The later part of the database is the evidence database [11] which is gathered with the KVM extension using libvirt API. Raw memory dumps of virtual machines are taken at regular intervals. A total of 360 memory dumps of a total of 5 virtual machines deployed for over 30 h are stored in the database. The size of one memory dump file is around 800 Mb. The evidence database consists of files that belong to benign and malicious virtual machines. We implemented a volatile data dump module that generated around 360 memory dumps of five virtual machines. The average size of each memory dump was 800 Mb. These data files are compressed to a 79.5 GB memory evidence dataset. Out of this preserved and stored memory dump dataset, 79 files of size 17.3 GB were generated during the attack. This means 21.76% of the data (in size) is potential evidence, and this can improve the storage and the

2 Information Retrieval for Cloud Forensics

15

Table 2.2 Category and description of features of OpenNebula dataset Sr no Category 1

Parameter

Meta-data VMID

Description The ID of the VM

2

LAST_POLL

EPOCH timestamp

3

STATE

State of the VM (explained below)

4

MAC

The physical address of the VM

5

IP

IPv4 addresses associated with the VM at LAST_POLL time instant

Memory

CPU

Percentage of CPU consumed

MEMORY

Memory consumption in kilobytes

Network

NETRX

Received bytes from the network

NETTX

Sent bytes to the network

DISKRDBYTES

Read bytes from all disks since last VM start

6 7 8 9 10

Disk

11

DISKWRBYTES Written bytes from all disks since last VM start

12

DISKRDIOPS

Read IO operations from all disks since last VM start

13

DISKWRIOPS

Written IO operations all disks since last VM start

processing time during the forensics investigation. However, these statistics only depict the current scenario of the simulation and are subject to change as per the attack duration.

2.3 Training and Testing Due to the nature of the data, the dataset exhibits a class imbalance problem for attack and normal classes. However, without using any explicit control over the data balance ratio, the Naïve Bayes provides better overall classification accuracy for both the Open- Nebula and KVM datasets. Because of the class imbalance problem, we have considered sensitivity and specificity to measure the performance through evaluation results. The training and testing splits are fixed and available on the dataset Web site [10, 11].

2.4 Conclusion In this paper, we introduce a novel dataset for cloud forensics aimed at the identification of evidence by profiling virtual machines. The dataset is divided into a monitoring database and an evidence database. Monitoring database is generated by monitoring the activities at the virtualized disk, network, and memory resources. These activities can be used to identify data generated during an attack. Evidence

16

P. Purnaye and V. Kulkarni

Table 2.3 Category and description of features of KVM dataset Sr No Category 1

Parameter

Meta-data LAST_POLL

Description Epoch timestamp

2

VMID

The ID of the VM

3

UUID

Unique identifier of the domain

4

Dom

Domain name

5

Rxbytes

Received bytes from the network

6

rxpackets

Received packets from the network

7

rxerrors

Number of received errors from the network

8

rxdrops

Number of received packets dropped from the network

9

txbytes

Transmitted bytes from the network

10

txpackets

Transmitted packets from the network

11

txerrors

Number of transmission errors from the network

12

txdrops

Number of transmitted packets dropped from the network

timecpu

Time spent by vCPU threads executing guest code

14

timesys

Time spent in kernel space

15

timeusr

Time spent in userspace

16

state

Running state

17

memmax

Maximum memory in kilobytes

18

mem

Memory used in kilobytes

19

cpus

Number of virtual CPUs

20

cputime

CPU time used in nanoseconds

21

memactual

Current balloon value (in KiB)

22

memswap_in

The amount of data read from swap space (in KiB)

23

memswap_out

The amount of memory written out to swap space(in KiB)

24

memmajor_fault

The number of page faults where disk IO was required

25

memminor_fault

The number of other page faults

26

memunused

The amount of memory left unused by the system (in KiB)

27

memavailable

The amount of usable memory as seen by the domain (in KiB)

13

Network

Memory

(continued)

2 Information Retrieval for Cloud Forensics

17

Table 2.3 (continued) Sr No Category

Parameter

Description

28

memusable

The amount of memory that can be reclaimed by balloon without causing host swapping (in KiB)

29

memlast_update

The timestamp of the last update of statistics (in seconds)

30

memdisk_cache

The amount of memory that can be reclaimed without additional I/O, typically disk caches (in KiB)

31

memhugetlb_pgalloc The number of successful huge page allocations initiated from within the domain

32

memhugetlb_pgfail

The number of failed huge page allocations initiated from within the domain

33

memrss

Resident Set Size of the running domain’s process (in KiB)

vdard_req

Number of read requests on the vda block device

35

vdard_bytes

Number of read bytes on the vda block device

36

vdawr_reqs

Number of write requests on the vda block device

37

vdawr_bytes

Number of write requests on vda the block device

38

vdaerror

Number of errors in the vda block device

39

hdard_req

Number of read requests on the hda block device

34

Disk

database is the volatile database of virtual machine dumps acquired with KVM API. Our upcoming work is focused on how monitoring databases and evidence databases can be used together for effecting cloud forensics.

References 1. Zawoad, S., Hasan, R., Skjellum, A.: OCF: an open cloud forensics model for reliable digital forensics. In: 2015 IEEE 8th International Conference on Cloud Computing, pp. 437–444. (2015). https://doi.org/10.1109/CLOUD.2015.65 2. Dykstra, J., Sherman, A.T.: Design and implementation of FROST: digital forensic tools for the OpenStack cloud computing platform. Digit. Investig. 10, S87–S95 (2013) 3. Raju, BKSP Kumar, Geethakumari, G.: SNAPS: towards building snapshot-based provenance system for virtual machines in the cloud environment. Comput. Secur. 86, 92–111 (2019) 4. Aniello, C., et al.: A novel methodology to acquire live big data evidence from the cloud. IEEE Trans. Big Data 5.4, 425–438 (2017) 5. Roussev, V., McCulley, S.: Forensic analysis of cloud-native artifacts. Digit. Investig. 16, S104– S113 (2016) 6. Vassil, R., et al.: Cloud forensics–tool development studies and future outlook. Digital Investigation 18, 79–95 (2016) 7. Zhengwei, Q. et al.: ForenVisor: a tool for acquiring and preserving reliable data in cloud live forensics. IEEE Trans. Cloud Comput. 5.3, 443–456 (2016) 8. Shams, Z., Dutta, A.K., Hasan, R.: SecLaaS: secure logging-as-a-service for cloud forensics. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (2013)

18

P. Purnaye and V. Kulkarni

9. Federici, C.: AlmaNebula: a computer forensics framework for the cloud. Proc. Comput. Sci. 19, 139–146 (2013) 10. Purnaye, P., Kulkarni, V.: Memory dumps of virtual machines for cloud forensics. IEEE Dataport, December 15, (2020). https://doi.org/10.21227/ft6c-2915 11. Purnaye, P., Kulkarni, V.: OpenNebula virtual machine profiling for intrusion detection system. IEEE Dataport (2020). https://doi.org/10.21227/24mb-vt61

Chapter 3

Machine Translation System Combination with Enhanced Alignments Using Word Embeddings Ch Ram Anirudh and Kavi Narayana Murthy

Abstract Machine Translation (MT) is a challenging problem and various techniques proposed for MT have their own strengths and weaknesses. Combining various MT systems has shown promising results. Confusion network decoding is one such approach. In this work, we propose using word embeddings for aligning words from different hypotheses during confusion network generation. Our experiments, on English-Hindi language pair, have shown statistically significant improvement in BLEU scores, when compared to the baseline system combination. Four data-driven MT systems are combined, namely, a phrase based MT, hierarchical-phrase based MT, bi-directional recurrent neural network MT and transformer based MT. All of these have been trained on IIT Bombay English-Hindi parallel corpus.

3.1 Introduction Machine Translation (MT) is a challenging problem and various techniques proposed for MT have their own strengths and weaknesses. For example, Neural MT systems are good at fluency, whereas they suffer with the problems of unknown words, amount of training data, length of sentences, word alignment and domain mismatch [17]. Combining various MT systems could capitalize on these differences to obtain improved translation quality. Combining MT systems has shown improvement in performance [3, 8, 10, 28, 30]. Systems may be combined in two ways: (1) by intervention at the level of architectures and (2) by combining only the outputs of various MT systems. The focus of the current work is of the latter kind: confusion network decoding [3, 8, 28]. In this method, the outputs (hypotheses) from various MT systems are combined in a directed acyclic graph structure called confusion network.

C. R. Anirudh (B) · K. N. Murthy School of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_3

19

20

C. R. Anirudh and K. N. Murthy

System combination using confusion network involves three steps: confusion network generation, scoring and decoding. Confusion network generation proceeds by choosing a hypothesis called primary hypothesis from different MT hypotheses and aligning semantically similar words from remaining hypotheses to the primary hypothesis. Scoring involves assigning scores using ideas such as majority voting score (number of hypotheses a word occurred in), language model probability and word penalty. Decoding proceeds by beam search through the hypotheses space generated by traversing the network. Alignment of words from different hypotheses is a crucial step in confusion network generation. Various methods have been proposed for alignment: Levenshtein Distance [3], Translation Edit Rate (TER) [28], IBM model alignments [22] and alignments in Meteor [8]. An open source Statistical MT system Jane [8], is used for system combination in our work. Jane uses Meteor [2] matcher for alignments. Meteor aligns words based on exact string matching, stem matching and synonyms. Meteor is flexible and modules can be further added subject to the availability of resources. In this work, we add a fourth module: word embeddings matcher to Meteor. Word embeddings represent words as vectors in an n-dimensional space, capturing a significant amount of semantic information. word2vec [23], fastText [5] and GloVe [26] are some of the well known word embedding schemes. Semantically similar words tend to form clusters in this n-dimensional space [23]. Therefore, distance between the vectors (word embeddings) can be used to align semantically similar words from different MT systems. To the best of our knowledge, this is the first attempt to use word embeddings for alignment in confusion network decoding. In this work, outputs of four English to Hindi MT systems, namely, Phrase Based statistical MT (PBMT) [18], Hierarchical Phrase Based MT (HPBMT) [7], Bi-directional Recurrent Neural Network based neural MT (BRNN) [21] and Transformer based neural MT (TMT) [33] are combined. A system combination baseline is built with default settings (exact, stem and synonymy) in Meteor. Four different system combinations are built using four different word embeddings for alignment, namely word2vec [23] (two variants: skip-gram and continuous bag of words (cbow)), fastText [5] and GloVe [26]. We compare our proposed approach with the individual MT systems and the baseline system combination. Statistically significant improvements in BLEU [25] scores are observed in three cases: skip-gram, cbow and fastText. IIT Bombay English-Hindi parallel corpus [20] is used for training the individual MT systems. PBMT and HPBMT systems are trained using Moses [16]. BRNN and TMT systems are trained using OpenNMT [14].

3.2 Methodology Ensembling outputs of various MT systems is done by boosting. In boosting for classification, various classifiers are trained and the class label that is assigned by a majority of classifiers (majority voting) is chosen as the final label. Unlike a classification task, output of an MT system is a sequence of words. Majority voting may not work since each MT system may generate a different sequence. To handle

3 Machine Translation System Combination …

21

unhone kaha ki yah manoranjan par kendrit hoga usne kaha ki niva siyon ke manoranjan par kendrit kiya jayega unhone kaha ki paryatakon ke manoranjan par fokas hoga unhone kaha ki ava siyon ke manoranjan par dhyan kendrit hoga

Fig. 3.1 An example of confusion network (primary hypothesis is in red color)

this, the hypotheses are arranged in a graph data structure called confusion network. Confusion network is a directed acyclic graph with the following property: any path from start to end node passes through all the nodes. Each arc label consists of a word and a confidence score. Epsilon (NULL) arcs are allowed and scored 1. Ideally, arcs from a node i to i + 1 should consist of words that are semantically related. In a naive model, confidence score of a word could be the number of systems that generated the word in the output. An example (Hindi) of various MT hypotheses and the corresponding confusion network generated is given in Fig. 3.1.

3.2.1 Confusion Network Generation The method of generating a confusion network used in Jane [8] is briefly described here. Given the outputs of m MT systems, a primary hypothesis is chosen, and the remaining hypotheses are aligned word-to-word with the primary hypothesis using Meteor matcher. These alignments are used to generate a confusion network. Jane uses a systematic way of generating confusion network to accommodate relations for words from secondary hypotheses that are not aligned to primary hypothesis but could be potential matches with words from the other hypotheses. First, a confusion network skeleton is created with words in the primary hypothesis as arcs. Secondary hypotheses are ranked based on a language model score built on the hypotheses. Words from the best ranked hypothesis that are aligned with the words in the primary hypothesis are added to the skeleton by inserting a new arc between the corresponding nodes. Words that are not aligned are inserted by adding a new node to the previously inserted arc. The new node is joined with the next node via an epsilon arc and an arc with the unaligned word. This procedure continues until all secondary hypotheses are added to the confusion network.

22

C. R. Anirudh and K. N. Murthy

m confusion networks are built with each one of m hypotheses as a primary hypothesis. These m networks are combined by linking the start nodes of all the networks to a single start node and the end nodes of all the networks to a single end node, resulting in a lattice. The final output is generated by a beam search over all possible hypotheses obtained by traversing the network. The objective function for the search is a weighted log-linear combination of various parameters like: weight for each member system, language model trained on input hypotheses, word-penalty score and score for epsilon arcs. These weights are obtained by optimizing on a held-out set, using Minimum Error Rate Training (MERT) [24].

3.2.2 Alignment of Hypotheses Using Word Embeddings The major contribution of our work is alignment of hypotheses in confusion network using semantic similarity based on word embeddings. For this, we modify the implementation of Indic meteor provided by IIT-Bombay,1 which uses Indo-WordNet [31] for synonymy module. Words that are left out after exact, stem and synonymy matching, are matched using word embeddings. This is done using cosine similarity. Words that have cosine similarity score above a threshold α are considered semantically similar. α is set by maximizing the correlation of the Meteor scores, with a data-set of human evaluations for post-editability [1]. The evaluators score from 1 to 4 based on the effort required to post-edit the MT outputs: 4 means no post-editing required, 3 means minimal post-editing, 2 means post-editing required but better than translating from scratch, 1 means it is better to translate from scratch. The data-set consists of 100 sentences from 3 MT systems: PBMT, RNN based NMT and Google translate, evaluated by professional translators. α for each word embedding is shown in Table 3.1. Pre-trained word embeddings for Indian languages provided by Kumar et al. [19] are used for all the experiments. Size of the embedding vector considered is 50. Adding word embeddings to meteor matcher (Table 3.2) has resulted in increased number of word-matches, tested on transformer MT output and reference translations.2 Table 3.1 Thresholds of semantic similarity, obtained by maximizing the correlation with a human evaluated data-set Word embedding Semantic similarity threshold (α) Skip-gram Cbow FastText GloVe

1 2

0.82 0.73 0.80 0.94

https://www.cfilt.iitb.ac.in/~moses/download/meteor_indic/register.html. details of the data-set used are given in Sect. 3.4.

3 Machine Translation System Combination …

23

Table 3.2 Number of word matches by each module in Meteor when test data is aligned with reference data Module Matches Exact Stem Synonym Embeddings Total

25,876 1278 1602 5240 33,996

There are 48622 tokens in test data and 51019 tokens in reference data Table 3.3 Examples of words matched using word embeddings in Meteor, that are missed out by other modules

Words given in Table 3.3 give an idea of potential word alignments that exact, stem and synonym modules fail to match, but are matched using word embeddings. These examples are taken from Meteor (word2vec) matches between TMT output and reference translations. There are Hindi equivalents of loan words/transliterated words, synonyms of words, inflected forms of same root, spelling variants, etc.

3.3 Related Literature Ensembling using confusion network for machine translation was first proposed by Bangalore et al. [3]. The authors used Multiple String Alignment (MSA) algorithm based on Levenshtein distance between a pair of strings. Matusov et al. [22] proposed using alignment models from IBM models of SMT [6]. The set of hypotheses generated from different MT systems are used as the training data for learning alignments. If the size of the hypothesis set is m sentences and if there are n MT systems, there will be m ∗ (n ∗ (n − 1)/2) pairs of strings for learning the alignments. The

24

C. R. Anirudh and K. N. Murthy

authors reported some improvements in BLEU scores for Chinese-English, SpanishEnglish, Japanese-English language pairs. Sim et al. [30] and Rosti et al. [27, 28] used a relatively simpler alignment method that uses edit operations from Translation Edit Rate (TER) [32] computation and obtained improved results. Rosti et al. [28] further improved consensus decoding by adding features like language model scores, number of epsilon arcs, number of words in hypothesis in a log-linear model. System combination using Jane [8] has outperformed the best single systems as well as best system combination task of WMT 2011. Jayaraman and Lavie [12] aligned words in hypotheses by matching explicitly. The aligned hypotheses are used to generate a new set of synthetic hypotheses and ranked using confidence scores. Confusion network is not used in this method of system combination. Heafield et al. [10] enhanced this system further by introducing an alignment sensitive method for synchronizing available hypothesis extensions across the systems. They also packed similar partial hypothesis, allowing greater diversity in beam search. Banik et al. [4] follow a similar approach and score the hypotheses using various features like language model score, BLEU score, word movers distance and cosine similarity between hypotheses using word2vec. It may be noted that word2vec has been used here for scoring and not alignment. To the best of our knowledge, there have been no attempts in literature to align the hypotheses in confusion networks using word embeddings.

3.4 Set up of the Experiments 3.4.1 Data English-Hindi parallel corpus developed and provided by Center for Indian Language Technologies (CFILT) at IIT Bombay,3 is used for training all MT systems. Version 3.0 [20] consists of 1,609,682 sentence pairs from various domains. The development set consists of 520 sentence pairs and test set consists of 2507 sentence pairs. Hindi monolingual corpus, which is also shared by the same group, consists of 45 Million sentences and 844 Million tokens approximately.

3.4.2 Data Pre-processing We use Moses [16] toolkit for tokenization, cleaning and true-casing for English language data. Hindi language data is tokenized using Indic NLP library.4 Length of the sentences is limited to 80. For NMT systems, we use byte pair encoding (BPE) 3 4

http://www.cfilt.iitb.ac.in/iitb_parallel/. https://anoopkunchukuttan.github.io/indicnlplibrary/.

3 Machine Translation System Combination …

25

[29] word segmentation with 32K merge operations. This segments the tokens into sub-words and reduces the vocabulary size by a large degree. Using BPE has been found to alleviate out-of-vocabulary problem, which was a major bottleneck in NMT.

3.4.3 Training of MT Systems We use Moses [16] for building PBMT and HPBMT systems. Word alignments are trained using mgiza. After training, alignments are symmetrized with -grow-diagfinal-and heuristic. Reordering model is built with msd-bidirectional option. A 5-gram language model with Kneser-Ney smoothing is built with lmplz (kenlm) which comes along with Moses. Tuning is performed using MERT. HPBMT is built with default options in Moses. We use OpenNMT [14] for building BRNN and TMT systems. BRNN system is trained using LSTM based bi-directional RNNs with global attention, with 4 encoding and 4 decoding layers. TMT system is trained using transformers with 6 encoding layers and 6 decoding layers, with 8 attention heads. Adam [13] optimizer is used in both NMT systems. The choice of hyper-parameters is based on configurations of various state-of-the-art NMT systems implemented in Workshop on Asian Translation (WAT) [9] and our own experimental observations. Both the NMT systems are trained on NVIDIA GeForce RTX 2080 GPUs with 8GB memory.

3.5 Experiments and Results We train four MT systems namely PBMT, HPBMT, BRNN and TMT. Five system combinations are performed in total: baseline system with default Meteor alignments (baseline), alignment with word2vec skip-gram (sg), alignment with word2vec CBOW (cbow), alignment with fastText (fastText) and alignment with GloVe (GloVe). BLEU and RIBES [11] evaluation scores are reported in Table 3.4. Bold entries indicate improvement in BLEU and Ribes scores over best individual system (TMT). System combination using word embeddings has shown improvement in performance in all cases except GloVe. System combination baseline has shown marginally poor performance in comparison with best individual system (TMT). To check whether the difference between the BLEU scores is statistically significant, we perform students t-test by bootstrap resampling [15] for each pair of MT systems. The null hypothesis is that the outputs are from the same system. We reject the null hypothesis at p < 0.05. The p-values are reported in Table 3.5. From Table 3.5, it is evident that out of the four system combinations, three models: sg, cbow and fastText show a significant improvement in BLEU scores, when compared with the best individual system (TMT) and the baseline combination system. Improvement in performance of GloVe against baseline is marginal but not statistically significant. The decrease in performance with respect to TMT is also not

26

C. R. Anirudh and K. N. Murthy

Table 3.4 Performance of individual MT systems and their combinations MT system BLEU RIBES PBMT HPBMT BRNN TMT baseline sg cbow fastText GloVe

12.06 13.28 13.31 18.66 18.52 18.96 19.00 18.99 18.53

0.652 0.655 0.715 0.735 0.724 0.731 0.735 0.731 0.729

Table 3.5 p−values for difference between BLEU scores for each pair of MT systems; p < 0.05 are shown in blue color (reject null hypothesis) and others are shown in red PBMT HPBMT BRNN TMT baseline sg cbow fastText GloVe PBMT 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 HPBMT 1.000 0.361 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 BRNN 1.000 0.183 0.013 0.006 0.010 0.155 TMT 1.000 0.000 0.000 0.000 0.419 baseline sg 1.000 0.237 0.287 0.000 1.000 0.332 0.000 cb ow fastT ext 1.000 0.000 1.000 GloVe

statistically significant. The difference between GloVe and other combinations based on word embeddings, is statistically significant. There is no statistically significant difference between the following pairs: TMT-baseline, sg-cbow, sg-fastText, cbowfastText and HPBMT-BRNN.

3.6 Conclusion In this work, we have used word embeddings for aligning hypotheses in confusionnetwork based system combination. An open source statistical MT toolkit Jane, which uses Meteor matcher for confusion network generation is used for our experiments. Meteor is appended with a module for matching words using word embeddings. Cosine similarity between the vectors is used for aligning semantically similar words. Four English-Hindi MT systems PBMT, HPBMT, BRNN and TMT are

3 Machine Translation System Combination …

27

trained on IIT-Bombay English-Hindi parallel corpus. Outputs of these four systems are combined in four different settings using word-embeddings: word2vec skip-gram (sg), word2vec cbow, fastText and GloVe. A system combination baseline is built with default Meteor setting without word embeddings. Three systems (sg, cbow and fastText) out of four combination systems have shown statistically significant improvement in BLEU scores, compared to baseline and the best performing individual system (TMT). GloVe shows a marginal improvement in BLEU score compared to the baseline, but it is not statistically significant. Thus, we see that using word embeddings for alignment in confusion network decoding holds promise.

References 1. Anirudh, C.R., Murthy, K.N.: On post-editability of machine translated texts. Transl. Today (2021). (in press) 2. Banerjee, S., Lavie, A.: Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. vol. 29, pp. 65–72. Association for Computational Linguistics, University of Michigan, Ann Arbor (29 June 2005) 3. Bangalore, S., Bordel, G., Riccardi, G.: Computing consensus translation from multiple machine translation systems. In: IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01. pp. 351–354. IEEE (2001) 4. Banik, D., Ekbal, A., Bhattacharyya, P., Bhattacharyya, S., Platos, J.: Statistical-based system combination approach to gain advantages over different machine translation systems. Heliyon 5(9), e02504 (2019). https://doi.org/10.1016/j.heliyon.2019.e02504 5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016) 6. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993) 7. Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007). https://doi.org/10.1162/coli.2007.33.2.201 8. Freitag, M., Huck, M., Ney, H.: Jane: Open source machine translation system combination. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics. pp. 29–32. Association for Computational Linguistics, Gothenburg, Sweden (Apr 2014). https://doi.org/10.3115/v1/E14-2008 9. Goyal, V., Sharma, D.M.: LTRC-MT simple & effective Hindi-English neural machine translation systems at WAT 2019. In: Proceedings of the 6th Workshop on Asian Translation. pp. 137–140. Association for Computational Linguistics, Hong Kong, China (Nov 2019). https:// doi.org/10.18653/v1/D19-5216 10. Heafield, K., Hanneman, G., Lavie, A.: Machine translation system combination with flexible word ordering. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 56–60 (2009) 11. Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952 (2010) 12. Jayaraman, S., Lavie, A.: Multi-engine machine translation guided by explicit word matching. In: Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 101–104 (2005) 13. Kingma, D. P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

28

C. R. Anirudh and K. N. Murthy

14. Klein, G., Kim, Y., Deng, Y., Nguyen, V., Senellart, J., Rush, A.: OpenNMT: Neural machine translation toolkit. In: Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers). pp. 177–184. Association for Machine Translation in the Americas, Boston, MA (Mar 2018) 15. Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 388–395 (2004) 16. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics, Prague, Czech Republic (June 2007) 17. Koehn, P., Knowles, R.: Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation. pp. 28–39. Association for Computational Linguistics, Vancouver (Aug 2017). https://doi.org/10.18653/v1/W17-3204 18. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1. pp. 48–54. Association for Computational Linguistics, Edmonton (2003) 19. Kumar, S., Kumar, S., Kanojia, D., Bhattacharyya, P.: “A passage to India”: Pre-trained word embeddings for Indian languages. In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 352–357. European Language Resources association, Marseille, France (May 2020) 20. Kunchukuttan, A., Mehta, P., Bhattacharyya, P.: The iit bombay english-hindi parallel corpus. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), pp. 3473–3476. European Language Resources Association (ELRA), Miyazaki, Japan (May 2018) 21. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (Sep 2015). https://doi.org/10.18653/v1/D15-1166 22. Matusov, E., Ueffing, N., Ney, H.: Computing consensus translation for multiple machine translation systems using enhanced hypothesis alignment. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (2006) 23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013) 24. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics, pp. 160–167 (2003) 25. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA (Jul 2002). https://doi.org/10.3115/1073083.1073135 26. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 27. Rosti, A.V., Ayan, N.F., Xiang, B., Matsoukas, S., Schwartz, R., Dorr, B.: Combining outputs from multiple machine translation systems. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 228–235 (2007) 28. Rosti, A.V., Matsoukas, S., Schwartz, R.: Improved word-level system combination for machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 312–319 (2007)

3 Machine Translation System Combination …

29

29. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 86–96. Association for Computational Linguistics, Berlin, Germany (Aug 2016). https://doi.org/10.18653/v1/P16-1009 30. Sim, K.C., Byrne, W.J., Gales, M.J., Sahbi, H., Woodland, P.C.: Consensus network decoding for statistical machine translation system combination. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07. vol. 4, pp. IV–105, IEEE (2007) 31. Sinha, M., Reddy, M., Bhattacharyya, P.: An approach towards construction and application of multilingual indo-wordnet. In: 3rd Global Wordnet Conference (GWC 06). Jeju Island, Korea (2006) 32. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pp. 223–231. The Association for Machine Translation in the Americas, Cambridge (2006) 33. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: NIPS. pp. 6000–6010 (2017)

Chapter 4

Geometry-Based Machining Feature Retrieval with Inductive Transfer Learning N. S. Kamal, H. B. Barathi Ganesh, V. V. Sajith Variyar, V. Sowmya, and K. P. Soman Abstract Manufacturing industries have widely adopted the reuse of machine parts as a method to reduce costs and as a sustainable manufacturing practice. Identification of reusable features from the design of the parts and finding their similar features from the database is an important part of this process. In this project, with the help of fully convolutional geometric features, we are able to extract and learn the high-level semantic features from CAD models with inductive transfer learning. The extracted features are then compared with that of other CAD models from the database using Frobenius norm and identical features are retrieved. Later we passed the extracted features to a deep convolutional neural network with a spatial pyramid pooling layer and the performance of the feature retrieval increased significantly. It was evident from the results that the model could effectively capture the geometrical elements from machining features.

4.1 Introduction Computer-aided process planning (CAPP) helps determine the processing steps required to manufacture a product or its parts. It serves as the connecting link between computer-aided design (CAD) and computer-aided manufacturing (CAM). Automated machining feature recognition is a critical component in the detection N. S. Kamal · V. V. Sajith Variyar (B) · V. Sowmya · K. P. Soman Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India e-mail: [email protected] V. Sowmya e-mail: [email protected] K. P. Soman e-mail: [email protected] H. B. Barathi Ganesh Federated AI Services, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_4

31

32

N. S. Kamal et al.

of manufacturing information from CAD models. When it comes to machine parts, features are semantically higher-level geometric elements such as a hole, passage, and slot. Feature recognition is the process of identifying these features from an image or a 3D model of these machine parts. Approaching machining feature recognition (MFR) as a supervised problem is not feasible for real-time applications in industries. This is because, different industries have their own data lakes to store the huge pool of 3D CAD models that include different combinations with respect to their requirements. Preparing a supervised corpus and training them to predict large number of classes is a complex and tedious process to achieve. By observing this, we propose an approach that includes inductive transfer learning for geometrical feature extraction [1]. CAD models of machining features [2] were converted to point cloud data, and Minkowski engine [3] was used to handle sparse convolutions. Frobenius norm was used for measuring similarity between the extracted geometrical features, and later a spatial pyramid pooling layer was added for making the feature matrices to span the same number of spaces. We have detailed the relevant work associated with machining feature retrieval in Sect. 4.2, the proposed approach is given in Sect. 4.3, and observed results with the experiments conducted are given in Sect. 4.4.

4.2 Literature Review The attempts at integrating computer-aided process planning (CAPP) with CAD began decades ago [4]. Several approaches to development of feature recognition techniques followed [5, 6]. Major approaches being graph-based methods [7, 8], hint-based methods [9], convex decomposition, volume decomposition [10], and free form features [11]. One common problem addressed by all these approaches was an increase in computational complexity. Introduction of neural networks revolutionized feature recognition as it was excellent at recognizing patterns and was able to tolerate noise in input data. Some of the most influential recent works on 3D data representation and feature recognition are as 3D ShapeNets [12], VoxNet [13], and PointNet [14]. However, the most recent and important work on machining feature recognition is FeatureNet [2]. The authors have used deep 3D convolutional neural networks, and it could recognize manufacturing features from the low-level geometric data such as voxels with a very high accuracy. To train FeatureNet, a large-scale dataset of 3D CAD models with labeled machining features was constructed by the authors. They achieved a feature recognition accuracy of 96.7 % for the recognition of single features. The recent breakthrough in geometric feature extraction was using fully convolutional geometric features by C. Choy et al. [1]. The authors present a method to extract geometric features computed in a single pass by a 3D fully convolutional network. The predictions have a one-to-one correspondence with input image in spatial dimension. The authors have used a UNet structure with skip connections and resid-

4 Geometry-Based Machining Feature Retrieval …

33

ual blocks to extract sparse fully convolutional features. They used the Minkowski Engine to handle sparse convolutions. Majority of the existing works on structuring 3D models of machining features are based on low-level geometric characteristics, which makes the model localized and domain specific. Extracting high-level features will result in a generalized model that can be applied in various domains. Utilizing fully convolutional neural networks for feature extraction are proven to be significantly faster than conventional methods [1].

4.3 Materials and Methods For benchmarking purpose, we have taken the synthetic data that includes 24 machining features [2]. Geometrical features were extracted from the CAD models using an inductive transfer learning technique using a model pre-trained with fully convolutional geometric features for the purpose of image registration [1]. Point cloud registration is the underlying source task of this process. The number of extracted geometrical feature vectors from this process was varying with respect to the CAD models. In order to get the same number of feature spaces, later an SPP layer was introduced as the target task of inductive transfer learning process. Frobenius norm was computed to measure the similarity between the CAD models. Based on the obtained similarity score, 3D models were assigned to the respective classes. The overall proposed approach is given in Fig. 4.1.

Fig. 4.1 Proposed method for feature extraction and retrieval

34

N. S. Kamal et al.

4.3.1 3D CAD Models The dataset used for feature retrieval is adopted from the synthetic dataset generated for FeatureNet [2]. It consists of 24 commonly found machining features in the manufacturing industry (e.g., O-ring, rectangular passage). Each of the 24 classes has 1000 3D CAD models of varying dimensions and orientations. The features are present in the CAD models as portions removed from a solid cube of dimensions 10 cm × 10 cm × 10 cm (Fig. 4.1). The 3D models are available in stereolithography(STL) format, which is a widely used format in the field of computer-aided manufacturing. STL files describe only the surface geometry of a three-dimensional object without any representation of color or texture.

4.3.2 Methodology Feature extraction with Inductive Transfer Learning The 3D files in STL format were converted into point cloud data with the help of Open3D library [15]. Point cloud data was given as the input to the neural network with ResUNet architecture pre-trained on 3D match dataset which extracts the highlevel geometric features [1].The extracted feature vectors were observed to be 32 dimensional. The number of feature vectors extracted were found to be different, even for 3D models of the same family. The features were projected to three dimensions with t-distributed stochastic neighbor embedding (t-SNE) [16] for visualization purposes. A sample 3D model of a through hole with its extracted features laid over it (See Fig. 4.2). Frobenius Norm The extracted feature vectors supposedly should capture the geometrical elements of the machining features. To test this, the extracted features needed to be compared with a similarity measure. For this purpose, all the 24,000 models were paired with random samples from each class and the Frobenius norm of their

[a]

[b]

Fig. 4.2 a A sample 3D model from class ‘through hole,’ b extracted feature vectors projected to 3D space and laid over the model

4 Geometry-Based Machining Feature Retrieval …

35

feature matrices were found. The Frobenius norm of a matrix ‘A’ of dimension m × n is defined as the square root of the sum of the squares of the elements of the matrix.   n  m  A F =  |ai j |2

(4.1)

i=1 j=1

The difference in frobenius norm obtained for each pair was sorted in descending order along with their corresponding labels. The position of the true label on this list is found out, and then the accuracy is calculated. The accuracy and top-five accuracy obtained were 39% and 86%, respectively. The top-five accuracy should be satisfactory for retrieval applications, since a set of similar features are returned. From the result, it was evident that the FCGF could capture some geometric information from the 3D models. But the accuracy was not satisfactory enough for real-world application. The reason behind the low performance was later found not to be the extracted geometrical features but the inability of Frobenius norm to serve as a good similarity measure in this particular case. The machining features varied significantly in size and orientation even within the same family. Frobenius norm could only capture the energy of the feature matrix which was not good enough for comparison. Spatial Pyramid Pooling Layer The next target was to improve the results by finding out a better method to capture the similarity between feature matrices of different shapes. A deep neural network with a spatial pyramid pooling(SPP) layer was used to solve this issue [17, 18]. SPP maintains spatial information in local spatial bins of fixed size and removes the fixedsize constraint of the network by giving an output of a fixed dimension irrespective of input size. The dataset of 24,000 models belonging to 24 classes was split into three subsets—training, validation,testing at 70%, 15%, and 15%, respectively. A CNN architecture with 97,272 learnable parameters was used to learn the extracted features, and in the pre-final layer, spatial pyramid pooling was introduced as shown in (Fig. 4.3). The Frobenius norm is equivalent to the Euclidean norm generalized to matrices instead of vectors. So the Euclidean distance between the output vectors from SPP layer was taken as the measure of similarity between features. Adam optimizer and categorical cross-entropy loss were used for training the model for a mere 30 epochs, and the loss and accuracy is shown in the graphs (See Fig. 4.4).

4.4 Results and Discussion The inclusion of SPP layer improved the performance of feature retrieval. The testing accuracy obtained from the current model was 86%, and the top-5 accuracy was 95 % for 30 epochs. Pyramid pooling is robust to object deformations and is suitable for data of higher dimensions. Hence, it was able to handle machining features with varying physical dimensions and it proved to be a better method than taking matrix

36

N. S. Kamal et al.

Fig. 4.3 Architecture with SPP layer used for learning extracted features

[a]

[b]

Fig. 4.4 a Training and validation loss curves for 30 epochs, b training and validation accuracy for 30 epochs

4 Geometry-Based Machining Feature Retrieval …

37

Table 4.1 Sample files from four families chosen for testing and the top 2 families retrieved from database Family of Test file Family 1 Family 2 O-ring Rectangular passage Triangular pocket 6 Sides passage

Circular end pocket Rectangular pocket Triangular passage 6 Sides pocket

Blind hole Rectangular blind step rectangular blind step Circular blind step

norm of the extracted features directly. As the dimensionality of the data increases, the less useful Euclidean distance becomes. This might be one of the reasons why it didn’t work directly on the feature matrices but worked well on the lower dimensional output from the SPP layer. One major application of this work in real life would be the retrieval of CAD models from the database identical to an input CAD model given by the user. To test the feasibility of this application, some CAD models belonging to families of machining features with different geometrical properties were taken from the test set and we performed a retrieval of the most similar features. The search returned other features from the same family first and then returned features from other geometrically similar families. The categories of the top features returned by sample files from four families of distinct geometrical properties are listed in Table 4.1. Family 1 is the most identical one to the family of test feature and family 2 is the second most identical one. The models retrieved were rendered and inspected for geometric similarities. Searching with circle-based features returned CAD models from other circle-based families (e.g., O-ring returned circular end pocket and blind hole). The same was the case with rectangular, triangular and six-sided features. From the output, it was clear that the extracted feature vectors could successfully capture the geometric properties of the machining features. It became evident that the similarity in size or orientation was preceded by the similarity in geometrical features. Each CAD model in the dataset was given a numerical identification number for convenience. The indices of the top retrieved features along with the corresponding Euclidean distances for a test file from the family ‘circular end blind slot’ are shown in Table 4.2. We can observe that all the top-5 features belong to the same family as the test file in this case. The test file and the top retrieved CAD model are rendered in Fig. 4.5. We can observe that the two features are of different size but the network could still capture the similarity between them.

4.5 Conclusion and Future Work The results indicate that the inductive transfer learning model performed with fully convolutional neural networks, could capture, and learn the geometric similarity

38

N. S. Kamal et al.

Table 4.2 Top-5 models retrieved for a sample 3D model of the family circular end blind spot (ID: 1990) No. Model ID Family Euclidean distance 1 2 3 4 5

1214 1561 1326 1853 1726

[a]

Circular end blind spot Circular end blind spot Circular end blind spot Circular end blind spot Circular end blind spot

8.17 8.28 8.49 8.89 8.94

[b]

Fig. 4.5 a Test model - ID: 1990, Family: circular end blind slot, b Result no.1 - ID: 1214, Family: circular end blind slot

between 3D models of machining features. Even though the network was pre-trained on 3D image data, it performed well on the 3D CAD model data. Spatial pyramid pooling proved to be very efficient in handling feature vectors of varying sizes. The model was good enough to perform feature retrieval with 95 % top-5 accuracy and the retrieval of CAD files from databases proved to be successful enough for practical applications. This work could be extended for recognition and retrieval of multiple features present in a single CAD model. This work could also be extended to a generalised shape search application.

References 1. Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South) (2019), pp. 8957– 8965. https://doi.org/10.1109/ICCV.2019.00905 2. Zhang, Z., Jaiswal, P., Rai, R.: FeatureNet: machining feature recognition based on 3D convolution neural network. Comput.-Aided Des. 101, 12–22 (2018) 3. Choy, C., Gwak, J.Y., Savarese, S.: 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks (2019)

4 Geometry-Based Machining Feature Retrieval …

39

4. Grayer, A. R.: A Computer Link Between Design and Manufacture,” Ph.D. Dissertation, University of Cambridge (1976) 5. Shah, J.J., Anderson, D., Kim, Y.S., Joshi, S.: A discourse on geometric feature recognition from CAD models. ASME. J. Comput. Inf. Sci. Eng. 1(1), 41–51 (2000) 6. Arvind Kumar Verma & Sunil Rajotia: A review of machining feature recognition methodologies. Int. J. Comput. Integr. Manuf. 23(4), 353–368 (2010) 7. Malyshev, A., Slyadnev, S., Turlapov, V.: CAD Model Simplification Using Graph-Based Feature Recognition (2017) 8. Gao, S., Shah, J.J.: Automatic recognition of interacting machining features based on minimal condition subgraph. Comput.-Aided Des. 30(9), 727–739 (1998) 9. Verma, A.K., Rajotia, S.: A hint-based machining feature recognition system for 2.5D parts. Int. J. Prod. Res. 46(6), 1515–1537(2008) 10. Kailash, S.B., Zhang, Y.F., Fuh, J.Y.H.: A volume decomposition approach to machining feature extraction of casting and forging components. Comput.-Aided Des. 33(8), 605–617 (2001) 11. Sundararajan, V., Wright, P.K.: Volumetric feature recognition for machining components with freeform surfaces. Comput.-Aided Des. 36(1), 11–25 (2004) 12. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). https://doi.org/10.1109/cvpr.2015.7298801 13. Maturana, D., Scherer, S.: VoxNet: a 3D Convolutional Neural Network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, pp. 922–928 (2015). https://doi.org/10.1109/IROS.2015.7353481 14. Charles, R., Su, H., Mo, K., Guibas, L.: PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, pp. 77–85 (2017). https://doi.org/10.1109/CVPR.2017.16 15. Zhou, Q.-Y., Park, J., Koltun, V.: Open3D: A Modern Library for 3D Data Processing (2018) 16. van der Maaten, L., Hinton, G.: Viualizing data using t-SNE. J. Mach. Learn. Rese. 9, 2579– 2605 (2008) 17. Sriram, S., Vinayakumar, R., Sowmya, V., Alazab, M., Soman, K.P.: Multi-scale Learning based malware variant detection using spatial pyramid pooling network. In: IEEE INFOCOM 2020— IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, pp. 740–745 (2020). https://doi.org/10.1109/INFOCOMWKSHPS50562.2020. 9162661 18. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognitionIEEE Trans. Pattern Anal. Mach. Intelligence 37(9), 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824

Chapter 5

Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture R. Priyamvada , D. Govind, Vijay Krishna Menon, B. Premjith , and K. P. Soman Abstract The two key components of Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems are language modeling and acoustic modeling. The language model generates a lexicon, which is a pronunciation dictionary. A lexicon can be created using a variety of approaches. For low-resource languages, rule-based methods are typically employed to build the lexicon. However, because the corpus is often tiny, this methodology does not account for all possible pronunciation variances. As a result, low-resource languages like Malayalam require a method for developing a comprehensive lexicon as the corpus grows. In this work, we explored deep learning based encoder-decoder models for grapheme-to-phoneme (G2P) conversion in Malayalam. Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) encoder models with varying embedding dimensions were used to create the encoder model. The performance of the deep learning models used for G2P conversion was measured using the Word Error Rate (WER) and Phoneme Error Rate (PER). With 1024 embedding dimensions, the encoder using the BiLSTM model had the maximum accuracy of 98.04% and the lowest PER of 2.57% at the phoneme level, and the highest accuracy of 90.58% and the lowest WER of 9.42% at the word level.

5.1 Introduction One of the most significant ways that humans engage is through speech. To develop a reliable and deployable system that can identify and synthesis speech, several research methods have been used. Researchers have built very efficient ASR and TTS systems using developing techniques in artificial intelligence and data science. R. Priyamvada · D. Govind · V. K. Menon · B. Premjith (B) · K. P. Soman Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] K. P. Soman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_5

41

42

R. Priyamvada et al.

Fig. 5.1 Automatic speech recognition system using Kaldi

These systems have already been extensively utilized for leading spoken languages such as English, Hindi, French, and Spanish. Home automation, communication, health care, finance, marketing, defense, and other applications of ASR and TTS are only a few examples. However, these are often limited to highly resourced languages. In the case of low-resourced languages, such level of advancement has not been made. Automatic speech recognition systems include two models, which are language model and acoustic model. The language model finds the relation between graphemes to their phonemes, whereas acoustic models find the correlation between features extracted from the audio to the pronunciation dictionary [1]. The efficiency of the Speech Recognition (SR) system depends highly on the resources and the models used. For acoustic modeling, the HTK toolkit [2], Kaldi [3], Deep Speech [4] etc., are used. For language modeling which generates lexicon, rule-based model [5], conditional and joint model [6], Hidden Markov Model [7], Long Short-Term Memory Recurrent Neural Network model [8], Convolutional Neural Networks [9], Transformer based model [10] etc., are used. Figure 5.1 represents one such ASR model using Kaldi. Language modeling, which is the first step before acoustic modeling, is critical in improving the accuracy of the ASR systems. Lexicon created from the transcription will include only the words present inside those transcriptions. If the dataset is minimal, the lexicon may not contain different dialects, which will impact the model during implementation. Rule-based techniques typically demand manual work in the case of lexicon expansion, which takes a long time and effort. As a result, we must investigate deep learning techniques that can help us overcome this challenge. In this paper, we investigated the efficacy of the encoder-decoder model [11] with BiLSTM and LSTM [12] networks for expanding the lexicon by generating phoneme sequences from Malayalam words.

5.2 Literature Review Malayalam is a less spoken language, and hence, it belongs to the class of lowresource languages. The amount of exploration done in these low resourced languages using the above mentioned speech recognition techniques is less than in

5 Grapheme to Phoneme Conversion …

43

other significant languages. However, recently, a substantial amount of research has been carried out towards the development of efficient datasets for speech recognition and speech synthesis for low resourced languages like Malayalam. Vimal et al. [13], proposed a method using wavelet packet decomposition and discrete wavelet decomposition technique for Malayalam speech recognition. B Lavanya et al. [1] proposed an SR model using Kaldi, which was trained on 2.7 h of data. The authors reported a WER of 50% for the mono-phone model and 43.87% for the tri-phone model. This WER is higher than other highly resourced languages like English. [4] proposed a DeepSpeech model where WER of 6% and 19% were obtained for clean and noisy data. This DeepSpeech model was trained on 5000 h of data, which points out that resources are significant for speech recognition. [14] have attempted to create a large Malayalam Corpus for encouraging academic research. This corpus contains 200 h of narrational speech and 50 h of interview speech. It also has 4925 unique words in the lexicon. This kind of large corpus will help to improve the efficiency of the ASR systems. Their collection of the corpus is still expanding, which will increase the size of the lexicon. Here, they have transcribed the audio content into Malayalam and phonetic text in English with the aid of transcribers. So, doing it manually for this amount of data is a tedious task. That is why, we need to create a model which can transcribe a phonetic text automatically from its transcriptions. It also points out that an inefficient lexicon will bring down the quality of the speech recognition system. As mentioned earlier, there are different language models for lexicon generation. The commonly used method for making a lexicon is the rule-based method, where the rules are handcrafted according to the language to improve their accuracy. Nair et al. [15] have proposed a rule-based approach for Malayalam. But this requires a person who is proficient in the language. But the corpus used for rule-based methods is static and small, so most of the time, it fails to include all variations in pronunciation [5]. The latter proposed a decision tree-based model for grapheme to phoneme conversion for Hindi and compared it with the rule-based model. Here, the decision tree performed better compared to the rule-based model. Baby et al. [16] implemented a unified parser method for multiple Indian languages like Malayalam, Tamil, Hindi, Telugu, etc., for their TTS system. But they achieved only an accuracy of 80–95%. Recently, LSTM and BiLSTM based models are also used for making lexicon. Rao et al. [8] proposed a model using LSTM Recurrent Neural Network(RNN). They experimented with unidirectional and bidirectional LSTM with different embedding dimensions. The BiLSTM model outperformed other models. The CMU pronunciation dictionary containing more than 1 lakh words in English was used as the dataset. The lowest WER of 21.3% was achieved for the model. These models prove to be more effective for G2P conversion and also enables to adapt with expanding lexicon.

44

R. Priyamvada et al.

5.3 Malayalam Phonemes Malayalam is an Indian language that comes under the Dravidian category. It contains 56 letters in which there are 36 consonants, 15 vowels and 5 chillaksharas. So, for all these orthographic forms, there will be a corresponding phoneme. Apart from these, there are conjunct consonants, which are combinations of consonants and vowels. This also has separate phonemes. For example, phoneme for ‘ ’ is ‘n’, whereas for ’ it is ‘n n’ and ‘ ’ it is ‘nn’. For different dialects, the pronunciation of the ‘ word also changes. Therefore, a person proficient in Malayalam language is required to structure these rules for rule-based method. Even then, it is difficult to capture all such variations with a limited corpus.

5.4 Dataset Description The lexicon is made of two sets of word lists. The first word list was taken from the Indic TTS Malayalam dataset. The dataset contained audio files in WAV format and their corresponding transcriptions in a text file. The transcription files had 11300 sentences in total which were tokenized to procure a word list. The second set of word lists was made from the various sources available from the internet like Wiktionary, Olam, etc. Then, the word lists were combined to make a large word corpus. The combined word list contained 33206 unique words. Then, the lexicon was generated using unified parser from the Indic TTS library. Table 5.1 shows the sample of grapheme and its corresponding phoneme used in the unified parser. Table 5.2 shows a preview of the lexicon. This lexicon has been used as the dataset for our work. The first column contains the words written in Malayalam, and the second column contains corresponding phonemes. Dataset was split into 90:10 ratio for training and testing.

Table 5.1 Sample of grapheme and its phonemes used in the unified parser.

5 Grapheme to Phoneme Conversion …

45

Table 5.2 Preview of dataset

5.5 Experiments and Results This section describes the methods used for conducting the experiments and also the discussions on the results obtained for the G2P conversion for Malayalam data.

5.5.1 Experimental Setup—The Grapheme to Phoneme Model This work used the encoder-decoder architecture for the grapheme to phoneme conversion in Malayalam. The architectural diagram of the encoder-decoder model is shown in Fig. 5.2. The encoder reads the input word and learns the context and generates internal states. The input is a set of variable-length Malayalam words that are enumerated and then one-hot encoded for better representation. This is the input to the encoder. The decoder uses these internal state information as its input to generate the corresponding phoneme. Generally, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional Long Short-Term Memory (BiLSTM), etc. are used for building the encoder part, whereas the decoders are designed using the unidirectional variants of LSTM/RNN/GRU. Here, two models have been trialed, the first model with LSTM at both encoder and decoder and the second model with BiLSTM at encoder and LSTM at the decoder. The architecture of best performing model, BiLSTM model, is illustrated in Fig. 5.2. The encoder used 1024 vectors for generating the grapheme vectors, whereas the decoder used 2048 dimensional vectors for representing the phonemes. As shown

Fig. 5.2 The best performing G2P encoder-decoder model with BiLSTM

46

R. Priyamvada et al.

in Fig. 5.2, when an input ‘ ’ (raghu) is given to encoder, it learns the context and output the internal states with which decoder predicts the phonemes as ‘r a gh u’. The performance of BiLSTM and LSTM models are assessed using PER and WER, which are the standard metrics used in G2P conversion and speech recognition. WER and PER are calculated as follows, NW × 100 NT

(5.1)

S+I +D × 100 N

(5.2)

WER = PER =

where NT is total number of words, NW is the number of wrongly predicted words, S is the number substitutions, I is the number of insertions, D is the number of Deletions, and N is the number of phonemes in a word.

5.5.2 Hyperparameters Hyperparameter tuning is important in finding the optimal model for a particular application. The set of values used for finding the optimal hyperparameters of the encoder-decoder model used for G2P conversion in Malayalam is shown in Table 5.3 along with the optimal hyperparameters. WER and loss on the test dataset were observed for fixing the optimal values for the hyperparameters. This was done by changing one of the hyperparameters, while keeping other values constant. This method was followed for both BiLSTM and LSTM models. Adam optimizer and categorical cross entropy were used as the optimizer and loss function, respectively, whereas the Softmax was used at the decoder for generating the phoneme sequence.

Table 5.3 Optimal hyperparameters used for building the encoder-decoder architecture using BiLSTM Parameters List of values Optimal value Batch size Epochs Dropout (at encoder) Dropout (at decoder) Learning rate

32, 48, 64, 100 10, 15, 20, 25 0.05, 0.1 0.05, 0.1 0.01, 0.001

100 15 0.05 0.01 0.001

5 Grapheme to Phoneme Conversion …

47

5.6 Results and Discussion Table 5.4 compares error rates and accuracies of LSTM model with different embedding dimension in word and phoneme level. The embedding dimensions were taken from 256 to 1024. The lowest PER of 5.5% and highest accuracy of 85.49% were obtained for model with 256 embedding dimension at the phoneme level. In case of LSTM, the best accuracy was obtained for the lowest embedding dimension. It is observed that accuracy decreases with the increase in embedding dimension. Table 5.5 shows the error rates and the accuracies of BiLSTM model used for G2P conversion. The best performing BiLSTM model attained accuracies of 90.58% and 98.04% for word and phoneme level, respectively, with the embedding dimension of 1024. The corresponding WER and PER achieved by the BiLSTM model are 9.42% and 2.57%, respectively. Unlike LSTM model, in BiLSTM network, the performance improves with the increase in embedding dimension at both word and the phoneme level. From the Tables 5.4 and 5.5, it is evident that BiLSTM network performs better than encoder with LSTM. This is due to the fact that BiLSTM learns the context better than LSTM. It is able to capture future information along with the past information. Along with embedding dimension, other hyperparameters also contributed to higher efficacy of the BiLSTM model.

Table 5.4 Word level and phoneme level error rates and accuracies of LSTM model LSTM Word Level Phoneme Level Embedding dimension 256 512 768 1024

WER

Accuracy

PER

Accuracy

14.51 32.46 24.2 31.58

85.49 67.54 75.79 68.41

5.5 13.68 9.74 8.75

95.81 89.93 93.48 93.49

Table 5.5 Word level and phoneme level error rates and accuracies of BiLSTM model BiLSTM Word level Phoneme level Embedding dimension 256 512 768 1024

WER

Accuracy

PER

Accuracy

19.36 13.39 11.71 9.42

80.64 86.6 88.29 90.58

6.07 3.37 3.21 2.57

95.25 96.73 97.38 98.04

48

R. Priyamvada et al.

5.7 Conclusion The encoder-decoder architecture with BiLSTM at the encoder side showed an increase in the accuracy of generating the phoneme sequence with the increase in the embedding dimension. However in LSTM, performance of the model and the embedding dimension are inversely proportional. From this, we can deduce that the model which learns more contextual information like BiLSTM at the encoder in an encoder-decoder model can be used for G2P conversion of Malayalam data. The encoder-decoder model with BiLSTM at the encoder side scored the highest accuracy of 90.58% at the word level and 98.04% at the phoneme level, which are better than the encoder with a unidirectional LSTM network. It also achieved a PER of 9.42% at the word level and 2.57% at the phoneme level, which are also lower than the encoder network containing an LSTM. In this work, the BiLSTM network performed better than LSTM for the encoder by capturing both forward and backward contextual information. However, this model fails to predict the phoneme sequence for lengthy word sequence. It can be resolved with the help of an attention network or a transformer network.

References 1. Babu, L.B., George, A., Sreelakshmi, K.R., Mary, L.: Continuous speech recognition system for malayalam language using Kaldi. In: 2018 International Conference on Emerging Trends and Innovations In Engineering And Technological Research. ICETIETR, pp. 1–4. IEEE, Ernakulam (2018). https://doi.org/10.1109/ICETIETR.2018.8529045 2. Young, S.: The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge, Department of Engineering Cambridge, England (1994) 3. Sri, K.V.L., Srinivasan, M., Nair, R.R., Priya, K.J., Gupta, D.: Kaldi recipe in Hindi for word level recognition and phoneme level transcription. In: Procedia Computer Science, pp. 2476– 2485 (2020) 4. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Andrew Y. Ng.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014) 5. Kumar, C.S., Govind, D., Nijil, C., Manish, N.: Grapheme to phone conversion for Hindi. In: Oriental COCOSDA. Malaysia (2006) 6. Chen, S.F.: Conditional and joint models for grapheme-to-phoneme conversion. In: EUROSPEECH-2003, pp. 2033–2036. Geneva (2003) 7. Taylor, P.: Hidden Markov models for grapheme to phoneme conversion. In: INTERSPEECH2005, pp. 1973–1976. Lisbon (2005) 8. Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long shortterm memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP, pp. 4225–4229. IEEE, South Brisbane (2015). https:// doi.org/10.1109/ICASSP.2015.7178767 9. Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Grapheme-to-phoneme conversion with convolutional neural networks. Appl. Sci. 96, 1143 (2019) 10. Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Transformer based grapheme-to-phoneme conversion. In: INTERSPEECH-2019, ISCA (2019). https://doi.org/10.21437/interspeech.20191954

5 Grapheme to Phoneme Conversion …

49

11. Premjith, B., Kumar, M.A., Soman, K.P.: Neural machine translation system for English to Indian language translation using MTIL parallel corpus. J. Intel. Syst. 28(3), 387–398 (2019) 12. Premjith, B., Soman, K.P., Poornachandran, P.: A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features. In: Proceedings of the 10th annual meeting of the Forum for Information Retrieval Evaluation, pp. 56–60. Association for Computing Machinery, New York, USA (2018) 13. Krishnan, V.V., Anto, P.B.: Features of wavelet packet decomposition and discrete wavelet transform for Malayalam speech recognition. Int. J. Recent Trends Eng. 1(2), 93–96 (2009) 14. Lekshmi, K.R., Jithesh, V.S., Sherly, E.: Malayalam speech corpus: design and development for dravidian language. In: Proceedings of the WILDRE5–5th Workshop on Indian Language Data: Resources and Evaluation, pp. 25–28. European Language Resources Association (ELRA), Marseille (2020) 15. Nair, S.S., Rechitha, C.R., Kumar, C.S.: Rule-based grapheme to phoneme converter for Malayalam. Int. J. Comput. Linguist. Nat. Lang. Process. 2(7), 417–420 (2013) 16. Baby, A., Nishanthi, N.L., Thomas, A.L., Murthy, H.A.: A unified parser for developing Indian language text to speech synthesizers. In: International Conference on Text, Speech, and Dialogue. TSD 2016, LNCS, vol. 9924, pp. 514–521. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-45510-5_59

Chapter 6

Usage of Blockchain Technology in e-Voting System Using Private Blockchain Suman Majumder and Sangram Ray

Abstract In India, the conventional voting system consists of paper polling, electronic ballot system, and associated mechanical devices. However, for the improvement of conventional voting system, its scalability and accessibility from anywhere, a digital voting system can be built using the electronic polling devices like e-voting, mobile voting, and IoT-related centralized voting system. But the conventional system faces some limitations—fake voters, costing, quicker outcomes, constant Internet connectivity, secure user interface, snooping, and DoS attack. To remove those barriers, blockchain is introduced in e-voting applications that provide stability and anonymity of voters due to the use of Merkle tree and hashed confidential data and any changes can be detected if the hash value is changed and the message is conveyed immediately. In this scheme, we have proposed an e-voting application using private blockchain, practical Byzantine fault-tolerant (PBFT) non-competitive consensus algorithm, and ECC-based session generated by server.

6.1 Introduction Blockchain technology was first introduced by Satoshi Nakamoto in 2008 as a peerto-peer payment system using Bitcoin that confirms user to perform any transaction using Internet connectivity without any fear of lose of money in decentralized structure [1]. Nowadays, the voting procedure has been shifted from conventional paper polling to digital technology like electronic ballot system to prevent the illegal behavior of voters, hazards to maintain the queue for voting process, scalability, reliability, transparency of voting process and reduce the investment of unnecessary manpower and confirmation regarding the casted votes. Afterward, the final counting results are performed in a proper way that produces the correct result [2]. Internet of things (IoT) can be used for the voting and counting process. However, such system faces some complexity regarding the network and bandwidth requirements, S. Majumder (B) · S. Ray Department of Computer Science and Engineering, National Institute of Technology Sikkim, Ravangla, Sikkim 737139, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_6

51

52

S. Majumder and S. Ray

very expensive in computation and counting, space limitation, testing and centralized control mechanism, etc. [3]. So, blockchain technology can be used for this purpose. But still there is a possibility of double voting problem and 51% attack in public blockchain applications. So, private blockchain is generally used in e-voting system to resist voters with permission to access the block application [4–7]. To solve these limitations of digital voting system, private blockchain technology can be used along with the smart contract and efficient consensus algorithm that makes e-voting system more secure, efficient, and scalable.

6.1.1 Blockchain Technology Blockchain technology is a decentralized and distributed (public blockchain) or centralized (private blockchain) application, ledger, or database [4–7] where data are shared with the participant nodes called peers. Communication data is transferred in the form of transaction by the peer nodes. Blockchain is a chain of blocks where blocks are connected to hold information regarding any event or transaction, and each transaction within the blockchain is verified by a consensus mechanism that is supported by the majority of the peer nodes [6]. However, the main challenge of the distributed architecture is to satisfy the condition that all the nodes works on a common agreement which all the peer nodes have to follow and any changes that made by a single node must convey to all the other nodes in the blockchain network. So, when peer node wants to add a block to the chain, the peer nodes are responsible to validate the block. After successful validation, if the majority of the nodes agree to add the block, then only the respective block is added to the blockchain; otherwise, it is discarded. So, the peer nodes take the whole responsibility to take any decision regarding the alteration of the blockchain rather using any third-party centralized system [6]. This common agreement mechanism is called consensus. Depending on the agreement made by the peers, blockchain uses different consensus algorithms like—proof of work (PoW), proof of stake (PoS), raft, Paxos, and practical Byzantine fault tolerance (PBFT) to validate the transactions [4–7, 10–14].

6.1.2 Smart Contract Smart contract is an executable code that runs on the top of the blockchain network to provide the facility as an agreement between the peer nodes of the blockchain [6]. In blockchain system, the vote is casted in the form of transaction using the cryptographic hashing technique SHA-256 that produces 256-bit code [7].

6 Usage of Blockchain Technology in e-Voting …

53

6.1.3 Private Versus Public Blockchain Public blockchain is mainly decentralized in nature and must be synchronized properly so that they can work properly in a series. Generally, this type of chain is very big in size and that is why it takes more time and energy to perform any operation. In private blockchain, the nodes need to take permission to join in the network and perform any further operation. So, private blockchain is much faster, efficient, and secure than public blockchain, and it uses centralized structure [10].

6.1.4 Practical Byzantine Fault Tolerance (PBFT) It is a type of non-competitive consensus algorithm that is mainly used in private blockchain network [10]. It can solve the issues of uncertainty of any branch of Merkle tree and performance issues of the network and solve the problem of double spreading attack, 51% attack in blockchain. Contribution and Organization of the Paper: From the above discussion, it is concluded that the conventional digital voting system faces some limitations like fake voters, increases of costing, quicker outcomes, stable Internet connectivity, design of secure user interface, snooping, and DoS attack. To remove these barriers, we are motivated to propose an e-voting application using a smart contract, private blockchain, practical Byzantine fault-tolerant (PBFT ) noncompetitive consensus algorithm and ECC-based session generated by the server. The remaining of this paper is organized as: In Sect. 6.2, the details literature review of different schemes and their drawbacks are also mentioned. In Sect. 6.3, the details working procedure of e-voting system and sample counting results of the system are discussed. Finally, Sect. 6.4 concludes the paper.

6.2 Literature Review In 2018, Casado-Vara and CoRCHaDo proposed a blockchain scheme that can solve the limitations of conventional digital voting system. The conventional system mainly consists of paper polling, electronic ballot system like EVM and associated mechanical devices that are used in voting system [2]. However, there is a need a digital voting system using electronic polling device mainly in two forms—e-voting and I-voting. In e-voting, the users are able to cast their votes at voting center using the digital technology, and in case of I-voting, a software interface is used to cast votes. The main criteria that must be maintained for digital voting are to prevent the illegal behavior of voters, hazards to maintain the queue for voting process, scalability, reliability, transparency of voting process to reduce the investment of unnecessary

54

S. Majumder and S. Ray

manpower, and at the end, confirmation of the casted votes and final counting results are performed in a proper way that produces the correct result [2]. In 2019, Krishnamurthy et al. discussed regarding the digital voting system and its limitations regarding the fake voting, expenses, lack of quicker outcomes, counting, and storage limitations. They proposed that Internet of things (IoT) can be used for the voting and counting process. However, such system faces some complexity regarding the network and bandwidth requirements, very expensive in computation and counting, space limitation, testing and centralized control mechanism [3]. In 2020, Dhulavvagol et al. [4] proposed that Ethereum is the well-accepted and renowned blockchain platform due to the use of unlimited blockchain platform. A private blockchain network is formed where the client notes share their data within the private network of the chain using the digital wallet, and each block is identified using the hash key generated using hashing algorithm (SHA 256). The main focus of this paper is to perform a comparative study of two different Ethereum clients called Geth and Parity, and from the result, it was found that the Parity client is 91% faster than the Geth in terms of time, consistency, and scalability. On the other hand, an e-voting application is developed using the private network and smart contract of Ethereum client where the smart contract is developed using proof-of-work (PoW) mechanism. Using PoW consensus mechanism, block structure can be adjusted if any one of the blocks gets corrupted. In this scheme, a democracy contract is developed that consists of administrator or owner and all the other members. The owner can initialize a proposal for voting in the form of Ethereum transaction, time, and margin of votes. On the other hand, the owner can also add, remove, or transfer members and the members can cast their votes against or for the proposal. Based on the number of votes, the system executes the proposal for the voting. The voting application can be built using the Remix IDE, Ethereum virtual machine (EVM), Web3.js, and solidity development language. In 2020, Alvi et al. proposed [5] a digital e-voting system that is developed using the decentralized application of blockchain technology, smart contract (SC), Merkle tree, and fingerprint-based hashing technique for data integrity, accuracy, scalability, authentication, privacy, and security purpose for the voters. The voting process is mainly divided into three different steps—data management system, voter/candidate registration, and voter validation and counting of the submitted votes by the smart contract using different techniques like Merkle tree structure. Generally, three types of data management module are used—database of election commission, blockchain storage, and cloud server. However, the voter-related hashed confidential information with public key is stored in the genesis block of the blockchain as a voter list. After registering the user in local registration office or registration process, a private key and a public key are generated using key generation algorithm where the private key is sent to the voter registered mobile to participate and cast of vote through an Internetconnected device along with national identification number (NID), fingerprint, etc. Later, smart contract validates the voter information from the voter list of genesis block, and based on successful validation, candidate list is provided by the smart contract for casting of vote. Further, voter selects the candidate from the list for casting of vote and signs the same using digital signature and sends the same to the

6 Usage of Blockchain Technology in e-Voting …

55

contract. SC generates the vote identity (VID) and voter number and further makes a block as a form of transaction. In 2020, Sadia et al. [6] proposed a decentralized e-voting system based on blockchain technology using smart contract. This scheme is divided into three phases: pre-voting phase, voting phase, and post-voting phase. This protocol is initiated by the organizer to collect the list of candidates, list of eligible voters, start time, and end time of voting which are previously decided by the election commission (EC). In pre-voting phase, organizer collects the eligible voters and participating candidates in the e-voting process based on the condition provided by the EC like national and provides the information like—national identification number (NID), binary values of the fingerprint, candidate list, public key, start time, end time, etc., for storage in genesis block with a flag value to true for the timing. Further, the eligible voters are grouped randomly and specific time duration is provided for e-voting purpose and they are notified via e-mail and messages, and each group is maintained in such a way that the overlapping among the groups are not possible. In the voting phase, the public key, the group and the flag value is verified and based on successful validation, the voter is then allowed to cast his vote providing his/her fingerprint and based on the successful validation with the genesis block, a list of candidates as a ballot is provided with their respective logos to the voter for candidate selection. Each logo for the candidate contains a binary value, and the user basically selects the binary values of the respective logos. On the other hand, each ballot also contains a ballot number which contains a unit ballot string for a specific voter and fingerprint of the voter, and the ballot string communicates with the smart contract for the specific voter. After completion of the voting process, all the blocks are broadcasted consecutively and the peer nodes perform counting the result. In 2020, Shah et al. proposed [7] an online e-voting system using android-based application where authentication and authorization are maintained separately using a unique key/one-time password and fingerprint, respectively. In this system, the voter has to register first providing the necessary details in registration module using the android application, and accordingly, the information is updated in the server. After successful registration, the user can log into the e-voting application proving a one-time password that is valid for a specific time limit. After successful login, user has to provide his/her fingerprint for further validation purpose, and upon successful validation, a token is issued to the voter for casting of vote. Vote casting is done as a transaction by transferring the token from user account to the wallet of the candidate. In the same year, Suganya et al. [8] proposed an e-voting system that is developed using IoT-based hardware application—NODEMCU, ARDUINO board, and blockchain application that follows a distributed architecture. This e-voting system was mainly developed using fingerprint sensor module, NODEMCU, ARDUINO board, LCD display, vote monitoring component, voting machine that uses voting buttons, web server, etc. Initially, the user has to log in to the system using the fingerprint which is verified by the local authority, and based on successful validation, user can cast his/her vote using voting user interface. The casted vote is further counted and updated into server, and the server updated it in separate blockchain. For every separate voter, a separate block is assigned and voting information is updated in the

56

S. Majumder and S. Ray

block. Finally, the block is appended to the blockchain and the same data is also stored in several databases. The whole data of the blockchain is stored in main server from where the voting outcome is monitored. In 2020, Abuidris et al. [9] proposed a hybrid consensus model (PSC-Bchain) using proof of stake (PoS) and proof of credibility (PoC) consensus mechanisms jointly for both the public blockchain (upper layer) and private blockchain (lower layer), respectively, using the sharding mechanism for segregating the data in different parts and distribute the transaction loading in different layers of blockchain technology [9]. The architecture of the system is divided into different layers: manage servers (MS), blockchain network (BN), voters, and smart contract (SC). MS maintains node authentication and user credentials for user login purpose and is responsible to provide the node certificates and publish the node information of lower blockchain layer to upper blockchain layer. This hybrid consensus model contains multiple blockchain networks for parallel execution and improves scalability and overall performance of the system. Voter contains an identity to login into the system for casting of the vote and receives a digital token from the smart contract that permits them to cast their vote and allow accessing the wallets. The smart contract allows and permits the transaction for voting purpose. In 2020, Roh and Lee proposed [10] an e-voting system is developed using private permission blockchain and practical byzantine fault-tolerant (PBFT ) algorithm that is a non-competitive consensus algorithm. This scheme provides more security, confidentiality, and secure access mechanism due to the use of permission mechanism to join the blockchain. Recently in 2021, Jain et al. [12] proposed a blockchain-based electronic voting system called “MATDAAN” that utilizes Ethereum technology, and it is claimed that the open-source application is secure, fast, reliable, cheap, reliable, and anonymous. The voting process consists of three phases—registration phase, polling phase, and result phase. In registration phase, user enters his name as per the Aadhaar card, selects their regional district and address as per the Aadhaar card, and after registration, QR code is generated for login. During login, QR code has to upload to get the OTP in registered mobile number, and after successful login, a menu will appear for voting purpose that contains the name of the voter, Aadhaar number, candidate list, constituency name, and the status of voting; i.e., if anyone has previously casted vote or a new voter. Based on the voting status, the user is given permission to cast their vote. Further, a result tab will appear after completion of the voting process that shows the result as a real-time graph and changes based on successive updates. So, no need to count separately. But this application faces some limitations like device incompatibility, requirement of stable Internet connection all the time- and security-related issues like 51% attack in blockchain [12–14].

6 Usage of Blockchain Technology in e-Voting …

57

6.3 Proposed e-Voting Application The existing electronic voting system does not provide so much security due to data tampering issue. The bitcoin blockchain is used in some e-voting applications using lottery protocol but still it does not provide so much privacy to voters and ring signature have to be used separately to provide privacy. To solve different issues of Bitcoin, Ethereum blockchain is used in e-voting system that provides privacy using blind signature method in e-voting. On the other hand, PBFT-based consensus algorithm imposes a restriction of generation of duplication branch or double spreading. It resists the possibility of double voting problem and 51% attack. Thus, we have motivated to propose an e-voting system using private blockchain and PBFT-based consensus mechanism. The entities involved in this scheme are—voters/candidates, voting and counting server (VCS) and management/verification server (MS), and four steps are used—voter/candidate registration phase, voter authentication and login phase, vote casting and validation phase and counting and announcement of result phase. The basic architecture of the proposed system is given in Fig. 6.1, and

Fig. 6.1 Basic architecture of proposed e-voting system

58

S. Majumder and S. Ray

the detail communication protocol is depicted in Fig. 6.2 and described below where the notations used are summarized in Table 6.1.

Fig. 6.2 Proposed communication protocol of e-voting system using blockchain system

6 Usage of Blockchain Technology in e-Voting … Table 6.1 Notations used in the proposed scheme

59

Parameters

Description

CL

Candidate list

Cl

Candidate information

VL

Voter list

SKvi

Session key for the specific voter

PKvi

Public key of user

MS

Management/verification server

VCS

Voter and counting server

VIDi

Identity of ith voter

FP

Fingerprint of user

BLKvi

Block created for the ith voter

V_Time1

Voting time

T

Small time difference for 3 s

6.3.1 Voter/Candidate Registration Phase In registration phase, candidates and voters send their registration details to management/verification server. Based on the data provided, MS creates candidate list and voter list, respectively, and sent those to voting and counting server. Candidate Registration: In candidate registration, candidates send a registration request to MS which sends a request for Cl details for name, address, age, affiliation, mobile number, e-mail, and participating symbol. Candidate sends their details to MS and based on the Cl details, MS generates a candidate list CL for the voting process. Voter Registration and creation of Genesis Block: In voter registration process, voter sends a registration request to MS. MS sends a request for identity details of the voter. Voter provides his/her identity like voter card/Aadhaar card/passport to validate further and uploaded the same for further registration. Later on, MS asks for voter details like name, address, age, affiliation, fingerprint, mobile number, and e-mail address that validates with identity. Based on successful voter registration, MS generates a voter list (VL) and public key PKvi for the specific voter. After completion of all the registration formalities, MS sends a message containing CL, VL, and PK vi to VCS. Later on, VCS generates a voter identity (VIDi ) for the specific voter and sends a message to the registered mobile number of the voter that contains VIDi and PKvi for further requirement. Further, the details of CL, VL, VIDi, and PKvi are stored in the genesis block of the blockchain that is created and maintained by VCS.

60

S. Majumder and S. Ray

6.3.2 Voter Authentication and Login Phase In this phase, voter sends his/her login request to VCS along with biometric identity like fingerprint (F P ) and VIDi as an encrypted message using the public key PKvi . Based on the login information, VCS validates the voter against the information stored in genesis block and the validation information is sent to MS against the VIDi . Based on the user validation, MS further generates a session key SKvi (further communication between the voter, VCS and MS) for the VIDi on ECC environment and transmits an encrypted message that contains PKvi (SKvi ||VIDi ) to the VCS. Later on, VCS decrypts the message and concatenates the candidate list (CL) from the genesis block and forwards the message PKvi (SKvi ||CL||VIDi ) to the voter.

6.3.3 Vote Casting and Validation Phase In this phase, voter cast his/her vote after selecting the suitable candidate CI from the candidate list CL and records the current voting time V_Time1 . Further, concatenates the CI with V_Time1 and inserts the concatenated data as transaction TS and finally encrypts the transaction with session key SKvi and dispatched the message SKvi (TS (CI||V_Time1 )) to VCS. Afterward, VCS decrypts the message, records the current time V_Time2, and validates whether |V_Time2 − V_Time1 | ≤ T to restrict the denial of service (DoS) attack, double voting problem, and 51% attack. VCS also validates the transaction TS, and based on successful validation, VCS forwards the message SKvi (TS (CI||V_Time1 )) to MS. Later on, MS collects all the TS from the voters and creates voter-specific information in the separate block BLKvi and transmits the block as encrypted with session key to VCS. Later, VCS decrypts the message and appends the block BLKvi to the blockchain.

6.3.4 Voter Counting and Announcement of Result Phase VCS counts all candidate-wise votes from the blockchain. Afterward, VCS announces the entire candidate-wise (CI) voting results in a tabulated format where all voting information of different candidates is taken arbitrary and mentioned in Table 6.2 as a sample format.

6.4 Conclusion In this paper, we have proposed a basic architecture and communication protocol for an e-voting application using private blockchain and PBFT consensus mechanism

6 Usage of Blockchain Technology in e-Voting …

61

Table 6.2 Sample publication of voting results Region of votes

Candidate A

Candidate B

Candidate C

Winning candidate

R1

3000

2000

1500

Candidate A

R2

3500

4000

2700

Candidate B

R3

4000

5100

1200

Candidate B

R4

1500

5300

1200

Candidate C

R5

2700

5100

1200

Candidate B

that can be used in recent e-voting application to solve the various problems related to digital voting applications. This application can resist the possibility of double voting problem and the 51% attacks that generally occurred in blockchain-based voting schemes.

References 1. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system, 2008. bitcoin.org. https://bitcoin. org/bitcoin.pdf. Accessed: 16.02. 2021 2. Casado-Vara, R., Corchado, J.M.: Blockchain for democratic voting: how blockchain could cast of voter fraud. Oriental J. Comput. Sci. Technol. 11(3) (2018) 3. Krishnamurthy, R., Rathee, G., Jaglan, N.: An enhanced security mechanism through blockchain for E-polling/counting process using IoT devices. Wirel. Netw. (2019) 4. Dhulavvagol, P.M., Bhajantri, V.H., Totad, S.G.: Blockchain Ethereum clients performance analysis considering e-voting application. Procedia Comput. Sci. 167, 2506–2515 (2020) 5. Alvi, S.T., Uddin, M.N., Islam, L.: Digital voting: a blockchain-based e-voting system using biohash and smart contract. In: Third International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 228–233. IEEE (2020) 6. Sadia, K., Masuduzzaman, M., Paul, R.K., Islam, A.: Blockchain-based secure e-voting with the assistance of smart contract. In: IC-BCT 2019, pp. 161–176. Springer, Singapore (2020) 7. Shah, A., Sodhia, N., Saha, S., Banerjee, S., Chavan, M.: Blockchain enabled online-voting system. In: ITM Web of Conferences, vol. 32, p. 03018. EDP Sciences (2020) 8. Suganya, R., Sureshkumar, A., Alaguvathana, P. Priyadharshini, M.S., Jeevanantham, K.: Blockchain based secure voting system using IoT. Int. J. Future Gener. Commun. Netw. 13(3), 2134–2142 (2020) 9. Abuidris, Y., Kumar, R., Yang, T., Onginjo, J.: Secure large-scale e-voting system based on blockchain contract using a hybrid consensus model combined with sharding. ETRI J. (2020) 10. Roh, C.H., Lee, I.Y.: A study on electronic voting system using private blockchain. J. Inform. Process. Syst. 16(2), 421–434 (2020) 11. Priya, K.L.S., Rupa, C.: Bslockchain technology based electoral franchise. In: 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 1–5. IEEE (2020) 12. Jain, N., Upadhyay, P., Arora, P., Chaurasia, P.: MATDAAN: a secure voting system using blockchain. Int. Res. J. Modern. Eng. Technol. Sci. 3 (2021) 13. Baudier, P., Kondrateva, G., Ammi, C., Seulliet, E.: Peace engineering: the contribution of blockchain systems to the e-voting process. Technol. Forecasting Soc. Change 162, 120397 (2021) 14. Wang, Z., Jin, H., Dai, W., Choo, K.K.R., Zou, D.: Ethereum smart contract security research: survey and future research opportunities. Frontiers Comput. Sci. 15(2), 1–18 (2021)

Chapter 7

Bengali Visual Genome: A Multimodal Dataset for Machine Translation and Image Captioning Arghyadeep Sen, Shantipriya Parida, Ketan Kotwal, Subhadarshi Panda, Ondˇrej Bojar, and Satya Ranjan Dash Abstract Multimodal machine translation (MMT) refers to the extraction of information from more than one modality aiming at performance improvement by utilizing information collected from the modalities other than pure text. The availability of multimodal datasets, particularly for Indian regional languages, is still limited, and thus, there is a need to build such datasets for regional languages to promote the state of MMT research. In this work, we describe the process of creation of the Bengali Visual Genome (BVG) dataset. The BVG is the first multimodal dataset consisting of text and images suitable for English-to-Bengali multimodal machine translation tasks and multimodal research. We also demonstrate the sample use-cases of machine translation and region-specific image captioning using the new BVG dataset. These results can be considered as the baseline for subsequent research.

A. Sen · S. R. Dash (B) KIIT University, Bhubaneswar, India e-mail: [email protected] A. Sen e-mail: [email protected] S. Parida · K. Kotwal Idiap Research Institute, Martigny, Switzerland e-mail: [email protected] K. Kotwal e-mail: [email protected] S. Panda Graduate Center, City University of New York, New York, USA e-mail: [email protected] O. Bojar Charles University, MFF, ÚFAL, Prague, Czech Republic e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_7

63

64

A. Sen et al.

7.1 Introduction In the broad area of machine learning or deep learning, multimodal processing refers to training models based on combined information sources such as image, audio, text, or video. Multimodal data facilitates learning features from various subsets of information sources (based on data modality) to improve the accuracy of the prediction. The multimodal machine translation includes information from more than one modality in the hope that additional modalities will contain useful alternative views of the input data [1]. Though machine translation performance reached nearhuman level for several language pairs (see, e.g., [2]), it remains challenging to translate low-resource languages or to effectively utilize other modalities (e.g., image [3]). The objective of the paper is twofold: 1. To describe the process of building the multimodal dataset for Bengali language suitable for English-to-Bengali machine translation and multimodal research. 2. To demonstrate some sample use-cases of the newly created multimodal dataset: Bengali Visual Genome (BVG).

7.2 Related Work For the Bengali language, very limited work has been done in multimodal research including image captioning, and none in multimodal machine translation to the best of our knowledge due to lack of multimodal bi-lingual corpus. An end-to-end image captioning framework for generating Bengali captions using the BanglaLekhaImageCaptions dataset [4]. In their work, the image features are extracted using the pretrained ResNet-50 and text (sentences) features using a onedimensional CNN. An automatic image captioning system “Chittron” proposed by [5] which uses VGG16 for generating image features and a staked LSTM for caption generation. An another image captioning dataset in Bengali which consists of 500 images of lifestyle, festivals along with its associated captions is available for research [6].

7.3 Dataset Preparation To avoid any bias, we did not use any machine translation system and rely on human volunteers. The dataset statistics are shown in Table 7.1.

7 Bengali Visual Genome …

65

Table 7.1 Brief details of the Bengali Visual Genome (BVG) dataset Dataset Number of items Training dataset Development test set (D-test) Evaluation test set (E-test) Challenge test set (C-test)

28930 998 1595 1400

English Text: The sharp bird talon. Bengali Text:

Fig. 7.1 Sample image with a specific region and its description in English and Bengali for textonly, multimodal and caption generation tasks

7.3.1 Training Set Preparation We follow the same selection of short English segments (captions) and the associated images from Visual Genome as HVG 1.11 has. For BVG, volunteers manually translated these captions from English to Bengali taking the associated images and their region into account as shown in Fig. 7.1. The translation is performed by human volunteers (native Bengali speakers) without using any machine translation system.

7.3.2 Test Set Preparation The development test (D-Test), evaluation test (E-Test), and challenge test (C-Test) sets prepared in the same fashion as the training. The C-Test was created for the

1

https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3267.

66

A. Sen et al.

(a) Four men on

.

(b) Judge on side of platform.

on blue

Fig. 7.2 Sample item from BVG challenge test set. Machine translation system fails to translate correctly into Bengali for the ambiguous word court (e.g., tennis or judicial) present in the source English text without using the associated image. The English text in image a contains the ambiguous word Court. The English text in image b even more ambiguous containing the words judge and court refers to the ‘tennis court’ rather ‘judicial court’.

WAT2019 multimodal task2 [7] by searching for (particularly) ambiguous English words based on the embedding similarity and manually selecting those where the image helps to resolve the ambiguity. The surrounding words in the sentence however also often include sufficient cues to identify the correct meaning of the ambiguous word. The sample ambiguous words in used in the challenge test are: Stand, Court, Players, Cross, Second, Block, Fast, Date, Characters, Stamp, English, Fair, Fine, Press, Forms, Springs, Models, Forces, and Penalty [8]. The NMT system fails to translate the sentence containing one or more listed ambiguous words without referring its associated image as shown in Fig. 7.2.

7.4 Sample Applications of BVG 7.4.1 Text-Only Translation For the text-only translation, we have first trained the SentencePiece subword units [9] setting the maximum vocabulary size to 8 k. The vocabulary was learned jointly on the source and target sentences. We set the number of encoder and decoder layers to 3 each, and the number of heads was set to 8. The hidden size was set to 128, along with the dropout value of 0.1. We initialized the model parameters using Xavier initialization [10] and used the Adam optimizer [11] with a learning rate of 5e − 4 for optimizing model parameters. Gradient clipping was used to clip gradients greater 2

https://ufal.mff.cuni.cz/hindi-visual-genome/wat-2019-multimodal-task.

7 Bengali Visual Genome …

67

Table 7.2 Details of the processed BVG for NMT experiments Dataset #Sentences #Tokens EN Train D-test E-test C-test

28,930 998 1595 1400

BN

143,156 4922 7853 8186

113,993 3936 6408 6657

The number of tokens for English (EN) and Bengali (BN) for each set is reported Table 7.3 Results of text-to-text translation on the BVG dataset D-test BLEU E-test BLEU C-test BLEU 42.8

35.6

17.2

than 1. The training was stopped when the development loss did not improve for 5 consecutive epochs. The training, dev, test, and challenge test sizes for the neural machine translation (NMT) experiment are shown in Table 7.2 and text-to-text NMT result in Table 7.3.

7.4.2 Bengali Caption Generation In this section, we demonstrate the use-case of region-specific image caption generation. We provide a baseline method for generating Bengali captions to the area enclosed by the bounding box as provided by the BVG dataset. In [12], O. Vinayls et al. have proposed an end-to-end deep neural network for generating the captions for the entire image. Their network consists of a vision CNN (used as a feature extractor for images) followed by a language generating RNN (to obtain caption as a sequence). Considering the model from [12] as the reference model, we incorporate the following modification for region-specific caption generation. The overall architecture of the modified network is shown in Fig. 7.3.

Fig. 7.3 Architecture of the region-specific image caption generator

68

A. Sen et al.

The reference model uses features from the last convolutional layer of the vision CNN as the input for subsequent RNN. As our interest lies in obtaining a caption focused on the specific region, we need to consider the features for the region as well, in addition to the whole image features. We compute the scaling factor between the input image size and the size of the final convolutional layer of the vision CNN. Using this factor, we identify the coordinates of the region (bounding box) in this final convolutional layer. We obtain the features for the corresponding region through Region of Interest (RoI) pooling [13]. We generate the final feature vector by concatenation of features from the region and features from the entire image. In the present use case, we consider ResNet-50 as the backbone for the reference model. In this approach, the encoder module is not trainable; it only extracts the image features; however, the LSTM decoder is trainable. We used LSTM decoder using the

Fig. 7.4 Sample output of text-only translation and Bengali captioning

7 Bengali Visual Genome …

69

Table 7.4 Results of the region-specific image captioning on the BVG dataset D-test BLEU E-test BLEU C-test BLEU 2.5

1.3

0.4

image features for caption generation using greedy search approach [14]. We have used the cross-entropy loss during training the decoder [15]. The result shown in Table 7.4. Figure 7.4 shows a sample output of text-only translation and Bengali captions generated. We obtained better results for both text-only translation and image captioning while referring to the reference.

7.5 Conclusion and Future Work We have presented the first multimodal English-Bengali dataset suitable for multimodal research applications such as (a) multimodal translation, (b) Bengali caption generation including e-commerce product catalog labeling, and (c) product development for visually impaired persons. To exploit the BVG by the research community, we plan to include this dataset in the multimodal shared tasks for Bengali image captioning as well as the tasks related to English–Bengali multimodal machine translation. We also plan to extend the BVG multimodal dataset for visual question answering. Our ‘Bengali Visual Genome’ is available for research and non-commercial use under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License3 at http://hdl.handle.net/11234/1-3722. Acknowledgements The author Ondˇrej Bojar would like to acknowledge the support of the grant 19-26934X (NEUREM3) of the Czech Science Foundation.

References 1. Sulubacak, U., Caglayan, O., Grönroos, S.A., Rouhe, A., Elliott, D., Specia, L., Tiedemann, J.: Multimodal machine translation through visuals and speech. Mach. Transl. 34(2), 97–147 (2020) 2. Popel, M., Tomkova, M., Tomek, J., Kaiser, Ł, Uszkoreit, J., Bojar, O., Žabokrtský, Z.: Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nat. Commun. 11(1), 1–15 (2020) 3. Parida, S., Motlicek, P., Dash, A.R., Dash, S.R., Mallick, D.K., Biswal, S.P., Pattnaik, P., Nayak, B.N., Bojar, O.: Odianlp’s participation in WAT2020. In: Proceedings of the 7th Workshop on Asian Translation. pp. 103–108 (2020) 3

https://creativecommons.org/licenses/by-nc-sa/4.0/.

70

A. Sen et al.

4. Khan, M.F., Sadiq-Ur-Rahman, S., Islam, M.S.: Improved bengali image captioning via deep convolutional neural network based encoder-decoder model. In: Proceedings of International Joint Conference on Advances in Computational Intelligence. pp. 217–229. Springer (2021) 5. Rahman, M., Mohammed, N., Mansoor, N., Momen, S.: Chittron: an automatic Bangla image captioning system. Procedia Comput. Sci. 154, 636–642 (2019) 6. Kamruzzaman, T.: Dataset for image captioning system (in bangla) (2021) 7. Nakazawa, T., Doi, N., Higashiyama, S., Ding, C., Dabre, R., Mino, H., Goto, I., Pa, W.P., Kunchukuttan, A., Oda, Y., Parida, S., Bojar, O., Kurohashi, S.: Overview of the 6th workshop on Asian translation. In: Proceedings of the 6th Workshop on Asian Translation. pp. 1–35. Association for Computational Linguistics, Hong Kong, China (Nov 2019). https://doi.org/10. 18653/v1/D19-5201, https://www.aclweb.org/anthology/D19-5201 8. Parida, S., Bojar, O., Dash, S.R.: Hindi visual genome: a dataset for multi-modal English to hindi machine translation. Comput. Sist. 23(4) (2019) 9. Kudo, T., Richardson, J.: SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 66–71. Association for Computational Linguistics, Brussels, Belgium (Nov 2018). https://doi.org/10.18653/v1/D182012, https://www.aclweb.org/anthology/D18-2012 10. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 9, pp. 249–256. PMLR, Chia Laguna Resort, Sardinia, Italy (13–15 May 2010), http:// proceedings.mlr.press/v9/glorot10a.html 11. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014), http://arxiv.org/ abs/1412.6980, cite arxiv:1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015 12. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3156–3164 (2015). https://doi.org/10.1109/CVPR.2015.7298935 13. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169 14. Soh, M.: Learning cnn-lstm architectures for image caption generation 15. Yu, J., Li, J., Yu, Z., Huang, Q.: Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4467–4480 (2019)

Chapter 8

Deep Learning-Based Mosquito Species Detection Using Wingbeat Frequencies Ayush Jhaveri, K. S. Sangwan, Vinod Maan, and Dhiraj

Abstract The outbreak of mosquito-borne diseases such as malaria, dengue, chikungunya, Zika, yellow fever, and lymphatic filariasis has become a major threat to human existence. Hence, the elimination of harmful mosquito species has become a worldwide necessity. The techniques to reduce and eliminate these mosquito species require the monitoring of their populations in regions across the globe. This monitoring can be performed by automatic detection from the sounds of their wingbeats, which can be recorded in mosquito suction traps. In this paper, using the sounds emitted from their wingbeats, we explore the detection of the six most harmful mosquito species. From 279,566 wingbeat recordings in the wingbeat kaggle dataset, we balance the data across the six mosquito species using data augmentation techniques. With the use of state-of-the-art machine learning models, we achieve detection accuracies of up to 97%. These models can then be integrated with mosquito suction traps to form an efficient mosquito species detection system.

8.1 Introduction Mosquitoes are small, yet one of the most life-threatening creatures to mankind. They are carriers of some of the deadliest illnesses including malaria, dengue, chikungunya, Zika, yellow fever, and lymphatic filariasis [1]. Their ability to quickly spread and infect humans has killed over 1 million people per year [2] and has made the war against mosquitoes a global priority. The United Nations, along with the World Health A. Jhaveri · K. S. Sangwan Birla-Institute of Technology and Science (BITS, Pilani), Pilani Campus, Pilani, Rajasthan, India e-mail: [email protected] V. Maan Mody University, Lakshmangarh, Rajasthan, India Dhiraj (B) CSIR—Central Electronics Engineering Research Institute (CSIR-CEERI), Pilani, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_8

71

72

A. Jhaveri et al.

Organization’s tropical diseases program [3], has set guidelines for affected countries to prevent mosquito-borne diseases. The mosquitoes belonging to the Aedes, Anopheles, and Culex genus are vectors of the harmful illnesses mentioned above. The most effective combatants against these mosquitoes have been insecticides and the sterile insect technique (SIT) [4], where sterile males are released in affected areas and reduce the population of these winged creatures. However, what is a vital prerequisite for these prevention techniques and is the detection and identification of the mosquito populations. The Biogents Counter [5] is a device that differentiates between mosquitoes and dust particles and transmits mosquito counts to the cloud. Further, analysis can be done at this stage and mosquito species populations can be identified using the frequencies of their wingbeats. Thus, we present state-of-the-art deep learning solutions to effectively identify these harmful mosquito species by using their wingbeat frequencies.

8.2 Related Works Several methods have been discovered to identify the harmful mosquito species. The most accurate identification of these mosquito species is explained in the literature [6], where DNA-based methods are explored. In literatures [7–9] and [10], vector mosquitoes are automatically identified from their images. With the use of GoogleNet [7], simple convolutional neural networks [8], artificial neural networks [9], and deep convolutional neural networks [10], they identify these harmful mosquitoes at an accuracy of 76.2%, 93%, 96%, and 97%, respectively. Although the employment of these methods would be brisk and accurate, the image acquisition of these mosquitoes would pose a hurdle. Different mosquito species produce unique harmonic frequencies from their wing beats. Mosquito species classification can be automatically performed using the sounds of their wingbeats and is presented in the literature [11, 11]. Slightly different audio features have been extracted using state-of-the-art deep learning models on the wingbeat dataset [13], achieving their best results of 86% [11] and 96% [12] with the use of spectrograms as audio features.

8.3 Materials and Methods Our solution follows a series of steps as shown in Fig. 8.1 above. The first step is audio data capture, where we obtain WAV files of mosquito wingbeats from the wingbeat kaggle database [13]. This data is extended by data augmentation to the existing WAV files to balance the data across multiple mosquito species. We generate mel spectrograms from these audio files and train machine learning models using it as input.

8 Deep Learning-Based Mosquito Species Detection …

73

Fig. 8.1 Block diagram showing the process of our study

Table 8.1 Kaggle wingbeat dataset description [13]

Species

Number of recordings

Ae. Aegypti

85,553

Ae. Albopictus

20,231

An. Arabiensis

19,279

An. Gambiae

49,471

Cu. Pipens

30,415

Cu. Quinquefasciatus

74,599

Total

279,566

8.3.1 Audio Data Capture We use the wingbeat kaggle database [13], which consists of 279,566 labeled wingbeat recordings of 6 mosquito species as mentioned in Table 8.1. The data was recorded via an optical sensor at the Biogents premises in Regensburg, Germany.

8.3.2 Data Preprocessing In order to balance the number of recordings across the six mosquito species, we employ data augmentation techniques on the mosquito species with less WAV files. Augmented recordings are obtained by combining audio recordings that form a longer continuous recording and extracting random audio samples with a sample length of 5000. Post this data augmentation step, the distribution of audio recordings is given in Table 8.2. We divide the dataset into 80:20% for training and 20 testing the model. Mel spectrograms [14] are time–frequency representations of audio signals with the mel frequency on the vertical axis, time domain on the horizontal axis, and the color representing the intensity of the audio signal. Different mosquito species exhibit different harmonic frequencies as shown in the mel spectrograms of three mosquito species in Fig. 8.2. The WAV files are converted into mel spectrograms using the Python library Librosa [15]. On the basis of experimentation, we find that the best results are obtained when mel spectrograms are generated with a fast Fourier transform (FFT) size of 256 and a hop length of 42.

74 Table 8.2 Final augmented dataset description

A. Jhaveri et al. Species

Number of recordings

Ae. Aegypti

85,553

Ae. Albopictus

85,377

An. Arabiensis

85,485

An. Gambiae

85,092

Cu. Pipens

85,485

Cu. Quinquefasciatus

85,043

Total (80% Train: 20% Test)

512,035 (408,736:103,299)

Fig. 8.2 Mel spectrograms of Ae. Aegypti, An. Gambiae, and Cu. Quinquefasciatus, respectively

8.3.3 Machine Learning-Based Model Development and Training With mel spectrograms of the mosquito wingbeats as input, we train two deep learning models: MobileNet [16], a lightweight model, and DenseNet-121 [17], a deeper model. We choose these models due to their significant difference in model complexity, in order to compare their performance in predicting mosquito species. MobileNet. MobileNets [16] are very lightweight efficient models and are ideal for embedded vision applications such as this. The architecture of MobileNet is shown in Fig. 8.3 below. The model has a total of 5,866,182 parameters. DenseNet-121. DenseNets [17], known as deeply connected convolution networks, increase the depth of deep convolutional networks. By connecting every layer to each other as shown in Dense Block 1 in Fig. 8.4, the information flow is maximized. This exploits the potential of the network through feature reuse, and hence, fewer parameters are needed than traditional convolutional neural networks. We use the model DenseNet-121 with 121 blocks of interconnected layers and have a total of 8,625,222 parameters.

8 Deep Learning-Based Mosquito Species Detection …

75

Fig. 8.3 MobileNet general architecture [16]

Fig. 8.4 DenseNet with three blocks [17]

8.3.4 Inference Generation As described in Sect. 8.3.2, we have performed a train–test split of 80:20. There are six mosquito species to be identified, and therefore, six output class labels are used, namely 0–5. Several evaluation metrics including accuracy, loss, precision, and recall are used for the evaluation and comparison of our models.

8.4 Results In this section, we emphasize the experimental setup, the model results and compare the performance with previous works.

76

A. Jhaveri et al.

8.4.1 Experimental Setup Once the model architectures are defined, the generated spectrograms are resized to dimensions of 256 × 256 × 3 as input data to the models. The models are constructed and trained using the TensorFlow framework as backend and Keras wrapper for defining the building blocks. We performed five-fold cross-validation, wherein the five different 80:20 splits of data are considered for model training and testing. Out of five such folds, we analyze and present the results of one such fold.

8.4.2 Model Training Both the models MobileNet [16] and DenseNet-121 [17] are trained on the mel spectrograms with a batch size of 32 for 200 epochs by using an Adam optimizer with a categorical cross-entropy loss function, and an initial learning rate of 5 × 10e−4 is set with polynomial decay over time. Graphs are plotted using Matplotlib, and the confusion matrices are constructed using Scikit-learn. MobileNet. We use transfer learning of ImageNet weights for effective initialization of model weights, with the first 20 layers set as untrainable, hence reducing computations needed for training. To avoid overfitting, to the last four layers before the final softmax layer, dropout regularization with factor 0.7 is used with batch normalization. L1 and L2 regularizations are also employed with regularization parameter values 1 × 10e−5 and 1 × 10e−4 , respectively. The training graph and confusion matrix in Fig. 8.5 display the performance of MobileNet. The confusion matrix indicates that the Ae. Albopictus achieves the highest prediction accuracy of 98.6%, whereas Am. Gambiae gets the lowest prediction accuracy of 94.6%. The results of the execution are shown in Table 8.3. MobileNet achieves an accuracy of 96.69%.

Fig. 8.5 Training graph and confusion matrix for MobileNet

8 Deep Learning-Based Mosquito Species Detection … Table 8.3 Performance table for MobileNet

77

Macroaverage

Weighted average

Precision

0.9671

0.9671

Recall

0.9669

0.9669

F1-score

0.9669

0.9669

Accuracy

0.9669

DenseNet-121. We us transfer learning for DenseNet-121 with ImageNet weights to reduce the number of model parameters to be trained and hence reduce the time and computational power needed for training. Setting the first 40 layers as untrainable gave the best results. To avoid overfitting on the data, we use dropout regularization with a factor of 0.65 on the last three layers prior to the final softmax layer. The performance of DenseNet-121 can be judged from the training graph and confusion matrix in Fig. 8.6. The light diagonal in the confusion matrix is an indication of good predictions. There is a low variance in the accuracies of the mosquito species with the maximum at 98.9% (C. Quinquefasciatus) and minimum at 95.4% (An. Gambiae). The performance metrics of DenseNet-121 are provided in Table 8.4. An accuracy of 97.03% is obtained.

Fig. 8.6 Confusion matrix and training graph for DenseNet-121

Table 8.4 Performance table for DenseNet-121

Macroaverage

Weighted average

Precision

0.9705

0.9706

Recall

0.9703

0.9703

F1-score

0.9703

0.9703

Accuracy

0.9703

78 Table 8.5 Performance comparison between MobileNet and DenseNet-121

A. Jhaveri et al. MobileNet

DenseNet-121

Accuracy

0.9669

0.9703

F1-score

0.9669

0.9703

Model parameters

5,866,182

8,625,222

Total training time

22 h 30 m

55 h 55 m

Inference time

0.0113 s

0.0389 s

8.4.3 Performance Comparison The MobileNet model achieves an accuracy of 96.69%, as seen in Table 8.5, whereas DenseNet-121 achieves a higher accuracy of 97.03%. It also be inferred from Table 8.5 that MobileNet has 3 million less parameters, which takes less than half the time to train and makes predictions three times faster than DenseNet-121. Hence, DenseNet-121 performs better in terms of accuracy, though MobileNet would be more suited to mosquito detection applications with limited computation power and quick prediction demands. As seen, Table 8.6 compares the performance of our models with previous works. We achieve better accuracies than similar models of previous works on the same dataset. Our MobileNet model (96.69%) achieves a higher accuracy than the previously worked MobileNet in 2018 (95.62%) [12]. Also, our DenseNet-121 model performs better than the DenseNet-121 models in previous works in 2018 (96%) [12] and 2019 (80%) [11]. Both of our models also outperform multilayer CNNs (86%) [11], which also trains on wingbeats. Table 8.6 Performance comparison of our models with previous works

Model

Mosquito feature used Accuracy (%)

DenseNet-121 2019 [11]

Mel spectrograms of wingbeats

80.00

Multilayer CNN [11] Mel spectrograms of wingbeats

86.00

DenseNet-121 2018 [12]

Mel spectrograms of wingbeats

96.00

MobileNet 2018 [12] Mel spectrograms of wingbeats

95.62

GoogleNet [7]

Images of mosquitoes 76.20

Simple CNN [8]

Images of mosquitoes 93.00

ANN [9]

Images of mosquitoes 96.00

MobileNet (ours)

Mel spectrograms of wingbeats

96.69

DenseNet-121 (ours)

Mel spectrograms of wingbeats

97.03

8 Deep Learning-Based Mosquito Species Detection …

79

8.5 Discussion and Conclusion In this paper, we demonstrate and analyze the use of deep learning models on the mel spectrograms of mosquito wing beats for the identification of harmful mosquito species. Several other approaches were carried out using mel frequency cepstral coefficients as input. This was fed into simple LSTM networks, stacked LSTM networks, combinations of LSTM networks, and ANNs, but all test accuracies were under 83%. We have stressed on the successful experimentations of DenseNet-121 and MobileNet in this paper, achieving accuracies of up to 97.03%. Mosquito species can also be automatically detected from their photographs. In this area, research has achieved up to 97% accuracy [11], which is very similar to our work’s performance though the advantage of wingbeat recordings used by us over image capture lies in the fact that it is very difficult to capture an image of a still mosquito, with its small wings in a likely position. One drawback of the dataset was its unbalanced nature. Also, the wingbeat recordings lack background noise, which makes it slightly impractical. This work could be further adopted to work with noisy inputs and for realistic field testing. These mosquito detection models can be used with optoelectronic sensors in mosquito traps to form a foolproof, automated, and real-time mosquito species monitoring system that could have numerous use cases.

References 1. Caraballo, H., King, K.: Emergency department management of mosquito-borne illness: Malaria, Dengue, and West Nile Virus. Emerg. Med. Prac. 16(5), 1–23 (2014) 2. WHO Vector-borne diseases. https://www.who.int/news-room/fact-sheets/detail/vector-bornediseases. Last accessed 2020/03/31 3. UN mosquito sterilization technology set for global testing, in battle against malaria, dengue. https://news.un.org/en/story/2019/11/1051361. Last accessed 2020/03/31 4. IAEA Sterile insect technique. https://www.iaea.org/topics/sterile-insect-technique. Last accessed 2020/03/31 5. BG-Counter 2: high tech mosquito monitoring. https://www.bg-counter.com/. Last accessed 2020/03/31 6. Walton, C., Sharpe, R.G., Pritchard, S.J., Thelwell, N.J., Butlin, R.K.: Molecular identification of mosquito species. Biol. J. Lin. Soc. 68(1–2), 241–256 (1999) 7. Motta, D., Santos, A.A.B., Winkler, I., Machado, B.A.S., Pereira, D.A.D.I., et al.: Application of convolutional neural networks for classification of adult mosquitoes in the field. PLOS One 14(1), e0210289 (2019) 8. Akhter, M., Hossain, M.S., Ahmed, T.U., Anderson, K.: Mosquito classification using convolutional neural network with data augmentation. In: Intelligent Computing and Optimization, ICO. Advances in Intelligent Systems and Computing, vol. 1324 (2020) 9. Banerjee, A.K., Kiran, K., Murty, U.S.N., Venkateswarlu, C.: Classification and identification of mosquito species using artificial neural networks. Comput. Biol. Chem. 32(6), 442–447 (2008) 10. Park, J., Kim, D.I., Choi, B.: Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks. Sci. Rep. 10, 1012 (2020) 11. Mulchandani, P., Sidiqui, M., Kanani, K.: Real-time mosquito species identification using deep learning techniques. Int. J. Eng. Adv. Technol. 9(2), 2249–8958 (2019)

80

A. Jhaveri et al.

12. Fanioudakis, E., Geismar, M., Potamitis, I.: Mosquito winbeat analysis and classification using deep learning. In: European Signal Processing Coference (EUSIPCO), vol. 26, pp. 2410–2414 (2018) 13. Kaggle Wingbeats. https://www.kaggle.com/potamitis/wingbeats. Last accessed 2021/03/31 14. Understanding the Mel Spectrogram. https://medium.com/analytics-vidhya/understandingthe-mel-spectrogram-fca2afa2ce53. Last accessed 2021/03/31 15. Librosa feature spectrogram. https://librosa.org/doc/main/generated/librosa.feature.mel-spectr ogram.html. Last accessed 2021/03/31 16. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) 17. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

Chapter 9

Developments in Capsule Network Architecture: A Review Sudarshan Kapadnis, Namita Tiwari, and Meenu Chawla

Abstract Problems like image recognition, object detection, image segmentation need efficacious solution for computer vision. Traditionally, these problems are being solved by Deep Learning. Convolutional Neural Network(CNN) and Recurrent Neural Network models are used for computer vision tasks. However, CNNs have some drawbacks. They cannot recognize objects if same object is viewed from different viewpoints and deformed objects. Besides, CNNs require an immense amount of training data. Capsule networks are viewed as a new solution for computer vision problems. They are capable of solving the above-mentioned problems better than CNNs. Capsule networks have shown better accuracy in many computer vision applications. In this paper, we review the methodologies and architectures of existing implementations of capsule networks.

9.1 Introduction There are various computer vision tasks which go from finding faulty products in the quality control of manufacturing unit to real-time video analyzing for security footage. These tasks which requires real time analysis, above certain scale, are impossible for human beings. The large volumes of available data make it suitable for Deep Neural Networks such as Convolutional Neural Network (CNN). CNN has been used in many areas of computer vision. It has performed well for tasks like facial recognition, plant disease detection and image processing. CNNs can use data in different forms and extract features by themselves. Capsule networks [1] were introduced in 2017 for MNIST dataset where they performed far better than CNNs. The algorithm called “Routing by agreement” is used in the CapsNet which replaces pooling layer of CNNs as well as the scalar output of CNN is replaced by a vector. The output vector has magnitude which represents the probability of the existence of feature in the image represented by the capsule. Direction of the vector gives out the instantiaS. Kapadnis (B) · N. Tiwari · M. Chawla Maulana Azad National Institute of Technology, Bhopal 462003, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_9

81

82

S. Kapadnis et al.

tion parameter values. CapsNet recognizes the spatial relationship between low-level features, which CNN simply discards during the pooling process. CNN uses large volumes of training data. Labeling of such data can be exhausting.

9.1.1 Convolutional Neural Networks (CNN) Any neural network broadly performs 3 operations which are • Weighting the inputs. • Summing up the weighted inputs. • Applying non-linearity. Convolutional layers, pooling layers, and a fully connected layer make up a convolutional network. On the input image, convolutional layers apply a filter (kernel). The kernel imposes the image and moves over it with a stride value. Since large stride values result in the loss of features, the stride value is held below 2. Feature maps are generated as a result of this process. Several specific kernels are used to generate several function maps in order to preserve as many features as possible. The ReLU activation function is used to apply nonlinearity and minimize model computational complexity. The pooling is the next move. Pooling, also known as down sampling, is the method of keeping the most useful parts of feature maps and discarding the rest to minimize computational complexity, while maintaining accuracy. Max-pooling, min-pooling, and average pooling are three different types of pooling. Max-pooling is favored by most models because it retains the most similar part of the input image. The combined feature maps are then categorized using a conventional Artificial Neural Network (ANN) with fully connected layers. Both error and kernels are corrected in backpropagation of ANN.

9.1.2 Limitations of Convolutional Neural Networks In the CNNs, the pooling process loses essential image information. Since it ignores the pose, texture, and deformation of an image [1], as well as portions of the image. The pose variable has no impact on CNNs. Since the pooling operation loses several features, CNN needs more training data than CapsNet. CNNs are vulnerable to adversarial attacks including pixel perturbations, which lead to incorrect classifications [2].

9 Review of CapsNet Arch

83

9.2 Capsule Networks (CapsNet) “A capsule is a set of neurons whose activity vector represents the instantiation parameters of a particular entity, such as an object or an object component” [1]. Capsules are equivariant. The vector output is the main attribute that separates CapsNet from CNN. Each capsule can be considered as a distinct entity that is looking for a specific feature in the overall picture. If the feature is located, the capsule generates a vector with a high magnitude and direction that is equivariant to the feature’s pose, texture, and deformities. Alternatively, the output vector’s magnitude is small. In one capsule, each neuron represents a different property of the same feature. CNN’s output is fed into the capsule. The capsule’s output reflects the probability of finding the feature. The following three are progressive implementations of CapsNet: 1. transforming auto-encoders [3] 2. vector capsules based on dynamic routing [1] 3. matrix capsules based on expectation-maximization routing [4].

9.2.1 Transforming Auto-encoders At this stage, this was just an Auto-Encoder, and no object detection capabilities were tested. It was created to assess the capsule network’s ability to recognize pose as a parameter. The auto-encoder was given an image as well as pose parameters to work with. It was intended to produce the same image with different pose parameters. Lower-level (l) capsules are referred to as primary capsules, whereas higher-level (l + 1) capsules are referred to as secondary capsules. Part whole hierarchy can be established by lower level capsules as they derive pose parameters from pixel intensities. Capsule networks benefit from this part-whole hierarchy because it allows them to understand the whole entity by recognizing its components. Lower-level capsules must preserve proper spatial relationships when predicting which higherlevel capsule will be activated in order to achieve correct classification.

9.2.2 Dynamic Routing Between Capsules Groups of neurons having instantiation parameters expressed by activity vectors are called capsules. Probability of feature being present is represented by length of vector. This is a improvement to the transforming autoencoders. The key advantage of the network over the auto-encoder is that pose parameters are extracted by the network rather than being supplied as data. couple of Convolutional layers, Primary Capsule Layer and class capsule layer are the components of the capsule network architecture. The primary capsule layer is succeeded by many capsule layers. The

84

S. Kapadnis et al.

last layer is known as the class capsule layer as it gives out the classification output. The convolutional layer extracts feature from the image, and the output is supplied to the Primary Caps layer. Spatial information is encoded in the form of instantiation parameters by operation vector vi ∈ R for each capsule i (where N ≥ i ≥ 1) in layer a. The i t h lower-level capsule’s output vector vi is fed into the next layer (a + 1). The jth capsule at layer (a + 1), will take vi and multiply it with the weight matrix Wi j . vˆ j|i vector is the transformation of the entity represented by capsule j at level (a + 1) by capsule i at level a. vˆ j|i is a PC’s prediction vector that shows the contribution of capsule i to the class capsule j. To obtain a single-primary capsule, it’s prediction to the class capsule j, a product of the prediction vector and a coupling coefficient reflecting the agreement between these capsules is performed. Two capsules are considered relevant to each other if agreement between them is high. If capsules are relevant to each other then the coupling coefficient would increase, while it would otherwise be decreased. To find the squashing function (U j ) candidates, a weighted sum (S j ) of all these individual primary capsule predictions for the class capsule j is determined. (9.1) S j = Σci j .vˆ j|i U j = (S 2j .S j )/(1 + (S 2j .S j ))

(9.2)

Like a probability, the squashing function makes sure that the value of the contribution from the capsule is between 0 and 1. The U j from one capsule layer is directed to the following capsule layer and treated similarly as previously mentioned. ci j is the coupling coefficient. ci j functions in a way such that i’s prediction in layer a + 1 is related to j’s prediction in layer a + 1. ci j is tuned with each iteration by finding the dot product of v ji and u j . Each capsule has a vector linked with it. This vector consists of two numbers: a probability that reflects the presence of the feature which that capsule is responsible for and a bunch of instantiation parameters that can be used to describe layer consistency. The agreement of lower level capsule with the higher level capsule creates a part to whole relationship. This relationship indicates the path’s relevance. This algorithm is called “Dynamic Routing Algorithm” [1] (Figs. 9.1 and 9.2).

Fig. 9.1 Capsule network architecture[1]

9 Review of CapsNet Arch

85

Fig. 9.2 Capsule network decoder[1]

9.2.3 Matrix Capsule with EM Routing Instead of taking vector outputs, input and output of a capsule can be represented as matrices [4]. This will ensure that transformation matrix can be generated with only n elements. In case of vectors, they require n 2 elements. The expectation-maximization (EM) algorithm was also used to replace the hierarchical routing by agreement. In addition, parameter “a” was used instead of the probability of the existence of feature depicted by a capsule. This helped to avoid the squashing function. To function effectively, the EM routing algorithm employs multiple layers of capsules for CapsNet. Let Ω L denote the set of capsules in the primary layer. Let K represent the pose matrix of each capsule. “a” denotes activation probability of each capsule. Weight matrix wi j is between levels L and L + 1 with denoting ith and jth capsule, respectively. In EM, pose matrix of a capsule from layer L is transformed so that it can cast vote for pose matrix of a capsule in layer L + 1. Transformation is done by product of pose matrix and weight matrix. Vi j = K i .Wi j

(9.3)

Routing by agreement in EM algorithm makes sure that layer L capsules’ pose maps to a data point and L + 1 layer’s each capsule falls in Gaussian distribution

9.3 Major Capsule Networks Structures and Implementations Sabour [1] proposed the architecture of introductory and fruitful capsule network, which consisted of two convolutional layers. Conv1 used a 28 × 28 × 1 MNIST image with 256 channels. Each channel consisting of 9 × 9 filters with a stride of 1.

86

S. Kapadnis et al.

Activation function used was ReLU. The first layer is followed by a convolutional capsule layer, with 6 × 6 × 32 capsules. Each capsule outputs an 8D vector. Every primary capsule uses 8 convolutional units. Primary capsules operate with a 9 × 9 kernel at a stride of 2. Applying non-linearity, a squashing function generates 10 capsules each having 16 dimensions. This layer works as the primary caps layer. Scalar Outputs of Conv1 layer are supplied to primary caps layer. All the layers after primary caps layer work with 8D vectors. Every channel from the 32 in the layer has a 6 × 6 grid of primary capsules. The classification layer is called DigitCaps layer. It is third in structure and it is fully connected layer having 16 dimensional capsules for each of 10 classes. Length of each capsule is measured by last layer which is used to calculate the probability of entity being present. The decoder has fully connected layers which tries to reconstruct the image.

9.3.1 CapsNet Performance Analysis The efficiency of an algorithm is dependent on the properties of a dataset. Unlike more complex datasets with varying color, scale, multiple digits in a single sample, natural scene history, affine transformations, noise, and so on, the MNIST dataset has only one channel. Capsule Network performs better than CNN on less complex datasets like MNIST as well as complex datasets such as CIFAR10 and SVHN. However, on CIFAR10 and SVHN capsule network performance is slightly less than state-of-theart [5–7] but better than CNN [7] . CapsNet has attained state of art performance on the MNIST [8] and Fashion-MNIST [9]. Hyperparameter tuning in terms of momentum, learning rate, learning rate decay and dropout rate does not influence the performance remarkably. Suitable routing operations and routing iteration number influence the performance of capsule network significantly. Higher values for the hyperparameters such as number of capsules in primary layer, number of channels in first convolutional layer, number of capsules in subsequent convolutional layer found to result in faster convergence and training in matrix capsule with EM routing [10].

9.3.2 Other Modifications on Baseline Implementation Quick-CapsNet (QCN): One downside of the capsule network is slow training and testing. The Quick Capsule Network (QCN) [11] has improved the original CapsNet in a way that it takes 5 times lesser time for inference on the datasets such as MNIST, F-MNIST, SVHN, CIFAR10. This makes CapsNet feasible for real time applications. In QCN, CapsNet architecture has been modified in terms of the second convolutional layer being replaced by a Fully Connected Layer. All the other layers have been kept same. In decoder the fully connected layer has been replaced with deconvolution layers. This results in fewer parameters for the decoder part. QCN decoder also uses class independent decoder. 1 In terms of training time, QCN was 10 times faster on

9 Review of CapsNet Arch

87

CIFAR10, SVHN dataset. For MNIST and F-MNIST it was 7 times faster. In terms of accuracy, it does not fall behind too much on MNIST and F-MNIST but shows considerable degradation on CIFAR10 and SVHN as compared to original capsule network. The Multi-Lane Capsule Network: Multi-Lane Capsule Network (MLCN) [12] used parallel processing capabilities for the efficient use of resources. Each lane designed in MLCN worked independently to construct a dimension of output. Main feature of this architecture is data independent lanes which allows parallel processing. MLCN modified original architecture by dividing primary capsules in a way that they can work independently and give out one of the dimensions of digit capsule. MLCN achieves parallelism in execution of CapsNet and at the same time gives better understanding to the network (explainability), MLCN gave 2x faster training and inference but does not mention anything about affine transformation robustness. Faster Multiscale Capsule Network: “Multiscale Capsule Network enhance the computational efficiency and representation capacity of capsule network” [13]. There are two levels of how it operates. In the first stage, multi-scale feature extraction is used to obtain structural and semantic data, and then a hierarchy of features is encoded to the multi-dimensional primary capsule in the second stage. This architecture also employs a more effective dropout process. The mask encoding approach is changed by an improved dropout algorithm for capsule. It ensures that the trajectory of each capsule does not change by looking at it as a whole. Bernoulli distribution then drops some capsules at random. The improved dropout algorithm is best suited to vector neurons because of the direction invariance.

9.3.3 Capsule Network Applications Capsule networks showed encouraging results in real life applications in machine translation, emotion and mood detection [14], intent detection [15], hand written and text recognition [16, 17] and many more. Knowledge graph(KG) completion is an important task in NLP. Capsule network performed better than CNN on this task too [18]. In classification of protein family structure CapsNet has shown slightly better performance than CNN because it takes into account hierarchical relationship of protein structure [19, 20]. Another application of computer vision is in autonoumous cars. For autonoumous cars to function properly, collected sensor data needs to be processed at high speed. As ultrasonic sensors are used in order to keep the cost low but they are not optimized for high performance. CapsNets have given excellent results for classification on ultrasonic data [21]. (Environmental) sound detection is another difficult problem solved by CapsNets [22, 23]. For this issue, unlike speech or music recognition, no domain specific information is known beforehand. Despite the fact that CNNs have had some success in this area, they have not proved themselves against overfitting.

88

S. Kapadnis et al.

9.3.4 Datasets Capsule networks’ performance presently varies depending on the dataset used to test them. Audio/video, natural language processing, speech processing and image processing are the most common dataset types used for CapsNets. The benefit of datasets is that it takes less time pre-processing data. MNIST is the most widely used dataset for CapsNet training. This is partially due to the fact that it is used as a standard, as it was used for experimental assessment in the original CapsNet [1]. It comprises 60,000 training examples and 10,000 testing images and consists of handwritten digits. Fashion-MNIST [9] is a 28 × 28 grayscale picture culled from 70,000 fashion items. The CIFAR10 dataset contains of color images. This dataset has a lot of background noise and inter-class variability. As a consequence, on CIFAR10, the original CapsNet [1] does not work well. Some researchers also tested CapsNet on SVHN which contains 600,000 street view house number images. As per requirement, some researchers even made their own datasets [21]., while others used local datasets [24].

9.3.5 Performance Evaluation Methods Accuracy is the largely used performance metric for classifiers. area under the receiver operating curve, kappa statistic, Root mean square error (RMS), F-measure, mean absolute error (MAE) and area under the precision-recall curve are the others. Balanced classes are important to pick out an evaluation criterion for a model. As if classes are balanced evaluation can be done using more than one metric. Alternatively, if classes are not balanced one metric is sufficient in most cases [25]. Other metrics like kappa statistic and F-measure are threshold tests. They do not take into account if the expected value is true or not. These metrics just gauge the proximity of expected value to a predetermined threshold. Reconstruction loss is a common metric for assessing standard autoencoder efficiency. It punishes the network for generating outputs that aren’t the same as the real outputs. The loss function is simply mean square error or cross entropy. This penalization principle makes the network adaptive to inputs, while still preventing overfitting when combined with a regularizer.

9.3.6 Discussion Capsule networks are exciting addition to Deep Learning and they are outperforming CNNs and standard Neural Nets. “A CapsNet is activated by comparing several incoming pose vectors, while Neural Nets are activated by comparing a trained weight vector to a single-incoming activity vector” [4]. This raises likelihood that the CapsNet will correctly remember the object’s pose. The CNN classifiers aren’t

9 Review of CapsNet Arch

89

especially immune to adversarial attacks. CapsNets, on the other hand, have been shown to be relatively resistant to such attacks. CapsNets have proved themselves to be durable to affine transformations to the input [1]. CapsNets have showed positive outcomes in the area of training time reduction [26]. Since capsules’ connections operate between groups of neurons rather than single neurons, they have a limited number of parameters relative to CNNs. These findings indicate that despite the fact that this critical field is yet to be thoroughly investigated, training CapsNets online will be bearable. CNNs must be trained on large datasets, on the other hand CapsNets can be generalized on smaller datasets too. CapsNets are therefore better suited for wide range of applications. Although CapsNets have been proved to surpass CNNs, they fall short in several areas. Researchers have focused their efforts on these gray areas because they need further study in order to strengthen the capsule any further. Across various datasets, the capsule does not behave consistently. CapsNets are also having trouble with datasets like CIFAR10 and ImageNet. The complexity of parallelizing dynamic algorithms is also an area mostly unexplored. Researchers have tried using parallel lanes, each of which contributes to a dimension of the output, trained similar to baseline CapsNet gave better accuracy with reduced training and inference time [12]. Full potential of capsule networks is yet to be realized. As CapsNet has strong foundation in terms of core idea, improving performance on complex datasets is the area where much attention is demanded.

9.4 Conclusion Models like CNN have given great results on many computer vision tasks. But the amount of data and computational overhead required for them is enormous. Capsules were implemented to address the difficulties posed by CNNs, and they have done commendably so far. CapsNet requires further exploration as it is relatively new. This paper reviewed models in this field which have proved to be impressive. Paper also presents a survey on the latest CapsNet architectures and implementations. CapsNet is powerful but there is also much to explore and improve.

References 1. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, vol. 2017-December, pp. 3857–3867 (2017) . Neural information processing systems foundation 2. Su, J., Vargas, D.V., Sakurai, K.: Attacking convolutional neural network using differential evolution. IPSJ Trans. Comput. Vis. Appl. 11, 1–16 (2019). https://doi.org/10.1186/s41074019-0053-3 3. Hinton G.E., Krizhevsky A., Wang S.D.: Transforming auto-encoders. In: Honkela T., Duch W., Girolami M., Kaski S. (eds.) Artificial Neural Networks and Machine Learning—ICANN 2011.

90

4. 5. 6. 7. 8. 9. 10.

11.

12. 13.

14. 15. 16. 17. 18.

19. 20. 21. 22. 23. 24. 25.

26.

S. Kapadnis et al. ICANN 2011. Lecture Notes in Computer Science, vol. 6791. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_6 Hinton, G., Sabour, S., Frosst, N.: Matrix capsules with em routing. In: ICLR, pp. 1–15 (2018). https://doi.org/10.2514/6.2003-4412 Xi, E., Bing, S., Jin, Y.: Capsule network performance on complex data (2017) arXiv: 1712.03480v1 [stat.ML], 1–7 Yang, Z., Wang, X.: Reducing the dilution: an analysis of the information sensitiveness of capsule network with a practical solution. Cap (2019). arXiv: 1903.10588v2 [cs.LG] Mukhometzianov, R., Carrillo, J.: CapsNet comparative performance evaluation for image classification, pp. 1–14 (2018). arxiv 1805.11195 LeCun, Y., Cortes, C., Burges, C.J.C.: MNIST [WWW Document] (1998). https://yann.lecun. com/exdb/mnist/. (Accessed 6.15.19) Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms, pp. 1–6 (2017). arXiv:1708.07747v2 [cs.LG] Chauhan, A., Babu, M., Kandru, N., Lokegaonkar, S.: Empirical study on convergence of capsule networks with various hyperparameters (2018). http://people.cs.vt.edu/bhuang/courses/ opt18/projects/capsule.pdf Shiri, P., Sharifi, R., Baniasadi, A.: Quick-CapsNet (QCN): a fast alternative to capsule networks. In: 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA), Antalya, Turkey, pp. 1–7 (2020). https://doi.org/10.1109/AICCSA50499. 2020.9316525 Rosario, V.M.D., Borin, E., Breternitz, M.: The multi-lane capsule network. IEEE Sig Proc. Lett. 26(7):1006–1010 (2019). https://doi.org/10.1109/LSP.2019.2915661 Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: MS-CapsNet: a novel multi-scale capsule network. IEEE Sig. Proc. Lett. 25(12), 1850–1854 (2018). https://doi.org/10.1109/LSP.2018. 2873892 Wang, Y., Sun, A., Han, J., Liu, Y., Zhu, X.: Sentiment analysis by capsules. In: International world wide web conference committee, pp. 1165–1174 (2018c) Xia, C., Zhang, C., Yan, X., Chang, Y., Yu, P.S.: Zero-shot User Intent Detection via Capsule Neural Networks. Meth (2018). arXiv: 1809.00385v1 [cs.CL] Mandal, B., Dubey, S., Ghosh, S., RiteshSarkhel, Das, N.: Handwritten indic character recognition using capsule networks. arXiv:1901.00166v1 [cs.CV] Zhao, W., Ye, J., Yang, M., Lei, Z., Zhang, S., Zhao, Z.: Investigating capsule networks with dynamic routing for text classification (2018a). arXiv: 1804.00538v4 [cs. CL] Nguyen, D.Q., Vu, T., Nguyen, T.D., Nguyen, D.Q., Phung, D.: A capsule network-based embedding model for knowledge graph completion and search personalization (2019). arXiv:1808.04122v3 [cs.CL] Mallea, M.D.G., Meltzer, P., Bentley, P.J.: Capsule neural networks for graph classification using explicit tensorial graph representations (2019). arXiv:1902.08399v1 [cs.LG] Verma, S., Zhang, Z.: Graph capsule convolutional neural networks. In: Joint ICML and IJCAI Workshop on Computational Biology, Stockholm, Sweden (2018) Popperl, M., Gulagundi, R., Yogamani, S., Milz, S.: Capsule neural network based height classification using low-cost automotive ultrasonic sensors (2019). arXiv: 1902.09839v1 [cs.CV] Iqbal, T., Xu, Y., Kong, Q., Wang, W.: Capsule routing for sound event detection (2018). arXiv:1806.04699v1 [cs.SD] Vesperini, F., Gabrielli, L., Principi, E., Squartini, S.: Polyphonic sound event detection by using capsule neural Networks. J. Sel. Top. SIGNAL Process X, 1–13 (2018) Wang, Q., Qiu, J., Zhou, Y., Ruan, T., Gao, D., Gao, J.: Automatic severity classification of coronary artery disease via recurrent capsule network (2018b). arXiv:1807.06718v2 [cs.CL] Liu, Y., Zhou, Y., Wen, S., Tang, C.: A strategy on selecting performance metrics for classifier evaluation. Int. J. Mob. Comput. Multimed. Commun. 6, 20–35 (2014). https://doi.org/10. 4018/IJMCMC.2014100102 Phong, N.H., Ribeiro, B.: Advanced capsule networks via context awareness, pp. 1–12 (2019). arXiv: 1903.07497v2 [cs.LG]

Chapter 10

Computer-Aided Segmentation of Polyps Using Mask R-CNN and Approach to Reduce False Positives Saurabh Jha, Balaji Jagtap, Srijan Mazumdar, and Saugata Sinha

Abstract Computer-aided detection and segmentation of polyps present inside colon are quite challenging due to the large variations of polyps in features like shape, texture, size, and color, and the presence of various polyp-like structures during colonoscopy. In this paper, we apply a mask region-based convolutional neural network (Mask R-CNN) approach for the detection and segmentation of polyps in the images obtained from colonoscopy videos. We propose an efficient method to reduce the false positives in the computer-aided detection system. To achieve this, we rigorously train our model by selecting non-polyp regions in the image which have high probability of getting detected as a polyp. Using two colonoscopic frame datasets, we demonstrate the experimental results that show the significant reduction in the number of false positives by adding selected regions in our computer-aided polyp segmentation system.

10.1 Introduction According to Global Cancer Statistics 2020, colorectal cancer is the third most diagnosed cancer and second most common cause of death due to cancer worldwide [1]. Most colorectal cancer begins with abnormal malign tissue growth on the inner lining of the colon or rectum, known as a polyp, and overtime if these polyps remain untreated, it may turn cancerous. The 5-year survival rate of colorectal cancer, if diagnosed at localized state is 90% which falls down to 71% if it starts spreading to surrounding tissues and falls down to 14 % if it spreads to distant organs [2]. Hence, early diagnosis plays one of the important roles in increasing the 5-year survival rate. In the medical field, colonoscopy is the gold standard for the detection S. Jha (B) · B. Jagtap · S. Sinha Department of Electronics and Communication Engineering, Visvesvaraya National Institute of Technology, Nagpur, India e-mail: [email protected] S. Mazumdar Gastro Oncology Department, Indian Institute of Liver and Digestive Science, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_10

91

92

S. Jha et al.

of polyps inside the colon. However, the colon screening in colonoscopy is highly dependent on the skills of the operator where the polyp miss rate is as high as 25% [3]. These missed polyps, at the later stage, can lead to colorectal cancer. Thus, the studies to develop real-time computer-aided polyp detection and segmentation in colonoscopy videos are highly desirable.

10.1.1 Related Work Over the last years, researchers have proposed a large number of ways for polyp detection. And based on their approach, these methods generally can be classified into two main categories as traditional (based on handcrafted features) and convolutional neural network (CNN) methods. Some traditional approaches include a study on different variations of wavelet, local binary pattern, gray-level co-occurrence properties, edge-shape, and context information-based method by Alexandre et al., Bernal et al., Iakovidis et al., Tajbakhsh et al. [4–7], respectively. Out of the mentioned studies, a work by Tajbakhsh et al. [7] showed a better performance than others. Tajbakhsh et al.’s work is based on a hybrid context-shape approach, which utilizes context information to remove non-polyp structures and shape information to localize polyps. But the performance of these traditional approaches is decreased by various parameters like camera position, a reflection of light, lumen structured polyps, and lighting condition. Coming to the methods based on CNN, the recent study by M. M. Hasan et al. [8] proposed a method using a fusion of contourlet transform and fine-tuned VGG19 pre-trained model from enhanced endoscopic patch images. Y. Shin et al. [9] suggested a method using region-based deep convolutional network (CNN) model. They used Inception-ResNet as a transfer learning scheme in the detection system with augmentation techniques during training. In the later study, they proposed two different approaches as automatic false positive learning and offline learning. Yu et al. [10] proposed an offline and online three-dimensional (3D) deep learning framework in which they used time information from colonoscopy videos and applied it to 3D fully convolutional networks (3D FCN). This method reduced the number of false positives. In the similar approach based on CNN by Ming Liu et al. [11] which is primarily focused on endoscopic videos, they have worked on single-shot detectors (SSD) for detecting polyps in colonoscopy videos.

10.1.2 Contributions Our main objective in this study is to reduce the number of false positives in polyp segmentation. We evaluate the strategy of selecting highly probable false positive regions and using those regions to improve our Mask R-CNN model. The remainder of this paper is organized as follows. In Sect. 10.2, we introduce the various materials,

10 Computer-Aided Segmentation of Polyps …

93

methods for reducing false positives and discuss the mask R-CNN implementation in detail. The test results are reported in Sect. 10.3, and conclusions are drawn in Sect. 10.4.

10.2 Methodology 10.2.1 Dataset In our study, we have used three independent datasets out of which two are publicly available polyp frame dataset, CVC-CLINIC [12] and ETIS-LARIB [13] and third is the dataset of 231 colonoscopy frames (from now on referred to as IILDS-COLON) provided by expert endoscopist. And the ground truths for the colonoscopy frames were prepared by the skilled endoscopist associated with corresponding clinical institutions. CVC-CLINIC contains 612 polyp frames of original size 384x288 and ETIS-LARIB contains 196 polyp frames of original size 1255x966. IILDS-COLON has 69 non-polyp frames and 162 polyp frames, all of original size 640x480. Combining all these 3 datasets, we have a total of 1039 colonoscopic frames with their corresponding ground truths.

10.2.2 Mask R-CNN Implementation Mask R-CNN [14] is built upon the previous object detection model faster RCNN[15] by Girshick et al. Figure 10.1 shows the end-to-end architecture of Mask R-CNN. These images pass through the feature extractor to generate feature maps. In our study, we use ResNet-101 [16] as a backbone network for the feature extractor. Anchors boxes as described in [15] are ROI proposals that we consider for the presence of objects. In the original implementation of Mask R-CNN, it uses anchor boxes of 5 scales with area 322 , 642 , 1282 , 2562 , 5122 and three aspect ratios 1:1, 2:1,

Fig. 10.1 End-to-end architecture of Mask R-CNN

94

S. Jha et al.

1:2. For our polyp segmentation application, polyps of smaller size might be present. So accordingly we consider anchor boxes of 5 scales with box area 82 , 162 , 642 , 1282 , 2562 and three aspect ratios 1:1, 2:1, 1:2. Each anchor is fed into the region proposal network (RPN) and RPN labels each anchor with object probability score and gives bounding box regressor (i.e.,  x,  y changes in anchor box such that box covers the object entirely). We select the top 6000 ROIs based on their object probability score for further non-max suppression processing. After non-max suppression with a threshold of 0.7, we select top N=2000 proposals in the training phase. While at the time of inference, we select top N=1000 proposals. The N ROIs of different sizes pass through the ROI Align [14] module to generate a feature map of fixed size for further processing in the class label, bounding box, and mask prediction for the final output. Mask R-CNN adds the parallel branch after ROI aligns for the output mask prediction. Hence, Mask R-CNN isn’t doing classification based upon mask prediction, but classification and mask prediction take place parallely.

10.2.3 Reducing False Positives This dataset of 1039 colonoscopic frames is randomly divided into a training dataset (66.2%) and a test dataset (33.8%). Thus, we have 688 colonoscopy frames in the training dataset (from now on referred to as Dataset-I) and 351 colonoscopy frames in the test dataset. We train Mask R-CNN [14] on original training dataset and analyze the output on the test dataset. After analyzing the output on the test dataset, we observe that most of the false positives were due to fecal matters Fig. 10.2a, specular reflections Fig. 10.2b, black hole in luminal region Fig. 10.2c because they have a high probability to get detected as polyp due to certain similar characteristics with polyp like shape or sudden change in pixel intensities around these regions. We come up with a novel approach to select these regions and consider these regions as a new

Fig. 10.2 From each sub-figure, image on right shows selected region from its original frame on left. a fecal matters, b specular reflection, c black hole in luminal region

10 Computer-Aided Segmentation of Polyps …

95

input image. These regions from the image which we select are non-polyp parts of the image. Figure 10.2 shows certain cases where this method is applied along with input and selected region. The patients are supposed to go through a pre-colonoscopy procedure, which helps medical experts to have a clear vision of the colon. Still, some intestinal contents are left off which is shown in Fig. 10.2 a. Another case we consider is of specular reflection from non-polyp areas of the colon, as shown in Fig. 10.2b. As shown in Fig. 10.2 c, black hole is seen in the luminal region, which poses polyp-like resemblence. Because of this, they have high chances of getting detected as a polyp. From the original dataset of 688 images, we select 187 such unique instances manually and create our new dataset of 875 colonoscopy frames (from now on referred to as Dataset-II). And ground truth for these selected regions will not contain any polyp instance as all of the instances are non-polyp areas.

10.2.4 Training Details For training the deep CNN like Mask R-CNN, a large amount of dataset is required. Due to the paucity of the polyp dataset, we use the technique of transfer learning for training our Mask R-CNN architecture. We use the pre-trained ResNet-101 [16] feature extractor which is trained on Microsoft’s (MS) common object in context (COCO) dataset [17]. We freeze the backbone network (i.e., layers of ResNet-101) and only train the RPN, classifier, and mask heads of the network using our polyp dataset. For training, we use SGD with learning momentum 0.9, batch size 2, and learning rate=0.001 for both of our datasets (i.e., Dataset-I and Dataset-II) for 30 epochs.

10.3 Results 10.3.1 Evaluation Metrics In the context of our study, we use the term “polyp segmentation” as the ability of the model to provide a colored mask over the polyp present in a given image. Since the output of our model is colored mask along with bounding box, we define the following parameters as follows: True Positive (TP): Correct detection output if the detected bounding box has the I oU ≥ 0.5 with ground truth bounding box. False Positive(FP): Any detection output in which the detected bounding box has I oU < 0.5 with the ground truth. False Negative(FN): Polyp is present in the ground truth but not detected.

96

S. Jha et al.

Table 10.1 Comparison of polyp frame detection results using 2 Dataset Training Dataset TP FP FN precision(%) recall(%) Dataset-I Dataset-II

309 299

450 330

36 46

40.7 47.5

89.5 86.6

mAP(%) 77.4 78.6

Based on the above parameters, the two common performance metrics, i.e., precision and recall, can be defined as: pr ecision =

TP TP , r ecall = T P + FP T P + FN

mAP: mAP value is derived by averaging the average precision on a per-class basis for all classes in the dataset. To compute average precision for a single class, we determine the IoU of all data points for a particular class. Once we have the IoU, we divide by the total class labels for that specific class, yielding the average precision. In our study, we have single class only, i.e., polyp class.

10.3.2 Evaluation of Polyp Frames We report the performance of our polyp segmentation model, trained on two datasets: Dataset-I, which contains original 688 colonoscopy frames and Dataset-II of size 875 frames created by adding 187 more images containing non-polyp regions which have high probability to get detected as a polyp in the original Dataset-I. Table 10.1 summarizes the comparison of evaluation metrics when trained on two datasets. We observed a significant reduction of 120 (450–330) in false positives between the two outputs. Figure 10.3 shows some of the cases row-wise, which shows improvements in detecting false positives between two outputs along with input image and ground truth. Input image in Row 1 of Fig. 10.3 includes some specular reflections which are getting detected as a polyp in output-I, but those false positives are not present in output-II. Similar improvements over specular reflection can be seen in rows 3 and 4. Another improvement is observed in the case where the black hole in the luminal region is shown in row 2 and row 3 of Fig. 10.3, which was initially getting detected in output -I but those false detections are absent in output-II. The input image in row 4 of Fig. 10.3 has polyp like inner colon linings, which was initially getting detected as polyp in output-I but those false detections are absent in output-II.

10 Computer-Aided Segmentation of Polyps … Input image

Ground truth

97 Output-I

Output-II

Fig. 10.3 Images in the 2nd column are the ground truth images for corresponding polyp frames in 1st column. The images in 3rd column are output result of Mask R-CNN model when trained on Dataset-I. The images in 4th column are output result of Mask R-CNN model when trained on Dataset-II.

10.4 Conclusion We presented a deep learning-based computer-aided polyp detection along with a segmentation system in this study. A Mask R-CNN model built upon the faster R-CNN method is adopted for this segmentation system. The proposed system is superior in detection performance in terms of precision, recall as compared to traditional approaches which are based on handcrafted features. We proposed an efficient method to reduce the false positives in the detection system by rigorously training our Mask R-CNN model on selective non-polyp regions in the colonoscopic frame which has a high probability of getting detected as a polyp. By using this method, we further achieved a significant reduction in the number of false positives which is advantageous to medical experts during colonoscopy.

98

S. Jha et al.

References 1. Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F.: Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clinicians (2021). https://doi.org/10.3322/caac.21660, https:// acsjournals.onlinelibrary.wiley.com/doi/abs/10.3322/caac.21660 2. Cancer.net webpage. https://cancer.net/cancer-types/colorectal-cancer/stages. Last accessed 15 Mar 2021 3. Leufkens, A., van Oijen, M.G.H., Vleggaar, F., Siersema, P.: Factors influencing the miss rate of polyps in a back-to-back colonoscopy study. Endoscopy 44(5), 470–475 (2012). https://doi. org/10.1055/s-0031-1291666 4. Alexandre, L.A., Nobre, N., Casteleiro, J.: Color and position versus texture features for endoscopic polyp detection. In: 2008 International Conference on BioMedical Engineering and Informatics, vol. 2, pp. 38–42 (2008). https://doi.org/10.1109/BMEI.2008.246 5. Bernal, J., Tajkbaksh, N., Sánchez, F.J., Matuszewski, B.J., Chen, H., Yu, L., Angermann, Q., Romain, O., Rustad, B., Balasingham, I., Pogorelov, K., Choi, S., Debard, Q., MaierHein, L., Speidel, S., Stoyanov, D., Brandao, P., Córdova, H., Sánchez-Montes, C., Gurudu, S.R., Fernández-Esparrach, G., Dray, X., Liang, J., Histace, A.: Comparative validation of polyp detection methods in video colonoscopy: results from the miccai 2015 endoscopic vision challenge. IEEE Trans. Med. Imaging 36(6), 1231–1249 (2017). https://doi.org/10.1109/TMI. 2017.2664042 6. Iakovidis, D.K., Maroulis, D.E., Karkanis, S.A.: An intelligent system for automatic detection of gastrointestinal adenomas in video endoscopy. Comput. Biol. Med. 36(10), 1084– 1103 (2006). https://doi.org/10.1016/j.compbiomed.2005.09.008, https://www.sciencedirect. com/science/article/pii/S0010482505000983, Intelligent Technologies in Medicine and Bioinformatics 7. Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2016). https://doi. org/10.1109/TMI.2015.2487997 8. Hasan, M.M., Islam, N., Rahman, M.M.: Gastrointestinal polyp detection through a fusion of contourlet transform and neural features. J. King Saud Univ.-Comput. Information Sci. (2020) 9. Shin, Y., Qadir, H.A., Aabakken, L., Bergsland, J., Balasingham, I.: Automatic colon polyp detection using region based deep CNN and post learning approaches. CoRR abs/1906.11463 (2019). https://doi.org/10.1109/ACCESS.2018.2856402, http://arxiv.org/abs/1906.11463 10. Yu, L., Chen, H., Dou, Q., Qin, J., Heng, P.A.: Integrating online and offline 3d deep learning for automated polyp detection in colonoscopy videos. IEEE J. Biomed. Health Informatics PP. 1–1 (12 2016). https://doi.org/10.1109/JBHI.2016.2637004 11. Liu, M., Jiang, J., Wang, Z.: Colonic polyp detection in endoscopic videos with single shot detection based deep convolutional neural network. IEEE Access 7, 75058–75066 (2019). https://doi.org/10.1109/ACCESS.2019.2921027 12. Cvc-clinic db. https://polyp.grand-challenge.org/CVCClinicDB/. Last accessed 15 Jan 2021 13. Etis-larib db. //polyp.grand-challenge.org/EtisLarib/. Last accessed 15 Jan 2021 14. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017). http://arxiv.org/abs/1703.06870 15. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497 16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385 17. Lin, T.Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV (2014)

Chapter 11

Image GPT with Super Resolution Bhumika Shah, Ankita Sinha, and Prashant Saxena

Abstract A Generative Pre-trained Transformer (GPT) model which can generate text by looking at previous text was trained to generate image pixels sequentially by making a correlation between the image classification accuracy and the image quality. This model uses the generative model for generating images. The Image Generative Pre-trained Transformer (IGPT) works on a low-resolution image which in turn produces a low-resolution output. In this paper, we have attempted to eliminate this limitation by enhancing the resolution of the output image produced by IGPT. The primary focus during this research work is to check different models and choose the simplest model for improving quality of the image generated because there are several models that support deep neural networks that have been successful in upscaling the image quality with great accuracy for achieving super resolution for a single image. The output image of low resolution is upscaled to high-resolution space employing a single filter and bicubic interpolation. We have also considered peak signal-to-noise ratio (PSNR) score and structural similarity (SSIM) value to analyze the standard of the image produced by the algorithm. The proposed approach has been evaluated using images from publicly available datasets. We have used leaky ReLU instead of ReLU as the activation function which produces better PSNR score and SSIM value, improving the overall result. Combining efficient sub-pixel convolutional neural network (ESPCNN) algorithm with IGPT, we have managed to get better output compared to the output generated by IGPT solely.

11.1 Introduction The OpenAI Team trained GPT-2 [1] on images by dividing them into long sequences of pixels, which they called IGPT [2]. OpenAI found that the model appears to recognize the characteristics of a 2-D image, like object appearance and category [2]. This can be proven by a range of coherent image samples it generates with no B. Shah (B) · A. Sinha · P. Saxena Department of Computer Science, Gujarat University, Ahmedabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_11

99

100

B. Shah et al.

help of labeled data. But the quality of the image generated by the OpenAI model was very low, and our efforts were to upscale the resolution of the image produced using a CNN-based model [3, 7] named efficient sub-pixel convolutional neural network (ESPCNN). The recovery of a high-resolution image from its low-resolution counterpart may be a topic of great interest in digital image processing. Many popular methods work on the assumption that there are multiple lowresolution images of a scene having different perspectives that are available. These methods are classified as multi-image super resolution methods. These methods exploit the redundancy of the multiple images by repeatedly applying the same information during the sampling process. However, these methods usually require computationally demanding image registration and fusion stages for images, the accuracy of which directly impacts the standard of the result. Another family of methods is single image super-resolution (SISR) techniques. These techniques seek to find out implicit redundancy that is present in natural data to recover missing high-resolution information from one low-resolution instance which is completed through local spatial correlations for images. In this research, we have also tried to experiment with different activation functions like tanh, ReLU, leaky ReLU in order that we are able to improve the ultimate results of our output.

11.2 Background and Related Work 11.2.1 IGPT Algorithm Introduction. OpenAI trained IGPT-S [2] (small), IGPT-M [2] (medium), and IGPTL [2] (large) transformers containing 76 M, 455 M, and 1.4 B parameters, respectively, on ImageNet [2]. They have also trained IGPT-XL [2] (extra large), a transformer that takes 6.8 billion parameters, on a combination of ImageNet and pictures from the net. There is a high computational cost for modeling long sequences with dense attention layers, which is the reason for training this model at low resolutions of 32 × 32, 48 × 48, and 64 × 64 pixels. The work will be done at an even lower resolution to further reduce the computational cost, but prior work has demonstrated that human performance on image classification begins to drop rapidly when the images are below these sizes. Instead, motivated by early color display palettes, OpenAI created their own 9-bit color palette [2] to represent pixels which help to provide an input sequence of a length that is three times shorter than the quality palette. Architecture. The transformer model’s decoder [8] takes an input sequence of discrete values and produces a d-dimensional embedding for every position. The mixing of the sequential elements of the image takes place only once, i.e., within the attention operation, and in order to make sure proper conditioning is applied

11 Image GPT with Super Resolution

101

when training the auto-regressive model [8]. After that, the ordinary upper triangular BERT mask is applied to an n × n matrix of attention logits [9]. When using the BERT objective [9], there is no requirement of attention logit masking. After applying content embeddings to the input sequence, the positions in M are zeroed out. Pre-Training. For the pre-training purpose, an unlabeled dataset (X) containing data as x = (×1, ×2, ×3, …, ×n) is given, then different permutations are computed as π of the set [1, n] and the density is modelled auto-regressively. The formula for the same is given as [2]: p(x) =

n 

p(xπi |xπ1 , . . . , xπi−1 , θ )

(11.1)

i=1

We have to pick up the permutation =i for 1 ≤ i ≤ n. Here, the main aim is to minimize the negative log-likelihood of this cost function can be given as [2]: L AR =



− log log p(x)



(11.2)

x∼X

The BERT model [9] samples a sub-sequence M[1, n] such that each index i has the probability of 0.15 of appearing within the set M where M is the BERT mask. Here, the aim is to attenuate the negative log-likelihood of the masked elements ✕M conditioned on the unmasked ones x[1, n]\M. This function is given as [2]: L BERT =

  

− log log p(xi |x [1,n] ) M

 (11.3)

x∼X M i∈M

During the pre-training, AR [8] or Bert [9] can be used to minimize the loss over the pre-trained dataset. Fine-Tuning. During the fine-tuning, IGPT averages the nL pool across the sequence dimension to obtain one d-dimensional vector of features for each sample [2]:  f L = n iL i

(11.4)

Projection from f L to logits that utilizes minimization of cross entropy loss LCLF. During the fine-tuning of LCLF, it attains acceptable performance. Then, the empirical joint objective is found which can be given as below [10]: L GEN + L CLF

(11.5)

Limitation of IGPT. One of the limitations of Image GPT is that it works on only low-resolution images while there are many other algorithms that work on high

102

B. Shah et al.

resolution, but they are based on supervised learning. Since the Image GPT works on low-resolution input images, the final output is also of low resolution. So when the pixelated image is generated through the standard interpolation or enlargement, a low-quality image is generated.

11.2.2 ESPCNN Algorithm ESPCNN Structure. Suppose there are L layers in an SR network. For first L−1 layers, f l × f l convolution is performed on the low-resolution input image to obtain nl−1 feature maps. An efficient sub-pixel convolution is performed at the final layer to induce the HR image as the output. We can conclude that it is a shallow network as the number of layers is 3. The parameters for each layer are: (f 1 , n1 ) which is (5, 64), (f 2 , n2 ) which is (3, 32) and f 3 = 3. Layer No. 1: There are 64 filters with the filter of size of 5 × 5. Layer No. 2: There are 32 filters with the filter of size of 3 × 3. Layer No. 3: A single filter with a filter of size of 3 × 3. For a YUV image [3], the Y parameter is taken into consideration as human eyes are more sensitive to the brightness, i.e., luminance than the color, i.e., chrominance of the image (Fig. 11.1). Leaky ReLU as Activation Function. ReLUs are not without drawbacks. Generally, ReLUs are non-zero centered and are non-differentiable at zero, but differentiable elsewhere. Another problem is the dying ReLU problem, where some ReLU neurons essentially die for all inputs and remain inactive for all the other input which is supplied; here, we have no gradient flow and if a sizable number of dead neurons are there in a large network, the performance is affected which could be corrected by making use of Leaky ReLU in which the slope is modified left of x = 0, and thus, it causes a leak which extends the range of ReLU.

Fig. 11.1 The efficient sub-pixel convolutional neural network (ESPCNN) has two convolution layers for the extraction of feature maps, and there is a single sub-pixel convolution layer which aggregates the feature that build the super resolution image employing a single step by mapping from low-resolution space [3]

11 Image GPT with Super Resolution

103

PSNR (Peak Signal-to-Noise Ratio) Score [11]. The model will have a high PSNR score if the mean squared error (MSE) is low. This approach works well, but even with a high PSNR scores, the images might not look good to the human eye, and this can be a major downside of this approach. The way humans perceive image quality does not perfectly correlate to the PSNR score. Trying to minimize the MSE produces images which will look similar to the original but will not always look pleasing to the human eye. PSNR = 10 ·

MAX2J MSE

(11.6)

SSIM (Structural SIMilarity) Index [12]. The similarity of two images is compared using the SSIM index. The SSIM index value is often considered as a top quality measure of an image that is being compared; here, it is assumed that the referenced image is of high quality.



2μx μ y + c1 2σx y + c2

SSIM(x, y) = 2 μx + μ2y + c1 σx2 + σ y2 + c2

(11.7)

All distorted images have approximately the same mean squared error values when compared to the original image, but they have different quality. SSIM gives a better indication of image quality [12] (Fig. 11.2). Comparison of ESPCNN with other models. ESPCNN uses residual blocks [3] rather than normal convolution layers. The success of architectures like ESPCNN popularized the idea that residual blocks are more powerful than simple convolutional layers because they allow using more layers without overfitting. In ESPCNN, the upsampling step is done at center of the neural network. Generally, in an SR architecture, a step of upsampling is always involved. If we are using bicubic interpolation inside the network to upsample, then we can either use it at the

Original Image MSE = 0 SSIM = 1

MSE = 144

MSE = 144

SSIM = 0.988

SSIM = 0.913

Fig. 11.2 Comparison between images with same MSE but different SSIM values [12]

104

B. Shah et al.

Fig. 11.3 Plot of the comparison between PSNR score and speed in seconds for various models when performing resolution upscaling with the scale of three. The average PSNR and run-time run on one CPU core clocked at 2.0 GHz over the photographs from Set14 are displayed in these results [3]

beginning or at the top. We cannot use it at the center because it is a set operation that cannot be learned. The way ESPCNN got around was that it used sub-pixel convolutions to upscale. This method of upscaling is a learnable operation, and it led to an improvement in the results. Hence, an optimized PSNR score was achieved [3] (Fig. 11.3). Limitation of other Super-Resolution Models. Convolutional neural network (CNN) approaches like super-resolution convolutional neural network, fast superresolution convolutional neural network, and very deep super resolution. • Upscaling or upsampling of the low-resolution image. • Performing convolution in order to generate a high-resolution image. • The convolutions are based on the low-resolution image on which the upsampling has been done, because the low-resolution image is upsampled at the initial stage. Thereby, the number of computations is increased.

11.3 Proposed Algorithm As discussed in section II.A.5), we are focusing on how to generate a better output from the Image GPT. To do that, we generated output using IGPT-S on various datasets like MNIST, Fashion MNIST, Cifar-10, etc., and then forwarded the output to ESPCNN that improved the quality of the output generated previously. Here, we have validated the result using PSNR score and SSIM index value.

11 Image GPT with Super Resolution

(a) Epochs VS Average PSNR Score

105

(b) Epochs Vs Validation Loss

Fig. 11.4 Showing prediction plots at different epochs

11.4 Our Results on IGPT-S and ESPCNN Super Resolution The graph above shows that the average PSNR score is continuously increasing and validation loss is continuously decreasing, and we get better results as we increase the number of epochs. For this experiment, we kept considering 140 epochs, and we saw a good result based on the ESPCNN algorithm. Table 11.1a shows the images with the distorted output when the learning rate is set to 0.006, and Table 11.1c shows a better generated image when the learning rate is reduced to 0.003.

11.5 Conclusion An auto-regressive model can do significantly well in image generation and classification as well. The Generative Pre-trained Transformer (GPT) is known for NLP, but with this research, it is clear that we can generate and classify images by using the same technique. Better generative models learn better representations. As the size of the model is increased, the accuracy improves too which implies that the quality of the model is dependent on the size of the training dataset even for such a huge amount of data. We found that on a larger learning rate (0.006), the model generated distorted output and produced a better image on a smaller learning rate (0.003). Our proposed model works on a large number of parameters and an enormous dataset as compared to other models. The sequence transformer used in IGPT can compute based on different features in multiple domains like text and images because of its simplicity and generality. We have demonstrated that a non-adaptive upscaling at the primary layer provides worse results compared to an adaptive upscaling for SISR and due to which it requires more computational complexity.

106

B. Shah et al.

Table 11.1 Final results generated by the IGPT and ESPCNN

(a) Comparison between low-resolution and predicted image = > MSE: 872.26, SSIM: 0.48PSNR of low-resolution image and predicted image is 23.4956

(b) Comparison between low-resolution and predicted image => MSE: 477.84, SSIM: 0.41PSNR of low-resolution image and predicted image is 26.1092

(c) Comparison between low-resolution and predicted image => MSE: 415.55, SSIM: 0.24PSNR of low-resolution image and predicted image is 26.7158

To address the matter, we have proposed to perform the feature extraction stages within the HR space rather than LR space. To meet this requirement, we have implemented a different sub-pixel convolution layer which can resolve the given LR data into HR space with a little to almost negligible computational cost compared to a deconvolutional layer at the time of training. Evaluation performed on an extended benchmark dataset with upscaling factor of 3 shows that the speed is significantly good approximately ten times better and performance boost compared to the previous CNN approach in which there are more parameters considered.

References 1. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. Accessed: 30 December 2020. [Online]. Available https://git hub.com/codelucas/newspaper

11 Image GPT with Super Resolution

107

2. Chen, M., et al.: Image GPT. ICML (2020) 3. Shi, W., et al.: Real-time single image and video super-resolution using an efficient subpixel convolutional neural network. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 1874–1883 (2016). https:// doi.org/10.1109/CVPR.2016.207 4. Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial Feature Learning, pp. 1–18 (2016). [Online]. Available http://arxiv.org/abs/1605.09782 5. Huang, Y., et al.: GPipe: efficient training of giant neural networks using pipeline parallelism 6. Isola, P.: Contrastive multiview coding 7. Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. arXiv (2019) 8. Vaswani, A., et al.: Attention is all you need 9. Devlin, J., Chang, M.-W., Lee, K., Google, K.T., Language, A.I.: BERT: pre-training of deep bidirectional transformers for language understanding. [Online]. Available https://github.com/ tensorflow/tensor2tensor 10. Kornblith, S., Shlens, J., Le, Q.V.: Do better ImageNet models transfer better ? pp. 2661–2671 11. SR with OpenCV Homepage. https://bleedai.com/super-resolution-with-opencv/ 12. NYU Center for Neural Science Homepage. https://arxiv.org/abs/2006.13846

Chapter 12

Boosting Accuracy of Machine Learning Classifiers for Heart Disease Forecasting Divya Lalita Sri Jalligampala, R. V. S. Lalitha, M. Anil Kumar, Nalla Akhila, Sujana Challapalli, and P. N. S. Lakshmi

Abstract Heart disease is one of the significant diseases which causes a huge number of deaths all over the world. Even medical specialists are facing difficulties for the proper diagnosis of heart disease which raises a need for a new classification scheme. But it becomes a crucial task for healthcare providers due to the rapid increase of medical data size every day. To resolve this, several machine learning algorithms are discussed in this paper, and these algorithms’ performance is measured by using different metrics like accuracy, precision, recall, and F1-score. But these algorithms are not acceptable for accurate prediction and diagnosis. To further improve the accuracy of classifiers, different ensemble methods were used because for any machine learning algorithm, accuracy is the main criteria to measure the performance. In this new methodology, the feature importance method is used as a pre-processing technique to get a minimum number of attributes rather than using all attributes in the dataset which has impact on the accuracy of classifiers. After that pre-processed data is trained by using various classifiers like linear regression, SVM, naïve Bayes, and decision tree, and then finally, three ensemble methods like bagging, AdaBoosting, and Gradient boosting are used to boost the performance of the classifiers. From the observations, the bagging ensemble algorithm elevated the highest accuracy, 80.21%, than the accuracy of other classifiers.

12.1 Introduction In anatomy, the most vital organ is the heart. It circulates blood to the whole body to send oxygen and nutrients to the tissues as well as pumps blood to itself by using coronary arteries. And it can also eliminate unnecessary things from the body. The malfunctioning and abnormal condition of the heart are termed heart disease. Today, D. L. S. Jalligampala (B) · R. V. S. Lalitha · M. Anil Kumar · S. Challapalli · P. N. S. Lakshmi Department of C.S.E, Aditya College of Engineering & Technology, Surampalem, East-Godavari, India N. Akhila Department of I.T, Pragati Engineering College(A), Surampalem, East-Godavari, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_12

109

110

D. L. S. Jalligampala et al.

heart disease is the most prominent problem for the public, which causes a greater number of deaths around the world irrespective of gender and place. According to the recent statistics by the World Health Federation in 2019, more than 23 million deaths are expected to happen by 2030. Actually, several risk factors are there that put heart disease in the top place like diabetes, high BP, tobacco use, physical inactivity, poor diet, high cholesterol, obesity, alcohol use, etc. So, the major challenge is reducing this count. For this, an automated system must be developed. But healthcare industry is generating a bulk amount of data day by day. So, this is a big challenge for healthcare professionals to provide such an environment. To tackle this problem, several data mining techniques were introduced and used. Later, machine learning algorithms came into the picture and gets more popularity because of their features. Machine learning is one of the fastest-growing fields which is a subfield of data science to address the most important data mining tasks of analysis and prediction. There are three classes of machine learning approaches, supervised learning, unsupervised learning, and reinforcement learning. These algorithms produce efficient results so that medical officials can use these results for analysing patients, and they can make decisions about diagnosis perfectly. In this paper, different supervised machine learning algorithms were used on the given input training dataset like support vector machine, linear regression, decision tree, and naïve Bayesian classifier, etc., which produce different accuracies. For any machine learning classifier, accuracy is the important criteria to show the difference between various classifiers in terms of their performance. So we can further improve the accuracy of classifiers by using ensemble methods. Ensemble methods are techniques that are developed to boost the accuracy of predicted results by creating multiple models, and later, they are combined. Ensemble methods are categorized into three types: bagging, boosting, and stacking. In the rest of the paper, these techniques will be addressed. Section 2 discusses about Recent Work, Sect. 3 talks about the proposed methodology, Sect. 4 discusses implementation work and results, and Sect. 5 finally gives the conclusion of the work.

12.2 Related Works Anbuselvan [1] In this, author examines different machine learning techniques like logistic regression, naïve Bayes, support vector machine, K-nearest neighbour, decision tree, random forest, and ensemble technique XG Boost on the given dataset. From the experimentation, it is exhibited that random forest produces the highest accuracy as compared with other classifiers. In [2], author investigates comparative analysis of different ensemble methods like bagging, boosting, stacking, and blending on stock market prediction and observed performance concerning the accuracy and others. From the results, stacking and blending show the highest accuracy than other methods of bagging and boosting. Yekkala et al. [3] examines various ensemble algorithms like bagged tree, random forest, and AdaBoost on the given dataset to predict heart disease. And author used one of the feature subset selection methods, particle

12 Boosting Accuracy of Machine Learning Classifiers …

111

swarm optimization (PSO) as a pre-processing step to perform prediction accurately. From the results, it is observed that bagged tree with PSO achieved the highest accuracy. Lalitha et al. [4] examines different machine learning approaches for air pollution analysis. Raza [5], in this, author proposed a method where various machine learning classifiers are ensembled by using one of the ensemble techniques, Majority Voting to predict heart disease and results are compared with other techniques and observed that this proposed method achieved the highest accuracy 88.88%. Rajendran and Vincent [6], the goal of the author is to design a robust machine learning algorithm to predict heart disease. For this, ensemble of different machine learning algorithms like LDA, classification, and regression trees, SVM, K-nearest neighbours, and naïve Bayes are used. The result shows improved accuracy than individual machine learning algorithms. Jan et al. [7] proposed a new approach that ensembles multiple classifiers like SVM, artificial neural networks, naïve Bayesian, regression analysis, and random forest to predict and diagnosis heart disease. From the results, it is demonstrated that the proposed method gave better prediction accuracy and reliability. And the author also presented the smart heart disease diagnosis system with a user-friendly interface for easy prediction by healthcare systems. Kamalapurkar and GH [8], in this, author proposed online portal for the prediction of heart disease by using different ensemble methods and compared their works based on accuracy. Bulut [9], in this study, the goal is to predict the chances of getting heart disease by applying one of the ensemble algorithms, bagging. From the results, it shows high performance. So, healthcare professionals can use these results as a base to take some provisions before a heart attack. Lalitha et al. [10] proposed a method to predict and analyse COVID disease by using Cubist and One R. [11], this paper addresses improved machine learning methods to predict heart disease more effectively. The proposed method first divides the given dataset into smaller partitions by using a mean-based splitting method; then one of the machine learning algorithms classification and regression trees (CART) is used to construct a model. And then weighted ageing classifier ensemble is applied on different CART models which produced the best performance that is an effective prediction of heart disease as compared with other machine learning algorithms. Omotosho et al. [12], in this paper, the author examines the comparison of different ensemble methods like bagging and boosting with classifiers naïve Bayes and neural networks, and performance is measured in terms of accuracy, kappa statistics, ROC, precision, and MCC. From the results, bagging of naïve Bayes produced the highest accuracy with 83%. Yuan et al. [13], author proposed a new method, hybrid gradient boosting decision tree with logistic regression to improve accuracy of heart disease prediction. Habib et al. [14], author used different ensemble techniques to predict heart disease.

12.3 Methodology The below steps are performed to implement the new methodology.

112

D. L. S. Jalligampala et al.

Step 1: Take the dataset and apply data pre-processing technique. (feature selection method—feature importance). Step 2: Train the pre-processed dataset by using different classifiers and evaluate performance. Step 3: And then, different ensemble methods are applied to the dataset. Step 4: Finally, results are compared concerning their performance by using evaluation metrics.

12.3.1 Dataset Description Heart disease dataset is available from various databases like UCI, Hungary, Long Beach, and Cleveland. The current dataset from UCI databases has health information of heart patients with respect to various attributes like high BP, chest pain, physical inactivity, and poor diet. The repository is having 303 units and 14 attributes where 13 attributes resemble the risk factors for getting heart disease, and the last attribute is an outcome that is a class label. The following table describes attributes in the dataset taken (Table 12.1).

12.3.2 Architecture See Fig. 12.1.

12.3.3 Pre-processing The dataset may contain some attributes which are not relevant for the current task. Using those attributes will have a negative impact on the performance of the models. So, to avoid this, pre-processing is performed. Data pre-processing is very essential in any machine learning process as it prepares data well before applying any algorithm to it. In this paper, feature selection is used to get only relevant attributes from the dataset. Three kinds of feature selection techniques are univariate selection, correlation matrix with heat map, and feature importance. In this paper, the feature importance technique was used on the given dataset because of its characteristics speed, and easy retrieval of data. It will assign a score to every feature w.r.t its importance. High score represents the feature that is more important than other features. So, only selected features are used for the further process by dropping irrelevant features. Feature importance is one of the in-built classes; we can use it directly by importing the package Scikit-learn. It comes from

12 Boosting Accuracy of Machine Learning Classifiers …

113

Table 12.1 Specification about dataset Id

Feature name

Type

Specification

F1

Age

Numerical

Age of a patient in years

F2

Sex

Binary

Gender of a patient (Male/Female)

F3

CP

Nominal

Chest Pain Type: 1: typical Angina, 2: atypical angina, 3: non-anginal pain, 4: asymptomatic

F4

Trestbps

Numerical

Resting blood pressure in mm Hg

F5

Serum cholesterol

Numerical

Serum cholesterol in mg/dl

F6

Fasting blood sugar

Binary

FBS in mg/dl; True-1, False-0

F7

Resting electrocardiographic result (ECG)

Nominal

Rest ECG results

F8

Thalach

Numerical

Maximum heart rate

F9

Exercise-induced angina (Exang)

Binary

Exercise-induced angina result

F 10

Old peak

Numerical

Depression induced by exercise relative to rest

F 11

Slope

Nominal

The slope of peak exercise ST segment

F 12

CA

Nominal

Number of major vessels coloured by fluoroscopy

F 13

Thal

Nominal

Defect types: normal, fixed defect, and reversible defect

F14

Concept class

Binary

Target result; 1: Yes, 0: No

Data

Preprocessing (Feature Importance)

Classifiers

Decision Tree SVM

Logistic Regression

Naive Bayes

Performance Evaluation Ensemble Performance

Output

Fig. 12.1 Proposed methodology

tree-based classifiers where it has extra tree classifier to list out attributes according to their importance.

114

D. L. S. Jalligampala et al.

12.3.4 Learning Classifiers Initially, the dataset is split into two categories, namely training dataset and test dataset. Training data is then trained by using various classification algorithms like decision tree, support vector machine, logistic regression, and naïve Bayes. Decision Tree. One of the finest supervised learning algorithms which address classification problems that are operated on both categorical and continuous data. It is a flowchart like structure made up of several nodes that are linked by using edges called branches. Every tree begins with a specially designated node called a root node, where each node is divided into further nodes based on the decisions made by an attribute test condition. These nodes are called internal nodes. Leaf nodes are the nodes that represent the final output class label. Decision trees are easy to understand by anyone because of their tree-like structure. Support Vector Machine. Support vector machine (SVM) is a supervised learning algorithmic model that analyses data which is used for challenges of classification and regression. SVM is used to maximize decision boundaries, i.e., hyperplane from n-dimensional space with the help of features called support vectors to find the best splitting boundary that exists between data points. Support vectors are extreme points. The following are the mathematical formulae’s used: Line equation: x = cz + b.

(12.1)

W T X = 0.

(12.2)

where c and b are constants Hyperplane equation:

Logistic Regression. Logistic regression is one of the supervised machine learning techniques which attempts to model the relationship between two variables by fitting a linear equation to observed data. Here, one variable is called an independent and the other is called a dependent variable. It was taken from the statistics field and applied to the dataset to know the relationship between these variables. In this, a model is constructed to represent the relationship between the input variable (P) with one output variable (Q). So, Q can be computed by taking a linear combination of the input variables (P). Q = C P + D.

(12.3)

where C is constant and D is coefficient, respectively. If a single input variable is used for regression, then it is called simple linear regression, and if more than one input variable is used, then it can be called multiple linear regression. Our linear

12 Boosting Accuracy of Machine Learning Classifiers …

115

regression model representation can be rewritten as: y = a0 + a1 ∗ x.

(12.4)

where a0 and a1 are the regression coefficients. We use a learning technique to find a good set of coefficient values. Naive Bayes. Naive Bayes is a supervised machine learning algorithm. It is a classification technique that is best suitable for a huge dataset. If the output variable is discrete, then naïve Bayes is used. Naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable. This classifier is a probabilistic classifier that is inspired by Bayes theorem to perform prediction based on the probability of features. Bayes theorem states the following equation: P(R/Q) =

(P(Q|R) ∗ P(R)) P(Q)

(12.5)

where P(R)—prior probability which is the probability of a class, P(Q)—prior probability, P(R|Q)—probability of evidence given that hypothesis is true, P(Q|R)—probability of evidence given that evidence is true. The above equation can be rewritten as: P(Ci|d) = (P(d|Ci ) ∗ P(Ci))/P(d).

(12.6)

where Ci represents i class labels, P(d) is constant, so it can be neglected and P(d|Ci) can be calculated by applying independence assumption.

12.3.5 Ensemble Methods To represent the performance of any machine algorithm, accuracy is the vital criteria. To improve the accuracy of the model (or) classifier by reducing the variance, bias ensemble methods were developed. These methods are used to train multiple learners by aggregating their results rather than using a single learner to provide accurate results. Three kinds of ensemble methods are applied: bagging, boosting, and stacking.

12.3.5.1

Bagging

It is one of the powerful ensemble methods which is mainly used to reduce the variance of a classifier. Bagging is a method that is a combination of bootstrapping and

116

D. L. S. Jalligampala et al.

aggregation. It makes several learners work together to generate accurate predictions as compared to predictions generated by one learner. Algorithm: Step 1: Let D is a dataset, S = {s1, s2 … sn} represents sample set Step 2: for iteration = 1 to n Step 3: create bootstrap samples on the given dataset with replacement methods Step 4: train all samples by using base learners Step 5: End for Step 6: Final algorithm is the ensemble of all base learners & produces accurate output.

12.3.5.2

Boosting

Boosting is one of the ensemble methods to improve the predictions of a learner by training weak learners to convert them into a strong learner. There are three types of boosting algorithms: AdaBoost, gradient boosting, and XGBoost. AdaBoost. AdaBoost means adaptive boosting, which is one of the powerful and effective boosting techniques that was developed to address classification and regression problems. The goal of this technique is to boost the accuracy of machine learning classifiers. Working process: Step 1: Take the dataset and then give weights to each point in the dataset equally. Step 2: Pass it as input to the AdaBoost which identifies the wrongly classified data points. Step 3: Next increase the weights for the data points that are failed to predict correctly, to make them correctly classified by the classifier in the next iteration. Step 4: If accuracy is achieved, then stop the process otherwise do step 2. Gradient Boosting. It is one of the robust algorithms that handle classification and regression problems. This algorithm mainly focuses on reducing the loss function from the prediction given by previous learners by adding a new model. This algorithm is functioning based on three components: weak learner to perform predictions, loss function that must be minimized, and new models which are added to weak learners.

12 Boosting Accuracy of Machine Learning Classifiers …

117

12.4 Implementation and Results 12.4.1 Technology Anaconda is one of the distributions that come up with several predefined packages, a large number of compilers, and various tools used by data scientists and IT Anaconda provides packages and modules for one of the powerful languages in the world Python which makes it attractive. It is open-source, supports R and Python IDE and Jupyter Notebook. Jupyter notebook provides an environment for the developers to create, edit, and present documents with the best visualization tools and that can be shared with others.

12.4.2 Evaluation Metrics The classifier performance is evaluated by using a matrix called a confusion matrix. A confusion matrix is denoted as a table where a row can be represented with an actual class label and a column can be represented with a predicted class label. It gives information about how many records are correctly labelled and incorrectly labelled by a classifier. It can be depicted as the following (Table 12.2). The following are the evaluation metrics that are used to measure the performance of the classifier. Accuracy. Accuracy can be defined as the percentage of correctly classified records. The formula is Accuracy = (TP + FP)/(TP + FP + TN + FN).

(12.9)

Precision. Precision can be defined as percentage of correct positive predictions from the total positive predictions: Precision = (TP)/(TP + FP)

(12.10)

Table 12.2 Depiction of confusion matrix Actual /predicted class label

Positive (YES)

Negative (NO)

Positive (YES)

True positives

False positives

Negative (NO)

False negatives

True negatives

Where TP True positives: Number of tuples that are correctly classified as C1; FP False positives: the number of tuples with actual class label C1 is incorrectly classified as C2; FN False negatives: the number of tuples with actual class label C2 is incorrectly classified as C1; TN True negatives: Number of tuples that are correctly classified as C2

118

D. L. S. Jalligampala et al.

Fig. 12.2 Sample input and graphical depiction of feature importance

Recall (or) sensitivity. Recall can be defined as the division of the number of correctly classified positive values and total positive values. Recall = (TP)/(TP + FN)

(12.11)

F1 Score. It is the harmonic mean of recall and precision F1 − score = (2 ∗ precision ∗ recall)/(precision + recall)

(12.12)

Error Rate. Error rate = 1 − accuracy

12.4.3 Results 12.4.3.1

Sample Input and Feature Importance (Pre-processing)

See Fig. 12.2.

12.4.3.2

Classifiers’ Accuracy

See Table 12.3.

12.4.3.3

Ensemble Methods’ Accuracy

See Table 12.4.

(12.13)

12 Boosting Accuracy of Machine Learning Classifiers …

119

Table 12.3 Accuracy for all classifiers S. No.

Classifier

Accuracy

Precision

Recall

F1-score

1

Decision tree

75.9

69.3

82.9

75

2

Logistic regression

79.1

72

79.1

87.8

3

Naïve Bayes

78.02

70.5

79.1

87.8

4

SVM

78.02

71.42

87.8

78.2

Accuracy

81 80.21 80 79.12 Table 12.4 Accuracy of ensemble methods79 with graphical 78.02 representation 78 S. No. Ensemble Accuracy 77 Methods 76

1

Bagging

80.21

2

Ada boosting

79.12

3

Gradient boosting

78.02

Bagging

Adaboosng

Gradient boosng

12.5 Conclusion As we all aware that accuracy is very important metric to measure the performance of different machine learning algorithms. So to boost the performance of classifiers, a new methodology was proposed in this paper. First of all, one of the pre-processing techniques, feature importance, is used to identify only important attributes from the dataset. Next, variety of classifiers are used on the pre-processed dataset, thereby induces different accuracies. To further enrich the accuracy of classifiers used above, three different ensemble methods used are bagging, AdaBoosting, and gradient boosting. From the observations, the bagging method yields the best accuracy with 80.21% than other classifiers. In future, the accuracy of prediction can be upgraded by taking a combination of different classifiers, by using feature engineering concept or by applying hyperparameter tuning on the taken dataset and prediction on image-based analysis.

References 1. Anbuselvan, P.: Heart disease prediction using machine learning techniques. Int. J. Eng. Res. Technol. (IJERT) 09(11), (2020) 2. Nti, I.K., Adekoya, A.F., Weyori, B.A.: A comprehensive evaluation of ensemble learning for stock- market prediction. J. Big Data 7, 20 (2020). https://doi.org/10.1186/s40537-020-002 99-5 3. Yekkala, I., Dixit, S. Jabbar, M.A.: Prediction of heart disease using ensemble learning and particle swarm optimization. In: 2017 International Conference On Smart Technologies For

120

4.

5. 6.

7.

8.

9.

10.

11.

12. 13.

14.

15.

16.

17.

18.

D. L. S. Jalligampala et al. Smart Nation (SmartTechCon), pp. 691–698 (2017).https://doi.org/10.1109/SmartTechCon. 2017.8358460 Lalitha, R.V.S., Kavitha, K., Vijaya Durga, Y., Sowbhagya Naidu, K., Uma Manasa, S.: A machine learning approach for air pollution analysis. In: Bhattacharyya, S., Nayak, J., Prakash, K.B., Naik B., Abraham, A. (eds.) International Conference on Intelligent and Smart Computing in Data Analytics. Advances in Intelligent Systems and Computing, vol. 1312. Springer, Singapore. https://doi.org/10.1007/978-981-33-6176-8_9. Raza, K.: Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. (2019). https://doi.org/10.1016/B978-0-12-815370-3.00008-6 Rajendran, N.A., Vincent, D.R.: Heart disease prediction system using ensemble of machine learning algorithms. Recent Patentsn Eng. 13, 1 (2019). https://doi.org/10.2174/187221211366 6190328220514 Jan, M., Awan, A.A., Khalid, M.S., Nisar, S.: Ensemble approach for developing a smart heart disease prediction system using classification algorithms. Res Rep Clin Cardiol. 9, 33–45 (2018). https://doi.org/10.2147/RRCC.S172035 S. Kamalapurkar, GH, S.G.: Online portal for prediction of heart disease using machine learning ensemble method (PrHD-ML). In: 2020 IEEE Bangalore Humanitarian Technology Conference (B-HTC), pp. 1–6 (2020). https://doi.org/10.1109/B-HTC50970.2020.9297918 Bulut, F.: Heart attack risk detection using bagging classifier. In: 2016 24th Signal Processing and Communication Application Conference (SIU), Zonguldak, pp. 2013–2016 (2016). https:// doi.org/10.1109/SIU.2016.7496164 Lalitha, R.V.S., Lalitha, J.D., Kavitha, K., RamaReddy, T., Srinivas, R., Sujana, C.: Prediction and analysis of corona virus disease (COVID-19) using Cubist and OneR. In: R.V.S. Lalitha, et al. (eds.) IOP Conference Series: Materials Science and Engineering, Volume 1074, International Conference on Computer Vision, High Performance Computing, Smart Devices and Networks (CHSN 2020) 28th-29th December, Kakinada, India (2021). IOP Conf. Ser.: Mater. Sci. Eng. 1074, 012022, https://doi.org/10.1088/1757-899x/1074/1/012022 Mienye, I.D., Sun, Y., Wang, Z.: An improved ensemble learning approach for the prediction of heart disease risk. Inf. Med. Unlocked 20, 100402 (2020). ISSN 2352-9148. https://doi.org/10.1016/j.imu.2020.100402. Embedded Syst. (WECON), Rajpura, 1–7 (2016). 1109/WECON.2016.7993480 Omotosho, L., Olatunde, Y., Akanbi, C.: Comparison of adaboost and bagging ensemble method for prediction of heart disease (2019) Yuan, K., Yang, L., Huang, Y., Li, Z.: Heart disease prediction algorithm based on ensemble learning, In: 2020 7th International Conference on Dependable Systems and Their Applications (DSA), pp. 293–298 (2020). https://doi.org/10.1109/DSA51864.2020.00052 Habib, A.-Z.S.B., Tasnim T., Billah, M.M.: A study on coronary disease prediction using boosting-based ensemble machine learning approaches. In: 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), pp. 1–6 (2019). https://doi.org/10.1109/ ICIET48527.2019.9290600. Singh, A., Kumar, R.: Heart disease prediction using machine learning algorithms. In: 2020 International Conference on Electrical and Electronics Engineering (ICE3), Gorakhpur, India pp. 452–457 (2020). https://doi.org/10.1109/ICE348803.2020.9122958. Dinesh, K.G., Arumugaraj, K., Santhosh, K.D., Mareeswari, V.: Prediction of cardiovascular disease using machine learning algorithms. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), Coimbatore, pp. 1–7 (2018). https:// doi.org/10.1109/ICCTCT.2018.8550857 Rajesh, N., et al.: Prediction of heart disease using machine learning algorithms. Int. J. Eng. Technol. [S.l.], 7(2.32), 363–366 (2018). ISSN 2227-524X. Available at: https://www.scienc epubco.com/index.php/ijet/article/view/15714. Date accessed: 07 Jan 2021. https://doi.org/10. 14419/ijet.v7i2.32.15714. Krishnan, S.J., Geetha, S.: Prediction of heart disease using machine learning algorithms. In: 2019 1st international conference on innovations in information and communication technology (ICIICT), CHENNAI, India, 2019, pp. 1–5. https://doi.org/10.1109/ICIICT1.2019.8741465

12 Boosting Accuracy of Machine Learning Classifiers …

121

19. Shankar, V., Kumar, V., Devagade, U., et al.: Heart disease prediction using CNN algorithm. SN COMPUT. SCI. 1, 170 (2020). https://doi.org/10.1007/s42979-020-0097-6 20. Ram M.K., Sujana C., Srinivas R., Murthy G.S.N.: A fact-based liver disease prediction by enforcing machine learning algorithms. In: Smys, S., Tavares, J.M.R.S., Bestak, R., Shi, F. (eds.) Computational Vision and Bio-Inspired Computing. Advances in Intelligent Systems and Computing, vol. 1318. Springer, Singapore (2021). https://doi.org/10.1007/978-981-336862-0_45

Chapter 13

Rapid Detection of Fragile X Syndrome: A Gateway Towards Modern Algorithmic Approach Soumya Biswas, Oindrila Das, Divyajyoti Panda, and Satya Ranjan Dash

Abstract Human medical care is perhaps the main subjects for society. It attempts to track down the right powerful and strong infection location straightaway to patients in the proper considerations. With an anticipated birth prevalence of 0.12 in 1000 females and 0.25 in 1000 males, the second most frequent cause of serious mental disability is Fragile X Syndrome. It’s a repetition of trinucleotide, when (CGG)n factor located within the 5’ untranslated region of the Fragile X Mental Retardation 1 (FMR1) gene develops to more than 200 repetitions (complete mutation) and becomes hypermethylated. The failure to transport FMR1 protein (FMRP), which occurs in the fragile X condition, is linked to such events. Since the wet lab technique is not precised enough and time consuming, so the dry lab methods like statistics, bioinformatics and computer science are becoming fundamental gateway for disease diagnosis and new model towards modern treatment. Here, we have proposed a gene alignment and nucleotide base matching algorithm ‘FXSDetect’ managing the determination of Fragile X syndrome in very short span leading a way towards rapid diagnosis.

S. Biswas · O. Das School of Biotechnology, KIIT University, Bhubaneswar, Odisha, India e-mail: [email protected] D. Panda Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Odisha, India e-mail: [email protected] S. R. Dash (B) School of Computer Applications, KIIT University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_13

123

124

S. Biswas et al.

13.1 Introduction Fragile X syndrome (FXS) is the most well-known acquired reason for mental hindrance after down syndrome, with around 1 of every 4000 people affected [1]. The formation of a CGG repetition in the 5 -untranslated (UTR) sections of the FMR1 gene, which arises as a result of the meiotic disorientation of certain alleles of these repetitive sequences, causes this X-linked disease. FXS can affect both men and women, and it has a wide range of clinical manifestations [2]. FXS can manifest itself in a variety of ways, ranging from learning difficulties and a normal IQ to severe mental impairment and clinically introverted behaviours. The FMR1 gene is engaged with three unique disorders, the fragile X related tremor or ataxia syndrome (FXTAS), the fragile X syndrome (FXS) and primary ovarian inadequacy (POI) [3]. The formation of over 200 CGG repetitions in the gene FMR1 causes the Fragile X syndrome, causing the FMR1coding mRNA and translated protein (FMRP) to be deficient. At the neuronal connection, the FMR1 protein is thought to act as a controller of mRNA translation and transportation. FXTAS and POI are linked to enlarged levels of mRNA (FMR1) in persons with an additional repeat between 50 and 200 CGGs, suggesting that FXTAS may address a detrimental RNA acquisition of work effect. (Fig. 13.1).

Fig. 13.1 Pictorial representation of Fragile X Syndrome. The first category shows the normal condition, where we find CGG repeats in the range of 6–50 bp. The second category shows premutation condition, where we find CGG repeats in the range of 55–200 bp, and the third category shows full mutation condition, where we find CGG repeats more than 200 bp. Adapted & Modified from [4]

13 Rapid Detection of Fragile X Syndrome: A Gateway Towards …

125

Fig. 13.2 Phylogenetic tree analysis of the different isoforms of FMR1 gene. Amongst the 12 isoforms, the transcript variants of isoforms 1, 4, 6, 7, 9, 10, 12 are available from NCBI. Amongst these seven isoforms, isoform 4 and isoform 10 are noncoding variant. Hence, 100 implies that the node is very much upheld: it appeared taking all things together bootstrap imitates

The FMR1 gene was found to have 17 number of exons spanning 38 kb on chromosome Xq27.3 using restriction analysis and exon-exon PCR. After everything is said and done, the splice donors and acceptors at the 5’ end demonstrate a stronger commitment to consensus in comparison with 3’ end, possibly explaining the discovery of alternative splicing in FMR1 gene [5]. Prenatal diagnosis of FMR1 gene in pregnant mothers is beneficial as it helps prevents giving birth to child carrying Fragile X Syndrome [6]. The amount of CGG repeats in the FMR1 gene was used to determine the expression levels of carrier, which is located in the long arm of X chromosome. Until 2000, the size cut off for offering intrusive pre-birth finding was 52 repetitions. However, this was increased to 55 in compliance with an Israeli Society of Medical Genetics authority decision. Premutation (PM) was defined as a sequence of 55–199 repetitions. As a result, the females who participated in the study indicated that premutation with 52–54 repetitions cannot be considered as a carrier [7]. On the other hand, the existence of at least four or five isoforms was detected in various human or mouse tissues, notwithstanding the possibility that some of these relate to post-transcriptional changes. RT–PCR on RNA was used to get the Isoform 1 structure of the FMR1 gene, which includes all exons, from human lymphoblasts [8]. Below shown (See Fig. 13.2) is the phylogenetic tree analysis of the different isoforms of the FMR1 gene. The phylogenetic tree was constructed by neighbourhood joining method with bootstrapping method (Value 1000) in MEGAX v11.0. Clinical information frequently comprises of an enormous arrangement of heterogeneous factors, gathered from various sources, like hypersensitivities, sickness history, medicine, demography, biomarkers, clinical pictures, or hereditary markers, every one of which offer an alternate fractional perspective depending upon a patient’s condition. At the point when specialists and professionals examine such information, they are faced with two issues: the scourge of dimensionality and the statistical characteristics and heterogeneity of feature sources [9].

126

S. Biswas et al.

These variables cause delays and inaccuracies in illness detection, preventing patients from receiving adequate therapy. Consequently, there is an unmistakable requirement for a compelling and powerful approach that takes into consideration the early illness discovery and it very well may be utilized by specialists as an assistance for dynamic [10]. In this vein, researchers in the computational, clinical, and quantifiable disciplines are putting novel methods for presenting disease prediction and determination to the test, because traditional ideal models fail to work with this data. This necessity is very identified with the advancements in different spaces, like Artificial Intelligence (AI), Data Mining (DM) and Big Data (BD) [11]. Previous works like detection of tandem repetitions using an MGWT-based algorithm method and observing the repeat patterns in terms of period and location is done. Here, both correct and flawed tandem repeats are detected by the suggested approach [12]. Another work used RepeatHMM programme which used sequenced data that are long reads to discover repeated microsatellite counts. It was tested both on actual and replicated, with the findings indicating that it was successful in estimating recurrent counts [13]. Application of Knuth Morris-Pratt method using the ‘pbdMPI’ R package for elevated computing process for identification of repetition pattern in nucleotide sequences was also done [14]. We likewise showed an algorithm ‘FXSDetect’ which includes sequence alignment and nucleotide gene matching. Furthermore, we detailed the benefits and hindrances of every procedure depicted to help in a future foundation about which the procedure is generally reasonable for every genuine circumstance [15].

13.2 Our Approach As we have to find the CGG repeats in the promoter region, so we have extracted the nucleotide sequences of chromosome X & FMR1 gene from NCBI [16] gene library. Total 7 mRNA isoforms were present and amongst them, 5 mRNAs were Protein coding isoform & 2 were non coding. Different exon regions of FMR1 gene & Promoter sequences (−1500 Upstream < + > ) were retrieved from Ensemble genome browser [17]. UCSC Blat tool was used to validate the promoter sequence by aligning it with X chromosome. BLAT on DNA is intended to rapidly discover alignments of 95% and more prominent closeness of length 25 bases or more. BLAT isn’t BLAST. DNA BLAT works by keeping a file of the whole genome in memory. In the Algorithmic approach, use of Biopython was more efficient. The Biopython project is a non-commercial Python device collection for bioinformatics and computational biology that is fully accessible. Firstly, StringIO library was imported, which is used to read, write and parse files into Python. Blastn (v2.11.0) tool had been used to alignment between our FMR1 gene & X chromosome sequence. As ATG or start codon comes after TSS & Promoter region, coding was applied to find promoter region and number of CGG repeats present in the promoter region. Then the output was converted to human

13 Rapid Detection of Fragile X Syndrome: A Gateway Towards …

127

Fig. 13.3 Methodology through the flow diagram

readable format, based on following logic: if CGG Repeats represented as ‘n’ then, if ‘n’ = 55–200, it will be shown as Premutation Phase and if ‘n’ > 200, it will be shown as Full Mutation or the Person is affected with Fragile X Syndrome. Flow diagram is given in Fig. 13.3.

128

S. Biswas et al.

Pseudocode of “FXSDetect”: 1. Algorithm FXSDetect (ChromXFile) 2. // Input: ChromXFile - File of X chromosome 3. // Output: 2 - Positive, 1 - At Risk, 0 - Negative, -1 - Not Found Gene 5. // Description: Detects whether a patient has fragile x syndrome. 6. { 7. FMRGene := FMR standard gene; 8. Bufferlen := 1500; 9. BlastNResList := BlastN(FMRGene, ChromXFile); 10. // BlastN performs BlastN with first and second parameters as query and database 11. // respectively. Returns a list of BlastN results. 12. GeneCandRes := BlastNResList[0]; 13. if (GeneCandRes.evalue > 10^(-50)), then return -1; 14. else, do:{ 15. [sStart, sEnd] := GeneCandRes.SubjectLocation; 16. [gStart, gEnd] := GeneCandRes.Querylocation; 17. startIdx := max(0, sStart - gStart); 18. endIdx := min(FMRgene.GeneLength, sEnd + FMRGene.GeneLength - gEnd); 19. for i from startIdx to (endIdx - 3), do: { 20. if (ChromXFile[i: (i + 3)] = “ATG”), do: { 21. CGGcount := 0; 22. for idx from (i - Bufferlength) to (i - 3), do: { 23. if (ChromXFile[idx: (idx + 3)] = “CGG”), then CGGcount := CGGcount + 1; 24. if (CGGcount = 200), then return 2; 25 } 26. if (CGGcount > 55), then return 1; else, return 0; 27. } 28. } 29. return -1; 30. }

Processor: Intel (R)Core (TM) i5-4200. Ram: 8 Gb. Language: Python 3.8.8 Packages: Biopython, StringIO, NCBI XML, NCBI BlastN Command line.

13.3 Results After giving the input data of chromosome X sequence and FMR1 gene sequence, the algorithm had run properly with expected outcome results (Fig. 13.4). We have checked it in various promoter length like 250 bp, 500 bp, 1000 bp, 1500 bp, 2000 bp, 2500 bp & it had given the following CGG repeats, respectively, 34, 43, 47, 49, 50, 51 for normal X chromosome. The average time taken by algorithm for all promoter length was 36.572 s. For validating the accuracy and efficiency of the algorithm, gene location on the X chromosome was compared with UCSC genome browser, and the value was

13 Rapid Detection of Fragile X Syndrome: A Gateway Towards …

129

Fig. 13.4 Output of the proposed algorithm for finding CGG repeats in detection of Fragile X syndrome. The total time taken for finding the CGG repeats is 35.036 secs for −1500 upstream length of promoter from ATG codon

Fig. 13.5 Location of FMR1 gene in human X chromosome. The band shown here is Xq27.3, starting from 147,911,919–147,951,125 bp

147,911,919–147,951,125 bp in X chromosome, 39,207 bp (Fig. 13.4) which is similar with output alignment start location & end location (Fig. 13.5). As X chromosome of FXS patients will have CGG repeats 55–200 or more than 200, our algorithm is flexible with upstream promoter base pair range (500–2500). After getting the sequencing (Sanger/NGS) [18] data of diseased X chromosome, this algorithm can be used for precise and rapid detection of mutation in the promoter region. So, the Fragile X Syndrome rapid detection can be done at very cheap cost.

13.4 Conclusion and Future Prospect FMR1 is a gene that codes for protein FMRP, found mostly in the brain and is an essential protein for intellectual and psychological development and for the female to carry out reproduction functions normally. Mutations that are observed in this gene, can lead to fragile X syndrome, psychological disability, mistime failure of ovary and autism. This X chromosomal disease shows a repetition of CGG nucleotide sequence which is roughly around 5–44 times. A CGG repeat of around 55–200 is referred to as premutation and an excess of 200 repeats is referred to as complete mutation in the FMR1 gene, leading to the disruption of 3-dimenstional structure of FMRP protein, thus hindering the protein from being manufactured leading to fragile X syndrome. Wet lab methods such as Polymerase chain reaction (PCR) and Southern blotting were extensively used in prenatal detection and diagnosis of this severe disease from

130

S. Biswas et al.

samples collected from chorionic villi and amniotic fluid of pregnant women (HAFC) and blood samples from umbilical cord [19]. However, these methods almost gave accurate results but were at the same time, time consuming and quite expensive for a normal patient to carry out these tests. Dry lab techniques on the other hand, used sequence alignment tools and nucleotide base matching tools to identify the CGG repeats present in the FMR1 gene in a quite rapid and accurate manner within a few seconds and also helps in locating the exact location of the FMR1 gene in the X chromosome. In current condition, we were unable to collect the nucleotide sequence of diseased X chromosome so proper validation of the algorithm could not be done right now. But as it is functioning properly with normal sequences with same length, therefore, we can predict that it will be efficient enough with the diseased one. Thus, we conclude by saying that we have developed a technique which is quite rapid and authentic and at the same time is cost-effective and environmental conditions such as temperature, heat which leads to faulty results in the wet lab techniques are compromised here which can lead to a gateway for personalized treatment in the near future. But it should also be kept in mind that training people to generate such databases requires time and large storage space is required for storing these databases. It would be helpful if the disease can be detected in very early stages as amniotic fluid cells address a heterogeneous populace that incorporates both cell type got from foetal layer membranes of the actual foetus [20]. So, from amniotic fluid or sample from chorionic villi or from lymphocyte, we can get the X chromosome sequences and implying the algorithm, FXS can be detected easily. We can also design forward & reverse primers for retrieving only FMR1 gene & promoter region sequences through sanger sequencing, then by utilizing our algorithm CGG repeats can easily be found. Nonetheless, a critical benefit of ML is that specialists don’t have to determine which likely prescient factors to consider and in which combination [21] that can be a revolutionary approach towards disease diagnostic method and human health care. Declaration We have taken permission from competent authorities to use the images/data as given in the paper. In case of any dispute in the future, we shall be wholly responsible.

References 1. Berkenstadt, M., et al.: Preconceptional and prenatal screening for fragile X syndrome: experience with 40 000 tests. In: Prenatal Diagnosis: Published in Affiliation with the International Society for Prenatal Diagnosis, vol. 27, no. 11, pp. 991–994 (2007) 2. Tassone, F., Hagerman, P.J.: Expression of the FMR1 gene. Cytogenet. Genome Res. 100(1–4), 124–128 (2003) 3. Fink, D.A., et al.: Fragile X associated primary ovarian insufficiency (FXPOI): case report and literature review. Front. Genet. 9, 529 (2018) 4. Hagerman, R.J., Hagerman, P.J.: The fragile X premutation: into the phenotypic fold. Curr. Opin. Genet. Dev. 12(3), 278–283 (2002)

13 Rapid Detection of Fragile X Syndrome: A Gateway Towards …

131

5. Eichler, E.E., et al.: Fine structure of the human FMR1 gene. Hum. Mol. Genet. 2(8), 1147–1153 (1993) 6. Ryynänen, M., et al. Feasibility and acceptance of screening for fragile X mutations in low-risk pregnancies. Eur. J. Hum. Genet. 7(2), 212–216 (1999) 7. Pesso, R., et al.: Screening for fragile X syndrome in women of reproductive age. Prenat. Diagn. 20(8), 611–614 (200) 8. Sittler, A., et al.: Alternative splicing of exon 14 determines nuclear or cytoplasmic localisation of fmr1 protein isoforms. Hum. Molecular Genet. 5(1), 95–102 (1996) 9. Pölsterl, S., et al.: Survival analysis for high-dimensional, heterogeneous medical data: exploring feature extraction as an alternative to feature selection. Artif. Intell. Med. 72, 1–11 (2016) 10. Dick, R.S., Elaine B.S., Don E.D., (eds.: The computer-based patient record: an essential technology for health care. National Academies Press,(1997) 11. Huang, M.-J., Chen, M.-Y., Lee, S.-C.: Integrating data mining with case- based reasoning for chronic diseases prognosis and diagnosis. Expert Syst. Appl. 32(3), 856–867 (2007) 12. Garg, P., Sharma, S.:MGWT based algorithm for tandem repeats detection in DNA sequences. In: 2019 5th International Conference on Signal Processing, Computing and Control (ISPCC), pp. 196–199 2019. https://doi.org/10.1109/ISPCC48220.2019.8988475 13. Liu, Q., et al.: Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing. Genome Med. 9(1), 1–16 (2017) 14. Riza, L.S., et al.: Genomic repeat detection using the knuth-morris-pratt algorithm on R highperformance-computing package. Int. J. Advance Soft Compu. Appl. 11(1), 94–111 (2019) 15. Caballé, N.C., et al.: Machine learning applied to diagnosis of human diseases: a systematic review. 1–27 (2020) 16. National Center for Biotechnology Information (NCBI)[Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988]–[cited 2017 Apr 06]. Available from: https://www.ncbi.nlm.nih.gov/ 17. Hubbard, T., et al.: The Ensembl genome database project. Nucleic Acids Res. 30(1), 38–41 (2002). https://doi.org/10.1093/nar/30.1.38 18. Annear, D.J., et al.: Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci. Rep. 11(1), 1–11 (2021) 19. Shapiro, L.R., et al.: Experience with multiple approaches to the prenatal diagnosis of the fragile X syndrome: Amniotic fluid, chorionic villi, fetal blood and molecular methods. Am. J. Med. Genet. 30(1-2), 347–354 (1988) 20. Simoni, G., Colognato, R.: The amniotic fluid-derived cells: the biomedical challenge for the third millennium. J. Prenat. Med. 3(3), 34–36 (2009) 21. Kelly, C.J., et al.: Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17(1), 1–9 (2019)

Chapter 14

Summarizing Bengali Text: An Extractive Approach Satya Ranjan Dash, Pubali Guha, Debasish Kumar Mallick, and Shantipriya Parida

Abstract Text summarization is a challenging task in the field of Natural Language Processing. In the case of lower resource language like Bengali is also a difficult task to make an automatic text summarization system. This paper is based on the extractive summarization of Bengali text. Text summarization deletes the less useful information in a piece of text or a paragraph and summarizes it into a confined text. This helps in finding the required text more effectively and quickly. There are many types of algorithms used for summarizing the text. Here in this paper, we have used TF-IDF and BERT-SUM technologies for the Bengali extractive text summarization.

14.1 Introduction A human can describe his mood with the help of text. Therefore, understanding the meaning of the text is very important. Sometimes, it is hard to understand the meaning of those texts, alongside this is also time-consuming. The machine is the best way to solve this problem. As a part of machine learning, text summarization is a large field of research in natural language processing. Build automatic text summarizer is the main focusing point of all research. Text summarizer produces the gist part of a large document in a short time. An automatic text summarizer for other languages has been made previously but not for the Bengali language. Increasing the tools and technology of the Bengali S. R. Dash (B) · P. Guha · D. K. Mallick School of Computer Applications, KIIT University, Bhubaneswar, India e-mail: [email protected] P. Guha e-mail: [email protected] D. K. Mallick e-mail: [email protected] S. Parida Idiap Research Institute, Martigny, Switzerland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_14

133

134

S. R. Dash et al.

language is the main goal of this research. In this research work, we have tried to build an automatic text summarizer for the Bengali language, although working with the Bengali language was a very challenging part of this research. But until the end, we have made a base for an automatic text summarizer of the Bengali language. Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural language techniques to generate a new shorter text that conveys the most critical information from the original text. Extractive methods attempt to summarize articles by selecting a subset of words that retain the most important points. This approach weights the important part of sentences and uses the same to form the summary. Different algorithms and techniques are used to define weights for the sentences and further rank them based on importance and similarity among each other. Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of text. It is a branch of artificial intelligence that helps computers understand, interpret, and manipulate human language.

14.2 Literature Review When we were going through different research papers over the internet, we noticed that there were very little researches on the area of Bengali Text Summarization. Further, we found that the problem of Bangla character recognition is a difficult task for the machine. That is when we chose this language and decided to do intensive research work using text summarization algorithms such as TF-IDF and BERT-SUM algorithm [1]. We found few relevant papers regarding extractive text summarization or rather text summarization using text summarization algorithms. In “An Approach to Summarizing Bengali News Documents” Kamal Sarkar has done the system that produces summaries extracted from Bengali News Documents. The extraction is done in four steps: First, the documents are pre-processed; second, the summary sentences are extracted from the documents which were pre-processed; third, the sentences from the summary are ranked; and finally, the summary is generated. The research is completely based on the TF-IDF algorithm using TF-IDF in Natural Language Processing (NLP). Furthermore, the research has provided an effective way that helps in summarization performance [2]. A word serves as a key to another phrase, sentence, etc. In “Automatic Keyword Extraction for Text Summarization: A Survey” by Santosh Kumar Bharti, Korra Sathya Babu, and Sanjay Kumar Jena has a very great idea where they are extracting meaningful data (Keywords) from various e-magazines, newspapers, journals, etc. where there are tons of unnecessary columns and hence takes a lot of time in going through the whole magazine or newspaper. So, in this research, they have introduced an automatic keyword summarizer which will summarize the important relevant topics with keywords from the source, i.e., newspapers or journals and will give a

14 Summarizing Bengali Text: An Extractive Approach

135

summary of that as short and to the point as it can [3]. Automatic keyword extraction is the process of selecting words and phrases from the text document that can be at best project the core sentiment of the document without any human intervention depending on the model. They have discussed different databases such as DUC, TAC, MEDLINE; along with different evaluation matrices, such as precision, recall, and ROUGE series [4]. In “A Heuristic Approach of Text Summarization for Bengali Documentation” by Sheikh Abujar, Mahmudul Hasan, M.S.I Shahin, and Syed Akhter Hossain have done basic extractive text summarization applied with a new model and a set of Bangla text analysis rules derived from the heuristics. They have used the sentence clustering method in every Bangla sentence and words from the original text are further analyzed [5]. In the language of Odia, the OCR-based text summarization work has been done. They have used a modified py-tesseract model for a better result for extracting the text from a document. After the extraction of text, the TFIDF is used to summarize the text. The results are human evaluated [6]. The finetune of BERT-SUM model for extractive text summarization, they have used Bert pre-trained model for summarizing the data. The sentences are tokenized and add special characters to segment the sentences. After that Token Embedding and Interval Segment Embeddings with Position Embedding and at the end Transformers are used to summarize the data [7].

14.3 Methodology Encoder Representations (ER)—this refers to an encoder which is a program or algorithm used to learn a representation from a set of data. In BERT’s case, the set of data is vast, drawing from both Wikipedia (2500 million words) and Google’s book corpus (800 million words). The vast number of words used in the pre-training phase means that BERT has developed an intricate understanding of how language works, making it a highly useful tool in NLP shown in Fig. 14.1 (Tables 14.1 and 14.2).

14.3.1 BERT-SUM Fine Tune for Summarization In BERT-SUM we have inserted a token, i.e., [CLS] token at the start of each sentence. Also, we have used interval segmentation of embedding to distinguish between multiple sentences. For sentence denote (Si) we have assigned segmentation embedding, those are EA or EB depends whether i is Odd or Even, e.g., document [S1, S2, S3, S4, S5]. We would assign embedding [EA, EB, EA, EB, EA]. So that documents represent hierarchically when the upper Transformer layers combinations of represent multi-sentences, self-attention and lower Transformer layer represents adjacent sentences.

136

S. R. Dash et al.

Table 14.1 Human Evaluation ratting table of BERT-SUM Evaluator Topic name 1 (%) 2 (%) Evalutor 1

Evalutor 2

Amader-Jatiyoposhu-Bagh Amar-posa-kukur Amar-priyo-boithakurmar-jhuli Banglar-utsob Biswo-usnayonkaron-osomadhaner-upay Boi-manuserbondhu Borsar-kolkata Chhatro-jibonekhela-dhular-gurutto Ekti-bimandurghotona Amader-Jatiyoposhu-Bagh Amar-posa-kukur Amar-priyo-boithakurmar-jhuli Banglar-utsob Biswo-usnayonkaron-osomadhaner-upay Boi-manuserbondhu Borsar-kolkata Chhatro-jibonekhela-dhular-gurutto Ekti-bimandurghotona

3 (%)

4 (%)

5

100

80

55

60

Good (73%)

100 100

85 83

67 77

65 68

Good (79%) Good (82%)

100 100

75 86

71 73

70 56

Good (79%) Good (79%)

100

78

59

55

Good (73%)

100 100

76 75

89 87

71 66

Good (84%) Good (82%)

100

87

77

88

Good (88%)

100

83

58

77

Good (79%)

100 100

88 90

55 60

81 75

Good (81%) Good (81%)

100 100

85 86

68 73

65 56

Good (79%) Good (79%)

100

80

75

85

Good (85%)

100 100

69 76

70 80

75 90

Good (78%) Good (86%)

100

75

80

60

Good (79%)

The Original BERT model has a maximum length of position Embedding is 512, so we have added more position embedding, that is why the position of embedding randomly initialize and can able to tune with parameters with encoder.

14 Summarizing Bengali Text: An Extractive Approach Table 14.2 Human Evaluation ratting Table of TF-IDF Evaluator Topic name 1 (%) 2 (%) Evalutor 1

Evalutor 2

Amader-Jatiyoposhu-Bagh Amar-posa-kukur Amar-priyo-boithakurmar-jhuli Banglar-utsob Biswo-usnayonkaron-osomadhaner-upay Boi-manuserbondhu Borsar-kolkata Chhatro-jibonekhela-dhular-gurutto Ekti-bimandurghotona Amader-Jatiyoposhu-Bagh Amar-posa-kukur Amar-priyo-boithakurmar-jhuli Banglar-utsob Biswo-usnayonkaron-osomadhaner-upay Boi-manuserbondhu Borsar-kolkata Chhatro-jibonekhela-dhular-gurutto Ekti-bimandurghotona

137

3 (%)

4 (%)

5 (%)

100

80

50

50

Good(70%)

100 100

88 80

55 60

60 79

Good (76%) Good (80%)

100 100

85 80

75 65

65 64

Good (75%) Good (77%)

100

90

60

88

Good (84%)

100 100

90 75

55 63

59 69

Good (76%) Good (77%)

100

80

55

77

Good (78%)

100

80

55

77

Good (78%)

100 100

75 88

76 58

74 59

Good (81%) Good (76%)

100 100

80 75

76 78

65 57

Good (80%) Good (77%)

100

86

77

55

Good (79%)

100 100

90 75

87 69

75 56

Good (88%) Good (75%)

100

80

84

70

Good (83%)

14.3.2 BERT Extractive Summarization Let denote d as document have multiple sentences [S1 , S2 , S3 , . . . , Sm ] where Si is the i t h sentence in a document, yi 0, 1 label to each sentence’s Si indicates whether the sentence should be important sentences including as the summary. The vector ti which is vector of BERT-SUM, ith [CLS] symbol which is from the top layer can be represent as Si . Then various Transformer layer are stacked on top of BERT outputs.

138

S. R. Dash et al.

Table 14.3 Source and the summarized text using BERT-SUM and TF-IDF

hˆ 1 = LN(h l−1 + MHAtt(h l−l )) hˆ 1 = LN(h l + FFN(h l )) where h 0 = PosEmb(T ) : where T denotes the sentence vectors, where the output of BERT-SUM of PosEmb adds sinusoid positional embedding to T , indicating the position of each sentence. The sigmoid classifier is the last output layer. yˆ = (WO h iL + bO ) where h iL is the vector for Si from the top layer of the Transformer. We have taken the value of L = 1, 2, 3. The value of L is 2 is getting better results.

14 Summarizing Bengali Text: An Extractive Approach

139

Fig. 14.1 Text Summarization work-flow Diagram Table 14.4 Manual evaluation parameters for the generated summaries Parameter Description Parameter -1 Parameter -2 Parameter -3 Parameter -4 Parameter -5

Is the summarized is related to the given topic? Name of main character is verified by looking at the summarization Present of the bag of words is giving a relatable meaning Is the total no of lines in the summarization understandable and meaningful? Overall quality of the output

14.4 Result The technique applied on nine Bengali topics as one topic is shown in Table 14.3. we got the summarized data as our desire. The results are evaluated manually when we have summarized the Bengali text above using the TF-IDF and BERT-SUM text summarization techniques in natural language processing: BERT- SUM and Term Frequency-Inverse Document Frequency (TF-IDF) algorithms. Among the two results, the BERT-SUM algorithm’s outcomes very refined and to the point or rather better summarized than TF-IDF as shown in Table 14.3. We have set parameters for the human evaluation because for the language of Bengali there is no system to evaluate automatically. So, we have chosen human evaluators who can read, write, and understand Bengali properly as per parameter as shown in Table 14.3. The evaluation table of both BERT-SUM and TF-IDF attached in Tables 14.1 and 14.2. The manual evaluation parameters for the generated summaries are given in Table 14.4.

140

S. R. Dash et al.

14.5 Conclusions Throughout the paper, we can see how we summarized all the given Bengali paragraphs into a single text. Bengali is a language that is known for its rich culture and literature and the second most spoken after Hindi-Urdu in India but lacks computational resources for the machine to perform different NLP technologies and algorithms. This research gives an idea about how the TF-IDF and BERT-SUM algorithms are used for the text Bengali summarization. In our future work, we would like to consider the abstractive techniques for summarizing the Bengali text by building summarization datasets consisting of Bengali paragraphs or texts and their corresponding summaries.

References 1. Christian, H., Agus, M.P., Suhartono, D.: Single document automatic text summarization using term frequency-inverse document frequency (tf-idf). ComTech Comput. Math. Eng. Appl. 7(4), 285–294 (2016) 2. Sarkar, K.: An approach to summarizing bengali news documents. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics. pp. 857–862 (2012) 3. Bharti, S.K., Babu, K.S.: Automatic keyword extraction for text summarization: a survey. arXiv preprint arXiv:1704.03242 (2017) 4. Asa, A.S., Akter, S., Uddin, M.P., Hossain, M.D., Roy, S.K., Afjal, M.I.: A comprehensive survey on extractive text summarization techniques. Am. J. Eng. Res. 6(1), 226–239 (2017) 5. Abujar, S., Hasan, M., Shahin, M., Hossain, S.A.: A heuristic approach of text summarization for bengali documentation. In: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT). pp. 1–8. IEEE (2017) 6. Pattnaik, P., Mallick, D.K., Parida, S., Dash, S.R.: Extractive odia text summarization system: An ocr based approach. In: International Conference on Biologically Inspired Techniques in Many-Criteria Decision Making. pp. 136–143. Springer (2019) 7. Liu, Y.: Fine-tune bert for extractive summarization. arXiv preprint arXiv:1903.10318 (2019)

Chapter 15

Dynamic Hand Gesture Recognition of the Days of a Week in Indian Sign Language Using Low-Cost Depth Device Soumi Paul, Madhuram Jajoo, Abhijeet Raj, Ayatullah Faruk Mollah, Mita Nasipuri, and Subhadip Basu Abstract We develop a dynamic hand gesture recognition system on the gestures of seven days of the week in Indian Sign Language. We use Kinect V2 sensor, which is a low-cost RGB-D camera, to collect videos of gestures of two subjects and then perform key frame extraction, cropping of static regions, background subtraction, feature extraction and classification. On the resultant dataset, tenfold cross-validation by random forest classifier gives us an accuracy of around 74.29%.

15.1 Introduction Human body gestures can be viewed as an information source that gives insights into different psychological traits of the subjects involved such as intentions, emotions, interests, thoughts, feelings and ideas. In simple terms, gestures are defined as actions or motions of human body parts performed with the goal of communication. Such actions are intrinsic to humans, irrespective of cultural background, gender and age of the individuals. In fact, small children can perform certain gestures long before they start to speak and they continue using more gestures as they grow up. For human– computer interaction or human–robot interaction, automatic recognition of digitized gestures is an important research area within the field of computer vision.

S. Paul · M. Jajoo · A. Raj · M. Nasipuri · S. Basu (B) Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India e-mail: [email protected] S. Paul e-mail: [email protected] M. Nasipuri e-mail: [email protected] A. F. Mollah Department of Computer Science and Engineering, Aliah University, Kolkata 700160, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_15

141

142

S. Paul et al.

Depending on the format of the gestures, they can be classified into two types, namely static (consisting of still images of fixed gestures) and dynamic (consisting of video frames corresponding to real-time movement of body parts). On the other hand, based on which body part is involved in making the gestures, they can be classified into different categories like hand, arm, head, face, body gestures etc.

15.1.1 Related Work In this section, we discuss a few relevant works on dynamic as well as real-time hand gesture recognition. Dardas and Georganas [1] worked on detection and tracking of bare hand in cluttered background. They did face subtraction, skin detection and contour-based features extraction with a multi-class SVM. But disadvantage of their system is that it is limited to a few video game gestures only. Wang and Yang [2] developed a multiclass hand posture recognition system using an ensemble of real-time deformable detectors. However, their method worked only for recognizing alphabets. Ghotkar and Kharate [3] developed a new algorithm for continuous word recognition and sentence interpretation in Indian Sign Language, using rule-based and DTW-based methods. Bergh and Gool [4] used two cameras, one RGB and another ToF and considered an adaptive detection method for skin colour and depth. Liang et al. [5] applied 3D-CNNs to extract spatial and temporal features from video streams. Raghuveera [6] used histogram of oriented gradients and local binary patterns to build three SVM classifies and combined them to improve the average recognition accuracy up to 71.85%. Further related works include neural network-based recognition of hand shapes [7], palm area removal [8] and SVM-based method on depth data [9].

15.1.2 Our Contributions To the best of our knowledge, there is no existing work on identifying the gestures of the days of the week in Indian Sign Language (ISL). We explore a new gesture recognition approach by capturing depth-videos, creating our own data set, generating key frames, cropping off static regions, background subtractions, calculating statistical features, combining the features for all frames and finally running a random forest classifier.

15 Dynamic Hand Gesture Recognition of the Days of a Week in Indian …

143

15.2 Data Collection We use Kinect Sensor, a low-cost motion sensing device produced by Microsoft, that has a built-in RGB camera with the resolution of 1920 × 1080 pixels, as well as a depth camera with the resolution of 512 × 424 pixels, both operating at 30 frames per second. The Kinect Software Development Kit (SDK) comes with an API to aid in reading data from the sensor. The device aids in skeletal tracking of six people at a time. The skeleton consists of a set of 25 points denoting the arm-joints and leg-joints and also the positions corresponding to the head, hands and spine. For real-time data collection, we have built an application that reads the live stream from Kinect V2 camera and records RGB-D videos. We have collected depth video of Indian Sign Language for seven days of a week, i.e., from Monday to Sunday. The data has been collected from two persons and each gesture collected in five videos. Therefore, altogether, 7 × 2 × 5 = 70 videos have been collected for our dynamic data set which we refer as JU_V2_DYN_DAYS [10]. Sample depth images of the seven days are shown in Fig. 15.1.

15.3 Pre-processing The collected data is pre-processed for the purpose of gesture recognition. First, all the frames are extracted from the videos. Then, the key frames are extracted with the help of our proposed algorithm. Key frames are chosen very carefully, because these frames give the main movement of the gesture. The static part and the background is subtracted to get the gestures only. Finally, the features are extracted and classified for gesture recognition.

15.3.1 Key Frames Extraction Key frame extraction is based on the difference in matrix vector and selecting those frames whose difference is greater than some threshold. Here, we have fixed a threshold value so that if the difference between two consecutive frames is less than the threshold value, it can be ignored. In this way, we can prevent from extracting the repeated frames (in most cases, the gap between two consecutive frames is very very less than the threshold). The algorithm for key frame extraction from a video is shown in Algorithm 15.1.

144

Fig. 15.1 Sample depth images of Monday to Sunday (from top to bottom)

S. Paul et al.

15 Dynamic Hand Gesture Recognition of the Days of a Week in Indian …

145

Algorithm 15.1: Algorithm for key frame extraction

1 2 3 4 5 6 7 8 9 10

Input: Video path Output: A set of key frames Start capturing video and extract all the frames; Fix a threshold th = (width x height)/1.5; Save the first frame as a key frame and call it s; After s, scan the next frames f i one by one; Find absolute differences di of f i with s; if di > th then Save f i as the next key frame; Set s = f i and go to Step 4; else Ignore f i and continue scanning;

Another improvement in this is cropping some parts of the frames extracted that seem to be static throughout the video. For each extracted frame, it is checked whether any one of the top and bottom 10 rows and the leftmost and the rightmost 15 columns are static or not, and the frame is cropped accordingly, following Algorithm 15.2. The effect of cropping is shown pictorially in Fig. 15.2. In this particular example, the bottom-right region of the extracted frame has some blackish static part which is the other non-gesturing hand. These parts, if not removed, can interfere during hand segmentation because this hand does not signify any gesture.

Algorithm 15.2: Algorithm for cropping the static parts

4 5

Input: Set of all key frames for a specific gesture Output: A set of cropped frames Call the top and the bottom 10 rows and also the left and the right 15 columns of each frame as the cropping regions; Set a threshold for each cropping region as 10% of its area; Find the absolute differences of the cropping regions between two consecutive frames, say f 1 and f 2 ; if the difference is more than the threshold in any one of the above four regions then mark that region in f 1 as dynamic;

6 7

else mark that region in f 1 as static;

1 2 3

Crop all regions in f 1 that are marked as static and replace f 1 by the new cropped frame f 1new ; 9 Rename the frame f 2 as f 1 and continue from Step 3 with the next two consecutive frames; 8

146

S. Paul et al.

Fig. 15.2 Left: One of the extracted frames from a gesture video; Right: the frame received after cropping. The cropping bound is 10 and 15 in row and column units, respectively

15.3.2 Background Subtraction Hand segmentation is done on the key frames extracted. The algorithm makes use of the fact that the gesturing hand part is nearer to the camera (Kinect) than the rest of the body parts and environment. The algorithm for background subtraction is shown in Algorithm 15.3. The image we processed so far still has noises and the scattered holes need to be filled. So we perform dilation on these images as shown in Fig. 15.3.

Algorithm 15.3: Algorithm for background subtraction

1 2 3 4 5 6 7 8 9 10

Input: A key frame of depth video with background; Output: Image with background removed (only hand parts); Set a = 50, b = 80, area A = a × b; Set thresholds th 1 = 25% of A and th 2 = 50% of A; Set maxdepth = maximum depth from the camera; Set minnepth = minimum depth from the camera; Set xdepth := 30% of (maxdepth − mindepth); Create a depth matrix from the intensity matrix as follows: depth[i][ j] = (maxdepth − mindepth) × intensit y[i][ j]/255 + mindepth; Take a binary matrix f initialized to all zeros; if depth[i][ j] ≤ mindepth + xdepth then Set f [i][ j] = 1;

11 12

for every a × b submatrix of f ending at position (i, j) do Count the number of ones in it as N ;

13 14

if th 1 < N < th 2 then Mark this submatrix as a candidate;

15

By appending the first 200 candidates together, make a rectangular region as the hand segment;

15 Dynamic Hand Gesture Recognition of the Days of a Week in Indian …

147

Fig. 15.3 Dilated images obtained after removing noises and filling holes

Fig. 15.4 Direction vector between successive background subtracted key frames

15.4 Features Extraction We perform density-based feature extraction by dividing each collected frame into four zones as follows. Centre of gravity of a frame is defined as the point where all of the weight of the frame appears to be concentrated. This point is the average of the intensity of the individual black points of the frame. First, we extract the centre of gravity (CG) of the whole frame. We call it level 0. Then, the image is divided into four parts based on this CG. Now, we will go one level further. For the next level, i.e., level 1, the CGs of these four sub-images are calculated. Next, each sub-image is divided further into four more parts. In level 2, CGs for all 16 sub-images are then calculated. Next, the above 16 CGs of two successive frames are considered and the corresponding CGs are paired. From each pair, we calculate the length and the direction (shown in Fig. 15.4) of the line segment connecting them, yielding two features. So for 16 such pairs of CGs, we get 2 × 16 = 32 features from two successive frames. If there are n frames in the video, then we have a collection of n − 1 such successive frame-pairs, giving a total of F = 32(n − 1) features. In this work, we take n = 14 and hence F = 416. The feature matrix calculated has 70 rows and 416 columns.

148

S. Paul et al.

15.5 Experiments and Analysis We ran random forest classifier on the data set with tenfolds cross-validation. Random forest creates a set of random decision trees and combine their decisions to solve the problem of classification. Naturally, multiple uncorrelated models (trees) outperform a single model (tree). This is our motivation behind selecting this particular learning algorithm. A summary of the performance is presented in Table 15.1. Detailed accuracy by class is given in Table 15.2. The confusion matrix is given in Table 15.3.

15.6 Conclusion In this work, we have created an application to capture RGB-D videos of hand gestures of the seven days of a week in Indian Sign Language using Kinect V2. Then, we have extracted the key frames from those videos and done background subtraction with noise removal to get the right gesture from each frame. Finally, we have extracted features from each gesture and ran random forest classifier to get the classification accuracy.

Table 15.1 Summary of tenfold cross-validation Correctly classified instances Incorrectly classified instances Kappa statistic Mean absolute error Root mean squared error

52 18 0.7 0.188 0.2812

Table 15.2 Detailed accuracy by class TP rate FP rate Precision Recall 0.500 0.700 0.800 0.600 0.700 0.900 1.000 0.743

0.033 0.033 0.017 0.083 0.083 0.017 0.033 0.043

0.714 0.778 0.889 0.545 0.583 0.900 0.833 0.749

0.500 0.700 0.800 0.600 0.700 0.900 1.000 0.743

74.2857% 25.7143 %

Fmeasure

MCC

ROC area

PRC area

Class

0.588 0.737 0.842 0.571 0.636 0.900 0.909 0.741

0.544 0.697 0.819 0.497 0.573 0.883 0.898 0.701

0.784 0.933 0.960 0.848 0.914 0.985 0.998 0.917

0.580 0.830 0.928 0.668 0.745 0.953 0.991 0.813

Mon Tue Wed Thu Fri Sat Sun (Avg.)

15 Dynamic Hand Gesture Recognition of the Days of a Week in Indian … Table 15.3 Confusion matrix Mon Tue Wed

5 0 0 2 0 0 0

1 7 1 0 0 0 0

0 0 8 1 0 0 0

149

Thu

Fri

Sat

Sun

← Classified As

1 1 1 6 2 0 0

1 2 0 1 7 1 0

0 0 0 0 1 9 0

2 0 0 0 0 0 10

Mon Tue Wed Thu Fri Sat Sun

Our features are designed based on the distance and the direction between the centre of gravities of different sub-parts of each consecutive key frame pair. Potential future work can involve augmenting additional features based on other spatially interesting points. As part of the future work, we may also consider applying similar techniques to other types of dynamic hand gestures (other than days of weeks).

References 1. Dardas, N.H., Georganas, N.D.: Real-time hand gesture detection and recognition using bag-offeatures and support vector machine techniques. IEEE Trans. Inst. Measur 60(11), 3592–3607 (2011) 2. Wang, Y., Yang, R.: Real-time hand posture recognition based on hand dominant line using kinect. In: 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4. IEEE (2013) 3. Ghotkar, A.S., Kharate, G.K.: Dynamic hand gesture recognition and novel sentence interpretation algorithm for Indian sign language using Microsoft Kinect sensor. J. Pattern Recogn. Res. 1, 24–38 (2015) 4. Van den Bergh, M., Van Gool, L.: Combining rgb and tof cameras for real-time 3d hand gesture interaction. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 66–72. IEEE (2011) 5. Liang, Z.j., Liao, S.b., Hu, B.z.: 3d convolutional neural networks for dynamic sign language recognition. Comput. J. 61(11), 1724–1736 (2018) 6. Raghuveera, T., Deepthi, R., Mangalashri, R., Akshaya, R.: A depth-based Indian sign language recognition using Microsoft Kinect. S¯adhan¯a 45(1), 1–13 (2020) 7. Nolker, C., Ritter, H.: Visual recognition of continuous hand postures. IEEE Trans. Neural Netw. 13(4), 983–994 (2002) 8. Fang, Y., Wang, K., Cheng, J., Lu, H.: A real-time hand gesture recognition method. In: 2007 IEEE International Conference on Multimedia and Expo, pp. 995–998. IEEE (2007) 9. Suryanarayan, P., Subramanian, A., Mandalapu, D.: Dynamic hand pose recognition using depth data. In: 2010 20th International Conference on Pattern Recognition, pp. 3105–3108. IEEE (2010) 10. Paul, S.: JU_V2_DYN_DAYS, the dynamic hand gesture dataset (2021). https://doi.org/10. 6084/m9.figshare.14813355

Chapter 16

Sentiment Analysis on Telugu–English Code-Mixed Data K. S. B. S. Saikrishna and C. N. Subalalitha

Abstract With the increase of communication over social media, in a multilingual country like India, people tend to use more than one language in day-to-day communication and also on social media platforms in order to showcase their linguistic proficiency. For instance, combining Telugu and English or Tamil and English in the same sentence is commonly observed. This is called code-mixing. Code-mixed text demands a different level of processing, and in this paper, we have attempted to extract sentiment from the Telugu–English code-mixed sentences. We classify the polarity of the code-mixed sentences into positive and negative sentiments classes using lexicon-based approach and also using machine learning approaches, namely naïve Bayes and support vector machine classifiers. We achieved an accuracy of 82% and 85%, respectively.

16.1 Introduction The emergence of social media has got a lot of impact on current marketing and trading trends. The companies have shown great interest in observing people views on social media to align their promotion of their product. This has resulted in many automated tools to analyze people’s sentiment on a particular product, movie, educational interest etc. Sentiment analysis is one of the solutions to do this analysis. On the other hand, the way the people express their views on social media has also changed tremendously mainly in terms of language they use. As India is a land of multiple languages and dialects, people tend to mix more than one language on social media. The current trend is that they use their preferred languages written in English script. This is due to the fact that English is one of the official languages of India. This is called code-mixing and is mostly observed in Indic languages. Usually, a code refers K. S. B. S. Saikrishna (B) · C. N. Subalalitha SRM Institute of Science and Technology, Kattankulathur 602303, India C. N. Subalalitha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_16

151

152

K. S. B. S. Saikrishna and C. N. Subalalitha

to a language or a dialect. This paper puts forth a technique that analyzes sentiments of Telugu code-mixed data. There is one more concept called “code-switching” which is different from codemixing. In code-switching, a text fragment such as a phrase, clause or a sentence is completely written in one language, and the usage of another language in some other text fragment [1] as shown in Example 1 where we find the first clause written in English and the second one in Telugu using English script. It can be observed that in Example 2, the sentence is a mix of both English and Telugu words. Example 1 Rama is a good boy, bhaga chaduvutadu kuuda. Example 2 Rama is manchi boy, bhaga chaduvutadu too. Till now, there was lot of research done in the field of code-mix; some of the research areas of code-mix are language identification in a code-mixed text, named entity recognition, word embedding, parts-of-speech tagging, and text normalization. This work focuses on Telugu–English language pair as there was enough research already done on other language pairs. Like other Dravidian languages, Telugu language is also one of the agglutinative languages. There are number of people whose mother tongue is Telugu, but they tend to use English language in their daily conversations and in social media. This happens as most of their education is in language other than the native language which is English. This code-mixed text is different from conventional monolingual text and need to be processed differently for sentiment analysis. Sentiment analysis refers to assigning a positive or negative score to a text at different levels like document level, sentence level, or sub-sentence level [2]. In this work, we tried to extract the sentiment from sentences using lexicon-based approach and supervised machine learning approaches. In lexicon-based sentiment analysis, we followed dictionary-based approach for extracting the sentiments. In supervised machine learning approach for finding sentiment analysis, we used linear classifier, that is, support vector machine (SVM) classifier and probabilistic classifier, that is, naïve Bayes classifier. Examples 1. 2. 3.

#Federer chaalaa mamchi player, amdarikamte baagaa aadataadu that’s why anni trophies atanu gelichaadu—Positive Movie worst vundhi hero natana daridramu—Negative Naaku cricket vachu batting bhaga chestaanu—positive.

The rest of the paper is organized as follows. Section two describes about some of the works that were done on code-mixed text. Section three illustrates about our proposed system’s implementation. Section four showcases the datasets we used and the results obtained by following the proposed methodology. The final section talks about the conclusions we made and about the extensions that can be added to our work.

16 Sentiment Analysis on Telugu–English Code-Mixed Data

153

16.2 Literature Survey Sharma et al. (2015) have proposed strategies for extracting sentiments from a normalized text. In their work, they have used Fire 2013 and 2014 Hindi–English datasets [3]. They have divided the project into two phases: in the first phase, they tried to identify the languages present and then in the next phase, they have examined the sentiments of a text using SentiWordNet. In their preprocessing phase, they have also handled spell corrections, slang words, word play, and transliterated all the Hindi words written in English to Hindi script. By doing this, they have achieved accuracy of 85%. Pravalika et al. (2017) have introduced a framework to extract sentiments from English–Hindi code-mixed text which was scraped from Facebook graph API. They have done this using two approaches: dictionary-based approach and machine learning approach. In dictionary-based approach, the text is classified based on the count of positive and negative words which achieved an accuracy of 86% [4]. In machine learning-based approach, they have classified text using SVM, naïve Bayes, decision tree, random tree, multi-layer perceptron models with unigram words, and list of negative words and positive words as features that were available in WEKA tool and achieved an accuracy of 72% which is less when compared to that of dictionary-based approach. Malgaonkar et al. (2017) have predicted the sentiments from Hindi–English codemixed text which has been extracted from Twitter. In the first phase, they identified the language of each and every word in the code-mixed data and tagged parts of speech to it. They have used both linear search and dictionary search algorithms while searching the words in the lexicons, and they found that dictionary-based search has given better results [5]. They classified the sentiments as positive, negative, and neutral. They got an accuracy of 92.68%. Bhargava (2017) proposed a methodology by which we can determine the sentiment for sentences in which English is combined with four major Indian languages (Telugu, Tamil, Hindi, and Bengali) [6]. Firstly, they identified the language of the words which were in the sentence and then find the sentiment of every token in the sentence. Finally, the overall sentiment was calculated based on the count of positive and negative words. They achieved an accuracy of 86%. Das and Das (2017) have worked on extracting opinion from English–Bengali and English–Hindi code-mixed data which was extracted from the Facebook posts using supervised machine learning approaches. In the preprocessing phase, they removed noisy data, punctuations, numbers, and expansion of slang words in the data. After that, they used machine learning algorithms to train the classifiers with SentiWordNet, opinion lexicon, English sentiment words, Bengali sentiment words, density of revile words, parts of speech, number of every capitalized word, density of exclamation marks, density of question marks, frequency of code-switches, and number of smiley coordinates as features utilizing WEKA software [7]. They achieved 68.5% accuracy. Padmaja et al. have proposed two approaches for sentiment extraction of English– Telugu code-mixed data: one is lexicon-based approach and the other is machine

154

K. S. B. S. Saikrishna and C. N. Subalalitha

learning approach. In lexicon-based approach, the polarity of each word was counted to find the overall sentiment. With this approach, they got an accuracy of 66.82%. In supervised machine learning approach, they extracted unigram, bigram, and skip gram features and trained SVM classifier to classify text [8]. By this approach, they achieved an accuracy of 76.33% which is better than lexicon-based approach.

16.3 Proposed Methodologies 16.3.1 Dataset This work has focused on extraction of sentiments from Telugu–English code-mixed text which had been extracted from Twitter with the use of Twitter API. We obtained 744 tweets, and each tweet was annotated with polarity values 1 or -1 indicating a positive and a negative sentiment, respectively.

16.3.2 Lexicon-Based Approach Lexicon-based approach follows the following steps. Step-1. Text Pre-processing and Normalization Step-2. Language Identification Step-3. Back Transliteration Step-4. Sentiment Extraction Figure 16.1 shows the architecture of the proposed approach.

16.3.2.1

Text Preprocessing and Normalization

In the first phase of the lexicon-based approach, the tweets that have been extracted from Twitter have to be normalized. URLs, hashtags, numbers, and special symbols were removed. All the numbers were replaced with words, and abbreviations were expanded. We made all the letters to be small cases and also performed lemmatization of words. Usually, the users won’t write correct spellings on social media. All such data has to be corrected; for that, we have used spellchecker which gives a correct spelling for a particular word in the code-mixed text with the help of textblob and spello libraries of Python. As we were dealing with the data obtained from social media, it contained lot of slang words, so we replaced the slang words with their original sentences. For example, Tysm—Thank you so much, Asm—awesome. Example Before normalization.

16 Sentiment Analysis on Telugu–English Code-Mixed Data

155

Fig. 16.1 Architecture of the proposed approach

@sdfsderwe eroju nenu one interesting movie chusa;hero acting bhaga chyesadu.asm assala aa movie. pakka hit avutundi movie. After normalization. eroju nenu one interesting movie chusa hero acting bhaga chyesadu awe- some assala aa movie pakka hit avutundi movie.

16.3.2.2

Language Identification

Once the text preprocessing and text normalization are finished, each and every word in the text is looked into the language dictionaries for a match. If the word in the text matches with the word of one of the dictionaries, then the respective language is

156

K. S. B. S. Saikrishna and C. N. Subalalitha

tagged to the word. If a word in the text matches with a word in English dictionary, then -en is tagged to that word. If a word matches with a word in Telugu dictionary, then -te is tagged to that word. If the word was not found in any of the dictionaries, then it is tagged with “-un” which is unknown. Example Before tagging. eroju nenu one interesting movie chusa. hero acting bhaga chyesadu awe- some assala aa movie pakka hit avutundi movie. After tagging. eroju-te nenu-te one-en interesting-en movie-en chusa-te.hero-en acting- en bhagate chyesadu-te awesome-en assala-te aa-te movie-en. Pakka-un hit-en avutundi-te movie-en.

16.3.2.3

Back Transliteration

Once language is tagged to a particular word in the text, then all the words which have been tagged with Telugu language tag (- te) will be replaced with the Telugu script using Indic transliteration. But the words with -un tag remain same in the English script. Example: awesome

16.3.2.4

one interesting movie movie Hit movie.

hero acting

.

Sentiment Extraction

After transliteration of the Telugu text, the words in the code-mixed text were categorized using lexicon-based approach into negative or positive. Every word in the code-mixed text is then looked up into its corresponding language lexicons, and the sentiment score is attached to the word. In order to get the total sentiment of a code-mixed sentence, the total number of positive and negative sentiments of a word will be counted. If there were more positive words in the sentence, then positive score is assigned to the sentence, else negative score is assigned to the respective Telugu–English code- mixed text. . awesome one interesting movie . hero acting movie. Hit movie -Positive.

16 Sentiment Analysis on Telugu–English Code-Mixed Data

157

Fig. 16.2 Code-mixing sentiment classification using machine learning algorithms

16.3.3 Machine Learning Approach In order to compare the performance of the lexicon-based approach, we have used machine learning approaches. Machine learning approach involves the following procedures. 1. 2. 3. 4.

Text Preprocessing and Normalization Vectorization and Feature Extraction Training and classification Evaluation

Figure 16.2 shows the codemixing sentiment classification using machine learning algorithms.

16.3.3.1

Text Preprocessing and Normalization:

The preprocessing phase in machine learning approach is same as that of the lexiconbased approach.

158

K. S. B. S. Saikrishna and C. N. Subalalitha

16.3.3.2

Vectorization and Feature Extraction

After preprocessing, the text has to be transformed to numerical feature vector; we used TF–IDF (term frequency–inverse document frequency) vectorizer to transform the text to numerical feature after preprocessing. Term frequency determines how often a word was encountered in the entire document, whereas inverse document frequency represents how often a word was repeated across the documents TF =

Number of times a word(I)appeared in a document Total number of terms in the document

(16.1)

Total number of documents in the corpus documents containing word(I)

(16.2)

L=

IDF = log(L)

(16.3)

TF − IDF = TF × IDF

(16.4)

After extracting the features, the next step is to classify the text, and it has been done using naïve Bayes and SVM algorithms.

16.3.3.3

Training and Classification

Naïve Bayes Classifier Naïve Bayes classifier is one of the widely used classifiers for performing text classification because of ease of building, training, and classifying. It shows effective results while dealing with large datasets. It is probabilistic classifier based on Bayes theorem which considers occurrence of a particular feature independent of other features [9]. Bayes theorem: P(h − d) = (P(d − h) ∗ P(h))/P(d)(4)

(16.5)

where P(h − d) is the probability of hypothesis h given the data d. This is called the posterior probability. P(d − h) is the probability of data d given that the hypothesis h was true. P(h) is the probability of hypothesis h being true (regardless of the data). This is called the prior probability of h. P(d) is the probability of the data (regardless of the hypothesis). P(Negative − Movie worst vundhi hero natana daridramu) = P(Movie worst vundhi hero natana daridramu − Negative)

16 Sentiment Analysis on Telugu–English Code-Mixed Data

∗ P(Negative)P(Movieworst vundhi hero natana daridramu)

159

(16.6)

Preprocessed data and extracted features were provided as input for the purpose of training the statistical naïve Bayes classifier, which then classifies the text based on the training. SVM Classifier SVM classifier is also one of the widely used supervised machine learning classifiers which was used when the dataset is large with a greater number of features. It gives better results than many other classifiers and well suited for sentiment analysis. The main principle of SVM is to find the decision boundary or linear separator that maximizes the distance between any two classes. More than one hyperplane may separate the class, but the vector with maximum margin of separation is called as the support vector. It can also be performed even when the number of classes is more than two, with help of kernel functions. Input data space is transformed into a higher-dimensional feature space in order to make data linearly separable and suitable for the linear SVM formulation. It makes possible to determine a nonlinear decision boundary, which is linear in the higher-dimensional feature space, without computing the parameters of the optimal hyperplane in a feature space of possibly higher dimension. Even though the performance of this classifier is better than remaining classifiers, it is extremely slow while predicting the output. This happens because it uses quadratic programming which involves algorithmic complexity and demands large memory [10]. Preprocessed data and extracted features were provided as input for the purpose of training the SVM classifier. After training, the classifiers make predictions based on the trained data.

16.3.3.4

Evaluation Matrix

Once the classification was done by the classifiers, to validate the results and to test the performance shown by the classifiers, we used measures like accuracy, precision, recall, f1-score, and confusion matrix.

16.4 Results The following results were obtained by performing sentiment analysis on Telugu– English code-mixed text which was obtained through scraping tweets in Twitter using Twitter API. The proposed approaches are evaluated using precision, recall, Fmeasure, and accuracy. We manually annotated each tweet with positive and negative tags. For identification of languages in code-mixed text, we used Leipzig corpora for both English and Telugu languages. While performing classification through lexiconbased approach, we achieved an accuracy of 79%. By performing classification using

160

K. S. B. S. Saikrishna and C. N. Subalalitha

machine learning approach, we achieved accuracy of 82.9% with naïve Bayes classifier and 90.24% using SVM classifier. Table 16.1 shows precision, recall, and F-measure obtained using naïve Bayes classifier and SVM classifier, and Fig. 16.3 illustrates the confusion matrix of naïve Bayes classifier. Figure 16.4 illustrates the confusion matrix of SVM classifier.

16.5 Conclusion and Future Work In this paper, we proposed two methodologies to extract sentiment from Telugu – English code-mixed text which involves classifying the sentences as positive and negative. We have used lexicon-based and machine-based approaches to extract the sentiments. This work can extract sentiments at sentence-level code-mixed data only, which needs to be further extended to a word-level code-mixed data where a word itself is a mixture of two languages. This work can also not deal with the words which has same spelling in both the languages at this stage. However, we are working on these enhancements. This work can also be further extended by adding more languages to a sentence sentiment can be extracted from them. This work can also be extended to find out the sarcasm, ambiguity, and indirect sense of code-mixed sentences.

85

79

1

81

87

80

84

Precision

Naïve Bayes

Naïve Bayes

SVM

f1-score

0

Label

86

84

SVM 78

86

Naïve Bayes

Recall

76

91

SVM 82.5

Naïve Bayes

Accuracy

84.5

SVM

Table 16.1 Precision, recall, F-measure, and accuracy obtained for naïve Bayes, SVM classifiers, and lexicon-based approach

74

Lexicon-based approach

16 Sentiment Analysis on Telugu–English Code-Mixed Data 161

162

K. S. B. S. Saikrishna and C. N. Subalalitha

Fig. 16.3 Confusion matrix of naïve Bayes classifier

Fig. 16.4 Confusion matrix of SVM classifier

References 1. Waris, A.M.: Code switching and mixing (communication in learning language). J Dakwah Tabligh 13(1), 123–135 (2012) 2. Liu, B.: Sentiment analysis and subjectivity. Handbook of natural language processing, vol. 2, pp. 627–666 (2010) 3. Sharma, S., Srinivas, P., Balabantaray, R.C.: Sentiment analysis of code -mix script. In: 2015 International Conference on Computing and Network Communications (CoCoNet), Trivandrum, India, pp. 530–534 (2015). https://doi.org/10.1109/CoCoNet.2015.7411238 4. Pravalika, A., Oza, V., Meghana, N.P., Kamath, S.S.: Domain-specific sentiment analysis

16 Sentiment Analysis on Telugu–English Code-Mixed Data

5.

6.

7.

8.

9.

10.

163

approaches for code-mixed social network data. In: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Delhi, India, 2017, pp. 1–6. https://doi.org/10.1109/ICCCNT.2017.8204074 Malgaonkar, A., Khan, S., Vichare, A.: Mixed bilingual social media analytics: case study: Live Twitter data. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 2017, pp.1407–1412 https://doi.org/10.1109/ICA CCI.2017.8126037 Bhargava, R., Sharma, Y., Sharma, S.: Sentiment analysis for mixed script Indic sentences. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, 2016, pp. 524–529 (2016). https://doi.org/10.1109/ICACCI.2016.773 2099 Patra, B.G., Das, D., Das, A.: Sentiment analysis of code-mixed Indian languages: an overview of SAIL code-mixed shared Task@ ICON-2017. Patra, Braja Gopal and Das, Dipankar and Das, Amitava; arXiv preprint arXiv:1803.06745 (2018) Padmaja, S., Fatima, S., Bandu, S., Nikitha, M., Prathyusha, K.: Sentiment Extraction from bilingual code mixed social MediaText. In: Raju, K., Senkerik, R., Lanka, S., Rajagopal, V. (eds.) Data Engineering and Communication Technology. Advances in Intelligent Systems and Computing, vol. 1079. Springer, Singapore (2020) Goel, A., Jyoti G., Kumar, S.: Real time sentiment analysis of tweets using Naive Bayes. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT). IEEE, 2016 Liu, Z., Lv, X., Liu, K. and Shi, S.: Study on SVM compared with the other text classification methods. In: 2010 Second International Workshop on Education Technology and Computer Science (Vol. 1, pp. 219–222). IEEE (2020)

Chapter 17

Fuzziness on Interconnection Networks Under Ratio Labelling A. Amutha

and R. Mathu Pritha

Abstract Investigation of interconnection networks like circulant network, hypercubes Q(n), cube connected cycles CCC(n), butterfly network BF(n), beans networks for the admissibility fuzziness is the notion of the paper. The binary tree, star graph do not admit fuzziness under ratio labelling. Classification of these interconnection networks as Cayley graph leads to a conclusion that not all Cayley graphs are fuzzy graphs under ratio labelling.

17.1 Introduction Graph theory has its role in solving real life problems. In real-time problems, there exists many situations which does not ensure certainty. The problem of uncertainty was unnoticed till 1965. Fuzzy sets were introduced by Zadah in 1965 that concentrates on uncertainty, which paved the way to the concept of fuzzy graphs [1]. But Kaufmann gave the definition of fuzzy graphs [2]. Rosenfeld introduced the idea of fuzzy graphs and related graph theoretic concepts concerning fuzzy graphs [3]. In 1987, fuzzy relations were introduced by Zadeh. Later, operations on fuzzy graphs were introduced by Mordeson and Peng [4]. Labelling of fuzzy graph introduced by Nagoor Gani and Rajalakshmi in 2012 [5]. Later, their novel properties were discussed by Nagoor Gani et al., in 2014 [6]. Various investigation and results were published in labelling a fuzzy graph. A study on fuzzy magic labelling graph for a simple finite connected graph was done by Fathalian, Borzooei, and Hamidi in 2018 [7]. Sujatha et al. examined graceful and magic labelling in special fuzzy graph in 2019 [8].

A. Amutha (B) · R. M. Pritha Department of Mathematics, The American College, Tamil Nadu, Madurai 625 002, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_17

165

166

A. Amutha and R. M. Pritha

17.2 Basic Concepts In a graph G(V, E), where V = ϕ containing elements called nodes or vertices and E a set of unordered pair (u, v) of vertices of V called lines or edges. In a fuzzy set A, every element is assigned a membership degree σ (u) ∈ [0, 1]. A fuzzy graph is a graph G with two functions σ and μ defined as σ: V → [0, 1] and μ: V × V → [0,1], where for all u, v ∈ V , μ(u, v) ≤ σ (u) ∧ σ (v), where ∧ stands for minimum and σ ∗ = supp(σ ) = {u ∈ V : σ (u) > 0}. μ∗ = supp(μ) = {(u, v) ∈ E : μ(u, v) > 0}.  The order of a fuzzy graph G is p = x∈V σ (x). The nodes u and v are called neighbours if μ(u, v) is positive, and u and v will lie on the same edge. The set of all vertices which are neighbours of v is the neighbourhood of v denoted by N (v).  The degree of a vertex v is d(v) = μ(v, w), v = w. A regular fuzzy graph is a one in which every vertex is of same degree. An edge (u, v) in μ∗ is an effective edge if μ(u, v) = σ (u) ∧ σ (v). A fuzzy graph G is said to be a strong fuzzy graph if every edge is an effective edge in μ∗ . In a fuzzy graph G, for all u, v in σ ∗ , if μ(u, v) = σ (u) ∧ σ (v), G is said to be complete.   A circulant undirected network, G(n; ±S), where S ⊆ 1, 2, . . . , n2 , n > 2 is defined as an undirected graph consisting of the vertex set V = {0, 1, 2, . . . , n − 1} and the edge set E = {(i, j) : there is s ∈ S such that| j − i| ≡ s(mod n)}. The n-dimensional hypercube, denoted as Q n is Q n−1 × Q 1 = K 2 × K 2 × · · · × K 2 , n ≥ 2 where Q 1 = K 2 . The n-dimensional cube connected cycle CCC(n) contains the vertex set V = {(u; i) : u ∈ V (Q n ), 0 < i < n + 1} and the edges of CCC(n) are formed by connecting the vertices (u; i) and (v; j) if either. (i) (ii)

u = v and|i − j| ≡ 1(mod n), or i = j and u differs from v in the ith bit.

Type (i) edges are called cycle edges, and type (ii) edges are known as hypercube edges. The n-dimensional butterfly network, BF (n), has a vertex set, V = {(u; i) : u ∈ V (Q n ), 0 ≤ i ≤ n}. Any two vertices (u; i) and (v; j) in BF (n) are connected by edges if and only if j = i + 1 and either. (i) (ii)

u = v, or u differs from v in the jth bit.

17 Fuzziness on Interconnection Networks Under Ratio Labelling

167

The back-to-back butterfly is called n-dimensional Benes network BB(n). The BB(n) has 2n + 1 levels, each level has 2n vertices. The first and last n + 1 levels in the Beans network shapes two Butterfly networks, and these two butterfly networks shares the n + 1 th level.

17.3 Context that Enhances the Study Labelling a crisp graph with a definition for σ and μ is the notion of our research. As examining the crisp graphs for fuzziness seems to be quite interesting and challenging, we introduced ratio labelling. Ratio labelling examines and identifies fuzziness in crisp graphs. The behaviour of the simple graphs like cycles and paths, towards ratio labelling finds a scope to extend the research on complete graphs, K n . It creates inquisitiveness over regular graphs, as all the complete graphs becomes fuzzy graph under ratio labelling. The ratio labelling acquitted itself well and revealed the fuzziness property in all regular graphs. The degree bounded network graphs behaves more positively towards ratio labelling. That intend to look into interconnection network graphs. In the line of thought, some of Cayley graphs such as circulant networks, hypercubes Q(n), cube connected cycles CCC(n), butterfly network BF(n) are taken for appraise fuzziness, and is the content of this paper.

17.4 Main Results In communication networks, the strong bond between processing points and communication lines leads to a good communication system. Ratio labelling guarantees the existence of strong relationship between vertices and edges of a crisp graph. So, a crisp graph that admits fuzziness under ratio labelling reveals a strong connectivity in the communication system. Cayley graphs are used in the representation of interconnection networks. The vertices in these graphs correspond to processing elements and the edges corresponds to communication lines. Here, investigated symmetric inter connection networks like circulant networks, hypercubes Q n , cube connected cycles CCC(n), butterfly networks BF(n), Benes network, which are Cayley graphs that admits fuzziness under ratio labelling.

17.4.1 Definition Let G(V, E) be a crisp graph. The vertices and edges of G are labelled using the functions, σ : V → [0, 1], and μ : E → [0, 1], respectively, and are defined as

168

A. Amutha and R. M. Pritha

Fig. 17.1 Circulant Graph

σ (v) =

μ(u, v) =

|N (v)| |E|

max [σ (u), σ (v)]  v∈V σ (v)

(u,v)∈E

(17.1)

(17.2)

is called ratio labelling of G. Example 1 See Fig. 17.1. G is a circulant network, G(7, ±{1, 2, 3}) with V = {0, 1, 2, 3, 4, 5, 6}. The functions, σ : V → [0, 1] defined as σ (0) =

|N (A)| 2 4 = = σ (1) = σ (2) = σ (3) = σ (4) = σ (5) = σ (6) = |E| 14 7

and μ : E → [0, 1] defined for all (i, j) as   max 27 , 27 1 max[σ (i), σ ( j)] = = μ(i, j) =  σ 2 7 (v) v∈V Here, μ(i, j) = 17 < 27 = σ (i) ∧ σ ( j) for all (i, j) E. Hence,G(7, ±{1, 2, 3}) is a fuzzy graph under ratio labelling. Example 2 See Fig. 17.2. G is a binary tree with vertex set V = {A, B, C, D, E, F, G}. The functions, σ : V → [0, 1] defined as σ (A) =

|N (A)| 1 1 = , σ (B) = σ (C) = ; |E| 3 3

17 Fuzziness on Interconnection Networks Under Ratio Labelling

169

Fig. 17.2 Counter ExampleBinary tree

σ (E) = σ (F) = σ (G) =

1 1 ; σ (D) = 6 2

and μ : E → [0, 1] defined for all (i, j) as   max 21 , 16 1 max[σ (D), σ (F)]  = = , μ(D, F) = 2 4 v∈V σ (v) Here, μ(D, F) = 41 > 16 = σ (H ) ∧ σ (F) for H, F. Hence the given binary trees is not a fuzzy graph under ratio labelling. Remark Binary trees need not be a fuzzy graph under ratio labelling.

17.4.2 Theorem For all n, the hypercube Q n is a fuzzy graph under ratio labelling. Proof In Q n , number of vertices,|V | = 2n , number of edges,|E| = n2n−1 , and |N (v)| = n for all v. (v)| 1 Now, σ (v) = |N|E| = n2nn−1 = 2n−1 for all v. 

σ (v) = 2n ×

v∈V

1

= 2.

(17.3)

1 max[σ (u), σ (v)]  = n. σ 2 (v) v∈V

(17.4)

2n−1

For all, (u, v) ∈ E, μ(u, v) = μ(u, v) =

1 1 < n−1 = σ (u) ∧ σ (v), for all(u, v) ∈ E. 2n 2

This completes the proof.

(17.5)

170

A. Amutha and R. M. Pritha

17.4.3 Theorem The butterfly network, BF(n) of dimension n, is a fuzzy graph under ratio labelling. Proof In BF(n), number of vertices,|V | = (n + 1)2n , number of edges, |E| = n2n+1 ,  and|N (v)| =

2, for every vertexv ∈ V in level 0 and n . 4, for every vertex v ∈ V in level i, for 1 ≤ i ≤ n − 1

Hence, |N (v)| σ (v) = = |E| σ (v) = 



2 , for v n2n+1 4 , for v n2n+1

|N (v)| = |E|



1 , for v ∈ V in level 0and n n2n 1 , for v ∈ V in level i, for 1 n2n−1

σ (v) = 2 × 2n ×

v∈V

∈ V in level 0 and n ∈ V in level i, for 1 ≤ i ≤ n − 1

(17.6)

≤i ≤n−1

1 1 + (n − 1) × 2n × n−1 = 2. n2n n2

(17.7)

1 max[σ (u), σ (v)]  = n. n2 v∈V σ (v)

(17.8)

For all, For all, (u, v) ∈ E, μ(u, v) = μ(u, v) =

1 = σ (u) ∧ σ (v), for all(u, v) ∈ E. n2n

(17.9)

This completes the proof.

17.4.4 Theorem The n-dimensional Beans network BB(n) is a fuzzy graph under ratio labelling. Proof: In BB(n), number of vertices, |V | = (2n + 1)2n , number of edges, |E| = n2n+2 , 

2, for every vertex v ∈ V in level 0 and 2n + 1 . 4, for every vertex v ∈ V in level i, for 1 ≤ i ≤ 2n  2 |N (v)| , for v ∈ V in level 0 and 2n + 1 n2n+2 . σ (v) = = 4 , for v ∈ V in level i, for 1 ≤ i ≤ 2n |E| n2n+2

and |N (v)| =

17 Fuzziness on Interconnection Networks Under Ratio Labelling

|N (v)| σ (v) = = |E|



1 , for v ∈ V in level 0 and 2n n2n+1 1 , for v ∈ V in level i, for1 ≤ i n2n

171

+1 . ≤ 2n

Now, 

σ (v) = 2 × 2n ×

v∈V

=

1 1 + (2n − 1) × 2n × n n2n+1 n2

1 1 + (2n − 1) × = 2. n n

For all(u, v) ∈ E, μ(u, v) = Hence, μ(u, v) =

1 max[σ (u), σ (v)]  = n+1 n2 v∈V σ (v)

1 = σ (u) ∧ σ (v), for all (u, v) ∈ E. n2n+1

(17.10) (17.11) (17.12)

This completes the proof. 4.5 Theorem. The n-dimensional cube connected cycle CCC(n) is a fuzzy graph under ratio labelling. Proof In CCC(n), |V | = n2n , |E| = 3n2n−1 , and as CCC(n) is 3-regular, |N (v)| = 3.

(17.13)

Now, σ (v) =  v∈V

σ (v) = n2n ×

1 = 2. n2n−1

(17.14) (17.15)

1 max[σ (u), σ (v)]  = n , for all (u, v) ∈ E n2 v∈V σ (v)

(17.16)

1 1 < n−1 = σ (u) ∧ σ (v), for all(u, v) ∈ E n2n n2

(17.17)

μ(u, v) = μ(u, v) =

|N (v)| 1 = n−1 , for all v ∈ V |E| n2

This completes the proof.

172

A. Amutha and R. M. Pritha

17.4.5 Theorem Every regular graph is a fuzzy graph under ratio labelling. Proof Let G(V, E) be a k-regular graph with n vertices. , and In G, |V | = n, |E| = nk 2 |N (v)| = k |N (v)| 2 k = nk = , for all v ∈ V |E| n 2  2 σ (v) = n × = 2. n v∈V

Now, σ (v) =

μ(u, v) =

1 max[σ (u), σ (v)]  = , for all (u, v) ∈ E. σ n (v) v∈V

(17.18)

μ(u, v) =

2 1 < = σ (u) ∧ σ (v), for all(u, v) ∈ E n n

(17.19)

This completes the proof. Observations. The order of the graph G that admits fuzziness under ratio labelling is two.  p= σ (x) = 2. x∈V

17.4.6 Theorem Every connected circulant networks are fuzzy graph under ratio labelling. Proof: Every connected circulant networks are regular graph. By theorem 3.6, circulant network is a fuzzy graph under ratio labelling. This completes the proof.

17.5 Conclusion All Cayley graphs need not be a fuzzy graph under ratio labelling. Star graphs do not admit fuzziness under ratio labelling for n > 2. Further investigation on non-Cayley interconnection network graph is under proposal. Finding the common parameters

17 Fuzziness on Interconnection Networks Under Ratio Labelling

173

that prop up fuzziness in both Cayley and non-Cayley graphs is the next proposal. Moreover, a study to be carried out to find the reason for degree sum, p which is two for all graph that satisfies ratio labelling.

References 1. Zadeh, L.: Fuzzy sets. Inf. Control 8, 338–353 (1965) 2. Kauffman, A.: Introduction a la Theorie des Sous-emsembles Flous, vol. 1, Masson et CIE (1973) 3. Rosenfeld, A.: Fuzzy graphs. In: Fuzzy Sets and their Applications to Cognitive and Decision Process, M., Eds. Academic Press, New York, pp. 77–95 (1975) 4. Mordeson, J.N., Peng, C.S.: Operations on fuzzy graphs. Inf. Sci. 180, 519–531 (2010) 5. Nagoor Gani, A., Rajalaxmi (a) Subahashini, D.: Properties of fuzzy labeling graph. Appl. Math. Sci. 6(69–72), 3461–3466 (2012) 6. Nagoor Gani, A., Akram, M., Rajalaxmi (a) Subahashini, D.: Novel properties of fuzzy labelling graph. J. Math. 2014, 6 (2014). Article ID 375135 7. Fathalian, M., Borzooei, R.A., Hamidi, M.: Fuzzy magic labelling of simple graphs. J. Appl. Math. Comput. 60, 369–385 (2019) 8. Sujatha, N., Dharuman, C., Thirusangu, K.: Graceful and magic labeling in special fuzzy graphs. Int. J. Recent Technol. Eng. 8(3) (2019). ISSN: 2277–3878

Chapter 18

CoviNet: Role of Convolution Neural Networks (CNN) for an Efficient Diagnosis of COVID-19 D. N. V. S. L. S. Indira and R. Abinaya

Abstract Coronaviruses are a large family of viruses that can cause a human being to become critically sick. In different forms, COVID-19 affects various people. Since COVID-19 has begun its rampant expansion, isolating COVID-19 infected individuals is the best way to deal with it. This can be accomplished by monitoring individuals by running recurring COVID tests. The use of computed tomography (CT-scan) has demonstrated good results in evaluating patients with possible COVID-19 infection. Patients with COVID will heal with the help of antibiotic therapy from vitamin C supplements. Patients with these symptoms need a faster response using non-clinical methods such as machine learning and deep neural networks in order to manage and address additional COVID-19 spreads worldwide. Here, in this paper we are diagnosis the covid-19 patients with CT-scan images by applying XGBoost classifier. Developed a web application which basically accepts a patient CT-scan to classify COVID positive or negative. After that, the negative class patients with symptoms are suggested with a danger rate with the help of age groups, health-related issues, and the area he/she belongs to. Three machine learning algorithms, the decision tree, random forest, and k-nearest neighbor algorithms, were used for this. The results of the current investigation showed that the model built with the decision tree data mining algorithm is more fruitful in foreseeing the risk rate of 93.31 percent overall accuracy of infected patients.

18.1 Introduction Algorithms for data mining play an important role in the study and forecasting of epidemics. The data mining methods help to recognize the disease trends in the D. N. V. S. L. S. Indira (B) Department of Information Technology, Gudlavalleru Engineering College, Gudlavalleru, AP 521356, India R. Abinaya Department of Computer Science and Engineering, Gudlavalleru Engineering College, Gudlavalleru, AP 521356, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_18

175

176

D. N. V. S. L. S. Indira and R. Abinaya

presence of large epidemic data, so that early intervention can be planned to avoid the spread of the virus. In image classification-based problems, which are the primary functionality and theory of deep neural networks based on CNN, there has been enormous demand and participation from various individuals. In addition, several concerns are being discussed to avoid this COVID-19 [1–3] virus. Data scientists gave their best in their own way, and several competitive works have also been done. In addition to the increase in the quantity, availability, and value of images in our daily lives, image applications have been quickly established. So we are using CT-scan images in this scheme in the detection process. We propose a web application that mainly accepts CT-scan of patients and will be able to detect whether they have COVID-19 [4, 5] positive or not, after examining their machines. The ultimate objective of this project is to create an application for detecting COVID-19 with CT-scan findings. Then we can visualize the exact risk rate of individuals with their age groups and symptoms. Suggestions on health, COVID Designated Centers list that also show visitors. To help us assess the information, anticipate the COVID-19 hazardous status.

18.2 Literature Survey Eisenhofer et al. [4] reported that evolving sequencing technology and accessible datasets are growing significantly, that these methods are inefficient for analytical use in multiple sequences, and that various alignment methods for evaluating the relationship between organisms are used. A. For Bernheim’s (ct) results, SARS-CoV-2 was used, et al. [5], He et al. [6] CT shows basic radiological CAD, transfer learning systems used to diagnose COVID-19; network models used in the pretraining process. A framework for consolidating and grouping COVID-19 profound highlights and preprepared organization models was utilized by Ozkaya et al. [7]. Pre-prepared organizations are furnished with a precision of 95% in their philosophy. Randhawa et al. [8] have adopted a profound learning strategy in order to detect a number of genomic human examples of viral successions from non-viral groups. Al-Rousan et al. [9] a few analysts utilized the conventional Susceptible-Exposed-Infectious-Removed (SEIR) model to mimic the spread of the plague and applied faculty relocation information to change the model. Zhu et al. [10] explored the city-scale elements of the pandemic utilizing cell phone city information. Patron et al. [11] thinking about that infections have a brooding period, we ignored the ramifications of POI conclusion. There is no proper platform or software with regard to COVID-19 for the detection of viruses using technology. Moreover, the traditional physical examination procedure, such as RT-PCR, and anti-speed controls, has been used if we see what happens now with COVID-19 screening! The first to be generated and widely implemented when COVID-19 pandemics began (Corman et al., January 2020) were reverse transcriptase PCR (RT-PCR) tests [3], which remain the most significant approach to diagnostics. While antigen tests are currently used and enhanced, they are still not widely available and currently have variable reliability. Testing RT-PCR is the key

18 CoviNet: Role of Convolution Neural Networks …

177

diagnostic of COVID-19, because of its sensitivity, precision, and viability in comparison with viral culture. In terms of symptoms, the test limit for detection and location of the collection site, the timing of the PCR evaluation should be determined. New development in RT-PCR is largely focused on processing easiness, fast production times, and reductions in material use.

18.2.1 Limitations • Tests are made available but they are not enough as population is more in the country. • Difficult to gather the required information for preparing the tracking purpose. • Wastage of time due testing process and less availability of testing laboratories. • Wastage of money due to paying money to the commercial hospitals for the conduction of tests which may be or may not be cost-effective.

18.3 Proposed Methodology This “COVID-19 detection using XGBoost” is a Web site in which we can submit a CT-scan report in order to get the result of prediction. This Web site provides the information in simple language, and it is understandable by everyone. A user can use this website for the prediction of the COVID-19 virus using the CT-scan reports of a patient. Nowadays, the most common threat to humans is COVID-19. COVID-19 has been causing a huge damage to every sector. As time passes, the count and the intensity of COVID-19 are increasing and its effects are worse. Testing everyone is the most challenging task in countries like India. Taking up COVID-19 test takes time and cost (private hospitals). In some rare conditions, the doctors are unable to predict if person has COVID-19 or not. Here comes the idea of our project, easing the process of COVID detection. Our main goal is to minimize the time, cost, fear of the people, and also to act as a second opinion for doctors. Globally, the epidemic of coronaviruses (COVID-19) has resulted in 114,735,032 people and over 25, 44,226 deaths as of today. The diagnosis of COVID-19 poses a significant problem of inefficiency and lack of diagnostic testing. In the evaluation of the patients with probable COVID-19 infection, computer tomography (CT) was promising [12]. CT testing is difficult to understand and requires skilled commitment to help diagnostic errors. CAD frameworks have addressed the challenges created by CT experts. In CT tests for extracting the COVID-19 function from CNN and XGBoost, this paper presents a diagnostic process. The technology consists of 800 CT, COVID-19 400, and non-COVID-19-driven CNN technical systems. We use XGBoost to evaluate the information extracted. Ninety-three percent of the results are correct. The results show that specialists can use the proposed technique as a diagnostic tool.

178

D. N. V. S. L. S. Indira and R. Abinaya

In this analysis, data mining techniques are used to track routine activity using real-time information and the COVID-19 risk assessment across the region. The decision tree, random forest, and k-nearest neighbor algorithms were applied to the dataset for building the model. The probability of recovery from the virus in COVID19 patients is calculated in this model. Those that are probably healing and those who are at greater risk. The results show that the suggested method with the decision tree gives a fair accuracy of 93.31%. The workflow in this paper is presented in Fig. 18.1. The work is divided into two steps. In the first step, it loads CT-scan data and trained with XGBoost classifier for detecting whether the input image is COVID-positive or COVID-negative. In the subsequent step, it load structured data and trained the data with three ML algorithms like decision tree, random forest, and k-nearest neighbor. After doing a lot of trails on these algorithms, we identified DT as the best ML algorithm for this work. It gives nearly 95% accuracy on COVID-19 structured data collected from kaggle of Andhra Pradesh State. First the web page accepts input CT-scan image and test it. If it is positive case, we are guiding them strictly to follow isolation and treatment. If the test gives negative report, then we go with the second step by accepting all symptoms of the patients. By applying decision tree algorithm, we got danger rate. We are classifying danger rate as low, moderate, and high with the rates of 10% to 40%, 41% to 80%, and > 80%. People with more than 80% danger rate must be in ICU for treatment.

Fig. 18.1 Flow diagram of proposed work

18 CoviNet: Role of Convolution Neural Networks …

179

18.3.1 XGBOOST Classifier XGBoost is a machine learning classification method. It is built on a decision tree which uses a gradient booster structure [1]. ANNs are utilized for the analysis of problems with forecasts of unstructured data, such as images, audio, and text. However, among all ML algorithms these decision trees are considered as right choice for small to medium structured data. XGBoost can be used for solving problems with regression, grouping, rating, and user-defined prediction. The evolution of decision tree algorithm (decision tree–bagging–random forest–boosting–gradient boosting–XGBoost) is shown in Fig. 18.2. XGBoost regression: Gain = Left similarity + Right similarity − Root similarity

(18.1)

The new prediction is: Initial predicted value + learning rate(eta) × output value

(18.2)

XGBoost uses loss function to build trees by minimizing the following value: L(∅) =

 i

l( yˆi , yi ) +



( f k )

k

where ( f ) = 7T + 1/2 λ|ω|2

Fig. 18.2 Evolution of XGBoost algorithm from decision trees

(18.3)

180

D. N. V. S. L. S. Indira and R. Abinaya

The aim is to find an optimized leaf output value to minimize the entire equation. So, as follows, we replace the first part. L(t) =

n    l yi , yˆit−1 + f t (X i ) + ( f t )

(18.4)

i=1

18.3.2 Algorithm for CT-Scan Image Classification Input: CT-Scan Image. Output: COVID Positive or Negative message. 1. 2. 3. 4. 5.

Dataset loading. Carry out a dataset visualization. Prepare data that is the smoothing of images data. Separate the train and test data into the dataset. Model Set a. b. c.

6. 7. 8.

Use Model.Summary() to perform the model description. Data increase is achieved by batch and some random transformation to improve model generalization. Neural networks are learning increasingly sophisticated characteristics from an input image. a. b. c.

9.

The first hidden layer=> just learn local edge patterns to enter the layer=> food level More complicated representations are learned from each layer subsequent=> filters==> To categorize use XGBoost

Using Softmax and ReLU feature to execute the Deep CNN a. b.

10. 11. 12. 13.

Sequential= > NN layer linear stack= > CNN forward feed Layers=> for nearly every NN Layers CNN: -> maxpooling2D, Convolution 2D.

ReLU: f(x) = 1(x < 0)(αx) + 1(x > = 0)(x)  Softmax: f(xi ) = (Exp(Xi )i = 0,1,2,…,k)/ K j (Exp(Xj )

Check a few sample images. Save model. Create a frontend for image acquisition purpose Recommend the class based on the accuracy measures.

18 CoviNet: Role of Convolution Neural Networks …

181

Fig. 18.3 CT-scan photograph copies of COVID-19 positive patients

Fig. 18.4 CT-scan photograph copies of COVID-19 negative patients

18.3.3 Database This paper is using two types of dataset. First one CT-scan images dataset with two categories like positive and negative. Second one structured dataset with symptoms and a categorical value of danger rate. Both datasets are acquired from kaggle. Sample dataset is represented in Figs. 18.3, 18.4, and 18.5.

18.4 Screen Shots Figures 18.6, 18.7, 18.8, 18.9, 18.10, 18.11, 18.12, and 18.13 express the screenshots of proposed work. Figures 18.6, 18.7, and 18.8 are for classification of CT-scan image using XGBoost classifier. Figures 18.9, 18.10, 18.11, 18.12, and 18.13 are the outputs from the structured data analysis. Here we express the infected rate on age groups.

182

D. N. V. S. L. S. Indira and R. Abinaya

Fig. 18.5 Structured dataset COVID-19 symptoms

Fig. 18.6 home page of this website

Observed that the age between 20 and 70 got infected more than others. Finally predicted the danger rate from patient symptoms.

18.5 Results and Discussion This paper works on two datasets. This work is represented in flow diagram. First we used CT-scan images to detect patient position like he/she was affected with COVID or not. Used CNN five layers to extract features from images. After that applied XGBoost classifier is to distinguish image. Trained this model on kaggle CT-scan

18 CoviNet: Role of Convolution Neural Networks …

Fig. 18.7 Uploading an input CT-scan image to the application

Fig. 18.8 Classifying the input CT-scan image is either positive or negative with XGBoost

183

184

D. N. V. S. L. S. Indira and R. Abinaya

Fig. 18.9 Home page for checking the status of COVID-19 patients

Fig. 18.10 Prediction of danger rate of patient using decision tree classifier

image dataset. Reached 95.23% accuracy for test data. If the patient position is negative and if he or she suffers from all COVID symptoms then tested with model for detecting danger rate. For this, we trained the model with three machine learning algorithms like decision tree, KNN, and random forest on structured data collected from kaggle for danger rate classification attribute. Finally, decision tree is the best classifier with the accuracy of 93.31% (Figs. 18.14, 18.15, and 18.16).

18 CoviNet: Role of Convolution Neural Networks …

185

Fig. 18.11 Count of patients affected with COVID by age group

Fig. 18.12 Patients count related to age versus city

18.6 Conclusion and Future Work The application makes the life of the patient easier in a way of reducing the time taken for the test and also as a second opinion for the doctors in some rare conditions of ambiguity. It can be really helpful in a fight against the pandemic COVID-19. Moreover, patients with ambiguity in result can be tested in a very short period of time easily and more accurately and safely. This application can be a boon in the situation where doctors are in a state of ambiguity in COVID-19 detecting process. It acts as a good weapon in fighting against this pandemic. This can be enhanced further by improving its accuracy. This can also be enhanced by adding one more module which

186

D. N. V. S. L. S. Indira and R. Abinaya

Fig. 18.13 Heat map of various symptoms in structured data

Model Trained on ML algorithms 95 90

Accuracy

85

Precision

80

Recall

75

F1-Score

70 KNN

Random Forest

Decision Tree

Fig. 18.14 Structured data trained using KNN, RF, and DT Fig. 18.15 CT-scan data tested by two algorithms

100 95 90

Accuracy

85 80

Gradient Boosng

XGBoost Classifier

18 CoviNet: Role of Convolution Neural Networks …

187

XGBoost Classifier TP

TN 186 138

80 35 15 50

20 100

14

12 150

200

Fig. 18.16 True positive and true negative values from XGBoost classifier

sends alert messages to the local administrators about COVID-infected persons. This can also be further developed as a mobile application with all the features which can improve its reach and usage.

References 1. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system, KDD ’16, August 13–17, 2016, San Francisco, CA, USA 2. Gorbalenya, A.E., Baker, S.C., Baric, R.S., de Groot, R.J., Drosten, C., Gulyaeva, A.A., et al.: Severe acute respiratory syndrome-related coronavirus: the species and its viruses—a statement of the coronavirus study group (2020) 3. Corman, V.M., Landt, O., Kaiser, M., Molenkamp, R., Meijer, A., Chu, D.K., Bleicker, T., Brünink, S., Schneider, J., Schmidt, M.L., Mulders, D.G., D.G.J.C., Mulders, B.L. Haagmans, B. van der Veer, S. van den Brink, Wijsman, L., Goderski, G., Romette, J.-L., Ellis, J., Zambon, M., Peiris, M., Goossens, H., Reusken, C., Koopmans, M.P.G., Drosten, C.: Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Euro. J. Infect. Dis. Surveill. Epidemiol. Prev. Control 25(3) (2020) 4. Eisenhofer, R., Weyrich, L.S.: Assessing alignment-based taxonomic classification of ancient microbial DNA. PeerJ 7 (2019) 5. Bernheim, A., Mei, X., Huang, M., Yang, Y., Fayad, Z.A., Zhang, N., et al.: Chest CT findings in coronavirus disease-19 (covid19): Relationship to duration of infection. Radiology 0, 200463 (2020) 6. He, X., Yang, X., Zhang, S., Zhao, J., Zhang, Y., Xing, E., et al.: Sample-efficient deep learning for covid-19 diagnosis based on CT scans (2020) 7. Ozkaya, U., Ozturk, S., Barstugan, M.: Coronavirus (covid-19) classification using deep features fusion and ranking technique (2020) 8. Randhawa, G.S., Soltysiak, M.P.M., El Roz, H., de Souza, C.P.E., Hill, K.A., Kari, L.: Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID19 case study. PLoS ONE 15(4) (2020) 9. Al-Rousan, N., Al-Najjar, H.: Now casting and forecasting the spreading of novel coronavirus 2019-nCoV and its association with weather variables in 30 Chinese provinces: a case study (2020) 10. Zhu, X., Zhang, A., Xu, S., Jia, P., Tan, X., Tian, J., et al.: Spatially explicit modeling of 2019-nCoV epidemic trend based on mobile phone data in Mainland China. medRxiv (2020)

188

D. N. V. S. L. S. Indira and R. Abinaya

11. Backer, J.A., Klinkenberg, D., Wallinga, J.: Incubation period of 2019 novel coronavirus (2019nCoV) infections among travellers from Wuhan China 20–28 January 2020. Eurosurveillance 25(5), 10–15 (2020) 12. Gozes, O., Frid-Adar, M., Greenspan, H., Browning, P.D., Zhang, H., Ji, W., et al.: Rapid Ai development cycle for the coronavirus (Covid-19) pandemic: initial results for automated detection patient monitoring using deep learning CT image analysis (2020)

Chapter 19

Deep Learning for Real-Time Diagnosis of Pest and Diseases on Crops Jinendra Gambhir, Naveen Patel, Shrinivas Patil, Prathamesh Takale, Archana Chougule, Chandra Shekhar Prabhakar, Kalmesh Managanvi, A. Srinivasa Raghavan, and R. K. Sohane Abstract Agriculture is the main stay of India as it contributes significantly to the economy by providing employment to more than half of country’s workforce. The major problem faced by farmers during the crop production is the pest and disease attack. Lack of right technical advice at right time leads to improper farm decision leading to economic losses. It is necessary to provide an interface to farmers to identify the pest and disease problem faced during agricultural processes and get a solution to that problem from agricultural experts. We have developed an interface in the form of an Android application to upload the images and a Web interface to display the uploaded images with their disease or pest type. The uploaded images are used to train a CNN model which will help to identify pests and diseases for the crops. This system is currently being developed for four major crops, which are rice, maize, chickpea, and lentil.

19.1 Introduction Agriculture is the backbone of Indian economy as it is providing employment to more than half of its work force and contributes up to 17% towards Gross Domestic Product [1]. Agriculture production worldwide is affected by several biotic and abiotic constraints. It is estimated that pests, diseases, and weed infestations account for approximately 40% of the agricultural production [2]. Due to lack of right diagnosis at right time, the expected pest/disease control is not achieved, leading to monetary loss as well as environmental pollution due to improper and excess use of agrochemicals. There is a need to develop deployable real-time pest and disease diagnostic modules using emerging technologies. Although several J. Gambhir · N. Patel · S. Patil · P. Takale · A. Chougule (B) Sanjay Ghodawat University, Kolhapur 416118, India e-mail: [email protected] C. S. Prabhakar · K. Managanvi · A. S. Raghavan · R. K. Sohane Bihar Agricultural University, Sabour-813210, Bhagalpur, Bihar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_19

189

190

J. Gambhir et al.

molecular and serological techniques are available, their deployment in field is not cost-effective and requires specialized personnel. Till date, the most common method for detection of pest and disease in field is through visual observations which requires expertise and continuous monitoring. Off late, several image processing techniques have successfully been used and implemented for disease and pest detection. Out of this, some have been applied in research and development of agricultural sector. The author Rangarajan et al. [3] suggested using two pre-trained deep learning models named AlexNet and VGG16 net to classify six different diseases in tomato plants. Application of pre-trained deep learning models to classify new objects, i.e., transfer learning was used. The above-mentioned models are more accurate as compared to support vector machine algorithm which yielded an accuracy of 97.29% for VGG16 net and 97.49% for AlexNet. When provided with optimized datasets, Alexnet provided low execution time with good accuracy compared to VGG16 net. Author Jiao et al. [4] here describes that traditional methods of agriculture pest detection have low efficiency and accuracy. To overcome this, an anchor-free region convolutional neural network (AF-RCNN) is introduced for precise recognition of 24-classes of pests. Firstly, a fusion module is used to gain information, especially on small pests. After all this, an anchor-free region convolutional neural network (AF-RCNN) is deployed to identify the different pests. Also, during training, optimizations are done which improves the localization accuracy of pests. Authors Malicdem and Fernandez [5] have mentioned his views on rice blast diseases in tropical regions of the world. Data from government agencies was obtained and modified for building predictive models. Weather conditions which also affect the disease were also taken into account. Using all these features, to predict the occurrence of rice blast disease, artificial neural network (ANN) and support vector machine (SVM) classifiers were used. These models provided accurate results, but the SVM model showed more accurate predictions. Author Hill et al. [6] studied and demonstrated the usefulness of machine learning algorithms to predict leafroller disease on kiwifruits. Five ML algorithms (decision tree, naïve Bayes, random forest, AdaBoost, and support vector machine) were used to speculate insecticide administration for leafroller disease by predicting the amount of insecticide was sprayed or not. Orchard management attributes were important for models for forecasting accuracy.

19.2 Methods 19.2.1 Data Collection Data collection is a crucial part of our research. The characteristics of each disease and pest can be used to distinguish them. As a result, images of pest and disease are required. The primary goal of gathering the images is to utilize them to train machine learning models. The more photographs we collect, the more accurate the

19 Deep Learning for Real-Time Diagnosis …

191

model will become. We are dealing with significant amount of data, including many forms of crop diseases and pests. We took some datasets available on Kaggle [7] via the Internet, but got limited usefulness from the dataset. There is an additional need of images which need to be collected directly from farms which will strengthen the dataset. We are starting with rice images and later move on to maize, chickpea, and lentil crop images.

19.2.2 Android Application The application will act as a handy tool that will help in further data collection process. This application is distributed among the local farmers. Application has a simple and understandable UI. The farmer has to give certain permissions to application for the functionality of the app. Farmer has to register with his mobile number which will be useful for us to track how many images the farmer has uploaded. Location access to get info about the area the crop is cultivated in and gallery access to upload the images which are available in the gallery. Farmers cannot upload images other than crops or plants as there is a simple object detection model added which identifies crops. The uploaded images will directly be uploaded to a cloud server and stored

Fig. 19.1 a App listed on play store, b splash screen, c app home screen

192

J. Gambhir et al.

Fig. 19.2 Website home page

in different directories according to crops and their metadata [8] Fig. 19.1 displays application.

19.2.3 Website There is a need to monitor the incoming images coming from the local farmers. Some images may be distorted, blurry, etc. if included in the learning algorithms, it will lower the accuracy of the model. To properly sort out images and removing unwanted images is the goal here. The website is for Admin use only. Where each user will be able to sort the images and view all the data related to the images. The website is connected to the cloud server where the images are stored. Website retrieves the images and metadata and displays it on the website. The website also provides real-time stats of the images on the cloud [9] Figs. 19.2 and 19.3 exhibit website snapshots.

19.2.4 Convolutional Neural Networks (CNN) Artificial neural networks designed in a special unique structure makes a convolutional neural network (CNN). Features of the visual cortex are used in CNNs to achieve best results in visual perception tasks. CNNs are composed of straightforward segments, which are convolutional layers and pooling layers [10]. We developed a CNN model using transfer learning, which achieved accuracy greater than 89% for rice plant disease types. We used InceptionResNetV2 for transfer learning Fig. 19.4. Depicts the CNN model trained in this research. Once the CNN

19 Deep Learning for Real-Time Diagnosis …

193

Fig. 19.3 Viewing the uploaded images

Fig. 19.4 General architecture of inceptionResNetV2

model gets downloaded, we erased all previous weights. Once images are read in the input size of 299 × 299 which is the default input size for given transfer learning model and with channel a size of 3, since the images are RGB images. After that, images are extracted, and rescaling is done to convert it into 256 pixels. Also, some augmented operations are performed such as shearing of the image, zooming of image, and horizontal flip. We removed the last layer of Inception ResNetV2 CNN model and added two last layers which are flatten layer and dense layer. We can conclude from summary of the model that we have total parameters of 54,729,956. Among of them, only 393,220 parameters are trainable for our dataset. Dense layer has four classes for three rice diseases (‘Brown Spot,’ ‘Hispa,’ ‘Leaf Blast’) and one is for healthy rice leaf. It has a ‘softmax’ activation function which is generally used in multiclass classification. The code is implemented in Python using TensorFlow and Keras library. Training and validation runs of the model were carried out on Kaggle notebooks hosted on

194

J. Gambhir et al.

Fig. 19.5 a Represents picture of Brown Spot of rice. b Represents picture of Healthy rice leaf. c Represents picture of rice Hispa damage on leaf. d Represents picture of Leaf Blast of rice

server at Google Cloud using Nvidia GPU’s. To analyze the operation of the working model, we picked rice plant which has three disease types and healthy leaves, shown in Fig. 19.5. We trained the CNN model using optimizer ‘adam’ and used metrics factor based on accuracy. We have used ‘Categorical Cross-Entropy loss’ function which is used on one hot encoded image size (299,299,3). We have trained images with a batch size of 32 which is very much useful for feature extraction and learning. We have specified a checkpoint for training which will monitor validation loss with mode equal to min, which means it will save minimum loss and respective validation accuracy. We have specified an initial learning rate equal to 0.0001 and improvement factor is based on validation loss. We have specified a function to reduce the learning rate which monitors improvement of validation loss after every three epochs. If validation is not improved in previous three epochs, then it reduces learning rate by a factor of 0.2. The model introduced here is normally applicable to any plant disease type with detectable symptoms, to test the performance of the CNN model. We select the rice plant type with four classes. Total images for training and classification (total 3355) were sourced from Kaggle dataset. The dataset contains two folders of train and test data. Train data contains four classes, brown spot (523), healthy (1488), Hispa (565), and leaf blast (779), and validation contains four classes, brown spot (123), healthy (123), Hispa (123), and leaf blast (123). The model is trained on a Kaggle notebook with 70 epochs. Each epoch contains 100 steps per epochs in training dataset with a batch size of 32. During one of the 70 epochs, we found validation loss up to 0.3987 and validation accuracy up to 0.898 which is saved in.h5 file.

19.3 Results and Discussion Figure 19.6 shows the line graph between training accuracy and validation accuracy. In the given graph, both the training accuracy and validation accuracy fluctuate in the first 40 epochs. But after 45 epochs, validation accuracy has no fluctuations till 70 epochs. While training accuracy has small fluctuation till the last epoch (Fig. 19.7).

19 Deep Learning for Real-Time Diagnosis …

195

Fig. 19.6 Represents accuracy graph

Fig. 19.7 Represents Loss graph and shows the line graph between training loss and validation loss

At the starting, epochs loss was nearly beyond 7.00. But after some epochs, when the learning rate was reduced by a factor of 0.2, then the loss graph also exponentially decreased. After 20 epochs, loss is in from 0 to 1 range. At the last epoch, the validation loss goes up to 0.398. Table 19.1 shows different transfer learning models used on our dataset. All the results (accuracy and loss) for CNN models for 70 epochs are mentioned Table 19.1 Accuracy and loss of each CNN model CNN model

Input size

Optimizer

Metrics

Validation accuracy

Time (h)

VGG

(2,992,993)

Adam

Accuracy

0.6079

14

ResNet50

(2,992,993)

Adam

Accuracy

0.7014

14

ResNetl52V2

(2,992,993)

Adam

Accuracy

0.7243

13

InceptionV3

(2,992,993)

Adam

Accuracy

0.7856

12

Xception

(2,992,993)

Adam

Accuracy

0.7726

12

InceptionResNet V2

(2,992,993)

Adam

Accuracy

0.8937

12

196

J. Gambhir et al.

Fig. 19.8 Snapshot of the implemented CNN model undergoing training

above. The table also shows the average time in hours required to train the dataset. InceptionResNetV2 provided great accuracy and so we decided to use it. We have used Checkpoint in code for saving best validation accuracy and validation loss in.h5 file. We used keras.callbacks library for saving the checkpoint which monitors validation loss. We got the best accuracy and loss value at 68th epoch that is 0.8923 and 0.3852, respectively. The average time required to train the model was 12 h (Fig. 19.8). The model was trained on a machine having an Intel(R) Xeon(R) CPU @ 2.00 Ghz, Accelerator-GPU (Tesla P100), 16 GB of RAM, and HDD of 74 GB.

19.4 Conclusion Deep learning models are widely used for pest and disease detection in plants. However, the major problem of low accuracy is faced when the model is provided with real-world images. In this paper, we test different models with the same dataset and compare the results obtained from them. Using transfer learning for CNN is easy to implement in such scenarios as the base model is done and we have to focus only on the important part, i.e., the identification process. After trying out different CNN models, InceptionResNetV2 provided with the best accuracy of 0.8937 with minimal loss. The accuracy will be greatly increased later as we acquire more images for the dataset. This model works perfectly together with the application and website ecosystem.

19 Deep Learning for Real-Time Diagnosis …

197

References 1. Wagh, R., Dongre, A.P.: Agricultural sector: status, challenges and its role in Indian economy. J. Commer. Manag. Thought 7–2 (2016) 2. Carvajal-Yepes, M., Cardwell, K., Nelson, A., Garrett, K.A., Giovani, B., Saunders, D.G.O., Kamoun, S., Legg, J.P., Verdier, V., Lessel, J., Neher, R.A., Day, R., Pardey, P., Gullino, M.L., Records, A.R., Bextine, B., Leach, J.E., Staiger, S., Tohme, J.: A global surveillance system for crop diseases. Science 364(6447), 1237–1239 (2019) 3. Rangarajan, A.K., Purushothaman, R., Ramesh, A.: Tomato crop disease classification using pre-trained deep learning algorithm. Procedia Comput. Sci. 133(2018), 1040–1047 (2018) 4. Jiao, L., Dong, S., Zhang, S., Xie, C. and Wang, H.: AF-RCNN: An anchor-free convolutional neural network for multi-categories agricultural pest detection. Comput. Electron. Agricult. 174, 105522 (2020) 5. Malicdem, A.R., Fernandez, P.L.: Rice blast disease forecasting for Northern Philippines. WSEAS Trans. Inf. Sci. 2015 (2020) 6. Hill, M.G., Connolly, P.G., Reutemann, P., Fletcher, D.: The use of data mining to assist crop protection decisions on kiwifruit in New Zealand. Comput. Electron. Agric. 108(2014), 250–257 (2014) 7. https://www.kaggle.com/minhhuy2810/rice-diseases-image-dataset 8. https://play.google.com/store/apps/details?id=com.scriptbuild.agrikanti 9. https://www.agrikanti.in 10. https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnndeep-learning-99760835f148.

Chapter 20

Sentiment-Based Abstractive Text Summarization Using Attention Oriented LSTM Model Dipanwita Debnath, Ranjita Das, and Shaik Rafi

Abstract Product reviews are essential as they help customers to make purchase decisions. Often these product reviews are used to be too abundant, lengthy and descriptive. However, the abstractive text summarization (ATS) system can help to build an internal semantic representation of the text with the use of natural language processing to create sentiment-based summaries of the entire reviews. An ATS system is proposed in this work, which is called a many-to-many sequence problem where attention-based long short-term memory (LSTM) is incorporated to generate the summary of the product reviews. The proposed ATS system works in two stages. In the first stage, a data processing module is designed to create the structured representation of the text, where noise and other irrelevant data are also removed. In the second stage, the attention-based long short-term memory (LSTM) model is designed to train, validate and test the system. It is found that the proposed approach can better deal with the rare words, also can generate readable, short and informative summaries. An experimental analysis has been carried out over the Amazon product review dataset, and to validate the result, the ROUGE measure is used. The obtained result reflects the efficacy of the proposed method over other state-of-the-art methods.

20.1 Introduction The rapid expansion of the Internet has led to the growth of online services such as shopping, social networking and blogging to a large extent. Added to these growths, users’ competence of expressing their opinion about the online services in the form of reviews is also prolonged. These expressed opinions and their associated sentiments play an essential role in decision-making. Each online product receives lots of reviews; however, analyzing all these reviews and making a decision is an arduous task, but referring to only some of them would lead to a biased decision. Thus, autoD. Debnath · R. Das (B) · S. Rafi Department of Computer Science and Engineering, National Institute of Technology Mizoram, Aizawl, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_20

199

200

D. Debnath et al.

matic sentiment-based summarization becomes a severe necessity, which generates an automated, readable and relevant summary that express the overall sentiments [1, 2]. By analyzing sentiments, we can identify the expressed sentiment from an opinion, such as whether the sentiment is positive, neutral or negative, and by summarization, we can present a large amount of information in an informative and meaningful manner using little words [3]. Broadly, text summarization techniques are divided into two groups, extractive and abstractive [4]. In an extractive method, the summaries are generated by selecting relevant portions (sentence) using some machine learning approaches. In contrast, in the abstractive methods, by building an internal semantic representation of the given text and using a natural language generation technique, summaries are generated. However, the reviews are mostly short in length, either of one or few lines and unstructured; hence, sentence-based methods are not suitable for this purpose. Also, sentiment-based summaries are expected to be short in length containing only expressed sentiment and subjective knowledge, which needs the understanding of natural language generation. So, abstractive models are most suitable for generating such summaries as most of the extractive models are sentence-based. Rule-based or neural network-based abstractive summary generation approaches are characterized by imposing a set of rules learned from the corpus [5–8]. In particular, a diversified large corpus containing parallel running review summary pairs, the summarization model’s ability to dynamically dig out excellent rules and their ability to present the outputs in a meaningful manner guides the summarization system’s result to perfection. Such as LSTM-based recurrent neural networks (RNN) was used in [9, 10] for machine translation purpose. Klein et al. [11] proposed a similar 2-layer LSTM-based machine translation system facilitating encoding, decoding of variable-length sentences. Also, several neural network-based machine translation and summarization approaches are proposed [12, 13]. However, their methods suffer from the issues like inability to handle rare words and improper sentence formation, leading to low readability [14, 15]. The neural networks’ ability has motivated us to use them for product review summarization, which can handle all the shortcomings mentioned above. This paper proposed an abstractive text summarization approach for generating the summary of the Amazon Fine Food Reviews dataset. This dataset contains parallel review–summary pairs, where most of the summaries are formed by considering the sentiment and subjective knowledge of its corresponding reviews. We used RecallOriented Understudy for Gisting Evaluation (ROUGE) [16] for automatic evaluation of summaries. ROUGE includes measures for automatically determining the quality of system-generated summaries by comparing them to their actual gold summaries. The rest of the paper is organized as follows: In Sect. 20.2, the detailed system architecture is presented, and in Sect. 20.3, we have discussed the corpora and experimental setups. Obtained results are discussed in Sect. 20.4, along with a comprehensive analysis of the system’s performance. Finally, in Sect. 20.5, with some insights into future work, we concluded the paper.

20 Sentiment-Based Summarization …

201

20.2 System Description The proposed system’s fundamental phases are data processing, system training, and system inference/testing, further elaborated in the following subsections.

20.2.1 Data Processing Following steps are performed to prepare the dataset: • Text normalization: Product reviews containing abbreviation, misspelling, emoticons, contraction etc., need to be normalized. For these purposes, the following steps are performed: (i) Lexical-based approach is applied to handle acronyms and emoticons, where most of these acronyms and emoticons are collected from NetLingo,1 and Febcons.2 (ii) Contraction mapping, including some spell correction (such as gud gooood etc., to good), is also performed to handle abbreviation, misspelling and contraction. (iii) Finally, sentences are lower cased, and preprocessing of reviews are performed to remove noises like HTML tags, hyperlinks, punctuation, special characters and stop-words. • Dropping of duplicate or null-valued review–summary pair: If any review or corresponding summary contains duplicate or null values, then the review–summary pair is removed from the dataset. Finally, the START and END token to each summary is added. All the reviews and summaries lengths are calculated to understand the sequence’s distribution, and the maximum permitted length is calculated. • Splitting the dataset: Dataset (review–summary pair) is distributed for training, testing and validation purposes. • Tokenizer preparation: Tokenization is performed on the training and validation data, which further converts the word sequence to an integer sequence. After that, out-of-vocabulary (OOV) words (also called rare words) are identified and extracted. OOV words have three or less occurrence in the training and validation data. After removing the OOV words, deleted the review–summary pair if any review or corresponding summary has zero length (excluding START and END token). Then tokenization is performed again with the common words, and each sentence is padded with zero upto maximum permitted length of review and summary, respectively, such that all the review is of fixed length so as summary. Finally, review or summary vocabulary size is calculated as the total size of token in it plus one.

1 2

http://netlingo.com. https://fbicons.net/.

202

D. Debnath et al.

20.2.2 System Training An encoder and decoder are set up to train the system. Figure 20.1 shows the proposed encoder–decoder architecture. Encoder The encoder reads an input sequence in the many sequence-to-sequence models and generates its corresponding fixed-length context vector. For this purpose, the encoder uses a series of LSTM reading one token at a time. However, because of the difficulty of generating fixed-length vectors from arbitrary length sequences, a series of LSTM layers stacked on each other can be used. So, in the proposed system, three-layered stacked LSTM, which has layers of LSTM stacked (vertically) on top of each other, is used. Also, each layer has parallel running LSTMs. At each time-stamp, one word is fed to the encoder to capture the contextual information present in the input sequence. The hidden-state and cell-state information of the last time step of the encoder (final layers LSTM) is used to initialize the decoder as the context vector. It was observed empirically that deep-stacked LSTM makes the model deeper, more accurately earning the description than the shallower one. In particular, Sutskever et al. [10] reported a well-performing 4-layered deep neural network architecture for machine translation. Decoder The training phase’s decoder comprises a single LSTM layer, attention layer, con-cat attention layer and dense layer. The decoder is trained to predict the next word in the sequence given the previous word. For this purpose, the decoder’s initial state reads input gathered from the final state of the encoder and target summary sequences. Every summary sequence is quoted with < ST A RT > and < E N D > token. This is related to the supervised learning process, where the teaching enforcing technique is used to teach the model in which the attention mechanism interactively learns the relation between context and sentiment words to optimize the performance of sentiment-based text summarization, a lot similar to how the human brain works. Attention mechanism [17], back-propagation, and sparse categorical cross-entropy are incorporated to enhance the learning system. In the initial layers, the distribution of attention is relatively uniform throughout the sentence, as shown in Fig. 20.2a. However, with the training process progress, the attention mechanism gradually focuses attention toward sentiment contributing words, as shown in the Fig. 20.2b. The final result of the attention mechanism mainly highlighted four words (i.e., “but,” “adaptation,” “fails” and “miserably”). Finally, to improve the quality of word embed-

Fig. 20.1 Training phase: the proposed encoder–decoder architecture

20 Sentiment-Based Summarization …

203

Fig. 20.2 Comparison of initial versus final attention

ding by avoiding blind learning, the sentiment attention mechanism is combined with the conventional word embedding, which serves as the deep neural network’s input layer to extract text structure information [18]. To overcome memory issues, sparse categorical cross-entropy as the loss function is also used since it converts the integer sequence to a one-hot vector on the fly. An epoch in training refers to one forward and one backward pass over all the training instances. The proposed system is trained for a fixed number of epochs. However, early stopping is used to stop training the neural network at the right time by monitoring the validation loss for which validation data is used.

20.2.3 System Testing After training, the model is tested on new review sequences for which the targeted summaries are unknown. For this purpose, testing architecture is set up. Its encoder encodes the review sequences and uses the training model and encoder’s output. The system’s decoder generates the output summary. The following steps are performed in the testing phase to forge or decode a target sequence: (1) Encode the input source sequence and feed the decoder the encoder’s final hidden layer and cellsstate information. (2) Pass < star t > token as an input to the decoder. (3) Run the decoder for one time-step with the internal states information and training model. (4) Calculate every possible token’s probability and select the token having the maximum probability as the output. (5) Update the internal state information; pass the last output token as an input to the decoder for the next time-step and generate the output. (6) Repeat steps 3–5 until < end > token or maximum possible length of the target sequence reached.

20.3 Experimental Design This section contains a detailed description of the dataset and experimental setup used.

204

D. Debnath et al.

20.3.1 Dataset Description and Evaluating Methods The proposed system has been trained using the Amazon Fine Food Reviews dataset of Stanford Network Analysis Project (SNAP).3 This dataset comprises parallel running data spans of more than ten years, including all 500,000 reviews up to October 2012. These reviews include the product and its user information, ratings, plain text review and corresponding summary. We used ROUGE-1 and ROUGE-2 recall scores of ROUGE evolutionary measures to evaluate our system as recall scores help us to predict how many words are correctly identified by the obtained summaries concerning actual summary words. We also used human evaluators.

20.3.2 Experimental Setup We have used the following different experimental setups to train, test and analyze the system’s performance from different perspectives. 1. Initially, we trained the system using single-layered LSTM in the encoder and decoder without attention mechanism. This model (system-1) used 100,000 randomly selected preprocessed review–summary pair in a ratio of 8:1:1 for training, validation and testing purpose. 2. We have re-trained the system using stacked-LSTM in the encoder and decoder with an attention mechanism. This model (system-2) was trained and tested using the same data used in our initial system. We also saved the training model of each epoch. Each of these models has been tested on the same testing data containing reviews, and each of the obtained summaries is evaluated using ROUGE measures against the corresponding actual summaries. Such a setup helps in analyzing the change in systems behavior with an increase in the number of epochs, including the advantages of using stacked LSTM and attention mechanism. 3. We have considered the second system as the baseline system and took three more runs where in the first run (system-2a), we have trained and validated using 10,000 and 1000 randomly selected review–summary pairs, and for testing purpose, we have used 1000 review–summary pairs which are not selected for training purpose. And in the second (system-2b) and third run (system-2c), we have trained, validated and tested using 52,000 randomly selected review–summary pair in a ratio of 10:2:1. However, in the third run, we have used only positive reviews for training and validation purposes.

3

https://www.kaggle.com/snap/amazon-fine-food-reviews.

20 Sentiment-Based Summarization …

205

20.4 Result Analysis Each of the systems are described in the experimental setups section, and their performance have been detailed and analyzed in this section. Also, some of the summaries are discussed in this section. To analyze the system performance, we evaluated the system by varying different parameters, dataset size, etc. Detailed of these systems are described in the experimental setup section and obtained ROUGE-1, and ROUGE-2 recall scores are presented in Table 20.1. From Table 20.1 it is visible that the obtained summaries are similar to a large extent. However, the ROUGE score does not ensure how accurately the system identifies the expressed sentiments. So, for this purpose, we evaluated the system with the human evaluator. After analyzing the ROUGE recall scores and human evaluator’s responses, following observations are made, which we have enumerated below: 1. The ROUGE scores from the table shows the attention mechanism that helps to improve the results. The system with a large dataset and diversified content plays a vital role along with the core system components like attention mechanism, rare word handler and stacked LSTM. Some of the results are shown in Fig. 20.3. However, the human evaluator reported that the proposed system (system-2) produced better summaries than obtained summaries.

Table 20.1 Experimental results of our system System used ROUGE-1 recall score System-1 System-2 System-2a System-2b System-2c

Fig. 20.3 Obtained summaries

39.89 65.57 63.89 45.03 57.25

ROUGE-2 recall score 8.56 11.93 11.25 9.13 10.33

206

D. Debnath et al.

Fig. 20.4 Obtained poor summaries when only positive opinions are fed to the system

Fig. 20.5 Validation loss example

2. Because of the rare word handling mechanism, the systems does not produces any unknown token “< U N K >,” but sometimes the system produces wrong sentiments. The human evaluator identifies this problem, and after analyzing the problems, we found that the system learns the sentiments from the dataset itself as this is a supervised approach. So, we need to feed the system data containing all the sentiments (positive, negative and neutral). As system-2c is fed with maximum positive opinions contains only 2–3 negative opinions, so system-2c deals with the negative words as rare words. And hence, the system is trained by discarding these rare words. So, when we fed negative opinions in the testing/inference phase, the system found the negative words as the rare word. But because of the rare-word handing mechanism, the system decoded the word with the most probable words.

20 Sentiment-Based Summarization …

207

Some of such outputs are shown in Table 20.4. These issues can be solved using a large stable dataset containing all types of opinions while training the system. 3. We also observed that the early stopping mechanism helped to reduce the system training time. As for training the system, we have used 100 epochs with a batch size of 128. But the validation-based early stopping mechanism stopped the system training when it found a validation loss. Figure 20.5 shows an example of such.

20.5 Conclusion The sentiment-based summarization is essential since it significantly affects the decision-making of an individual or an organization management system. Abstractive text summarization approach using attention-oriented LSTM model serves this purpose. In particular, the different experimental setup is designed to analyze the performance and trained, validated and tested our systems for Amazon review–summary pairs. A close analysis of the models and their interpretation guided us to the conclusion that attention-oriented LSTM model has the ability systems to produce relevant summaries and can deal with rare words to a large extent. However, this system is tested on the dataset containing English short review–summary pairs, though, in reality, most of the reviews are multilingual. Hence, cross-lingual summaries can better serve this purpose, which will be our future direction of work.

References 1. Liu, M., Fang, Y., Choulos, A.G., Park, D.H., Hu, X.: Product review summarization through question retrieval and diversification. Inf. Retrieval J. 20(6), 575–605 (2017) 2. Yu, N., Huang, M., Shi, Y., Zhu, X.: Product review summarization by exploiting phrase properties. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. pp. 1113–1124 (2016) 3. Amarouche, K., Benbrahim, H., Kassou, I.: Customer product review summarization over time for competitive intelligence. J. Autom. Mobi. Robot. Intell. Syst. 12 (2018) 4. Shuai, W., Xiang, Z., Bo, L., Bin, G., Daquan, T.: Integrating Extractive and Abstractive Models for Long Text Summarization. pp. 305–312. IEEE (2017) 5. Hong, M., Wang, H.: Research on customer opinion summarization using topic mining and deep neural network. Math. Comput. Simul. 185, 88–114 (2021) 6. Jana, E., Uma, V.: Opinion mining and product review summarization in E-commerce. In: Trends and Applications of Text Summarization Techniques, pp. 216–243. IGI Global (2020) 7. Khan, A., Gul, M.A., Zareei, M., Biswal, R., Zeb, A., Naeem, M., Saeed, Y., Salim, N.: Movie review summarization using supervised learning and graph-based ranking algorithm. Comput. Intell. Neurosci. 2020 (2020) 8. Ramadhan, M.R., Endah, S.N., Mantau, A.B.J.: Implementation of textrank algorithm in product review summarization. In: 2020 4th International Conference on Informatics and Computational Sciences (ICICoS). pp. 1–5. IEEE (2020) 9. Cho, C., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Schwenk, F.B.H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

208

D. Debnath et al.

10. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to Sequence Learning with Neural Networks. pp. 3104–3112. Curran Associates, Inc. (2014) 11. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: Open-Source Toolkit for Neural Machine Translation. arXiv preprint arXiv:1701.02810; This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC- BY-ND (2017) 12. Rajesh, P., U.V., K., Jayashree, P.: Machine learning in evolving connectionist text summarizer. Anti-Counterfeiting, Security, and Identification in Communication, 2009. pp. 539–543 (2009) 13. Sainath, T.N., Vinyals, O., Senior, A., Sake, H.: Convolutional long short term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584 (2015) 14. Boorugu, R., Ramesh, G.: A Survey on NLP based text summarization for summarizing product reviews. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA). pp. 352–356. IEEE (2020) 15. Wang, Y., Zhang, J.: Keyword extraction from online product reviews based on bi-directional LSTM recurrent neural network. In: 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM). pp. 2241–2245. IEEE (2017) 16. Lin, Chin-Yew: Rouge: a package for automatic evaluation of summaries. In: Text summarization Branches Out: Proceedings of the ACL-04 workshop. vol. 8. Barcelona, Spain (2004) 17. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) 18. Tian, Y., Yu, J., Jiang, J.: Aspect and opinion aware abstractive review summarization with reinforced hard typed decoder. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. pp. 2061–2064 (2019)

Chapter 21

A Fuzzy-Based Multiobjective Cat Swarm Optimization Algorithm: A Case Study on Single-Cell Data Amika Achom, Ranjita Das, Pratibha Gond, and Partha Pakray

Abstract The microRNAs and single-cell expression data embed relevant and unrelevant information about the genes. Finding the hidden gene expression pattern helps us to understand the functionality of a gene at molecular level. Particularly, gene expression pattern varies with the effect of external stimuli. Therefore, in this situation, soft clustering technique helps to detect multiple patterns of a gene at one time. In this work, a form of soft clustering approach is presented in assimilation with the fuzzy and cat swarm optimization (CSO) algorithm to find an optimal cluster center for clustering. Here, the task of clustering is formulated as the optimization of multiple cluster validity indexes. The proposed clustering algorithm (multiobjective fuzzy-based CSO) is then compared with some of state-of-the-art clustering technique on different gene expression dataset. A case study is conducted to identify the transcriptome or marker gene in each cell types of human urinary bladder single-cell RNA sequencing data. Experimental analysis of multiobjective fuzzy-based CSO clustering algorithm shows its effectiveness over other state-of-the-art clustering algorithm such as AL, K-means, GA-FCM, DE-FCM, PSO-FCM, and GWO-FCM.

21.1 Introduction Clustering is an unsupervised machine learning technique to organize gene point exhibiting similar gene expression patterns in one cluster and a distinct group of gene points in a different cluster. It is widely applied in data mining, image segmentation, pattern recognition, machine learning, and bioinformatics. But there is some question related to the clustering task: How to detect the cluster number in any given dataset? How to determine an optimal partitioning of any data? What would be the appropriate A. Achom · R. Das (B) · P. Gond · P. Pakray National Institute of Technology, Mizoram, Aizawl, India e-mail: [email protected] P. Pakray e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_21

209

210

A. Achom et al.

cluster validity indexes to be used as an objective function? How to measure the effectiveness of a clustering solution? In recent years, some clustering approaches are developed utilizing the search capability of nature-inspired algorithm such as genetic algorithm (GA), [1], differential evolution (DE) algorithm [2], particle swarm optimization (PSO) [2], ant colony optimization (ACO) algorithm [3], and artificial bee colony (ABC) [4]. But the problem with all the above clustering approach is that it converges prematurely at some local point and convergence very slowly. Das and Saha clustered miRNAs gene expression data utilizing the fuzzy clustering with PSO algorithm [2] and DE algorithm [2]. Recently, different variant of the CSO algorithm is adopted for clustering purposes. Santosa et al. [5] and Razzaq et al. [6] utilized CSO for clustering real-life dataset while minimizing a single objective criterion. But the problem in the article [5, 6] is that clustering of extensive data takes more time even though it is accurate. So in this work, a soft form of fuzzy clustering technique is adopted, which can work under fuzziness boundaries between the clusters. Sadeghi et al. applied ACO to find the optimal solution (i.e., cluster center) for solving the clustering problem [3]. Karaboga et al. successfully used ABC for resolving clustering problem [4]. Kumar et al. [7] applied GWO for clustering real life dataset such as iris, wine, CMC, Bupa, Haberman, and glass. All the clustering framework are based on minimizing a single objective criterion (SSE). The problem with this clustering approach is that it required apriori knowledge of the data to be classified. The proposed clustering algorithm [3, 4, 7] and variants of CSO used in [5, 6] adopted a hard form of clustering technique with the assumption that there is well- defined boundaries between the clusters. But soft clustering technique does not require to reflect the description of the data. To solve the above problem, the article focuses on developing some novel fuzzybased clustering approaches utilizing the CSO algorithm’s search capability. The proposed fuzzy-based CSO algorithm can efficiently search the local and global solutions because of its adaptive bidirectional search in the seeking and tracing modes. CSO algorithm improves all cat’s exploratory search capability (candidate solution) and prevents them from getting confined around the local minima point. Moreover, it iteratively improved and refined the current candidate solution around local minima and global minima solution. The proposed clustering algorithm attempted to minimize two cluster validity indexes at the same time instead of a single objective function [5–7]. In the first step, fuzzy clustering approaches aim to generate an optimal partitioning of the data. In the second step, the cat improves the cluster center in the local and global search space at the same time to generate an optimal cluster center for clustering. The performance of multiobjective fuzzy-based CSO algorithm is then compared with some of the state-of-the-art clustering approaches such as average linkage (AL) [2], FCM [1], genetic algorithm-based fuzzy clustering (GA-FCM), differential evolution-based fuzzy clustering (DE-FCM) [1], PSO based fuzzy clustering (PSO-FCM) [2], multiobjective genetic algorithm fuzzy clustering (MOGAFC) [1],

21 A Fuzzy-Based Multiobjective Cat Swarm Optimization …

211

and multiobjective differential evolution fuzzy clustering (MOGAFC) [1] on different dataset. The paper section is organized as: Sect. 21.2 describes the CSO algorithm. Section 21.3 describes the proposed clustering algorithm. Section 21.4 presents the experiment. Section 21.5 concludes the paper.

21.2 Cat Swarm Optimization Cat swarm optimization (CSO) [8, 9] is a swarm-inspired optimization algorithm developed by Chu et al. in 2016 for solving a single-objective problem. But here in the paper, the algorithm is modeled to solve the real-valued multiobjective clustering problem.

21.2.1 Seeking Mode In seeking mode (SM), the behavior of cats in the resting and alert period is modeled. In this phase, the cat rests and observes its surroundings to catch prey. The important parameters in seeking mode are seeking memory pool (SMP), count of dimension to change (CDC), seeking range of the selected dimension (SRD), and self-position considering (SPC). The seeking mode is described as: 1. As many as SMP copies of the current cat position catk is made. 2. For each copy, CDC random number are selected. The current cat position is mutated by adding/subtracting SRD values. This is described as: X ( j Dnext ) = (1 + rand ∗ SRD) ∗ X ( j Dcurrent )

(21.1)

where X ( j Dcurrent ) and X ( j Dnext ) denote the current and next position of a cat, D denotes the dimension, and j is the total cat in the seeking mode. 3. The fitness value is calculated for all the candidate cat position. 4. One of the candidate point is selected to be the next position based on the probability. If the fitness value of all the cat position are equal, then the probability of selecting each candidate point is 1. This is described as: Pi =

| FSi − FSb | , where 0 < i < j FSmax − FSmin

5. If the objective is minimization, then FSb = FSmax else FSb = FSmin

212

A. Achom et al.

21.2.2 Tracing Mode The tracing mode (TM) represents the cat’s quick movement with its velocity while moving toward the prey. Initially, random velocity values are assigned to all the dimension of the cat position. These velocity values get updated at every iteration of the algorithm. The tracing mode is described as: 1. Update velocity of each Vk,d = Vk,d + r1 c1 (X best,d − X k,d ) 2. Check the velocity value out-ranged the maximum value, then it is reset to the maximum velocity. 3. The position of each cat is updated as: X k,d = X k,d + Vk,d .

21.3 Proposed Multiobjective Fuzzy-Based Cat Swarm Optimization Algorithm for Clustering The primary step followed for the execution of proposed clustering approach is explained as follows:

21.3.1 Solution Representation and Population Initialization In this representation, every cat is initialized with some random data point from the dataset. This set of data point represents the coordinates of the cluster centers or a candidate solution vector. Suppose every solution vector (or a cat) encodes C number of cluster centers in D-dimensional space. Then, the length of each cat would be D ∗ C.

21.3.2 Assignment of Gene Point to Different Cluster To assign a data point (di ), where i = 1, 2, 3, . . . , N to a cluster, following steps are executed: 1. The cluster center CC1 = 0.1, 0.2, 0.5, 0.3, CC2 = 0.2, 0.5, 0.7, 0.1 and CC3 = 0.9, 0.8, 0.7, 0.3 encoded in the solution vector is extracted. (EXAMPLE 1) 2. For each data point, Euclidean distance is calculated from the respective cluster center.

21 A Fuzzy-Based Multiobjective Cat Swarm Optimization …

213

3. Now, a data point di is assigned to the nearest cluster center, CCk having the minimum Euclidean distance, i.e., CCk = argminCj=1 dEucDist (di , CC j ). The above step (2–3) is repeated for N data point in the data. 4. The centers of each solution vector is updated according to the FCM algorithm as given in Eq. (21.3).

21.3.3 Execution of FCM Step Fuzzy C-means (FCM) [10] algorithm is a soft clustering technique that associates a data point to more than one clusters with some membership value. It partitions N data point into some C number of clusters to minimize the variance within the cluster. Initially, it chooses C number of cluster centers. In every iteration, it updates the membership values of a data point using the equation (21.2): 1 )(1/m−1) (d (CC ,x ) , 1 ≤ k ≤ C and 1 ≤ i ≤ N Uk,i = C EucDist k1 i (1/m−1) k=1 ( dEucDist (CCk ,xi ) )

(21.2)

Subsequently, it updates the cluster center using the following equation (3). Uk,i associates the membership value of ith data point to kth cluster. N CCk =

m i=1 Uk,i ∗ x i N m i=1 Uk,i

(21.3)

This process of computation of membership values and cluster centers is continued for a constant number of times until the cluster centers obtained in the next iteration do not change.

21.3.4 Objective Function The objective function used in the paper are described as: Jm =

C  N  m 2 Uk,i dEucDist (CCk , xi )

(21.4)

k=1 i=1

dEucDist (CCk , xi ) gives the Euclidean distance of ith data point from kth cluster center CCk . Jm gives the variance within the cluster. Lower Jm value results in the formation of more compact clusters.

214

A. Achom et al.

Xie-Beni (XB) [1] index is defined as the ratio of compaction of fuzzy partition to its cluster separation. The compactness of fuzzy partition (π ) and separation between the cluster is defined as: π=

ni C   2 2 Uk,i dEucDist (CCi , x j )

(21.5)

k=1 i=1 C

2 Separation = min ,i= j [dEucDist (CCi , CC j )] i,k=1

Now X B =

π n i ∗ Separation

(21.6) (21.7)

To obtain a compact and well-separated cluster, π value should be small and separation should be large. Thus, the aim is to minimize XB index.

21.3.5 Selection and Ranking of New Non-dominated Solution for the Next Generation In multiobjective optimization (MOO) [11] problem, two or more conflicting objective functions are optimized simultaneously. In this context, one objective function solution cannot be considered better than the other. To satisfy all the objective function criteria, a set of optimal solution is needed. These solutions are known as the non-dominated solution or the Pareto optimal solution [11]. Two or more non-dominated solutions can occupy at the same Pareto front that is of equal rank. In that case, the crowding distance module of the NSGA-II algorithm is executed to select the non-dominated solutions located in the less density area [11]. To select the best cat in SMP, non-dominated sorting algorithm is applied on the population consisting of parent solution and offspring solution, i.e., copies of cat. The non-dominated solution is assigned rank, and crowding distance algorithm is applied to select the least crowded solution if the non-dominated solutions are assigned the same rank.

21.3.6 Clustering Optimization with CSO Algorithm The following steps are executed to find a globally optimal solution using the CSO algorithm: 1. Randomly, | N P | number of cats are generated and distributed in D-dimensional space

21 A Fuzzy-Based Multiobjective Cat Swarm Optimization …

215

2. According to the MR ratio, the cats are classified into seeking and tracing mode. 3. The objective function is calculated using Eqs. (21.4) and (21.7) for each cat in the population. 4. Non-dominated sorting and crowding distance algorithm in NSGA-II algorithm [11] is executed to select SMP number of cats. 5. Then, the cat moves to either the seeking or tracing mode, which is described in the above Sect. 21.2. 6. After the cat went through the seeking or tracing mode, cats are randomly redistributed based on MR for the next iteration. 7. Step (21.3)–(21.6) is repeated if the terminating criteria is not satisfied.

21.3.7 Terminating Criteria The final cat position is returned as the optimal cluster center for clustering.

21.4 Experimental Result 21.4.1 Dataset Description The proposed clustering algorithm’s is executed on different gene expression dataset such as Arabidopsis Thaliana [12], human fibroblasts serum [12], Rat-CNS [12], yeast cell cycle [12], and yeast sporulation [12]. Also, some publicly available datasets from UCI Machine Learning repository are considered. As an application point of view, we considered single-cell RNA-sequencing data of human urinary bladder.

21.4.2 Discussion of Result The comparative performance analysis of multiobjective fuzzy-based CSO with different state-of-the-art clustering algorithm is given in Tables 21.1 and 21.2. Different cluster validity indices such as SI, ARI and MS are used for validating the clustering result. From Table 21.1, it is observed that the amount of misclassification from multiobjective fuzzy-based CSO is significantly less as compared to PSO-FCM, GAFCM, and GWO-FCM. The proposed clustering algorithm also gives a good ARI value on both iris and breast cancer data (shown in Table 21.3). Multiobjective fuzzy-based CSO algorithm also gives a good partition of gene expression data set such as yeast cell cycle (SI ∼ 0.51119), Arabidopsis Thaliana (SI ∼ 0.43859), human fibroblast serum (SI ∼ 0.42231), and rat-CNS (SI ∼ 0.6083).

216

A. Achom et al.

Table 21.1 Average MS and ARI of different clustering algorithm on real-life datasets Algorithm Iris Wine Cancer Glass MS ARI MS ARI MS ARI MS ARI AL [1] FCM [1] MOGAFC [1] MODEFC [1] Multiobjective

0.596 0.4603 0.270 0.276 0.707

0.694 0.7832 0.920 0.910 0.568

0.9116 0.7329 0.353 0.382 0.743

0.403 0.5321 0.858 0.847 0.860

0.445 0.3214 0.090 0.096 0.321

0.756 0.868 0.970 0.967 0.877

0.745 0.596 0.319 0.290 0.917

0.520 0.737 0.881 0.910 0.021

Fuzzy based CSO Table 21.2 Average SI of different clustering algorithms on gene expression data Algorithm Yeast spore Yeast cell Arabidopsis Serum RatCNS AL [2] K-means [2] FCM [1] MOGA [1] DE [2] PSO [2] Multiobjective fuzzy

0.5023 0.6127 0.4696 0.5754 0.7116 0.6976 0.67845

0.4378 0.3785 0.3856 0.4232 0.4793 0.4420 0.51119

0.3162 0.3678 0.3665 0.4023 0.4405 0.4380 0.43859

0.3576 0.3628 0.3125 0.3874 0.4223 0.4130 0.42231

0.414 0.3564 0.4132 0.4993 0.5271 0.5089 0.6083

Based CSO Table 21.3 Different cell types and marker gene for human urinary bladder Cluster ID Cell types Cluster% Marker genes or distribution% transcriptomes Cluster ID 1 Cluster ID 2 Cluster ID 3 Cluster ID 4 Cluster ID 5 Cluster ID 6 Cluster ID 7 Cluster ID 8 Cluster ID 9

Endothelial cell Luminal cell Fibroblast cell Epithelial cell Basal cell Stromal fibroblast myofibroblast pericycle cell Stromal cell Urothelial cell Umbrella cell

1.8660 0.32930 0.5488 1.4270 17.3435 5.0493

Cd302 Lamp1 Mxra8, Cd34, Cd81, Pi16 Cd55, Cyb5, Epcam Mrc2, Mxra7 Moxd1

3.7321 65.971 3.732

Cox5a, Cox5b, Cox6c Enpp4, Epha1 Upk2, Upk3a, Hmox1

The CSO algorithm exhibits its effectiveness when clustering microarray gene expression data. The result of SI of different microRNAs gene data is given in Table 21.2.

21 A Fuzzy-Based Multiobjective Cat Swarm Optimization …

217

21.4.3 Case Study: Human Urinary Bladder A case study is conducted to identify the transcriptomes factor or marker gene from single-cell RNA-sequencing (scRNA-Seq) data of human urinary bladder. In this scRNA-Seq data, nine different clusters are obtained. The clusters are then annotated to the known cell type from the existing work and gene ontology database. The obtained clustering result is given in Table 21.3. We have listed all the annotated cell types and their corresponding marker gene or transcriptomes factor.

21.5 Conclusion and Future Work This article presents a form of soft clustering using the multiobjective CSO algorithm. It is observed that CSO based clustering algorithm does not give good result in some high- dimensional data. In such cases, the dataset is then transformed to a lower representation using the PCA algorithm. Unexpectedly, it produces good results on some gene dataset. The proposed clustering technique’s performance has been compared with the state-of-the-art clustering technique reported in the literature. In future, the proposed clustering module could be extended further to investigate the gene that promotes tumor in each cell cluster. Acknowledgements Principal Investigator (PI) acknowledges Sunrise Project with Ref: NECBH/ 2019-20/178 under North East Centre for Biological Sciences and Healthcare Engineering (NECBH) Twinning Outreach Programme hosted by Indian Institute of Technology Guwahati (IITG), Guwahati, Assam, funded by Department of Biotechnology (DBT), Ministry of Science and Technology, Govt. of India, with number BT/COE/34/SP28408/2018 for providing necessary financial support.

References 1. Saha, I., Maulik, U., Plewczynski, D.: A new multi-objective technique for differential fuzzy clustering. Appl. Soft Comput. 11(2), 2765–2776 (2011) 2. Das, R., Saha, S.: Gene expression classification using a fuzzy point symmetry based PSO clustering technique. In: 2015 2nd International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 69–73. IEEE (2015) 3. Dorigo, M., Birattari, M., Stutzle, T.: Ant colony optimization. IEEE Comput. Intell. Mag. 1(4), 28–39 (2006) 4. Karaboga, D., Basturk, B.: On the performance of artificial bee colony (abc) algorithm. Appl. Soft Comput. 8(1), 687–697 (2008) 5. Santosa, B., Ningrum, M.K.: Cat swarm optimization for clustering. In: 2009 International Conference of Soft Computing and Pattern Recognition, pp. 54–59. IEEE (2009) 6. Razzaq, S., Maqbool, F., Hussain, A.: Modified cat swarm optimization for clustering. In: International Conference on Brain Inspired Cognitive Systems, pp. 161–170. Springer (2016) 7. Kumar, V., Chhabra, J.K., Kumar, D.: Grey wolf algorithm-based clustering technique. J. Intell. Syst. 26(1), 153–168 (2017)

218

A. Achom et al.

8. Chu, S.C., Tsai, P.W., Pan, J.S.: Cat swarm optimization. In: Pacific Rim International Conference on Artificial Intelligence, pp. 854–858. Springer (2006) 9. Chu, S.C., Tsai, P.W., et al.: Computational intelligence based on the behavior of cats. Int. J. Innov. Comput. Inf. Control 3(1), 163–173 (2007) 10. Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984) 11. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 12. Maulik, U., Mukhopadhyay, A., Bandyopadhyay, S.: Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinform. 10(1), 1–16 (2009)

Chapter 22

KTM-POP: Transliteration of K-POP Lyrics to Marathi Manisha Satish Divate

Abstract Considering the popularity of the K-pop music in India, automatic transliteration of K-POP (Korean POP songs) to Marathi (KTM-POP: Korean to Marathi POP) music is the idea proposed in this paper. Native speakers of Maharashtra could not able to speak Korean lyrics. Though the Romanization of the Korean script is available, pronunciation of those scripts is difficult because of multiple phonetical sounds for the Romanization script. For the correct pronunciations of the K-POP lyrics, this paper proposed transliteration of the Korean lyrics in Marathi script. Bilingual script dictionary and hand-crafted rules are used for the transliteration. Pilot study shows the successful transliteration of K-POP lyrics to Marathi with word accuracy of 82.5%.

22.1

Introduction

Increase in popularity of Korean POP (popular as K-pop) music, Korean drama (K-drama) before pandemic and post-pandemic is noticeable [1]. For vocal music, pronunciation of the lyrics is very important. Voice modulation is easy to remember, but for a singing, correct pronunciation of the lyrics in foreign language is very important [2]. Here the attempt is to produce the transliteration of the K-POP lyrics to Marathi script in order to help the Maharashtrians to enjoy the singing by reading the transliterated lyrics of K-pop music. Transliteration is one of the important Natural Language Processing (NLP) tasks which helps in transforming the source language to a destination without changing its pronunciation, e.g., Korean: 안녕하세요 (Translation: Hello). Romanization: annyeonghaseyo. Marathi Transliteration: आन्नयोन्गंहाासेयो

M. S. Divate (&) Usha Pravin Gandhi College of Arts Science and Commerce, Vile Parle, Mumbai 400056, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_22

219

220

M. S. Divate

22.1.1 Types of Transliteration Transliteration process is classified as (1) phoneme-based, (2) spelling-based or grapheme-based and (3) hybrid approach [3–5]. In phoneme-based approach, common phonetic representation is used to map the source word to destination word. Challenge here is availability of multiple phonetic representation for the source grapheme. Grapheme-based or spelling-based approach is a orthographic process which follows the rule for writing the spelling, where the source language grapheme is map with target language grapheme which is the smallest unit in writing system (e.g., consonant, vowels). Combining the above-said methods is a hybrid approach.

22.1.2 Phonetic Representation In Marathi, alphabets प,फ,ब, त,थ,श, etc. have similar phonetic representation irrespective of their place in word, whereas in Korean alphabet system, the phonetic representation for some alphabet changes as per the location of it in the script block. Example 1: ㄱ if appears in first place in the block, then its phoneme representation is [k- क] otherwise it is [g- ग].

22.1.3 Script Specification Korean script and Marathi script are totally different. There is no overlapping of graphemes. Both the languages are written in one script. Marathi script is rich than Korean script as shown in Table 22.1. In Korean, consonant’s phoneme representation changes based on the position. Vowels in Korean script are either monophthong (single sound) or diphthong (two vowel sounds). For the transliteration, mapping of Korean to Marathi syllable as per phonology is listed in Table 22.2. Direction of reading and writing of both the script is from left to right (LTR). The language structure of the Marathi and Korean is C*V* where C stands for consonant and V stand for vowel. Writing system of the Marathi is sequential where Korean is in block structure as shown in Fig. 22.1. Table 22.1 Basic script of Korean and Marathi

Consonant Vowels

Korean

Marathi

14 21

38 14

22

KTM-POP: Transliteration of K-POP Lyrics to Marathi

221

Table 22.2 Korean-Marathi phoneme-based mapping Korean

Marathi

Remark



nil

No sound Vowel Consonant Vowel consonant vowel Consonant Vowel Vowel

ᅡ ᆫ

न योन् ह आ स

ᅧ ᄒ ᅡ ᄉ

यो

C-1 C-1

C-1 V-2

C-3

C-1

V-2

V-2

V-2

C-1

V-2

C-3

C-4

C-3

(a)

(b)

(c)

(d)

(e)

Fig. 22.1 (a–e) Syllable block structure in Korean language

Example 2: Three-letter syllabary: ‘ᄏ’, ‘ᅩ’, ‘ᆼ’ will form a character 콩 as represented in Fig. 22.1d block. Its Marathi transliteration is कौंग which is in sequential form क+ + + ग (CVCV). Here, the research idea is to help the Maharashtrian vocalist, singers and music lovers to read the lyrics in Devanagari. The rest of the paper is organized as follows. Section 2 provides a survey of transliteration work. Methodology of the transliteration and results is presented in Sects. 3 and 4, followed by conclusion.

22.2

Literature Survey

In 1996, grapheme and phoneme alignment of the text was recorded [6]. Knight and Graehl [7] worked on transliteration and back-transliteration of English-JapaneseEnglish script. The study conducted two experiments—1. On full language model and 2. Personal name language model. Sixty-four percent accuracy was recorded for the back-transliteration of personal names. Twelve percent names were phonetically correct but inaccurately spelled by machine.

222

M. S. Divate

As a need of cross-lingual information retrieval of the Korean technical documents which contain the English word in transliterated form, [8] had proposed the statistical transliteration model. In pivot-based method, English word is transformed to English pronunciation and then to Korean word, whereas in direct method, English word is directly transformed to Korean word. Kang and Choi [9] had presented decision-tree-based supervised learning model for transliteration process which performs the alignment of English to Korean character automatically. Ninety-nine accuracy was recorded. Oh et al. [10] designed a correspondence-based model based on grapheme to phoneme for the transliteration and had claimed 15–41% increase in performance in transliteration of English to Korean. Classification model for English to Tamil transliteration of persons’ name was trained for 30,000 entries. Features in binary form, such as cantered, leftand right-side n-grams of source word, start symbol and end symbol of a word, were encoded. WEKA J48 classifier algorithm had shown the accuracy of 84.82% [11], whereas in case of [12] Hindi-to-English transliteration of proper nouns, accuracy of direct mapping model was 97%. To train the system, bi-gram to six-gram character mapping was used. Most popular statistical machine transliteration method had been used by [8, 12]. On the similar guidelines, [13] Punjabi-to-Hindi transliteration and character-to-character mapping approach have been used to transliterate the nouns. Literature study shows the major contribution in Romanization of the East Asian, Indian and European languages. Here the proposed method is transliteration of the East Asian (Korean) to Indian (Marathi) language using grapheme–phoneme–phoneme–grapheme approach. Section 3 briefs about the methodology used for the transliteration.

22.3

Methodology

Transliteration process is expressed as source language!Target language (here Korean !Marathi) mapping without changing the phonetic sound of source language. Korean alphabet system and their sounds are mapped with the Marathi alphabet system. As explained in Sect. 1, Marathi is rich in alphabet and phonetics. Each grapheme along with its phonetic sounds is mapped with the help of experts. The main contribution of this research is to create a bilingual corpus for 74 syllabaries. Figure 22.2 shows the block diagram of the transliteration process in detail. Step 1—Korean sentence tokenization: Korean language has clear sentence and word boundaries as Marathi. Sentences and words are tokenized with the use of simple split function.

22

KTM-POP: Transliteration of K-POP Lyrics to Marathi

223

Source Sentence

Korean sentence tokenization

Read each Syllable block

Phoneme based syllabary Alignment

Target Sentence

Framing a Figure target 2: Sentence

Construct the target script

Use phonemebased mapping Rules for ending syllable

Fig. 22.2 Block diagram of transliteration

Step 2—Read syllabary block: As discussed in Sect. 2, Korean words are written using syllable blocks. Read each syllable block and identify the consonant and vowels in syllable. Step 3—Syllabary alignment: Every consonant and vowels are aligned with the Marathi script on the basis of phoneme as shown in Table 22.2. Step 4—Mapping rule for ending syllable: Some syllable has two phonemes, and they are used based on placement of grapheme in the script bloc, e.g., if grapheme comes in beginning, then it sounds [s- स], and if it comes in the end of the script block, then sounds [t- त] as shown in Table 22.3. Here, 20 hand-crafted rules are designed to map the phonemes of source script to a target script. However, there are many graphemes such as which are unused when doing transliteration. Step 5—Target Script: Target script is a sequential arrangement of the mapped graphemes. Step 6—Sentence framing: Final sentence is constructed by combining words in same sequence. The proposed method uses the direct method of transliteration where Korean grapheme with phonetical sound is mapped with the phonetically similar grapheme of Marathi script. Figure 22.3 shows the outcome of the methodology applied. The next section shows the result and the challenges faced in transliteration.

Table 22.3 Phoneme mapping rule based on position of syllable Grapheme

Mapping rule

! [र,ल] ᄉ![त,स] ᆫ![न, अन्]

If If If If

its position is not last, then ‘र’ else ‘ल’ appears at first place in block, then स otherwise ‘त’ otherwise appears at first place in block, then ‘न’ appears at last position in block, then ‘अन्’

224

M. S. Divate

Fig. 22.3 Transliteration of K-pop to Marathi

22.4

Result and Discussion

The accuracy of the model is measured as a number of correct transliterated words. The pilot study had considered 26 sentences from a K-pop song which consist of Korean as well as English scripts. To reduce the computational time, removal of English script and duplication of sentences or words can be done but to maintain the rhythm, English words are ignored and duplicate sentences/words are transliterated to reduce the post-production task. Precision of a model is defined as the ratio of correct transliteration of sentence to total number of the input sentences. Out of 126 words, 80 words were unique. The transliterated words, its Romanization and Korean sound track with Romanization subtitle were given to a native speaker of Marathi who also has primary understanding of Korean language. Word accuracy of the model is computed as Word Accuracy ðPÞ ¼

No of correctly transliterted words Total transliterted words

It has been observed that 82.5% words are correctly transliterated. The performance of the KTM-POP is comparted with the existing transliteration models as shown in Table 22.4. Table 22.4 shows that use of direct mapping method (phoneme–grapheme, character-to-character mapping) gives better result than the statistical or machine learning methods. 17.5% of the transliterated words produced by KTM-POP can be improved by modifying the grammar rules. The problem in transliteration of words is due to the difference in grammar of language. In Marathi, for example,

22

KTM-POP: Transliteration of K-POP Lyrics to Marathi

225

Table 22.4 Accuracy of transliteration Systems

Dataset

Methodology

Word accuracy (%)

[12] Hindi to English [14] Punjabi-to-Hindi transliteration of Proper nouns [13] Punjabi to Hindi KTM-POP [15] English-Chinese Arabic-English using [16] English-to-Korean transliteration of documents

Proper nouns Proper nouns

Hybrid approach Character-to-character mapping

97 94

Proper nouns K-POP Lyrics Proper Nouns

Statistical machine transliteration Direct mapping Statistical methods of transliteration MOSES Statistical machine transliteration and machine learning techniques

87.72 82.5 37.8 43 Statistical Trans. 40.7 Neural Network 35.1 Decision Tree: 37.6

English Technical words

explicit halant is used to show the half consonant (the lack of an inherent vowel), e.g., क् (k) = क (ka) + [hal] = क् (Incomplete sound). As per the Romanization (if we use for comparing the outcomes), transliteration of the Example 1 shown below is correct. In phoneme representation/or in voice modulation, the [श], [ग] sounds incomplete (without vowel) Example 1: 거짓말과- [geojismalgwa]- गोजीशमालगवा While singing, if we pronounce [श, ग, वा] per the phoneme (complete sound), the lyrics will sound differently. Hence, the new modified rule is: If two consonants [ग] and [व] are followed by vowel, then put after the first consonant. 과! [gwa] ! ग वा ! ग् + वा! ग्वा Identify the vowels’ type and insert halant after the consonant to the frame compound grapheme (Table 22.5-Sr. no. 1).

Table 22.5 Challenges in transliteration Sr No

Original Korean word

System transliterated

Phoneme sound of word

Remark

1

거짓말과 geojismalgwa 첫 (cheot) 무엇이 이 사랑이

गोजीशमालगवा

गोजीस्मालग्वा

As per sound track

छोश मूओशी

छोत मूओशी इ सारांगी

Need to add more rule Correct as per the sound track If appear first letter in word, then इ otherwise

2 3 4

226

M. S. Divate

Table 22.6 Korean vowels represented as diagraph in Marathi Korean

Marathi

ᅧ [yeo] [ye] [yo] [ya]

येओ ये यो या

Diagraph

Example 2: Also ‘s phoneme is [t] if it appears at end, but the same rule is not applicable for the example in Table 22.5-Sr. no. (2). In Marathi, there are long and short pronunciations and vowels used for those are etc. Mapping of those phoneme is difficult. It can be generalized as rule that short vowels only to use to produce equal length of pronunciation. Vowels in Korean are represented as diagraph in Devanagari as shown in Table 22.6. For the present study, diagraph as per the phoneme is used for the Korean vowels.

22.5

Conclusion

This paper presented the study of Devanagari and Korean scripts in order to perform the transliteration. Transliteration is one of the challenging jobs as every language is rich in vocabulary. Though the phonetically syllable mapping is present between both the languages, for phoneme representation, understanding of language is very important. The results computed are impressive as phonological alignment of the phonemes gives more grapheme-to-phoneme accuracy. Results will improve further by adding the micro rules for the target word construction as discussed in Sect. 4. Current system is based on bilingual dictionary of grapheme-to-phoneme alignment of the syllable. Automatic machinelearning-based transliteration model with evaluation system will be the future extension of the proposed study.

References 1. Bhatt, S.: How K-pop and Korean drama had their biggest breakthrough in India amid the pandemic (2020) 2. GINSBURG, J.: How to learn a song in a language you don’t know (2021). https://www. fluentin3months.com/learn-a-song/ 3. Karimi, S., Scholer, F., Turpin, A.: Machine transliteration survey. ACM Comput. Surv. 43(3) 2011. https://doi.org/10.1145/1922649.1922654 4. Antony, P., Soman, K..: Machine transliteration for indian languages: a literature survey. Int. J. Sci. Eng. 2(12), 1–8 (2011), [Online]. Available: http://scholar.google.co.in/scholar?q=

22

5. 6.

7. 8.

9. 10.

11. 12.

13.

14. 15. 16.

KTM-POP: Transliteration of K-POP Lyrics to Marathi

227

machine+transliteration+for+indian+languages:+a+literature+survey&btnG=&hl=en&as_ sdt=0,5#0 Oh, J.H., Choi, K.S., Isahara, H.: A comparison of different machine transliteration models. J. Artif. Intell. Res. 27, 119–151 (2006). https://doi.org/10.1613/jair.1999 Andersen, O., Kuhn, R., Lazarides, A., Dalsgaard, P., Haas, J., Noeth, E.:Comparison of two tree-structured approaches for grapheme-to-phoneme conversion. In: Proceeding of Fourth International Conference on Spoken Language Processing, vol. 3, pp. 1700–1703 (1996). https://doi.org/10.1109/icslp.1996.607954 Knight, K., Graehl, J.: Machine transliteration. Assoc. Comput. Linguist. 24(4), 599–612 (1997), [Online]. Available: https://www.aclweb.org/anthology/J98-4003.pdf Lee, J.S., Choi, K.S.: English to Korean statistical transliteration for information retrieval. Comput. Process. Orient. Lang. 12(1), 17–37 (1998), [Online]. Available: papers2:// publication/uuid/1C9266A4–59DC-4AC5–9663–30EE870AD70D Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: 2nd International Conference on Language Resources and Evaluation Lr. (2000) Oh, J.H., Choi, K.S., Isahara, H.: A machine transliteration model based on correspondence between graphemes and phonemes. ACM Trans. Asian Lang. Inf. Process. 5(3), 185–208 (2006). https://doi.org/10.1145/1194936.1194938 Vijaya, M.S., Ajith, V.P., Shivapratap, G., Soman, K.P.: English to Tamil transliteration using WEKA. Int. J. Recent Trends Eng. 1(1), 1–3 (2009) Kaur, V., kaur Sarao, A., Singh, J.: Hybrid approach for Hindi to English transliteration system for Proper Nouns Veerpal. Int. J. Comput. Sci. Inf. Technol. 5(1), 1–6 (2014). https:// doi.org/10.5120/3356-4629. Josan, G., Kaur, J.: Punjabi to Hindi statistical machine transliteration. Int. J. Inf. Technol. 4 (2), 459–463 (2011), [Online]. Available: http://www.csjournals.com/IJITKM/PDF4-2/ Article_26.pdf Malhan, E.S., Mann, E.J.: Punjabi to Hindi transliteration system for Proper Nouns using hybrid approach. 5(11), 6–10 (2015) Matthews, D.: Machine transliteration of Proper names (2007) Kang, I.-H., Kim, G.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. 418–424 (2000). https://doi.org/10.3115/990820.990881

Chapter 23

A Comparative Study on the Use of Augmented Reality in Indoor Positioning Systems and Navigation Aashka Dave and Rutvik Dumre

Abstract In today’s world, when smartphones are readily available, printed maps are outdated and replaced by mobile apps. These apps rely on technologies like GPS, Wi-Fi, etc. Unfortunately, in closed places like shopping malls, colleges, etc., these apps do not provide the best results. To tackle this problem of indoor navigation, multiple solutions using augmented reality (AR) have emerged over the last decade. In this paper, we discussed augmented reality and its application in indoor positioning and navigation. Furthermore, we have done a comparative study of these methods on the basis of attributes like compatibility, requirements, availability, and usability. The comparison helps state the advantages and limitations of the systems clearly.

23.1 Introduction Indoor positioning system (IPS) refers to the technology that helps locate people and objects within buildings. Location-based services and navigation can play an essential role in shopping malls, airports, schools and colleges, hospitals, and other indoor spaces. This can be the reason why indoor positioning systems are becoming popular in such fields. Indoor positioning systems need to deal with comparatively smaller areas and the presence of different objects. Thus, indoor positioning systems require higher precision and accuracy compared to outdoor positioning applications. In indoor spaces, Global Positioning System (GPS) becomes unreliable as visual contact is not available with the GPS satellites. Thus, IPS relies on other technologies that can fall under the following categories: proximity-based systems, Wi-Fi-based systems, ultra wideband (UWB) systems, infrared (IR) systems, to name a few. AlAmmar et al. [1] Each of these systems, however, have their limitations. Most of these technologies still rely on the availability of connection in the building; for example, wireless technology would fail to work if there was no signal. Thus, there comes a need for a system that does not rely entirely on such technologies. A. Dave · R. Dumre (B) NMIMS’ Mukesh Patel School of Technology Management and Engineering, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_23

229

230

A. Dave and R. Dumre

With the evolution of technology, many advancements are being made to improve the user experience. Augmented reality (AR) has been one such field that has gained popularity in the last decade. Augmented reality (AR) is the real-time integration of real-world objects and information using graphics, texts, and other virtual objects. Carmigniani et al. [2], Azuma et al. [3] The real world is enhanced by overlaying these digital elements to increase interactivity. The primary intent of AR is to enhance certain real-world features. It helps improve their understanding which would then help in real-life applications. Various domains, such as medical training, retail, design, and classroom education, currently use augmented reality to enhance and aid the experience. Most navigation systems in the indoor environment still rely on signage and floor maps which are not interactive. The focus of the current study is on the augmented reality-based indoor positioning systems. Such systems use augmented reality to display location information to the users, thus providing a more interactive environment. The main intent of this paper is to focus on AR-based indoor positioning systems to determine their efficiency. We also explore various applications of augmented reality in navigation in the indoor as well as outdoor environment. These applications are built using different, commercially available software development kits (SDKs). Through our study, we perform an analysis of these systems to determine their suitability for various applications. Further in the paper, we briefly discuss indoor positioning systems and their current limitations in Sect. 23.2; augmented reality in Sect. 23.3; AR-based indoor positioning systems in Sect. 23.4 and the inference drawn in Sect. 23.5; and conclusion and future scope in Sect. 23.6.

23.2 Indoor Positioning System Indoor positioning systems (IPSs) help locate objects or people in an indoor environment. As compared to the outdoor environment, indoor environments have greater complexity due to the presence of various objects such as people, walls, and pieces of equipment. The presence of such objects can cause scattering of the signal. Thus, IPS requires a lot more precision and accuracy as compared to outdoor positioning systems to deal with smaller areas and various objects. Limitations of Existing Technologies 1. Infrared – Cannot penetrate walls and other such solid objects and, hence, can only be used for smaller spaces – Direct sunlight or fluorescent light can cause interference with the signal and, hence, cannot be used in such locations. 2. WLAN – Lot of restrictions on localization

23 Use of Augmented Reality in Indoor Positioning …

231

– Accuracy varies depending on the device’s sensors – Changes in the environment and/or signal strength can harm the performance 3. Bluetooth – Has a relatively smaller range – Additional hardware and higher installation costs – Presence of other wireless signals such as Wi-Fi can cause a lot of interference 4. RFID – RFID readers need to be positioned at regular intervals which increases the installation cost – Accuracy is heavily dependent on the positioning of the tags – Signal strength may drop between the antenna and the tag These technologies thus require constant access to connection within the indoor environment and, at the same time, have additional hardware requirements. This leads to the need for a more sophisticated system that not only eliminates these concerns but also improves the overall experience.

23.3 Augmented Reality Augmented reality refers to any technology that augments the user’s visual (and sometimes, auditory) perception of their environment. It provides real-time rendering of digital images or data onto real-world objects, thereby enhancing the user’s experience. It is a technology that combines computer graphics, multimedia technology, digital image processing, and other areas. Unlike virtual reality (VR) that replaces the user’s real-world experience with a simulated one, AR augments the real-world scene as seen in games like Pokemon Go or various retail websites that provide the users with an immersive digital experience and engage them in a significant way. Milgram P., Takemura H., Utsumi A., and Kishino F. illustrated this difference through their reality–virtuality continuum as shown in Fig. 23.1 At the two ends of the continuum lie the real-world and the complete virtual experience. The middle region is called mixed reality. Augmented reality lies toward the real-world end of the spectrum, where the idea is of the real world augmented by computer-generated data. [4] There are two types of systems, marker-based and markerless AR. Marker-based AR systems use computer vision-based technology to detect pre-defined markers. These markers are specific image patterns present in the environment having distinctive points such as logos, posters, or even quick response (QR) codes. Markerless AR, on the other hand, does not require specific markers. Instead, it uses different technologies like GPS, Wi-Fi, or Bluetooth to determine the user’s location in the environment. It places virtual objects in the environment based on the actual features present in the environment instead of using special identifying markers. The biggest

232

A. Dave and R. Dumre

Fig. 23.1 Reality–Virtuality continuum

drawback with such systems is that they require additional hardware, which increases the cost. Many applications of AR have emerged in recent years, one of which is using AR for navigation. The benefit of using AR-based navigation is enhancing the user experience. Most signages and floorplans currently in use are either hard to read or are not visually appealing. AR overcomes this problem by giving the users a more immersive and interactive experience. Using AR, navigational instructions are provided using different kinds of graphics and textual information. Unlike 2D maps or floor plans, this puts the user in the indoor environment and can experience these in real-time by looking at the surroundings.

23.3.1 Applications of AR in Navigation A relic park called Yuanmingyuan is one of the few relics left after burning and looting in history. A system proposed to make these relics look less boring was put forward. This system is a game-based guidance application, in which a time travel game is implemented using AR [5]. Another research presented the results of a user study that compared: classic digital maps, handheld AR browser navigation, and a combination of AR and maps [6]. Results, observations, and issues were noted on the basis of a user study. Augmented reality is also helping revolutionizing health care. Chen et al. [7] proposed a local navigation system using augmented reality with enhanced arthroscopic information. They got to anatomical locations during the operations by utilizing arthroscopic pictures and arthroscopy calibration. They developed a system that presented anatomical data of the area operated upon without using any glasses. Another research studied the corneal disease, one of the primary causes of blindness for humans globally nowadays, and deep anterior lamellar keratoplasty (DALK), a widely applied technique for corneal transplantation. However, the stitch point’s positions affect the success of such surgery highly. Which requires precise control over the instruments used in surgery [8]. User studies and experimental evaluations, both quantitative and qualitative, reveal results indicating accurate tracking and detection of corneal contour in real-time situations with disturbances.

23 Use of Augmented Reality in Indoor Positioning …

233

Another research proposed a car as an AR system, using the windshield to render the outside world and superimpose virtual information. Narzt et al. [9] It uses a camera to detect the driver’s eye position and display information relative to it. The windshield is used as a see-through instrument for AR. The proposed method uses no additional device at all. The user does not need to carry any electronic device for viewing AR information; allows the driver to now see junctions that may be hidden due to other vehicles, etc. Most importantly, the driver does not have to switch between checking the navigation and driving. Budhi and Adipranata [10] proposed a system that performs a search of public facilities by implementing augmented reality. It makes use of the mobile phone’s camera, GPS, and other sensors. The user can point in a particular direction to find the desired public facility such as gas stations or restaurants. Users will be able to see detailed information such as location and the distance of the selected facilities. The application can also show the route from the user’s position toward the public facilities.

23.4 Augmented Reality-Based Indoor Positioning Systems Augmented reality-based indoor positioning systems use vision-based techniques to locate the user. These systems either construct models for the locations based on the floor plans and the layout or store images of the same. The image sequence received from the device is then matched to these models or stored images to determine the position. Figure 23.2 shows the overall block diagram of such systems. The system first detects the image sequence through the camera in the device. These images are then processed to find markers. Markers, as described earlier, are specific patterns having some distinctive points. These markers act as features of the objects that are present such as logos or posters. These objects are also referred to as the target images. Once detected, they are matched against the stored image sequences that can be image sets or 3D models. Every system has a location model that maps the images to the particular location and stores the location-related information. Once the image sequence is matched, the model recognizes the location and fetches the related information. The system overlays the existing scene with this information using AR. This information ranges from simple text-based 2D graphics to 3D objects showing navigational instructions. Augmented Reality SDKs such as ARKit, ARToolkit, and ARCore, to name a few, automate the marker detection process. One can store target images in the database and use them to match the markers for image sequence mapping. Choosing appropriate target images helps in improving the overall accuracy of such systems. Target images having a sufficient number of marker points and features help distinguish them from similar ones, thereby increasing the accuracy and eliminating the chances of mis-classification.

234

A. Dave and R. Dumre

Table 23.1 Merits and limitations of existing systems Research work

Tools and software used

Merits

Limitations

[11]

Camera, tablet PC, Head Mounted Display (HMD), desktop PCs with a wireless LAN interface Software Used: ARToolkit, ARTag

It does not rely on GPS or any other sensor and, hence, does not require any additional devices. It also used image processing techniques to improve the marker detection process

The proposed system relies on the images and videos being captured from the camera. In case of motion, the images get blurred which reduces the accuracy of the marker detection

[12]

PC, webcam Software Used: ARToolkit, OpenGL API, OpenAL (audio API)

By using an audio module, this project also gives the user directions; provides the shortest path for navigation

The proposed system does not take the user’s input into consideration during navigation and does not show different paths to reach the same destination. Also, it relies on floor maps so in their absence, this system would not function

[13]

Smart phones, compass accelerometer, gyroscope Software Used: vuforia SDK, Open GL ES2.0, iSpeech and mapkit

Allows searching for location, indoor and outdoor maps both, allows you to share your location with your friends, has a voice command, provides the shortest path

No support for multiple floors. Also, in case of multiple identical markers, the location detection failed. No user-specific information or services

[14]

Smart phone camera Software Used: Vuforia SDK

Usually, distinctive markers are used as target images for superimposing information using AR. This paper uses the features that are already available in an environment as targets for superimposing information

It is difficult to find an environment suitable for testing such a system. Other limitations of the app were that there was no voice commands, directions were given using text and not graphics, and a map of the building was not available

[15]

Camera, QR code scanner, Internet connection

The information about the location is hosted locally. Therefore, making changes to the location information is inexpensive. Location detection is made easier by making the details of the location available locally by scanning QR codes

Information is downloaded and stored on the local storage of mobile devices. The efficiency is highly dependant on the resolution and quality of the target images; high-quality images take up more space

[16]

Google Glass Software Used: Glass Development Kit, Metaio SDK

Google glass headmounted display comes with sensors like accelerometer, gyroscope, light sensors, and proximity sensors which improves the tracking ability substantially

No quantitative comparison is provided with technologies like Bluetooth and Wi-Fi. Google glasses are expensive and not readily available

[17]

smartphone/Web camera Software Used: ARToolkit

This AR navigation system enables the users to interact with virtual objects. The system was built using ARtoolkit which is an open-source software

The performance depends upon the lighting of the environment

[18]

mobile phone camera Software Used: Unity

Makes use of deep learning and has a higher accuracy. Interference from signals can also be avoided using image recognition. Also, allows the user to customize his location

Since the method uses deep learning, the training time is relatively higher, and it requires a lot of images to give a good accuracy. A lot of effort and time would also be consumed if the floor plans are not available

23 Use of Augmented Reality in Indoor Positioning …

235

Fig. 23.2 Block diagram of the system Table 23.2 Comparison between different systems Research work

Multiple floors

Dependencies

SDK used

Compatibilty Navigation with OS type

Voice commands

Markers

[11]

Yes

None

ARToolkit, ARTag

Android

Textual, 3D No graphics

Special

[12]

Yes



Unity, ARToolkit

Android

3D graphics

Yes

Special

[13]

No

Internet/network

Vuforia

Android and iOS

Textual, Map

Yes

Natural

[14]

Yes

None

Vuforia

Android

Textual

No

Natural

[15]

Yes

Internet connection –





No

Natural

[16]

Yes

Google glasses

Glass development Kit, Metaio SDK

Android

Textual, 3D Yes graphics

Natural

[17]

Yes

None

ARToolkit

Android

3D graphics

No

Special

[18]

Yes



Unity

Android

Textual, Map

No

Natural

236

A. Dave and R. Dumre

Many different systems have been proposed in the last decade that use augmented reality-based indoor positioning systems for indoor navigation. Some of these systems have been discussed in Table 23.1. In Table 23.2, we have compared the systems based on the following attributes: • Multiple floors: Based on whether the system works for buildings with multiple floors. AR systems that use floor maps and have similar floor maps for multiple floors have a hard time providing multi-floor support in their application. • Dependencies: This attribute mentions if the system is dependant on factors like GPS, Wi-Fi, Internet connection, or any special hardware. ‘None’ in the table means that the system does not have any dependencies other than a local database of target images/3D objects. • Software development kits (SDK) are the tools used to make these systems. AR SDKs help locate the targets, build, and place 3D objects virtually in the environment. • Compatibility of the system with various operating systems (OS): The systems built using various SDKs provide compatibility to only a certain OS, e.g., Android, iOS. • Navigation type: Since the systems have been designed for improving the user’s experience in navigation, it becomes important to study the different types of experiences. • Voice commands for navigation: One feature that can greatly improve the navigational experience is the availability of voice commands. • Markers: The location detection depends on the markers that are used by the system. These markers need to be unique to differentiate among different places. Some systems rely on special markers that are placed on the walls. Others, however, use objects that are available in the environment (natural markers) such as numbers on the doors or logos of different shops.

23.5 Inference Compared to techniques using Wi-Fi, Bluetooth, implementing vision-based methods using computer vision can be done without additional requirements, making them more feasible in indoor environments. The various AR-based systems show the different possibilities in the implementation of this technology. Various softwares and tools offer integration with the different mobile OS and different types of devices. They also allow developers to configure the marker detection process by allowing both natural and specially designed markers. The use of these specially designed markers make the technology easy to implement but restricts it due to their positioning. On the other hand, the use of natural markers (objects present in the natural environment), though harder to implement, overcomes that disadvantage.

23 Use of Augmented Reality in Indoor Positioning …

237

23.6 Conclusion and Future Scope In this paper, we have discussed the augmented reality-based indoor positioning systems for navigation. These systems eliminate the need for external utilities like GPS, Wi-Fi, RFIDs, etc., which are not feasible in all indoor environments. The systems provide the users with an interactive environment using augmented reality, improving the navigation experience. The comparison helps the users understand and determine the compatibility, performance, and usability of the AR systems for indoor positioning. Algorithms like A* and Dijkstra’s have been used to calculate the shortest path for navigation. Though they give accurate results, most of these systems lack one feature—user preference. For complexes like shopping malls or colleges, some routes may be more favorable for users because they pass through certain areas. Current systems do not take the user’s choice into account. In addition to this, although most systems have interactive graphics, burdening the user with a lot of information may prove counterproductive. Thus, the system should offer the user a choice between textual or 3D graphics. The systems can also add voice commands that could be helpful for visually impaired people. Using augmented reality and vision-based techniques increase the accuracy. However, some AR systems are dependant on some hardware and markers that are not readily available. Additionally, when stored locally, these target images/3D objects take up a lot of memory on mobile devices. These are some factors that could be kept in mind while developing newer applications.

References 1. Al-Ammar, M.A., et al.: Comparative survey of indoor positioning technologies, techniques, and algorithms. In: Proceedings—2014 International Conference on Cyberworlds, CW 2014, pp. 245–252 (2014). https://doi.org/10.1109/CW.2014.41 2. Carmigniani, J., Furht, B., Anisetti, M., Ceravolo, P., Damiani, E., Ivkovic, M.: Augmented reality technologies, systems and applications. Multimed. Tools Appl. 51(1), 341–377 (2011). https://doi.org/10.1007/s11042-010-0660-6 3. Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., MacIntyre, B.: Recent advances in augmented reality. IEEE Comput. Graph. Appl. 21(6), 34–37 (2001). https://doi.org/10.3390/ biom11020211 4. Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented reality: a class of displays on the reality-virtuality continuum. In: Proceeding SPIE Conference Tele manipulators Telepresence Technology, vol. 2351, pp. 282–292 (1994) 5. Wei, X., Weng, D., Liu, Y., Wang, Y.: A tour guiding system of historical relics based on augmented reality. In: Proceedings–IEEE Virtual Reality, vol. 2016-July, pp. 307–308. https:// doi.org/10.1109/VR.2016.7504776 6. Duenser, A., Billinghurst, M., Wen, J., Lehtinen, V., Nurminen, A.: Exploring the use of handheld AR for outdoor navigation. Comput. Graph. 36, 1084–1095 (2012). https://doi.org/10. 1016/j.cag.2012.10.001

238

A. Dave and R. Dumre

7. Chen, F., Cui, X., Han, B., Liu, J., Zhang, X., Liao, H.: Augmented reality navigation for minimally invasive knee surgery using enhanced arthroscopy. Comput. Methods Programs Biomed. 201, (2021). https://doi.org/10.1016/j.cmpb.2021.105952 8. Pan, J., et al.: Real-time segmentation and tracking of excised corneal contour by deep neural networks for DALK surgical navigation. Comput. Methods Programs Biomed., vol. 197, (2020). https://doi.org/10.1016/j.cmpb.2020.105679 9. Narzt, W., et al.: Augmented reality navigation systems. Univers. Access Inf. Soc. 4(3), 177– 187 (2006). https://doi.org/10.1007/s10209-005-0017-5 10. Budhi, G.S., Adipranata, R.: Public facilities location search with augmented reality technology in Android. In: Proceedings of 2014 International Conference on Information, Communication Technology and System, ICTS 2014, pp. 195–198 (2014). https://doi.org/10.1109/ICTS.2014. 7010582 11. Kim, J., Jun, H.: Vision-based location positioning using augmented reality for indoor navigation. IEEE Trans. Consum. Electron. 54(3), 954–962 (2008). https://doi.org/10.1109/TCE. 2008.4637573 12. Huey, L.C., Sebastian, P., Drieberg, M.: Augmented reality based indoor positioning navigation tool. In: 2011 IEEE Conference on Open Systems, ICOS 2011, pp. 256–260. https://doi.org/ 10.1109/ICOS.2011.6079276 13. Al Delail, B., Weruaga, L., Zemerly, M.J.: CAViAR: context aware visual indoor augmented reality for a university campus. In: Proceedings of the 2012 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology workshops, WI-IAT 2012, pp. 286–290 (2012). https://doi.org/10.1109/WI-IAT.2012.99 14. Kasprzak, S., Komninos, A., Barrie, P.: Feature-based indoor navigation using augmented reality. In: Proceedings—9th International Conference on Intelligent Environments, IE 2013, pp. 100–107 (2013). https://doi.org/10.1109/IE.2013.51 15. Leeladevi, B., Rahul, R.C.P., Tolety, S.: Indoor location identification with better outlook through Augmented Reality. In: Proceedings of 2014 IEEE International Conference on Advanced Communication, Control and Computing Technologies, ICACCCT 2014, vol. 978, pp. 398–401 (2015) . https://doi.org/10.1109/ICACCCT.2014.7019471 16. Rehman, U., Cao, S.: Augmented reality-based indoor navigation using google glass as a wearable head-mounted display. In: Proceedings—2015 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2015, pp. 1452–1457 (2016). https://doi.org/10.1109/ SMC.2015.257 17. Yadav, R., Chugh, H., Jain, V., Baneriee, P.: Indoor navigation system using visual positioning system with augmented reality. In: 2018 International Conference on Automation and Computational Engineering, ICACE 2018, pp. 52–56 (2018). https://doi.org/10.1109/ICACE.2018. 8687111 18. Wu, J.H., Huang, C.T., Huang, Z.R., Chen, Y.B., Chen, S.C.: A rapid deployment indoor positioning architecture based on image recognition. In: 2020 IEEE 7th International Conference on Industrial Engineering and Applications, ICIEA 2020, pp. 784–789. https://doi.org/10.1109/ ICIEA49774.2020.9102083

Chapter 24

Classification of Chest X-Ray Images to Diagnose COVID-19 Disease Through Transfer Learning Sameer Manubansh and N. Vinay Kumar

Abstract The world has been trapped in a pandemic caused by the deadly SARSCOV2 virus, which is very contagious and is rapidly spreading across the world. It affects humans’ respiratory organs and later causing breathing problems, cough, loss of smell, fever, and other symptoms. With the rapidly growing Coronavirus cases, the demand for its testing has increased drastically; however, the labs capable of performing the diagnosis are minimal. Thus, many patients cannot get them diagnosed due to a lack of testing facilities available nearby. The purpose of this paper is to aid the Coronavirus scanning process with Artificial Intelligence for a faster and cost-efficient scanning mechanism. A severe respiratory ailment triggers the sudden occurrence of the Coronavirus disease or Coronavirus. Due to the multiplicative rate of its spread, large population has been infected and is increasing every day at exponential speed. Performing tests to diagnose the Corona positive patients in this large population has become the most significant challenge due to limited resources. In this project, a tool has been proposed based on deep learning algorithms that will be capable of diagnosing the Coronavirus using chest x-rays. The training data contains images of Coronavirus and Pneumonia cases used to train CNN based models. Six training approaches will be performed, including VGG16, VGG19, InceptionV3, ResNet50, and DenseNet 201, and further, the best performing model will be considered for prediction. Having such a model will help us reduce the diagnosis time and the cost; it will also increase the capability to test a more significant number of people daily.

S. Manubansh Capgemini Technology Services India Ltd, Bangalore, KA, India LJMU University, Liverpool, UK N. Vinay Kumar (B) Freelance Researcher, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_24

239

240

S. Manubansh and N. Vinay Kumar

24.1 Introduction The world has been trapped in a pandemic caused by the deadly SARS-COV2 virus, which is very contagious and is rapidly spreading across the world. It is a new virus that has not existed previously among human lives. The virus (SARS-COV2) has traveled from the Wuhan province of China to the entire world. The first case for this virus was diagnosed in the Hubei province of China in December 2019. It is said that this virus has originated from Hunan seafood market in Wuhan city of China, where various animals such as bats, frogs, snakes, marmots, birds, rabbits, and many more were sold, and it infected more than fifty peoples due to its highly transmittable nature. It affects humans’ respiratory organs and later causing breathing problems, cough, loss of smell, fever, and other symptoms. The sudden spread of the Coronavirus and its severe implications have been the biggest challenge for humankind to survive. As it is a highly contagious virus, it results in a very high spread rate. Its survival in the environment depends on various factors, including temperature, humidity, and surface type (Coronavirus disease (COVID-19): Infection prevention and control for health care workers, 2020). Medical professionals are playing a significant role in treating the patients keeping their own life in danger due to the contagious nature of this virus. Even the public health officers play a crucial role in tracing the virus to stop its spread using social distancing measures, frequent hand sanitization, and wearing a face mask. However, when it comes to diagnosing, the massive number of patients using the limited available resources has become a significant challenge. The current medical infrastructure is not developed to handle such a large number of patients with intensive care. A limited ICU is available with a ventilator to support patients with breathing issues. Also, the labs capable of performing Coronavirus diagnosis are limited to certain cities, and thus, testing has become a challenge. In this study, the publicly available dataset from Kaggle’s “COVID-19 Radiography Database” has been used to train the models created to make them accessible to researchers [1].The below are images of chest x-ray from the data set of patients having Coronavirus positive, normal, and viral pneumonia, respectively (Fig. 24.1).

Fig. 24.1 Samples chest x-ray images from [1] data set: a COVID-19 positive, b Normal and c Viral pneumonia

24 Classification of Chest X-Ray Images to Diagnose COVID-19 …

241

This research aims to propose an AI-based deep learning tool for diagnosing the Corona cases using the images of a chest x-ray. This tool will notify us how probable it is for a person to be infected by Coronavirus. Also, it can be used for a faster beginning stage diagnosis before the medical test. This tool will be based on multiple layers of neural network that can be comprehended as the neurons present in human brains. As we have limited provisions for diagnosing the Coronavirus, it is crucial that in this AI world, we should have a tool based on AI which can perform faster diagnosis using the images of x-rays.

24.2 Related Works In this technologically advanced world, it is important to have an AI-based tool to diagnose Coronavirus using images of chest x-rays to cater to the current need for diagnosing a very high number of patients due to the rapidly increasing numbers of positive Coronavirus patients. The primary tool currently being used to identify the Coronavirus is the “Reverse transcription-polymerase chain reaction” [3]. However, it is costly, not very sensitive, and demands medical personnel with specialty [3]. Also, the higher demand for diagnosis has created much pressure on the existing labs, and the diagnosis’s precision is at a challenging phase. Such equipped labs are also limited to their availability in the country and not easily reachable to every patient. As x-ray imaging tool is available easily that can be the best alternative in the coronavirus diagnosis and to aid the existing labs to make them capable of performing this diagnosis using their existing tools at higher speed and lower cost and thus improving the overall diagnosis capacity of the health department infrastructure. The classification of medical images (X-rays and CT scans) has been performed by researchers using the classical machine learning algorithm and deep learning algorithms. In the classical machine learning approach, classifiers such as KNN and SVM have been used to perform image classification [4]. In recent, deep learning algorithms such as CNN have been used majorly due to their high performance and quality to perform image classification. Using the transfer learning approach has helped researchers save much time in training these deep learning models. Diagnosing Coronavirus using the images of CT scans and chest x-rays have been very popular among the research community currently. However, using the images of a chest x-ray to diagnose the Coronavirus is more cost-effective and is available easily [3]. The classical machine learning algorithms have existed for a long time in the automation domain. These algorithms are preferred especially for a smaller dataset and also does not require very high-end computers and can be performed with a decent CPU. Also, in classical ML, feature engineering can be performed directly, and these algorithms are easier to comprehend and understand. Also, hyperparameters tuning and altering the model designs is very straightforward since we have a more thorough understanding of the data and underlying algorithms. Similarly, the research paper of [4] has proposed diagnosis of Coronavirus using chest x-ray

242

S. Manubansh and N. Vinay Kumar

images by ML-based KNN model, which can categorize the images of chest x-ray as Coronavirus positive or normal. Also, the feature extraction “from the images of chest x-ray has been performed using new Fractional Multichannel Exponent Moments (FrMEMs)” [4]. Additionally, an advanced computational machine has been used to speed up the processing. For selecting the most significant features, a modified “Manta-Ray Foraging Optimization based on differential evolution has been used” [4]. The data used to train this classifier was a set of images of two classes, COVID-19, and normal cases. The parallel FrMEMs algorithms were executed on multi-core CPUs to extract the image features. Then, an optimization algorithm was used for the purposed of feature extraction, and finally, a KNN classifier was trained and evaluated” [4]. Similarly, the work of [5] has contributed to exploring the association “between variables and cluster coronavirus patients into sub-types, and create a computer-based classification model to distinguish between coronavirus positive patients and the patients infected with influenza” [5]. He observed various “new relations between variables, including the association between male and serum lymphocytes and neutrophils present in higher levels. Also, he observed that the coronavirus positive patients could be clubbed into subcategories based on the levels of serum of the immune cells, the gender of patient, and the observed signs. He taught the XGBoost model to classify coronavirus patients from influenza patients. He also observed that computer-based methods trained on large data sets can produce better coronavirus classifying models to overcome the drawbacks of testing” [5]. Currently, deep learning techniques have been preferred majorly among researchers for solving the problem of diagnosing Coronavirus. Is this a new area of research? Few papers have been published that have used deep learning based models’ image recognition technology to detect Coronavirus from the images of chest-rays and CT scans. To perform the diagnosis of Coronavirus, the paper of [3] has proposed a tool to understand the importance of AI in the detection of Coronavirus from the images of chest x-ray with lesser time and greater accuracy. In this paper, he has proposed a technique for diagnosing coronavirus pneumonia automatically from the images of chest x-ray by applying the learnings of previously trained DL models, while also increasing the precision of the Coronavirus classification. “Transfer learning technique was used to handle model building with limited data and lower processing time. Image augmentation has been used to train the models and validate the performance of several previously trained deep CNNs. The trained models were used to classify the images of chest x-ray on both the augmented as well non-augmented images” [3]. Additionally, [7] have published his paper for “the diagnosis of Coronavirus using ensembles of deep learning. Ensemble learning is a technique in which multiple models are trained individually. Later, their predictions are added for better performance where other models’ classification ability fulfills the weakness of an individual model. Also, the classification ability from the combined models has shown better performance than individual models” [7]. CNN and ImageNet pre-trained models are trained and evaluated using the images of a chest x-ray. The strategies of transfer learning, pruning iteratively and ensemble, have been used to lower the complexity,

24 Classification of Chest X-Ray Images to Diagnose COVID-19 …

243

improve the strength, and inference the deep learning model’s inference. “Optimization of the hyperparameters of previously trained models has been performed using the grid search as well as L2 regularization technique and with the initial learning rate of Stochastic Gradient Descent optimizer” [7]. With the help of a transfer, learning pre-trained models were retrained on the RSNA’s images of a chest x-ray to learn chest-ray specific features and classify the chest-ray into normal and Coronavirus positive categories. This data set included images of normal and Coronavirus positive chest-ray images also having pneumonia-related opacities. Later, the top three performing “CNN models were instantiated and truncated at their deepest convolutional layer, and the layers of zero-padding, a strided separable convolutional layer with five × five filters and 1024 feature maps, GAP layer, Dropout layer, and finally a dense layer with Softmax activation was added to modify the model and further used to classify the chest x-rays as normal or coronavirus pneumonia or viral pneumonia” [7]. Further, the fine-tuned model was pruned iteratively to detect the optimal number of neurons in the convolutional layers to reduce model complexity without any loss in the model’s performance. The learning was fine-tuned to classify the normal and abnormal chest x-rays. The ensembles of the deep learning models are used to perform the diagnosis of Coronavirus using chest x-ray images” [7]. The work of [6] has also proposed an AI-based classification tool to diagnose coronavirus infection and other lung illnesses. During this study, four conditions that were evaluated are COVID-19 pneumonia, non-COVID-19 pneumonia, pneumonia, and normal lungs. The AI-based deep learning tool was divided into 2 phases. Phase 1 classified images of chest x-ray into patient having pneumonia and patient not having pneumonia. Phase 2 gets the input from phase 1 if the x-ray belongs to the pneumonic class and further classifies it into Coronavirus positive and Coronavirus negative. X-ray images were pre-processed using the image augmentation technique to treat the low-quality images, which would eventually affect the trained model’s accuracy. 2D CNN based model, also known as CovAI-Net, has been trained using the processed x-ray images to diagnose the coronavirus infection. Sethi and Mehrotra [9] have also used the deep learning based approach in their paper to diagnose the Coronavirus as the healthcare infrastructure around the world is trying to expand the diagnosis facilities for Coronavirus. Additionally, the images of chest x-ray are used to build deep learning tool. Four deep learning CNN architectures were created and studied by diagnosing Coronavirus using the chest x-ray images. Due to the benefit of having pre-trained weights in ImageNet, these models were pre-trained on the ImageNet database, which helped to overcome the dependency of a large dataset. A comparison was performed among seven pre-trained deep learning models to diagnose Coronavirus on a small data, which consisted of only 50 x-ray images (25 of positive coronavirus patients and 25 of coronavirus negative patients). Four Deep CNN models (Inception V3, ResNet50, MobileNet, and Xception), which are pre-trained on ImageNet data, were trained using the 6249 chest x-ray images (320 of coronavirus positive patients and coronavirus negative patients). In this study, it was observed that MobileNet was the best performing model on this given data to diagnose the Coronavirus.

244

S. Manubansh and N. Vinay Kumar

Artificial Intelligence has played a crucial role in advancing diagnosis mechanisms for such viruses and diseases in the past. Especially, deep learning techniques have shown very good results and proven to be successful and effective in the medical image classification domain. Multiple studies have depended on deep learning CNN based models to diagnose pneumonia and other diseases using radiography [9]. From the literature review, we have studied the existing AI techniques used to diagnose the Coronavirus. In this study, deep convolutional neural networks will be used to build a tool that can classify positive coronavirus patients using chest x-ray images. The structure of the paper is organized as follows. In Sect. 24.3, proposed methodology for detecting Coronavirus using images of the chest x-rays. Experimentation and results are provided in Sect. 24.4. Comparative analysis is given on Sect. 24.5. Finally, conclusion is presented in Sect. 24.6.

24.3 Proposed Model The Coronavirus has impacted the entire population worldwide, and even the global economy has been severely impacted. It has been observed over time that the diagnosis of Coronavirus in the initial stage has higher chances of cure. However, due to limited testing kits, early diagnosis has not been possible for everyone. The coming of an AI-based tool that can diagnose Coronavirus using chest x-ray images can be of great help. It can also help diagnose other viruses related to the respiratory system. It is expected that transfer learning will give an edge, while working with a limited dataset, time, and computational resources. Using the Image Augmentation techniques, the limitation of a smaller dataset can be managed to train the deep learning models. The test and validation data will be tested on the trained model, and the performance of different models with Image Augmentation and without Image Augmentation will be observed. The best performing model, which has higher recall and classification accuracy on unseen data, will be recommended to diagnose the Coronavirus using the images of a chest x-ray. The dataset on multiple deep learning models (VGG16, VGG19, InceptionV3, ResNet50 and DenseNet 201) will be trained using transfer learning and checking their accuracy in the training phase their performance. Using the transfer learning approach, other deep learning models will be built, explored, and executed to reach the goal. The comparison of the performances using different evaluation matrix will be represented and analyzed. Our dataset has a better sample size of COVID-19 images than the previous studies; hence, it is expected to give better results. In this study, five deep learning models have been trained using transfer learning due to the limited dataset. The five deep learning models include VGG16, VGG19, Inception V3, ResNet 50, DenseNet 201. These models have been initialized using these models’ pre-trained weights when trained on a large dataset like the ImageNet dataset. Later, the model is compiled and trained on our dataset consisting of chest x-ray images of the three classes (COVID-19, Normal, Viral pneumonia) to develop a prediction model for coronavirus classification. The models’ train and validation

24 Classification of Chest X-Ray Images to Diagnose COVID-19 …

245

Fig. 24.2 General architecture of the proposed model

accuracy are recorded and analyzed using “Loss & Accuracy” graphs for each epoch. Lastly, the saved model is reloaded, and predictions are performed on the new test dataset to check the efficacy of trained models on Coronavirus classification. The process flow chart of the above models is illustrated in Fig. 24.2.

24.4 Experimentation and Results 24.4.1 Dataset The dataset for this study has been taken from the Kaggle platform “COVID-19 Radiography Database.” It is a publicly available dataset created to make it accessible to researchers [1]. It was created by the Qatar University and the University of Dhaka’s researchers with their partners from Pakistan and Malaysia having a partnership with the medical doctors [1]. It consists of images of chest x-rays for Corona positive, normal, and viral pneumonia cases. This dataset has 1200 images of the positive Corona cases, 1341 normal images, and 1345 viral pneumonia images [1]. This dataset’s images are in “.PNG” format, having a resolution of 256*256 for COVID-19 images and 1024*1024 for normal and pneumonia image. At present, the number of infected people is very high worldwide, but the images of chest x-ray available online to the public are very few and not arranged. However, the authors’ contribution to providing a comparatively larger dataset and updating it regularly with more such images, including the Coronavirus positive, normal, and viral pneumonia cases, have made a great effort [3]. Table 24.1 gives an overview of the data set having Coronavirus positive, normal, and viral pneumonia chest x-ray images.

246

S. Manubansh and N. Vinay Kumar

Table 24.1 Description of the classes of image in the dataset No Class name

Description

1

COVID-19

Chest x-ray image of a patient diagnosed with Coronavirus positive

2

Normal

Chest x-ray image of a patient diagnosed with Coronavirus negative

3

Viral Pneumonia Chest x-ray image of a patient diagnosed with viral pneumonia positive and Coronavirus as negative

24.4.2 Experimental Setup The chest x-ray images present in our dataset are limited. They have different dimensions, so they need pre-processing to handle the images’ dimensions and augment them to train and analyze the deep learning models. This step of data pre-processing will be performed using the image augmentation technique of Keras, known as Image Data Generator. To train any deep learning model irrespective of the type of dataset used, we need to split the dataset into training data, validation data, and test data. This study contains 1200 images of the positive Corona cases, 1341 normal images, and 1345 viral pneumonia. This dataset has been divided manually, maintaining the ratio of 70:15:15. After splitting the dataset with the mentioned ratio, major portion of the dataset has been kept to train the model to achieve higher accuracy and lower loss. The training and validation datasets will be used during the deep learning models’ training phase, and the test dataset will be used to evaluate the model performance on the trained model. The below table demonstrates the number of images in all the three data types (training, validation, and test) of all the three classes (COVID-19, normal, and viral pneumonia). In our dataset, the three classes’ images do not have the same dimension. Thus, resizing these input images is very important so that the dimension of all the images is consistent. The dimension of the images needs to be modified based on the requirement of the deep learning architecture. Images of higher or lower resolution can be troublesome during the training phase, so the dimensions are modified. It contributes toward proper training and testing of the deep learning models. This dataset’s images have a 256*256 for COVID-19 images and 1024*1024 for normal and pneumonia images. Image Data Generator from Keras has been used to resize all the input images to the dimension of (224*224). Having limited data is a very big barrier when we have to apply CNN deep learning models. Also, having a class imbalance in the data points impacts the model’s learning and results in biased prediction. In such cases, image augmentation is one of the best techniques to handle all these hindrances by building a convolutional neural network that increases the training dataset’s size without the need for more new images. It creates copies of the original image based on the arguments provided (rescale, rotation_range, sheer_range, zoom_range & horizontal_flip). This feature in Keras is provided by using an image data generator, which makes image augmentation very simple and convenient [2] (Table 24.2).

24 Classification of Chest X-Ray Images to Diagnose COVID-19 … Table 24.2 Description of dataset split

247

Datatype (%)

Class name

Number of images

Training (70)

COVID-19

840

Normal

939

Validation (15)

Test (15)

Viral Pneumonia

942

COVID-19

180

Normal

201

Viral Pneumonia

202

COVID-19

180

Normal

201

Viral Pneumonia

201

24.4.3 Results and Analysis The performance of all deep learning models (VGG16, VGG19, Inception V3, ResNet 50, DenseNet 201) has been compared based on their prediction capability to distinguish between COVID-19, normal, and viral pneumonia patients. The model with the best overall performance has been chosen to diagnose Coronavirus using chest x-ray images. The deep learning models’ performance has been tested using several evaluation metrics including precision, recall, f1 score, roc-auc score, and the confusion matrix. As our dataset is from the healthcare domain, the evaluation of such a model needs to be tested on essential measures. These evaluation measures include accuracy, True Positive rate (sensitivity/recall), False Positive rate, precision, and roc-auc score. Figure 24.3 presents the confusion matrices obtained from all the trained deep learning models when evaluated on the validation data. True labels correspond to the original class of the image, and the predicted labels correspond to the predicted class of the image. From the below confusion matrixes, we can see that the ResNet 50 classifier can classify 175 out of 180 images correctly as COVID-19, whereas 3 has been misclassified as normal and 2 has been misclassified as viral pneumonia. The model can classify approximately 97% COVID-19 images correctly and having highest COVID-19 classification accuracy. From Fig. 24.4, it can be observed that ResNet 50 trained model has a precision of 97%, recall of 97%, and f1-score of 0.97. The ResNet 50 model has a recall of 97%, which corresponds that 97% of COVID-19 patients who have COVID-19 has been classified correctly. Also, the roc-auc score is 0.91, which corresponds that the ResNet 50 model does a great job in the classification of COVID-19 images. Figure 24.5 illustrates the TP and FP obtained from ResNet 50 model. Images in the True Positive represents the COVID-19 images classified correctly as COVID-19 and False Positive represents the normal and viral pneumonia images incorrectly as COVID-19.

248

S. Manubansh and N. Vinay Kumar

Fig. 24.3 Confusion matrices of different deep learning architectures with transfer learning

Fig. 24.4 Precision, recall and F1-score of the respective deep learning models

24.5 Comparative Analysis The comparative analysis is conducted in two ways viz., within model comparison and state-of-the art models comparison. The former one presents the efficacy of the proposed deep learning models. The latter one presents the comparative analysis of the best performing proposed model against the state-of-the-art models. These two comparisons are presented in subsequent sub-sections.

24.5.1 Within Model Comparison In the below comparison graph, the precision, recall, f1-score, roc-auc score, and the classification accuracy on test data have been presented for the trained models,

24 Classification of Chest X-Ray Images to Diagnose COVID-19 …

249

Fig. 24.5 TP and FP of the best scoring deep learning model (ResNet 50)

including VGG16 and VGG19 ResNet 50, Inception V3, and DenseNet 201 are presented. Since this study is related to Healthcare Industry and directly impacts humans’ lives, the model’s accuracy in classifying COVID-19 images is very important. In this study, we aim to diagnose coronavirus using chest x-ray images obtained with a model with higher recall (true positive rate) and higher accuracy for classifying COVID-19 images. From the below graph, the ResNet 50 model shows the highest recall of 97% and the highest classification accuracy of 97% on test data. ResNet 50 model can classify 97% of the COVID-19 images correctly, which is a great performance (Fig. 24.6).

24.5.2 Comparison Against State-of-the Art Models The performance of the trained models has been compared with other existing research work with similar objective and the results has been analyzed using the evaluation metrics including precision, recall/ sensitivity and f1-score. Table 24.3 presents the results of the best performing model among the existing and the proposed deep learning architecture to diagnose Coronavirus suing COVID-19 chest x-ray images. All the below researches have been performed using the concept of transfer learning as the size of dataset is limited. From Table 24.3, it is observed that MobileNet architecture is the best performing model among the trained model in the work of [9]. In this work, the model was trained using 320 COVID-19 chest x-ray images and 5928 non-COVID-19 chest

250

S. Manubansh and N. Vinay Kumar

Fig. 24.6 Comparative analysis of various models measured in terms of precision, recall, f1-score and roc-auc score

Table 24.3 Proposed model results along with existing research results Corpus

Methods used

COVID-19 X-ray sample size

Precision

Recall/sensitivity

Chest X-ray images

MobileNet [9]

320

F1 score

87.8

87.8

87.8

DenseNet 121 [8] 225

94.14

93.92

96.1

Proposed method 1200 (ResNet 50)

97

97

97

x-ray images. Also, it has recall of 87.7 which corresponds that this model is able to classify 87.7% of COVID-19 images correctly as COVID-19 [9]. Also, in the work of [8], DenseNet 121 has given the best result among all the trained models when trained on normal and COVID-19 images for binary classification. It used 225 COVID-19 images, 1583 normal and 4292 pneumonia to train the models. Having recall of 93.92, it can classify 93.93 COVID-19 chest x-ray images correctly as COVID-19 chest x-ray images [8]. However, in the proposed architecture, the best performing model is ResNet 50 having recall of 97. This model is able to classify 97% of COVID-19 images correctly as COVID-19. Also, as it has been trained using comparatively higher number of COVID-19 x-ray sample size of 1200, and thus, the trained model is more generalized and versatile.

24 Classification of Chest X-Ray Images to Diagnose COVID-19 …

251

24.6 Conclusion In this paper, the extensive study enabled to find the best AI-based deep learning model (ResNet 50) for the prediction of coronavirus patient using chest x-ray images. Using this AI-based approach for diagnosing coronavirus will enable faster, costefficient, and statistically reliable coronavirus diagnosis compared to the existing diagnosis methods. This classifier can be very reliable as the trained model’s performance has been tested on an all-important statistical-based evaluation matrix including recall, roc-auc score, classification accuracy on the test dataset, and the best performing model ResNet 50 has been proposed for implementation.

References 1. Anon.: COVID-19 Radiography Database|Kaggle. [online] Available at: https://www.kaggle. com/tawsifurrahman/covid19-radiography-database (2020). Accessed 30 Sept 2020 2. Anon.: Image Augmentation for Convolutional Neural Networks|by ODSC—Open Data Science|Medium. [online] Available at: https://medium.com/@ODSC/image-augmentation-forconvolutional-neural-networks-18319e1291c (2021). Accessed 30 Jan 2021 3. Chowdhury, M.E.H., Rahman, T., Khandakar, A., Mazhar, R., Kadir, M.A., Mahbub, Z.B., Islam, K.R., Khan, M.S., Iqbal, A., Emadi, N.A., Reaz, M.B.I., Islam, M.T.: Can AI help in screening viral and COVID-19 Pneumonia? IEEE Access 8, 132665–132676 (2020) 4. Elaziz, M.A., Id, K.M.H., Salah, A., Darwish, M.M., Lu, S. and Sahlol, A.T.: New machine learning method for image- based diagnosis of COVID-19 [online] (2020). Available at: https:// doi.org/10.1371/journal.pone.0235187 5. Li, W.T., Ma, J., Shende, N., Castaneda, G., Chakladar, J., Tsai, J., Apostol, L., Honda, C., Xu, J., Wong, L., Zhang, T., Lee, A., Gnanasekar, A., Honda, T., Kuo, S., Yu, M.A., Chang, E., Rajasekaran, M., Ongkeko, W.: Using machine learning of clinical data to diagnose COVID-19. medRxiv [online] p.2020.06.24.20138859 (2020). Available at: https://doi.org/10.1101/2020. 06.24.20138859. Accessed 6 Dec 2020 6. Mishra, M., Parashar, V. and Shimpi, R.: Development and evaluation of an AI System for early detection of Covid-19 pneumonia using X-ray (Student Consortium). pp. 292–296 (2020) 7. Rajaraman, S., Siegelman, J., Alderson, P.O., Folio, L.S., Folio, L.R., Antani, S.K.: Iteratively Pruned Deep Learning Ensembles for COVID-19 Detection in Chest X-Rays. IEEE Access 8, 115041–115050 (2020) 8. Sekeroglu, B., Ozsahin, I.: Detection of COVID-19 from Chest X-Ray Images Using Convolutional Neural Networks. SLAS Technology 256, 553–565 (2020) 9. Sethi, R. and Mehrotra, M.: Deep learning based diagnosis recommendation for COVID-19 using chest X-Rays images. pp. 18–21 (2020)

Chapter 25

Remodeling Rainfall Prediction Using Artificial Neural Network and Machine Learning Algorithms Aakanksha Sharaff, Kshitij Ukey, Rajkumar Choure, Vinay Ujee, and Gyananjaya Tripathy Abstract Weather forecasting has lately won the consideration of numerous scientists because of its effect on human life. The alert time for climate screw-ups may moreover without a doubt keep masses of lives each year. The significance of atmosphere anticipating can likewise be found in farming, for example, appropriate arranging of ranch tasks, transportation, and capacity of nourishment grains. Information on cyclones, tornados and substantial downpours are basic for ways of life. On the off chance that they are recognized from the earlier, numerous presence misfortunes might be spared. Adverse climate situations and activities could have direct and oblique consequences on distinctive transport or control sectors as an instance transportation routing and time charges in addition to the twist of fate risks can lower. This paper makes a speciality of extreme rainfall events because of its relevance for minimizing the effect on the populace. Statistical methods were proved wrong to predict accurately. This paper studied the effect of machine learning classification techniques such as Artificial Neural Network, Decision Tree and Naive Bayes Gaussian algorithms to predict the rainfall, and it has been found from experimental analysis that Decision Tree outperforms others.

25.1 Introduction Natural disasters occur all over the world up to now. Tsunamis, earthquakes, volcano eruptions, hurricanes, tornados, thunderstorms, and excessive rainfalls, handiest to mention few, are dangerous occasions for civilization. They are examples of destruction and loss of life that strike the civilization. Finding a way to assume the effect concerning these natural phenomena is of critical significance for alerting, minimizing, or averting failures and saving lives. During heavy rainfall, site visitors will A. Sharaff (B) · K. Ukey · R. Choure · V. Ujee · G. Tripathy National Institute of Technology, Raipur, Chhattisgarh, India e-mail: [email protected] G. Tripathy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_25

253

254

A. Sharaff et al.

increase financial decreases. The climate has currently won the attention of many researchers because of which it affects the life of human beings as well as animals directly or indirectly. The weather disorder like cyclones and tornados may also probably cause the damage of health and wealth of living beings in regular manner. That is why, weather forecasting is used. The significance of weather forecasting can also be seen in agriculture, e.g., for appropriate making plans of farm operations, transportation, and garage of food grains, among others. Rainfall prediction is very beneficial in managing resources of water, avoid floods in India where climate conditions vary significantly. Rainfall prediction also helps peasants to maintain their crops regularly. Rainfall prediction is helpful in growth and development of Indian economy as India is dependent on agriculture at great extent. Since dynamic nature of rain and weather makes it difficult to predict rainfall precisely by statistical methods. Rainfall Prediction uses various technology that precisely predicts rainfall by gathering varying data of weather of different locations. Different technologies as radar system, satellites are used for this task. This process is tedious and provides less accuracy for predicting rainfall. There is a need for a better method to predict rainfall correctly and relevantly. It is more convenient to use machine learning techniques for better rainfall prediction in India. Machine learning techniques provide more accuracy as well as precision. Relevant rainfall prediction results are predicted by classification techniques. The objective is to study, analyze, and predict rainfall with help of data of 1 year of Kerala weather station. The dataset has many features as Min Temperature, Max Temperature, Wind Speed, Pressure, Rainfall, Humidity at 9 am and 3 pm on each day Artificial Neural Network, Naïve Bayes, and Decision Tree Techniques are used for predicting Rainfall using above Dataset.

25.2 Related Work Rainfall prediction models can be characterized into two classifications: physical models and information-driven models. Physical models are dependent on physical standards and data-driven models dependent on recorded information that can settle on future choices. Gómez et al. [1], has given a methodology which is a provincial predicting system dependent on the Regional Atmospheric Modeling System (RAMS) which is being run at the CEAM Foundation. The testing of the model has been finished utilizing a progression of programmed meteorological stations from the CEAM organize. Predicting rainfall accurately can be utilized in irrigation scheduling like wetting and drying water system have been accounted for as a viable water sparing procedure for paddy rice without fundamentally diminishing yield as discussed by Cao et al. [2]. One another application of rainfall prediction is to control the building energy by using a middleware software to analyze weather data and actual data, which were compared to evaluate validation forecast and predictive models. By using this technique, electricity charges or costs reduced by 10% as proposed by Lee et al. [3]. Torres et al. [4] investigated about economic variables affected by the timing of water availability and rainfall availability, used to

25 Remodeling Rainfall Prediction Using Artificial Neural …

255

predict future availability of resources. The hydro-economic model can be applied yearly, monthly, and seasonal basis. Velasco executed Multilayer Perceptron Neural Network in precipitation determination of weak-ahead rainfall prediction to furnish associations and people with lead-time for the key and strategic arranging of exercises and approaches identified with precipitation [5]. An IoT based climate determining system has been proposed to give momentary climate figures on a University grounds at interims going from 20 to 60 min. A few versatile estimating calculations dependent on variations of the MLR strategy just as K-Nearest Neighbors (K-NN) have been tested by Fowdur et al. [6]. Extreme Rainfall Event (ERE) affects many areas in India within the past couple of years. It is a very challenging and important task to forecast the ERE. General Circulation Model (GCM) used widely in many fields to forecast such challenging tasks. In this work, variable resolution GCM (VRGCM) technique used for ERE prediction. In the current work, a VRGCM (variable resolution general circulation model) is being arranged and assessed for the occasional expectation of the quantity of EREs in three distinct classes relying on the limit of everyday precipitation over the mainland India during the storm (JJA) season for a long time [7]. Climate hazard group infrared precipitation with station data (CHIRPS) [8], a satellite-based rainfall dataset has been evaluated by Divya and Shetty over Kerala state for water resource applications. Sharaff and Roy have compared regression models with back propagation neural network to predict the temperature for weather forecasting [9].

25.3 Methodology Basically, methodology deals with the construction and design of the proposed model. It will take the weather dataset as input and then the following steps will be performed: • • • •

Data collection as weather dataset. Feature selection and Data preprocessing Splitting into testing and training data Artificial Neural Network, Decision Tree Classifier and Naive Bayes Gaussian Classifier • Output Classification Report of both the classifier Data Collection. Any machine learning project model initially collects data as unstructured data, and this process is called data collection. Here, data collection refers to the data set which is taken from the data of 1 year of Kerala Weather station. The dataset has many features as Min Temperature, Max Temperature, Wind Speed, Pressure, Rainfall, Humidity at 9 am and 3 pm on each day.

256

A. Sharaff et al.

Feature selection and Data preprocessing. After the above first step of data collection, different methods are used for making project data relevant to the project. The more number of attributes makes the project complex, and there is need for filtering the additional irrelevant attributes of data. There is a technique called feature selection that filters additional columns from data by removing them. There is another method called feature extraction which combines the columns of dataset. Splitting into testing and training data. After the above mentioned step, training data and testing data are differentiated from dataset by dividing the dataset into two parts. The model is trained with the aid of training data and version accuracy. Then, the predictions are made upon test data. Artificial Neural Network, Decision Tree Classifier and Naive Bayes Gaussian Classifier. Three classification techniques have been analyzed for the prediction project. Artificial Neural Network, Naive Bayes and Decision tree ML techniques and models provide rainfall results in the rainfall test project.

25.4 Experimental Analysis The proposed framework is implemented using python language as it helps in disposing of specialized subtleties. It is the most used language across the board in AI around the globe and has straightforward grammar and semantics. Kerala weather station dataset is analyzed and used. Each tuple comprises of 22 columns and 365 tuples are there. Table 25.1 represents the sample dataset.

25.5 Result and Discussion Rainfall prediction is challenging task for a country like India. Rainfall prediction plays an important role in day-to-day life and agriculture field. This problem is analyzed and extracted the solution from ML classifier methods. All the data has been classified in Positive (1) or Negative (0) class. The Confusion matrix is used to outline the performance of different model test data records for which the actual value is acknowledged (Tables 25.2 and 25.3). Accuracy: Basically accuracy_score deals with the accuracy of the model, which is defined as: Accuracy =

tp + tn fp + fn + tp + tn

(25.1)

25 Remodeling Rainfall Prediction Using Artificial Neural … Table 25.1 Kerala weather station dataset

257

Attribute no

Attribute

Attribute type

A1

Min Temp

float

A2

Max Temp

float

A3

Rainfall

float

A4

Evaporation

float

A5

Sunshine

float

A6

WindGustDir

object

A7

WindGustSpeed

float

A8

WindDir9am

object

A9

WindDir3pm

object

A10

WindSpeed9am

float

A11

WindSpeed3pm

float

A12

Humidity9am

int

A13

Humidity

int

A14

Pressure9am

float

A15

Pressure3pm

float

A16

Cloud9am

int

A17

Cloud3pm

int

A18

Temp9am

float

A19

Temp3pm

float

A20

RainToday

object

A21

RISK_MM

A22

RainTomorrow

Table 25.2 Confusion matrix

object

Predicted value Actual Value

Negative

Positive

Negative

tn

fp

Positive

fn

tp

tp:true positive. tn:true negative. fp:false positive. fn:false negative. Table 25.3 Performance comparison with accuracy parameter

Algorithms used

Accuracy (%)

ANN

83.4

Decision tree

84.4

Naïve Bayes

78.89

258

A. Sharaff et al.

Rainfall forecast is challenging task for a nation like India. Rainfall expectation plays an important role in day to day life and agriculture field. Table 25.3 illustrates that the artificial neural network (83.4%) and decision tree algorithm (84.4%) is providing more accurate performance than Naive Bayes Gaussian algorithm (78.89%).

25.6 Conclusion and Future Work Prediction of weather has become very important in many areas, it can save human lives, and it can also be used to avoid economic losses. It is difficult to predict accurate weather in a particular area but will give good accuracy by the experience of the past dataset. While carrying out the work, it has been observed that the rainfall predicted through decision tree outperforms other machine learning classifier in Kerala region. Even numerical weather forecasting making an integral part of our daily lives. Weather forecasting work can be further extended for accuracy on high investment deals and projects which are completely or partially dependent based on weather conditions. Many industries like agriculture, airlines, transport companies, etc., can avoid economic problems with forecasting future climate changes and possible conditions. In many countries most of the agricultural industries are depends on natural weather conditions for their productivity, they can overcome many losses by not investing such resources on agriculture which are fully dependent upon natural weather. Transportation industries are responsible for providing safe services of goods, which will be easy transportation scheduling of goods which easily effects from changing climate. Acknowledgements The authors would like to thank the National Institute of Technology Raipur, India for providing infrastructure and facilities to carry out this research work.

References 1. Gómez, I., Caselles, V., Estrela, M.J.: Real-time weather forecasting in the Western Mediterranean Basin: an application of the RAMS model. Atmos. Res. 139, 71–89 (2014) 2. Cao, J., Tan, J., Cui, Y., Luo, Y.: Irrigation scheduling of paddy rice using short-term weather forecast data. Agric. Water Manag. 213, 714–723 (2019) 3. Lee, J., Lee, S., Kim, J., Song, D., Jeong, H.: A middleware platform for the validation and utilization of short-term weather forecast data for office buildings. Energy Build. 149, 192–203 (2017) 4. Torres, M., Howitt, R., Rodrigues, L.: Analyzing rainfall effects on agricultural income: why timing matters. Economia 20(1), 1–14 (2019) 5. Velasco, L.C.P., Serquiña, R.P., Zamad, M.S.A.A., Juanico, B.F., Lomocso, J.C.: Week-ahead rainfall forecasting using multilayer perceptron neural network. Procedia Comput. Sci. 161, 386–397 (2019)

25 Remodeling Rainfall Prediction Using Artificial Neural …

259

6. Fowdur, T.P., Beeharry, Y., Hurbungs, V., Bassoo, V., Ramnarain-Seetohul, V., Lun, E.C.M.: Performance analysis and implementation of an adaptive real-time weather forecasting system. Internet of Things 3, 12–33 (2018) 7. Gouda, K.C., Nahak, S., Goswami, P.: Evaluation of a GCM in seasonal forecasting of extreme rainfall events over continental India. Weather Clim. Extremes 21, 10–16 (2018) 8. Divya, P., Shetty, A.: Evaluation of chirps satellite rainfall datasets over Kerala, India. Trends Civil Eng. Challenges Sustain. 655–664 (2020) 9. Sharaff, A., Roy, S.R.: Comparative analysis of temperature prediction using regression methods and back propagation neural network. In 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 739–742. IEEE, May 2018

Chapter 26

Gender-Based Emotion Recognition: A Machine Learning Technique Biswajit Nayak, Bhubaneswari Bisoyi, Prasant Kumar Pattnaik, and Biswajit Das

Abstract Speech emotion recognition is a mechanism to perform interaction between human and machine. Speech is a most attractive and effective way for expressing emotion as well as attitude. This paper focuses on identifying impact of gender on different basic emotions during exchange of speech. To analyze above different emotional features, emotion speech Hindi database simulated by Indian Institute of Technology Kharagpur, Mel-frequency cepstral coefficient feature extraction method and a classification method are processed. The analysis shows the machine recognizes the female speech more efficiently than male emotion speech recognition irrespective of the method. Simulation also carried out for text independent data. The simulation is carried out by using Indian Institute of Technology Speech emotion for Hindi database. Simulation clearly shows the recognition always happens good when it is performed by female speech than male. And also it doesn’t matter, whether it is text dependent or text independent.

26.1 Introduction Humans normally express their thought a verbal communication. But the existence of feeling is more accepted which is possible only by using emotion. The natural way of exchanging information among individuals is speech. Similarly, the effective B. Nayak (B) · B. Bisoyi Faculty of Management Studies, Sri Sri University, Cuttack, Odisha, India e-mail: [email protected] B. Bisoyi e-mail: [email protected] P. K. Pattnaik School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India e-mail: [email protected] B. Das School of Management, KIIT University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_26

261

262

B. Nayak et al.

method of interfacing between any individual and machine is the speech. The information contained in the speech includes message used for interaction, identification of individuals and individual’s status of emotion used during the interaction. Building a system for recognizing emotion, it is required to categorize imperative emotions by an automatic emotion recognizer which is a difficult process. So as to solve this problem the researcher thought to use the idea of “Palette.” According to the concept of palette, every emotion can be derived from combination of several other basic emotions used for interaction like: “anger,” “fear,” “happy,” “neutral,” “sarcastic” and “surprise.” The SER system implementation is really vital for several facts because of difficulty in clarifying the influence of different features of speech to distinguish among emotions. There may be the possibility of auditory. The auditory discrepancy due to the several parameters like sentences, speaker, techniques to deliver dialog and the way speaker speak always encourages some extra obstacle. As these parameters directly affect the common extracted speech features as energy form and pitch, there may be a possibility of additional recognized emotion exists in counterpart sound, very emotion matches up to a different section of the vocal expression [1, 2].

26.2 Technique and Algorithm Machine learning is an on demand technology in today’s scenario. Machine learning process begins by feeding the machine with required input. By using these inputs, the machine is trained to detect hidden insights and patterns. These patterns are than used to build a machine learning model by using an algorithm in order to solve or match pattern. The technique may use supervised machine learning or un-supervised machine learning method [3, 4]. Machine learning deals with certain terms like: (a) (b) (c) (d) (f)

(g)

Algorithm: An algorithm is technique which are used to learn pattern. Model: Model will basically map the input to the required output by the machine learning algorithm with the data. Predictor Variable: It is a feature of the data that can be used to predict the output. Response Variable: It is a feature or output variable that need to be predicted by using the predictor variable. Training Data: Training data is used to build the machine learning model. It is always need to remember that the size of the training data must be much larger than testing data. Testing Data: Testing data is just used to validate and evaluate the efficiency of the model.

26 Gender-Based Emotion Recognition: A Machine Learning Technique

263

26.3 Feature Extraction MFCCs are mainly based on known variation of human ear critical bandwidth. The MFCC can be a representation of short-term power-spectrum of speech-frame with linear-cosine transform of log power-spectrum on non-linear Mel-frequency scale. As Fig. 26.1 shows the speech signal is treated as input to get the Melfrequency cepstral coefficient and this can be through several steps like, preprocessing, framing, windowing, Fast-Fourier Transform, Mel-filter bank and frequency wrapping, logarithm and discrete-cosine Transform [5–7]. For the carrying out the Mel-frequency cepstral coefficient calculation, it follows certain procedure, as: (1) (2) (3) (4) (5)

Speech signal is preemphasized Frames are generated from speech signal. Fourier transformation is applied for magnitude spectrum. Mel-Filter bank used to evaluate Mel-spectrum. MFCC is derived by applying Discrete-Cosine-Transformation.

It is always required to transform the normal frequency to Mel-frequency, and it can be calculated by an equation, as:  m = 2595 Log10

 f +1 700

where f : Normal Frequency. m: Mel-frequency. Fig. 26.1 Mel-frequency cepstral coefficient evaluation process

Emotional Speech

MFCC

Preprocessing

DCT

Framing

Log

Windowing

FFT

Mel Filter bank & Frequency Wrapping

264

B. Nayak et al.

Fig. 26.2 Process of training phase

26.4 Gaussian Mixture Model Here, the Gaussian Mixture Model is used as a tool for developing classification model. It is a known as model of probabilistic which is not only used for clustering but also can be used for estimation. It is also known to model multimodal distribution. In this paper for the purpose of simulation, three variations of Gaussian Mixture Model are used as: (a) 8-centered, (b) 16-centered, (c) 32-centered so that three models of each emotion can be created by the help of Gaussian Mixture Model [8–10]. In case of training a system as in Fig. 26.2, first the input is identified as speech, then the extraction of feature is carried out through the method mel-frequency cepstral coefficient (MFCC). Then, the model is trained based on different emotions like “anger,” “fear,” “happy,” “Natural,” “Sarcastic,” “Surprise” [11, 12]. In case of testing, a system as in Fig. 26.3 the input is identified as speech as in training procedure. Then, similar procedure is carried our as in training system means feature extraction is carried out using MFCC. Then, the different emotions are passed to the different models to compare with the different emotions like “anger,” “fear,” “happy,” “natural,” “sarcastic,” “surprise,” trained during training of the speech emotion recognition system [13, 14].

26.5 Simulation Modeling Simulation is carried out by using eight different speakers with six different emotions from each. Hindi speech database simulated by The Indian Institute of Technology Kharagpur consists of ten different speakers with 15 text prompts for each speaker with six different emotions. It has ten different speakers; each has to utter 15 sentences in 8 different emotions. Entire data is divided into two different parts. One part is used for training the machine where as other part is used for testing. Here, the testing

26 Gender-Based Emotion Recognition: A Machine Learning Technique

265

Fig. 26.3 Process of testing phase

and training data were 70% and 30%, respectively. Gaussian Mixture Model is used to model the system with three different components like 8 centered, 16 centered and 32 centered. Simulation carried out on Matlab-11 simulation tool. Tables 26.1, 26.2 and 26.3 shows the confusion matrix for the speaker-7 for all the component 8-centered, 16-centerted and 32-centered, respectively. Initially, simulation carried out on the text dependent data input for training and testing the model to evaluate the system performance for both text dependent data and also text independent data. It is found that the system performed reasonably good in case of text dependent data input as compare to the text independent data input. Table 26.4 shows the performance of the system in three different area of Gaussian Mixture Model when the system simulated with text dependent data, whereas Table 26.5 shows the performance of the system when it is simulated by taking text independent data (Fig. 26.4). Table 26.1 Confusion matrix for data using 8-centered Gaussian Mixture Model Anger

Fear

Happy

Natural

Sarcastic

Surprise

Anger

18

0

14

8

4

1

Fear

0

40

1

4

0

0

Happy

11

0

34

0

0

0

Natural

11

0

34

0

0

0

Sarcastic

1

0

1

0

43

0

Surprise

12

0

0

4

0

29

Accuracy % = 71.48

266

B. Nayak et al.

Table 26.2 Confusion matrix for data using 16-centered Gaussian Mixture Model Anger

Anger

Fear

Happy

Natural

Sarcastic

Surprise

22

0

14

5

3

1

Fear

0

39

1

4

0

1

Happy

10

0

32

3

0

0

Natural

14

0

6

25

0

0

Sarcastic

0

0

1

0

44

0

Surprise

3

0

0

1

0

41

Accuracy % = 75.19

Table 26.3 Confusion matrix for data using 32-centered Gaussian Mixture Model Anger

Fear

Happy

Natural

Sarcastic

Surprise

Anger

27

0

13

2

3

0

Fear

0

43

1

1

0

0

Happy

10

0

35

0

0

0

Natural

11

0

4

30

0

0

Sarcastic

0

0

1

0

44

0

Surprise

5

0

1

1

0

38

Accuracy % = 80.37

Table 26.4 System performance with text dependent data inputs

8

16

32

Speaker1

60.74

65.19

65.93

Speaker4

58.52

63.71

70.74

Speaker5

78.52

81.85

82.22

Speaker9

74.44

77.04

82.96

Speaker3

71.48

80.00

83.70

Speaker6

61.85

67.78

73.33

Speaker7

71.48

75.19

80.37

Speaker10

58.74

61.48

62.22

Average Accuracy %

66.98

71.53

75.18

The simulation clearly shows that the text dependent data is easier to recognize by the machine, whereas for the text independent data average accuracy percentage is less. (A) Text Dependent Gender Specific Data. Simulation is carried out for gender specific data inputs and it is found that gender plays a major role in speech processing technology. Tables 26.6 and 26.7 shows the

26 Gender-Based Emotion Recognition: A Machine Learning Technique Table 26.5 System performance with text independent data inputs

267

8

16

32

Speaker1

65.33

70.67

68.67

Speaker4

61.13

62.13

66.78

Speaker5

71.00

79.67

82.33

Speaker9

74.00

79.00

79.00

Speaker3

67.67

72.00

75.00

Speaker6

60.00

64.00

71.33

Speaker7

71.06

77.67

83.67

Speaker10

59.67

58.33

62.67

Average Accuracy %

65.96

70.43

73.68

8

16

32

Speaker1

60.74

65.19

65.93

Speaker4

58.52

63.71

70.74

Speaker5

78.52

81.85

82.22

Fig. 26.4 Accuracy percentage of text dependent data set and text independent data set

Table 26.6 System performance with text dependent gender specific (female) data inputs

Table 26.7 System performance with text dependent gender specific (male) data inputs

Speaker9

74.44

77.04

82.96

Average Accuracy %

68.06

71.95

75.46

8

16

32

Speaker3

71.48

80.00

83.70

Speaker6

61.85

67.78

73.33

Speaker7

71.48

75.19

80.37

Speaker10

58.74

61.48

62.22

Average Accuracy %

65.89

71.11

74.90

268 Table 26.8 System performance with text independent gender specific (female) data inputs

Table 26.9 System performance with text independent gender specific (male) data inputs

B. Nayak et al. 8

16

32

Speaker1

65.33

70.67

68.67

Speaker4

61.13

62.13

66.78

Speaker5

71.00

79.67

82.33

Speaker9

74.00

79.00

79.00

Average Accuracy %

67.86

72.87

74.19

8

16

32

Speaker3

67.67

72.00

75.00

Speaker6

60.00

64.00

71.33

Speaker7

71.06

77.67

83.67

Speaker10

59.67

58.33

62.67

Average Accuracy %

64.06

68.00

73.17

simulation carried out on text dependent female and male data, respectively. It is found out female data inputs plays optimal performance as compare to male data inputs. (B). Text Independent Gender Specific Data. Like text dependent data, the simulation is also carried out for text independent gender specific data inputs, and it is found that gender plays a major role again in speech processing technology. Tables 26.8 and 26.9 shows the simulation carried out on text independent female and male data, respectively. It is found out female data inputs plays optimal performance as compare to male data inputs (Fig. 26.5). Like in text independent, Fig. 26.6 shows the difference between average accuracy percentage in case of text independent data inputs, and it can be clearly identified that the female speech recognition accuracy percentage is more than the male. Fig. 26.5 Accuracy percentage of male and female in text dependent data set

26 Gender-Based Emotion Recognition: A Machine Learning Technique

269

Fig. 26.6 Accuracy percentage of male and female in text independent data set

26.6 Conclusion The outcome of the analysis clearly showing the influence of gender in speech. These variations of speech are used for speech emotion recognition. The Gaussian Mixture Model used for the purpose for classifying different basic emotions. It is used in four variations like 8-centered, 16-centered, and 32-centered. Simulation shows the recognition of female speech over male in any circumstances. The analysis clearly shows the performance of machine in case of female better than the performance of machine in case of male. In case of male for 32-centered it is 74.90%, 16-centered 71.11 and for 8-centered it is 65.89, whereas it is 75.46 fir 32-centered, 71.95 for 16-centered and 68.06 for 8-centered in case of female under dependent data set. Similar result can be observed for independent data set. The gender-based emotion recognition system is machine independent of text means the machine recognizes the female speech more efficiently than male emotion speech recognition irrespective of the method used for designing system, and it may be text dependent machine or text independent machine.

References 1. Koolagudi, S., Rao, K.S.: Emotion recognition from speech using source, system, and prosodic features. In: International Journal of Speech Technology, vol. 15, pp. 265–289 (2012) 2. Nayak, B., Pradhan, M.K: Text-dependent versus text-independent speech emotion recognition/advances in intelligent systems and computing. In: 2nd International Conference on Computer and Communication Technologies, vol. 379, pp. 153–161 (2015). https://doi.org/10. 1007/978-81-322-2517-1_16 3. Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. SPC 48, 1162–1181 (2006) 4. Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, New Jersy (1993) 5. Nayak, B., Madhusmita, M., Sahu, D.K.: Speech emotion recognition using different centered GMM. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(9), 646–649 (2013) 6. Koolagudi, S.G., Maity S., Kumar V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis, vol. 40, pp. 485–492. Springer, Berlin, Heidelberg, (2009)

270

B. Nayak et al.

7. Koolagudi, S., Reddy, R., Yadav, J., Rao, K.S.: IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In: In International Conference on Devices and Communications (ICDeCom), pp. 1–5 (2011) 8. Moataz, M.H., El Ayadi, M., Kamel, M.S., Karray, F.: Survey of speech emotion recognition: feature, classification, schemes and databases. Pattern Recognit. Elsevier 44(3), 572–587 (2011) 9. Cheng, X., Duan, Q.: Speech emotion recognition using Gaussian Mixture Model. In: The 2nd International Conference on Computer Application and System Modeling (2012) 10. Thapliyal, N., Amoli, G.: Speech based emotion recognition with Gaussian Mixture Model. Int. J. Adv. Res. Comput. Eng. Technol. (2012) 11. Reynolds, D.: Gaussian Mixture Models: MIT Lincoln Laboratory, 244 St Wood, emotion recognition using support vector regression, In: 10th International Society for Music Information Retrieval Conference (ISMIR 2009) (2009) 12. Wankhade, S.B., Tijare, P., Chavhan, Y.: Speech emotion recognition system using SVM AND LIBSVM. Int. J. Comput. Sci. Appl. 4(2) (2011). ISSN: 0974-1003 13. Khanna, M.P., Kumar, S., Toscano-Medina, K., Nakano, M., Meana H.P.: Application of Vector quantization in emotion recognition from human speech. In: Information Intelligence, Systems, Technology and Management Communications in Computer and Information Science, vol. 141, pp. 118–125 (2011) 14. Olivares-Mercado, J., Aguilar, G., Toscano-Medina, K., Nakano, M., Meana, H.P.: GMM vs. SVM for face recognition and face verification. In: Corcoran, P. (ed.) Reviews, Refinements and New Ideas in Face Recognition (2011). ISBN: 978-953-307-368-2

Chapter 27

An Efficient Exploratory Demographic Data Analytics Using Preprocessed Autoregressive Integrated Moving Average Siddhesh Nandakumar Menon, Shubham Tyagi, and Venkatesh Gauri Shankar Abstract The demographic dividend is an essential measure of the growth and development of a country. It refers to the economy’s growth due to a shift in the age structure in the country’s population. In India, around 90% of the population is under the age of 60, which is a stark contrast compared to the world, where more than 20% of the population lies above 60. Such a young population ensures that the working-age group will be vibrant in the coming years, adding to the country’s overall productivity. Today, COVID has caused much damage to an already vibrant economy, because of which millions of people have lost their jobs and have had to migrate back to their hometowns. To recover from this severe damage and take stock of the existing and incoming workforce, it is necessary to identify and analyze the current population that lies in suitable age ranges and understand how to use them optimally. Therefore, analyzing the demographic dividend to identify the workforce of the country becomes an essential task.

27.1 Introduction Are there enough people to both pay taxes and to provide retiree health benefits? Can everyone find a job in the future? Do all states in India have sufficient medical facilities? It is interesting to note these very prominent issues related to the increasing population. Government planning, risk management strategies, and development instruments are highly dependent on population predictions. Projection is critical for making a visual projection of the future and while planning for worst-case situations. Because of this, the government places literacy, employment, and many other policies in immediate or long-term situations. Infrastructure may also be planned for S. N. Menon · S. Tyagi Data Science, Manipal University Jaipur, Jaipur, Rajasthan, India V. G. Shankar (B) Department of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_27

271

272

S. N. Menon et al.

public education, healthcare, and spiritual sites with greater control. Governments may provide public services with significant long-term advantages. Basic job search and job placement abilities, together with a fundamental understanding of labor practices, financial investments, and health research and medical resource development, are essential for long-term workforce planning. Environmental, military, geopolitical, and other dangers may emerge in the long run without effective contracts. Governments require long-term agreements to avoid these events and take immediate action to minimize their effects [1]. Employment, the economy, education, and healthcare are all impacted by the COVID-19 epidemic. For businesses to make long-term revenue investments, such as in the pharmaceutical industry, a population scenario must be in place [2]. The demography of India is in the limelight in recent years as the country has exhibited an unprecedented change in the working population as described by Population Division of the Department of Economic and Social Affairs of the UN Secretariat (UNPD) [3]. For India, the Census of India provides the population data and projections for the same. Various statutory bodies like the Sample Registration System under the Ministry of Home Affairs perform annual sample surveys for the population insights. While most of the country is getting older, India is becoming younger. The term “Demographic dividend” was coined by the United Nations Population Fund (UNFPA). “Population dividend” means that economic productivity increases when the workforce increases relative to the number of relatives. UNFPA said: “A country with an increased youth population and a declining fertility rate can benefit demographic dividends [4]. With 28 states and eight union territories, India has a workforce gain over other countries, including China, even with the highest population with 18.47% [5] of the world share in 2021. The demographic dividend is a time-specific process to evaluate the potential in the labor force which will contribute to the nation’s GDP [6]. This trend occurs when there is a shift in the age composition that is boosting the working-age population. This trend is due to the decrease in the total fertility rate (TFR is the number of births per woman) after the growth in life expectancy stabilizes. Calculating demographic dividends is a tedious task. For the state-wise bifurcation, there are various rows and columns generated for every state. The data also need to be scraped from legitimate sources. This is where the modern tools: Python and Microsoft Excel 365 come into the picture. Microsoft Excels 365 also has various data analysis tools. From data scraping to data forecasting, one can perform time series analysis with no code as a prerequisite. Handling such bulky data is efficiently processed by these tools. Python has built-in libraries that cater to our requirement to organize the data more efficiently and then process the data for visualization, population visualizations, segmentation based on rural, urban, and total population [7, 8]. Using Plotly (a data visualization library), one can gain insights from the data with higher accuracy. Plotly allows users to fine-tune the parameters for better and interactive visualization. Data visualization plays a prominent role in checking for trends. One can check for trends via line graphs, multiple line graphs, bar graphs, scatter plots, and many more. These trends then further process considering time as the axis for time series analysis and forecasting. Visualizing the age group structure gives a

27 An Efficient Exploratory Demographic Data Analytics …

273

clear insight into infants, children, teenagers, adults, and the elderly population with male and female distinctions. A structure that visualizes the broad age groups with male and female parameters is termed a population pyramid.

27.2 Methodology In this paper, the analysis of the state-wise demographic dividend of India has been implemented. The data for this paper has been extracted from various sources, which was not complete and had to be preprocessed for visualization purposes. The visualization was done using Python and its various libraries. The mentioned steps have been elaborated on in the following points.

27.2.1 Data Collection To analyze the demographic dividend of an area, the first and foremost requirement is gathering the data for the same. The data needs to be accurate for optimal analysis. Incorrect or data with errors may result in a flawed analysis. Therefore, the most recommended sources for data collection for this analysis are official government census records which contain yearly records of employment, literacy, and mortality rates. In this paper, we collect state-wise data from the official government and census records, as published by the Ministry of Home Affairs, India, and use it for preprocessing, which are as follows:

27.2.1.1

SRS Statistical Records

SRS stands for Sample Registration System. Birth, and death reporting is an important demographic source for socioeconomic growth and population control in developing countries. Data on population growth, births, and deaths forms the basis of demographic predictions. For accurate data, the Office of the Registrar General of India has prepared a sample registration plan for births and deaths in India, commonly known as the Sample Registration System (SRS), according to a pilot base of 1964– 65 and a full scale 1969–70 started ongoing data on these indicators [9]. Since then, SRS has provided data on a routine basis. Various methods have been tried for fat collection. India’s SRS is based on a dual recording system. The field investigation of the Sample Registration System involves the ongoing registration of births and deaths in a randomly selected town/city block by local auditors and inspectors. Data collected from these two sources was collected for discrepancies. Insufficient data is re-checked to create an error-free data source with no duplication. The SRS used in this document is a Population Projection Technology Group report published by the

274

S. N. Menon et al.

Population Committee of the Department of Health and Family Welfare in November 2019. This CDS contains demographic forecasts for India for 2011–2036 [10]. Along with this, the report highlights urban and rural state-wise populations, with male and female gender separation. The population numbers are projected till the year 2036. The report also contains projections on sex ratios, net-migration rates, an expectation of life at birth, total fertility rates for India, all the states, and all the union territories.

27.2.1.2

2011 and 2016 Population Composition Reports

Population composition reports are extracted from SRS itself. These reports generally highlight population division based on age groups, sex, marital status, and developed from half-yearly surveys of SRS. One of the essential components of these reports is the age components. Age composition by residence and broad age groups, namely 0–14, 15–59, and 60+, are essential for analyzing demographic dividends. The broad age group of 15–59 comprises the working class or the labor class [5]. This age group is where most operations and data extractions are performed to derive meaningful analytical models.

27.2.1.3

CMIE Unemployment Reports

CMIE-India Economic Monitoring Center Pvt. Ref. CMIE provides its clients with professional analysis tools that provide financial and business databases for decision making and analysis. Data is examined to determine economic patterns. CMIE has created a database of financial results for India’s most significant single company. It conducts the most comprehensive survey to assess household income, expenses, and savings. Specifically monitors new investment projects and creates the most comprehensive integrated database for the Indian economy [11]. CMIE is currently developing a real-time unemployment database, using R to analyze the country’s current trends and unemployment rates. It provides the unemployment rates of Indian urban and rural populations, 30-day moving average unemployment, and various other forecastings. They also provide state-wise unemployment records in a weekly or 30-day format, including the unemployment rates and numbers in each state, from 2016 onward.

27.2.2 Data Preprocessing The data source should be authentic in every research work. For the research, we have scraped the data from the Report of the Technical Group on the Population Projections, November 2019, from the Census of India. The dataset was in an unstructured manner. So, we did some data preprocessing and made the tables in the report in a

27 An Efficient Exploratory Demographic Data Analytics …

275

structured manner. For making the tables structured with taking help from Microsoft Power Query, we scraped the data more efficiently by using power query tools without hindering the authenticity of data. This is the most efficient manner as it reduces the risk of human error. For the efficient data model, we need to train our model with previous data or, say, create a knowledge base for checking the trend in the population. To reduce the ambiguity of the projected data, we have omitted a few states as the data is from the 2011 Population Census Report. States like Telangana and union territories like Jammu and Kashmir and Ladakh were not formed. In the 2011 Census Report of states from the Northeast, say Seven Sisters, the data was cumulative because of less population and difficulty to survey in remote areas. Challenges like merging data from different sources were more time-consuming. We are merging data from sources like Centre for Monitoring Indian Economy Pvt. Ltd (CMIE), SRS sample report, Census Report, and state demography in a single sheet, and maintaining the consistency and authenticity of data was a challenging task. We have considered the same parameters for the total fertility rate for every state as an input to calculate the whole population. Below is the table from the 2011 Population Census Report depicting TFR and lower asymptote for every state as per 2017 (Table 27.1)

27.3 Data Visualization and Data Insights Data visualization is an essential technique for viewing and understanding large quantities of statistical data. It enables users to easily comprehend trends in the data and adjust and tweak the data as per their requirements [12]. As mentioned above. we have used Python and its various libraries to visualize the data extracted and processed for the visualization process. Python has dynamic libraries allow us to analyze data and derive insights from it. We use data analysis libraries such as NumPy and pandas. NumPy is the Python library used for statistical analysis. It has functions for statistical processing, image processing, graphs and networks, mathematical analysis, multidimensional array analysis, etc. Pandas enable importing various data files such as CSVs, JSON, SQL, and Microsoft Excel. Using pandas, the Excel sheets that contain the preprocessed data can be imported. The visualization library used is Plotly, which allows us to display extensive statistical data and display information in various graphical plots [13, 14]. A population pyramid, also known as an “age-gender pyramid,” is a graphical depiction of the distribution of a population (generally that of a nation or region of the globe) by age groups and gender. The form of the pyramid usually assumes that the population is continually growing. A horizontal bar graph is created when a population pyramid consists of stacked histogram bars arranged horizontally. Population size is shown on the x-axis, while age groups are represented on the y-axis (vertical). A percentage of the overall population or a raw figure may indicate the size of each bar. Historically, males have traditionally been on the left and women on the right.

276 Table 27.1 2011 population census report depicting TFR and lower asymptote for every state as per 2017

S. N. Menon et al. S No.

States

SRS, 2017

Lower asymptote

1

India

2.1

1.667

2

Andhra Pradesh

1.6

1.5

3

Assam

2.3

1.8

4

Bihar

3.2

1.8

5

Chhattisgarh

2A

IS

6

NCT of Delhi

1.5

1-5

7

gl7arat

2.2

1.7

S

Haryana

2.2

1.7

9

Himachal Pradesh

1.6

1.5

10

Jammu & Kashmir

1.6

1.5

11

Jharkhand

2.5

1.8

12

Karnataka

1.7

1.5

13

Kerala

1.7

1.5

14

Madhya Pradesh

2.7

IS

15

Maharashtra

1.7

1.5

16

Odisha

1.9

1.7

17

Punjab

1.6

1.5

IS

Rajasthan

2.6

l.S

19

Tamil Nadu

1.6

1.5

20

Telangana

1.7

1.5

21

Uttar Pradesh

3.0

l.S

21

Uttarakhand

1.9

1.6

23

West Bengal

1.6

1.5

A percentage of the overall population may be described by the size of each bar, whereas one specific figure can be used to describe each bar’s size. Men traditionally occupy the left side of the arrangement, and women the right. It is also often used to depict the age and population distribution using the population pyramid. In part, this is because these pyramids are characterized by a primary image [15] (Fig.27.1). The above plot shows the population pyramid for the years 2011 and 2021. In 10 years, it can be observed that the number of people falling in the age bucket of 0–10 has reduced drastically, and the same for the age bucket of people between 15 and 30 has increased [10]. This trend can be observed in the following years from the SRS report and the line plot, suggesting an uprise of a younger workforce in the upcoming years and an increase in people lying in the range of 60 + years (Fig.27.2). In 5 years between 2011 and 2016, most of the states saw a steep increase in unemployment. The state of Uttar Pradesh saw the highest rise in unemployment. The unemployment numbers then dropped as the graph proceeds to 2021 in most states, except states such as Rajasthan, Jharkhand, etc. This plot can deliver insights

27 An Efficient Exploratory Demographic Data Analytics …

Fig. 27.1 Population pyramid for the years 2011 and 2021

Fig. 27.2 Unemployment in the states from 2011 to 2021

277

278

S. N. Menon et al.

such as which states need to allocate more resources to boost the economy and produce new jobs for the workforce to reduce the unemployment numbers [16].

27.4 Time Series Forecasting Machine and deep learning techniques are the solutions to all predictive modeling issues. The research was performed by Makridakis et al. [17] to compare different time series forecasting methods. One-step and multiple-step predictions over a large number of time series were used to assess a collection of eight classical techniques and ten machine learning methods. Eight traditional approaches often referred to as “classical techniques,” were examined. These methods include simple exponential smoothing, Holt, damped exponential smoothing, theta method, ARIMA, ETS, Naïve 2, and average SES. Other machine learning techniques in the ten methods include multilayer perceptron (MLP), Bayesian neural network (BNN), radial basis function (RBF), kernel regression, K-nearest neighbors (KNN), CART regression trees (CART), Gaussian process, long short-term memory (LSTM), and recurrent neural network (RNN). To duplicate and compare previous research, machine learning models were utilized to produce the data, which was also processed according to that study’s approach [18,19]. For the one-step forecasting, the MLP and the BNN were found to achieve the best performance among the 18; RNN and LSTMs showed the most unsatisfactory results. In the multi-step forecasting, theta, ARIMA, and a combination of exponential smoothing achieved the best performance. In an overall scenario, ETS and ARIMA showed the highest efficiency. The study, therefore, recommends the use of these two methods before more elaborate methods are explored. Based on the results published by this study, we have applied an ARIMA model to our demographic dataset [20–22]. ARIMA stands for autoregressive integrated moving average, which is a class of functions and models that explains and predicts a time series based on its pre-existing values, so that the model can be used to forecast future values. It is a statistical equation that measures events over a period in the future or the past and predicts specific values based on previous values. ARIMA is one of the most used time series forecasting models used in demographic analysis due to its easy predictability and accuracy in forecasting future results. In this paper, we have performed a basic implementation of statistical ARIMA to plot a time series forecasting on the population of the state of Andhra Pradesh. The autocorrelation plot shown in Fig. 27.3 which shows the randomness in the data that we have used. The state’s population has been shown for 2016–2021 in Fig. 27.3, and the ARIMA model would be applied to it [23]. For the parameters of ARIMA, we have considered (2, 1, 0). On implementing the ARIMA Model, we get the SARIMAX results, which show the summary of the entire model and the number of observations, dependent variable, covariance type, standard error, coefficient, p-value, etc. We generate the predictions for the ARIMA, as shown

27 An Efficient Exploratory Demographic Data Analytics …

279

Fig. 27.3 Population line graph for the population for the state of Andhra Pradesh and the autocorrelation plot for the line graph

in Fig. 27.4, which shows the predicted and expected values from the forecasts. The forecasts show the predicted count for the populations in the upcoming years in Andhra Pradesh. The forecast did return a high RMSE value which means the data needs to be tweaked, and the ARIMA model must be further optimized for better accuracy results. Fig. 27.4 Expected and predicted values from the ARIMA model

280

S. N. Menon et al.

27.5 Future Work and Conclusion The prime target will be getting our data more accurate by reducing the RMSE value of the ARIMA model. A better accuracy would mean that the model can be applied to all the states to forecast various quotients and predict population numbers. The demographic divide requires a fine-tuning of considering the healthy population. There is also a need to consider the literacy rate parameter along with health unemployment. The unemployment rate will be more classified into marginal workers rate and total unemployment rate. This paper will surely open the doors for various other projects. To strategize government policies for India’s youth on the state level, one can consider this model. Taking notations for the age group projection in-state perspective from the model is beneficial for the business mainly dealing with agendas primarily focused on age groups. Creating state-wise dashboards for the state demography with time series forecasting will be fruitful for modeling policies based on their state population and can compare with other state’s population or total country population. From this paper, the segmentation of rural and urban demographic dividends is also achievable. This will generate a lucid reality of the condition of the state’s workforce. In conclusion, the paper deals with India’s giant population and its capabilities to be an optimal workforce in developing the country. Identifying this workforce and equipping them with the required tools to work toward development is one of the primary and most important tasks the government needs to focus on developing. Literacy and employment are two of the measures that need to be worked upon to make the country’s young workforce capable and are, therefore, the need of the hour.

References 1. Chauhan, S., Arokiasamy, P.: India’s demographic dividend: state-wise perspective. J. Soc. Eco. Dev.20(1), 1–23 (2018).https://doi.org/10.1007/s40847-018-0061-7 2. Devi, B., Shankar, V.G., Srivastava, S., Srivastava, D.K.: AnaBus: a proposed sampling retrieval model for business and historical data analytics. In: Sharma, N., Chakrabarti, A., Balas, V. (eds.) Data management, analytics and innovation. Advances in intelligent systems and computing, vol. 1016. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9364-8_14 3. Population Division.: United Nations.www.un.org/development/desa/pd. Accessed 20 Apr 2021 4. Wikipedia contributors.: Demographic Dividend. Wikipedia. en.wikipedia.org/wiki/ Demographic_dividend#:%7E:text=In%20other%20words%2C%20it%20is,to%20reap%20a%20 demographic%20dividend (2021) 5. Demographics of India.: Wikipedia. en.wikipedia.org/wiki/Demographics_of_India (2021) 6. Vollset, S.E., et al.: Fertility, mortality, migration, and population scenarios for 195 countries and territories from 2017 to 2100: A forecasting analysis for the global burden of disease study. The Lancet396(10258), 1285–1306 (2020). https://doi.org/10.1016/S0140-6736(20)30677-2 7. Shankar, V.G., Sisodia, D.S., Chandrakar, P.: A novel discriminant feature selection–based mutual information extraction from MR brain images for Alzheimer’s stages detection and prediction. Int. J. of Imaging Syst and Technol. 1- 20 (2021). https://doi.org/10.1002/ima. 22685

27 An Efficient Exploratory Demographic Data Analytics …

281

8. Shankar, V.G., Sisodia, D.S., Chandrakar, P.: DataAutism: An Early Detection Framework of Autism in Infants using Data Science. In: Sharma N., Chakrabarti A., Balas V. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1016. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9364-8_13 9. Mozumder, K.A., Koenig, M.A., Phillips, J.F., Murad, S.: The sample registration system: an innovative system for monitoring demographic dynamics. Asia-Pac. Popul. J.5(3), 63–72 (1990) 10. Census of India: Sample Registration. Censusindia. censusindia.gov.in/vital_statistics/SRS/ Sample_Registration_System.aspx. Accessed 20 Apr 2021 11. CMIE: Centre for Monitoring Indian Economy Pvt. Ltd.,www.cmie.com 12. Aparicio, M., Costa, C.J.: Data visualization. Commun. Des. Q. Rev.3(1), 7–11 (2015).https:// doi.org/10.1145/2721882.2721883 13. Shankar, V.G., Devi B., Srivastava S.: DataSpeak: data extraction, aggregation, and classification using big data novel algorithm. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds.) Computing, communication and signal processing. Advances in intelligent systems and computing, vol. 810. Springer, Singapore (2019).https://doi.org/10.1007/978-981-13-1513-8_16 14. Shankar, V.G., Devi, B., Bhatnagar, A., Sharma, A.K., Srivastava, D.K.: Indian air quality health index analysis using exploratory data analysis. In: Sharma D.K., Son L.H., Sharma R., Cengiz K. (eds) Micro-Electronics and Telecommunication Engineering. Lecture Notes in Networks and Systems, vol. 179. Springer, Singapore (2021).https://doi.org/10.1007/978-981-33-46871_51 15. Future of India: Trends Projections Age-Cohort Analysis. Proximityone, proximityone.com/future_of_india.htm. Accessed 20 Apr 2021 16. Crespo Cuaresma, J., Lutz, W., Sanderson, W.: Is the demographic dividend an education dividend? Demography51, 299–315 (2014).https://doi.org/10.1007/s13524-013-0245-x 17. Makridakis, S., Spiliotis, E., Assimakopoulos, V.: Statistical and machine learning forecasting methods: concerns and ways forward. PLoS ONE13(3), e0194889 (2018).https://doi.org/10. 1371/journal.pone.0194889 18. Ahmed, N.K., Atiya, A.F., El Gayar, N., El-Shishiny, H.: An empirical comparison of machine learning models for time series forecasting. Economet. Rev.29(5–6), 594–621 (2010) 19. Devi, B., Shankar, V.G., Srivastava, S., Nigam, K., Narang, L.: Racist tweets-based sentiment analysis using individual and ensemble classifiers. In: Sharma, D.K., Son, L.H., Sharma, R., Cengiz, K. (eds.) Micro-Electronics and Telecommunication Engineering. Lecture Notes in Networks and Systems, vol, 179. Springer, Singapore (2021).https://doi.org/10.1007/978-98133-4687-1_52 20. Tyagi, S.: “Something about Time Series Analysis—Shubham Tyagi. Medium. shubhamtyagi.medium.com/something-about-time-series-analysis-e94ae49cf9f1 (2021) 21. Population Pyramids of the World from 1950 to 2100. PopulationPyramid.Net.www.populatio npyramid.net/india/2020. Accessed 20 Apr 2021 22. Zakria, M., Muhammad, F.: Forecasting the population of Pakistan using the ARIMA model. Pak. J. Agri. Sci46(3), 214–223 (2009) 23. Brownlee, J.: How to create an ARIMA model for time series forecasting in python. In: Machine Learning Mastery (2020) machinelearningmastery.com/arima-for-time-seriesforecasting-with-python

Chapter 28

Classification of VASA Dataset Using J48, Random Forest, and Naive Bayes S. Anitha and M. Vanitha

Abstract Nowadays, the work pressure level increasing caused diseases of employees in the different organizations or companies. Predicting diseases using data mining techniques plays an important role of medical industry. The synthetic dataset named VASA which is collected the data from employees who affected the work pressure. In this paper, the capability of different classifiers such as J48 classifier, random forest classifier, and Naive Bayes classifier, analyzing the VASA dataset for disease prediction. The output of each classifier is compared with accuracy, TPR, TNR, precision, and error rate, and finally, get the best classifier which is produced high accuracy and low error rate.

28.1 Introduction The real dataset is collected from different employees under work pressure. Generally, data mining is applied to extract hidden information in large databases, and it can be capable of predicting diseases [1]. Data mining is also one of the interdisciplinary field because of it produces the methods in machine learning, deep learning, artificial intelligence, etc. [2]. Different data mining techniques are J48, random forest, and Naive Bayes. These methods can be applied to the dataset which contains the information about the work pressure employees for predicting diseases. The dataset contains 523 records and 41 attributes of each record. These techniques have been implemented on a dataset that produced the confusion matrix. The results of confusion matrix based on dataset are compared and finding the best technique for predicting diseases [3–5] (Table 28.1). This paper is arranged into five sections. Section 28.2 contains related works performed in data mining techniques. Sect. 28.3 represents the methodology for predicting diseases of employees. The result and discussion are represented in Sect. 28.4. The final Sect. 28.5 conclude the result and future enhancement of the work. S. Anitha (B) · M. Vanitha Alagappa University, Karaikudi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_28

283

284

S. Anitha and M. Vanitha

Table 28.1 VASA dataset Attribute

Values

Age

1. 20–30, 2. 31–40, 3. 41–50, 4. Above 50

Sex

1. Male, 2. Female

Marital status

1. Single, 2. Married

Highest qualification

1. Below SSLC, 2. Graduate, 3. Post Graduate, 4. Technical Qualification

Working environment

1. City, 2. Village

Nature of works

1. Physical, 2. Mental, 3. Both

Working sector

1. Govt. 2. Private, 3.Business

Occupation

1. Teacher/Professor, 2. Doctor, 3. Engineer, 4. Bank Employees, 5. Business, 6. Labors, 7.others

Working hours

1. 6 h, 2. 8 h, 3. > 8 h

Overtime hours

1. 2 h, 2. 5 h, 3. > 5 h

Job satisfaction

1. Yes, 2. No

Working experience

1. 2–5 years, 2. 6–10 years, 3. > 10 years

Sufficient income

1. Yes, 2. No

Work pressure

1. Yes, 2. No

Addiction

1. Smoking, 2. Alcohol, 3.Tobacco

Food

1. Veg., 2. NV, 3. Both

Sleeping hours

1. < 6 h, 2. 6–8 h, 3. > 8 h

Physical activities

1. Yes, 2. No

Type of roles

1. Major, 2. Minor

Leisure time

1. Reading books, 2. Listening Music, 3. Watching TV, 4. Playing chess

Diabetes

1. Yes, 2. No

Blood pressure

1. Yes, 2. No

Headache

1. Yes, 2. No

Mental illness (depression)

1. Yes, 2. No

Heart disease

1. Yes, 2. No

Gastritis (Ulcer)

1. Yes, 2. No

Stroke

1. Yes, 2. No

Exercise regularly

1. Yes, 2. No

Any continuous medication

1. Yes, 2. No

How long suffer

1.6 months, 2. 1 year, 3. > 1 year

Body weight

1. < 40 kg, 2. 41–60 kg, 3. > 60 kg

Feel tired or depressed

1. Yes, 2. No

Working conditions

1. Satisfactory, 2. Dissatisfactory, 3. Cannot say

Better on the job if conditions changed

1. Yes, 2. No (continued)

28 Classification of VASA Dataset …

285

Table 28.1 (continued) Attribute

Values

Job affects your family

1. Yes, 2. No

Control over life

1. Yes, 2. No

Any argument

1. Yes, 2. No

Activity for stress relief in organization

1. Yes, 2. No

Are underpaid

1. Yes, 2. No

Are undervalued

1. Yes, 2. No

Appreciation for good work

1. Yes, 2. No

28.2 Related Work Bharati et al. [6], this article is used different classification algorithms such as Naive Bayes method, random forest method, logistic regression method, multilayer perceptron method, and K-nearest neighbor method for predicting breast cancer in the UCI dataset. The Kappa statistics value, TP & FP rate, and the value of precision are compared and finally KNN produced best accuracy for predicting breast cancer. Maliha et al. [7] here for predicting cancer using the classification algorithms are Naive Bayes algorithm, K-nearest neighbor algorithm, and J48 algorithm. Training and testing data are split from the original dataset and then evaluated the result. The output of classification techniques produced confusion matrix. From that values, highest accuracy is produced by KNN. Mir et al. [8], some machine learning algorithms are used for predicting diabetes. Here, Pima Indian Diabetes Dataset is used for disease prediction. Four classifiers such as CART, Naive Bayes, random forest, and SVM are compared as per training and testing time and accuracy, and then, finally conclude that SVM is better classifier. Deepa et al. [9], this article is used to detect autism spectrum disorder (ASD) from the children. The dataset is taken from UCI and also using three data mining techniques such as decision table, SVM, and Naïve Bayes. These are all given final data which is reliable based on 100% Kappa statistics. Bhargava et al. [10], this paper used CART algorithm for predicting heart attack. The heart disease patient data are collected from medical practitioners. There are eight attributes including disease attribute, so totally 90 instances using Weka tool, whether the patient affected heart attack or not. Alam et al. [11], some suitable algorithms are used for finding the feature ranking. Random forest can be applied into the dataset which have the highest ranking. This method is implemented in ten benchmark dataset which collected from UCI for predicting diseases. Breast cancer dataset, diabetes dataset, Bupa dataset, hepatitis, SPECTF dataset, heart dataset, planning relax dataset, Parkinson’s dataset, and HCC dataset are used in this paper for predicting diseases.

286

S. Anitha and M. Vanitha

William et al. [12], this paper used enhanced fuzzy c-means algorithms for cervical cancer classification-based pap smears report. This algorithm provides the better accuracy. Patil et al. [13], in this article, C4.5 algorithm is used for predicting the survivability which data were collected from the hospital. The dataset contains 180 samples which was divided into 104 and 76 instances for training and testing data. Using Weka tool produced pruned decision tree for predicting survival of burn patients. This technique is the best for producing sensitivity and specificity.

28.3 Methodology J48, random forest, and Naive Bayes classification algorithms are discussed in this paper. The system architecture is shown in below

Dataset

Preprocessing

Classifier

J48

Random Forest

Naive Bayes

Performance Analysis

J48 Classifier

At first, dataset are preprocessed, then three classifier algorithms have been executed in the dataset. And finally, comparison of the output produces the best result.

28 Classification of VASA Dataset …

287

28.3.1 J48 Algorithm Independent and dependent variables are included in the synthetic dataset. J48 algorithm can be applied into that dataset, and then new dataset record is predicted. This algorithm runs easily for all types of data such as discrete, continuous, and missing values [14]. After creation, it also provides an option for pruning trees. The greedy method is used by classifier, and error is reduced. It can create declarations for classification.

28.3.2 Random Forest One of the classifier algorithm is random forest which is under the category of supervised learning. Both classification and regression used this method but majorly used in classification problems. Lot of decision tree and the output class are included in random forest which individual trees are the classes’ output mode. Decision trees characteristic such as random decision forests appropriate is over suitable to their instruction set. One important property of random forest is that the reported training error is unbiased estimator cross-validated error rate. Random forest algorithm’s procedure is as follows. Step 1—From a given dataset, to select the random samples. Step 2—Construct a decision tree for every sample. From every decision tree will be produced the prediction result. Step 3—Every predicted result will perform the voting. Step 4—Finally, select the prediction result which is mostly voted.

28.3.3 Naive Bayes One of the data mining classifier algorithm is Naive Bayes classifier which constructs from Bayes theorem. In this given class, it considers the outcome of an attribute value. This outcome is independent on another attribute value on the same class. This classifier is called simple probabilistic classifier, and every feature is also called class-conditionally independent. Bayes probability and decision rule are combined in Naive Bayes. It is worked using the below equation P(Ci |X) =

P(X|Ci )P(Ci ) P(X)

288

S. Anitha and M. Vanitha

28.4 Results and Discussion Here, diseases prediction of employees who are affected from work pressure using three types of classification algorithm and also find the accuracy which algorithm is best among them. The VASA dataset is used for classification which are the following algorithms are J48, random forest, and Naive Bayes algorithm. These algorithm will produce the confusion matrix when executed in the dataset. There are four types of disease prediction of employees majorly under work pressure. The confusion matrix contains the values of true positive, false positive, true negative, false negative. Using these values of confusion matrix to calculate accuracy, specificity, sensitivity, precision, and error rate using the below formula. Accuracy =

TP + TN TP + FN + FP + TN

True Positive Rate(TPR) or Sensitivity = Error Rate =

TP TP + FN

FP + FN TP + FN + FP + TN

Precision =

TP TP + FP

where TP—True positive. TN—True negative. FP—False positive. FN—False negative. The VASA dataset is executed using the classification algorithm one by one; then, the output shows confusion matrix. It contains the values TP, TN, FP, FN based on the specified dataset. As per the produced output, the above formula is calculated for finding the result as follows. Based on the values of confusion matrix finds the accuracy, rate of true positive, true negative, error rate, and precision of three classification algorithms (Tables 28.2, 28.3 and 28.4). After finding the average of accuracy, TNR, TPR, precision, and error rate for three classification algorithms, comparison is given in Table 28.5. Table 28.2 Output of J48 algorithm Class

Accuracy (%)

TNR (%)

TPR (%)

Precision (%)

Error rate (%)

Blood pressure

86

91

75

79

14

Mental illness

88

92

80

81

12

Headache

79

70

85

82

21

Gastritis

71

73

70

67

29

28 Classification of VASA Dataset …

289

Table 28.3 Output of random forest Class

Accuracy (%)

TNR (%)

TPR (%)

Precision (%)

Error rate (%)

Blood pressure

85

91

72

79

15

Mental illness

71

76

65

67

29

Headache

85

65

97

82

15

Gastritis

81

80

82

77

19

Table 28.4 Output of Naive Bayes Class

Accuracy (%)

TNR (%)

TPR (%)

Precision (%)

Error rate (%)

Blood pressure

74

82

58

60

26

Mental illness

64

64

64

57

36

Headache

64

36

82

67

36

Gastritis

59

51

69

53

41

Table 28.5 Comparison of J48, random forest, and Naïve Bayes Class

Accuracy (%)

TNR (%)

TPR (%)

Precision (%)

Error rate (%)

J48

81

81

77

77

19

Random forest

80

78

79

76

20

Naïve Bayes

65

58

68

59

35

From Table 28.5, the accuracy of J48, random forest, and Naïve Bayes is 81, 80, and 65%, respectively. The error rate of J48 classifier is 19%, random forest classifier is 20% and Naïve Bayes classifier is 35%. Considering these values of accuracy, J48 classifier is the high accuracy other than two classifiers such as Naïve Bayes and random forest. The error rate is also low in J48 compare than two like. J48 < Random Forest < Naïve Bayes. In Figs. 28.1 and 28.2, blue represents J48 classifier, orange represents random forest classifier, and ash represents Naive Bayes classifier. Fig. 28.1 Graph of accuracy

82 80 78 76 74 72 70 68 66 64 62 Accuracy

290

S. Anitha and M. Vanitha

Fig. 28.2 Graph of error rate

28.5 Conclusion In this article, the real dataset which consists of work pressure employees’ data is used for classification and predicting diseases. Three classification algorithms J48, random forest, and Naïve Bayes are applied in the real dataset. Dataset contains four classes which is produced; the confusion matrix and the values are clearly mentioned in tables. The comparison table particularly shown the better algorithm. The three algorithms are executed well, but J48 classifier produced best accuracy and low error rate other than two classifier. In future, other classifier techniques will be implemented for getting some more better accuracy. Acknowledgements This research work has been supported by RUSA PHASE 2.0, Alagappa University, Karaikudi. Declaration We have taken permission from competent authorities to use the images/data as given in the paper. In case of any dispute in the future, we shall be wholly responsible.

References 1. Rayka, S.S., Shet, V.N.: Cognitive analysis of data mining tools application in health care services. Published in IEEE (2020) 2. Beulah Christalin Latha, C., Carolin Jeeva, S.: Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inf. Med. Unlocked (2019) 3. Anitha, S., Vanitha, M.: The predicting diseases of employees with VASA dataset using entropy. Int. J. Adv. Sci. Technol. (IJAST) 29(4) (2020). ISSN: 2205-4238 4. Anitha, S., Vanitha, M.: Imputation methods for missing data for a proposed VASA dataset. Int. J. Innov. Technol. Explor. Eng. (IJITEE), 9(1) (2019). ISSN: 2278-3075 5. Anitha, S., Vanitha, M.: Selecting features related to work-pressure and assessing possible diseases in humans. Int. J. Sci. Technol. Res. 8(12) (2019). ISSN 2277-8616 6. Bharati, S., Rahman, M.A., Podder, P.: Breast cancer prediction applying different classification algorithm with comparative analysis using Weka. published in IEEE (2018) 7. Maliha, S.K., Ema, R.R., Ghosh, S.K.: Cancer disease prediction using Naive Bayes, K-nearest neighbor and J48 algorithm. published in IEEE (2019)

28 Classification of VASA Dataset …

291

8. Mir, A, Dhage, S.N.: diabetes disease prediction using machine learning on big data of healthcare. published in IEEE (2018) 9. Deepa, B., Marseline, J.: Exploration of autism spectrum disorder using classification algorithms. Int. Conf. Recent Trends Adv. Comput. (2019) 10. Bhargava, N., Dayma, S., Kumar, A., Singh, P.: An approach for classification using simple CART algorithm in Weka. published in IEEE (2017) 978-1-5090-2717 11. Alam, M.Z., Saifur Rahman, M., Sohel Rahman, M.: A random forest based predictor for medical data classification using feature ranking. Inf. Med. Unlocked, Published by Elsevier Ltd (2019) 12. William, W., Ware, A., Habinka, A., Ejiri, B., Obungoloch, J.: Cervical cancer classification from Pap-mears using an enhanced fuzzy C-means algorithm. Published by Elsevier Ltd (2019) 13. Patil, B.M., Toshniwal, D., Joshi, R.C.: Predicting burn patient survivability using decision tree in WEKA environment. published in IEEE (2009) 14. Mienye, I.D., Sun, Y., Wang, Z.: Prediction performance of improved decision tree-based algorithms-a review. In: ScienceDirect, 2nd International Conference on Sustainable Materials Processing and Manufacturing. SMPM (2019)

Chapter 29

Efficient Fault-Tolerant Cluster-Based Approach for Wireless Sensor Networks Kavita Jaiswal and Veena Anand

Abstract Wireless sensor networks (WSNs) can be used for a wide range of purposes and have almost unlimited future potential. Energy depletion, hardware failure, communication connection errors, malicious attacks, and other factors can cause nodes in WSNs to fail. As a result, one of the essential aspects of WSNs is fault tolerance. In this paper, we present a fault-tolerant technique for a clustered WSN . A node is chosen as an alternate cluster head (ACH) to detect the failure of cluster heads (CH) and cluster member (CM) nodes. The alternate cluster head monitors the CH’s performance and copies its results. The malfunctioning of the CM is detected by the CH, and the data from the failed CM will be transferred to CH via ACH. The proposed methodology compares the performance of some existing related algorithms in terms of various metrics such as energy efficiency, number of inactive nodes, number of dead cluster heads, and packet loss.

29.1 Introduction Modern technology is widespread in wireless sensor networks (WSN) in most applications to carry out day-to-day real-world tasks. WSNs comprise of numerous sensor nodes that are broadly dispersed in a hostile environment and accumulate data [1]. The location of these nodes is not mandatorily pre-specified and prefixed. WSN is getting more popular each day in our current lives; the services provided by WSN related to the integrity of information, the correction, and transmission of data in a timeconscious form have increasingly caught the attention of people doing research and programmers. Nevertheless, nodes in WSNs are inclined to have failures because of power exhaustion, loss in the hardware, communication link errors, malware attacks, etc. Therefore, addressing the faults and failure in WSN throws the biggest challenge to the research community. K. Jaiswal (B) · V. Anand National Institute of Technology Raipur, Chhattisgarh, India e-mail: [email protected] V. Anand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_29

293

294

K. Jaiswal and V. Anand

Fault tolerance is the state of a system’s ability to provide a required level of operability in the occurrence of errors [2]. The failure will seize the network’s operation while decreasing the available network time frame and degrade the network’s general performance. In detail, the failure of a CH interrupts not only the communication to its member sensor nodes as well as to other cluster heads, since they are actively linked in routing grouped data to link via other cluster heads [3]. Further, for it to keep WSN working usually, the clustering and routing algorithms should work in conjunction with the fault-tolerant characteristics, majorly in the failure of the CHs. Sensors do not need to interact directly with BS in cluster-based approaches. However, CHs coordinate the organization of CMs, collection of data at the BS from CM. The amount of data transmitted in the network was significantly reduced as a result of this operation. As a result, communication overheads and energy usage in clustering would be substantially reduced. There have been several fault-tolerant mechanisms suggested and reviewed. However, detecting and recovering failures either consume a lot of extra energy or require additional hardware and software resources [4]. This paper proposes to design a fault detection and recovery approach for lifetime enhancement in cluster-based WSN. The remaining paper is presented in the following manner: Overview of related work is shown in Sect. 29.2. The system model and assumptions are further explained in Sect. 29.3. In Sect. 29.4, the proposed methodology and strategy are discussed. Section 29.5 gives an explanation of our applied simulation and an evaluation of the simulation results. Lastly, in Sect. 29.6, the conclusions of our paper are given.

29.2 Related Work Wireless sensor network fault detection techniques are an important research area today. As for today’s era, fault detection and recovery are hot topics in the wireless sensor network. From this section, we will shortly revise the current fault detection approaches in WSN. A survey on fault management frameworks in WSNs can be found in [5]. The analysis revealed some problems like energy consumption, delay, overhead, and scalability depending on the fault control method. The low-energy adaptive clustering hierarchy (LEACH) protocol [6] is one of the most well-known protocols for WSN clustering. It systemically spins the workload of the CH between the sensor nodes; this is important for load balancing. The major drawback related to this is that a node with extremely low energy can be chosen as a CH that can die very rapidly. Cheraghlou et al. [7] propose an improved LEACH protocol called FT-Leach protocol which distinguishes between live and faulty node for increasing the network’s fault-tolerance capacity. Jadav et al. [8] use a fuzzy inference system (FIS) to identify hardware faults like transmitter circuit condition, receiver circuit condition, and battery condition considering smaller monitoring area and a smaller number of nodes. Jia et al. [9] propose a low-power distributed fault detection (LEFD) system in a WSN that embraces the stream of data gathered by the sensor node in order to detect a given nature of the fault. Similarly, Gupta et al. [10] nature-inspired technique for detecting the fault for WSNs defined as improved fault detection crow search algorithm (IFDCSA). Mohapatra et al. [11] propose two algorithms named

29 Efficient Fault-Tolerant Cluster-Based Approach …

295

uniform energy clustering for fault tolerance and population-based clustering for fault prevention. Same way to extend the network lifetime of heterogeneous nodes, Khriji et al. [12] propose a fault-tolerant-based unequal clustering algorithm where luster heads are selected based on a fuzzy logic system by considering the node’ residual energy, its density, and its centrality. Nevertheless, the majority of algorithms mentioned above concentrate only on detecting or recovering a fault in the cluster head. Yet within this paper, besides detecting the failure in the cluster members, we also detect and recover the failure of cluster heads.

29.3 System Model and Assumptions This paper uses wireless sensor networks with one base station (BS) and the collection of several sensor node numbers. We follow a multi-hop communication strategy to facilitate communication between the sensor nodes and the BS. As shown in Fig. 29.1, the network is divided into several clusters. Each cluster has a head, referred to as the cluster head(CH), that is responsible for collecting and aggregating data. Noncluster head sensor nodes are also present in each cluster and are referred to as cluster members (CM). The data is collected by the CH from cluster members. Each CH has the ability to send data collected to other CHs. In the proposed model, each CH has a backup node called ACH that saves a copy of the CH’s data. The BS present in the network receive data from all the CHs collected from corresponding cluster member. The data may be transmitted directly to the base station or through other CHs. The main focus of this paper is on hardware failures that occur at the cluster head. Battery depletion, hardware breakdown, transmitter

Fig. 29.1 Network model

296

K. Jaiswal and V. Anand

failure, malicious human activity, and other factors may trigger hardware faults. This type of failure has a significant effect because when a cluster head fails, the children are cut off from the cluster tree, resulting in a loss of access to the outside. The sensor network’s availability would be significantly reduced as a result of this. The radio energy model suggested by [13, 14] is used to evaluate energy consumption. Assumptions: – – – – –

N sensor nodes are randomly distributed. In the sensing area of M × M Sq. meter, there is only one base station. The sensor and sink nodes remain static after deployment. The sensor nodes are all the homogeneous. After deployment, the sensor node coordinates are available to the base station.

29 Efficient Fault-Tolerant Cluster-Based Approach …

297

29.4 Proposed Methodology A technique needs to be developed to trace the failed CH or CM which is in a cluster simultaneously to replace them to smooth the cluster functioning. Due to hardware failures or energy exhaustion, the cluster head (CH) or cluster member (CM) fails, increasing network latency and message overhead. We present a fault identification and recovery methodology for a clustered WSN in this paper. Initially, an alternate cluster head (ACH) is chosen based on two key parameters: distance from the CH and residual energy. The failure of the CH, ACH, and CM is then discovered. If a fault is found at CH, the ACH will take over the failed CH’s mission. Simultaneously, any other CM will be elected as ACH if ACH fails. In our previous work [15], we defined the CH selection process, which is a GreyWolf-based optimized clustering approach. We use the same approach as defined in our previous work to select ACH. Our previous work[15] contains a more comprehensive discussion. To choose ACH, we use the following formula: C  EiC M AC H j =

i=1

EC H

× Dist (C H, Si ) C

 (29.1)

where AC H j , C, E C H , E iC M and Dist (C H, Si )) stand for ACH, number of CM, cluster head energy, cluster average energy, and distance between node and CH, respectively. The nearest node to CH with the highest residual energy is the best candidate for ACH. CH notifies BS by sending a message after choosing ACH. The ACH will verify the aliveness of CH based on the beacon message it receives from CH during each round of data transfer from CM to CH. If it does not receive a response message from CH after a certain amount of time, it will announce that the CH has failed and notify CM to send data to ACH. The pseudo-code of the proposed methodology is explained in the Algorithm 1. In the algorithm 1, AC Hi checks the status of C Hi in every time interval and vice versa. In any case, the AC Hi discovers the failure of C H i due to hardware failure or energy exhaustion; it overtakes the function of C Hi and sends an ownership message C Ho to its members (C Mi ). After the ACH has completed the task of CH, a new ACH will be chosen. If AC Hi fails, C Hi selects an alternative AC H i and sends the consent message Con f ir m AC H to its members. Every C Mi keeps track of the status of its C Hi faults in its routing table. In case of detection of failure of a C Hi , by a C Mi , the C Mi gathers the status from other C M j by overhearing. If the majority of C M j announce a failure of C Hi , AC Hi , will be chosen as C Hi , and the information will be sent to the relevant C Mi . To resolve the CH fault, ACH is chosen as the cluster’s new CH, and Eq. 29.1 is used to choose a replacement node among the CM. The data from cluster members should then be sent to the new CH. In addition, whenever CH data is transferred into BS, a copy of the data is saved into ACH to avoid the recollection overhead of data in the event of CH failure. Whenever CH sends data to BS, ACH stores a copy of data before BS receives it. Data is removed

298

K. Jaiswal and V. Anand

from ACH memory after confirmation of data receipt by destination. Data will be accessible via ACH in the event of a problem that results in data loss or damage, resulting in reduced energy consumption and network latency. Similarly, CH obtains CMs energy depletion and hardware information through a communication link that connects CH and CM. Once CH identifies the faults in CM, it asks ACH to verify. ACH begins to detect CM’s failure and removes it from the routing table. To restore the network, the ACH deploys a node in place of a failed CM.

29.5 Simulation Results Here, we give a detailed overview of the obtained simulation results. The proposed technique is compared with other existing two similar algorithms, EAFTR [12] and LEACH [6]. They have been compared with a range of parameters such as energy efficiency, number of inactive nodes, number of dead CHs, and packet loss. For better results, the simulation runs for 1000 rounds, with 500 nodes in the network, 50 clusters, 1000-byte data packets, 10-byte beacon messages (which contain both requests and reply messages), and 5.0 J/Battery initial energy. The number of dead CHs in EAFTR and LEACH is far higher than the proposed algorithm, as shown in Fig. 29.2a. This is because the objective function used for clustering, which slows down the death of CHs. It is determined from Fig. 29.2b that the amount of inactive nodes is less in the proposed algorithm than the existing algorithms as and when the number of rounds increases. This is because the number of alive CHs in the proposed technique is much higher, and we have conjointly recovered the member sensor nodes of every cluster of any failing CH. The amount of energy consumed by the network nodes in each round is depicted in Fig. 29.2c. Since the proposed approach uses fault tolerance mechanisms for clusters that avoid no re-clustering or rerouting required in each round, which saves time and energy. As a result, the network’s energy consumption is lower than that of the EAFTR and LEACH algorithms. The data loss in the different rounds is shown in Fig. 29.2d. The reason for data loss can be a collision, unreliable communication, etc. Our proposal does not perform re-transferring of data when CH becomes faulty or fails because of decreased energy. As a part of the back up a copy of data is available on alternative CH. Therefore, EAFTR and LEACH algorithms have increased the data loss rate compared to our proposed algorithm. The overall observation from this simulation result analysis is that the proposed approach achieves better performance than other existing methods. And enhances the overall network performance corresponds to EAFTR and LEACH by 10%, and 15%, respectively.

29 Efficient Fault-Tolerant Cluster-Based Approach … 180

45 LEACH EAFTR Proposed

40

Proposed EAFTR LEACH

160

35 Inactive sensor nodes

140

30 Dead CHs

299

25

20

15

120

100

80

40

10

20

5

0

0 0

100

200

300

400

500

600

700

800

900

0

1000

100

200

300

400

500

600

700

800

900

1000

No of round

No of rounds

(a) Comparison in terms of no. of dead CHs

(b) Comparison in terms no. of inactive sensor nodes 50

900 Proposed EAFTR LEACH

750

Proposed EAFTR LEACH

45

35

600

Data packet loss

Total energy consumption (J)

40

450

30

25

20 300 15

10

150

100

200

300

400

500

600

700

800

900

1000

No of rounds

(c) Comparison in terms of total energy consumption

5 50

100

150

200

250

300

350

400

450

500

No of nodes

(d) Comparison in terms of average data loss

Fig. 29.2 Simulation results

29.6 Conclusion To provide better network quality of service, it is an essential attribute of a WSN to be able to detect and take appropriate action when there is faults or malfunction of equipments in the network particularly nodes and cluster head. The main contribution of this paper is to design a cluster-based fault-tolerant approach. We apply the GWO algorithm to cluster nodes for the selection of CH. Because of the significance of CHs and to expand the adaptation to fault occurrence in them, an ACH is selected. The alternate cluster head monitors the CH’s performance and copies its results. The malfunctioning of the CM is detected by the CH, and the data from the failed CM will be transferred to CH via ACH. The proposed method does not involve re-clustering and retransmission to recover from the fault or failure that causes delay and timeconsuming. To show the effectiveness of the proposed methodology, we compare the simulation results with two associated algorithms EAFTR and LEACH. The

300

K. Jaiswal and V. Anand

proposed technique outperforms EAFTR and LEACH in terms of energy efficiency, number of inactive nodes, number of dead CHs, and packet loss. In the future, we would like to expand the proposed algorithm to include WSN in mobile scenarios which can cope with rapid topology changes.

References 1. Moridi, E., Haghparast, M., Hosseinzadeh, M., Jassbi, S.J.: Novel fault-tolerant clusteringbased multipath algorithm (ftcm) for wireless sensor networks. Telecommun. Syst. 74(4), 411–424 (2020) 2. Demirbas, M.: Scalable design of fault-tolerance for wireless sensor networks, Ph.D. dissertation, The Ohio State University (2004) 3. Azharuddin, M., Jana, P.K.: A distributed algorithm for energy efficient and fault tolerant routing in wireless sensor networks. Wireless Netw. 21(1), 251–267 (2015) 4. Qiu, M., Liu, J., Li, J., Fei, Z., Ming, Z., Edwin, H.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: 2011 IEEE/ACM International Conference on Green Computing and Communications. IEEE, pp. 56–61 (2011) 5. Moridi, E., Haghparast, M., Hosseinzadeh, M., Jassbi, S.J.: Fault management frameworks in wireless sensor networks: a survey. Comput. Commun. 155, 205–226 (2020) 6. Heinzelman, W.B., Chandrakasan, A.P., Balakrishnan, H.: An application-specific protocol architecture for wireless microsensor networks. IEEE Trans. Wireless Commun. 1(4), 660– 670 (2002) 7. Cheraghlou, M.N., Haghparast, M.: A novel fault-tolerant leach clustering protocol for wireless sensor networks. J. Circuits Syst. Comput. 23(03), 1450041 (2014) 8. Jadav, P., Babu, V.K.: Fuzzy logic based faulty node detection in wireless sensor network. In: 2017 International Conference on Communication and Signal Processing (ICCSP). IEEE, pp. 0390–0394 (2017) 9. Jia, S., Ma, L., Qin, D.: Fault detection modelling and analysis in a wireless sensor network. J. Sensors (2018) 10. Gupta, D., Sundaram, S., Rodrigues, J.J., Khanna, A.: An improved fault detection crow search algorithm for wireless sensor network. Int. J. Commun. Syst. e4136 (2019) 11. Mohapatra, H., Rath, A.K.: Fault-tolerant mechanism for wireless sensor network. IET Wireless Sensor Syst. 10(1), 23–30 (2019) 12. Khriji, S., Ammar, M.B., Fakhfakh, A., Kammoun, I., Kanoun, O.: Energy Aware Fault Tolerant Re-clustering Algorithm for Wireless Sensor Networks 13. Jaiswal, K., Anand, V.: Eomr: an energy-efficient optimal multi-path routing protocol to improve qos in wireless sensor network for iot applications. In: Wireless Personal Communications, pp. 1–23 (2019) 14. Jaiswal, K., Anand, V.: An optimal qos-aware multipath routing protocol for iot based wireless sensor networks. In: 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, pp. 857–860 (2019) 15. Jaiswal, K., Anand, V.: A grey-wolf based optimized clustering approach to improve qos in wireless sensor networks for iot applications. In: Peer-to-Peer Networking and Applications, pp. 1–20 (2021) 16. Moridi, E., Haghparast, M., Hosseinzadeh, M., Jassbi, S.J.: Novel fault-tolerant clusteringbased multipath algorithm (ftcm) for wireless sensor networks. Telecommun. Syst. 74(4), 411–424 (2020)

Chapter 30

From Chalk Boards to Smart Boards: An Integration of IoT into Educational Environment During Covid-19 Pandemic Shraddha Dhal, Swati Samantaray, and Suresh Chandra Satapathy

Abstract Internet of Things (IoT) is a unique paradigm shift in the domain of Information Technology. It converts the real-life things into intelligent virtual devices to ensure a machine-to-machine transmission of information. With its increasingly technological magnitude, it ascertains an imperative role in almost all spheres of life, and the global education markets have incredibly reaped benefits. The present paper gives an insight into the radical evolution and integration of IoT trends in education sector—the transition from conventional chalk boards to modern smart boards, especially at the outset of Covid-19 pandemic when the exigencies of global education demanded it the most. The paper also lists out opportunities, obstructions, scalability of tools and technology, and services allied to IoT expertise, while at the same time bridges the gap between educational and technical applications amid Covid-19 pandemic.

30.1 Introduction In today’s world, we see an instantaneous erasure of boundaries. This process is not only restricted to politics or business, but also liberates almost all sections of life. The recent hi-tech paradigm, Internet of Things (IoT), which embeds in technological enhancements has proven to be a boon for the contemporary means of life, especially the educational sector due to its cutting edge transition. This recent phase of internet revolution that refers to a network of numerous devices which are attached with assorted software, electronics, and network connectivity of distinct orientations, while aiming at exchanging and compiling of information. IoT introduces the next generation of internet, which admits that physical features can be identified and approached through the use of internet. This conquers the outlying world by S. Dhal · S. Samantaray · S. C. Satapathy (B) KIIT Deemed to be University, Bhubaneswar, Odisha, India e-mail: [email protected] S. Dhal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_30

301

302

S. Dhal et al.

linking crucial information altogether. IoT plays a vital role in leveraging the education sector. It gives access to global education by embedding our daily activities that ought to happen on a smarter platform—from learning to commute knowledge and information through using materials into comprehending the progress in time. The importance and implications of IoT in gaining an insight into learning the daily web-based activities is substantial. It helps sinking the physical distance that is often found to be a hindrance in the macro level, thereby providing a key to extend knowledge and information to scholars and stakeholders. It also facilitates in improving the quality of the teaching pedagogy and andragogy and promotes a collaborative future of edification by indulging an advanced level of custom-made learning. A computational Internet of Things that makes use of smart gadgets and web tools like digital books, slides, graphic images, audios, videos, smart boards, and other electronic reference tools certainly ensure a well-structured, well-comprehended, and a smoother class than a class with no such mechanisms. Therefore, with the implementations of IoT applications in the education industry, more efficacious scholastic outcomes are ensured. IoT is the ultimate answer to all the uncertainties and reservations allied to E-learning. A complete implementation of IoT trend in academics is the next big thing to be witnessed. The origin of the term ‘Internet of Things’ goes back to the early twentieth century by the founders of the MIT Auto ID Center, the British technology pioneer, Kevin Ashton and David Brock. Ashton who wanted to draw the attention of his colleagues during his tenure in Procter & Gamble and believed “The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so” [1]. A decade ago, Sundmaeker in his work Vision and Challenges for Realizing the Internet of Things had already predicted that by 2020, “Internet of PCs” would be replaced by “Internet of Things,” where 50–100 billion appliances would be linked to the Internet [1]. The applications of the first Internet of Things included a “metamorphosis of objects” (hand-made simple and easy to use objects) to “complex gauged machines” to ‘mass manufactured artifacts’ to “gizmos” (run through program and controlled by the user) which eventually promoted anytime anyplace connectivity. Before this period, the term was mostly widespread as “embedded internet” or “pervasive computing”. The concept took almost a decade to gain a healthy market share. However, the way it sweeps across the planet and securing a strong place in the embryonic marketplace in the twenty-first century is reflective of its popularity among the subsequent generations.

30.2 Review of Literature The unique paradigm shift of IoT in the field of Information Technology is worth mentioning. Sources that stress on the different aspects of IoT, swear a perspicacious observation and perspectives that bring almost all aspects of life into one global computing network. Numerous manuscripts have been recorded as part of the contributions and scope of IoT, stressing out its importance and implications in almost all

30 From Chalk Boards to Smart Boards: An Integration …

303

segments of life ranging from the operations of IoT in politics, economics, history and architecture, to science and technology, health services, education and many more. The following record provides a short review of the most thrust areas of IoT, but not limited to the list. The text Getting Started with Internet of Things (2011) by Cuno Pfister gives a concise preface to the expansion of embedded internet enabled function using Netduino [2]. The book is a gateway for novice to comprehend the basics of language and programming, while serving as a brilliant tool for machine control. Apart, it provides a hands-on practice of demonstrating the ethics that boost sensor networks. Maciej Kranz, who has mastered technologies like fog computing, artificial intelligence and blockchain, has skilfully captured and compiled the concrete guidelines for using IoT to accomplish innovative ways of production in his book Building the Internet of Things (2016) [3]. This pioneering work shows how “Generation IoT” provides a better and greater primer for manufacturer beyond age, gender and professional background. Kranz in his book provides his readers with an admirable platform to get them familiarized with his shared experience, market field studies, case studies, and much other valuable assistance to generate and achieve their own Internet of Things strategies. The book IoT Inc: How Your Company can Use the Internet of Things to Win in the Outcome Economy (2017) written by Bruce Sinclair, focuses on the tangible values created for the establishment of trade and industry [4]. It also highlights focal business shifts which would disturb industries manufacturing physical products. The author has leveraged on his IoT familiarity and practices in business transactions, the importance of security and privacy and also empowers his readers to run a competent operation. A wonderful insight has been given in the book Analytics for the Internet of Things (IoT)—Intelligent analytics for your intelligent devices (2017) by Andrew Minteer is an exploration of statistical analysis, big data, cloud architecture, and other software designing. The book uncovers unique issues related to IoT infrastructure and devices and also help understanding and implementing actionable aptitudes to gain business prospective [5]. Manuscripts addressing the global concerns of IoT security and privacy issues have also been recorded. Cynthia H. Cwik et al. extract the legal concerns and policy issues as dealt by primary authorities and intellectuals across different vocations in the book Internet of Things (IoT): Legal Issues, Policy, and Practical Strategies (2019) [6]. The text is multidisciplinary in its approach and promises a wide range of themes allied to IoT. Books allied to the implementations of IoT technology in the academic industries have also been published to promote the digitalization of education. However, the potential facilities that IoT provides during the outbreak of a pandemic and the actionable guidance for the secure functioning between non-human connections have not been addressed yet. Keeping this in view, the current manuscript ensures the radical evolution of IoT technology in educational sector during the outbreak of Covid-19 pandemic.

304

S. Dhal et al.

30.3 Objectives Various sources including Government policies/announcements, media reports, texts, and research articles have been taken into consideration, while placing information in the paper. The present work provides an increased insight into how IoT is getting integrated into educational model—opportunities, obstacles, scalability of technology and services (both at the physical and virtual level). It features how IoT serves as a gateway to bridge the gap between educational and technical applications amid Covid-19 pandemic.

30.4 A Fusion of IoT into Educational Sector for Learning Purpose During Covid-19 Pandemic The Covid-19 pandemic has not only disturbed the socio-economic segment of countries, but also significantly affected the education sector. As reported by UNESCO, prioritizing the health factor of students during this stress test, 195 countries across the world have temporarily shut down numerous educational bodies, as a result of which more than 1.5 billion students (comprising 94% of the world’s student population) have suffered acutely [7]. Educationalist and students all across the globe have perennially struggled to exchange academic information. At the wake of this harrowing occurrence, IoT comes as a global rescue as teachers, trainers, students, scholars, and educationalists adopt digital platform for collaborative teaching and learning purposes, which help them make a swift shift from the traditional chalk boards to the contemporary smart boards. In the past, people referred to IoT as “embedded internet.” Today, IoT technology is embedded in many aspects of our lives [8]. Figure 30.1 below shows the number of IoT devices growing exponentially worldwide: Developed and some parts of the developing countries have been bridging up the educational gap of remote distance with high-internet penetration and technical access. While almost the entire year has been quite dedicated to serve and save lives rather than focusing on anything else, global education survives only by the Fig. 30.1 Use of IoT devices

30 From Chalk Boards to Smart Boards: An Integration …

305

complete implementation of IoT, which would not have made possible otherwise. With the new normal of social distancing, IoT takes up certain hybrid approaches to amplify the teacher-student connectivity through a digital platform. Advanced software, technical applications, digital learning platforms and web tools (like Kialo Edu, Edmodo, Kahoot, Glogster, Haiku Deck, Padlet, Coursera, Nearpod, Google Duo, zoom) are widely considered for the deliberation of information among the educational communities. While the future amidst the outbreak of Covid-19 pandemic holds all uncertainties and chaos, by 2021, nearly 23.8 million young learners (from pre-primary to tertiary) would possibly give up on their education for not having an access to school owing to the pandemic’s economic collision alone [8]. In such dire straits, a rapid integration of IoT into educational sector by quickly developing and deploying distant hi-tech learning solutions through alternative channels would prevent the academic attainment of incalculable number of learners from getting disrupted. With an advancement of internet connectivity worldwide, the school and university level of education delivery attains a revolutionary height. Cyber security experts give statistics regarding the rise of IoT in 2020 by saying that IoT is a expeditiously growing industry and it is estimated that for this year alone there are 50 billion connected devices. They also report that the global IoT in education market size is expected to grow to $11.3B in 2023. IoT in education market size is predicted to grow exponentially during the forecast period driven by introduction of IoT-based solutions in schools, colleges, and private educational institutes to provide better education to students (Fig. 30.2). Statistics on the e-learning market size in 2020 indicate that mobile learning remains one of the fastest-growing markets in the sector. The political scientist, Bobby Chernev says, the corporate e-learning market could increase by $38.09 billion between 2020 and 2024. The worldwide e-learning market is projected to be worth

Fig. 30.2 Global IoT in education market [9]

306

S. Dhal et al.

$325 billion in 2025. In 2015, the mobile learning market was worth just $7.98 billion. By the end of 2019 that number had risen to $27.32 billion. Due to the Covid-19 pandemic, and the ever-growing number of mobile users worldwide, experts predict that the mobile e-learning market will rise to $37.6 billion by the end of 2020 [10]. An escalating utilization of connected devices in the education establishments, swift adoption of eLearning, and accessibility of cloud-based solution contributes to the growth of IoT in education market.

30.4.1 Unveiling a World of Opportunities Technology is regarded as a superb equalizer and a mode of getting in touch with people that may not have been possible ever before [11]. As the replacement of IoT applications and integration with the physical platform has proved to be immensely beneficial for its stakeholders within the first 8 months of its employment during Covid-19 pandemic, the world can certainly think of promoting and leveraging the digital teaching platform even in a post pandemic era. The absolute advantages one can derive out of the implementation of IoT technology in the education institutions are: • Physical distance does not remain a constraint. There is remote access to knowledge and information, while maintaining social distancing. Learning goes beyond boundaries. • Lays the foundation for future digital education even after Covid-19. • Applications like video conferencing with screen casting features help share the smart boards/screens, which ultimately enable stakeholders work collaboratively to manage class in real time. • Numerous e-books are made accessible and are compiled to different websites that incorporate visual aids, animations, assessments, and other materials to facilitate the learning process. • IoT-based attendance system maintaining a record of the students attending the lecture saves time of the educators which would have otherwise consumed more time. • By means of technologies like 3D positioning, students can be monitored inside the virtual class and their presence can be reported at any given point of time. • Helps reach out to individual students even with a thick strength of a class which gets really difficult in its physical form. • Issues which may not be verbally described, can be addressed by instant alerts. • Easily accessible during self-quarantine and social-distancing. • Everything from curriculum to course delivery to analysis and examination can be controlled by a single portal. • Accessible from any digital device—desktop, laptop, mobile, tablets, and wearable. • IoT with advanced technology facilitates learning tools for the differently-abled.

30 From Chalk Boards to Smart Boards: An Integration …

307

• Devices like smart student ID Cards monitor the physical whereabouts of students. • E-questionnaires and surveys, quizzes, assignments can be conducted on the same virtual platform while progress can also be tracked. Push notifications, mails/text messages may also be sent for reminder. • International students having a problem of language may repeatedly watch the recorded version of the lectures to beat language barrier. • Webcam enabled examination system monitors the students’ activities during an examination, thereby preventing the possibility of future malpractices. • Geo location helps to track arrival and avoids the issues of foot traffic. • Facial recognition setting helps identify the candidate, thereby allowing them to enter into and exit from the institution premises. • Advanced technologies using sensors may also help tracking the number of times a student goes to the washroom or washes hands to ensure that Covid-19 safety guidelines are being strictly followed. • Virtual consultation or counseling to promote and practice mental health amidst this pandemic situation. • Ensures a safe and secure environment against cyber attack and data loss. • Individual feedback can also be documented for each session. • An useful tool for the students with various communication challenges. • Educational establishments may install IoT-enabled gadgets like smart bulbs and thermostats to preserve energy.

30.4.2 Challenges in Integrating IoT into Education IoT is the new normal that is rapidly spreading around the globe, facilitating to fashion a unipolar learning environment. There are some challenges and gaps in the education sector that IoT aims to provide solution to. Though the involvement of IoT in all spheres of life improves the quality of living, its complete integration for a better educational attainment is still a question of possibility. As we know, there are some barriers to the emulsion of IoT technology considering various research and industry experiences. Remote learning is not attainable without internet connection. Again, places having a poor network bandwidth have connectivity issues as univocally reported by majority of the student community. It also involves a huge financial inclusion to cover numerous institutions and hence is not physically, financially and economically accessible. Availability of electronic devices for students to access e-education may get difficult. Most educational institutions in under developing countries and some in developed countries are not fully equipped to become 100% digitalized. Often it is difficult to implement remote learning flawlessly because of lack of required knowledge as well as exposure, and technical inaptness of the trainers in rural areas. Total learn-fromhome model reduces the involvement of physical activity thereby increasing the possibilities of experiencing metal isolation. Risk of cyber crimes and liability are included too. Absence of physical connection or face-to-face interaction gives a loss

308

S. Dhal et al.

of group experience. Discipline always remains an issue. Additionally, self-direction is another vital prerequisite. Virtual tutoring system may not always diagnose the specific drawbacks of the students. Due to lack of a proper structure, students tend to drop out more in case of virtual learning platforms as compare to physical learning systems. People experience a fear of leading toward a future where the conventional “brick and mortar” physical institutions may get replaced with these virtual platforms.

30.5 Conclusion Education is an indispensable part our life. Right to education is recognized under Article 26 of the Universal Declaration of Human Rights (UDHR) [12]. Moreover, it is also a component of Articles 13 and 14 relating to the International Covenant on Economic, Social, and Cultural Rights (ICESCR) [13]. In order to receive education, four significant components are required: (i) Availability, (ii) Accessibility, (iii) Acceptability, and (iv) Adaptability [14]. Though the advantages of IoT in the educational industry are seemingly apparent, it would still take a while for such advancements to become ubiquitous in the global educational markets because of the lack of these four convenient components. Even though a complete integration of IoT into global education system is still an unsolved question, educationists breath a collective sigh of relief with its adaptation during Covid-19 pandemic to ensure that learning goes beyond boundaries and borders and does not get impeded. Thus, the transition from chalk boards to smart boards through the incorporation of IoT system on the heels of this pandemic at least stops exacerbating the already pre-existing teaching discrepancies. However, new technologies are expected to be developed, piloted, introduced, and sustained in the near future to resolve all technical disparities to unshackle the utilization of opportunities, while preparing the next generations for future outbreaks.

References 1. Sundmaeker, H. et al.: Edited: Vision and Challenges for Realising the Internet of Things. European Union, Belgium (2010) 2. Pfister, C.: Getting Started with Internet of Things. O’Reilly media, Sebastopol (2011) 3. Kranz, M.: Building the Internet of Things. Wiley, United States (2016) 4. Sinclair, B.: IoT Inc: How Your Company can Use the Internet of Things to Win in the Outcome Economy. McGraw-Hill, New York (2017) 5. Minteer, A.: Analytics for the Internet of Things (IoT)—Intelligent analytics for your intelligent devices. Packt, England (2017) 6. Cwik, H.C., Suarez, A.C., Thompson, L.L. (eds.).: Internet of Things (IoT): Legal Issues, Policy, and Practical Strategies. American Bar Association, Chicago (2019) 7. How IoT Can Help Bridge Education Gaps For Now And The Future? https://inc42.com/res ources/iot-can-help-bridge-education-gaps-for-now-and-the-future (2020)

30 From Chalk Boards to Smart Boards: An Integration …

309

8. Maayan, G.D.: The IoT Rundown For 2020: Stats, Risks, and Solutions. Jan 13, 2020. https:// securitytoday.com/articles/2020/01/13/the-iot-rundown-for-2020.aspx 9. Data Bridge Market Research. Global IoT in Education Market is Expected to Register a Healthy CAGR in the Forecast Period of 2019 to 2026. https://databridgemarketresearch.com/ reports/global-iot-in-education-market/ 10. Chernev, B.: 27 Astonishing E-learning Statistics for 2020. https://techjury.net/blog/elearningstatistics/#gref (2020) 11. Mohanty, J.R., Samantaray.: Cyber feminism: unleashing women power through technology. Rupkatha J. Interdiscipl. Stud. Human. IX(2), 328–336 (2017) 12. United Nations: Policy Brief: Education during Covid-19 and beyond. https://www.un.org/dev elopment/desa/dspd/wp-content/uploads/sites/22/2020/08/sg_policy_brief_covid-19_and_ education_august_2020.pdf (2020) 13. United Nations: Universal Declaration of Human Rights. https://www.un.org/en/universal-dec laration-human-rights/ 14. United Nations Human Rights. International Covenant on Economic, Social and Cultural Rights. https://www.ohchr.org/EN/ProfessionalInterest/Pages/CESCR.aspx

Chapter 31

Design and Development of an IoT-Based Smart Hexa-Copter for Multidisciplinary Applications Goutam Majumder, Gouri Shankar Chakraborty, Shakhaowat Hossain, Yogesh Kumar, Amit Kumar Ojha, and Md. Foysal Majumdar Abstract UAV or Unmanned Aerial Vehicle has become a ubiquitous term today because of its extensive uses in different fields. Different advanced technologies like robotics, machine learning, the Internet of Things (IoT), and computer visions make this term more advance and powerful. But most of the UAVs available at present are being used only for a specific purpose. In some cases, UAVs are designed with advanced features but cannot be used as versatile machines because of their inflexible nature. Like agriculture, drones are intended to be used for agricultural purposes only where the devices are not suitable for other fields. This work aims to remove this task-specific limitation and provide a solution where an IoT-based smart Hexa-copter can perform multiple tasks rather than any domain-specific task. A single machine would perform several dedicated functions in different fields, including agriculture, medical service, fire service, surveillance ingeniously and efficiently with futuristic features like remote monitoring and controlling, autopilot mode, talkback system, intelligent decision-making abilities, etc. In the coming days, UAV’s application’s necessity will increase where our proposed idea can put an epoch-making solution.

31.1 Introduction Various new things have been invented to make our lives standard and more straightforward from the birth of innovation. These innovations are trying to make the job easier amid these discoveries in many fields, and we always want to do our things in a short time. Like, 100 years back, people used postage stamps, pigeons, and carriers to communicate from one place to another. But at present, intelligent communication systems exist. At present, drones are being used to complete the work within a short period, like transferring things from one place to another. During the 4th industrial revolution, people have been using advanced technologies like Robotics, Artificial Intelligence (AI), and the Internet of Things (IoT). Machine learning (ML) is the key to making life easier and more efficient, revolutionizing the industry. The invention G. Majumder (B) · G. S. Chakraborty · S. Hossain · Y. Kumar · A. K. Ojha · Md. F. Majumdar Lovely Professional University, Jalandhar, Punjab, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_31

311

312

G. Majumder et al.

of fixed-wing aircraft in 1903 spurred a generation of the idea of making drones. In 1917, during World War I, the United States of America (USA) developed drone weapons, but they did not deploy these innovative weapons until 1918 [1]. Drones emerged in the modern era in World War II as surveillance craft [2]. In the post-World War II years, the cold war increased the usage of Unmanned Aerial Vehicles (UAV). The USA and the Union of Soviet Socialist Republics (USSR) used drones extensively for spying and surveillance. Israel Air Force used drones in its stunning defeat of the Syrian Army in 1982 [1]. The weapon and modern war tools were integrated with drones which made a current battlefield scenario. According to a wall strict Journal report, commercial drones have begun in earnest in 2006 where drones were started to be used in non-military applications like disaster relief, wildfire fighting, and spraying pesticides on farms [3]. Later, that term got extended when drones were started to be used for recreational purposes. In 2013, Amazon CEO announced to use of drones as a delivery method on public interest [4]. Although the initial intention behind drones’ development was military purposes, drones are now being used by individual entrepreneurs and organizations [7]. Toward this direction, we have developed a Hexa-copter with an innovative design that can perform multiple dedicated tasks integrated with different advanced features and technologies. The drone is based on the Internet of Things (IoT), where remote interaction between the devices can be possible and send or receive data by connecting with the Internet. Intelligent landing facilities, advanced security, and remote data transmission feature make the system more reliable. Different modules help make the device versatile in multiple fields specially designed for emergency medical services. The operator can interact with the drone continuously from the ground location by monitoring the flight status with the help of telemetry. The transmitter helps to control the drone with a wide range, and the dual GPS gives precise location tracking facilities with balanced and stable flight modes like attitude, altitude, and RTL mode [5]. The rest of the paper is organized as follows: Sect. 31.2 highlights the literature review, and Sect. 31.3 discussed the design and development of the proposed Hexacopter architecture with its functionalities. Section 31.4 provides the data gathering information and required parameters with various error analyses. The conclusion of this paper is drawn at Sect. 31.5.

31.2 Literature Review Xiang et al. [1] proposed designing a drone system related to the life-ring delivery that delivers life-rings to victims faster than a lifeguard. The notable thing behind the work is life-ring delivery drone system results in a reduction in the meantime to reach a victim. Choi et al. [2] proposed a fully automatic charging station that operates wirelessly. Application of the proposed charging station may eliminate the need for manual battery charging of the quadrotor UAVs. Presented the dramatic

31 Design and Development of an IoT-Based …

313

growth in the drone market in recent years, the number of research efforts dedicated to drones increases correspondingly [6]. Kornatowski et al. [3] conducted 150 deliveries over one month on the University campus in Switzerland to assess domestics’ user acceptance. Researchers describe the results of these tests by analyzing flight statistics, environmental conditions, and user reactions. Moreover, researchers describe technical problems that occurred during flight tests and solutions that could prevent them [10]. Dung and Rohacs [4] highlighted in their research paper that drones could be used in everything from smart cities to medical, transportation, and agriculture. The researcher highlighted the drones that can be used for multiple purposes to manage the smart city of drone application. Using his drone technology, he described the quality of security operations in different parts of the town in his research paper [4]. Koh and Wich [5] proposed that deforestation is an important driver of biodiversity loss and greenhouse gas escape. Far sensing technology is increasingly used to meet changes in jungle cover, species format, and carbon stocks. Satellite and airborne sensors can be prohibitively costly and inaccessible for researchers in developing countries [10]. Butcher et al. [6] proposed that drone technology can be a new medium for managing and researching wildlife. The quality of research can be further enhanced by accurately collecting information and the life cycles of these animals. Canas et al. [8] describe how teachers and students use Python to develop drones using their programming skills and other new technologies. Taj [9] highlighted the usage of advanced drones in the war zone. In this research paper, researchers have shown how enemies attack can be prevented using a drone. The researcher has also revealed about future of drones using satellites through his writings. Rivas et al. [10] presented an artificial intelligence-based multi-rotor system especially for surveillance and monitoring because of its 360° picture and video capability. The author also put concern on wildlife monitoring using digital image transmission systems.

31.3 The Proposed System Architecture and Its Components This section is divided into several subsections where each subsection defines its respective related contents. The very first subsection is a microscopic description that describes the application of the device thoroughly. Required components with a specific application present the necessary components used in the device and its introduction with detailed applications. The subsection working functionalities show the system’s functional flow that the device works with flow diagrams and detailed diagrams and descriptions. The circuit diagram subsection offers the connectivity and internal integration of different modules.

314

G. Majumder et al.

31.3.1 Microscopic Description of the Application This device is designed for multipurpose use in different fields, especially in medical, fire service, agriculture, and surveillance. The primary application area is the medical field, where the system has been designed with some features that make the drone an aerial ambulance for providing emergency medical service. Sometimes we see that an ambulance cannot reach the destination in time because of a traffic jam or road limitations. In such circumstances, an ambulance drone is an epoch-making solution where the aerial vehicle can deliver the necessary lite-weight equipment, medicines, or blood to the respective destination in time. We have added some devices to the drone by which doctors can monitor a patient’s health condition from a long distance and give primary treatment to the patients until the ambulance comes so that the life will be saved. There will also be talkback features where the online voice would also be possible from both sides as doctors can give initial treatment and instruction to the patient through it. For the agriculture field, this drone would be able to spray insecticide in an automated way. The drone can also be used for fire service and surveillance tasks related filed and used to detect or observe any particular situation like traffic condition, fire and temp status, and weather information. Further, this information is sent to the base station so that firefighters can get the idea to approach their rescue plan. An intelligent security camera will help detect objects, capture photos, and record video. The camera can be operated through any smartphone with unlimited distance and also have talkback and night vision features.

31.3.2 Required Components with a Specific Application The required components to build an aerial vehicle are categorized into three parts: (i) transmission receiver system, (ii) flight controller, and (iii) sensors, actuators, and related accessories. After reading through many similar research works [5, 7], we got some ideas about different components required for different device modules and their functionalities. These are summarized below. (a)

(b)

(c)

Transmitter–receiver: Transmitter generally transmits the signal from the user side, and the receiver receives the signal and sends the received signal to the flight controller to be processed. We have used a 12-channel transmitter where first four channels are by default used for the Elevon (Roll), Elevator (Pitch), Throttle, and Rudder (Yaw). The Flight Controller: This device is considered as a brain of an autonomous air vehicle. We have used two-flight controllers for testing and final demonstration. GPS (dual GNSS): It is used for position mapping and tracking. The GPS in our device is an M8N dual GNSS module that has the characteristics of high safety and strong interference resistivity and precise location.

31 Design and Development of an IoT-Based …

(d) (e) (f) (g) (h) (i)

315

Brushless motor: Brushless DC motors, also known as BLDC motors, are generally used in aerial systems for higher RPM. ESC (30A one-shot): ESC is used as a brushless DC motor speed controller. The low output resistance dramatically enhances power stability. Rechargeable Li-po battery: Li-po batteries are popularly used in radiocontrolled planes, cars because of their efficiency and weight. Radio Telemetry: Radio telemetry uses radio signals made up of invisible and silent electromagnetic waves to determine location. S550 Hexa-frame: This type of frame is made for the S550 model built with carbon fiber landing gear and an integrated power distribution board. Microcontroller: A microcontroller is considered the heart of any embedded system where different sensors and actuators perform various dedicated tasks. The list of required components for the ground station is summarized below:

(a) (b) (c)

(d)

(e)

(f)

(g)

Arduino Uno: Arduino Uno is a programmable platform having a microcontroller board based on the ATmega328P, generally. Servo Motor: A servo motor is a self-contained electrical device that rotates parts of a machine with high efficiency and remarkable precision. RFID Module: Radio Frequency Identification system consists of two main components, a transponder/tag attached to an object to be identified and a Transceiver also known as interrogator/Reader. Fingerprint Sensor: A fingerprint scanner is a type of technology that identifies and authenticates an individual’s fingerprints to grant or deny access to a computer system or a physical facility. IR Sensor: An infrared sensor is an electronic device that emits to sense some aspects of the surroundings. These sensors measure only infrared radiation rather than talking it, which is called a passive IR sensor. Ultrasonic Sensor: An ultrasonic sensor is an electronic device that measures the distance of a target object by emitting ultrasonic sound waves and converts the reflected sound into an electrical signal. 16*2 LCD module (I2C integrated): The 16 × 2 translates to a display of 16 characters per line in two lines.

Two Arduino Uno are used to control the ground station. RF-ID and fingerprint modules are used to receive verified access input; servos are used to lock/unlock the actuation; IR and ultrasonic are used to detect drone movement during landing. Display, speaker, and LEDs are for defining output action.

31.3.3 Working Functionalities Aerial vehicle functionalities are represented with the attached flow diagram in Fig. 31.1a and b.

316

G. Majumder et al.

Fig. 31.1 a Flow diagram of the Aerial vehicle functionalities. b Flow diagram of the ground station functionalities

According to Fig. 31.1a flow diagram, the user needs to move the throttle stick to the most bottom-right position for 3–5 s to arm the vehicle. The led and buzzer will indicate if the arm goes successfully and the propellers start to rotate. After arming once, the user has to give any input action through the throttle. The flight controller also receives the signal from its sensors. The detailed functionalities of the aerial vehicle are provided below: (a)

(b)

(c)

Modes: There are different predefined modes, and each mode has its flight parameters. Modes can be set through the transmitter according to user requirements. We have selected Stabilized, Altitude hold, Attitude mode, Land, Sport, and RTL modes for the vehicle. The vehicle can hold a particular position with attitude mode until the user wants to move, and the vehicle can fly with higher stability. Failsafe: Different failsafe already set on the flight controller, which helps avoid an unwanted situation like a crash and makes the aircraft safely landed. Radio failsafe and battery failsafe are notable among them. Autonomous: The flight data can be accessed through telemetry and also be able to observe the flight status and component parameters. After uploading the mission, the copter will start to fly, follow the assigned line on the given mission map, and fly accordingly without a radio transmitter.

According to Fig. 31.1b, one has to punch the valid RF-ID card to access the ground station. If the punching is valid, then it will ask for a fingerprint. If the system can recognize the user’s identity, then it would unlock the drone to fly. Otherwise, the drone would be remained locked, and the system generates an alert message. An RGB LED ring indicates the different status of the action on the ground station.

31 Design and Development of an IoT-Based …

317

31.3.4 Circuit Diagram In Fig. 31.2, two GPS—Neov2 and Neo V2pro are connected with flight controller V5+ in GPS and CAN1 port. PW-Link Telemetry is associated with the Telemetry1 port, and the receiver is associated with a ppm input port through a ppm encoder. The obstacle-avoiding sensor module is connected with the Pixhawk flight controller through the I2C communication protocol, as shown in the circuit diagram Fig. 31.3a. The receiver is connected in RC in pin, and the telemetry air module is connected Fig. 31.2 Connection diagram of GPS, RC receiver, wi-fi telemetry, motors, and power module with Autopilot (CUAV V5+)

Fig. 31.3 a Connection diagram of different sensors and actuators with microcontroller (NodeMCU) and flight controller. b Connection diagram of microcontrollers, sensors, and actuators of the ground module

318

G. Majumder et al.

in Telemetry1 port. Different sensors and actuator devices are connected with NodeMCU, where a portable router is there to provide an ad-hoc network to the module. With a reasonable distance, the device can be operated through a radio transmitter or radio telemetry device. In the drone landing pad, there are four LDR modules attached on four corners. A servo, a Neo-pixel led ring is there, and in the center, an ultrasonic sensor is attached. In Fig. 31.3a, the obstacle-avoiding sensor module having four (4) ultrasonic sensors with an Arduino is connected with the flight controller through the I2C communication protocol. The receiver is connected in RC-in pin, and the telemetry air module is connected in Telemetry1 port. Different sensors and actuator devices are connected with Node-MCU, where a portable router is there to provide an ad-hoc network to the module. A smart wi-fi camera is connected to the Internet through the router, which is remotely accessible. In Fig. 31.3b, four LDR modules, a servo motor, a Neo-pixel led ring, and an ultrasonic sensor, are attached with an Arduino. Four LEDs have been integrated to show the landing status. RF-ID sensors are there that are also connected with SPI communication protocol. A fingerprint sensor, another servo motor, an RTC module, and a mini speaker are associated with another Arduino board. Two LCDs are connected with two Arduino, and an anemometer is connected to measure the airspeed.

31.4 Various Calculation and Results This section has provided the details of various weight measurements in the calculation, such as distributed weights. The thrust calculation shows the relation between generated thrust and throttle value, performance testing, and error analysis. (a) (b)

Weight measurement: The vehicle’s total weight is approximately 2234 g, where the detailed weight description is listed in Table 31.1. Thrust calculation: Emax mt2216 motor of 6 pieces is used in the drone where the maximum thrust a single motor can produce is 920 g claimed from the manufacturer side. But in the real-time hands-on experiment, the maximum thrust we got is 915 at 100% throttle, which is very close to the claimed one. That means a single motor can lift around 915 g with its full power. According to this, thrust per motor is 915 g, and the total thrust of the drone would be 915 * 6 = 5490 g. Optimum thrust required to control the drone smoothly = (Total weight *2) = 4468 Throttle percentage to lift 4468 g = (5490/100) * 4468 = 81.38. So, with around 81% throttle, the drone can be operated smoothly with whole weight.

31 Design and Development of an IoT-Based … Table 31.1 Weight status of the device with individual components

Accessories

319 Weight (g)

Frame

630

Flight controller

90

Flight controller accessories

112

Battery

491

ESC

216

Motor

486

Propeller

78

NMCU

21

Other fixed accessories

110 2234

31.4.1 System Testing and Error Analysis Before checking the performance, we have gathered test data, and for that, nine test flights have been conducted. All the test flights have been sorted into three test sections: (i) beginning test, (ii) intermediate test, and (iii) final test. Data of the first four test flights is taken for the beginning test. For the intermediate test, data of the next three test flights have been considered. For the final test, the last two-flight data have been taken. After completing the beginning test part, some issues got arisen like stability issues and lack of proper response with less accuracy, which has been solved accordingly and made the device ready for the next test. After the completion of the intermediate test, we discovered some more laggings due to poor signal transmission and hardware malfunctioning in the system, which has also been cleared to continue the efficiency. Then the final test has been done with a higher success rate than the previous record. The testing criteria have been divided into four categories with a range of 0 to 10 values shown in Fig. 31.4a. In every category, some testing factors are included to check the performance in an integrated manner. In the first category,

Fig. 31.4 a Performance status with graph. b Error analysis status

320

G. Majumder et al.

the included testing factors are power distribution, the accuracy of propeller rotation, primary stability, and ESC performance. The second category provides flight accuracy, heating optimization, software quality, battery draining, sensors performance, and transmitter range. In the third category, flight stability, quick responses, communication status, flight duration, lifting capacity, vibration status, and accuracy have been included to be checked. In the final category, the factors included being checked are GPS accuracy, flight mode performance, autopilot efficiency, drone to station communication, premium stability, speed, reliability with a compass, and other sensor’s accuracy. From Fig. 31.4b, we can observe the comparison of performance status of flight test with all the categories. In the graph, the beginning test status, representing by the orange line, indicates poor performance in every category. The intermediate test indicates the blue line giving the test status in each category above average. The final flight test, showing by the violet line, indicates excellent performance in every category. The errors generally occurred in the system mainly categorized into three parts: physical implementation error, network communication error, and logical error. From Fig. 31.4b, it is easily observed that the amount of error was high during the beginning test, having a value of nearly 4.2. During the intermediate test, the error got reduced under 2.5 where the error rate decreased in all the categories. In the final test, the rate got reduced drastically where all types of errors were mostly removed and made the rate comparatively lower than the previous test’s status.

31.5 Conclusion Tough competition has been going on since the age of the technical revolution, where different devices have been being discovered to make our lives more comfortable. UAV or Unmanned Aerial Vehicle is a ubiquitous term in today’s date because of its easy and extensive uses in different fields. The interest rate on UAV is getting high where drones are being used not only for professional purposes but also in personal applications like agriculture, aerial transportation, cinematography, surveillance, etc. But most of the devices are generally built in such a way that it can be for a certain task only like agriculture drones are only used in agricultural purpose; cinematic drones are used only for video shooting; surveillance drones for surveillance, etc. Agricultural drones are not being used in cinematography or surveillance. This inflexibility draws a limitation because multiple tasks cannot be done with a single drone. To remove the problem, we have proposed a solution where a single copter can perform multiple tasks in several fields, maintaining proper accuracy and efficiency where there will be no reason to use more than one drone for different tasks. Our proposed device is basically built for the application of agriculture, medical transportation, fire service, and surveillance with one device. Our proposed device can put an exclamatory mark as an innovative invention to serve dynamic work to meet our expectations.

31 Design and Development of an IoT-Based …

321

References 1. Koh, L.P., Wich, S.A.: Dawn of drone ecology: low-cost autonomous aerial vehicles for conservation. Trop. Conserv. Sci. 5, 121–132 (2012) 2. Cañas, J.M., Mart´ın-Mart´ın, D., Arias, P., Vega, J., Roldán-Álvarez, D., Garc´ıa-érez, L., Fernández-Conde, J.: Open-source drone programming course for distance engineering education. Electronics, 9, 2163 (2020) 3. Rivas, P.C., González-Briones, A., Corchado, J.M.: Detection of cattle using drones and convolutional neural networks. Sensors 18, 2048 (2018) 4. Choi, C.H., Jang, H.J., Lim, S.G., Lim, H.C., Cho, S.H., Gaponov, I.: Automatic wireless drone charging station creating the essential environment for continuous drone operation. In: 2016 International Conference on Control, Automation and Information Sciences (ICCAIS) (2016) 5. Bergen, P., Tiedemann, K.: The year of the drone. New Am. Found. 24 (2010) 6. Jacques, S., Bissey, S., Martin, A.: Multidisciplinary project-based learning within a collaborative framework: a case study on urban drone conception. iJET 11, 36–44 (2016) 7. Kornatowski, P.M., Bhaskaran, A., Heitz, G.M., Mintchev, S., Floreano, D.: Last-centimeter personal drone delivery: field deployment and user interaction. IEEE Robot. Autom. Lett. 3, 3813–3820 (2018) 8. Xiang, G., Hardy, A., Rajeh, M., Venuthurupalli, L.: Design of the life-ring drone delivery system for current rip rescue. In: 2016 IEEE Systems and Information Engineering Design Symposium (SIEDS) (2016) 9. Butcher, P.A., Colefax, A.P., Gorkin, R.A., Kajiura, S.M., López, N.A., Mourier, J., Purcell, C.R., Skomal, G.B., Tucker, J.P., Walsh, A.J., et al.: The drone revolution of shark science: a review. Drones 5, 8 (2021) 10. Dung, N. D., Rohacs, J.: The drone-following models in smart cities. In: 2018 IEEE 59th International Scientific Conference on Power and Electrical Engineering of Riga Technical University (RTUCON) (2018)

Chapter 32

Deep Learning-Based Violence Detection from Videos Neha Singh, Onkareshwar Prasad, and T. Sujithra

Abstract Violence has been one of the major concerns among human interactions. Violent activities turn out to be worse in public places like parks, halls, stadiums, and many more. The presence of efficient detection algorithms is the need of the hour as unusual events such as fights have been comparatively studied less. The existing system requires manual monitoring of videos from surveillance cameras. Recognition of violent interactions is important to develop automated video monitoring systems. Approaches based on deep Learning promise better results in recognition of images and human actions. Our proposed deep learning-based model uses CNN and LSTM based on DarkNet19 architecture as the pretrained model. The proposed model achieves 95 and 100% accuracy on the benchmark hockey and Peliculas datasets. Unlike most previous works which involve the use of single-frame models (models which do not include temporal features), our model is able to learn temporal features as well as spatial features. The two datasets, namely Hockey Fight and Movie datasets, contain motions of sudden camera recordings and are challenging datasets to work on. It is also observed that different variations in optical flow were taken into consideration for the work in hand. However, the existing system was unable to utilize an efficient pretrained model for the task of violence detection. The proposed approach puts forward a more efficient pretrained model architecture which is less studied, but is very effective than traditional methods.

N. Singh · O. Prasad (B) · T. Sujithra SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] N. Singh e-mail: [email protected] T. Sujithra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_32

323

324

N. Singh et al.

32.1 Introduction There has been an increasing rate of violence that has been occurring at both public and remote places. In the majority of scenarios, these unusual events turn to worse situations whose consequences are inevitable. There needs to be some mechanism, so that these violent incidents can be monitored and reported to higher authorities such as the policemen in that area, and strict action must be taken to prevent such events from happening in the future. The conventional system of monitoring videos requires a person to keep record of the happenings at the given area. Yet, this methodology is both incapable and unreliable due to the fact that the effectiveness of this system depends primarily on the individual’s capacity to manage his responsibility. Another con of the traditional system is that the person-in-charge may not be attentive all the time which presents the existing system as an inefficient one. Henceforth, an automated system to investigate and distinguish fights and caution individuals of the same continuously has become the critical need of great importance. Also, we carry out this scholarly overview to recognize existing strategies to overcome this issue.

32.2 Related Work 32.2.1 Approaches Based on Convolutional Neural Networks The printing area is 122 × 193 mm. The task of recognition of human actions in abnormal or violent scenarios has been done to differentiate between fight and non-fight sequences. A hybrid learned feature framework was introduced in Ismael et al. [1]. Image frames from video sequences were extracted considering motion and appearance. In order to obtain a spatial–temporal weighting model, Hough forest classifier is used along with 2D CNN where kernel slides on 2D input data to perform multiplication and sums up the result in an output pixel. They experimented on three benchmark datasets namely Hockey, Movie, and Behave. The Behave dataset includes 200,000 frames. The Hockey dataset includes 1000 clips grouped in two: 500 for violence and another 500 for non-violence. The Movie Fight dataset contains 200 clips having 100 clips for violence and rest for non-violence from action movies. In Wei et al. [2], the authors used 3D ConvNet to detect anomalies. A novel sampling method is used. Features like grayscale values and number of key frames were considered. The network utilized pooling and convolution layers. Public datasets like Hockey Fights, Movies, and Crowd Violence are considered. They used random sampling method as performance metric. Key frames from videos were extracted, and the videos were divided based on it. Gray-centroid algorithm is considered to evaluate the similarity between adjoining frames. In that, RGB frames are converted into grayscale. A CCTV-Fights dataset of real-world fights was created in Mauricio et al. [3]. The dataset has 8 h long video footage of CCTVs. The authors worked on

32 Deep Learning-Based Violence Detection from Videos

325

methodologies for feature extractions, like 2D-CNN and local interest points. Meaningful features were extracted from the frames using RGB information. Then, chosen classifier determine if the frame is violent, indicating positive case or non-violent indicating a negative case. The segment generation step makes prediction based on temporal segments for recognizing fight scenes. The performance metrics used were mAP and F-measure. The advantage of this paper is the creation of new dataset. Here, LSTM was not able to strengthen the sequential information. A multi-stream convolutional network [4] was used to detect fights in video clips. The authors adopted VGG16 architecture to learn features. They used descriptors like optical flow, RGB frames, and the estimation of depths. For the classification step, approaches like threshold technique and SVM were used. It was evaluated on sensitivity, specificity, and accuracy. The use of multi-stream network helped authors to learn high-level features. The work also showed the effectiveness of using pretrained models. The authors of Qichao et al. [5] proposed a framework related to localization to identify fight actions in videos. It described movement as detailed description to identify fight actions. The entire frame for recognizing the action was used, and human detectors and K-means clustering were used to aggregate groups of humans. It achieved 93.3% accuracy FADS datasets which was introduced containing 1520 video clips.

32.2.2 Motion Detection-Based Approaches Ersin et al. [6] proposed a new method which is based on a novel motion feature called motion co-occurrence feature (MCF). Block matching algorithms are used to extract motion vectors. Magnitude- and direction-based MCF is calculated by considering both current and past motion vectors. KNN classifier is used to classify clips into violent and non-violent. 93.5% accuracy obtained from MCF descriptor. It did not generate better results for violent flow dataset, but showed promising results for hockey dataset. The Jia et al. [7] focused on detection of abnormal behaviors among pedestrians. They used optical flow to obtain motions in adjacent frames. Key nodes and limb connection like legs and arms were taken into consideration. The authors identify sparse points in images by comparing reference maps with respective points in images. From response graph, the node points obtained were connected to get limb details of pedestrians. The position of pedestrians and torso direction were used for evaluating the model. It achieved 82.4% on COCO public dataset.

32.2.3 Approaches Utilizing Other Machine Learning Algorithms In Aqib et al. [8], the authors focused on implementing transfer learning (Fig. 32.1) approach using CNN on Hockey and Movies dataset. Google Net architecture was

326

N. Singh et al.

Fig. 32.1 Transfer learning approach using ImageNet dataset

used to learn the features for the model which is trained on ImageNet dataset with 1000 different classes. A tenfold cross-validation scheme was used as their training scheme. The last dense classification layer was replaced to distinguish between fight and non-fight scenes. An approach of transfer learning was taken into consideration which showed chances of getting good models with better accuracy. In Pawar and Attar [9], the author used auto-encoder based on convolutional pipeline sequences. A custom LSTM auto-encoder along with the convolutional structure was proposed. They proposed one class-classification structure to identify anomaly and abnormal behaviors in videos. The method was evaluated on AUC and rate of error. The authors of Mabrouk and Zagrouba [10] introduced a custom CNN descriptor based on distribution. They focused on using optical flow along with CNN to detect violence. To model motion structure, orientation and magnitude of optical flow information were used. They used SVM for the task of training. The paper Laisong et al. [11] introduced anomaly behavior detection in persons based on MIL approach. The concept of integrated pipeline spaces in smart cities was considered. A Center Net method is used in approach to extract features, and score function was maintained that use convolutional auto-encoder to learn the extracted features. Peaks in the heat maps are taken as center points to set dimensions for bounding boxes. A new dataset containing 110 videos of violence and normal was used. They achieved AUC of 84.4 on dataset that was introduced. The authors in Konstantinos et al. [12] suggested the use of ResNet50 architecture for the detection of violence. The main focus was to identify violence in crowd fights and hence used Crowd Violence dataset for the evaluation of the method. The method was based on 3D ResNet50 with three convolutional layers of different filters. Their method shown a mean accuracy of 94% on particularly on the Crowd Violence dataset. There was a lag of 1 s in processing streams.

32 Deep Learning-Based Violence Detection from Videos

327

From the literature review, it has been observed that for the detection of fights or any abnormal activities in videos, different approaches have been proposed, including analyzing the key nodes of pedestrians, using sampling methods [2], use of grayscale images, and many more. Several papers introduced their own datasets to work on and evaluate the performance like FADS [5], CCTV-Fights [3], etc.

32.3 Proposed Work The deep learning paradigm has been very effective in the field of object detection. In this work, we are using deep learning to detect violence present in videos. The first step is the frame extraction from video clips. For this purpose, OpenCV is used. OpenCV provides one inbuilt function named VideoCapture method which extracts the frames and saves the frames as a list which can be iterated and traversed as required (Fig. 32.2). The second step is preprocessing step where the focus is on cleaning the data to make it ready for further process. Gaussian Blur which is very effective in smoothing the images is used as an augmenter. The picture is convolved with a Gaussian filter in the Gaussian Blur process. The Gaussian filter is a type of low-pass filter that helps in reduction of high-frequency components. Then, we divide the datasets into training, validation, and testing. For the input to our network, we also reshaped the input frames to fit the model for its use. Unlike images, videos contain temporal information along with spatial information. It can be described as a sequence of images placed in a specific order that runs with a temporal rate. Optical flow solves this problem as it mainly focuses on motion in videos with spatial features in each frame. In other words, it examines the changes in illumination of each pixel in frames to identify motions. We have used two CNNs to extract low-level features and local motion features. Firstly, two video frames are taken as input. The pretrained CNN model process input frames. The two output frames added in the last channel from the bottom layer of the pretrained model is fed into the CNN network. This CNN network learns local motion

Fig. 32.2 Block diagram of the proposed system

328

N. Singh et al.

features as well as invariant appearance features by analyzing two frame feature maps. From the top layer of the pretrained network, two frame’s outputs are added and fed into other additional CNN to compare high-level features between them. From the two CNNs, the outputs are chained together and moved to a fully connected layer before adding LSTM cell to learn global temporal features. These features help in recognizing the behaviors of people and objects in videos. These features include edges or lines of the objects and persons, body movements. These features also include appearance invariance features which include changes in illumination, weather, and other changes related to the environment or background. In this network architecture, acceleration field present in videos and RGB difference between images extracted in the first step also play a vital role in determining presence of violence in frames and hence videos. These features help in classifying the movements or actions of people in videos, hence categorizing violence and non-violence. These are then fed to the LSTM network. The fully connected layer with two neurons characterizing two categories (fight and no-fight) takes the output from the LSTM cell to classify whether there is violence present in the video given as input or not. In the proposed approach, DarkNet19 is selected as a pretrained network architecture with features learned from ImageNet [9] dataset with 1000 categories. DarkNet19 is a 19-layer network with five max-pooling layers. Despite the fact that the network has 19 layers, it achieves top-5 accuracy of 13.5% more than Alex Net that makes it an efficient neural network for computer vision. On the basis of these factors, DarkNet19 has been chosen for transfer learning, and several datasets such as Hockey and Movies dataset have also been used. DarkNet19 architecture consists of 19 convolutional layers along with five max-pooling layers. Convolutional Layers: The architecture mostly uses 3 × 3 filters. For every new start of a batch of convolutional layers in the architecture, the number of filters is doubled. Max-Pooling Layers: After every pooling step, the number of channels is doubled. To compress the feature maps between 3 × 3 convolutions, max-pooling layers with 1 × 1 filters are used. Batch normalization is also used in the last layers of the architecture to make convergence faster and regularizing the model.

32.4 Datasets The experiments for the proposed method are conducted on two benchmark datasets, Hockey and Movies fight for violence activity detection. Hockey Dataset: The Hockey Dataset comprises 1000 clips consisting of videos (Fig. 32.3) divided into two categories, i.e., fight and no-fight. The videos in this dataset are taken from the National Hockey League (NHL) from different scenes in hockey. There are 500 clips in each category in the dataset and the resolution of the video clips being 360 × 288. There is some background motion and sound in each video and are nearly three seconds long in duration. The size of the dataset is 212 MB. For this work, we divided the whole dataset in the following manner: for testing 400 clips, 500 clips for training, and for validation 100 clips.

32 Deep Learning-Based Violence Detection from Videos

329

Fig. 32.3 Sample video scenes from Hockey dataset

Fig. 32.4 Sample video scenes from Movies dataset

Peliculas Dataset: This dataset has a wide scope of different scenes (Fig. 32.4) recorded at dissimilar resolutions. The mean resolution of videos is 360 × 250 pixels. The videos are taken from Hollywood action movies and contain 200 clips which comprise 100 clips of non-fight videos. There are another 100 clips of fight scenes. The duration of video varies from two to three seconds and so does the size of the frames. The Peliculas dataset, also known as Movie Fight Dataset, has complex view angles due to changing radiance and background motions with different impediments. In contrast, the Hockey dataset is also challenging as it contains instantaneous camera motions in recordings of non-fight situations. Based on these characteristics, both these datasets (Hockey and Movies) become appropriate sources to conduct the work of violence detection. The size of the dataset is 734 MB.

32.5 Implementation 32.5.1 Setup The proposed method incorporates DarkNet19 network architecture as its pretrained model for transfer learning [11] (Fig. 32.1). This network architecture is pretrained on ImageNet dataset with more than 1000 categories. The whole process is divided into three categories namely validation, training, and testing. For instance, the Hockey dataset containing 1000 fight scenes videos is divided into 500 videos to be used for training, 400 for testing purposes, and remaining videos for validation purposes. The proposed model is trained on Google Colab TPU.

330

N. Singh et al.

Fig. 32.5 Accuracy graphs on Hockey and Movie dataset

32.5.2 Parameters The proposed network is fine-tuned with an initial learning rate of 0.00001, a batch size of 4, and an unrolled size of 40. The unrolled size here stands for the number of frames in each video. Considering the size of the datasets used, we incorporated five folded validation procedures to evaluate the performance of our proposed method (Fig. 32.5).

32.5.3 Result The proposed method outperformed the state-of-the-art techniques. The previous methods which used traditional methods such as HOG, STIP obtained accuracy of nearly 94%. With the use of CNN and LSTM based on DarkNet19 network architecture, our proposed method showed promising results of 95% on Hockey dataset and 100% on Movie Fight dataset. Our proposed model outperformed the traditional approaches of using classifiers and static features like STIP. The approach based on random forest observed accuracy of 83.2% when implemented on Hockey dataset and that used SVM [4] as classifiers got an accuracy of 86% on Hockey dataset. In [1, 3], the authors used 2D CNN with Hough forest classifiers. It shown the model to be 94.6% accurate on Hockey dataset. Also, the proposed model ran for over 10 epochs on both the datasets. The proposed work also helps to present the remarkable effectiveness to use DarkNet19 architecture for the work on violence detection. For the evaluation of the proposed method, we incorporated fivefold cross-validation. Also, we used precision and recall as the performance metrics for evaluating our network. On the Hockey dataset, we got a precision value of 0.9412 and recall value of 0.98.

32 Deep Learning-Based Violence Detection from Videos

331

32.6 Conclusion This paper aims at identifying violent scenes from videos in a very quick and effective way. To solve the problem of overfitting in small datasets, the technique of using pretrained models to make an effective target model, called transfer learning is utilized. The method presented in this paper incorporated two CNNs for extracting both low-level and high-level features from videos. There is a need for a wellstructured system that can help concerned authorities of the respective places to take these crimes into control. Our future work will help to do exactly the same. The future model which we have planned will be powerful to detect violence. It may be helpful to use the video footage of CCTVs as nowadays all public places have it.

32.7 Future Work The idea of a camera can be presented, having an alarm fit close to each camera which begins humming for quite a while during any event of violence. It can help the individuals in environmental factors know about the dangerous movement, alert them and assist them with anticipating activities. Likewise, the idea can be reached out to fill the need of different sorts of locations, for example, fire mishaps and thievery.

References 1. Ismael, S., Oscar, D., Espinosa, A., et al.: Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 27, 4787–4797 (2019) 2. Wei, S., Jing, Y., et al.: A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7, 39172–39179 (2019) 3. Mauricio, P., Alex, C., Anderson, R.: Detection of real-world fights in surveillance videos. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 2662–2666. IEEE, UK (2019). https://doi.org/10.1109/ICASSP.2019.8683676 4. Sarah, A., Helio, P., et al.: Fight detection in video sequences based on multi-stream convolutional neural networks. In: SIBGRAPI Conference on Graphics, Patterns and Images, pp.8–15. Images (2019). https://doi.org/10.1109/SIBGRAPI.2019.00010 5. Xu, Q., John, S., Weiyao, L.: Localization guided fight action detection in surveillance videos. In: IEEE International Conference on Multimedia and Expo, pp. 568–573. IEEE, China (2019). https://doi.org/10.1109/ICME.2019.00104 6. Ersin, E., Mehmet, A., Medeni, S.: Fight detection in surveillance videos. In: 11th International Workshop on Content-Based Multimedia Indexing, pp. 131–135. IEEE, Hungary (2013). https://doi.org/10.1109/CBMI.2013.6576569 7. Jia, Z., Jun, Y., et al.: Fighting detection based on pedestrian pose estimation. In: International Congress on Image and Signal Processing, Biomedical Engineering and Informatics, pp.1–5. IEEE, China (2019). https://doi.org/10.1109/CISP-BMEI.2018.8633057 8. Aqib, M., Allah, B., Zulfiqar, H.: Violence detection in surveillance videos with deep network using transfer learning. In: European Conference on Electrical Engineering and Computer Science, pp. 558–563. IEEE, Switzerland (2018). https://doi.org/10.1109/EECS.2018.00109

332

N. Singh et al.

9. Pawar, K., Attar, V.: Application of deep learning for crowd anomaly detection from surveillance videos. In: International Conference on Confluence the Next Generation Information Technology Summit, pp. 506–511. IEEE, India (2021). https://doi.org/10.1109/Confluence51 648.2021.9377055 10. Mabrouk, A.B., Zagrouba, E.: Spatiotemporal feature based convolutional neural network for violence detection. In: Thirteenth International Conference on Machine Vision, Italy (2021). https://doi.org/10.1117/12.2587276 11. Laisong, K., Shifeng, L., Hankun, Z., Daqing, G.: Person anomaly detection-based videos surveillance system in urban integrated pipe gallery. Build. Res. Inf. 49(5), 1–14 (2020). https:// doi.org/10.1080/09613218.2020.1779020 12. Konstantinos, G., Konstantinos, I., Theodora, T., et al.: A crowd analysis framework for detecting violence scenes. In: International Conference on Multimedia Retrieval, pp. 276–280. Ireland (2020). https://doi.org/10.1145/3372278.3390725

Chapter 33

Stationary Wavelet-Based Fusion Approach for Enhancement of Microscopy Images Disha Singh, Vikrant Bhateja, and Ankit Yadav

Abstract Microscopy images are acquired from the microscopic view of blood sample analyzed under a microscope. The visual quality of these images is not promising due to its acquisition via lens of the microscope. Guided image filter (GIF) has been suitable for noise suppression; morphological filter (MF) provides a dependable result for contrast enhancement, and unsharp mask (UM) filter is deployed for image sharpening. This paper proposes a combinative fusion approach of aforesaid filter responses using stationary wavelet transform (SWT). The image quality assessment (IQA) of the fused image is evaluated using parameters like peak signal-to-noise ratio (PSNR), enhancement measure estimation (EME), and entropy to adjudge the quality traits of different filter responses. Incremental values of these quality parameters demonstrate that the resultant microscopic images are free from background noises, possesses better contrast and sharpness so that the bacterial clusters are properly differentiated from the background.

33.1 Introduction The identification of the bacterial species and genera is important because the biological information of the microorganisms is very important in veterinary science, medicine, food industry, biochemistry, and farming. Many microorganisms could be useful in day-to-day requirements (such as Lactobacillus bacteria for curd formation); they can cause many diseases (including some of the infectious ones). Therefore, their identification is necessary for proper detection and diagnosis of the infection level [1]. A microorganism is a living thing that is too small to be seen with the naked eye; hence, microscopes are used for the same. The images of the microorganisms taken by a microscope are known as microscopic images. For taking a microscopic D. Singh · V. Bhateja (B) · A. Yadav Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM), Faizabad Road, Lucknow 226028, India Dr. A. P. J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_33

333

334

D. Singh et al.

image, which can be blood, urine, or oral cavity swab is taken from the human body, and a slide is prepared which is placed under a microscope and an image is clicked to retreive a microscopic image. Then, these images are used by the pathologist to detect the species and genera of the microorganism present in it [2]. But these images are challenged because of poor contrast, visibility, acquired noise, etc. Thus, enhancement of these images is very necessary for increasing the accuracy as well as making the process less time-consuming. Pre-processing techniques are applied for the improvement of the image quality for achieving better results. The various works [3–5] used multi-scale retinex (MSR) for the enhancement of the image. This method produces constancy in color and dynamic range compression. It mostly emphasizes on the modification of MSR for the enhancement of the image by using color correction and enhancement of local and global contrast. Noise filtering in the images is performed by GIF in [6], but it suffers from gradient reversal artifact. In [7], MF is used for contrast enhancement, but it blurs the images. In [8], the correction of illumination is carried out by using an illumination factor which is predefined. In the work [9], the authors had implemented UM for sharpening the image; however, this filter is extremely sensitive to noise. If the results of these three filters are fused, it might reduce the challenges faced by each other. The fused result would have reduced noise, enhanced contrast, and increased sharpness. Thus, it may produce a considerable result. The organization of the rest of the paper is in the following sections: Sect. 33.2 describes the general overview of GIF, MF, and UM along with proposed fusion-based enhancement approach. In Sect. 33.3, discussion is on the IQA, summary of the results, performance metrics, and discussion of the outcomes of this work. In Sect. 33.4, summary of the conclusion of the work is discussed.

33.2 Proposed Fusion-Based Enhancement Approach 33.2.1 Guided Image Filter (GIF) GIF is a very popular filter because of its high efficiency and exceptional edgepreserving property [10]. Taking into consideration the structure of the image which is used as a guidance, GIF makes use of a linear model to calculate the values of the pixel in a specified window. As the regularization parameter (one of the controlling parameters of GIF) is taken as a constant, halo artifacts are unavoidably produced in the output images for strong edges. Although, the GIF has an exquisite smoothing and edge-preserving characteristics like the bilateral filter, it is not exposed to certain limitations similar to gradient reversal artifacts in the conventional bilateral filter. The application of GIF can be used beyond smoothing [11]. By making a correct use of the guidance image, it can be enforced to make the output image more structured and smoothed as compared to the image applied to the input. However, GIF poses certain limitations which makes its usage a challenging task as the result of the GIF depends upon the regularization parameter and window size. These parameters are constant in

33 Stationary Wavelet-Based Fusion Approach for Enhancement …

335

the conventional methods, but generally, these values are different for different kind of images. So, the values of these parameters should be selected carefully to avoid any distortions in the image. In this work, GIF is used with regularization parameter of 0.0001 and window size of 16 for reduction of the noise.

33.2.2 Morphological Filter (MF) MF [12] is a group of non-linear filtering techniques that is dependent on the shape and structure of the objects. Morphological filtering technique is carried out by using a combination of morphological operators which are dependent on the shape of the objects in the image. These operators are used separately on structuring element and the input image. Structuring element is a layout that determines the neighborhood of the pixel and selects its size and shape which is largely dependent on the information which has been acquired from the image [13]. The most repeatedly used morphological operators are dilation, opening, closing, and erosion [14]. Erosion is used to shrink the image which scales down the area whereas dilation is used to expand the image which increases its area [15]. In this work, for an input image x, structural element is selected as disk of size 7 then after, top-hat and bottom-hat filters are applied on x. The contrast-enhanced image is found by subtracting the bottom-hat response from top-hat response and adding it to the original image.

33.2.3 Unsharp Masking (UM) UM is a filtering technique in which a blurred version of the image is subtracted from the input image, thus producing a sharp output image. This method has many applications in the printing and publishing industry [9]. It is a type of a bilateral filter, which is popular for producing denoised images along with edge preservation. The various steps involved in the working of the UM are: • Gaussian kernel is used to blur the input image. • The blurred image is subtracted from the original image, and this is stored as mask. • Finally, output is the sum of the original image and the mask. The UM filter works on the name–value pair arguments. Name is the argument name, and value is the corresponding value. These parameters define the amount of sharpness required and which filter should be used for the sharpening of the image. In this work, Gaussian filter is used with the sharpening strength of 0.8.

336

D. Singh et al.

33.2.4 Fusion-Based Enhancement Filter Using SWT The results obtained by applying the GIF, MF, and UM are fused together to gain the desirable output. To achieve this, firstly, the responses of GIF and MF are used. These images are decomposed using SWT [16] into approximation coefficient (AC) and detailed coefficients (DC). Further, the AC of first image is fused with the AC of second image using principal component analysis (PCA) [16]. Hereafter, inverse stationary wavelet transform (ISWT) is used on the generated result. Similarly, the DC of the GIF and MF outputs are fused together using PCA, and ISWT is applied on the fused result. After applying ISWT for both the fused results, they are further fused together using PCA. The output of the UM and the output of the fusion of GIF and MF are again selected as new inputs. Decomposition of these images is performed by SWT for obtaining DC and AC. The DC is fused with the DC and the AC with the AC using PCA. After fusion, ISWT is applied, and the result obtained is further fused together using PCA to achieve the desirable result as depicted in Fig. 33.1. The algorithm for the proposed fusion-based enhancement filter is explained below: Algorithm 1: Procedural Steps for Fusion Based Enhancement Filter using SWT BEGIN Step 1: Input the test image Step 2: Apply RGB to grayscale conversion Step 3: G1 = response of GIF Step 4: G2 = response of MF Step 5: G3 = response of UM Step 6: Decompose G1 and G2 into AC and DC using SWT Step 7: Fuse the DC of G1 and G2 using PCA Step 8: Fuse the AC of G1 and G2 using PCA Step 9: Reconstruct the output of steps 7 and 8 using ISWT separately Step 10: Fuse the results of step 9 using PCA as G4 Step 11: Decompose G3 and G4 into AC and DC using SWT Step 12: Fuse the DC of G3 and G4 using PCA Step 13: Fuse the AC of G3 and G4 using PCA Step 14: Reconstruct the output of steps 12 and 13 using ISWT separately Step 15: Fuse the results of step 14 using PCA as G5 Step 16: Output the fused result G5 END

SWT works by applying a low- and a high-pass channel to the information in the image on all the levels and proceeding this, it produces two sequences. The length of the two new sequences is equal to that of the first grouping. In SWT, instead of changing the channels at all the levels, it has altered zeroes being cushioned at all levels. PCA fuses the images by using weighted sum of source images. After

33 Stationary Wavelet-Based Fusion Approach for Enhancement …

337

Fig. 33.1 Proposed design methodology for fusion-based enhancement of microscopy images

obtaining the weight of each source using the normalized Eigen vector of the covariance matrices of each source images, the weighted sum is carried out for achieving the fusion result.

33.3 Results and Discussions 33.3.1 Image Quality Assessment The above methods are used not only for image enhancement but also for noise filtering. In the enhancement procedure, the values for the pixels of the image is modified. During noise filtering, certain noisy pixels are removed from the images. Further, within the fusion process, the pixels from all the images are selected and modified accordingly to produce the output image. Thus, quality assessment of the output image is of utmost value for verification of the result. Hence, noise filtering evaluation is done by the calculation of PSNR of the output image. Higher the value of PSNR (in dB), better is the noise filtering. The contrast enhancement is evaluated by calculating EME [17] for both input and the output images as it is an absolute value. Higher the value of EME, better is the result. The quality assessment of the fusion process is carried out by calculating entropy which is again an absolute value. Entropy is the amount of information contained in the image; hence, higher value of Entropy claims the better result.

33.3.2 Experimental Results The above-discussed simulation is performed on the dataset of microscopy images of bacterial cells taken from DiBAS dataset [18]. In the initial stage, some of the images from the dataset are selected as test images and converted to grayscale mode. Then, GIF is applied to these images followed by MF with structuring element as disk of size 7, followed by UM with radius 2 and amount 1. The results of these

338

D. Singh et al.

EME=0.791 Entropy=4.236 (a)

EME=2.489 Entropy=5.983 PSNR=35.12 dB (d)

EME=1.246 Entropy=4.781 PSNR=20.32 dB (b)

EME=3.872 Entropy=5.328 PSNR=32.15 dB (c)

EME=4.356 Entropy=6.786 PSNR=40.38 dB (e)

Fig. 33.2 a Original Test Image#1. b Response of GIF. c Response of MF. d Response of UM. e Response of Fusion-based Enhancement Approach

filters are fused as discussed in Sect. 2.4. Response of the simulation is shown in Figs. 33.2 and 33.3. In Fig. 33.2, Test Image#1 is shown along with the results at all the stages of preprocessing, while Fig. 33.3 shows rest of the test images along with the final output after the fusion process. IQA of the images is carried out by computing PSNR, EME, and entropy as discussed in Sect. 3.1 and the corresponding values are presented in Table 33.1 shown.

33.3.3 Discussions Figure 33.2 shows the test image of Candida albicans species of bacteria and their respective output at all the stages of preprocessing of the images, whereas Fig. 33.3 shows Micrococcus spp and Acinetobacter baumanii bacterial cells and their respective output after fusion. The original test image#1 of Candida albicans consists of bacterial cells which are hazy as these images are captured through the lens of the microscope. These images are passed through the GIF, and the resultant image

33 Stationary Wavelet-Based Fusion Approach for Enhancement …

EME=0.853 Entropy=5.678

339

EME=5.467 Entropy=8.236 PSNR=38.27 dB (b)

(a)

EME=0.615 Entropy=3.548

EME=5.623 Entropy=7.516 PSNR=35.72 dB (d)

(c)

Fig. 33.3 a Original Test Image#2. b Response of Fusion-based Enhancement Approach Test Image#2. c Original Test Image#3. d Response of Fusion based Enhancement Approach Test Image#3 Table 33.1 Values of Entropy, PSNR, and EME for the proposed method Images

PSNR (in dB) of result

Entropy

EME

Original

Fused

Original

Fused

Test Image#1 Candida albicans

40.38

4.236

6.786

0.791

4.356

Test Image#2 Micrococcus spp

38.27

5.678

8.236

0.853

5.467

Test Image#3 Acinetobacter baumanii

35.72

3.548

7.516

0.615

5.623

340

D. Singh et al.

obtained in Fig. 33.2 is less hazy. MF output has the edges of the bacterial cells enhanced. UM output of the test image#1 shows sharp images, and this distinguishes the bacterial cell boundaries from each other. These images when fused become clearer, and the cells are clearly visible and distinguished from the background. The original test image#2 shows the Micrococcus spp bacterial cells, and the fused output image is very clear and sharp as compared to the input image. The test image#3 shows the Acinetobacter baumanii bacterial cells, and the fused image is enhanced and has better visual characteristics as compared to the input test image.

33.4 Conclusion In this paper, an improved approach for image enhancement is presented in which the responses of noise reduction, edge sharpening, and contrast enhancement are fused together to give a resultant image. Noise filtering is performed by GIF. The contrast enhancement is carried out by MF. The sharpening of the images is done by using UM. The resultant images have a very improved visual characteristics which are further proved by the IQA parameters EME, PSNR, and entropy. The resultant images have improved visual characteristics which are further proved by incremental values of the IQA parameters (EME, PSNR, and entropy) as compared to the original image. The value of PSNR is also comparable to the results of various other techniques. Therefore, in this proposed fusion-based enhancement approach, only the bacterial cells in the microscopy images are conserved, and they are clearly separated from the background which can further be used for the counting and classification of the images for the diagnosis of the disease. Hence, this work provides a significant and dependable result.

References 1. Zielinski, B., Oleksiak, A.S., Rymarczyk, D., Piekarczyk, A.: Deep learning approach to describing and classifying fungi microscopic images. J. arXiv 1–21 (2020) 2. Nizar, A., Yigit, A., Isik, Z., Alpkocak, A.: Identification of Leukemia subtypes from microscopic images using convolutional neural network. Diagnostics 9(3), 1–11 (2019) 3. Jobson, D.J., Rahman, Z., Woodell, G.A.: Spatial aspect of color and scientific implications of retinex image processing. In: Visual Information Processing International Society for Optics and Photonics, vol. 4388, pp. 117–128, Aug 2001 4. Rahman, Z.U., Jobson, D., Woodell, J., Woodell, G.A.: Retinex processing for automatic image enhancement. J. Electron. Imaging 13(1), 100–110 (2004) 5. Barnard, K., Funt, B.: Analysis and improvement of multi-scale retinex. In: Color and Imaging, Society for Imaging Science and Technology, vol. 1997, no. 1, pp. 221–226 Jan 1997 6. He, K., Sun, J., Tang, X.: Guided image filtering. In: Proceeding of European Conference on Computer Vision, pp. 1–14. Berlin, Heidelberg, Sept 2010 7. Oh, J., Hwang, H.: Feature enhancement of medical images using morphology-based homomorphic filter and differential evolution algorithm. Int. J. Control Autom. Syst. 8(4), 857–861 (2010)

33 Stationary Wavelet-Based Fusion Approach for Enhancement …

341

8. Tek, F.B., Dempster, A.G., Kale, I.: Parasite detection and identification for automated thin blood film malaria diagnosis. Comput. Vis. Image Underst. 114(1), 21–32 (2010) 9. Deng, G.: A generalized unsharp masking algorithm. IEEE Trans. Image Process. 20(5), 1249– 1261 (2010) 10. Awasthi, N., Katare, P., Gorthi, S.S., Yalavarthy, P.K.: Guided filter based image enhancement for focal error compensation in low cost automated histopathology microscopic system. J. Biophotonics 13(11), 1–23 (2020) 11. Sharma, A., Bhateja, V., Sinha, A.K.: Synthesis of flash and no-flash image pairs using guided image filtering. In: Proceeding of 2nd International Conference on Signal Processing and Integrated Networks (SPIN), pp. 768–773. Noida, India, Apr 2015 12. Alankrita, A.R., Shrivastava, A., Bhateja, V.: Contrast improvement of cerebral MRI features using combination of non-linear enhancement operator and morphological filter. In: Proceeding of IEEE International Conference on Network and Computational Intelligence (ICNCI), pp. 182–187. Zhengzhou, China, May 2011 13. Somasundaram, K., Kalaiselvi, T.: Automatic brain extraction methods for T1 magnetic resonance images using region labeling and morphological operations. Comput. Biol. Med. 41(8), 716–725 (2011) 14. Benson, C.C., Lajish, V.L.: Morphology based enhancement and skull stripping of MRI brain images. In: Proceeding of IEEE International Conference on Intelligent Computing Applications (ICICA), pp. 254—257. Coimbatore, India, Mar 2014 15. Tiwari, D.K., Bhateja, V., Anand, D., Srivastava, A., Omar, Z.: Combination of EEMD and morphological filtering for baseline wander correction in EMG signals. In: Proceeding of 2nd International Conference on Micro-Electronics, Electromagnetics and Telecommunications, pp. 365–373. Singapore, Sept 2018 16. Tyagi, T., Gupta, P., Singh, P.: A hybrid multi-focus image fusion technique using SWT and PCA. In: Proceeding of 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 491–497. Noida, India, Jan 2020 17. Arya, A., Bhateja, V., Nigam, M., Bhadauria, A.S.: Enhancement of brain MR-T1/T2 images using mathematical morphology. In: Proceeding of Information and Communication Technology for Sustainable Development, pp. 833–840. Springer, Singapore, June 2019 18. The bacterial image dataset (DIBaS) is available online at: http://misztal.edu.pl/software/dat abases/dibas/. Last Visited on 10 Dec 2020

Chapter 34

A Novel Optimization for Synthesis of Concentric Circular Array Antenna G. Challa Ram, D. Girish Kumar, and M. Venkata Subbarao

Abstract This paper presents an optimized concentric circular array of antennas based on biogeography-based optimization (BBO). Concentric circular antenna array (CCAA) is preferred to similar types of array antenna designs due to its azimuth scanability. Optimization techniques like particle swarm algorithm (PSO), genetic algorithm (GA), and biogeography-based optimization (BBO) are used for the synthesis of CCAA with minimal side lobe level (SLL). From the results, it is clear that BBO algorithm produced better results with thinning by reducing SLL to about −23.33 dB than PSO algorithm.

34.1 Introduction The CCAA contains many concentric rings with different radius for each with the number of excitation elements over each ring will be varied with one another [1]. Generally, an array antenna is highly depended on the parameters like the relative positions of the individual radiators with respect to each other, the relative phases, and the relative magnitudes of each element in the array. All these parameters are applied as inputs for the optimization algorithms to obtain the best side lobe levels. In this paper, optimization algorithms like particle swarm algorithm (PSO), genetic algorithm (GA), and biogeography-based optimization (BBO) are used for the synthesis of CCAA and LAA with minimum side lobe level (SLL). We can further decrease the SLL by using thinning technique. Thinning is a process of setting few of the excitation elements in OFF position [2]. Thinning cannot only decrease the SLL but also can decrease the cost of manufacturing the antenna elements as we can off a few elements in the array.

G. Challa Ram (B) · D. Girish Kumar · D. Girish Kumar · M. Venkata Subbarao · M. Venkata Subbarao Department of ECE, Shri Vishnu Engineering College for Women, Bhimavaram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_34

343

344

G. Challa Ram et al.

34.2 Concentric Circular Antenna Array The CCAA is a structure of concentric rings with different radius for each, and the number of excitation elements over each ring will be varied with one another. All these elements on the circumference of the circle are center fed. In uniform CCAA, the gap between the elements which lie on each ring is made equal to halves of the wavelength [3]. Figure 34.1 shows the structure of the concentric circular antenna array [1] with M rings which are places at a radius Rm where m=1, 2, 3…., M. The number of elements over each ring is indicated as N m . Consider all the element excitation to be isotropic and their radiation pattern is obtained by its array factor [4]. Emi is excitation over ith element placed in the mth ring [5]. The array factor of CCAA is given [6] AF(θ, E) =

M  N 

E mi exp[ j(crm sin θ cos(ϕ − ϕmi ) + αmi )]

(34.1)

m=1 i=1

E mi = Excitation amplitude of ith antenna placed in mth ring. The elevation angle ϕ is remained constant, and the azimuthal angle for the ith antenna placed in mth ring is given by ϕ mi . They are equally distributed over each ring [7]. The elevation angle and the azimuthal angle ϕ mi are given by αmi = −krm sin θ1 cos(ϕ − ϕmi )  θmi = 2π

Fig. 34.1 Concentric circular array

(i − 1) Nm

(34.2)

 (34.3)

34 A Novel Optimization for Synthesis …

345

θ1 is the value of θ . at the peak value of main lobe [8].

34.3 Design Equations After designing the AF, the preceding is to calculate the objective/fitness function which is to be calculated using the optimization algorithms. The fitness function (FF) is to be defined such that the synthesis of antenna should be practically applicable [9]. The major factor that should be considered while designing an array antenna is side lobe level (SLL). This can be achieved in two ways either by varying the position of elements with uniform amplitude or by varying the amplitudes with uniform spacing between them. Thinning is the best way to reduce the SLL by switching off few of the elements in the array [3]. By this thinning, we can obtain low side lobe levels and low manufacturing as we off few of the antennas in the array [10]. The fitness function for thinning of CCAA is defined as below. CF = k

|AF(θm1 , E mi ) + AF(θm2 , E mi )| |AF(θ1 , E mi )|

(34.4)

In the above equation, θm1 is the angle where maximum side lobe AF(θm1 , E mi ) is attained in lower band and θm2 is the angle which indicates the maximum side lobe AF(θm2 , E mi ) is obtained in the upper band.θ1 angle which indicates the maximum of the central lobe was attained for e [−π, π]. k the weighting parameter which is a constant value [2]. The fitness function indicates reduction of side lobe levels in both lower and upper bands [11]. In each ring of CCAA, the excitation elements are placed with uniform spacing between each element and zero phases [12]. The radius is given rm = m ∗

λ m = 1, 2, 3, . . . , N 2

Nm =

(34.5)

2πrm = 2π m Sm Sm ≥

λ 2

(34.6)

rm indicates the radius of mth ring of CCAA and N m indicates number of elements of mth ring. E mi value is equal to ‘1’ if ON and ‘0’ if OFF.

346

G. Challa Ram et al.

Fig. 34.2 Flowchart for biogeography-based optimization (BBO) [14]

34.4 Biogeography-Based Optimization-BBO This algorithm depends on the principle of how the species migrate among the islands. These starts migrating to reach the islands which feel more comfortable for their survival. Such are known to have a good level of habitat suitability index (HSI) [13]; i.e., they have good emigration rate and very low immigration rate. Since the species always try to retain in islands which are friendly to live and the species which are in islands with low HSI will always try to move to more comfort zones. Steps involved in BBO are examine, crossover, and mutation. It is similar to genetic algorithm as we make use of migration operation but for every iteration, we replace the bad habitants with best solutions and continue to evaluate the fitness function [14] (Fig. 34.2).

34.5 Results Implementations of algorithms are done on MATLABR2013 software. The results were obtained by evaluating the fitness function for all these algorithms to produce minimum side lobe levels (SLL). The side lobe level of a CCAA antenna is around

34 A Novel Optimization for Synthesis … Table 34.1 Analysis for CCAA antenna optimization using genetic algorithm

Table 34.2 Analysis for CCAA antenna optimization using PSO algorithm

347

No. of rings (m)

SLL in dB (without thinning)

SLL in dB (with thinning)

20

−17.38

−19.11

16

−17.29

−18.84

10

−17.21

−18.46

No. of rings (m)

SLL in dB (without thinning)

SLL in dB (with thinning)

20

−17.42

−22.38

16

−17.31

−20.87

10

−17.27

−19.65

−17.5 dB, and this can be further reduced by applying thinning to the excitation elements of the array

34.5.1 Results for Concentric Circular Array Antenna Using GA See Table 34.1.

34.5.2 Results for Concentric Circular Array Antenna Using PSO See Table 34.2.

34.5.3 Results for Concentric Circular Array Antenna Using BBO The values of SLL obtained for CCAA with GA, BBO, and PSO algorithms are compared. By comparing the results, the side lobe level is greatly reduced to −23.33dB when compared with the side lobe level of −19.11dB and −22.38dB using GA and PSO, respectively, in Table 34.4 (Table 34.3).

348 Table 34.3 Analysis for CCAA antenna optimization using BBO

Table 34.4 Comparison of SLL values for CCAA using GA, BBO, and PSO algorithms

G. Challa Ram et al. No. of rings (m)

SLL in dB (without thinning)

SLL in dB (with thinning)

20

−17.62

−23.33

16

−17.53

−22.34

10

−17.22

−20.91

No. of rings (m)

SLL in dB Using GA

SLL in dB Using PSO

SLL in dB Using BBO

20

−19.11

−22.38

−23.33

16

−18.84

−20.87

−22.34

10

−18.46

−19.65

−20.91

34.6 Conclusions In this paper, optimized values of SLL for LAA and CCAA are obtained with nonuniform and uniform excitations. With BBO optimization for linear antenna array, SLL is minimized to −27.48 dB from −13.19 dB. The side lobe level for concentric circular array is reduced by using thinning technique. By using BBO optimization with thinning, the SLL for CCAA is reduced to −23.33 dB from −17.62 dB. From the results, it is clear that BBO has been found to produce better results than GA and PSO techniques.

References 1. Liang, S., Feng, T., Sun, G.: Sidelobe-level suppression for linear and circular antenna arrays via the cuckoo search–chicken swarm optimisation algorithm. IET Microwaves Antennas Propag. 11(2), 209–218 (2016) 2. Mandal, D., Ghoshal, S.P., Bhattacharjee, A.K.: Swarm intelligence based optimal design of concentric circular antenna array. J. Electric. Eng. 10(3), 30–39 (2010) 3. Haupt, R.L.: Thinned interleaved linear arrays. In: Wireless Communications and Applied Computational Electromagnetics, 2005. IEEE/ACES International Conference, pp. 241–244, 3–7 April 2005 4. Stearns, C., Stewart, A.: An investigation of concentric ring antennas with side lobe. IEEE Trans. Antennas Propag 13(6), 856–863 (1965) 5. Haupt, R.L.: Thinned concentric ring arrays. In: Antennas and Propagation Society International Symposium, 2008. AP-S 2008, pp.1–4. IEEE 5–11 July 2008 6. Venkata Subbarao, M., Sayedu Khasim, N., Thati, J., Sastry, M.H.H.: Tapering of antenna array for efficient radiation pattern. e-J. Sci. Technol. (e-JST), 8(2), 37–42 (2013) 7. Sayedu Khasim, N., Murali Krishna, Y., Thati, J., Venkata Subbarao, M.: Analysis of different tapering techniques for efficient radiation pattern. e-J. Sci. Technol. (e-JST). 8(5), 47–55 (2013) 8. Dessouky, M., Sharshar, H., Albagory, Y.: Efficient side lobe reduction technique for small -sized concentric circular Progress. In Electromagn. Res. 65, 187–200 (2006)

34 A Novel Optimization for Synthesis …

349

9. Challa Ram, G., Venkateswararao, N.: Optimization of SLL and FNBW in linear arrays using PSO optimization. In: International Conference on Communication, Computing and Internet of Things IC3IoT 2018 10. Mandal, D., Ghoshal, S.P., Bhattacharjeee, A.K.: Design of concentric circular antenna array with central element feeding using particle swarm optimization with constriction factor and inertia weight approach and evolutionary programming technique. J. Infrared Milli Terahz Waves 31(6), 667–680 (2010) 11. Terlapu, S.K., Subba Rao, M.V., Chowdary, P.S.R., Satapaty, S.C.: Design and analysis of koch fractal slots for ultra-wideband applications. In: Lecture Notes in Electrical Engineering, vol 655. Springer, Singapore (2021) 12. Das, R.: Concentric ring array. IEEE Trans. Antennas Propag. 14(3), 398–400 (1966) 13. Challa Ram, G., Venkateswararao, N.: Synthesis of Linear array antenna with wide range of beam widths using DE algorithms. Int. J. Eng. Technol. 7(4.5), 273–276 (2018) 14. Challaram, G., Venkateswararao, N.: Synthesis of Linear array antenna for required range of beamwidth using optimization algorithms. Int. J. Eng. Technol. 7(3), 16–20 (2018)

Chapter 35

PAPR Analysis of FBMC and UFMC for 5G Cellular Communications T. Sairam Vamsi, Sudheer Kumar Terlapu, and M. Vamshi Krishna

Abstract Orthogonal frequency-division multiplexing (OFDM) is a renowned multiple access technique for fourth-generation (4G) wireless cellular systems, as it provides good transmitting power efficiency, multipath propagation and high spectral efficiency. This OFDM is not satisfying some of the requirements for fifth-generation (5G) cellular systems as it has having limitations of more side band leakage power, more peak-to-average power ratio (PAPR) and out-of-band radiation (OOB). The main objective of this paper is to design an efficient waveform which provides high spectral efficiency and low PAPR for 5G Systems. The distinct sub-carriers and different QAM modulations are used to analyse PAPR of various multiplexing techniques like universal-filtered multicarrier (UFMC) and filter bank multicarrier modulation (FBMC) which serve 5G requirements in comparison with OFDM for 4G. At the end of the analysis, this paper describes which modulation is best suited for 5G that satisfies all basic requirements.

35.1 Introduction There are many challenges that 5G needs to serve such as supporting distinct users and their requirements, providing large bandwidth and efficient utilization of available spectrum, and reducing the requirement of power while serving a greater number of users [1]. The selection of multiplexing technique is one of the major challenging decisions by any vendor while serving customers to provide 5G services [2]. In this path, OFDM is the most prominent multiple access techniques to provide efficient T. Sairam Vamsi (B) Centurion University of Technology and Management, Paralakhemundi, India S. K. Terlapu Shri Vishnu Engineering College for Women, Bhimavaram, India e-mail: [email protected] M. Vamshi Krishna Dhanekula Institute of Engineering and Technology, Vijayawada, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_35

351

352

T. Sairam Vamsi et al.

services in present 4G systems [3]. In OFDM, the entire available bandwidth is divided into a number of sub-bands and each sub-band is allocated to each user to transmit data with diverse carriers which are orthogonal to each other [4], But the data rates and users are increasing rapidly and need to be served. Thus, a novel multiple access techniques such as generalized frequency-division multiplexing (GFDM), UFMC and FBMC are used to complete all of these requirements [5]. FBMC is the new technique of OFDM which does not use cyclic prefix (CP) and makes use of an array of filters both at transmitter and receiver end. It is a method which gives better-quality results than OFDM. The usage of filters with more sampling period and shifting of sub-carriers independently result in more complex organization and more Intercarrier Interference (ICI) [6]. UFMC is the advanced multiplexing technique for the new generation of cellular systems, which combines all the feature of OFDM and FBMC. In UFMC, the entire bandwidth is divided into a number of sub-bands and also information is divided into number of samples and each sample is allocated with one sub-carrier which results in effective utilization of bandwidth and also side lobe power gets reduced. The major findings of three modulation techniques are due to the insertion of CP, the OFDM produces less spectral efficiency, and it is improved in UFMC and FBMC by removing CP and making use of independent sub-carriers and independent filters [7].

35.2 FBMC and UFMC These are the two modulation techniques and have gained lot of attention as a remedy for conventional OFDM systems. Both the techniques do not use CP during data transmission to achieve more spectral efficiency.

35.2.1 FBMC It uses dynamic filters for each sub-carrier to be transmitted in the communication channel along with original information. In this technique, the length of each symbol is very high and entire frequency band is separated into number of sub-bands, and hence, these FBC systems were less prone to fading channels. These systems provide less side lobe power as it uses pulse shaping filters instead of rectangular window used in OFDM. The PAPR is less compared to OFDM systems but needs to be decreased to meet 5G challenges. More PAPR is generated in FBMC because of overlapping of adjacent symbols [8]. Hence, the PAPR is determined from symbol along with its adjacent symbols. The transmitter and receiver of FBMC are shown in Fig. 35.1. The input data is decomposed into many sub-symbols allocated each with separate sub-band and sub-carrier. Each data stream is modulated with m-ary QAM, where m represents number of bits transmitted per symbol and the purpose of QAM modulation is to keep the orthogonality between the carriers. Now, inverse FFT is performed

35 PAPR Analysis of FBMC and UFMC for 5G Cellular Communications

353

Fig. 35.1 FBMC transceiver block diagram

on the transmitted information and then filtration process is carried out with the help of prototype filter banks. Finally, synthesized information will be transmitted through the channel. In FBMC, PHYDYAS filter bank is used for designing the prototype filter x(t) and length of the filter id Lq, where Lq = PN-1, where P is the overlapping factor of the prototyping filter and N is the total number of sub-carriers in the system and Hp is the prototype filter coefficient. The impulse response of the transceiver prototyping filter can be expressed as in Eq. 35.1 h(t) = 1 + 2

P−1  p=1

H p2

  pt cos 2π PT

(35.1)

At the receiver, same prototyping filters are used as in transmitters by maintaining the symmetry among them. Initially, the received information is filtered with different offsets and then original data is retrieved with the help of QAM and FFT.

35.2.2 UFMC This modulation technique is considered one of the finest modulation techniques to satisfy the challenges of 5G cellular systems, and it is undoubtedly better than OFDM and FBMC schemes. In this technique, the group of sub-carriers is filtered instead of individual sub-carriers and entire band is filtered in FBMC and OFDM, which helps in reducing filter length in UFMC [9]. Figure 35.2 shows how filtering is performed in three modulation techniques like OFDM, FBMC and UFMC Figure 35.3 shows the transceiver block diagram of UFMC, in which ‘K’ subcarriers are divided into many numbers of sub-bands and each band is assigned with a fixed number of sub-carriers. The N-Point inverse FFT is performed for each subband along with the padding of zeros for the sub-bands that are not allocated for any sub-carrier. The output after N-Point IFFT is expressed in Eq. 35.2

354

T. Sairam Vamsi et al.

Fig. 35.2 Filtering mechanism of OFDM, FBMC and UFMC

Fig. 35.3 Transceiver block diagram of UFMC

Qi = IFFT{ pi}

(35.2)

where pi is time domain representation of IFFT. Now, every sub-band is filtered with a band filter with Chebyshev filter of length ‘L’ and it can be written as q = H. ∼ Y. qi

(35.3)

35 PAPR Analysis of FBMC and UFMC for 5G Cellular Communications

355

where H is called eplitz matrix with length as (N+L1) and ‘~Y ’ is called as IFFT matrix [10]. After data bits have travelled through the channel, an 2N point FFT is performed to convert the frequency domain into time domain and demapping is used for converting streams of data into bits. Due to padding of zeros between the streams of IFFT data, the intersymbol interference (ISI) can be reduced [11].

35.3 PAPR In general, this PAPR occurs in a multicarrier communication where multiple diverse sub-carriers are out of phase with one another. In a multi-carrier communication, there are many numbers of sub-carriers which are modulated individually by each other at different phase values. Once the maximum values are attained by all sub-carriers at the same time, it causes output to generate an envelope with peak value and its values become large when compared with the average value of the data sample [12]. The mathematical expression for PAPR is given in Eq. 35.4 PAPR = 10 log10

  Max |y[n]|2  dB Expectation |y[n]|2

(35.4)

From the above expression, the PAPR can be defined as the peak capacity of y[n] to its normal power. In particular, FBMC and UFMC in the PAPR can also be calculated with the help of complementary cumulative distribution function (CCDF), it can be expressed as CCDF(δ) = Ppapr (PAPR (y[n]) > δ) = 1 − (1 − e−δ )

(35.5)

where δ is the average threshold value.

35.4 Simulation Results For the simulation, the entire band is divided into 20 sub-bands and each sub-band is allocated with 5 symbols with very less side lobes. The system parameters used for the simulation of three modulation techniques can be seen in Table 35.1. The power spectral density (PSDs) values of 100 sub-carriers for UFMC and OFDM can be observed in Figs. 35.4 and 35.5. In FBMC, the power spectral density plot is varied with respect to overlapping parameter (K) and it is shown in Fig. 35.6 for different values of K (2, 3 and 4) By comparing PSDs of three modulation techniques like UFMC, OFDM and FBMC, the UFMC is having lower side lobes, i.e. at −0.5 frequency, the PSD

356

T. Sairam Vamsi et al.

Table 35.1 Parameters definition for OFDM, FBMC and UFMC OFDM Parameter

FBMC Preferred Value

UFMC

Parameter

Preferred Value

Parameter

Preferred value

FFT length

512

FFT length

512

FFT length

512

No. of symbols

5

No. of symbols

100

No. of symbols

5

No. of bits per each sub carrier

2, 4, 6, 8 or No. of bits per 10 each sub carrier

2, 4, 6, 8 or No. of bits per 10 each sub carrier

2, 4, 6, 8 or 10

SNR (in dB)

15

SNR (in dB)

15

SNR (in dB)

15

Sub band offset

156

No of Gaud Bands

212

Sub band offset

156

Filter length

43

No of

2, 3 or 4

Filter length

43

Sub band Size

20





Sub band size

20

Side lob attenuation

40 dB





Side lob attenuation

40 dB

Fig. 35.4 Normalized frequency versus PSD (dBW/Hz) in UFMC

Fig. 35.5 Normalized frequency versus PSD (dBW/Hz) in OFDM

20 0 -20 -40 -60 -80 -100 -0.5

0

0.5

35 PAPR Analysis of FBMC and UFMC for 5G Cellular Communications

357

0

-50

-100

-150

-0.5

0

0.5

Fig. 35.6 Normalized frequency vs PSD (dBW/Hz) in FBMC for different values of K

of UFMC is −80 dBW/Hz, OFDM has −40 dBW/Hz and FBMC with K = 2 is −50 dBW/Hz, this shows efficient utilization of frequency spectrum by UFMC which leads to increase of spectral efficiency. By increasing the value of K, the side lobe power will be very less but not preferable in multicarrier framework as it leads to the duplication of data.

35.4.1 PAPR Analysis The PAPR is determined with the help of CCDF function with different values of subcarriers per symbol for three modulation techniques, and the comparative analysis is shown in Table 35.2. From Table 35.2, UFMC has less PAPR compared to FBMC and OFDM as it has less side lobes. Table 35.2 PAPR analysis of three modulation techniques QAM modulation

No of bits per sub-carrier

PAPR of three modulation techniques UFMC

FBMC

OFDM

4 QAM

2

6.182

9.6503

9.8319

16 QAM

4

5.956

8.7711

9.066 8.856

64 QAM

6

5.695

8.1743

256 QAM

8

5.66

7.3

7.6705

1024 QAM

10

4.7

6.4

7.2561

358

T. Sairam Vamsi et al.

35.5 Conclusion and Future Scope The current cellular technology (4G) has many boundaries like low spectral efficiency and high PAPR. This is because of adding CP to the original information in OFDM technique. These limitations can be analysed with the help of UFMC and FBMC which are new promising waveforms for 5G cellular systems. It is decided that UFMC is a better-quality waveform technique compared to OFDM and FBMC as it has less PAPR and high spectral efficiency because of not using CP and distributing overall bandwidth into sub-bands. In future, there are various techniques and are developed to reduce PAPR, such as signal distortion methods and signal scrambling methods. Both out-of-band and in-band interference frameworks are used under signal distortion methods and finding all variants in coding methods that scramble the codes to eliminate the PAPR can be used under signal scrambling techniques.

References 1. Wang, C.-X., Haider, F., Gao, X., You, X.-H., Yang, Y., Yuan, D., Aggoune, H., Haas, H., Fletcher, S., Hepsaydir, E.: Cellular architecture and key technologies for 5G wireless communication networks. Commun. Mag. IEEE 52(2), 122–130 (2014) 2. Sahin, A., Guvenc, I., Arslan, H.: A survey on multicarrier communications: prototype filters, lattice structures, and implementation aspects. Commun. Surv. Tutorials IEEE 16(3), 1312– 1338 (2014) 3. Kansal, P.K., Shankhwae, A.K.: FBMC vs OFDM waveform contenders for 5G wirelesscommunication-system. Wirel. Eng. Technol. 59–70 (2017). https://doi.org/10.4236/wet.2017. 84005 4. Choo, Y.S., Kim, J., Yang, W.Y.: MIMO-OFDM Wireless Communications with MATLAB. Wily (Asia) Ptee Ltd (2010) 5. Park, Y.: 5G Vision and Requirements. 5G Forum, Korea (2014) 6. Timoshenko, A.G., Osipenko, N.K., Bakhtin, A.A., Volkova, E.A.: 5G communication systems signal processing PAPR reduction technique. In: 2018 Systems of Signal Synchronization, Generating and Processing in telecommunication (SYNCHROINFO) 7. Sidiq, S., Mustafa, F., Sheikh, J.A., Malik, B.A.: FBMC and UFMC: the modulation techniques for 5G. In: 2019 International Conference on Power Electronics, Control and Automation (ICPECA), New Delhi, India, 2019, pp. 1–5. https://doi.org/10.1109/ICPECA47973.2019.897 5581. 8. Xu, L.T.: Modulation method of FBMC with low delay in 5G system. Electron. Meas. Technol. 41 (2018) 9. Sathipriya, N.S.: Implementation and study of universal filtered multi carrier frequency offset for 5G. Int. J. Electron. Commun. (IIJEC) 4(5), 1-5 (2016) 10. Si, F., Zheng, J., Chen, C.: Reliability-Based signal detection for universal filtered multicarrier. IEEE Wirel. Commun. Lett. https://doi.org/10.1109/LWC.2020.3043735 11. Vamsi, T.S., Krishna, M.V., Kumar, T.S.: Channel estimation techniques for OFDM and GFDM: a review. Test Eng. Manage. 83, 17143–17149. ISSN: 0193-4120 12. Baig, I., Farooq, U., Hasan, N.U., Zghaibeh, M., Arshad, M.A., Imran, M.: A joint SLM and precoding based PAPR reduction scheme for 5G UFMC cellular networks. In: 2020 International Conference on Computing and Information Technology (ICCIT-1441), Tabuk, Saudi Arabia, 2020, pp. 30–33. https://doi.org/10.1109/ICCIT-144147971.2020.9213778

Chapter 36

Touchless Doorbell with Sanitizer Dispenser: A Precautionary Measure of COVID-19 G. R. L. V. N. Srinivasa Raju, T. Sairam Vamsi, and Sanjay Dubey

Abstract We all aware of COVID-19 impact around the world which made us to be more cautious in our social life. In this situation, we need to adopt few precautionary measures such as washing hands regularly, sanitization and social distancing. In this regard, we need to upgrade our gadget which comes under physical contacts regularly with people as it can be high risk of transmission of virus. Here doorbell is one such an example which comes under gadget with multiple person contacts, so there should be an upgradation to avoid physical contact while using doorbell. This made us to develop a product called as touchless doorbell with sanitizer dispenser, which avoids physical contacts as well as provides sanitizing the hand. It consists of two IR transceivers, and microcontroller along with surrounding circuitry helps to control sanitizer and doorbell.

36.1 Introduction Generally, nowadays most of the house owners are using doorbell, and there are many types in the designing of doorbells such as wired doorbells, wireless doorbells and smart doorbells. Mostly the wired doorbells are connected to the electrical system of the room, and those are controlled with a switch which is placed outside of the door. Wireless doorbells are operated wirelessly with the help of transmitter as a switch which is connected outside of door and receiver is placed anywhere in the room with range specifications needs to be met. It uses radio frequency (RF) technology or infrared technology such as Bluetooth, Zigbee for data transfer between transmitter and receiver [1]. The major advantages of wireless doorbells are cost-effective, easy to install, power saving, and majorly no drilling is required to walls. The smart G. R. L. V. N. Srinivasa Raju (B) · T. Sairam Vamsi Shri Vishnu Engineering College for Women, Bhimavaram, India e-mail: [email protected] S. Dubey B V Raju Institute of Technology, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_36

359

360

G. R. L. V. N. Srinivasa Raju et al.

doorbells are also one type of wireless doorbells but operated with Internet, and these are controlled with smart devices such as mobile phone or a remote. The major limitation in smart bell is connectivity issues when no Internet, and also, it has to support your smart phone application. There are many ways to control the touchless doorbell device such as [2]. • It can be controlled with the help of serial joysticks, which contains potentiometer and push buttons which controls the motion of motor with the help of pulse width modulation (PWM). • It can be controlled by interfacing device with personal computer (PC) using serial port. • It can control with the help of 433.92 MHz RF transmitter wirelessly. • Lastly, it can also control with the help of Zigbee Communication. As we know that COVID pandemic brings lot of changes in the human life, and also, it brings changes in the technology to be used such as increasing digital transactions which reduces the exchange of liquid cash and usage of smart technologies which provides safety precautions to decrease the effect of human life with COVID pandemic [3]. There are many precautionary measures and are there to avoid COVID such as maintaining social distance, washing hands regularly with sanitizer and keeping the face mask while going outside. Out of all we have taken sanitizing hands as a concept and linked with smart doorbell mechanism [4]. Generally, in regular life there are many persons may visit our home or our office, so every time they need to press switch to ring doorbell and also there is a possibility to exchange some goods from them and both has to be avoided in this pandemic. So, this paper will discuss the design of such system called as automatic doorbell with sanitizer dispenser.

36.2 Implementation Diagram Figure 36.1 shows the implementation diagram of the project, and it mainly consists of transmitter and receiver. The transmitter block is placed at outside of the door, and it consists of two IR transmitters, Arduino Uno, one relay circuit, sanitizer bottle and an antenna for wireless data transfer [5]. The outer part of design consists of two LEDs, red colour and green colour, red colour LED used for enabling doorbell and green colour LED used for dispensing sanitizer. Each of the LEDs is connected to Arduino Uno microcontroller through IR transmitter and to dispense the sanitizer a relay is connected to Arduino and the output of this is connected to a motor which is inside of the sanitizer bottle [6]. Whenever person wants to knock on the door, he/she is having to keep his hand nearby IR transmitter of red LED then through antenna along with Arduino, information will be transmitted and received at antenna of IR receiver which is placed at doorbell, so that it activated [7].

36 Touchless Doorbell with Sanitizer Dispenser: A Precautionary …

361

Fig. 36.1 Implementation diagram

36.3 Design Approach of Project In this section, we will study flowchart and how to access the IR sensors to get required outputs. Flowchart of project: Fig. 36.2 shows the flow diagram of the project, in which initially the person has to decide whether he/she wants to knock the doorbell or clean hands with sanitizer dispenser, based on that hand has to be used near the corresponding LED. When it detects properly, then corresponding action takes place. Accessing Mechanism: Table 36.1 shows the accessing mechanism of the project, in which if any one of the LED glows, then only corresponding action executed, if none of the LED is on then no operation takes place and both LEDs never activated simultaneously [8].

36.4 Results and Discussions Figure 36.3 shows the overall project implementation consists of bell setup and sanitizing setup as well as receiver doorbell speaker. As this product needs to be operated without physical contact, we seek the help of IR sensors and RF modules in place of conventional switching in doorbells. A sanitizing setup is installed in addition to the doorbell setup. Working of Doorbell When a person wants to ring the doorbell, he/she has to place the hand near the IR sensor which is placed in bell setup. Whenever the hand reaches near the IR

362

G. R. L. V. N. Srinivasa Raju et al.

Fig. 36.2 Flow diagram of the project

Table 36.1 Accessing mechanism of the project

Red LED

Green LED

Type of operation

0

0

No operation takes place

0

1

Sanitizer dispensed

1

0

Doorbell Activated

1

1

Invalid input/no operation

sensor at a distance less that 5 cm, then Arduino gives the PWM pulse with width of 200 ms from its digital output pin and a signal will be sent to bell speaker using RF transceiver. The activation of IR sensor is indicated by red colour LED. Figure 36.4 shows the corresponding output to activate the doorbell. Working with Sanitization If he/she wants to sanitize hands, they need to place their hand near the sanitizer slot. Whenever the hand is placed near the IR sensor at a distance less than 7 cm, then the IR sensors are activated (activation is indicated by green colour LED) and sanitizing liquid is dispensed for duration of 2 s. Figure 36.5 shows the setup for sanitizer dispenser.

36 Touchless Doorbell with Sanitizer Dispenser: A Precautionary …

Fig. 36.3 Overall project implementation

Fig. 36.4 Doorbell indication

363

364

G. R. L. V. N. Srinivasa Raju et al.

Fig. 36.5 Sanitizer dispenser

36.5 Conclusion Finally, we have designed a product which is very useful during this COVID pandemic for providing two-dimensional safety, i.e. dispensing the sanitizer without touching any device before using calling bell, and hence, safety is provided at both the ends like persons who are living inside home and persons who came to knock the door. We have used already existing devices only to implement this product with less cost and can be used by anyone. In future, same project we are modifying by including image processing to detect and intimate the person who is outside of the door and also by providing image fusion sensors to scan the person when he is some distance away from the home weather, he/she carries any malicious devices.

References 1. Lobaccaro, G., Carlucci, S., Löfström, E.: A review of systems and technologies for smart homes and smart grids. Published in Horizon 2020, QUANTUM, Quality management for building performance—Improving energy performance by life cycle quality management, in May 2016 2. Vamsi, T.S., Radha, K.: ARM based stair climbing robot controlling through DTMF technology. Int. J. Recent Technol. Eng. (IJRTE) 2(3), (2013). ISSN: 2277–3878 3. Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., Zhao, X., Huang, B., Shi, W., Lu, R., Niu, P., Zhan, F., Ma, X., Wang, D., Xu, W., Wu, G., Gao, G., Tan, W.F.: A novel Coronavirus from patients with pneumonia in China, 2019. New England J. Med. 382(8), 727–733

36 Touchless Doorbell with Sanitizer Dispenser: A Precautionary …

365

4. Sharma, N., Hasan, Z., Velayudhan, A., Emil M.A., Mangal, D.K., Gupta, S.D.: Personal protective equipment: challenges and strategies to combat COVID-19 in India: a narrative review. J. Health Manage. 11 (2020) 5. Vamsi, T.S., Srinivasa Raju, G.R.L.V.N. (2018) ARM-Based industrial circuitous like robotic implementation using CAN Bus for surveillance. In: Satapathy, S., Bhateja, V., Chowdary, P., Chakravarthy, V., Anguera, J. (eds) Proceedings of 2nd International Conference on Micro-Electronics, Electromagnetics and Telecommunications. Lecture Notes in Electrical Engineering, vol. 434. Springer, Singapore. https://doi.org/10.1007/978-981-10-4280-5_28 6. Majeed, R., Abdullah, N.A., et al.: An intelligent, secure, and smart home automation system. Published in Scientific Programming, Hindawi, ISSN No: 1058–9244. https://doi.org/10.1155/ 2020/4579291 7. Vamsi, T.S., Viswanadam, R., Praveen kumar, K.: An industrial survivalence sensoric robot controlling through arduino and GSM. Int. J. Control Theor. Appl. (IJCTE) 9(23), 87–96. Scopus Indenxed with ISSN: 0974–5572 8. Sanjay, D., Kumar, P.R., Savithri, T.S.: Hardware and software codesign methodology for implementing algorithms of person follower robotic system. Int. J. Mech. Rob. Syst. 4(2), 89–106 (2018). https://doi.org/10.1504/IJMRS.2018.090274

Chapter 37

Optic Disc Segmentation Based on Active Contour Model for Detection and Evaluation of Glaucoma on a Real-Time Challenging Dataset Sonali Dash, P. Satish Rama Chowdary, C. V. Gopala Raju, Y. Umamaheshwar, and K. J. N. Siva Charan Abstract Glaucoma is a chronic disease, and if not diagnosed at the early stage can lead to permanent blindness. For the detection of glaucoma, ophthalmologists use few medical techniques like Heidelberg Retinal Tomography (HRT) and Optical Coherence Tomography (OCT) for detecting glaucoma. These techniques are costly and time-consuming. Thus, automatic analysis of retina images is gaining more attention that can provide accurate results that are delivered faster than the manual process can achieve. Glaucoma increases the cup-to-disc ratio (CDR) and decreases the rim-todisc ratio (RDR), affecting the peripheral vision loss. This work recommends a new denoising technique followed by active contour model calculation that depends on a texture-based procedure to diagnose the glaucoma on the CDR and RDR evaluation. The robustness of the recommended approach is evaluated on a most challenging real-time retinal database named as VISAKHA database. The retinal images are collected from the Visakha Eye Hospital, Visakhapatnam, AP, India. The suggested method is capable of detecting the glaucoma almost 94–96% successfully on the real-time dataset.

S. Dash (B) · P. Satish Rama Chowdary Department of Electronics and Communication Engineering, Raghu Institute of Technology, Visakhapatnam, India P. Satish Rama Chowdary e-mail: [email protected] C. V. Gopala Raju · Y. Umamaheshwar Visakha Eye Hospital, NBE Post-Graduate Training Center, Visakhapatnam, AP, India e-mail: [email protected] K. J. N. Siva Charan Department of Ophthalmology, Maharajahs Institute of Medical Sciences, Nellimarla, AP, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_37

367

368

S. Dash et al.

37.1 Introduction The optic nerve is gradually damaged if it is affected by glaucoma. The optic nerve may completely affect that leads to blindness if the glaucoma is not detected in time it. Therefore, for timely treatment of the disease, it is immensely vital to detect glaucoma at primary phases [1–3]. Optic nerve head investigation and evaluation is necessary for the precise identification of glaucoma, i.e., cup-to-disc ratio and rim-to-disc ratio are the utmost appreciated methodologies to identify glaucoma architecturally [4]. Identification of cup-to-disc ratio is quite costly and consumes more time. Currently, only few experts are performing the task of finding cup-to-disc ratio. Consequently, it is highly necessary to identify the optic disc and its segmentation to diagnose glaucoma automatically that can be extended for the clinical practices. The optic nerve head can be identified through two different methods. In the first method, the normal and abnormal cases are identified through extracting the features and classifying by binary classification, which is a very challenging approach. The second method utilizes clinical parameters in the optic disc area like cup-to-disc ratio, rim-to-disc ratio, inferior, superior, nasal, and temporal (ISNT) zones rule for detecting glaucoma [5]. Many authors have suggested various approaches for optic disc (OD) localization and segmentation for glaucoma detection. Sinha and Babu have suggested a new OD localization using L1 minimization for glaucoma detection [6]. Abdullah et al. have recommended an approach utilizing thresholding and circular Hough transform based on the characteristics of the OD [7]. Bharkad has constructed an equiripple filter to eliminate the high-intensity variation within the OD and later promoted for segmentation application [8]. Zhou et al. have presented an approach for segmenting optic disc and optic cup jointly and robustly through local statistical active contour model with the structure prior method [9]. Mvoulana et al. have combined brightness criterion and a template matching method for the detection of OD. Afterwards, utilizing texture-based method for optic cup (OC) and OD segmentation is performed to compute the cup-to-disc ratio for glaucoma screening. This work suggests a fast, fairly simple algorithm to localize the OD and then the OD and OC segmentation is performed. After the segmentation, cup-to-disc ratio (CDR) and rim-to-disc ratio (RDR) are calculated for the screening of glaucoma. The major contribution of this article is the suggested approach that is assessed on a challenging real-time dataset, i.e., collected from Visakha Eye Hospital, Visakhapatnam, AP, India. The paper is structured as follows: Sect. 37.2 explains the materials and methods utilized in the work. Experimental results are described in Sect. 37.3. The conclusion is drawn in Sect. 37.4.

37 Optic Disc Segmentation Based on Active Contour Model …

369

37.2 Method and Materials 37.2.1 Method The suggested method consists of three elementary stages: pre-processing, homomorphic filter, the active contour model, and texture-based OD and OC identification. Figure 37.1 illustrates the procedural steps of the suggested approach. In the pre-processing step, the red channel of the RGB retinal image is selected. Generally, for the analysis of diabetic retinopathy (DR), green channel is selected while red channel is chosen to identify the OD. For the reason that higher resolution is found in the red channel for OD than the other part of the retinal image. In the red channel, the vessels seem slighter clear and OD appears with highest contrast against background. In the next step, homomorphic filter is used to compensate the illumination as high illumination variations are observed in the real-time data. The performance of the homomorphic filter is described below. An image can be displayed mathematically in terms of illumination M(x, y) and reflectance F(x, y) component. The image I (x, y) is then defined as I (x, y) = M(x, y) ∗ F(x, y)

(37.1)

The following are the five sets of the equation that represent the homomorphic filter.

Original image

Classifying as glaucoma or normal

Red component

Disc boundary

Morphological output

OD segmented

Cup boundary

Fig. 37.1 Procedural steps of the suggested method

Homomorphic filter

Thresholded output Approximate OD Boundary

370

S. Dash et al.

A(x, y) = ln M(x, y) + ln F(x, y)

(37.2)

A(u, v) = FM (u, v) + FF (u, v)

(37.3)

B(u, v) = H (u, v)FM (u, v) + H (u, v)FF (u, v)

(37.4)

B(x, y) = J−1 {H (u, v)FM (u, v) + H (u, v)FF (u, v)}

(37.5)

E(x, y) = exp{B(x, y)}

(37.6)

where A(x, y) represents the additional term of illumination and reflectance, A(u, v) is the Fourier transform of A(x, y), FM (u, v), and FF (u, v) represents the Fourier transform of illumination and reflectance, respectively, H (u, v) is the filter function, B(u, v) represents the filtered image, and B(x, y) represents the filtered image in spatial domain. E(x, y) represents the final homomorphic filtered enhanced image. In the literature, the filter function H (u, v) has many variants but the difference of Gaussian filter (DoG) is considered in the experiment which is shown below: 

    D(u, v) 2 + γL H (u, v) = (γ H − γ L ) 1 − exp c D0

(37.7)

where constant c is the steepness control, D0 is the cut-off frequency, D(u, v) is the distance from the origin of the centred Fourier transform, and γ H , γ L are the high-frequency and low-frequency gain, respectively. As this work is assessed on a real-time data, it can be physically seen from the fundus images that many other ailments are distributed around OD. For example, the ODs of images such as 04L, 19L, and 23R of Fig. 37.2 are surrounded with high intensity that leads to a substantial collapse in identifying the OD. To get rid of this difficulty, the region of interest (ROI) is utilized. As there are low and high contrast among retinal images, thus, automated and robust identification of ROI is a tedious work. The extraction of ROI is accomplished for reducing the region wherever the processing will be performed. Therefore, it optimized the results and optimized the processing time and better outputs are achieved. Similarity and discontinuity are the two basic properties of pixel intensity that are used for developing the segmentation algorithms. In the first method, the similar intensity values are grouped as pixels by isolating the image in related areas following specific conditions. These regions are identified in the second method depending on sudden variations in the intensity of the pixels targeting at the recognition of edges which is the example of the active contour. In the next step, active contour is performed to segment the approximate area of OD. The deformable contour is related to the energy, which is the key conception

37 Optic Disc Segmentation Based on Active Contour Model …

1l

1r

7r

8l

8r

9l

9r

10l

14l

14r

15l

15r

16l

13r

19r

20l

2l

2r

20r

3l

21l

21r

3r

4l

22l

4r

10r

22r

5l

11l

16r

5r

11r

17l

23l

371

12l

17r

23r

6l

12r

18l

24l

6r

13l

18r

24r

7l

25l

19l

25r

Fig. 37.2 Fundus images of VISAKHA dataset selected for glaucoma identification

of the method. Contour energy contains of two stages alike to a physical procedure: internal energy which is determined by the flexibility and solidness of the model, and external energy which is determined by the image characteristics considered for segmentation. The snake power function of is defined below. l E snake =

E snake (c(m))di

(37.8)

E in (c(m)) + E img (c(m)) + E cn (c(m))dm

(37.9)

0

l E snake = 0

where c(m) represents the contour designed through Cartesian coordinates x(m) and y(m), and m is the parameter through coordinates vary, E in represents the internal energy associated to the deformable curve and is related with priori knowledge, E img presents the energy rest on the image where the snake is implanted, E cn presents the energy achieved through exterior limitation. The snake internal energy is accountable for maintaining the evenness of the curve when implanted into the force field. It is denoted by the energy of the image and computed as follows. 2 2 dc 2 d c + β(m) 2 E in = α(m) dm dm

(37.10)

E img = ωline E line + ωedge E edge

(37.11)

372

S. Dash et al.

where α(m) presents the elasticity curve elasticity, β(m) presents curve stiffness, and ω is the weights of these energies. In the next step, an efficient OC and OD boundary detection is performed, using a texture-based approach. Finally, CDR and RDR are calculated for diagnosing the glaucoma.

37.2.2 Materials In this work, real-time images collected from Visakha Eye Hospital, Visakhapatnam, AP, India, are used. Around 1500 images are collected, which are of various pixel sizes such as 3956 × 3958, 4008 × 4006, 3912 × 3914, and 4304 × 4306 with name of the patients. The images are captured using CLARUS 700 retinal camera that captures wide-field images in true colour. Initially, fifty numbers of images that contains both left and right eye are chosen. The images are renamed and resized into pixel size of 512 × 512 to create the dataset and named as VISAKHA dataset. Figure 37.2 shows all the images selected for glaucoma detection.

37.3 Results and Discussions The key purpose of this work is to suggest an automated technique for segmenting the OD and computing CDR and RDR for glaucoma identification. The value 0.5 is accepted universally for the computation of CDR. If CDR is higher than 0.5, the retinal image is glaucomatic. The ratio of rim area in inferior temporal region to optic disc area is known as RDR. The value 0.4 is accepted universally for the computation of RDR. If RDR is lower than 0.4, the fundus image is glaucomatic. CDR and RDR are calculated as follows. • CDR = (optic cup area/optic disk area) • RDR = Rim area in inferior temporal region/optic cup. Table 37.1 represents the computed values of CDR and RDR and accordingly classify either as healthy eye or glaucoma eye. As the dataset contains images of various brightness, different OD size, and position, thus, there are three failure cases of OD detection. The failure images are 07l, 07r, and 23r. Due to the ailments exist in this dataset, there are lots of damages produced. Subsequently, in few images, the suggested approach attained an improper segmentation of OD. For example, retinal images 04l and 19l are having highly uneven illumination, and segmentation of OD is an inspiring work. Figure 37.3 displays few segmented OD for both even and uneven illuminated retinal images. Figure 37.4 displays disc and cup boundaries for even and uneven illuminated fundus images. Thus, uneven illumination is observed in the back scene of the eye

37 Optic Disc Segmentation Based on Active Contour Model …

373

Table 37.1 CDR and RDR values of glaucoma and healthy eye Sample of retina

Healthy (CDR)

Glaucoma (CDR)

Healthy (RDR)

Glaucoma (RDR)

Remarks

1l

0.436



0.656





1r

0.465



0.591





2l

0.438



0.650





2r

0.490



0.536





3l

0.487



0.544





3r

0.439



0.698





4l

0.358



0.815





4r

0.414



0.724





05l

0.432



0.704





5r

0.455



0.617





6l



0.528



0.356



6r



0.602



0.390



7l









OD is reddish is colour

7r









OD is reddish is colour

8l

0.457



0.606





8r

0.455



0.665





9l

0.455



0.608





9r

0.398



0.770





10l

0.421



0.699





10r

0.430



0.664





11l

0.487



0.542





11r

0.459



0.604





12l

0.446



0.631





12r

0.361



0.702





13l

0.437



0.656





13r

0.496



0.519





14l

0.440



0.649





14r

0.503



0.510





15l

0.437



0.656





15r

0.436



0.658





16l

0.415



0.717





16r

0.368



0.878





17l

0.424



0.693





17r

0.458



0.609



– (continued)

374

S. Dash et al.

Table 37.1 (continued) Sample of retina

Healthy (CDR)

Glaucoma (CDR)

Healthy (RDR)

Glaucoma (RDR)

Remarks

18l

0.397



0.776





18r

0.398



0.766





19l

0.480



0.558





19r

0.398



0.776





20l

0.445



0.632





20r

0.423



0.692





21l

0.449



0.790





21R



0.501

0.398



22l

0.373

0.856





22r

0.390

0.797





23l

0.497

0.516





23r









Highly uneven illumination

24l

0.363



0.892





24r

0.345



0.964





25l

0.423



0.696





25r

0.362

0.897





4l

5r

10r

12l

13l

Fig. 37.3 Examples of segmented OD for even and uneven illuminated fundus images

images, and often the frizzy and low contrast, that normally decreases the segmentation proportion. Consequently, it is stated that the rim thickness is a vital component for identifying the glaucomatic condition of the fundus image. In inferior portion, the strong optic disk is more thick, then thicker in superior, then in nasally, and very thin in temporal region. Cup size rises in its region straight up decreasing the thickness of the rim in infero-temporal disc regions in a glaucomatic eye. Therefore, rim–disc ratio for infero-temporal area of the rim is estimated to determine the glaucomatic condition.

37 Optic Disc Segmentation Based on Active Contour Model …

5r Disc bo un dary

10r Disc bound ary

12l Disc boundary

13l Disc bo un dary

375

5r cup boundary

10r cup bound ary

12l cup boundary

13l cup bound ar y

Fig. 37.4 Examples of disc and cup boundaries for even and uneven illuminated fundus images

376

S. Dash et al.

37.4 Conclusion The OD identification and segmentation are very important to diagnose glaucoma. The work suggests an automated localization of OD, segmentation, and evaluation of CDR and RDR is developed. In the first step, red channel is selected as it removes the vessels in the fundus image. At this step, morphological processes are employed to enhance visualization and to remove the coarseness from the scene. The OD location candidates are detected utilizing homomorphic filter and Otsu approach. Active contour model is used for the OD segmentation. Texture-based model is used for finding the disc boundary and cup boundary. By evaluating the CDR and RDR, the pathological condition of the eye is specified. The suggested method is assessed on a real-time challenging dataset, which consists of many retinal images with severe pathological diseases. Therefore, it is misleading towards OD segmentation. Even though the OD is segmented accurately still for three fundus images, the suggested approach is completely failed. Also, for few other fundus images, the high-intensity patches are segmented along with the OD. Further, we are working out to investigate the effect of different illumination and denoising techniques, adding more number of fundus images to the VISAKHA dataset for future work. The dataset can be made available to the researchers upon request basis.

References 1. Costagliola, C., dell’Omo, R., Romano, M.R., Rinaldi, M., Zeppa, L., Parmeggiani, F.: Pharmacotherapy of intraocular pressure-part II. Carbonic anhydrase inhibitors, prostaglandin analogues and prostamides. Exp. Opin. Pharmacother. 10(17), 2859–2870 (2009) 2. European Glaucoma Society, Terminology and Guidelines for Glaucoma, 4th ed. Publicomm, Savona, Italy (2014) 3. Almazroa, A., Burman, R., Raahemifar, K., Lakshminarayanan, V.: Optic disc and optic cup segmentation methodologies for glaucoma image detection: a survey. J. Ophthalmol. 2015(180972), 1–28 (2015) 4. Nicolela, M.T.: Optic nerve: clinical examination. In: Giaconi, J.A., Law, S.K., Coleman, A.L., Caprioli, J. (eds.) Pearls of Glaucoma Management, pp. 15–21. Springer, Berlin, Germany (2010) 5. Cheng, J., Liu, J., Yu et al.: Superpixel classification based optic disc and optic cup segmentation for glaucoma screening. IEEE Trans. Med. Imaging 32(6), 1019–1032 (2013) 6. Sinha, N., Babu, R.V.: Optic disc localization using L1 minimization. In: Proceedings of 19th IEEE International Conference on Image Processing (ICIP’12), pp. 2829–2832, Orlando, Fla, USA, October 2012 7. Abdullah, M., Fraz, M.M., Barman, S.A.: Localization and segmentation of optic disc in retinal images circular Hough transform. Peer J. 4(3). e2003 (2016) 8. Bharkad, S.: Automatic segmentation of optic disc in retinal images. Biomed. Sig. Process. Control 31, 483–498 (2017) 9. Zhou, W., Yi, Y., Gao, Y., Dai, J.: Optic disc and cup segmentation in retinal images for glaucoma diagnosis by locally statistical active contour model with structure prior. Comput. Math. Methods Med. 2019, 1–16 (2019). ID 8973287

37 Optic Disc Segmentation Based on Active Contour Model …

377

10. Mvoulana, A., Kachouri, R., Akil, M.: Fully automated method for glaucoma screening using robust optic nerve head detection and unsupervised segmentation based cup-to-disc ratio computation in retinal fundus images. Comput. Med. Imaging Graph. 77(101643), 1–19 (2019)

Chapter 38

Array Thinning Using Social Modified Social Group Optimization Algorithm E. V. S. D. S. N. S. L. K. Srikala, M. Murali, M. Vamshi Krishna, and G. S. N. Raju

Abstract The thinning in the antenna array involves reducing the number of elements with desired sidelobe level (SLL) and beamwidth (BW). In this paper, the linear antenna array (LAA) is chosen for thinning with the objective of obtaining the suppressing the SLL to the best possible level with the constraint of fixed uniform BW. The considered LAA shall have 40 elements in the full array configuration in which all the elements are switched ON. Further, the LAA is thinned with different magnitudes while the elements to be switched OFF are determined as per the objective using the social group optimization algorithm (SGOA). The process of thinning is perceived as the non-uniform spacing technique of suppressing the SLL with constraints. The results are analyzed in terms of radiation pattern plots. The simulations are carried out in MATLAB.

38.1 Introduction Linear antenna arrays are simple and easy to design configuration of high-directive radiating systems [1–3]. Thinning of the antenna array has the advantage of providing uncompromised or marginally compromised patterns in terms of SLL and BW with a respectable number of elements of the array left unused or switched OFF [4–7]. This is undoubtedly helpful in meeting the radiation characteristics with a reduced number of elements in the array antenna. It is also possible to handle the thinning process with uniform amplitude distribution [5, 6]. However, the process portrays a non-uniform spacing technique, which should usually be in multiples of λ/2. The E. V. S. D. S. N. S. L. K. Srikala (B) Department of ECE, Centurion University of Technology and Management Andhra Pradesh, Gidijala, AP, India M. Murali · G. S. N. Raju Centurion University of Technology and Management Andhra Pradesh, Gidijala, AP, India e-mail: [email protected] M. Vamshi Krishna Dhanekula Institute of Engineering and Technology, Vijayawada, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_38

379

380

E. V. S. D. S. N. S. L. K. Srikala et al.

efficient thinning process in LAA involves determining appropriate elements which are to be switched OFF. From the engineering perspective, it appears like a traveling salesman problem where the objective is to determine the proper sequence of nodes. This is also inherently a multimodal problem which is a complex task to deal with numerical techniques. Several evolutionary computing tools are applied to solve the antenna array synthesis problem with objectives like SLL suppression, null control, shaped beams, etc. In this paper, the recently popular algorithm known as SGOA [8–10] is used to determine the position of elements to be switched OFF in the linear array of 40 elements. The SGOA has been earlier employed for optimizing the linear and circular antenna arrays with several objectives and constraints [10–14]. Further, the manuscript is organized into five sections. The problem formulation is given in Sect. 38.2, while the algorithm and its implementation are discussed in Sect. 38.3. The results and discussion are presented in Sect. 38.4, and finally, the overall conclusions are mentioned in Sect. 38.5.

38.2 Problem Formulation Symmetric LAA is considered for simulation and demonstration of thinning. Accordingly, consider 2 N element LAA geometry as shown in Fig. 38.1. The corresponding array factor is given as [2] E(φ) = 2

N 

In cos[kxn cos(φ) + ϕn ]

(38.1)

n=1

Here, I, ϕ, and x are the corresponding element’s current excitation amplitude, phase, and position in the array, respectively. The geometry of the considered symmetric LAA is as given in Fig. 38.1a while the uniform radiation pattern of the 40 elements LAA is shown in Fig. 38.1b. The SLL is measured as −13.24 dB while the BW is 5.7°.

(a)

Fig. 38.1 Linear array antenna a geometry, b uniform radiation pattern

(b)

38 Array Thinning Using Social Modified Social Group …

381

The proposed generalized fitness function corresponding to the above discussion is formulated as, For θ l ≤ θ ≤ θ h f 1 = |SLLdes − SLLobt |

(38.2)

f 2 = |BWdesired − BWobt |

(38.3)

f 3 = |ncOFF − n OFF |

(38.4)

For θ h ≤ θ ≤ θ l

Here, SLLdes and BWdes are desired SLL and BW, while SLLobt and BWobt are obtained values from simulation. Also, ncOFF and nOFF are the number of elements to be switched OFF and elements in OFF state. The final cost function is f = f1 + f2 + f3

(38.5)

38.3 The Social Group Optimization Algorithm The SGOA mimics the social behavior of human beings [8–10]. Every human inherently possesses some coefficient of intelligence and skill to get aware of the problem and instantly provide a solution. An individual with the highest degree of knowledge or skill set can provide the best solution to a problem. Further, it is possible to improve his/her key finding skills subsequently. This knowledge exchange to find a solution to individuals’ problems is formulated and structured in the social group optimization algorithm. For a population of N individuals, each individual referred to as X i can provide a solution to a design problem of dimension ‘d.’ The same as given as population X = [X 1 , X 2 , X 3 , . . . , X N ]

(38.6)

xi = [x1 , x2 , x3 , . . . , xd ]

(38.7)

where i = 1,2, …, N. The algorithm progresses in two phases, namely the improving and acquiring phase. In the improving phase, every individual tries to improve his solution with respect to the best in the society or group. This formulated as

382

E. V. S. D. S. N. S. L. K. Srikala et al.

X newi = cX i + r.(X best − X i )

(38.8)

X newi is the improved version of ith individual and X best is the best individual in the group, and r is a random number. Similarly, the following formulation is used to update individuals in acquiring phase. X newi = X i + D.r1 .(X i − X r ) + r2 (X best − X i )

(38.9)

Here, D = 1 if f (X i )) < f (X r ). = −1 if f (X i ) > f (X r ).

(38.10)

Here, r1 and r2 are two random numbers with uniform distribution within the range (0, 1), while D is a determinant factor. Similarly, X r is another individual.

38.4 Results and Discussions The simulation-based experimentation is carried out using MATLAB on an i5 processor. The experimental framework is structured to deal with the objectives of the LAA thinning. Several rules are knitted with the objectives. The first being is that there are only two levels of amplitude distribution. An element of the array can have an amplitude of either ‘1’ which refers to switched ON, or ‘0,’ which refers to the switching state ‘OFF.’ The main objective of the thinning process involves determining the appropriate elements which have to be switched ‘OFF,’ which can provide SLL as low as possible. At the same time, the corresponding BW is fixed to the BW of uniform distribution. A 40 element-symmetric LAA is considered for the experimentation. As a result, it is sufficient to determine the distribution of 20 elements which will be mapped to the other 20 elements, which are arranged on the other side of the reference point. The thinning is achieved for different magnitudes ranging from 5 to 40%, with an interval of 5%. Here, 5% corresponding to 2 elements bearing the state of switched ‘OFF’ while the remaining 38 elements would be switched ‘ON’. In the first instance of 5% thinning, the element number ‘9’ is forced to switched ‘OFF’ state as determined by SGOA, which reported an SLL of −13.45 dB. This SLL is better than the uniform SLL of −13.24 dB. The corresponding radiation pattern and determined amplitude distribution are presented in Fig. 38.2a, b, respectively. Following this, as a next example, a 10% thinning target is considered. The SGOA determined that switching ‘OFF’ the ‘8th’ and ‘13th’ elements on either side of the

38 Array Thinning Using Social Modified Social Group …

(a)

383

(b)

Fig. 38.2 5% thinned array a radiation pattern and b amplitude distribution

reference point would result in an SLL of −14.25 dB. The radiation pattern and the amplitude distribution stem plot are given in Fig. 38.3a, b. Similarly, best SLL is obtained when set of elements indexed with (1, 2, 6, 12, 14, 18) for 25% thinning and elements with index (4, 5, 10, 11, 14, 19) for 30% are to be switched ‘OFF’ reporting an SLL of −13.61 dB and −13.26 dB, respectively. The corresponding radiation pattern and amplitude distribution pertaining to 25% thinning are as shown in Fig. 38.6a, b. While for 30% thinning, they are given in Fig. 38.7a, b. Following this, the SGOA is also employed to achieve 35% and 40% thinning, which resulted in patterns with SLL of −13.03 and −11.71 dB. The corresponding radiation pattern plots are presented in Fig. 38.8a, b, respectively. The amplitude distribution corresponding to these two cases is as illustrated in Figs. 38.8b and 38.9b, respectively.

(a)

(b)

Fig. 38.3 10% thinned array a radiation pattern and b amplitude distribution

384

E. V. S. D. S. N. S. L. K. Srikala et al.

Further, the simulation is extended to 15 and 20% thinning, where the SLL further suppressed to −14.41 and −14.91 dB. The radiation pattern and the amplitude distribution corresponding to these two examples are given in Figs. 38.4a, b and 38.5a, b. The (4th, 12th, 15th) and (4th, 9th, 13th, 15th) elements are to be switched OFF for 15 and 20% thinning in order to obtain the low SLL. The consolidation list of determining for all the thinning cases is presented in Table 38.1. It is interesting to note that the SLL suppression improved from 5 to 20% thinning. Further, for 25 and 30% thinning, though the suppressed SLL is better than the uniform case, the trend seems to descend. The SLL is competitively not better than the uniform case for 35 and 40% thinning.

(a)

(b)

Fig. 38.4 15% thinned array a radiation pattern and b amplitude distribution

(a)

(b)

Fig. 38.5 20% thinned array a radiation pattern and b amplitude distribution

38 Array Thinning Using Social Modified Social Group …

(a)

385

(b)

Fig. 38.6 25% thinned array a radiation pattern and b amplitude distribution

(a)

(b)

Fig. 38.7 30% thinned array a radiation pattern and b amplitude distribution

(a)

(b)

Fig. 38.8 35% thinned array a radiation pattern and b amplitude distribution

386

E. V. S. D. S. N. S. L. K. Srikala et al.

(a)

(b)

Fig. 38.9 40% thinned array a radiation pattern and b amplitude distribution

Table 38.1 Variation of the SLL with respect to the percentage of thinning

Percentage of thinning (%)

SLL (in dB)

5

−13.45

10

−14.25

15

−14.41

20

−14.99

25

−13.61

30

−13.26

35

−13.02

40

−11.71

38.5 Conclusion The SGOA for linear arrays is applied to obtain the antenna array thinning. The algorithm is robust in thinning the antenna arrays with several magnitudes. It is possible to conclude that the thinning can provide suppressed SLL patterns because it can be perceived as the non-uniform spacing technique for optimizing the antenna arrays. However, the SLL suppression is not consistent and varies with the magnitudes of thinning nonlinearly. This suppression is highly unpredictable due to this fact.

References 1. Raju, G.S.N.: Antennas and Wave Propagation. Pearson Education India (2006) 2. Devi, G.G., Raju, G.S.N., Sridevi, P.V.: Application of genetic algorithm for reduction of sidelobes from thinned arrays. Adv. Model. Anal. B 58(1), 35–52 (2015) 3. Chakravarthy, V.V.S.S.S., Chowdary, P.S.R., Anguera, J., Mokara, D., Satapathy, S.C.: Pattern recovery in linear arrays using grasshopper optimization algorithm. In: Microelectronics,

38 Array Thinning Using Social Modified Social Group …

387

Electromagnetics and Telecommunications, pp. 745–755. Springer, Singapore (2021) 4. Haupt, R.L.: Linear and planar array factor synthesis. In: Antenna Arrays, pp. 115–215. Wiley (2010) 5. Haupt, R.L.: Adaptively thinned arrays. IEEE Trans. Antennas Propag. 63(4), 1626–1632 (2015) 6. Sartori, D., Oliveri, G., Manica, L., Massa, A.: Hybrid design of non-regular linear arrays with accurate control of the pattern sidelobes. IEEE Trans. Antennas Propag. 61(12), 6237–6242 (2013) 7. Dalirian, S., Majedi, M.S.: Hybrid DS-CP technique for pattern synthesis of thinned linear array antennas. In: Iranian Conference on Electrical Engineering (ICEE), pp. 416–419. IEEE (2018, May) 8. Naik, Satapathy, S.C., Ashour, A.S., Dey, N.: Social group optimization for global optimization of multimodal functions and data clustering problems. Neural. Comput. Appl. 30(1), 271–287 (2018) 9. Naik, A., Satapathy, S.C.: A comparative study of social group optimization with a few recent optimization algorithms. Complex Intell. Syst. 1–47 (2020) 10. Naik, A., Satapathy, S.C., Abraham, A.: Modified social group optimization—a meta-heuristic algorithm to solve short-term hydrothermal scheduling. Appl. Soft Comput. 95, 106524 (2020) 11. Swathi, A.V.S., Chakravarthy, V.V.S.S.S.: Synthesis of constrained patterns of circular arrays using social group optimization algorithm. In: Smart Intelligent Computing and Applications, pp. 453–459. Springer, Singapore (2020) 12. Sekhar, B.V.D.S., Reddy, P.P., Venkataramana, S., Chakravarthy, V.V., Chowdary, P.S.R.: Image denoising using novel social grouping optimization algorithm with transform domain technique. Int. J. Nat. Comput. Res. (IJNCR) 8(4), 28–40 (2019) 13. Chakravarthy, V.V.S.S.S., Chowdary, P.S.R., Satapathy, S.C., Anguera, J., Andújar, A.: Social group optimization algorithm for pattern optimization in antenna arrays. In: Socio-cultural Inspired Metaheuristics, pp. 267–302. Springer, Singapore (2019) 14. Chakravarthy, V.V.S.S.S., Rao, P.M.: Circular array antenna optimization with scanned and unscanned beams using novel particle swarm optimization. Indian J. Appl. Res. 5(4) (2015)

Chapter 39

Impact of Flash Flood on Landuse and Landcover Dynamics and Erosional Pattern of Jiadhal River Basin, Assam, India Amar Kumar Kathwas, Rakesh Saur, and V. S. Rathore Abstract Flash flood is one of the most disastrous natural hazards causing disruption in the environment and societies. It is mainly governed by intense rainfall, and due to its quick action, effective response is challenging. Several studies have assessed impact of flash flood events concerning land use/landcover (LULC), financial losses, and destruction of infrastructure. In collaboration of rising population and expansion of urban centers along the floodplains, the increased frequency of the flash floods results in enormous damage to natural resources. The flash floods are primarily governed by intense rainfall events, topography, LULC, soil, and available soil moisture which are also prime parameters governing soil erosion. The above rationale forms the motivating factor to assess the influence of flash floods on LULC dynamics and pattern and magnitude of soil loss in mountainous river basin on subwatershed scale. The findings of the present study are insightful for formulation of effective land management strategies toward natural resource sustainability.

39.1 Introduction Flood, it can be defined as a hydrological event characterized by rainfall or discharge of high-water levels that deluges lands sharing border with streams, rivers, lakes, wetlands, or any water bodies. Primarily, two types of floods exist in nature; first, fluvial floods which occur when the water levels in a stream, river, or lake rise to an extent that it overflows against the river banks, shores, and the contiguous land, A. K. Kathwas Haryana Space Applications Centre, CCS Haryana Agricultural University, Hisar, Haryana 125004, India R. Saur (B) · V. S. Rathore Department of Remote Sensing, Birla Institute of Technology, Mesra, Ranchi, Jharkhand 835215, India V. S. Rathore e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_39

389

390

A. K. Kathwas et al.

and second, pluvial floods or flash flood, which are characterized by an intense, high velocity torrent of water triggered by torrential rain falling within a short amount of time and within the vicinity or on nearby elevated terrain [2, 3, 6]. In contrast to fluvial floods, the flash floods subside quickly, however it is extremely dangerous and destructive owing to the force of water and the hurling debris sweeping up with the flow [5]. Further, it also disrupts the natural harmony of the river basin by inducing various types of obliteration, for example, loss of fertile soil, soil loss due to erosion, economic and physical loss, etc. Due to the climate change, the intensity of the rainfall both at national and global scale in combination with swift urbanization process has significantly escalated the frequency and impacts of flash floods [7]. The studies concerning flash floods reveal that LULC categories such as cropland and built-up receive majority damages [4]. Irrespective of the disruption caused by flash floods to the soil fertility, economies, and societies, it has not received the methodical and all-inclusive study proportionate with their impacts. Flash floods not only deplete the nutrient stock of the soils by eroding away nutrient rich above layer of soil, but also radically transform the LULC significantly. These alterations caused by flash floods result in high magnitude of land degradation, primarily due to soil erosional process. The nature and impact of flash floods are principally governed by rainfall, topography, LULC, soil types, and the soil moisture which are also the prime erosion governing parameters [3]. Considering the above implication, it can be presumed that flash floods have significant impact on LULC dynamics and soil erosion process. The present study has been carried out in mountainous watershed of Jiadhal River basin, a tributary of river Brahmaputra flowing across the Assam state of India. Here in the present study, an attempt has been made to comprehend the relationship and impact of the flash floods on LULC dynamics and soil erosion process computed using Universal Soil loss equation (USLE), at spatial and temporal scale, respectively. The findings of the study help in establishment of flash flood’s role in LULC dynamics and higher soil loss in the watershed.

39.2 Study Area The Jiadhal River basin is located in the northeast region of India (Fig. 39.1). The river is a tributary of the mighty Brahmaputra River, which is severely affected by yearly flash floods. The spatial area of the river basin is approximately 1173.65 Km2 extending from 27°11’N to 27°4’N Latitude and 94°14’E to 94°38’E Longitude. Physiographically, the basin can be divided into two broad categories, i.e., the Himalayan hill range and the fluvial flood plains. The hills comprise approximately 34 percent area whereas the flood plains encompass 66 percent area of the river basin. The Jiadhal River originates from the lower Himalayan ranges in West Siang district of Arunachal Pradesh and flows toward south through the flood plains of the Dhemaji district in Assam and meets the Brahmaputra river near Majuli Island.

39 Impact of Flash Flood on Landuse and Landcover Dynamics …

391

Fig. 39.1 The location of the study area

39.3 Methodology At the onset, the satellite imageries corresponding to three time-periods (1990, 2000, and 2010) a decade apart were downloaded from (https://earthexplorer.usgs.gov/), and the atmospheric correction was carried out for various bands in the Erdas Imagine 9.2 Software. Further, the satellite images were geometrically rectified with respect to SOI (Survey of India) topomaps, and the image enhancement techniques were applied for better visualization of the features. The enhanced imageries were further used for the generation of LULC maps through digitization process in GIS environment, using ArcGIS 10 software. Moreover, the imageries were also used for the generation of Normalized difference Vegetation index (NDVI) for the computation of C-factor, for respective time-periods. The monthly rainfall datasets were downloaded from CHELSA (Climatologies at High Resolution for the Earth’s Land Surface Areas) (http://chelsa-climate.org/) with the spatial resolution of 1 km for the computation of R-factor whereas the Cartosat DEM was downloaded from Bhuvan (https://bhuvan-app3.nrsc.gov.in/) and used for slope length (LS-factor) and conservation practices (P-factor) computation. Since in the present study the scale of analysis is sub-watersheds, the DEM was also used for the delineation of sub-watersheds using the Arc-SWAT extension of ArcGIS software. The soil map procured from the National Bureau of Soil Survey and Land Use Planning (NBSS & LUP) was used for the derivation of K-factor.

392

A. K. Kathwas et al.

The USLE is the multiplicative equation of the erosion governing parameters where the product is the amount of soil loss eroded from the land parcel. Among the erosion governing parameters R and C-factor are time-variant, whereas the other parameters are time-invariant. The USLE can be represented as: A = R × K × LS × C × P

(39.1)

where A is the computed soil loss, R is the rainfall-runoff erosivity factor, K is the soil erodibility factor, L is the slope length factor, S is the slope steepness factor, C is the cover management factor, and P is the supporting practices factor [7]. R-factor: Rainfall erosivity is the aggregate of the rainfall amount occurring in a year. It represents the impact of raindrops and the rate of runoff. In the present study, the equation used for the computation of rainfall erosivity was derived from [1]. K-Factor: It can be defined as rate of soil loss from the unit plot size. The values of soil erodibility corresponding to various soil texture groups were derived from [1]. LS-Factor: The slope and slope length combined represents the topographic factor, LS. The LS-factor was computed using the ArcGIS raster calculator tool and can be expressed as  LS =

flow accumulation × cell size 22.13

0.6

 ×

sin slope × 0.01745 0.09

1.3 (39.2)

C-Factor: The land cover and management factor represent ratio of soil loss from a land parcel cropped under specific condition. It reflects the combined effect of landcover and crop sequence. Its computation was carried out using the NDVI which is an indicator of vegetation health and vigor. The NDVI is computed using the formula (39.3) given below: NDVI = (NIR − R)/(NIR + R).

(39.3)

Further the NDVI values were transformed to C-factor values using the following equation [6] CN DVI = (1 − NDVI)/2)(1+NDVI) .

(39.4)

P-Factor: The conservation practices factor is loss of soil within a specific crop support practices such as contour, strip, and terracing. The P values range from 0 to 1. Various studies show that the P-factor is slope dependent, and therefore, the values were derived accordingly from [8]. Furthermore, the computed soil loss was standardized corresponding to various sub-watersheds using the formula (sum of soil loss from all the pixels/sub-watershed area) to determine net soil loss from the individual sub-watersheds.

39 Impact of Flash Flood on Landuse and Landcover Dynamics …

393

39.4 Results and Discussion 39.4.1 Land Use Land Cover and Erosion Dynamics The dynamics of LULC and soil loss was assessed for two time-periods, i.e., 1990– 2000 and 2000–2010. The analysis of change in LULC during the two time-periods was carried out on percent change basis (Fig. 39.2). The analysis of the results (Fig. 39.4) reveals that sub-watersheds number 1–4 and 10–16 show small or no change. On the contrary, sub-watersheds number 5–9 and 17 show significantly high degree of change during the two time-periods (1990–2000, 2000–2010), respectively. Furthermore, the pattern of LULC transformation during the two time-periods remains largely same; however, watersheds number 5, 7, and 8 show greater amount of transition among the LULC categories during 1990–2000 as compared to 2000– 2010, respectively. On the other hand, the magnitude of net soil loss from individual sub-watersheds shows positive trend, i.e., increase in soil loss from 1990–2000 to 2000–2010, respectively. However, during the time-period 2000–2010, the subwatersheds number 7, 11, and 16 show negative trend, i.e., decrease in the magnitude of soil loss. On the contrary, the pattern of change in soil loss amount, from the various subwatersheds, remains largely dissimilar during the two time-periods. Figure 39.3 shows the dynamics of net soil loss from individual sub-watersheds against their spatial extent. A peculiar observation that can be made from the Fig. 39.3 is the area of sub-watersheds number 1–3 and 11–16 are comparatively smaller to sub-watersheds number 4–10 and 17, but still shows higher amount of soil loss. The above observation can be attributed to the high degree of slope, as these sub-watersheds are in the upper part of the river basin possessing areas with higher degree of slopes.

Fig. 39.2 Percent LULC transformation and Net change in soil erosion magnitude in respective sub-watersheds

394

A. K. Kathwas et al.

Fig. 39.3 Net soil erosion pertaining to various sub-watersheds in respective time-periods

Fig. 39.4 Soil erosion severity under different sub-watersheds corresponding to respective timeperiods

39.4.2 Impact of Flash Flood on Soil Loss The mean and coefficient of variation (CV) values of soil erosion governing timevariant parameters, i.e., R-factor and C-factor is presented in Figs. 39.5 and 39.6. The analysis of mean and CV values corresponding to R-factor reveals that the magnitude and pattern of rainfall erosivity corresponding to individual subwatersheds remains largely same. On the contrary, the mean C-factor values pertaining to respective sub-watersheds show an increasing trend, i.e., 1990 < 2000 < 2010. The smaller mean C-factor values of sub-watersheds 10, 11, and 16

39 Impact of Flash Flood on Landuse and Landcover Dynamics …

395

Fig. 39.5 Mean and CV values of R-factor corresponding to different sub-watersheds in respective time-periods

Fig. 39.6 Mean and CV values of C-factor corresponding to different sub-watersheds in respective time-periods

corresponding to year 2010 resulted in lower soil loss amount, compared to year 2000. On the contrary, sub-watersheds number 14 show small amount of increase in magnitude of net soil loss in year 2010 compared to 2000.

39.5 Conclusion The results obtained from the analysis of statistics reveals that the flash floods cause significant amount of change in spatial extent of LULC in sub-watershed 1, 4–10, and 17, where highest degree of LULC transformation occurred in sub-watershed 8, 7, and 9 followed by marginal degree of transformation in sub-watershed 5, 17, and 6, whereas the sub-watershed 5, 1, and 10 show minute degree of LULC transformation during time-period 1990–2000. Moreover, during 2000–2010 time-period, sub-watershed 8, 9, and 7 witnessed highest degree of transformation followed by sub-watershed 7, 6 and 17 with marginal change and sub-watershed 4, 1, and 10 depicts minute degree of change. The remaining sub-watersheds show no change in spatial extent of LULC categories. Since, the sub-watersheds number 1–4 and 10–16 are in upper reaches and are dominated by vegetation cover (more than 80 percent), the higher degree of soil loss can be attributed to significant amount of damage

396

A. K. Kathwas et al.

caused by flash floods to vegetation which was verified during ground truthing, leading to smaller values of NDVI, and hence higher degree of erosion in respective sub-watersheds. Acknowledgments Authors would like to acknowledge USGS and ISRO-Bhuvan for providing satellite imageries free of cost.

References 1. Akangsha, B., Dhrubajyoti, S., Ashok, B., Santonu, G.: Soil loss estimation of Jiadhal river basin, Assam, using revised universal soil loss equation model (RUSLE). North Eastern Geographer 40, 113–128 (2019) 2. Esposito, G., Matano, F., Scepi, G.: Analysis of increasing flash flood frequency in the densely urbanized coastline of the campi flegrei volcanic area, Italy. Front. Earth Sci. 6, 63 (2018) 3. Hapuarachchi, H.A.P., Wang, Q.J., Pagano, T.C.: A review of advances in flash flood forecasting. Hydrol. Process. 25, 2771–2784 (2011) 4. Hazarika, H., Das, A.K., Borah, S.B.: Assessing land-use changes driven by river dynamics in chronically flood affected Upper Brahmaputra plains, India, using RS-GIS techniques. Egypt. J. Remote. Sens. Space Sci. 39(18), 107–118 (2015) 5. Khajehei, S., Ahmadalipour, A., Shao, W., Hamid, M.: A place-based assessment of flash flood hazard and vulnerability in the contiguous United States. Sci. Rep. 10, 448 (2020) 6. Lin, C., Lin, W., Chou, W.: Soil erosion prediction and sediment yield estimation: the Taiwan experience. Soil Tillage Res. 68, 143–215 (2002) 7. Renard, K.G.: Predicting soil erosion by water: a guide to conservation planning with the revised universal soil loss equation (RUSLE). United States. Agricultural Research Service. U.S. Department of Agriculture, Agricultural Research Service (1997) 8. Shin,G. J.: The analysis of soil erosion analysis in watershed using GIS”, Ph.D. Dissertation., Ang-Won National University (1999) 9. Saharia, M., Kirstetter P., Vergara, H., Gourley, J.J., Hong, Y., Giroud, M.: Mapping flash flood severity in the United States. J. Hydrometeorol. 397-411 (2017) 10. Sayama1, T., Yamada1, M., Sugawara, Y., Yamazaki, D.: Ensemble flash flood predictions using a high-resolution nationwide distributed rainfall-runoff model: case study of the heavy rain event of July 2018 and Typhoon Hagibis in 2019. Progress in Earth and Planetary Sci. 7, 75 (2020)

Chapter 40

Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model, Computational Intelligence Methods, and Geospatial Analytics: A Case Study of Aizawl City, Mizoram—India Gospel Rohmingthangi, F. C. Kypacharili, Alok Bhushan Mukherjee, and Bijay Singh Mipun Abstract Landslide is a ubiquitous phenomenon. It may be perilous in terms of possible losses which cost human lives, network and economic infrastructures. Hence, risk quantification of landslides is indispensable for effective disaster management. This study assessed and quantified the risk associated with landslide events. Moreover, utility of computational intelligent methods such as fuzzy spatial overlay operations along with GIS spatial overlay operations, and AHP-TOPSIS model is demonstrated in the study for deciphering various aspects of landslide risk dynamics. Social vulnerability was also assessed in the context of infrastructure of the region. Findings of the study suggests that rainfall with highest relative weight among the identified influential factors may have significant impact in triggering landslides. It has been observed that the highest risk zone for landslides fall in the vicinity of central and eastern part of Aizawl city surrounded by high-moderate risk zone of landslides.

40.1 Introduction There has been significant transformations and modifications in the earth’s surface since the beginning of human civilization. These transformations may have happened due to internal and external forces of the earth. Landslides correspond to the movement of a mass of rock, debris, or earth down a slope due to the force of gravity [2]. G. Rohmingthangi (B) · F. C. Kypacharili Mizoram University, Aizawl, Mizoram, India A. B. Mukherjee R&D Head-NE India Operations, Leads Research Lab, (LeadsConnect Services Pvt. Ltd), Noida, Uttar Pradesh, India B. S. Mipun North Eastern Hill University, Shillong, Meghalaya, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_40

397

398

G. Rohmingthangi et al.

Total number of 4862 landslide events were recorded across the world between the years 2004 and 2016. This has been observed that majority of the recorded landslide events are related to slope failure [3]. It has also been observed that anthropogenic activities were also responsible for causing some of the landslide events in the past. But impact is generally not significant in the scenario of human-induced landslides. However, increase in population may encourage cropping of new urban clusters and subsequent increase in economic development. Such phenomenon and growth can be alarming [5]. It is important to consider the fact that availability of information pertaining to the magnitude and expected frequency of mass movement may help in containing the intensity of consequent effects of landslides [10]. Earlier, preliminary aspect of landslide hazard has been investigated by Jian and Xiang-Guo [7]. Interrelationship among influential factors and development of factor analysis models are important for designing flood hazard assessment frameworks [1]. Landslide is the most frequent natural hazard in Aizawl city, Mizoram. Especially, this phenomenon is more widespread in the monsoon season. Majority of landslides in the city are consequences of random anthropogenic activities. Hence, this study aims to assess and quantify the landslide risk dynamics of study area using advance frameworks and methodologies.

40.2 Materials and Methodology 40.2.1 Location of the Study Area This study was performed in the capital city of Mizoram, i.e., Aizawl. The study area is located in the southern part of North-East India (Fig. 40.1) bounded by the extent between 92o 39´54´´E to 92o 46´57´´E longitude and 23o 39´54´´N to 23o 50´35´´N latitude. Aizawl is represented by No. 83 A/9, 83 A/10, and 83 A/14 in the Survey of India toposheet. Geographical area of Aizawl is about 120.3 km2 with average annual rainfall of 1996.1 mm.

40.2.2 Data Used Satellite imageries of Landsat series and ASTER Global DEM (GDEM) were acquired for this study (USGS). Topographic variables such as elevation and slope were prepared using DEM. Furthermore, drainage density layer was prepared using DEM and ArcSwat software. Rainfall data and soil data were also used in this study for quantifying landslide risk dynamics.

40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model …

399

Fig. 40.1 Study area

40.2.3 Methodology Research methodology flowchart is presented in Fig. 40.2 for representing the flow process of quantifying landslide risk dynamics.

400

G. Rohmingthangi et al.

Fig. 40.2 Research methodology flowchart

40.2.3.1

Assessment of Landslide Hazard Zones

Landslide hazard zonation primarily studies the interrelation between the causative factors, creates factor analysis models, and hence executes hazard zonation [6]. Landslide hazard was assessed using spatial overlay operations such as knowledge-based GIS overlay, AHP-GIS overlay, and fuzzy-GIS overlay. Influential factors such as slope, elevation, LULC, lithology, rainfall, and drainage density were used for this study. Influential factors may have varying importance in the context of landslide hazard analysis. Therefore, relative importance weights were assigned to the aforementioned influential factors using knowledge-based, analytic hierarchy process, and fuzzy method. Next, weighted influential layers were integrated using spatial overlay

40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model …

401

frameworks such as knowledge-based, weighted sum, weighted overlay and fuzzy architecture. Analytic hierarchy process (AHP) was developed by Thomas L. Saaty in 1980. It is an effective technique for assigning relative weights to the influential parameters. Moreover, it is significant in containing inconsistencies in the process of computing weights. Application of AHP is simple and straightforward comprised of steps such as construction of pairwise comparison matrix, computation of importance weights, and consistency check [4]. Weights are assigned to different factors on the scale of 1 to 9 for constructing pairwise comparison matrix [9]. After computing relative importance weights, consistency check is performed in AHP framework using following equations: C.R. =

Consistency Index(C I ) Random Consistency Index(R.I )

Consistency Index(C.I.) =

(40.1)

(λmax − n) n−1

(40.2)

where λmax and n corresponds to the principal Eigen value and the order of pairwise comparison matrix, respectively. Random consistency index (RI) is a default value which was provided in the random matrix value by Saaty (1980). RI needs to be used according to the order of matrix. For both weighted sum and weighted overlay, priority weights which were computed using AHP (Table 40.1) were used in the AHP-GIS spatial overlay frameworks. In fuzzy spatial overlay, influential layers were reclassified using fuzzy membership function. Linear membership operator was used in fuzzy reclassification in the fuzzy-GIS overlay framework using the “SUM” overlay type. where CI = 0.03 and consistency ratio (CR) = 0.02. Table 40.1 Priority weights of landslide influencing factors Factors

Ra

S

G

E

LULC

DD

So

Weights

Rainfall(Ra)

1

1.5

3

4

5

6

6.5

0.34 0.24

Slope(S)

1/1.5

1

2

3

3.5

4.5

5

Geology(G)

1/3

1/2

1

2

2.5

4

4

0.16

Elevation(E)

1/4

1/3

1/2

1

2

2.5

3

0.10

LULC

1/5

1/3.5

1/2.5

1/2

1

2

2

0.07

Drainage density(DD)

1/6

1/4.5

1/4

1/2

1/2

1

2

0.05

Soil(So)

1/6.5

1/5

1/3

1/3

1/2

1/2

1

0.04

402

40.2.3.2

G. Rohmingthangi et al.

Assessment of Social Vulnerability to Landslides Using AHP-TOPSIS Model

Landslide vulnerable zones in the Aizawl city were identified using AHP-TOPSIS model in the context of “Infrastructure.” Influential factors such as density population to household (DPH), built-up density, and composite index (CI) were used in the study. AHP was computed for DPH, built-up density, and CI to get the priority weights. Identification of possibilities which is at the minimum from the positive ideal solution and maximum from the negative ideal solution is the basic principle of TOPSIS [8]. For the calculation of positive ideal solution, maximum value was taken from DPH and built up which was considered as positive factor and minimum value was taken from CI which was considered as negative factor and vice versa for negative ideal solution.

40.3 Results and Discussion 40.3.1 Results Obtained from Assessment of Landslide Hazard Area For AHP based, the high zone area comprises of 13.39 km2 in weighted overlay map and 21.59 km2 in weighted sum overlay map. For knowledge based, the zone of high area is 2.03 km2 in weighted overlay map and 21.32 km2 in weighted sum overlay. The fuzzy overlay map consists of the highest area for high zone, i.e., 96.03 km2 . If AHP and knowledge-based analysis are compared for hazard zonation, the AHP based map shows more precise results than the knowledge-based map which has been inferred from the validation of landslide hazard accuracy.

40.3.2 Map Validation of Landslide Hazard Accuracy Using Historical Landslide Data Landslide hazard zonation map was validated by computing the number of historical landslide falling in any of the five zones divided by the total number of past landslide known so far. Comparison was carried out for all the technique implemented and fuzzy overlay consists of the highest classes of high zones, followed by weighted sum overlay and the least by weighted overlay from the computational results representing variation in the map (Figs. 40.3, 40.4, 40.5, 40.6, 40.7 and 40.8).

40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model …

403

Fig. 40.3 AHP-LHZ map (weighted overlay)

40.3.3 Results Obtained from the Identification of Vulnerable Zones From the computed weight of AHP-TOPSIS, most of the central part of the city are having the high vulnerable zones in terms of infrastructure. The area having high vulnerability generally consists of high-density population with high household density and vice versa. The AHP-TOPSIS vulnerable zone map (based on ward map) has been classified into five different zones namely high, high-moderate, moderate, low, and least vulnerable zone (Fig. 40.9) which was again refined combining with proximity to vulnerable zone map showing more comprehensive results for the social vulnerable zones which is further categorized into five zones depending on their intensity, such as high, high-moderate, moderate, low, and least vulnerable zones (Fig. 40.10).

404

G. Rohmingthangi et al.

Fig. 40.4 AHP- LHZ map (weighted sum overlay)

40.3.4 Results of the Assessment of Risk Analysis for Landslide Hazard in Aizawl City From the assessment of risk analysis for landslide hazard in Aizawl city, it is observed that highest risk zone for landslide hazard falls in and around the central and eastern part of Aizawl city surrounded by high-moderate risk zone of landslide hazard zones. Most of the north western and southern parts of Aizawl city are under moderate risk zones, and the least risk zone is found in the western part of Aizawl where infrastructures are few (Fig. 40.11)

40.4 Conclusions This study demonstrates the utility of AHP-TOPSIS model along with computationally intelligent techniques and geospatial analytics for assessing, quantifying landslide risk dynamics of Aizawl city. Findings of AHP-TOPSIS modeling suggest that high vulnerable zones are usually observed in densely populated regions of the city. In addition, risk analysis map shows that the highest risk zone was located within the central part of the city. Therefore, overall investigations reveal that locality in the

40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model …

405

Fig. 40.5 LHZ map (fuzzy overlay)

Fig. 40.6 Transitional zone for each respective landslide hazard zone (black dot represents a part of specific past landslide location)

vicinity of the middle part of the city are more prone to landslide hazard and are in a high-risk zone, while most of its surrounding areas are not vulnerable to landslides. The resultant map can be used for different purposes such as disaster management planning that will further help in the alleviation of landslide risk.

406 Fig. 40.7 KB- LHZ map (weighted sum overlay)

G. Rohmingthangi et al.

40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model … Fig. 40.8 KB-LHZ map (weighted overlay)

407

408 Fig. 40.9 AHP-TOPSIS vulnerable zone map

G. Rohmingthangi et al.

40 Landslide Risk Dynamics Modeling Using AHP-TOPSIS Model … Fig. 40.10 Social vulnerable zone map

Fig. 40.11 Landslide hazard risk analysis map

409

410

G. Rohmingthangi et al.

Acknowledgements Authors are grateful to the management of North Eastern Hill University (NEHU) for providing the opportunity to perform the research. And special thanks to the managing director of Leads Connect Services Pvt. Ltd. for providing platform to refine the study and validate the results.

References 1. Anbalagan, R.: Landslide hazard evaluation and zonation mapping in mountainous terrain. Eng. Geol. 32, 269–277 (1992) 2. Cruden, D.M.: A simple definition of a Landslide. Bull. Int. Assoc. Eng. Geol. 43, 27–29 (1991) 3. Froude, M.J., Petley, D.N.: Global fatal landslide occurrence from 2004 to 2016. Nat. Hazard. 18, 2161–2181 (2018) 4. Gorsevski, P.V., Jankowski, P., Paul, E.G.: Heuristic approach for mapping landslide hazard integrating fuzzy logic with analytical hierarchy process. Control Cybernet. 35 (2006) 5. Jaboyedoff, M., Michoud, C., Derron, M.H., Voumard, J., Leibundgut, G., Rieux, K.S., Nadim, F., Leroi, E.: Human-induced landslides: toward the analysis of anthropogenic changes of the slope environment (2016) 6. Jian, W., Xiang-guo, P.: GIS based landslide hazard zonation model and its application. Proc. Earth Planet. Sci. 1, 1198–1204 (2009). https://doi.org/10.1016/j.proeps.2009.09.184 7. Kypacharili, F.C., Rohmingthangi, G., Mukherjee, A.B., Mipun, B.S.: Assessment of landslide hazard zone using soft computing approach and geospatial technology. In: 2nd Annual convention of North East(India) Academy of Science and Technology(NEAST) and International Seminar on Recent Advances in Science and Technology (IRSRAST) (2020) 8. Mukherjee, A.B., Krishna, A., Patel, N.: Application of remote sensing technology, GIS and AHP-TOPSIS model to quantify urban landscape vulnerability to land use transformation (2018) 9. Saaty, T.L.: A scaling method for priorities in hierarchical structures. J. Math. Psychol. 15, 231–281 (1977) 10. Saha, A.K., Gupta, R.P., Arora, M.K.: GIS-based landslide hazard zonation in the Bhagirathi (Ganga) valley Himalayas. Int. J. Remote Sens. 23(2), 357–369 (2002)

Chapter 41

Delineation and Assessment of Groundwater Potential Zones Using Geospatial Technology and Fuzzy Analytical Hierarchy Process Model Hundashisha Thabah and Bijay Singh Mipun Abstract The most important resources for the survival of human race are water resources. Sustainable development cannot be possible without proper monitoring and preserving of groundwater resources. Therefore, this becomes necessary to develop robust approaches and methods to identify groundwater potential zones and manage it. The main aim of this study is to identify the groundwater potential zones in the surrounding regions of Mawphlang Dam which falls in East Khasi Hills of Meghalaya. Different factors such as landuse/landcover, elevation, slope, drainage density, geology and soil have been identified as influencing factors. Techniques such as knowledge-based GIS overlay, AHP-based overlay and fuzzy AHP-based overlay have been employed. Results obtained from the study have been compared with the groundwater potential map prepared by Bhuvan. This shows that results are in right direction, and these findings have also been validated based on field observation. Significant variations in the results of different techniques have been found. This helps in analyzing the intrinsic characteristics of the model and event in a more effective manner.

41.1 Introduction Groundwater is also one of the important resources that contribute significantly in total annual supply. Over exploitation of this resource has considerably led to land subsidence at some places. For protection, management of water quality and groundwater systems, examining the potential zone of groundwater recharge is extremely important. With the help of remote sensing and geographic information system, we are able to demarcate the groundwater potential zones [1]. In improving the ability to explore for groundwater, remote sensing and GIS hold a great capability. The H. Thabah (B) Mawmluh, Smit, Shillong, Meghalaya, India B. S. Mipun North Eastern Hill University, Shillong, Meghalaya, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_41

411

412

H. Thabah and B. S. Mipun

integration of local field observation into the conventional GIS-based models helps upgrade better local results [2]. This study aims to locate groundwater potential zones in the study area. The benefits of different computational/informational techniques have been demonstrated in the study area. This is necessary to understand the impact of different techniques on results to identify the most appropriate technique for such studies.

41.2 Study Area Mawphlang has been chosen as case study area for this investigation. This is primarily a rural region which belongs to East Khasi Hills district. It is a village situated in the Southern part of Shillong. It lies on the slopes of “U Ryngkew Mawiong Hills.” It is believed that its name was derived from the abundance of stone and grass. Mawphlang claims its reputation from its sacred grove, which is the best conserved grove in Meghalaya and in India as well. It stands on a basin of a saucer-shaped depression with the hills sloping all around [3]. However, buffer of the demarcated region has been considered. Hence, after considering buffer of the demarcated region, some of the portions of the study area also fall in West Khasi Hill districts. Mawphlang belongs to Meghalaya state which is in the Northeastern part of India. The meaning of Meghalaya is abode of clouds. It is bounded by Assam on the North and East and by Bangladesh on the South. The climatic condition of Meghalaya depends upon the altitude, the more high the altitude is, the more cooler the climate. The average yearly rainfall is around 2600 mm in the Western part of the Northeastern state while the Northern Meghalaya receives an annual rainfall between 2500 and 3000 mm. The Southeastern Meghalaya gets annual showers above 4000 mm [4] (Fig. 41.1).

41.3 Tools and Techniques Use in This Study This study integrated advanced technologies and informational/computational techniques such as remote sensing, geographical information system, analytic hierarchy process [5] and fuzzy analytic hierarchy process [6] in locating the groundwater potential zones. Thematic layers such as geology, soil, slope, elevation, landuse/landcover and drainage density have been prepared by integrating remote sensing and GIS. In addition, techniques such as analytic hierarchy process and fuzzy analytic hierarchy process have been used to assign weights for quantifying relative importance of factors. Influence of factors on a geospatial event may vary according to context. This means that a particular influential variable may be more significant in one context. But, its importance may not be as significant in some other context as it has been in the first context. Therefore, this becomes necessary to assess and quantify the possibility of transition of an influential variable in terms of its possible influence on the event. This is primarily related to assessing the possibility of shift in

41 Delineation and Assessment of Groundwater Potential Zones …

413

Fig. 41.1 Location map of the study area

boundary of influential variables in terms of their influence on the event. Hence, this becomes necessary to include advanced techniques such as fuzzy AHP for analyzing the possibility of shift in boundary in a more effective manner.

414

H. Thabah and B. S. Mipun

41.4 Research Methodology The Landsat 8 TM data (with UTM projection, spheroid and datum WGS 84, zones 46 North), digital data images and Aster G Dem have been used in the present study. Secondary data on geology and soil have been collected from Geological Survey of India and National Bureau of Soil Survey and Landuse planning, Nagpur, respectively. Arc Swat, Arc GIS and Erdas Imagine software have been used for analysis and mapping of individual layers. The methodology flowchart is shown in Fig. 41.2.

Fig. 41.2 Methodology flow chart

41 Delineation and Assessment of Groundwater Potential Zones …

415

41.5 Identification of Influencing Factors Primarily, six different thematic layers such as slope, geology, soil, landuse/landcover and elevation have been identified as influential factors. The maps are shown in figures below. Slope: Surface water infiltration is directly influenced by slope gradient. Steep slopes result in limited time for water to infiltrate the soil, due to rapid downward flow of water. While moderate to flat areas have a higher holding for infiltration of rainwater, the gentle slope category with a slope less than 10 ° covers most of the study area [7]. Geology: The aquifers where groundwater is stored are mostly influenced by geology. The absorptive formation influenced the quality of an aquifer. Higher permeability and porosity are found mostly in sedimentary rocks and lesser in crystalline type of rocks. In Fig. 41.3b, majority of the portion is under sedimentary rocks [2]. Drainage: Drainage density of an area mostly depends on the structure and nature of geological formation, absorptive capability of the soil. High drainage density results in low infiltration due to rapid surface runoff, while low drainage density results in higher infiltration due to slow runoff of surface water [8]. In Fig. 41.3a, higher drainage density is found mostly on the Northern and Southern part of the study area. Soil: Geomorphology, geology, relief and time are the main factors in determining the spatial distribution of types of soil in a certain area. The relationship between runoff and infiltration rates is mainly determined by soil properties [9]. In Fig. 41.3d, coarse loamy soil is found extensively covering a major portion of the study area. Landuse/Landcover: Geomorphology, agro-ecology, climate and human induced activities are some of the influencing factors that determined the landuse/ landcover of a certain area. It regulates the availability and occurrence of groundwater [9]. Five types of LULC classes were identified through supervised image classification.

41.6 Result Obtained from Knowledge-Based GIS Assessment Six thematic maps of geology, soil, drainage density, slope and landuse/landcover were subsequently classify and weighted to generate the groundwater potential map. Areas with very low, low, moderate, high, very high groundwater potential were identified. Through knowledge-based techniques, geology has been taken as the most influential factor for delineating groundwater potential, followed by soil, elevation, slope river density and landuse/landcover (Table 41.1). This technique shows that a high groundwater potential zone is found on the Southern part of the study area and it decreases as we go further North (Fig. 41.4).

416

H. Thabah and B. S. Mipun

Fig. 41.3 a Drainage density map; b geology map; c slope map; d geology map; e landuse/landcover map

41 Delineation and Assessment of Groundwater Potential Zones … Table 41.1 Weights which have been assigned in knowledge-based GIS method for determining groundwater potential zones

417

Raster

Weight

Landuse/landcover

1

Elevation

4

Slope

3

River density

2

Soil

5

Geology

6

Fig. 41.4 Groundwater potential map (knowledge base)

41.7 Results Obtained from AHP-Based GIS Assessment Six thematic maps of geology, soil, drainage density, slope and landuse/landcover were subsequently classify and weighted to generate the groundwater potential map. Areas with very low, low, moderate, high, very high groundwater potential were identified. Through AHP-based technique, we have assign the weights of each factors based on their importance to get the priority weights. Geology has the highest weights followed by soil, elevation, slope, drainage density and landuse/landcover (Table 41.2). The groundwater potential map through AHP-based technique shows that higher groundwater potential is found on the Southern part of the area and the least

418 Table 41.2 Weights which have been assigned in AHP-based GIS method for determining groundwater potential zones

H. Thabah and B. S. Mipun Raster

Weight

Landuse/landcover

0.05

Elevation

0.15

Slope

0.09

River density

0.06

Soil

0.25

Geology

0.40

Fig. 41.5 Groundwater potential map (analytic hierarchy process)

on the Northwestern part, because of higher porosity and permeability of sedimentary rocks (Fig. 41.5).

41.8 Results Obtained from Fuzzy AHP-Based GIS Assessment Six thematic maps of geology, soil, drainage density, slope and landuse/landcover were subsequently classify and weighted to generate the groundwater potential map. Areas with very low, low, moderate, high, very high groundwater potential were

41 Delineation and Assessment of Groundwater Potential Zones … Table 41.3 Weights which have been assigned in fuzzy AHP-based GIS method for determining groundwater potential zone

419

Raster

Weight

Landuse/landcover

0.04

Elevation

0.17

Slope

0.12

River density

0.10

Soil

0.23

Geology

0.35

identified. Through fuzzy AHP-based technique, we have assign the weights of each factors based on the importance to get the normalized weights (Table 41.3). It is the extended form of AHP. The result obtained under this technique is similar to that of AHP (Fig. 41.6). Fig. 41.6 Groundwater potential map (fuzzy analytic hierarchy process)

420

H. Thabah and B. S. Mipun

41.9 Results Obtained from Proximity-Based GIS Assessment Proximity analysis is an analytical technique used to determine the relationship between the selected points in its neighbor. The comparative analysis is performed using the three techniques: AHP, fuzzy AHP and knowledge base. The proximity impact of Mawphlang Dam on the surrounding area can be analysis, as most of the areas located on the Southern part of the dam have a greater groundwater potential (Tables 41.4, 41.5 and 41.6) (Figs. 41.7, 41.8 and 41.9). Table 41.4 Impact of proximity (AHP) Categories

Criterion

Area covered (ha)

% of study area coverage

Proximity map (AHP)

2.5046–3.7219

9.602

11.99

3.7219–4.6300

19.015

23.75

4.6300–5.3643

21.81

27.24

5.3643–6.0792

22.261

27.81

6.0792–7.4317

7.352

9.18

Table 41.5 Impact of proximity (fuzzy AHP) Categories

Criterion

Proximity map (fuzzy AHP)

3.9562–5.9292

Area covered (ha) 4.386

% of study area coverage 5.47

5.9292–6.9289

22.314

27.87

6.9289–7.9286

20.057

25.05

7.9286–8.7705

26.042

32.53

8.7705–10.6647

7.34

9.16

Table 41.6 Impact of proximity (knowledge based) Categories

Criterion

Area covered (ha)

% of study area coverage

Proximity map (knowledge based)

44.9385–68.9023

11.65

14.55

68.9023–87.0818

23.642

29.53

87.0818–104.8480

14.45

18.04

104.8480–120.9616

24.484

30.59

120.9616–150.2967

5.808

4.75

41 Delineation and Assessment of Groundwater Potential Zones … Fig. 41.7 Proximity map (AHP)

Fig. 41.8 Proximity map (AHP) (fuzzy AHP)

421

422

H. Thabah and B. S. Mipun

Fig. 41.9 Proximity map (knowledge base)

41.10 Results Obtained by Adding the Result Maps of All Techniques (Knowledge Based, AHP, Fuzzy AHP) Delineating and accessing groundwater potential based on different geospatial techniques varies from one to another. The figure below shows the average map by adding up all the layers in raster calculator and divided it by the total number of layers (Fig. 41.10).

41.11 Summary, Conclusion and Recommendations Elevation, slope, geology, soil, drainage density an landuse/landcover are some of the thematic layers prepared for delineating the groundwater potential zones for the present study. Knowledge-based weighted overlay, AHP, fuzzy AHP and proximity analysis have been used for the delineation of potential zonation map of the study area. The study area represents five groundwater potential zones, i.e., very low, low, moderate, high and very high. The zones of very low and low GWP are mostly found in areas with steep slope and high elevated areas, moderate GWP correspond to gentle slope and lower elevated areas and high to very high GWP occurs in very gentle slope

41 Delineation and Assessment of Groundwater Potential Zones …

423

Fig. 41.10 Average map

with very low topographical features. Slope plays a major role in groundwater potential zones. It has been observed that most of the area located at a steeper slopes results in lower groundwater potential than areas with gentle slopes. Gentler slopes result in lower runoff while steep slopes results in higher runoff [2]. The types of soil found in a particular area also act as a dominant factor on groundwater recharge through percolation and loss through runoff. The type of soil and porosity determine the water holding and infiltrating range of a specific type of soil (Godebo, 2005). The infiltration is slow in clayey soil than of sand soil because of its low absorption capability [2]. Geology is one of the main factors that determine the storage of groundwater. Sedimentary rocks such as sandstone and limestone are good aquifers. Metamorphic and igneous rocks have very low porosity. Higher drainage density results in lower groundwater potential due to rapid surface runoff. Low drainage density results in higher groundwater potential due to slow surface runoff. Groundwater mapping using GIS and remote sensing has modified to a great extend in the field of geology [2]. It provides accurate measurement and offers more accurate data and thereby easing the scaling process. Water is the key survival of all living organisms. Water demand has increased tremendously over the years due to overgrowing population but at the same time it has been misused and wasted. Hence, water being one of the most precious and indispensable resources needs to be conserved. Through the application of GIS in water resource management, the implementation of various techniques related to

424

H. Thabah and B. S. Mipun

study area is marked and evaluated for effective future development in managing water resources. Acknowledgements The author would like to thank Dr. AlokBhushan Mukherjee, Prof. S. Dey, Department of Geography, North Eastern Hills University Shillong, for their meticulous supervision, affectionate advice, support and encouragement.

References 1. Nilawar, A.L.: Identification of groundwater potential zone using remote sensing and GIS technique. Shri Guru Gobind Singhji Institute of Engineering and Technology 3, 1 (2014) 2. Adeyeeye, O.A., Ikopokonte, E.A., Arabi, S.A.: GIS based groundwater potential mapping within this Dengri area, North Central Nigeria. 1,5,7–8 (2018) 3. Ryngnga, P.K.: Land evaluation for landuse planning: a case of Mawphlang, East Khasi Hills Meghalaya. vol. 2, Issue 2, pp. 34 4. Balasubramanian, A.: Meghalaya- At a Glance, pp. 2 (2017) 5. Whitaker, R.: The analytic hierarchy process- what it is and how it is used, vol. 9 (1987) 6. Wang, Y.M., Chim, K.S.: Fuzzy analytic hierarchy process: a logarithmic fuzzy preference programming methodology (2010) 7. Thapa, R., Gupta, S., Guin, S., Kaur, H.: Assessment of groundwater potential zones using multi influencing factor and GIS: a case study from Birbhum Distict, West Bengal. pp. 10, May 2017 8. Razavi Termeh, S.V., Sadeghi- Niaraki, A., Choi, S.M.: Groundwater potential mapping using an integrated ensemble of three bivariate statistical models with random forest and logistic model tree model. pp. 5 (2019) 9. Hussein, A.A., Govindu, V., Nigusse, A.G.M.: Evaluation of groundwater potential using geospatial techniques. pp. 5,9 (2016)

Chapter 42

COVID-19: Geospatial Analysis of the Pandemic—A Case Study of Bihar State, India, Using Data Derived from Remote Sensing Satellites and COVID-19 National Geoportal Pallavi Kumari, Richa Sharma, and Virendra Singh Rathore Abstract The state of Bihar has a sizeable population which is spread all over the nation especially the skilled and non-skilled labourers. This populace contributes towards human resource as service providers for nation building in various sectors across the length and breadth of India. The declaration by the World Health Organization (WHO) of the spread of COVID-19 virus as a pandemic brought a nationwide lockdown from 23rd of March 2020 to curb its spread. These daily wage workers were stranded far and wide without any resources. The transport communication was also withdrawn initially against the spread. Now, as soon as the conditions became conducive for the migrant labours to return to the native state of Bihar, there was a fear for disaster due to these returning migrant labour force as they probably could become a vector for the spread of COVID-19 pandemic in the state. The state government in consonance with the Central Government formed strict protocols to be adhered to, for these returning migrants. The present paper statistically analyses the spread of this pandemic once the migrant populace began returning to their home state. It investigates whether the influx of so many humans from various parts of the country would become the hub of the spread of the virus causing infectious hot spots or not. Simultaneously, as many researchers were trying to correlate the presence of atmospheric nitrogen dioxide (NO2 ) with the spread of COVID-19 virus, the paper tried to relate the amount of NO2 present over the study area on the day the maximum number of cases were reported in the study area. Evaluation of atmospheric nitrogen dioxide (NO2 ) used for the present paper was derived from satellite data. Time series analysis of this NO2 data was done. This enabled us to identify the peak day and the day when the NO2 levels were minimum. Incidentally, the number of COVID-19 P. Kumari · R. Sharma (B) · V. S. Rathore Department of Remote Sensing, Birla Institute of Technology, Mesra, Ranchi, India e-mail: [email protected] V. S. Rathore e-mail: [email protected] R. Sharma Birla Institute of Technology, Mesra, Ranchi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_42

425

426

P. Kumari et al.

cases reported synchronized with the NO2 levels in the atmosphere. Spatial autocorrelation was performed using Moran’s I test on the above two days. The values so obtained indicated that there were no hot spots identified, and the virus was found to be spread in a dispersed manner.

42.1 Introduction Analyses of the spread of the COVID-19 pandemic and related causes saw many research articles being published. Initially, it began with tracking the number of cases online and with mobile applications for tracking the pandemic. Maged and Estella [1], then began the spatial mapping of the spread of the disease. Terminologies like cumulative incident rate (CIR) and cumulative mortality rate (CMR) were coined [2]. Satellite-based NO2 data were examined to determine the effect of lockdown on the atmospheric gases [3, 4]. Then came studies related to assess the remote sensing techniques to monitor NO2 and related gases present in the atmosphere. Authors [5, 6] and finally many researches pointed out the evidences of aerosol transmission for COVID-19 was possible. The present study uses satellite-derived data with respect to the evaluation of atmospheric nitrogen dioxide and data from COVID-19 Dashboard of BHUVAN, the Indian Geo-Platform of Indian Space Research Organization (ISRO) and National portal (www.COVID19.india.org). It carries out time series analysis of NO2 to enable and identify the peak day and the day when the NO2 levels were minimum. Incidentally, the number of COVID-19 incidents also synchronized with the NO2 levels in the atmosphere. Further spatial auto-correlation was carried out using Moran’s I test on these two days and accordingly analyses were done.

42.2 Study Area and Methodology 42.2.1 Study Area Bihar is in the eastern region of India between latitude 24°20 10"N to 27°31 15"N and longitude 82°19 50” E to 88°17 40"E. It is an entirely land locked, in a subtropical region of the temperate zone. The state has 38 districts which are further subdivided into 534 blocks. Bihar is situated Indo-Gangetic plain, and it is rich in water resources. The drainage has trellis and dendritic pattern. Being a water-rich area and a fertile soil agriculture is the mainstream sector of the economy of the state. It accounts for about three-fourth of the state’s economic output also because of the fertile land the population density is 1106 persons per sq km, (as per Census of India 2011) which makes it the third most densely populated state of the country. The pattern of movement of the migrant population has its own season and direction, and it changes

42 COVID-19: Geospatial Analysis of the Pandemic …

427

Fig. 42.1 Study area: Bihar State of India

from time to time [7]. But the underlying fact is that the labour migrate for their cause which changes with time (Fig. 42.1).

42.2.2 Data Used The following data sets were used for the current study: The NASA Aura NO2 tropospheric vertical column density data where from the NO2 data was collected from 15–17 Jan 2018 to 15–17 Oct 2020. Aura satellite studied the Earth’s ozone layer, air quality and climate. This data was downloaded from NASA’s Giovanni website. OMI absorbing NO2 data for the year 2018 to 2020 was downloaded and pre-processed in Table 42.1. Interpolation over the study area and average month-wise data raster values to points were extracted, and time series plot was obtained by using the methodology as shown in Fig. 42.2. Table 42.1 Specification for downloading NO2 data Variable

Units

Source

Temporal resolution

Spatial resolution

Begin date

End date

NO2 Total Column (30% cloud screened) (OMNO22d V003)

1/cm2

OMI

Daily

0.25°

2004-10-01

2020-11-09

428

P. Kumari et al.

Fig. 42.2 Methodology adopted for the extraction of satellite data for time series analysis

42.3 Methodology The methodology adopted to find out the hot spot and cold spot is summarized in Fig. 42.3

Fig. 42.3 Methodology for carrying out Moren’s I auto-correlation test on COVID-19 data

42 COVID-19: Geospatial Analysis of the Pandemic … Table 42.2 Moran’s I score and their indications

Table 42.3 Summarizes the combination of p-values with z-values

429

Moran’s I score

Distribution—spatial pattern

Score −1

Dispersed

Score =0

Random

Score +1

Clustered

Z-score (standard deviations)

p-value (probability)

Confidence level (%)

+1.65

0 do Computetheloss f unctiong(x ∗ ) : g(x ∗ ) = ([maxk=t Z (x ∗j )k ] − Z (x ∗ )t ) Computegradientdescent f org(x ∗ ) : grad = ∂(g(x ∗j )/x ∗j using Adam O ptimi zer Applythegradienttothe f eatur evariablevector x ∗j = x ∗j − (1/η)grad Logical Or ((x ∗j ones ∗ x ∗j > 0.5), x) x ∗j ones isthearraywithshapeo f x ∗j f illedwithones end while Retur nx ∗j end while

condition for mis-classification is g(x) is less than or equal to 0. The adversarial EMR samples crafted by the CW are fed to the LSTM’s trained predictive model, and the mis-classifications that occur are tabulated for the various variants of LSTM considered in Table 50.2. The accuracy drops from 97.11 to 1.86% for the L 2 norm and to 5.34% for the L infinity norm for the considered CW attack on the EMR dataset. The attack is a strong and successful attack by crafting strong adversarial samples which when fed to the trained model causes the model to get confused yielding in a possible misclassification.

50.4 Handling Adversarial Attacks In this proposed method, the usage of explainable AI (XAI) techniques is considered as a means to understand the effect of perturbed samples on the model’s predictions. As for a model to be reliable and trustworthy for use in sensitive areas like healthcare, it should generate consistent accurate results and offer explanations at the same time. XAI can be explained as machine learning techniques that build models which are not only high performing but also interpretable. An explainable model makes the users trust the model more by offering a detailed analysis

50 SafeXAI to Detect Adversarial Attacks in EMR

507

50.4.1 Local Interpretable Model-Agnostic Explanations(LIME) An example for post-hoc methods is the permutation feature importance method. LIME is one such permutation feature importance tool that explains a model by highlighting the salient features of an input that lead to the results. LIME builds locally linear models around the predictions of an opaque model to explain it. These contributions fall under explanations by simplification as well as under local explanations It varies the input repeatedly and observes the changes in the predictions and comes up with an explanation. The local explanation of an instance produced by LIME is obtained by the following formulation, ξ(x) = argmin L( f, g, πx )+Ω(g) g∈G

where ξ(x) is the explanation of the instance x, g is the simple interpretable model. G is the set of interpretable models like Linear model and Decision tree. f is the complex Medical diagnosis classifier. πx is the similarity kernel for instance x. Ω(g) is the complexity of interpretable model g, and it must be small. Algorithm 50.2 Local Explanations using LIME Inputs: complex Medical diagnosis classifier f, Number of Electronic Medical Record samples N, Electronic Medical Record instance x and its interpretable version x’, N Perturbed samples z’ around x’, Similarity kernel π x , Number of features to be explained K

Z ← {} for i in z’ do πx (i) ← {Weight i according to the similarity score between i and x’} Z ← {Coefficients of g(i,f(i),πx (i))} end for w ← {K Most contributing coefficients K-Lasso(Z ,K) } return w

Here, f is the complex medical diagnosis classifier, g is the interpretable simple model which could be a Linear model or Decision tree. LIME for Medical Informatics LIME is fed with the input medical features extracted from the EMR sample. LIME then generates a set of permuted data points and fits the perturbed sample on the black box model whose predictions are sought to be interpreted by the LIME model. For the samples in the considered dataset, by creating a set of perturbated data points, the LIME models fits a linear interpretable model by assigning weights to the perturbed points based on their distance to the original points and thereby reflecting the same on the prediction of the model. This serves as an explanation of why the model predicted a specific sample to a specific target class. Figure 50.2 showcases the top features contributing to each class along with the probability of the class predicted for the natural EMR sample and its corresponding adversarial variant in Fig. 50.3.

508

S. Selvaganapathy et al.

Fig. 50.2 LIME explanation for natural EMR sample

Fig. 50.3 LIME explanation for an adversarial EMR sample

50.5 Conclusion and Future Work EMR driven clinical deep learning research is on the rise, and this articles coheres to address the problem of adversarial attacks happening on learning based medical solutions by deploying XAI techniques. XAI techniques provide interpretation of the model’s prediction at local and global scale.The current article addresses Naive vanilla LSTM model as a baseline model and observes the misclassification caused by the adversarial medical record crafted using the CW attack. It then tries to alleviate the perturbations by detecting and comparing against the natural sample and adversarially crafted sample using XAI techniques. The considered methodology sounds

50 SafeXAI to Detect Adversarial Attacks in EMR

509

promising but still has several limitations which will be considered as our future work to improve upon. One possible aspect to explore in future could be the usage of alternative metrics other than misclassification rate to measure the effectiveness of the attack. Utilization of advance learning techniques to handle time aware input data like GRU, Tree LSTM’s and Peephole LSTM to name a few can be considered.

References 1. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (sp), pp. 39–57. IEEE (2017) 2. Kartoun, U.: Advancing informatics with electronic medical records bots (emrbots). Softw. Impacts 2, 100006 (2019) 3. Kelly, C.J., Karthikesalingam, A., Suleyman, M., Corrado, G., King, D.: Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17(1), 1–9 (2019) 4. Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with lstm recurrent neural networks. ArXiv preprint arXiv:1511.03677 (2015) 5. Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., Lu, F.: Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recogn. 110, 107332 (2021) 6. Molnar, C.: Interpretable machine learning. Lulu, Com (2020) 7. Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (Euro S&P), pp. 372–387. IEEE (2016) 8. Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: MILCOM 2016-2016 IEEE Military Communications Conference, pp. 49–54. IEEE (2016) 9. Rahman, A., Hossain, M.S., Alrajeh, N.A., Alsolami, F.: Adversarial examples–security threats to covid-19 deep learning systems in medical iot devices. IEEE Internet Things J. (2020) 10. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016) 11. Wu, D., Fang, W., Zhang, Y., Yang, L., Luo, H., Ding, L., Xu, X., Yu, X.: Adversarial attacks and defenses in physiological computing: a systematic review. arXiv preprint arXiv:2102.02729 (2021)

Chapter 51

Exploring Historical Stock Price Movement from News Articles Using Knowledge Graphs and Unsupervised Learning Amol Jain, Binayak Chakrabarti, Yashaswi Upmon, and Jitendra Kumar Rout

Abstract Technological advancements such as natural language processing (NLP) and machine learning (ML) have shown their impact in the financial sector in recent years. A major development in this has been the rapid use of news articles online to determine the direction of stocks of different companies. This has been the main theme of our paper which focuses on the reaction of Facebook stock due to various kinds of news referring to the company. Knowledge graphs have been put to work in our research which helps us to link different news items to each other. To extract the features, Word2Vec has been used to convert the sources, targets, and edges of all the knowledge graphs into appropriate five-dimensional vectors. Each dimension is regarded as a feature that was then clustered using K-Means, Mini Batch KMeans, Gaussian Mixture, Birch, and DBSCAN clustering algorithms. Out of all the mentioned techniques, the clusters were best formed using Gaussian Mixture algorithm. The performance of each algorithm was evaluated using the distinctness of the clusters formed by the respective algorithm. The model is aimed to provide investors with the idea of what kind of effect they can expect on Facebook stock price due to any event relating to the firm.

51.1 Introduction The stock markets today are highly influenced by external events including news about the company into consideration. The digital media of the twenty-first century had a profound impact on how the market reacts to the ever-changing dynamics of the business world. Thus, today we stand in a situation where studying the vast amount of news available is quintessential for a sufficient understanding of various company’s stock behavior [1]. Keeping an eye on this modern outlook, in this project a model has been created which keeps the investors apprised of the immediate future trend of the A. Jain · B. Chakrabarti · Y. Upmon · J. K. Rout (B) KIIT Deemed to be University, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_51

511

512

A. Jain et al.

Facebook stock price. This work primarily focuses on the use of knowledge graphs in the analysis of company news. Since a firm like Facebook comes into the light of multiple news every day, just a sentiment analysis of the individual articles will not be sufficient due to the different varieties and different intensities of impact on the stock price by each respective variety. Moreover, many of those articles together describe a single event. Thus, if the entire event at once is considered instead of single articles, we can get a better understanding of which events can lead to what percent increase or decrease in the stock price. Therefore, the use of knowledge graphs comes into play [2]. A knowledge graph is a graph consisting of a source which is the subject of a particular statement, a target which is the object of a statement and an edge joining the source and target which is the verb of the statement linking the subject and the object. The main specialty of knowledge graph is that if two or more statements in a single piece of text consist of the same subject, only one source is considered. This helps group similar news together in the graph eventually forming the entire event. In our work, such a knowledge graph has been created for each day’s news and then clustered them using various clustering techniques in order to get the various types of news related to Facebook. This work adds to the technological innovations in studying stock markets. The effect on the stock price due to various categories of news is an interesting phenomenon which is set to draw the attention of investors today more than ever before. Exploring stock behavior using knowledge graph following a clustering approach is the major contribution of this work. The research aims to add to the existing technologies by apprising the investors regarding stock price movement due to financial news. The rest of the paper is organized as follows: Sect. 51.2 is about related work. The proposed work including the research architecture has been delineated in Sect. 51.3. Implementation of the work is given in Sect. 51.4. The results were discussed in Sect. 51.5. Section 51.6 concludes the work with possible future work.

51.2 Related Work The underlying work in this paper has taken into consideration the recent advancements in technology applied in stock markets. Moving beyond traditional prediction practices and referring to financial news as an indicator of stock price movement have been a central theme for many research works in the past few years. The articles [3–5] have focused primarily on technical indicators of stock prices on predicting future value of stock. However, technical indicators alone cannot always accurately foretell future stock price because stock prices are highly volatile in the real world and are easily affected by external unforeseen events. Long et al. [6] have used trading behaviors to predict movement of stock prices in the Chinese stock market. The impact of external factors in the market has also been taken into

51 Exploring Historical Stock Price Movement …

513

consideration. However, external factors alone are not sufficient enough to estimate stock prices in non-ideal environments. The models proposed by Nguyen et al. [7] and Wang et al. [8] have used sentiment analysis of news for stock price prediction. Knowledge graphs, which have been used in our work, provide a better analysis of news statements by extracting entities and their relationships as compared to sentiment analysis. Liu et al. [9] have suggested an attention-based event relevance model for stock price movement prediction. Ding et al. [1] have utilized event extraction from the news article, represented it using dense vectors, and subsequently trained it using a novel neural tensor network as well as deep convolutional neural network model. The whole process is used to determine movement in the stock market. On the other hand, we have used the knowledge graph for extraction of entities and relation for stock price movement analysis. Hu et al. [2] have predicted the stock trend by hybrid attention method, whereas we have implemented the unsupervised learning and knowledge graph to categorize news articles for the analysis of stock movement. After going through an extensive literature survey, various issues were identified and subsequently been addressed by using state-of-the-art NLP tool—knowledge graph to analyze news articles and clustering techniques. Each cluster was compared using the average daily return of the dates of the news articles of that cluster. The usage of average daily return as a comparison metric gives the investor an idea of the effect a news article of a given kind can have on the stock price and not go by any expected future value which might prove to be inaccurate.

51.3 Proposed Methodology The proposed methodology consists of the research architecture and explains the detailed stepwise process for data collection, data cleaning, data preprocessing and feature engineering. This section also explains the different clustering techniques used in this work.

51.3.1 Research Architecture The detailed research layout is given in Fig. 51.1. The steps are as follows: – The data has been collected from New York Times by looking up for keywords such as “Facebook”. The process of Web Scraping has been used to collect relevant news that affects the “Facebook” stock price. Also, the Facebook stock price has been imported from Yahoo to our Jupyter notebook. – The raw data extracted from New York Times has been cleaned by extracting the dates and titles separately.

514

A. Jain et al.

– Feature Selection: The knowledge graphs constructed from the titles have been vectorized using Word2Vec (a Python library which is a model to convert each word into appropriate vectors and related words having vectors close to each other). These directed graphs contain various nodes (entities) and edges (relations) connecting them. Each starting node for an edge is termed as “source”, each destination node is labeled as “target”, and the edges are termed as “edge” in Fig. 51.1 . Since source, target and edge are words, their embeddings were generated using Word2Vec. Each source, edge and target of all the graphs has been converted into a five-dimensional vector. Each dimension is a new feature. The features of all the source, target and edge words were collected together to form the final data frame.

Fig. 51.1 Layout of proposed architecture

51 Exploring Historical Stock Price Movement …

515

– Clustering Technique: The clustering techniques used are K-Means, Mini Batch K-Means, Birch, DBSCAN and Gaussian Mixture algorithm. • K-Means: It partitions the population into K sets with the aim to minimize the variance within each cluster formed. • Birch: Balanced iterative and reducing clustering using hierarchies extracts cluster centroids from a tree structure formed. • DBSCAN: Density-based spatial clustering of applications with noise searches high density areas in the domain and expands the feature space around it to form clusters. • Mini Batch K-Means: A modified version of K-Means that changes the centroid of the clusters using mini batches of samples instead of the whole data set. • Gaussian Mixture Model: It forms a summary of a multivariate probability density function with an agglomeration of Gaussian probability distributions.

51.3.2 Data Information We have extracted the one-year news headlines related to “Facebook” published in the New York Times between August 2019 and August 2020 [10]. The database consists of a profusion of unique headlines and their corresponding publishing dates. It has been taken into consideration that news headlines can accurately represent the content of the news [11]. The news headlines related to Facebook have been used to analyze the past behavior of Facebook stock price.

51.3.3 Data Preprocessing A Web Crawler is created in which Request, Beautiful Soup and Selenium have been used to parse and extract the data. Selenium was used to automate the Load More action button on the Web site. The Text Edition news was extracted from the New York Times article since it is more accurate and is published after being verified. The data which is extracted was in the unstructured format and has been preprocessed using appropriate techniques. Table 51.1 describes the data extracted directly from the Web site, and Table 51.2 describes the data after preprocessing. The key visible difference is the date and title separated and structured as against the unstructured data extracted from the Web site.

516

A. Jain et al.

Table 51.1 Samples of extracted data from Web site Sl. No Title 0

Facebook pulls voices for QAnon from its site—August 20, 2020, Page B1 Facebook removes misleading Trump campaign post—August 6, 2020, Page B4 Facebook starts a TikTok Rival: Instagram Reels—August 6, 2020, Page B4 After many attempts, Facebook gaming is finally available in the App store—August 8, 2020, Page B4

1 2 3

Table 51.2 Samples of data after preprocessing Sl.No. Title 1 2 3 4

Facebook pulls voices for QAnon from its site Facebook removes misleading Trump campaign post Facebook starts a TikTok rival: Instagram Reels After many attempts, Facebook gaming is finally available in the App store

Date 2020-08-20 2020-08-06 2020-08-06 2020-08-08

Fig. 51.2 Facebook stock price

51.4 Implementation The Facebook stock price was visualized to know the general trend of the stock over the time period we had taken into consideration as shown in Fig. 51.2. In order to cluster the knowledge graph embeddings, five clustering algorithms were used as mentioned earlier in this paper. Three of them gave decent results. The clusters created by each algorithm were compared by the average daily returns for each cluster of knowledge graphs. If the average values were far apart, it indicated a

51 Exploring Historical Stock Price Movement …

517

Fig. 51.3 Silhouette Coefficient for Gaussian Mixture algorithm

Fig. 51.4 Silhouette coefficient for K-Means algorithm

good cluster formation. The number of clusters used in each algorithm was chosen using silhouette coefficient. Silhouette score is the mean similarity of contents of a cluster and their distances from objects in other clusters. Since the features are dimensions of vectors of the original source, edge and target of knowledge graphs and words with similar meanings have vectors closer to each other, silhouette coefficient will be a good metric to determine number of clusters to feed into the model. The number of clusters with the highest silhouette score was chosen as the hyperparameter for the clustering algorithms. Thus, after plotting the graph of silhouette coefficient with respect to each number of clusters, we have found the most optimal number of clusters which can be created is 3 as that has the highest silhouette score. In Figs. 51.3, 51.4 and 51.5, we have shown the plotting with respect to each number of clusters for Gaussian Mixture algorithm, K-Means algorithm and Mini Batch K-Means Algorithm.

51.5 Results The clusters formed by the Gaussian Mixture model were the best out of all the algorithms we had used followed by K-Means and Mini Batch K-Means. Their results are shown in Table 51.3. However, the performance of DBSCAN and Birch clustering was not satisfactory. The reason for under performance of DBSCAN can be attributed to the fact that being a density-based clustering technique, the close

518

A. Jain et al.

Fig. 51.5 Silhouette coefficient for K-Means algorithm Table 51.3 Comparison of results for different algorithms Algorithms(→) Gaussian Mini Batch K-Means Mixture K-Means Labels(↓) 0.0 1.0 2.0

Daily percentage change 0.296121 0.163985 0.180012 0.351533 −0.445320 0.175005

0.4253050 −0.190920 0.324976

Birch

DBSCAN

0.335976 0.312897 0.108456

0.344567 0.321658 0.239875

location of word embedding features was not properly differentiated by the model. The entire density of word embeddings was taken as one cluster by the model which was not what we needed. The reason for under performance of Birch can be attributed to the fact that the model tends to summarize large data sets. Since the size of the data set in our case was not large enough to summarize and cluster, the model could only form one cluster of the entire data. The label column consists of the cluster labels formed from the given three algorithms. The column adjacent to it—Pct_change—shows the average daily percentage change of stock price of those days whose news articles were clustered into the respective labels. Using the Gaussian Mixture model, we cluster the news into three groups. The average daily return for group 0 was 0.296%, group 1 was 0.18%, and group 2 was −0.44%. Similarly, Mini Batch K-Means and K-Means gave three clusters each. For Mini Batch K-Means, group 0 gave 0.16% average daily return, group 1 −0.35% and 0.175% respectively. For K-Means, group 0 gave 0.42% average daily return, group 1 gave −0.19%, and group 2 gave 0.32%, respectively.

51.6 Conclusion and Future Work Therefore, it has been observed how different categories of news have their impact on the stock price of Facebook. Each day’s news articles were grouped together and knowledge graph created followed by embedding and subsequent clustering

51 Exploring Historical Stock Price Movement …

519

of those knowledge graphs. Thus, dates were categorized using the cluster labels assigned to the news of those days. The clusters were compared by the average daily percentage change of stock price of the respective dates of every cluster. This, as we envisage, will prompt investors to check the impact of different kinds of events related to the company on its daily stock price returns. Our work has primarily focused on one company. This can be applied to companies of other domains too as different domains’ companies will have different sets of classes of news affecting their respective stock prices.

References 1. Ding, X., Zhang, Y., Liu, T., Duan, J.: Deep learning for event-driven stock prediction. In: Proceedings of Twenty-Fourth International Joint Conference on Artificial Intelligence on Proceedings, pp. 2327–2333 (2015) 2. Hu, Z., Liu, W., Bian, J., Liu, X., Liu, T. Y.: Listening to chaotic whispers: a deep learning framework for news-oriented stock trend prediction. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 261–269. Marina Del Rey, CA, USA (2018) 3. Naik, N., Mohan, B.R.: Stock price movements classification using machine and deep learning techniques-the case study of indian stock market. In: Proceedings of the International Conference on Engineering Applications of Neural Networks, pp. 445–452. Springer, Cham (2019) 4. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst. Appl. 42(1), 259–268 (2015) 5. Nabipour, M., Nayyeri, P., Jabani, H., Mosavi, A., Salwana, E.: Deep learning for stock market prediction. Entropy 22(8), 840 (2020) 6. Long, J., Chen, Z., He, W., Wu, T., Ren, J.: An integrated framework of deep learning and knowledge graph for prediction of stock price trend: an application in Chinese stock exchange market. Appl. Soft Comput. 91, 106205 (2020) 7. Nguyen, T.H., Shirai, K., Velcin, J.: Sentiment analysis on social media for stock movement prediction. Expert Syst. Appl. 42(24), 9603–9611 (2015) 8. Li, X., Wu, P., Wang, W.: Incorporating stock prices and news sentiments for stock market prediction: a case of Hong Kong. Information Process. Manage. 57(5), 102212 (2020) 9. Liu, J., Chen, Y., Liu, K., Zhao, J.: Attention-based event relevance model for stock price movement prediction. In: Proceedings of the China Conference on Knowledge Graph and Semantic Computing, pp. 37–49. Springer, Heidelberg (2017) 10. Times Topics: Facebook. https://www.nytimes.com/search?dropmab=true& endDate=20200823&query=Facebook&sort=best&startDate=20190801. Last accessed 24 Aug 2020 11. Fehrera, R., Feuerriegela, S.: Improving decision analytics with deep learning: The case of financial disclosures. Stat 1050, 9 (2015)

Chapter 52

Comparative Study of Classical and Quantum Cryptographic Techniques Using QKD Simulator Cherry Mangla

and Shalli Rani

Abstract Security is the crucial aspect of all applications and networks. Due to the increasing use of the Internet all over the world, the importance of security is also increasing. Cryptography is used for converting plaintext into ciphertext for securing it from hackers. Based on speed and efficiency, the best cryptographic algorithm is decided among various other algorithms. With the advancement in time, attackers are becoming smarter than in the past, e.g., quantum adversaries. So to secure the network from those attacks, quantum cryptography is used, which is based on the laws of quantum mechanics. In this paper, we have compared various classical and quantum cryptographic algorithms based on different parameters in different settings to determine one of the best algorithms in both cryptography techniques (classical and quantum).

52.1 Cryptography Cryptography is a term used for keeping online data safe and secure from hackers. There are various types of algorithms used to encrypt and decrypt the data, to help that information reach the final destination safely. Increasing usage of the Internet of Things (IoT) is growing the data exchange. Hence, increasing data leads to many problems such as security, privacy, integrity. Intruders are capturing the data from all the resources and re-distributing it. Therefore, it has become mandatory to secure the sensitive information of people like social security numbers, credit, and debit card details. In the last few years, various cryptographic algorithms are being developed and used to secure these types of information [7, 11, 13].

C. Mangla · S. Rani (B) Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab 140401, India e-mail: [email protected] C. Mangla e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_52

521

522

C. Mangla and S. Rani

Fig. 52.1 Types of cryptography Table 52.1 Various size of keys used by public key encryption S. no. Keys Algorithms 1 2 3

64-bit key 128-bit key, 192-bit key, 256-bit key (various bit keys) (32 to 448)-bit key (any size in between)

RC2, DES,3DES (triple DES) AES, RC6 Blowfish

Cryptography is divided into two types: • Symmetric (private key): In private key cryptography, single key is used to encrypt and decrypt the data for the secure transmission of data between sender and receiver. The security of the algorithm depends upon the length of keys being used in encryption and decryption [15]. Table 52.1 shows the key description of various algorithms mentioned in Fig. 52.1. The smaller key encryption is less secured as compared to longer-sized key algorithms. As it is easy to decrypt the smaller key, whereas longer key algorithms are hard to break. The strength of the symmetric key algorithms depends upon the length of the key. • Asymmetric (public key): Asymmetric key cryptography is used where key distribution before transmission is difficult; apparently, it is used to solve the problem of key distribution [19]. Here, two keys are provided, one is known as the public key which is used to encrypt the data, whereas the other known as the private key is the receiver’s which he/she uses for the process of decryption. These encryption are 1000 times slower in processing than symmetric encryption [15].

52.2 Quantum Cryptography It works upon the laws of physics where the position of a qubit keeps on changing. Here, the classical paper of Wootters [23] describes the situation better, which says “photons cannot be cloned.” An intruder cannot copy or harm the data on the way to

52 Comparative Study of Classical and Quantum …

523

Fig. 52.2 Position of a qubit

the receiver which makes it the best use case for cryptography. Even more interesting fact is that quantum cryptography can break any other classical cryptography in seconds, mostly classical cryptographic algorithms are based on factorization of prime numbers. Quantum systems can find those prime numbers easily used in Diffie– Hellman key exchange. Quantum cryptography uses quantum computing principles to perform cryptographic tasks to provide the security system. Currently, only one of the seven applications (mentioned in quantum cryptographic constructions), i.e., quantum key distribution (QKD), is working, because it does not need any quantum computer (QC) and can be used with currently available fiber optics and lasers [18]. Consequently, random generation of encryption schemes based on quantum computing is hard to break for hackers.

52.2.1 Quantum Mechanics and Its Properties Quantum mechanics is a branch of physics, which deals with photons instead of bytes. In computers, all the encryption using bytes are done with mathematical formulations, whereas photons travel in the form of light and have various properties which make them a more safe and secure way for cryptography for the future. Machines that are based on quantum mechanics principles (quantum superposition, quantum entanglement, and quantum annealing) are known as QC. Following is the description of quantum mechanics’ various properties: Quantum Superposition: Superposition is referred to as the possibility of a system, where various states are balanced. Even, those interference effects between states can be observed in the particle description. In standard mechanics, interference is a feature of wave snot expected of particles. Using the superposition property of quantum computing, data is safe [6]. It changes the position of qubits (Fig. 52.2) when intruders try to read the data. Quantum Entanglement: Entanglement can be explained as the correlation of two or more particles, but there does not exist any quantum mechanics about correlation. For instance, in an experiment, various pairs of particles are formed (keeping total

524

C. Mangla and S. Rani

momentum = 0). It means the momentum of both particles depends upon each other. Thus, there is some correlation between the momentum of those particles. There is no quantum mechanics in it, whereas if the remotely placed measuring devices (uncorrelated from each other) are used, it shows some features which describe quantum mechanics. This cannot be reproduced by any classical and probability theory [6].

52.2.2 Quantum Cryptographic Constructions Many researchers confuse quantum cryptography with QKD, whereas QKD is one type of quantum cryptographic constructions [5]. There are seven various types of quantum cryptography. The reason why QKD is popular as in classical systems, only QKD can be applied. Following are the various types of quantum cryptography. Among all the types, in this article, we have used BB84 of quantum key distribution. • Conjugate Coding [22]: It converts binary sequences in the form of light. Mostly, it is known as quantum coding or quantum multiplexing [3]. • Quantum Key Distribution (QKD): QKD works on the laws of quantum physics for secure communication between sender and receiver. In QKD, two channels (quantum channel and classical channel) are used for sharing key [9]. QKD is used over classical networks to securely sharing keys between sender and receiver using quantum channel. These two channels are used side by side to ensure secure communication in the light of all types of adversaries. It implements the rules of quantum mechanics. The sender and receiver share the key securely using QKD through the quantum channel, and through the classical channel, bits are measured. As in quantum physics, if the hacker tries to read any bit, it changes the position of the bit (value of the resultant bit). While comparing the bits through the classical channel which can be easily identified, the communication is started all over again. QKD is a part of quantum cryptography that can be used along with classical channels for secure communication. Following are a few QKD protocols used for securing data in classical networks using quantum channels. – BB84 is a QKD scheme used for exchanging keys among two parties securely. It was developed by Bennett and Brassard in 1984. It is the first QKD protocol developed for secure key exchange, which works by the polarization of photons. It is one-time pad encryption [1, 20]. – In Secoqc QKD, the maximum number of keys is generated and stored. These are used according to the traffic on the network. It helps in selective forwarding issue [12]. – Cow—coherent one-way—protocol works on the principle of quantum entanglement. It transmits the data on the speed of light.

52 Comparative Study of Classical and Quantum …

525

– KMB09 protocol works on the principle of Heisenberg uncertainty principle. It is impossible to know simultaneously the exact position and momentum of a particle. – E91 protocol works on the property of entanglement, where both the sender and receiver could have one photon each. Therefore, sniffers will not be able to log into the system. – Six-state uses a six-state polarization scheme on three orthogonal bases. • Bit Commitment Implies Oblivious Transfer: These two are the rudimentary but essential part of cryptography. Here, two messages are sent from Alice to Bob. But, only one bit or say one message is received by Bob. Alice does not know which message Bob received, and Bob does not know the other message [14, 17, 21]. • Limited-quantum-storage Models [5]: Here, to perform secure two-way computation, some information is needed to be stored for attacking by the adversary, but in quantum computation, the concept of quantum memory is still under consideration, so it becomes an advantage. Damgård et al. [10] writers suggested a bounded-quantum-storage model which assumes attackers have to secure some information and lose some information about the sender in the process. • Delegated Quantum Computation: Nowadays, quantum computers are available in a few places for computation, but it harms the privacy of data, as all the data is sent to one place for computation which was under consideration by researchers for many years. So, Childs [8], Arrighi, and Salvail [2] already gave uBQC [4] known as universal blind computation. • Quantum Protocols for Coin-flipping and Cheat-sensitive Primitives [5]: It consists of quantum strong coin-flipping protocol showing the best possible bias √1 − 1 . 2 2 • Device-independent Cryptography: It helps to run protocols on devices that do not trust each other.

52.3 Motivation and Background Study • Classical Versus quantum cryptography Classical Cryptography: It is based on mathematics, and it relies on the difficult computations of factorizing various large numbers in different algorithms. Its security is mainly based on the high complex factorizing algorithms [19]. Quantum Cryptography (QC): QC is based on the laws of quantum mechanics which is a branch of physics; in this type of cryptography, a secure connection between sender and receiver can be made using photons, which cannot be cloned on the way and even can store the intruder’s information. Table 52.2 is showing the differences between classical and quantum cryptography based on various factors. • QKD Simulator Process: Here, both the classical and quantum channels are used for key distribution by Alice (sender) and Bob (receiver). The advisory can attack both the channels (assuming that the advisory has a quantum system). The quantum

526

C. Mangla and S. Rani

Table 52.2 Comparison of classical cryptography and quantum cryptography Classical cryptography Quantum cryptography Base Usage

Mathematical computations (factoring of large numbers) Widely used

Digital signature Bitrate Bit storage Communication range

Present Computational power-dependent 2n n-bit strings Millions of miles

Deployment and testing Upgrading

Properly checked and in use Required when computational power rises Less

Expense





• •

Quantum mechanics Less in use due to the inability of QC, and only QKD is in use Absents Average 1MBPS One n-bit strings Maximum ten miles (still in progress) On initial stages The base is physics laws Highs

channel will help to transmit keys using the laws of quantum mechanics. Depending upon the BB84 (QKD protocol), the photons are randomly polarized on a specific basis. Those bases can be rectangular or diagonal, and it depends upon the specifications being set by QKD protocol. Now, when Bob (receiver) receives the photons, he will measure them using a random basis present with him. Here, the classical channel will be used; after that, Alice and Bob will compare the results of every single photon using the classical channel. Eventually, after doing some local operations, the classical channels are used for correcting the errors and privacy amplification of the data being shared between Alice and Bob. Data Encryption Standard (DES) is used for encryption of both blocks of data and one bit at a time; i.e., it is known as a block cipher. For encrypting plaintext into ciphertext, DES groups the data into 64-bit blocks. Using the methods of permutation and substitution, each block is enciphered using secret keys into 64bit ciphertext. It uses a 56-bit key and maps 64-bit input into a 64-bit output block. The process of DES encrypts the blocks individually in 16 rounds along with four different modes. To decrypt the block, reverse process is used. In general, brute force attack is the most common type of attack on DES. Triple Data Encryption Standard (TDES) is an extended version of DES which takes three instead of one 64-bit key, in total 192 bits key length. The procedure is exactly similar to DES, the one part where DES is repeated three times (as the name suggests). Starting with encryption with the first key, again the second key is used to encrypt the second time, and at last, it is encrypted with the third key. Advanced Encryption Standard (AES) is also a fixed-block cipher having 128 bits. Key sizes may vary among 128, 192, and 256 bits. It is based on the network of substitution and permutation for passing hardware and software. Blowfish is used for encryption and securing data. It is a type of symmetric block ciphers. Comparing with other ciphers, it has the problem of weak keys.

52 Comparative Study of Classical and Quantum …

527

52.4 Proposed System This article’s main aim is to identify the best-suited algorithm out of all the four (DES, TDES, AES, and blowfish) by comparing the results based on various performance parameters in different settings, processing different data file sizes (1MB to 6 MB) along with the 1-kb change in key in both classical and quantum cryptography. The results and analysis will show the algorithm out of all eight (DES, DESQ, TDES, TDESQ, AES, AESQ, BLOWFISH, BLOWFISHQ) based on various performance parameters such as encryption/decryption time, throughput, and avalanche effect. Evaluation Metrics • Encryption time: The process of converting plaintext into ciphertext at the sender side is known as encryption, and the time consumed in this process is known as encryption time. Its main parameters are plaintext block size, key size, and mode. Milliseconds is the unit to measure encryption time [16]. • Decryption time: The process of converting ciphertext into plaintext at the receiver side is known as decryption. Time taken in conversion is known as decryption time. To make a system faster and more responsive, decryption time should be less than encryption time. Milliseconds is the unit to measure decryption time [16]. • Throughput: It is the sum total of all the time taken during the process of encryption and decryption [16]. • Avalanche Effect: Property of cryptographic algorithms where a slight change in input can show significant changes in output. For example, changing a key slightly could show the difference in the ciphertext. It is done to disable adversaries from predicting plaintext via statistical analysis.

52.5 Results Previously, we have discussed how classical cryptography is different from quantum cryptography in Sect. 52.2. In this section, we have used all the four mentioned security algorithms (DES, TDES, AES, and BLOWFISH) in both (quantum and classical cryptography) settings and noted down some results. Below are the descriptive result tables, and graphs are used to further analyze which cryptography algorithm is best used in both classical and quantum cryptographic settings. • Performance: The comparison of the performance of all the eight cryptographic algorithms is shown keeping various file sizes (i.e., from 1 to 6 MB) based on encryption and decryption time in Fig. 52.3. In Fig. 52.3, the results are visualized, and it is depicted that TDESQ while working with BB84 protocol shows the lowest time of encryption and decryption while comparing to other algorithms. The comparison of the performance of all the eight cryptographic algorithms is shown keeping various file sizes (i.e., from 1 MB to 6MB) based on throughput in Fig. 52.4. In Fig. 52.4, it is depicted that TDESQ while working with BB84 protocol

528

C. Mangla and S. Rani

Fig. 52.3 This figure of values and graph are showing the encryption and decryption time being taken by various file sizes in different algorithms

Fig. 52.4 This figure and graph are showing the throughput with different file sizes in all the eight algorithms

Fig. 52.5 This figure of values is showing the avalanche effect with different file sizes in different algorithms

is performing better. It is showing an average 0.31 performance rate of throughput while comparing to other classical and quantum algorithms. • Avalanche Effect: The comparison of all the eight cryptographic algorithms is shown keeping various file sizes (i.e., from 1 to 6 MB) along with the 1-KB key change to show the performance of the avalanche effect (Fig. 52.5). In Fig. 52.5, it is depicted that TDESQ while working with BB84 protocol performs well with a notable key change in the output while comparing to other classical and quantum algorithms.

52 Comparative Study of Classical and Quantum …

529

52.6 Conclusion In this article, comparative analysis and experimental results are being studied on four cryptographic algorithms. We have used all the four AES, DES, TDES, and BLOWFISH in both classical and quantum scenarios to read out the readings for finding the best out of all. From our results, it is observed that TDES’s performance is better as compared to all other algorithms in both parameters (performance and avalanche effect).

References 1. Abbas, Y.A., Abdullah, A.A.: Efficient hardware implementation for quantum key distribution protocol using fpga. In: IOP Conference Series: Materials Science and Engineering, vol. 1076, p. 012043. IOP Publishing (2021) 2. Arrighi, P., Salvail, L.: Blind quantum computation. I. J. Quantum Inf. 4(05), 883–898 (2006) 3. Bennett, C.H., Brassard, G., Breidbart, S.: Quantum cryptography ii: How to re-use a one-time pad safely even if p= np. Nat. Comput. 13(4), 453–458 (2014) 4. Broadbent, A., Fitzsimons, J., Kashefi, E.: Universal Blind Quantum Computation 50th Annual IEEE Symposium on Foundations of Computer Science, 2009, focs’09 (2009) 5. Broadbent, A., Schaffner, C.: Quantum cryptography beyond quantum key distribution. Des. Codes Crypt. 78(1), 351–382 (2016) 6. Chaudhary, N., et al.: A pedagogical approach to quantum computing using spin-1/2 particles. In: 2006 6th IEEE Conference on Nanotechnology, vol. 2, pp. 882–885. IEEE (2006) 7. Chhabra, R., Verma, S., Krishna, C.R.: A survey on driver behavior detection techniques for intelligent transportation systems. In: 2017 7th International Conference on Cloud Computing, Data Science and Engineering-Confluence, pp. 36–41. IEEE (2017) 8. Childs, A.M.: Secure Assisted Quantum Computation. arXiv preprint quant-ph/0111046 (2001) 9. Ciesla, R.: Implementations of QKD. In: Encryption for Organizations and Individuals, pp. 247–256. Springer (2020) 10. Damgård, I.B., Fehr, S., Salvail, L., Schaffner, C.: Cryptography in the bounded-quantumstorage model. SIAM J. Comput. 37(6), 1865–1890 (2008) 11. Datta, P., Sharma, B.: A survey on IoT architectures, protocols, security and smart city based applications. In: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5. IEEE (2017) 12. Dianati, M., Alléaume, R.: Transport layer protocols for the secoqc quantum key distribution (QKD) network. In: 32nd IEEE Conference on Local Computer Networks (LCN 2007), pp. 1025–1034. IEEE (2007) 13. Dixit, P., Gupta, A.K., Trivedi, M.C., Yadav, V.K.: Traditional and hybrid encryption techniques: a survey. In: Networking Communication and Data Knowledge Engineering, pp. 239–248. Springer (2018) 14. Goldreich, O., Micali, S., Rivest, R.: A fair protocol for signing contracts. IEEE Trans. Inf. Theory 36(1), 40–46 (1990) 15. Kader, H., Hadhoud, M.: Performance evaluation of symmetric encryption algorithms. In: Performance Evaluation, pp. 58–64 (2009) 16. Lakshmi, P.S., Murali, G.: Comparison of classical and quantum cryptography using QKD simulator. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 3543–3547. IEEE (2017) 17. Merkle, R.C.: A digital signature based on a conventional encryption function. In: Conference on the Theory and Application of Cryptographic Techniques, pp. 369–378. Springer (1987)

530

C. Mangla and S. Rani

18. Nanda, A., Puthal, D., Mohanty, S.P., Choppali, U.: A computing perspective of quantum cryptography [energy and security]. IEEE Consum. Electron. Mag. 7(6), 57–59 (2018) 19. Patil, P.A., Boda, R.: Analysis of cryptography: classical verses quantum cryptography. Int. Res. J. Eng. Technol. (IRJET) 3(5) (2016) 20. Raddo, T.R., Rommel, S., Land, V., Okonkwo, C., Monroy, I.T.: Quantum data encryption as a service on demand: Eindhoven QKD network testbed. In: 2019 21st International Conference on Transparent Optical Networks (ICTON), pp. 1–5. IEEE (2019) 21. Wiesner, S.: Conjugate coding. Sigact News 15, 78–88 (1983) 22. Wiesner, S.: Conjugate coding. ACM Sigact News 15(1), 78–88 (1983) 23. Wootters, W.K., Zurek, W.H.: A single quantum cannot be cloned. Nature 299(5886), 802–803 (1982)

Chapter 53

A Novel Approach to Encrypt the Data Using DWT and Histogram Feature Sandeep Kumar Srivastava, Sandhya Katiyar, and Sanjay Kumar

Abstract In recent years, digital image data techniques are used in very large scale. Data watermarking is the best technique that allows us to hide the secret information into an image. In this paper, discrete wavelet transfer (DWT) low frequency band is used for embedding watermark techniques. Binary code, i.e., 0, 1, is used to hide the data with watermark technique, which hides the information into an image. In this proposed work, information is hidden by carrier image where DWT feature hides the data by histogram feature shifting. By using this technique, information can be hidden point by point, and image can be recovered easily. Further verification is also done by taking authentic dataset image. These operations demonstrate the proposed work with SNR and PSNR with maximum secured information.

53.1 Introduction Watermarking is basically used in various contexts depending on the user requirement. There are several types of watermark available, i.e., image watermarking, video watermarking, audio watermarking, digital signal water marking and text watermarking. We use watermarking for military operation for providing useful information sender to receiver end. Watermarking is basically sender sends the encoded message and then got the message at the receiver end. Now, receiver end can start encoding the entire message and can extract the plain text. There are many encryption algorithms such as one-time key-based, bit-level permutation-based, DNA rule-based and unromantic mathematical model-based ones [1] that have shown good performance. The encryption system can be symmetric [2, 3] or asymmetric. Encoded message can be using varieties of technique to send the message at the receiver S. K. Srivastava (B) · S. Katiyar · S. Kumar Department of Information Technology, Galgotias College of Engineering and Technology, Greater Noida, India S. Kumar School of Computing Science and Engineering, Galgotias University, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_53

531

532

S. K. Srivastava et al.

end. Tabassum et al. [4] hashing dwt technique to hide the information in a particular image in this image we can implement the watermark embedded technique. The useful information watermarking can be done in the text or images can be host in embedded form. Watermarking algorithm is the process to hide the information behind the images. Simply we put the message into a image host using watermarking algorithm and get the watermark image. Joshi and Sharm [5] one thing is more important that action of embedded is only possible when it is in reversible process. If an extraction is irreversible, then extraction of message is impossible, depending on the user requirement, he can choose accordingly, and reversible message is based on the ownership issues or copyright issues. Then, they provide watermarking solution and choose the reversible process to provide authentication. It is more reliability confidentiality and security. It has to be classified in different watermarking parameters, and it is the best on the characteristics of properties watermarking. It can be done by the user requirement. Hartung et al. [3] they are classified into the caption, and watermarking is based on the tempering tip, watermarking and anti-counter tempering mark Waugh. If we can see the watermarking in it, it is called as visual watermarking, and if the watermarking is not seen, it is called blind watermarking can be done into image, video text, audio and medical department. It is broadly used in medical techniques that integrating watermarking images, the large amount of bandwidth usually required to storing image. The patient report can be hidden inside the medical image without being seen by unauthorized persons. The goal of watermarking is to provide confidentiality, integrity and indexing them using watermarking algorithm.

53.2 Literature Survey Chen et al. (2018) [6] According to this paper, we have to analyze a new phenomenon called a non-negative lattice factor, it is a duplicate secure or verification technique that boosts dwt technique form if DW (Data Watermarking) plan to remove the regular framework then it is not the number of keys about the proposed work of DLW (Double Line Watermarking) plan successfully. Ninny Mittal et al. (2017) [7] In this scenario, we have used optical watermarking that depends on the mix of the DWT, FTT, and SVD, and another goal is to the finding out the life of DW that is emerge in a progression and provide incorporate watermarking is to share with the other address of pictures data carried out the Frontline cameras. D’Silva et al. (2017) [8] In this scenario, we can use the hybrid methodology to define the SWD and DWT to make individual plan. There are several assets to show the difficulty of expansion like SVD and DWT techniques on the crossover technique. It can be happened quickest because of the SVD. This is implemented in a MATLAB condition. Hua et al. (2016) [9] According to this paper, we have to improve the security in audio dual watermarking i.e. wavelet packet analysis. It uses Ultra caustic encryption

53 A Novel Approach to Encrypt the Data Using DWT and Histogram …

533

to explain that is finding the accuracy of the selected and contains the binary sequence of zero watermarking is based on the operation of an algorithm. Yang et al. (2017) [10] According to this paper, watermarking is based on the wavelet packets for data transform to protect in smart grid. They are the type of watermarking; one is the robustness, and second one is fragile. These are the coefficients of DWT technique to protect the copyright of integrity and confidentiality. Chen et al. (2016) [11] Digital watermarking is based on the multimedia technique that is protected uniform authentication of the ownership. The watermark videos are basically based on the different type of attack in correlation of factor used for security and confidently and integrity. Kiatpapan et al. (2015) [12] According to this paper, the image temper detection has analyzed for using encryption techniques. This is based on due to side watermarking performance and temper detection; it will be helpful to remove large and tempering such as entire message. Xiang-Gen Xia et al. [13], Daxing Zhang et al. [14] and Qiang Wang et al. [15] proposed three multiple ways to implement DWT in embedding of the watermark onto the host image. In [11], a multiresolution watermark is used by subtracting pseudorandom codes to the coefficients at upper and middle frequency bands of DWT. Proposed method is robust versus worldwide image distortions. Depending upon noise level of image, watermark information is retrieved by computation load. While in [12] a contour-based semi-fragile image watermarking algorithm is ripened and implemented by dividing the Y subdivision of original image in 4 × 4 blocks and applying first a two-level DWT surpassing applying Canny edge detector to requite a filtered silhouette image On the other hand, in [13], a new method tabbed Chaos is implemented in the wavelet transformation. The watermark is embedded onto the singular values of the host image’s DWT sub-bands.

53.3 Methodology The main goal of this work is to cover entire digital information into an image. There are two stages we have worked that form storage. We have work that first one is hiding the information and second one extraction of the image. The entire work is isolated into embedding and extraction steps. For imperceptible embedding was done in such a manner that visibility of the information hiding by naked eyes isn’t conceivable. If there should be an occurrence of extraction information ought to be effectively recovered from the received information with no data loss of the embedded information (Fig. 53.1).

534

S. K. Srivastava et al.

Fig. 53.1 Block diagram of proposed work

53.3.1 Pre-processing Image matrix is basically followed by the pixel value between the five ranges like 0–255, 0–1, 0–360 and so on. These matrixes are showing the value of progression. The complete work done when it belongs is 0 to 255 image pixel and is based on the framework of the image counted in the frame of grid.

53.3.2 DWT (Discrete Wavelet Transform) LL: In Fig. 53.2, the upper part of the value is called LL block. It is obtained by the filtration of image by rows and column in the low pass filter. This block does have any other additional information, and it can be termed as approximately variation of an image. HL: In Fig. 53.2 HL in the given diagram, the higher right part is called as the HL block. This block is obtained by the filtering of low pass filter by applying at the same time. It carries the edge of information the image without any false information. LH: In Fig. 53.2 LH block, in this image, lower part is called LH block. This image obtained by the passing of low filtration method. It passes hyper filtration to filter the analysis of images. HH: In this part, the lower right of image is called HH block. This image is obtained by passing the high class filtrations method and low pass filtration method. It passes the low class filtration, and there are diagonal age region to fetch the information.

53 A Novel Approach to Encrypt the Data Using DWT and Histogram …

535

Fig. 53.2 DWT of Lena image from [8]

So, it is the use of LL band to identify by the DWT output and never identified by naked eyes.

53.3.3 Image Histogram Images program S vector founded after reverse order is used where histogram of the image is find at one bins. This can be understand as let scale of color is 1 to 10, than count of each pixel value is done in the image. So as per above S vector Hi = [0, 0, 0, 0, 4, 5, 5, 1, 2, 0] where H represent the color pixel value count and i represent the position in the H matrix with color value.

53.3.4 Histogram Shifting and Data Hiding It is the reversible process of data hiding technique to make in order. It is shifting method to change the image. The above content showing pixel have a system data into a pixel form where the pixel is highest peak p = {6} pixel the Zero presence of the given image.

53.3.4.1

Data Hiding and Histogram Shifting

It is the process of impulsive data hiding technique that is shifting of histogram into an image likely to be a pixel value always obtained when the largest pixel value of highest peak value in the histogram of p = {6} is Z = {1,2,3,10}. Histogram is

536

S. K. Srivastava et al.

basically shifting and manipulating peak value in the presence of pixel value. But as one disadvantage is that number cannot hide maximum information. It can be under it P = {6} where p is a pixel value in five multiple locations of the S vector; it can hide six multiple location image carriers. So, if we can order to increase the proposed work to hide the more histogram firstly increases the capacity of image. It can be easily understood the other pixel value. Let P = {6, 5, 5, 7} can be replaced with Z = {1 2 3 10}.

53.3.4.2

Data Hiding

Here, shifting is done by hiding of each data corresponding value replacing with peak value. Let us consider the hiding of data H = [1, 0, 0, 1]. As already explained in the histogram shifting, peak value will be unaffected if 1bit is used to hide the data else replace peak value with zero value. S = [4, 5, 7, 6, 4, 6, 6, 6, 5, 4, 7, 8, 9, 5, 4, 6, 9] 10 01 HS = [4, 6, 5, 6, 4, 1, 1, 6, 5, 4, 7, 8, 9, 5, 4, 6, 9]

53.3.5 Extraction Steps In the extraction technique, we can extract data at the receiver end and the image with the help of block diagram.

53.3.5.1

Extraction of Image

It is likely to be a proposed work for extraction of useful content at the recipient end. So, it is mandatory to take as histogram, where peak value and zero must pass. Along these resend all images contains secret information. Presently ASCII steam is to change the relating character according to the image than we have carriage the decryption technique of using DWT methodology.

53 A Novel Approach to Encrypt the Data Using DWT and Histogram …

537

53.4 Experiment It is considered as the experimental value as proposed procedure for protection of image. Everything is calculated by MATLAB equipments. It can perform 2.7 GHz, 15 machineauto filtered with the 4 GB ram Windows7 Home Basic. Dataset It is analyzed by the standard pictures, for example, mandrilla, lena, tree, and so forth. These are standard pictures which we are gotten from http://sipi.usc.edu/dat abase/?volume=misc. The framework is tried on everyday pictures also. Evaluation Parameter Peak signal-to-noise ratio  PSNR = 10 log10

Maximum_ Pixel_ Value Mean_ Square_ Error



Signal-to-noise ratio  SNR = 10 log10

Signal Noise



Extraction rate η=

nc × 100 na

where nc = number of pixels. Where na = total number of pixels.

53.5 Result To help investigate the security performance of the proposed algorithm, a program is developed. The experimental data comes from the image database, and we choose the image Lenna.jpg which can test each processing algorithm because it mixes detail, smooth area, shadow and texture (Fig. 53.3). From Table 53.1, we got the ideal information to form the previous works PSNR parameters. We can use DWT technique and regenerate the images in different color mode. So that image can be easily hidden. It is high parameter as compared to the previous work (Fig. 53.4). From Table 53.2, we got the ideal condition as comparison to the proposed work under the SNR evolutions parameters. As DWT and histogram shifting algorithm

538

S. K. Srivastava et al.

Fig. 53.3 PSNR-based comparison between proposed work and previous work Table 53.1 Analysis proposed work and previous work at PSNR

PSNR-based comparison Images

Proposed work

Previous work

Tree

71.6914

50.5478

Bowl

72.7578

52.3594

Lena

71.8595

51.3273

Fig. 53.4 SNR-based comparison between proposed and previous work

53 A Novel Approach to Encrypt the Data Using DWT and Histogram … Table 53.2 Analysis of proposed and previous work at SNR

539

SNR-Based Comparison Images

Proposed work

Previous work

Tree

23.5462

3.45078

Bowl

24.7507

3.38207

Lena

23.9455

3.43929

has regenerate images in color format only so this parameter is high as compare to previous value (Figs. 53.5, 53.6 and 53.7). From Tables 53.3, 53.4 and 53.5, we got the filtration attack to the current proposed work as compared to the previous work. Decryption of the evaluation is based on the rate of parameters. [8]. As DWT technique has to regenerate images indifferent color format with high parameters in comparison with the previous parameters (Figs. 53.8, 53.9 and 53.10). From Tables 53.6, 53.7, the noise attack information has been compared between proposed work and previous work. As DWT regenerates the image with different color. So, these parameters are high in comparison with the others (Table 53.8).

53.6 Conclusion Here, the proposed work is basically hiding the useful information into an image so that we can easily transfer the confidential information sender to the receiver end. It carries high security while maintaining the embedded data. When embedding is done, histogram algorithms shuffle the confidential data. As given, algorithm will decrypt the message at the receiver end. Its result shows that proposed work improves the result from PSNR and MSE and extraction of parameter is high. In the future work, due to the large redundancy of the image, DWT technology is suggested and unromantic into the image encryption algorithm. As a result, we do not need to process every pixel in the process of encryption. By using DWT technique to shrink and encrypt the image, it can modernize the speed and effect of encryption effectively. However, it will moreover lead to much time forfeit to reconstruct the image. So, there are still a lot of works that should be considered in the future.

540

S. K. Srivastava et al.

Fig. 53.5 PSNR-Noise based comparison between proposed and previous work

53 A Novel Approach to Encrypt the Data Using DWT and Histogram …

Fig. 53.6 SNR-Filter based comparison between proposed and previous work

541

542

S. K. Srivastava et al.

Fig. 53.7 Filter attack based comparison between proposed work and previous work

Table 53.3 Analysis of proposed work and previous work at PSNR

Table 53.4 Analysis of proposed work and previous work at SNR

Filter attack-based PSNR comparison Images

Proposed work

Previous work

Tree

51.082

50.2961

BowI

59.3812

52.2136

Lena

52.2868

51.1646

Filter attack-based SNR comparison Images Tree BowI Lena

Proposed work 3.93683 10.3742 4.37279

Previous work 3.19903 3.23626 3.27661

53 A Novel Approach to Encrypt the Data Using DWT and Histogram … Table 53.5 Extraction rate comparison between proposed and previous work

Filter attack-based data extraction comparison Images

Proposed work

Previous work

Tree

40.666

22.222

Bowl

57.333

16.666

Lena

40.666

22.916

Fig. 53.8 Noise attack based comparison between proposed work and previous work

543

544

S. K. Srivastava et al.

Fig. 53.9 SNR-Noise based comparison between proposed work and previous work

53 A Novel Approach to Encrypt the Data Using DWT and Histogram …

Fig. 53.10 Extraction rate of comparison between proposed work and previous work Table 53.6 Extraction rate comparison between proposed and previous work

Noise attack-based data extraction comparison Images

Proposed work

Previous work

Tree

57.3333

42.055

Bowl

57.3333

33.722

Lena

57.3333

34.416

545

546 Table 53.7 SNR-based comparison between proposed and previous work

S. K. Srivastava et al. Noise attack-based SNR comparison Images Tree

Table 53.8 Extraction of rate comparison between proposed and previous work

Proposed work 8.4888

Previous work 3.4147

Bowl

9.374

3.33241

Lena

11.032

3.39444

Noise attack-based PSNR comparison Images

Proposed work

Previous work

Tree

56.634

50.511

M

57.381

51.309

Lena

57.946

50.282

References 1. Patro, K.A.K., Acharya, B.: A novel multi-dimensional multiple image encryption technique. Multimedia Tools Appl. 79(19–20), 12959–12994 (2020) 2. Chai, X., Fu, X., Gan, Z., Lu, Y., Chen, Y.: A color image cryptosystem based on dynamic DNA encryption and chaos. Signal Process. 155, 44–62 (2019) 3. Ponuma, R., Amutha, R.: Encryption of image data using compressive sensing and chaotic system. Multimedia Tools Appl. 78(9), 11857–11881 (2019) 4. Tabassum, T., Mohidul Islam, S.M.: A digital image data hiding technique based on identical frame extraction in 3-level DWT, 13(7), 560–576 (2003) 5. Joshi, A.K., Sharm, S.: Reversible data hiding by utilizing AES encryption image & LZW compression. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 6(2), ISSN: 2278-1323 (2017) 6. Hartung, F., Su, J.K., Girod, B.:Spread spectrum data hiding: malicious attacks and counterattacks of multimedia contents. Int. J. Res. Eng. Technol. Eissn: 2319-1163 | Pissn: 2321-7308 (2005) 7. Chen, Z., Li, L., Peng, H., Liu, Y., Yang, Y.: A novel digital watermarking based on general nonnegative matrix factorization. IEEE Trans. Multimedia (2018). https://doi.org/10.1109/ TMM.2018.2794985 8. Mittal, N., Bisen, A.S., Gupta, R.: An improved digital watermarking technique based on 5DWT, FFT& SVD. In: International Conference on Trends in Electronics and Informatics ICEI 2017. 978-1-5090-4257-9/17/$31.00 ©2017 IEEE (2017) 9. D’Silva, A., Shenvi, N.: Data security using SVD based digital watermarking technique. In: International Conference on Trends in Electronics and Informatics ICEI 2017, 978-1-50904257-9/17/$31.00 ©2017 IEEE (2017) 10. Hua, G., Bi, G., Xiang, Y.: Dual channel watermarking—A filter perspective. In: 2016 International Conference on Progress in Informatics and Computing (PIC). 978-1-5090-34840/16/$31.00 ©2016 IEEE (2016) 11. Yang, J-X., Niu, D-D.: A novel dual watermarking algorithm for digital audio. In: 2017 17th IEEE International Conference on Communication Technology. 978-1-5090-39449/17/$31.00 ©2017 IEEE (2016) 12. Chen, Q., Xiong, M.: Dual watermarking based on wavelet transform for data protection in smart grid. In: 2016 3rd International Conference on Information Science and Control Engineering. 978-1-5090-2534-3 /16 $31.00 © 2016 IEEE (2016)

53 A Novel Approach to Encrypt the Data Using DWT and Histogram …

547

13. Kiatpapan, S., Kondo, T.: SawiyaKiatpapan and Toshiaki Kondo. In: 2015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). 978-1-4799-7961-5/15/$31.00 c 2015 IEEE (2015) 14. Xia, X.-G., Boncelet, C.G., Arce, G.R.: A multiresolution watermark for digital images. In: Proceedings of International Conference on Image Processing, 1997 vol. 1. IEEE (1997) 15. Zhang, D., Pan, Z., Li, H.: A contour-based semi-fragile image watermarking algorithm in DWT domain. In: Second International Workshop on Education Technology and Computer Science (ETCS), 2010 vol. 3. IEEE (2010)

Chapter 54

Auto-generation of Smart Contracts from a Domain-Specific XML-Based Language Vimal Dwivedi

and Alex Norta

Abstract Smart contracts are a means of facilitating, verifying and enforcing digital agreements. Blockchain technology, which includes an inherent consensus mechanism and programming languages, enables the concept of smart contracts. However, smart contracts written in an existing language, such as Solidity, Vyper, and others, are difficult for domain stakeholders and programmers to understand in order to develop code efficiently and without error, owing to a conceptual gap between the contractual provisions and the respective code. Our study addresses the problem by creating smart legal contract markup language (SLCML), an XML-based smartcontract language with pattern and transformation rules that automatically convert XML code to the Solidity language. In particular, we develop an XML schema (SLCML schema) that is used to instantiate any type of business contract understandable to IT and non-IT practitioners and is processed by computers. To reduce the effort and risk associated with smart contract development, we advocate a pattern for converting SLCML contracts to Solidity smart contracts, a smart contractual oriented computer language. We exemplify and assess our SLCML and transformation approach by defining a dairy supply chain contract based on real-world data.

54.1 Introduction Blockchain technology has gained traction in a variety of industries, including finance [1] and healthcare [2], due to its distributed, decentralized, and immutable ledger, in which individual entities may not be trusted. Blockchains overcome the intermediary trusted authority by securing and validating a transaction through cryptographic signature and a consensus mechanism. Several blockchains, including Ethereum and Hyperledger, use smart contracts to define business rules and automate business processes that govern transactions. According to Nick Szabo “A smart V. Dwivedi (B) · A. Norta Tallinn University of Technology, Akadeemia tee 15a, Tallinn, Estonia e-mail: [email protected] A. Norta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. C. Satapathy et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 266, https://doi.org/10.1007/978-981-16-6624-7_54

549

550

V. Dwivedi and A. Norta

contract is a computer program or a transaction protocol which is intended to automatically execute, control or document legally relevant events and actions according to the terms of a contract or an agreement [3].“In 1996, Szabo pioneered the concept of smart contracts, which was later adopted by blockchain technology. The first version of blockchain (also known as blockchain 1.0) was implemented in 2008 as a byproduct of bitcoin without the capability of smart contracts [4]. Blockchain 2.0 introduces smart contract languages (SCLs) like Solidity, Vyper, and others, which have significantly increased the use of smart contracts and blockchain implementations outside of digital currencies [5]. A smart contract is frequently confused with a computer program written by an IT programmer, but it is an interdisciplinary concept that includes, but is not restricted to, business, financial services, and legal principles [6]. A smart contract defines how exchanges and disbursements between various wallets in the business and finance domains are shared. A contract is a legal agreement between collaborating stakeholders that consists of consensual commitments in commercial contracts, whereas a smart contract is one in which the commitments are encoded in computer programs that are automatically executed [6]. Because smart contracts are interdisciplinary in nature, different practitioners such as lawyers, computer engineers, business and finance experts, and others from various domains can collaborate to design, propose, and implement smart contracts. Existing SCLs (such as Solidity [7], Vyper [8], and others) are primarily implemented technologically, and smart contracts written in these languages are incomprehensible to professionals outside the IT sector. Legal properties (rights and obligations, for example) of smart contracts are equivalent to software requirements that the program must meet for the programmer. As a result, legal knowledge is required for IT programmers to write contract content and communicate with business people in order to elicit and clearly define software requirements. Due to a lack of legal knowledge among IT professionals, verifying legal requirements in smart contracts is difficult and time-consuming. Another unsolved issue is that existing SCLs make it impossible to write sophisticated cooperative business relationships (such as DAOs) in a legal-compliant manner [9]. Several workarounds for developing legally binding SCLs have been published in the scientific literature, including SmaCoNat [10], ADICO [11], and SPESC [12]. The above-mentioned publications, in particular, encompass intriguing approaches and findings. However, there is no model transformation to an executable smart contract implementation in the proposed domain specific languages [13]. Furthermore, existing SLCs are not process aware when it comes to writing collaborative business contracts. As a result, by addressing the research question, ”How to build a BPMN choreography model for converting an SLCML contract to Solidity,” this study fills a void in existing literature. The paper’s contributions include the creation of an SLCML1 (XML-based language) smart-contract implementation that is process aware and understandable by both IT and non-IT practitioners. SLCML allows for the requirements of a smart contract’s configuration (including execution) to be used in the creation of collaborative business contracts. To reduce the effort and risks 1

shorturl.at/uBHR6.

54 Auto-generation of Smart Contracts …

551

associated with smart-contract development, we propose a pattern and its transformation rules for building a choreography model to translate smart contracts written in SLCML into Solidity. We derive the following sub-research question from the main research question in order to simplify it and develop a separation of concerns. What is the structure of the SLCML instantiation that is crucial for the choreography transformation? What are the patterns and rules for converting SLCML code to a BPMN choreography model? What is the feasibility-evaluation approach of the proposed solution for a use case? The paper’s outline is as follows. In Sect. 54.2, we discuss about the traceability of rights and obligations in the dairy supply chain. We also illustrate the preambles, which aid the reader’s comprehension of the sections that follow. In Sect. 54.3, we define the running case SLCML instantiation and describe its vocabulary and configuration. In addition, the pattern and transformation rules are discussed in Sect. 54.4, and the feasibility of the code generator is assessed in Sect. 54.5. Related work is described in Sect. 54.6, and the paper is concluded with Sect. 54.7.

54.2 Motivating Example and Preliminaries To demonstrate how legally binding smart contracts can be created, we address an ongoing case study from the dairy food supply chain. We believe that supply-chain representatives from both upstream (manufacturing companies, producers, etc.) and downstream (distribution companies, resellers, etc.) track and process conformance data in order to explain enforcement criteria to both public workers and more demanding consumers. As a result, in Sect. 54.2.1, we present the running case and discuss a conflict scenario involving the rights and obligations of upstream and downstream parties. Section 54.2.2 then describes the related background literature, preparing the reader for the following sections.

54.2.1 Running Case Blockchain technology has the prospects to benefit the food supply chain, including the pork supply chain [14], the fish supply chain [15], and others. A significant usecase for blockchain is the tracking and monitoring of product safety and regulatory compliance throughout the food supply chain [16]. To better understand the traceability of rights and obligations in relation to the various food supply-chain stakeholders, we employ the dairy supply-chain conceptual model [17] for our research, as shown in Fig. 54.1. Many stakeholders, including manufacturers, retailers, and others, are in charge of managing the supply-chain operation from the start, when a cow on a farm produces raw milk, to the finished product, when a consumer consumes babymilk powder. The traceability of one of the actors’ internal processes is referred to as internal traceability, whereas chain traceability corresponds to the traceability of

552

V. Dwivedi and A. Norta

Food Safety Information System

Internal Traceability

Factory

External Traceability

Internal Traceability

External Traceability

Distributor

Information & Communications Technology

Location-based Technology

Internal Traceability

Retailer

Internal Traceability

Consumer

External Traceability

Farm

Wireless Identification & Sensor Technology

External Traceability

Internet & Web Technology

Internal Traceability

Supply-chain Traceability

Good Practices (GMP, GHP,...)

HACCP

ISO Standards

TQM

Food Safety & Quality Assurance System

Fig. 54.1 Dairy food supply chain [19]

the entire supply chain [18]. An external traceability is used to determine the traceability between two actors. Each actor uses a different type of technology, such as IoT devices, location-based technology, to retrieve and provide information to the Food Safety Information System (FSIS). The latter includes a variety of data that are required for food supply-chain actors to achieve transparency and quality assurance. According to [17], FSIS is managed by centerlized-, or decenterlized information that is not specified. The emphasis is on each actor possessing similar abilities at the same time. The FSIS stores traceability data that reflects compliance with these regulations. We are only interested in market exchanges between entities. Farmers keep detailed records of their farm’s location, breed, immunizations, treatments, and, if applicable, special regimens. Animal health and movement are tracked using RFID devices, or any other sensor network that incorporates blockchain technology. Similarly, data on animal migration is captured and stored on blockchains using sophisticated machines. When the milk is milked, the distributor is notified via public blockchain platforms, and the milk is ready for collection. Maintaining temperature during transportation is critical to preventing milk spoilage, and sensors devices are used to accomplish this. Furthermore, GPS is used for real-time vehicle tracking. As milk is delivered to the factory, the relevant information is updated on a blockchain network. The location of the unit, the amount of delivery at a specific lot, and so on are examples of information. The factory processes the milk and produces the

54 Auto-generation of Smart Contracts …

553

baby milk powder, as well as providing consumers with accurate information about food products, management guidelines, validity periods, usage directions, and other useful information. According to [20], smart contracts are required for food supply-chain operations to excel in the form of improved buyer service and quality assurance. The supplychain operation clauses are specified in the food safety and quality assurance system to trigger specific events. Smart contracts, for example, charge a penalty if a distributor fails to deliver milk to producers (i.e., a factory) within a specified time and quality. Collaboration organizations in a supply chain process often have little or no control over which organizations are responsible for inefficiencies. Smart contracts and blockchain technology enable collaborative stakeholders to track and check the progress of products and exchanges, making this oversight feasible. Nonetheless, due to the infancy of blockchain technology and SCLs, we raise economic and regulatory concerns. What happens if the quality of the milk delivered does not meet the producers’ specifications? Assume a smart contract automatically releases financial resources (ether, bitcoin, etc.) upon supplying milk to producers. When the milk quality is poor prior to delivery, the producer seeks compensation, or exchanges the milk. On the distributor side, the obligation to fulfill that compensation must be imposed. The distributor can also claim that the poor-quality milk stems from the farmer. Legal constraints must be specified in smart contracts in these cases according to contract law. Contracts must include exchange provisions that define the rules for exchanging the product, canceling the contract, and calculating interest adjustments in payments.

54.2.2 Preliminaries The previous section discussed the challenges of developing collaborative smart contracts for the dairy supply chain, in which participants’ rights and obligations must be stipulated. We use the Liquid studio tool to create SLCML (i.e., XML schema) as a foundation for instantiating each type of real-world business contract. Liquid studio2 includes a robust collection of XML and JSON creation tools, and also data visualization and conversion tools (such as an XSLT editor and an XQuery editor), and a graphical XML schema editor for visualizing, authoring, and manipulating complex XML schemata. The former gives you an interactive rational stance of the Xml document, allowing you to edit it intuitively, while still having access to all of the W3C XML schema’s features. Following that, a Solidity code-generator tool is created to convert XML-based smart contracts into blockchain-specific programming languages (i.e., Solidity [21]). Following that we next present the SLCML schema definition and SLCML instantiation for legally binding dairy-milk supply chain with the XML smart-contract code.

2

Liquid Studio—Home.

554

V. Dwivedi and A. Norta

54.3 SLCML: A Contract-Specification Language SLCML [22] is a machine-readable XML-based choreography language for specifying cross-organizational business smart contracts derived from a smart-contract ontology,3 which includes the concepts and properties of legally binding contractual business collaboration. We do not go into detail about the smart-contract ontology because it is beyond the scope of this paper. SLCML is an extension of the eSourcing Markup Language (eSML), with the goal of incorporating a smart-contract collaboration configuration. eSML is based on a real-world contractual framework, and collaborating stakeholders use process views that are projected externally for cross-organizational matching. When a match takes place an agreement is formed, which is the primary criterion for contract formation. A process view is an abstract concept of an inbuilt process that enables customers to control the evolution of process instances while concealing sensitive or insignificant aspects of the supplier process. Legally binding constructs, as well as process views and their efficient matching relationships, are critical factors in establishing crossorganizational smart contract collaboration. We do not go into great detail about process views and their matching relationships in SLCML instantiation because they are a part of eSML and have already been discussed in [23, 24]. SLCML is built on a subset of eSML, which we supplement with additional rights and obligations schemas. The complete schema, as well as process views, can be downloaded from the link provided in Sect. 54.1. Following that we discuss below the SLCML instantiation in terms of rights and obligations that is not part of eSML for our running case, which is based on the SLCML schema. The code excerpt in Listing 54.1 defines the fundamental contractual elements required for any legally binding business-oriented smart contract. To resolve the conflict, the producer (i.e., factory) and the distributor create a smart contract with a unique ID that cannot be altered during enforcing contracts. Line 2 specifies the producer’s public key, and Line 6 specifies the milk distributor’s public key. Lines 3 and 7 specify the names of the parties, namely the producer and distributor. The contracting parties’ roles, i.e., producer as a service consumer and distributor as a milk supplier are defined in Lines 4 and 8, respectively. Line 10 specifies the contract consideration (milk) for which the parties have agreed to a contractual relationship. The obligations and rights outlined in 54.2 and Listing 54.3 are incorporated into the terms and conditions. 1 2 3 4 5 6 7 8 9

< c o n t r a c t c o n t r a c t _ i d = " Id1 " > < p a r t y a d d r e s s = " 03 m6 " > < name > Producer < role > S e r v i c e c o n s u m e r

< p a r t y a d d r e s s = " 31 x7 " > < name > D i s t r i b u t o r < role > Milk s u p p l i e r

3

shorturl.at/gxFKT.

54 Auto-generation of Smart Contracts … 10 11 12 13 14 15 16

555

< c o n s i d e r a t i o n > Milk < terms_and_conditions /> < obligation /> < right /> < prohibitions /> < terms_and_conditions >

Listing 54.1 Contract instantiation for the dairy supply chain.

Listing 54.2 is an illustration of a producer commitment to compensate for milk. The obligation has a name and a unique ID that is used to track performance that we consider a monetary obligation because it deals with economic-, or financial consequences. Line 3 initiates the obligation state, indicating that the producer collects milk in accordance with the orders and is obligated to pay the distributor money. The producer is the promisor, and he or she is responsible for fulfilling this obligation as stated in Line 6. The obligations stated in Line 5 benefit the distributor, and we suppose no intermediaries or arbitrators are involved. The producer is expected to act by investing the amount, and the to-do obligation has legal ramifications. Line 12 presuppose the obligations, for which the producer and distributor sign contracts (Act 1) and the producer receives milk. The payment that must be transferred from the producer’s wallet address to the distributor’s wallet address is referred to as the performance type. Furthermore, the performance object is characterized as a qualified purchase for which a particular amount is compensated within a particular time period. The payment time limit is specified in the rule conditions, and the purchasepayment plan is specified in Line 15. Finally, a reference to the existence of a late payment remedy is added to the obligation. If the producer fails to pay the money within the specified time frame, the producer must transfer a specified monetary amount to the distributor. 1

2 3 4 5 6 7 8 9 10 11 12

13

14

15

< obligation_rule tag_name =" paying_invoices " rule_id = " 0001 " c h a n g e a b l e = " f a l s e " m o n e t a r y = " true " > < state > e n a b l e d < parties > < b e n e f i c i a r y > D i s t r i b u t o r (31 x7 ) < obligor > P r o d u c e r (03 m6 ) < t h i r d _ p a r t y > nil

< obligation_type > < l e g a l _ o b l i g a t i o n > to - do

< p r e c o n d i t i o n > a c t 1 ( s i g n e d ) & Milk ( t r a n s f e r r e d ) < p e r f o r m a n c e _ t y p e > p a y m e n t (03 m6 ,31 x7 , buy ) < p e r f o r m a n c e _ o b j e c t > i n v o i c e ( buy , a m o u n t ) < performance_object > < r u l e _ c o n d i t i o n s > d a t e ( b e f o r e d e l i v e r y of milk )

556 16

17

V. Dwivedi and A. Norta

< remedy > l a t e _ p a y m e n t _ i n t e r e s t ( amount ,03 m6 ,31 x7 )

Listing 54.2 Paying milk obligation illustration.

The obligation is intersected by provisions in the Listing 54.3 code extract. Because the parties’ rights and obligations are intertwined, if one party asserts its rights, the other must comply. The rights, such as the obligation in Listing 54.2, have a beneficiary who can benefit from them and an obligor who can enable them. For example, if a producer receives poor-quality milk, he or she has the right to demand that the milk be replaced. As a result, the distributor is required to replace the milk. 1

2 3 4 5 6 7 8 9 10 11 12

13

14

15 16

17

< right_rule tag_name =" milk_replacement " rule_id =" 0002 " c h a n g e a b l e = " true " m o n e t a r y = " f a l s e " > < state > e n a b l e d < parties > < b e n e f i c i a r y > p r o d u c e r (31 x7 ) < obligor > d i s t r i b u t o r (03 m6 ) < t h i r d _ p a r t y > nil

< right_type > < c o n d i t i o n a l _ r i g h t > c l a i m

< p r e c o n d i t i o n > a c t 1 ( s i g n e d ) & Milk ( t r a n s f e r r e d ) < p e r f o r m a n c e _ t y p e > r e p l a c e ( poor - q u a l i t y milk ) < a c t i o n _ o b j e c t > milk ( cans of milk , type , and b a t c h unit ) < r u l e _ c o n d i t i o n s > d e a d l i n e ( date ) < remedy > l a t e _ r e p l a c e m e n t _ i n t e r e s t ( amount , 31 x7 )

18

Listing 54.3 Replacing low-quality milk with this example.

Afterward, we suppose that the rights, as defined in Line 1, have a name and an ID. Because the distributor has the right to revoke the right, the distributor, for example, can persuade the producer that the quality of the milk was ruined during logistics due to a faulty sensor machine that was not his fault. If the distributor agrees to replace the milk, the rights to the contract can be changed, while it is being carried out, and the compensation can be placed to false. The parties are similarly specified that they are in Listing 54.2, and the state of right is ready to be enacted right away. Because the producer demands that the milk be replaced, conditional-right is assigned to the right-type. For the right to be exercised, the contract must be signed and the milk must be delivered to the producer. To substitute the milk mentioned as a performance object, the performance type has been changed to cans of milk, type, and batch unit. Following the activation of this right, the distributor’s corresponding obligation must

54 Auto-generation of Smart Contracts …

557

be met within the timeframe specified; otherwise, the producer is entitled to monetary compensation. In the following section, we discuss the rules for converting SLCML code to a choreography model and then implementing solidity smart contract code.

54.4 Patterns and Transformation Rules We borrow a concept from the translation of ADICO statements to solidity [25] and XML to choreography model [26]. According to Frantz et al. [25], contract statements are divided into different components (abbreviated as ADICO), which include ‘Attributes‘, ‘Denotic‘, ‘AIm‘, ‘Conditions‘, and ‘Or else, where attributes denotes actor characteristics and denotic describes obligations, permissions, or prohibitions. AIm describes the action taken to regulate the contract, conditions describe the contract’s contextual conditions, and or-else describe the consequences. Furthermore, the author proposes mapping rules that enable developers to generate solidity code from ADICO components. Our main contribution is to first transform the rights and obligations written in SLCML into choreography model based on the publication [26] and then to solidity code based on the publication [25]. The proposed SLCML-Solidity mapping is summarized in Table 54.1. This core construct mapping serves as the cornerstone for translating SLCML specifications into solidity contracts. Supply chain is a process choreography model referred to as Supply chain choreography model, which is then transformed into a smart contract referred to as “Supply chain smart contract“(rule (a)). External functions are only called externally by other smart contracts, or they can be called if the former includes interaction with other contracts. Interactions between the primary smart contract, Supply chain contract, as well as the Supply chain oracle contract, which are discussed further below, occur in our case. Product quantity, quality, and other constraints are attached to attributes components, which are contract global variables that translate to Solidity struct members (rule(c)). Similarly, performance types effectively represent functions and events (rule(f)), whereas function modifiers introduce descriptive checks that invalidate function execution (rule(d, e, g)) to reflect the mix of rights, obligations, and corresponding preconditions.The performance type (rule(f)) is refined further by enabling the configuration of an item, such as an invoice, as shown in Listing 54.2, and a target related with a particular operation, such as replacement, as shown in Listing 54.3. Events that are prompted as a result of the fulfillment of encapsulated circumstances are a subset of this type (e.g. reaching a deadline for pay). Remedy is repercussions for breaching provisions in function modifiers, which are translated by default utilizing the throw primitive (rule(h)). Conditions joined by quantifiers are defined by a single-modifier construct, with semantic integration delegated to the developer. Assume a payment is made, and the payment is represented as a choreography task that interacts with the merchant account, which is implemented in Solidity as an external function.

558

V. Dwivedi and A. Norta

In the process choreography model, tasks represent steps in the supply chain that interact with external resources. As a result, we discover: (1) information recorded from an external actor (eventually the service providers) and passed to the smart contract, for example, service providers provide data regarding their service costs, and so on, which is to be processed with the implementation of the process choreography model. (2) Existing technologies as well as utilities will also pass data to the smart contract, such as information about a payment task or transportation configurations. (3) data read from smart contracts stored in the blockchain. Special contracts known as oracles will be used to deal with external data. In the following section, we will use these transformation rules on SLCML code to generate the choreography model and then the solidity code.

54.5 Feasibility Evaluation The SLCML code examples produced in Listing 54.1, 54.2, 54.3 serve as our starting point. The workflow model in Figures 54.2 and 54.3 is generated using the XML to choreography transformation rules in Table 54.1, in which two organizations, namely, service consumer and service provider, are engaged in the execution of the crossorganizational milk supply chain process. A service provider organization (e.g., milk distributor) completes a workflow process on behalf of a service consumer (e.g., milk producer). Nonetheless, the service provider does not wish to disclose all of the details of the workflow process that it implements, preferring to disclose only those aspects of the process that are of interest to potential consumer organizations. We do not represent the process views in the SLCML code due to page constraints; however,

Table 54.1 Rules for transitioning SLCML to solidity Rule XML component Choreography component ID (a)

Root element: supply chain

(b) (c) (d) (e) (f) (g) (h)

Step containing a supply chain Attributes Obligation Precondition Performance type Right Remedy

(i)

Step containing payment

Solidity code

Supply chain: choreography Supply chain: smart contract model Choreography task External function Data perspectives Choreography task Choreography task Choreography task Choreography task Embedded in model documentation choreography task

Struct Function modifier, Events Function modifier Functions, Events Function modifier Throw statements/alternative control flow External function

54 Auto-generation of Smart Contracts … consumer proess view

SEQ1

consumer process view

559

c: milk packaging

c:take a order

c: sign contract

c:determine right & obligation

SEQ2

c:deliver regular

c:claim parcel

c:test parcel

c:schedule route

c:pay money

c:fulfilled

precondition

c: assign batch number

rule[ delivery. original condition returnDate