Proceedings of Research and Applications in Artificial Intelligence: RAAI 2020: 1355 (Advances in Intelligent Systems and Computing, 1355) [1st ed. 2021] 981161542X, 9789811615429

This book discusses the recent research trends and upcoming applications based on artificial intelligence. It includes b

343 95 15MB

English Pages 370 [350] Year 2021

Table of contents :
RAAI 2020 Committee Members
Preface
Contents
Editors and Contributors
Prediction and Analysis on COVID-19 Using Positive and Negative Association Rule Mining
1 Introduction
2 Methodology
2.1 Data Source and COVID-19 Symptoms
2.2 Algorithms
3 Result and Discussions
4 Conclusion
References
Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking for Secure Image Transfer
1 Introduction
2 Literature Review
3 Proposed Hybrid DWT-DCT-RSA Algorithm
3.1 Procedure for Encryption
3.2 Procedure for Decryption
4 Proposed Algorithms
4.1 Key generation Algorithm [5]
4.2 The DWT-DCT-RSA based Encryption Algorithm
4.3 The DWT-DCT-RSA based Decryption algorithm
5 Result and Performance Evaluation
6 Conclusion
References
Detecting Sexually Predatory Behavior on Open-Access Online Forums
1 Introduction
2 Methodology
2.1 Data Manipulation and Text Pre-processing
2.2 Vector Representation of Words
2.3 Feature Extraction Using Word Embedding Aggregation
2.4 Two-Stage Classification System
3 Results
3.1 Sexual Predatory Conversation Identification Task
3.2 First-Stage Classification Results
3.3 Second-Stage Classification Results
3.4 Contextual Details
4 Analyzing the Classification System
4.1 First-Stage Classifier
4.2 Second-Stage Classifier
5 Conclusion
References
Swarm-Based Sudoku Solution: An Optimization Procedure
1 Introduction
2 Related Work
3 Proposed Work
3.1 Swarm-Based Sudoku Puzzle Solution
4 Experimental Result
5 Conclusion
References
Application of Cellular Automata (CA) for Predicting Urban Growth and Disappearance of Vegetation and Waterbodies
1 Introduction
2 Methodology
2.1 Study Area
2.2 Data Used
2.3 Method
3 Results and Discussions
4 Conclusions
References
Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture Text and English Humor Literature
1 Introduction
2 Recent Works
3 Corpus
3.1 Sarcastic Words Distribution
4 Deep pLSTM Architecture
4.1 Hyperparameters Tuning
5 Results and Analysis
6 Benchmark Comparisons
7 Discussion
8 Conclusion and Future Work
References
Sentiment Analysis of Covid-19 Tweets Using Evolutionary Classification-Based LSTM Model
1 Introduction
2 Related Works
3 Preparing Covid-19 Dataset
3.1 Data Pre-Processing
4 Feature A: Covid-19 Specified Words Identification
4.1 Word Popularity
5 Feature B: Word Popularity Detection Using N-gram
6 Sentiment Analysis
6.1 Sentiment Classification
7 Sentiment Modeling Using Sequential LSTM
8 Conclusion & Future Scope
References
Clustering as a Brain-Network Detection Tool for Mental Imagery Identification
1 Introduction
2 Proposed Techniques
2.1 Computation of FCM-Based Brain Network Features
2.2 Computation of SOM-Based Brain-Network Features
3 Experiments and Results
3.1 Data Acquisition and Pre-processing
3.2 Training and Classification for FCM-Based Brain Networks
3.3 Brain-Network Computation by Extended SOM
4 Performance Analysis
4.1 Classifier Performance
4.2 Statistical Validation Using Wilcoxon Signed-Rank Test
5 Discussion
6 Conclusions
References
Comparative Study of the Effect of Different Fitness Functions in PSO Algorithm on Band Selection of Hyperspectral Imagery
1 Introduction
2 Literature Survey
3 Relevant Techniques for the Proposed Method
3.1 Particle Swam Optimization
4 Proposed Work
4.1 Algorithm of the Proposed Work
5 Experiment and Analysis
5.1 Data Set Description
5.2 Result Analysis
6 Conclusion
References
Breast Abnormality Detection Using Texture Feature Extracted by Difference-Based Variable-Size Local Filter (DVLF)
1 Introduction
2 Proposed System
2.1 Pre-processing
2.2 Blood Perfusion Image Generation
2.3 Difference-Based Variable-Size Local Filter (DVLF)
2.4 Asymmetry Analysis
3 Results and Discussion
3.1 Dataset Collection
3.2 Classification
4 Conclusion
References
Nuclei Image Boundary Detection Based on Interval Type-2 Fuzzy Set and Bat Algorithm
1 Introduction
2 Boundary Detection in an Image
2.1 Interval Type-2 Fuzzy Set (IT2FS)
3 Mapping of Gradients into IT2FS
4 Boundary Detection as Constraint Optimization
5 Theory of Bat Algorithm (BA)
5.1 Virtual Bats Movement
6 Proposed Method for Boundary Detection
7 Results and Discussion
8 Conclusions
References
Machine Learning Approach to Sentiment Analysis from Movie Reviews Using Word2Vec
1 Introduction
2 Related Works
3 Proposed Methodology
3.1 Data Collection
3.2 Cleaning and Preprocessing Data
3.3 Feature Selection
3.4 Splitting Training and Testing Set
3.5 Apply Machine Learning Algorithm
4 Results and Discussion
5 Conclusions and Future Work
References
Selection of Edge Detection Techniques Based on Machine Learning Approach
1 Introduction
2 Various Edge Detection Techniques
3 Machine Learning
4 Analysis of the Operators: ML Approach
5 Results and Discussions
References
ANN-Based Self-Tuned PID Controller for Temperature Control of Heat Exchanger
1 Introduction
2 System Modeling with Traditional PID Controller
3 System Modeling with the Proposed ANN-PID Controller
4 Simulation Results and Analysis
5 Conclusion
References
A Novel Partitioning Algorithm to Process Large-Scale Data
1 Introduction
2 Concept Behind Pairwise Partitioning
3 Pseudocode of Pairwise Partitioning
3.1 Partitioning Table Generation Procedure
4 Importance of Partitioning Table on Big Data Management
4.1 Important Observations Regarding Essential Vertex & Result Analysis
5 Conclusions and Future Scope
References
Segmentation of Blood Vessels, Optic Disc Localization, Detection of Exudates, and Diabetic Retinopathy Diagnosis from Digital Fundus Images
1 Introduction
1.1 Diabetic Retinopathy
1.2 Motivation
1.3 Proposed Methods
2 Background
3 Materials and Methods
3.1 Hardware and Libraries
3.2 Datasets
3.3 Proposed Methods
4 Experimental Results
4.1 Segmented Blood Vessels
4.2 Localized Optic Disc
4.3 Detected Exudates
4.4 Binary Diabetic Retinopathy Diagnosis
5 Conclusions
References
Interval Type-2 Fuzzy Framework for Healthcare Monitoring and Prediction
1 Introduction
2 Preliminaries
2.1 Type-1 Fuzzy Set (T1FS)
2.2 Type-2 Fuzzy Set (T2FS)
2.3 Footprint of Uncertainty (FOU)
2.4 Interval Type-2 Fuzzy Set
3 Interval Type-2 Fuzzy Logic System (IT2FL)
3.1 Fuzzifier
3.2 Fuzzy Rules
3.3 Fuzzy Inference
3.4 Type-Reducer and Defuzzifier
3.5 Performance Evaluation
4 Results and Discussion
5 Conclusions
References
Real-time Social Distancing Monitoring and Detection of Face Mask to Control the Spread of COVID-19
1 Introduction
2 Literature Study
3 Proposed Methodology
3.1 Social Distancing Violations and Face Mask Classifier Using DBSCAN and DSFD Algorithm
3.2 Real-Time Face Mask Classification
4 Results and Discussion
5 Conclusion
References
Emotion Recognition from Feature Mapping Between Two Different Lobes of Human Brain Using EEG
1 Introduction
2 Principles and Methodologies
2.1 EEG Data Acquisition
2.2 Pre-processing
2.3 Feature Extraction
2.4 Feature Mapping
3 Experimental Results
4 Conclusion and Future Scope
References
Secured Diabetic Retinopathy Detection through Hard Exudates
1 Introduction
2 Proposed Scheme for DR Detection Through Hard Exudates
3 Dual Watermarking on the Automated Report
3.1 Watermark Embedding Methodology
3.2 Watermark Extraction Methodology
4 Results and Discussion
4.1 Results and Discussion on DR Detection
4.2 Results and Discussion on Dual Watermarking
5 Conclusion
References
SAR Image Change Detection Using Modified Gauss-Log Ratio Operator and Convolution Neural Network
1 Introduction
2 The Problem Statements and an Overview of the Proposed Process
3 Methodology
3.1 Preclassification by Using Modified Gauss Log Ratio and FCM
3.2 Sample Selection, Patch Generation, and CNN Training
3.3 Classification Using CNN
4 Experimental Results and Analysis
4.1 Dataset Description
4.2 Experimental Settings
4.3 Result and Discussion
5 Conclusions
References
Convolutional Neural Network-Based Visually Evoked EEG Classification Model on MindBigData
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Acquisition
3.2 Preprocessing
3.3 Classification of EEG Spectrogram Images with Proposed CNN Model
4 Result and Discussion
5 Conclusion
References
Design and Development of a Pipeline Inspection Robot for Visual Inspection and Fault Detection
1 Introduction
2 System Design
3 Result and Analysis
4 Conclusion and Future Work
References
A Random Forest Classifier Combined with Missing Data Strategies for Predicting Chronic Kidney Disease Stages
1 Introduction
2 Materials and Methods
2.1 Dataset
2.2 Data Analysis and Filtering
2.3 Random Forest Classifier
3 Computational Experiments
3.1 Scenarios
3.2 Experimental Setup
3.3 Results and Discussion
4 Conclusion
References
Automatic Cataract Detection Using Sobel and Morphological Dilation Operation
1 Introduction
2 Related Works
3 Problem Identification
4 Proposed Work and Implementation
4.1 Image Pre-processing
4.2 Thresholding
4.3 Sobel Edge Detection Technique
4.4 Morphological Dilation
4.5 Cataract Region Recognition
4.6 Sobel Magnitude and Dilation Algorithm:
5 Experimental Analyses
6 Conclusions
References
Context Based Searching in Cloud Data for Improved Precision of Search Results
1 Introduction
1.1 Context-Based Searching
2 Designing Ontologies
2.1 Restriction Types
3 OWL Constructs Overview
3.1 OWL Syntax of an Ontology
3.2 Ontology Building Tools: Protégé
3.3 Sample Ontologies
4 Word Relativity Computation
4.1 Concept of Affinity Table
5 Conclusion
References
Gaussian-Based Spatial FCM Technique for Interdisciplinary Image Segmentation
1 Introduction
2 Methodology
3 Result and Discussion
4 Conclusion
References
Interactive and Intelligent Tutoring of Graphical Solutions
1 Introduction
2 Proposed Model
3 Experiment
3.1 Lesson Mode
3.2 Test Mode
4 Conclusion
References
COVID-19 India Forecast Preparedness for Potential Emergencies
1 Introduction
1.1 Objective
2 Methodology and Results
3 Discussion and Conclusion
References
Prediction of Cyclodextrin Host-Guest Binding Through a Hybrid Support Vector Method
1 Introduction
2 Materials and Methods
2.1 Data Collection
2.2 Machine Learning Approach
3 Results and Discussion
4 Conclusion
References
Hybrid Unsupervised Extreme Learning Machine Applied to Facies Identification
1 Introduction
2 Materials and Methods
3 Results and Discussions
4 Conclusion
References
Decision Tree-Based Classification Model to Predict Student Employability
1 Introduction
2 Literature Survey
3 Proposed Methodology
3.1 Data Collection
3.2 Data Preprocessing
3.3 Feature Selection
3.4 Model Building and Cross-Validation
4 Experimental Results
5 Conclusion
References
Author Index
506347_1_En_33_Chapter_OnlinePDF.pdf
Interactive and Intelligent Tutoring of Graphical Solutions
1 Introduction
2 Proposed Model
3 Experiment
3.1 Lesson Mode
3.2 Test Mode
4 Conclusion
References

Recommend Papers

Proceedings of Research and Applications in Artificial Intelligence: RAAI 2020 (Advances in Intelligent Systems and Computing) 981161542X, 9789811615429

This book discusses the recent research trends and upcoming applications based on artificial intelligence. It includes b

109 32 15MB Read more

Intelligent Communication, Control and Devices: Proceedings of ICICCD 2020 (Advances in Intelligent Systems and Computing) [1st ed. 2021] 9811615098, 9789811615092

This book focuses on the integration of intelligent communication systems, control systems and devices related to all as

1,391 122 15MB Read more

Advances in Cognitive Research, Artificial Intelligence and Neuroinformatics: Proceedings of the 9th International Conference on Cognitive Sciences, ... in Intelligent Systems and Computing, 1358) [1st ed. 2021] 3030716368, 9783030716363

This book reports on theoretical and experimental research answering key questions in neuroscience, philosophy of mind,

957 121 73MB Read more

Advances in Artificial Systems for Medicine and Education IV (Advances in Intelligent Systems and Computing) [1st ed. 2021] 3030671321, 9783030671327

This book covers the latest advances for the development of artificial intelligence systems and their applications in va

121 73 42MB Read more

Advances in Applications of Data-Driven Computing (Advances in Intelligent Systems and Computing) [1st ed. 2021] 9813369183, 9789813369184

This book aims to foster machine and deep learning approaches to data-driven applications, in which data governs the beh

109 18 7MB Read more

Advances in Computing and Intelligent Systems: Proceedings of ICACM 2019 (Algorithms for Intelligent Systems) [1st ed. 2020] 9789811502224, 9789811502217, 9811502226

This book gathers selected papers presented at the International Conference on Advancements in Computing and Management

163 73 61MB Read more

Embedded Systems and Artificial Intelligence: Proceedings of ESAI 2019, Fez, Morocco (Advances in Intelligent Systems and Computing, 1076) 9811509468, 9789811509469

This book gathers selected research papers presented at the First International Conference on Embedded Systems and Artif

115 17 34MB Read more

Machine Intelligence and Soft Computing: Proceedings of ICMISC 2020 (Advances in Intelligent Systems and Computing, 1280) 9811595151, 9789811595158

121 50 24MB Read more

Machine Intelligence and Soft Computing: Proceedings of ICMISC 2021 (Advances in Intelligent Systems and Computing, 1419) 9811683638, 9789811683633

This book gathers selected papers presented at the International Conference on Machine Intelligence and Soft Computing (

116 103 Read more

Proceedings of 6th International Conference on Harmony Search, Soft Computing and Applications: ICHSA 2020, Istanbul: 1275 (Advances in Intelligent Systems and Computing, 1275) [1st ed. 2021] 9811586020, 9789811586026

This book covers different aspects of real-world applications of optimization algorithms. It provides insights from the

507 61 15MB Read more

Proceedings of Research and Applications in Artificial Intelligence: RAAI 2020: 1355 (Advances in Intelligent Systems and Computing, 1355) [1st ed. 2021]
981161542X, 9789811615429

Author / Uploaded
Indrajit Pan (editor)
Anirban Mukherjee (editor)
Vincenzo Piuri (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Advances in Intelligent Systems and Computing 1355

Indrajit Pan Anirban Mukherjee Vincenzo Piuri Editors

Proceedings of Research and Applications in Artificial Intelligence RAAI 2020

Advances in Intelligent Systems and Computing Volume 1355

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong

The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST). All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/11156

Indrajit Pan · Anirban Mukherjee · Vincenzo Piuri Editors

Proceedings of Research and Applications in Artificial Intelligence RAAI 2020

Editors Indrajit Pan Department of Information Technology RCC Institute of Information Technology Kolkata, West Bengal, India

Anirban Mukherjee Department of Information Technology RCC Institute of Information Technology Kolkata, West Bengal, India

Vincenzo Piuri Department of Computer Science Università degli Studi di Milano Milan, Milano, Italy

ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-16-1542-9 ISBN 978-981-16-1543-6 (eBook) https://doi.org/10.1007/978-981-16-1543-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Dr. Indrajit Pan would like to dedicate this book to Prof. (Dr.) Siddhartha Bhattacharyya Prof. Anirban Mukherjee would like to dedicate this book to all his research scholars Prof. Vincenzo Piuri would like to dedicate this book to all researchers and professionals who are extraordinarily contributing to make our life better, and to students who will change our future

RAAI 2020 Committee Members

Honorary General Chairs Dr. Kalyanmoy Deb, Michigan State University, USA Dr. Aboul Ella Hassanien, Cairo University, Egypt Dr. Valentina Emilia Balas, University of Arad, Romania

Chief Patron Dr. Pranabesh Das, Director, Technical Education, Government of West Bengal, India and Chairman, BOG, RCCIIT, India

General Chair Dr. Anirban Mukherjee, RCC Institute of Information Technology, Kolkata, India

International Advisory Chairs Dr. Elizabeth Behrman, Wichita State University, Kansas Dr. Xiao-Zhi Gao, University of Eastern Finland, Finland Dr. Ricardo Baeza-Yates, Universitat Pompeu Fabra, Barcelona, Spain Dr. Valentina Emilia Balas, University of Arad, Romania Dr. Kalyanmoy Deb, Michigan State University, USA Dr. Aboul Ella Hassanien, Cairo University, Egypt Dr. Vincenzo Piuri, Universita degli Studi di Milano, Italy Dr. Daniela Romano, University College of London, London, UK vii

viii

RAAI 2020 Committee Members

Dr. Debotosh Bhattacharjee, Jadavpur University, India Dr. Siddhartha Bhattacharyya, Christ University, Bengaluru, India Dr. Debashis De, Maulana Abul Kalam Azad University of Technology, West Bengal, India Dr. Paramartha Dutta, Visva Bharati University, Santiniketan, India Dr. Hafizur Rahaman, Indian Institute of Engineering Science and Technology, Shibpur, India Dr. Chiranjoy Chattopadhyay, Indian Institute of Technology, Jodhpur, India Dr. Koushik Mondal, Indian Institute of Technology, (ISM), Dhanbad, India Dr. Nilanjan Dey, Techno International, New Town, India Dr. Ernesto Cuadros-Vargas, School of Computer Science, UTEC, Peru

Organizing Secretary Dr. Indrajit Pan, RCC Institute of Information Technology, Kolkata, India Moumita Deb, RCC Institute of Information Technology, Kolkata, India

Program Chairs Dr. Ernesto Cuadros-Vargas, School of Computer Science, UTEC, Peru Dr. Dipankar Majumdar, RCC Institute of Information Technology, Kolkata, India Dr. Abhijit Das, RCC Institute of Information Technology, Kolkata, India Dr. Shyantani Maiti, RCC Institute of Information Technology, Kolkata, India

Technical Program Committee Dr. Rabie A. Ramadan, Cairo University, Egypt Dr. Amlan Chatterjee, California State University, USA Dr. Sherif Ismail, Umm Al-Qura University, Saudi Arabia Dr. Ahmed A. Elngar, Beni-Suef University, Egypt Dr. Pushpendu Kar, The University of Nottingham, Ningbo, China Dr. Mohamed Abdelfattah, MET, Mansoura, Egypt Dr. Rony Chatterjee, Microsoft Corporation, USA Dr. Indrajit Banerjee, Indian Institute of Engineering Science and Technology, Shibpur, India Dr. Tuhina Samanta, Indian Institute of Engineering Science and Technology, Shibpur, India Dr. Shibakali Gupta, University of Burdwan, Burdwan, India Dr. Subhamita Mukherjee, Techno Main, Salt Lake, India

RAAI 2020 Committee Members

ix

Prof. Piyal Sarkar, Techno Main, Salt Lake, India Dr. Sudip Ghosh, Indian Institute of Engineering Science and Technology, Shibpur, India Dr. Arijit Ghosal, St. Thomas College of Engineering and Technology, Kolkata, India Dr. Abhishek Bhattacharya, Institute of Engineering and Management, Kolkata, India Dr. Sachi Nandan Mohanty, KIIT University, India Dr. Anirban Das, University of Engineering and Management, Kolkata, India Dr. Tiya Dey Malakar, RCC Institute of Information Technology, Kolkata, India Dr. Arpita Ghosh, RCC Institute of Information Technology, Kolkata, India Dr. Papia Datta, RCC Institute of Information Technology, Kolkata, India Dr. Biswarup Neogi, JIS College of Engineering, India Dr. Tanupriya Chourdhury, University of Petroleum and Energy Studies, India Dr. Swarnendu Chakraborty, National Institute of Technology, Arunachal Pradesh, India Dr. Anilesh Dey, Narula Institute of Technology, Kolkata, India Dr. Soumi Dutta, Institute of Engineering and Management, Kolkata, India Dr. Debashis Mondal, RCC Institute of Information Technology, Kolkata, India Dr. Anup Kumar Kolya, RCC Institute of Information Technology, Kolkata, India Dr. Sutirtha Kumar Guha, Meghnad Saha Institute of Technology, Kolkata, India Dr. Baisakhi Das, Institute of Engineering and Management, Kolkata, India Dr. Shiladitya Pujari, University Institute of Technology, Burdwan, India Dr. Chandan Koner, Dr. B. C. Roy Engineering College, Durgapur, India Dr. Tanmay Bhattacharya, Techno Main Salt Lake, India Dr. Chandan Bhattacharyya, Sister Nivedita University, Kolkata, India Dr. Sangita Agarwal, RCC Institute of Information Technology, Kolkata, India Dr. Tathagata Deb, RCC Institute of Information Technology, Kolkata, India Sanjib Saha, Dr. B. C. Roy Engineering College, Durgapur, India Uddalak Chatterjee, BITM, Santiniketan, India Rabi Narayan Behera, Institute of Engineering and Management, Kolkata, India Sudipta Bhattacharya, Bengal Institute of Technology, Kolkata, India Anjan Bandyopadhyay, Amity University, Kolkata, India Anirban Bhar, Narula Institute of Technology, Kolkata, India Mousumi Bhattacharyya, Sister Nivedita University, Kolkata, India Sudeep Basu, North Bengal University, India Mithun Roy, Siliguri Institute of Technology, Siliguri, India Soumitra Sasmal, Techno Main Salt Lake, India

Institutional Advisory Chairs Dr. Minakshi Banerjee, RCC Institute of Information Technology, Kolkata, India Dr. Ashoke Mondal, RCC Institute of Information Technology, Kolkata, India Dr. Alok Kole, RCC Institute of Information Technology, Kolkata, India Dr. Abhishek Basu, RCC Institute of Information Technology, Kolkata, India

x

RAAI 2020 Committee Members

Dr. Soham Sarkar, RCC Institute of Information Technology, Kolkata, India Dr. Srijan Bhattacharyya, RCC Institute of Information Technology, Kolkata, India Dr. Arindam Mondal, RCC Institute of Information Technology, Kolkata, India Dr. Shilpi Bhattacharya, RCC Institute of Information Technology, Kolkata, India

Institutional Program Committee Arpan Deyasi, RCC Institute of Information Technology, Kolkata, India Soumen Mukherjee, RCC Institute of Information Technology, Kolkata, India Arup Kumar Bhattacharjee, RCC Institute of Information Technology, Kolkata, India Rajib Saha, RCC Institute of Information Technology, Kolkata, India Harinandan Tunga, RCC Institute of Information Technology, Kolkata, India Biswanath Chakraborty, RCC Institute of Information Technology, Kolkata, India Arijit Ghosh, RCC Institute of Information Technology, Kolkata, India Nitai Banerjee, RCC Institute of Information Technology, Kolkata, India Alokananda De, RCC Institute of Information Technology, Kolkata, India Satabdwi Sarkar, RCC Institute of Information Technology, Kolkata, India Pampa Debnath, RCC Institute of Information Technology, Kolkata, India Naiwrita Dey, RCC Institute of Information Technology, Kolkata, India Nandan Bhattacharyya, RCC Institute of Information Technology, Kolkata, India Budhaditya Biswas, RCC Institute of Information Technology, Kolkata, India Kalyan Biswas, RCC Institute of Information Technology, Kolkata, India Avishek Paul, RCC Institute of Information Technology, Kolkata, India Anindya Basu, RCC Institute of Information Technology, Kolkata, India Sk. Mazharul Islam, RCC Institute of Information Technology, Kolkata, India Koushik Mallick, RCC Institute of Information Technology, Kolkata, India Parama Bagchi, RCC Institute of Information Technology, Kolkata, India Priya Sen Purkait, RCC Institute of Information Technology, Kolkata, India Satarupa Chatterjee, RCC Institute of Information Technology, Kolkata, India Dr. Joyeeta Basu Pal, RCC Institute of Information Technology, Kolkata, India Moumita Banerjee, RCC Institute of Information Technology, Kolkata, India Deepam Ganguly, RCC Institute of Information Technology, Kolkata, India Subhrajit Sinha Roy, RCC Institute of Information Technology, Kolkata, India Sarbojit Mukherjee, RCC Institute of Information Technology, Kolkata, India Nijam Ud-Din Molla, RCC Institute of Information Technology, Kolkata, India

Organizing Chairs Hrishikesh Bhaumik, RCC Institute of Information Technology, Kolkata, India Abantika Choudhury, RCC Institute of Information Technology, Kolkata, India Ranjan Jana, RCC Institute of Information Technology, Kolkata, India

RAAI 2020 Committee Members

Jayanta Datta, RCC Institute of Information Technology, Kolkata, India Hiranmoy Roy, RCC Institute of Information Technology, Kolkata, India Soumyadip Dhar, RCC Institute of Information Technology, Kolkata, India Amit Khan, RCC Institute of Information Technology, Kolkata, India Pankaj Pal, RCC Institute of Information Technology, Kolkata, India Sudarsan Biswas, RCC Institute of Information Technology, Kolkata, India Shaswati Roy, RCC Institute of Information Technology, Kolkata, India

xi

Preface

Artificial intelligence is a leading theory of computer science now. Scientists across all engineering disciplines are interested in this concept. Artificial intelligence has opened the door for all research enthusiasts and application developers to propose new-age research and application concepts. The future of artificial intelligence enabled research and application is very promising. This book will discuss the recent research trends and upcoming applications based on artificial intelligence. Many of the versatile fields of artificial intelligence will be categorically addressed in different chapters of this volume. Over the years scientists have developed several efficient algorithms to address different real-world problems realistically and propose meaningful solution for them. However, the classical problem-solving algorithms often fall short of offering a robust solution to handle the multiple constraints encountered in real-life situations since these core methods are often uncertain and imprecise, which remain intractable to process in practice using the conventional classical methods. Moreover, with the progress of technology, the need for advanced computational techniques is always called for addressing the complex real-life problems. The objective of such computational paradigm is to give rise to fail-safe and robust solutions to the emerging problems faced by mankind. Imparting intelligence in a machine is the need of the hour. Several intelligent techniques have been in vogue over the year in this direction. Among these techniques, the soft computing techniques stand in good stead. However, it is often noted that the soft computing techniques often fall short in offering a formidable solution. On and above, if the different components of the soft computing paradigm are conjoined together, the resultant hybrid intelligent computing paradigm is found to be more efficient and robust by design and performance in these situations. This book aims to introduce to the prospective readers the latest trends in artificial intelligence with reference to both the classical and hybrid computational paradigms. The editors would like to take this opportunity to express their heartfelt regards to the Management of RCC Institute of Information Technology, Kolkata, and all the

xiii

xiv

Preface

committee members of RAAI 2020, especially the technical committee members who have critically reviewed all the articles. Special thanks to Mr. Aninda Bose, Senior Editor, Springer, India, for his constant support and guidance during this book project tenure. Kolkata, India January 2021

Indrajit Pan Anirban Mukherjee Vincenzo Piuri

Contents

Prediction and Analysis on COVID-19 Using Positive and Negative Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sujit Chakraborty, Sudarsan Biswas, and Sourav Debnath

1

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking for Secure Image Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . Arkadeep Dey, Portret Mallick, and Harinandan Tunga

13

Detecting Sexually Predatory Behavior on Open-Access Online Forums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yash Singla

27

Swarm-Based Sudoku Solution: An Optimization Procedure . . . . . . . . . . Sayak Haldar, Pritam Kumar Roy, and Sutirtha Kumar Guha

41

Application of Cellular Automata (CA) for Predicting Urban Growth and Disappearance of Vegetation and Waterbodies . . . . . . . . . . . . Debasrita Baidya, Abhijit Sarkar, Arpita Mondal, and Diptarshi Mitra

49

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture Text and English Humor Literature . . . . . . . . . . . . . . . . . . . . . . . . . Sourav Das and Anup Kumar Kolya

63

Sentiment Analysis of Covid-19 Tweets Using Evolutionary Classification-Based LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arunava Kumar Chakraborty, Sourav Das, and Anup Kumar Kolya

75

Clustering as a Brain-Network Detection Tool for Mental Imagery Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reshma Kar and Indronil Mazumder

87

Comparative Study of the Effect of Different Fitness Functions in PSO Algorithm on Band Selection of Hyperspectral Imagery . . . . . . . . 101 Aditi Roy Chowdhury, Joydev Hazra, Kousik Dasgupta, and Paramartha Dutta

xv

xvi

Contents

Breast Abnormality Detection Using Texture Feature Extracted by Difference-Based Variable-Size Local Filter (DVLF) . . . . . . . . . . . . . . . 111 Sourav Pramanik, Debotosh Bhattacharjee, and Mita Nasipuri Nuclei Image Boundary Detection Based on Interval Type-2 Fuzzy Set and Bat Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Soumyadip Dhar, Hiranmoy Roy, Rajib Saha, Parama Bagchi, and Bishal Ghosh Machine Learning Approach to Sentiment Analysis from Movie Reviews Using Word2Vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Amit Khan, Dipankar Majumdar, and Bikromadittya Mondal Selection of Edge Detection Techniques Based on Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Soumen Santra, Dipankar Majumdar, and Surajit Mandal ANN-Based Self-Tuned PID Controller for Temperature Control of Heat Exchanger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Godavarthi Charan, Dasa Sampath, K. Sandeep Rao, and Y. V. Pavan Kumar A Novel Partitioning Algorithm to Process Large-Scale Data . . . . . . . . . . 163 Indradeep Bhattacharya and Shibakali Gupta Segmentation of Blood Vessels, Optic Disc Localization, Detection of Exudates, and Diabetic Retinopathy Diagnosis from Digital Fundus Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Soham Basu, Sayantan Mukherjee, Ankit Bhattacharya, and Anindya Sen Interval Type-2 Fuzzy Framework for Healthcare Monitoring and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Uduak Umoh, Samuel Udoh, Abdultaofeek Abayomi, and Alimot Abdulazeez Real-time Social Distancing Monitoring and Detection of Face Mask to Control the Spread of COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Shreyas Mishra Emotion Recognition from Feature Mapping Between Two Different Lobes of Human Brain Using EEG . . . . . . . . . . . . . . . . . . . . . . . . . 203 Susmita Chaki, Anirban Mukherjee, and Subhajit Chatterjee Secured Diabetic Retinopathy Detection through Hard Exudates . . . . . . . 213 Subhrajit Sinha Roy, Abhishek Basu, and Avik Chattopadhyay SAR Image Change Detection Using Modified Gauss-Log Ratio Operator and Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Chanchal Ghosh, Dipankar Majumdar, and Bikromadittya Mondal Convolutional Neural Network-Based Visually Evoked EEG Classification Model on MindBigData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Nandini Kumari, Shamama Anwar, and Vandana Bhattacharjee

Contents

xvii

Design and Development of a Pipeline Inspection Robot for Visual Inspection and Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Abu Salman Shaikat, Molla Rashied Hussein, and Rumana Tasnim A Random Forest Classifier Combined with Missing Data Strategies for Predicting Chronic Kidney Disease Stages . . . . . . . . . . . . . . 255 João P. Scoralick, Gabriele C. Iwashima, Fernando A. B. Colugnati, Leonardo Goliatt, and Priscila V. S. Z. Capriles Automatic Cataract Detection Using Sobel and Morphological Dilation Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Akanksha Soni and Avinash Rai Context Based Searching in Cloud Data for Improved Precision of Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Viji Gopal and Varghese Paul Gaussian-Based Spatial FCM Technique for Interdisciplinary Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Srirupa Das COVID-19 India Forecast Preparedness for Potential Emergencies . . . . . 297 Narayana Darapaneni, Ankit Rastogi, Bhagyashri Bhosale, Subhash Bhamu, Turyansu Subhadarshy, Usha Aiyer, and Anwesh Reddy Paduri Prediction of Cyclodextrin Host-Guest Binding Through a Hybrid Support Vector Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Ruan M. Carvalho, Iago G. L. Rosa, Priscila V. Z. C. Goliatt, Diego E. B. Gomes, and Leonardo Goliatt Hybrid Unsupervised Extreme Learning Machine Applied to Facies Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Camila M. Saporetti, Iago G. L. Rosa, Ruan M. Carvalho, Egberto Pereira, and Leonardo G. da Fonseca Decision Tree-Based Classification Model to Predict Student Employability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Chandra Patro and Indrajit Pan Interactive and Intelligent Tutoring of Graphical Solutions . . . . . . . . . . . . 335 Prapty Chanda, Nilormi Das, Dishani Kar, Anirban Mukherjee, and Arindam Mondal Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

Editors and Contributors

About the Editors Indrajit Pan received his B.E. in Computer Science and Engineering with Honors from The University of Burdwan (2005) and M.Tech. in Information Technology from Bengal Engineering and Science University, Shibpur (2009). He was the recipient of University Medal in his Masters. He obtained Ph.D. in Engineering from Indian Institute of Engineering Science and Technology, Shibpur, in 2015. His current research interest includes community detection, cloud computing, influence theory, and digital microfluidic biochip. He joined RCC Institute of Information Technology in 2006 and is now Associate Professor and Head of the Information Technology Department. He has research publications in different international journals, edited books, and conference proceedings. He has also co-authored some edited research volumes and international conference proceedings. He served as the guest editor in International Journal of Hybrid Intelligence, Inderscience, and is currently an editorial board member of Elsevier’s Applied Soft Computing Journal. He is now a senior member of IEEE, USA, and a member of ACM, USA. Dr. Anirban Mukherjee did his Bachelors in Civil Engineering in 1994 from Jadavpur University, Kolkata. While in service, he achieved a professional Diploma in Operations Management (PGDOM) in 1998 and completed his Ph.D. on ‘Automatic Diagram Drawing based on Natural Language Text Understanding’ from Indian Institute of Engineering, Science and Technology (IIEST), Shibpur, in 2014. Serving RCC Institute of Information Technology (RCCIIT), Kolkata, since inception (in 1999), he is currently Professor in the Department of Information Technology. Before joining RCCIIT, he served as Engineer in the Scientific and Technical Application Group in erstwhile RCC, Calcutta, for 6 years. He has several international journal and conference papers to his credit. He is the co-editor of one IGI Global Book and also the co-editor of two books of CRC Press and Willey. He has reviewed several journal papers and book chapters and acted as the guest editor of Journal of Pattern Recognition Research and International Journal of Computers and Applications.

xix

xx

Editors and Contributors

Vincenzo Piuri has received his Ph.D. in Computer Engineering at Politecnico di Milano, Italy (1989). He is Full Professor in Computer Engineering at the Università degli Studi di Milano, Italy (since 2000). He has been Associate Professor at Politecnico di Milano, Italy, and Visiting Professor at the University of Texas at Austin and at George Mason University, USA. His main research interests are artificial intelligence, computational intelligence, intelligent systems, machine learning, pattern analysis and recognition, signal and image processing, biometrics, intelligent measurement systems, industrial applications, digital processing architectures, fault tolerance, and cloud computing infrastructures. Original results have been published in 400+ papers in international journals, proceedings of international conferences, books, and book chapters. He is Fellow of the IEEE, Distinguished Scientist of ACM and a senior member of INNS. He is President of the IEEE Systems Council (2020–2021) and has been IEEE Vice President for Technical Activities (2015), IEEE Director, President of the IEEE Computational Intelligence Society, Vice President for Education of the IEEE Biometrics Council, Vice President for Publications of the IEEE Instrumentation and Measurement Society and the IEEE Systems Council, and Vice President for Membership of the IEEE Computational Intelligence Society.

Contributors Abdultaofeek Abayomi Department of Computer Science, University of Uyo, Uyo, Akwa Ibom State, Nigeria Alimot Abdulazeez Department of Computer Science, University of Uyo, Uyo, Akwa Ibom State, Nigeria Usha Aiyer Great Learning, Bangalore, India Shamama Anwar Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India Parama Bagchi RCC Institute of Information Technology, Kolkata, India Debasrita Baidya Department of Geography, Kazi Nazrul University, Asansol, India Abhishek Basu Electronics and Communication Engineering, RCC Institute of Information Technology, Kolkata, India Soham Basu Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata, West Bengal, India Subhash Bhamu Great Learning, Bangalore, India Debotosh Bhattacharjee Jadavpur University, Kolkata, India Vandana Bhattacharjee Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India

Editors and Contributors

xxi

Ankit Bhattacharya Tata Consultancy Services, Kolkata, West Bengal, India Indradeep Bhattacharya University Institute of Technology, The University of Burdwan, Bardhaman, West Bengal, India Bhagyashri Bhosale Great Learning, Bangalore, India Sudarsan Biswas Department of Information Technology, RCC Institute of Information Technology, Kolkata, W.B, India Priscila V. S. Z. Capriles Federal University of Juiz de Fora, Juiz de Fora, Brazil Ruan M. Carvalho Computational Modeling Program, Federal University of Juiz de Fora (UFJF), São Pedro, Juiz de Fora, Minas Gerais, Brazil Susmita Chaki University of Engineering & Management, Kolkata, India Arunava Kumar Chakraborty Department of Computer Science & Engineering, RCC Institute of Information Technology, Kolkata, India Sujit Chakraborty Department of Information Technology, RCC Institute of Information Technology, Kolkata, W.B, India Prapty Chanda RCC Institute of Information Technology, Kolkata, West Bengal, India Godavarthi Charan School of Electronics Engineering, VIT-AP University, Amaravati, AP, India Subhajit Chatterjee University of Engineering & Management, Kolkata, India Avik Chattopadhyay Radio Physics and Electronics, University of Calcutta, Kolkata, India Aditi Roy Chowdhury Women’s Polytechnic, Kolkata, India Fernando A. B. Colugnati Federal University of Juiz de Fora, Juiz de Fora, Brazil Leonardo G. da Fonseca Computational Modeling Program, Federal University of Juiz de Fora, Juiz de Fora, MG, Brazil Narayana Darapaneni Northwestern University/Great Learning, Evanston, USA Nilormi Das RCC Institute of Information Technology, Kolkata, West Bengal, India Sourav Das Maulana Abul Kalam Azad University of Technology, WB, Kolkata, India Srirupa Das RCC Institute of Information Technology, Kolkata, India Kousik Dasgupta Kalyani Government Engineering College, Kalyani, Nadia, India Sourav Debnath Electrical Engineering Department, Camellia Institute of Technology, Kolkata, W.B, India

xxii

Editors and Contributors

Arkadeep Dey Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata, West Bengal, India Soumyadip Dhar RCC Institute of Information Technology, Kolkata, India Paramartha Dutta Visvabharati University, Santiniketan, India Bishal Ghosh RCC Institute of Information Technology, Kolkata, India Chanchal Ghosh Department of MCA, Calcutta Institute of Technology, Uluberia, Howrah, India Leonardo Goliatt Computational Modeling, Federal University of Juiz de Fora (UFJF), São Pedro, Juiz de Fora, Minas Gerais, Brazil Priscila V. Z. C. Goliatt Computational Modeling, Federal University of Juiz de Fora (UFJF), São Pedro, Juiz de Fora, Minas Gerais, Brazil Diego E. B. Gomes Computational Modeling, Federal University of Juiz de Fora (UFJF), São Pedro, Juiz de Fora, Minas Gerais, Brazil Viji Gopal School of Engineering, Cochin University of Science and Technology, Cochin, Kerala, India Sutirtha Kumar Guha Meghnad Saha Institute of Technology, Kolkata, West Bengal, India Shibakali Gupta University Institute of Technology, The University of Burdwan, Bardhaman, West Bengal, India Sayak Haldar Meghnad Saha Institute of Technology, Kolkata, West Bengal, India Joydev Hazra Heritage Institute of Technology, Kolkata, India Molla Rashied Hussein University of Asia Pacific, Dhaka, Bangladesh Gabriele C. Iwashima Federal University of Juiz de Fora, Juiz de Fora, Brazil Dishani Kar RCC Institute of Information Technology, Kolkata, West Bengal, India Reshma Kar Artificial Intelligence Laboratory, ETCE Department, Jadavpur University, Kolkata, India Amit Khan Department of IT, RCC Institute of Information Technology, Kolkata, India Anup Kumar Kolya Department of Computer Science & Engineering, RCC Institute of Information Technology, Kolkata, India Nandini Kumari Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India Dipankar Majumdar Department of CSE, RCC Institute of Information Technology, Kolkata, West Bengal, India

Editors and Contributors

xxiii

Portret Mallick Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata, West Bengal, India Surajit Mandal Department of ECE, B.P. Poddar Institute of Management & Technology, Kolkata, West Bengal, India Indronil Mazumder ECE Department, RCC Institute of Information Technology, Kolkata, India Shreyas Mishra National Institute of Technology, Rourkela, India Diptarshi Mitra Department of Geography, Kazi Nazrul University, Asansol, India; Salt Lake City, Kolkata, India Arindam Mondal RCC Institute of Information Technology, Kolkata, West Bengal, India Arpita Mondal Department of Geography, Kazi Nazrul University, Asansol, India Bikromadittya Mondal Department of CSE, B. P. Poddar Institute of Management and Technology, Kolkata, India Anirban Mukherjee RCC Institute of Information Technology, Kolkata, West Bengal, India Sayantan Mukherjee Tata Consultancy Services, Kolkata, West Bengal, India Mita Nasipuri Jadavpur University, Kolkata, India Indrajit Pan RCC Institute of Information Technology, Kolkata, West Bengal, India Chandra Patro RCC Institute of Information Technology, Kolkata, West Bengal, India Varghese Paul School of Engineering, Cochin University of Science and Technology, Cochin, Kerala, India Y. V. Pavan Kumar School of Electronics Engineering, VIT-AP University, Amaravati, AP, India Egberto Pereira State University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil Sourav Pramanik New Alipore College, Kolkata, India Avinash Rai Department of ECE, UIT (RGPV), Bhopal, M.P., India K. Sandeep Rao School of Electronics Engineering, VIT-AP University, Amaravati, AP, India Ankit Rastogi Great Learning, Bangalore, India Anwesh Reddy Paduri Great Learning, Bangalore, India

xxiv

Editors and Contributors

Iago G. L. Rosa Computational Modeling Program, Federal University of Juiz de Fora (UFJF), São Pedro, Juiz de Fora, Minas Gerais, Brazil Hiranmoy Roy RCC Institute of Information Technology, Kolkata, India Pritam Kumar Roy Meghnad Saha Institute of Technology, Kolkata, West Bengal, India Subhrajit Sinha Roy Electronics and Communication Engineering, RCC Institute of Information Technology, Kolkata, India; Radio Physics and Electronics, University of Calcutta, Kolkata, India Rajib Saha RCC Institute of Information Technology, Kolkata, India Dasa Sampath School of Electronics Engineering, VIT-AP University, Amaravati, AP, India Soumen Santra Department of MCA, Techno International New Town, Kolkata, West Bengal, India Camila M. Saporetti State University of Minas Gerais, Divinópolis, MG, Brazil Abhijit Sarkar Department of Geography, Kazi Nazrul University, Asansol, India João P. Scoralick Federal University of Juiz de Fora, Juiz de Fora, Brazil Anindya Sen Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata, West Bengal, India Abu Salman Shaikat World University of Bangladesh, Dhaka, Bangladesh Yash Singla Manav Rachna International Institute of Research and Studies, Faridabad, Haryana, India Akanksha Soni Department of ECE, UIT (RGPV), Bhopal, M.P., India Turyansu Subhadarshy Great Learning, Bangalore, India Rumana Tasnim World University of Bangladesh, Dhaka, Bangladesh Harinandan Tunga Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata, West Bengal, India Samuel Udoh Department of Information and Communication Technology, Mangosuthu University of Technology, Durban, South Africa Uduak Umoh Department of Computer Science, University of Uyo, Uyo, Akwa Ibom State, Nigeria

Prediction and Analysis on COVID-19 Using Positive and Negative Association Rule Mining Sujit Chakraborty, Sudarsan Biswas, and Sourav Debnath

Abstract The enormous complicated heterogeneous data can be processed easily using data mining. An attempt has been made to generate a pattern for COVID-19 disease, which can be beneficiated to detect as well as treat the affected patients. Consideration has been made to analyze and predict the most common as well as rare and hidden symptoms via applying both positive (interesting rules) and negative (uninteresting rule) data mining association rules. Thus, study has made on both frequent and infrequent itemsets of affected patients with remembering the risk levels at the pandemic situation worldwide. The extracted frequent and infrequent itemsets assist the medical professionals to make diagnostic recommendations and determine the riskiness of patients at an initial stage as test report has generated after few days of sample collection. Keywords Data mining · COVID-19 prediction · Medical data · Frequent itemsets · Association rule mining

S. Chakraborty (B) · S. Biswas Department of Information Technology, RCC Institute of Information Technology, Kolkata 700015, W.B, India e-mail: [email protected] S. Debnath Electrical Engineering Department, Camellia Institute of Technology, Kolkata 700129, W.B, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_1

1

2

S. Chakraborty et al.

1 Introduction In today’s world, COVID-19 is the preeminent human killer pandemic. In India, the cause of death from COVID-19 is co-morbidity in the maximum cases [1]. It claims millions of lives globally and causes thousands of diseases in India until these days. Common symptoms of COVID-19 are fever, tiredness and dry cough [2]. Other symptoms that are less familiar and may affect some patients include sore throat, aches and pains, headache, nasal congestion, loss of taste or smell, conjunctivitis, diarrhea or a rash on skin [3]. Some people who are COVID-19 positive only have very mild symptoms. Data mining is being used in the clinical field. “Association rule mining” is the most efficient data mining approach for extracting frequent itemsets from enormous datasets. The foremost objective of association rule mining is to bring out frequent itemsets from transactional database. However, “association rule mining” algorithms neglect many valuable infrequent itemsets. These frequent itemsets with low support can give rise to important negative association rules (high confidences). The problems aimed in this paper are extracting positive(+ve) and negative(-ve) association rules from the “frequent” and “infrequent” itemsets. Application of negative rule mining along with positive rule mining in medical field is to extract rare symptoms of a particular disease. Most research articles that have written on frequent itemsets from medical data to analyses HIV [4, 5], heart disease [6–8], cancer and tumor [9, 10], diabetes mellitus [11, 12] using apriori algorithm [13]. Little researcher work on negative association rule to find out rare features on various diseases [14–20]. There is very rare work on COVID-19 using positive(+ve) and negative(−ve) association rules, which is discussed with experiment in this paper.

2 Methodology This proposed method focuses on both positive(+ve) and negative(−ve) association rules from frequent and infrequent itemsets with considering sample text dataset of COVID-19 symptoms. Here, “minimum support” and “minimum confidence” values are taken from pre-processed dataset to generate frequent itemsets. Finally, positive(+ve) and negative(−ve) association rules are considered from “frequent” and “infrequent” itemsets utilizing lift is the last step as elaborated in Fig. 1.

2.1 Data Source and COVID-19 Symptoms Symptoms are simulated for COVID-19 dataset containing sample records of 1000 patients [21–24, 19, 20] have tabulated in Table 1.

Prediction and Analysis on COVID-19 Using Positive …

3

COVID-19 dataset

Pre-processed dataset M inimum support and confidence

Symptoms

Proposed method

Positive & negative association rule and possibility of affecting COVID-19

Frequent & Infrequent itemsets of symptoms

Lift

Fig. 1 Block diagram of methodology

Table 1 Symptoms for COVID-19 affected patients

Symptoms number

Symptom name Symptoms number

Symptom name

1

Shortness of breadth

Cough

10

2

Sneezing

11

Chilblains

3

Throat pain

12

Dizziness

4

Diarrhea

13

Fatigue

5

Vomiting

14

Fever

6

Chest pain

15

Body pain

7

Nausea

16

Changes in heart rhythm

8

Headache

17

Skin rashes or blood clots

9

Conjunctivitis

18

Loss of taste or smell

2.2 Algorithms All four algorithms are noted below.

4

2.2.1

S. Chakraborty et al.

Algorithm-1

Input: TRD- Covid-19 dataset; Min_support- “Minimum support” value; Min_confi- “Minimum confidence” Output: FREQ: “frequent” itemsets; inFREQ: “infrequent” itemsets; (1) initialization of FREQ = Φ; inFREQ = Φ; (2) tem1=∀ P | P ∈ Frequent symptoms (3) FREQ1 = {P | P ∈ tem1 and support (P) ≥ min_support}; (4) inFREQ1 = tem1 – FREQ1; z = 2; (5) while (temz−1 !=Φ) do started Dz = generation of (temz−1, min_support); for individual transaction tran ∈ TRD do started Dtran = subset (Dz, tran); for individual candidate d ∈ Dtran d.counter++; end of loop; d.supp = (d.counter/|TRD|); temz= {d | d ∈ dz and (d.supp ≥ minsupport)}; (6) FREQz = {P | P ∈ temz and P. supp ≥ min_support)}; (7) inFREQz = temz − FREQz; (8) FREQ = ∪z FREQz; (9) inFREQ = ∪z inFREQz; (10) z++; end; (14) return FREQ and inFREQ;

Description of Algorithm-1 The present algorithm, “patients” are expressed as “Transaction” and corresponding “symptoms” are represented with “itemsets.” Support is the indication of item that represents frequency of occurrence (itemsets). First of all, “FREQ” and “inFREQ” are initialized with null value. All itemsets having 1 candidate (symptom) is added to “tem1 ”.frequent symptoms with support count is greater than least support are assigned to “FREQ1 ”. “FREQ1 ” is subtracted from “tem1 ” to get infrequent itemsets having 1 candidate, which are assigned to “inFREQ1 ” and “z” is initialized by 2 (>1) where z is the size of itemsets (the number of symptoms for a person). While the value of “tempz-1 ” is not equal to null value, “tempz-1 ” is generated with minimum support and is stored in “Dz ”, which is candidate z-itemsets. Now database “TRD” is scanned for individual patient, which is represented by “trans”; “temp1 ” candidates are generated in transaction “trans”, now if itemsets exist in transaction then it is increased by 1.support of z number of itemset is calculated. An item d (symptom) is stored in tempz where support of symptom d is greater than or equal to minimum support. Frequent symptoms having support count greater than or equal to threshold

Prediction and Analysis on COVID-19 Using Positive …

5

support are added to “FREQz”. Infrequent itemsets are added to “inFREQz” by subtracting FREQz from tempz where support is smaller than least support. Now generated frequent symptoms are added to FREQ and infrequent itemsets (symptoms) are added to “inFREQ” with z number of items. Then itemset size is incremented by 1. Return the value of “FREQ” and “inFREQ”.

2.2.2

Algorithm 2

Input: minisupport: “minimum support”; minconfi: “minimum confidence”; FREQ (frequent itemsets); inFREQ (infrequent itemsets) Output: POSR: Positive(+ve) Association Rules; NEGR: Negative(-ve) Association Rules; (1) POSR = ;NEGR = ; (2) For individual itemset S in FREQ do started for individual itemset S1 ∪ S2 = S,S1 ∩ S2 = do started (3) if confidence (S1 S2) ≥ minconfi && lift (S1 S2 ) ≥ 1 then output of the rule (S1 S2); POSR ∪ (S1 S2 ) else (4) if confidence (S1 ~S2) ≥ minconfi && lift (S1 ~S2) ≥ 1 output of the rule (S1 ~S2);NEGR ∪ (S1 ~S2) if confidence (~S1 S2) ≥ minconfi && lift (~S1 S2 ) ≥ 1 S2);NEGR ∪ (~S1 S2) output of the rule (~S1 ~S2) ≥ minconfi && lift (~S1 ~S2) ≥ 1 if confidence (~S1 output of the rule (~S1 ~S2);NEGR ∪ (~S1 ~S2) end of loop; end of loop ; (5) For any itemset S in inFREQ do started For every itemset S1 ∪ S2 = S,S1 ∩ S2 = , support(S1) ≥ minisupport and support(S2) ≥ minisupport Do started S2) ≥ minconfi && lift(S1 S2 ) ≥ 1 (6) If confidence(S1 then output of the rule S1 S2 ; POSR ∪ (S1 S2) else if confidence (S1 ~S2) ≥ minconfi && lift (S1 ~S2) ≥ 1 output of the rule (S1 ~S2);NEGR ∪ (S1 ~S2) S2) ≥ minconfi && lift (~S1 S2 ) ≥ 1 if confidence (~S1 S2);NEGR ∪ (~S1 S2 ) output of the rule (~S1 if confidence (~S1 ~S2) ≥ minconfi && lift (~S1 ~S2) ≥ 1 ~S2);NEGR ∪ (~S1 ~S2) output of the rule (~S1 end of loop; end of loop; (7) return POSR and NEGR;

6

S. Chakraborty et al.

Description of Algorithm 2 “Confidence” is an indication of how often the rule has been found to be true and “Lift” is measurement of interestingness of a rule. At first, “PQSR” and “NEGR” are initialized by null value. After that, association rules are generated from FREQ (frequent itemsets). For an individual patient S, all symptoms belong to “FREQ”. If S1 and S2 are any two symptoms of patient S, rules are generated for S1 implies to S2 (S1 =>S2 ). If confidence and lift of (S1 =>S2 ) are higher than or equal to minimum value of confidence and value 1 respectively, then output is (+ve) rule. Else, if confidence and lift of (S1 =>~S2 ) are more than or equal to least confidence and value 1 respectively, then output is (−ve) rule. If confidence and lift of (~S1 => S2 ) are higher than or equal to least confidence and value 1 respectively, then output is (−ve) rule. If confidence and lift of (S1 =>~S2 ) are higher than or equal to minimum value of confidence and value 1 respectively, the output belongs to (−ve) rule. If confidence and lift of (~S1 =>~S2 ) are more than or equal to least confidence and value 1 respectively, then output is been (−ve) rule. Now association rules are generated from “inFREQ” or infrequent itemsets where an individual patient S and corresponding symptoms are S1 and S2 . Mentioned support S1 and S2 are higher than base support. Rules are generated for (S1 =>S2 ). If confidence and lift of (S1 => S2 ) are more than or equal to minimum threshold of confidence and value 1 respectively, then output is (+ve) rule. Else, if confidence and lift of (S1 => ~S2 ) are higher than or equal to minimum value of confidence and value 1 then output is (−ve) rule. If confidence and lift of (~S1 => S2 ) are more than or equals to minimum threshold of confidence and value 1 respectively, then output is (−ve) rule. If confidence and lift of (~S1 => ~S2 ) are higher than or equal to minimum confidence and value 1 respectively, then output is (−ve) rule. Return the value of “PQSR” and “NEGR”.

2.2.3

Algorithm 3

Given: support(S1 ∪ S2) ≥ minisupport (1)if confidence (S1 S2) ≥ minconfi, and lift (S1 S2) > 1 then S1 S2 is a effective positive rule, S1 and S2 are positively correlated with minimum confidence. else S2) < minconfi, and lift (S1 S2) < 1 (2) if confidence (S1 then S1 S2 is not a effective positive rule, S1 and S2 are negatively correlated with lower than minimum confidence. Hence, Negative association rules are generated from itemset S. ~S2) ≥ minconfi, and lift (S1 ⇒ ~S2) > 1 (3)if confidence (S1 then S1 ~S2 is a effective negative rule, S1 and S2 are positively correlated with minimum confidence.

Prediction and Analysis on COVID-19 Using Positive …

7

Description of Algorithm 3 Support of (S1 ∪ S2 ) is more than or equal to minimum support. If confidence of (S1 => S2 ) higher than or equal to minimum value of confidence, and lift of (S1 => S2 ) greater than 1 then S1 => S2 is an effective positive rule, S1 and S2 are positively correlated with minimum confidence. Else, if confidence and lift of (S1 => S2 ) less than minimum confidence and value 1 respectively, then S1 => S2 is not an effective positive rule, S1 and S2 are negatively correlated with lower than minimum confidence. Hence, negative association rules are generated from itemset S. If confidence of (S1 => ~S2 ) ≥ minimum value of confidence, and lift of (S1 = > ~S2 ) greater than 1 then S1 => ~S2 is an effective negative rule, S1 and S2 are positively correlated with minimum confidence.

2.2.4

Algorithm 4

Given: support(S1 ∪ S2)< minisupport, and support(S1 ∪ S2) != 0 support(S1) ≥ minisupport, and support(S2) ≥ minisupport, (1)if confidence(S1 ⇒ S2) ≥ minconfi, and lift(S1 ⇒ S2) > 1 then S1 ⇒ S2 is a effective positive rule, X and Y is positively correlated with minimum confidence else (2)if confidence (S1 ⇒ ~S2) ≥ minconfi and lift(S1⇒ ~S2) > 1 then S1 ⇒ ~S2 is a effective negative rule, S1 and ~S2 is positively correlated with minimum confidence.

Description of algorithm- 4 Here, Support of (S1 ∪ S2 ) less than minimum support, and support of (S1 ∪ S2) not equal to 0, minimum support of (S1 ) higher than or equal to least support, and support of (S2 ) more than equal to least support. If confidence of (S1 ⇒ S2 ) more than or equal to minimum threshold of confidence, and lift of (S1 ⇒ S2 ) more than 1 then S1 ⇒ S2 is an effective positive rule, S1 and S2 is positively correlated with minimum confidence. Else, if confidence of (S1 ⇒ ~S2 ) more than or equal to minimum value of confidence and lift of (S1 ⇒ ~S2 ) greater than 1 then S1 ⇒ ~S2 is an effective negative rule, S1 and ~S2 is positively correlated with minimum confidence.

8

S. Chakraborty et al.

3 Result and Discussions The generated itemsets are summarized in Table 2. It has been observed that frequent itemsets are decreased with improving the minsup value. A gradual increase of the infrequent itemsets has also noticed as shown in Fig. 2. All the four types of association rules have implemented and corresponding results are recorded in Tables 3, 4, 5 and 6. These results can also be visualized with Figs. 3, 4, 5 and 6 respectively. Prediction can be made easily for the risk factor of a COVID-19 patient from these datasets. Here, {Cough ⇒ Shortness of breath} means “Shortness of breath” is experienced by a patient who is suffering from “Cough” and { ~Throat pain-Headache ⇒ ~Cough} means if “Headache” experienced by a patient is not having “Throat pain”, then he may not have “Cough” with high confidence. Table 2 Generated numbers of frequent and infrequent items

“Support”

“Frequent” itemsets

“Infrequent” itemsets

0.1

14

21

0.15

10

25

0.25

8

27

0.3

6

29

0.4

4

31

40 Frequent Items

20

Infrequent Items

0 0.1

0.15

0.25

0.3

0.4

Fig. 2 Graphical representation of generated numbers of frequent and infrequent items

Table 3 [+ve] rules from frequent itemsets

Rules

“Support”

“Confidence”

“Lift”

{cough}{shortness of breath}

0.41

0.63

1.07

{shortness of breath}{fever}

0.39

0.67

1.15

{cough} {fever}

0.39

0.63

1.08

{body pain} {headache}

0.31

0.58

1.18

{cough} {body pain}

0.37

0.69

1.13

Prediction and Analysis on COVID-19 Using Positive …

9

Table 4 [−ve] rules from frequent itemsets Rules

“Support”

“ Confidence”

“Lift”

{Cough}{~fatigue}

0.5

0.92

1.29

{~sneezing}{cough}

0.42

1.00

1.47

{~dizziness}{fatigue}

0.67

0.89

1.21

Table 5 [+ve] rules from infrequent itemsets Rules

“Support”

“Confidence”

“Lift”

{headache}{throat pain}

0.20

0.92

2.21

{vomiting}{diarrhea}

0.15

0.93

2.75

{fever}{chilblain}

0.10

0.91

2.23

{loss of taste}{headache}

0.05

0.93

2.54

Table 6 [−ve] rules from infrequent itemsets Rules

“Support”

“Confidence”

“ Lift”

{conjunctivitis}{~chest pain}

0.34

0.94

1.67

{nausea}{~changes in heart rhythm}

0.32

0.97

1.94

{throat pain}{~chilblain}

0.30

0.97

1.56

1.5 1

Support

0.5

Confidence

0

Lift

Fig. 3 Graphical representation of Table 3

3 2

{headache}->{throat pain}

1

{vomiting}->{diarrhoea} {fever}->{chilblain}

0

support

confidence

Fig. 4 Graphical representation of Table 4

lift

10

S. Chakraborty et al. 3 2

1

support

0

confidence lift

Fig. 5 Graphical representation of Table 5

3

{conjunctivitis}->{~chest pain}

2

{nausea}->{~change in heart rhythm}

1 0

{throat pain}->{~chilblain} support

confidence

lift

Fig. 6 Graphical representation of Table 6

4 Conclusion Positive and negative association rule minings are beneficiated and can be used for subsequent prediction of diseases in all kinds of medical sectors. Using data mining a predictive model has generated successfully from retrospective data. The proposed method is very likely petitioned to any disease, not only COVID-19 but also any member of CORONA VIRUS family as well as other viruses, dataset to predict and analyses the exposure of patients on the basis of chosen attributes (symptoms). There are numerous data mining techniques are available to project the outbreak. Each technique has its own merits and demerits. The present research has investigated the prevalence of Corona virus worldwide depends on retrospective data and a predictive model has been presented. It is mentioned that this model is able to carry on successful prediction with an acceptable error limit.

References 1. 2. 3. 4.

Bedford, J., Enria, D.: COVID-19 towards controlling of a pandemic. The Lancet (2020) World Health Organization: Coronavirus disease 2019 (COVID-19) Situation Report-56 (2020) World Health Organization: Coronavirus disease 2019 (COVID-19) Situation Report-70 (2020) Tamene, F., Akmel, F., Birhanu, E., Siraj, B.: Pattern discovery and association analysis to identify customer vulnerable to hiv/aids. IOSR J. Comput. Eng. (IOSR-JCE) 19(4), Ver. II, 1–7 (2017) 5. Babu, D.S., Vani, K.S., Sravani, T.D.: Mining of association rules from HIV-1 protein data. Int. J. Eng. Res. Technol. (IJERT) 2(10), 4041–4047 (2013)

Prediction and Analysis on COVID-19 Using Positive …

11

6. Iliyaraja, M., Meyyappan, T.: Efficient data mining method to predict the risk of heart diseases through frequent itemsets. In: International Conference on Eco-friendly Computing and Communicating System (ICECCS), Procedia, Elsevier, vol. 70, pp. 586–592 (2015) 7. Said, I., Muhammad, J., Gupta, M.: Intelligent heart disease prediction system by applying apriori algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (IJARCSSE) (2015). Corpus ID: 212467393 8. Pouladian, M., Golpayegani, M.R.H., Tehrani-Fard, A.A., Nejad, M.B.: Noninvasive detection of coronary artery disease by arteriooscillography. IEEE Trans. Biomed. Eng. 52(4), 743–747 (2006) 9. Tang, J., Chuang, L., Hsi, E., Lin, Y., Yang, C., Chan, H.: Identifying the association rules between clinic pathologic factors and higher survival performance in operation-centric oral cancer patients using the apriori algorithm. BioMed Res. Int. (Hindawi Publishing Corporation) 2013, Article ID 359634, 1–7 (2013) 10. Majali, J.: Data mining techniques for diagnosis and prognosis of breast cancer. Int. J. Comput. Sci. Infor-Mation Technol. (IJCSIT) 5(5), 6487–6490 (2014) 11. Wu, H., Yang, S., Huang, Z., He, J., Wang, X.: Type 2 diabetes mellitus prediction model based on data mining. Elsevier 10, 100–107 (2018) 12. Lakshmi, K.S., Kumar, G.S.: Association rule extraction from medical transcripts of diabetic patients. In: The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), INSPEC Accession Number: 14301218, IEEE (2014) 13. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets generalizing association rules to correlations. In: Proc. ACM SIGMOD Conf., pp. 265–276 (1997) 14. Savasere, A., Omiecinski, E., Navathe, S.: Mining for Strong negative associations in a large data base of customer transactions. In: Proc. of ICDE, pp. 494–502 (1998) 15. Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Trans. Inf. Syst. 22(3), 381–405 (2004) 16. Mahmood, S., Shahbaz, M., Guergachi, A.: Negative and positive association rules mining from text using frequent and infrequent itemset. Sci. World J. (Hindawi Publishing Corporation) 2014, article ID 973750, 1–11 (2014) 17. Antoni, M.L., Zaïane, O.: Mining positive and negative association rules: an approach for confined rules. In: Knowledge Discovery in Databases, vol. 3202. Springer, pp. 27–38 (2004) 18. Sumalatha, R., Ramasubbareddy, B.: Mining positive and negative association rules. Int. J. Comput. Sci. Eng. (IJCSE) 2(9), 2916–2920 (2010) 19. Mei, X. et al.: Artificial intelligence-enabled rapid diagnosis of patients with COVID-19. Nat. Med. 26, 1224–1228 (2020) 20. Tomar, A., Gupta, N.: Prediction for the spread of COVID-19 in India and effectiveness of preventive measures. Sci. Total. Environ. (Elsevier) 728, 1–6 (2020) 21. WHO Homepage: https://www.who.int. Accessed 08 Dec 2020 22. COVID-19 Homepage: https://www.mygov.in/covid-19. Last accessed 8 Dec 2020 23. CDC Homepage: https://www.cdc.gov. Last accessed 8 Oct 2020 24. MAYO CLINIC Homepage: https://www.mayoclinic.org. Last accessed 8 Dec 2020

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking for Secure Image Transfer Arkadeep Dey, Portret Mallick, and Harinandan Tunga

Abstract The accelerated expansion of digitalized information in terms of multimedia has created the requirement for secure encryption enforcement techniques to protect data from any illegal manipulations. Digital image watermarking is one of the technologies which is developed to preserve any digital image from any strangers or 3rd party authorities. The aim of this research paper is to develop a new technique to encrypt digital images and decrypt it with higher PSNR value. This article proposes a new algorithm that combines three techniques (DWT, DCT, and RSA) to boost the performance and security of image watermarking. Our proposed algorithm is tested against a wide range of previously proposed algorithms and the results are reported. We have also tested this proposed algorithm using different type of image formats like png, jpeg, gif, tiff, etc. This study definitively answers the question regarding watermarking and encryption of digital images. Keywords DCT · Decryption · Digital watermarking · DWT · Encryption · Imperceptibility · LSB · PSNR · Public Key · Private Key · Robustness · RSA

1 Introduction Watermarking is a technique that provides copyright protection and authentication of information. And digital watermarking is one such procedure for inserting one digitalized multimedia into another multimedia object, which can be later extracted for various purposes including authentication and identification. The preexisting A. Dey (B) · P. Mallick · H. Tunga Department of Computer Science and Engineering, RCC Institute of Information Technology, Kolkata, West Bengal, India e-mail: [email protected] P. Mallick e-mail: [email protected] H. Tunga e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_2

13

14

A. Dey et al.

text encryption algorithms were not strong enough against unwanted tampering attacks like adding, erasing or reformatting of text sequence. Watermark developed by encrypting images, makes the digital information more strong and secure against illegal attacks. So, we have tried to develop an image watermarking technique which is more robust and secure in terms of creating and transferring a digital encrypted image. In the field of image watermarking, there are two types of watermarking techniques. One is visible watermarking where the encrypted image can be visible within the watermarked image content with open eyes. Another one is invisible watermarking which is just opposite of visible watermarking where viewer is unable to identify secret image within cover image but the secret image can be easily extracted by using some decoding techniques. In this article, we have achieved invisible watermarking with higher PSNR value. To embed the original image with watermark image, initially, we used LSB encoding technique but it was not that much secure and was easy for hackers to decrypt that digital data. So, we have modified that process with DWT (i.e., Discrete Wavelet Transform) algorithm and DCT (i.e., Discrete Cosine Transform) algorithm. And finally, we have combined these two techniques to get a better performance. So, we implemented hybrid DWT-DCT digital image watermarking algorithm. To confirm secure transmission of this digital watermarked image, we have implemented RSA encryption Algorithm. By RSA key generation algorithm, we have generated two keypairs. One is public keypair for the sender and other one is private keypair for the receiver. Public key is used to encrypt watermarked image into a binary file and the private key decrypt that file into a watermarked image. After using DCT-DWT decryption algorithm, we have got two separate images.1. Cover-image and 2. Secret-image. We have calculated the PSNR value of the images. Performance evaluation results show that our proposed algorithm (DWT+DCT+RSA) has increased the performance of the watermarking technique.

2 Literature Review Abdullah B., R. Ibrahim and Mohd. Najib B. Mohd Salleh proposed an LSB based watermarking algorithm which used the third and fourth bit of an 8-bit pixel for hiding the data instead of last bit [1]. But this technique was not robust at all. Prof. C. A. Dhote and S. P. Ingale come up with DWT-based approach which has provided both imperceptibility and robustness by choosing most appropriate sub band to watermark [2]. Discrete wavelet transform (DWT) is an image transformation technique for which the wavelets are discretely sampled. This transformation actually splits the signal into higher and lower frequency segments. In case of watermarking high-frequency segments are mostly utilized as the human eyes are not that much sensitive to the changes which occurs at the edges of the images (Fig. 1). For 2D DWT, we get four sub bands, i.e., 1 low-frequency (Low-Low) and 3 high-frequencies (High-High, Low-High, High-Low) bands.

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking …

15

Fig. 1 2-level DWT of image

Instead of using DWT, Prof. A. Saxena and M. Singh used DCT and generic algorithm for watermarking [3]. Their technique breaks down digital images into nonintersecting sections and transformation applied to each section. This transformation produces 3 sub bands, i.e., low-frequency, mid-frequency, and high-frequency sub band. This transformation technique is depended on two facts. The 1st one is that the low-frequency section with maximum of the energy signals establish the most important visual parts of digital image. The 2nd fact is that high-frequency segment of an image is mostly deleted by noise attack and compression. For this reason, encryption is done by changing the coefficient of the mid-frequency band [4, 4].

3 Proposed Hybrid DWT-DCT-RSA Algorithm The watermarking schemes proposed here are based on DWT-DCT-RSA combined technique that is known as Hybrid Algorithm where both the algorithms (DWT and DCT) are used while identifying the most appropriate sub band for encrypting in order to retain the image quality preferable. Further to provide secure transmission of that digital signal from sender to receiver, we modify this algorithm with RSA encryption.

3.1 Procedure for Encryption To watermark an image within an image, three data are required. These are cover image, secret image, and RSA generated public key. So, at first, the cover image is converted into fixed size grayscale image array. Then DWT is applied for decomposing the cover image array in four nonintersecting multi-resolution sub bands. Then the Low-Low (LL) sub band is to be divided into 8 * 8 segments. Then, DCT is applied to each segment in the LL sub band. An embedding function is used to add the secret image within the LL sub bands, i.e., each bit of secret image is to be added in the 7th bit position of each 8th pixel of cover image. We have tested on every

16

A. Dey et al.

Fig. 2 Block diagram of our proposed DWT-DCT-RSA encryption

bit and found out that the 7th bit gives maximum psnr value. Then inverse DCT is applied to every segment after modifying the mid band coefficients to encrypt the watermark bits as described in the previous steps. After that inverse DWT is applied on the DWT transformed image, including the modified sub band, to produce the watermarked host image array. Now using RSA algorithm, two keys, i.e., public key pair and private key pair pairs, have to be generated. Each and every pixel value of watermarked image is to be encrypted with public key. Then in a binary file has been generated the encrypted data. In the Fig. 2, the working principle of encryption is described (Fig. 3).

3.2 Procedure for Decryption For decryption, we have two data: one is encrypted binary file and another is previously generated private key pair. The encrypted binary file has to be decrypted with the help of RSA generated private key. Then the values have to be formatted in a fixed size watermarked image array. Then DWT is applied to break down the created watermarked image array into 4 nonintersecting multi-resolution sub bands. The LL sub band is divided in the form of 8 * 8 segments. After that, Discrete Cosine Transform is applied to every segment in the LL band and the mid band coefficient of every segment is extracted. After that, reconstruction of secret image is done with the help of watermark bits, extracted from the image previously. Now the resemblance between the extracted watermark and original image is calculated for performance analysis. In Fig. 4, the block diagram is described. We illustrate the decryption technique with the help of following diagram (Figs. 5 and 6).

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking …

Fig. 3 Our proposed DWT-DCT-RSA encryption mechanism

17

18

Fig. 4 Our proposed decrypting mechanism

A. Dey et al.

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking …

Fig. 5 Block diagram of our proposed DWT-DCT-RSA decryption

Fig. 6 Overall Use case diagram

19

20

A. Dey et al.

Fig.7 Cover images (first image) and secret images (last two)

Fig. 8 DWT-DCT-RSA watermarked images and decrypted secret images

4 Proposed Algorithms 4.1 Key generation Algorithm [5]

Algorithm: Key generation [10]

This algorithm generates public and private keys and save it in two separate files Input: A collection of large prime numbers between 2 to 100K as a text file format. Output: One public key pair and one private key pair Step 1. Take two random prime numbers in two variables named p and q. Step 2. Compute n, totient or phi (φ), i.e. n = (p x q) & totient φ =(p-1) x (q-1) Step 3. Take the value ‘e’, so that HCF of the value ‘e’ & totient ‘φ’ will be 1 and the value of e will be in between 1 to n Step 4. Compute d, in a way that (e x d = 1 mod φ) and the value of d lies between 1 and ‘φ’ , where e & d are inverses (mod φ). for calculate Euclidean HCF. Step 5. Make sure d (private key) is positive, if (x < 0) then d = x + φ else d = x Step 6. Write the key pairs in two different files – set values key pairs in list variable (i.e. public = [e, n] and private = [d, n]) then create two files as the public.txt and the private.txt and save those keys.

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking …

21

4.2 The DWT-DCT-RSA based Encryption Algorithm

Algorithm: Encryption of image

This is the main algorithm which takes two images and generate watermarked image using DCT-DWT. Then using RSA and public key it converts the watermarked image into binary file. Input: The original image (Secret image) array and a watermark image (Cover image) array and public key file Output: One encrypted image file. Step 1. Transform cover-image into a gray-scale image array and secret image into a grayscale image array with (1/16) th of the cover image size. Step 2. Two Dimensional(2D) DWT is applied on cover image array which return a tuple as [ca1(ch1, cv1, cd1)] and copy the ca1 (low - low frequency brand) in coefficient image array variable(coeffs_image[]). Step 3. Apply DCT on coeffs_image[0]. Step 4. Convert 2D watermark secret image into an 1D array data and store it. Step 5. Then encoding each value of that array in the 7th bit of every 8th pixel. Step 6. Apply inverse DCT method with the embed image array and store the coeffs_image[0] array. Step 7. Inverse DWT is applied on image array to get watermarked image array. Step 8. Open file public.txt to collect public keys (e, n). Step 9. Collect each value from image array and set the value into ‘plain_text’ variable. Step 10. If ‘plane_text’ is greater than ‘n’ then exception occurs due to 'plain text will be too large for encoding else encoding using RSA encoding function i.e. c = (me) mod n, where ‘m’ is ‘plane_text’. Step 11. Save the encoded output in the ‘output.txt’ file.

22

A. Dey et al.

4.3 The DWT-DCT-RSA based Decryption algorithm

Algorithm: Decryption of image

This is the algorithm for decryption. It takes an encrypted file and generate watermarked image by using DCT and DWT. It will return the secret image to the receiver. Input: Binary file and private key file Output: Original secret image Step 1. Open file ‘private.txt’ to collect public keys (d, n). Step 2. Open file ‘output.txt’ to collect encrypted data and set the values in ‘encrypted_text’ variable. Step 3. Now take every value ‘encrypted_text’ variable and decrypt it with RSA decryption function i.e. m = (cd) mod n, where m is ‘plane_text’, and c is encrypted text. Step 4. Now collect the image array from step 3. Step 5. After that DWT transformation is implemented on the image array and store it into coeffs_watermarked_image variable. Step 6. Then DCT is performed on the co-efficient array. Step 7. Now extract every 8th bit from the 8th pixel of that image to take out 1D bit sequence of secret image. Step 8. Convert 1D bit sequence into 2D image array format. Step 9. Take the image_array which is passed as parameter and bound every value between 0 to 255. Step 10. Convert that array into a specific type “uint8”. Step 11. Then convert the 2D array into image and save this final decrypted image.

5 Result and Performance Evaluation We have implemented the above-mentioned hybrid algorithm to encrypt and decrypt images using python language and finally produced a watermarked image as a result. The watermarked image quality is compared with the host image quality by using some performance factors. To measure the watermarked image quality, we have calculated the peak signal to noise ratio (PSNR) for both the images [6]. PSNR equation given below in decibels (dB). P S N R db = 10.log10

M AX 2 M AX = 20.log10 √ MSE MSE

where MAX=difference between input pixel value and output pixel value. The performance of Discrete Wavelet Transform (DWT) based encryption, Discrete Cosine Transform (DCT) based encryption, combined DWT-DCT based

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking …

23

encryption, and our proposed DWT-DCT-RSA hybrid encryption is evaluated by using two cover images, i.e., 4096 × 4096 sized image ‘Lena’ and ‘Baboon’, and two secret images ,i.e., 256 × 256 sized ‘foods’ and ‘Cameraman’ (Figs. 7 and 8). Performance of our proposed hybrid encryption technique is being described along with DWT, DCT, and DWT-DCT based encryption techniques. For comparison, we have also evaluated the watermarking performance when the High-High sub band of DWT is used. Although we have got lower performance in terms of PSNR by that approach. For High-Low sub band of DWT and Low–High sub band of DWT, we have also got similar result where the performance is not acceptable at all. To boost performance, we have used Low-Low sub band of DWT with another transformation technique DCT. We have also changed the bit position in watermark image to encrypt secret image’s bit sequence and calculated the final PSNR as mentioned in the Table 1. Then we concluded that when we have encrypted in the 7th bit position of every 8th pixel of Low-Low DWT sub band of host image, we have got the highest PSNR value which is clear from Fig. 9A. Now to provide secure transmission, we modify the algorithm with RSA encryption. But to decrypt the binary data into image file, the authorized recipient needs to know unique private keypair which makes this algorithm robust and secure. From Table 2 result set, it’s also clear that if we use host and secret image of tiff format, we will get the highest PSNR value. Table 1 PSNR between secret image and decrypted image for encrypting in different bit position of different dwt-sub band in host image DWT sub band

0th

1st

2nd

3rd

4th

5th

6th

7th

PSNR for LL sub band

53.31

65.9

65.97

66.08

61.66

66.01

66.05

66.15

PSNR for HH sub band

36.97

36.9

36.9

36.97

36.97

36.97

36.97

36.97

If we consider LL part of dwt and encrypt it in 7th bit, we get the highest psnr

70

LL SUB BAND

HH SUB BAND

80

60 60 PSNR VALUE

50 PSNR

40 30 20

20 0

10

1 2 3 4 5 6 7 8 9 10 SERIAL NUMBER OF DATA FROM TABLE III

0

DWT DWT-DCT

7th 6th 5th 4th 3rd 2nd 1st 0th bit posion

A

40

DCT DWT-DCT-RSA

B

Fig. 9 Comparative PSNR analysis from the data of Table I and Table III, respectively

24

A. Dey et al.

Table 2 PSNR between secret image and decrypted image for using different type of image format by our proposed algorithms Cover image

Secret image

PSNR achieved

Cover image Secret image

PSNR achieved

Lena.gif

Foods.tiff

62.788

Baboon.gif

Cameraman.tiff

66.15

Lena.gif

Foods.png

62.725

Baboon.gif

Cameraman.png 66.15

Lena.gif

Foods.jpg

56.78

Baboon.gif

Cameraman.jpg

65.00

Lena.gif

Foods.gif

56.125

Baboon.gif

Cameraman.gif

61.55

Lena.tiff

Foods.tiff

66.15

Baboon.tiff

Cameraman.tiff

66.28

Lena.tiff

Foods.png

66.11

Baboon.tiff

Cameraman.png 66.28

Lena.tiff

Foods.jpg

56.95

Baboon.tiff

Cameraman.jpg

65.07

Lena.tiff

Foods.gif

56.12

Baboon.tiff

Cameraman.gif

56.56

Lena.png

Foods.tiff

65.202

Baboon.png

Cameraman.tiff

66.28

boon.jpg

Cameraman.tiff

66.2

Baboon.png

Cameraman.png 66.28

Baboon.jpg

Cameraman.png

66.268

Baboon.png

Cameraman.jpg

65.06

Baboon.jpg

Cameraman.jpg

62.87

Baboon.png

Cameraman.gif

56.56

We have encrypted the secret image using DCT and DWT algorithm seperately. In Table 3, we have given the data of PSNR values for using different type of encrypting techniques. We have also graphically compared the PSNR in Fig. 9B using the experimental data of Table 3. As per Table 3 data, it’s clear that our proposed algorithm gives highest PSNR value. For the sake of comparison, in this paper, we have also portrayed the PSNR values achieved by implementing different types of algorithm and it’s clear that our proposed algorithm achieved highest PSNR value (approx. 66.28).

6 Conclusion In this article, we have implemented an algorithm which combines both Discrete Cosine Transform and Discrete Wavelets Transform with RSA encryption to watermark digital images. Both the techniques are previously applied in ‘n’ number of ways for digital watermarking, but our hybrid algorithm proposes highly effective technique in terms of performance with security. Combined transformation of DCT and DWT improves the performance when compared to the standalone DWT or DCTbased watermarking techniques, where the approach of RSA makes the digital image more secure from 3rd party access with the help of asymmetric encryption. So, we can say that, in this particular method: Combining with various transforms will have an effective impact on performance. However, our proposed algorithm has some limitations. As of now, we have tried to implement this algorithm to encrypt images in 8-bit grayscale format only. In future, we will try to encrypt digital images keeping the image format as it is. Also, we will apply machine learning (ML)-based algorithms

Hybrid Algorithm Based on DWT-DCT-RSA with Digital Watermarking …

25

Table 3 PSNR between secret image and decrypted image for using different type of techniques Serial No.

Image used

Technique used DWT

DCT

DWT-DCT

DWT-DCT-RSA

PSNR achieved 1

Lena.jpg and Foods.jpg

37.12

42.32

48.32

55.688

2

Lena.jpg and Foods.gif

31.27

45.48

51.16

56.85

3

Lena.gif and Foods.tiff

36.21

51.32

57.36

62.78

4

Lena.gif and Foods.gif

35.48

39.20

51.51

56.12

5

Lena.gif and Foods.png

35.48

39.21

52.36

56.82

6

Lena.jpg and Foods.png

37.26

41.45

58.78

65.60

7

Lena.gif and Foods.png

37.34

40.13

56.26

62.70

8

Lena.gif and Foods.gif

36.94

40.11

56.37

61.55

9

Lena.tiff and Foods.tiff

38.97

41.73

59.14

66.15

10

Lena.png and Foods.png

37.65

42.25

58.32

66.28

Table 4 Comparative PSNR Analysis between our proposed algorithm and previously proposed algorithm Algorithm

Our proposed Algorithm based algorithm on LSB, by (DWT-DCT-RSA) Abdullah Bamatraf [2]

Maximum PSNR 66.28 achieved

54.6

DWT and DCT based algorithm by Wang Na, Wang Yunjin, Li Xia [4]

Algorithm based on DCT and Generic Algorithm (GA) by M. Singh, and A. Saxena [6]

51.01

53.32

to identify the high-frequency pixels from the cover image to make this encryption process more efficient and imperceptible. We will also try to implement Quantum concept and many newer concepts to obtain more secure and robust watermarked image.

26

A. Dey et al.

References 1. Abdullah B., Ibrahim, R., Salleh, M.N.B.: Digital Watermarking Algorithm Using LSB. IEEE (2010). https://doi.org/10.1109/ICCAIE.2010.5735066. 2. Ingale, S.P., Dhote, C.A.: Digital watermarking algorithm using DWT technique. IJCSMC 5(5) (2016) 3. SINGH, M., SAXENA, A.: Image watermarking using discrete cosine transform [DCT] and genetic algorithm [GA]. IJIERM 04(03) Paper id-IJIERM-IV-II1275 (2017) 4. Na, W., Yunjin, W., Xia, L.: A Novel Robust Watermarking Algorithm based on DWT and DCT. IEEE (2009). https://doi.org/10.1109/CIS.2009.135 5. RSA Algorithm in Cryptography—GEEKSFORGEEKS. https://www.geeksforgeeks.org/rsaalgorithm-cryptography (2020). Last accessed Nov 28 2020 6. Al-Haj, A.: Combined DWT-DCT digital image watermarking. J. Comput. Sci. (2007). https:// doi.org/10.3844/jcssp.2007.740.746

Detecting Sexually Predatory Behavior on Open-Access Online Forums Yash Singla

Abstract Technological advancements have turned the world into a global village. Online communication channels have surpassed geographical boundaries and lessened the gap between us. At the same time, the malicious activity persisting in platforms such as online chat rooms cannot be overlooked. It creates an unsafe atmosphere particularly for children on the internet. The objective of this research is to enhance children’s safety by protecting them from social ills that make them vulnerable to sexual predators online. In this study, online conversations are examined using computational linguistics and statistical learning to identify sexually predatory behavior. The research is carried out with a special focus on contextual details, which identifies the sections of conversations that are unique to misbehaving users. The proposed algorithm detects potential misuse of chat rooms with a two-stage classification system and classifies them into three groups of potential levels of maliciousness. F1-score is used to measure the accuracy of this algorithm. Keywords Computational linguistics · Statistical learning · Sexually predatory behavior online

1 Introduction Virtual communication services and social networking platforms have made it easier than ever to summon those we love and care about with the click of a button. However, the world of online chat rooms is twisted into a loser’s lobby due to an alarmingly high number of perverts. The existence of sexual predators that enter into chat rooms or forums and try to convince children to provide some sexual favor is a socially worrying issue [1]. Examples of such acts are the online pedophiles who “groom” children, that is, who meet underage victims online, engage in sexually explicit text or video chat with them, and eventually convince the children to meet them in person Y. Singla (B) Manav Rachna International Institute of Research and Studies, Faridabad, Haryana 121004, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_3

27

28

Y. Singla

[2]. According to the research on Sexual-Orientated Online Chat Conversations— Characteristics and Testing Pathways of Online Perpetrators [3], there can be two types of perpetrators, Clint-type and Small talk type, who persuade the minor to engage in (online) sexual activities using a direct or indirect approach. Therefore, developing intelligent systems that can identify sexually predatory behavior online is a pressing priority. Some of the praise-worthy works surrounding the identification of sexually predatory behavior in online chat rooms include prediction using chained classifiers [4], examining large sets of chat logs based on psycholinguistics using learning-based approach [1], and exploring high-level features for detecting cyberpedophilia [5]. Spotting the pedophile telling victim from predator in text chats [6] and learning to identify sexual predation [7] are two other notable works worth mentioning. This research explores the dark undertow of online chat rooms that create an unsafe environment for children. Research is focused on analyzing the entire conversation and deciding if it contains sexual predatory behavior, as opposed to previous approaches, which focused on detecting predators. To detect the potential misuse of instant messaging in such an environment, an approach using deep-learning and statistical learning models coupled with feature extraction methods is proposed in this research. The contextual details are also accounted for while developing the approach. Besides that, the proposed algorithm used incorporates Word2Vec, Linear Discriminant Analysis, and AdaBoost.

2 Methodology The proposed algorithm’s input is parsed as a CSV file consisting of all conversation data. These data also include a conversational label that separates conversations showing predatory behavior (binary value 1) from those that do not (binary value 0).

2.1 Data Manipulation and Text Pre-processing In this study, preprocessing the data involves data cleaning methods such as removing conversations of less than three words, extra white spaces, HTML tags, links, and numerical characters. In addition, all the characters are converted to lowercase and spell-checked. The preprocessed text output consists of a conversation label along with the entire conversation in a flat file.

Detecting Sexually Predatory Behavior …

29

2.2 Vector Representation of Words Google’s Word2Vec.etc. is used to convert the textual data into a high-dimensional vector representation to quantify the information. The model is trained to reconstruct the linguistic context of words by placing the vectors of words used in the same context close to each other (Fig. 1). If dimensionality reduction would be performed on word embeddings, these semantic regularities are visualized in so-called low dimension word vector spaces. A good example of such vector space can be seen in Fig. 2. Fig. 1 Detailed overview of the deep learning model called Word2Vec. Image credits go to [8]

Fig. 2 Three-dimensional vector space, which contains the vectors “Man”, “Woman”, “King”, “Queen”. “Man” and “Woman” are related to each other the same way as “King” and “Queen” are related to each other, since the vectors in between them describe the male–female relationship. Photo Credit to [9]

30

Y. Singla

In order to train the Word2Vec model, Gensim9 python library’s implementation of Word2Vec was used. Gensim’s implementation requires the text corpus and size of word vectors as parameters. size = 400,workers = 20 and a text file containing a pre-processed conversation for each line were the parameters given for the trained model. size = 400 was chosen on an experimental basis, as 100, 200, 300, 400, and 500 were considered, but word vectors of size 400 appeared to work best with the classification models discussed later. The Word2Vec model created a vector representation of each unique word used somewhere within a conversation, thus a conversation can be represented as set of n word vectors, where n is the number of words present in that specific conversation.

2.3 Feature Extraction Using Word Embedding Aggregation This research lays a special focus on the exploration of contextual details within the conversation. For this purpose, feature extraction is implemented. A word vectorspecific technique called “word embedding aggregation” [10] is used to extract the features. In the data manipulation section, it was mentioned that each conversation is being concatenated into one very long sentence, which seemed like an odd choice for data manipulation. The real purpose of conversation concatenation is to set up conversations to be aggregated into “conversation embeddings”, using Word2Vec embeddings and De Boom et al.’s word embedding aggregated sentence embeddings. Based on De Boom et al.’s results, the algorithm makes use of coordinate-wise min and max functions applied to each word embedding within a conversation. This approach works well, assuming that “conversation embeddings” behave the same way as sentence embeddings. If the vectors for the n words in the conversation are v1, v2, . . . , vn ∈ Rd , then the computation of min(v1, v2, …, vn) and max(v1, v2, …, vn) is needed. What this means is that the coordinate-wise minimum vector and maximum vector of n word vectors within a conversation become two separate feature vectors. Finally, as suggested in De Boom et al.’s work, the concatenation of these two feature vectors results in obtaining a coordinate-wise min–max feature vector for feature extraction purposes. In this project, the word embeddings are 400 dimensional, so the concatenation of coordinate-wise minimum and maximum vector.

Detecting Sexually Predatory Behavior …

31

2.4 Two-Stage Classification System After the feature extraction is done, the 800-dimensional feature vectors were ready to be fed into various classification models. However, no model was able to minimize false positives and false negatives at the same time. Therefore, a two-stage classification system is designed for this project. First-Stage Classifier. The first classifier reduces the number of chats suggesting predatory behavior labeled otherwise. This first-stage classifier is trained to predict a conversation as containing or not containing predatory behavior. Using the first-stage classifier, the predicted groups can be interpreted the following ways: 1.

2.

Conversations predicted by the first-stage classifier as containing non-predatory behavior: these are conversations that most likely do not contain predatory behavior. Conversations predicted by the first-stage classifier as containing predatory behavior: these are conversations that need to be looked at again, by a secondary classifier for filtering out false positives.

A comparison of different first-stage classification models’ performance can be found in the Results section, but it is worth noting that the Linear Discriminant Analysis Classifier was chosen as the first-stage classifier, purely based on crossvalidated error rate and precision measurements. Second-Stage Classifier. The second-stage classifier filters through all conversations labeled by the first-stage classifier as containing possible predatory behavior, reduce the non-predatory chats labeled otherwise. Using the second-stage classifier, the predicted groups can be interpreted the following ways: 1.

2.

Conversations predicted by the second-stage classifier as containing nonpredatory behavior: these are conversations that possibly could contain predatory behavior. Conversations predicted by the second-stage classifier as containing predatory behavior: these are conversations that most likely contain predatory behavior.

A comparison of different second-stage classifier’s performance can be found in the Results section, but it is worth noting that the AdaBoost Classifier chosen as the second-stage classifier, purely based on cross-validated error rate, precision, and recall measurement. Overall, the two-stage classification system generates three groups, in increasing danger levels: conversations most likely not containing predatory behavior, conversations possibly containing predatory behavior and conversations most likely containing predatory behavior. This resulted from two groups found by the first-stage classifier, and then the second-stage classifier breaks down one of the first classifier’s groups into two separate groups. This results in a hierarchical relationship between the groups.

32

Y. Singla

3 Results 3.1 Sexual Predatory Conversation Identification Task A controlled case study was performed to analyze if a conversation contains predatory behavior instead of identifying a predator. The dataset used for this controlled case study was obtained from Inches and Crestani’s [11] work. The test set was chosen over the training set, since the two set’s ground truth labels are different, and only the test set’s labels matched the controlled case study’s needs. Inches and Crestani’s test set provides an xml file, containing 155,128 conversations, alongside author and line meta-data. A ground-truth label file is also given, containing conversations id’s and lines ids of those lines considered suspicious (of a perverted behavior) in a particular conversation. Given these data, the conversation xml file was processed into a csv file, with columns containing conversation id, line number, author, text and each row representing a line within a conversation. To get the data in the format described at the beginning of Sect. 3, the conversion of predatory line labels to conversation level labels was needed. In order to do so, each conversation’s line-level labels were checked, and a new label of 0 (non-predatory) or 1 (predatory) was assigned if any of the conversation’s lines contained predatory behavior. After these initial data processing steps, the whole algorithm described in Sect. 3 was executed. The sample size of the conversation’s dataset is 155128 observations, which is a large sample size. In order to speed up the computational process, a sample of 100,000 conversations was taken. In this random sample, 99,450 conversations had the label 0, meaning no predatory behavior has been identified within the conversation, and 550 conversations had the label 1, suggesting that predatory behavior is present within the conversation. A seed number of 2017 was used in order to create reproducible results. The final results of this controlled case study were the classification results obtained from the two-stage classification system. K-fold crossvalidation was used to assess the performance of various classification models on the feature vectors extracted from each conversation. k = 10 was chosen for the number of folds in the cross-validation process.

3.2 First-Stage Classification Results In the first-stage classification process LASSO, Linear Discriminant Analysis, Support Vector Machine, Random Forest, Bagging, Generalized boosted models have been tested out. The Random Forest model consistently outperformed the Bagging model, and since both of them as similar, the Bagging model was dropped. The two groups within the dataset are massively unbalanced, thus measuring the error rate of classifiers would be a mistake. Instead, recall and precision measurement for the predatory behavior labels, and F-scores are taken into consideration. In

Detecting Sexually Predatory Behavior … Table 1 Average recall and precision, and overall F1-score from the top five first-stage classification models

33

Classification model

Average recall

Average precision

F1-score

LDA

0.5223

0.9144

0.6648

SVM

0.7664

0.6585

0.7084

Random forest

0.8241

0.2982

0.4379

LASSO

0.6739

0.6492

0.6613

Gr. Boosting machine

0.8646

0.3018

0.4474

statistical analysis of binary classification, the F-score is a harmonic mean of precision and recall and it has a parameter, β. This parameter is just a constant that places more emphasis on either precision or recall. For the purposes of this binary classification task, a β of value 1 was chosen, which makes the F-scores, F1-scores, and it equally weights recall and precision, while punishing extreme recall or precision values. Table 1 contains the average precision and recall measurements from each fold within the cross-validation process, also the overall F1-score for each model. All results are color-coded based on cross-comparisons done between each model’s average recall, precision, and F1 scores. All values are percentage rates, and lower values within the table are red, while higher values are represented with the green color cells within the table. As previously established, the first-stage classifier needs to reduce the number of false negatives (predatory conversations labeled as non-predatory), therefore the model with the highest average precision value is the best first-stage classifier. Random Forest has a large recall average value of 0.8241, but its average precision is only 0.2982, thus it is not an appropriate first-stage classifier. Its F1-score is under 0.5, which makes it an ineffective classifier for the overall classification task. Nonetheless, the Random Forest model is good at detecting non-predatory conversations labeled as predatory. The Support Vector Machine (SVM) classifier has a lower average precision value (0.6585) than its average recall value (0.7664), therefore it does not minimize the number of false negatives, and it cannot be used as a first-stage classifier. Its F1-score is 0.7084, which is acceptable, but using this SVM classifier will result in balancing out recall with precision, without focusing on minimizing either, therefore, this model is not perfect for providing the best results. The LASSO model’s performance is very similar to SVM’s case, having an F1-score of 0.6613. LASSO has almost equal precision (0.6492) and recall (0.6739) values, having the same problem as the SVM model, not minimizing the number of false negatives, thus not being a good first-stage classifier. Also, worth noting that the Gradient Boosting Machine model’s performance is really similar to the performance of the Random Forest model. Both models are good at detecting non-predatory conversations labeled as predatory, average recall being 0.8646, but they both have really poor average precision performance (0.3018), thus they are ineffective at detecting predatory conversations accurately. Finally, Linear Discriminant Analysis (LDA) minimizes the number of false negatives, having the

34

Y. Singla

Fig. 3 Confusion matrix from the cross-validated LDA model results

largest average precision value at 0.9144, therefore, it was chosen as the First-Stage Classifier. The LDA classifier is very good at identifying conversations that contain predatory behavior, but it is very low recall rate of 0.5223 shows that this model cannot solve all the problems at once. The low recall rate created a concern over the large number of false positives, which triggered the addition of a second-stage classifier. The second-stage classifier should be focused on filtering out false positives, while not creating a large number of false negatives. Figure 3 shows the confusion matrix, after predicting for the left-out subset of observations during the cross-validation process for the LDA model. As shown in Fig. 3, the LDA model correctly classifies 503 out of 550 predatory conversation, out of a total of 100,000 conversations. This model minimizes the number of undetected predatory conversations, which makes this model so effective, with a large average precision value. However, if one looks at the F1-score, it is noticeably low, only 0.6648, which is understandable, since the average recall is only 0.5223. It can be concluded that the LDA model is only good at precision, and it is ineffective at detecting non-predatory conversations labeled as predatory, where it mis-classifies conversations about half of the time. Models like Random Forest and SVM are good at this specific task, so a second-stage classifier in addition to the LDA model would help.

3.3 Second-Stage Classification Results For the Second-Stage Classification process, the following classifiers have been trained: LASSO, Ridge Classifier, Naive Bayes Classifier, k-NN Classifier, Linear Discriminant Analysis, Support Vector Machine Classifier, Random Forest Classifier, Bagging Classifier, and AdaBoost Classifier. The second-stage classifier’s emphasis is on filtering through all conversations labeled by the first-stage classifier (LDA) as containing predatory behavior, reducing

Detecting Sexually Predatory Behavior … Table 2 Second-stage classification process, with cross-validated results from the top five models, and their average precision, recall measurements, alongside F1-scores

35

Classification model

Average recall

Average precision

F1-score

SVM

0.6934

0.8246

0.7533

Naïve Bayes

0.5858

0.9285

0.7184

LASSO

0.7442

0.7711

0.7574

AdaBoost

0.7767

0.8091

0.7926

k-NN

0.6279

0.8966

0.7385

the number of non-predatory chats labeled otherwise. Since the second-stage classifier only filters through flagged conversations, there is no need to classify all 100,000 conversations, only those labeled by the LDA model as predatory (961 conversations, from Fig. 3). Table 2 contains the five best models built for the second-stage classification process. The performance results from these five models are cross-validated and average precision and recall measurements from each fold within the cross-validation process are returned, alongside the overall F1-score for each model. All results are color-coded based on cross-comparisons done between each model’s average recall, precision, and F1 scores. The five best classifiers for the second-stage classification process were SVM, Naive Bayes, LASSO, k-NN, and AdaBoost. Looking at the F1-scores, all five models have F1-values between 0.7 and 0.8, therefore all five models are competitive at filtering out the mistakes that the LDA model does in the first-stage classification. The Naive Bayes and k-NN classifier’s performance is quite similar, by having similarly large average precision values, 0.9285 for Naive Bayes, and 0.8966 for the k-NN model, but they both lack performance in average recall, where the values are quite low, 0.5858 and 0.6279 respectively. Overall, both have over 0.7 F1-scores, but the conclusion is that both Naive Bayes and k-NN models are effective at precision, but not a recall, and that leaves quite a room for plenty of mis-classifications. A good second-stage classifier would need to be really effective at both precision and recall, since if the chosen classifier would be optimized to minimize just one precision or recall, a third-stage classifier would be needed. Through observing the SVM model, it has a 0.7533 F1-score, which is larger than the previous two models, and even the average precision is over 0.8, but the average recall value is just shy of 0.7. Unfortunately, the SVM model’s performance has more emphasis on precision, therefore choosing such a model as the second-stage classifier would result in having the same problem as with the Naive Bayes and k-NN classifier models. Looking at the LASSO model’s performance, it has an F1-score of 0.7574, which is better than all previous second-stage classifier models. Its average recall is 0.7442, and precision is 0.7711, which makes it an effective and recall and precision-wise balanced model. This LASSO model would be a good choice for the second-stage classifier, but there is one problem: the AdaBoost model outperforms the LASSO model. AdaBoost has an average recall of 0.7767 and average precision of 0.8091, making its F1-score 0.7926, just shy of 0.80. This model performed the best at filtering out mis-classifications that the first-stage classifier made, and it achieves

36

Y. Singla

Fig. 4 AdaBoost second-stage classification result’s confusion matrix

Table 3 Recall, precision, and F1 scores from first-stage classifier combined with each possible second-stage classifier

System of classifiers

Recall

Precision

F1-score

LDA → SVM

0.6928

0.7545

0.7224

LDA → Naive Bayes

0.5859

0.8491

0.6934

LDA → AdaBoost

0.7767

0.74

0.7579

LDA → LASSO

0.7433

0.7055

0.7239

LDA → k-NN

0.6273

0.82

0.7108

equally large and balanced performance on both recall and precision, making the second-stage classifier a complete system for accurate classification, without needing a third classifier. Figure 4 shows the confusion matrix, after predicting for the leftout subset of observations during the cross-validation process for the AdaBoost second-stage classifier model. As shown in Fig. 4, the AdaBoost model correctly classifies 407 out of 503 predatory conversations. It also only mis-classifies 117 out of 458 non-predatory conversations. This AdaBoost model is as effective as it can be at minimizing errors, having large values for both average precision and recall. It can be concluded that the AdaBoost model is effective at filtering out mis-classifications made by the LDA firststage classifier. The combination of LDA as the first-stage classifier and AdaBoost as the second-stage classifier answers one of the research questions asking “what kind of classification system and what statistical machine learning models can make a difference and predict whether or not a conversation contains sexual predatory behavior?” (Table 3).

Detecting Sexually Predatory Behavior …

37

3.4 Contextual Details In order to come up with a unique approach that detects predatory behavior in conversations, the algorithm is centered around detecting insight that lies within the contextual details of a conversation. More specifically, with the help of vector representation of words, and customized feature extraction, the vector representation of a whole conversation is obtained. These “conversation feature vectors” are composed of each conversation’s contextual details detected by the Word2Vec model, then carefully selected and aggregated by a feature extraction process. The “conversation feature vectors” obtained in such a manner are the essential input for classification models, which decide whether or not a conversation contains predatory behavior. These concepts answer the research question formulated as “how to extract semantic details from conversations, such that conversations containing malicious intent could be detected?” It is also worth noting that this approach considers the whole conversation as one large textual observation, without looking at discrepancies between each individual line in the conversation. To do so, the original labels from Inches and Crestani’s dataset need to be parsed. The original labels are predatory line labels, which means that within each conversation, each line is labeled as predatory or nonpredatory. The end goal of the label parsing process is to create conversation level labels, where each conversation contains a predatory/non-predatory label. This can be easily obtained by checking whether or not a predatory line is present within a conversation.

4 Analyzing the Classification System 4.1 First-Stage Classifier By studying the first-stage classifier, one would probably like to understand why a Linear Discriminant model does so much better than most other predictive models. Without going too much into depth about what a Linear Discriminant model is and how it works, one just needs to understand how does the model make decide if an observation xi belongs to class k = 0 or k = 1, where xi is an n dimensional vector, and class k is just a group within the data. Equation (1) shows the calculation of the discriminant function, δ k , which decides to what group does an observation belong to. There are multiple parameters for the discriminant function: μk is the mean of the group k, while π k is the variance of the group k, and π k is the prior class membership probability of a group k.

δ k (xi ) = xi .

μk 2

σ

μ2k

−

2

2σ

+ log(π k ).

(1)

38

Y. Singla

First, LDA uses some estimation method to estimate the mean μk and variance σ k of a group k, then it requires to know or estimate the prior class membership probability, π k . After the parameters have been estimated, the LDA classifier plugs the estimates for μk , σ k and π k into Eq. (1) and assigns an observation X = xi to the class for which δ k is the largest. One of the key parameters that can influence the discriminant function’s decision is π k , the prior class membership probability of a group k. During training of the LDA model, the training data influence this parameter π k by adjusting it to prior class weights of the training data, which is known, since the training data have labels. For the first-stage classifier, the LDA model was trained on 100,000 observations. Out of this training set, 99,450 observations belong to the non-predatory behavior group (k = 0), while 550 observations belong to the predatory behavior group (k = 1), thus π 0 = 0.99450, while π 1 = 0.00550. These π k values adjust the weighting on the probability that a new observation belongs to a class k, thus they had a massive influence on the performance and accuracy of the LDA model. As LDA became First-Stage Classifier, its performance was very good at identifying conversations that contain predatory behavior, but it was creating a concernedly large percentage of false positives, which in the end triggered the fitting of a second-stage classifier. About half of the conversations flagged as containing predatory behavior actually were non-predatory conversations being classified as predatory conversations. This means that too many “innocent conversations were mislabeled, which affected the classification system’s overall trustworthiness”. The classification system needed a filtering model, which focused on lowering the overall amount of false positives, and it was called the second-stage classifier.

4.2 Second-Stage Classifier In the second-stage classification AdaBoost performs well, better than similar treebased methods like Random Forest or Bagging, since its algorithm is a similar, but improved version of Bagging. Boosting works in a sequential manner, as each tree within the AdaBoost model is fitted on random subsets of the original training set, without the use of Bootstrapping, then finally the iterative models are added up to create a strong classifier. The LASSO model’s performance is accurate as both a first and second-stage classifier. It can be noted that the LASSO model’s performance ranked in the top five most accurate models for both first- and secondstage classification processes. Unfortunately, both times it was outperformed by other models, but nonetheless it is a great model, which is good for minimizing both types of errors to some level of accuracy. With F1-scores of 0.6613 and 0.7574 for the first and, respectively, second-stage classifier, one must wonder why such a model is worth taking into consideration for classification tasks. The explanation is simple: LASSO is supervised machine learning method, which is also known as a shrinkage and variable selection method for linear regression models. In this project, LASSO is applied on the “conversation feature vectors”, which are 800 dimensional.

Detecting Sexually Predatory Behavior …

39

One could only speculate which of those 800 dimensions are useful predictors for the Sexual Predatory Conversation Identification task, but the LASSO model can actually do so. By carefully choosing λ, the model’s penalty term, the model applies constraints on the original coefficients, which end up shrinking the coefficients of useless predictors to zero. Penalizing the useless predictors results in selecting the most important variables associated with the response variable. The advantage of using LASSO for this classification task is great variable selection, which provides greater prediction accuracy and better model interpretability.

5 Conclusion This research project aims to enhance children’s safety in online chat room environments by leveraging computational linguistics and statistical machine learning to detect sexually predatory behavior. The system is principally designed with an approach that can detect and classify online conversations as either containing or not containing sexual predatory behavior within the conversation. Upon examining 100,000 messy and unstructured online chat-room conversations, 550 malicious conversations were detected. The proposed algorithm uses models like Word2Vec, Linear Discriminant Analysis, and AdaBoost to detect potential predatory behavior with accuracy measured as a F1-score of 0.7579. The algorithm’s two stage classification system creates a three-group classification of conversations based on their uncertainty levels of maliciousness. Future work involves considering a deep learning approach for the classification system, by favoring prediction accuracy over model interpretability. The entire version of De Boom’s algorithm could also be used to create a better feature extractor by applying representation learning concepts combined with weighted word embedding aggregation.

References 1. Parapar, J., Losada, D.E., Barreiro, A.: A learning-based approach for the identification of sexual predators in chat logs. In: CLEF (Online Working Notes/Labs/Workshop), vol. 1178 (2012) 2. Villatoro-Tello, E., Juárez-González, A., Escalante, H.J., Montes-y-Gómez, M., Pineda, L.V.: A two-step approach for effective detection of misbehaving users in chats. In: CLEF (Online Working Notes/Labs/Workshop), vol. 1178 (2012) 3. Kleijn, M., Bogaerts, S.: Sexual-orientated online chat conversations—characteristics and testing pathways of online perpetrators. In: Sweetie 2.0, pp. 95–112. TMC Asser Press, The Hague (2019) 4. Escalante, H.J., Villatoro-Tello, E., Juárez, A., Montes, M., Villaseñor-Pineda, L.: Sexual predator detection in chats with chained classifiers. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 46–54 (2013)

40

Y. Singla

5. Bogdanova, D., Rosso, P., Solorio, T.: Exploring high-level features for detecting cyberpedophilia. Comput. Speech Lang. 28(1), 108–120 (2014) 6. Pendar, N.: Toward spotting the pedophile telling victim from predator in text chats. In: International Conference on Semantic Computing (ICSC 2007), pp. 235–241. IEEE (2007) 7. McGhee, I., Bayzick, J., Kontostathis, A., Edwards, L., McBride, A., Jakubowski, E.: Learning to identify internet sexual predation. Int. J. Electron. Commer. 15(3), 103–122 (2011) 8. Kevin, L.: Graphic representations of word2vec and doc2vec (2015) 9. NSS: An intuitive understanding of word embeddings: from count vectors to word2vec. Accessed 21 August 2017 10. De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016) 11. Inches, G., Crestani, F.: Overview of the international sexual predator identification competition at PAN-2012. In: CLEF (Online working notes/labs/workshop), vol. 30 (2012)

Swarm-Based Sudoku Solution: An Optimization Procedure Sayak Haldar, Pritam Kumar Roy, and Sutirtha Kumar Guha

Abstract This paper introduces Ant Traversing Method (ATM), an optimized searching method that is implemented to the typical 9X9 Sudoku puzzle problem. In ATM, a set of intelligent Ant called Ant Agents are introduced to find the optimal solution for Sudoku puzzle. Agents work independently to find the result. The final outcome is obtained based on the amount of pheromone deposited by the Ant Agent on the target cell of the matrix. Each Ant Agent starts their journey from each vacant cell of the Sudoku matrix. Agents deposit pheromone in the form of digit to the starting cell after completing the tour along the predefined path. Efficiency and optimality of ATM are examined by undergoing through experiments to evaluate the operation. It is found that the performance of ATM is satisfactory compared to the typical backtracking method for solving typical Sudoku puzzle. Keywords Sudoku Puzzle problem · Swarm Intelligence · Ant Colony Method · Agent · Ant Traversing · Puzzle Optimization

1 Introduction A Sudoku problem is a number-based puzzle problem. A typical Sudoku problem consists of a 9 × 9 matrix, having few cells filled by digits 1–9, and the rest of the cells are vacant. The objective is to fill the vacant cells in such a manner that there will be no row-wise or column-wise repetition of digits. A 9 × 9 matrix is subdivided

S. Haldar · P. K. Roy · S. K. Guha (B) Meghnad Saha Institute of Technology, Kolkata, West Bengal, India e-mail: [email protected] S. Haldar e-mail: [email protected] P. K. Roy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_4

41

42

S. Haldar et al.

into nine 3 × 3 sub-matrices. There should not be any repetition of digits in the submatrix too. Different variations of Sudoku based on the matrix size are introduced in different parts of the world. Backtracking method is used to solve a typical Sudoku puzzle. Backtracking approach is typically a brute force method. It visits the vacant cells of the matrices, fills them with some digits. If the filled digits are found invalid as per the rule of the puzzle then the digit will be discarded and another digit will be placed. A major drawback of this method is the time complexity. Ant colony optimization algorithm is a swarm intelligence method to solve the computational problem. This method is inspired by the food hunting and searching method of real ants. The pheromone-based method is used as an intelligent and automatic replacement of tracking procedure in the computational field. Ant behavior-inspired different algorithms are already implemented in different sectors by replacing the natural pheromone of ant by chemical or physical ways. Nature-inspired algorithms are implemented in puzzle-solving areas to reduce the time complexity by researchers. Traveling salesperson problem is solved by implementing Ant Colony Optimization method. Ants are considered as Agent. The shortest path from source to source after reaching all cities is found by the Agent [1]. Artificial Bee Colony algorithm is implemented in [2] to solve the Graph Coloring problem.

2 Related Work A variation of P system is introduced in [3] to solve the conventional Sudoku puzzle. Minigrid-based backtracking approach is proposed in [4] to get a better Sudoku solution in terms of efficiency. Authors propose a novel Sudoku solution approach in [5] by producing probable values column-wise instead of cell-wise computation. A multi-agent-based distributed framework has been proposed to implement Swarm Intelligence concept for solving graph search problem in [6]. Solution of graph coloring problem has been proposed by swarm intelligence in [7]. Different Sudoku solution methods and their performance have been evaluated in [8]. Performance and methodological evaluation of various Sudoku solution techniques are discussed in [9].

3 Proposed Work 3.1 Swarm-Based Sudoku Puzzle Solution ATM is introduced in this paper to find an optimal solution for a typical 9 × 9 Sudoku puzzle. This method is different from the typical backtracking-based Sudoku puzzle

Swarm-Based Sudoku Solution: An Optimization Procedure

43

Fig. 1 Implementation of ATM in a typical 9 × 9 Sudoku Puzzle

solution in three main aspects: (i) An optimized solution is yielded based on in-depth feasibility study performed by Ant Agent, (ii) Reduction of wastage of effort as no backtracking concept is followed, (iii) Minimum number of changes are made to the values of one cell with compare to backtracking method and (iv) A complete solution is proposed as pre-implemented survey is performed about the appropriate placement of the digits. Informally ATM works as follows: ‘n’ number of Ant Agents start their journey from the vacant cells, ‘n’ is the number of vacant cells that to be filled by the appropriate digits. Each Ant Agent is having nine types of pheromone, representing all digits from 1 to 9. Ant traverses along with all the cells of the corresponding row, column and sub-matrix. Ant deposit pheromones of the digits that are not found in the path from the starting cell to the end cell in the particular path into the starting cell are depicted in Fig. 1. Ant Agents start their journey from cells chosen according to some initialization rule (e.g. randomly or previously visited cell). Each agent starts a repetitive tour by applying some stochastic greedy approach for each empty cell of the Sudoku matrix. An agent changes the pheromone secretion mechanism based on the encountered digits by applying Pheromone Elimination process (PE). Once all cells of the predefined path of the agent are traversed, the selective digits in the form of pheromones will be deposited on the originating cell. Empty cells that are having a single probable digit will be filled first. Ant Agent starts the same tour for every empty cell to find the candidate digits in the form of pheromone. In contrary to the typical Ant system, Ant Agent has to follow a predefined fixed route for every traversal. Ant pheromone is used here to indicate the probable digits in each cell. A cell with a minimum amount of pheromone will be filled first as the number of candidate digits minimum hence minimal selection option will be there.

44

S. Haldar et al.

The Pheromone Elimination (PE) method is designed in such a manner that each time the Ant Agent is finding a digit in its traversing path, corresponding pheromone secretion will be closed unless it reaches its origin. The PE method is depicted in Algorithm 1. Algorithm 1: Pheromone_Elimination (ph[9], p) Input: Total Pheromone of Ant Agent Pheromone found in a cell Output: Pheromone available in the Ant Agent after traversing the allotted path Data structures used: ph[9]:= It is a 1-D array, used for storing the 9 types of pheromone representing digits from 1 to 9; p:= It is a variable, it stores the pheromone found in the corresponding cell of the Sudoku matrix; n:= Number of rows of the Sudoku matrix=Number of columns of the Sudoku matrix m[n][n]:= It is a 2-D array, used to represent a (nXn) Sudoku matrix Step 1: Start Step 2: For i:=1 to n For j:=1 to n repeat Step 3. Step 3: Repeat for k:=1 to 9 if (ph[k]=m[i][j]) then ph[k]:= DELETED Step 4: for i:=1 to n Display ph[i]. Step 6: Exit.

The ATM differs from the conventional ant colony process in three main aspects: (i) Each Ant Agent is carrying nine types of pheromone in the form of digit 1–9, (ii) Pheromone of the ant agent will be locked if it encounters the same pheromone in its traversed path and (iii) After traversing along three paths, Ant Agent will deposit unlocked pheromone onto the source cell.

Swarm-Based Sudoku Solution: An Optimization Procedure

45

Table 1 Time consumption of the ABC-GCP |V|

Hardness

ATM

Backtracking

T 28

T

Time

504

0.008

1482

0.008

29

1939

0.008

1939

0.008

29

1985

0.012

1985

0.008

29

583

0.006

796

0.007

30

0

0.006

713

0.007

137,212

0.085

138,156

0.085

23

1202

0.004

3207

0.010

24

575

0.007

79,609

0.061

25

529

0.008

882

0.008

26

11,603

0.015

13,418

0.016

22

Medium

Time

Hard

4 Experimental Result In this section, the ATM concept is implemented on Sudoku problems of different degrees of toughness. The result is compared with conventional backtracking method of Sudoku solving techniques in terms of time, a number of times value of one cell is overwritten. Our proposed method is implemented using C language on a Notebook PC with CPU 2.66 GHZ. Two types of Sudoku problems based on hardness are considered. The results are summarized and compared in Table 1. In Table 1, |V| denotes the number of clues in the puzzle and ‘T’ denotes the number of times values are changed. It is noted that Table 1 contains the average of 10 runs in each case. Comparative analysis of execution time between ATM and backtracking method for Sudoku solving is pictorially represented in Fig. 2. In a typical backtracking method, cell value would be changed many times compared to our proposed method as shown in Fig. 3.

5 Conclusion ATM method is proposed for solving typical Sudoku puzzle solving. We focus on the time consumption to solve the puzzle successfully and number of failed attempts to insert a digit in appropriate place. Ant Traversing Method (ATM) is introduced here to find out the most accurate candidate for an empty cell. Each Ant is represented as an Agent to dispense the appropriate pheromone in the form of digit at right place. In the future, it may be possible to reduce the number of Agents without hampering the efficiency of the process. A variation of the proposed method may be introduced

46

S. Haldar et al.

Fig. 2 Execution Time comparison between ATM and Backtracking

Fig. 3 Comparison of the number of times value of a cell is changed

to solve different puzzle-based real-time problems. It can be concluded that the proposed ATM would be beneficial for solving different computational problems.

Swarm-Based Sudoku Solution: An Optimization Procedure

47

References 1. Dorigo, M., Gambardella, L.M.: Ant colony system: a cooperative learning approach to the travelling Salesman problem. IEEE Trans. Evol. Comput. 1(1), 53–66 (1997) 2. Dorrigiv, M., Markib, H.Y.: Algorithms for the graph coloring problem based on swarm intelligence. In: The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), Fars, Iran (2012) 3. Diaz-Pernil, D., Fernandez-Marquez, C.M., Garcia-Quismondo, M., Gutierrez-Naranjo, M.A., Martinez-del-Amor, M.A.: Solving sudoku with membrane computing. In: IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications BIC-TA, Changsha, China, IEEE (2010) 4. Maji, A.K., Pal, R.K.: Sudoku solver using minigrid based backtracking. In: IEEE International Advance Computing Conference (IACC), Gurgaon, New Delhi, IEEE, India (2014) 5. Jana, S., Maji, A.K., Pal, R.K.: A novel Sudoku solving technique using column based permutation. In: International Symposium on Advanced Computing and Communication (ISACC), Silchar, India, IEEE (2015) 6. Ilie, S., Badica, C.: Multi-agent distributed framework for swarm intelligence. In: International Conference on Computational Science, Barcelona, Spain, Elsevier (2013) 7. Dorrigiv, M., Markib, H.Y.: Algorithms for the graph coloring problem based on Swarm intelligence. In: The 16th International Symposium Artificial Intelligence and Signal Processing (AISP 2012), Shiraz, Iran, IEEE (2012) 8. Maji, A.K., Jana, S., Roy, S., Pal, R.K.: An exhaustive study on different Sudoku solving techniques. Int. J. Comput. Sci 11(2), No. 1, 247–253 (2014) 9. Thenmozhi, M., Jain, P., Anand, R.S., Ram, B.S.: Analysis of Sudoku solving algorithms. Int. J. Eng. Technol. 9(3), 1745–1749 (2017)

Application of Cellular Automata (CA) for Predicting Urban Growth and Disappearance of Vegetation and Waterbodies Debasrita Baidya, Abhijit Sarkar, Arpita Mondal, and Diptarshi Mitra

Abstract For sustainable development and also urban planning, the data regarding the type and amount of urban growth, and disappearance of vegetation and waterbodies, because of urbanization, in the near/distant future, are absolutely necessary. This work endeavors to know, how effective (freely available) MOLUSCE (MOdules for Land USe Change Evaluation) plugin of QGIS is, for the prediction of the growth of Asansol city (in Eastern India), and the disappearance of vegetation and waterbodies, because of urbanization, in the area around the city of Rajarhat New Town (in Eastern India). The study considers the closeness to the Asansol Railway Station as the only factor governing the urban growth; and the closeness to the Sector-V Metro Station (adjacent to New Town) as the sole factor governing the disappearance of vegetation and waterbodies. MOLUSCE has used suitably classified Landsat images (Landsat images are freely available), and utilized Cellular Automata technique, with Logistic Regression method for transition potential modeling, to produce the prediction maps of the Asansol city, the vegetation around New Town, and the waterbodies around New Town. Classified Landsat images of the respective areas have been used to validate these prediction maps; the correctness is 75.44% for urban growth, 78.18% for disappearance of vegetation, and 85.73% for disappearance of waterbodies. However, there is not much similarity between the corresponding classified images and the prediction maps, because of the consideration of only two simple D. Baidya · A. Sarkar · A. Mondal · D. Mitra (B) Department of Geography, Kazi Nazrul University, Asansol 713340, India e-mail: [email protected] D. Baidya e-mail: [email protected] A. Sarkar e-mail: [email protected] A. Mondal e-mail: [email protected] Present Address: D. Mitra Salt Lake City 700064, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_5

49

50

D. Baidya et al.

factors for prediction. Nevertheless, the result shows the potential of MOLUSCE for land use change prediction. Keywords Urban growth · Disappearance of vegetation · Disappearance of waterbodies · Cellular automata · Logistic regression · MOLUSCE · Landsat · Asansol · Rajarhat New Town

1 Introduction Sustainable development is extremely important in the present scenario [1]. But, this is an era of rapid urbanization [2]. And, the growth of a city often leads to disappearance of vegetation and waterbodies in the pertinent area, among other things. However, for sustainable development, urbanization should be done without disturbing the ecological balance [3]. Hence, the data about the type and amount of urban growth, and disappearance of vegetation and waterbodies, because of urbanization, in the near/distant future, are absolutely necessary for sustainable development and also urban planning. The objective of this project is to predict the growth of the city of Asansol (in Eastern India), for the year 2020, and the disappearance of vegetation and waterbodies in the area around the city of Rajarhat New Town (in Eastern India), for the year 2019, using the classified images of the Asansol region for 1990 and 2005, and of the Rajarhat New Town area for 1999 and 2009, employing the Cellular Automata (CA) technique, and selecting the closeness to the Asansol Railway Station as the only factor for predicting the urban growth, and the closeness to the Sector-V Metro Station (beside Rajarhat New Town) as the sole factor for predicting the disappearance of vegetation and waterbodies. Cellular Automata can be thought of as an array of regular cells. At a particular time, a cell can be in a specific state. The cell can change its state depending on the states of the cells in its neighborhood. And, this change of state is governed by some transition rules. The process can be mathematically depicted using Eq. (1): {St+1 } = f({St } Ith )

(1)

where, {St+1 } = state of the cell at time t + 1, {St } = state of the cell at time t, {Ih t } = neighborhood of the cell, h = neighborhood size, t = time steps in temporal space, and f ≡ transition rules [4]. In this work, Cellular Automata has been implemented using the (freely available) MOLUSCE (MOdules for Land USe Change Evaluation) plugin of QGIS (which is an open source system), where (i.e., in the MOLUSCE plugin) Logistic Regression technique has been selected for modeling the potential of cells for transition to other states/same state.

Application of Cellular Automata (CA) ...

51

Logistic Regression is a Machine Learning technique which uses logistic function to calculate the probability of transition. Equation (2) shows the logistic function in mathematical terms: p=

1 1 + e−x

(2)

where, p = probability of transition, and x = input to the logistic function. For each cell, the values of p for all possible transitions are computed, and the transition with the highest value of p is allowed. Before starting this project, a brief literature survey was performed; however, no work dealing with the prediction of the growth of Asansol city or the disappearance of vegetation or waterbodies in the Rajarhat New Town area, using the MOLUSCE plugin, was found.

2 Methodology 2.1 Study Area The study area comprises the Asansol region and the area around Rajarhat New Town. The bounding coordinates of the region comprising the Asansol city and the surrounding areas, used in this work for studying the urban growth, are 23.63 °N, 23.73 °N, 86.87 °E, 87.05 °E. The bounding coordinates of the region comprising the Rajarhat New Town city and the surrounding areas, used for studying the disappearance of vegetation, are 22.52 °N, 22.69 °N, 88.35 °E, 88.58 °E. The bounding coordinates of the region comprising the Rajarhat New Town city and the surrounding areas, used for studying the disappearance of waterbodies, are 22.45 °N, 22.70 °N, 88.33 °E, 88.74 °E.

2.2 Data Used In this study, suitable Landsat-5 (Thematic Mapper) and Landsat-8 (Operational Land Imager) images have been used.

2.3 Method The classified images of the study area (i.e., the Asansol region and the area around Rajarhat New Town), for the relevant years (i.e., 1990 and 2005 for Asansol, and

52

D. Baidya et al.

1999 and 2009 for Rajarhat New Town), have been prepared from the corresponding (freely available) Landsat-5 images. For Rajarhat New Town area, two sets of classified images have been created: one needed for predicting the disappearance of vegetation, and the other required for predicting the disappearance of waterbodies. Each of these three sets of images contains only two classes: the urban area class and the other features class, or the vegetation class and the other features class, or the waterbody class and the other features class, as the case may be. Besides, maps showing concentric buffers of thicknesses: 1, 2, 3, 4, and 5 km, drawn around the Asansol Railway Station, concentric buffers of thicknesses: 1, 2, 3, 4, 5, and 6 km, drawn around the Sector-V Metro Station (meant for predicting the disappearance of vegetation), and concentric buffers of thicknesses: 1, 2, 3, 4, 5, 6, 7, and 8 km, drawn around the Sector-V Metro Station (meant for predicting the disappearance of waterbodies), have been prepared. This work considers the closeness to the Asansol Railway Station as the only factor governing the urban growth, and the closeness to the Sector-V Metro Station (beside New Town) as the sole factor governing the disappearance of vegetation and waterbodies. The classified images and the buffer maps have been provided as input to the MOLUSCE plugin (this plugin is to be opened in QGIS 2.x). The maps depicting the changes in the urban area/vegetation/waterbody class and the other features class, in the study area (i.e., the Asansol region and the Rajarhat New Town area), during the relevant periods (i.e., between 1990 and 2005 for Asansol, and between 1999 and 2009 for Rajarhat New Town), have been yielded by MOLUSCE as the intermediate output. Subsequently, the Logistic Regression method has been employed, and the prediction maps showing the urban area/vegetation/waterbodies, in the study area (i.e., the Asansol region and the Rajarhat New Town area), in the relevant years (i.e., 2020 for Asansol and 2019 for Rajarhat New Town), have been obtained from MOLUSCE as the final output. In the process of generation of prediction maps, the intermediate output (i.e., the maps depicting the changes) has been utilized; in fact, this intermediate output is the input ‘x’ in Eq. (2). Next, a classified image of the Asansol region for the year 2020, and two classified images of the Rajarhat New Town area for the year 2019, have been prepared (from the corresponding (freely available) Landsat-8 images), such that each image contains only two classes viz., urban area/vegetation/waterbody class and other features class. Subsequently, MOLUSCE has been employed to validate the prediction maps using the classified images of the study area for the year 2020/2019, and the percentages of correctness have been noted. Here, all image classifications have been done using the Maximum Likelihood Classification technique. The closeness to the Asansol Railway Station has been used as the only factor for prediction of urban growth, for no special reason. Actually, the Asansol Railway Station (established in the nineteenth century) is an important part of the Asansol city. Hence, it can be expected that, nearer a region (of Asansol city) is to the Asansol Railway Station, more will be the chances of urban growth there. Similarly, there is no particular reason behind choosing the closeness to the Sector-V Metro Station as the only factor for prediction of the disappearance of vegetation and waterbodies.

Application of Cellular Automata (CA) ...

53

Actually, the area where the Sector-V Metro Station is situated today, can be considered as the point from where the growth of the city of Rajarhat New Town started. So, it is understandable that, the farther an area is from the Sector-V Metro Station, the more is the possibility that vegetation can be found there, and the farther a waterbody is from the Sector-V Metro Station, the more is its chance of survival. It is quite likely that there are other factors which dictate urban growth in the Asansol region or the disappearance of vegetation or waterbodies in the Rajarhat New Town area; however, the process of incorporating them in this work demands a lot of time and effort. Thus, the focus of this project is to study how MOLUSCE performs with these two simple factors of prediction (i.e., the closeness to the Asansol Railway Station and the closeness to the Sector-V Metro Station). If the performance is satisfactory (i.e., at least some positive results are obtained), MOLUSCE can be expected to give better results when other factor/s is/are also used. The outline of the method, used here, is shown as a flowchart, in Fig. 1.

Fig. 1 Flowchart depicting the method

54

D. Baidya et al.

3 Results and Discussions The Landsat images and the corresponding classified images of the Asansol region, for 1990, 2005, and 2020, are shown in Fig. 2a–f. The image showing the concentric buffers (used for predicting the urban growth) around the Asansol Railway Station, is depicted in Fig. 3. It is interesting to note that some parts of the Asansol city experienced negative growth between 2005 and 2020 (as per Fig. 2d and f).

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 2 a Landsat image showing the Asansol region in 1990. b Classified image showing the Asansol region in 1990. c Landsat image showing the Asansol region in 2005. d Classified image showing the Asansol region in 2005. e Landsat image showing the Asansol region in 2020. f Classified image showing the Asansol region in 2020.

Application of Cellular Automata (CA) ...

55

Fig. 3 The concentric buffers around the Asansol Railway Station

Besides, it should be mentioned that, in the classified images of the Asansol region (Fig. 2b, d, and f), several non-urban features have been classified by ArcGIS as urban area, due to the similarities in signatures. The Landsat images and the corresponding classified images of the Rajarhat New Town region, for 1999, 2009, and 2019, are shown in Fig. 4a–f; these images have been used for studying the disappearance of vegetation. The image showing the concentric buffers (used for predicting the disappearance of vegetation) around the Sector-V Metro Station is depicted in Fig. 5. It is surprising to note that vegetation increased in some parts of the Rajarhat New Town area between 1999 and 2009 (as per Fig. 4b and d). The Landsat images and the corresponding classified images of the Rajarhat New Town region, for 1999, 2009, and 2019, are shown in Fig. 6a–f; these images have been used for studying the disappearance of waterbodies. The image showing the concentric buffers (used for predicting the disappearance of waterbodies) around the Sector-V Metro Station, is depicted in Fig. 7. It is astonishing to note that the area covered by the waterbodies increased significantly in the Rajarhat New Town region between 2009 and 2019 (as per Fig. 6d and f). The prediction map of the Asansol region, for the year 2020, the prediction map of the Rajarhat New Town area, used for studying the disappearance of vegetation, for the year 2019, and the prediction map of the Rajarhat New Town area, used for studying the disappearance of waterbodies, for the year 2019, are, respectively, shown in Figs. 8, 9, and 10. It should be admitted that there are significant differences between the prediction map of the Asansol region for the year 2020 (Fig. 8) and the classified image of this region for the same year (Fig. 2f), between the prediction map (focusing on

56

D. Baidya et al.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 4 a Landsat image showing the Rajarhat New Town area in 1999. b Classified image showing the Rajarhat New Town area in 1999. c Landsat image showing the Rajarhat New Town area in 2009. d Classified image showing the Rajarhat New Town area in 2009. e Landsat image showing the Rajarhat New Town area in 2019. f Classified image showing the Rajarhat New Town area in 2019.

vegetation) of the Rajarhat New Town area for the year 2019 (Fig. 9) and the classified image of this region for the same year (Fig. 4f), and between the prediction map (focusing on waterbodies) of the Rajarhat New Town area for the year 2019 (Fig. 10) and the classified image of this region for the same year (Fig. 6f). This is most probably because of the fact that only two simple factors (viz., closeness to the Asansol Railway Station and closeness to the Sector-V Metro Station) have been used for prediction.

Application of Cellular Automata (CA) ...

57

Fig. 5 The concentric buffers around the Sector-V Metro Station

Besides, it must be acknowledged that the impression of the concentric buffers is visible in two prediction maps (Figs. 8 and 9). Nevertheless, after validation (using MOLUSCE and employing pertinent classified images), the percentage of correctness has been found to be 75.44% for the prediction map for urban growth, 78.18% for the prediction map for disappearance of vegetation, and 85.73% for the prediction map for disappearance of waterbodies. If factor/s, other than the ones used here, were also considered for prediction, more accurate results could have been possibly obtained. Thus, this study indicates that MOLUSCE (which is freely available) has the capability of predicting land use changes satisfactorily, if suitable input data are available. Hence, MOLUSCE can be a suitable alternative to the costly and proprietary software systems like TerrSet (formerly IDRISI), with regard to land use change prediction. Besides, an accurate prediction map showing urban growth, or disappearance of vegetation or waterbodies, because of urbanization, in the near/distant future, which this work has attempted to produce, is expected to help the pertinent authority in taking appropriate steps towards sustainable development and/or urban planning.

4 Conclusions In this study, freely available Landsat data and freely obtainable MOLUSCE plugin (of QGIS which is an open source system) have been used for predicting urban growth and disappearance of vegetation and waterbodies; accurate prediction maps are expected to be helpful for sustainable development and/or urban planning. And, the outcome indicates that the MOLUSCE plugin can be a suitable alternative to the costly software systems like TerrSet (formerly IDRISI), with regard to land use change prediction.

58

D. Baidya et al.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 6 a Landsat image showing the Rajarhat New Town area in 1999. b Classified image showing the Rajarhat New Town area in 1999. c Landsat image showing the Rajarhat New Town area in 2009. d Classified image showing the Rajarhat New Town area in 2009. e Landsat image showing the Rajarhat New Town area in 2019. f Classified image showing the Rajarhat New Town area in 2019.

Application of Cellular Automata (CA) ...

Fig. 7 The concentric buffers around the Sector-V Metro Station

Fig. 8 Prediction map (showing the urban area) for the year 2020

59

60

D. Baidya et al.

Fig. 9 Prediction map (showing vegetation) for the year 2019

Fig. 10 Prediction map (showing the waterbodies) for the year 2019

Acknowledgements The authors are indebted to Dr. Paramita Roychowdhury (Head of the Department of Geography, Kazi Nazrul University) and the other faculty members of the Department of Geography, Kazi Nazrul University, for their help, encouragement, and support with regard to this work. Also, the authors are thankful to Dr. Asim Ratan Ghosh, Senior Scientist, Department of Science and Technology and Biotechnology, Government of West Bengal, for his help, support, and cooperation in connection with this study.

Application of Cellular Automata (CA) ...

61

References 1. Keiner, M. (ed.): The future of sustainability (2006) 2. Bodo, T.: Rapid urbanisation: theories, causes, consequences and coping strategies. Ann. Geogr. Stud. 2(3), 32–45 (2019) 3. Camhis, M.: Sustainable development and urbanization. In: Keiner, M. (ed.) The future of sustainability, pp. 69–98. Springer, Dordrecht (2006) 4. Maithani, S.: Application of cellular automata and gis techniques in urban growth modelling: a new perspective. Inst. Town Plan. India J. 7(1), 36–49 (2010)

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture Text and English Humor Literature Sourav Das

and Anup Kumar Kolya

Abstract Sarcasm is a sophisticated waya of wrapping any immanent truth, message, or even mockery within a hilarious manner. The advent of communications using social networks has mass-produced new avenues of socialization. It can be further said that humor, irony, sarcasm, and wit are the four chariots of being socially funny in the modern days. In this paper, we manually extract the sarcastic word distribution features of a benchmark pop culture sarcasm corpus, containing sarcastic dialogues and monologues. We generate input sequences formed of the weighted vectors from such words. We further propose an amalgamation of four parallel deep long-short term networks (pLSTM), each with distinctive activation classifier. These modules are primarily aimed at successfully detecting sarcasm from the text corpus. Our proposed model for detecting sarcasm peaks a training accuracy of 98.95% when trained with the discussed dataset. Consecutively, it obtains the highest of 98.31% overall validation accuracy on two handpicked Project Gutenberg English humor literature among all the test cases. Our approach transcends previous state-of-the-art works on several sarcasm corpora and results in a new gold standard performance for sarcasm detection. Keywords Sarcasm Detection · Pop Culture Sarcasm · English Humor Literature · Parallel LSTM

1 Introduction Sarcasm used in any language relies heavily on the context of the subjectivity being discussed. Thus, it can be delivered without showing the signs of any expressions whatsoever, as keeping a straight face, a smirk, or even with laughter. It is a challenge S. Das (B) Maulana Abul Kalam Azad University of Technology, WB, Salt Lake, Kolkata 700064, India e-mail: [email protected] A. K. Kolya Dept. of Computer Science, RCC Institute of Information Technology, Kolkata 700015, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_6

63

64

S. Das and A. K. Kolya

to extract the actual underlying meaning of a sarcastic statement, rather than finding the straight targeted sentiment polarity. However, when an independent sarcastic text corpus is cultured, we may not have to consider the whole surroundings and context, but with the sarcastic texts only. Most of the research work in this area is conducted with binary classification tasks and empirical details to detect any text or phrase that is sarcastic or not [1]. Also, the accuracy of neural networks deployed for sarcasm detection tasks is dependent on the context- or sentence-level attention covering an entire sarcasm corpus [2, 3]. Presently, deep learning networks are represented as the computational combinations of weighted vectors through input neurons. Hence the training can improve their ability to reproduce the weightage combinations over time, enhancing the overall learnability and detection capability of sarcasm, but often fails to accumulate the subjectivity. In this work, we aim to accurately identify the complex sarcasm within spoken dialogues translated to texts. At first, we take up a sarcasm text corpus for reading and manually tokenizing the file for lexical occurrences indexing within the file. Second, we channelize the weighted vectors gained from the dataset and combine our LSTM networks by feeding the input with such vectors from the token indices. We introduce a simple yet robust set of four parallel long short-term networks (pLSTM) with dense hidden layers, while each set consisting of a different activation function. Finally, the proposed network is evaluated against a handpicked number of open-source humor and sarcasm text corpora. For all the generic validation cases, our cross-activation classifier incorporated parallel long-short term networks to achieve a better test accuracy of 98.31% when compared with other diverse data and models for a similar task.The rest of the sections are organized as follows: In Sect. 2, some recent works are mentioned in a similar domain. In Sect. 3, we discuss the attributes of the dataset used, with the feature extraction for sarcastic words. In Sect. 4, we introduce and formulate our proposed parallel long-short term networks. We state results and analysis regarding validation and classification in Sect. 5. In Sect. 6, the performance of our approach is compared against several standard humor and sarcasm datasets. Section 7 consists of a discussion on a few pivotal points from our work. Finally, we consider the future extensions of the work and conclude in Sect. 8.

2 Recent Works Recent trends in deep neural networks have opened up diversified applications in sarcasm detection. In accordance, a group of researchers proposed sentiment classification and sarcasm detection as the correlated parts of a common linguistic challenge [4]. For their said approach, they chalked out multiple task-based learning systems, respectively, the sentiment and sarcasm classification tags. They combined the gated network with a fully connected layer and softmax activation and observed that their proposed classifier utilizes the sentiment shift to detect sarcasm better. Kumar et al. exploited attention-based bidirectional and convolutional networks for sarcasm detection from the benchmark dataset [5]. They selected an already

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture …

65

developed sarcastic tweets dataset and randomly streamed tweets on sarcasm. They introduced an attention layer constructed within an LSTM network, embedding the softmax activation function embedded into the backpropagation property. The backpropagation-based feedback helps to identify and differentiate tweets from each other.Sundararajan and Palanisamy classified sarcasm detection in different genres, naming polite, rude, raging types of sarcasm [6]. Instead of defining the straight polarity of a sarcastic expression, they extracted the mixed emotions associated with the sarcasm itself. They fragmented several feature parameters of the input tweets and ensembled them for a better semantic understanding of the tweets. Finally, they applied a rule-based classifier to eliminate the fuzziness of the inputs.

3 Corpus To train our model, we select the text corpus from the MUStARD or Multimodal Sarcasm Detection Dataset [7]. As it is explicit from the corpus name, the dataset is comprised of multimodal aspects of sarcastic expressions, i.e., videos, audios, and text corpus accumulated from the dialogues. The data contain a collection of 6000 videos from several popular TV comedy shows. Following that, the utterance of videos was manually annotated, defining in which context the characters delivered sarcastic dialogues. Furthermore, from the entire range of their video repository, they selected only above 600 videos as a blend of balanced sarcastic and non-sarcastic labeled. From the multimodality parameters of dialogues’ utterance, only the transcription or textual modality is selected. The researchers’ overall character–label ratio and distribution help us to visualize which characters have the most utterance of words throughout the dialogues gathered. Also, what part of dialogue contributions the characters have in building the corpus. The final proposed textual corpus consists of 690 lines of individual line indexed dialogues. The highest dialogue per character cumulation reaches a maximum of up to uninterrupted 65 words, whereas the minimum dialogue ranging is 7 words. It symbolizes that the corpus has well consisted of both monologues and dialogues. To represent the contextual text utterance, the researchers represented sentence-level utterance features from BERT [8]. Finally, the dataset showcases the occurrences of sarcastic word utterances belong to the respective sentences, alongside the first token average for the sentences.

3.1 Sarcastic Words Distribution As the entire corpus highly consists of sarcastic statements, we do not partition balanced or imbalanced data allocations further. We primarily focus to extract the frequency of the hundred most recurring words in dialogues. Since it is not a plain documented text from literature, rather consists of sarcastic dialogues, it can be inferred that each time these words occur within a dialogue or monologue, is intended

66

S. Das and A. K. Kolya

Table 1 A few of the most frequent words with their frequency span within the corpus

Words

Frequency

Distribution (%)

‘Oh’

74

8.21

‘Know’

49

7.90

‘Like’

46

7.78

‘Yeah’

43

7.66

‘Well’

39

7.52

‘Go’

37

7.50

‘Right’

34

7.47

‘Think’

32

7.46

‘Really’

30

7.44

for saying something sarcastic. Such few most occurred words are shown from the entire corpus in Table 1.

4 Deep pLSTM Architecture We propose to construct four equipollent long-short term networks for parallel learning with identical sizes as our combined baseline model. We term them as pLSTM networks, with a fixed batch of inputs but individual outputs. Each LSTM network consists of distinguished end classifier functions at the output. These classifiers, or generically termed as activation functions, help us to understand the analogous comparison of their behavioral natures while handing large input vectors generated from fragmenting the text corpus. We channelize the input vectors generated from tokens from the text data in a four-parallel way as the input feeding for each corresponding model. Each of these models contains fully connected deep layers, but isolated from each other within the architecture. The proposed networks consist of the bidirectional signaling within their layers, containing a backward LSTM pass and a forward LSTM pass, for the forget to get backpropagation, as well as the output gate serving as the input for the next layer. The combination of a forward and backward pass within an LSTM layer can be standardized formally as: For incoming input into a layer: i = Wi xi + U i Fseq(→) + Rseq(←) + bi

(1)

where Φ denotes the activation function of the respective network, W is the weight vector of input in the layer, U can be denoted as the updated combination of sequence signals, and b is the bias vector for output summarization of that particular layer. Similarly, for the output generation from a layer:

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture …

o = Wo xo + Uo Fseq(→) + Rseq(←) + bo .

67

(2)

For the forget gate of a layer: f = Wf xf + Uf Fseq(→) + Rseq(←) + bf .

(3)

Finally, the computation cell first takes the output quality of the forget gate f and simultaneously sustain the previous layer’s input characteristics, or forget them. Similarly, it takes the input coming from the input gate i, and accordingly channels it as the new computational memory as c. ˜ It then sums these two results to produce the final computation memory c. Hence, for the computation cell operation: c = ( f (W n + U n) · c) + (in · c). ˜

(4)

Now, these are the representative modeling of a respective layer of a single longshort term network. Obviously, as the layering densifies, these equational simulations will be clubbed together for each layer. Henceforth, we sum up together the inputs from Eq. (1) for each LSTM channel as: i 1...n = (soft max) Wi xi + Ui Fseq(→) + Rseq(←) + bi + i 2...n = (sigmoid) Wi xi + Ui Fseq(→) + Rseq(←) + bi + i 3...n = [(relu) (Wi xi + Ui Fseq(→) + Rseq(←) + bi )]+ i 4...n = [(tanh) (Wi xi + Ui Fseq(→) + Rseq(←) + bi )],

(5)

here, n is the notation for the number of inherent layers within each respective LSTM structure. Similarly, for the output cumulation of fragments within the unified architecture o1 → o4 , forget gates f 1 → f 4 , and c1 → c4 are summarized for each layer.

4.1 Hyperparameters Tuning The model tuning contains a word embedding dimension of 400. We further deploy 500 hidden layers for each of the deep LSTM substructures within the composite architecture. The dense layer holds different activation classifiers for each substructure. The adam optimizer is used with categorical cross-entropy for binary categorical classification of sarcasm within the training phase. It is then initialized with the loss function to evaluate the training loss. The learning rate of the optimizer is kept as 0.01. The dropout rate for avoiding overfitting is respectively 0.6 for the vectorization and 0.4 for the bidirectional layers. The epochs for each LSTM module are set

68

S. Das and A. K. Kolya

Fig. 1 Proposed architecture for parallel long-short term networks

as 500. The verbose information is kept as 1 for words (vectors) to training logs. Finally, the collective outcome is printed as model.summary() (Fig. 1).

5 Results and Analysis At first, we compare the performance of four pLSTM modules with distinct activation classifiers. The comparison is made while all the modules reach the peak training locus assigned, i.e., on the 500th epoch. We show the scalable comparison graph in Fig. 2. From Fig. 2, it can be observed that our baseline pLSTM module with softmax activation function minimizes down the training error to a minimal range even before the first 100 epochs. The training accuracy also peaks above 90% during this phase. Hence an overlap takes place between 0 and 100 epochs, where the error rate comes down while the accuracy emerges. Meanwhile, the accuracy simultaneously goes up to almost touching the peak margin of 98.95%, and it mostly settles within the high accuracy range of 96–98% for the rest of the training phase. The LSTM coupled with sigmoid performs similarly, but closer analytical observation reveals it tops an accuracy of 96.88%, narrowly falling short of the first module. On the other hand, LSTM modules coupled with ReLU and tanh both suffer from a heavy range of data loss during training and perform drastically worse. Since the loss values remain

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture …

69

Fig. 2 Epoch level analysis of pLSTM modules with training loss and accuracy

constantly high, the training success for base sarcasm classification does not progress much. Table 2 represents the training analysis concerning epochs. Following, we discard the train sets of both LSTM coupled with ReLU and tanh. We further compare the first two classifier combined modules in Table 3, where performance visibility helps to narrow down the F-measure analysis. Here the performance represents the classification report along with the F-score generated from the training phase. Table 2 Summary information for training sets, where S/NS is the successful detection of sarcastic or non-sarcastic dialogues and monologues from the data per 100 epochs Modules

S/NS

S/NS

S/NS

S/NS

S/NS

100

200

300

400

500

pLSTM + softmax

93.07

92.15

95.33

97.79

98.95

pLSTM + sigmoid

91.12

90.03

92.50

94.52

96.88

pLSTM + relu

21.27

12.20

11.16

11.90

10.63

pLSTM + tanh

10.60

09.23

09.11

08.25

08.16

70

S. Das and A. K. Kolya

Table 3 Comparative evaluation of the better performing pLSTM modules w.r.t classification report Modules

Sarcasm Precision

Recall

F1-Score

Accuracy

pLSTM + softmax

0.9900

0.9800

0.9851

0.9850

pLSTM + sigmoid

0.9600

0.9400

0.9505

0.9500

6 Benchmark Comparisons We carry out similar rounds of experimentation with several open-sourced and sarcasm-based datasets for evaluating generic validation accuracy. At first, we select two works of English humor and comedy literature from the Project Gutenberg digital library. These are The Comedy of Errors and Three Men in a Boat. Each of them represents linear forms of plain texts, without the need for any preprocessing. Following, we examine our proposed framework with three more sarcasm data scraped and developed from the internet. First of which is a sarcastic and nonsarcastic combination of data curated by a supervised pattern (SIGN) [9]. The main texts in the corpus are questions and rhetorical statements, in quote and response pairs. The second dataset developed is a sarcastic and non-sarcastic combination of data curated by a supervised pattern (Sarcasm V2) [10]. The final corpus selected is a collection of train and test data on sarcasm from Reddit posts (SARC) [11]. It is a manually annotated corpus primarily consisting of 8 years of 1.3 million sarcastic Reddit posts, their replies, and comments. We shuffle the data by performing fivefold random cross-training from each dataset, in a 3:2 manner. Then again, we apply the testing on the entire corpora. Table 4 shows that among all the sarcasm datasets, our baseline pLSTM with softmax obtains the highest validation accuracy of 98.31%, gaining a lead margin of 2.27% from the next best performing substructure of pLSTM with sigmoid (96.04%). Not only that, but both the softmax and sigmoid attached modules also obtain F scores 97.03 and 94.58, respectively, for TCE-based validation, which is the highest among the benchmark analysis. For [10, 11], our proposed method outperforms the previous state-of-the-art results [12, 13] by a notable improvement leap. It is evident from the results that the feature vectors fed as input for the parallel LSTMs do not lead to overfitting, and the deep structured module(s) utilizes them to almost the saturation extent. As already adjudged, we use two of our better-performing modules pLSTM + softmax and pLSTM + sigmoid to represent the performance comparison with similar tasks for sarcasm, satire, and/or irony detection.

7 Discussion Our proposed architecture can successfully detect sarcasm generation learned from the training epochs. All the LSTM modules in our architecture operate independently,

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture …

71

Table 4 Detailed information of the datasets, where V represents the lexical vocabulary length, Size is the cumulation of training and test size in MB of the respective data, and SoA is the respective state-of-art results observed. SARC represents (tr + ts), i.e., training and test data combined. Dashed lines are introduced where no results are reported yet Data

V

Size

Model

Train

Test

SoA

TCEa

18.020

100

pLSTM + softmax

98.90

98.31

__

pLSTM sigmoid

98.16

96.04

97.70

96.93

TMBa

69.849

382

pLSTM + softmax pLSTM + sigmoid

96.00

95.40

SIGN [9]

41.097

2076

pLSTM + softmax

97.05

95.89

pLSTM + sigmoid

95.96

94.09

pLSTM + softmax

95.37

94.06

pLSTM + sigmoid

93.20

91.54

pLSTM + softmax

94.91

93.00

pLSTM + sigmoid

91.02

89.43

Sar. V2 [10] SARC [11]

43.327 51.476

2568 4646

__ __ 76.00 [12] 77.00 [13]

a https://www.gutenberg.org/

drawing the same input sets. The performance of one such module does not affect the others. This provides room for monitoring the individual performances of each standalone module. Also, since modules are substructures of a cohesive architecture as a whole, the hyperparameters tuning and simulation environment is identical for all the modules. Finally, the statements are generated as a whole set combining all the modules’ independent outputs. Besides maintain overall consistent performance, our method also misses out on certain occasions. Also, we worked with an independent sarcasm corpus. Hence, we did not consider the context and situational sarcasm within regular conversations. The data are already rich with sarcastic dialogues and punchlines only. But sarcasm and irony within regular conversations are mostly reliant on the conversational context. As the training set of sarcastic statements are not part of any ongoing conversations, a few of them could actually sound generic ones and not sarcastic. Keeping it in mind, we tested our model on multiple other public sarcasm data as well, on which the previously applicable state-of-the-art works were also compared. It provided us a comparative concept of how our proposed approach fared on some of the standard sarcasm corpora available.

8 Conclusion and Future Work Sarcasm can be produced through multi-parametric expressions. It is rather complex to understand at times; even for us humans. But when dealt with linguistics data only, we can exploit language-specific features in the likes of syntax, semantics, and vocabulary of that particular data. In this paper, we chose an open-sourced text

72

S. Das and A. K. Kolya

corpus manually built on collecting some of the popular comedy show dialogues. We introduced the parallel deep LSTM network architecture, keeping in mind how the homogeneity of one such network module would come up against each other when they are fed with the same inputs, identical tuning, but with different classifiers in outermost layers. To our expectations, two of the standalone modules performed well on the training data, achieving a maximum overall accuracy of 98.95%. We also traditionally tested our framework on popular public sarcasm corpora, and two of our independent modules scored over 95% accuracy in detecting sarcasm, with 98.31% being the highest accuracy obtained. However, for future endeavors, instead of deploying four deep neural models for achieving a homogenous task of sarcasm detection, we intend to develop a mimic the human alike statement succession initiating from random user input seed words to produce auto-generated natural sarcastic dialogues. For sentence-level sarcasm, there are a few areas of impact, which can be identified with the help of attention mechanisms. Considering it, we would like to build a persuasive model of the contextual conversation containing humor, wit, and irony for creating a sarcastic vocabulary entirely produced by deep neural networks.

References 1. Mukherjee, S., Bala, P.K.: Sarcasm detection in microblogs using Naïve Bayes and fuzzy clustering. Technol. Soc. 48, 19–27 (2017). https://doi.org/10.1016/j.techsoc.2016.10.003 2. Avvaru, A., Vobilisetty, S., Mamidi, R.: Detecting Sarcasm in conversation context using transformer-based models. In: Proceedings of the Second Workshop on Figurative Language Processing. pp. 98–103. Association for Computational Linguistics, Online (2020) 3. Gregory, H., Li, S., Mohammadi, P., Tarn, N., Draelos, R., Rudin, C.: A Transformer approach to contextual Sarcasm detection in Twitter. In: Proceedings of the Second Workshop on Figurative Language Processing. pp. 270–275. Association for Computational Linguistics, Online (2020). 4. Majumder, N., Poria, S., Peng, H., Chhaya, N., Cambria, E., Gelbukh, A.: Sentiment and Sarcasm classification with multitask learning. IEEE Intell. Syst. 34, 38–43 (2019). https://doi. org/10.1109/MIS.2019.2904691 5. Son, L.H., Kumar, A., Sangwan, S.R., Arora, A., Nayyar, A., Abdel-Basset, M.: Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE Access. 7, 23319–23328 (2019). https://doi.org/10.1109/ACCESS.2019.289 9260 6. Sundararajan, K., Palanisamy, A.: Multi-rule based ensemble feature selection model for Sarcasm type detection in Twitter, https://www.hindawi.com/journals/cin/2020/2860479/ 7. Castro, S., Hazarika, D., Pérez-Rosas, V., Zimmermann, R., Mihalcea, R., Poria, S.: Towards multimodal Sarcasm detection (An_Obviously_ Perfect Paper). In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 4619–4629. Association for Computational Linguistics, Florence, Italy (2019). 8. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171–4186. Association for Computational Lin-guistics, Minneapolis, Minnesota (2019).

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture …

73

9. Peled, L., Reichart, R.: Sarcasm SIGN: Interpreting Sarcasm with sentiment based monolingual machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1690–1700. Association for Computational Linguistics, Vancouver, Canada (2017) 10. Oraby, S., Harrison, V., Reed, L., Hernandez, E., Riloff, E., Walker, M.: Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. pp. 31–41. Association for Computational Linguistics, Los Angeles (2016) 11. Khodak, M., Saunshi, N., Vodrahalli, K.: A Large Self-Annotated Corpus for Sarcasm. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018) 12. Ili´c, S., Marrese-Taylor, E., Balazs, J., Matsuo, Y.: Deep contextualized word representations for detecting sarcasm and irony. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. pp. 2–7. Association for Com-putational Linguistics, Brussels, Belgium (2018). 13. Pelser, D., Murrell, H.: Deep and Dense Sarcasm Detection. arXiv:1911.07474 [cs]. (2019)

Sentiment Analysis of Covid-19 Tweets Using Evolutionary Classification-Based LSTM Model Arunava Kumar Chakraborty, Sourav Das, and Anup Kumar Kolya

Abstract As the Covid-19 outbreaks rapidly all over the world day by day and also affects the lives of million, a number of countries declared complete lockdown to check its intensity. During this lockdown period, social media platforms have played an important role to spread information about this pandemic across the world, as people used to express their feelings through the social networks. Considering this catastrophic situation, we developed an experimental approach to analyze the reactions of people on Twitter taking into account the popular words either directly or indirectly based on this pandemic. This paper represents the sentiment analysis on collected large number of tweets on Coronavirus or Covid-19. At first, we analyze the trend of public sentiment on the topics related to Covid-19 epidemic using an evolutionary classification followed by the n-gram analysis. Then we calculated the sentiment ratings on collected tweet based on their class. Finally, we trained the longshort term network using two types of rated tweets to predict sentiment on Covid-19 data and obtained an overall accuracy of 84.46%. Keywords Covid-19 · Gram selection · LSTM · Sentiment analysis

1 Introduction On 31st December, 2019 the Covid-19 outbreak was first reported in the Wuhan, Hubei Province, China and it started spreading rapidly all over the world. Finally, WHO announced Covid-19 outbreak as pandemic on 11th March, 2020, when the A. K. Chakraborty (B) · A. K. Kolya Department of Computer Science & Engineering, RCC Institute of Information Technology, Beleghata, Kolkata 700015, India e-mail: [email protected] A. K. Kolya e-mail: [email protected] S. Das Maulana Abul Kalam Azad University of Technology, WB, Salt Lake, Kolkata 700064, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_7

75

76

A. K. Chakraborty et al.

virus continues to spread [1]. Starting from China, this virus infected and killed thousands of people from Italy, Spain, USA, UK, Brazil, Russia, and other many more countries as well. On 21st August 2020, more than 22.5 million cases of Covid-19 were reported in more than 188 countries and territories, yielding more than 7,92,000 deaths; although 14.4 million people have reported to be recovered.1 While this pandemic has continued to affect the lives of millions, many countries had enforced a strict lockdown for different periods to break the chain of this pandemic [1]. Since the Covid-19 vaccines are still yet to be discovered, therefore maintaining social distancing is the one and only one solution to check the spreading rate of this virus [2]. During the lockdown period a lot of people have chosen the Twitter to share their expression about this disease so we have been inspired to measure the human sensations about this epidemic by analyzing this huge Twitter data [3]. Initially, we have to face many challenges at the time of streaming the English tweets from the multilingual tweets all over the world as most of the peoples of foreign countries have used their native languages rather than English to express their feelings on social media [3]. However, we have developed our dataset considering the English tweets exclusively on Covid-19 of 160 k tweets during April–May, 2020. We found the most popular words from the word corpus. Then we analyzed the trend of tweets using n-gram model. Further we assigned sentiment scores to our preprocessed tweets based on their sentiment polarity and classified our dataset on basis of their sentiment scores. Finally, we used those tweets and their sentiment ratings to train our LSTM model. The following sections are furnished as follows: In Sect. 2 we have described some previous related research works. The architecture of our dataset and proposed pre-processing approach presented in Sect. 3. The Sect. 4 consists of Feature a for identifying the Covid-19 specified words based on the word lexicon. In Sect. 5, we have described Feature b as the trend of tweet words using n-gram model. The evolutionary classification on the sentiment-rated tweets based on their sentiment polarity has given in Sect. 6. In Sect. 7 we trained our LSTM model based on the classified tweets including their sentiment ratings. Whereas the Sect. 8 concludes the future prospects of our research work.

2 Related Works A machine learning and cloud computing-based Covid-19 prediction model has been developed on May, 2020 to predict the future trend of this epidemic. They mainly used probabilistic distribution functions like Gaussian, Beta, Fisher-Tippet, and Log Normal functions to predict the trend [1]. A Covid-19 trend prediction model has introduced on June, 2020 for predicting the number of COVID-19 positive cases in different states of India. The researchers 1 https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd402994234

67b48e9ecf6.

Sentiment Analysis of Covid-19 Tweets Using Evolutionary …

77

mainly developed a LSTM-based prediction model as LSTM model performs better for time series predictions. They tested different LSTM variants such as stacked, convolutional, and Bidirectional LSTM on the historical data, and based on the absolute error they found that Bi-LSTM gives more accurate results over other LSTM models for short-term prediction [4]. On July, 2020 the evolutionary K-means clustering on twitter data related to Covid-19 has been done by some researchers. They analyzed the tweet patterns using n-gram model. As the result they observed the difference between the occurrences of n-grams from the dataset [5]. Another research work describes a deep LSTM architecture for Message-level and Topic-based sentiment analysis. The authors used LSTM networks augmented with two kinds of attention mechanisms, on top of word embeddings pre-trained on a big collection of Twitter messages [7]. A group of researchers developed LSTM hyperparameter optimization for neural network-based Emotion Recognition framework. In their experiment they found that optimizing LSTM hyperparameters significantly improve the recognition rate of fourquadrant dimensional emotions with a 14% increase in accuracy and the model based on optimized LSTM classifier achieved 77.68% accuracy by using the Differential Evolution algorithm [8].

3 Preparing Covid-19 Dataset Since this Covid-19 epidemic has affected the entire world, we have collected worldwide Covid-19 related English tweets at a rate of 10 k per day in between April 19 and May 20, 2020 to create our dataset of about 160 k tweets. The dataset we developed contains the important information about most of the tweets as its attribute. The attributes of our dataset are id [Number], created_at [DateTime], source [Text], original_text [Text], favorite_count [Number], retweet_count [Number], original_author [Text], hashtags [Text], user_mentions [Text], place [Text]. Finally, we have collected 1,61,400 tweets containing the hash-tagged keywords like—#covid19, #coronavirus, #covid, #covaccine, #lockdown, #homequarantine, #quarantinecenter, #socialdistancing, #stayhome, #staysafe etc. In Fig. 1 we have represented an overview of our dataset.

3.1 Data Pre-Processing Data pre-processing is mainly used for cleaning the raw data by following certain steps to achieve the better result for further evaluations. We have done the preprocessing on our collected data by developing a user defined pre-processing function based on NLTK (Natural Language Toolkit, a Python library for NLP). Stemming helps to reduce inflected words to their word stem, base, or root form whereas by

78

A. K. Chakraborty et al.

using Tokenization this function splits each of the sentence into smaller parts of word.

4 Feature A: Covid-19 Specified Words Identification After pre-processing we have developed the Bag-of-Words (BOW) model using the frequently occurred words from the word lexicon and we obtained a list of most frequent Covid-19 exclusive words. We have represented a dense word-cloud in Fig. 2 of some of the mostly used words within the corpus.

Fig. 1 Partial snapshot of the Covid-19 tweets corpus

Fig. 2 Some of the most popular Covid-19 related words from our corpus

Sentiment Analysis of Covid-19 Tweets Using Evolutionary …

79

4.1 Word Popularity Several words within the generated corpus have been found at different times in different positions of the tweets. Here we have counted the recurrence of each word and presented the top 50 popular words along with their popularity in Fig. 3. After finding the word popularity, we have calculated the probability of repetition for each word on the basis of total 3,53,704 words from the corpus. Table 1 represents the popularity and probability scores of some most frequent words. count(Wi ) n i=0 count(Wi=0 )

P(Wi ) = n

(1)

5 Feature B: Word Popularity Detection Using N-gram Lexical n-gram models are widely used in Natural Language Processing for statistical analysis & syntax feature mapping. We developed n-gram model to analyze our generated corpus consisting of tokenized words for finding the popularity of words or group of adjacent words. Here the probability of the occurrence of a sequence can be calculated using probability chain rule:

Fig. 3 Graphical representation of the popularity for most frequent Covid-19 exclusive words

80

A. K. Chakraborty et al.

P(x1 , x2 , x3 , . . . xn ) = P(x1 ) P(x2 |x1 ) P(x3 |x1 , x2 ) . . . P(xn |x1 , x2 , x3 , . . . x(n−1) ) =

n

P(xi | x(i-1) 1 )

(2)

(3)

i=1

For example, we can consider a sentence as “Still Covid-19 wave is running”. Now as per the probability chain rule, P(“Still Covid19 wave is running”) = P(“Still”) x P(“Covid19” | “Still”) x P(“wave” | “Still Covid19”) x P(“is” | “Still Covid19 wave”) x P(“running” | “Still Covid19 wave is”). The probabilities of words in each sentence after applying probability chain rule: P(W1 ,W2 , W3 , ... Wn ) =

P(Wj | W1 ,W2 , W3 , ... W(j-1) )

(4)

j

=

n

(j-1)

P(Wj | W1 )

(5)

j=1

The bigram model estimates the probability of a word by using only the conditional probability P(Wi |Wi-1 ) of one preceding word on given condition of all the previous words P(Wi |Wi-1 1 ) [6]. P(W1 , W2 ) =

P(W2 | W1 )

(6)

count (W(k-1), Wk ) count (W(k-1) )

(7)

i=2

The expression for the probability is P(Wk | W(k-1) ) =

We have identified the most popular unigrams, bigrams, and trigrams within our corpus using the n-gram model. The graphical representations of 50 most popular unigrams, bigrams, and trigrams along with their popularity are presented in Figs. 4, 5, and 6, respectively. As the result of this analysis, we found that the popularity of trigrams is lesser than that of bigrams and the unigrams popularity is the highest according to this n-gram model.

6 Sentiment Analysis To measure the trend of public opinions we use Sentiment Analysis, a specific type of Data Mining through Natural Language Processing (NLP), computational linguistics

Sentiment Analysis of Covid-19 Tweets Using Evolutionary …

81

Fig. 4 Graphical representation of the popularity for most frequent unigrams

Fig. 5 Graphical representation of the popularity for most frequent bigrams

and text analysis. The subjective information from the social media are analyzed and extracted to classify the text in multiple classes like positive, negative, and neutral. Here we calculated the sentiment polarity of each cleaned and preprocessed tweet using the NLTK-based Sentiment Analyzer and get the sentiment scores for positive, negative, and neutral classes to calculate the compound sentiment score for each tweet.

82

A. K. Chakraborty et al.

Fig. 6 Graphical representation of the popularity for most frequent trigrams

6.1 Sentiment Classification We have classified the tweets on the basis of the compound sentiments into three different classes, i.e., Positive, Negative, and Neutral. Then we have assigned the sentiment polarity rating for each tweet based on the algorithm presented in Table 2. In Fig. 7, we have represented the sentiment classification with the overall percentage of each positive, negative, and neutral tweet found in the dataset. It can be visualized that the sentiment classes are naturally imbalanced as a large portion of Fig. 7 Sentiment distribution of three class polarity along with the percentage of Covid-19 tweets occurred from each class

Sentiment Analysis of Covid-19 Tweets Using Evolutionary …

83

social media users are either negative or neutral against the Covid-19 and the medical details associated with it.

7 Sentiment Modeling Using Sequential LSTM In traditional textual sentiment analysis, LSTM (Long-Short Term Model) network have already been proven to be performing better than the similar neural models [3]. We exploit a sequential LSTM model for sentiment evaluation of our Covid19 dataset. We developed a new dataset consisting of the cleaned and preprocessed tweets along with their corresponding positive (1.0) and negative (0.0) sentiments. Then we created two sets X and y for the cleaned tweets and their sentiment scores, respectively, and split the dataset into 80:20 ratio, i.e., 80% for training (X_train, y_train) and 20% for validation (X_test, y_test) purposes, respectively. A large number of Covid-19 exclusive words were generated by this model from the new dataset. Then we converted these words into word vectors using word2vec by setting the vector dimension as 200 for each collected n-grams within a sentence and developed new X_train, X_test sets consisting with the calculated word vectors for further processing. From updated training set the word vectors and the respective sentiment scores fed in the model as the first layer of inputs. In this experiment, we used TensorFlow framework and keras library to add Sequential LSTM model with Dense layers. We have trained our five-layered model for 30 epochs with two types of outline activation function with parameters, optimizer, loss, and accuracy. We have used ReLU (Rectified Linear Unit) activation function for the initial set of Dense layers with 128, 64, and 32 units, respectively, and Sigmoid activation function for the outermost final Dense layer with 2 units. During the training we have used 32 batches and 2 verbose for our model. For some epochs Table 3 represents the training accuracy vs. loss and validation accuracy vs. loss, respectively. After completion of the training on our model, we have finally achieved 91.67% of overall training accuracy whereas the validation accuracy is 84.46% on the testing data. Tables 4 and 5 are representing the confusion matrix and classification report to present the differences between predicted and the actual tweets along with the different classes. Table 1 Popularity & probability of most frequent words

Words

Popularity

Probability

0

Covid19

91,794

0.259522

1

Test

11,663

0.032974

2

New

11,305

0.031962

3

People

10,834

0.030630

4

Death

10,783

0.030486

84

A. K. Chakraborty et al.

Table 2 Algorithm used for sentiment classification of our Covid-19 tweets

Algorithm Sentiment Classification of Tweets (compound, sentiment): 1. for each k in range (0, len(tweet.index)): 2. if tweetk[compound] < 0: 3. tweetk[sentiment] = 0.0 # assigned 0.0 for Negative Tweets 4. elif tweetk[compound] > 0: 5. tweetk[sentiment] = 1.0 # assigned 1.0 for Positive Tweets 6. else: 7. tweetk[sentiment] = 0.5 # assigned 0.5 for Neutral Tweets 8. end

Table 3 Training accuracy versus loss, validation accuracy versus loss within 30 epochs Epochs

Train loss (%)

Train accuracy (%)

Val loss (%)

Val accuracy (%)

Initially

59.93

67.63

56.18

70.45

5th

42.71

79.27

43.75

78.74

10th

35.91

83.38

39.58

81.71

15th

30.43

86.36

39.28

82.81

20th

26.23

88.40

41.04

83.47

25th

22.44

90.15

41.96

84.24

30th

19.38

91.67

45.57

84.46

Table 4 Confusion matrix Actual

Predicted Positive

Negative

Positive

8298 (TP)

1941 (FP)

Negative

1946 (FN)

7924 (TN)

Table 5 Classification report Precision

Recall

F1-Score

Support

Positive (1.0)

0.81

0.81

0.81

10,239

Negative (0.0)

0.80

0.80

0.80

9870

Avg/Total

0.81

0.81

0.81

20,109

In Fig. 8, we have plotted the percentages of training accuracy vs. loss and validation accuracy vs. loss, achieved by the Sequential LSTM model during compilation. From the figure it is evident that there has been a significant loss difference between the training and testing epochs. This indicates a slight overfitting of the data which can be postulated from the several tweet collection parameters differing from time to time in the tweets streaming phase.

Sentiment Analysis of Covid-19 Tweets Using Evolutionary …

85

Fig. 8 Performance metrics from the training loss vs. accuracy and validation loss vs. accuracy by the proposed model

8 Conclusion & Future Scope This experiment is mainly focused on Deep Learning-based Sentiment Analysis on Covid-19 tweets. We extracted the mostly popular words and analyzed the popularity of group of words using n-gram model as two main features of our dataset. However, later we developed a model to assign the sentiment ratings to the tweets based on their sentiment polarities calculated by sentiment analyzer and classify all tweets into positive and negative classes based on their assigned sentiment ratings. Then, using this classified dataset containing the cleaned and preprocessed tweets and their sentiment ratings, i.e., 1.0 for positive and 0.0 for negative, we trained our Deep Learning-based LSTM model. We divided the dataset into 80:20 ratio, i.e., 80% for training and 20% for testing purposes. After running 30 epochs on almost 93,474 parameters, we achieved validation accuracy as 84.46%. For the future work, we want to develop a polarity-popularity model based on the features extracted during this experiment so that we can assign the refined sentiment ratings to the tweet based on the polarity of mostly recurred words [3]. With that data we will train the deep learning model to enhance the validation accuracy of our system.

86

A. K. Chakraborty et al.

References 1. Tuli, S., Tuli, S., Tuli, R., Gill, S.S.: Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing. Internet of Things, p. 100222 (2020) 2. Dubey, A.D.: Twitter sentiment analysis during COVID19 outbreak. Available at SSRN 3572023 (2020) 3. Das, S., Das, D., Kolya, A.K.: Sentiment classification with GST tweet data on LSTM based on polarity-popularity model. Sadhana. 45(1) (2020) 4. Arora, P., Kumar, H., Panigrahi, B.K.: Prediction and analysis of COVID-19 positive cases using deep learning models: a descriptive case study of India. Chaos, Solitons Fractals 139, 110017 (2020) 5. Arpaci, I., Alshehabi, S., Al-Emran, M., Khasawneh, M., Mahariq, I., Abdeljawad, T., Hassanien, A.E.: Analysis of twitter data using evolutionary clustering during the COVID-19 Pandemic. CMC-Comput. Mat. Cont. 65(1), 193–203 (2020) 6. Jurafsky, D., 2000. Speech & language processing. Pearson Education India. 7. Baziotis, C., Pelekis, N., Doulkeridis, C.: Datastories at semeval-2017 task 4: deep lstm with attention for message-level and topic-based sentiment analysis. In Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp. 747–754 (2017, August) 8. Nakisa, B., Rastgoo, M.N., Rakotonirainy, A., Maire, F., Chandran, V.: Long short term memory hyperparameter optimization for a neural network based emotion recognition framework. IEEE Access 6, 49325–49338 (2018)

Clustering as a Brain-Network Detection Tool for Mental Imagery Identification Reshma Kar and Indronil Mazumder

Abstract Brain connectivity measures have been identified as effective feature extraction tools for the classification of EEG data. However, there exist certain theoretical limitations in the computation of brain networks. First, bivariate models of brain connectivity are incapable of handling the multivariate nature of brain connections. Second, multivariate brain connectivity models are typically based on regression models. These regression models are associated with stationary assumptions, which do not hold for EEG data. To solve this problem, the authors propose clustering as a tool to perform multivariate brain connectivity analysis. Extended variants of Fuzzy c-means and self-organizing map-based clustering are proposed to compute brain networks, which are subsequently used as features for mental imagery detection. Experiments undertaken demonstrate the superiority of the proposed brain network features over its traditional counterparts. Keywords Brain networks · Clustering · Fuzzy C means · Neuromarketing · Multivariate connectivity · Self-organizing maps · Support vector machine

1 Introduction Human brain is believed to organize computation by establishing communication among multiple interconnected regions [1]. The exact mechanism of information transfer in the brain is unknown to date, but it is widely accepted that the brain ex-changes information through electrical, anatomical and chemical pathways. A biological brain network updates itself to perform and co-ordinate tasks in response to internal and external stimuli. Information about the communication protocol in biological brain networks and its associated update mechanisms can answer questions R. Kar Artificial Intelligence Laboratory, ETCE Department, Jadavpur University, Kolkata, India e-mail: [email protected] I. Mazumder (B) ECE Department, RCC Institute of Information Technology, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_8

87

88

R. Kar and I. Mazumder

pertaining to disease, genetics, evolution and computation [1–4]. Thus, the study of brain networks is an important research area, which needs to be addressed. Typically, the brain computes information within milliseconds, and hence the brain-rhythm capturing devices like Electroencephalogram (EEG), which offer high temporal resolution, are preferable means for detection of cognitive states in the brain and associated brain networks. The general approach in the detection of brain networks involves computing statistical measures of interdependence among brain signals and declaring highly inter-dependent signals as being connected. Apart from its biological significance, the detection of brain networks can also lead to the computation of a better feature space for the classification of cognitive states [5–7]. However, the problem of analyzing and detecting biological brain networks remains plagued by various issues as discussed next. Brain networks are typically network structures with nodes indicating brain regions and edges representing connectivity between nodes. Computation of brain networks may be multivariate or bivariate in nature [8]. Bivariate measures of connectivity are only capable of capturing limited information as brain-level interactions have often been suggested to be multivariate in nature [9]. Further, the computation of multivariate connectivity in EEG signal space is dependent on multivariate regression models, subject to assumptions of stationary and normal distribution of regression model error [10]. Unfortunately, as EEG data are non-stationary, these models of computing brain connectivity may be misleading. Common approaches in this regard include computing signal interactivity measures in either time domain [11] or spectral domain [12]. Even after many limitations of EEG-based brain network computation, these have consistently out-performed traditional feature extraction techniques for the detection of disease biomarkers [13–16] and cognitive tasks [8, 17]. However, the exact mechanism of brain network computation remains widely debated among researches of the domain [4]. Typically, an EEG brain network is intended to capture the interactions among brain regions by analyzing EEG signals. Interestingly, the question of whether two continuous variables (in this case EEG signals) interact among each other is philosophical in nature and can be formulated in many ways [11, 12, 18]. Frequency level interaction has been deemed important in EEG analysis [12, 19, 20]. The major contribution of this paper involves computation of frequency domain coupling in multivariate brain networks for non-stationary EEG signals. The paper employs clustering for the computation of brain networks. Clustering of brain signals leads to result in a multivariate approximation of the coupling among brain signals. To this end, two popular clustering algorithm variants are used, namely fuzzy c-means (FCM) and self-organizing maps (SOM). FCM is chosen for its inherent ability to model uncertainty in signal space, SOM is chosen because it can be used as a nonparametric tool for computation of brain networks. The principle is to compute clustering among EEG signals and store the results in an n × m matrix, where the matrix contains the frequency domain coupling among all n cluster centers/neurons and all m number of EEG signals acquired. This matrix when multiplied with its transpose yields an m × m matrix, indicating inter-electrode coupling which may be used as a brain network. The elements of the brain network matrix are then subsequently used

Clustering as a Brain-Network Detection Tool …

89

for the classification of cognitive states. The proposed variant of the SOM for brain network computation is non-parametric. However, as FCM requires the number of cluster centers as input, the authors propose using classification accuracy [21] as a metric to adjust the number of clusters for optimal performance. The rest of the paper is organized as follows. The second section discusses an overview of the proposed multivariate brain connectivity techniques; Section 3 lists the experimental details and Sect. 4 describes the performance analysis of the proposed techniques. The conclusions and prospects of the proposed techniques are summarized in Sect. 5.

2 Proposed Techniques The proposed technique computes multivariate brain connectivity in frequency domain. The aim is to compute multivariate spectral-domain connectivity measures free from stationary assumptions. Let E1, E2 and E3 be the three EEG electrode positions, used to acquire the electrical signals from subject’s scalp at a given experimental instance. We may then cluster the signals obtained from each electrode using a clustering algorithm. For illustrative purposes, a crisp clustering among electrodes’ signals is considered. Thus, if we cluster the signals into two clusters (C1 and C2), we can compute a matrix A, with binary indicators of the electrodes belonging to each cluster indicated in black as shown in Fig. 1. It may be noted that multiplying matrix A with its transpose, results in a symmetric matrix, in which electrodes belonging to the same cluster are indicated to be connected (shaded in black). This transformation is used to compute brain networks using clustering.

2.1 Computation of FCM-Based Brain Network Features The authors employed the FCM algorithm to compute multivariate brain networks in spectral domain. Fuzzy sets are capable of handling uncertainty in EEG signal

Fig. 1 Illustrative example outlining computation of brain networks using clustering matrix A (black/connected: 0, white/not connected: 1)

90

R. Kar and I. Mazumder

Fig. 2 Computation of brain network from pre-processed signals. PSD: Power spectral density, FCM: Fuzzy C-Means. E1 –En : EEG signal acquisition electrodes

space [22] arising out of their non-stationary nature. The ideal number of cluster centers in the FCM algorithm for detecting brain networks is selected by iteratively optimizing the number of cluster centers to select the best performing EEG brain network features for a given classification task. For each instance of multiple-channel EEG data, artifact removal is performed using windsorising [23]. Then each signal is transformed to frequency domain by performing power spectral density estimation [24]. Clustering among the obtained frequency domain signals is thereafter performed to estimate the frequency-domain inter-electrode coupling, which is often given emphasis in computational neuroscience literature [25]. The membership matrices computed by FCM are multiplied with its transpose to detect brain networks. The technique of brain-network computation using FCM, for a single iteration is given in Fig. 2. These brain network matrices are subsequently used as features. The number of cluster centers in the FCM algorithm is optimized by testing all values in the range [3, 15] for the given pattern recognition problem. The aforementioned range is chosen keeping in mind that data are obtained from 30 electrodes. The technique of brain network computation is outlined in Algorithm 1.

Clustering as a Brain-Network Detection Tool …

91

Algorithm 1: Proposed Brain Network Feature Extraction Algorithm using FCM Step 1: Perform Artifact Removal by windsorising Step 2: Compute Welch’s power spectral density (PSD) estimate to transfer signals to frequency domain Ef = PSD(Et) Step 3: For a given number of clusters n in the range [3,15] perform steps 4 to 6. Step 4: Perform FCM clustering among frequency-transformed signals obtained from all electrodes, and cluster memberships of each electrode in an m×n matrix A using FCM. A = FCM (Ef) Step 5: Let m be the number of data acquisition electrodes Compute an m×m Fuzzy Brain-Network (BN) representing inter-electrode connections among electrodes by multiplying the membership vector with its transpose. BN = A×AT Step 6: Select lower diagonal matrix of computed brain networks as features.

As demonstrated in the experiments section, the number of cluster centers chosen to implement feature extraction has a significant impact on the features extracted and in turn affects the performance of the classifier. In order to select the optimal number of cluster centers for feature extraction purposes, the feature extraction process is repeated multiple times, for each number of cluster centers considered. The optimal number of cluster centers is selected as follows. The selected feature-label set is partitioned into training and test sets, namely Tr and Te. Brain network features are computed for n number of cluster centers. The optimal number of cluster centers (c) is identified as the one, which results in maximum classification accuracy for the Tr2 set using LSVM [26]. Algorithm 2: Tuning number of cluster centers in FCM based brain networks for optimal classifier performance Step 1: For number of cluster centers c = [3, 15] repeat steps 2 to 4 Step 2: For each data instance of subject s, extract features by Algorithm 1 Step 3: Divide data into training sets Tr1, Tr2 and test set Te Step 4: Train SVM using Tr1 and test its performance on Tr2 using features computed in step 3. Step 5: Identify the feature set with best classification accuracy on Tr2, and use it to compute final performance on test set Te.

92

R. Kar and I. Mazumder

2.2 Computation of SOM-Based Brain-Network Features The authors propose computation of multivariate connectivity in the brain by employing SOMs, which are fundamentally based upon the same principle as biological neuronal connectivity models, viz. “neurons which fire together, wire together” [4]. This work implements the “phase-locking value” [27] among signals as a distance metric for the SOM-based brain network computation measure. The training and recall mechanism of the extended SOM-based brain network computation is outlined next. The traditional SOM is adapted in the following ways: (a) phase-locking value is used as a distance metric between neurons and input vectors and (b) the selforganizing neurons are arranged as one-dimensional vector for simplicity, because a two-dimensional representation of the obtained clusters is not relevant to the given problem. As the training phase progresses, the random neuron weights start adapting to resemble the closest EEG signals, as shown in Fig. 3. At the end of the recall phase, similar signals are mapped to nearby neurons. • Training Phase (a)

A neuron field of 1 × k dimension is initialized, each having a 1 × n dimensional weight vector mapping it to the input vector x = {x i } (here each input

Fig.3 Adaptation of neuronal weights in proposed SOM

Clustering as a Brain-Network Detection Tool …

(b)

vector is an EEG signal), which is also of dimension 1 × n. The weights are initialized such that no two weight vectors would be the same. The distance among weight vectors and input vectors is computed as the Phase-Locking Value (PLV) among inputs and weights. di, j = P L V (xi , w j ).

(c)

(1)

The winning neuron associated with the highest PLV with the input vector, and its neighborhood neurons within radius r t are adapted by the following formula wc = wc + ηt (xi − wc ),

(d)

93

(2)

where, a neuron is said to be within radius r t of winning neuron, if the PLV among winning neuron and selected neuron is within threshold r t . Steps b and c are performed iteratively, and learning rate ηt and neighborhood radius r t are linearly updated with each iteration as follows ηt+1 = ηt /t rt+1 = 0.5 + rt /t.

(3)

Here t represents the number of iterations. An increasing r t implies that the neighborhood becomes smaller with iterations as the threshold for coupling increases. • Recall Phase (a)

The distance/clustering coefficient between the neurons and signals was computed with the following using phase-locking value ai, j = P L V (xi , w j ).

(4)

3 Experiments and Results 3.1 Data Acquisition and Pre-processing The EEG data are acquired from an online four-class mental imagery dataset [28] for 9 subjects acquired in 2 sessions of 48 trials each. Artifact removal is performed by a technique called windsorising [23], which involves scaling EEG data into a specific range. For FCM-based experiments, the data for all users are then divided into training and test sets in the ratio 2:1 and training set Tr is divided into two

94

R. Kar and I. Mazumder

Table 1 Average classification accuracy (CA) with LSVM

further partitions Tr1 and Tr2 in the ratio 1:1. For SOM-based experiments, the data are divided into training and test sets in the ratio 2:1.

3.2 Training and Classification for FCM-Based Brain Networks Brain networks are computed with a different number of cluster centers. Sample brain networks for a given mental imagery task are given in Table 1. It can be seen that with the increasing number of cluster centers, the clustering criteria become more restricted and hence the computed synchronization among features is lower. All features are extracted after performing windsorising on 4 s clip of EEG data collected while the subjects performed mental imagery. The proposed brain network features yield an accuracy of 81.44% with linear support vector machines, on the test set, which a significant improvement over the earlier work is done (average accuracy 66.94%) [10]. The precision, recall and F1-score for all 10 number of cluster centers, corresponding to training set Tr2 are given in Fig. 4.

3.3 Brain-Network Computation by Extended SOM Using the proposed SOM technique, a matrix M is computed, where the element in the i-th row and j-th column of matrix M indicates the clustering coefficient of the signal collected from i-th electrode Ei with j-th neuron Nj. The matrix M multiplied with its transpose, results in the connectivity matrix among all electrodes.

Clustering as a Brain-Network Detection Tool …

95

Fig. 4 Precision, recall, F-score on set Tr2 for a different number of cluster centers

An illustrative example with clustering coefficients among electrodes and neurons is presented in Fig. 5, where electrodes belonging to the same cluster are indicated as being connected in the computed Brain Network (BN).

Fig. 5 Computation of brain network by multiplying the clustering-coefficient matrix with its transpose. Red: high connectivity, Blue: low connectivity

96 Table 2 Average classification accuracy (CA) with LSVM

R. Kar and I. Mazumder

Traditional features

Brain network features

Features

CA

Hjorth-parameters [8]

0.65

PSD [33]

0.67

AYP [12]

0.80

Yule-parameters [23]

0.72

PLV [19]

0.64

Coherence [4, 27]

0.74

Cross-entropy [20]

0.69

Proposed FCM features

0.75

Proposed SOM features

0.71

4 Performance Analysis This section provides the experimental basis for performance analysis and comparison of the proposed brain network computation algorithms with traditional/existing ones.

4.1 Classifier Performance The proposed method of brain network computation is compared with other multivariate and by bivariate measures of brain network computation. For this purpose, brain networks were computed by various connectivity measures including Granger causality, phase-locking value, correlation and coherence [4, 10, 27, 29–31]. Elements of the brain networks were selected as features and used for five-class mental imagery classification problems. The proposed brain network connectivity-based features outperformed the traditional brain network connectivitybased features in terms of average recognition accuracy using linear support vector machines. Thus, the authors believe that the proposed method of brain network computation may be used as an effective tool for feature extraction from EEG signals. The classification results are given in Table 2.

4.2 Statistical Validation Using Wilcoxon Signed-Rank Test To statistically validate the performance of the proposed brain-network algorithms, we employ a non-parametric Wilcoxon signed-rank test [32] using the classification accuracy as a metric on a single database. Let, H o be the null hypothesis, indicating identical performance of a given algorithm-B with respect to a reference algorithm-A.

Clustering as a Brain-Network Detection Tool …

97

Table 3 Results of statistical validation with the proposed methods as reference, one at a time Existing features

Reference features Proposed FCM-based Proposed SOM-based features features

Traditional features

Hjorth-parameters [8] +

+

PSD [33]

+

+

AYP [12]

+

+

Yule-parameters [23]

+

+

+

+

Coherence [4, 27]

−

+

Cross-entropy [20]

−

+

Brain network features PLV [19]

Proposed FCM-based features

+

Proposed − SOM-based features

Here, A = any one of the two proposed brain network computation techniques and B = any one of the nine feature extraction techniques listed in Table 3 employed along with linear support vector machine classifier. The plus (minus) sign in Table 3 represents that the signed-rank values of an individual method with the proposed method as reference are significant (not significant). Here, 95% confidence level is achieved with the degree of freedom 1, studied at p- value greater than 0.05.

5 Discussion Computation of feature space requires multiplying the clustering coefficient matrix with its transpose. This is done because, for each clustering-based brain network matrix, the cluster centers are computed randomly, and the same cluster of signals may be grouped into different regions in space. Multiplying the clustering matrices keeps the required information regarding which signals are clustered and removes the clustering center-related spatial information. Also, a high number of cluster centers imply that there will be more number of signals, mapped to a single cluster. This may affect the classification accuracy of the proposed classifier, and hence the number of clusters needs to be chosen judiciously. In this paper, an iterative technique of choosing clusters is outlined.

98

R. Kar and I. Mazumder

6 Conclusions A novel technique for computing EEG brain networks has been proposed in this paper. FCM and SOM algorithms have been employed for clustering signals in frequency domain. The chief theoretical advantage of this technique is that multivariate spectral domain synchrony may be computed without stationary assumptions. It may also be noted that while most spectral domain synchrony measures are computed as frequency band-specific, the proposed technique measures the overall synchrony among different frequency bands and decreases ambiguity in selecting the appropriate frequency band for a given classification problem. Further, the uncertainty in EEG signal space can be modeled by FCM to detect similar signals in spectral domain. Experimental results validated the superiority of the proposed approach over existing techniques. In the future, the authors would like to use various clustering models and classifier combinations, for feature extraction and classification in different types of EEG classification problems.

References 1. Morin, C.: Neuromarketing: the new science of consumer behavior. Society 48, 131–135 (2011) 2. Murphy, E.R., Illes, J., Reiner, P.B.: Neuroethics of neuromarketing. J. Consum. Behav. Int. Res. Rev. 7, 293–302 (2008) 3. Javor, A., et al.: Neuromarketing and consumer neuroscience: contributions to neurology. BMC Neurol. 13(1), 13 (2013) 4. Ural, G., Kaçar, F., Canan, S.: Wavelet phase coherence estimation of EEG signals for neuromarketing studies. Neuro Quantol. 17 (2019) 5. Telpaz, A., Webb, R., Levy, D.J.: Using EEG to predict consumers’ future choices. J. Market. Res. 52(4), 511–529 (2015) 6. Hassan, M., et al.: EEG source connectivity analysis: from dense array recordings to brain networks. PloS ONE 9(8), e105041 (2014) 7. Kar, R., et al.: Detection of signaling pathways in human brain during arousal of specific emotion. In: 2014 International Joint Conference on Neural Networks (IJCNN). IEEE (2014) 8. Sakkalis, V.: Review of advanced techniques for the estimation of brain connectivity measured with EEG/MEG. Comput. Biol. Med. 41(12), 1110–1117 (2011) 9. Zanin, M., et al.: Optimizing functional network representation of multivariate time series. Sci. Rep. 2, 630 (2012) 10. Yadava, M., et al.: Analysis of EEG signals and its application to neuromarketing. Multimedia Tools Appl. 76(18), 19087–19111 (2017) 11. Rodgers, J.L., Nicewander, W.A.: Thirteen ways to look at the correlation coefficient. Am. Stat. 42(1), 59–66 (1988) 12. Aydore, S., Pantazis, D., Leahy, R.M.: A note on the phase locking value and its properties. Neuroimage 74, 231–244 (2013) 13. Oon, H.N., Saidatul, A., Ibrahim, Z.: Analysis on non-linear features of electroencephalogram (EEG) signal for neuromarketing application. In: International Conference on Computational Approach in Smart Systems Design and Applications (ICASSDA). IEEE (2018) 14. Cecchin, T., et al.: Seizure lateralization in scalp EEG using Hjorth parameters. Clin. Neurophysiol. 121(3), 290–300 (2010) 15. Pittner, S., Kamarthi, S.V.: Feature extraction from wavelet coefficients for pattern recognition tasks. IEEE Trans. Pattern Anal. Mach. Intell. 21(1), 83–88 (1999)

Clustering as a Brain-Network Detection Tool …

99

16. Greicius, M.D., et al.: Default-mode network activity distinguishes Alzheimer’s disease from healthy aging: evidence from functional MRI. Proc. Natl. Acad. Sci. 101(13), 4637–4642 (2004) 17. Heuvel, M.P.V., Sporns, O.: Network hubs in the human brain. Trends Cogn. Sci. 17(12), 683–696 (2013) 18. Liu, W., Pokharel, P.P., Principe, J.C.: Correntropy: a localized similarity measure. In: The Proceedings of IEEE International Joint Conference on Neural Network Proceedings (2006) 19. Bowyer, S.M.: Coherence a measure of the brain networks: past and present. Neuropsychiatr. Electrophysiol. 2(1), 1 (2016) 20. Korzeniewska, A., et al.: Determination of information flow direction among brain structures by a modified directed transfer function (dDTF) method. J. Neurosci. Methods 125(1–2), 195–207 (2003) 21. Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley (2000) 22. Kar, R., et al.: Uncertainty management by feature space tuning for single-trial P300 detection. Int. J. Fuzzy Syst. 21(3), 916–929 (2019) 23. Lotte, F., et al.: Towards ambulatory brain-computer interfaces: a pilot study with P300 signals. In: Proceedings of the International Conference on Advances in Computer Enterntainment Technology. ACM (2009) 24. Welch, P.: The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 15(2), 70–73 (1967) 25. Kitzbichler, M.G., et al.: Broadband criticality of human brain network synchronization. PLoS Comput. Biol. 5(3), e1000314 (2009) 26. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 27. Priestley, M.B.: Spectral Analysis and Time Series, vol. 1. Academic press, London (1981) 28. https://bnci-horizon-2020.eu/database/data-sets. 29. Hall, M.A.: Correlation-based feature selection for machine learning (1999) 30. Baccalá, L.A., Sameshima, K.: Partial directed coherence: a new concept in neural structure determination. Biol. Cybern. 84, 463–474 (2001) 31. Wang, G., Takigawa, M.: Directed coherence as a measure of interhemispheric correlation of EEG. Int. J. Psychophysiol. 13(2), 119–128 (1992) 32. Wilcoxon, F., Katti, S.K., Wilcox, R.A.: Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Select. Tables Math. Stat. 1, 171–259 (1970) 33. Mazumder, I.: An analytical approach of EEG analysis for emotion recognition. In: 2019 Devices for Integrated Circuit (DevIC) 2019 Mar 23 (pp. 256–260). IEEE

Comparative Study of the Effect of Different Fitness Functions in PSO Algorithm on Band Selection of Hyperspectral Imagery Aditi Roy Chowdhury, Joydev Hazra, Kousik Dasgupta, and Paramartha Dutta Abstract The innate intricacy of hyperspectral images and the absence of the mark data set make the band selection a challenging task in hyperspectral imaging. Computational multifaceted nature can be decreased by distinguishing suitable bands and simultaneously optimizing the number of bands. The PSO (Particle swarm optimization) based technique is used for this purpose. Fitness function takes a significant role in PSO to make a balance between the optimal solution and the accuracy rate. Different distance metrics like Euclidean, City Block, etc. are used as fitness functions and the aftereffects of a similar investigation on different data sets are reported in the present paper. Keywords Hyperspectral · PSO · Fitness function · Band selection

1 Introduction The hyperspectral images can be visualized as a data cube in a three-dimensional space where the x,y plane represents spatial information, and the z plane represents spectral information. Different sensors like AVIRIS, HYDICE, etc. can collect image information with hundred of bands. Accordingly, the data volume of the hyperspectral image gets enormous and it makes a challenge in data transmission as well as in data processing. Dimensionality reduction is a method to reduce this immense volume of data maintaining classification accuracy. A. R. Chowdhury (B) Women’s Polytechnic, Kolkata, India e-mail: [email protected] J. Hazra Heritage Institute of Technology, Kolkata, India K. Dasgupta Kalyani Government Engineering College, Kalyani, Nadia, India P. Dutta Visvabharati University, Santiniketan, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_9

101

102

A. R. Chowdhury et al.

2 Literature Survey Dimensionality reduction is a well-known technique in the field of hyperspectral images to reduce data volume. Band choice or significant band identification is one of the dimensionality reduction techniques. Depending on the accessibility of the data set, the band selection technique can be of three types: supervised, semi-supervised, and unsupervised. In differential mutual information, the authors proposed mutual information among the pixel value in the spectral image and corresponding label in the reference map to establish the dependency. But it requires a large amount of sample data and the technique depends on the training data set. The key problems with the supervised technique are to optimize the number of bands as well as minimize the classification error. In MVPCA [1], the bands are sorted based on some criteria and correlation with other bands. In [2], CSA was used for dimensionality reduction. In trivariate MI and semi-supervised TMI [3], the authors considered the relationship among three variables, i.e., the class labels and two bands, and they incorporated CSA as a search strategy. In PSO [4], the authors proposed a semi-automatic technique where two PSO are fused. The outer PSO deals with the optimum no. of selected bands. The inner PSO deals with the optimal bands. But the computational cost is very high. The Pairwise band selection (PWBS) framework is a semi-supervised technique in band selection [5]. Another semi-supervised band selection technique on the hybrid paradigm is reported in [6]. This technique hybridized clonal selection algorithm and 2DPCA. Due to the lack of prior knowledge, unsupervised techniques are gaining popularity. In [7], the authors proposed a hierarchical unsupervised clustering algorithm using mutual information or Kullback–Leibler divergence. Due to the destruction of band correlation, some crucial information distorted and also loss the special information and interpretation of Hyperspectral images. Multiobjective optimization technique [8] deals with the optimization of two objective functions simultaneously, i.e., information entropy and the no. of selected bands. But it is not so much effective to remove redundancy. Unsupervised split and merge [9] technique split low correlated band and merge high correlated band and sub-bands. But it depends on some algorithmic parameters which again are dependent on hyperspectral sensors. In [10], researchers incorporate Fuzzy with PSO (Particle Swarm Optimization) algorithm to improve the performance of band selection. Dominant set extraction [11] is another unsupervised technique for band selection. In [12], the objective function called MSR (maximum–submaximum ratio) is proposed for the detection of useful bands during PSO. In this article, we performed a comparative analysis of different fitness functions used in an optimization algorithm to select useful bands. In this work, we used PSO to select the combinations of the best bands. As fitness functions, we used various distance metrics like City Block, Euclidean, Cosine, etc. Overall accuracy and Kappa Coefficient for the best fitness functions give convincing results.

Comparative Study of the Effect of Different Fitness Functions in PSO Algorithm …

103

3 Relevant Techniques for the Proposed Method Particle Swarm Optimization or PSO is an optimization algorithm inspired by nature. Essentially it depends on the flocking behavior of birds. A flock of birds is haphazardly searching for foods in a specific region. At first, they have no clue about the food or the amount of food. Yet, that region contains a modest quantity of food. Now, in each trial/iteration, the birds have an idea about the distance of the food. Thus, each bird must follow the bird with the smallest distance from food.

3.1 Particle Swam Optimization This above mention idea is mimicked in PSO. In the search space, every single solution is known as particles (birds). All the particles have some fitness values(using fitness function). They also have velocities which direct the flying of the particles. Initially, PSO has a set of particles and afterward enhances the solution (particle) set by different iterations. In every iteration, the particles are updated by two values. One is the best fitness value or best solution of any particular particle and calls it P L best . In the search space, another best fitness value is obtained from all the particles, i.e., G L best . After finding the two values, every particle updates velocity (the rate of change of position) and positions using (1) and (2). (Vi (k + 1)) = I × V p (k) + α1 × rand(0, 1) × (P L best − present p ) +α2 × rand(0, 1) × (G L best − present p )

(1)

positioni (k + 1) = positioni (k) + Vi (k + 1)

(2)

where Vi is the ith particle velocity, I is the inertia coefficient, and rand (0,1) is a random number between (0, 1). The α1 and α2 learning rates of local optimum and global optimum, respectively. Choice of learning rate is very important. For large value of learning rate, the speed of updation will probably surpass the plausible limits of the solutions. When they are excessively small, the speed of searching will be moderate.

4 Proposed Work In this proposed method, the PSO algorithm is used to optimize the no. of bands for the hyperspectral image. Fitness function is a significant factor in PSO. Choice of fitness function should be such that it must make a balance between the optimal

104

A. R. Chowdhury et al.

solution, computational cost as well as highest accuracy. Since if the evaluation of the fitness function is computationally intricate, then the overall computational cost of the optimization will become expensive. Different fitness functions can be utilized in PSO for band selection. Here, distance metrics are used as fitness functions. Different distance metrics like Euclidean, City Block, Cosine, and EMD are used as fitness functions as given in Eqs. 3, 4, 5, and 6. In Eq. (6), cdf is the cumulative distribution function. N 2 v1 − v2 Euclidean Distance : D(1, 2) =

(3)

i

City Block Distance : D(1, 2) =

N

|v1 − v2 |

(4)

i

Cosine Distance : D(1, 2) = 1 − EM Distance : D(1, 2) =

1.2 ||1||.||2||

(|(cd f (x) − cd f (y)|))

(5) (6)

4.1 Algorithm of the Proposed Work Algorithm 1 clearly explains the basic steps of the proposed method. Algorithm 1 Proposed algorithm 1. Initial particles pti are generated by selecting bands randomly from the set of bands. 2. Calculate the value of the fitness function of each particle and save them to pttextbest i (t) and set the gtbest (t) = max( ptbesti (t)) 3. Initialize particle velocity pvi . 4. For each particle compute velocity of the particle using (1) and position of the particle using (2) pti for i = 1, 2, 3, 4, . . . N (N is the no. of particles). 5. Calculate the value of the fitness function for the particle pti (t + 1) by any one of equations (3), (4), (5), (6). 6. If pti (t + 1) > ptbesti (t) then ptbesti (t + 1) = pti (t + 1) 7. If max( pti (t + 1)) > gtbest (t) then update gtbest (t + 1) = max( pti (t + 1). 8. If stopping criteria is satisfied then go to step 9 else goto step 4. 9. Terminate.

Comparative Study of the Effect of Different Fitness Functions in PSO Algorithm …

105

5 Experiment and Analysis 5.1 Data Set Description To compare the effectiveness of different fitness functions in the PSO algorithm, different tests are conveyed out on two notable hyperspectral informational data sets, in particular, Botswana and Indiana Pine image. Botswana data set contains 145 bands with 1476 × 256 pixels in each band. It consists of 14 identified classes. These classes represent the impact of flooding on different vegetation. The Size of the image of the Indiana data set (Fig. 1) is 145 × 145 and it has 220 spectral bands. After removing noisy and useless bands, the number of bands becomes 185. There are 16 different classes in the image. Tables 1 and 2 represent the class name and no. of samples in each class used in the proposed method as mentioned above.

5.2 Result Analysis Execution of the proposed PSO with different distance metrics on the abovementioned data sets is depicted in Figs. 2 and 3. Before experimenting, normalization of data has been done. The experiments are performed for different band numbers from 10 to 30.

Fig. 1 Indiana Pine image (a) Actual image (b) Ground truth

106

A. R. Chowdhury et al.

Table 1 Class label and number of samples of Botswana data set Class No. of samples Land type CL1 CL2 CL3 CL4 CL5 CL6 CL7 CL8 CL9 CL10 CL11 CL12 CL13 CL14

270 101 251 215 269 269 259 203 314 248 305 181 268 95

Water Hippo Grass FloodPlain Grasses 1 FloodPlain Grasses 2 Reeds Riparian Firescar Island Interior Acacia Woodlands Acacia Shrublands Acacia Grasslands Short Mopane Mixed Mopane Exposes Soils

Table 2 Class label and Number of Samples of Indiana data set Class No. of samples Land type CL1 CL2 CL3 CL4 CL5 CL6 CL7 CL8 CL9 CL10 CL11 CL12 Cl13 CL14 CL15 Cl16

52 224 380 20 734 1234 486 495 746 2408 19 898 610 1290 90 210

Alfalfa Corn Buildings-grass-trees-drives Grass-pasture-mowed Corn-min till Corn-no till Hay-windrowed Grass/pasture Grass/trees Soybean-min till Oats Soybean-no till Soybean-clean till Woods Stone-steel-towers Wheat

Comparative Study of the Effect of Different Fitness Functions in PSO Algorithm …

107

Fig. 2 Comparative analysis of different fitness functions in PSO based on the accuracy of Botswana Data

Fig. 3 Comparative analysis of different fitness functions in PSO based on the accuracy of Indiana Pine Data

From the given graph, it is clear that the choice of the fitness function in the PSO algorithm depends on specific data set. In terms of overall accuracy, City Block distance performs better in the case of the Indiana data set. But for the Botswana data set, Euclidean distance performs better. Confusion matrix of Botswana data set computed by the proposed method with Euclidean distance as a fitness function is depicted in Table 3. From Table 3, it is noticed that classes C L1 − C L2,C L7 − C L9, and C L11 − C L14 have good classification accuracy than the remaining classes. The computational complexity of the proposed method is the same as that of PSO since in regards to fitness function associated overhead is negligible. From Fig. 4, it is clear that the computational time of cosine distance is better than the other two.

CL2

0 95 2 4 0 0 0 0 0 0 0 0 0 0 101

94.06

Class No CL1

268 0 0 0 2 0 0 0 0 0 0 0 0 0 270

99.26

CL1 CL2 CL3 CL4 CL5 CL6 CL7 CL8 CL9 CL10 CL11 CL12 CL13 CL14 Pixels in Gt Producer Accur

84.86

0 0 213 16 0 22 0 0 0 0 0 0 0 0 251

CL3

82.33

0 0 0 177 38 0 0 0 0 0 0 0 0 0 215

CL4

94.80

0 0 0 0 255 0 0 12 2 0 0 0 0 0 269

CL5

93.31

0 0 0 10 6 251 0 2 0 0 0 0 0 0 269

CL6

92.28

0 0 0 0 0 14 239 0 0 6 0 0 0 0 259

CL7

96.06

0 0 6 0 2 0 0 195 0 0 0 0 0 0 203

CL8

80.89

3 0 21 0 0 0 0 0 254 32 4 0 0 0 314

CL9

99.19

0 0 0 0 0 2 0 0 0 246 0 0 0 0 248

CL10

96.07

0 0 0 0 0 0 0 0 0 2 293 10 0 0 305

CL11

80.11

0 0 0 0 0 0 0 0 22 0 14 145 0 0 181

CL12

100

0 0 0 0 0 0 0 0 0 0 0 0 268 0 268

CL13

Table 3 Confusion matrix of the proposed algorithm with Euclidean distance as a fitness function on Botswana Data set

89.47

0 0 0 0 0 0 0 0 0 70 2 0 8 85 95

CL14 271 95 242 207 303 289 239 209 278 286 313 155 276 85 3248

Pixels Classify

98.89 100 88.02 85.51 84.16 86.85 100 93.30 91.37 86.01 93.61 93.55 97.10 100 OA 91.87 Kappa 0.9119

User Accur

108 A. R. Chowdhury et al.

Comparative Study of the Effect of Different Fitness Functions in PSO Algorithm …

109

Fig. 4 Computation time of the proposed algorithm with different distance metrics as fitness functions on Indiana Data set

6 Conclusion In image processing, fitness functions are a very significant factor, especially in optimization algorithms. However, many times, the fitness function is not carefully considered and thus an inaccurate outcome is acquired. In this paper, our aim is to spot the foremost suitable fitness function for PSO for the selection of various bands of hyperspectral image cube. We conclude that the same fitness function is not suitable to achieve the highest classification accuracy for different hyperspectral data sets.

References 1. Sun, T.-L., Chang, C.I., Du, Q., Althouse, M.L.G.: A joint band prioritization and band decorrelation approach to band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 37(6), 2631–2641 (1999) 2. Huang, B., Gong, J., Zhang, L., Zhong, Y.: Dimensionality reduction based on clonal selection for hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 45(12), 4172–4186 (2007) 3. Zhang, X., Feng, J., Jiao, L.C., Sun, T.: Hyperspectral band selection based on trivariate mutual information and clonal selection. IEEE Trans. Geosci. Remote Sens. 52(7), 4092–4105 (2014) 4. Su, Genshe Chen Peijun Du Hongjun, Qian, Du: Optimized hyperspectral band selection using particle swarm optimization. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 7(6), 2659–2670 (2014)

110

A. R. Chowdhury et al.

5. Bai, Limin Shi Jun., Xiang, Shiming, Pan, Chunhong: Semisupervised pair-wise band selection for hyperspectral images. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 8(6), 2798– 2813 (2015) 6. Chowdhury, A.R., Hazra, J., Dutta, P.: A hybrid approach for band selection of hyperspectral images. In: Hybrid Intelligence for Image Analysis and Understanding, Chapter 11, pp. 263– 282. John Wiley and Sons Ltd. (2017) 7. Sotoca, J.M., Martinez-Uso, A., Pla, F., Garcia-Sevilla, P.: Clustering based hyperspectral band selection using information measures. IEEE Trans. Geosci. Remote Sens. 45(12), 4158–4171 (2007) 8. Gong, Yuan Yuan Maoguo, Zhang, Mingyang: Unsupervised band selection based on evolutionary multiobjective optimization for hyperspectral images. IEEE Trans. Geosci. Remote Sens. 54(1), 544–557 (2016) 9. Rashwan, S., Dobigeon, N.: A split-and-merge approach for hyperspectral band selection. IEEE Geosci. Remote Sens. Lett. 14(8) (2017) 10. Chang, C.I., Wu, C.C., Liu, K.H., Chen, H.M., Chen, C.C.C., Wen, C.H.: Progressive band processing of linear spectral unmixing for hyperspectral imagery. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 8(6), 2583–2597 (2015) 11. Zhu, Jingsheng Lei Zhongqin Bi Feifei Xu Guokang, Huang, Yuancheng: Unsupervised hyperspectral band selection by dominant set extraction. IEEE Trans. Geosci. Remote Sens. 54(1), 227–239 (2016) 12. Younan, Nicolas H., Yan, Xu, Qian, Du: Particle swarm optimization-based band selection for hyperspectral target detection. IEEE Geosci. Remote Sens. Lett. 14(4), 554–558 (2017)

Breast Abnormality Detection Using Texture Feature Extracted by Difference-Based Variable-Size Local Filter (DVLF) Sourav Pramanik, Debotosh Bhattacharjee, and Mita Nasipuri

Abstract This paper proposes a novel texture descriptor, called difference-based variable-size local filter (DVLF), to detect breast abnormality in breast thermograms. Firstly, the contrast in the grayscale thermal breast image is improved using a BBP (Breast Blood Perfusion) model. Then, DVLF is applied to extract local texture features from the contrast-enhanced image. In the next phase, asymmetry features are extracted by applying a block-based version of distance correlation measure. Finally, the feedforward backpropagation network with the Levenberg–Marquardt training method is used as a classifier. The proposed system has been tested on 100 frontal view breast thermograms of the DMR-IR database, including 60 benign and 40 malignant. Experimental results on this dataset show that the proposed system can distinguish benign and malignant breasts with an accuracy of 95.6%, and a sensitivity and specificity of 94% and 97%, respectively. Keywords Breast thermogram · DVLF · Distance correlation · Texture features

1 Introduction Breast cancer is one of the world’s most daunting and lethal diseases in females [1]. However, it is well known that early diagnosis of this deadly disease can reduce the mortality rate [2]. In recent years, thermography has received lots of attention from various researchers to diagnose breast cancer in its early growing stage. The presence of a tumor in the breast typically generates more heat than the normal breast tissue. S. Pramanik (B) New Alipore College, Kolkata, India e-mail: [email protected] D. Bhattacharjee · M. Nasipuri Jadavpur University, Kolkata, India e-mail: [email protected] M. Nasipuri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_10

111

112

S. Pramanik et al.

This temperature appears as a hot spot in the breast thermogram and plays a crucial in the diagnosis of breast abnormality [2, 3]. However, interpretation of these hot spots, in search of abnormalities, presently relies on the visual analysis by the experts. Typically, radiologists compared the right and left breast thermograms to find out the degree of asymmetry between them. A minute difference in temperature patterns between them typically indicates the existence of breast abnormality. However, sometimes this minute difference in temperature patterns may miss by the radiologist because of constraints in the human visual system. Thus, the computer-assisted analysis of breast thermograms has drawn considerable attention from various researchers for the early detection of breast cancer. As a consequence, notable progress has been made in this regard. In [3–5], the authors have presented a comprehensive review of thermography-based breast cancer detection. However, most of the methods mentioned in [3–5], used standard statistical features, fractal dimension-based features, histogram-based features, and GLCM-based features for asymmetry analysis. Recently, Madhu et al. [6] used some interpretable medical features and reported convincing results concerning specificity and sensitivity. In [7], the authors derived GLCM-based features and histogrambased features from the right and left breast thermograms of a patient to assess the asymmetry between them. In this work, we presented a new difference-based variable-size local filter (DVLF) for the extraction of temperature distribution by means of texture patterns in breast thermograms. Then, a method is proposed based on DVLF to detect breast abnormality in breast thermograms. It is important to note that here, we focused on differentiating benign and malignant breasts. However, it is very hard to find any literature on this aspect. At the very beginning of the proposed method, left and right breast thermograms’ contrast is enhanced using a BBP model. Then, local texture information is captured using the proposed second-order DVLF. Thereafter, asymmetry features between left and right breast thermal images are extracted using a blockbased version of the distance correlation measure. Finally, a Feedforward Backpropagation Network (FBN) is employed for classification. The experiment is conducted on the images of the DMR-IR database. By observing experimental results, it is clear that our proposed method differentiates benign breasts from the malignant one significantly well.

2 Proposed System Figure 1 illustrates a pictorial diagram of the proposed system. The proposed system includes multiple steps: pre-processing, blood perfusion image generation, localization of texture feature using DVLF, and finally, an asymmetry analysis. The following subsections would detail each of the steps.

Breast Abnormality Detection Using Texture …

113

Fig.1 Pipeline of the proposed system

(a)

(b)

(c)

(d)

(e)

Fig. 2 a Gray-level breast thermograms, b-c cropped right and left breast areas, d-e corresponding BP images

2.1 Pre-processing Typically, the grayscale breast thermograms consist of non-breast regions along with the breast region (see Fig. 2a). Non-breast areas such as arms, neck, and abdomen do not yield any relevant information for breast cancer diagnosis. Hence, the cropping of the breast region from a breast thermogram plays a key step in the detection of breast cancer. Regrettably, very few works have been published in the literature on automated breast region segmentation [3] 7 and are not sufficiently stable to apply to various types of breast thermogram databases. Hence, in the present work, we manually segmented the breast region with the help of experts.

2.2 Blood Perfusion Image Generation Typically, the grayscale breast thermograms are of very low contrast that adequately fails to reveal different regions’ structural details. In effect, it is challenging to extract essential features like edges, regular and irregular texture information, and boundary information of different regions from those images to further process them. Thus, a BBP (breast blood perfusion) model [8] is used here to transform the grayscale thermal breast images into blood perfusion (BP) images. The BBP model is formulated based on the breast’s thermal physiology, and thus, it results in high blood perfusion in the relatively high-temperature region in the breast and low blood perfusion in the low-temperature region. Figure 2 shows the thermal breast image and the corresponding BP image. It can be seen that the BP image provides considerably

114

S. Pramanik et al.

better contrast between low- and high-temperature areas compared to the originally captured grayscale thermal breast image.

2.3 Difference-Based Variable-Size Local Filter (DVLF) In this section, we presented a new texture descriptor, called difference-based variable-size local filter (DVLF), which is capable of characterizing more detailed image information. This method is motivated by the difference of inverse probability (DIP) [9], which is a well-known descriptor for the extraction of sketch features like the edges and valleys from the grayscale images. In DIP, the difference is taken between the inverse of the probability of the center pixel and the pixel having a maximum value in the local region and considered as the changes of the center pixel. However, it is very much sensitive to noise. If the center pixel and the pixel having a maximum value in the local region are equal, it will produce a zero response. Also, the response at the edge or valley region is very low. The proposed difference-based variable-size local filter (DVLF) also used inverse probability, as used in DIP, in a local region to extract the texture features from an image. However, there is a fundamental difference between our DVLF and DIP, which is also claimed the novelty of our method. In DVLF, we have used the inverse of probability differences between its neighboring pixels and a center pixel as the changes of the center pixel. Let I be an image and w denotes the local region in the image I; p0 be the center pixel in a local region w; pi (i = 0, 1, . . . ., l) signify the neighboring pixels of p0 . The size of the local region w is defined as (2m + 1, 2m + 1). The value of m plays a key role in the computation of DVLF. Therefore, in this work, we define DVLF up to nth-order based on the value of m. Now the first-order DVLF at p0 in w can be computed as DV L F 1 ( p0 ) =

l p p − pi p0 i=0

(1)

where p = li=0 pi be the sum of intensities in w, l = (2m + 1) × (2m + 1), m = 1 (as we are calculating the first-order DVLF). Figure 3a shows the local region representation used in the first-order DVLF. At the very first step of the first-order DVLF, the differences between the inverse of the probability of the neighboring pixels and a center pixel in w are computed. Then, we encode the neighboring effect on the center pixel by taking a sum of differences. It is noted that the value of first-order DVLF is negative if the neighboring pixels are less than p0 . Similarly, it is positive when the neighboring pixels pi are greater than p0 . Thus, the proposed DVLF preserves more discriminating image information compared to

Breast Abnormality Detection Using Texture …

115

Fig. 3 a-b Pixel representation for 1st–order DVLF and 2nd–order DVLF

the absolute difference. As we have encoded here the neighboring effect on the center pixel, it is very much robust to noise and has more discriminative ability than DIP. The second-order DVLF at p0 in W is defined as DV L F 2 ( p0 ) =

l q i=0

pi

−

q p0

(2)

k where q = j=0 p j denotes the sum of intensities in a small region w of size k = (2m + 1) × (2m + 1), pi denote the intensities in a large local region W , k ⊆ l, here for the small region the value of m = 1 and for large region m = 2 (as we are calculating second-order DVLF). Figure 3b shows a local region representation for the second-order DVLF. It can be observed from the above formulation that the second-order DVLF is slightly different from the first-order DVLF. For the computation of the secondorder DVLF, two local regions, namely, the small region (w) and large region (W ) of different sizes are considered. The size of the small region is similar to the size of the local region considered in the case of the first-order DVLF and the size of the large region depends on the order of DVLF. The second-order DVLF computes the differences between the inverse of the probability of the neighboring pixels and a center pixel in a large local region concerning a small local region. As a result, it preserves more discriminating image features compared to the first-order DVLF. In a general formulation, the size of local regions that is a small region and large region, for the n th -order DVLF can be defined as m = n − 1; f or thesmallestr egion(w) m = n; f or thelargerr egion(W ) The high-order DVLF can provide more discriminating features than the firstorder DVLF. However, the problem is when the order (n) becomes large it gravitates to be sensitive to noise.

116

S. Pramanik et al.

Fig. 4 a DIP image, b 1st-order DVLF image, and c 2nd-order DVLF image

In this work, we have used second-order DVLF to extract texture features from the thermal breast images. Figure 4 shows the results of DVLF on the thermal breast image.

2.4 Asymmetry Analysis As stated earlier, a minute difference in the temperature of the left and right breasts may indicate abnormality [2]. In this work, we used a block-based Distance correlation measure [10] to compute the degree of asymmetry between two breasts. The distance correlation value is zero when two breasts are completely dissimilar, and a higher value implies that two breasts are statistically symmetrical. An important property of this measure that motivated us is that it not only considers the pixel value, also used the spatial information of the pixel to measure the degree of asymmetry between two samples. Since the size of the breast has varied for women, the number of blocks will also be different. To overcome the said problem, each breast thermogram is divided into a fixed number of blocks. Here, each breast thermogram is divided into 100 blocks. Then, distance correlation is computed between two same correspondence blocks and considered as the asymmetry feature of that block. The detailed algorithm for asymmetry measures using distance correlation is bellowed.

Breast Abnormality Detection Using Texture …

117

3 Results and Discussion 3.1 Dataset Collection Breast thermograms of the DMR-IR (Database for Mastology Research with Infrared Images) database [11] are used in this work to test the proposed method. A detailed description of the database could be found in [11]. The database contains breast thermograms of 237 individuals with three different breast conditions: normal, benign, and malignant. In this work, we have randomly selected 100 frontal view breast thermograms with forty malignant and sixty benign cases for the experimental purpose.

3.2 Classification A three-layer Feedforward Backpropagation Network (FBN) is used here as the classifier which has 100-neurons in both the input and hidden layers. The output

118

S. Pramanik et al.

Fig. 5 ROC curves

layer contains one neuron. For the input layer, a linear transfer function is used and both the hidden and output layer use the tan-sigmoid transfer function. The Levenberg–Marquardt backpropagation algorithm is used to train (learning rate = 0.1) the network. 60 (35 benign and 25 malignant) samples are randomly selected from a set of 100 breast thermograms for training the network, and the remaining samples are used as a test set. Three performance measures, such as sensitivity (Sen.), specificity (Spec.), and accuracy (Acc.) [3], are used here to evaluate the performance capability of the classifier. The higher the values of these metrics, the greater the efficiency of the system. In the field of medical image analysis, the ROC curve is the most common representation of the overall system’s performance. The area under the ROC curve (AUC) is a metric used to evaluate the characteristic of the ROC curve. The AUC value close to 0.5 signifies a lousy test, and closure of 1 means a better diagnostic test [12]. In this work, we also used the ROC curve, see Fig. 5, to quantify the system performance. Comparatively, the second-order DVLF retained an AUC of 0.995, which is very close to 1. Table 1 shows the experimental results of our proposed system along with two other methods. It can be seen that the discriminating ability of the second-order DVLF significantly improved compared to the first-order DVLF. Also, it outperforms the original DIP. Also, we compared our proposed method with a texture featurebased method [13]. In [13], Acharya et al. derived some texture features from breast thermograms and used an SVM classifier for classification. For a fair comparison, we implemented this method and applied it to the dataset that we used for this work. The results are also given in Table-I. The resultant sensitivity is very low compared to our method, which means the capability of identifying malignant breast thermograms is significantly lower. Moreover, the proposed method also performed better concerning sensitivity, specificity, and accuracy against the method mentioned in [7]. In [7], the reported accuracy, sensitivity, and specificity are 90%, 87.5%, and 92.5%, respectively. Hence, we can infer that our proposed texture descriptor can

Breast Abnormality Detection Using Texture …

119

Table 1 Performance comparison of the dip, acharya method, and the proposed dvlf-based method Methods

Sen. (%)

Spec. (%)

Acc. (%)

AUC

DIP + FBN

88

91

89.7

0.963

1st-order DVLF

94

88

91.2

0.962

2nd-order DVLF

94

97

95.6

0.995

Acharya-features [13]

77.5

90

85

0.855

differentiate breast thermograms into malignant and benign cases significantly well compared to the original DIP and other methods [7] 13. Also, our proposed system (i.e., 2nd-order DVLF + FBN) produces comparable results to a deep learning-based breast abnormality detection method [14] concerning the accuracy, sensitivity, and specificity, respectively. In [14], the reported accuracy, sensitivity, and specificity are 95%, 94%, and 92%, respectively.

4 Conclusion We developed a local feature descriptor, called DVLF, for breast abnormality detection in thermal breast images. Firstly, the grayscale breast thermogram is converted to the BP image. Then, texture features are extracted from the BP image using the proposed DVLF descriptor. After that, asymmetry features are extracted from the right and left breast thermograms of an individual. Here, we derive an asymmetry feature vector of length 100 for an individual and employed a three-layer FBN to differentiate benign and malignant breasts. Experimental results on a dataset of the DMR-IR database demonstrate that the proposed method based on DVLF performs significantly better than the DIP and Acharya method.

References 1. “Cancer Today,” (Date last accessed 16th Sept. 2019). [Online]. Available: https://gco.iarc.fr/ today/home 2. Foster, K.R.: Thermographic detection of breast cancer. IEEE Eng. Med. Biol. 17, 10–14 (1998) 3. Borchartt, T.B., Conic, A., Lima, R.C.F., Resmini, R., Sanchez, A.: Breast thermography from an image processing viewpoint: a survey. Sign. Proc. 93, 2785–2803 (2013) 4. Pramanik, S., Bhattacharjee, D., Nasipuri, M.: Texture analysis of breast thermogram for differentiation of malignant and benign breast. Proc. IEEE International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016), 8–14 (2016) 5. Singh, D., Singh, A.K.: Role of image thermography in early breast cancer detection-Past, present and future. Computer methods and programs in biomedicine, 105074 (2019) 6. Madhu, H., Kakileti, S.T., Venkataramani, K., Jabbireddy, S.: Extraction of medically interpretable features for classification of malignancy in breast thermography. IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC), pp. 1062–1065 (2016)

120

S. Pramanik et al.

7. Sathish, D., Kamath, S., Prasad, K., Kadavigere, R., Martis, R.J.: Asymmetry analysis of breast thermograms using automated segmentation and texture features. SIViP (2016). https://doi.org/ 10.1007/s11760-016-1018-y 8. Pramanik, S., Banik, D., Bhattacharjee, D., Nasipuri, M., Bhowmik, M.K.: Breast blood perfusion (BBP) model and its application in differentiation of malignant and benign breast. In Advanced Computational and Communication Paradigms (pp. 406–413). Springer, Singapore (2018) 9. Ryoo, Y.J., Kim, N.C.: Valley operator for extracting sketch features: DIP. Electron. Lett. 248:pp. 461–463 (1988, Apr) 10. Szekely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependency by correlation of distances. The Annals Stat. 35(6), 2769–2794 (2007) 11. Silva, L.F., Saade, D.C.M., Sequeiros-Olivera, G.O., Silva, A.C., Paiva, A.C., Bravo, R.S., Conci, A.: A new database for breast research with infrared image. J. Med. Imag. Health Inform. 4(1), 92–100 (2014) 12. Tilaki, K.H.: Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J. Intern. Med. 4(2), 627–635 (2013) 13. Acharya, U.R., Ng, E.Y.K., Tan, J.H., Sree, S.V.: Thermography based breast cancer detection using texture features and support vector machine. J. Med. Syst. 36(3), 1503–1510 (2012) 14. Baffa, M.D.F.O., Lattari, L.G.: Convolutional neural networks for static and dynamic breast infrared imaging classification, In 2018 IEEE 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 174–181 (2018)

Nuclei Image Boundary Detection Based on Interval Type-2 Fuzzy Set and Bat Algorithm Soumyadip Dhar, Hiranmoy Roy, Rajib Saha, Parama Bagchi, and Bishal Ghosh

Abstract In biological science, detection of nuclei in cell images is necessary for research and disease identification. The low light and varying illumination in the nuclei images make them highly uncertain to differentiate. As a result, the automatic boundary detection of the nuclei in cell images is a crucial task. Here, we propose a nuclei image boundary detection based on an interval type-2 fuzzy set(IT2FS). The IT2FS manages the uncertainties in the nuclei images and helps to detect accurate boundaries. The bat algorithm (BA) is exploited to generate the proper IT2FS build upon statistics of the image. The Kaggle 2018 dataset is used to measure the performance experimentally. Our method is found to be superior to the recently published methods on the standard dataset. Keywords Nuclei image · Type-2 fuzzy · Bat algorithm · Image boundary

1 Introduction In pathology for early detection of diseases, identification of cell nuclei is very necessary [1]. For proper identification of nuclei images, accurate detection of the boundaries between the nuclei is required. Not only the detection but also the localS. Dhar (B) · H. Roy · R. Saha · P. Bagchi · B. Ghosh RCC Institute of Information Technology, Kolkata, India e-mail: [email protected] H. Roy e-mail: [email protected] R. Saha e-mail: [email protected] P. Bagchi e-mail: [email protected] B. Ghosh e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_11

121

122

S. Dhar et al.

ization of the boundaries is necessary for the analysis of the nuclei images. The detection of boundaries between the nuclei means the detection of edges between them. Due to various types of perturbations like low light and varying illumination, it is difficult to recognize the boundaries between them. The reason is that due to the perturbations, there are high uncertainties in the nuclei image patterns. Due to the uncertainties, it is difficult and challenging to detect accurate boundaries between the nuclei images. Edge or boundary identification is an elementary task in object identification in an image. One of the famous methods for edge detection was proposed by Canny [2]. In this paper, he showed that the detection and localization of boundaries simultaneously is an ill-defined problem. That means high uncertainties arise in the detection and localization of boundaries. As a result, automatic boundary detection of cell nuclei is a crucial problem in computer vision. To detect the edges or boundaries, several methods can be found in literature [3, 4]. The conventional methods for boundary detection use different filters to extract the edges from an image [5]. Some methods used spatial information for the detection of the boundary between the objects in an image [6]. Though the methods are efficient, low-intensity images like nuclei images require proper handling of uncertainties for accurate detection of boundaries. To manage the uncertainties in boundary detection, several methods used fuzzybased techniques. The fuzzy set can properly manage the uncertainties in an image for boundary detection [7–11]. The limitation of a fuzzy-based technique is the difficulty in the mapping of pixel values in the fuzzy domain. In most cases, the membership functions are related to a number of parameters. The parameters in the conventional fuzzy-based systems are ad-hoc or fixed. But, the proper values of the parameters are required for the detection of low-intensity nuclei images. Given the above limitations, here, we propose an interval type-2 fuzzy set(IT2FS)based technique for boundary detection in nuclei images. The generation of the IT2FS is based on statistical information of the image. To accelerate the generation of the IT2FS based on image statistics, we employ the bat algorithm [12]. The bat algorithm(BA) helps to find the optimized boundary in the nuclei images. The novelty of the proposed method is that here we determine the boundary between the nuclei by minimizing the uncertainties using the IT2FS. The uncertainties in parameter detection are minimized by utilizing the parameters based on image statistics. The search of the parameters is optimized based on the evolutionary bat algorithm.

2 Boundary Detection in an Image The basic work to find the boundaries between images is to find out the gradients in an image. Various methods for boundary detection can be found in the literature. Different methods used different techniques to calculate the gradient in an image. The intensity of the boundary gradient is x and it is given by

Nuclei Image Boundary Detection …

123

x = (|G 1 | + |G 2 |)/2.

(1)

Here, G 1 and G 2 represent row gradient and column gradient of the image, respectively. For the gradient detection, Laplacian is the second difference operator and it is given by 1 G 1 = (xm−1,n − xm+1,n − 2xm,n ) 2 . (2) 1 G 2 = (xm,n−1 − xm,n+1 − 2xm,n ). 2 Here, xm,n is the pixel intensity at a location (m, n) in an image. Among the different types of operators, Laplacian has a good detection capacity of corners and line end. This is the reason in this paper we used the Laplacian gradient for the boundary detection of nuclei images. For boundary detection, the gradient detection is followed by computation of threshold t value to differentiate the edge and non-edge pixels. Finding the true edges using threshold value is difficult as it depends on the local and global statistics of the image. That means high uncertainty is involved with the detection of the threshold. The uncertainty increases in the nuclei images due to the highly uncertain image pattern. We have to determine the value of t such that the uncertainty is minimum. To minimize the uncertainties in the boundary detection, we use IT2FS. To minimize the uncertainties in boundary detection, the gradient is transformed into IT2FS. In the next section, we address the mapping of gradient into IT2FS.

2.1 Interval Type-2 Fuzzy Set (IT2FS) IT2FS is the fuzzy set where the membership values are not crisp. The membership is given by values lie between a range of values. Mathematically, an IT2FS A = A

μ A(x, u)/(x, u), Jx ⊆ [0, 1]. x∈X

(3)

u∈Jx

In the above presentation, Jx is the primary membership of x and μ A(x, u) is the secondary membership function. If x is discrete, is replaced by . If μ A(x, u) = 1∀x is called interval type-2 fuzzy set (IT2FS) [13]. Alternatively, in the Eq.(3), then A = {x, [μ(x), μ(x)]|x ∈ X }, where the μ(x) and μ(x) IT2FS is also written as A(x) are the lower and upper membership functions.

124

S. Dhar et al.

3 Mapping of Gradients into IT2FS For mapping the gradients into IT2FS, the gradients are first mapped into a type-1 fuzzy set(T1FS). To map the gradients into the T1FS, here we use the membership function proposed by Tizhoosh [14]. The function is given by ⎧ ⎪ ⎪ ⎨

x k1 1− if x < t t

μ(x) = L − x k2 ⎪ ⎪ ⎩1 − if x ≥ t L −t

(4)

where k1, k2 > 0. Here, x is the gradient value and t is the threshold value. The edge pixels are given by x ≥ t and non-edge pixels are given by x < t. The T1FS described above is transformed into an IT2FS by blurring operation. The conversion into the interval type-2 fuzzy membership function is given by 1

μ(x) = μ(x) α μ(x) = μ(x)α

.

(5)

Here, α is taken as α = [1 3]. Measure of uncertainty in IT2FS: The upper membership function μ(x) and lower membership function μ(x) play a significant role in measuring the uncertainty. Here, we utilize Kacpryzk and Smidtz’s [15] measure of uncertainty in the IT2FS domain. The measure is given by = E( A)

N 1 1 − max(1 − μ A(x), μ A(x))) . N i=1 1 − min(1 − μ A(x), μ A(x)))

(6)

Here, N becomes the fuzzy set cardinality. For the proper value of t, the uncertainty measure E will be minimum.

4 Boundary Detection as Constraint Optimization From the preceding sections, it is apparent that the uncertainties should be minimized, i.e., E should be minimized to get the proper boundary between the nuclei. But,the transformation of gradients into IT2FS involves the parameters k1, k2, α. For proper boundary detection, the parameters should be chosen based on the statistical information of the input image. That means boundary detection can be presented as a constraint optimization problem. The problem can be defined as

Nuclei Image Boundary Detection …

125

Minimize : E Subject to : k1, k2 ∈ [1 3], α ∈ [1 3], t ∈ [xmin xmin ].

(7)

To solve the constraint optimization problem, we use the bat algorithm. The bat algorithm helps to minimize E by finding the proper combination of parameters depending on the image at hand.

5 Theory of Bat Algorithm (BA) The BA finds a set of parameters to optimize a function by mimicking the food searching technique of a bat [12]. The method takes the virtues of both simulated annealing and particle swarm optimization. The bat algorithm executes the following steps: (1) Assigning initial values to parameters; Repeat the process; (2) Creating the current solutions; (3) Search locally to diminish the local optima effect; (4) Randomly fly to create new solutions; (5) Until the optimization criteria are met get the new best solutions.

5.1 Virtual Bats Movement The population in BA is initialized randomly. In this algorithm, the new results are established by the drift of simulated bats following the equations below. f i = f min + ( f max − f min )γ . p

p−1

vi = vi p

(8)

+ (bi − b∗ ) f i . p

p−1

bi = bi

(9)

p

+ vi .

p

(10) p

Here, the i the bat has the velocity vi with frequency f i and position bi at the p th iteration. The algorithm starts by assigning a random frequency f to each bat where f ∈ [ f min f max ]. The γ ∈ [0 1] takes a random value from a uniform distribution. The current global best solution is given by b∗ . This is computed by comparing the solutions in the bat population. This is followed by the selection of the best solution from the set of current best solutions. Then random walk is done to create a new solution locally and the walk is given by bnew = bold + η A p .

(11)

126

S. Dhar et al.

The parameter η ∈ [−1 1] is randomly created from Gaussian distribution. To reach an updated solution, η becomes the step size. The average loudness of best solutions at the present time step is given by A p . The bold is computed by Eq. (10).

6 Proposed Method for Boundary Detection In the proposed method, the introductory bat count is taken as b = 50 with iterations number iter = 40. The objective is to find out the parameters k1, k2, α, and threshold t with the constraints given in Sect. 4. The objective is to find the proper value of the parameters to minimize the entropy E. Here, E acts as a fitting function for the BA. For finding the best position of each bat in the population, i.e., best solution, the entropy is minimized and this minimization gives the best results for boundary detection. The algorithm for boundary detection is given in Algorithm 1. Input: A nucleus image of size M × N Output: The detected boundary of the images Initialisation: The bat population b is initialized by the position vectors given by random positions bi = [k1, k2, α, t]i . The frequency f i , the velocity vi , the loudness Ai and pulse emission rate ri where i = 1, 2, . . . , b are initialized for each bat in the population. The total count of iterations is M1 with initial iteration iter = 0. 1: Create the gradient image from input nuclei image using the Eq. (2). 2: repeat 3: for every bat bi in the bat population do 4: f i , vi and pi for bi are updated by b∗ (Eqs. 8, 9 and 10) to create new solution, i.e., the updated parameter vector. 5: if (rand > ri ) then 6: choose a local solution from the best solution neighborhood. Here rand stands for random variable; 7: end if 8: New solution binew (Eq. 11) is generated by random fly (Eq. 11); 9: Create the T1FS and corresponding IT2FS from the image using the Eqs. (4) and (5) respectively using the parameters [k1, k2, α, t]. 10: Calculate the type-2 fuzzy entropy E from Eq. (6), which acts as a fitting function. 11: if (rand < Ai and E(binew ) < E(b∗ )) then 12: Updated solution is created as bi = binew with increment of ri and the decrement of Ai ; 13: end if 14: end for 15: iter = iter + 1; 16: Pick up the current best solution b∗ for which the entropy is minimum in the present bat population; 17: until iter = M1 or there is no change in fitting function 18: Pick up the parameter bi = [k1, k2, α, t]i corresponding to the best solution. 19: Generate the boundaries of the nuclei image.

Algorithm 1: Proposed algorithm for nuclei image boundary detection

Nuclei Image Boundary Detection …

127

7 Results and Discussion We investigated the proposed method for boundary detection on the nuclei images. For the experimental purpose, the Kaggle 2018 dataset [16] was used. The dataset contains nuclei images of mice, humans, and flies. The images were captured under different illuminations and light conditions. The highly uncertain image form made it very difficult to detect boundaries between the nuclei images. The image dimensions varied from 200 × 200 to 300 × 300. The dataset also provided the ground truth of the boundaries. For the quantitative measure, in this paper, we used two metrics. They are figure of Pratt (IMP) [17] and edge-based structural similarity (ESSIM) [18]. In both measures, the high value indicates the edge detection results resemble the ground truth. In this paper, we compared the proposed method with boundary detection methods by Khan et al. [9], Shrivastav et al. [10], and Tang et al. [11]. We compared the latest methods both qualitatively and quantitatively for the nuclei images from the Kaggle 2018 dataset. The subjective or qualitative results of the presented method are demonstrated in Fig. 1. The quantitative results are demonstrated in Table 1. From the results, it is clear that the proposed methods can detect the nuclei boundaries more accurately than that of the methods mentioned here. The method by Khan used a type-1 fuzzy rule-based for uncertainty reduction in boundary detection. The proposed method used IT2FS for uncertainty reduction and it was a powerful tool for the reduction of uncertainties. Shrivastav also used a type-1 fuzzy set for boundary detection. The fuzzy set did not depend on the statistical information of the input image. The proposed method used IT2FS based on image statistics. The IT2FS similarity was used by Tan for boundary detection in images. But, the generation

Fig. 1 Boundary detection in the nuclei images from the Kaggle 2018 dataset. Row-wise (1) Original test images. (2) The boundary detection by the proposed method

128

S. Dhar et al.

Table 1 Performance of boundary detection of nuclei images from the Kaggle 2018 dataset. ↑ indicates that the higher quantity is the mark of better performance Mertic Khan Shrivastav Tang Proposed IMP↑ ESSIM ↑

0.7321 0.7231

0.8120 0.8012

0.8319 0.8231

0.9224 0.9172

of IT2FS depending on the bat algorithm helped the proposed method to detect the boundaries in the nuclei images more accurately. It is to be noted for the quantitative measure, we ran the proposed algorithm 20 times and took the average results.

8 Conclusions In this paper, we present a novel method for nuclei image boundary detection. The uncertainties in the boundary detection are reduced by an IT2FS. The gradients of the nuclei image are mapped into the IT2FS for boundary detection. In the proposed method, the bat algorithm helps to minimize the uncertainties by generating the proper set of parameters based on the statistical information of the image. The state-of-the-art methods found in the literature cannot reduce the uncertainties efficiently as the uncertainties exist in the determination of the proper parameter values. The experimental performances demonstrate that the presented method can identify boundaries from highly uncertain patterns efficiently. In the future, the method can be extended to detect boundaries under noise. Using the proposed technique, currently, we are working on boundary detection in highly uncertain medical images.

References 1. Abdolhoseini, M., Kluge, M.G., Walker, F.R., Johnson, S.J.: Segmentation of heavily clustered nuclei from histopathological images. Sci. Rep. 9(1), 1–13 (2019) 2. Canny, J.F.: Finding Edges and Lines in Images. Technical report, Massachusetts Inst of Tech Cambridge Artificial Intelligence Lab (1983) 3. Chen, X., Liu, H., Cao, W.M., Feng, J.Q.: Multispectral image edge detection via clifford gradient. Sci. China Inf. Sci. 55(2), 260–269 (2012) 4. Zhang, X., Liu, C.: An ideal image edge detection scheme. Multidimension. Syst. Signal Process. 25(4), 659–681 (2014) 5. Rao, D., Rai, S.: A Review on Edge Detection Technique in Image Processing Techniques, vol. 2, pp. 345–349 (2016) 6. Howard, M., Hock, M.C., Meehan, B.T., Dresselhaus-Cooper, L.E.: A locally adapting technique for edge detection using image segmentation. SIAM J. Sci. Comput. 40(4), B1161–B1179 (2018) 7. Liang, L.R., Looney, C.G.: Competitive fuzzy edge detection. Appl. Soft Comput. 3(2), 123– 137 (2003)

Nuclei Image Boundary Detection …

129

8. Om Prakash Verma and Anil Singh Parihar: An optimal fuzzy system for edge detection in color images using bacterial foraging algorithm. IEEE Trans. Fuzzy Syst. 25(1), 114–127 (2016) 9. uddin Khan, N., Arya, K.V.: A new fuzzy rule based pixel organization scheme for optimal edge detection and impulse noise removal. Multimedia Tools Appl. 1–27 (2020) 10. Shrivastav, U., Singh, S.K., Khamparia, A.: A nobel approach to detect edge in digital image using fuzzy logic. In: First International Conference on Sustainable Technologies for Computational Intelligence, pp. 63–74. Springer (2020) 11. Tang, L., Xie, J., Chen, M., Xu, C., Zhang, R.: Image edge detection based on interval type-2 fuzzy similarity. In: 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE), pp. 958–963. IEEE (2019) 12. Yang, X.S.: Bat algorithm: literature, review and applications. Int. J. Bio-inspired Comput. 5(3), 141–149 (2013) 13. Mendel, J.M., John, R.I.B.: Type-2 sets made simple. IEEE Trans. Fuzzy Syst. 10, 117–127 (2002) 14. Tizhoosh, H.: Image thresholding using type-2 fuzzy sets. Pattern Recognit. 38, 2363–2372 (2005) 15. Szmidt, E., Kacprzyk, J.: Entropy for intuitionistic fuzzy sets. Fuzzy Sets Syst. 118(3), 467–477 (2001) 16. http://data.broadinstitute.org/bbbc/BBBC038/ 17. Abdou, I.E., Pratt, W.K.: Quantitative design and evaluation of enhancement/thresholding edge detectors. Proc. IEEE 67(5), 753–763 (1979) 18. Chen, G.-H., Yang, C.-L., Po, L.-M., Xie, S.-L.: Edge-based structural similarity for image quality assessment. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 2, pp. II–II. IEEE (2006)

Machine Learning Approach to Sentiment Analysis from Movie Reviews Using Word2Vec Amit Khan, Dipankar Majumdar, and Bikromadittya Mondal

Abstract Nowadays, many organizations use customer reviews to improve their business. They analyze the reviews given by the customer to get a concrete decision about the quality of the products or services provided by them. Sentiment Analysis in recent times has come up in a broad way for the analysis of the customer reviews and comments to know the customer views regarding consumer products or services. Through sentiment analysis, the attitude of the customers toward the product can be easily determined. In this paper, we used movie reviews dataset to extract the sentiment of viewers using machine learning approaches. We have used Word2Vec feature extraction method to obtain features from movie reviews. An arithmetic mean of the word vectors is obtained along each dimension and, thereafter, the same mean vector is used to train the different machine learning classifiers. The same method is used on performance classifiers against the available test data. Our proposed model yields an overall good performance based on accuracy, recall, and F1 score. Finally, we made a comparative analysis, among various methods used here based on their performance. Keywords Machine learning classifier · Sentiment analysis · Word2Vec · CBOW

A. Khan (B) Department of IT, RCC Institute of Information Technology, Kolkata 700015, India e-mail: [email protected] D. Majumdar Department of CSE, RCC Institute of Information Technology, Kolkata 700015, India e-mail: [email protected] B. Mondal Department of CSE, B.P. Poddar Institute of Management and Technology, Kolkata 700052, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_12

131

132

A. Khan et al.

1 Introduction Tremendous use of Internet makes available a large amount of accessible data with many freely available platforms to share opinions. A large number of users share their opinions on virtual platform with an emotional attitude, every day. Consequently, it is almost impossible to handle this high volume of data manually to extract sentiment from the data. As a result, an automatic sentiment analysis comes into the picture. Sentiment analysis techniques handle these large volumes of data collected from various social networking platforms and able to extract the attitude of the users toward the particular product or service [1]. Today, most of the business intelligence applications are largely dependent on sentiment analysis. This is so because, business analysts can use sentiment analysis as a tool to analyze the sentiments of the end users about different products, policies, and services. Sentiment analysis is a method to examine secret opinions of textual comments and classify them into different sentiment categories i.e., positive, neural, or negative [2]. Mainly two sentiment analysis techniques are available in the literature: lexicon approach and machine learning approach. In lexicon approach, Chen [3] used an algorithm based on opinion tendency element mutual content to compute the polarity score of words to determine the opinion trend of textual contents. This method highly depends on the predefined dictionaries to calculate sentiment score those depict the emotional orientation of the text. The performance of this extremely relies on the quality of lexicons. In contrast, machine learning approach extract the features from labeled dataset and using this features train the classifier to create the classification model that able to predict the sentiment from some unknown data [4]. Sentiment analysis takes a very vital function to devise decision. As per the survey done in April 2013, almost 90% of decisions regarding a particular movie are taken from online movie reviews [5]. Due to the poor accuracy provided by the lexicon-based approach, researchers paid their attention to machine learning-based approach. This work suggests a machine learning technique to opinion mining from movie reviews dataset using Word2Vec as a word embedding method. This article is divided into five sections. Section 2 presents a summary of connected works on sentiment analysis. Section 3 focuses on the proposed methodology. The experimental result is mentioned in Sect. 4 and the conclusion and future scope of the work is written in Sect. 5.

2 Related Works Most of the researchers contributed their works on sentiment analysis either dictionary-based or machine learning-based. Pang and Lee [6] successfully applied various classification methods like support vector machine (SVM), maximum entropy to categorize the movie reviews and able to attain very satisfactory results.

Machine Learning Approach to Sentiment Analysis ...

133

Ye et al. [7] analyzed online travel notes using supervised sentiment classification techniques and achieved improved accuracy and recall of returned travel notes of search engines. Raja et al. [8] analyzed Twitter data to observe the sentiment of people using pattern-based machine learning approach. In [9], Xie et al. proposed a sentence level sentiment polarity method to analyze each tweet using trained SVM classifier. Word vector clustering approach for sentiment analysis from hotel reviews is reported in [10]. Word embeddings technique successfully applied for sentiment analysis [11] and text classification [12]. In [13], authors showed that when unigram, bigram, trigram features are combined and used for opinion mining using various machine learning approaches like Gaussian Naive Bayes (GNB), Support Vector Machine (SVM), and Stochastic Gradient Descent (SGD). It provides a better result using different performance measures, i.e., precision, recall, f-measure, and accuracy. In [14], Maas et al. addressed document-level sentiment analysis using a combination of both supervised and unsupervised machine learning approaches. They also focused on word vector learning technique that includes sentiment and non-sentiment annotations.

3 Proposed Methodology Our proposed method starting from collecting data till acquiring the desirable output displayed in the detailed diagram given in Fig. 1 and steps are discussed subsequently.

3.1 Data Collection We have collected IMDB movie reviews dataset from Kaggle. The dataset consists of 50,000 reviews and their corresponding sentiment as positive or negative. Here 25,000 reviews are bearing positive sentiment and the rest of the 25,000 are negative.

Start

Test the performance of the classifier using test set

Collect IMDB movie reviews dataset from kaggle

Train machine learning classifier using training set

Data cleaning and preprocessing

Split train and test set(70% and 30%)

End

Fig. 1 Flow diagram of our proposed system

Generation of unique words from each review

Continue the process for all the reviews

Word vector generation using Word2Vec

Calculate the mean of all word vectors present in the review

134

A. Khan et al.

The original dataset contains levels as positive and negative, we have converted positive level as 1 and negative as 0.

3.2 Cleaning and Preprocessing Data Data cleaning and preprocessing involve tokenization, elimination of the markup language tags, punctuation, special characters, numbers, and stopwords. Though the stopwords are the most frequently used words, they have no role in determining sentiment. They unnecessarily increase the execution time. Finally, we used the stemming function to normalize the data.

3.3 Feature Selection Here we used Word2Vec feature selection model. Word2vec uses two language models: continuous bag-of-words (CBOW) and skip-gram model. Both the methods are shallow neural networks utilized to mapping source word/words to target word/words. During training, network learns the weights of words and, finally, the words are represented using word vectors. In case of CBOW model, a single target word is generated from a group of surrounding words. But skip-gram model works just the opposite way of CBOW model. The Skip-gram model used one source word as input and attempts to find a group of surrounding words as output. Working of both models shown in the figures below and mathematical expressions are depicted below: PCBOW = P(xs |xs−k , xs−k−1 , . . . . . . xs+k−1 , xs+k )

(1)

Pskip−gram = P(xs−k , xs−k−1 , . . . . . . xs+k−1 , xs+k |xs )

(2)

where x s denotes the current word, k denotes the adjacent window (Figs. 2 and 3). In our proposed model, we have used a 300 dimension feature vector, minimum word count as four and, number of contextual word as ten. Instead of using any pre-trained word vector, we have trained, owned vector, as its size is more compact than the pre-trained word vectors. For preparing the vectors for movie reviews, we used a simple technique that is calculating the average of all word vectors existing in each review. The length of the resultant vector will be the same, in our case, it is 300. After that, we continued the aforesaid procedure for all reviews present in our dataset and acquired their vectors. Novelty of Our Approach: Here instead of using the word vector directly to train the classifiers, we used a simple technique that is calculating the average of all word

Machine Learning Approach to Sentiment Analysis ...

135

x(s-2)

x(s-1)

x(s)

x(s+1)

x(s+2) Fig. 2 Continuous bag of words (CBOW) model

x(s-2)

x(s-1)

x(s)

x(s+1)

x(s+2)

Fig. 3 Skip-gram model

vectors existing in each review. And we have repeated the same process for all the reviews present in our dataset. Then obtained review vectors are used to train the classifiers. The primary benefit of this technique is that the feature vector counts are limited to the number of reviews present in our dataset, as a result, both the training and inference process become more faster.

3.4 Splitting Training and Testing Set In the previous step, we got feature vectors for each review present in our dataset. Now we split the features as 70% for the training purpose and remaining 30% for testing purpose. Training features as well as corresponding levels are utilized to train

136

A. Khan et al.

different machine learning classifiers, whereas testing features are utilized to test the performance of the trained classifiers.

3.5 Apply Machine Learning Algorithm After getting the features from the reviews next step is to select a machine learning classifier to train them using the features already obtained from the training data. Then we used the trained classifier to test the features for testing. Here we have selected some of the typical machine learning classifiers like Gaussian Naive Bayes, logistic regression, linear support vector classifier, random forest, and KNN, and finally, voting classifier to ensemble the output from the different classifiers. Naive Bayes: It’s a probabilistic categorization method works using Bayes theorem. It works mainly on assuming the various features are strongly independent of each other. For a given input, Naive Bayes algorithm finds the output using Bayes theorem with maximum likelihood. Bayes theorem can be depicted as below: P(Y |X ) =

P(Y )P(X |Y ) P(X , Y ) = P(X ) P(Y )P(X |Y )

(3)

Y

where Y indicates the predicted class and X indicates the input instance. Logistic Regression: We have used logistic regression technique as a binary classification method. For that purpose, we considered a decision threshold value that actually makes the logistic regression algorithm as a classification algorithm. Determining the threshold is a very crucial facet of this method as it depends on the nature of classification problem. Linear Support Vector Classifier: The fundamental concept behind this classifier is to discover a hyper-plane that partitioned the training set into two different classes. It can be stated using the equation given below: wT x + b = 0

(4)

where, w: indicates the direction of hyper-plane. b: represents the spatial arrangement of hyper-plane from the origin. Random Forest: We have used the random forest as a classification process for our sentiment analysis problem. From randomly selected data samples, random forest algorithm creates decision trees to get prediction from each tree, and using the voting technique it selects the best solution. Here, information gain can be used as one of the feature selection criteria. Information gain is depicted as below:

Machine Learning Approach to Sentiment Analysis ...

137

g(D|A) = H (D) − H (D|A)

(5)

where for particular training set D, H(D) indicates the entropy, and H(D|A) indicates the conditional entropy on feature A. K-Nearest Neighbors (KNN): It’s a simple supervised categorization process where K denotes the number of neighbors to be considered for taking decision. K nearest neighbors may be computed using any distance measures like Euclidean distance or Manhattan distance. It is basically a lazy and nonparametric algorithm. Lazy because it performs the action at the time of classification.

4 Results and Discussion With this particular segment, we measure the performance of various classifiers utilized for our work using different performance metrics. Here, we utilized accuracy, recall, precision, and F-1 score as the performance metrics. Before going to discuss the metrics, we introduce some of the parameters here. True Positive (TP) measures the number of positive instances identified correctly and True Negative (TN) measures the number of negative instances identified correctly using our model. Whereas False Positive (FP) indicates the number of negative instances identified wrongly as positive and False Negative (FN) indicates the number of positive instances identified wrongly as negative using our model. Accuracy =

TP + TN TP + TN + FP + FN

precision = recall = F1 − score =

TP TP + FP

TP TP + FN

2 × precision × recall recall + precision

Researchers in [15] used hybrid features on IMDB dataset and able to get accuracy and F-measure using different classifiers as shown below (Table 1). Table 1 Shows result obtained by researcher in [15]

Classifiers

Accuracy (%)

F-measure

Navie bayes (NB)

63.4

0.567

Support vector machine (SVM)

76.6

0.764

K-nearest neighbors (KNN)

72.26

0.723

138

A. Khan et al.

Table 2 Shows the result obtained from our proposed model Classifiers

Accuracy (%)

F-measure

Precision

Recall

Gaussian navie bayes (GNB)

78.01

0.777

0.786

0.769

Linear support vector classifier (LSVC)

88.29

0.884

0.875

0.892

K-nearest neighbors (KNN)

83.36

0.822

0.872

0.774

Logistic regression (LR)

87.06

0.871

0.863

0.880

Random forest (RF)

85.20

0.839

0.871

0.854

Result Obtained by our proposed model on IMDB movie review dataset shown below (Table 2). From the results obtained, it’s clear that our proposed technique provides improved performance in terms of both accuracy and F-1 score. This is so because we have used word embedding using Word2Vec, which considers the semantic features of the words [13] using a context of surrounding words. The context in our case comprises ten such words for CBOW or Continuous bag of words model. We have trained the Word2Vec model using 100 iterations. Results obtained from our proposed model using different classifiers are shown graphically as given below (Fig. 4).

Fig. 4 Graphically shows the obtained results from our model

Machine Learning Approach to Sentiment Analysis ...

139

5 Conclusions and Future Work The aim of our proposed work is to focus on the opinion mining problem by developing a technique to classify movie reviews using different trained classifiers and comparing the obtained results using various performance parameters. Our contribution mainly lies in the feature extraction method, i.e., instead of using the word vectors directly to train the classifiers, we have calculated the average of all word vectors existing in each review, and resultant review vectors are mainly used to train the classifiers. The main benefit of this approach is that the total number of feature vectors is limited to the number of reviews present in our dataset, as a result, both the training and inference process become faster. We used the IMDB movie reviews dataset for evaluation of our model and achieved reasonably good performance using accuracy, recall, and F1 score as performance metrics. The future scope of the work can be done on clustering the word vectors into groups and applying some statistical aggregation function to improve the model performance. Also in future, we want to widen our work using different deep learning classifiers.

References 1. El Rahman, S.A., AlOtaibi, F.A., AlShehri, W.A.: Sentiment analysis of twitter data. In: International Conference on Computer and Information Sciences (ICCIS), Aljouf, Kingdom of Saudi Arabia (2019) 2. Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment analysis of movie reviews: a new feature-based heuristic for aspect-level sentiment classification. In: 2013 International MultiConference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), IEEE, Kottayam, Kerala, India, pp. 712–717 (2013) 3. Chen, X.D.: Research on sentiment dictionary based emotional tendency analysis of Chinese microblog. Huazhong University of Science & Technology (2012) 4. Fan, Z., Su, L., Liu, X., Wang, S.: Multi-label Chinese question classification based on word2vec. In: 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, pp. 546–550 (2017) 5. Ling, P., Geng, C., Menghou, Z., Chunya, L.: What do seller manipulations of online product reviews mean to consumers? In: HKIBS Working Paper Series 070–1314. Hong Kong Institute of Business Studies, Lingnan University, Hong Kong (2014) 6. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning technique [C]. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing (2002) 7. Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approached [J]. Expert Syst. Appl. 36(3), 6527–6535 (2009) 8. Raja, H., Ilyas, M.U., Saleh, S., Liu, A.X., Radha, H.: Detecting national political unrest on twitter. In: 2016 IEEE International Conference on Communications (ICC). IEEE, Kuala Lumpur, Malaysia, pp. 1–7 (2016) 9. Xie, L., Zhou, M., Sun, M.: Hierarchical structure based hybrid approach to sentiment analysis of chinese micro blog and its feature extraction. J. Chinese Inf. Process. 26(1), 73–83 (2012) 10. Zhang, X., Yu, Q.: Hotel reviews sentiment analysis based on word vector clustering. In: 2017 2nd IEEE International Conference on Computational Intelligence and Applications, Beijing, China (2017)

140

A. Khan et al.

11. Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of sina weibo with word2vec. In: 2014 IEEE International Congress on sa (BigData Congress). IEEE, Anchorage, AK, USA, pp. 358–363 (2014) 12. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, pp. 136–140 (2015) 13. Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using n-gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016) 14. Maas, A., Daly, R., Pham, P., Huang, D., Ng, A., Potts, C.: Learning word vectors for sentiment analysis. In: 49th Annual Meeting of the Association for Computational Linguistics. Human Language Technologies, vol. 1, pp. 142–150 (2011) 15. Keerthi Kumar, H.M., Harish, B.S., Darshan, H.K.: Sentiment analysis on IMDb movie reviews using hybrid feature extraction method. Int. J. Inter. Multimedia Artif. Intell. 109–114 (2018).

Selection of Edge Detection Techniques Based on Machine Learning Approach Soumen Santra, Dipankar Majumdar, and Surajit Mandal

Abstract Machine Learning (ML) plays an important role in Image Processing where we can apply different algorithms of ML for better analysis of an image. In this communication, we present that the application of ML may help in selecting a particular edge detection technique for image analysis. We consider various components of confusion matrix and other parameters to assess different edge detection techniques. Keywords Edge detection · Filtering · Confusion matrix · Learning model · Adversarial search algorithm · Statistical learning

1 Introduction An image is a two-dimensional array or a matrix arranged in columns and rows. An edge is formed due to an abrupt change in color intensity and an edge is formed due to a change in color intensity of the pixels [1–9]. Edge detection is used in multiple domains of science and engineering such as Medical science, Aero-Space Engineering, Robotics, and many more [1–9]. Image processing is a method of converting an image into the digital form and performing some operations on it, in order to get an enhanced image or to extract some useful information from it. The input may be a video frame, or a photograph, and the output may be another image or S. Santra Department of MCA, Techno International New Town, Kolkata 700156, West Bengal, India e-mail: [email protected] D. Majumdar (B) Department of CSE, RCC Institute of Information Technology, Kolkata 700015, West Bengal, India e-mail: [email protected] S. Mandal Department of ECE, B.P. Poddar Institute of Management & Technology, Kolkata 700016, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_13

141

142

S. Santra et al.

Fig. 1 Process flow of ML-based image processing

some characteristics associated with that image [1–9]. Image processing technique includes the following three steps [1–9] (Fig. 1).

2 Various Edge Detection Techniques Various popular edge detection techniques are classified as Classical edge detections (Sobel, Prewitt, & Robert) and Canny edge detection which are gradient-based edge detection operators. On the contrary, Marr-Hildreth technique is Laplacianbased edge detection technique [1–9]. In medical science, the Sobel operator was mostly used to determine edge, but nowadays, Canny operator became very popular because it can detect edge from noisy images with very low error. Gradient-based operator is also known as first-order derivative operator-based edge detection [10, 11]. In gradient-based edge detection, we need to calculate two things: the value of the gradient of pixels and the direction of the gradient [1–9]. The Sobel operator calculates the gradient from -1 to + 2 in the (x, y) position. The convolution matrix or Kernel matrix of the Sobel operator is [1–9] (Table 1). Prewitt operator calculates the gradient from −1 to +1 in the (x, y) position. The convolution matrix of the Prewitt operator is [1–7] (Table 2). Table 1 Convolution matrix for Sobel operator −1

0

1

1

−1

0

1

0

0

0

−1

0

1

−1

−1

−1

1

1

Horizontal (x)

1

1

Vertical (y)

Table 2 Convolution matrix for Prewitt operator −1

0

1

1

−1

0

1

0

0

0

−1

0

1

−1

−1

−1

Horizontal (x)

Vertical (y)

Selection of Edge Detection Techniques Based on Machine Learning Approach Table 3 Convolution matrix for Robert operator

1

0

0

−1

Horizontal (x)

143

0

1

−1

0

Vertical (y)

Table 4 Convolution matrix for Canny operator 1 159

2

4

5

4

2

4

5

9

5

4

5

12

15

12

5

4

5

9

5

4

2

4

5

4

2

Like Sobel and Prewitt operators, Robert is also a gradient-based edge detection operator; but the kernels of previous two operators are 3 × 3 convolution matrix and, in case of Robert operator, the matrix will be 2 × 2 [1–9]. The convolution matrix of the Robert operator is [1–9] (Table 3). Canny operator can give very smooth output from a noisy image with a very low percentage of errors [6–9]. In case of Canny operator, the kernel matrix will be 5 × 5 [1, 3–7] (Table 4). Before the Canny operator came, Marr-Hildreth was the most useful edge detection operator. This operator is a Laplacian operator which can be generated by secondorder derivative. Like Canny operator, Marr-Hildreth operator can also make image blur by applying Gaussian Blur function [1–9] as follows in Eq. (1): G(x, y) =

2

1

σ

e− 2

x 2 +y 2 2σ 2

(1)

σ is used to define the radius for Gaussian.

3 Machine Learning Machine Learning is a subset of artificial intelligence which is classified into three wings: Supervised Learning, Unsupervised Learning, and Reinforcement Learning. The three main components of ML are data input, data process, and data output. All these three components must be tuned to create a Model or Machine “M.” As per Benzio et al. [11] it is a combination of Tusk, Performance, and Measure. We apply several techniques like Regression, Classification, Clustering, Probability Theories, Decision Trees, Support Vector Machine, and Principal Component Analysis to measure the performance of Model [2].

144

S. Santra et al.

Supervised Learning deals with only labeled data or data which we can separate as per class, e.g., rating of restaurants (5 stars or 4 stars) with the price of food. Unsupervised Learning deals with those data where the dependent variables do not vary with independent variables. We cannot manage them using a single linear equation or separate them using a class-wise format such as age of students with their marks in a particular subject. Reinforcement Learning deals with the combination of previous two approaches. To analyze a set of data by a Machine we start with Unsupervised learning, then adopt Supervised Learning. For example, to play a game we must analyze the algorithm for better accuracy at each step.

4 Analysis of the Operators: ML Approach We cannot determine which edge detection operator is the best just by watching through naked eyes. That is why we need to calculate the percentage of detected edge by a mathematical function. Before the introduction of ML, it was done through eyes only; but after calculation, we can compare the results to get the best edge detection operator. To analyze an edge detection technique, three parameters are taken into consideration: “True Positive Pixels” (TP), “False Positive Pixels” (FP), and “False Negative Pixels” (FN) [10, 11]. The true positive pixels are the correctly detected edge pixels. The pixels erroneously classified as edge pixels are called the false positive pixels. The actual edge pixels that are not correctly detected are called the false negative. Based on these parameters, we need to calculate the Merits of Pratt (IMP) which is referred to as quality of edge. The greater the values of IMP, the better are the edge detection operator. To calculate the IMP first, we need to calculate the percentage of pixels that were correctly detected by Eq. (2) [1–3], PC O =

TP max(I, B)

(2)

In Eq. (2), I represent the number of edge points of the actual image and B is the number of edge points detected. After calculating the Pco, we need to calculate the percentage of pixels that was not detected by Eq. (3) [1–4], PN D =

FN max(I, B)

(3)

Finally, we need to calculate the percentage of pixels which were erroneously detected by Eq. (4) [1–4], Pf a =

FP max(I, B)

(4)

Selection of Edge Detection Techniques Based on Machine Learning Approach

145

After calculating these three percentages, we can move to calculate the values of ‘IMP’ by Eq. (5) [1–4], 1 1 IMP = max(I, B) i=1 1 + ad 2 B

(5)

where “d2 ” the distance between an edge pixel and the nearest edge pixel of the ground truth and “a” is an empirical calibration constant and was taken as a = 1/9.

5 Results and Discussions After executing the IMP calculation process over all the edge operators, we represent the output in Tables 5, 6, and 7 for the Figs. 2, 3, and 4, respectively. Various Image assessment parameters such as TP, FP, FN, P(CO), P(ND), P(FA), and IMP have been calculated for different Images. Figures 2, 3, and 4 represent the images of Rubic cubes, Leena, and synthetic, respectively. Corresponding assessment parameters are tabulated in Tables 5, 6, and 7, respectively. The values of these Table 5 Values of parameters of different filter operators for Fig. 2 Algorithms

TP

FP

FN

P(CO)

P(ND)

P(FA)

IMP

Sobel (h)

12,453

178,128

124,748

0.0908

0.9092

1.2983

1.1700000000E-10

Sobel (v)

5983

176,815

132,531

0.0379

0.8404

1.1212

9.0700000000E-11

Prewitt (h)

9063

180,237

126,029

0.0575

0.7992

1.1429

1.0100000000E-10

Prewitt (v)

3958

177,531

133,840

0.0251

0.8487

1.1258

9.0700000000E-11

Robert (h)

180

167,339

147,810

0.0011

0.9373

1.0611

9.0700000000E-11

Robert (v)

180

167,109

148,040

0.0011

0.9388

1.0591

9.0700000000E-11

11,467

175,752

128,286

0.0727

0.8135

1.1145

9.7000000000E-11

Canny

Table 6 Values of parameters of different filter operators for Fig. 3 Algorithms

TP

FP

FN

P(CO)

P(ND)

P(FA)

IMP

Sobel (h)

1640

48,788

265,072

0.0061

0.9939

0.1829

3.2600700000E-10

Sobel (v)

2892

48,857

263,751

0.0108

0.9892

0.1832

3.2745263153E-10

Prewitt (h)

1100

49,082

265,372

0.0041

0.9959

1.184

3.3065926196E-10

Prewitt (v)

1493

50,067

263,940

0.0056

0.9944

0.1886

3.2894535355E-10

Robert (h)

2

49,918

266,380

7.51E-06

0.99998

0.1843

3.2780526996E-10

2

48,889

266,609

7.50E-06

0.99999

0.1833

3.2753006402E-10

165

51,459

263,876

0.0006

0.8135

1.1145

3.2538186958E-10

Robert (v) Canny

146

S. Santra et al.

Table 7 Values of parameters of different filter operators for Fig. 4 Algorithms

TP

FP

FN

P(CO)

P(ND)

P(FA)

IMP

Sobel (h)

2029

31,615

281,840

0.0071

0.9929

0.1114

4.7747000000E-10

Sobel (v)

1801

30,093

284,164

0.0043

0.9957

0.1054

4.3758979035E-10

Prewitt (h)

1533

31,986

281,965

0.0054

0.9946

0.1128

4.8119032842E-10

Prewitt (v)

893

30,171

284,420

0.0031

0.9969

0.1057

4.3770942038E-10

Robert (h)

0

31,016

284,468

0

1

0.10903

4.3969126385E-10

Robert (v)

0

30,942

284,542

0

1

0.10874

4.3969912831E-10

Canny

664

33,256

281,564

0.0024

0.9976

0.1178

4.7447061689E-10

Original

Sobel

Prewitt

Robert

Canny

Marr-Hildreth

Canny

Marr-Hildreth

Fig. 2 Various edge detected filter on rubic cubes image

Original

Sobel

Prewitt

Robert

Fig. 3 Various edge detected filter on noisy Leena image

Original

Sobel

Prewitt

Robert

Canny

Marr-Hildreth

Fig. 4 Various edge detected filter on noisy synthetic image

parameters indicate which particular edge detection technique will be utilized for further processing. As evident from Table 5 the ‘TP’ values are the greatest for Sobel edge detection technique.

Selection of Edge Detection Techniques Based on Machine Learning Approach

147

Sobel (h) is found to be greatest among the four edge detection techniques. But in case of IMP, Prewitt (h) is found to be the greatest among the others edge detection technique. But Robert (h) and Robert (v) both are not good for any type of images. From these assessment parameters, we can decide that Canny gives a standard output for all types of images, but Sobel (v) works well for synthetic images. So, from these results, we can conclude that particular edge detection cannot be applied for every situation. So, before applying any edge detection technique, we need to check their assessment parameters and finally decide which particular technique is to be adopted.

References 1. He, X., Yung, N.H.C.: Performance improvement of edge detection based on edge likelihood index. In: Li, S., Pereira, F., Shum, H.-Y., Tescher, A.G. (Eds.) Visual Communications and Image Processing. Proc. SPIE 5960. https://doi.org/10.1117/12.633216 2. Boaventura, A.G., Gonzaga, A.: Methsod to Evaluate the Performance of Edge Detector. https:// doi.org/10.1.1.562.2382 3. Nadernejad, E., Sharifzadeh, S., Hassanpour, H.: Edge detection techniques: evaluations and comparisons. Appl. Math. Sci. 2(31), 1507–1520 4. Santra, S., Mandal, S.: A new approach towards invariant shape descriptor tools for shape classification through morphological analysis of image. In: 2nd International Conference On Computational Advancement In Communication Circuit And System (ICCACCS-2018), Computational Advancement in Communication Circuits and Systems (2020) 5. Maini, R., Aggarwal, H.: Study and comparison of various image edge detection techniques. Int. J. Image Process. (IJIP) 3(1). https://doi.org/10.1.1.301.927 6. Bhardwaj, S., Mittal, A.: A survey on various edge detector techniques. Proc. Technol. 4, 220–226 (2012). https://doi.org/10.1016/j.protcy.2012.05.033 7. Santra, S., Mukherjee, P., Sardar, P., Mandal, S. and Deyasi, A.: Object detection in clustered scene using point feature matching for non-repeating texture pattern. In: Conference on Control, Signal Processing and Energy System (CSPES 2018), Lecture Note of Electrical Engineering. Springer (2019) 8. Juneja, M., Sandhu, P.S.: Performance evaluation of edge detection techniques for images in spatial domain. Techniques for Images in Spatial Domain January 2009. Int. J. Comput. Theory Eng. 1(5), 614–621. https://doi.org/10.7763/IJCTE.2009.V1.100 9. Rashmi, Kumar, M., Saxena, R.: Algorithm and technique on various edge detection—a survey. Signal Image Process. Int. J. 4(3), 65–75 (2013). https://doi.org/10.5121/sipij.2013.4306 10. Santra, S., Mandal, S., Das, K., Bhattacharjee, J., Deyasi, A.: A comparative study of ztransform and fourier transform applied on medical images for detection of cancer segments. In: IEEE 3rd International Conference on Electronics, Materials Engineering & NanoTechnology (IEMENTech) (2019) 11. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 12. Santra, S., Mandal, S., Das, K., Bhattacharjee, J., Roy, A.: a modified canny edge detection approach to early detection of cancer cell. In: IEEE 3rd International Conference on Electronics, Materials Engineering & NanoTechnology (IEMENTech) (2019)

ANN-Based Self-Tuned PID Controller for Temperature Control of Heat Exchanger Godavarthi Charan, Dasa Sampath, K. Sandeep Rao, and Y. V. Pavan Kumar

Abstract Controlling is one of the most important and main aspects in regulating industrial processes such as pressure, level, temperature, etc. Generally, conventional PID controllers are used for this purpose. For tuning these controllers, there are numerous algorithms such as Ziegler-Nichols method, Cohen-Coon method, etc., are available. But the challenge with these conventional tuning methods is selecting a suitable method for a system. There is no such method that gives the best performance to all systems. The usefulness of a method changes from system to system. Even after finding the best method for a system, it may not be able to address the external disturbances up to the desired merit. So, there is a necessity for using artificial intelligence to overcome these problems. Thus, this paper focuses on the implementation of artificial neural network-based self-tuned PID (ANN-PID) controller for heat exchanger system that can generate the desired results even under disturbances. Further, various nonlinearities are applied to the system and tested the robustness of the proposed controller. The simulations are done using Simulink software. The results give a conclusion that the proposed ANN-PID method gives robust performance compared to the key traditional PID-tuning methods. Keywords Temperature control · Heat exchanger · Artificial neural networks · Self-tuned PID · Disturbance rejection

G. Charan · D. Sampath · K. S. Rao · Y. V. Pavan Kumar (B) School of Electronics Engineering, VIT-AP University, Amaravati 522237, AP, India e-mail: [email protected] G. Charan e-mail: [email protected] D. Sampath e-mail: [email protected] K. S. Rao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_14

149

150

G. Charan et al.

1 Introduction PID controllers are the most commonly used controllers in the industrial processes as they are easily tunable and simply structured. Even, using trial and error method, these controllers can be tuned manually as far as the system gives satisfactory results. There are many tuning algorithms evolved in the literature such as OLTR (open-loop transient response), EPI (error performance indices), ultimate cycle methods, etc., [1–3]. Further, a fractional PID controller was explained and compared with traditional PID controllers in [4–6]. Besides, many controllers used in industries are overwhelming and possess poor disturbance rejection against the nonlinear disturbances, which triggers more research in PID controller design. Hence, this paper proposes ANN-PID controller to overcome the restrictions of traditional PIDs. To test the proposed controller effectiveness, industrial heat exchanger temperature control system was used in this paper. It has a vital importance as the controlling depends on dynamic properties of temperature flow, resistance, and friction. The detailed operation of the tubular heat exchanger was given in [7, 8], whose transfer function was derived as (1) from [9], where gain (K) = 1, time constant (T ) = 38 s, and delay time (τ ) = 15 s. The proposed ANN-PID is compared with traditional offline PID controllers for the analysis of its usefulness. G(s) =

1 K e−τ s = e−15s Ts + 1 38s + 1

(1)

2 System Modeling with Traditional PID Controller The traditional PID controllers are capable of controlling only LTI systems. Basically, PID controller sets the system response to a desired value by setting up controller gains with the help of tuning algorithms [10]. Some of the methods such as ZieglerNichols (ZN), Astrom-Hagglund (AH), Cohen-Coon (CC), and Hang are the openloop tuning methods which were used for controlling the heat exchanger system. The procedural steps for calculating the gain values in these methods have described as follows [11]: • Step-1: Detach the feedback loop and controller from the system. • Step-2: Set the input with step-signal and see the open-loop response of the system. • Step-3: Find inflection point and make a tangent at the inflection point on the curve as shown in Fig. 1. • Step-4: Make a note of fall back time (L), reaction time (T), and stationary gain (K) and calculate the controller gains by using these parameters as given in [10]. All these computed gains are given in Table 1. Figure 2 shows the system model designed with the CC tuning algorithm. Among the traditional methods, CC method gives satisfactory outcomes in contrast with others. But the drawback of these offline

ANN-Based Self-Tuned PID Controller for Temperature …

151

Fig. 1 Output curve of an open-loop system Table 1 PID gains computed with various conventional tuning techniques Technique

KP

KI

KD

TI

TD

ZN

0.61

0.01

10.01

33.39

16.69

CC

2.17

0.02

16.32

94.47

7.76

Hang

2.96

0.14

11.24

22.48

3.74

AH

3.43

0.12

1.46

30.02

7.49

Fig. 2 Model of the system with CC PID controller

152

G. Charan et al.

tuning algorithms is that they can’t cope up with the online disturbances that occur in the system. Hence, there is a need for controller that can address the online disturbances, which can be achieved by the proposed ANN-PID controller in this paper.

3 System Modeling with the Proposed ANN-PID Controller ANN has an ability to predict and control the system response without using preestablished relations of mathematics. From the literature, it can be observed that ANNs are being used in the process control in various sectors like aerospace, automotive and electronics [12, 13]. Fault tolerance, data processing in ANN are similar to that of human nervous system characteristics. The storage of information is very complex in neurons. Information is being processed through simple aggregation and also with complex processes. But in ANN the information is stored in the form of digital, analog as well as spike models. It solves problems by using summation, multiplication, and also with some improved processes like aggregation. The simplified mathematical representation of neuron is shown in Fig. 3. The scalar inputs are represented as x 1 , x 2 , … x n . Synaptic weights are represented as w1 , w2 , … wn . Bias is represented with w0 . The summation of bias and multiplication of scalar inputs with synaptic weights are given to the activation function. Neuron output is y given by (2). Linear, tan-sigmoidal, and log-sigmoidal shown in Fig. 4 are the commonly used activation functions, whose mathematical relations are given in (3), (4), (5), respectively. Usually, sigmoidal activation functions are used as they are smooth, continuous, monotonically increasing, and holds positive derivative. Figure 5 shows the adjustment of synaptic weights. First, some random synaptic weights are chosen, then, the network output is calculated. This output is compared with the desired value, thus, the error between the desired value and the network output is found. To compensate for this error, required weights are added to the network as shown in (6).

Fig. 3 Mathematical representation of neuron

ANN-Based Self-Tuned PID Controller for Temperature …

153

Fig. 4 Activation functions

Fig. 5 Synaptic weights adjustment for the neural network

y = f ( p) = f w0 +

n

wi xi

(2)

i=0

y = pur elin( p) = p = w0 +

n

wi xi

(3)

i=0

2 −1 1 + e−2 p

(4)

1 1 + e− p

(5)

wi(new) = wi( pr evious) + dwi

(6)

y = tan sig( p) =

y = log sig( p) =

The ANN-PID controller takes the advantage of both the ANN and conventional PID controller. The ANN logic updates the gain values of the PID controller according to the disturbances. The procedure for training data of an ANN controller is given in Fig. 6. As shown in Fig. 7, 1—hidden layer and 5—hidden neurons are chosen for the feed-forward neural network [14]. Tan-sigmoidal and pure-linear activation functions are considered for the hidden layer and output layer neurons, respectively. For ANN training, LMBP algorithm is used as given in [15].

154 Fig. 6 Flow for design of the proposed ANN-PID controller

G. Charan et al.

ANN-Based Self-Tuned PID Controller for Temperature …

155

Fig. 7 ANN architecture for predicting PID gains

4 Simulation Results and Analysis The response of the system under various conditions is plotted for the analysis on the effectiveness of the proposed controller. Figure 8 represents the comparison of system responses with conventional tuning methods as per Table 1. System response with CC PID controller was relatively good, without any disturbance in the system. Figure 9 represents the system response designed with CC PID with and without disturbance and is observed that the response of the system has become unsatisfactory while disturbance occurs in the system. Hence, this drawback triggers to go for new techniques. Figure 10 represents the model for comparison of system responses realized with conventional and proposed controllers with disturbances applied. The regression plot of ANN for predicting PID gains is shown in Fig. 11, which gives the information of data fitting into ANN. Figure 12 represents ANN-PID response with different values of disturbances. Figure 13 represents the comparison of proposed ANN-PID and conventional CC-PID responses for different disturbances. Various performance metrics for ANN-PID are computed as given in Table 2. The collective comparison between the conventional and proposed methods is given in Table 3.

5 Conclusion The major disadvantages of the conventional offline PID controllers are overcome in this paper by the introduction of ANN-based intelligent concepts for the design of PID controller. This proposed controller can cope up with online disturbances and

156

Fig. 8 System responses with conventional tuning methods

Fig. 9 System response with CC-PID with and without a disturbance

G. Charan et al.

ANN-Based Self-Tuned PID Controller for Temperature …

157

Fig. 10 Simulink model for conventional CC-PID vs proposed ANN-PID analysis

Fig. 11 Regression plot of ANN training

protects the system from all kinds of nonlinearities. Table 3 indicates the usefulness of the proposed control method over the conventional methods. Hence, the proposed controller ensures robust and stable system operation under all conditions.

158

Fig. 12 Response of the system with ANN-PID for step disturbances

G. Charan et al.

ANN-Based Self-Tuned PID Controller for Temperature …

159

(a) for repeating sequence disturbance

(b) for Band-Limited White noise disturbance

(c) for sinusoidal disturbance

(d) for uniform random disturbance

Fig. 13 Comparison of ANN-PID and CC-PID system responses for different disturbances

Table 2 Time domain performance of the system response with ANN-PID controller Applied disturbance

Delay time (s)

Rise time (s)

Settling time (s)

Overshoot (%)

0.1% of set point

7.546

16.478

39.06

1.79 1.75

0.05% of set point

9.24

16.786

39.10

0

9.548

17.094

40

No overshoot

– 0.05% of set point

8.008

17.71

40.52

No overshoot

– 0.1% of set point

8.162

17.864

41.5

No overshoot

160

G. Charan et al.

Table 3 Comparison of conventional and proposed system with time domain parameters Disturbance type

performance metric

conventional system Proposed system Gain

No disturbance

Delay time (s)

11.214

Step of 10% of input

Step of −10% of input

1.63

Rise time (s)

82.236

17.094

65.14

Settling time (s)

322.96

40

282.9

Peak overshoot (%)

0

0

0

Steady state error (%) 0

0

0

Delay time (s)

49.518

7.546

41.91

Rise time (s)

9.17

16.478

−7.31

Settling time (s)

312.975

39.06

273.9

Peak overshoot (%)

0

1.79

−1.79

Steady state error (%) 0

0

0

Delay time (s)

92.617

8.162

84.45

Rise time (s)

10.087

17.864

−7.78

Settling time (s)

383.526

41.5

342.1

Peak overshoot (%)

0

0

0

Steady state error (%) 0

0

0

13.755

−2.75

Interpolating repeated Delay time (s) sequence Rise time (s)

11.004 79.779

17.423

62.36

Settling time (s)

276.85

100.638

176.2

Peak overshoot (%)

0

3.96

−3.96

0

0

Steady state error (%) 0 Random number with Delay time (s) uniformity Rise time (s)

9.887

8.253

1.634

50.435

15.589

34.85 −69.8

Settling time (s)

–

69.855

Peak overshoot (%)

7.45

0

7.45

0

0.017

Steady state error (%) 0.0173 Sinusoid of 15 rad/sec Delay time (s)

White noise (Band limited)

9.584

11.004

9.17

1.834

Rise time (s)

80.696

16.506

64.19

Settling time (s)

360.50

87.25

273.3

Peak overshoot (%)

0

4.1

−4.1

Steady state error (%) 0

0

0

Delay time (s)

10.087

−3.99

6.0975

Rise time (s)

95.121

17.423

77.69

Settling time (s)

378.652

41.34

337.3

Peak overshoot (%)

0

1.5

−1.5

0

0.007

Steady state error (%) 0.0073

ANN-Based Self-Tuned PID Controller for Temperature …

161

References 1. Antony, A.P., Varghese, E.: Comparison of performance indices of PID controller with different tuning methods. Int. Conference on Circuit, Power and Computing Technologies (ICCPCT), pp. 1–6, Nagercoil (2016) 2. Kucherov, D., Kozub, A., Rasstrygin, A.: Setting the PID controller for controlling quadrotor flight: a gradient approach. IEEE 5th Int. Conference on Methods and Systems of Navigation and Motion Control (MSNMC), pp. 90–93, Kiev (2018) 3. Surya, S., Singh, D.B.: Comparative study of P, PI and PID Controllers for operation of a pressure regulating valve in a blow-down wind tunnel. IEEE Int. Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), pp. 1–3, Manipal, India (2019) 4. Patel, H.R., Shah, V.A.: Comparative study between fractional order PIλ Dμ and integer order PID Controller: a case study of coupled conical tank system with actuator faults. 2019 4th Conference on Control and Fault Tolerant Systems (SysTol), pp. 390–396, Casablanca, Morocco (2019) 5. Charef, M., Charef, A.: Fractional order controller based on the fractionalization of PID controller. 5th Int. Conference on Electrical Engineering – Boumerdes (ICEE-B), pp. 1–5, Boumerdes (2017) 6. Durgadevi, K., Karthik, R.: Performance Analysis of Zeta Converter Using Classical PID and Fractional Order PID Controller. 2018 Int. Conference on Power, Energy, Control and Transmission Systems (ICPECTS), pp. 312–317, Chennai, India (2018). 7. Qi, W., Xiao, J.: Fuzzy predictive control for a tubular heat exchanger system based multiple models strategy. 2013 5th Int. Conference on Intelligent Human-Machine Systems and Cybernetics, pp. 220–223, Hangzhou (2013) 8. Al-Dhaifallah, M.: Heat exchanger control using fuzzy fractional-order PID. 16th Int. MultiConference on Systems, Signals and Devices (SSD), pp. 73–77, Istanbul, Turkey (2019) 9. Zhou, H.: Simulation on temperature fuzzy control in injection mould machine by Simulink. 2008 IEEE Int. Conference on Networking, Sensing and Control, pp. 123–128, Sanya (2008) 10. Kumar, A., S. Pan, S.: A PID Controller Design Method using Stability Margin with Transient Improvement Criteria. 4th Int. Conf. Electrical Energy Systems, pp. 506–510, Chennai, India (2018). 11. Bharath Kumar, V., Charan, G., Pavan Kumar, Y.V.: Design of robust PID Controller for improving voltage response of a cuk converter. Innov. Elect. Elect. Eng. Springer Lect Notes Elect. Eng. 661, 301–308 (2020) 12. Liu, J., Tao, X., Ma, X., Feng, K., Chen, J.: Fuzzy controllers with neural network predictor for second-order linear systems with time delay. IEEE Access 8, 206049–206062 (2020) 13. Gueye, D., Ndiaye, A., Diao, A.: Adaptive Controller based on neural network artificial to improve three-phase inverter connected to the grid. 2020 9th Int. Conference on Renewable Energy Research and Application (ICRERA), pp. 72–77, Glasgow, United Kingdom (2020) 14. Ramirez, H.J., Juarez, S.O.U., Hernandez, G.L., Hernandez, R.A., Olivares, D.R.S.: Voltage control base on a back-propagation artificial neural network algorithm. 2020 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6, Ixtapa, Mexico (2020) 15. Sandeep Rao, K., Siva Praneeth, V.N., Pavan Kumar, Y.V., John Pradeep, D.: Investigation on various training algorithms for robust ANN-PID controller design. Int. J. Sci. Technol. Res. 9(02) (2020)

A Novel Partitioning Algorithm to Process Large-Scale Data Indradeep Bhattacharya

and Shibakali Gupta

Abstract In mathematics and graph theory, the graph partitioning problem defines the reduction of graphs into smaller graphs by partitioning its set of nodes into mutually exclusive groups. Fundamentally, finding a partition that simplifies a graph is very hard to figure out and hence, the problem falls under the NP-hard category. Numerous algorithms and mechanisms exist for the evaluation of graph partitioning. In this paper, we have discussed a novel partitioning process named pairwise partitioning, where the prime focus is to derive all possible pairs of vertices to simplify any graph. The process of partitioning has been developed using an equality-inequality mechanism. The entire graph will be decomposed into meaningful pairs represented through certain sets where each set must contain distinct pairs of vertices. Nowadays, data management is an essential task to be performed. To deal with a large amount of data, it should be very difficult and next to impossible. In this situation, our proposed algorithm (pairwise partitioning algorithm) can provide a good solution. As we can simplify a graph in the form of pairs, so, a better understanding should be developed to analyze any graph. Throughout this paper, we have discussed the working principle of the pairwise partitioning algorithm and its significant impact on big data analysis. Keywords Pairwise-partitioning · Equality-inequality detection · Essential vertex · Central node

1 Introduction Consider a graph G (V, E), where V denotes the set of n vertices and E denotes the set of edges. Compute the vertex partitioning with V = v0 ∪ v1 ∪ v2 ∪ … ∪ vn-1 I. Bhattacharya (B) · S. Gupta University Institute of Technology, The University of Burdwan, Bardhaman 713104, West Bengal, India e-mail: [email protected] S. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_15

163

164

I. Bhattacharya and S. Gupta

such that vi ∩ vj = , for i = j, vi ~ vj (roughly balanced), Ecut is equivalent to {(u, v) | u ∈ vi and v ∈ vj }, and this Ecut should be minimized. Based on this theory of partitioning, Zhang et al. [1] proposed a streaming graph partitioning algorithm named, Akin. Their prime focus was to provide a better solution that fits the need of streaming graph partitioning in a distributed system. To reduce the edge-cut ratio, they exploited the similarity measure on the degree of vertices to gather structurally related vertices in the same partition as much as possible. In their research, they had shown clearly that their proposed algorithm was able to achieve preferable partitioning quality in terms of edge-cut ratio during maintaining a reasonable balance between all partitions. Their research-oriented improvements should be applicable to all reallife graphs. Upon analysis of the graph partitioning algorithm, Patwary et al. [2] proposed another streaming graph partitioning algorithm called WStream. It is a window based streaming graph partitioning algorithm based on edge-cut partitioning methodology. The primary goal of this algorithm is to distribute a vertex among the partitions. The authors claimed that this algorithm must be able to partition large graph data efficiently while keeping the load balanced across various partitions, and communication to a minimum. In a study of big data management, Wang et al. [3] have discussed about more scalable, effective, and low complexity approach to produce high-quality dataset partitions with a lesser number of links between partitions. They displayed experimentally that it works well in diminishing the communication cost of query processing. To enhance the versatility and expressivity of graphs, Nisar et al. [4] had studied the impact of different graph partitionings on run time and network I/O and claimed drastic reductions in network traffic. In this work, we have provided a novel concept to simplify any undirected graph through a pairwise partitioning algorithm. We aim to figure out all possible pairs from any given graph, then with the assistance of an equality-inequality detection mechanism these pairs should be mapped into their equivalent set. Initially, the vertices in a graph should be grouped according to their degree. After making those groups, a cartesian product should be applied between each group and with itself to get all possible pairs of vertices. Throughout the algorithm we use an undirected graph for our experiment so, if any pair (a, b) will exist inside a group; (b, a) cannot reside in that group as it represents the same edge. In this way, the resultant group (modified) should be formed. In the next few sections, we have discussed the concept behind pairwise partitioning, its pseudocode, and its impact on large scale data processing.

2 Concept Behind Pairwise Partitioning As we have discussed earlier that, initially we must consider the degree of each vertex in a graph and then create groups or sets according to the degree equivalency [5]. The equation referenced underneath generates the degree of each vertex: λ(v) = k

(1)

A Novel Partitioning Algorithm to Process Large-Scale Data

165

Here, v is any vertex, and k should be any positive integer that holds the degree of each vertex in a graph G (V, E). Assume, a graph comprises five vertices where, λ (v1 ) = λ (v3 ) = a, and λ (v2 ) = λ (v4 ) = λ (v5 ) = b, where a, b should be defined as the degree the respective vertices. In this case, two sets can be formed, and they are S1 = {v1 , v3 }, and S2 = {v2 , v4 , v5 }. After the creation of these sets with the help of Eq. 1, the cartesian product between each group and with itself must be applied to get all possible pairs from the graph. The formula (Eq. 2) referenced underneath makes our understanding even clearer. n

{(Si × Si ) ∪

i=1

n−1

(Si × S j+1 )}

(2)

j=1

To remove symmetric pairs from the resultant set R, we have modified Eq. 2 mentioned below. n i=1

{(Si × Si ) ∪

n−1 j=1

(Si × S j+1 )} −

n n−1 { (S j+1 × Si )}

(3)

i=1 j=1

Equation 3 is sufficient to extract all possible pairs from any given graph. For each pair (x, y) in the resultant set R, we must seek for reflexive pairs (i.e. (x, x) or (y, y)), and then by removing those pairs from R we should get the optimized R set for further experiment. Now, for each pair in the optimized R set, the equalityinequality mechanism should be applied to create partitions for those pairs in R. In the upcoming sections, we have discussed the pseudocode of this partitioning method and how large-scale data should be managed with the assistance of partitioning table.

3 Pseudocode of Pairwise Partitioning In this section, we have shown the pseudocode of the pairwise partitioning process to make our understanding clearer.

166

I. Bhattacharya and S. Gupta

3.1 Partitioning Table Generation Procedure The creation of a partitioning table is the fundamental concept to manage the large volume of data. In this section, we have demonstrated the procedure of the creation of the partitioning table which has been mentioned already in our algorithm.

A Novel Partitioning Algorithm to Process Large-Scale Data

167

The above diagram (Fig. 1) depicts the initial representation data in the form of an undirected graph. Now, our task is to create meaningful partitions of the above graph. According to the algorithm, λ (v1 ) = 3, λ (v2 ) = 2, λ (v3 ) = 3, λ (v4 ) = 2, λ (v5 ) = 3, λ (v6 ) = 4, and λ (v7 ) = 3. Therefore, according to λ-equivalence we can create three groups or sets such as S1 = {v1 , v3 , v5 , v7 }, S2 = {v2 , v4 }, and S3 = {v6 }. Applying Eq. 3 and eliminating reflexive pairs we must get an optimized R set (Roptimized ). For this instance, Roptimized = {(v1 , v3 ), (v1 , v5 ), (v1 , v7 ), (v3 , v5 ), (v3 , v7 ), (v2 , v4 ), (v1 , v4 ), (v3 , v2 ), (v5 , v2 ), (v7 , v4 ), (v2 , v6 ), (v4 , v6 ), (v1 , v2 ), (v3 , v4 ), (v5 , v4 ), (v7 , v2 ), (v1 , v6 ), (v3 , v6 ), (v5 , v6 ), (v7 , v6 ), (v5 , v7 )}. All the pairs within Roptimized are distinct. For each pair (x, y) ∈ Roptimized , the equality-inequality mechanism should be applied to get the respective partitioning set for each pair. An equality flag has been used which displays either 0 or 1 depending on the adjacency value of (x, y). If the equality flag becomes 1, we should perform a logical X-NOR operation (as equality detector) otherwise X-OR operation (as inequality detector) should be performed [6]. For this instance, let us suppose (v1 , v3 ) ∈ Roptimized , and adj (v1 , v3 ) = 1. Therefore, the equality flag becomes 1 so, we must go for the equality detection operation i.e. X-NOR. To do this, we should have to convert the degree of each vertex inside any specific pair to its equivalent binary. Here, binary (λ (v1 )) = 0011; and binary (λ (v3 )) = 0011. Therefore, [binary (λ (v1 )) = 0011] [binary (λ (v3 )) = 0011] = 15 so according to the algorithm, (v1 , v3 ) pair should be inserted into P15 partitioning set. In the case of (v2 , v6 ), the equality flag becomes 0, as they are not related to each other. Therefore, an X-OR operation should be performed. Here, [binary (λ (v2 )) = 0010] ⊕ [binary (λ (v6 )) = 0100] = 6, so, the pair (v2 , v6 ) should be inserted to P6 . This process will continue until all the pairs (in Roptimized ) will get their respective partitioning sets. The table referenced underneath (Table 1) simplifies the initial representation of data (shown in Fig. 1) and we will get the respective partitions for it. The above table simplifies the entire representation, as we have successfully partitioned the graph (Fig. 1) in the form of pairs. As, P0 ∪ P1 ∪ P6 ∪ P8 ∪ P14 ∪ P15 = Roptimized , and P0 ∩ P1 ∩ P6 ∩ P8 ∩ P14 ∩ P15 = . Therefore, the partitioning of the above graph has been achieved successfully (Proved). Fig. 1 Undirected graph with random edge distribution

168

I. Bhattacharya and S. Gupta

Table 1 Partitioning table of Fig. 1 Partitioning set

Selected pairs

Status

P0

{(v1 , v5 ), (v1 , v7 ), (v3 , v5 ), (v3 , v7 ), (v2 , v4 )}

Non-essential partitioning set

P1

{(v1 , v4 ), (v3 , v2 ), (v5 , v2 ), (v7 , v4 )}

Non-essential partitioning set

P6

{(v2 , v6 ), (v4 , v6 )}

Non-essential partitioning set

P8

{(v1 , v6 ), (v3 , v6 ), (v5 , v6 ), (v7 , v6 )}

Essential partitioning set

P14

{(v1 , v2 ), (v3 , v4 ), (v5 , v4 ), (v7 , v2 )}

Essential partitioning set

P15

{(v1 , v3 ), (v5 , v7 )}

Essential partitioning set

4 Importance of Partitioning Table on Big Data Management Big data is a field of modern computer science [7] that treats ways to analyze large scale data, systematically extracts information, or otherwise working with data sets that are too large or complex to be dealt with. In a big data management system [8], one of the biggest problems is storage. The access time (t) is inversely proportional to the size of storage (s) [9]. Therefore, if the volume of a data set is sufficiently large then, we must spend a huge amount of time accessing useful information from it. On this occasion, our novel partitioning algorithm provides some important concepts to solve this problem. As we have discussed earlier that, an undirected graph can be used to represent the relationships among data [10]. The vertices which are connected to others by an edge must carry necessary information about each other. If the size of the input graph is very large then, the space complexity [11] will be more and it is very difficult to extract useful information from that graph. The pairwise partitioning algorithm could be applied to resolve this issue. According to the structure of the partitioning table, it can be further divided into essential and non-essential partitioning sets. The essential partitioning set contains those pairs (x, y) where, adj (x, y) = 1, and in the case of non-essential partitioning set, adj (x, y) = 0. Therefore, instead of storing the entire graph into memory, we can store each partition. These partitions could be stored inside memory in a non-contiguous manner so, to access these partitions, we must follow the circular doubly linked list [12] architecture (shown in Fig. 2). Some additional information (which is not essential) should be kept inside non-essential partitioning sets. The size of each partition is always lesser than the entire representation as it has been designed in the form of pairs, which simplifies the initial arrangement of data. The above architecture could be followed to diminish the space complexity when dealing with large scale data, even when the partitions are non-contiguous in memory. In the next section, we have shown the mathematical derivation to extract the most essential vertex from the initial representation of data. The extracted essential vertex should help to understand the graph in a better way.

A Novel Partitioning Algorithm to Process Large-Scale Data

169

Fig. 2 Basic architecture of partitioning sets in memory

4.1 Important Observations Regarding Essential Vertex & Result Analysis Let, Ep denotes the set of essential pairs, and NEp denotes the set of non-essential pairs. If α be any well-formed formula stated as follows: α = [(∃v) P(v)∨(∃u) P(u)]

(4)

Here, P (v): v ∈ (u, v) is a vertex with maximum occurrences in Ep , and P (u): u ∈ (u, v) is a vertex with maximum occurrences in Ep . The well-formed formula α is satisfiable for some vertex belongs to (u, v) in Ep . It derives the most important vertex from the set of all possible essential pairs. During the experiment, we observed an important result i.e. if α is satisfiable for some vertex ‘v’ ∈ (u, v) then, v is the central node of that graph G. The maximum occurrences of a vertex should be calculated using max (λ (vi )) for i = 1 to n. Let, M (v): max (λ (vi )), E (v): v is an essential vertex, and C (v): v is the central node then, (∀v)[M(v) → E(v) ∧ (C(v))]

(5)

Equation 5 is the most important observation in our experiment. In the case of Fig. 1, the essential vertex is v6 , and according to Eq. 5 , it is the central node also. The figure (Fig. 3) referenced underneath makes our understanding even clearer to understand Eq. 5 . In our experiment, we have already mentioned that Ep denotes a set of essential pairs and for this experiment Ep = {P8 ∪ P14 ∪ P15 }. Here, P8 ⊂ Ep is the only partitioning set for generating essential vertex (v6 ) as well as the central node of the initial representation of data (shown in Fig. 1). Table 1 clearly portrays meaningful and simplified partitions of the entire arrangement of large-scale data. If |SPi | denotes

170

I. Bhattacharya and S. Gupta

Fig. 3 Vertex_6 as a central node and carries useful information

the size of each partition then, |G| ≥ |SPi |. In our experiment |G| = |V| + |E| = 17, whereas |SP8 | = 9, which clearly holds the relation |G| > |SPi |. As the size of each partitioning set is lesser than |G|, therefore to access information from each partition will become easier and less time-consuming.

5 Conclusions and Future Scope Nowadays, data is so important as it helps to make better decisions. Any business organization with a website, social media presence, electronic payments of any kind is collecting data about customers, their daily habits, web traffic, demographics, and so on. Initially, a graph-based model could be used to represents the relationships among data, however in the long-run, the entire structure cannot be stored inside memory, otherwise, space complexity will be more. In this context, the pairwise partitioning algorithm will give an efficient solution to reduce space complexity. The partitioning table plays an important role to manage the large-scale data, as it has been designed in the form of pairs. Also, these pairs are again sub-divided into essential and non-essential partitioning sets. Recent required information is stored inside essential partitioning sets, and some additional information, which may require in the future, is stored inside non-essential partitioning sets. The following are a few areas for future research• Conversion of data relationships into computable square grids or lattice graph. • The hierarchical arrangement of information through simplified components i.e. to extract useful information not only from essential vertices but also from connected components arranged in hierarchical form.

A Novel Partitioning Algorithm to Process Large-Scale Data

171

References 1. Zhang, W., Chen, Y., Akin, D.D.: A streaming graph partitioning algorithm for distributed graph storage systems. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 183-192. IEEE, Washington, DC, USA (2018) 2. Patwary, M., Garg, S., Kang, B.: Window-based streaming graph partitioning algorithm. In: Proceedings of the Australasian Computer Science Week Multiconference, pp. 1–10, Association for Computing Machinery, United States (2019) 3. Wang, R., Chiu, K.: A stream partitioning approach to processing large scale distributed graph datasets. In: 2013 IEEE International Conference on Big Data, pp. 537–542. IEEE, Silicon Valley, CA, USA (2013) 4. Nisar, M., Fard, A., Miller, J.A.: Techniques for graph analytics on big data. In: 2013 IEEE International Congress on Big Data, pp. 255–262. IEEE, Santa, Clara, CA, USA 5. Tripathi, A., Tyagi, H.: A simple criterion on degree sequences of graphs. Discrete Appl. Math. 156(18), 3513–3517 (2008) 6. Uthayakumar, T., Vasantha, R., Raja, J., Porsezian, K.: Realization of all-optical logic gates through three core photonic crystal fiber. Opt. Commun. 296(1), 124–131 (2013) 7. Zhang, H., Chen, G., Ooi, B.C., Tan, K., Zhang, M.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015) 8. Bou-Harb, E., Debbabi, M., Assi, C.: Big data behavioral analytics meet graph theory: on effective botnet takedowns. IEEE Netw. 31(1), 18–26 (2017) 9. Park, Y., Cho, K., Bahn, H.: Challenges and implications of memory management systems under fast SCM storage. In: 6th International Conference on Information Science and Control Engineering (ICISCE), pp. 190–194. IEEE, Shanghai, China (2019) 10. Richiardi, J., Van De Ville, D., Riesen, K., Bunke, H.: Vector space embedding of undirected graphs with fixed-cardinality vertex sequences for classification. In: 20th International Conference on Pattern Recognition, pp. 902–905. IEEE, Istanbul, Turkey (2010) 11. Liu, J., Liang, Y., Ansari, N.: Spark-based large-scale matrix inversion for big data processing. IEEE Access 4, 2166–2176 (2016) 12. Gupta, K.G.: Dynamic implementation using linked list. Int. J. Eng. Res. Manage. Technol. 1(5), 44–48 (2014)

Segmentation of Blood Vessels, Optic Disc Localization, Detection of Exudates, and Diabetic Retinopathy Diagnosis from Digital Fundus Images Soham Basu, Sayantan Mukherjee, Ankit Bhattacharya, and Anindya Sen

Abstract Diabetic Retinopathy (DR) is a complication of long-standing, unchecked diabetes, and one of the leading causes of blindness in the world. This paper focuses on improved and robust methods to extract some of the features of DR, viz., Blood Vessels and Exudates. Blood vessels are segmented using multiple morphological and thresholding operations. For the segmentation of exudates, k-means clustering and contour detection on the original images are used. Extensive noise reduction is performed to remove false positives from the vessel segmentation algorithm’s results. The localization of optic disc using k-means clustering and template matching is also performed. Lastly, this paper presents a Deep Convolutional Neural Network (DCNN) model with 14 Convolutional Layers and 2 Fully Connected Layers, for the automatic, binary diagnosis of DR. The vessel segmentation, optic disc localization and DCNN achieve accuracies of 95.93%, 98.77%, and 75.73%, respectively. Keywords Image processing · Artificial intelligence · Image segmentation · Deep learning · Convolutional neural network · Image classification · Template matching · Diabetic retinopathy · Blood vessels · Exudates · Optic disc · Fundus images

S. Basu (B) · A. Sen Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata 700107, West Bengal, India e-mail: [email protected] A. Sen e-mail: [email protected] S. Mukherjee Tata Consultancy Services, Delta Park, Kolkata 700091, West Bengal, India e-mail: [email protected] A. Bhattacharya Tata Consultancy Services, Ecospace Business Park, Kolkata 700156, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_16

173

174

S. Basu et al.

Fig. 1 Different features in a typical DR image

1 Introduction 1.1 Diabetic Retinopathy Diabetic Retinopathy is a direct consequence of prolonged, unchecked diabetes, wherein the retinal blood vessels get damaged and leak fluid into the retina. If left untreated, DR can eventually lead to total blindness. DR can be classified as Mild, Moderate, Severe, and Proliferative Diabetic Retinopathy (PDR). These stages can be identified by the presence and extent of certain features (Fig. 1).

1.2 Motivation Ophthalmologists identify Diabetic Retinopathy based on features like blood vessel area, soft and hard exudates, hemorrhages, cotton wool spots, and microaneurysms. Automatic extraction of these features from fundus images helps in the quick and early diagnosis of DR. PDR is easily identified by studying the abnormal pattern of retinal blood vessels.

1.3 Proposed Methods The proposed algorithm utilizes the structure and contrast of the darker blood vessels with respect to the brighter background and aims to efficiently and accurately segment the vessels from retinal fundus images.

Segmentation of Blood Vessels, Optic Disc Localization ...

175

Next, the structural profile of the Optic Disc is used to generate a template and the images are matched with this template to calculate the similarity between the two. The exudates detection method performs k-means clustering to cluster the different intensities in the original image, and extract the pixels with the highest intensities. Finally, the proposed DCNN employs 14 convolutional layers to generate feature maps from images and predict the correct labels for the diagnosis of DR.

2 Background Wang et al. [1] demonstrated the use of two classifiers—Convolutional Neural Network (CNN) and Random Forest (RF), which can automatically learn features from raw images and predict patterns, by combining feature learning and traditional learning. Zhang et al. [2] proposed an algorithm which classifies vessel pixels using a texton dictionary. It focused more on the thin vessel regions which increased its sensitivity. However, non-vessel pixels may be recognized as vessel pixels, thereby decreasing accuracy and specificity. Singh and Srivastava [3] used entropy-based optimal thresholding and length filtering, while Al-Diri et al. [4] used active contours. Abbadi et al. [5] used the grey levels of the OD to approximate its boundary. Abdullah and Fraz [6] used grow-cut, Mary et al. [7] used active contours and Marin et al. [8] used thresholding on morphologically transformed images. Some use the green channel, the red channel, or a combination of both. These algorithms fail due to the poor contrast of the OD or saturation due to overexposure in the red channel. Besides, the shapes and sizes of exudates may often be comparable to that of the OD. Liu et al. [9] used thresholding and region growing to detect exudates. Long et al. [10] used Dynamic Thresholding and SVM Classification, while Ege et al. [11] used Bayesian, Mahalanobis, and nearest neighbor classifiers for the same. Lam et al. [12] proposed the concept of transfer learning using pre-trained neural networks like GoogLeNet and AlexNet from ImageNet. Pratt et al. [13] proposed another CNN model which was trained using Kaggle’s DR database. However, it could only be trained on a high-end GPU to achieve acceptable results.

3 Materials and Methods 3.1 Hardware and Libraries Algorithms proposed in this paper were coded in Python (version 3.6.9) using OpenCV and Scikit-learn libraries in Jupyter notebooks [14, 15]. TensorFlow Keras was used to build the DCNN for DR diagnosis. The Jupyter notebooks were executed on virtual machines provided by Google Colaboratory (standard runtime) which

176

S. Basu et al.

Fig.2 a Original Image. b Green channel component of (a). c CLAHE applied image. d Background estimated after Alternate sequential filtering. e Image (d) subtracted from (c) and CLAHE applied again. f Median blur and thresholding. g Final segmentation output

consist of Intel® Xeon single core, 2.3 GHz CPUs, and around 12.5 GB of available RAM. The DCNN was trained on the GPU runtime, with a single Tesla K80 GPU.

3.2 Datasets For the Blood Vessel Segmentation, we have used the DRIVE dataset [16]. It is an openly available dataset consisting of 40 images of size 565 × 584 pixels, split equally among training and test sets. Thirty-three out of 40 images are without signs of DR and seven images are with the signs of DR. The proposed method has been evaluated on the test set and reported in this paper. The IDRiD [17] Segmentation dataset was used for the Localization of the Optic Disc and detection of hard exudates. It is a publicly available dataset consisting of 81 images of 4288 × 2848 pixels each, split into training and test sets of 54 and 27 images, respectively. The IDRiD Disease Grading dataset was used for training and testing the DCNN for the diagnosis of Diabetic Retinopathy. It is also a publicly available dataset consisting of 516 images, split into training and test sets of 413 and 103 images, respectively.

Segmentation of Blood Vessels, Optic Disc Localization ...

177

3.3 Proposed Methods Blood Vessel Segmentation. The green channel of the original RGB image (Fig. 2b) was selected because the retina is most sensitive to the green wavelength of light and the vessels have the best contrast in the green channel. Contrast Limited Adaptive Histogram Equalization (CLAHE) was applied to the green channel image to create better local contrast of the vessel pixels (Fig. 2c). The image was passed through four iterations of Alternate Sequential Filtering, where elliptical kernels of sizes (5, 5), (7, 7), (15, 15), (11, 11) were applied to the respective iterations (Fig. 2d). The result was subtracted from the image in Fig. 2c to generate a rough outline of the vessels, removing the background features. CLAHE was applied a second time to create an even better contrast against most other background features (Fig. 2e). The image was then passed through a Median Filter with a kernel size of (3, 3) to filter out salt and pepper noise. Thresholding was done based on the average intensity value (Fig. 2f). Contour detection was then performed to remove larger, isolated specks. The final image with segmented vessels is shown in Fig. 2g. The algorithm’s steps are shown in Fig. 3. Contour Detection. Contour detection employs Green’s theorem. If C is a simple, closed curve, D is the plane enclosed by C, P, and Q are functions of (x, y) defined on an open region containing D (given that their partial derivatives exist in that region), then Green’s theorem can be stated as:

¨

(Pd x + Qdy) = C

D

∂Q ∂P d xd y − ∂x dy

(1)

where the path of integration along the curve C is counterclockwise. Localization of Optic Disc. The proper localization of the OD is an essential step in the segmentation of exudates from fundus images, because of the comparable grey intensity levels of the OD and the exudates. The only difference is the irregular structure and (usually) smaller sizes of the exudates. The proposed method resized a colored fundus image to 300 × 300 pixels and performed k-means clustering on its grayscale counterpart (Fig. 4c). The k-means algorithm minimizes the squared difference between ‘k’ color centers (or means) and the respective pixel values (color values) in the image. If xi is the i-th color center and p j is the color value of the j-th pixel in the image, then the squared error function

Fig. 3 Blood vessel segmentation flow

178

S. Basu et al.

Fig.4 a Original image. b Grayscale of (a). c Result of k-means clustering. d Generated template. e Template matching result (using NCCOEFF; notice the OD region has the highest similarity). f Marking OD and its center. g Masking OD region

JS E to be minimized by k-means is given by JS E =

k n 2 xi − p j i=1

(2)

j=1

A template of size comparable to the average size of the optic disc in all the images was generated (Fig. 4d) and matched with the image, using the Normalized Correlation Coefficient (NCCOEFF) method (Fig. 4e). NCCOEFF is computed as: R(x, y) =

x ,y

x ,y

T

x ,y

.I

x+x ,y+y

2 2 T x , y . x ,y I x + x , y + y

(3)

The location of maximum values from the template-matched result was extracted. This indicates the approximate center of the Optic Disc in the image (Fig. 4f). The algorithm’s steps are shown in Fig. 5. Detection of Exudates. Exudates are small, yellowish deposits located on the outer layers of the retina, formed as a result of protein leakage from the retinal vessels. Initially, k-means clustering was performed on the original image (Fig. 6b), and the cluster with the highest intensities was extracted and binarized. This extracts the exudates (including the largest ones) along with the OD (Fig. 6c). Canny Edge [18] and Contour detection were then sequentially performed on the green channel image to filter out large structures. Thresholding was applied to create a binary image, where the pixels with maximum grey level intensities were considered. This extracts the small exudates with the OD. The images containing the small and large exudates were then logically added (Fig. 6d). The OD was detected in the grayscale image using the

Segmentation of Blood Vessels, Optic Disc Localization ...

179

Fig. 5 Optic disc localization flow

Fig. 6 a Original image. b K-means clustering result. c Extracting the exudates from (b) and thresholding. d Logical OR of (c) and the images containing the smallest exudates. e Final segmentation result after OD masking

aforementioned method, and a circular, black mask was placed over it, in the image in Fig. 6d. This yields the final segmentation result (Fig. 6e). The algorithm’s steps are shown in Fig. 7. Binary Diagnosis of Diabetic Retinopathy using Deep Convolutional Neural Network. The following DCNN was developed for the binary classification of DR. The proposed DCNN architecture was adapted from the VGG-16 [19] architecture and improvised to perform DR diagnosis. It is composed of four stages of convolutional layers (Fig. 8) with a 2 × 2 2D Max Pooling layer between each stage, which

Fig. 7 Exudates detection flow

Fig. 8 DCNN architecture with the corresponding kernel/filter sizes (k), number of feature maps (n), and strides (s) specified for each convolutional layer

180

S. Basu et al.

downsamples the input image by a factor of 2. In each convolutional layer, small 3 × 3 sized kernels were used with Rectified Linear Unit (ReLU) as the activation function. ReLU (x) = max(0, x), x ∈ R

(4)

The original, colored images are resized to 300 × 300 pixels and trained with a batch size of 8. The output of the final convolutional stage is fed to a fully connected layer with 1024 neurons and the ReLU activation function. The final layer is a single neuron with the Sigmoid activation function for binary classification, which is given by S(x) =

1 1 + e−x

(5)

To optimize the network, the Adam [20] optimizer was chosen having a learning rate of 0.0001 along with a Binary Cross-Entropy loss J BC E , which can be computed as: J BC E

m 1 =− yi · log( p(yi )) + (1 − yi ) · log(1 − p(yi )) m i=1

(6)

where yi is the actual label, p(yi ) is the predicted probability of yi and m is the number of training/test examples.

4 Experimental Results 4.1 Segmented Blood Vessels See Fig. 9 and Table 1. Fig. 9 The segmentation result for the best case (left) with the ground truth (right)

Segmentation of Blood Vessels, Optic Disc Localization ...

181

4.2 Localized Optic Disc The proposed Optic Disc Localization algorithm could accurately locate the Optic Disc in 80 of the 81 images in the whole dataset, achieving an accuracy of 98.77% (Fig. 10). Fig. 10 Optic Disc masked image (left) with the original image (right)

Fig. 11 a, b Result of proposed exudates detection method (left) with original image (right) 1 0.9 0.8

Metrics

0.7 0.6 0.5 0.4

Train Loss

0.3

Train Accuracy

0.2

Test Accuracy

0.1 0 0

10

20

30

40

50

Epochs

60

70

80

Fig. 12 DCNN training results with training accuracy, test accuracy and training loss

90

100

182

S. Basu et al.

4.3 Detected Exudates The proposed algorithm was able to identify exudates in all the 81 images in the IDRiD Segmentation dataset quite appreciably (Fig. 11).

4.4 Binary Diabetic Retinopathy Diagnosis The proposed model achieved a maximum training accuracy of 99.27% over 100 epochs. The optimal accuracy on the test set was found to be 75.73% (Fig. 12).

5 Conclusions The proposed blood vessel segmentation method performs better than most of the available segmentation techniques (Table 2). It utilizes simple yet highly efficient morphological operations that perform background estimation and segmentation with more precision over existing methods. However, DCNNs are not as prone to noise and usually have higher sensitivity than image processing methods. Methods that relied on the intensity levels of the OD, often fail to correctly locate it when the exudates have comparable grey level intensities. The proposed method is very efficient in locating the OD accurately, as it relies on the size and structure of the Table 1 Results of proposed vessel segmentation method on DRIVE dataset Accuracy (%)

Specificity (%)

Sensitivity (%)

Dice Coefficient

95.93

98.32

71.19

0.75

Table 2 Results of known vessel segmentation methods on DRIVE dataset (highest in bold) Method

Year

Accuracy (%)

Specificity (%)

Sensitivity (%)

Azzopardi et al. [21]

2015

94.42

97.05

76.55

Roychowdhury et al. [22]

2015

95.20

98.30

72.50

GeethaRamani et al. [23]

2016

95.36

97.78

70.79

Christodoulidis et al. [24]

2016

94.79

95.82

85.06

U-net [25]

2018

95.31

98.20

75.37

R2U-net [25]

2018

95.56

98.16

77.51

LadderNet [26]

2018

95.61

98.10

78.56

BCDU-Net [27]

2019

95.60

97.86

80.07

Sun et al. [28]

2020

95.45

97.41

82.09

Proposed method

2020

95.93

98.32

71.19

Segmentation of Blood Vessels, Optic Disc Localization ...

183

OD, instead of its grey level intensity. However, it may fail to distinguish between the OD and a large exudate, when the size of the latter is comparable to that of the OD. Our exudates detection algorithm extracts not only the largest exudates but also the smaller ones because of the contour detection stage at the end. But the algorithm may generate false positives if the images are very unevenly illuminated. The proposed DCNN architecture takes about an hour to train on Google Colab and has a test accuracy of 75.73%, which is better than most of the previous architectures. However, the biggest advantage of the proposed architecture over existing networks is the significantly lower training time taken to produce acceptable results.

References 1. Wang, S., et al.: Hierarchical retinal blood vessel segmentation based on feature and ensemble learning. Neurocomputing 149, 708–717 (2015) 2. Zhang, L., Fisher, M., Wang, W.: Retinal vessel segmentation using multi-scale textons derived from keypoints. Comput. Med. Imaging Graph. 45, 47–56 (2015) 3. Singh, N.P., Srivastava R.: Retinal blood vessels segmentation by using Gumbel probability distribution function based matched filter. Comput. Methods Programs Biomed. (2016) 4. Al-Diri, B., Hunter, A., Steel, D.: An active contour model for segmenting and measuring retinal vessels. IEEE Trans. Med. Imaging. 28(9), 1488–1497 (2009) 5. Abbadi, N.K., Al-Saadi, E.H.: Automatic Detection of Exudates in Retinal Images (2013) 6. Abdullah, M., et al.: Localization and segmentation of optic disc in retinal images using circular Hough transform and grow-cut algorithm. Peer J. 4, e2003 (2016) 7. Mary, M.C.V.S., et al.: An empirical study on optic disc segmentation using an active contour model. Biomed. Signal Process. Control 18, 19–29 (2015) 8. Marin, D., Gegundez-Arias, M.E., Suero, A., Bravo, J.M.: Obtaining optic disc center and pixel region by automatic thresholding methods on morphologically processed fundus images. Comput. Methods Programs Biomed. 118(2), 173–185 (2015) 9. Liu, Z., Chutatape, O., Krishna, S.M.: Automatic image analysis of fundus photograph. IEEE Conf. Eng. Med. Biol. 2, 524–525 (1997) 10. Long, S., Huang, X., Chen, Z., Pardhan, S., Zheng D.: Automatic detection of hard exudates in color retinal images using dynamic threshold and SVM classification: algorithm development and evaluation. BioMed. Res. Int. 2019 (2019) 11. Ege, B.M., et al.: Screening for diabetic retinopathy using computer-based image analysis and statistical classification. Comput. Meth. Programs Biomed. 62(3), 165–175 (2000) 12. Lam, C.K., et al.: Automated detection of diabetic retinopathy using deep learning. AMIA Summits Translational Sci. Proc. 2018, 147–155 (2018) 13. Pratt, H. et al: Convolutional neural networks for diabetic retinopathy. In Procedia Computer Science, vol. 90, pp. 200–205. Elsevier B.V. (2016) 14. Bradski, G.: The openCV library. Dr. Dobb’s J. Softw. Tools (2000) 15. Pedregosa, et al.: Scikit-learn: machine learning in python. JMLR 12 (2011) 16. Digital Retinal Images for Vessel Extraction (DRIVE). https://www.isi.uu.nl/Research/Databa ses/DRIVE. Accessed 09 Des 2020 17. Porwal, P. et al.: Indian diabetic retinopathy image dataset (IDRiD). https://doi.org/10.21227/ H25W98. Accessed 09 Dec 2020 18. Canny, J.F.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)

184

S. Basu et al.

19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2015). arXiv:1409.1556 20. Kingma, D.P., Adam, B.J.: A method for stochastic optimization. CoRR (2015) 21. Azzopardi, G., et al.: Trainable cosfire filters for vessel delineation with application to retinal images. Med. Image Anal. 19(1), 46–57 (2015) 22. Roychowdhury, S., et al.: Blood vessel segmentation of fundus images by major vessel extraction and subimage classification. IEEE J. Biomed. Health Inform. 19, 1118–1128 (2015) 23. GeethaRamani, R., Balasubramanian, L.: Retinal blood vessel segmentation employing image processing and data mining techniques for computerized retinal image analysis. Biocybern. Biomed. Eng. 36, 102–18 (2016) 24. Christodoulidis, A. et al.: A multi-scale tensor voting approach for small retinal vessel segmentation in high resolution fundus images. Comput. Med. Imaging Graph. (2016) 25. Alom, M.Z. et al.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation (2018). 26. Zhuang, J.: LadderNet: Multi-path networks based on U-Net for medical image segmentation (2018). arXiv:1810.07810 27. Azad, R. et al.: Bi-directional ConvLSTM U-net with densely connected convolutions. Int. Conf. Comput. Vis. (2019) 28. Sun, X. et al.: Robust retinal vessel segmentation from a data augmentation perspective (2020). arXiv:2007.15883

Interval Type-2 Fuzzy Framework for Healthcare Monitoring and Prediction Uduak Umoh , Samuel Udoh , Abdultaofeek Abayomi , and Alimot Abdulazeez

Abstract The popularity of interval type-2 fuzzy logic systems (IT2 FLSs) in the last decade cannot be overemphasized as they have shown superior and more accurate presentation in many applications. In this paper, we investigate healthcare monitoring and prediction using an interval type-2 fuzzy logic system (IT2FLS) based on Mamdani fuzzy inference. Also, the study investigates healthcare monitoring and prediction problems using conventional type-1 fuzzy logic system (T1FLS) for comparison purposes. The empirical comparison was carried out on the developed work using cardiovascular disease patients’ health datasets. The study observed that interval type-2 fuzzy logic could cope with more information and could handle more uncertainties in health data compared to it counterpart. The Root Mean Squared Errors (RMSEs) evaluation results of 0.018 and 0.0006 were observed for type1 fuzzy logic and interval type-2 fuzzy logic respectively in cardiac shock level prediction experiment, which showed the superiority of IT2FLS paradigm over T1FLS. Keywords Fuzzy logic · Fuzzy inference · Type-2 fuzzy sets · Fuzzy controller · Defuzzification · Cardiac patient

1 Introduction The increase in popularity of the fuzzy logic systems in problem solving can be attributed to its ability to incorporate human reasoning in its algorithm. The notion of fuzzy sets (FSs) was introduced in [1] as a method of representing uncertainty and vagueness in a way that elements are not limited to 0 or 1membership functions (MFs), instead, it is a continuity between 0 and 1. A type-1 fuzzy system (T1FLS) U. Umoh (B) · A. Abayomi · A. Abdulazeez Department of Computer Science, University of Uyo, PMB 1017, Uyo, Akwa Ibom State, Nigeria e-mail: [email protected] S. Udoh Department of Information and Communication Technology, Mangosuthu University of Technology, P.O. Box 12363 Jacobs, 4026 Durban, South Africa © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_17

185

186

U. Umoh et al.

can be contemplated as a procedure that use fuzzy set theory in mapping crisp inputs to outputs [2]. T1FLSs is capable of processing data and information with the use of linguistic variables and can make decision in the face of imprecision, vagueness, ambiguity and uncertainty. The beauty of T1FLSs is its’ potential in characterizing unpredictable input/output associations by the use of IF/THEN statements, known as rules [3]. T1FLS consists of four components; namely, the fuzzifier, fuzzy rules, inference engine and defuzzifier [4]. T1FLSs have recorded huge accomplishment in handling a lot of distinct real-world challenges such as classification, regression, control, decision making, prediction and so on [5, 6]. However, due to the complexity and uncertainty in many real-world problems, T1FLs cannot adequately cope with or minimise the effects of the uncertainties posed by the complex nature of many real-world problems. Fuzzy logic uncertainties can result from uncertainty in inputs, outputs, linguistic differences, change in operations’ conditions and noisy data [7]. In order to address this problem, [8] recognized this potential limitation and introduced a higher type of fuzzy sets which is the concept of type-2 Fuzzy logic system (T2FLS) from type-2 fuzzy set (T2FS), an extension of T1FLS where the MFs are themselves fuzzy with mor degrees of freedom (DoF) with the real MF grade is supposed lying inside a closed interval of 0 and 1. However, T2FLS suffers computational complexity and to resolve this problem, interval type-2 fuzzy logic system (IT2FLS) is used which is a simplified version of T2FLS with reduced computational intensity making it quite practicable. Recently, T2FLSs and IT2FLSs have been strongly used to cope with loads of uncertainties in problems associated with predictions and the results are very encouraging [9–14]. In this study, an IT2FLS for prediction problem is presented. Our motivation is to apply IT2FS to reduce prediction error in the task of modeling uncertainties in healthcare data. Mamdani fuzzy inference system (FIS) is employed as the background algorithm. The rest of the paper is structured as follows: Sect. 2 gives the preliminaries and reviews of concepts in TIFLS and IT2FLS. In Sect. 3, the design of IT2FLS for prediction is carried out. Our model results and conclusion are presented in Sects. 4 and 5 respectively.

2 Preliminaries 2.1 Type-1 Fuzzy Set (T1FS) Definition 1 T1FLS, given a finite set X, is formulated as, A = {{(x, µ A (x)}|∀x ∈ X }, where the function µ A (x) :→ [0, 1], is the membership degree, x ∈ X , 0 ≤ µ A (x) ≤ 1 holds and X , domain of definition of the variable [15].

Interval Type-2 Fuzzy Framework …

187

2.2 Type-2 Fuzzy Set (T2FS) ˜ = { (x, u), µA˜ (x, u) |∀x ∈ X, ∀u ∈ Jx ⊆ Definition 2 T2FS takes the form of A [0, 1], where x ∈ X and u ∈ Jx ⊆ [0, 1]; u ∈ Jx , x is the primary variable having X as domain, at each x ∈ X , the secondary variable, u has domain, Jx [12]. x has the primary membership, Jx where the secondary grades equal 1 [10, 11].

2.3 Footprint of Uncertainty (FOU) ˜ Definition 3 The FOU of A, µA˜ (x,u) = 1, F OU A = ∪∀ x∈X Jx {(x,u) : u ∈ Jx ⊆ [0, 1]},

=

˜ ∀x ∈ X µ ¯ A˜ (x) ≡ F OU A

(1)

µA˜ (x) ≡ F OU ( A) ∀x ∈ X

(2)

Jx = (x, u) : u ∈ µ A˜ (x), µ ¯ A˜ (x)

(3)

˜ =F OU A ˜ = ∪∀ x∈X µ (x), µ ¯ A (x) ˜ A A˜

(4)

(10) and (11) are the two type-1 MFs (upper and lower bound of FOU ˜ )representing the upper membership function (UMF) and lower membership A ˜ [16]. function (LMF) of A

2.4 Interval Type-2 Fuzzy Set ˜ = 1/F OU A ˜ Definition 3 An IT2FS, represented as A = 1/ ∪x∈X µA˜ (x), µ ¯ A˜ (x) where X and U are discrete, having the domain of Ã equal to the union of all of its embedded T1 FSs [16].

188

U. Umoh et al.

3 Interval Type-2 Fuzzy Logic System (IT2FL) The IT2FL model is similar to the classical T1FLS with additional components. The structure of IT2FL is made up of the fuzzification unit, rule base, inference engine and output processing (type-reducer and defuzzifier).

3.1 Fuzzifier The fuzzification process connects the crisp input vector x ∈ X into an IT2FS, A with assignment of degree of MFs to each element using a modified Triangular MFs method in Eq. (5). ⎧ ⎪ 0 if x ≤ a ⎪ ⎪ ⎨ x−a i f a ≤ x ≤ b b−a µ A (x) = c−x ⎪ ⎪ c−b i f b ≤ x ≤ c ⎪ ⎩0 if x ≥ a

(5)

where, a, b, c are the x coordinates of the three vertices of µA (x) in a fuzzy set A. a and c are the lower and upper boundaries at which the membership grade is zero. b is the centre with membership grade equal 1. The UMF and LMF of IT2FL are based on (6) and (7) respectively. Where y is left end point of both UMF and LMF, r is right end point of both UMF and LMF and q is the peak point.

0, x ≤ y1 y1 ≤ x ≤ q1 = ⎪ 1, i f q ≤ x ≤ r2 1 ⎪ ⎪ ⎩ r2 −x , i f q ≤ x ≤ r 2 2 r2 −q2

(6)

0, x ≤ y2 r q +y r i f y2 ≤ x ≤ 1 ( q2−y2 ) + 2r( 1−q1 ) ( 2−y2 ) ( 1−q1 ) = r q +y r r2 −x ⎪ , i f x > 1 ( q2−y2 ) + 2r( 1−q1 ) < x < r1 ⎪ ⎪ r2 −q2 ( 2−y2 ) ( 1−q1 ) ⎪ ⎩ 0, i f x ≥ r2

(7)

µ(x) ¯ I T 2F L U M F

µ(x) −

I T 2F L L M F

⎧ ⎪ ⎪ ⎪ ⎨

⎧ ⎪ ⎪ ⎪ ⎪ ⎨

x−y2 , q2− y2

x−y1 ,if q1− y1

3.2 Fuzzy Rules An IT2FIS is characterized by IF–THEN rules, in this case, the antecedent and consequents partsareType-2fuzzy sets [4]. The study adopts Mamdani Fuzzy rules

Interval Type-2 Fuzzy Framework …

189

as defined in (8) and the MF rule is shown in (9) Rk IFxi is A˜ ik and,.., and xn is A˜ kn then yk is B˜ ik

(8)

µ µ µ ˜ µpk Rk IFxi is A˜ ∗ik and 22.2 and x p is A˜ ∗ pk then yk is B∗

(9)

where xi , i =1, 2…, n, = antecedents, y = consequent of the kth rule of IT2FLS. The A˜ i ’s = MFs µ A˜ ik (xi ) of the antecedent part of the ith input xi , The B i = the MFs µ B˜ ik (y) of the consequent part of the output y j .

3.3 Fuzzy Inference The paper adopts a Mamdani-inferencing system due to its ability to shape the final function somewhat locally, without necessarily altering the relation on other regions. It combines the IT2FIS inputs and the output of each fuzzy IF-THEN rule. Then we calculate for both the upper and lower bounds, the strength fire of the pth rule of the K values that fired using (10) and (11–12) respectively. µ( x ) µ µ = fk , fk F k x = f kµ x , f k µ

f k = µ Aˆ (x1 ) ∗ µ 1k

−

Aˆ 2k

(x2 ) ∗ . . . ∗ µ −

Aˆ pk

xp

µ f k = µ¯ Aˆ 1k (x1 ) ∗ µ¯ Aˆ 2k (x2 ) ∗ · · · ∗ µ¯ Aˆ pk x p

(10) (11) (12)

where F k x is the antecedent of kth. µ Aˆ pk is the degree of membership for i = 1,…, p [17, 18].

3.4 Type-Reducer and Defuzzifier Type-reducer in Eq. (13), computes the centroid of an IT2FS to obtain a T1FS, giving an interval of uncertainty for the output of an IT2FLS. To compute centroid of an IT2FS both the left- and right-end points are needed [7, 19]. See [15] and [17, 18] for a detailed definition of Eq. (13). The study adopts Karnik and Mendel (KM), Algorithms [19] to calculate the exact end-points in Eqs. (14) and (15) respectively. For each output k, the defuzzified, which is the crisp output is obtained using Eq. (16).

190

U. Umoh et al.

YT R x = yl (x , yr (x )] ≡ [yl , yr ] =

∫ ∫ ... y 1 ∈[ yl1 ,yr1 ] y 1 ∈[ ylN ,yrN ]

N

∫

f 1 ∈ −f 1 , f¯1

..

∫

f N ∈ −f N , f¯ N

N fri yri yr = i=1 N i i=1 f r N fli yli yl = i=1 N i i=1 f l Yk (X ) =

ylk + yr k 2

1/ i=1 N i=1

f i yi fi

(13)

(14)

(15) (16)

3.5 Performance Evaluation The IT2FLS is analyzed with real-world health datasets. T1FLS is also developed, analyzed and the performance is compared. IT2FS is utilized in the experiment to minimize the model complexity. The performance criteria, Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) in Eqs. (17) and (18) are defined and applied to measure our experimental results: N 1 x (y − y)2 N i=1 N 1 RMSE = (y x − y)2 N i=1

MSE =

(17)

(18)

where y x is desired output, y is the computed output and N is the number of data.

4 Results and Discussion In this paper, experimental results for prediction problem as applied to predicting shock level in cardiovascular diesease patients, using IT2FLS has been presented. The effectiveness and generalization capability of IT2FLS have been tested using 1000 datasets obtained from University of Uyo Teaching Hospital and Federal Medical

Interval Type-2 Fuzzy Framework …

191

Centre, Yenagoa, all in Nigeria. Five cardiovascular health variables namely: systolic blood pressure, diastolic blood pressure, temperature, heart rate and respiratory rate served as the input while cardiac shock level served as the desired output. Triangular membership function was employed for fuzzification of the input. Fuzzy inference was derived using Mamdani inference process. Investigation of the performance of IT2FLS and that of its type-1 counterpart were carried out. Statistical, using performance metrics of Mean Square Error (MSE) and Root Mean Squared error (RMSE) were established. Figure 1 shows the plots of transformed cardiac patients’ health dataset for (a) blood pressure diastolic(b) blood pressure systolic (c) temperature (d) respiratory rate (e) heart rate respectively. Figure 2a, b gives the prediction results for IT2FLS and (b) T1FLS. Figure 3 presents the system’s response of the IT2FLS for the actual and predicted shock level. Figure 4 gives the results of the performance of IT2FLS with that of T1FLS where, Shock Level-Actual is the actual shock level, Shock Level-Predicted-IT2FLS is the predicted shock level by IT2FLS and Shock Level-Predicted-T1FLS is the predicted shock level by T1FLS respectively. The study compares statistical analysis between TIFLS and IT2FLS. Two experiments are conducted in order to explore these analyses. In each case, the performance metrics are the MSE and the RMSE. (a)

(c)

(b)

(d)

(e)

Fig. 1 Plots of the transformed cardiac patients’ health dataset for a blood pressure diastolic b blood pressure systolic c temperature d respiratory rate e heart rate

192

U. Umoh et al.

(a)

(b)

Fig. 2 a and b give the prediction results for IT2FLS and T1FLS

(a)

(b)

Fig. 3 System’s response of the a IT2FLS and b T1FLS for the actual and predicted shock level

Fig. 4 Plots of the system’s response of actual shock level and predicted shock level for IT2FLS and T1FLS

Table 1 compares the performance of IT2FLS with T1FLS with respect to MSE and RMSE. The results of the statistical comparison of the model performance are presented in Table 1. From Fig. 4 it is observed that with the use of IT2FLS, the study is able to model uncertainty adequately in predicting shock level of a cardiovascular dieseas patient better than T1FLS. Also, with MFs that are intervals, the IT2FLS’s ability in modeling uncertainty in predicting shock level of a cardiac patient in many applications is more accurately than its counterpart, T1FL. This is because the MFs of T1FL are not characterized in the form of intervals values.

Interval Type-2 Fuzzy Framework … Table 1 Comparison of T1FLS and IT2FLS in shock level prediction

193

Models

Performance

MSE

RMSE

T1FLS

Prediction error

0.004

0.018

Model error

0.0003

0.079

IT2FLS

Prediction accuracy

0.9997

0.9821

Prediction error

0.0001

0.0006

Model error

2.57E−07

0.0005

Prediction Accuracy

1

0.9995

5 Conclusions In this paper, investigation of the predictive capability of interval type-2 fuzzy logic system (IT2FLS) based on Mamdani fuzzy inference has been explored. Implementation of conventional type-1 fuzzy logic system (T1FLS) was carried out for the purpose of comparison. By the use of IT2FLS, we have been able to predict different shock levels for cardiac patients. Specifically, the following conclusions are made: The IT2FLS cope with more information and handles more uncertainties in health data. The IT2FLS performs significantly well compared to TIFLS. The IT2FLSs with interval MF, when applied in many healthcare problems, has the ability to minimize the effects of uncertainties and offers a better solution. We intend to explore Tagaki Sugeno Kang (TSK) fuzzy inference in the future, to conduct more experiments using the same data sets. Also, we intend to optimize our system using flower pollination algorithm for performance improvement. More so, we hope to employ other fuzzy modeling techniques and compare experimental results. Ethical Issues Ethical issues came to play in this research as the research involved gathering data from cardiovascular patients’ records. However, the research was not involved in the direct collection of the data, but a review of patients’ files and medical histories after due permission was granted by the responsible authorities. Hence, we discuss the ethical issues under two areas: Consent form The consent form through written permission was obtained from the health authority before embarking on the research. A sample of the authorization clearance is used by the ethical committee of Federal Medical Centre, Yenagoa and University Teaching Hospital, Uyo Data Protection Data protection is ensured by not revealing patients’ personal details such as Name, Address, Occupation and many others. Hence, data gathered excluded this information.

References 1. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965) 2. Negnevitsky, M.: Artificial intelligence: a guide to intelligent systems. Addison-Wesley (2002) 3. Chiu, S.: Extracting fuzzy rules from data for function approximation and pattern classification. A Guided Tour of Applications. Wiley, Fuzzy Information Engineering (1997)

194

U. Umoh et al.

4. Mamdani, E.H.: Application of fuzzy algorithms for control of simple dynamic plant. Proc. Inst. Electr. Eng. 121(12), 1585–1588 (1974) 5. Umoh, U., Inyang, U.G.: A fuzzy-neural intelligent trading model for stock price prediction. IJCSI Int. J. Comput. Sci. 12(3), 36–44 (2015) 6. Umoh, U., Asuquo, D.: Fuzzy logic-based quality of service evaluation for multimedia transmission over wireless ad hoc networks. Int. J. Comput. Intell. Appl. (IJCIA) 16(4), 1–22 (2017). 10.1142/S1469026817500237 7. Mendel, J., John, R.: Type-2 fuzzy sets made simple. IEEE Trans. Fuzzy Syst. 10(2), 117–127 (2002) 8. Zadeh, L.A.: The concept of linguistic variable and its application to approximate reasoning. Inf. Sci. 8, 199–249 (1975) 9. Hagras, H.: Type-2 flcs: a new generation of fuzzy controllers. IEEE Comput. Intell. Mag. 2(1), 30–43 (2007) 10. Mendel, J.M.: Uncertain Rule-Based Fuzzy Logic System: Introduction and New Directions (2001) 11. Mendel, J.M., John, R.I., Liu, F.: Interval type-2 fuzzy logic systems made simple. IEEE Trans. Fuzzy Syst. 14(6), 808–821 (2006) 12. Zimmermann, H.J.: Fuzzy Set Theory and Its Applications. Springer, New York (2012) 13. Castillo, O., Melin, P.: Recent Advances in Interval Type-2 Fuzzy Systems, vol. 1. Springer, USA (2012) 14. Umoh, U.A., Inyang, U.G., Nyoho, E.E.: Interval type-2 fuzzy logic for fire outbreak detection. Int. J. Soft Comput. Artif. Intell. Appl. (IJSCAI) 8(3), 1–20 (2019) 15. Castillo, O., Melin, P.: Type-2 Fuzzy Logic: Theory and Applications. Springer, New York (2008) 16. Wu, H., Mendel, J.M.: Uncertainty bounds and their use in the design of interval type-2 fuzzy logic systems. IEEE Trans. Fuzzy Syst. 10(5), 622–640 (2011) 17. Wu, D.: Design and analysis of Type-2 Fuzzy Logic Systems. A Master’s Thesis, Department of Electrical and Computer Engineering, National University of Singapore (2005) 18. Wu, D.R., Tan, W.W.: Computationally efficient type-reduction strategies for a type-2 fuzzy logic controller. In: FUZZ-IEEE, Reno, USA, pp. 353–358, May, 2005 type-reduction (2005) 19. Karnik, N.N., Mendel, J.M.: Centroid of a type-2 fuzzy set. Inf. Sci. 132(1), 195–220 (2001)

Real-time Social Distancing Monitoring and Detection of Face Mask to Control the Spread of COVID-19 Shreyas Mishra

Abstract The novel Coronavirus Disease 2019 (COVID-19) which first emerged in Wuhan, China in late December 2019, has now spread to all the countries in the world. Many pharmaceutical companies around the world are racing to find a cure, but the possibility of a vaccine in the near future seems bleak. Recommended solutions for preventing the spread of the virus include wearing face masks, physical and social distancing and regular use of alcohol-based hand sanitizers. The governments of many countries are actively promoting social distancing and the use of a face mask. This paper will propose techniques for the real-time monitoring of violations of social distancing guidelines as well as detection of a face mask. This paper has successfully monitored public spaces and indoor shops in real time. The methods can be applied on live CCTV camera video feed both in indoor as well as outdoor setting. They are essential to control the spread of COVID-19 in accordance with local government guidelines. Keywords COVID-19 · Social distancing · Face mask classifier · Real-time object tracking

1 Introduction The COVID-19 outbreak was declared a pandemic on 11 March, 2020. As of November 2020, nearly 60 million people had been infected and more than 1 million people have died all over the world, with no definite cure or vaccine in sight. As of now, the only ways to prevent getting infected are by practicing social distancing and wearing face masks, both indoors and outdoors. It is recommended to maintain a distance of at least 1 m from each other so as to not get infected by a potential transmitter of the virus. This issue is solved by the use of a face mask, so as to not catch the virus droplets in the air and not transmit the virus if the person is infected but asymptomatic. Local S. Mishra (B) National Institute of Technology, Rourkela 769008, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_18

195

196

S. Mishra

governments around the world are actively calling on people to follow the safety guidelines which have been recommended by the WHO and their country’s health departments. Many local governments have slapped hefty fines on people who do not practice public health guidelines. Monitoring public areas such as small streets used for walking and narrow marketplace lanes, and indoor spaces such as the different aisles inside a supermarket or the billing counter can be quite helpful in preventing the accumulation of a large crowd. CCTV cameras installed in public spaces and shops for security purposes can be used for the purpose of crowd monitoring. The live video feed from the CCTV camera is usually stored in local devices or cloud servers where the video is fed to software in real time which can monitor social distancing violations and give real time results as well as monitor whether a person is wearing a face mask or not. There are important applications of both these processes individually. The software should also be able to monitor people’s faces and distance between them in real time simultaneously. The applications of these results can be used to generate alerts when the safety guidelines are not being followed and take necessary steps to carry out the changes. Cameras can be installed in front of the door of a shop which opens automatically only if the person is wearing a mask. Monitoring distances between people can be accompanied by keeping a tab on the number of people not at a safe distance from each other. This paper will propose a technique to monitor social distancing in real time and monitor whether people are wearing a face mask in real-time video feeds.

2 Literature Study There are studies which have introduced the concept of social distancing and practicing the usage of face mask to protect oneself and others around the person from getting infected [1, 2]. The authors observed very less clusters of infection in masked settings than non-masked settings [2]. Many studies have simulated the spread of infection among the populace by taking into account parameters such as reproduction rate and rate of infection using social distancing measures [3–7]. Methods to monitor the distance between people include CCTV camera video feeds and wearable devices connected via Bluetooth or Wi-Fi, which can detect the distance between two devices and raise an alert. Very few researches have been published in the field of real-time monitoring of social distancing [8–10]. In the DeepSOCIAL [8] model, the author has proposed a detection technique for people using a three-step architecture. This includes real-time surveillance of public places, where the video is sent as real-time input to a network that can detect the people present in each video frame. The distance between each detected person is calculated and the ones which are measured as more than a threshold are highlighted. In [9], the COVID-Robot designed by the author detects violations of social distancing and encourages people to move apart if they do not follow the required guidelines. The authors in vision-based monitoring [10] have used an AI-based detection system to calculate the density of the crowd. The authors estimated the size of

Real-time Social Distancing Monitoring …

197

the reference object inside the image by comparing it to the width of pedestrians detected using their object detection model. They determined an upper limit of the social density in the crowd taken into account and have tried to maintain crowd density under the upper limit so as to decrease the probability of social distancing violations. The use of deep learning techniques for face mask detection has been illustrated in [11]. They have extracted the features from different image datasets of people wearing and not wearing masks and have trained these features using the ResNet-50 architecture. The classification of these features has been done using an ensemble machine learning model, decision trees and support vector machines and have compared the results achieved on each dataset.

3 Proposed Methodology Social Distancing violations can be determined using an algorithm that uses the YoloV3 [12] object detection algorithm. They include object density calculation using Density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. Face mask can also be detected by modifying the Yolo configuration file.

3.1 Social Distancing Violations and Face Mask Classifier Using DBSCAN and DSFD Algorithm This algorithm uses the bounding boxes obtained using the YoloV3 algorithm. The DBSCAN and DSFD algorithms are used on each of the individual bonding boxes to get the distance between the individual persons and detect masks on their faces. The predicted classes with person labels have to be detected and taken into account. The dimensions of bounding boxes with a confidence level above 0.5 are used to find the clusters using the DBSCAN algorithm. In this method, given a set of points, the algorithm groups together the set of points that are closely packed and the points which lie in low-density region are marked as outliers. Simultaneously the faces are obtained using the dual shot face detector (DSFD) algorithm. This is better than simple algorithms like Haar-Cascades or MTCNN because it can detect faces in low light and resolution and in a wide range of face orientations. A feature enhance module (FEM) is used to enhance the quality of original feature maps to obtain the dual shot detection using single-shot detection. After that, progressive anchor loss (PAL) and Improved Anchor Matching (IAM) is used by different anchors to provide better initialization for regressor. Popular benchmarks have demonstrated the superiority of DSFD over other face detection systems [14, 15].

198

S. Mishra

The faces are passed to a trained classification model. This model is built using a modified ResNet-50 architecture. The dataset has been created by downloading images of faces with and without masks. To increase the number of training images, an algorithm was used to place different types of masks artificially on the original unmasked faces [17]. Various augmentation techniques such as rotation of the image, horizontal and vertical flips, brightness shift are applied to the images before training the model to make the images diversified so that the training images consist of faces of various orientations. Images are also randomly blurred by applying different types of noises such as Gaussian blur, horizontal and vertical motion blur and anti-diagonal motion blur. The faces detected by the DSFD algorithm are passed to this model which determines whether the person is wearing a face mask or not and produces alerts using transfer learning. The algorithm has been illustrated in Fig. 2. The algorithm repeats itself as long as video feet is input to the system.

Fig. 1 Monitor social distancing violations and face mask classifier using DBSCAN and DSFD algorithms

Fig. 2 Real-time face mask classification using YoloV3

Real-time Social Distancing Monitoring …

199

3.2 Real-Time Face Mask Classification The algorithm provided for monitoring social distancing can be effective if tuned properly with the right parameters but is computationally expensive to implement in the real world. CCTV cameras with installed software, high-end processors, cloud services and alert generation systems can make the whole system quite expensive which cannot be afforded by small shops and can be costly for taxpayers if the local government decides to use them widely. These devices can be replaced with a face mask detector system which can accurately determine whether a single person or multiple persons in a frame are wearing a face mask or not. The dataset used in the previous subsection has been used to train the model. This dataset consists of images of various people with real face masks and artificially placed masks on unmasked faces. The algorithm edits the YoloV3 configuration file by deleting all the classes present in the file because these classes have to be filtered out. Two classes are added namely, mask and no mask. The images were annotated and a text file containing the coordinates of the bounding box containing the mask was created. Once the algorithm is trained, it accepts real time videos as input and produces the output in real time, as to whether the person is wearing a mask or not. This is because this process is not computationally extensive if the training has been completed beforehand. This can be useful in monitoring how many people in a crowd are actually wearing a mask, the system can be connected to an automatic door opener where the front door to a shop opens only if the customer is wearing a face mask. The system has been illustrated in Fig. 2.

4 Results and Discussion Table 1 provides the comparison of mean average precision (mAP) values. As shown in Table 1, YoloV3 performs better in terms of mean average precision value than Faster R-CNN and Yolov4. Figure 3 illustrates the output frame for the algorithm using DBSCAN to cluster and DSFD to detect faces. This algorithm is quite efficient to detect the presence of face mask and monitor social distancing violation. As shown in Fig. 3, two bounding boxes appear, either in red or green for the person and the face. Using the YoloV3 algorithm, the objects which in this case are the persons, this paper achieves an accuracy of 69.65% which is higher than object detection using other algorithms. The DSFD algorithm detects the faces present in the video frame. These faces are taken as input to the ResNet-50 classifier which has been trained using transfer learning on the dataset which has been created. If the person is not wearing a mask, a red bounding box appears on the face. If the persons are not following social distancing, a red bounding box appears over each person. The ResNet model classifies the faces detected using the DSFD model with an accuracy of 98.8%.

200

S. Mishra

Fig. 3 Output frame for DBSCAN and DSFD algorithm

Fig. 4 Real-time Face Mask Classifier using YoloV3

Figure 4 illustrates the results obtained from the face mask classifier using object detection where the mask is the object by modifying the YoloV3 configuration file. This algorithm detects whether the face is covered with a mask or not. This method is faster when run on video feeds than when social distancing is taken into account. In Fig. 4, the test videos have been taken from YouTube clips [18, 19].

5 Conclusion This paper proposes some techniques to control the spread of COVID-19 by monitoring social distancing between people in public spaces and by detecting the presence of face masks on people using object detection techniques. Monitoring social

Real-time Social Distancing Monitoring … Table 1 Comparison of evaluation metrics for object detection using different models

201

Model

mAP (%)

FPS

Faster R-CNN [9]

42.1–42.7

25

Yolov4 [9]

41.2–43.5

25

Proposed Work

69.65

30

distancing was achieved by calculating cluster density using the DBSCAN algorithm. Face mask detection was carried out by detecting faces first using the DSFD algorithm followed by a ResNet based classifier and by using the YoloV3 algorithm to detect the presence of a mask on the person’s face. It was observed that YoloV3 provides a very good mean average precision value for mask detection in comparison to other object detection techniques. The paper observed that it is computationally very expensive to monitor social distancing in crowds but the detection of face masks is quite fast with the right processing system. These techniques can be used to generate alerts whenever public safety guidelines are not being followed. The face mask classifier can be used with CCTV cameras in storefronts to prevent the opening of doors if the customer is not wearing a face mask. Local governments can fine people for not following safety measures by continuous monitoring of these locations. These methods can help to curb the spread of COVID-19 in a community. Future research opportunities include thermal screening systems fitted alongside CCTV cameras as well as infrared cameras which can measure the body temperature of a customer. People’s movements can be monitored by law enforcement agencies for crowd control and maintain social distancing regulations. The spots where there is overcrowding can be shown in different colors than other places on the live video feed and alarm can be given.

References 1. Tyrrell, C.J., et al.: The paradox of social distancing: Implications for older adults in the context of COVID-19. Psychol. Trauma: Theory, Res. Prac. Policy 12(S1), 214–216 (2020) 2. Cheng, V.C., et al.: The role of community-wide wearing of face mask for control of coronavirus disease 2019 (COVID-19) epidemic due to SARS-CoV-2. J. Infect. 81(1), 107–114 (2020) 3. Lee, L.Y.K., et al.: Practice and technique of using face mask amongst adults in the community: a cross-sectional descriptive study. BMC Public Health 20(1), 1–11 (2020) 4. Mao, L.: Agent-based simulation for weekend-extension strategies to mitigate influenza outbreaks. BMC Public Health 11(1), 522 (2011) 5. Kumar, S., et al.: Policies to reduce influenza in the workplace: impact assessments using an agent-based model. Am. J. Public Health 103(8), 1406–1411 (2013) 6. Cauchemez, S., et al.: Estimating the impact of school closure on influenza transmission from Sentinel data. Nature 452(7188), 750–754 (2008) 7. Milne, G.J., et al.: A small community model for the transmission of infectious diseases: comparison of school closure as an intervention in individual-based models of an influenza pandemic. PLoS ONE 3(12), 4005 (2008) 8. Rezaei, M., Azarmi, M.: DeepSOCIAL: social distancing monitoring and infection risk assessment in COVID-19 Pandemic. arXiv preprint arXiv:2008.11672 (2020)

202

S. Mishra

9. Yang, D, et al.: A vision-based social distancing and critical density detection system for COVID-19. Image video Process. DOI (2020) 10. Sathyamoorthy, A.J, et al.: COVID-Robot: Monitoring Social Distancing Constraints in Crowded Scenarios. arXiv preprint arXiv:2008.06585 (2020) 11. Loey, M., et al.: A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 167(1), 108288 (2020) 12. Redmon, J, et al.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018) 13. Li, J., et al.: DSFD: dual shot face detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5060–5069. IEEE, Long Beach, USA (2019) 14. Wang, J., et al.: Face attention network: An effective face detector for the occluded faces. arXiv preprint arXiv:1711.07246 (2017) 15. Tang, X., Du, D.K., et al.: Pyramidbox: A context-assisted single shot face detector. In: Ferrari, V., Hebert, M., (eds.) Proceedings of the European Conference on Computer Vision (ECCV), LNCS, vol. 11213 pp. 812–828. Springer, Munich, Germany (2018) 16. Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE, Miami, USA (2009) 17. GitHub Mask Augmentation. https://github.com/aqeelanwar/MaskTheFace, last accessed 2020/05/10 18. YouTube link for test dataset. https://www.youtube.com/watch?v=OzaryngZ5Kk, last accessed 2020/10/08 19. YouTube link for test dataset. https://www.youtube.com/watch?v=YCmKEq5wzDU, last accessed 2020/10/08

Emotion Recognition from Feature Mapping Between Two Different Lobes of Human Brain Using EEG Susmita Chaki, Anirban Mukherjee, and Subhajit Chatterjee

Abstract The occipital lobe of human brain is responsible for visual perception and the prefrontal lobe is responsible for emotion recognition. A novel approach to understand the interrelation between the occipital lobe and pre-frontal lobe for human emotions is presented in this paper. Electroencephalogram is an important and efficient tool for emotion recognition. In this paper data acquisition is performed using 10–20 electrode placement system. Data are acquired from occipital lobe as well as from pre-frontal lobe corresponding to five different emotional videos. The raw EEG data from occipital lobe and pre-frontal lobe are pre-processed using surface Laplacian filtering. After removal of artifacts and noise, feature extraction is performed using wavelet transform and the feature power spectral density is considered further. Feature mapping between occipital lobe and pre-frontal lobe is performed for different emotions. The work has been performed in MATLAB. The present work shows faster convergence of the weights of the proposed Type 1 Fuzzy neural network compared to the back-propagation neural network which indicates better perceptual ability. Keywords Human emotion · Wavelet transforms · Type1 Fuzzy

1 Introduction Emotions are the most important features of human communication. For analysis of human emotions different methods are available such as Electromyography (EMG), Electrocardiography(ECG)and Electroencephalography (EEG).Among these methods EEG is the most effective and convenient. In this proposition [1],

S. Chaki (B) · S. Chatterjee University of Engineering & Management, Kolkata, India e-mail: [email protected] A. Mukherjee RCC Institute of Information Technology, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_19

203

204

S. Chaki et al.

interrelation between temporal lobe and pre-frontal lobe has been made for understanding of auditory perceptual ability. In this work [2], emotion classification is performed using EEG with the help of discrete wavelet transform. Here preprocessing of EEG signals is done using surface Laplacian filtering, thereby analyzing the signals into three different frequency bands. Human emotions are also classified in this work [3] using KNN which gives highest classification accuracy as compared to other algorithms. To classify discrete emotions, a combination of surface Laplacian filtering, time-frequency analysis of wavelet transform and linear classifiers are used in [4]. Linear discriminant analysis is also used for internal emotion classification [5]. Four different emotions are classified using EEG and broad learning system in [6] where one electrode channel is selected for feature extraction. For binary classification of emotions a marged LSTM model has been proposed in [7]. A deep learning technique is used to identify emotions from raw EEG signals and LSTM model is used for feature extraction in [8]. In proposition [9] BiLSTM is used, which is an improved version of LSTM model, to analyze signals for emotion classification. In this paper we have proposed automated understanding of the perceptual ability of different human emotions. In this proposition application of 21 channel-electrodes is shown whereas electrodes related to occipital lobe and prefrontal lobe are more relevant to the proposed work. The objective is to understand the perceptual ability of humans by mapping features from occipital lobe and prefrontal lobe when engaged in visualizing and recognizing different emotions. Five different emotional videos are shown to a subject and during that time data acquisition is performed from occipital lobe and pre-frontal lobe. For removal of artifacts, surface Laplacian filtering and for feature extraction wavelet transform is used. In this work power spectral density from occipital lobe features and prefrontal lobe features have been extracted and mapped. The PSD features for all five different emotional video for multiple sessions are collected, Back Propagation Neural Network (BPNN) and Type1 Fuzzy algorithm are used independently to map feature class relationship using MATLAB software. Upon developing the mapping function between occipital lobe and prefrontal lobe, it is needed to find variation with time, if any, of the mapping function. If there is no change in the mapping function for a particular subject over time then we can say accurate interrelation is established between two brain lobes in visualizing and recognizing proper emotions.

2 Principles and Methodologies In this section the methodology and the working of the proposed system is presented and it is explained in the successive discussions.

Emotion Recognition from Feature Mapping Between Two Different Lobes …

205

Fig. 1 Block Diagram of our proposed work

2.1 EEG Data Acquisition In this section we propose an approach to map occipital lobe and pre-frontal lobe EEG response. At first we have to accumulate data from occipital lobe corresponding to five different emotions. For different emotions we have to take EEG response from pre-frontal lobe for 10 different subjects. Fig. 1 represents the block diagram of the proposed work. The block diagram represents the feature mapping between occipital lobe and pre- frontal lobe. In the first block data acquisition is performed from occipital lobe and pre-frontal lobe. Features are extracted after pre-processing and finally feature map- ping is performed.

2.2 Pre-processing EEG signals recorded from different brain lobes are contaminated with noise due to eye blinking, undesired movement of muscles etc. The complete removal of artifacts is not possible and it will also remove some useful information which are necessary for the proposed work. A couple of methods are available for artifact removal. Here surface Laplacian (SL) filter is used for removal of noise and artifacts. The surface Laplacian is a technique that has been utilized to improve the spatial resolution of EEG. Using this SL filter, the electrical activities that are spatially close with respect to an electrode are emphasized. Also the EEG activities common to all channels are attenuated at the same time to enhance the spatial resolution of the recorded signal. The surface Laplacian filter is mathematical modeled as follows:

206

S. Chaki et al.

X new = X (t) −

1 NE

(1)

where X new is filtered signal; X(t) is raw signal and N E denotes number of neighbor electrodes.

2.3 Feature Extraction As far as research related to emotion analysis from EEG signals is concerned, the non-parametric method of feature extraction following wavelet transform has been reported in the literature. The time-frequency resolution produced by wavelet transform (WT) is good for extraction of details and signal approximation; neither Fast Fourier Transform (FFT) nor Short Time Fourier Transform (STFT) [2, 3] is that good for the same purpose. The non-stationary EEG signals are expanded onto basis functions by scaling and shifting a single prototype function (Ψ a,b, the mother wavelet) selected specific to a signal. The mother wavelet Ψ a,b (t) is given as: 1 t −b ψa,b (t) = √ ψ a a

(2)

where a, b ∈ R, a > 0, and R are the wavelet space, ‘a’ is the scaling factor and ‘b’ the shifting factor. The admissibility condition for selecting a prototype function as mother wavelet is: ∝ Cψ = −∝

|ψ(ω)|2 dω T

(1)

4.3 Sobel Edge Detection Technique Sobel is a trendy filter of DIP which is used for highlighting edges and also removes unnecessary information. The colour representation value is lies between 0 and 255, that is, lesser the value of gray levels that means dark the vicinity of a scene and superior the value of gray levels means brighter the particular realm of the picture [10]. Usually, it is utilized for locating the approximate absolute gradient magnitude at every point in the scene. ⎡

⎡ ⎤ ⎤ +1 0 −1 +1 +2 +1 G x = ⎣ +2 0 −2 ⎦ × A, G y = ⎣ 0 0 0 ⎦ × A +1 0 −1 −1 −2 −1 Here, A is the input 2D image array and G x , G y are the mask that will be multiplied with A, where G x as a horizontal kernel and G y as a vertical kernel. Now, calculate the value at each pixel by shifting the row till the end. The gradient approximation specified through G x and G y is merged to provide gradient magnitude, by applying [11]. G=

G 2x + G 2y

272

A. Soni and A. Rai

Fig. 6 Smoothen data

Now, once the Sobel has been acquired, determine its inverse matrix for segmenting smoothen data G −1 (Fig. 6).

4.4 Morphological Dilation Morphological action is utilized for extracting picture elements and also profitable for articulating image shape [12]. It is normally applied for adjusting the pixel intensity. In this work, we apply the Morphological Dilation operation for filling up very minute gaps and broken region by altering image pixels intensities and make the image more proper for extracting the region of concern [13]. This operation does not cover up those cells which have successive 0. It simply covers those 0 where successive 1 is present in any row or column.

4.5 Cataract Region Recognition The cataract region can be extracted by comparing the consequential picture of the proposed system and the healthy eye image. If the scene contains large uncertainties (holes), that means a cataract is present in the output. This is segmented by the foreground extraction technique (Fig. 7). Fig. 7 Cataract region extraction

Automatic Cataract Detection …

273

4.6 Sobel Magnitude and Dilation Algorithm: Require: Gx ← Horizontal kernel, Gy ← Vertical kernel, G ← Absolute magnitude, x, y ← co-ordinates, A ← Input picture, x ← Grayscale picture, T ← threshold and Px ← Probability. INPUT: A ← Cataract picture as 2D array OUTPUT: H ← overall gradient magnitude Step 1: Input 2-dimensional picture in the array structure Step 2: Alter RGB picture into grayscale Step 3: Reduce contrast via HE cd f x(i) =

0

Px ( j)

j

Here cdf is the cumulative distribution function Step 4: Replace pixel with a dark pixel if the picture intensity I i,j < constant T. Step 5: Apply Sobel by gradient mask Gx and Gy , ⎡

⎡ ⎤ ⎤ +1 0 −1 +1 +2 +1 G x = ⎣ +2 0 −2 ⎦ ∗ A, G y = ⎣ 0 0 0 ⎦ ∗ A +1 0 −1 −1 −2 −1 G = G 2x + G 2y Then determine, G−1 for smoothening. Step 6: Apply Morphological Dilation Step 7: if Contrast > T, then Cataract Affected picture; else No Cataract distinguished; end else end if. Step 8: Highlight the infected area; Step 9: End Flow graph illustrates the similar progression of the developed scheme which is explained above (Fig. 8).

274

A. Soni and A. Rai

Fig. 8 Flow graph

5 Experimental Analyses The outcome has been evaluated on the basis of TP, TN, FP, and FN. There are 130 sample images that have been examined where 62 sample as true positive (predicted as positive which is really positive), 1 as false positive (predicted as positive which is really negative), 3 as true negative (predicted as negative which is really positive) and 64 images as false negative (predicted as negative which is really negative). Thus, the overall accuracy has been computed as 96.92% (Fig. 9 and Tables 1 and 2). Fig. 9 Console result

Automatic Cataract Detection … Table 1 Outcome analysis

Table 2 Result comparison

275 Terms

Result obtained

TTC

130

TP

62

TN

3

FP

1

FN

64

Terms

Harini et al. [6]

Proposed

TP

27

62

TN

14

3

FP

1

1

FN

3

64

Accuracy

91.11%

96.92%

TTC − (TN + FP) × 100 TTC 130 − (3 + 1) = × 100 130 = 96.92%.

Accuracy =

6 Conclusions In this effort, we proposed a proficient approach for the automatic identification of cataracts from fundus pictures with a great precision rate and less processing instant. Firstly; acquire input image and altered this image into a grayscale picture via histogram equalization where unwanted noise will be trim down after that we apply thresholding, Sobel filter, and morphological dilation for finding boundaries of the object, which is very helpful for recognizing cataracts in the next step and finally, we obtained the desired output. The System achieves 96.92% accuracy by testing 130 sample images. The outcome has been assessed on the basis of a true positive, true negative, false positive, and false negative. Automatic identification of cataract conquers human mistakes. In the field of medical science, the automatic recognizing algorithm is now trending which genuinely saves a human life, time, and wealth. The generated progression is more predictable and effortless. The above-developed system can be extended in the future by utilizing diverse kinds of techniques and filters which can acquire great accuracy because correctness is a very indispensable factor in the medical that is why system accuracy can be enhanced through various filters or method.

276

A. Soni and A. Rai

References 1. Formerly Danbury Eye Physicians & Surgeons Greater Waterbury Laser Eye Physicians: Passfaces. https://www.danburyeye.com/cataract-surgery-new-milford.htm 2. Jindal, I., Gupta, P., Goyal, A.: Cataract detection using digital image processing. In: Global Conference for Advancement in Technology (GCAT). IEEE (2019) 3. Zhanga, L., Lia, J, Zhang, I., Han, H., Liu, B., Yang, J., Wang, Q.: Automatic cataract detection and grading using deep convolutional neural network. IEEE (2017) 4. Jagadale, A.B., Jadhav, D.V.: Early detection and categorization of cataract using slit-lamp images by hough circular transform. In: International Conference on Communication and Signal Processing. IEEE (2016) 5. Pavan, T.R., Deepak, A.: Automatic cataract detection of optical image using histogram of gradient. Int. J. Eng. Res. Technol. (2018) 6. Harini, V., Bhanumathi, V.: Automatic cataract classification system. In: International Conference on Communication and Signal Processing (2016) 7. Kolhe, S., Guru, S.K.: Remote automated cataract detection system based on fundus images. Int. J. Innovative Res. Sci., Eng. Technol. (2016) 8. Patwari, M.A.U., Arif, M.D., Chowdhury, M.N.A., Arefin, A.: Detection, categorization, and assessment of eye cataracts using digital image processing. In: The First International Conference on Interdisciplinary Research and Development, Thailand (2011) 9. Zheng, J., Guo, L.Y., Peng, L.H., Li, J.Q., Yang, J.J., Liang, Q.F.: Fundus image based cataract classification. In: IEEE International Conference on Imaging Systems and Techniques Proceedings, pp. 90–94 (2014) 10. Shen, H.L., Hao, H.W., Wei, L.H., Wang, Z.B.: An image based classification method for cataract. In: International Symposium on Computer Science and Computational Technology. ISCSCT’08, vol. 1, pp. 583–58 (2008) 11. Jia, Y.Q., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678 (2014) 12. Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: IJCAI ProceedingsInternational Joint Conference on Artificial Intelligence, vol. 22, no. 1, pp. 1237–1242 (2011) 13. Qiao, Z., Zhang, Q., Dong, Y., Yang, J.-J.: Application of SVM based on genetic algorithm in classification of cataract fundus images. In: IEEE Instrumentation and Measurement Society (2017) 14. Holennavar, V., Kumar, P.: Instant detection of cataracts. Int. J. Latest Trends Eng. Technol. (2017)

Context Based Searching in Cloud Data for Improved Precision of Search Results Viji Gopal and Varghese Paul

Abstract The World Wide Web has been spread its branches in all the areas of the day to day life of humans. It is considered as the largest data repository in the world that works as a key driving force for many novel architectures in the area of information technology. With the expansion of the volume of content, it has gotten hard to construct an intuitive web search using conventional keyword search. Probably the worst issue of traditional search engines is that typically they depend on mere keyword processing. An idea is proposed here to ameliorate the process of searching with data separated from the semantic model of the domain. With the improvement of network storage services, the cloud architecture has proved that it is inexpensive, easy to manage, highly scalable and has a wide access limit. These features attract an ever-increasing number of enterprises and make them decide to redistribute enormous amounts of information to a third party. This method will help heaps of medium or small enterprises dispose of expenses of development and maintenance, thus it has wide market possibilities. But the companies must pay for the bandwidth they use to access the data they store in the cloud which may end up as an economic burden on mediocre enterprises when a large number of their employees fetch data from the cloud through networks. Such huge expense may keep users stay away from the conveniences of cloud usage. In this situation, returning large volumes of data as a search result is not a good solution. The returned search results must be small in volume and highly precise. This brings forth the requirement of an optimized approach for searching in clouds and fetching results. The use of context-based searching is very beneficial in this regard. Ontology is the backbone of semantic web technologies. Keywords Cloud computing · Ontology · Semantic search · Context-based search · Affinity table · Word similarity V. Gopal (B) · V. Paul School of Engineering, Cochin University of Science and Technology, Cochin, Kerala, India e-mail: [email protected] V. Paul e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_26

277

278

V. Gopal and V. Paul

1 Introduction An ontology is characterized as conventional jargon, as an information portrayal, that depicts the essential classes of being by characterizing entities, categories of entities, and the relationships among these specified entities. Ontology is defined as a “specification of a conceptualization” as per Gruber [1]. Normally ontologies are made for a particular area. Information can be viably coordinated in ontology so that reuse and sharing become seamless. The ultimate aim of all these problems is to decide the semantic similarity between concepts from various ontologies. The basic elements of ontology are classes, instances (individuals), attributes (the properties of a class), relationships and a hierarchical structure [2].

1.1 Context-Based Searching The current keyword-based web search engines return plain text for a search query. Moreover, these search engines only consider the string value of the query. Hence they cannot provide optimum results for searches. A lot of irrelevant data that just contains the keyword is returned resulting in the wastage of several resources like computation power, bandwidth and time. Semantic Web-based systems are built based on a network of ontologies [3]. To fetch more relevant results, ontology mixes semantics with inference rules. An information retrieval system or a search engine can use ontology on two various phases of the data retrieval process. First is disambiguation which means, selecting the proper search domain. For example, a jaguar in the automobile is not a jaguar in an ecosystem. The second one is relevance feedback by which the system returns search results and then gather feedback from the user which will guide the result of the next similar query [4–6].

2 Designing Ontologies During the last few years, many Semantic Web-related technologies have emerged or have been elaborated. One of the most important parts of these improvements is the status of ontology development languages which looks to be more stable now. The World Wide Web Consortium (W3C) has approved the Resource Definition Framework (RDF) and the Web Ontology Language (OWL). The W3C Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be exploited by computer programs, for example, to verify the consistency of that knowledge or to make implicit knowledge explicit.

Context Based Searching in Cloud Data for Improved Precision …

279

2.1 Restriction Types A restriction is a class that is characterized by a depiction of its members regarding existing properties and classes. The language constructs in OWL for constructing new class descriptions depending on descriptions of its members is called the restriction (owl:Restriction). Class expressions (also referred to as descriptions) and complex concepts can be constructed using Classes and their property expressions. Class expressions constitute a group of individuals by formally specifying certain conditions on the attributes of the corresponding individuals. Individuals satisfying these conditions are supposed to be instances of the respective class expressions. Table 1 lists a set of the most popular restriction types used by OWL. OWL suggests various sorts of restrictions [7] as given below: i. ii. iii. iv. v.

Object Property Restrictions Data Property Restrictions Object Property Cardinality Restrictions Data Property Cardinality Restrictions Propositional Connectives and Enumeration of Individuals.

3 OWL Constructs Overview OWL ontologies are categorised as OWL Full, OWL-DL and OWL-Lite. OWL-Lite confines the maximum and minimum cardinality to 1 or 0. OWL-DL loosens up this limitation by permitting minimum and maximum values. OWL Full permits instances to be defined as a class too that is not available in the other two variants of the ontologies [8]. Figure 1 depicts the relation between three classes Person, Occupation and Hobby. Table 1 Commonly used Operators in OWL

Operator

Name

Meaning

≥

Min Cardinality

“At least n”

≤

Max Cardinality

“At most n”

=

Cardinality

“Exactly n”

hasValue

“equals x”

∀

Universal, allValuesFrom

“Only”

∃

Existential, someValuesFrom

“Some”, “At least one”

280

V. Gopal and V. Paul

Fig. 1 Figure depicting the relation between classes

3.1 OWL Syntax of an Ontology A more clear and comprehensible syntax of an ontology can be presented as below: Class(SpicyPizza complete annotation(rdfs:label "PizzaTemperada"@pt) annotation(rdfs:comment "Any pizza that has a spicy topping is a SpicyPizza"@en) Pizza restriction(hasTopping someValuesFrom(Spicy Topping) ) )

3.2 Ontology Building Tools: Protégé The Protégé-OWL API is an open-source Java library for the Web Ontology Language (OWL) and RDF(S). The API provides classes and methods to load and save OWL files, to query and manipulate OWL data models, and to perform reasoning based on Description Logic engines. Furthermore, the API is optimized for the implementation of graphical user interfaces. The Protégé platform supports modeling of ontologies.

Context Based Searching in Cloud Data for Improved Precision …

281

3.3 Sample Ontologies Some sample ontologies have been implemented as part of the research. The screenshots of them have been presented as Figures 2 and 3.

4 Word Relativity Computation A document should not be thought of as a bagful of words, instead, the patterns inside them decide what the document is about [9–11]. It is not the number of occurrences of a word, but what words are connected and how it is done decides the destiny of a web text document [12, 13]. A few weighting schemes might be inferred to depict information relevance. In our work, we choose the strategy known as Term Frequency × Inverse Document Frequency (TF × IDF ) data weighting scheme [14]. As per the TF × IDF weighting scheme, we assume the following: C represents a concept and Ci is an instance of C. The set of web documentsW = {W 1 , W 2 , …. , W n }. N is the total number of web documents, ni represents the number of documents containing keywords in C i . Oij is the frequency in which all words from C i appear in W j , which is depicted as given in Table 2. Now that, we have the frequencies, as per TF × IDF scheme: Probability, Pij can be computed as :

Fig. 2 Sample of a pizza ontology

282

V. Gopal and V. Paul

Fig. 3 Sample of a wine ontology

Table 2 Table showing the word relativity computation C1

W1

W2

….

Wj

…

Wn

O11

O12

…

O1j

…

O1n

C2

O21

O22

…

O2j

…

O2n

…

…

…

…

…

…

…

Ci

Oi1

Oi2

…

Oij

…

Oin

…

…

…

…

…

…

…

Ck

Ok1

Ok2

…

Okj

…

Okn

Pi j = TFi j × IDFi j N = Oi j × log ni The higher the Pij , the more will be the relevance of W j with C i . For every W j there will be a set {P1j , P2j ,…, Pij , …, Pkj } if k is the total number of concepts. The document is classified into the concept with the highest Pij . Also, this method implies a pattern, how this document is related to the other concepts in the field of interest. This frequency mapping is used to rank the results, in the first phase. For a given search pattern, the documents are sorted based on the Pij values obtained after performing TFxIDF calculation to rank them constructing the initial set of results. In the second phase, we propose the use of an affinity table to relate the concepts to each other that can give more insightful search results to the user. Relation between

Context Based Searching in Cloud Data for Improved Precision …

283

concepts depending on the similarity in documents leads to a network of related concepts. This gives rise to a new concept of an affinity table which is used in this paper to find out the concepts with higher affinity. After presenting the initial set of results, the next set is from the territory of the second related concept, and then from the third related concepts and so on. This ensures that the user is not bombarded with hundreds of irrelevant search results. Also if a document consists of more than one concept which is related to each other in the affinity table, that document tends to be more efficacious to the user. We also propose to cache the results in intermediate servers to provide a quick response to the users. When the similar query originates from the connected network, the server can immediately serve the request as the results are cached in it. Such stored data is assigned a life span after which the precision rating is brought down. When the precision rating goes below a threshold δ, the aged data is permanently removed from the server.

4.1 Concept of Affinity Table Affinity table, AT is represented in the form of a matrix, [ATij ]. It is a square matrix of size kxk where k is the number of concepts in the selected field. Initially, all values in an affinity table are zeroes. The values are populated in a later stage when the search algorithm starts working. The relation of documents with various concepts that are in consideration is available after the initial set of computations of TFxIDF. This reveals the affinity of two different concepts. For example, when a person searches for a pesticide, the system can show him its side effects, the reason for the side effect, an alternative, if any, with lesser side effects. In the classification, if concept x always ranks next to concept y when a specific pattern is searched, that indicates a high affinity. If one document that is in concept x is never ranked in concept y, a very low affinity is implied. The calculated affinity values are entered in the affinity table as shown in Table 3. This becomes a self-learning system. As the number of searches increases, that increases the precision of the values in the affinity table. This enables the system to provide more accurate and valuable search results to the user and to provide maximum Table 3 The sample of an affinity table Ci

Cj

C1

10.00

8.21

7.27

9.29

5.89

C2

8.21

10.00

9.73

5.63

8.73

C3

7.27

9.73

10.00

9.23

6.71

C4

9.29

5.63

9.23

10.00

6.96

C5

5.89

8.73

6.71

6.96

10.00

C1

C2

C3

C4

C5

284

V. Gopal and V. Paul

information related to the concept they searched in the first set of results itself [15]. This saves a lot of time for the user making this proposal a significant contribution in this area.

5 Conclusion Several papers were reviewed in the area of ontology and context-based searching. Existing systems are being closely analyzed to find their flaws and loopholes. Currently, we are working on the proposed method to improvise semantic similarity computation based on ontology in the area of agriculture. Also work is in progress to store the affinity table as a hash table to reduce the storage size and for improved access. As future work, we suggest including the searching of multiple concepts together which is not implemented in this paper. It is sure that if we can bring semantic knowledge into information retrieval systems, intelligent reasoning can be implemented which in turn will enhance the ability of machines to understand the meanings of concepts, thereby returning the most related documents as search results.

References 1. Gruber, T.R.: Toward principles for the design of ontologiesused for knowledge sharing. Hum. Comput. Stud. 43, 907–928 (1995) 2. Jun Zhai, Meng Li, and Jianfeng Li “Semantic Information Retrieval Based on RDF and Fuzzy Ontology for University Scientific Research Management” Affective Computing and Intelligent Interaction 2012, AISC 137, pp. 661–668 3. Guerram, T., Mellal, N.: A domain independent approach for ontology semantic enrichment. In: 7th International Conference on Natural Language Processing (NLP 2018) pp. 13–19 (2018) 4. Stergiou, C., Psannis, K.E., Brij Gupta, B.-G.: Secure integration of IoT and Cloud Computing. Futur. Gener. Comput. Syst. 78, Part 3, pp. 964–975 (2018) 5. Li, Y., Keke, G., Longfei, Q., Meikang, Q., Zhao, H.: Intelligent cryptography ap-proach for secure distributed big data storage in cloud computing. Inf. Sci. 387, 103–115 (2017) 6. Yong, Y., Man, H.A., Giuseppe, A., Xinyi, H., Willy, S., Yuanshun, D., Geyong, M.: Identitybased remote data integrity checking with perfect data privacy preserving for cloud storage. In: IEEE Transactions on Information Forensics and Security, vol. 12, no. 4 (April 2017) 7. https://www.coursehero.com/file/43447716/ProtegeOWLTutorialppt/ 8. Neethukrishnan K V,Swaraj K P. “Ontology Based Research Paper Recommendation Using Personal Ontology Similarity Method”, Second 2017 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT) 9. Shujun, X., Qinglai, G., Jianhui, W., Chen, C., Hongbin, S., Boming, Z.: Information masking theory for data protection in future cloud-based energy management. IEEE Trans. Smart Grid (Early Access) https://ieeexplore.ieee.org/abstract/docu-ment/7586112/ in (2018) 10. Ye, Z., Zhan-lin, Y.: Research on ontology-based semantic similarity computation. Int. Conf. Mach. Vis. Hum.-Mach. Interface, 472–475 (2010) 11. Senthil Kumar, K., Abirami, A.: Personalized web search based on client side ontology. Int. J. Eng. Sci. Comput., 16083–16086 (2018)

Context Based Searching in Cloud Data for Improved Precision …

285

12. https://uc-r.github.io/word_relationships 13. https://workingontologist.org/Examples/ 14. Joachims, T.: A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Int. Conf. Machine Learning (1997) 15. Iman Keivanloo, Feng Zang, Ying Zou “Threshold-Free Code Clone Detection for a LargeScale Heterogeneous Java Repository”, 2015 IEEE Conference

Gaussian-Based Spatial FCM Technique for Interdisciplinary Image Segmentation Srirupa Das

Abstract Spatial information with the fuzzy membership function plays an important role in the segment and classify the remote sensing images as well as medical images. In this paper, a Gaussian distribution-based spatial fuzzy c-means method has been proposed for the segmentation and classification of remote sensing images. To check the working principle of the proposed method in the interdisciplinary field, it has also been tested on brain MRI image. The intensity-based distances have been replaced with the compliment of Gaussian distribution to focus on the active artifacts in the datasets at the time of segmentation. The correlation of the neighbors has been estimated as local spatial membership, which is used to deal with the uncertainties and artifacts. The partition coefficient and partition entropy have been measured as quantitative statistical parameters and the segmented images as qualitative parameters, which have been used to understand the supremacy of the proposed method over the considered state of the art techniques. Keywords Artifacts · Brain MRI image · Fuzzy c-means · Intensity inhomogeneity · Remote sensing · Segmentation · Spatial information

S. Das (B) RCC Institute of Information Technology, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_27

287

288

S. Das

1 Introduction In remote sensing, depending upon the sensors’ properties, the spectral and spatial resolutions vary for different satellite images. Due to low spatial resolution, each pixel of the hyperspectral image covers a huge area in the earth surface. Hence, it is considered to be an important task to classify and unmix the hyperspectral pixels. Among the several available unmixing techniques [1, 2], Linear Mixing Model (LMM) [3, 4] has been widely used. If the endmembers, present in the image scene, are linearly arranged and single interactions occur between them, then LMM-based methods are useful for classification and unmixing. But generally, the arrangements of endmembers in the earth surface are non-linear and very close to each other. Many non-linear and bilinear models [5–8] are also being used widely to deal with non-linearity, such as Nascimento’s model [8], Fan’s model [6], Support Vector Machine (SVM) [9, 10] and Gaussian mixture model (GMM) [11] etc. Conventional Fuzzy-based methods [12, 13] are also popular here as the fuzzy membership functions are effectively used in unmixing and it performs well when the considered pixel is free from noise and uncertainty [14–19]. The presence of non-homogeneity, artifacts and uncertainties are very common in hyperspectral pixels due to sensors’ errors and close contact of endmembers’ in the surface. To deal with these issues authors in [18], considered the influences of the local constraints in the FCM-based methods. Though the effect of noise and non-homogeneity can be minimized with this approach, but it introduces a blurring effect in the edges. To overcome these issues, in [14] (FLICM) a fuzzy local information-based c-means technique has been defined. Recently to segment image data authors in [16] introduced an approach on entropy-based FCM technique but the constraints related to the classification in the presence of uncertainties are not fully addressed. To get rid of the above-mentioned issues and better classification in presence of noise and a high degree of non-homogeneity, in this study, a Gaussian-based Spatial Fuzzy c-means method has been proposed. The spatial information has been computed as a spatial membership function using Gaussian-based distance and geometrical distance. The proposed algorithm has been simulated and examined for Jasper ridge and brain MRI datasets and it has shown its superiority in terms of visual output, partition coefficient and entropy over considered techniques. The rest of the paper is arranged as follows, Section II clarifies the proposed method, section III elaborates the Result analysis and Section IV concludes the study.

Gaussian-Based Spatial FCM Technique for Interdisciplinary …

289

2 Methodology The proposed method deals with the issues that developed during the classification and segmentation of the remote sensing images by the state of the art fuzzy-based methods in presence of a high degree of non-homogeneity and noise. A 5 × 5 dynamic mask (M i ) has been constructed to estimate the local spatial information, which has been inherent along with the global membership for pixel (xi ) of the remote sensing dataset, X = {xi ∨ i = 1, 2, . . . (nr ∗ nc ∗ nz)}, and (nr ∗ nc ∗ nz) is the dimension of dataset X . At the very beginning, the number of clusters (C) and initial cluster centers (v j ) are determined by Histogram Pick Associative rules [17] and the standard deviation has been calculated considering the cluster centers as sample mean. The global membership (u i j ) has been calculated for the entire image in the same away as FCM does. A local distance matrix (D i j ) is constructed based on the difference between the neighbors and the cluster centers for each processing pixel, defined in Eq. 1. Di j = ||Mi − V |||V = v j ∀ j

(1)

Gaussian-based distances (G di j ) between each member of the local distance matrix (Di j ) and it’s center (D c ) has been calculated using Eq. 2. G di j = e

Di j −Dc 2 − 2 2σ1

(2)

Now a spatial distance matrix (S di ) has been constructed (Eq. 3) by computing −−→ the Gaussian-based geo-spatial distance of the neighbours (s, t) from the processing −−→ pixel (i, j). Sdi = e

−−→ −−→ 2 − (i, j)−(s,t) 2 2σ2

(3)

The standard deviation σ1 has been computed for the whole image based on the cluster centers and σ2 has also been computed for the considered mask based on the distance matrix (D i j ). Now both the matrices generated from Eq. 2 and Eq. 3 are convolved (Eq. 4) to produce the resultant distance matrix (L i j ), hence the local membership has been generated by normalizing the (L i j ) to satisfy the constraints of the fuzzy membership function in Eq. 5.

290

S. Das

Algorithm 1: Gaussian-based spatial FCM Techniques.

Li j =

Sdi ∗ G di j

(4)

∀k, j∨xk ⊂M i

Li j | Li j = 1 Li j =

∀j Li j

(5)

∀j

The modified membership function (u i j ) (Eq. 6) has been computed by combining both local and global membership functions. New cluster centers (V ) are estimated based on the modified membership. The standard deviation has been updated for the image based on new cluster centers. The proposed method is executed iteratively; the optimization of the objective function ensures the optimal cluster centers and segmentation as well, which is defined in Eq. 8.

Gaussian-Based Spatial FCM Technique for Interdisciplinary …

291

p C ui j × L i j q | ui j = ui j = 1 p C j=1 q u × L ij j=1 ij

nr ×nc m u i j × xi i=1 V = v j |v j = |xi ∈ X

nr ×nc m ui j i=1

J=

C nr ∗nc j=1

i=1

u imj di j +

C nr ∗nc j=1

i=1

L imj G di j

(6)

(7) (8)

The detailed process of execution of the proposed method has been depicted in Fig. 1 as a flowchart and the detailed steps have been shown in Algorithm 1.

Fig. 1 Detailed working principle of the proposed method as flowchart

292

S. Das

3 Result and Discussion The proposed Gaussian-based spatial FCM (GSFCM) method has been tested for remote sensing datasets, i.e. jasper ridge dataset [20] and also tested on brain MRI image to check the interdisciplinary performance. The Method has been developed and simulated on Matlab (v. 2015a) at windows 10 environment with Intel i3 processor and 4 GB of memory. An image scene of (100 × 100) pixels with 198 bands has been considered, where the spectral resolution is 9.46 nm. Four major endmembers are there in the image scene, namely soil, tree, water and road. The performances of the GSFCM have been estimated for the brain MRI image [21] of size (181 × 217) and slice 84. The statistical validation functions of clustering, i.e. partition coefficient (V pc ) and entropy (V pe ) [18] have been measured to ensure the supremacy of the proposed GSFCM over the considered fuzzy-based recent works. The quantitative and qualitative performances of the proposed method have been analyzed in Table 1 and Figs. 2 and 3 for Jasper ridge and brain MRI datasets. The outcomes have also been compared with the considered fuzzy-based methods such as FCM [12], ASIFC [19], FLICM [14], sFCM [18] and EFCM [16] and observed that the proposed GSFCM shows significant improvements. Table 1 Comparisons of statistical quantitative results of GSFCM with considered methods for Jasper ridge and MRI datasets

Jasper ridge dataset [19]

Band

Validation Function

FCM

ASIFC

FLICM

sFCM

EFCM

Proposed Method

22

Vpc

0.7467

0.8099

0.8783

0.9399

0.7423

0.9802

Vpe

0.4808

0.3822

0.2351

0.1004

0.5032

0.0282

Vpc

0.7468

0.7665

0.9079

0.9402

0.8148

0.9733

Vpe

0.4808

0.4348

0.1556

0.1007

0.3489

0.0200

112

Vpc

0.7469

0.7462

0.8827

0.9464

0.8049

0.9795

Vpe

0.4807

0.4841

0.2149

0.0893

0.3856

0.0173

119

Vpc

0.7467

0.7426

0.8996

0.9477

0.8078

0.9817

Vpe

0.4808

0.4883

0.1836

0.0877

0.3723

0.0130

Vpc

0.7468

0.7376

0.8757

0.9409

0.7951

0.9822

Vpe

0.4807

0.4986

0.2248

0.0982

0.4053

0.0159

Vpc

0.803

0.843

0.8015

0.897

0.906

0.9764

Vpe

0.378

0.290

0.3847

0.180

0.119

0.0387

55

163 Brain MRI with 9% noise and 40% IIH

Slice #84

Gaussian-Based Spatial FCM Technique for Interdisciplinary … Input Image

FCM

ASIFC

FLICM

293 sFCM

EFCM

Propose d

Water

Ve getation

Soil

Road

Fig. 2 Qualitative results for Jasper ridge dataset of band 112. 1st column represents original image (band 112), 1st row contains water, 2nd row contains vegetation and 3rd row contains soil and 4th row contains road for FCM [12], ASIFC [19], FLICM [14], sFCM [18], EFCM [16] and proposed method Input Image

FCM

ASIFC

FLICM

sFCM

EFCM

Proposed

WM

GM

CSF

Fig. 3 Qualitative results for T1 weighted brain MRI. 1st column represents original image (#84) with 9% noise and 40% inhomogeneity, 1st row contains WM, 2nd row contains GM and 3rd row contains CSF for FCM [12], ASIFC [19], FLICM [14], sFCM [18], EFCM [16] and proposed method

4 Conclusion In this paper, a Gaussian distribution-based spatial fuzzy c-means method has been proposed for unmixing and classification of remote sensing images. To check the working principle of the proposed method in the interdisciplinary field, it has also been tested on brain MRI image in the presence of non-homogeneity and noise. The intensity-based distances have been replaced with the compliment of Gaussian distribution to focus on the active artifacts in the datasets at the time of classification. The partition coefficient and partition entropy have been measured as quantitative

294

S. Das

statistical parameters and the segmented image as qualitative parameters, which are shown in Table 1, Fig. 2 and Fig. 3 respectively. From the table and figures, it has been understood that the proposed method shows significant improvement over considered fuzzy-based techniques.

References 1. Keshava, N., Mustard, J.F.: Spectral unmixing. IEEE Signal Process. Mag. 19(1), 44–57 (2002) 2. Meer, F.V.D.: Iterative spectral unmixing (isu). Int. J. Remote Sens 20(17), 3431–3436 (1999) 3. Heinz, D.C., Chang, C.-I.: Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 39(3), 529–545 (2001) 4. Zanotta, D.C., Haertel, V., Shimabukuro, Y.E., Renno, C.D.: Linear spectral mixing model for identifying potential missing endmembers in spectral mixture analysis. IEEE Trans. Geosci. Remote Sens. 52(5), 3005–3012 (2014) 5. Dobigeon, N., Tits, L., Somers, B., Altmann, Y., Coppin, P.: A comparison of nonlinear mixing models for vegetated areas using simulated and real hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(6), 1869–1878 (2014) 6. Fan, W., Baoxin, H., Miller, J., Mingze, L.: Comparative study between a new nonlinear model and common linear model for analysing laboratory simulated—forest hyperspectral data. Int. J. Remote Sens. 30(11), 2951–2962 (2009) 7. Halimi, A., Altmann, Y., Dobigeon, N., Tourneret, J.-Y.: Nonlinear unmixing of hyperspectral images using a generalized bilinear model. IEEE Trans. Geosci. Remote Sens. 49(11), 4153– 4162 (2011) 8. Nascimento, J.M.P. Bioucas-Dias, J.M.: Nonlinear mixture model for hyperspectral unmixing. In: Bruzzone, L., Notarnicola, C., Posa, F. (eds.) Proceedings of the SPIE Image and Signal Processing for Remote Sensing XV, Berlin, Germany, 7477 (2012) 9. Ping-Xiang, L., Wu, B. Zhang , L.: Abundance estimation from hyperspectral image based on probabilistic outputs of multi-class support vector machines. Paper presented at the IEEE International Geoscience and Remote Sensing Symposium, Seoul, pp. 4315–4318 (2005) 10. Tang, Y., Krasser, S., Yuanchen H., Yang, W., Alperovitch, D.: Support vector machines and random forests modeling for spam senders behavior analysis. Paper presented at the IEEE Global Telecommunications Conference, New Orleans (2008) 11. Cheng, B., Zhao, C., Wang, Y.: Algorithm to unmixing hyperspectral images based on APSOGMM. Paper presented at the 2010 First International Conference on Pervasive Computing, Signal Processing and Applications Signal Processing and Applications, Harbin, pp. 964–967 (2010) 12. Foody, G.M.: Approaches for the production and evaluation of fuzzy land cover classifications from remotely-sensed data. Int. J. Remote Sens. 17, 1317–1340 (1996) 13. Bastin, L.: Comparison of fuzzy c-means classification, linear mixture modelling and MLC probabilities as tools for unmixing coarse pixels. Int. J. Remote Sens. 18(17), 3629–3648 (1997) 14. Krinidis, S., Chatzis, V.: A robust fuzzy local information C-means clustering algorithm. IEEE Trans. Image Process. 19(5), 1328–1337 (2010) 15. Ma, A., Zhong, Y., Zhang, L.: Adaptive multi objective memetic fuzzy clustering algorithm for remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 53(8), 4202–4217 (2015) 16. Kahali, S., Sing, J.K., Saha, P.K.: A new entropy-based approach for fuzzy c-means clustering and its application to brain MR image segmentation. Soft. Comput. 23, 10407–10414 (2019) 17. Namburu, A., Samayamantula, S.K., Edara, S.R.: Generalised rough intuitionistic fuzzy cmeans for magnetic resonance brain image segmentation. IET Image Proc. 11(9), 777–785 (2017)

Gaussian-Based Spatial FCM Technique for Interdisciplinary …

295

18. Chuang, K.S., Tzeng, H.L., Chen, S., Wu, J., Chen, T.J.: Fuzzy c-means clustering with spatial information for image segmentation. Comput. Med. Imaging Graph 30(1), 915 (2006) 19. Wang, Z., Song, Q., Soh, Y.C., Sim, K.: An adaptive spatial information-theoretic fuzzy clustering algorithm for image segmentation. Comput. Vis. Image Underst. 117(10), 1412–1420 (2013) 20. https://www.escience.cn/people/feiyunZHU/Dataset_GT.html 21. https://www.bic.mni.mcgill.ca/brainweb/

COVID-19 India Forecast Preparedness for Potential Emergencies Narayana Darapaneni, Ankit Rastogi, Bhagyashri Bhosale, Subhash Bhamu, Turyansu Subhadarshy, Usha Aiyer, and Anwesh Reddy Paduri

Abstract To forecast the COVID-19 India information and predict an estimate of the number of beds needed in hospitals and healthcare centers for the predicted active cases; as this forecast information would aid the Indian government agencies to prepare and handle the needs of people who are infected with COVID-19 and require hospital care. Initially, many literatures related to COVID-19 were reviewed to understand the global situation and to understand the prediction models that are being used to forecast the COVID-19 information. Subsequently, a visual Exploratory Data Analysis was performed on the available official datasets pertaining to India, to understand the state-wise impact of COVID-19 outbreak. Furthermore, designed the SIR Model for forecasting almost accurate COVID-19 information; related to Susceptible, Infected and Recovered cases, based on which an estimate of the required beds in hospitals could be computed. This estimation on the availability of hospital beds for the COVID-19 active patients would be handy for the government agencies; especially while planning their preparatory steps to handle potential emergencies. The dataset specific to COVID-19 India was analyzed as it holds data records since March 2020. A total of 365 days was considered in the SIR model for the prediction of the COVID-19 information; specific to the Indian States. On further analysis, it was observed that Maharashtra is a highly impacted state. So, for this use case, the SIR Model was designed as it became a significant need to forecast the possibility of COVID-19 confirmed, recovered and active cases; in the upcoming 6 months. Moreover, based on this forecast information, an estimate of beds available in hospitals could get computed, for the agencies to handle COVID-19 active patients. Keywords Covid-19 · Pandemic · Infected · Susceptible · SIR Model

N. Darapaneni Northwestern University/Great Learning, Evanston, USA A. Rastogi · B. Bhosale · S. Bhamu · T. Subhadarshy · U. Aiyer · A. Reddy Paduri (B) Great Learning, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_29

297

298

N. Darapaneni et al.

1 Introduction The COVID-19 is a malady brought about by a novel infection called SARSCoV-2 i.e. a serious intense respiratory condition [11]; coronavirus-2, which has spread over more than 200 nations contaminating more than 60 lakh people groups around the world. The flare-up of this sickness was declared by the World Health Organization as a pandemic on March 11, 2020. India too is impacted profoundly by this viral malady with more than 200,000 individuals trapped in its arms [7]. The symptoms of COVID-19 are from gentle to serious, which are shown by fever, aches and pains, nasal congestion, runny nose, or sore throat. However, for some people, it can cause serious illness [8]. Around 1 in every 5 people who are infected with COVID-19 develop difficulty in breathing and require hospital care [3]. The two most significant methods of transmission of coronavirus are respiratory beads and contact transmission with a brooding time of 2–14 days. It is an infectious disease caused by the coronavirus and it is now a pandemic affecting many countries across the globe [12]. This infection has spread globally in countries like China, USA, Spain, Russia, UK, Italy, and India. Numerous investigations have been published by the scientists globally, to comprehend the elements of this pandemic. The Indian government had taken prompt preventive measures since the time the first COVID-19 contaminated individual was identified in Kerala. Though various preventive measures were imposed viz. nation-wide lockdown, screening of travelers at air terminals, suspension of transport including domestic and international flights, committed COVID-19 medical clinics and test centers, the spread of COVID-19 has not taken a bow down towards its ending phase. To address this issue, AI-based Models are essential for first predicting the spread of such infectious diseases, as these diseases quickly spread from one person to another. Moreover, these AI-based Models can help provide deep insights on the impact of COVID-19 such as the proportion of the population infected, deceased or recovered [2]. To model infectious diseases such as COVID-19 that sporadically spreads from one person to another; insights have to be gained to check the pattern of its spread viz. proportion of a population it infects, the proportion of population expires and so forth [9]. It is recommended to use a compartmental type AI Model for it will help separate the population into several compartments, namely; Susceptible, Infected, and Recovered [9]. Though the number of recovered cases in India is large in size, the COVID19 infected cases seem to be growing rapidly which is a cause of big concern. Looking into the current situation across the globe and in India, it is required to keep the essential medical resources ready for use by the government agencies; as a step towards preparedness to handle potential emergencies. Moreover, taking into consideration the demographic and land assorted variety in India, a distinct statewise prediction of COVID-19 curve flattening is needed to determine the needs of medical essentials like PPE kits and availability of beds in hospitals, using the AI Models [12].

COVID-19 India Forecast Preparedness for Potential Emergencies

299

1.1 Objective Build an AI-based predictive model to forecast the COVID-19 infected cases in India, based on which an estimation of the number of beds available in hospitals and healthcare centers to be computed; as this would aid to plan and serve the COVID19 infected patients. This forecast information would help the Indian government agencies to handle the needs of people who require hospital care for COVID-19 treatment.

2 Methodology and Results Initially, a visual exploratory data analysis i.e. EDA was performed on the available COVID-19 India datasets to understand the state-wise impact of the COVID-19 outbreak, so that necessary preparatory steps can be planned by the government agencies to handle the emerging potential emergencies. At the beginning of this pandemic, though a very less number of people bring this infection into the country, but the entire population is susceptible to infection as such diseases spread rapidly and infects people. There exists a time duration when the person gets infected and then is confirmed positive, due to which the possibility of total infected people can be assumed to be greater than the total number of confirmed positives when there is exponential growth, and the total number of infected people is almost equal to the total number of confirmed cases when the growth stops [4]. To provide insights over the actual situation, and perform forecasting of COVID-19 information, the SIR Model can be designed as it apt to accurately fit the exponential growth, linear growth, as well as a natural decline of cases that is possible in the pandemic [4]. Therefore, two AI Models, namely; the SIR model and ARIMA model were analyzed to forecast the COVID-19 information like the active, confirmed and recovered cases. It was observed that the ARIMA model had certain limitations as it could forecast the COVID-19 information for the utmost one to two weeks only. Hence, for this use case design, the SIR Model was used to predict the COVID-19 susceptible population, recovered population and active cases in India. Based on this forecast information, an estimation of the number of beds available in hospitals and healthcare centers gets computed; so that the government agencies can effectively plan to accommodate the COVID-19 infected patients across different states in India.

300

N. Darapaneni et al.

While implementing the SIR Model, the following computations were performed [4]. Rate of change of Susceptible Population = ds/dt = −β · I · S/N

(1)

Rate of change of Infectious Population = ds/dt = β · I · S/N − γ · I

(2)

Rate of change of Recovered/Removed Population = ds/dt = γ · I

(3)

The following COVID-19 India live dataset was used to perform the above operations: https://api.covid19india.org/csv/latest/state_wise_daily.csv [1]. This dataset is driven by volunteers; who collect data from trusted sources, and feed it to the spreadsheet in standard.csv format. This preset data is used for obtaining the COVID-19 status and patient data across India [1]. These available datasets are scalable such that they provide the latest cumulative daily COVID-19 test counts of patients at the national level, state level and district level. In addition, the contents of this repository are reviewed regularly and made official for public use [1]. In this use case, the COVID-19 India live dataset is used to fetch the confirmed, recovered and deceased status of COVID-19 patients across India, for forecasting the COVID-19 information. The active cases are computed daily, by deducting the recovered ones from the confirmed cases. From the SIR Model applied on this dataset, it is observed that Maharashtra is the highest COVID-19 impacted state of India, and the graph Fig. 1 depicts predicted versus actual COVID-19 data until July 2020. From this depiction of COVID-19 data, it is observed that there is a rapid growth in the spread of this coronavirus disease, in Maharashtra state. Additionally, the below graph displays the trend of a population infected, recovered and confirmed COVID-19 cases (viz. y-axis: 1 unit = 100,000) in Maharashtra, until March 2021 (Fig. 2).

Fig. 1 Number of Covid-19 infected people in Maharashtra

COVID-19 India Forecast Preparedness for Potential Emergencies

301

Fig. 2 Trend of infected, recovered, total

The SIR model was designed in this use case to forecast the COVID-19 information, and the graph below depicts a visualization of prediction of the population infected v/s the actual active COVID-19 cases in Maharashtra state, for a period of 12 months (Fig. 3).

Fig. 3 Covid-19 predicted versus actual active cases in Maharashtra

302

N. Darapaneni et al.

Furthermore, based on the forecast information, an estimation of the number of beds available in hospitals and healthcare centers across the public and private health sectors of Maharashtra is computed; so that the government agencies can effectively plan to accommodate the active patients in hospitals for medical care and treatment. The estimates of the existing hospital capacity of India’s public and private health sector; in terms of the number of available hospital beds that is produced by a team of researchers affiliated with CDDEP is listed below [6]. • Total number of hospital beds in Maharashtra (in public, private sectors): 231,739. • Estimated value is used in the computing threshold value of available beds. The below graph Fig. 4 depicts in green dotted line the threshold value of available beds in hospitals; for COVID-19 patients who need hospital care in Maharashtra. In addition, the graph Fig. 5 depicts in green dotted line the threshold value of available ICU beds in hospitals [10]; for the COVID-19 active patients who are in a critical stage and need hospital ICU for treatment. In addition, the estimates of the existing hospital capacity of India’s public and private health sector; in terms of a number of the available Intensive Care Unit (ICU)

Fig. 4 Hospitalized (20%) active cases versus available beds

COVID-19 India Forecast Preparedness for Potential Emergencies

303

Fig. 5 Available ICU beds versus critical cases in Maharashtra

beds; that is produced by a team of researchers affiliated with CDDEP, is listed below [6]. • Total number of ICU beds in Maharashtra (public, private health sectors): 11,587. • Estimated value is used in the computing threshold value of available ICU beds. Furthermore, when the SIR Modelling is designed for other states in India, it is observed that Karnataka and Tamilnadu states are next in comparison to Maharashtra; which has the highest COVID-19 impacted cases in India. The below graph Fig. 6 depicts a visualization of the prediction of the population infected versus the actual active COVID-19 cases in Karnataka state. Additionally, the graph Fig. 7 depicts in green dotted line the threshold value of available beds in hospitals for COVID-19 patients. The below graph Fig. 8 depicts a visualization of the prediction of the population infected v/s the actual active COVID-19 cases in Tamilnadu state, for a period of 12 months. Additionally, the graph Fig. 9 depicts in green dotted line the threshold value of available beds in hospitals; for the infected patients who need hospital care in Tamilnadu.

304

N. Darapaneni et al.

Fig. 6 Covid-19 predicted versus actual cases in Karnataka

3 Discussion and Conclusion Overall, the AI Prediction Model i.e. the SIR Model designed in this use case is very effective in predicting the required COVID-19 information, namely; the rate of change of susceptible population, infectious population, and recovered population. This predicted information enables us to estimate almost accurately the essential healthcare and medical needs viz. a number of beds currently available in hospitals, and a number of additional beds required to serve the predicted infected patients. Most researchers are focusing to constantly improve their prediction models to incorporate feedback or any new findings or based on their experiments carried out on different other diseases that used more complicated models; typically, with more compartments. Moreover, with constant research, it is feasible that the emerging new prediction models may show more accurately predicted results, but the research needs to be ongoing for scaling the developed models to support future datasets containing new demographics and statistical population.

COVID-19 India Forecast Preparedness for Potential Emergencies

305

Fig. 7 Hospitalized (20%) active cases versus available beds

Based on the visual Exploratory Data Analysis and the SIR prediction model carried out for this COVID-9 India live dataset, it is observed that the impact on India due to the COVID-19 outbreak is increasing drastically. The currently available beds in hospitals after its occupancy by patients is depicted in the below graph Fig. 10 for five highly impacted states viz. Maharashtra, Tamilnadu, Karnataka, Gujarat and Delhi.

306

N. Darapaneni et al.

Fig. 8 Predicted versus actual active cases in Tamilnadu

This estimation of the current availability of beds in hospitals and the number of additional beds required in healthcare centers, would definitely be a vital aid for the government agencies to plan ahead for a situation of crisis; especially when the COVID-19 active cases go beyond the number of available beds in hospitals.

COVID-19 India Forecast Preparedness for Potential Emergencies

Fig. 9 Hospitalized (20%) active cases versus beds available

Fig. 10 Total available beds

307

308

N. Darapaneni et al.

References 1. Dhanwant, J.N., Ramanathan, V.: Forecasting COVID 19 growth in India using SusceptibleInfected-Recovered (S.I.R) model (2020). arXiv [q-bio.PE] 2. “COVID19-India API,” Covid19india.org. https://api.covid19india.org/documentation/csv/ (2020). Accessed 30 Nov 2020 3. Froese, H.: Infectious disease modelling, part I: Understanding SIR. Towards Data Science, 06-Apr-2020. https://towardsdatascience.com/infectious-disease-modelling-part-iunderstanding-sir-28d60e29fdfc?gi=8564062200eb (2020). Accessed 30 Nov 2020 4. Kumar, K., Meitei, W.B., Singh, A.: Projecting the future trajectory of COVID-19 infections in India using the susceptible-infected-recovered (SIR) model. Iipsindia.ac.in. https://iipsindia. ac.in/sites/default/files/iips_covid19_pfti.pdf (2020). Accessed 30 Nov 2020 5. Kapoor, G., et al.: State-wise estimates of current hospital beds, intensive care unit (ICU) beds and ventilators in India: are we prepared for a surge in COVID-19 hospitalizations? (2020). bioRxiv, p. 2020.06.16.20132787 6. Maplesoft.com. https://www.maplesoft.com/applications/download.aspx?SF=127836/SIR Model.pdf (2020). Accessed 30 Nov 2020 7. Researchgate.net. https://www.researchgate.net/publication/340362418_Modeling_and_Pre dictions_for_COVID_19_Spread_in_India (2020). Accessed 30 Nov 2020 8. Ghosh, A.: Glimmer in Covid surge: less than 5% of all patients require critical care (2020). The Indian Express, The Indian Express, 29-May-2020 9. “No title,” Thelancet.com. https://www.thelancet.com/pdfs/journals/laninf/PIIS1473-309 9(20)30300-5.pdf (2020). Accessed 30 Nov 2020 10. “Hospitals in the Country,” Gov.in. https://pib.gov.in/PressReleasePage.aspx?PRID=1539877. Accessed 30 Nov 2020 11. Media statement: Knowing the risks for COVID-19. Who.int. https://www.who.int/indonesia/ news/detail/08-03-2020-knowing-the-risk-for-covid-19 (2020). Accessed 30 Nov 2020 12. Gupta, D.P., Sharma, P.K.K., Joshi, P.S.D., Goyal, D.S.: A data-driven method to detect the flattening of the COVID-19 pandemic curve and estimating its ending life-cycle using only the time-series of new cases per day (2020). bioRxiv, p. 2020.05.15.20103374

Prediction of Cyclodextrin Host-Guest Binding Through a Hybrid Support Vector Method Ruan M. Carvalho, Iago G. L. Rosa, Priscila V. Z. C. Goliatt, Diego E. B. Gomes, and Leonardo Goliatt

Abstract Applying in silico experiments is part of interdisciplinary areas such as the rational study of drugs since it reduces the time and cost to discover new drugs. This work applies a hybrid machine learning technique coupling Randomized Search and Support Vector Regression to predict complexation energy between cyclodextrin and ligand molecules in host-guest systems. The method was able to adjust the data (R2 = 0.776) with low prediction errors (RMSE = 1.932 kJ/mol and MAE = 1.351 kJ/mol). The results were compatible with the presented literature, even though using a less computational complexity method. Keywords Molecular interaction · Machine learning · Cyclodextrin

1 Introduction The inclusion of in silico experiments in the scientific context in the past decades allowed the consolidation of interdisciplinary areas such as bioinformatics, computational biology, computational chemistry, among others that seek to describe,

R. M. Carvalho (B) · I. G. L. Rosa · P. V. Z. C. Goliatt · D. E. B. Gomes · L. Goliatt Computational Modeling, Federal University of Juiz de Fora (UFJF), José Lourenco Kelmer, São Pedro, Juiz de Fora, Minas Gerais 36036-330, Brazil e-mail: [email protected] I. G. L. Rosa e-mail: [email protected] P. V. Z. C. Goliatt e-mail: [email protected] D. E. B. Gomes e-mail: [email protected] L. Goliatt e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_30

309

310

R. M. Carvalho et al.

Fig. 1 Chemical structure of the three classes of cyclodextrins used in the work

understand, and predict natural events through mathematical equations and computational methods [1]. In this context, it is common for researchers to be interested in predicting measures of interaction between molecules, mainly to enable the rational study of drugs [1]. Performing screening of potential drugs computationally aims to reduce the time and cost to discover new drugs [1, 2]. The cyclodextrin is a class of molecules with a diverse application field in drugs, food, and agriculture. Its soluble property allows the creation of complexes through the interaction with different guest molecules in its nonpolar interior [3]. Furthermore, their spatial arrangement, Fig. 1, allows its application as a carrier molecule [4]. In the pharmaceutical area, this type of inclusion complex has been used mainly as vehicles that enhance the pharmacokinetics of drugs inside the body by incorporating nonpolar (and sometimes cytotoxic) drugs inside wrapper molecules during drug delivery processes [3, 4]. Such applicability places cyclodextrins as essential molecules in developing technologies for the controlled release of drugs, which justify studies for predicting the interaction properties of these molecules, through computational techniques of molecular screening. For this, there are several proposals for objective functions based on mathematical models from the perspective of classical or quantum physics [5, 6]. More recently, predictive models based on data and adjusted by machine learning methods emerged [5, 7]. Some of these methods have shown superior results to those physical-based models, besides having lower prediction time [5]. This paper presents the use of a hybrid support vector learner for the prediction of interaction measures between molecular pairs. We used simplified host-receptor systems, commonly referred as Toy Systems for the host-guest [8] problem, considering hosts with well-known physical properties and receptors with reduced geometric and dynamic complexity.

Prediction of Cyclodextrin Host-Guest Binding …

311

2 Materials and Methods 2.1 Data Collection All selected data were curated and made available by the BindingDB community (https://www.bindingdb.org). Each record covers a host (large) and a guest (small) molecular complex. The database provided the molecule structural information through smiles, the experimental conditions, including pH and temperature (◦ C), and the binding free energy, measured as ΔG (kJ/mol). Here we focused on α, β, and γ cyclodextrin (-CD) [9] classes, as shown in Fig. 1, due to larger data availability. We only considered the experiments with available pH and temperature measurements within the following range: 6.9 ≤ pH ≤ 7.4 and 14.5 ≤ Temp ≤ 30.1; resulting in 280 unique observations of α-CD (73), β-CD (164), and γ-CD (43). From the smiles representation of each host and guest molecules, the other physical-chemical properties are calculated using the module RDKit Descriptor Calculation [10] from KNIME (https://www.knime.com). Table 1 presents the calculated descriptors for the three host molecules considered. The guest molecules representation follows with the same descriptors listed plus the Formal Charge (FC). Figure 2 shows some of the distributions for the guest molecules descriptors, plus the distribution for the complexation energy values.

2.2 Machine Learning Approach The database contains the following characteristics: 280 unique observations, 145 guests, 3 hosts, and 25 descriptors (9 for hosts, 10 for guests, 2 for the environment, Table 1 Descriptors for each Cyclodextrin SlogP

SMRa

ASAb

TPSAc

HBAe

HBDf

RBd

Atoms

α-CD

−13.055

195.800

372.130

474.900

AMWd 972.846

30.0

18.0

6.0

126.0

β -CD

−15.231

228.434

433.924

554.050

1134.987

35.0

21.0

7.0

147.0

γ -CD

−17.406

261.067

495.717

633.200

1297.128

40.0

24.0

8.0

168.0

−4

−2

0

value

(a) SlogP

2

4

0

2

4

6

value

(b) RB

8

10

140 120 100 80 60 40 20 0

frequency

60 50 40 30 20 10 0

frequency

50 40 30 20 10 0

frequency

Molecular Refractivity. b ASA: Approximate Surface Area. c TPSA: Topological Polar Surface Area. d AMW: Average Molecular Weight. e HBA: HB Acceptor. f HBD: HB Donor. g RB: number of Rotatable Bonds frequency

a SMR:

−3

−2

−1

0

value

(c) FC

Fig. 2 Distributions for guest descriptors in a, b, and c, and ΔG in d

1

50 40 30 20 10 0

−30 −25 −20 −15 −10 −5

value

(d) ΔG

312

R. M. Carvalho et al.

3 identifiers, and 1 objective variable). The database was divided into a set of training (224) and testing (56) using the Stratified K-fold method to maintain the same proportion of instances of each class of cyclodextrins in both sets [7] coupled with a Kullback–Leibler divergence analysis between the sets. The data can be found on the git repository.

2.2.1

The ε-Support Vector Regression (ε-SVR)

The ε-Support Vector Regression (ε-SVR) is a classical method [11] applied in several fields and cited as a potential tool for drug discovery studies [12]. Thus, we used this ML technique as the core of our predictions, and we investigate if acceptable prediction levels may be achieved with a low computational complexity method, such as the ε-SVR. The ε-SVR is a linear regression model: f (x) =

N

w j K x, x j + b,

(1)

j=1

where K (·, ·) is a kernel function or a nonlinear transformation, w = [w1 , . . . , w N ] , is the vector of weights, b is a bias, and N is the number of samples. In this paper, we use the radial basis kernel function to the nonlinear transformation (Eq. 2), where 1 and σ is the length scale parameter: Γ = 2σ K (x, xi ) =

N

exp −Γ x − xi 2

(2)

i=1

In ε-SVR, the optimal w and b are computed by minimizing the Eq. 3 [13]: N

N C + L ε (yi − f (xi )) , N i=1

0 if |y − f (x)| ≤ ε |y − f (x)| otherwise, i=1 (3) where yi is output data associated with xi , L ε is the ε-insensitive loss function [14], C is a regularization parameter and ε is a SVR parameter. The internal parameters of the ε-SVR model to be adjusted are C, ε and Γ, resulting the model building parameter vector through a Randomized Search strategy. J=

2.2.2

wi2

L ε (y − f (x)) =

Randomized Search (RS) Strategy

In addition to the internal parameters adjusted in the training step, ML methods are generally sensitive to hyperparameter definitions [15]. In this work, we propose an SVR learner coupled with the hyperparameters tuning through an RS strategy. The RS performs a random choice of values on the parameters, where each setting is sampled from a distribution over possible parameter values [16]. The strategy allows

Prediction of Cyclodextrin Host-Guest Binding …

40 30 20 10 0

C = 8308.25 0

2000 4000 6000 8000 10000

140 120 100 80 60 40 20 0

= 0.19

Γ = 0.09

800

frequency

50

frequency

frequency

60

313

600 400 200

0.0

0.5

1.0

value

(a) C

1.5

value

(b) ε

2.0

2.5

0

0

2

4

6

8

value

(c) Γ

Fig. 3 Best hyperparameters tuning distribution during 1000 runs of Randomized Search. The hatched bar indicates the range that involves the best-selected model

a budget to be chosen independently of the number of parameters besides adding parameters that do not influence the performance does not decrease efficiency. Each run of RS in the present paper considers a threefold cross-validation and 1000 samples of ε-SVR parameters following uniform distributions, where C ∈ [0, 104 ], ε ∈ [0, 10], and Γ ∈ [0, 10]. Each ε-SVR machine is adjusted in 10000 iterations (maximum).

2.2.3

Model Performance Criterion

We apply three metrics to calculate the prediction errors on training and test sets: (i) n n n 1 2 2 2 R =1− ˆi ) / i=1 (yi − y¯i ) , (ii) RMSE = n i=1 (yi − yˆi )2 , i=1 (yi − y 1 n and (iii) MAE = n i=1 yi − yˆi , where y is the measured value, yˆ is the pren dicted value and y¯ is the average of the measured values given by y¯ = n1 i=1 yi considering the n instances i.

3 Results and Discussion The predicting results for the optimized SVR model are presented next. First, we present in Fig. 3 the distribution of the optimized parameters selected over 1000 runs using RS. Table 2 shows the overall average metrics for training and testing sets. Both results demonstrate the consistency of the method in the given task. Note that, in Fig. 3b, even though ε search space was defined in a larger range, the best models were consistent, having ε ≤ 3. A similar result is observed for Γ , once it mostly converges to values closer to zero. The minor training RMSE indicates the best-optimized model shown in Fig. 3 (hatched bar). Figure 4 shows the high adjustability of the best-optimized model to the training data (R 2 = 0.922) along with the other evaluation metrics. The figure indicates instance attributes that frequently trick the predictions. The definition of each class is described in Table 3.

314

R. M. Carvalho et al.

Table 2 Average metrics (mean ± std) over 1000 runs of ε-SVR in the RS Dataset R 2 score RMSE (kJ/mol) MAE (kJ/mol) 0.881 ± 0.040 0.727 ± 0.041

neg. neutral pos.

−10

low medium high

−5

Predicted (y) ˆ

Predicted (y) ˆ

−5

1.621 ± 0.256 2.127 ± 0.154

−15 −20

−10 −15 −20

−25

−25

−30

−30 −30 −25 −20 −15 −10

−5

1.188 ± 0.399 1.692 ± 0.220

Predicted (y) ˆ

Training Testing

−5

-CD -CD

−10

-CD

−15 −20 −25 −30

−30 −25 −20 −15 −10

Experimental (y)

−5

−30 −25 −20 −15 −10

Experimental (y)

(a) Ligand FC

−5

Experimental (y)

(b) Ligand RB

(c) CD Class

Fig. 4 Best model (training set). R 2 = 0.922, RMSE = 1.331 and MAE = 0.505 −5

−10 −15 −20 −25

low medium high

−10

Predicted (y) ˆ

neg. neutral pos.

Predicted (y) ˆ

Predicted (y) ˆ

−5

−15 −20 −25

−25

−20

−15

−10

Experimental (y)

(a) Ligand FC

−5

−5

-CD -CD

−10

-CD

−15 −20 −25

−25

−20

−15

−10

Experimental (y)

(b) Ligand RB

−5

−25

−20

−15

−10

−5

Experimental (y)

(c) CD Class

Fig. 5 Best model (testing set). R 2 = 0.776, RMSE = 1.932 and MAE = 1.351

Figure 5 shows the model prediction ability over the testing set. The model’s generality is clear according to the values of the evaluation metrics. The increase of error from training to testing step keeps a save threshold showing the method fitted to the data without memorization instead of learning the data patterns. The instances with higher error have ligands with higher RB and γ-CD host. This behavior may originate because there were fewer γ-CD instances in the database. Table 3 shows the overall (1000 runs) average RMSE for each instance subset (charged ligand, ligand flexibility, and CD classes). There are indeed oscillations on the predicted measurements between the subsets, mainly for different RB levels. It should take into account before applying the method to new datasets. Given the previous results, we need to verify if the error levels are compatible with the model’s application domain of molecular interactions. In this context, interactions

Prediction of Cyclodextrin Host-Guest Binding …

315

Table 3 RMSE values in kJ/mol (mean ± std) for each classification of ligand Formal Charge (FC), ligand Rotatable Bonds (RB), and host type. We considered a low ligand RB the range [0, 3], medium RB the range [4, 7], and high RB the range [8, 11]. We disregard the instance that generated the outlier prediction for this RMSE calculation Dataset group RMSE metric (kJ/mol) FC

RB

CD

Classes Training Testing Classes Training Testing Classes Training Testing

Neg. 1.607 ± 0.249 1.957 ± 0.103 Low 1.630 ± 0.283 2.097 ± 0.153 α-CD 2.045 ± 0.290 2.728 ± 0.280

Neutral 1.231 ± 0.299 2.191 ± 0.145 Medium 1.662 ± 0.261 1.574 ± 0.253 β-CD 1.401 ± 0.284 1.710 ± 0.143

Pos. 2.242 ± 0.240 2.147 ± 0.365 High 1.467 ± 0.125 3.156 ± 0.160 γ-CD 1.460 ± 0.118 2.883 ± 0.169

of electrostatic potential, van der Waals, salt bridges, and hydrogen bonds (HB) are prevalent forms of interaction. Among them, hydrogen bonds and electrostatic interactions are determinants in receptor-ligand complexes [17]. HB are weaker than electrostatic interactions but extremely more frequent. The literature highlights that HB connections have an interaction potential always lower than 10 kJ/mol; in the vast majority of cases, they are lower than 3 kJ/mol [17, 18]. In protein complexes, for example, the contributions from the formation of HB are 5 ± 2.5 kJ/mol [17]. Thus, the error level in the predictions can be related to the number of HB inserted in the non-predicted range. For this work, an error up to one non-predicted hydronium bond is considered acceptable. The RMSE and the MAE metrics indicate an average error while keeping the same unit of measure as our objective variable (kJ/mol). Observing our best model RMSE and MAE in training (RMSE = 1.331 and MAE = 0.505) and testing (RMSE = 1.932 and MAE = 1.351), we see that the values remain safely below the threshold of 3 kJ/mol. The same is valid for our overall runs average analysis presented in Table 2. It guarantees an average error of only one hydrogen bond between the real and the predicted ΔG, showing the prediction applicability in this domain. When considering the average RMSE levels obtained for each instance subsets presented in Table 3, the worst scenario occurs for instances with ligand with high numbers of RB. Indeed, molecular systems with greater physical degrees of freedom are more difficult for in silico studies. However, in the present study, the higher levels of error may be explained by the lower number of instances with high RB in the database, as shown in Fig. 3b. We hope a more balanced dataset may decrease the error levels for this instance type since the training data was well adjusted. Finally, we briefly compare the results with some papers presented in the literature. Note that this comparison is not over results based on the same datasets. We only seek to demonstrate that, for the present database, the error levels are compatible, a priori, with it are frequently presented in the literature. Dimas et al. [19] selected a set of

316

R. M. Carvalho et al.

complexes between β-CD and 57 small organic molecules that have been previously studied with the binding energy distribution analysis method in combination with an implicit solvent model. Even being a study focus only on the β-CD and applying a physics-based method, the errors levels were R 2 = 0.66 and R M S E = 9.330 kJ/mol, values that are worst than we achieved with our average metrics in Table 2 and best model in Fig. 5. Zhao et al. [20], on the other hand, also applied ML methods for predicting interaction energy between a vast number of cyclodextrin classes and ligand molecules. The study is data driven over a dataset of 3000 instances that were not published because of intellectual property issues. The best result obtained for Zhao was based on a eXtrem Gradient Boosting (XGB) approach (R 2 = 0.86, R M S E = 1.83 kJ/mol, and M AE = 1.38 kJ/mol). Comparing the result with our best ε-SVR model presented in Fig. 5, we see that we achieved competitive results considering the database size and the lower computational complexity of ε-SVR.

4 Conclusion In this paper, we applied the ε-SVR method, coupled with the RS method for hyperparameters tuning, for the prediction of interaction measures between cyclodextrins host-guest systems. The approach was sufficient to define a model with great generality (R 2 = 0.776) and low associated error (RMSE= 1.932 kJ/mol and MAE = 1.351 kJ/mol), equivalent to only one hydrogen bond. The results were compatible with the presented literature, even though using a less computational complexity method. As future work, we may apply other ML methods to this task (e.g., Elastic Net, ELM, GB, XGB...), as well as investigate the use of Differential Evolution instead of Randomized Search method for hyperparameters optimization. Acknowledgements The authors thank the financial support from FAPEMIG and CAPES. We also thank the Grupo de Modelagem Computacional Aplicada (GMCA/CNPq).

References 1. Katsila, T., et al.: Computational approaches in target identification and drug discovery. Comput. Struct. Biotechnol. 14, 177–184 (2016) 2. Kumar, N., Hendriks, B.S., et al.: Applying computational modeling to drug discovery and development. Drug Discov. Today 11(17–18), 806–811 (2006) 3. Mura, P.: Advantages of the combined use of cyclodextrins and nanocarriers in drug delivery: a review. Int. J. Pharm. 119181 (2020) 4. Gadade, D.D., Pekamwar, S.S.: Cyclodextrin based nanoparticles for drug delivery and theranostics. Adv. Pharm. Bull. 10(2), 166 (2020) 5. Lu, J., Hou, X., Wang, C., Zhang, Y.: Incorporating explicit water molecules and ligand conformation stability in machine-learning scoring functions. J. Chem. Inf. Model. 59(11), 4540–4549 (2019)

Prediction of Cyclodextrin Host-Guest Binding …

317

6. Haghighatlari, M., Li, J., Heidar-Zadeh, F., Liu, Y., Guan, X., Head-Gordon, T.: Learning to make chemical predictions: the interplay of feature representation, data, and machine learning algorithms. arXiv:2003.00157 (2020) 7. Gao, H., Ye, Z., et al: Predicting drug/phospholipid complexation by the lightGBM method. Chem. Phys. Lett. 137354 (2020) 8. Mobley, D.L., Gilson, M.K.: Predicting binding free energies: frontiers and benchmarks. Annu. Rev. Biophys. 46, 531–558 (2017) 9. Hu, Q.D., Tang, G.P., Chu, P.K.: Cyclodextrin-based host-guest supramolecular nanoparticles for delivery. Acc. Chem. Res. 47(7), 2017–2025 (2014) 10. Landrum, G.: RDKit: open-source cheminformatics software. GitHub SourceForge 10, 3592822 (2016) 11. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011) 12. Karthikeyan, M., Vyas, R.: Machine learning methods in chemoinformatics for drug discovery. In: Practical Chemoinformatics, pp. 133–194. Springer (2014) 13. Kargar, K., Samadianfard, S., Parsa, J., Nabipour, N., Shamshirband, S., Mosavi, A., Chau, K.W.: Estimating longitudinal dispersion coefficient in natural streams using empirical models and machine learning algorithms. Eng. Appl. Comput. Fluid Mech. 14(1), 311–322 (2020) 14. Gunn, S.R., et al.: Support vector machines for classification and regression. ISIS Tech. Rep. 14(1), 5–16 (1998) 15. Schmidt, M., Safarani, S., Gastinger, J., Jacobs, T., Nicolas, S., Schülke, A.: On the performance of differential evolution for hyperparameter tuning. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019) 16. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012) 17. Zerbe, O., Jurt, S.: Applied NMR Spectroscopy for Chemists and Life Scientists. Wiley (2013) 18. Blundell, C.D., Nowak, T., Watson, M.J.: Measurement, interpretation and use of free ligand solution conformations in drug discovery. In: Progress in Medicinal Chemistry, vol. 55, pp. 45–147. Elsevier (2016) 19. Suarez, N.: Affinity calculations of cyclodextrin host–guest complexes: assessment of strengths and weaknesses of end-point free energy methods. J. Chem. Inf. Model. 59(1), 421–440 (2018) 20. Zhao, Q., Ye, Z., Su, Y., Ouyang, D.: Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques. Acta Pharm. Sin. B 9(6), 1241–1252 (2019)

Hybrid Unsupervised Extreme Learning Machine Applied to Facies Identification Camila M. Saporetti, Iago G. L. Rosa, Ruan M. Carvalho, Egberto Pereira, and Leonardo G. da Fonseca

Abstract Predictive models for classifying the distribution of heterogeneities and quality in hydrocarbon reservoirs are fundamental for exploring and optimizing oil and gas field production. Determining heterogeneities manually through facies is generally a time-consuming task; thus, computational intelligence, such as clustering techniques, appears as an alternative. This work aims to apply the UnSupervised Extreme Learning Machine, also known as US-ELM, to cluster petrographic data collected from the Paraná Basin, Brazil. We propose a hybrid approach to tune the internal parameters for US-ELM. We use Principal Component Analysis to remove redundant attributes. The results show that the hybrid US-ELM hit higher average results for accuracy, silhouette metrics, and adjusted rand score than the methods commonly used in the literature. Keywords Facies identification · Unsupervised extreme learning machine · Differential evolution

C. M. Saporetti State University of Minas Gerais, Av. Paraná, 3001, Jardim Belvedere I, Divinópolis, MG 35501-170, Brazil e-mail: [email protected] I. G. L. Rosa (B) · R. M. Carvalho · L. G. da Fonseca Computational Modeling Program, Federal University of Juiz de Fora, José Lourenco Kelmer, Juiz de Fora, MG 36036-330, Brazil e-mail: [email protected] R. M. Carvalho e-mail: [email protected] L. G. da Fonseca e-mail: [email protected] E. Pereira State University of Rio de Janeiro, Av. Sao Francisco Xavier, 524, Maracana, Rio de Janeiro, RJ 20550-900, Brazil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_31

319

320

C. M. Saporetti et al.

1 Introduction Determining and mapping heterogeneities of hydrocarbon reservoirs are strategically crucial for characterizing and defining the oil and gas field’s productivity and commercial character. Heterogeneities are characterized through several sedimentary facies that represent a limited part in the area of a given stratigraphic unit, which presents properties significantly different from the other parts of the unit [1]. Facies can be defined as rocky units characterized by similar lithological attributes (composition, texture, sedimentary structures, and color) and paleontological properties (content and fossil record) [2]. Thin sections that have similar characteristics belong to the same facies. The determination facies process is usually time-consuming and does not take into consideration all information. As a result, the large data sizes. Automating the identification steps becomes worthy since it speeds up the analysis to obtain information about the reservoir rocks. In recent years, several papers have applied computational intelligence techniques to assist in reservoir characterization: Martinelli and Eidsvik [3] employed grouping strategies to construct sequential designs for Bayesian Networks and Random Markov Fields of pre-oil prospects, helping in the decision of which well to drill first. El Sharawy and Gaafar [4] used Cluster Analysis and Principal Component Analysis to identify electrofacies in seismic profile data (gamma-ray, sonic, density, and neutron). Methe et al. [5] used cluster analysis to obtain information on lithology. They have tested clustering algorithms with different approaches (Ward, DBSCAN, K-Means, and Mean-Shift) on geophysical datasets from well drilling. Oloso et al. [6] introduced a hybrid strategy of Functional Networks (FN) and KMeans grouping to determine the volume, pressure, and temperature attributes of crude oil. K-Means was used to find groups from the input data set previously using FN to predict the actual target variables. Wang et al. [7] used an unsupervised KNN optimized using the cosine distance to identify lithology in data from the Gaoqing field at the Jiyang depression. Abdideh and Ameri [8] applied a clustering method based on high-resolution graphs (MRGC) on geological and petrophysical features to separate log facies from sequences of carbonate gas reservoirs. Chopra et al. [9] compared various computational intelligence techniques: Principal Component Analysis (PCA), waveform classification, K-means, and supervised Bayesian technique to identify facies from the Delaware Basin. Hong et al. [10] developed an unsupervised facies identification model based on deep neural networks using well logs from the Council Grove gas reservoir in Southwest Kansas. Huang and collaborators [11] developed the US-ELM method, UnSupervised Extreme Learning Machine, as an alternative for the standard ELM method to unsupervised learning. The proposed adaptation consists of calculating the Laplacian of the input data to find the eigenvectors and insert them in the ELM’s objective function. As a result, it is possible to deal with data clustering problems. However, adjusting the US-ELM hyperparameters can be a challenging task, and the performance of the method strongly depends on this choice [12]. Then, we propose using a Differential Evolution (DE) algorithm in a hybrid approach to tune the internal parameters for

Hybrid Unsupervised Extreme Learning Machine Applied to Facies Identification

321

US-ELM from the user-defined hyperparameters space. This work aims to use the US-ELM method coupled with DE hyperparameters tuning to determine sedimentary facies from the Paraná Basin data, Brazil.

2 Materials and Methods

A1 A2 A3 A4 A5

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

PC1 (36.89% of data variance)

1.0

A1 A2 A3 A4 A5

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

PC3 (11.97% of data variance)

1.0

PC3 (11.97% of data variance)

PC2 (23.89% of data variance)

The evaluated data comprises samples collected from wells in the Paleosul region, Devonian member of the Paraná Basin. This basin is situated in an area with 1.4 million km2 covering countries Brazil, Argentina, Paraguay, and Uruguay. The database presents information from 3 wells and 60 samples in total, with 25 constituents for each. Following the manual classification of the data, it was possible to identify in the sandstones five distinct facies: A1, A2, A3, A4, and A5 [13]. Petrographic databases usually have few samples, as the percentages of the characteristics are determined through thin-sections analysis. As some techniques are more sensitive to redundancy, there is a risk of input bias when dealing with multiple attribute classification. We apply Principal Component Analysis (PCA) to remove redundant attributes and resize the database to a threedimensional space. Figure 1 shows the data dispersion within the eigenvectors space (PC1, PC2, and PC3 components). Table 1 presents the variability for 5 first principal components. The input dataset analysis indicates that the three first components contributed more than 70% of the original information, being used as input data for clustering methods. US-ELM is an extending ELM to process unlabeled data. The unsupervised approach is reasoned on the two assumptions: (I) any unclassified data X u is abolished from the identical marginal distribution PX , and (II) if the points x1 and x2 are

1.0

PC1 (36.89% of data variance)

1.0

A1 A2 A3 A4 A5

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

Fig. 1 Data projections within PCA space Table 1 Variability for the five principal components PC1 PC2 PC3 0.36887

0.23890

0.11974

1.0

PC2 (23.89% of data variance)

PC4

PC5

0.09169

0.07753599

322

C. M. Saporetti et al.

similar, then the distributions P(y|x1 ) and P(y|x2 ) possibly will be analogous. The regularization scheme initiated by [11] to decrease the cost function is Lm =

1 wi j ||P(y|xi ) − P(y|x j )||2 , 2 i, j

(1)

where wi j represents the proximity among xi and x j . The weights wi j is generally calculated using Gaussian function ex p(−||xi − x j ||2 /2δ 2 ). Equation 1 can be written as the matrix equation Lˆ m = T r (Yˆ T L Yˆ ), where T r (.) indicates the matrix’s trace, Yˆ is array of the predicted labelsof X u , L = −W + D is the Laplacian, and D represents a diagonal matrix Dii = uj=1 wi j . N The data set X = {xi }i=1 are unlabeled since this is an unsupervised problem. The formulation of the US-ELM is given by min ||β||2 + λT r (β T H T L H β),

β∈R L xn o

(2)

where λ is a swap parameter. When L ≤ N , we have β ∗ = [v˜2 , v˜3 , . . . , v˜n 0 +1 ], where v˜i = vi /||H vi ||, i = 2, . . . , n 0 + 1 are the normalized eigenvectors. γi is the ith least possible eigenvalues of Eq. (3), and vi is the relative eigenvectors. (I L + λH T L H )v = γ H T H v

(3)

If L > N , Eq. (3) is indefinite. In this instance, a second formulation (Eq. (4)) is given by applying the same maneuver. (I N + λL H H T )u = γ H H T u

(4)

Also, u i is the derived eigenvectors corresponding the ith least possible eigenvalues of Eq. (4). Hence, the final solution is presented in Eq. (5). β ∗ = H T [u˜ 2 , u˜ 3 , . . . , u˜ n 0 +1 ],

(5)

where u˜ i = u˜ i /||H H T u˜ i ||, i = 2, . . . , n 0 + 1 are the normalized eigenvectors. Finding the best parameters for a clustering method is often hard work. Usually, the researcher sets these parameters by testing different configurations manually. Otherwise, a possibility is the apply heuristic approaches, such as evolutionary algorithms. In this paper, we use a Differential Evolution Algorithm (DE) [14] to search the optimal parameter settings, where every individual in the DE population is a candidate representation of a US-ELM. The use of Differential Evolution has grown in recent years, this is due to its good performance [12]. Considering the population of parameter vectors {θi, J |i = 1, . . . , N P} in the generation J , the following steps are performed iteratively:

Hybrid Unsupervised Extreme Learning Machine Applied to Facies Identification

323

Table 2 Candidate solutions encoding and parameter ranges for the search space Decision Variable Description Possible values/range θ1 θ2

Weight type Laplacian distance (LD)

θ3 θ4

No. neighbors (NN) Activation function (AF)

θ5 θ6 θ7

α parameter No. hidden neurons (HL) No. clusters (NC)

0: Binary, 1: Distance, 2: Heat 0: Euclidean, 1: cosine, 2: Hamming [0, 5] 0: Sigmoid, 1: Gaussian, 2: Tanh, 3: Identity, 4: Relu, 5: Swish [0, 1] [1, 500] [2, 8]

1. Mutation operator: Given a vector θi,J +1 , i = 1, 2, . . . , N P, a new vector is created as ν i,J +1 = θr1 ,J + F(θr2 ,J − θr3 ,J ) with random and mutually distinct indexes r1 , r2 , r3 ∈ 1, 2, . . . , N P and 0 ≤ F ≤ 2. F scales the variation for (θr2 ,G − θr3 ,G ). 2. Crossover operator: The trial vector μi,J +1 = (μ1i,J +1 μ2i,J +1 , . . . , μ Di,J +1 ) is generated according to μ ji,J +1 =

ν ji,J +1 if randb( j) ≤ C R or j = r nbr (i), if randb( j) > C R and j = r nbr (i). θ ji,J

(6)

where randb( j) is a random uniform number in [0, 1], C R is the user-defined probability, 1 ≤ r nbr (i) ≤ D is index selected randomly to ensure that μi,J +1 achieves at least one entry in ν i,J +1 . 3. Selection Operator: If vector μi,J +1 achieves better performance than θi,J , then θi,J +1 is set to μi,J +1 . Contrariwise, the previous value θi,J is kept as θi,J +1 . In the proposed approach, an individual θ = (θ1 , θ2 , θ3 , θ4 , θ5 , θ6 , θ7 ) encodes an US-ELM as exposed in Table 2. The objective of the DE algorithm is to find the US-ELM hyperparameters so that the method reproduces computed outputs to be the best possible clustering. The Adjusted Rand Score [15] was used as the objective function to be maximized in the evolutionary search.

3 Results and Discussions Table 3 presents the average and standard deviation values of the Silhouette Coefficient (SC), Accuracy, and Adjusted Rand Score (ARS) by US-ELM, K-Means, and Ward (averaged over 30 runs). Note that the average SC is low for all methods since the clusters are geometrically close within the considered hyperspace. Thus, there are points closer to the nearest cluster elements than from the elements of its cluster.

324

C. M. Saporetti et al.

Table 3 Average values for silhouette coefficient, accuracy and adjusted rand score Method SC Accuracy ARS 0.3229 ± 0.0459 0.4117 ± 0.0208 0.4222 ± 0.0000

US-ELM K-Means Ward

0.2056 ± 0.0997 0.1772 ± 0.1009 0.1333 ± 0.0000

0.1723 ± 0.0233 0.1437 ± 0.0123 0.0759 ± 0.0000

Table 4 Best parameters according to accuracy, and the maximum values achieved for SC and ARS considering 30 independent runs of DE Method Parameters Accuracy SC∗ ARS∗ US-ELM

K-Means Ward

Weights: Binary, 0.5333 LD: Hamming, LN: 2, AF: Gaussian, α: 0.6507, HL: 422, NC: 4 NC: 5 0.4333 NC: 8, Compute Full 0.1333 Tree: False

(a) US-ELM

(b) K-means

0.4260

0.2182

0.4189 0.4222

0.1973 0.0759

(c) Ward

Fig. 2 Clustering results for best models according to accuracy. Each marker refers to a different predicted cluster

The average accuracy and ARS suggest that the facies found are not consistent with those found in the manual method. Table 4 presents the best metrics for the hybrid US-ELM, K-Means, and Ward parameters. US-ELM increases the hit for accuracy in 23.08% and the maximum ARS in 10.59%, and SC in 18.75%, considering the best K-Means model. Figure 2 shows the clustering results of each of the best models according to accuracy. The creation of facies databases generally requires an experienced specialist to analyze and identify the detailed data. Although descriptive, some incoherences can be existing in facies individualization, such as the information registered is related

Hybrid Unsupervised Extreme Learning Machine Applied to Facies Identification

325

Table 5 Average metrics over 30 runs using silhouette as DE objective function Method SC Accuracy US-ELM K-Means Ward

0.1450 ± 0.0839 0.6285 ± 0.0196 0.6346 ± 0.0000

0.2711 ± 0.1147 0.2144 ± 0.0512 0.2333 ± 0.0000

to the practice and know-how of the geologist in the samples documentation [16]. Another restriction is the wide difference in the formation of determined similar materials. Due to this, in cases where facies have been wrongly classified in the real data or are not registered due to scale restrictions, it can produce poor clustering results. It is also essential to discuss which evaluation metrics to use for the given problem. The paper of the US-ELM proposal considers accuracy as a model evaluation metric. However, issues related to facies and petrofacies generally tend to be related to unbalanced data classes, making accuracy an unfair metric. Alternatively, metrics such as Balanced Accuracy or Adjusted Random Score provide more appropriate evaluations for classes of different sizes. Finally, it is worth mentioning when training the method over new data, the hyperparameter optimization with DE must consider unsupervised objective functions. Table 5 presents the same tests described previously but using the silhouette coefficient as the DE’s objective function. Note that the result shows a case in which the silhouette evaluation opposes the evaluation of accuracy metrics, demonstrating the difficulty in creating models for clustering facies.

4 Conclusion In this paper, we have evaluated the use of Differential Evolution for searching the best US-ELM hyperparameters in the problem of identifying facies from petrographic data. US-ELM was more accurate than other methods such as nearest neighbor approaches, hierarchical methods, and density-based clustering techniques. However, the results showed that identifying facies in an unsupervised way is challenging and requires further research. Acknowledgements We thank the financial support from CNPq (grant 429639/2016-3), FAPEMIG (grants 01106/15 and 00334/18), and CAPES—Finance Code 001.

326

C. M. Saporetti et al.

References 1. Cevolani, J.T., Oliveira, L.C., Goliatt, L., Pereira, E.: Visualizacao e classificacao automática de petrofácies sedimentares. In: 6o Congresso Brasileiro de Pesquisa e Desenvolvimento em Petróleo e Gás (2011) 2. Hyne, N.: Dictionary of Petroleum Exploration, Drilling & Production. PennWell Corporation (2014) 3. Martinelli, G., Eidsvik, J.: Dynamic exploration designs for graphical models using clustering with applications to petroleum exploration. Knowl. Based Syst. 58, 113–126 (2014) 4. El Sharawy, M.S., Gaafar, G.R.: Reservoir zonation based on statistical analyses: a case study of the Nubian sandstone, Gulf of Suez, Egypt. J. Afr. Earth Sci. 124, 199–210 (2016) 5. Methe, P., Goepel, A., Kukowski, N.: Testing the results of estimating lithological stratigraphy through cluster analysis on geophysical borehole logging data through multi sensor core logging data. In: EGU General Assembly Conference Abstracts, vol. 19, p. 9642 (2017) 6. Oloso, M.A., Hassan, M.G., et al.: Hybrid functional networks for oil reservoir PVT characterisation. Expert Syst. Appl. 87, 363–369 (2017) 7. Wang, X., Yang, S., Zhao, Y., Wang, Y.: Lithology identification using an optimized KNN clustering method based on entropy-weighed cosine distance in Mesozoic strata of Gaoqing field, Jiyang depression. J. Pet. Sci. Eng. 166, 157–174 (2018). http://www.sciencedirect.com/ science/article/pii/S0920410518302201 8. Abdideh, M., Ameri, A.: Cluster analysis of petrophysical and geological parameters for separating the electrofacies of a gas carbonate reservoir sequence. Nat. Resour. Res. 29, 1–14 (2019) 9. Chopra, S., Marfurt, K., Sharma, R.: Unsupervised machine learning facies classification in the Delaware basin and its comparison with supervised Bayesian facies classification. In: SEG Technical Program Expanded Abstracts 2019, pp. 2619–2623. Society of Exploration Geophysicists (2019) 10. Hong, Y., Wang, S., Bae, J., Yoo, J., Yoon, S., et al.: Automated facies identification using unsupervised clustering. In: Offshore Technology Conference (2020) 11. Huang, G., Song, S., et al.: Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 44(12), 2405–2417 (2014) 12. Zhu, Q.Y., Qin, A., Suganthan, P., Huang, G.B.: Evolutionary extreme learning machine. Pattern Recognit. 38(10), 1759–1763 (2005) 13. Brazil, F.A.F.: Estratigrafia de Sequencias e Processo Diagenético: Exemplo dos Arenitos Marinho-Rasos da Formacao Ponta Grossa, Noroeste da Bacia do Paraná. Master’s thesis, UERJ (2004) 14. Storn, R., Price, K.: Differential evolution—an efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997) 15. Hubert, L., et al.: Comparing partitions. J. Classif. 2, 193–218 (1985) 16. Pollock, D.W., Barron, O.V., et al.: 3D exploratory analysis of descriptive lithology records using regular expressions. Comput. Geosci. 39, 111–119 (2012)

Decision Tree-Based Classification Model to Predict Student Employability Chandra Patro and Indrajit Pan

Abstract The employability of students has been a major concern for all academic institutions. Academic organizations are keen to analyze multi-faceted performancecentric data of the students to enhance their performance outcome. The concept of data mining on educational data which is also known as Educational Data Mining (EDM) has gained much interest to facilitate this need for performance analysis. It helps to extract meaningful information from raw academic data. This work uses a dataset developed with academic performances along with test scores and applies those data for classification using a decision tree classifier to predict the employability of students across different disciplines. The experimental study has shown that the decision tree classifier for employability prediction yields high accuracy. Keywords Classification techniques · Decision tree classifier · Educational data mining · Ensemble method · Higher education · Random forest

1 Introduction The economy of the nation is being deeply reliant on the higher education industry. The industry is a major driving force for the development of a trained workforce. Several dependent industries are also benefited through the ready workforce [1]. The reputation of these higher educational institutes is mostly based on the competence of the students passing out from there. Data mining has great significance in the performance analysis of academic institutes. This provides different data patterns to analyze the strength and weakness of every student and their overall performance [2]. Effective use of data mining techniques in the educational domain helps to monitor several factors associated with student performances. This monitoring process will C. Patro (B) · I. Pan RCC Institute of Information Technology, Kolkata, West Bengal 700015, India e-mail: [email protected] I. Pan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_32

327

328

C. Patro and I. Pan

ensure to plug the gaps in the preparation of the students. Management of higher education institutes will have a thorough observation and that will certainly help to improve the students’ performance. This close scrutiny will not only improve the employability of the students but also improves the reputation of an academic organization. Educational institutes nowadays critically work over several studentcentric data including personal information, academic history, extra and co-curricular details, mental and physical knowhow. A thorough analysis of these data yields effective information which is helping to gain better insights resulting in the development of student performance [3, 4]. A decision tree model based on a decision tree classifier algorithm is being proposed for data classification to categorize the students into some different groups to quantify their employability status based on certain parameters at any point of time. The model will take some data records into consideration from student recordset like personal information, academic history, extra and co-curricular details, mental and physical knowhow. Further, those data can be processed through popular machine learning and model building classification algorithms like logistic regression, knearest neighbors’, Gaussian NB. The models are built and trained using training data and use to test or validate the test data for predicting employability status. ROC (Receiver Operating Characteristics) curve will help to measure the performance of the classification model [4]. Some of the recent research reports are discussed in Sect. 2. The proposed methodology has been illustrated in Sect. 3 and the experimental finding is illustrated in Sect. 4. Section 5 discussed about the conclusion and challenges that the present research work is having and possible scopes for improvement.

2 Literature Survey There are many research reports available in the literature which explains the role of educational data mining for the benefits of the students in quality improvement. A brief survey in this section will provide a thorough insight towards the possibilities of this research domain. In the work reported [5] has discussed a decision tree-based model in the development of programming performance of the students. The authors have worked on an internal dataset prepared on a programming language paper. They have classified students into three different performance categories. This has intended to find out weak students and further leverage special care to them. A feature selection method has been proposed in [6] which has focused on enhancing the prediction accuracy. The authors have insisted upon selecting more number of performance features while predicting performance trend. Considering a wide dimension of features has evolved in better prediction as described in the result section. An e-learning platform based parametric analysis has been done in [7]. They have focused on predicting the reason for the poor performance of participant and scoring

Decision Tree-Based Classification Model …

329

under rated grade in the evaluation. Now-a-days e-learning platforms are emerging out as alternative and convenient solutions for learning new skills. However, there is a huge gap between the number of registrations and the number of successful completions. This work has attempted to predict the reason behind this difference so that necessary preventive measures can be taken. Another research work [8] has presented a thorough review on various research reports on educational data mining, its challenges and prospects. This study has discussed that detailed data mining applications on individual data sets generated by educational institutes can do wonders. It can provide thorough analysis and prediction on different performance metrics of every individual student which can certainly contribute to the betterment of overall performance. A Naïve Bayesian method for predicting student future performance based on historical data has been proposed in [9]. The research shows that the naïve Bayesian method performs better than the techniques like regression and clustering. This brief survey has encouraged designing a model using data mining on educational performance data to improve student employability. The reputation of higher education institutes is highly reliant on the outcome of graduating students. Effective use of educational data mining can always assist the management and faculty members to identify weak areas of every individual student and device a plan to strengthen them so that in the final competition they can perform better [10]. Their performance will also glorify the reputation of the Institute.

3 Proposed Methodology This work proposes one decision tree-based model to predict student employability in two major categories. These two categories will distinctly predict whether an individual is employable or not. Following this prediction, the accuracy of the model will be compared with some other models populated through some well-known classification techniques such as logical regression, Gaussian NB, KNearestNeighbor etc. Detailed phases of the work are described in the following subsections.

3.1 Data Collection This project involves data analysis on the student employability dataset. A similar dataset in.CSV format based on AMCAT (Aspiring Minds Computer Adaptive Test) scores downloaded from Kaggle public dataset repository containing 33 columns (student attributes) and 3998 rows (Students records) in which employability is the dependent variable and the rest are independent variables. The rest attributes are characterized as

330

C. Patro and I. Pan

• Demographic data as general attributes: ID (unique ID of Student), Gender (male, female), DOB (date of birth), 10percentage (marks obtained in grade 10), 10board (school board), 12graduation (year of graduation in senior high school), 12board (High school board), CollegeID (unique ID identifying University or college), CollegeTier (each college has annotated as 1 or 2), Degree (Degree obtained), Specialization (specialization pursued), CollegeGPA (Aggregate GPA at graduation), CollegeCityID (unique ID to identify the city in which the college located in), CollegeCityTier (the tier of city in which the college located in), CollegeState (Name of state in which college is located), GraduationYear (year of graduation). • Aptitude and Soft skill attributes: English (score in English), Logical (score in logical section), Quant (score in quantitative ability section), Domain (scores in Domain module), ComputerProgramming (score in computer programming section), ElectronicsAndSemicon (score in electronics and semiconductor engineering section), ComputerScience (score in computer science section), MechanicalEngg (score in mechanical engineering section), ElectricalEngg (score in electrical engineering section), TelecomEngg (score in telecommunication engineering section), CivilEngg (score in Civil Engineering section). • Emotional attributes: Conscientiousness (score for the quality of wishing to do one’s work or duty well and thoroughly), agreeableness (personality trait or behavioral characteristics score), extraversion (Score in the personality test to indicate how outgoing and social is the person is), neuroticism (score for personality test to check how the person subjected to frequently changing emotions). The dataset contains both continuous (10%, 12%, cgpa) and categorical (rest including output variables).

3.2 Data Preprocessing • Demographic data removal: The independent attributes of dataset which has no effect on employability like ID, DOB, CollegeCityID, 10board, College state, 12graduation, CollegeID are dropped resulting to 26 attributes. • Renaming columns: Considering python as development platform, renaming of columns is done as python is case sensitive. All attributes are made in short form and lowercase to avoid case sensitive mistakes. • Missing value analysis and check for duplicate data are performed. • Data distribution analysis: Univariate and bivariate analysis of categorical variable done (emp, gen etc.) to find class imbalance and to identify the outlier histogram and skew is taken to check the distribution of a continuous variable. Logarithm and the square root of data are performed to remove any present skewness in data. • Object variables of the dataset are converted to integers for feature selection purpose.

Decision Tree-Based Classification Model …

331

3.3 Feature Selection In this process automatically and manually best features are selected contributing mostly to the output by the SelectKBest method of sklearn.feature_selection package of python under jupyter notebook environment. This gives rise to 20 best features with their k highest scores obtained by the statistical function f_classif which captures the linear relationship between dependent and independent variables. A highly correlated feature provides a higher score and a less correlated feature returns lower scores. f_classif used only for categorical targets based on Analysis of Variance (ANOVA) statistical test.

3.4 Model Building and Cross-Validation The decision tree is a supervised learning technique for solving classification related problems. It is a tree-structured classifier where internal nodes represent features of the dataset, branches represent the rules and leaf nodes represent the outcome. The method of predicting class begins from the root node and real attributes are compared with root attribute. Control jumps to the next node based on the outcome of comparisons. The process continues until it arrives at a leaf node. CART (Classifition or regression tree) decision tree classifier has been used in the project for model building. This classifier constructs a binary tree, where each internal node has exactly two outgoing edges. This classifier finally predicts in two classes (Yes and No). Yes means the concerned student is employable and No means the incumbent is not employable. The total dataset is divided into training and testing data.70% of data of the dataset used to train the model with 20 decision points and 30% of data are used to validate the model. After this, the decision tree model is evaluated and crossvalidated. Cross-validation is a procedure of comparing and evaluating models. Here 20 features or data points are presented in k bins (where, k = 10) of equal size. k separate learning experiments run and k numbers of testing are performed. Average of k experiments are performed for each model. k fold cross-validations are used in this work to compare with several models.

4 Experimental Results The proposed model has been implemented in python. The decision tree classifier based employability prediction model provides 90.08% accuracy by comparing predicted value against actual value of output. ROC_AUC score (ROC-receiver operating characteristics curve) was 0.5494 for the same model. ROC curve summarize the tradeoff between the true positive and false positive rate for a predictive model.

332

C. Patro and I. Pan

Fig. 1 Model comparison in boxplot

AUC or area under the curve is the measure of the classifier to distinguish between classes and used as a summary of ROC curve. It is known from the literature that higher AUC represents the better performance of a model. K fold cross-validation has been used for this decision tree model due to the medium ROC-AUC score. It helped to compare the accuracy with other classification models. Figure 1 shows a glimpse of that comparison.

5 Conclusion This work has put some insights towards the student employability analysis with the help of the in-house student data set of any educational institute. Decision tree analysis has been used in this model to classify the students into two groups. The basic objective was to find out the readiness of a student at a given point of time. Since there were only two different classification classes, the decision tree model has been used for operational simplicity. However, the discussion on existing literature put an insight that the decision tree model may not be suitable for multi-class classification.

Decision Tree-Based Classification Model …

333

References 1. Jacob, J., Jha, K., Kotak, P., Puthran, S.: Educational data mining techniques and their applications. In: Proceedings of 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Noida, pp. 1344–1348 (2015). https://doi.org/10.1109/ICGCIoT.2015. 7380675 2. Zhang, W., Qin, S.: A brief analysis of the key technologies and applications of educational data mining on online learning platform. In: Proceedings of 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), pp. 83–86 (2018). https://doi.org/10.1109/ICBDA.2018.836 7655 3. Stefanova, K., Kabakchieva, D.: Educational data mining perspectives within university big data environment. In: Proceedings of 2017 International Conference on Engineering, Technology and Innovation (ICE/ITMC), Madeira, Portugal, pp. 264–270 (2017). https://doi.org/10.1109/ ICE.2017.8279898 4. Kerdprasop, K., Kerdprasop, N.: From educational data mining model to the automated knowledge based system construction. In: Proceedings of 2015 8th International Conference on UbiMedia Computing (UMEDIA), Colombo, pp. 186–191 (2015). https://doi.org/10.1109/UME DIA.2015.7297452 5. Pathan, A.A., Hasan, M., Ahmed, M.F., Farid, D.M.: Educational data mining: a mining model for developing students’ programming skills. In: Proceedings of the 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), pp. 1–5 (2014). https://doi.org/10.1109/SKIMA.2014.7083552 6. Manzoor, M.Z., Hashmani, A., Savita, K.S.: Performance analysis of feature selection algorithm for educational data mining. In: Proceedings of IEEE Conference on Big Data and Analytics (ICBDA), pp. 7–12 (2017). https://doi.org/10.1109/ICBDAA.2017.8284099 7. Nasiri, M., Minaei, B., Vafaei, F.: Predicting GPA and academic dismissal in LMS using educational data mining: a case mining. In: Proceedings of 6th National and 3rd International Conference of E-Learning and E-Teaching, pp. 53–58 (2012). https://doi.org/10.1109/ICELET. 2012.6333365 8. Anoopkumar, M., Rahman, A.M.J.M.Z.: A review on data mining techniques and factors used in educational data mining to predict student amelioration. In: Proceedings of 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), Ernakulam, pp. 122–133 (2016). https://doi.org/10.1109/SAPIENCE.2016.7684113 9. Devasia, T., Vinushree, T.P., Hegde, V.: Prediction of students performance using educational data mining. In: Proceedings of 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), Ernakulam, pp. 91–95 (2016). https://doi.org/10.1109/SAPIENCE. 2016.7684167 10. Khan, M.A., Gharibi, W., Pradhan, S.K.: Data mining techniques for business intelligence in educational system: a case mining. In: Proceedings of 2014 World Congress on Computer Applications and Information Systems (WCCAIS), Hammamet, pp. 1–5 (2014). https://doi. org/10.1109/WCCAIS.2014.6916559

Interactive and Intelligent Tutoring of Graphical Solutions Prapty Chanda, Nilormi Das, Dishani Kar, Anirban Mukherjee, and Arindam Mondal

Abstract Nowadays online teaching–learning has become an integral part of the education of students ranging from school to college. The present paper proposes a feasible implementation of a student-centric automated system of learning and tutoring of school-level concepts of linear equations. The system can intelligently assess the student’s understanding of the concepts by testing him/her with relevant questions. While the student plots a line in the interactive graphical interface corresponding to a given algebraic equation, the system will evaluate the correctness of the line plotted. The system also checks student’s response regarding the graphical solution of two equations. The score of a student will be evaluated on the basis of the correctness of the response given by the student. The intelligent treatment of the system lies in the fact that by evaluating a student’s understanding the system can automatically test him with harder or easier concepts. The test case cited demonstrates the usefulness and of the system. Keywords Intelligent tutoring · Graph plotting of equations · Equations solving · Computer-based teaching–learning

P. Chanda (B) · N. Das · D. Kar · A. Mukherjee · A. Mondal RCC Institute of Information Technology, Canal South Road, Kolkata, West Bengal 700015, India e-mail: [email protected] N. Das e-mail: [email protected] D. Kar e-mail: [email protected] A. Mukherjee e-mail: [email protected] A. Mondal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6_33

335

336

P. Chanda et al.

1 Introduction Nowadays in the education system, there is a revolution in the use of technology especially computing systems. Particularly, the recent pandemic has shown us that students have no other alternative but to use a computer or smartphone to learn the lessons taught online by the schools or colleges. Teachers are also using different software or app to deliver the lessons to the students. The teacher gives assignment, classwork and quiz in an online platform and the students submit their response in online mode. This mode of education has become the standard not only in the higher education segment but also at all levels of education starting from the primary or nursery school level. A teacher needs to take class in online mode sometime for nearly a hundred or more students. So it is not possible for a teacher to monitor a student individually whether that student has learnt a lesson properly or not. Here lies the necessity of having an intelligent tutoring system that can dynamically track a student’s performance or understanding and accordingly feed him with appropriate exercise or lesson. This implies if a student can learn faster then it automatically gives him more complex problems to test and on the other hand, if the student does not understand a topic (tested by a problem) then it tests the more fundamental concept of the student by giving easier problem. The proposed system for intelligent tutoring of linear algebraic equations is designed in such a manner that it helps the student in learning the concept of linear graph drawing and finding graphical solutions in a self-paced manner and also helps monitor or test his progress of learning of concepts in order of increasing difficulty. A computer-based teaching–learning process in nothing new in today’s education system. New software and new apps are made available on regular basis. Mukherjee et al. introduced the system of automated diagram drawing from natural language description using GeometryNet [1, 2]—a knowledge base of school-level geometry concepts. Intelligent evaluation of the correctness of student-drawn geometric diagram was first proposed in [3] by Mukherjee et al. Based on these works Mondal et al. [4] proposed a basic framework of intelligent tutoring of basic geometrical drawing for primary school students. Using this prototype student can self-learn and self-test basic concepts of geometry. Skultety et al. [5] suggests that students who used active geometry software were more successful in discovering new mathematical thoughts than when they used paper-based construction. There are other intelligent tutoring systems [6, 7] which present a different way of teaching to students. In the present context of automated teaching on solving linear algebraic equations, few available software/tools are studied which are enumerated below. GeoGebra is a software that explicitly links (a bidirectional combination of) geometry and algebra level. It has a built-in Computer Algebra System (CAS). It has the ability to use variables for numbers, vectors and points, find derivatives and integrals of functions. iMathematics Pro is a useful tool that can solve various exercises

Interactive and Intelligent Tutoring of Graphical Solutions

337

with some built-in tools like Advanced Calculator, Fraction Approximator and Equation Solver. Desmos is an online graphing calculator that allows students to dynamically interact with equations. It also features lists, plots, inequalities, regressions, interactive variables, simultaneous graphing, and plot tables of data, transformations, and more. Quickmath is a site for answering common algebra questions automatically: users enter a mathematical expression, and decide whether they wish to expand, factorize, or simplify it. The answer is computed automatically and returned to them within a couple of minutes. Wolfram Mathematica is a computational platform or toolkit that encompasses computer algebra, symbolic and numerical computation, visualization, and statistics capabilities. Features include instant dynamic interactivity, high-impact adaptive visualization, and so on. Few other similar software includes Webmath, Mathigon, Mathspace, Graspable Math. Though all the above software can be used for algebraic problem solving and graph drawing but none of them are used for intelligent student-centric tutoring.

2 Proposed Model The proposed system is divided into two modes, the lesson mode and the test mode. Students can choose any one mode to self-learn or to self-test the related concepts. In the lesson mode the fundamental concepts regarding linear algebraic equation (like variables, constants, coefficients, slopes, intercepts etc.), solution of a set of equations, graphical representation of equations (how to plot a line graph) and graphical solutions (how to find intersection of two lines) etc. are demonstrated in a step by step manner. In the next level, the system will explain different conditions for a set of lines to represent feasible/infeasible solutions like parallel lines (no solution), incident lines (infinite solutions) and intersecting lines (unique solution). In the test mode, which forms the most important and intelligent part of the system, the students’ understanding of linear algebraic equations and linear graphs will be tested and monitored. The student would be asked to plot the linear graphs of equations that are randomly generated by the system. The student will also be asked the type of the solution (unique/infinite/no solution) and the exact solution, if any, that is, the intersection point of the corresponding straight lines. If there is no unique solution or infinite solution (0 or null intersection point is considered), the student has to enter “0” in the input field to score full marks. As the linear algebraic equations are generated from an equation database maintained by the system, the question changes every time. After selecting the question, the system algebraically calculates the slopes, intercepts, type of solution and the intersection point and stores them in the background. A student can click on any two points using the mouse on the graphical (grid) interface presented, which will draw a straight line through the points. After plotting both the lines corresponding to two equations, the student selects the option for the ‘type of solution’ and inputs the intersection point found from graph. The system automatically takes the points clicked by the student and calculates and compares the resultant slopes and intercepts

338

P. Chanda et al.

of the lines drawn by the student with the pre-calculated slopes and intercepts of the given equations. Finally, the system shows the percentage scored by the student. If the student fails to answer any question correctly, the system also shows the correct graph and the correct answers along with the score. Just like in the lesson mode, typical problems are presented according to the increasing level of difficulty of the related concept in the test mode. When a student scores maximum to a given test problem he may be tested subsequently with a slightly difficult problem. Like, if one responds correctly to a problem having a unique solution then he may be given a problem that has no solution or infinite solution and vice versa if the student doesn’t respond correctly to any of the latter problems. Also, if a student fails to understand and respond correctly to a problem with, say, a unique solution then he may be given multiple same level problems. Then, if the average score at a level is below threshold (say 80%) then he will be given fundamental problems like the one to plot a straight line from an algebraic equation or finding out the slope or intercept of a line from an equation or points on a line, parallel lines, vertical lines, horizontal lines, intersecting lines etc. The equation database stores equations (set of x, y coefficients and constants) of lines or set of lines with difficulty levels assigned to each such dataset. The concept of attaching difficulty level to the individual or set of linear equations and automatically guiding a student back and forth between the easier and harder concepts according to his performance makes possible student-centric intelligent tutoring. This is what a teacher is expected to do but practically it is not always possible to track a particular student’s understanding in a live online mode attended by 100 odd students or more. As far as the development of the system is concerned Python has been selected as the programming platform. Since it is a tutoring system for students, it has a userfriendly and easy-to-use interface. This UI, for the lesson mode & test mode, was created using Tkinter. In the lesson mode, all the necessary theory for teaching the concepts of linear equations is presented for easy understanding of the student. The graph of the system is an interactive graph, created with Matplotlib for visualization and Tkinter for tracking the mouse click. The process flow of the system is given below. Process Flow of the proposed system Define Equation Database to contain different types of linear equations (horizontal/vertical/inclined/lines with specific slopes and intercepts) and sets of pair of equations having different types of solutions (no solution/infinite solution/unique solution) with different difficulty levels attached to each equation or each set of equations. Step 1: Step 2:

User selects Lesson Mode or Test Mode. If Lesson Mode, content is displayed (from Equation Database) on different types of equations and corresponding line graphs and also different types of solutions for a pair of equations from lower difficulty level to the higher difficulty level.

Interactive and Intelligent Tutoring of Graphical Solutions

Step 3: Step 4: Step 5:

Step 6:

339

If Test Mode, then for the graphical solution of two linear equations, follow Step 4–Step 6. Randomly generate a pair of linear equation (having finite solution) from Equation Database. User selects two points for each equation on the graphical interface and corresponding lines are drawn by the system; user identifies the intersection point. Pre-calculated slope, intercept and intersection points (from equations) are compared with the slope, intercept and intersection of the lines actually drawn by user (calculated from the captured point coordinates selected on screen by user). – If the match % of the line plot, nature of solution and intersection point is greater than 75% then user is given another problem of same nature (same difficulty level) OR – If the match % is 100% then user is given another problem pertaining to infinite solution or no solution (higher difficulty level) OR – If the match % is less than 75% then user is given another problem pertaining to a single line plot (lower difficulty level).

3 Experiment 3.1 Lesson Mode An example of lesson mode is shown in Fig. 1 where the screenshot on left describes how to plot on a graph a linear equation and the one at the right demonstrates graphically unique solution of a set of equations. Different types (difficulty level) of lines equations are presented (or students are allowed to select) and the corresponding change in graphical plot is shown dynamically.

3.2 Test Mode Two test cases are presented here—one where the student’s response is fully correct and another where student give erroneous input. Case 1: Correct solution Here the system provides the student with two linear equations. The student has to compute the x, y coordinates of two points that satisfy each of the equations and corresponding straight lines, as taught in the lesson mode. The graphical interface

340

P. Chanda et al.

Fig. 1 Lesson mode

is interactive and the student can dynamically select the points for plotting. Subsequently, the student has to click on the “Plot on Graph” button and plot the points on the given graphical interface (Fig. 2). Here, the student plots (0, −1) and (−0.5, 0) for the equation 4x + 2y = −2 and (4, 0) and (0, 2) for the equation 3x + 6y = 12. Accordingly, the line graphs were plotted by the system (Fig. 3). Having plotted the graph, the student now has to opt for the type of solution for the given set of equations. If it has a unique solution, then a text box appears for the student to enter the intersection point. Finally, he/she has to click the submit button

Fig. 2 Equations given for Case 1

Interactive and Intelligent Tutoring of Graphical Solutions

(a)

341

(b)

Fig. 3 Graph for a 4x + 2y = −2 and b 3x + 6y = 12

to get the result (Fig. 4). Here, the student has given the correct answer and scored 100%. In the backend, the system computes the slopes, intercepts and solutions for the given set of equations. The same is calculated with the inputs given by the student and the values are then matched to calculate the score %. Here, all the parameters calculated from inputs by the student match with the system-stored values (Table 1) and hence the degree of correctness is 100%. Case 2: Incorrect solution In this case one of the plots by a student and hence the intersection point was incorrect. A pop up appears depicting the incorrect plot in red. The correct plot and the intersection point are then displayed by the system on the same screen (Fig. 5).

Fig. 4 Correct solution

Fig. 5 Equations and plots

342 P. Chanda et al.

Interactive and Intelligent Tutoring of Graphical Solutions

343

Table 1 User score Plot 1

Plot 2

MCQ

Intersection point

Max score

1

1

1

1

User score

1

1

1

1

Table 2 Choice of solution and entering intersection point Plot 1

Plot 2

MCQ

Intersection point

Max score

1

1

1

1

User score

1

0

1

0

Here, since the student has drawn the second line wrong and entered the wrong intersection point, his/her answer is partially correct and hence the degree of correctness is 50% (Table 2).

4 Conclusion This paper proposes a feasible implementation of a student-centric automated system of tutoring graphical solutions of equations. Basic linear algebraic equations are taken as a case study because this helps the basic understanding of graph plotting. The difficulty level is attached to each topic of the learning content starting from the basics of plotting a straight line from an equation to identifying different types of lines (like horizontal, vertical, parallel and diagonal lines) and finding graphical solutions of a set of lines (parallel, intersecting, incident). By evaluating the student’s understanding of each topic, the entire content is tutored in an intelligent way. This model shows how the system is sensitive to student’s responses and accordingly enables student-wise tracking of progress in learning which can serve as a useful component of the online teaching–learning system. The proposed system can be extended in future to graph plotting and finding graphical solution of non-linear, trigonometric and multiple-variable equations which can pose a real challenge towards developing a comprehensive intelligent tutoring system for graphical analysis.

References 1. Mukherjee, A., Garain, U., Nasipuri, M.: On construction of a geometrynet. In: IASTED International Conference on Artificial Intelligence and Applications, Calgary, Canada, pp. 530–536. ACTA Press (2007) 2. Mukherjee, A., Sengupta, S., Chakraborty, D., et al.: Text to diagram conversion: a method for formal representation of natural language geometry problems. In: IASTED International

344

P. Chanda et al.

Conference on Artificial Intelligence and Applications, Austria, pp. 137–144 (2013) 3. Mukherjee, A., Garain, U., Biswas, A.: Evaluation of the graphical representation for texttographic conversion systems. In: 10th IAPR International Workshop on Graphics Recognition— GREC 2013, Bethlehem PA, USA (2013) 4. Mondal, A., Mukherjee, A., Garain, U.: Intelligent monitoring and evaluation of digital geometry figures drawn by students. In: Bhattacharyya, S. (ed.) Intelligent Multimedia Data Analysis. De Gruyter (2019) 5. Skultety, L., Gonzalez, G., Vargas, G.: Using technology to support teachers’ lesson adaptations during lesson study. J. Technol. Teach. Educ. 25(2) (2017) 6. Feng, M., Roschelle, J., Heffernan, N., Fairman, J., Murphy, R.: Implementation of an intelligent tutoring system for online homework support in an efficacy trial. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) Intelligent Tutoring Systems. ITS 2014 (2014) 7. AbuEloun, N., Naser, S.: Mathematics intelligent tutoring system. Int. J. Adv. Sci. Res. 2, 11–16 (2017)

Author Index

A Abayomi, Abdultaofeek, 185 Abdulazeez, Alimot, 185 Aiyer, Usha, 297 Anwar, Shamama, 233

B Bagchi, Parama, 121 Baidya, Debasrita, 49 Basu, Abhishek, 213 Basu, Soham, 173 Bhamu, Subhash, 297 Bhattacharjee, Debotosh, 111 Bhattacharjee, Vandana, 233 Bhattacharya, Ankit, 173 Bhattacharya, Indradeep, 163 Bhosale, Bhagyashri, 297 Biswas, Sudarsan, 1

C Capriles, Priscila V. S. Z., 255 Carvalho, Ruan M., 309, 319 Chaki, Susmita, 203 Chakraborty, Arunava Kumar, 75 Chakraborty, Sujit, 1 Chanda, Prapty, 335 Charan, Godavarthi, 149 Chatterjee, Subhajit, 203 Chattopadhyay, Avik, 213 Chowdhury, Aditi Roy, 101 Colugnati, Fernando A. B., 255

D da Fonseca, Leonardo G., 319 Darapaneni, Narayana, 297 Dasgupta, Kousik, 101 Das, Nilormi, 335 Das, Sourav, 63, 75 Das, Srirupa, 287 Debnath, Sourav, 1 Dey, Arkadeep, 13 Dhar, Soumyadip, 121 Dutta, Paramartha, 101 G Ghosh, Bishal, 121 Ghosh, Chanchal, 223 Goliatt, Leonardo, 255, 309 Goliatt, Priscila V. Z. C., 309 Gomes, Diego E. B., 309 Gopal, Viji, 277 Guha, Sutirtha Kumar, 41 Gupta, Shibakali, 163 H Haldar, Sayak, 41 Hazra, Joydev, 101 Hussein, Molla Rashied, 243 I Iwashima, Gabriele C., 255 K Kar, Dishani, 335

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Pan et al. (eds.), Proceedings of Research and Applications in Artificial Intelligence, Advances in Intelligent Systems and Computing 1355, https://doi.org/10.1007/978-981-16-1543-6

345

346 Kar, Reshma, 87 Khan, Amit, 131 Kolya, Anup Kumar, 63, 75 Kumari, Nandini, 233

M Majumdar, Dipankar, 131, 141, 223 Mallick, Portret, 13 Mandal, Surajit, 141 Mazumder, Indronil, 87 Mishra, Shreyas, 195 Mitra, Diptarshi, 49 Mondal, Arindam, 335 Mondal, Arpita, 49 Mondal, Bikromadittya, 131, 223 Mukherjee, Anirban, 203, 335 Mukherjee, Sayantan, 173

N Nasipuri, Mita, 111

P Pan, Indrajit, 327 Patro, Chandra, 327 Paul, Varghese, 277 Pavan Kumar, Y. V., 149 Pereira, Egberto, 319 Pramanik, Sourav, 111

Author Index R Rai, Avinash, 267 Rao, K. Sandeep, 149 Rastogi, Ankit, 297 Reddy Paduri, Anwesh, 297 Rosa, Iago G. L., 309, 319 Roy, Hiranmoy, 121 Roy, Pritam Kumar, 41 Roy, Subhrajit Sinha, 213

S Saha, Rajib, 121 Sampath, Dasa, 149 Santra, Soumen, 141 Saporetti, Camila M., 319 Sarkar, Abhijit, 49 Scoralick, João P., 255 Sen, Anindya, 173 Shaikat, Abu Salman, 243 Singla, Yash, 27 Soni, Akanksha, 267 Subhadarshy, Turyansu, 297

T Tasnim, Rumana, 243 Tunga, Harinandan, 13

U Udoh, Samuel, 185 Umoh, Uduak, 185