Machine Intelligence and Emerging Technologies: First International Conference, MIET 2022, Noakhali, Bangladesh, September 23-25, 2022, Proceedings, ... and Telecommunications Engineering) 3031346181, 9783031346187

147 71 67MB

English Pages [597]

Table of contents :
Preface
Organization
Contents – Part I
Contents – Part II
Imaging for Disease Detection
Potato-Net: Classifying Potato Leaf Diseases Using Transfer Learning Approach
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Description
3.2 Data Preprocessing
3.3 Model Implementation
3.4 Performance Calculation
4 Results and Discussions
5 Conclusion
References
False Smut Disease Detection in Paddy Using Convolutional Neural Network
1 Introduction
2 Related Work
3 Methodology
3.1 False Smut Disease
3.2 Data Collection and Preprocessing
3.3 Convolutional Neural Network
4 Result and Performance Analyses
5 Future Research Direction and Conclusion
References
Gabor Wavelet Based Fused Texture Features for Identification of Mungbean Leaf Diseases
1 Introduction
2 Methodology
2.1 Dataset
2.2 Feature Extraction
2.3 Classification Using Cubic SVM
3 Results and Discussion
3.1 GW Decomposition
3.2 Statistical Analysis
3.3 Comparison to Other Classifiers
3.4 Comparison to Existing Approaches
3.5 Analysis of Computational Time
3.6 Discussion
4 Conclusion
References
Potato Disease Detection Using Convolutional Neural Network: A Web Based Solution
1 Introduction
2 Related Work
3 Proposed System
3.1 Methodology
3.2 Dataset Description
3.3 Data Preprocessing
3.4 Evaluation Measures
3.5 UML Use Case Diagram of Proposed Web Application
3.6 UML Activity Diagram of Proposed Web Application
4 Experimental Result
4.1 Comparison of CNN Models
4.2 Model Evaluation
4.3 User Interface of the Web Application
4.4 Sample Prediction of the Web Application
4.5 Comparison with Other Works
5 Conclusion
References
Device-Friendly Guava Fruit and Leaf Disease Detection Using Deep Learning
1 Introduction
2 Related Works
3 Dataset
4 Method
4.1 Model Training
4.2 Model Optimization
5 Results and Discussion
6 Conclusion and Future Work
References
Cassava Leaf Disease Classification Using Supervised Contrastive Learning
1 Introduction
2 Related Work
3 Dataset
4 Methodology
4.1 Representation Learning Framework
4.2 Projection Head
4.3 Classifier Head
4.4 Supervised Contrastive Learning Loss
4.5 First Stage Training
4.6 Second Stage Training (Encoder + Classifier Head)
5 Evaluation
5.1 Evaluation Metric
5.2 Experimental Setup
5.3 Evaluation and Comparison
6 Conclusion
References
Diabetes Mellitus Prediction Using Transfer Learning
1 Introduction
2 Background Study
3 Methodology
3.1 The Dataset
3.2 Exploratory Data Analysis
3.3 Data Transforming
3.4 Design of Our Model
3.5 Application of Convolutional Neural Networks
4 Results and Discussion
5 Conclusion
References
An Improved Heart Disease Prediction Using Stacked Ensemble Method
1 Introduction
2 Literature Review
3 Methodology
3.1 Research Design
3.2 Data Collection and Preprocessing
3.3 Models
4 Result Analysis
5 Conclusion and Future Recommendation
References
Improved and Intelligent Heart Disease Prediction System Using Machine Learning Algorithm
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset Collection
3.2 EDA (Exploratory Data Analysis) & F.E (Feature Engineering)
3.3 Preparing to Model
4 Results and Discussion
4.1 Comparing This Research Work with Some Previous Research
5 Conclusion
References
PreCKD_ML: Machine Learning Based Development of Prediction Model for Chronic Kidney Disease and Identify Significant Risk Factors
1 Introduction
2 Materials and Methods
2.1 Data
2.2 Data Preprocessing
2.3 Performance Evaluation Metrics
2.4 Machine Learning Approaches
2.5 Model Selection and Features Importance
3 Results and Discussion
4 Conclusion and Future Work
References
A Reliable and Efficient Transfer Learning Approach for Identifying COVID-19 Pneumonia from Chest X-ray
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Summarization
3.2 Dataset Preprocessing
3.3 Altered Transfer Learning Methods
3.4 Training and Testing
3.5 Ensemble of Best TL Models
4 Result Analysis
4.1 Result Analysis of 9 TL Models Based on Dataset 1 (80:20) Split
4.2 Result Analysis of 5 TL Models in First Scenario
4.3 Result Analysis of 5 TL Models in Second Scenario
4.4 Result Analysis of 5 TL Models in Third Scenario
4.5 Result Analysis of 5 TL Models in Fourth Scenario
4.6 Result Analysis of 5 TL Models in Fifth Scenario
4.7 Average Performance of the 5 TL Models Throughout 5 Scenarios
4.8 Result Analysis of Ensemble Model
4.9 Discussion
5 Conclusion
References
Infection Segmentation from COVID-19 Chest CT Scans with Dilated CBAM U-Net
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Description and Preprocessing
3.2 Model Overview
3.3 U-Net Architecture
3.4 Convolutional Block Attention Module (CBAM)
4 Result and Analysis
4.1 Visual Results
4.2 Performance Table
5 Conclusion and Future Work
References
Convolutional Neural Network Model to Detect COVID-19 Patients Utilizing Chest X-Ray Images
1 Introduction
2 Literature Review
3 Materials and Methods
3.1 Data Collection
3.2 Data Pre-processing
3.3 Proposed Convolutional Neural Networks
3.4 Baseline Classifiers
3.5 Pre-trained Transfer Learning Models
3.6 Evaluation
4 Experiment Results
5 Discussion
6 Conclusion and Future Work
References
Classification of Tumor Cell Using a Naive Convolutional Neural Network Model
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Description
3.2 Data Pre-processing
3.3 Models
3.4 Experimental Setup
4 Result Analysis
5 Conclusion
References
Tumor-TL: A Transfer Learning Approach for Classifying Brain Tumors from MRI Images
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Description
3.2 Data Preprocessing
3.3 Model Implementation
3.4 Performance Calculation
4 Results and Discussions
5 Conclusion
References
Deep Convolutional Comparison Architecture for Breast Cancer Binary Classification
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Pre-processing Using GAN
3.2 Feature Extraction, Selection and Classification
4 Result Analysis and Discussion
4.1 Dataset
4.2 Experimental Setup
4.3 Classification Result and Confusion Matrix
4.4 Model Accuracy, Loss Function, and ROC Graph Analysis
4.5 Discussion
5 Conclusion and Future Work
References
Lung Cancer Detection from Histopathological Images Using Deep Learning
1 Introduction
2 Related Works
3 Dataset
4 Methodology
4.1 Collection and Analysis of Data
4.2 Data Preprocessing
4.3 Deep Neural Network
4.4 Analysis and Visualization
5 Result and Discussions
6 Conclusion
References
Brain Tumor Detection Using Deep Network EfficientNet-B0
1 Introduction
2 Related Works
3 Proposed Methodology
3.1 Data
3.2 Deep Network
4 Experimental Section
4.1 Experiment
4.2 Result
4.3 Performance Matrices
4.4 Discussion
5 Conclusion
References
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures
1 Introduction
2 Background Study
2.1 Convolutional Neural Network
2.2 Transfer Learning
2.3 Related Works Using Mammograms and Images
2.4 Related Works Using CT Scan Images
2.5 Related Works Using Dermatoscopic Images
3 Workflow
3.1 Data Collection
3.2 Data Pre-processing
3.3 Architectures
4 Experimental Result Analysis
4.1 Experiment 1: Breast Cancer
4.2 Experiment 2: Lung Cancer
4.3 Experiment 3: Skin Cancer
5 Conclusion
References
Transfer Learning Based Skin Cancer Classification Using GoogLeNet
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset Description
3.2 Data Preprocessing
3.3 Data Augmentation
3.4 GoogLeNet
3.5 Xception Model
3.6 DenseNet Model
3.7 Inception-Resnet V2 Model
3.8 Transfer Learning
4 Result and Discussion
4.1 System Configuration
4.2 Training and Test Datasets
4.3 Transfer Learning Model's Hyperparameters
4.4 Performance Matrices
4.5 Result
4.6 Comparison with Existing Work
5 Conclusion and Future Work
References
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients Using Machine Learning Techniques
1 Introduction
2 Literature Review
3 Dataset Description
4 Research Methodology
5 Learning Models
6 Results and Analysis
6.1 Correlation Model Performance (Pearson Correlation Analysis)
6.2 Confusion Matrix
6.3 Accuracy, Precision, Recall, and F1-Score
6.4 Receiver Operating Characteristic (ROC)
7 Comparative Study
8 Conclusions
References
MRI Based Automated Detection of Brain Tumor Using DWT, GLCM, PCA, Ensemble of SVM and PNN in Sequence
1 Introduction
2 Dataset
3 Proposed Method
3.1 Pre-processing
3.2 Segmentation
3.3 Feature Extraction
3.4 Feature Selection Using PCA
3.5 Brain Tumor Classification
4 Results
5 Conclusion
References
Pattern Recognition and Natural Language Processing
Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets
1 Introduction
2 Literature Review
3 Methodology
3.1 ASUS Tinker SBC Preparation
3.2 Dataset Creation and Preprocessing
3.3 Image Augmentation
3.4 Training of MobileNetV2 and Classification
4 Result and Discussion
5 Conclusion
References
Fake Profile Detection Using Image Processing and Machine Learning
1 Introduction
1.1 Motivation
1.2 Problem Statement
1.3 Objective and Contributions
2 Background
2.1 Literature Review
3 Dataset Description
4 Proposed Model
4.1 Image Processing
4.2 One Time Password(OTP)
5 Implementation and Result Analysis
6 Analysis of Image Processing
7 Conclusion
References
A Novel Texture Descriptor Evaluation Window Based Adjacent Distance Local Binary Pattern (EADLBP) for Image Classification
1 Introduction
2 Background Study
2.1 Local Binary Pattern LBP
2.2 Local Binary Patterns by Neighborhoods nLBPd
3 Proposed Texture Descriptor
3.1 Proposed Adjacent Distance Based Local Binary Pattern AdLBP
3.2 Proposed Evaluation Window Based Local Binary Pattern EwLBP
3.3 Proposed Evaluation Window Based Adjacent Distance Local Binary Pattern EADLBP
4 Experimental Analysis
5 Conclusion
References
Bornomala: A Deep Learning-Based Bangla Image Captioning Technique
1 Introduction
2 Literature Review
3 Attention Based Caption Generation
4 Materials and Methods
4.1 Dataset Preparations
4.2 Model Description
5 Result Analysis and Discussions
6 Conclusion
References
Traffic Sign Detection and Recognition Using Deep Learning Approach
1 Introduction
2 Literature Review
3 Methodology
3.1 Image Collection
3.2 Preprocessing
3.3 Detection Method
3.4 Detection Method
4 Experimental Results
4.1 CNN
4.2 InceptionV3
4.3 AlexNet
5 Model Evaluation and Analysis
5.1 Classification Report of the Models
5.2 Prediction Results
5.3 Comparative Analysis
6 Conclusion
References
A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network
1 Introduction
2 Related Work
3 Preliminaries
3.1 Convolutional Neural Network (CNN)
3.2 Mel Frequency Cepstral Coefficients (MFCCs)
4 Proposed Method
4.1 Dataset Description
4.2 Preprocessing and Augmentation
4.3 Feature Extraction
4.4 Train Test Split
4.5 Feature Learning and Classification Using CNN
5 Experimental Results and Analysis
6 Future Work
7 Conclusion
References
Bangla Speech-Based Person Identification Using LSTM Networks
1 Background
1.1 Related Work
2 Methods
2.1 Create a Dataset
2.2 Feature Extraction
2.3 Data Preprocessing
2.4 Model Overview
3 Results
3.1 Training
3.2 Performance Measure
4 Conclusion and Future Works
References
VADER vs. BERT: A Comparative Performance Analysis for Sentiment on Coronavirus Outbreak
1 Introduction
2 Related Work
3 Methodology
3.1 Data Acquisition and Pre-processing
3.2 Sentiment Analysis
3.3 Model Building
4 Experiment Setup and Results
4.1 Dataset
4.2 Word Embedding
4.3 Experimental Results and Analysis
5 Conclusion and Future Works
References
Aspect Based Sentiment Analysis of COVID-19 Tweets Using Blending Ensemble of Deep Learning Models
1 Introduction
2 Related Work
3 Methodology
3.1 Topic Modeling for Aspect Extraction
3.2 Sentiment Analysis Using Blending Ensemble
3.3 Aspect Based Sentiment Categorization
4 Experiments
4.1 Dataset
4.2 Data Preprocessing
4.3 Hyper-Parameters Settings
4.4 Models Selection and Integration with Blending Ensemble Classifier
4.5 Effect of Iterations of Models
4.6 Insights from the Outputs to Provide Decision Support
5 Discussion
6 Conclusion and Future Work
References
Covid-19 Vaccine Sentiment Detection and Analysis Using Machine Learning Technique and NLP
1 Introduction
2 Related Work
3 Proposed Method
4 Result and Discussion
5 Conclusion
References
Sentiment Analysis of Tweets on Covid Vaccine (Pfizer): A Boosting-Based Machine Learning Solution
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Collection
3.2 Labeling Sentiment
3.3 Feature Extraction
3.4 Dataset Split
3.5 Classification
4 Result Evaluation
5 Conclusion and Future Work
References
Matching Job Circular with Resume Using Different Natural Language Processing Based Algorithms
1 Introduction
2 Related Works
3 Proposed Methodology
3.1 Overview
3.2 Collecting Dataset and Job Seekers Resume
3.3 Feature Extraction
3.4 Feature Extraction
3.5 Pre-Processing
3.6 Calculating Similarity Score
4 Different Models
4.1 Doc2Vec Model
4.2 BERT Model
4.3 Word2Vec Model (Trained with Our Corpus)
4.4 TF-IDF Model
4.5 Word2Vec Model (Pre-Trained with Google News)
4.6 GloVe Model
5 Experimental Result Analysis
5.1 Performance Evaluation
5.2 Doc2Vec Model
5.3 BERT Model
5.4 Word2Vec Model (Trained with Our Corpus)
5.5 TF-IDF Model
5.6 Word2Vec Model (Pre-Trained with Google News)
5.7 GloVe Model
5.8 Result Analysis
6 Implementation
7 Conclusion and Future Work
7.1 Conclusion
7.2 Future Work
References
Transformer-Based Text Clustering for Newspaper Articles
1 Introduction
2 Background and Related Work
3 Dataset and Proposed Methodology
3.1 Dataset Preparation and Preprocessing
3.2 Clustering Process and Evaluation Metric
4 Experimental Results
4.1 Performance of Distance-Based Clustering Algorithms
4.2 Performance of Density-Based Clustering Algorithm
4.3 Comparison Between K-Means and DBSCAN
5 Conclusion and Future Work
References
Bangla to English Translation Using Sequence to Sequence Learning Model Based Recurrent Neural Networks
1 Introduction
2 Recurrent Neural Networks Model
3 Proposed Methodology
3.1 RNN with Attention
4 Experimental Evaluation
4.1 Testing Result
5 Conclusion
References
Bangla Spelling Error Detection and Correction Using N-Gram Model
1 Introduction
2 Related Works
3 Proposed Bangla Spelling Error Detection and Correction Model
3.1 Data Collection
3.2 Data Preprocessing
3.3 Non-word Error Detection and Suggestion
3.4 Real-Word Error Detection and Correction
4 Performance Evaluation
5 Conclusion
References
Bidirectional Long-Short Term Memory with Byte Pair Encoding and Back Translation for Bangla-English Machine Translation
1 Introduction
2 BiLSTM and Its Performance Improvement Mechanisms
2.1 Bidirectional Long-Short Term Memory (BiLSTM) Basics
2.2 BiLSTM with Attention for MT
2.3 Byte Pair Encoding (BPE)
2.4 Back Translation (BT)
3 BiLSTM Based Bangla-English MT (BEMT) with ATT, BPE and BT
3.1 Benchmark Dataset
3.2 BiLSTM with Attention (BiLSTM+ATT)
3.3 BiLSTM+ATT+BPE Model
3.4 BiLSTM+ATT+BT Model
3.5 BiLSTM+ATT+BPE+BT Model
4 Experimental Studies
4.1 Experimental Setup and Performance Evaluation
4.2 Experimental Results and Analysis
5 Conclusions
References
Face Recognition-Based Mass Attendance Using YOLOv5 and ArcFace
1 Introduction
2 Related Works
3 Methodology
3.1 Face Detection Algorithms
3.2 Face Recognition Algorithms
4 System Architecture
5 Dataset Collection
6 Training the System and Recording Attendance
7 Experimental Setup and Results
8 Conclusion and Future Works
References
A Hybrid Watermarking Technique Based on LH-HL Subbands of DWT and SVD
1 Introduction
2 Literature Review
3 Theoretical Background
3.1 DWT
3.2 SVD
4 Proposed Methodology
4.1 Image Watermark Embedding
4.2 Watermark Extraction
5 Experimental Results and Discussion
6 Discussion
7 Conclusion
References
A Smartphone Based Real-Time Object Recognition System for Visually Impaired People
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Object Detection Layer
3.2 Object Recognition Layer
3.3 Cloud Layer
4 Application Development
5 Experimental Results and Analysis
5.1 Object Detection
5.2 Object Detection
6 Conclusions
References
Bangla Speech Emotion Recognition Using 3D CNN Bi-LSTM Model
1 Introduction
2 Bangla Speech Emotion Recognition Using 3DCNN-Bi-LSTM Model
2.1 Speech Processing and Transformation
2.2 Classification Using 3D CNN Bi-LSTM Model
3 Experimental Studies
3.1 Benchmark Dataset and Experimental Setup
3.2 Experimental Results and Analysis
3.3 Performance Comparison with Existing Models
4 Conclusions
References
An RNN Based Approach to Predict Next Word in Bangla Language
1 Introduction
2 Related Works
3 Methodology
3.1 Dataset
3.2 Preprocessing
3.3 Bi-directional LSTM
3.4 Bi-directional GRU
3.5 Implementation
4 Result and Analysis
5 Conclusion
References
Author Index

Recommend Papers

Machine Intelligence and Emerging Technologies: First International Conference, MIET 2022, Noakhali, Bangladesh, September 23-25, 2022, Proceedings, ... and Telecommunications Engineering, 491) 3031346211, 9783031346217

115 105 73MB Read more

Emerging Networking Architecture and Technologies. First International Conference, ICENAT 2022 Shenzhen, China, November 15–17, 2022 Proceedings 9789811996962, 9789811996979

179 81 53MB Read more

Intelligent Systems and Machine Learning: First EAI International Conference, ICISML 2022, Hyderabad, India, December 16-17, 2022, Proceedings, Part ... and Telecommunications Engineering, 471) 3031350804, 9783031350801

This two-volume set constitutes the refereed proceedings of the First EAI International Conference on Intelligent System

116 112 37MB Read more

Intelligent Systems and Machine Learning: First EAI International Conference, ICISML 2022, Hyderabad, India, December 16-17, 2022, Proceedings, Part I ... and Telecommunications Engineering, 470) 9783031350771, 9783031350788, 3031350774

This two-volume set constitutes the refereed proceedings of the First EAI International Conference on Intelligent System

109 88 94MB Read more

Proceedings of International Conference on Computational Intelligence and Data Engineering: ICCIDE 2022 9819906083, 9789819906086

This book is a collection of high-quality research work on cutting-edge technologies and the most-happening areas of com

161 98 20MB Read more

Proceedings of the 2022 12th International Conference on Environment Science and Engineering (ICESE 2022) 9819913802, 9789819913800

This book consists of selected and presented papers from the 2022 12th International Conference on Environment Science a

238 61 25MB Read more

Emerging Technologies in Computing. 5th EAI International Conference, iCETiC 2022 Chester, UK, August 15–16, 2022 Proceedings 9783031251603, 9783031251610

189 88 15MB Read more

Electrical and Computer Engineering: First International Congress, ICECENG 2022, Virtual Event, February 9–12, 2022, Proceedings (Lecture Notes of the ... and Telecommunications Engineering) 3031019830, 9783031019838

This book constitutes the refereed proceedings of the First International Congress, ICECENG 2022, held in February 2022.

114 114 25MB Read more

Knowledge Engineering and Knowledge Management: 23rd International Conference, EKAW 2022, Bolzano, Italy, September 26–29, 2022, Proceedings (Lecture Notes in Artificial Intelligence) 3031171047, 9783031171048

This book constitutes the refereed proceedings of the 23rd International Conference on Knowledge Engineering and Knowled

124 62 11MB Read more

Proceedings of 2022 7th International Conference on Environmental Engineering and Sustainable Development (CEESD 2022) (Environmental Science and Engineering) 3031281926, 9783031281921

This book provides audiences the research ideas and research achievements of authors who attended CEESD 2022. Although a

103 7 6MB Read more

Machine Intelligence and Emerging Technologies: First International Conference, MIET 2022, Noakhali, Bangladesh, September 23-25, 2022, Proceedings, ... and Telecommunications Engineering)
3031346181, 9783031346187

Author / Uploaded
Md. Shahriare Satu
Mohammad Ali Moni
M. Shamim Kaiser
Mohammad Shamsul Arefin

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Md. Shahriare Satu Mohammad Ali Moni M. Shamim Kaiser Mohammad Shamsul Arefin (Eds.)

490

Machine Intelligence and Emerging Technologies First International Conference, MIET 2022 Noakhali, Bangladesh, September 23–25, 2022 Proceedings, Part I

Part 1

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Editorial Board Members Ozgur Akan, Middle East Technical University, Ankara, Türkiye Paolo Bellavista, University of Bologna, Bologna, Italy Jiannong Cao, Hong Kong Polytechnic University, Hong Kong, China Geoffrey Coulson, Lancaster University, Lancaster, UK Falko Dressler, University of Erlangen, Erlangen, Germany Domenico Ferrari, Università Cattolica Piacenza, Piacenza, Italy Mario Gerla, UCLA, Los Angeles, USA Hisashi Kobayashi, Princeton University, Princeton, USA Sergio Palazzo, University of Catania, Catania, Italy Sartaj Sahni, University of Florida, Gainesville, USA Xuemin Shen , University of Waterloo, Waterloo, Canada Mircea Stan, University of Virginia, Charlottesville, USA Xiaohua Jia, City University of Hong Kong, Kowloon, Hong Kong Albert Y. Zomaya, University of Sydney, Sydney, Australia

490

The LNICST series publishes ICST’s conferences, symposia and workshops. LNICST reports state-of-the-art results in areas related to the scope of the Institute. The type of material published includes • Proceedings (published in time for the respective event) • Other edited monographs (such as project reports or invited volumes) LNICST topics span the following areas: • • • • • • • •

General Computer Science E-Economy E-Medicine Knowledge Management Multimedia Operations, Management and Policy Social Informatics Systems

Md. Shahriare Satu · Mohammad Ali Moni · M. Shamim Kaiser · Mohammad Shamsul Arefin Editors

Machine Intelligence and Emerging Technologies First International Conference, MIET 2022 Noakhali, Bangladesh, September 23–25, 2022 Proceedings, Part I

Editors Md. Shahriare Satu Noakhali Science and Technology University Noakhali, Bangladesh

Mohammad Ali Moni The University of Queensland St. Lucia, QLD, Australia

M. Shamim Kaiser Jahangirnagar University Dhaka, Bangladesh

Mohammad Shamsul Arefin Daffodil International University Dhaka, Bangladesh Chittagong University of Engineering and Technology Chattogram, Bangladesh

ISSN 1867-8211 ISSN 1867-822X (electronic) Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ISBN 978-3-031-34618-7 ISBN 978-3-031-34619-4 (eBook) https://doi.org/10.1007/978-3-031-34619-4 © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Machine intelligence is the practice of designing computer systems to make intelligent decisions based on context rather than direct input. It is important to understand that machine intelligence relies on a huge volume of data. On the other hand, the emerging technology can be termed as a radically novel and relatively fast-growing technology characterized by a certain degree of coherence that persists over time and with the potential to exert a significant impact on the socio-economic domain which is observed in terms of the composition of actors, institutions, and patterns of interactions among those, as well as the associated knowledge production processes. To that end, the 1st International Conference on Machine Intelligence and Emerging Technologies (MIET 2022) provided an opportunity to engage researchers, academicians, industry professionals, and experts in the multidisciplinary field and to share their cutting-edge research results gained through the application of Machine Learning, Data Science, Internet of Things, Cloud Computing, Sensing, and Security. Researchers in the relevant fields were invited to submit their original, novel, and extended unpublished works at this conference. The conference was hosted by Noakhali Science and Technology University (NSTU), Sonapur, Noakhali, 3814, Bangladesh. Support for MIET 2022 came from the IEEE Computer Society Bangladesh Chapter, Center for Natural Science and Engineering Research (CNSER), University Grant Commission (UGC) Bangladesh, Agrani Bank Limited, Mercantile Bank Limited, Union Bank Limited, Globe Pharmaceuticals Limited, EXIM Bank Limited and Janata Bank Limited. The papers presented at MIET 2022 covered theoretical and methodological frameworks for replicating applied research. These articles offer a representative cross-section of recent academic progress in the study of AI and IT’s wide-ranging applications. There are five main categories that the accepted papers cover: (1) imaging for disease detection; (2) pattern recognition and NLP; (3) biosignals and recommendation systems for well-being; (4) network, security, and nanotechnology; and (5) emerging technologies for society and industry. The five MIET 2022 streams received a total of 272 entries from writers in 12 different nations. There was one round of double-blind review performed on all articles, and each was read by at least two experts (one of whom was the managing chair). In the end, 104 full papers from authors in 9 countries were accepted for presentation at the conference after a thorough review procedure in which reports from the reviewers and the track chairs on the various articles were considered. All 104 papers that were presented physically at MIET 2022 are included in this volume of the proceedings. All of us on the MIET 2022 committee are indebted to the committee members’ tireless efforts and invaluable contributions. Without the hard work and dedication of the MIET 2022 Program Committee members in assessing the conference papers, we would not have had the fantastic program that we had. The success of MIET 2022 is due to the hard work of many people and the financial backing of our kind sponsors. We’d like to give particular thanks to Springer Nature and the rest of the Springer LNICST and EAI team for their support of

vi

Preface

our work. We’d like to extend our appreciation to the Springer and EAI teams for their tireless efforts in managing the publishing of this volume. Finally, we’d like to express our gratitude to everyone who has helped us prepare for MIET 2022 and contributed to it in any way. November 2022

Md. Shahriare Satu Mohammad Ali Moni M. Shamim Kaiser Mohammad Shamsul Arefin

Organization

International Advisory Committee Ali Dewan Amir Hussain Anirban Bandyopadhyay Anton Nijholt Chanchal K. Roy David Brown Enamul Hoque Prince Jarrod Trevathan Joarder Kamruzzaman Kanad Ray Karl Andersson Kenichi Matsumoto M. Julius Hossain Md. Atiqur Rahman Ahad Mohammad Tariqul Islam Manzur Murshed Mukesh Prasad Ning Zhong Pranab K. Muhuri Saidur Rahman Saifur Rahman Shariful Islam Stefano Vassanelli Suresh Chandra Satapathy Syed Ishtiaque Ahmed Upal Mahbub V. R. Singh Yang Yang Yoshinori Kuno Zamshed Chowdhury

Athabasca University, Canada Edinburgh Napier University, UK National Institute for Materials Science, Japan University of Twente, The Netherlands University of Saskatchewan, Canada Nottingham Trent University, UK York University, Canada Griffith University, Australia Federation University, Australia Amity University, India Luleå University of Technology, Sweden Nara Institute of Science and Technology, Japan European Molecular Biology Laboratory, Germany University of East London, UK Universiti Kebangsaan Malaysia, Malaysia Federation University, Australia University of Technology Sydney, Australia Maebashi Institute of Technology, Japan South Asian University, India Sunway University, Malaysia Virginia Tech Advanced Research Institute, USA Deakin University, Australia University of Padova, Italy KIIT Deemed to be University, India University of Toronto, Canada Qualcomm Inc., USA National Physical Laboratory, India Maebashi Institute of Technology, Japan Saitama University, Japan Intel Corporation, USA

National Advisory Committee A. B. M. Siddique Hossain A. Z. M. Touhidul Islam

AIUB, Bangladesh RU, Bangladesh

viii

Organization

Abu Sayed Md. Latiful Hoque Dibyadyuti Sarkar Hafiz Md. Hasan Babu Kaushik Deb Firoz Ahmed Kazi Muheymin-Us-Sakib M. Lutfar Rahman M. M. A. Hashem M. Rizwan Khan M. Sohel Rahman Md. Liakot Ali Md. Mahbubur Rahman Md. Sazzad Hossain Mohammod Abul Kashem Mohammad Hanif Mohammad Kaykobad Mohammad Mahfuzul Islam Mohammad Salim Hossain Mohammad Shahidur Rahman Mohammad Shorif Uddin Mohd Abdur Rashid Mozammel Huq Azad Khan Muhammad Quamruzzaman Munaz Ahmed Noor Nasim Akhtar Newaz Mohammed Bahadur S. I. Khan Subrata Kumar Aditya Suraiya Pervin Syed Akhter Hossain

BUET, Bangladesh NSTU, Bangladesh DU, Bangladesh CUET, Bangladesh NSTU, Bangladesh DU, Bangladesh DIU, Bangladesh KUET, Bangladesh UIU, Bangladesh BUET, Bangladesh BUET, Bangladesh MIST, Bangladesh UGC, Bangladesh DUET, Bangladesh NSTU, Bangladesh BRACU, Bangladesh BUET, Bangladesh NSTU, Bangladesh SUST, Bangladesh JU, Bangladesh NSTU, Bangladesh EWU, Bangladesh CUET, Bangladesh BDU, Bangladesh CSTU, Bangladesh NSTU, Bangladesh BRACU, Bangladesh SHU, Bangladesh DU, Bangladesh CUB, Bangladesh

Organizing Committee General Chairs Pietro Lió A. B. M. Shawkat Ali Nazmul Siddique

University of Cambridge, UK CQUniversity, Australia University of Ulster, UK

Organization

ix

General Co-chairs Ashadun Nobi Mohammad Ali Moni Mufti Mahmud

NSTU, Bangladesh University of Queensland, Australia Nottingham Trent University, UK

TPC Chairs Ashikur Rahman Khan M. Shamim Kaiser Mohammad Shamsul Arefin

NSTU, Bangladesh JU, Bangladesh DIU, Bangladesh

TPC Co-chairs Md. Amran Hossain Bhuiyan Md. Zahidul Islam

NSTU, Bangladesh IU, Bangladesh

Track Chairs Fateha Khanam Bappee M. Shamim Kaiser Nazia Majadi Rashed Mustafa Moahammad Jamshed Patwary Md. Atiqur Rahman Ahad Yusuf Sulistyo Nugroho Mohammad Ali Moni Md. Amran Hossain Bhuiyan Firoz Mridha Ashik Iftekher Mohammad Shamsul Arefin Syful Islam

NSTU, Bangladesh JU, Bangladesh NSTU, Bangladesh CU, Bangladesh IIUC, Bangladesh University of East London, UK Universitas Muhammadiyah Surakarta, Indonesia University of Queensland, Australia NSTU, Bangladesh AIUB, Bangladesh Nikon Corporation, Japan DIU, Bangladesh NSTU, Bangladesh

General Secretary S. M. Mahabubur Rahman

NSTU, Bangladesh

Joint Secretaries Md. Auhidur Rahman Md. Iftekharul Alam Efat Mimma Tabassum

NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh

x

Organization

Technical Secretary Koushik Chandra Howlader

North Dakota State University, USA

Organizing Chair Md. Shahriare Satu

NSTU, Bangladesh

Organizing Co-chairs A. Q. M. Salauddin Pathan Md. Abidur Rahman

NSTU, Bangladesh NSTU, Bangladesh

Finance Subcommittee Main Uddin Md. Javed Hossain Md. Omar Faruk Md. Shahriare Satu

NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh

Keynote Selection Subcommittee A. R. M. Mahamudul Hasan Rana NSTU, Bangladesh Nazia Majadi NSTU, Bangladesh Special Session Chairs Md. Kamal Uddin Zyed-Us-Salehin

NSTU, Bangladesh NSTU, Bangladesh

Tutorials Chairs Dipanita Saha Sultana Jahan Soheli

NSTU, Bangladesh NSTU, Bangladesh

Panels Chair Falguny Roy

NSTU, Bangladesh

Workshops Chair Nishu Nath

NSTU, Bangladesh

Organization

Proceeding Publication Committee Md. Shahriare Satu Mohammad Ali Moni M. Shamim Kaiser Mohammad Shamsul Arefin

NSTU, Bangladesh University of Queensland, Australia JU, Bangladesh DIU, Bangladesh

Industrial Session Chairs Apurba Adhikhary Tanvir Zaman Khan

NSTU, Bangladesh NSTU, Bangladesh

Project and Exhibition Chairs Dipok Chandra Das Rutnadip Kuri

NSTU, Bangladesh NSTU, Bangladesh

Publicity Chair Muhammad Abdus Salam

NSTU, Bangladesh

Kit and Registration Chairs K. M. Aslam Uddin Md. Bipul Hossain

NSTU, Bangladesh NSTU, Bangladesh

Venue Preparation Subcommittee Md. Habibur Rahman Md. Shohel Rana Md. Hasnat Riaz

NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh

Accommodation and Food Management Subcommittee Kamruzaman Md. Mamun Mia Subrata Bhowmik

NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh

Public Relation Chairs Md. Al-Amin Tasniya Ahmed

NSTU, Bangladesh NSTU, Bangladesh

xi

xii

Organization

Award Chair Md. Abul Kalam Azad

NSTU, Bangladesh

International Guest Management Subcommittee Iftakhar Parvez Tonmoy Dey Trisha Saha

NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh

Web Masters Md. Jane Alam Adnan Rahat Uddin Azad

NSTU, Bangladesh NSTU, Bangladesh

Graphics Designers Mohit Sarkar Shamsun Nahar Needhe

NSTU, Bangladesh Hezhou University, China

Technical Program Committee A. K. M. Mahbubur Rahman A. S. M. Sanwar Hosen A. A. Mamun A. F. M. Rashidul Hasan Abdul Kader Muhammad Masum Abdul Kaium Masud Abdullah Nahid Abdur Rahman Bin Shahid Abdur Rouf A. B. M. Aowlad Hossain Adnan Anwar Ahmed Imteaj Ahmed Wasif Reza Ahsanur Rahman Alessandra Pedrocchi Alex Ng Anindya Das Antar Anirban Bandyopadhyay Antesar Shabut

IUB, Bangladesh JNU, South Korea JU, Bangladesh RU, Bangladesh IIUC, Bangladesh NSTU, Bangladesh KU, Bangladesh Concord University, USA DUET, Bangladesh KUET, Bangladesh Deakin University, Australia Florida International University, USA EWU, Bangladesh NSU, Bangladesh Politecnico di Milano, Italy La Trobe University, Australia University of Michigan, USA NIMS, Japan Leeds Trinity University, UK

Organization

Antony Lam Anup Majumder Anupam Kumar Bairagi Arif Ahmad Asif Nashiry A. S. M. Kayes Atik Mahabub Aye Su Phyo Azizur Rahman Babul Islam Banani Roy Belayat Hossain Boshir Ahmed Chandan Kumar Karmakar Cosimo Ieracitano Cris Calude Derong Liu Dewan Md. Farid Dipankar Das Duong Minh Quan Eleni Vasilaki Emanuele Ogliari Enamul Hoque Prince Ezharul Islam Farah Deeba Fateha Khanam Bappee Francesco Carlo Morabito Gabriela Nicoleta Sava Giancarlo Ferregno Golam Dastoger Bashar H. Liu Habibur Rahman Hishato Fukuda Imtiaz Mahmud Indika Kumara Iqbal Hasan Sarkar Joarder Kamruzzaman John H. L. Hansen Jonathan Mappelli

xiii

Mercari Inc., Japan JU, Bangladesh KU, Bangladesh SUST, Bangladesh JUST, Bangladesh La Trobe University, Australia Concordia University, Canada Computer University Kalay, Myanmar City University of London, UK RU, Bangladesh University of Saskatchewan, Canada Loughborough University, UK RUET, Bangladesh Deakin University, Australia University Mediterranea of Reggio Calabria, Italy University of Auckland, New Zealand University of Illinois at Chicago, USA UIU, Bangladesh RU, Bangladesh University of Da Nang, Vietnam University of Sheffield, UK Politechnico di Milano, Italy York University, Canada JU, Bangladesh DUET, Bangladesh NSTU, Bangladesh Mediterranean University of Reggio Calabria, Italy University POLITEHNICA of Bucharest, Romania Politechnico di Milano, Italy Boise State University, USA Wayne State University, USA IU, Bangladesh Saitama University, Japan Kyungpook National University, South Korea Jheronimus Academy of Data Science, The Netherlands CUET, Bangladesh Federation University, Australia University of Texas at Dallas, USA University of Modena, Italy

xiv

Organization

Joyprokash Chakrabartty Kamruddin Md. Nur Kamrul Hasan Talukder Kawsar Ahmed K. C. Santosh Khan Iftekharuddin Khondaker Abdullah-Al-Mamun Khoo Bee Ee Lamia Iftekhar Linta Islam Lu Cao Luca Benini Luca Berdondini Luciano Gamberini M. Tanseer Ali M. Firoz Mridha M. Julius Hossain M. M. Azizur Rahman M. Tariqul Islam Mahfuzul Hoq Chowdhury Mahmudul Kabir Manjunath Aradhya Manohar Das Marzia Hoque Tania Md. Badrul Alam Miah Md. Faruk Hossain Md. Fazlul Kader Md. Manirul Islam Md. Saiful Islam Md. Sanaul Haque Md. Shirajum Munir Md. Whaiduzzaman Md. Abdul Awal Md. Abdur Razzak Md. Abu Layek Md. Ahsan Habib Md. Al Mamun Md. Amzad Hossain Md. Golam Rashed Md. Hanif Seddiqui Md. Hasanul Kabir Md. Hasanuzzaman

CUET, Bangladesh AIUB, Bangladesh KU, Bangladesh University of Saskatchewan, Canada University of South Dakota, USA Old Dominion University, USA UIU, Bangladesh Universiti Sains Malaysia, Malaysia NSU, Bangladesh Jagannath University, Bangladesh Saitama University, Japan ETH, Switzerland IIT, Italy University of Padova, Italy AIUB, Bangladesh AIUB, Bangladesh EMBL, Germany Grand Valley State University, USA Universiti Kebangsaan, Malaysia CUET, Bangladesh Akita University, Japan JSS S&T University, India Oakland University, USA University of Oxford, UK UPM, Malaysia RUET, Bangladesh CU, Bangladesh AIUB, Bangladesh CUET, Bangladesh University of Oulu, Finland Kyung Hee University, South Korea Queensland University of Technology, Australia KU, Bangladesh IUB, Bangladesh Jagannath University, Bangladesh MBSTU, Bangladesh RUET, Bangladesh NSTU, Bangladesh RU, Bangladesh CU, Bangladesh IUT, Bangladesh DU, Bangladesh

Organization

Md. Kamal Uddin Md. Mahfuzur Rahman Md. Murad Hossain Md. Nurul Islam Khan Md. Obaidur Rahman Md. Raju Ahmed Md. Rakibul Hoque Md. Saiful Islam Md. Sanaul Rabbi Md. Shamim Ahsan Md. Shamim Akhter Md. Sipon Miah Md. Ziaul Haque Mehdi Hasan Chowdhury Michele Magno Milon Biswas Min Jiang Mohammad Abu Yousuf Mohammad Hammoudeh Mohammad Mehedi Hassan Mohammad Motiur Rahman Mohammad Nurul Huda Mohammad Osiur Rahman Mohammad Zoynul Abedin Mohiuddin Ahmed Monirul Islam Sharif Monjurul Islam Muhammad Mahbub Alam Muhammed J. Alam Patwary Nabeel Mohammed Nahida Akter Nashid Alam Nasfikur Rahman Khan Nelishia Pillay Nihad Karim Chowdhury Nilanjan Dey Noushath Shaffi Nur Mohammad Nursadul Mamun Omaru Maruatona Omprakash Kaiwartya Osman Ali

xv

NSTU, Bangladesh Queensland University of Technology, Australia University of Turin, Italy BUET, Bangladesh DUET, Bangladesh DUET, Bangladesh DU, Bangladesh Griffith University, Australia CUET, Bangladesh KU, Bangladesh SUB, Bangladesh IU, Bangladesh NSTU, Bangladesh City University of Hong Kong, China ETH, Switzerland University of Alabama at Birmingham, USA Xiamen University, China JU, Bangladesh Manchester Metropolitan University, UK King Saud University, KSA MBSTU, Bangladesh UIU, Bangladesh CU, Bangladesh Teesside University, UK Edith Cowan University, Australia Google, USA Canberra Institute of Technology, Australia IUT, Bangladesh IIUC, Bangladesh NSU, Bangladesh UNSW, Australia Aberystwyth University, UK University of South Alabama, USA University of Pretoria, South Africa CU, Bangladesh JIS University, India College of Applied Sciences, Oman CUET, Bangladesh University of Texas at Dallas, USA Aiculus Pty Ltd, Australia Nottingham Trent University, UK NSTU, Bangladesh

xvi

Organization

Paolo Massobrio Partha Chakraborty Paul Watters Phalguni Gupta Pranab Kumar Dhar Rahma Mukta Ralf Zeitler Ramani Kannan Rameswar Debnath Rashed Mustafa Risala Tasin Khan Rokan Uddin Faruqui Roland Thewes Ryote Suzuki S. M. Rafizul Haque S. M. Riazul Islam S. M. Abdur Razzak Saiful Azad Saifur Rahman Sajal Halder Sajib Chakraborty Sajjad Waheed Samrat Kumar Dey Sayed Asaduzzaman Sayed Mohsin Reza Sazzadur Rahman Shafkat Kibria Shahidul Islam Khan Shahriar Badsha Shamim A. Mamun Shanto Roy Sharmin Majumder Silvestro Micera Surapong Uttama Syed Md. Galib Syful Islam Tabin Hassan Tamal Adhikary Tarique Anwar Tauhidul Alam Tawfik Al-Hadhrami Themis Prodomakis

University of Genova, Italy Cumilla University, Bangladesh La Trobe University, Australia IIT Kanpur, India CUET, Bangladesh UNSW, Australia Venneos GmbH, Germany Universiti Teknologi PETRONAS, Malaysia KU, Bangladesh CU, Bangladesh JU, Bangladesh CU, Bangladesh Technical University of Berlin, Germany Saitama University, Japan Canadian Food Inspection Agency, Canada Sejong University, South Korea RUET, Bangladesh Universiti Malaysia Pahang, Malaysia University of North Dakota, USA RMIT, Australia Vrije Universiteit Brussel, Belgium MBSTU, Bangladesh BOU, Bangladesh University of North Dakota, USA University of Texas at El Paso, USA JU, Bangladesh SUST, Bangladesh IIUC, Bangladesh University of Nevada, USA JU, Bangladesh University of Houston, USA Texas A&M University, USA Scuola Superiore Sant’Anna, Italy Mae Fah Luang University, Thailand JUST, Bangladesh NSTU, Bangladesh AIUB, Bangladesh University of Waterloo, Canada Macquarie University, Australia LSU Shreveport University, USA Nottingham Trent University, UK University of Southampton, UK

Organization

Thompson Stephan Tianhua Chen Tingwen Huang Tomonori Hashiyama Touhid Bhuiyan Tushar Kanti Saha Wladyslaw Homenda Wolfgang Maas Yasin Kabir Yusuf Sulistyo Nugroho Zubair Fadlullah

xvii

M. S. Ramaiah University of Applied Sciences, India University of Huddersfield, UK Texas A&M University, Qatar University of Electro-Communications, Japan DIU, Bangladesh JKKNIU, Bangladesh Warsaw University of Technology, Poland Technische Universität Graz, Austria Missouri University of Science and Technology, USA Universitas Muhammadiyah Surakarta, Indonesia Lakehead University, Canada

Contents – Part I

Imaging for Disease Detection Potato-Net: Classifying Potato Leaf Diseases Using Transfer Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abu Kowshir Bitto, Md.Hasan Imam Bijoy, Aka Das, Md.Ashikur Rahman, and Masud Rabbani False Smut Disease Detection in Paddy Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nahid Hasan, Tanzila Hasan, Shahadat Hossain, and Md. Manzurul Hasan

3

15

Gabor Wavelet Based Fused Texture Features for Identification of Mungbean Leaf Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarna Majumder, Badhan Mazumder, and S. M. Taohidul Islam

22

Potato Disease Detection Using Convolutional Neural Network: A Web Based Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jannathul Maowa Hasi and Mohammad Osiur Rahman

35

Device-Friendly Guava Fruit and Leaf Disease Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rabindra Nath Nandi, Aminul Haque Palash, Nazmul Siddique, and Mohammed Golam Zilani Cassava Leaf Disease Classification Using Supervised Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adit Ishraq, Sayefa Arafah, Sadiya Akter Mim, Nusrat Jahan Shammey, Firoz Mridha, and Md. Saifur Rahman

49

60

Diabetes Mellitus Prediction Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . Md Ifraham Iqbal, Ahmed Shabab Noor, and Ahmed Rafi Hasan

72

An Improved Heart Disease Prediction Using Stacked Ensemble Method . . . . . . Md. Maidul Islam, Tanzina Nasrin Tania, Sharmin Akter, and Kazi Hassan Shakib

84

Improved and Intelligent Heart Disease Prediction System Using Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nusrat Alam, Samiul Alam, Farzana Tasnim, and Sanjida Sharmin

98

xx

Contents – Part I

PreCKD_ML: Machine Learning Based Development of Prediction Model for Chronic Kidney Disease and Identify Significant Risk Factors . . . . . . 109 Md. Rajib Mia, Md. Ashikur Rahman, Md. Mamun Ali, Kawsar Ahmed, Francis M. Bui, and S M Hasan Mahmud A Reliable and Efficient Transfer Learning Approach for Identifying COVID-19 Pneumonia from Chest X-ray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Sharmeen Jahan Seema and Mosabber Uddin Ahmed Infection Segmentation from COVID-19 Chest CT Scans with Dilated CBAM U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Tareque Bashar Ovi, Md. Jawad-Ul Kabir Chowdhury, Shaira Senjuti Oyshee, and Mubdiul Islam Rizu Convolutional Neural Network Model to Detect COVID-19 Patients Utilizing Chest X-Ray Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Md. Shahriare Satu, Khair Ahammed, Mohammad Zoynul Abedin, Md. Auhidur Rahman, Sheikh Mohammed Shariful Islam, A. K. M. Azad, Salem A. Alyami, and Mohammad Ali Moni Classification of Tumor Cell Using a Naive Convolutional Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Debashis Gupta, Syed Rahat Hassan, Renu Gupta, Urmi Saha, and Mohammed Sowket Ali Tumor-TL: A Transfer Learning Approach for Classifying Brain Tumors from MRI Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Abu Kowshir Bitto, Md. Hasan Imam Bijoy, Sabina Yesmin, and Md. Jueal Mia Deep Convolutional Comparison Architecture for Breast Cancer Binary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Nasim Ahmed Roni, Md. Shazzad Hossain, Musarrat Bintay Hossain, Md. Iftekharul Alam Efat, and Mohammad Abu Yousuf Lung Cancer Detection from Histopathological Images Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Rahul Deb Mohalder, Khandkar Asif Hossain, Juliet Polok Sarkar, Laboni Paul, M. Raihan, and Kamrul Hasan Talukder Brain Tumor Detection Using Deep Network EfficientNet-B0 . . . . . . . . . . . . . . . . 213 Mosaddeq Hossain and Md. Abdur Rahman

Contents – Part I

xxi

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures . . . . . . 226 Tania Ferdousey Promy, Nadia Islam Joya, Tasfia Haque Turna, Zinia Nawrin Sukhi, Faisal Bin Ashraf, and Jia Uddin Transfer Learning Based Skin Cancer Classification Using GoogLeNet . . . . . . . . 238 Sourav Barman, Md Raju Biswas, Sultana Marjan, Nazmun Nahar, Mohammad Shahadat Hossain, and Karl Andersson Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Prosenjit Karmaker and Muhammad Sajjadur Rahim MRI Based Automated Detection of Brain Tumor Using DWT, GLCM, PCA, Ensemble of SVM and PNN in Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Md. Sakib Ahmed, Sajib Hossain, Md. Nazmul Haque, M. M. Mahbubul Syeed, D. M. Saaduzzaman, Md. Hasan Maruf, and A. S. M. Shihavuddin Pattern Recognition and Natural Language Processing Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Ferdib-Al-Islam, Nusrat Jahan, Farjana Yeasmin Rupa, Suprio Sarkar, Sifat Hossain, and Sk. Shalauddin Kabir Fake Profile Detection Using Image Processing and Machine Learning . . . . . . . . 294 Shuva Sen, Mohammad Intisarul Islam, Samiha Sofrana Azim, and Muhammad Iqbal Hossain A Novel Texture Descriptor Evaluation Window Based Adjacent Distance Local Binary Pattern (EADLBP) for Image Classification . . . . . . . . . . . . . . . . . . . 309 Most. Maria Akter Misti, Sajal Mondal, Md Anwarul Islam Abir, and Md Zahidul Islam Bornomala: A Deep Learning-Based Bangla Image Captioning Technique . . . . . 318 Jannatul Naim, Md. Bipul Hossain, and Apurba Adhikary Traffic Sign Detection and Recognition Using Deep Learning Approach . . . . . . . 331 Umma Saima Rahman and Maruf A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Ovishake Sen, Pias Roy, and Al-Mahmud

xxii

Contents – Part I

Bangla Speech-Based Person Identification Using LSTM Networks . . . . . . . . . . . 358 Rahad Khan, Saddam Hossain, Akbor Hossain, Fazlul Hasan Siddiqui, and Sabah Binte Noor VADER vs. BERT: A Comparative Performance Analysis for Sentiment on Coronavirus Outbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Subrata Saha, Md. Imran Hossain Showrov, Md. Motinur Rahman, and Md. Ziaul Hasan Majumder Aspect Based Sentiment Analysis of COVID-19 Tweets Using Blending Ensemble of Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Khandaker Tayef Shahriar, Md Musfique Anwar, and Iqbal H. Sarker Covid-19 Vaccine Sentiment Detection and Analysis Using Machine Learning Technique and NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Abdullah Al Maruf, Md. Nur Hossain Biplob, and Fahima Khanam Sentiment Analysis of Tweets on Covid Vaccine (Pfizer): A Boosting-Based Machine Learning Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Promila Haque, Rahatul Jannat Fariha, Israt Yousuf Nishat, and Mohammed Nazim Uddin Matching Job Circular with Resume Using Different Natural Language Processing Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 S. M. Shawal Chowdhury, Mrithika Chowdhury, and Arifa Sultana Transformer-Based Text Clustering for Newspaper Articles . . . . . . . . . . . . . . . . . . 443 Sumona Yeasmin, Nazia Afrin, and Mohammad Rezwanul Huq Bangla to English Translation Using Sequence to Sequence Learning Model Based Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 Rafiqul Islam, Mehedi Hasan, Mamunur Rashid, and Rabea Khatun Bangla Spelling Error Detection and Correction Using N-Gram Model . . . . . . . . 468 Promita Bagchi, Mursalin Arafin, Aysha Akther, and Kazi Masudul Alam Bidirectional Long-Short Term Memory with Byte Pair Encoding and Back Translation for Bangla-English Machine Translation . . . . . . . . . . . . . . . 483 Md. Tasnin Tanvir, Asfia Moon Oishy, M. A. H. Akhand, and Nazmul Siddique Face Recognition-Based Mass Attendance Using YOLOv5 and ArcFace . . . . . . 496 Omar Faruque, Fazlul Hasan Siddiqui, and Sabah Binte Noor

Contents – Part I

xxiii

A Hybrid Watermarking Technique Based on LH-HL Subbands of DWT and SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Fauzia Yasmeen, Mahbuba Begum, and Mohammad Shorif Uddin A Smartphone Based Real-Time Object Recognition System for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Md. Atikur Rahman, Kazi Md. Rokibul Alam, and Muhammad Sheikh Sadi Bangla Speech Emotion Recognition Using 3D CNN Bi-LSTM Model . . . . . . . . 539 Md. Riadul Islam, M. A. H. Akhand, and Md Abdus Samad Kamal An RNN Based Approach to Predict Next Word in Bangla Language . . . . . . . . . 551 Asif Mahmud, Md. Nazmul Hasan Rony, Deba Dip Bhowmik, Ratnadip Kuri, and A. R. M. Mahmudul Hasan Rana Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567

Contents – Part II

Bio Signals and Recommendation Systems for Wellbeing Diagnosis and Classification of Fetal Health Based on CTG Data Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Monirul Islam, Md. Rokunojjaman, Al Amin, Md. Nasim Akhtar, and Iqbal H. Sarker Epileptic Seizure Prediction Using Bandpass Filtering and Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nabiha Mustaqeem, Tasnia Rahman, Jannatul Ferdous Binta Kalam Priyo, Mohammad Zavid Parvez, and Tanvir Ahmed Autism Spectrum Disorder Detection from EEG Through Hjorth Parameters and Classification Using Neural Network . . . . . . . . . . . . . . . . . . . . . . . Zahrul Jannat Peya, Bipasha Zaman, M. A. H. Akhand, and Nazmul Siddique A Review on Heart Diseases Prediction Using Artificial Intelligence . . . . . . . . . . Rehnuma Hasnat, Abdullah Al Mamun, Ahmmad Musha, and Anik Tahabilder Machine Learning Models to Identify Discriminatory Factors of Diabetes Subtypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shahriar Hassan, Tania Akter, Farzana Tasnim, and Md. Karam Newaz Analysis of Hand Movement from Surface EMG Signals Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. A. Ahsan Rajon, Mahmudul Hasan Abid, Niloy Sikder, Kamrul Hasan Talukder, Md. Mizanur Rahman, Md. Shamim Ahsan, Abu Shamim Mohammad Arif, and Abdullah-Al Nahid Design and Implementation of a Drowsiness Detection System Up to Extended Head Angle Using FaceMesh Machine Learning Solution . . . . . . . . Jafirul Islam Jewel, Md. Mahabub Hossain, and Md. Dulal Haque Fuzziness Based Semi-supervised Deep Learning for Multimodal Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abeda Asma, Dilshad Noor Mostafa, Koli Akter, Mufti Mahmud, and Muhammed J. A. Patwary

3

17

31

41

55

68

79

91

xxvi

Contents – Part II

Human Emotion Recognition from Facial Images Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Saima Sultana, Rashed Mustafa, and Mohammad Sanaullah Chowdhury Emotion Recognition from Brain Wave Using Multitask Machine Learning Leveraging Residual Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Rumman Ahmed Prodhan, Sumya Akter, Muhammad Bin Mujib, Md. Akhtaruzzaman Adnan, and Tanmoy Sarkar Pias Emotion Recognition from EEG Using Mutual Information Based Feature Map and CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Mahfuza Akter Maria, A. B. M. Aowlad Hossain, and M. A. H. Akhand A Machine Learning-Based System to Recommend Appropriate Military Training Program for a Soldier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Md Tauhidur Rahman, Raquib Hasan Dewan, Md Abdur Razzak, Sumaiya Nuha Mustafina, and Muhammad Nazrul Islam Integrated Music Recommendation System Using Collaborative and Content Based Filtering, and Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . 162 Arafat Bin Hossain, Wordh Ul Hasan, Kimia Tuz Zaman, and Koushik Howlader A Clustering Based Niching Method for Effectively Solving the 0-1 Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Md. Meheruzzaman Sarker, Md. Jakirul Islam, and Md. Zakir Hossain Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset Based on Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan, and Hasan Mahmud The Impact of Data Locality on the Performance of Cluster-Based Under-Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Ahmed Shabab Noor, Muhib Al Hasan, Ahmed Rafi Hasan, Rezab Ud Dawla, Afsana Airin, Akib Zaman, and Dewan Md. Farid An Analysis of Islamic Inheritance System Under Object-Oriented Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 A. H. M. Sajedul Hoque, Sadia Tabassum, Rashed Mustafa, Mohammad Sanaullah Chowdhury, and Mohammad Osiur Rahman Can Transformer Models Effectively Detect Software Aspects in StackOverflow Discussion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Nibir Chandra Mandal, Tashreef Muhammad, and G. M. Shahariar

Contents – Part II

xxvii

An Empirical Study on How the Developers Discussed About Pandas Topics . . . 242 Sajib Kumar Saha Joy, Farzad Ahmed, Al Hasib Mahamud, and Nibir Chandra Mandal BSDRM: A Machine Learning Based Bug Triaging Model to Recommend Developer Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 K. M. Aslam Uddin, Md. Kowsher, and Kazi Sakib A Belief Rule Based Expert System to Diagnose Schizophrenia Using Whole Blood DNA Methylation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Mohammad Shahadat Hossain, Mumtahina Ahmed, S. M. Shafkat Raihan, Angel Sharma, Raihan Ul Islam, and Karl Andersson Network, Security and Nanotechnology Reactive and Proactive Routing Protocols Performance Evaluation for MANETS Using OPNET Modeler Simulation Tools . . . . . . . . . . . . . . . . . . . . . 285 Mala Rani Barman, Dulal Chakraborty, and Jugal Krishna Das A Novel MIMO Antenna for 6G Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Umor Fasal, Md. Kamrul Hasan, Ayubali, Md. Emdadul Hoque Bhuiyan, Abu Zafar Md. Imran, Md. Razu Ahmed, and Ferose Khan Modification of Link Speed Estimation Model for IEEE 802.11ac WLANs by Considering Shadowing Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Mohammed Aman Ullah Aman and Sumon Kumar Debnath Electromagnetic Absorption Analysis of 5G Wireless Devices for Different Electromagnetic Shielding Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Abdullah Al Imtiaz, Md. Saifur Rahman, Tanveer Ahsan, Mohammed Shamsul Alam, Abdul Kader Mohammad Masum, and Touhidul Alam ToothHack: An Investigation on a Bluetooth Dongle to Implement a Low-Cost and Dynamic Wireless Control-Signal Transmission System . . . . . . 325 Md. S. Shantonu, Imran Chowdhury, Taslim Ahmed, Al Imtiaz, and Md. Rokonuzzaman Robustness of Eigenvalue-Spread Based Rule of Combination in Dynamic Networked System with Link Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Miss. Nargis Parvin, Md. Saifur Rahman, Md. Tofael Ahmed, and Maqsudur Rahman

xxviii

Contents – Part II

Blockchain Based Services in Education: A Bibliometric Analysis . . . . . . . . . . . . 348 Md. Shariar Hossain and A. K. M. Bahalul Haque An Approach Towards Minimizing Covid-19 Situation Using Android App and Drone-Based Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Robi Paul, Junayed Bin Nazir, and Arif Ahammad IoT and ML Based Approach for Highway Monitoring and Streetlamp Controlling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Mushfiqur Rahman, Md. Faridul Islam Suny, Jerin Tasnim, Md. Sabab Zulfiker, Mohammad Jahangir Alam, and Tajim Md. Niamat Ullah Akhund Cyber-Attack Detection Through Ensemble-Based Machine Learning Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Mohammad Amaz Uddin, Khandaker Tayef Shahriar, Md. Mokammel Haque, and Iqbal H. Sarker A Stacked Ensemble Spyware Detection Model Using Hyper-Parameter Tuned Tree Based Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Nowshin Tasnim, Md. Musfique Anwar, and Iqbal H. Sarker IoT Based Framework for Remote Patient Monitoring . . . . . . . . . . . . . . . . . . . . . . 409 Ifti Akib Abir, Sazzad Hossain Rafi, and Mosabber Uddin Ahmed Block-chain Aided Cluster Based Logistic Network for Food Supply Chain . . . . 422 Rahat Uddin Azad, Khair Ahammed, Muhammad Abdus Salam, and Md. Ifthekarul Alam Efat Programmable Logic Array in Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . 435 Fatema Akter, Tamanna Tabassum, and Mohammed Nasir Uddin QPROM: Quantum Nanotechnology for Data Storage Using Programmable Read Only Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Tamanna Tabassum, Fatema Akter, and Mohammed Nasir Uddin Analytical Modeling of Multi-junction Solar Cell Using SiSn Alloy . . . . . . . . . . 460 Tanber Hasan Shemanto and Lubaba Binte Billah Design and Fabrication of a Low-Cost Customizable Modern CNC Laser Cutter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Radif Uddin Ahmed, Mst. Nusrat Yasmin, Avishek Das, and Syed Masrur Ahmmad

Contents – Part II

xxix

Hole Transport Layer Free Non-toxic Perovskite Solar Cell Using ZnSe Electron Transport Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 Rukon Uddin, Subrata Bhowmik, Md. Eyakub Ali, and Sayem Ul Alam A Novel ADI Based Method for Model Reduction of Discrete-Time Index 2 Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Mohammad-Sahadet Hossain, Atia Afroz, Oshin Mumtaha, and Musannan Hossain Emerging Technologies for Society and Industry Prevalence of Stroke in Rural Bangladesh: A Population Based Study . . . . . . . . . 515 Md. Mashiar Rahman, Rony Chowdhury Ripan, Farhana Sarker, Moinul H. Chowdhury, A. K. M. Nazmul Islam, and Khondaker A. Mamun Segmented-Truncated-SVD for Effective Feature Extraction in Hyperspectral Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Md. Moshiur Rahman, Shabbir Ahmed, Md. Shahriar Haque, Md. Abu Marjan, Masud Ibn Afjal, and Md. Palash Uddin Effective Feature Extraction via Folded-Sparse-PCA for Hyperspectral Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 Md. Hasanul Bari, Tanver Ahmed, Masud Ibn Afjal, Adiba Mahjabin Nitu, Md. Palash Uddin, and Md. Abu Marjan Segmented-Incremental-PCA for Hyperspectral Image Classification . . . . . . . . . 550 Shabbir Ahmed, Md. Moshiur Rahman, Md. Shahriar Haque, Md. Abu Marjan, Md. Palash Uddin, and Masud Ibn Afjal Spectral–Spatial Feature Reduction for Hyperspectral Image Classification . . . . 564 Md. Touhid Islam, Mohadeb Kumar, and Md. Rashedul Islam Predicting the Risk of COVID-19 Infection Using Lifestyle Data . . . . . . . . . . . . . 578 Nafiz Fuad Siam, Mahira Tabassum Khan, M. R. Rownak, Md. Rejaben Jamin Juel, and Ashraf Uddin Forecasting Dengue Incidence in Bangladesh Using Seasonal ARIMA Model, a Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Nur Mohammed and Md. Zahidur Rahman The Impact of Social and Economic Indicators on Infant Mortality Rate in Bangladesh: A Vector Error Correction Model (VECM) Approach . . . . . . . . . 599 Muhmmad Mohsinul Hoque and Md. Shohel Rana

xxx

Contents – Part II

Machine Learning Approaches to Predict Movie Success . . . . . . . . . . . . . . . . . . . . 613 Md. Afzazul Hoque and Md. Mohsin Khan Structure of Global Financial Networks Before and During COVID-19 Based on Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 Sheikh Shadia Hassan, Mahmudul Islam Rakib, Kamrul Hasan Tuhin, and Ashadun Nobi Employee Attrition Analysis Using CatBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644 Md. Monir Ahammod Bin Atique, Md. Nesarul Hoque, and Md. Jamal Uddin Readiness Towards Industry 4.0 of Selected Industrial Sector . . . . . . . . . . . . . . . . 659 Choudhury Abul Anam Rashed, Mst. Nasima Bagum, and Mahfuzul Haque Estimating Energy Expenditure of Push-Up Exercise in Real Time Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 Md. Shoreef Uddin, Sadman Saumik Islam, and M. M. Musharaf Hussain Cross-Layer Architecture for Energy Optimization of Edge Computing . . . . . . . . 687 Rushali Sharif Uddin, Nusaiba Zaman Manifa, Latin Chakma, and Md. Motaharul Islam Energy Consumption Issues of a Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702 Nabila Islam, Lubaba Alam Chhoa, and Ahmed Wasif Reza Trade-Offs of Improper E-waste Recycling: An Empirical Study . . . . . . . . . . . . . 715 Md Shamsur Rahman Talukdar, Marwa Khanom Nurtaj, Md Nahid Hasan, Aysha Siddeka, Ahmed Wasif Reza, and Mohammad Shamsul Arefin A Hybrid Cloud System for Power-Efficient Cloud Computing . . . . . . . . . . . . . . . 730 S. M. Mursalin, Md. Abdul Kader Jilani, and Ahmed Wasif Reza A Sustainable E-Waste Management System for Bangladesh . . . . . . . . . . . . . . . . . 739 Md. Shahadat Anik Sheikh, Rashik Buksh Rafsan, Hasib Ar Rafiul Fahim, Md. Tabib Khan, and Ahmed Wasif Reza Machine Learning Algorithms on COVID-19 Prediction Using CpG Island and AT-CG Feature on Human Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . 754 Md. Motaleb Hossen Manik, Md.Ahsan Habib, and Tanim Ahmed

Contents – Part II

xxxi

Statistical and Bioinformatics Model to Identify the Influential Genes and Comorbidities of Glioblastoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 Nitun Kumar Podder and Pintu Chandra Shill Protein Folding Optimization Using Butterfly Optimization Algorithm . . . . . . . . 775 Md. Sowad Karim, Sajib Chatterjee, Ashis Hira, Tarin Islam, and Rezanul Islam Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789

Imaging for Disease Detection

Potato-Net: Classifying Potato Leaf Diseases Using Transfer Learning Approach Abu Kowshir Bitto1 , Md.Hasan Imam Bijoy2,3(B) , Aka Das2,4 Md.Ashikur Rahman1 , and Masud Rabbani2,3

,

1 Department of Software Engineering, Daffodil International University, Dhaka 1341,

Bangladesh [email protected] 2 Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh {hasan15-11743,masud.cse}@diu.edu.bd 3 Daffodil International University, Dhaka 1341, Bangladesh 4 Premier University, Chattogram 4203, Bangladesh

Abstract. Research on pertinent topics is more important than ever for the longterm development of agriculture, given the advancements in contemporary farming and use of artificial intelligence (AI) for identifying crop illnesses. There are numerous diseases, and they all significantly affect the amount and quality of potatoes. Early and automated detection of these illnesses during the budding phase can assist increase the output of potato crops, but it requires a high level of ability. Several models have already been created to identify various plant diseases. In this study, we use a variety of convolutional neural network designs to recognize potato leaf disease and assess their early detection accuracy against that of other researchers’ work. The learning sample for our algorithm included both the original and enhanced photos, as a learning option. The model was then evaluated to ensure that it was accurate. After being trained on the dataset for the potato leaf disease using the Inception-v3, Xception, and ResNet50 models, the model’s performance was evaluated using test images. ResNet50 has the highest accuracy and lowest error rate for detecting potato leaf disease, followed by Inception-v3 with an accuracy of nighty four point two five percent (94.25%) and Xception with an accuracy of eighty-nine point seven one percent (89.71%). Keywords: Potato · Leaf Disease · Inception-V3 · Xception · ResNet50 · Transfer Learning

1 Introduction The potato is a popular vegetable in practically any nation on the planet [1]. It is a readily accessible and affordable vegetable. Therefore, people use this, and farmers grow this vegetable plentifully. Planting potatoes is 4th according to production. After rice, grains, and corn, potato is the only vegetable that farmers grow plentifully. We can © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 3–14, 2023. https://doi.org/10.1007/978-3-031-34619-4_1

4

A. K. Bitto et al.

also find the farmers growing potatoes everywhere in Bangladesh also. Potato grows considerably in some districts in Bangladesh of nice weather and good marketing. We mainly use potatoes as a vegetable because it is very delicious themselves and with any other vegetables also. Potatoes can be eaten as an alternative to starch because it’s a source of starch or carbohydrate. It is taken as the main meal in forty countries, including Bangladesh. Potato is a short-term but highly yielded crop that can increase the food production of Bangladesh. In Bangladesh, farmers grow only 11 tons of potatoes in every hector. But it can be increased to 20 tons. If people started taking potatoes as an alternative to rice, the pressure on agriculture could be decreased a lot. At least we can take potatoes sometime between February to June to lessen the anxiety from grain. Farmers should choose a good kind of potato to cultivate a good amount of potato. The advantage of a good category is we can store potatoes for a long time. And we can grow more good kinds of crops from them. Since the year 1960, farmers of Bangladesh have been developing a suitable type of potatoes. Some best kinds of potatoes are hira, diamond, bari potato11(chook), bari potato-12(dhira), potato-13(granola), potato-15(binela), and so on. Bari tropics-1 and bari tropics-2 are two types of hybrid potatoes invented by the Bangladesh Agricultural Institute. Farmers of Bangladesh apply many techniques to cultivate potatoes. But none of them is scientific. If they use scientific ways to produce the number of crops, the profit will Increase. The yield of potatoes can be good when the plant is healthy. Though potato grows anytime in the year, it is mainly winter seasoned vegetable. In winter, the plants become dry, and then the plants need irrigation. If there is a lack of irrigation, plants get dry. The size of the potatoes became more petite than usual. So it’s essential to water the plants as needed. It’s imperative to keep the plants hydrated and need to arouse the soil after 30–35 days of implantation. The roots of the plants must be kept clean from any weeds because it prevents the plants from growing naturally. And sucks all the nutrition from the plant. Potato plants can be affected by insects while they are in fields. If this happens, the irrigation should stop immediately because irrigation helps to influence the healthy plant. People have to depend on imported food which will increase their expenditure and lead to health risks for people. [11] Sometimes, potato plants can get diseases. The insects mostly destroy the crops. They are brown and can be 10-15mm in size. They cut the roots and drill the leaves and destroy the entire cultivation. There can be various kinds of insects that destroy plants and fruits. Farmers need to use insecticide according to the guidance of an agricultural officer. If it is a matter of storing the potatoes, farmers need to use sand. They should make a thin layer over and under the potatoes to keep them fresh for a long time. They should pick the Rotten or insect-affected potatoes and throw them out. The healthy and fresh ones will be affected by them. Despite taking proper care of them. Some most common diseases of potato plants are septoria leaf spot, early blight, bacteria wilt, late blight, common scab, and so on. There are also some fungal diseases. They are Powdery mildew, powdery scab (not a fungus but Rhizaria), Rosellina black rot, and black scurf/canker. The disorders of potatoes can be recognized by using machine learning. We can detect the disease more accurately and deeply by using machine learning. The solution can be more straightforward once the problem is seen in detail. It is

Potato-Net

5

essential to keep the plant healthy for harvesting a good amount of fruit. We can solve any problem quicker than before by using technology in this modern world [10]. We can detect a problem more accurately and profoundly and solve it than with the help of modern technology nowadays. Therefore, it’s highly possible to detect the diseases of potato plants and solve them using modern technology to prevent the loss of the farmers who invest their money and hard work to grow potatoes and balance the economy. CNN is one of the most powerful techniques in pattern recognition, with a large amount of data that benefits encouraging results to detect these diseases [11]. In this paper, we use several architectures of the CNN to detect potato leaf disease, find out whichever algorithm has the best detection accuracy and compare it with other researchers’ work. Section 2 belongs to the literature review. Section 3 discusses the method for identifying potato leaf disease. Section 4 presents and discusses the experimental findings. The document’s 5 section draws the whole affair to a conclusion.

2 Literature Review Many papers, publications, and research projects focus on the detection and categorization of potato leaf disease. A few of the work reviews that have been provided are included below. Tiwari et al. [1] focus on various diseases of potato leaf. A deep learning model is used to detect potato leaf diseases, and multiple classifiers are used. Several techniques for detecting potato leaf diseases include K Means Clustering, VGG19 model, etc. are used. Dataset collected from Kaggle. The competition winner, GoogLeNet, had a 6.7% mistake rate, whereas VGG19 had a 6.8% error rate in the top 5 validation and test error categories. Sravan et al. [2] use machine learning and deep learning approaches to concentrate on identifying the diseased plant at the lowest possible cost. The data collected for the course comprises 20,639 images taken from the plant village database. This research work is done by fine-tuning the ResNet-50 model. The suggested strategy optimizes ResNet50 with the greatest classification accuracy of 99.26%. Islam et al. [3] focus on employing a few leaf picture data and cutting-edge machine learning techniques, to diagnose potato illnesses. A transfer learning is applied to detect potato diseases early. For detecting potato leaf diseases, many techniques are used, such as the CNN model, VGG19, VGG16, ResNet50, RGB imaging, etc. Here, five key methodologies are used: data collection and analysis, image pre-processing, data splitting, classification model construction, and model testing. Data collected from plant village. In testing, the software predicts with an accuracy of 99.43% using 20% test data and 80% train data. Chen et al. [4] focus on various identifying plant diseases. Plant diseases are identified using a deep learning model and transfer learning. MobileNet-V2, CNNs, classification activation map (CAM), and other approaches are used to detect leaf diseases. To detect leaf diseases, AlexNet and ResNet101 were used. They gathered information from Xiamen, China’s Fujian Institute of Subtropical Botany. On the public dataset, it gets an average recognition accuracy of ninety nine point eight five (99.85) per cent. The average accuracy of the collected plant disease photos reaches 99.11% even under numerous classes and difficult backdrop conditions. Dasgupta et al. [5] focus on the detection of diseases in potato leaves. To identify diseases in potato leaves, researchers used a deep learning

6

A. K. Bitto et al.

model and transfer learning techniques and computer vision and image categorization. The accuracy measure is used to quantify model’s performance and visually display the model’s performance. Because the model is lightweight and resilient, it may have integrated into the application for a handheld device such as a smartphone, allowing crop growers to spot diseased crops on the go and save them from ruin. Here, a CNN was utilized to automatically detect sick maize plants in fields using leaf photos using a deep-learning-based approach, which obtained a 97% accuracy. A portion of the Plant Village Dataset was used for this challenge. There are three types of potato leaf photos in this dataset: Alternaria solani, Phytophthora infestans, and healthy leaf images. The dataset originally contained fifty thousand photos of disease-infected leaves from thirty-eight different plant species. Mukti et al. [6] focus on transfer learning based plant diseases detection using ResNet50. In this study, a CNN model based on transfer learning was built to accurately identify plant diseases. The ResNet50 network is the major focus of this research. This model produced the greatest outcome, with a training accuracy of 99.80%. They can manage infections and increase productivity if farmers use this application. The plant disease dataset was obtained from the GitHub repository of the ‘salathegroup’, a well-known research group. A CNN was built in this paper to identify plant illnesses automatically. The model’s performance was evaluated using a comparison with a few different transfer learning models and appropriate graphics. More picture data is necessary for the best generalization of the CNN model as the model’s depth increases. Finally, transfer learning was performed using a popular pre-trained model (ResNet50). The proposed model’s overall accuracy was 99.80%. Too et al. [7] focus on plant disease identification. Use ResNet with 50, 101, VGG 16, Inception V4, and 152 layers, and DenseNets with 121 layers. Dataset collected from Plant Village. Plant Village has 54,306 images, with twenty-six diseases for fourteen crop plants. DenseNet achieves a testing accuracy score of 99.75% to beat the rest of the architectures. Sumalatha et al. [8] focus on transfer learning-based plant disease detection. Deep CNN and transfer learning are used in this paper. In this case, picture classification challenges have seen great success using deep neural networks (DNN). Six different CNN architectures used Xception, Resnet50, MobileNet, VGG16, InceptionV3, and DenseNet121 are compared, and they found that DenseNet121 achieves the best accuracy of 95.48 on test data. Data collected from plant village and trained, validated, and tested data 11,333. Arshad et al. [9] focus on Plant Disease Identification Using Transfer Learning. In this study, ResNet50 with Transfer Learning is used for tomato, potato, tomato, and corn disease identification. Plant disease identification. CNN, VGG16, and CNN are also used in this paper. The best performance for identifying plant diseases was attained by ResNet50, which scored 98.7 per cent. Data were collected from the plant village. AJasim et al. [10] focus on plant leaf disease detection. Here CNN is used for plant leaf disease detection. Here got an accuracy of (98.29 per cent) for training and (98.029 per cent) for testing for all data sets used. They collected data from Plant Village, and the total number of data was 20636. They used a convolutional neural network algorithm. Future research on the suggested system could also test out different learning rates and optimizers.

Potato-Net

7

3 Methodology The main aim of our study is to develop a transfer learning model that will detect potato leaf disease. We must go through numerous phases to attain our aim, including dataset collecting, data preprocessing, model creation, etc. In Fig. 1, the functioning procedure is presented.

Fig. 1. Working procedure diagram to classify the potato leaf diseases.

3.1 Dataset Description The data is collected from a GitHub repository. 7148 data points were collected for each leaf disease: 2183 images for early blight disease, 2498 for late blight disease, and 2468 images for healthy leaf. The 7148 photographs in the customized leaf disease dataset are divided into 5722 for training and 1426 for testing. In Fig. 2, colors have been added to the images that have been taken, and sample data has been visualized.

Fig. 2. Sample dataset for (a) Early Blight, (b) Late Blight, (c) Healthy.

8

A. K. Bitto et al.

3.2 Data Preprocessing Preprocessing techniques employ geometric modifications. Image modification includes things like rotation, scale, and translation. During the preparation steps, we reduced the data resolution. For Xception, Inception-v3, and ResNet50, all pictures resize to 220*220 pixels. The photographs are all of the same excellent quality. Shear shifting, rotation, height shifting, width shifting, and horizontal flipping were applied to the pictures based on image modifications. 3.3 Model Implementation In this study, we used CNN based transfer algorithm into the potato leaf disease dataset. Transfer learning model relevant theory given below. Transfer Learning (TL): A previously trained model is utilized as the foundation for a new model on a different topic using the machine learning approach known as transfer learning. Simply put, a model developed for one job is used to another that is comparable as an optimization to enable quick modeling advancement on the second task [12, 13]. By avoiding the need to train many machine learning models from scratch to carry out similar tasks, transfer learning is frequently utilized to save time and money. In machine learning applications that need a lot of resources, such image classification, as a cost-saving measure. ResNet-50: A well-known neural network called ResNet [14], or Residual Networks, serves as the foundation for many computer vision applications. In the 2015 ImageNet competition, this model won first place. ResNet changed the game because it made it possible for us to successfully train 150-layer deep neural networks. One MaxPool, one Average Pool, and 48 Convolution layers make up ResNet50, a ResNet variant. There are 3.8 x 109 floating-point operations in all. We have extensively examined the ResNet50 architecture, a well-liked ResNet model. He and his collaborators suggested ResNet, which the 2015 ImageNet competition judged to be the best. This technique allows for the training of deeper networks. Xception: A 71-layer convolutional neural network called Xception exists [15]. The ImageNet database is used to load a pre-trained version of the network that has been trained on more than a million images. The network can classify photos into a thousand different item categories, including keyboards, pencils, mouse, and different animals. The Inception Architecture is enhanced by Xception [16], which substitutes depth-wise Separable Con-volutions for the original Inception modules. Inception-V3: When Inception Net moved farther to enhance performance and accuracy without compromising computing cost, it established a new standard for CNN classifiers. The Inception network, on the other hand, has been carefully built. “Convolutional Neural Networks employ stacked 11 convolutions to reduce dimensionality and deliver more efficient computation and deeper networks.” The modules were created to address problems like over-fitting and computational expense, among others. The Convolutional Layers: 11, 33, and 55, make up the Inception Layer, and their output filter banks have been concatenated into a single output vector that is used as the input for the following phase.

Potato-Net

9

3.4 Performance Calculation We used test data to quantify the models’ performance after training. Here are some of the metrics that were calculated for performance evaluation. We discovered the most accurate model to forecast using these parameters. Using Eqs. (1–7), several percentage performance measures have been created depending on the model’s given confusion matrix. Accuracy =

True Positive + True Negative × 100% Total Number of Images

(1)

True Positive Rate (TPR) =

True Positve × 100% True Positive + False Negative

(2)

True Negative Rate (TNR) =

True Negative × 100% False Positive + True Negative

(3)

False Positive Rate (FPR) =

False Positive × 100% False Positive + True Negative

(4)

False Negative Rate (FNR) =

False Negative × 100% False Negative + True Positive

(5)

Precision =

True Positive × 100% True Positive + False Positive

F1 Score = 2 ×

Precision × Recall × 100% Precision + Recall

(6) (7)

4 Results and Discussions A ratio of 80:10 was used to split the 5722 training pictures of potato leaf disease and the 1426 validation images. The experiment platform has an Intel Core i5 CPU and 8 GB of RAM. All input pictures were scaled to 220*220, 220*220, and 220*220, respectively, for the Xception ResNet50 and Inception-v3 models. These appropriate models were used in our study to scale images to 220*220. Weights from the pre-trained Xception, ResNet50, and Inception-v3 models were applied. The resulting confusion matrix (TP, FN, FP, TN) for each of the applicable models is shown in Table 1 with three classes. For Xception, we chose a batch size of 64 and 40 epochs. When Xception is finished, we build the confusion matrix and evaluate performance for each class. Fig. Table 2 displays the computed performance, while Fig. 3 depicts the accuracy graph and loss. We used a batch size of 64 and 40 epochs for ResNet-50. We build the confusion matrix from the model when ResNet-50 is finished and assess the performance of each class. Figure 4 displays the accuracy graph and loss, while Table 3 displays the computed performance. For Inception-V3, we utilized a batch size of 64 and 30 epochs. When Inception-V3 is finished, we create the confusion matrix and assess each class’s performance. The

10

A. K. Bitto et al. Table 1. Confusion matrices for applied transfer learning with three classes.

Model

Class

TP

FN

FP

TN

Xception

Late Blight

844

41

50

491

Early Blight

867

106

24

429

Healthy

726

129

90

481

Late Blight

854

56

13

503

Early Blight

882

23

11

510

Healthy

785

60

13

568

Late Blight

463

22

11

930

Early Blight

470

108

15

833

Healthy

364

92

4

966

ResNet-50

Inception-v3

Fig. 3. Diagram for (a) Xception accuracy and (b) Xception loss on 40 epochs.

Table 2. Class wise performance evaluation matrices for Xception. Model

Class

Accuracy (%)

TPR (%)

FNR (%)

Xception

Late Blight

93.62

95.37

4.63

Early Blight

90.88

89.11

Healthy

84.64

84.91

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

9.24

90.76

94.41

94.88

10.89

5.30

94.70

97.31

93.03

15.09

15.76

84.24

88.97

86.89

accuracy graph and loss are shown in Fig. 5, and the computed performance is shown in Table 4. In this suggested study, we want to evaluate the trained model using the test data set. Our model was given the training dataset, which contained both the original and the

Potato-Net

11

Fig. 4. Diagram for (a) ResNet-50 accuracy and (b) ResNet-50 loss on 40 epochs.

Table 3. Class-wise performance evaluation matrices for ResNet-50. Model

Class

Accuracy (%)

TPR (%)

FNR (%)

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

ResNet-50

Late Blight

95.16

93.85

6.15

2.52

97.48

98.50

96.12

Early Blight

97.62

97.46

2.54

2.11

97.89

98.77

98.11

Healthy

94.88

92.90

7.10

2.24

97.76

98.37

95.56

Fig. 5. Diagram for (a) Inception-V3 accuracy and (b Inception-V3 loss on 30 epochs.

improved photographs, as an additional learning option. The model was subsequently verified for accuracy. The model’s performance was evaluated using test images after it had been trained using the ResNet50, Inception-v3, and Xception, architecture on the dataset for the potato leaf disease. We played around with the ResNet50, Xception, and Inception-v3 models’ weights. This was done in order to compare our model to other well-known transfer learning networks that have already been trained [18]. We

12

A. K. Bitto et al. Table 4. Class-wise performance evaluation matrices for Inception-v3.

Model

Class

Accuracy (%)

TPR (%)

Inception-v3

Late Blight

97.69

95.46

Early Blight

91.37

Healthy

93.27

FNR (%)

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

4.54

1.17

98.83

97.68

96.56

81.31

18.69

1.77

98.23

96.91

88.43

79.83

20.17

0.41

99.59

98.91

88.35

investigated which pre-trained network will best complement this dataset. There are three distinct models: ResNet50, Xception, and Inception-v3. Table 5 reveals that for identifying potato leaf disease, ResNet50 has the best accuracy (96.11%) and the lowest mistake rate, Inception-v3 is second (94.25%), and Xception has the lowest accuracy (89.71%) and the highest error rate. Table 5. Final accuracy table for the computed performance of applied transfer learning. Model

Accuracy (%)

TPR (%)

FNR (%)

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

Xception

89.71

89.80

10.20

10.1

89.9

93.56

91.6

ResNet-50

96.11

94.74

5.26

2.3

97.7

98.55

96.60

Inception-V3

94.25

85.53

14.47

1.12

98.88

97.83

91.11

We will integrate these into a mobile application. With the use of a smartphone application, farmers will be able to detect potato leaf disease as early as possible, allowing them to produce more potatoes.

5 Conclusion In order to determine which model recognizes the disease the earliest and most precisely, in this article, we used different transfer learning approaches to recognize potato leaf illness and contrast them with previous studies. The learning data for our model came with the original and modified photographs, as a learning option. The model was then tested to make sure it was accurate. Utilizing test images, the model’s performance was assessed after being trained on the dataset for the potato leaf disease using the Xception, ResNet50, and Inception-v3 architecture. We experimented with the Xception, ResNet50, and Inception-v3 models’ weights. Inception-v3 came in second with 94.25% accuracy for identifying potato leaf disease, ResNet50 comes in first with 96.11% accuracy and a low error rate, and Xception came in last with 89.71% accuracy and the greatest mistake

Potato-Net

13

rate. For identifying potato leaf disease, ResNet50 has the best accuracy (96.11%) and a low error rate (0.4%), whereas Inception-v3 has the second best accuracy (94.25%). For identifying potato leaf disease, Xception has the most significant error rate and the least accurate detection rate of 89.71%. We pondered why accuracy was not 100% accurate. It’s most likely owing to the leaves’ nature. Some species’ pictures are strikingly similar in shape, color, and texture to those of other species. As a result, it can be difficult for networks to forecast genuine labels at times correctly. These plant leaves can be captured with a mobile camera. Thanks to an accurate plant disease detection model included in those smartphones, farmers will be able to spot plant illnesses in a timely and straightforward manner. Farmers will have the freedom to choose for themselves. Additionally, it will benefit the development of agriculture.

References 1. Tiwari, D., Ashish, M., Gangwar, N., Sharma, A., Patel, S., Bhardwaj, S.: Potato leaf diseases detection using deep learning. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 461–466. IEEE (2020) 2. Sravan, V., Swaraj, K., Meenakshi, K., Kora, P.: A deep learning based crop disease classification using transfer learning. In: Materials Today: Proceedings (2021) 3. Islam, F., Hoq, M.N., Rahman, C.M.: Application of transfer learning to detect potato disease from leaf image. In: 2019 IEEE International Conference on Robotics, Automation, Artificialintelligence and Internet-of-Things (RAAICON), pp. 127–130. IEEE (2019) 4. Chen, J., Zhang, D., Nanehkaran, Y.A.: Identifying plant diseases using deep transfer learning and enhanced lightweight network. Multimed. Tools Appl. 79(41–42), 31497–31515 (2020). https://doi.org/10.1007/s11042-020-09669-w 5. Dasgupta, S.R., Rakshit, S., Mondal, D., Kole, D.K.: Detection of diseases in potato leaves using transfer learning. In: Das, A.K., Nayak, J., Naik, B., Pati, S.K., Pelusi, D. (eds.) Computational Intelligence in Pattern Recognition. AISC, vol. 999, pp. 675–684. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9042-5_58 6. Mukti, I.Z., Biswas, D.: Transfer learning based plant diseases detection using ResNet50. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–6. IEEE (2019) 7. Too, E.C., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric. 161, 272–279 (2019) 8. Sumalatha, G., Krishna Rao, D.S., Singothu, D., Rani, J.: Transfer learning-based plant disease detection (2021) 9. Arshad, M.S., Rehman, U.A., Fraz, M.M.: Plant disease identification using transfer learning. In: 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1–5. IEEE (2021) 10. Jasim, M.A., Al-Tuwaijari, J.M.: Plant leaf diseases detection and classification using image processing and deep learning techniques. In: 2020 International Conference on Computer Science and Software Engineering (CSASE), pp. 259–265. IEEE (2020) 11. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016) 12. Mia, J., Bijoy, H.I., Uddin, S., Raza, D.M.: Real-time herb leaves localization and classification using YOLO. In: 2021 12th International Conference on Computing Communication

14

13. 14.

15.

16. 17. 18.

A. K. Bitto et al. and Networking Technologies (ICCCNT), pp. 1–7 (2021). https://doi.org/10.1109/ICCCNT 51525.2021.9579718 Krishna, R., Menzies, T.: Bellwethers: a baseline method for transfer learning. IEEE Trans. Softw. Eng. 45(11), 1081–1105 (2018) Theckedath, D., Sedamkar, R.R.: Detecting affect states using VGG16, ResNet50 and SEResNet50 networks. SN Comput. Sci. 1(2), 1–7 (2020). https://doi.org/10.1007/s42979-0200114-9 Bitto, A.K., Mahmud, I.: Multi categorical of common eye disease detect using convolutional neural network: a transfer learning approach. Bull. Electr. Eng. Inform. 11(4), 2378–2387 (2022) François, C.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Xia, X., Xu, C., Nan, B.: Inception-v3 for flower classification. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 783–787. IEEE (2017) Hasan, S., Rabbi, G., Islam, R., Bijoy, H.I., Hakim, A.: Bangla font recognition using transfer learning method. In: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 57–62 (2022). https://doi.org/10.1109/ICICT54344.2022.9850765

False Smut Disease Detection in Paddy Using Convolutional Neural Network Nahid Hasan1 , Tanzila Hasan1(B) , Shahadat Hossain3 , and Md. Manzurul Hasan2 1

2

City University, Dhaka, Bangladesh [email protected] American International University-Bangladesh (AIUB), Dhaka, Bangladesh [email protected] 3 Daﬀodil International University, Dhaka, Bangladesh [email protected] Abstract. Rice false smut (RFS) is the most severe grain disease aﬀecting rice agriculture worldwide. Because of the various mycotoxins produced by the causal pathogen, Villosiclava virens, epidemics result in yield loss and poor grain quality (anamorph: Ustilaginoidea virens). As a result, the farmers’ main concern is disease management measures that are eﬀective, simple, and practical. Because of this, we look at the image of the RFS to understand and predict this severe grain disease. This research proposes a model based on the Convolutional Neural Network (CNN), widely used for image classiﬁcation and identiﬁcation due to its high accuracy. First, we acquire data from actual rice farming ﬁelds with highresolution RFS images. Then, we train and test our model’s performance using actual images to compare and validate it. As a result, our model provides 90.90% accurate results for detecting the RFS in actual photos. Finally, we evaluate and record all of the data for subsequent studies. Keywords: CNN · False Smut · Detection · Image Processing

1

Introduction

Rice (Oryza sativa) is the main crop in Bangladesh and many other countries. In Bangladesh, 90% of all farmers are involved in rice farming, and 95% of the country’s food needs are met by paddy. The importance of rice to economic growth is huge. For example, Bangladesh exports paddy to other countries every year, which employs many people and brings in much foreign money. The United States Department of Agriculture says Bangladesh makes 3.6 crore tons of rice annually. Also, Bangladesh made the fourth most rice after China, India, and Indonesia. There are several reasons to limit rice production, but one of the most important is the disease in paddy grains. In 2019, a survey was done in the paddy ﬁelds of NRRI in Cuttack. They found that the grown rice genotypes had lost a lot of their brown, black, or ash-colored grains, usually accompanied by chaﬃness. Discoloured grains in diﬀerent rice genotypes ranged from 25% to 92% [1]. c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 15–21, 2023. https://doi.org/10.1007/978-3-031-34619-4_2

16

N. Hasan et al.

People have gotten sick with this disease in India. The highest infection rate, 85%, was found in Tamil Nadu [10]. False smut (Ustilaginoidea virens) is a pathogen that damages crops. The most common sign of the disease is the growth of black fungus on the rice grains, which are covered by yellow fungus in the ﬁeld. Spores that are fully grown are orange, but they turn yellowish-green or greenishblack as they age. Most of the time, only a few grains on a panicle are bad, and the rest are ﬁne. However, farmers who grow rice crops get this disease more often than when they ﬁrst saw them. So, the grain has a low market price because its colour is faded and not very good. As a result, people are more likely to spread false smut diseases. Recently, RFS has been found in many of the world’s most important rice-growing areas, such as China, India, and the United States [12]. We made a CNN system that makes it easy to spot paddy fake smut sickness. The rest of the information in this article is set up as follows. The second part talks about the works that go together. In Sect. 3, we look at how the analysis was done. Section 4 talks about our experiment with the CNN model. It shows how well our model did on the dataset and talks about it. Before ending, Sect. 5 talks about where we want to go with our research in the future.

2

Related Work

In agriculture, the study of diagnosing from a picture is an interesting one. Based on pictures of sick rice plants, a way to ﬁnd out if rice is sick has been found. They look at the extracted sample’s color, shape, and feel to ﬁnd bacterial leaf blight, brown spots, and leaf smoot. Regarding accuracy, 73.33% of the test dataset was correct, while 93.33% of the training dataset was correct [7]. Using image processing techniques to make diagnoses from pictures of leaves is one of the newest areas of research in agriculture that got our attention. This study suggests using an Optimized Deep Neural Network with the Jaya Algorithm to ﬁnd and categorize diseases in paddy leaves. They took pictures of the paddy ﬁeld and found four diseases: bacterial blight, blast, brown spot, and sheath rot. Also, RGB photos have been changed into HSV images, which makes it possible to remove and mask backgrounds based on color in pre-processing. Also, a clustering algorithm was used to separate the sick, normal, and background parts [8]. Using photos of tree leaves to determine what kind of disease a tree has is an interesting topic in agriculture. For example, pictures of rice leaves have shown three diseases: Bacterial Leaf Blight, Brown Spot, and Leaf Smut. The Deep Convolutional Neural Network Alexnet model was used to perform accurate feature extraction and detection. Rice is one of the most important crops right now because the economy depends on the production and productivity of agriculture. Moreover, even rice is a crop that most people in most countries eat daily. On the other hand, rice is grown in the ﬁeld and can get sick from bacteria and fungi. So, we must be more careful about growing and testing rice. With the help of image processing, three diseases of rice were found: Bacterial Leaf Blight, Brown Spot, and Leaf Smut. The UCI Machine Learning Repository’s Rice Leaf Disease dataset was used [9]. The remaining neural networks could put the pictures into the correct disease category 95.83% of the time [6]. These studies are

False Smut Disease Detection in Paddy Using Convolutional Neural Network

17

mostly about paddy blast, brown spot disease, and narrow brown spot disease. They used a tool called Matlab. The neural network puts diseases like paddy blast, brown spot disease, narrow brown spot disease, and normal paddy leaf disease into diﬀerent groups. As examples for training, ten pictures of Blast Disease, ten pictures of Brown Spot Disease, and ten pictures of Narrow-Brown Spot Disease are used. Also, the test score is 92.5% [13]. India’s main crop is rice, and most paddy farming land is used to grow brown and white rice. Rice is grown in almost every Indian state. More than threequarters of India’s people work in agriculture. The ﬁrst thing that makes rice plants sick is when fungi or bacteria attack them, and the second is when the weather changes without warning. Diseases in grain plants or changes in the weather can cause famine. It could also hurt the economy. Rice blasts, brown spots, leaf smut, tungro, and sheath blight are the most common diseases. Rice disease is the most common problem for most farmers. It means that early diagnosis is very important [11].

3

Methodology

This section explains our workﬂow of analyses and the required methods to observe our collected dataset and detect Paddy False Smut disease (Fig. 1).

Fig. 1. Work ﬂow diagram

3.1

False Smut Disease

Paddy is the major crop in many nations throughout the world. Thus its production and diagnosis are critical for us [9] There are several diseases in paddy. That is why we lose paddy production every year. One of these is False Smut Disease. The fungal pathogen Ustilaginoidea virens, which produces both sexual ascospores and asexual chlamydospores in its life cycle, is the primary cause of Rice false smut disease, according to [2] (Fig. 2).

18

N. Hasan et al.

Fig. 2. Aﬀected rice and healthy rice

3.2

Data Collection and Preprocessing

Dataset is the most crucial part of image processing. So, we should be careful when we collect data. The imbalanced dataset is one of the signiﬁcant problems in dataset [5]. Our research contains images directly from the paddy ﬁeld in Gazipur, Bangladesh. In every dataset, we have two types of class Disease and Healthy. After collecting data, we pre-processed our data. Then, we use image augmentation for re-scaling, crop, rotation, zooming, resizing images, and formatting images for further analysis. 3.3

Convolutional Neural Network

CNN is a class of neural networks that are used for deep learning. In a nutshell, it is a robust machine learning method for automatically classifying images. CNN is often used to segment images and can also be used to classify images. A convolution neural network (CNN) is an important part of many well-known methods for classifying images. CNN’s method has a few layers to determine the real output [4]. The input image is passed through a set of ﬁlters in the convolution layers. We use binary classiﬁcation and combine each feature map with a network with all its links. Because of this, we use the sigmoid algorithm in our research.

4

Result and Performance Analyses

In Table 1, we look at how well our model works and show performance metrics like precision, speciﬁcity, sensitivity, and the f1 score. Accuracy metrics create two local variables, “total” and “count,” which determine how often Y pred and Y true match up [3]. Based on our research, we have found that on-ﬁeld datasets are accurate 90.90% of the time. One way to measure how well a machine learning model works is by accuracy. It is worked out by dividing the number of real positives by the number of optimistic predictions. Our research has given us an on-ﬁeld dataset model with an accuracy of 90.90% (Fig. 3).

False Smut Disease Detection in Paddy Using Convolutional Neural Network

19

Table 1. Result Table Precision Speciﬁcity Sensitivity F1 score Accuracy On-ﬁeld Dataset 90.62%

91.17%

90.62%

90.62%

90.90%

Fig. 3. On-ﬁeld Dataset model curved

Sensitivity metrics assess a model’s ability to predict true positives in each accessible category, and speciﬁcity measures a model’s ability to predict the actual negatives of each accessible type. Our model achieves a sensitivity of 90.62% and a speciﬁcity of 91.17%. The F1 score is a better metric than accuracy since it is the harmonic mean of precision and recall. In our on-ﬁeld CNN model, the F1 score is 90.62%. Confusion matrices are a summary of the results of classiﬁcation problem prediction. Figure 4 shows confusion matrices for on-ﬁeld photographs.

Fig. 4. Confusion metrices of CNN model

20

N. Hasan et al.

After ﬁtting our model to the dataset, we obtain two curves: accuracy and loss curves (training and validation). The x-axis shows the number of epochs in the accuracy curve, and the y-axis represents the number of accuracies. Likewise, the x-axis represents the number of epochs in the loss curve, while the y-axis represents the number of validations.

5

Future Research Direction and Conclusion

We have determined the last stage of the fake smut disease based on healthy and diseased images from on-ﬁeld images. In our future expanded eﬀort to ensure the early detection of fake smut disease, we will try to collect the second stage of this condition. Paddy is Bangladesh’s most signiﬁcant food crop, accounting for 80% of the country’s total land area. Paddy agriculture employs 90% of the total farmer population. The importance of paddy in socioeconomic development is immense. Unfortunately, false smut disease destroys many paddies. As a result, farmers must face a signiﬁcant amount of loss. Farmers, but also the country’s economy, have a considerable impact. As a result, we developed a model to detect false smut disease and found noticeable results on datasets. We hope our research contributes to developing a solution to one of Bangladesh’s most severe agricultural problems and inspires other scholars to investigate agriculture’s well-being.

References 1. Baite, M.S., Raghu, S., Prabhukarthikeyan, S., Keerthana, U., Jambhulkar, N.N., Rath, P.C.: Disease incidence and yield loss in rice due to grain discolouration. J. Plant Dis. Prot. 127(1), 9–13 (2020) 2. Biswas, A.: False smut disease of rice: a review. Environ. Ecol. 19(1), 67–83 (2001) 3. Flach, P.A.: The geometry of roc space: understanding machine learning metrics through roc isometrics. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 194–201 (2003) 4. Lei, X., Pan, H., Huang, X.: A dilated CNN model for image classiﬁcation. IEEE Access 7, 124087–124095 (2019). https://doi.org/10.1109/ACCESS.2019.2927169 5. L´ opez, V., Fern´ andez, A., Herrera, F.: On the importance of the validation technique for classiﬁcation with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci. 257, 1–13 (2014) 6. Patidar, S., Pandey, A., Shirish, B.A., Sriram, A.: Rice plant disease detection and classiﬁcation using deep residual learning. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) MIND 2020. CCIS, vol. 1240, pp. 278–293. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6315-7 23 7. Prajapati, H.B., Shah, J.P., Dabhi, V.K.: Detection and classiﬁcation of rice plant diseases. Intell. Decis. Technol. 11(3), 357–373 (2017) 8. Ramesh, S., Vydeki, D.: Recognition and classiﬁcation of paddy leaf diseases using optimized deep neural network with Jaya algorithm. Inf. Process. Agric. 7(2), 249– 260 (2020)

False Smut Disease Detection in Paddy Using Convolutional Neural Network

21

9. Rao, D.S., Kavya, N., Kumar, S.N., Venkat, L.Y., Kumar, N.P.: Detection and classiﬁcation of rice leaf diseases using deep learning. Int. J. Adv. Sci. Tech. 29(03), 5868–5874 (2020) 10. Sethy, P.K., Barpanda, N.K., Rath, A.K., Behera, S.K.: Rice false smut detection based on faster r-CNN. Indonesian J. Electr. Eng. Comput. Sci. 19(3), 1590–1595 (2020) 11. Shah, J.P., Prajapati, H.B., Dabhi, V.K.: A survey on detection and classiﬁcation of rice plant diseases. In: 2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), pp. 1–8. IEEE (2016) 12. Wang, W.M., Fan, J., Jeyakumar, J.M.J., Jia, Y.: Rice False Smut: An Increasing Threat to Grain Yield and Quality. Protecting Rice Grains in the Post-genomic Era. London: IntechOpen, pp. 89–108 (2019) 13. Zainon, R.: Paddy disease detection system using image processing. Ph.D. thesis, UMP (2012)

Gabor Wavelet Based Fused Texture Features for Identification of Mungbean Leaf Diseases Sarna Majumder1,2 , Badhan Mazumder1,3(B) , and S. M. Taohidul Islam1,4 1

2

3

Faculty of Computer Science and Engineering, Patuakhali Science and Technology University, Patuakhali, Bangladesh [email protected] Department of Computer and Communication Engineering, Patuakhali Science and Technology University, Patuakhali, Bangladesh Department of Computer Science and Engineering, Dhaka International University, Dhaka, Bangladesh 4 Department of Electrical and Electronics Engineering, Patuakhali Science and Technology University, Patuakhali, Bangladesh

Abstract. Early diagnosis of crop plant disease is crucial since many of these diseases pose a considerable threat not only to global food security but also towards agricultural productivity. Aiming that, for detection of mungbean leaf diseases at the beginning stage, we introduce a novel approach based on Gabor Wavelet Transform (GWT) and Cubic SVM in this paper. We perform GWT to decompose the given images into eighteen directional sub-bands and then extract fusion of Gabor Wavelet (GW) based texture features from each detailed GWT coeﬃcient subband. Finally, Cubic SVM was deployed to classify three diﬀerent disease classes by using these GW based features, conducting cross validation at 10 fold. Outcomes of our experimental evaluation using our self-prepared dataset of mungbean leaf diseases exhibit that our proposed method yields overall a sensitivity of 91.11%, a speciﬁcity of 95.56%, a precision of 91.39% and an accuracy of 91.11%. Moreover, outcomes obtained from our comparative analysis conﬁrm the supremacy of our proposed framework over 3 currently existing approaches.

Keywords: Crop Diseases SVM

1

· Gabor Wavelet · Texture Feature · Cubic

Introduction

Plant diseases, especially plant leaf diseases are considered as a prime reason for both quantitative and qualitative losses in agricultural production which badly aﬀect the production cost as well as the agriculture based economy of Bangladesh. Since tools for precise and quick diagnosis still remain scarce, c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 22–34, 2023. https://doi.org/10.1007/978-3-031-34619-4_3

Gabor Wavelet Based Fused Texture Features

23

the nutrition security, the food supply and not to mention the livelihoods of farmers are hanging in the balance in case of any sort of bacterial or fungal disease outbreaks occur. Conventionally, farmers in our country still perform detection of diseases with their naked eyes which usually leads to inaccurate and biassed decisions since most of the leaf disease’s lesions appear to be exactly the same at the primary stage. This traditional approach eventually leads to the extensive use of pesticides, which not only increase the production cost signiﬁcantly but also impact the surrounding environment negatively. Hence, the need for a robust early disease detection technique is crucial to prevent the widespread of diseases as well as substantial economic damages. Recent advances in computer vision have paved the way to capture the texture features of the lesion better than human eyes. Adopting these features, several studies have been conducted in recent past few years to recognize plant diseases using diﬀerent machine learning (ML) algorithms and deep learning (DL) techniques. For disease identiﬁcation of plants, Konstantinos et al. [5] developed a model based on Visual Geometry Group (VGG), which obtained 99.53% accuracy. Using GLCM and Wavelet-based features, Akhtar et al. introduced an automated system for plant disease diagnosis [1]. Diﬀerent ML methods were trained adopting these features, including K-Nearest Neighbour (KNN), Decision Tree, Support Vector Machine (SVM), Naive Bayes Classiﬁer, and Recurrent Neural Networks. Trivedi et al. employed K-means clustering algorithm with hue value to segment the contaminated region of leaf at ﬁrst. The features from the desired region of interest were then retrieved and trained using SVM for classiﬁcation [18]. Hyperspectral measurement based approach was proposed to detect leaf diseases by Ashourloo et al. in their work [4]. Too et al. [17] compared the outcomes of several deep learning models, including Inception V4, Visual Geometry Group (VGG), DenseNet and ResNet, for classiﬁcation of disease adopting the popular “Plant Village” dataset, and concluded that DenseNet was the most convenient with 99.75%. To classify diseases in crops, Mohanty et al. [9] used a transfer learning (TL) technique using a pre-trained AlexNet. With a dataset of 54,306 sample and 99.35% accuracy, the model can diagnose 26 diﬀerent diseases in 14 diﬀerent crop species. Rangarajan et al. [11] classiﬁed tomato leaf diseases using AlexNet and VGG16, with VGG16 achieving 97.29% and AlexNet achieving 97.49%. Shijie et al. [12] also employed VGG16 to classify tomato leaf diseases, obtaining an accuracy of 88%. In addition, Ashok et al. [3] introduced CNN based algorithm in their work for diagnosis of tomato leaf lesions. As previously stated, deep learning algorithms comprise the majority of the recently suggested models for recognition of plant diseases. The performance of deep learning models, on the other hand, is highly dependent on the training dataset provided. On a suﬃciently big dataset, these models deliver better outcomes and excellent generalizability. Since there is no publicly available dataset on mungbean leaf diseases, the datasets currently available in our possession lack suﬃcient images in a variety of conditions, which is considered necessary for developing highaccuracy trained models. In case of such small dataset, the developed model may perform poorly on actual test data by overﬁtting.

24

S. Majumder et al.

We introduce a two-stage method for detecting and classifying three major mungbean leaf diseases: Cercospora Leaf Spot, Powdery Mildew, and Yellow Mosaic in this work. Our work’s primary novelties can be summed up as follows. Pre-processing was done using a morphological technique in the ﬁrst step to segment the lesion on the mungbean leaf surface. During the subsequent stage, a Gabor Wavelet (GW) based decomposition of segmented pictures into 18 directional coeﬃcient sub-bands was done, with fusion of three diﬀerent texture characteristics retrieved from each of the directional sub-bands. We picked a Gabor-based technique because it allows for directional decomposition, with each sub-band providing independent directional information in its own channel. Finally, Cubic SVM was used to analyse these extracted Gabor wavelet based fusioned texture features and categorise them into three separate disease classes using 10-fold cross validation. Our key contribution is to replace traditional texture features with a fusion of GW-based texture features to overcome its limitations and develop a robust mungbean leaf detection approach. This is the ﬁrst work that we are aware of that proposes a mungbean disease diagnosis method based on the merging of GW-based features. Mungbeans are widely grown in Bangladesh’s southern coastal regions, notably in the Barisal and Patuakhali districts. We believe that the farmers in these remote areas will beneﬁt from our proposed framework, which will not only expand mungbean productivity but also deter farmers from overusing pesticides, slowing the rate of environmental pollution in the near future. The remainder of this paper’s structure is as follows: Sect. 2 contains suﬃcient information about our mungbean leaf disease dataset as well as a detailed description of our proposed GW fusioned features and Cubic SVM-based framework. The experimental results of our investigation are presented in Sect. 3, followed by a thorough discussion, and the entire paper is summed up in Sect. 4.

2

Methodology

The following are the main phases in our proposed GW-based mungbean leaf disease detection approach: pre-processing and segmentation of mungbean leaf lesion, feature fusion extraction using GW, and disease classiﬁcation employing Cubic SVM. In a nutshell, Fig. 1 depicts our proposed methodology. This section contains a elaborate overview of the dataset we utilized and the methodology we developed. 2.1

Dataset

Our data set includes 120 photos of mungbean leaves infected with three diﬀerent forms of diseases: Cercospora Leaf Spot, Powdery Mildew, and Yellow Mosaic. The samples were obtained from several locations in the Patuakhali area, which is recognized for producing the most mungbean in Bangladesh. When blobs appeared, the leaves were collected and photographed the same day using a Canon 700D DSLR (18–55 mm lens) with a resolution of 5184 × 3456. We clliped

Gabor Wavelet Based Fused Texture Features

25

Fig. 1. Block diagram of our proposed GW based system.

ROI from each image and shrunk it to 256 × 256 pixels to make the model more eﬃcient and reduce required computation time. The three types of mungbean leaf diseases in our dataset are depicted in Fig. 2.

Fig. 2. Three classes of mungbean leaf diseases in our study.

Lesion Segmentation. Because of the numerous contrasts on the leaf surface, the contrast enhancement approach is utilized to adjust pixel intensities, which allows to provide further information in certain sections of an image. In this study, we deploy an advanced contrast enhancement strategy [10] for further investigation to improve low-contrast features and improve contrast quality. The fundamental idea behind this method is to keep an input image’s mean brightness intact while adjusting contrast in local areas. To begin, the RGB color

26

S. Majumder et al.

channels of the input image are transformed to HSI. This method just considers the intensity parameter, leaving the hue and saturation parameters unchanged. Following that, a separator divides the intensity into two groups: high and low using Eq. 1 [10] βhi = β( s)|s > βm , βlo = β( g)|g ≤ βm

(1)

where m represents the value of transitory threshold intensity that is speciﬁed to partition the image into two sub-images, βlo and βhi are intensity low and high intensity groups, respectively. To accomplish the improved intensity, estimation of the 2 intensity based sub-parameters are combined using following Eq. 2 [10] βenhance (s) = βlo + (βhi − βlo ) × z(s)

(2)

where z(s) represents the incremental density obtained from the histogram. To reduce the inaccuracy, both mean and given brightness are computed and compared. This technique is repeated until the improved intensity value is found to be optimal. To construct the output image, along with other preliminary hue and saturation values the boosted intensity are aggregated and subsequently converted to RGB channel. This contrast enhanced RGB image is then converted to HSV space and later CLAHE [20] is deployed on the H channel. The CLAHE approach analyses a intensity histogram in a contextual region concentrated at each pixel and represents the intensity at that pixel according to the serialwise pixel’s intensity in its own histogram. After that, the global thresholding approach is adopted to transform that image into binary. To segment the lesion, this generated binary image is considered as a mask and typecast onto the original RGB image. The eﬀect of our lesion segmentation approach is partially illustrated in Fig. 3.

Fig. 3. Outcomes of pre-processing and lesion segmentation.

Gabor Wavelet Based Fused Texture Features

2.2

27

Feature Extraction

Gabor Wavelet Representation. Two dimensional Gabor wavelet can be deﬁned using Eq. 3 [15] 2 1 (3) ψG (x) = ejk0 x e− 2 |Bx| where B is a 2×2 diagonal matrix which can be elongated in any speciﬁc direction 1 and deﬁned as B = [ε− 2 .1], ε >= 1. The k0 parameter represents the complex exponentials frequency. Elongation [14] of this ﬁlter is done by setting the value of k0 to [0, 3] and, ψ = 4, minimal-frequency complex exponential with a small number of signiﬁcant oscillations orthogonal to the wavelet’s main axis. These two parameters were chosen to allow the transform to present higher responses to pixels linked with lesion texture, as they are well suited for the characterization of directional information. Alongside frequency and elongation, the other two fundamental characteristics of the Gabor wavelet are scale and orientation. The breadth of the elongated object is determined by scale, whereas the orientation of the objects is determined by orientation. Varying widths lesions can be found in the mungbean leaf lesion in distinct orientations. To accommodate all the possible sizes and orientation of lesion, we used two scales: 8 and 16 with 9 different orientations: 0◦ , 25◦ , 50◦ , 75◦ , 100◦ , 125◦ , 150◦ , 175◦ and 200◦ , in our work and kept the highest responses from all 9 orientations for further analysis. Gabor Wavelet Based Texture Feature Fusion. GLCMs, unlike traditional textural features, reveal spatial information and occurrence details despite being time consuming [19]. Leaf lesions may have distinct texture in one direction that can be tracked utilizing GLCMs. Likewise, binary patterns in mungbean leaf lesions could assist in the classiﬁcation of disease groups. To characterize the spatial organization of a local image texture, the Local Binary Pattern (LBP) derives binary patterns from neighbouring pixels [7]. The Gray Level Run Length (GLRL) texture feature is a sophisticated texture feature that is responsive to intensity patterns. Since it can diﬀerentiate between distinct image intensities, it can facilitate the detection of lesion formation and dispersion. GW was solely employed for decomposition in our proposed feature extraction method. GW decomposes the provided input image into detailed directional coeﬃcient subbands based on appropriate scales and orientation. As previously stated, we applied GW with two diﬀerent scales and nine distinct orientations, yielding 18 separate directional sub-bands from each input image. After that, from each one of the 18 directional sub-bands, GLCM, LBP and GLRL features are extracted. Our study employs the most common twenty-two GLCM features [2]. Four main directions, 0◦ (Horizontal H), 90◦ (Vertical),45◦ (Diagonal D1), and 135◦ (Diagonal D2), are chosen to construct corresponding GLCMs and their average is employed for experimentation in order to gain irregular patterns of mungbean leaf lesion. LBP is calculated by comparing pixel values to neighbouring pixels in each directional sub-band [8]. For each gray level, GLRL is formed by stating the

28

S. Majumder et al.

direction and quantifying the number of runs and length in those directions [16]. Algorithm 1 demonstrates our proposed GW-based texture feature extraction algorithm. Algorithm 1. GW based texture feature extraction algorithm Input: Lesion segmented mungbean leaf image, value of scaling array S [2] and orientation array O [9] of Gabor wavelet. Output: GW based texture dataset Initialisation : 1: for for m = 1 to the number of lesion segmented mungbean leaf image: do 2: Read input image Im 3: Read input value of array S and O for of Gabor wavelet 4: Apply GW decomposition of Im according to S and O array 5: Obtain 18 directional sub-bands in total 6: Determine GLCM+LBP+GLRL features from each one of the 18 direction subbands 7: Return the GW texture feature vector (1×18) and include it to the corresponding dataset. 8: end for 9: return GW based texture dataset

2.3

Classification Using Cubic SVM

We used an SVM classiﬁer in our study to categorize mungbean leaf lesions based on GW texture features, which is favourable when memory space is limited. Around 70% of all lesion containing leaf image in our databases were solely used for training, while 30% were used for testing purpose. In order to generate the requisite trained model, the derived GW texture features were supplied to the SVM classiﬁer. In multidimensional feature space, SVM ﬁnds a hyperplane that separates the classes in the best feasible way [6]. SVM employs a mathematical function known as the kernel to generate the geometrical hyper plane. Linear, polynomial, radial basis function (RBF), sigmoid and non-linear are among some of the kernel functions [13]. The kernel is basically deﬁned employing Eq. 4: k(¯ x) = 1; if |¯ x| ≤ 1; otherwise 0;

(4)

In our work, we implemented Cubic SVM with a polynomial kernel function of order 3 (box constraint level = 1), which can be expressed using Eq. 5. 3 k xTi .xj + 1

3

(5)

Results and Discussion

For our experimental studies, we employed an 64-bit PC (Acer) with a i5 CPU (2.4 GHz) and 8 GB RAM, as well as MATLAB R2017b for all sorts of program implementation.

Gabor Wavelet Based Fused Texture Features

3.1

29

GW Decomposition

Figure 4(a) exhibits a lesion segmented image of aﬀected mungbean leaf, and Fig. 5 demonstrates all 18 sub-bands derived using GW decomposition with two distinct scales and nine diﬀerent orientations. To highlight the feasibility of GW representation, we performed discrete wavelet transform (DWT) with basic ﬁlter (ﬁlter: haar, order: 4) on the same image illustrated in Fig. 4(a) and presented all 16 generated sub-bands in Fig. 4(b).

Fig. 4. (a) Segmented image (b) Outcome of decomposition by applying DWT (4 levels).

3.2

Statistical Analysis

In the Cubic SVM classiﬁer, we feed a 120×3827 matrix, where the row value 120 deﬁnes the total amount of leaf photos in our dataset, and the overall number of retrieved GW-based fused texture features for one image is represented by the column value 3827. For the performance assessment of our suggested method, we evaluated four distinct indices: sensitivity, speciﬁcity, precision, and accuracy. We also carried 10 fold cross validation to ensure optimal validation, and the corresponding statistics for each speciﬁc disease class are shown in Table 1. Table 1 demonstrates that overall attained sensitivity, speciﬁcity, precision, and accuracy incorporating 10 fold cross validation are 91.11%, 95.56%, 91.39%, and 91.11%, respectively. This is because GW-based fused texture features are superior at categorizing the complicated texture of mungbean leaf lesions and retrieving local texture properties.

30

S. Majumder et al.

Fig. 5. Obtained 18 sub-bands of GW decomposition Table 1. Statistical analysis adopting 10-fold cross validation (unit: %).

3.3

Disease’s Name

Sensitivity Speciﬁcity Precision Accuracy

Cercospora Leaf Spot Powdery Mildew Yellow Mosaic Overall

93.33 86.67 93.33 91.11

93.33 93.33 100 95.56

87.50 86.67 100 91.39

93.33 86.7 93.33 91.11

Comparison to Other Classifiers

Our collected features were also supplied into linear discriminant analysis (LDA), decision tree (DT), and k nearest neighbour (KNN) classiﬁers for comparability purpose with cubic SVM employing 10 fold cross validation, with their outcomes reported in Table 2. Table 2. Comparison of classiﬁers (unit: %).

3.4

Classiﬁers

Sensitivity Speciﬁcity Precision Accuracy

LDA DT KNN Cubic SVM

75.56 62.22 75.56 91.11

87.78 81.11 87.78 95.56

75.48 63.55 76.12 91.39

75.56 62.22 75.56 91.11

Comparison to Existing Approaches

To demonstrate the eﬀectiveness and sustainability of our work, we compared our suggested GW+Texture+CubicSVM strategy to three diﬀerent method-

Gabor Wavelet Based Fused Texture Features

31

ologies, the ﬁndings of which are reported in Table 3. Table 3 indicates that DCT+DWT+Linear SVM [1] yields 82.22%, 91.11%, 82.46%, and 82.25%; Hue+Linear SVM [18] yields 75.6%, 87.78%, 75.93%, and 75.55%; and CNN [3] yields 88.89%, 94.44%, 89.36%, and 88.91%, respectively, in terms with sensitivity, speciﬁcity, precision and accuracy. On the other hand, our proposed GW+Texture+CubicSVM approach achieves 91.11%, 95.56%, 91.39%, and 91.11%, respectively, in therms of those 4 diﬀerent indices which seem to be superior and improved than others. Table 3. Comparison to Existing Approaches (unit: %). Methods

Sensitivity Speciﬁcity Precision Accuracy

DCT+DWT+Linear SVM [1] Hue+Linear SVM [18] CNN [3] GW+Texture+Cubic SVM (Proposed)

82.22 75.6 88.89 91.11

3.5

91.11 87.78 94.44 95.56

82.46 75.93 89.36 91.39

82.25 75.55 88.91 91.11

Analysis of Computational Time

We assessed elapsed time for each produced program utilizing 2 diﬀerent MATLAB functions: tic and toc. The ﬁndings of computational time analysis for all 4 employed classiﬁers are depicted in Fig. 6, which reveal that LDA, DT, KNN, and Cubic SVM consume 2.7310 ± 2.0297, 3.1002 ± 1.8578, 2.5138 ± 1.9354, and 3.2910 ± 1.8958 s seconds (mean ± standard deviation), respectively. 3.6

Discussion

Table 2 highlights that Cubic SVM surpasses all four deployed classiﬁers when adopting 10-fold cross validation, with a sensitivity of 91.11%, a speciﬁcity of 95.56%, a precision of 91.39%, and an accuracy of 91.11%. DT, on the other hand, provide one of the worst performance, with a sensitivity of 62.22%, a speciﬁcity of 81.11%, a precision of 63.55%, and an accuracy of 62.22%, allowing us to state that cubic svm delivers the best classiﬁcation performance of all the classiﬁers selected. Outcomes of Table 3 implies that our GW+Texture+Cubic SVM approach is extremely resilient. This could be for two basic reasons. To begin, we suggested a novel feature extraction strategy to provide a productive collection of features for our diseases detection system by integrating GW and texture characteristics. Second, Cubic SVM is a powerful way to improve classiﬁcation results by evaluating detailed directional data. The plots in Fig. 6 illustrate that the expected computation time for all employed classiﬁers are satisfactory and is fairly close. In addition, as model

32

S. Majumder et al.

Fig. 6. Comparison of classiﬁers required computational time (Unit: seconds)

training will not be necessary for each new instance, calculation time will be reduced even further in practice. We employed a self-collected database with a limited collection of pictures to examine the performance of our retrieved GW based texture features for classifying mungbean leaf lesion, which is the major shortcoming of our study. In the future, we plan to use additional aﬀected mungbean leaf images to assess our GW+Texture+Cubic SVM approach. Moreover, in addition to other classiﬁcation approaches for evaluating the overall performance of extracted features, there are more highly developed transforms such as Shearlet Transform that have the potential to outperform GW, which we would like to explore in our future research work.

4

Conclusion

Our study introduces a gabor wavelet based fused texture feature extraction algorithm for early disease diagnosis in mungbean leaves. Traditional texture features are substituted with GW-based texture features in order to more correctly categorize disease classes with overall enhanced performance. Comparative experimental ﬁndings revealed that our GW+Texture+CubicSVM method outperforms three other current techniques by obtaining improved outcomes which directly prove our method’s reliability and supremacy.

Gabor Wavelet Based Fused Texture Features

33

Acknowledgement. This study is being carried out with the collaboration of the CRG of PIU-BARC, NATP-2, Asi@Connect, and TEIN society. Special thanks to Sudipto Baral and Manish Sah from CSE-12th batch, Patuakhali Science and Technology University, Patuakhali, Bangladesh for their eﬀorts and assistances in preparing the dataset.

References 1. Akhtar, A., Khanum, A., Khan, S.A., Shaukat, A.: Automated plant disease analysis (APDA): performance comparison of machine learning techniques. In: 2013 11th International Conference on Frontiers of Information Technology, pp. 60–65. IEEE (2013) 2. Albregtsen, F., et al.: Statistical texture measures computed from gray level coocurrence matrices. Image Processing Laboratory, Department of Informatics, University of Oslo 5(5) (2008) 3. Ashok, S., Kishore, G., Rajesh, V., Suchitra, S., Sophia, S.G., Pavithra, B.: Tomato leaf disease detection using deep learning techniques. In: 2020 5th International Conference on Communication and Electronics Systems (ICCES), pp. 979–983. IEEE (2020) 4. Ashourloo, D., Aghighi, H., Matkan, A.A., Mobasheri, M.R., Rad, A.M.: An investigation into machine learning regression techniques for the leaf rust disease detection using hyperspectral measurement. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(9), 4344–4351 (2016) 5. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145, 311–318 (2018) 6. Jain, U., Nathani, K., Ruban, N., Raj, A.N.J., Zhuang, Z., Mahesh, V.G.: Cubic SVM classiﬁer based feature extraction and emotion detection from speech signals. In: 2018 International Conference on Sensor Networks and Signal Processing (SNSP), pp. 386–391. IEEE (2018) 7. Li, L., Fieguth, P.W., Kuang, G.: Generalized local binary patterns for texture classiﬁcation. In: BMVC, vol. 123, pp. 1–11 (2011) 8. M¨ aenp¨ aa ¨, T., Pietik¨ ainen, M.: Texture analysis with local binary patterns. In: Handbook of pattern recognition and computer vision, pp. 197–216. World Scientiﬁc (2005) 9. Mohanty, S.P., Hughes, D.P., Salath´e, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016) 10. Rahman, M., Liu, S., Lin, S., Wong, C., Jiang, G., Kwok, N.: Image contrast enhancement for brightness preservation based on dynamic stretching. Int. J. Image Process. 9(4), 241 (2015) 11. Rangarajan, A.K., Purushothaman, R., Ramesh, A.: Tomato crop disease classiﬁcation using pre-trained deep learning algorithm. Procedia Comput. Sci. 133, 1040–1047 (2018) 12. Shijie, J., Peiyi, J., Siping, H., et al.: Automatic detection of tomato diseases and pests based on leaf images. In: 2017 Chinese Automation Congress (CAC), pp. 2537–2510. IEEE (2017) 13. Singh, S., Kumar, R.: Histopathological image analysis for breast cancer detection using cubic SVM. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 498–503. IEEE (2020)

34

S. Majumder et al.

14. Soares, J.V., Cesar Jr, R.M.: Segmentation of retinal vasculature using wavelets and supervised classiﬁcation: theory and implementation. In: Automated Image Detection of Retinal Pathology, pp. 239–286. CRC Press (2009) 15. Soares, J.V., Leandro, J.J., Cesar, R.M., Jelinek, H.F., Cree, M.J.: Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classiﬁcation. IEEE Trans. Med. Imaging 25(9), 1214–1222 (2006) 16. Tang, X.: Texture information in run-length matrices. IEEE Trans. Image Process. 7(11), 1602–1609 (1998) 17. Too, E.C., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of ﬁne-tuning deep learning models for plant disease identiﬁcation. Comput. Electron. Agric. 161, 272–279 (2019) 18. Trivedi, V.K., Shukla, P.K., Pandey, A.: Hue based plant leaves disease detection and classiﬁcation using machine learning approach. In: 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), pp. 549–554. IEEE (2021) 19. Wei, L., Hong-ying, D.: Real-time road congestion detection based on image texture analysis. Procedia Eng. 137, 196–201 (2016) 20. Zuiderveld, K.J.: Contrast limited adaptive histogram equalization. In: Graphics Gems (1994)

Potato Disease Detection Using Convolutional Neural Network: A Web Based Solution Jannathul Maowa Hasi(B)

and Mohammad Osiur Rahman

Department of Computer Science and Engineering, University of Chittagong, Chattogram 4331, Bangladesh [email protected], [email protected]

Abstract. Despite being reliant on agriculture for the provision of food, many nations, including Bangladesh, struggle to feed their populations enough. The potato (Solanum tuberosum) is Bangladesh’s second-most popular and in-demand crop. But deadly diseases late blight and early blight cause an enormous loss in potato production. To increase plant yields, it is important to identify the symptoms of these diseases in plants in the early stage and advise farmers on how to respond. This project involves the development of a web application that allows users to upload images of potato leaves and then use a trained CNN model to diagnose the disease from those images. After comparing with different Convolutional Neural Network Models (EfficientNetB0-B3, MobileNetV2, DenseNet121, and ResNet50V2), MobileNetV2 was able to reach an accuracy of 96.14% with the test dataset in detecting early blight and late blight. So, MobileNetV2 is deployed in the web application to detect the disease of the input image. The developed web application offers a user-friendly interface that enables farmers who are less tech-savvy to use this method to identify disease at an early stage and prevent it. Keywords: Deep Learning · CNN · Potato Disease · Late Blight · Early Blight · Disease Detection

1 Introduction No food no life. In Bangladesh, more than 40 million peoples (27 percent of the population) are food insecure, with more than 11 million people suffering from intense hunger [1]. In Bangladeshi farming, Potato (Solanum tuberosum) is one of the most demanding and cultivating plants. Bangladesh is the world’s seventh largest potato grower. It is the second most produced crop in Bangladesh, behind rice [2]. But being affected by deadly diseases like late blight or early blight Bangladeshi potato cultivation faces a great loss every year. According to the Department of Agricultural Extension (DAE), with an annual average demand of around 70 lakh tones, there was a surplus of about 40 lakh tones of potatoes in 2020 [22]. Late blight and early blight are the two most common and destructive diseases of potatoes. Alternaria solani is the fungus that causes early blight in potatoes. The disease © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 35–48, 2023. https://doi.org/10.1007/978-3-031-34619-4_4

36

J. M. Hasi and M. O. Rahman

damages leave, stems, and tubers, reducing yield, tuber size, tuber storability, freshmarket, processed tuber quality, and crop marketability [3]. Every year, early blight wreaks havoc on the potato crop, producing premature defoliation, a major reduction in tuber output, and a quality loss of up to 50% [4]. Late blight (caused by the water mold Phytophthora infestans) is the most destructive, fungal disease in potatoes. Late blight causes the loss of almost three million hectares of potato fields globally [5]. Bangladesh also faces a loss of 25–27% of potato production each year due to that specific disease according to FAO (Food and Agriculture Organization of United Nations) [6]. The effect of applying remedies like- fungicides against potato blights highly depends on the timing of the application. After affecting certain plants this disease can destroy the plants of the entire field in no time [7], and the decay continues even after uprooting all the infected plants. To improve the efficiency of controlling late blight detecting late blight infection effectively in the primary stage and informing the farmer about the prediction is mandatory. The entrance of the fungi responsible for late and early blight is through leaves. So, the symptoms appear at the leaves first (Fig. 1).

Fig. 1. Healthy, Late Blight and Early Blight Infected Potato Leaves

Traditionally farmers depend on their eyesight and visual inspection to detect the symptoms. But this is not feasible as humans are not error-free. Hopefully, in recent years many researchers used CNN to detect plant disease resulting in a truly significant result [8].

2 Related Work Earlier, many researchers worked with CNN for developing a plant disease classification systems and recommended various approaches. Tiwari et al. [9] proposed a model using deep learning to detect potato leaf diseases. The Author used pre-trained models inceptionV3, VGG16, and VGG19 to extract the feature of an input image. And for classification of the input image classifiers - KNN, SVM, Neural Network, and logistic regression are used. According to that study, author found the optimal result with model- VGG19 and classifier – Logistic Regression. Emma Harte [21] worked with a plant disease recognition system using CNN. The author used ResNet34 and optimized it to get a better result while dealing with ‘in-field’ images. But the great irony of this work is, that although the author’s model showed a great accuracy while tested with laboratory-controlled images, that same model showed an accuracy of only 44% with images collected from fields or other no controlled image

Potato Disease Detection Using Convolutional Neural Network

37

sources. Moreover, the author implemented a web-based classification system with that optimized model. Agarwal et al. [10] presented a deep learning-based technique to detect early and late blight diseases in potatoes. His experiments show that his proposed model works even in difficult conditions like changing backdrops, image sizes, spatial differentiation, a high-frequency variation in illumination grades, and real-world photos. His proposed Convolution Neural Network Architecture (CNN) includes four convolution layers, each with 32, 16, and 8 filters, and provides training and testing accuracy of 99.47% and 99.8%, respectively. Afzaal, H. et al. [11] trained three convolutional neural networks- GoogleNet, VGGNet, and EfficientNet, using the PyTorch framework to detect early blight using AI (Artificial Intelligence). As result, He achieved validation accuracy in the range of 0.95 – 0.97 for all three candidates in detecting early blight. But, the performance accuracy of EfficientNet and VGGNet was higher when compared with GoogleNet. Rashid, J. et al. [12] developed a model to detect potato leaf diseases using multi-level deep learning. In the first level of the author’s proposed model, he used the YOLOv5 image segmentation technique to extract potato leaf images from potato trees. In the second layer of the proposed model a PDDCNN (Potato Disease Detection Convolutional Neural network) is developed to classify the input images as Healthy, Late Blight infected, or Early Blight infected. The author’s developed model achieves an average accuracy of 91.15%.

3 Proposed System In Fig. 2 the block diagram of the proposed system is illustrated. It is aimed to develop a web application using Convolutional Neural Network model MobileNetV2. MobileNetV2 was proven the best model concerning performance by comparing eight different CNN models (ResNet50V2, MobileNetV2, DenseNet121, and EfficientNetB0 – B3) with the same dataset. At first, the CNN model is trained with the selected dataset. Then the trained CNN model is used to implement a web application. The application takes the potato leaf’s image as input. The inputted image is then pre-possessed. After preprocessing feature extraction takes place. Extracted features are used to run the classification process and finally the result of classification, which is originally the name of the predicted class is viewed by the application user on the web interface. 3.1 Methodology This research has two main parts- comparison of models and implementation of the web application to detect potato diseases. For the comparison part, some well-known Convolutional Neural Network (CNN) models were trained using the dataset. For the training purpose, TensorFlow’s Keras framework is used and the models are compared concerning their performances. After the comparison, the best candidate model was selected to be deployed to the web application. The pertained model is integrated with the web app, developed using Flask API, to classify the inputted image.

38

J. M. Hasi and M. O. Rahman

Fig. 2. Block Diagram of the System

CNNs have multiple layers, so when an image passes through the CNN layers, it technically gets deeper and deeper by getting smaller in every layer [13]. So, when the leaf’s image passes each layer it gets deeper and smaller, and the most important features are filtered out. In the case of MobileNetV2, it was introduced to gain better performance with mobile devices [15]. The basic architecture of MobileNetV2 consists

Potato Disease Detection Using Convolutional Neural Network

39

of 17 building blocks in a row and is followed by a 1x1 convolution layer, 1 Avg pool, and a classification layer resulting in the model 53 layers deep [14].. 3.2 Dataset Description All images of potato leaves are collected from New Plant Disease Dataset [16]. This dataset contains a total of 87.9k images and 38 subdirectories with test and train classes. To carry out the research, the dataset is divided into 80% for training and 20% for validation. Table 1 shows the number of samples under the Early_Blight, Healthy, and Late_Blight classes of the dataset after the division. Sample images from the used dataset are also viewable in Fig. 3. Table 1. Dataset Description Class

Train

Test

Total

Healthy

1824

456

2280

Late_Blight

1939

485

2424

Early_Blight

1939

485

2424

Fig. 3. Sample images from Healthy, EarlyBlight and LateBlight classes

3.3 Data Preprocessing Images from the selected dataset are preprocessed to get better results while classification. In the preprocessing phase input image was resized to (256, 256) for EfficientNetB0, DenseNet121, ResNet50V2and MobileNetV2, (240, 240) for EfficientNetB1, (260, 260) for EfficientNetB2 and (300, 300) for EfficientNetB3. In case of data augmentation, three different versions (excluding the original version) of images from the “New Plant disease Dataset” are used to ensure that the models are not biased while classifying a sample [16].

40

J. M. Hasi and M. O. Rahman

Fig. 4. Different version of leaf images

3.4 Evaluation Measures Following evaluation measures [17] are used to evaluate the convolutional neural network model that is integrated into the web application. Accuracy Accuarcy =

Number of correct predictions Number of total predictions

(1)

Latency The time taken by the model to process one unit of data is referred to as latency. It is measured in seconds. Precision Precision =

True Positive True Positive + False Positive

(2)

Recall Recall =

True Positive True Positive + False Negative

(3)

F1-Score F1-score = 2 ∗

Precision ∗ Recall Precision + Recall

(4)

3.5 UML Use Case Diagram of Proposed Web Application UML Use Case Diagrams are used to collect a system’s needs, including those resulting from both internal and external influences. When a system is investigated to determine its functioning, use cases are created and actors are recognized [18]. The user is the actor in the online application, and the three primary features are uploading an image, viewing predictions, and watching associated videos. The Use Case Diagram of the proposed web app is shown in Fig. 5.

Potato Disease Detection Using Convolutional Neural Network

41

Fig. 5. UML Use Case Diagram

3.6 UML Activity Diagram of Proposed Web Application A behavioral diagram that shows a system’s behavior is called an activity diagram [18]. The control flow from beginning to end and every action that is being done are shown in the system’s activity diagram. The Activity Diagram of the proposed system is shown in Fig. 6.

4 Experimental Result In this section, the comparison of CNN models,model evaluation, user interface of the web application, sample prediction of the web application, comparison with other works are described in details. 4.1 Comparison of CNN Models The result of comparison of CNN models EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, MobileNetV2, DenseNet121 and ResNet50v2 is shown in Table 2. A comparison of the accuracy and training time of those models is also shown in Fig. 7. Hyper parameters used in the comparison – – – –

Activation = Softmax Class Mode = Categorical Optimizer = RMSprop Classifier = Softmax

42

J. M. Hasi and M. O. Rahman

– Input Shape = (224, 224, 3) for EfficientNetB0, DenseNet121, ResNet50V2, MobileNetV2, (240, 240, 3) for EfficientNetB1, (260, 260, 3) for EfficientNetB2 and (300, 300, 3) for EfficientNetB3

Fig. 6. UML Activity Diagram

Fig. 7. Comparison of accuracy and training time

After the comparison, MobileNetV2, EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, DenseNet121 and ResNet50v2 reached 96.14%, 93.55%, 91.94%, 92.78%, 91.37%, 92.85% and 82.68% respectively. Considering the latency, EfficientNetB1 (0.0301 s) takes a lead over MobileNetV2 (0.0336 s) but MobileNetV2 beats EfficientNetB1 with a training time of 16.8s. Moreover, MobileNetV2 was designed to get better results with mobile devices. Considering all, the comparison makes MobileNetV2 the best candidate.

6

EfficienetB0

DefenseNet121

ResNet50V2

MobileNetV2

EfficienetB1

EfficienetB2

EfficienetB3

1

2

3

4

5

6

7

6

6

6

6

6

6

Batch size

Mode l

Expe rime nt no.

10

10

10

10

10

10

10

Epochs

0.0001

0.0001

0.0001

0.0001

0.0001

0.0001

0.0001

Learning Rate

Parame te rs

0.9649 Re call

0.8699 Pre cis ion

Late blight

0.8887 Re call

0.9329 Pre cis ion

Late blight

0.9567

Healthy

0.7361 Re call

0.8283 Pre cis ion

Late blight

0.9794

Healthy

0.9588 Re call

0.9337 Pre cis ion

Late blight

0.9485

Healthy

0.8825 Re call

0.9049 Pre cis ion

Late blight

0.9402

Healthy

0.9673 0.8792 0.8983

Early Blight Healthy Late blight

0.8722 Re call

0.9276 Pre cis ion

Late blight

0.9629

Healthy

0.8557

0.9737

0.9155

0.9496

0.9473 0.9078

Early Blight

0.9364

0.9048 0.9510

Early Blight

0.9781

0.9829 0.9696

Early Blight

0.7610

0.7917 0.8785

Early Blight

0.9408

0.9336 0.9186

Early Blight

0.9715

0.9651

Healthy

0.8722

Re call

0.9860

Early Blight

Pre cis ion

Pe rformance Matrix

Table 2. Comparison of CNN models

0.8765

0.9240

0.9407

F1-score

0.8990

0.9282

0.9550

F1-score

0.8935

0.9436

0.9221

F1-score

0.9461

0.9738

0.9654

F1-score

0.7795

0.8155

0.8756

F1-score

0.9102

0.9296

0.9450

F1-score

0.9150

0.9683

0.9256

F1-score

0.0482s

0.0358s

0.0301s

0.0336s

0.0385s

0.0408s

0.0396s

Late ncy

0.9137

0.9278

0.9194

0.9614

0.8268

0.9285

0.9355

Accuracy

Potato Disease Detection Using Convolutional Neural Network 43

44

J. M. Hasi and M. O. Rahman

4.2 Model Evaluation Confusion Matrix evaluates a model’s performance by the ability to classify samples as positive or negative correctly. Figure 8 shows the confusion matrix of MobileNetV2.

Fig. 8. Confusion Matrix & performance matrix of the model

Accuracy, precision, recall, f1-score etc. helps us to know how good the model actually performs while classifying images. The model loss measures how well a model performs during classification, whereas the accuracy measures how closely a model’s forecast is to the actual data. After each optimization iteration, a model’s performance is shown by its loss value [19] (Fig. 9).

Fig. 9. Model Accuracy and Model Loss of MobileNetV2

4.3 User Interface of the Web Application The user interface of the implemented web application is shown in Fig. 10. The user interface contains two buttons, one for selecting the leaf’s image to upload from the device’s storage and the other is to predict the disease by analyzing the input image.

Potato Disease Detection Using Convolutional Neural Network

45

Fig. 10. User Interface of the web app

4.4 Sample Prediction of the Web Application The proposed system predicts whether the uploaded image is healthy or infected by late blight or early blight and shows the prediction as a result. The prediction process takes place when the user clicks on predict button and as result, the predicted class name is shown on the bottom section of the web page alongside a preview of the uploaded image, some preventative measures, and some disease-related videos to guide the user with his proceedings. Figure 11 shows a sample output of the system. 4.5 Comparison with Other Works Previously many authors worked to detect plant diseases. Although the outcome of the work is a web application that will help the users detect the disease by simply capturing an image, which is the main point to be compared, a comparative performance analysis with some existing works is shown in Table 3.

46

J. M. Hasi and M. O. Rahman

Fig. 11. Sample output of the system

Table 3. Comparative Analysis with Other Works Reference

Plant

Used Approach

Accuracy

Remarks

[9]

Multiple

ResNet34

44%

Better accuracy with field captured images

[12]

Potato

YOLOv5 Image segmentation technique and Multilayer CNN

91.15%

Comparatively better accuracy

[10]

Potato

Multilayer CNN model with 4 layers

99.45%

No proof that the proposed method is better than MobieNetV2 with mobile devices

Proposed Work

Potato

MobileNetV2

96.14%

Final outcome is a handy user friendly web app. Better accuracy with mobile devices

Potato Disease Detection Using Convolutional Neural Network

47

5 Conclusion Early detection of plant diseases is essential for maintaining crop quality. In comparison to hand-crafted-based approaches, deep learning techniques, notably convolutional neural network architectures, show promising outcomes. Potato is one of the major crops of Bangladesh. But due to some diseases, this country faces a massive loss in potato production each year. Detecting late blight and early blight, the two most deadly diseases of potatoes, in the earlier stage can help reduce the production loss. In this research, some CNN models (MobileNetV2, ResNet50V2, DenseNet121, EfficientNetB0-B3) were trained with a New Plant Disease Dataset [16] containing thousands of potato images. While measuring performance, MobileNetV2 reached the best performance with 96.14% test accuracy. So, the trained MobileNetV2 model was deployed and integrated into a web application that takes a potato leaf’s image as input and detects if the leaf is healthy, and predicts the disease otherwise. In the future, there is a plan of developing an organized dataset by collecting images by field studying Bangladeshi farms and gardens. Working with other deadly diseases of potato (for example – Common Scab Streptomyces spp., Fusarium Dry Rot Fusarium spp., and other major vegetables and crops (for example – Rice Blast of Rice (Oryza sativa), Fusarium Wilt of Lentil (Lens culinaris) [20] Stemphylium leaf blight of Onion (Allium cepa), etc.) is also in the future aspects of this research. Nowadays, almost everyone has a mobile device in their hand. So using a cellphone to detect plant disease by simply capturing an image of leaves can act as a lifesaver for our farmers. Hence, it is recommended to use this system to detect potato’s disease in an early stage and take proper steps to reduce the decay caused by these deadly diseases.

References 1. GHI: Global Hunger Index for Bangladesh (2021). https://www.globalhungerindex.org/ban gladesh.html. Last accessed 4 May 2022 2. BBS, Bangladesh Bureau of statistics (BBS): Agricultural Statistics Yearbook, 2012–2013 3. Bauske, M.J., Robinson, A.P.: Early blight in potato (2018). https://www.ag.ndsu.edu/public ations/crops/early-blight-in-potato 4. Landschoot, S., Vandecasteele, M., De Baets, B., Hofte, M., Audenaert, K., Haesaert, G.: Identification of A. arborescens, A. grandis, and A. protenta as new members of the Eur pean Alternaria population on potato. Fung. Biol. 121, 172–188 (2017). https://doi.org/10.1016/j. funbio.2016.11.005 5. The daily Star: Potato freed from deadly disease. https://www.thedailystar.net/frontpage/pot ato-freed-deadly-disease-209158. Accessed 29 Jan 2016, 21 Nov 2021 6. Hengsdijk, H., van Uum, J.: Geodata to control potato late blight in Bangladesh. https:// www.fao.org/e-agriculture/news/geodata-control-potato-late-blight-bangladesh-geopotato. Accessed 15 Mar 2017 7. Pande, A., Jagyasi, B.G., Choudhuri, R.: Late blight forecast using mobile phone based agro advisory system. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) PReMI 2009. LNCS, vol. 5909, pp. 609–614. Springer, Heidelberg (2009). https://doi.org/ 10.1007/978-3-642-11164-8_99 8. Toda, Y., Okura, F.: How convolutional neural networks diagnose plant disease. Plant Phenom. 2019, 1–14 (2019). https://doi.org/10.34133/2019/9237136

48

J. M. Hasi and M. O. Rahman

9. Tiwari, D., Ashish, M., Gangwar, N.: Potato leaf diseases detection using deep learning. IEEE (2020). 978-1-7281-4876-2/20/\$31.00 10. Agarwal, M., Sinha, A., Gupta, S.K., Mishra, D., Mishra, R.: Potato crop disease classification using convolutional neural network. In: Somani, A.K., Shekhawat, R.S., Mundra, A., Srivastava, S., Verma, V.K. (eds.) Smart Systems and IoT: Innovations in Computing. Smart Innovation, Systems and Technologies, vol. 141, pp. 391–400. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8406-6-37 11. Afzaal, H., et al.: Detection of a Potato disease (early blight) using artificial intelligence. Remote Sens. 13, 411 (2021). https://doi.org/10.3390/rs13030411 12. Rashid, J., Khan, I., Ali, G., Almotiri, S.H., AlGhamdi, M.A., Masood, K.: Multi-level deep learning model for potato leaf disease recognition. Electronics 10(17), 2064 (2021). https:// doi.org/10.3390/electronics10172064 13. Géron, A.: Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems, Chap 14. O’Reilly (2019). https://www. oreilly.com/library/view/hands-on-machine-learning/9781492032632/ 14. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 15. Hollemans, M.: MobileNet version 2. https://machinethink.net/blog/mobilenet-v2/. Accessed 22 Apr 2018 16. Bhattarai, S.: New Plant Disease Dataset (2018). https://www.kaggle.com/vipoooool/newplant-diseases-dataset 17. Gad, A.F.: Evaluating deep learning models: the confusion matrix, accuracy, precision, and recall. https://blog.paperspace.com/deep-learning-metrics-precision-recall-acc uracy/. Accessed 20 May 2021 18. Sommerville, I.: Software Engineering, 9th edn., chapter-7, p. cm. Includes index (2010). ISBN-13: 978-0-13-703515-1, ISBN-10: 0-13-703515-21 19. Riva, M.: Interpretation of loss and accuracy for a machine learning model (2021). https:// www.baeldung.com/cs/ml-loss-accuracy 20. Tiwari, N., Ahmed, S., Kumar, S., Sarker, A.: FusariumWilt: A Killer Disease of Lentil (2018). https://doi.org/10.5772/intechopen.72508 21. Harte, E.: Plant disease detection using CNN (2020). https://doi.org/10.13140/RG.2.2.36485. 99048 22. Potato freed from deadly disease. https://www.thedailystar.net/frontpage/potato-freed-dea dly-disease-209158. Accessed 25 July 2022

Device-Friendly Guava Fruit and Leaf Disease Detection Using Deep Learning Rabindra Nath Nandi1(B) , Aminul Haque Palash1 , Nazmul Siddique2 and Mohammed Golam Zilani3

,

1 BJIT Limited, Dhaka, Bangladesh

[email protected]

2 School of Computing, Engineering and Intelligent Systems, Ulster University, Coleraine, UK

[email protected]

3 Swinburne Online Australia, Melbourne, Australia

[email protected]

Abstract. This research presents a machine learning model that detects plant disease from images of fruits and leaves. Five state-of-the-art machine learning models are used in this research, which achieved high accuracy in detecting plant disease. The problem with such models with high accuracy is that the models are of larger size which do not allow them to be applied to end-user devices. In this research, model quantization techniques such as float16 and dynamic range quantization are applied to the model architectures. The experimental results show that the GoogleNet model reached the size of 0.143 MB with an accuracy of 97% and the EfficientNet model reached the size of 4.2 MB with an accuracy of 99%. The source codes are available at https://github.com/CompostieAI/Guava-diseasedetection. Keywords: Fruits and leaf disease detection · Guava disease · convolutional neural network · model quantization · model size reduction

1 Introduction Agriculture is a key player for sustainable development and global food security, which is becoming more challenging with the growing world population. Agriculture faces crop loss every year due to draught, pests and plant diseases. Plant disease is a major threat to plant health, production of major crops and economic loss in agriculture every year. There are plant pathogens which can spread from plant to plant very fast. Therefore, early diagnosis of plant disease is of key importance to prevent disease spread and crop production. Pathological test for diagnosis of plant disease is very often time consuming due to sample collection, processing and analysis. Moreover, pathological laboratory facilities are also not available in many parts of the country. An alternative to pathological test is the traditional visual assessment of plant symptoms. The traditional method requires an experienced expert in the domain but the visual assessment method © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 49–59, 2023. https://doi.org/10.1007/978-3-031-34619-4_5

50

R. N. Nandi et al.

is too subjective and the domain expert may not always be available in remote areas. Moreover, there are variants of disease due to variations of plant species and climate changes, which can make it difficult for an expert to diagnose plant disease correctly. Machine learning methods are widely employed for plant disease diagnosis with higher accuracy [1–3]. The use of deep learning [4], a class of machine learning algorithm, is increasing due to its promising result in numerous applications including agriculture and big data. A study shows that a Deep CNN (Convolutional Neural Network) provides an accuracy of 99.35% for a dataset containing 54,000 images of 26 different types of diseases for 14 crops. Many of such deep learning methods are web-based applications, which do not consider the end-users’ device leading to degraded performance of the system [5]. A similar study with a dataset containing 7,000 images of 25 different plant categories is reported where multiple DCNN architectures are used and the highest accuracy achieved is 99.53%. The study suggested the use of the model for real-time plant disease identification [6]. Unfortunately, these researchers didn’t investigate the model size complexity and application overhead for real-time plant disease detection. Deep architectures are usually large in size and the inference time is directly dependent on the number of parameters of the architecture. Sometimes a trained model with a reasonably large size is not feasible for mobile-based applications where hardware does not support the execution of the model in real-time [7]. Edge AI refers to the use of AI models for prediction in Edge devices and it is currently an active research area to use the highly efficient larger deep model on Edge devices [8]. The first requirement of Edge AI is model size optimization. The optimized models are smaller in size and suitable to deploy on Edge Devices. Google has some fascinating tools and techniques for model quantization and compression and lite version conversion [9]. This research investigates different models of deep learning for detecting disease fruit and leaf based on available dataset. Two popular quantization techniques are employed: 1) Float16 quantization and 2) Dynamic range quantization. The model performances are verified after quantization and found promising. The empirical investigations show that the optimized models are feasible for use by affordable smartphones with low specifications available in Bangladesh. The primary contributions of this study are: i) the development of deep learning models for fruits and leaves disease diagnosis, ii) optimization of deep learning models, and 3) conversion of the models into optimized TF-lite version applicable to smartphone applications. The rest of the paper is organized as follows: Sect. 2 describes related works, Sect. 3 describes the dataset, Sect. 4 describes the method, Sect. 5 presents the experiments and Sect. 6 concludes the paper with few directions for future work.

2 Related Works There are several studies done using both machine learning and deep learning for Guava disease detection. The generic procedure for detection of plant disease consists of 5 stages: image acquisition, labeling, feature extraction, feature fusion, feature selection, and disease classification. Image acquisition is the first step towards image processing.

Device-Friendly Guava Fruit and Leaf Disease Detection

51

It is done with a high-resolution digital camera followed by labeling for classification. Feature extraction is a very crucial part that refers to the process of transforming raw data (i.e., image) into numerical features that can be used by a machine learning algorithm. The most widely used features are color features and Local Binary Pattern (LBP) [10], Gray Level Co-Occurrence Matrix (GLCM) [11], and Scale Invariant Feature Transform (SIFT) [12]. Image segmentation [13], a process of partitioning the image into multiple segments or set of pixels, is often applied before feature extraction. Deep convolutional neural network (DCNN) is used for disease detection. Bhushanamu et al. [14] used Temporal CNN where, firstly, contour detection is applied to detect the shape of the leaf and Fourier feature descriptor is used as features to the 1D CNN. Mostafa et al. [15] used five different CNN structures to identify different guava diseases such as ResNet-50, ResNet-101, AlexNet, SqueezeNet and GoogLeNet. A CNN based plant disease identification model is developed by Mohanty et al. [5] which can detect 26 diseases of 14 crop species. An attention mechanism is developed by Yu et al. [16] that highlights the leaf area capable of capturing more discriminative features. Jalal et al. [17] used DNN to develop a plant disease detection system for apple leaf diseases where SURF is used for feature extraction and an evolutionary algorithm is used for feature optimization. A leaf disease model based on MobileNet model is developed in [18] and the performance of MobileNet is evaluated and compared with ResNet152 and InceptionV3 models. In all of these studies, deep learning models are developed and trained for plant disease identification but there has been no proper study on the model optimization meaning reduction of model size suitable for end-user device applications.

3 Dataset Guava belongs to the Myrtaceae plant family and it is a common tropical fruit cultivated in many tropical and subtropical regions like Bangladesh, India, Pakistan, Brazil, and Cuba [15]. Guavas are incredibly delicious and rich in antioxidants, vitamin C, potassium, calcium, nicotinic acid, and fiber. The data was gathered in the middle of 2021 by an expert team from Bangladesh Agricultural University from a reasonable size Guava plantation in Bangladesh. A digital SLR camera was used to capture the image, with no preprocessing being used [19]. The dataset includes four prevalent diseases: Red Rus, Scab, Styler end Rot, Phytophthora, as well as disease-free leaves and fruits. A fungus called Phytophthora causes a fruit disease that appears as black blemishes on young fruits. Red Rus is a Guava leaf disease caused by fungus. Different shapes of lesion, e.g. ovoid, corky, and spherical, on the surface of the guava fruits are the signs of scab, which is a fungus. Styler end Rot begins at the styler and spreads towards the root in guava fruits shown in Fig. 1. The dataset comprises two types of datasets: original dataset and augmented dataset. Original dataset contains 681 samples and the augmented dataset contains 8525 samples. Class-wise data distribution is provided in Table 1. The minimum samples are 87 of Red Rus disease and maximum number of samples are 154 belonging to Disease-free (fruit). It can be roughly said that the original dataset is a roughly balanced dataset. The augmented dataset is quite larger than the original dataset and about 10 times larger than

52

R. N. Nandi et al.

the original dataset. The range of the augmented dataset is 1264 to 1626. For the sake of experiments, both datasets original and augmented are combined. Table 1. Disease-wise data distribution for both original and augmented dataset Disease Name

Original Data

Augmented Data

Phytophthora

114

1342

Red Rus

87

1554

Scab

106

1264

Styler end Rot

96

1463

Disease-free (leave)

126

1276

Disease-free (fruit)

154

1626

Fig. 1. Samples of images containing four types of diseases (Red Rust, Styler and Root, Phytopthora, Scan), and Disease-free Leaf, Disease-free Fruit.

Device-Friendly Guava Fruit and Leaf Disease Detection

53

4 Method There are two objectives of this research. The primary objective is to detect the disease from the samples with high accuracy and the secondary objective is to optimize the model without degrading the detection performance. The work comprises two parts: 1) model training and 2) model optimization. The overall architecture of the proposed system is illustrated in Fig. 2. 4.1 Model Training Dataset is firstly divided into the training, validation, and test sets as the source images are not provided in a grouped manner. As part of the image pre-processing, resizing is applied to images, original orientation and colors are kept for the images. No special type of feature extraction is used before using the model. Five prominent image classification models are used. These are VGG-16 [20], GoogleNet [21], Resnet-18 [22], MobileNet-v2 [23], and EfficientNet [24]. The model parameters are described in Table 1. VGG-16 has the highest number of parameters 138,357,544. MobileNet-v2 has 2,230,277 parameters which is the lowest in number among these models. A common experimental setup is used and pretrained models are used for fine-tuning to the Guava dataset. During the model training, the models are tested on both training and validation data, and after the model training, models are saved and validated with the test dataset (Table 2). The parameters of the listed model are large in number and model size grows according to the increase of the parameters which have an inverse impact on inference time, battery consumption and device storage. Hence, model optimization is needed to optimize and compress the model for suitable use in edge devices. 4.2 Model Optimization Model optimization involves different factors: low latency, memory utilization, low power consumption, low cost, lowering the size of the payload for over-the-air model updates. Resources are even more limited on edge devices, such as smartphones and Internet of Things (IoT), therefore model size and computation efficiency become a top priority. Besides this, when using machine learning models, inference efficiency is a significant concern. One possible solution is to enable execution on hardware optimizedfor fixed point operations and preparing optimized models for special purpose hardware accelerators. According to Tensorflow documentation [25], model optimization can be carried out in three ways: Quantization, Pruning and Clustering. The precision of the integers used in representation of the model parameters is decreased by quantization. Model parameters are of 32-bit floating-point values by default. Pruning reduces the model parameters by removing parameters with a minor impact on model prediction. When using clustering, weights of each layer in the model are divided into a predetermined number of clusters, and only the centroid values of each cluster are taken into account. Only the quantization techniques are applied to model optimization in this research. There are mainly four types of quantization techniques: Quantization-aware training,

54

R. N. Nandi et al. Model Training

Processed Image data

Colouring

Plant disease

Orienting

Resizing

Image Acquisition Deep Learning Model Pooling Convolution

Image pre-processing without explicit feature extraction

Image data

Leaf

Quantisation

TF-Lite conversion

Model with float-16 data format Conv Pool

TF-lite model with float-16 data format Conv Pool

Convolution Pooling

Dynamic range quantisation

Conv

Pool

Model with float-8 data format

Conv Pool

Model testing and size calculation

Model with float 32 data format

Float 16 quantisation

Model Optimisation

TF-lite model with float-8 data format

Fig. 2. The overall workflow of our system. Data split to model training and performance analysis, model optimization, and further performance analysis to the optimized model.

Table 2. No of trainable parameters of different models on ImageNet dataset Model

Trainable Parameters

VGG-16

138,357,544

GoogleNet

5,605,029

ResNet-18

11,179,077

Mobilenet-V2

2,230,277

EfficientNet-b2

4,013,953

Post-training float16 quantization, Post-training dynamic range quantization, and Posttraining integer quantization [17]. A summary of features of Post-training quantization techniques is presented in Table 3. Post-training quantization approach is presented in Fig. 3. The model constants, such as weights and bias values, are converted from full precision floating point (32-bit) to

Device-Friendly Guava Fruit and Leaf Disease Detection

55

Table 3. Summary of three types of post-training quantization techniques Technique

Benefits

Hardware

Dynamic Range Quantization

4x-smaller, 2x-3x speedup

CPU

Full Integer Quantization

4x-smaller, 3x + speedup

CPU, Edge TPU, Microcontrollers

Float16 Quantization

2x smaller, GPU acceleration

CPU, GPU

a reduced precision floating-point data type in Float16 quantization, which reduces model sizes by up to 50% (IEEE FP16). Dynamic-range quantization reduces model size up to 75% by applying float-32 to float-8 quantization. Additionally, it employs “dynamicrange” operators that dynamically quantize activations according to their range to 8 bits and carry out computations using 8-bit weights and activations. Full Integer quantization requires calibration of all floating-point tensor ranges to get the max-min values. It requires a representative dataset to get the max-min values of variable tensors like model input, activations and model output [17].

Fig. 3. Three-types of quantization techniques (Float-16 Quantization, Dynamic Range Quantization and Full Integer Quantization) with their operational details.

The quantized models are small in size compared to the original model. Theoretically, their performance should degrade also. To understand the effect of quantization, quantized models are tested with validation data and test data sets. The next step is to compare the quantized and original models according to both size and accuracy. Finally, to balance the trade-off between size and performance, an optimal model is chosen for specific hardware. The selected model is in TF-lite format that can be directly used by Android and iOS applications with their convenient APIs.

56

R. N. Nandi et al.

5 Results and Discussion Experiments have been carried out on all five models with pre-trained models without freezing the layers. Only the decision layer is changed as per the number of classes of guava disease. Stochastic Gradient Descent (SGD) is used with decay LR by a factor of 0.1 every 7 epochs. For evaluation purposes, accuracy, precision, recall, and macro-F1 scores are used. Table 4 shows the accuracy, precision, recall, and macro-f1 score for both validation data and training data. EfficientNet provides 99% accuracy and 100% f1-score which is the overall best result but other models also provide satisfactory results. GoogleNet provides 97% accuracy and f1-score on test data. Table 4. Experimental results (Acc, Pr, Re, marco-F1) on training and testing data using different backbone cnn network Model

Validation data

Test data

Acc

Pr

Re

F1

Acc

Pr

Re

F1

VGG-16

0.97

0.97

0.96

0.96

0.97

0.98

0.97

0.98

GoogleNet

0.96

0.96

0.96

0.96

0.97

0.97

0.97

0.97

ResNet-16

0.96

0.97

0.96

0.96

0.97

0.97

0.97

0.97

MobileNet-v2

0.97

0.98

0.98

0.98

0.98

0.99

0.99

0.99

EfficientNet-b2

0.98

0.99

0.98

0.98

0.99

1

1

1

Table 5 shows the size reduction result after applying two types of quantization method. The VGG-16 model has the original size of 512.3 MB and is reduced to 256.1 MB and 128.01 MB after applying float16 quantization and dynamic range quantization respectively. This shows a reduction of model by 50% for float16 quantization and 75% for dynamic range quantization. For GoogleNet architecture, the original size of the model is 22.6 MB and the reduced size is 0.668 MB and 0.143 MB using float16 and dynamic range quantization respectively. This is about 156% reduction using dynamic range quantization. The current quantized model size is less than 1% of the original size. For ResNet-16 architecture, the reduction is from 44.8 MB to 22.4 MB and 1.7 MB for float16 quantization and dynamic range quantization respectively. This is about 96% reduction using dynamic range quantization. The current quantized model size is only 4% of the original size. For MobileNet-v2 architecture, the original model size is 9.2 MB and the quantized model size is 0.991 MB and 0.188 MB for float16 quantization and dynamic range quantization respectively. It is found that the MobileNet base model is small in size among all models compared to their original size. The reduction is 49 times smaller than the original model using dynamic range quantization. The quantized EfficientNet-b2 models have 8.1 MB and 4.5 MB where the original model size is 16.4 MB. It is shown that the dynamic range quantization method effectively reduces the model size for all five models compared to the Float-16 quantization. From Table 5,

Device-Friendly Guava Fruit and Leaf Disease Detection

57

the quantized GoogleNet model size using dynamic range quantization is 0.143 MB, which is the lowest among all the models. Table 5. Experimental size comparison before and after applying quantization Model

Optimization Method

Previous size (MB)

Optimized size (MB)

VGG-16

Float16 quantization

512.3

256.10

Dynamic range quantization

512.3

128.08

GoogeNet

Float16 quantization Dynamic range quantization

22.60

ResNet-16

Float16 quantization

44.80

22.40

Dynamic range quantization

44.8

1.70

MobileNet-v2 EfficientNet-b2

22.60

0.668 0.143

Float16 quantization

9.20

0.991

Dynamic range quantization

9.20

0.188

Float16 quantization

16.40

8.10

Dynamic range quantization

16.40

4.50

From Table 6, it can be seen that the result doesn’t change so much from the original model to the quantized models. Only 1–2% changes for a few models. VGG-16 model accuracy decreased from 97% to 96% and the F1-score from 98% to 96% for float16 quantization. For GoogleNet, there is no impact on model quantization and the accuracy is 97% for both original and quantized models. The accuracy of ResNet model is 97% and the quantized models have an accuracy of 96% and 95% respectively. For EfficeinetNet model, there is a minor change in accuracy, precision, recall and F1-score. From the performance perspective, EfficientNet model is the better among all quantized models. The reason for getting good performance of all models is probably the data quality is too good and only a few classes are available in the Guava dataset. Therefore, the optimal choice is to use the GoogleNet model using dynamic range quantization when the model size is the priority and if the model size is not an issue, then the EfficientNet model with dynamic quantization would be a good choice. In this case study, model optimization techniques are explored and their impact on performance on storage size and memory are analyzed. It is clear that, if optimization hasn’t been considered, MobileNet is the best choice as it has a size of 9.2 MB which is the lowest but after quantization. GoogleNet model with dynamic range quantization has size of only 0.143 MB and F1-score 0.97 is overall best candidate model from the all models described in Table 6. EfficientNet-b2 with dynamic quantization can be considered for mobile application if its size 4.5 MB is not a big issue as it has F1-score 0.99 which is better than GoogleNet quantized model.

58

R. N. Nandi et al.

Table 6. Experimental results (Acc, Pr, Re, marco-F1) on testing data using different backbone CNN network after applying model optimization and size comparison with original models Model VGG-16

GoogleNet

ResNet-16

MobileNet-v2

EfficeintNet-b2

Test data No optimization

Size

Acc

Pr

Re

F1

512.3

0.97

0.98

0.97

0.98

Float16 quantization

256.10

0.96

0.96

0.96

0.96

Dynamic range quantization

128.08

0.95

0.96

0.96

0.95

22.6

0.97

0.97

0.97

0.97

No optimization Float16 quantization

0.668

0.97

0.97

0.97

0.97

Dynamic range quantization

0.143

0.97

0.97

0.97

0.97

0.97

0.97

0.97

0.97

No optimization

44.8

Float16 quantization

22.4

0.96

0.96

0.96

0.96

Dynamic range quantization

1.70

0.95

0.96

0.95

0.95

No optimization

9.20

0.98

0.99

0.99

0.99

Float16 quantization

0.991

0.96

0.97

0.97

0.96

Dynamic range quantization

0.188

0.96

0.97

0.97

0.96

0.99

1.00

1.00

1.00

No optimization

16.4

Float16 quantization

8.10

0.99

0.99

0.99

0.99

Dynamic range quantization

4.50

0.99

0.99

0.99

0.99

6 Conclusion and Future Work Device end machine prediction is an active research area to overcome the complexity and cost of cloud computing. In this study, it is demonstrated that model optimization is an elegant way to use large-scale neural network models in edge computing. A similar study can be applied to other problems and datasets also. The future plan is to work with large and complex disease datasets not only considering the aspects of theoretical justification but also the real-life product development.

References 1. Pallathadka, H., et al.: Application of machine learning techniques in rice leaf disease detection. Mater. Today: Proc. 51, 2277–2280 (2022) 2. Sharma, A., Jain, A., Gupta, P., Chowdary, V.: Machine learning applications for precision agriculture: a comprehensive review. IEEE Access 9, 4843–4873 (2020) 3. Fan, X., Luo, P., Mu, Y., Zhou, R., Tjahjadi, T., Ren, Y.: Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 196, 106892 (2022) 4. Wick, C.: Deep Learning. Informatik-Spektrum 40(1), 103–107 (2016). https://doi.org/10. 1007/s00287-016-1013-2

Device-Friendly Guava Fruit and Leaf Disease Detection

59

5. Mohanty, S., Hughes, D., Salath´e, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 2016 6. Ferentinos, K.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018(145), 311–318 (2018) 7. Deng, Y.: Deep learning on mobile devices: a review. In: Mobile Multimedia/Image Processing, Security, and Applications 2019. International Society for Optics and Photonics, vol. 10993, p. 109930A (2019) 8. Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge ai: On-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19(1), 447–457 (2019) 9. Verma, G., Gupta, Y., Malik, A.M., Chapman, B.: Performance evaluation of deep learning compilers for edge inference. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp. 858–865 (2021) 10. Almutiry, O., et al.: A novel framework for multi-classification of guava disease. CMC— Comput Mater. Continua 69, 1915–1926 (2021) 11. Kour, H., Chand, L.: Healthy and unhealthy leaf classification using convolution neural network and CSLBP features. Int. J. Eng. Adv. Technol. (IJEAT), 10(1), October 2020 (2020). ISSN: 2249–8958 12. Thilagavathi, M., Abirami, S.: Application of image processing in diagnosing guava leaf diseases. Int. J. Sci. Res. Manage. (IJSRM) 5(07), 5927–5933 (2017) 13. Perumal, P., et al.: Guava leaf disease classification using support vector machine. Turkish J. Comput. Math. Educ. (TURCOMAT) 12(7), 1177–1183 (2021) 14. Bhushanamu, M.B.N., Rao, M.P., Samatha, K.: Plant curl disease detection and classification using active contour and Fourier descriptor. Eur. J. Mol. Clin. Med. 7(5), 1088–1105 (2020) 15. Mostafa, A.M., Kumar, S.A., Meraj, T., Rauf, H.T., Alnuaim, A.A., Alkhayyal, M.A.: Guava disease detection using deep convolutional neural networks: a case study of guava plants. Appl. Sci. 12(1), 239 (2021) 16. Yu, H.-J., Son, C.-H., Lee, D.H.: Apple leaf disease identification through region-of-interestaware deep convolutional neural network. J. Imaging Sci. Technol. 64(2), 20507–20510 (2020) 17. Al-bayati, J.S.H.,¨Ust¨unda˘g B.B.: Evolutionary feature optimization for plant leaf disease detection by deep neural networks. Int. J. Comput. Intell. Syst. 13(1), 12 (2020) 18. Bi, C., Wang, J., Duan, Y., Fu, B., Kang, J.-R., Shi, Y.: Mobilenet based apple leaf diseases identification. Mob. Netw. Appl. 27, 1–9 (2020) 19. Rajbongshi, A., Sazzad, S., Shakil, R., Akter, B., Sara, U.: A comprehensive guava leaves and fruits dataset for guava disease recognition. Data Brief 42, 108174 (2022) 20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 21. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 23. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018) 24. Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, pp. 6105–6114 (2019) 25. Dillon, J.V., et al.: Tensorflow distributions,” arXiv preprint arXiv:1711.10604 (2017)

Cassava Leaf Disease Classification Using Supervised Contrastive Learning Adit Ishraq1 , Sayefa Arafah1 , Sadiya Akter Mim1 , Nusrat Jahan Shammey1 , Firoz Mridha2(B) , and Md. Saifur Rahman1 1

Bangladesh University of Business and Technology, Dhaka, Bangladesh American International University-Bangladesh, Dhaka, Bangladesh [email protected]

2

Abstract. Cassava is a nutty-ﬂavored long bulbaceous starchy root vegetable. It is the principal source of calories and carbs for many people around the world, especially in southern Africa. Cassava production is most common in South Africa because it can survive well in a harsh environment. Sometimes the cassava crop gets aﬀected by leaf disease, which infects its overall production, and reduces the farmers’ income. And manual leaf disease detection may not obtain proper accuracy. In order to detect cassava leaf diseases, the current studies face various challenges such as poor accuracy, low detection rate, and high processing time. In our research, we have used supervised contrastive learning to detect four diseases of the cassava leaf and identify healthy leaves, from which we got tremendous results. We also used data augmentation modules, encoder networks, and projection networks to perform certain tasks such as network training, embedding similar classes nearby and other classes away, image labeling, and so on. From our study, we have achieved an accuracy, precision, recall, and f 1score of 88%, 78%, 79%, and 79%, respectively using supervised contrastive model. Keywords: Cassava Disease · Data Augmentation · Disease Classiﬁcation · Supervised Contrastive Loss · Deep Learning

1

Introduction

About 800,000 people eat cassava in 80 countries around the world [1]. It is a staple meal consumed by numerous people worldwide as a vegetable [2]. Cassava is a low-priced rich source of carbohydrates. For each acre, cassava can provide more calories than whole grains, making it a surely beneﬁcent crop in developing countries [3]. The leaves of cassava contain protein, vitamins, minerals, and essential amino acids. It helps in digestion and also helps in relieving constipation. Cassava faces various challenges during the harvest, e.g., leaf disease and inferiority. Cassava is vulnerable to the risk of a wide range of diseases caused by viruses. The main problem of cassava leaf disease decreases production, and the c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 60–71, 2023. https://doi.org/10.1007/978-3-031-34619-4_6

Cassava Leaf Disease Classiﬁcation Using Supervised Contrastive Learning

61

revenue of farmers’ can straightly get aﬀected by it. That is why cassava leaves need to be handled superiorly to improve disease diagnosis and production capacity. Early recognition of leaf disease helps in rescuing species before planting can be permanently infected. Without proper diagnosis and causative agents, disease control measures can be a waste of time and money and cause further damage to plants [4]. Therefore, various strategies can play an important role in accurately detecting leaf disease in a short period during the growth of the plant and also fungal infections. The possibility of spreading the virus through planting material is further increased because none of the farmers were aware that planting material could be the source of further virus transmission in the soil ground [5]. The symptoms can often be misleading because of the wide variety of diseases. So the manual diagnosis can be very time-consuming, ineﬃcient, or incorrect. This can hinder the overall production of Cassava. A technology that can eﬃciently diagnose the deceases at an early state with incredible accuracy should be introduced. Artiﬁcial intelligence applications have achieved huge success [6]. With the advancement of computer vision and machine learning, We used supervised contrastive learning to detect cassava leaf diseases which actually distinguishes between similar data and dissimilar data. Supervised contrastive learning can improve the accuracy and robustness of classiﬁers with minimal complexity. The overall contribution of our study can be summarized as, – This study presents a full overview of four types of cassava leaf disease detection. – This study presents a detailed description of supervised contrastive learning algorithm and explains how the model can upgrade Cassava leaf disease detection. – The suggested model is evaluated using diﬀerent performance metrics and compared with some existing studies. – Data augmentation have been used for getting the best results while training linear classiﬁers. Rest of the study is organized as follows: Sect. 2 represents the related work; Sect. 3 indicates the dataset. Section 4 covers the overall architectural methods related to our study. Section 5 explains the proposed model’s implementation, results and comparisons. In the end, Sect. 6 concludes the study.

2

Related Work

Cassava is a staple vegetable that is widely eaten in many parts of the world, especially in Africa and Thailand. Due to its ability to withstand harsh growing conditions, it grows in tropical regions of the world. These Cassava crops are exposed to many diseases and infections such as leaf blight and fungal infections throughout their life cycle, which reduces their productivity. For researchers, the main challenge is to identify the Cassava leaf diseases accurately during the early stages. Various studies have been accomplished to improve the accuracy of

62

A. Ishraq et al.

Cassava leaf diagnosis. In this section, we have an overview of existing research papers on this particular topic. Kiran Rao [7] proposed the Separable U-Net architecture for the Cassava Bacterial Blight Disease and Cassava Mosaic Disease detection, which resulted in accuracies of 63.6% and 83.9% respectively. In terms of accuracy, their model achieved comparable performance using Separable UNet, Dice co-eﬃcient, and also the mean IOU, which eventually became more eﬃcient than the original U-Net model. Ravi [8] suggested an approach that can be considered a deployable tool for the classiﬁcation of Cassava leaf diseases in agriculture. To do so, the author has used pre-trained CNN-based EﬃcientNet models for identifying and locating the infected tiny regions of the cassava leaf. Alongside the macro recall, macro precision, and macro F1 Score results being 0.69, 0.73, and 0.70, respectively, the EﬃcientNetB4 models exceeded other EﬃcientNet models. But this contemplation carries a highly inconsistent data set, and the proposed process is sensitive toward this kind of data. Based on Enhanced CNN models (ECNN), Lilhore [9] gave an extensive learning technique for real-time identiﬁcation of Cassava leaf disease. For the stabilized dataset, the proposed ECNN classiﬁer impressively exceeded and attained an accuracy of 99.3%. On the other hand, the model wasn’t capable enough in respect of other disease classes and data size. Oyewola [10] proposed a novel Deep Residual Convolution Neural Network (DRNN) for detecting Cassava Mosaic Disease in cassava leaf images with distinct block processing. Besides, for our predictive model, the DRNN model has generated the ﬁnest outputs and attains the 96.75% accuracy for the Cassava Disease Dataset of Kaggle. Though the method has favorable results, it also has some drawbacks as all Deep Learning-based methods are aptitude to overﬁt with training datasets, which stops them from generalizing. Also, for repugnant photographing conditions, the image enhancement using gamma correction could not be the ideal method all time. Ayu [11] has developed an intelligence system to detect Cassava leaf disease where MovileNetV2 has been used to create it and a Python graphical user interface (GUI) to display it. The study proposes to identify ﬁve disease classes called Cassava Green Mite (CGM), Cassava Bacterial Blight (CBB), cassava mosaic disease (CMD), Cassava Brown Steak Disease (CBSD), and healthy. And the accuracy obtained from the test data was 65.6%. Here accuracy is not so good. Sambasivam [12] proposed a CNN (Convulsive Neural Network) model with a very small dataset to detect cassava leaf disease with the challenge of achieving high accuracy. The study proposed to identify cassava leaf disease, but since the dataset was very imbalanced, the identiﬁcation became very biased towards cassava brown streak virus disease and also the Cassava mosaic disease classes. The overall accuracy was achieved 93%, but they couldn’t introduce any mobile application for the detection.

Cassava Leaf Disease Classiﬁcation Using Supervised Contrastive Learning

63

Sangamrang [13] has proposed a method that automatically classiﬁes unhealthy infected cassava and healthy cassava. Here deep learning has been used to introduce a novel method of classifying cassava disease. More speciﬁcally, CNN has been used for detection. Among the many diseases here, the focus was only on Cassava Brown Streak Virus Disease (CBSD). Diagnosis accuracy was achieved at 96%.In this study, they were only working on this particular disease though there are many more disease classes of cassava leaves. Emuoyibofarhe [14] has developed a trained machine learning system with Cubic Support Vector Machine Model(CSVM) with the object of health diagnosis. They also used the Coarse Gaussian Support Vector Machine (CGSVM) that detects cassava mosaic disease (CMD) and cassava bacterial bright disease (CBBD). Its accuracy is 83.9% and 61.8%, respectively. But the accuracy of the models is not so good, and they require a large sample. Surya [15] proposed a method of diagnosing cassava leaves using a Convolutional neural network(CNN), Tensorﬂow Package, In Google Collabs, and MobileNet V2 Architecture. They have used the ReLu Activation function, and the Classiﬁer function that has been used is Softmax. They also used Categorical cross-entropy as the Loss Function. The total accuracy for the training process was 0.8538 and for the Validation process was 0.7496. In the validation process, total accuracy is less than 80%, which needs to be improved. Metlek [16] has proposed disease detection methods in Deep Learning through the MobileNetV2 DL algorithms and ResNet 50 from identiﬁed areas. They also have used the K-nearest capture algorithm and support vector machine in order to classify Extracted properties. Average maximum success rate achieved from ResNet50 architecture and SVM classiﬁer. Resulted value of the training and testing process was 85.4% and 84.4%, respectively.But the model was not able to use more data sets.

3

Dataset

A dataset from the study competition of Kaggle was used for the classiﬁcation of cassava leaf disease during working on this model. There are ﬁve classes of datasets; among them, four have been used for four diseases and one for healthy leaves. In this model, the main goal is to learn to classify given images into Five classes. Local farmers in Uganda have collected these pictures from their gardens. Among the four diseases in this dataset are Cassava Mosaic Disease (CMD), Cassava bacteria Blight (CBB), Cassava Green Mite (CGM), and Cassava Brown Streak Disease (CBSD). More than 22,031 cassava leaf image datasets were included. Here we have used 5656 images to train the model.

4

Methodology

As the proposed concept of supervised contrastive learning is quite simple, the method is implemented and applied to detect Cassava leaf diseases. We know

64

A. Ishraq et al.

Fig. 1. This ﬁgure illustrates four types of diseases that the Cassava leaf contains. Each row presents a set of disease conditions for Cassava leaves.

that supervised learning enables to map the normalized encoding of instances of the same classiﬁcation closer and farther away from other classiﬁcations of instances (Fig. 1). The neural network alters the image into a representation and then uses this representation to predict the outcome. So it will be uncomplicated for the classiﬁer to give the correct result using this idea. Firstly the contrastive loss is used to train the network. The images are encoded in order to perform the task of embedding similar classes closer and other classes afar. Image labels are also operated. This phase consists of three components, namely Data Augmentation Module, Encoder Network, and Projection Network. These elements are described separately underneath. Secondly, the encoder network used in the previous state becomes frozen, and the projection network is abandoned. In order to learn a classiﬁer, the representation assimilated from the encoder network is used, which can be considered as a linear layer. We also used the cross-entropy loss to forecast the labels. 4.1

Representation Learning Framework

– Data Augmentation Module: In this module, the input images get transformed into augmented images. With various augmentation principles, two images are augmented for each particular image (Fig. 2).

Cassava Leaf Disease Classiﬁcation Using Supervised Contrastive Learning Stage 1

Encoder

Dense

Relu

Encoder

Augmented Image

Base Encoder

Dense

Relu

Stage 2

Dense

Contrastive Loss Function

Original Image

65

Image Embedding

Classifer

Softmax Loss Function

Output

Dense

Projection Head

Fig. 2. The supervised contrastive loss learns expressions operating a contrastive loss but uses label data to sample positives in addition to augmentations of the same picture. Both contrastive techniques can have an optional second phase which trains a model on top of the learned expressions.

• To acquire the main augmented image, it haphazardly trims the original image and resizes it to the actual size of the input image. • To acquire the second augmented picture there are three diﬀerent options. They are, AutoAugment, RandAugment and SimAugment ( the augmentation scheme proposed in SimCLR). The best results were found when we used the same data augmentation principle as in stage 2 when training linear classiﬁers. For every individual image, two diﬀerent augmented images can be found. This means it will return 2N augmented images for each N image. Encoder Network In Encoder Network, the image simply gets converted into a representation vector. Using headless ResNet-50 and ResNet-200 as base models for encoder networks, the authors have found some excellent results. The augmented images of the input image are sent separately to the same encoder, which was obtained from the data augmentation module, that outputs two representation vectors. Here, two representations for each input image mean that the outputs are normalized values. Projection Network Conversion of the representation vectors into a vector that is compatible with contrastive loss calculation is the Projection network. We have used a multi-layer perceptron and an output vector whose size DP = 128, along with an individual hidden layer whose size is 2048. In this network, those encoded vectors are fed, which were obtained as an output from the encoder network. The output we got from this network, i.e., At ﬁrst, normalized the projection vector and after that, used it in the loss function. When the supervised contraceptive loss function receives the output vector of this projection network, the loss is calculated and minimized.

66

A. Ishraq et al.

4.2

Projection Head

The projection is mainly responsible for the projection of the output, which is projecting the output of the encoders and setting a level on a smaller scale. 4.3

Classifier Head

In training, the classiﬁer head is used as the second optional phase. We can remove the projection head when we complete the SCL phase. After removing the projection head, we can add the classiﬁer head to the encoder and adjust the model with regular cross-entropy loss. 4.4

Supervised Contrastive Learning Loss

Supervised Contrastive Loss is an alternative loss function that is said to be an alternative to the loss function when it exceeds entropy. Here only parameter is temperature, and the default value is 0.1. But it can be changed. Low temperatures can beneﬁt from long training, but high temperatures can make classes more diﬀerent. And it is the execution of the SCL loss code. ιsup =

2N i=1

ι

sup

=

2N i=1

sup ιsup i ιi

ιsup i

(1)

2N

−1 exp(zi · zj /τ ) = Ii=j · Iy˜i = y˜j · log 2N 2Ny˜i − 1 j=1 k=1 Ii=k · exp(zi · zj /τ ) (2)

where, N is the number of sample images. Which is haphazard in a mini-batch.And we will get 2N images when N images pass the model stage 1. i indicates the index of an arbitrarily augmented image in a mini-batch. j speciﬁes the index of other augmented images. K is the index of other images apart from Xi and Xj. τ is a positive scalar temperature parameter. The total number of images that have the same label y is Ny . Zi and Zj are the projected vectors for the same images, and Zk is for any. i.e. Zi = P (E(Xi)).

Cassava Leaf Disease Classiﬁcation Using Supervised Contrastive Learning

67

Between the normalized vectors Zi*Zj computes an inner (dot) product. If B is true, then 1B is 1; otherwise, 0. And each z is normalized 128 dimensional vector. The numerator exp(zi ∗ zj /T ) denotes all cassava leaves in a batch. Than the log probability have been taken and sum that over all cassava leaves images in one batch except itself and divide by 2 ∗ n − 1. 4.5

First Stage Training

This phase of the training is completed using supervised contrastive Learning Loss with Encoder and Projection Head. 4.6

Second Stage Training (Encoder + Classifier Head)

In the second phase of training, we can conduct regular cross-entropy losses and train the model as usual. Here we remove the projection head and add the head classiﬁer to the top of the encoder. And now that has been considered.

5

Evaluation

At ﬁrst, we clariﬁed the evaluation metrics. Then the observed setup is described. At last, the assessment is presented with a comprehensive analysis. 5.1

Evaluation Metric

Depending on the Confusion matrix, we used the accuracy, precision, and retraction evaluation metrics. A Confusion matrix summaries the results of prediction that estimate Machine learning, including the problem of deep learning classiﬁcation, contains four measures: true positive (TP), true negative (TN), false positive (FP), and False Negative (FN). Architectural performance is then evaluated by these measurements. Precision: The proportion of the quantity of accurately classiﬁed positive samples to the absolute quantity of positively classiﬁed samples is regarded as precision. It reﬂects how well-grounded the model is in classifying samples as positive. The formula can be stated as follows: P recision =

TP TP + FP

(3)

Recall: The portion of positive samples accurately classiﬁed as positive is divided by the total amount of positive instances to compute the recall can be represented as the equation for recall:

68

A. Ishraq et al.

Recall =

TP TP + FN

(4)

f 1score The harmonic average of precision and recall, which carries into account both and integrates the precision with a single number of recalls operating the following instructions, is named the F1-score: F 1 − score = 2

P recision ∗ Recall P recision + Recall

(5)

The range of F1-score is 0 to 1. It can also be stated that the closer it is to 1, the better the model is. 5.2

Experimental Setup

When we require data collection, pre-processing, testing, and evaluation of any model, we ought to use Python, and here we use Python programming.Keras is used to complete the neural network architecture and implement the deep learning model.It has implemented the Adam [17] optimization function to train the dataset model. And the percentage of the total learning rate is 0.1%. For basic mathematical operations here, we use Numpy. We conduct to produce the GPU performance of the neural network by TensorFlow and also use ImageNet weights for each of the models. The size of our leaf image is 512 × 512 × 3. Here our dataset is divided into three parts train, test, and validity, and their partition part is 50% 25% and 25% respectively. During the training, here, we use the validation dataset to estimate the quality of the deep learning model, and the test data is calculated as the concluding dataset. 5.3

Evaluation and Comparison

In order to conﬁrm the performance of the Supervised Contrastive model for the classiﬁcation, we have compared Supervised Contrastive with other models using the same test environment and data set, such as DenseNet121, VGG16, InceptionV3[25], and ResNet50. The results for the other models are given in the table below. From the given Table 1, the evaluation of the applied three classic models’ precision, recall, F1-score, and also accuracy are issued. Given DenseNet121,The precision, F1-score, recall, and accuracy values are 0.67, 0.53, 0.53, and 0.70.For EﬃcientNetB7, the values of recall, F1-score, accuracy, and precision is 0.69,0.69 0.72 and 0.72. The nceptionV3 [11] model recall 0.43 , the F1-score is 0.46, precision is 0.55, and the total accuracy is 0.50.The MobileNetV2 [18] model F1-score is 0.65, recall 0.65 , the total accuracy is 0.68, and precision is 0.66. The ResNet50 model the F1-score is 0.59, precision is 0.62, accuracy is 0.68, and recall 0.59. The VGG16 model recall is 0.27, the F1-score is 0.32, accuracy is 0.32, and precision is 0.35. The VGG19 model recall is 0.43, the precision is 0.50, the F1 score is 0.43, and the absolute accuracy is 0.54. After that, recall and F1-score are 0.69, precision is 0.70, and accuracy is 0.79 for the eﬃciency of

Cassava Leaf Disease Classiﬁcation Using Supervised Contrastive Learning

69

Table 1. The table demonstrates our work’s accuracy, precision, recall & f1-score values of our assignment and popular architectures. Model

Precision Recall F 1score Accuracy

DenseNet121 [8]

0.67

0.53

0.53

0.70

EﬃcientNetB7 [8]

0.72

0.69

0.69

0.72

InceptionV3 [11]

0.55

0.46

0.43

0.50

MobileNetV2 [18]

0.66

0.65

0.65

0.68

ResNet50 [8]

0.62

0.59

0.59

0.68

VGG16 [8]

0.35

0.32

0.27

0.32

VGG19 [8]

0.50

0.43

0.43

0.54

Xception [8]

0.70

0.69

0.69

0.79

Supervised Contrastive

0.78

0.79

0.79

0.88

Xception. Eventually, the supervised contrastive model precision version is 0.78, the F1-score is 0.79, and the recall is 0.79. This study has the highest accuracy for the disease detection of cassava leaf images, with an accuracy of 88%, which denotes the acceptability of our proposed method. This study can easily exceed the other four, in the case of the F1 score, precision, or recall.

6

Conclusion

Cassava leaf disease detection is a very prominent ﬁeld of research. Supervised contrastive Learning (SCL) model has been developed in this study to identify four diseases of cassava leave and also identify the healthy leaves. Here data augmentation modules, encoder networks, and projection networks have also been used to perform certain tasks such as network training, embedding similar classes nearby and other classes away, and image labeling. We get the best results when we have used the same data augmentation principle as in stage 2 when training linear classiﬁers. In the following, we evaluated the architecture of 4 diﬀerent diseases of cassava leaves and identiﬁed healthy leaves. Our proposed architecture is convincing by achieving futuristic performance with the object of detecting any condition of cassava leaves and identifying healthy leaves have achieved accuracy of precision, recall, and f 1score are 78%, 79%, and 79%, respectively. In this study, the used dataset is highly imbalanced and the recommended approach is sensitive to imbalanced data. The imbalanced data can be controlled by modifying the proposed model, which can be considered as future work. Cassava diseases are equivalent to each other. Besides, the diﬀerent concepts of ﬁne-grained image classiﬁcation can be employed, which can help to reduce the proposed model’s misclassiﬁcation rate.

70

A. Ishraq et al.

References 1. Oyewole, O.B.: Cassava processing in Africa. In: Application of Biotechnology to Traditional Fermented Foods. Report of an Ad Hoc Panel of the Board on Science and Technology for International Development, USA, National Research Council, pp. 89–92 (1992) 2. Li, S., Cui, Y., Zhou, Y., Luo, Z., Liu, J., Zhao, M.: The industrial applications of cassava: current status, opportunities and prospects. J. Sci. Food Agric. 97(8), 2282–2290 (2017) 3. Zhao, P., et al.: Analysis of diﬀerent strategies adapted by two cassava cultivars in response to drought stress: ensuring survival or continuing growth. J. Exp. Bot. 66(5), 1477–1488 (2015) 4. Kabir, M.M., Ohi, A.Q., Mridha, M.F.: A multi-plant disease diagnosis method using convolutional neural network. In: Uddin, M.S., Bansal, J.C. (eds.) Computer Vision and Machine Learning in Agriculture. AIS, pp. 99–111. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6424-0 7 5. Prodeep, A.R., Hoque, A.M., Kabir, M.M., Rahman, M.S., Mridha, M.F.: Plant disease identiﬁcation from leaf images using deep CNN’S eﬃcientnet. In: 2022 International Conference on Decision Aid Sciences and Applications (DASA), pp. 523–527. IEEE (2022) 6. Jani, R., Shanto, M.S.I., Kabir, M.M., Rahman, M.S., Mridha, M.F.: Heart disease prediction and analysis using ensemble architecture. In: 2022 International Conference on Decision Aid Sciences and Applications (DASA), pp. 1386–1390. IEEE (2022) 7. Rao, P.K., et al.: Cassava leaf disease classiﬁcation using separable convolutions UNet. Turk. J. Comput. Math. Educ. (TURCOMAT) 12(7), 140–145 (2021) 8. Ravi, V., Acharya, V., Pham, T.D.: Attention deep learning-based large-scale learning classiﬁer for cassava leaf disease classiﬁcation. Expert Syst. 39(2), e12862 (2022) 9. Lilhore, U.K., et al.: Enhanced convolutional neural network model for cassava leaf disease identiﬁcation and classiﬁcation. Mathematics 10(4), 580 (2022) 10. Oyewola, D.O., Dada, E.G., Misra, S., Damaˇseviˇcius, R.: Detecting cassava mosaic disease using a deep residual convolutional neural network with distinct block processing. PeerJ Comput. Sci. 7, e352 (2021) 11. Ayu, H.R., Surtono, A., Apriyanto, D.K.: Deep learning for detection cassava leaf disease. In: Journal of Physics: Conference Series, vol. 1751, pp. 012072. IOP Publishing (2021) 12. Sambasivam, G., Opiyo, G.D.: A predictive machine learning application in agriculture: cassava disease detection and classiﬁcation with imbalanced dataset using convolutional neural networks. Egypt. Informat. J. 22(1), 27–34 (2021) 13. Sangbamrung, I., Praneetpholkrang, P., Kanjanawattana, S.: A novel automatic method for cassava disease classiﬁcation using deep learning. J. Adv. Inf. Technol. 11(4), 241–248 (2020) 14. Emuoyibofarhe, O., Emuoyibofarhe, J.O., Adebayo, S., Ayandiji, A., Demeji, O., James, O.: Detection and classiﬁcation of cassava diseases using machine learning. Int. J. Comput. Sci. Soft. Eng. (IJCSSE) 8(7), 166–176 (2019) 15. Surya, R., Gautama, E.: Cassava leaf disease detection using convolutional neural networks. In: 2020 6th International Conference on Science in Information Technology (ICSITech), pp. 97–102. IEEE (2020)

Cassava Leaf Disease Classiﬁcation Using Supervised Contrastive Learning

71

16. Metlek, S.: Disease detection from cassava leaf images with deep learning methods in web environment. Int. J. 3D Print. Technol. Digit. Ind. 5(3), 625–644 (2021) 17. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 18. Ramcharan, A., Baranowski, K., McCloskey, P., Ahmed, B., Legg, J., Hughes, D.P.: Deep learning for image-based cassava disease detection. Front. Plant Sci. 8, 1852 (2017)

Diabetes Mellitus Prediction Using Transfer Learning Md Ifraham Iqbal1(B) , Ahmed Shabab Noor2 , and Ahmed Raﬁ Hasan2 1

2

Department of Data Science, Friedrich Alexander University of Erlangen, Erlangen, Germany [email protected] Department of Computer Science and Engineering, United International University (UIU), Madani Avenue, Badda, Dhaka, Bangladesh {anoor193024,ahasan191131}@bscse.uiu.ac.bd Abstract. Over 25% of the elderly population suﬀer from diabetes. Diabetes has no cure but an early diagnosis can assist in reducing its eﬀects. Previously, Machine Learning has proven to be eﬀective for diabetes prediction. However, in the literature, barely any methods used the high learning capacities of Deep Learning (DL) techniques for diabetes prediction. Hence, in this study, we have proposed methods for diabetes diagnosis using Deep Learning (DL). All the attributes in the Pima Indian Diabetes Dataset (PIDD) are crucial for diabetes diagnosis. Since medical data is sensitive, the requirement of a non-biased classiﬁer is of utmost importance. Thus, the goal of this study is to create an intelligent model that can predict the presence of diabetes without using dimensionality reduction techniques. A 4-layered Neural Network (NN) model was used where the hidden layers consist of 64 neurons. Testing and evaluation demonstrated that the model achieves an accuracy of 93.33% on the PIDD. Alongside this, the data was also converted into an image dataset to apply transfer learning to the PIDD dataset. The obtained results are signiﬁcantly better than the ones obtained via experiments from previous studies. The CNN models produce 100% accuracy scores on the PIDD. The study proves that CNNs can be successful when being used on small medical datasets. Based on the results, we can also conclude that the proposed system can eﬀectively diagnose diabetes. Keywords: Diabetes Prediction · Convolutional Neural Networks · Tabular data to Image · Feature Analysis · Classiﬁcation · Tabular Convolution · Transfer Learning · Neural Networks

1

Introduction

A study in the United States has shown that more than 25.6 million people over the age of 20 (12.6 million are women) and more than 1 in 4 people over the age of 65 have been diagnosed with diabetes [6]. In the long term, diabetes c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 72–83, 2023. https://doi.org/10.1007/978-3-031-34619-4_7

Diabetes Mellitus Prediction Using Transfer Learning

73

can lead to eye diseases (i.e., glaucoma, retinopathy, etc., kidney damage, nerve damage, etc. However, the most signiﬁcant issue is that diabetes can also lead to cardiovascular diseases (CVD) [16] which is the leading cause of death in the US [5]. Another study concluded that in 2017 alone, diabetes diagnosis led to an expenditure of over 327 billion dollars in the US [35]. Diabetes has no cure, but research has shown that diabetes occurs due to the combination of daily lifestyle and the genes in a human being [23]. However, diabetes can be avoided with comprehensive lifestyle management regularly. However, in the US, more than 1 in 3 people have pre-diabetes which can lead to diabetes in the future, and the majority do not even know they suﬀer from this condition. Therefore, why will they change their lifestyle if they are unaware that they are developing risks of diabetes? Hence, our goal in this project is to build a system that will predict whether a person will have diabetes or not in the future. Technology has advanced dramatically during the last few decades. Technology is being used in robotics [32], home assistance [14,15], smart city applications [11], etc. With the invention of Machine Learning(ML), technology also plays a huge role in the medical sector. Researchers have already used machine learning to forecast CVD risks [25], growth of COVID-19 cases [13,22], etc. In this study, we will discuss a Deep Learning (DL) technique we have developed for diabetes diagnosis. The models give out positive or negative classiﬁcation results so that the individuals can reduce the impact of diabetes later on in their lives. Furthermore, the dataset is converted into images to apply CNNs to this dataset. Finally, multiple CNN models are applied to the PIDD using the transfer learning approach. The CNNs perform signiﬁcantly better than the existing literature, achieving sensitivity and speciﬁcity scores exceeding 0.90. Furthermore, the results suggest that DL Models can be used in small datasets. Section 2 looks at the previous studies on similar topics. In Sect. 3 we present the methods we have used in this study. Section 4 displays the results we have obtained and our discussion subsection. Finally, Sect. 5 concludes the study.

2

Background Study

Some researchers have already attempted to implement ML models to make an early prediction of diabetes. Jarullah et al. [1] used Decision Tree (DT) using the WEKA software. The DT accurately predicted 78.18% of the instances with speciﬁcity and sensitivity of 0.82 and 0.72, respectively. Srivanesan et al. [29], used the J48 DT on the dataset. However, the results did not improve with an accuracy of only 76.58% and speciﬁcity and sensitivity of 0.62 and 0.86. The J48 DT was biased in this case. Kumari et al. [21] used a Support Vector Machine (SVM) on the Pima Indian Diabetes dataset. They achieved an accuracy of 78% with sensitivity and speciﬁcity scores of 0.80 and 0.77, respectively. Wei et al. [34] applied several ML models and found that deep neural network (DNN) achieved the best performance (i.e., accuracy = 77.86%).

74

Md. I. Iqbal et al.

However, none of these works have focused on feature selection; hence, their performances are not up to the mark. Kaur et al. [17] used the Boruta Wrapper algorithm (BWA) for feature selection and then used ﬁve diﬀerent ML methods for classiﬁcation. Linear Kernel SVM gave the best results with an accuracy of 89% and an AUC score of 0.90. Calisir et al. [3] introduced a Morlet Wavelet Support Vector Machine Classiﬁer (MWSVM). They used Linear Discriminant Analysis (LDA) for feature reduction, and afterward, classiﬁcation was done using the MWSVM classiﬁer. The classiﬁer achieved accuracy, speciﬁcity, and sensitivity scores of 89.74%, 93.75, and 83.33, respectively. Erkaymaz et al. [7] introduced a Small-World Feed Forward Artiﬁcial Neural Network (SWFFANN) where they constructed the Small World network using Watts-Strogatz approach. This method achieved an accuracy of over 90% with an incredible speciﬁcity score of 0.9615 and a respectable sensitivity score of 0.85. However, this method is costly computationally. On the other hand, Nilashi et al. [24] introduced a novel classiﬁcation method. For clustering the data, they used Self Organizing Maps (SOM). Then they applied Principal Component Analysis (PCA) for dimensionality reduction and noise removal. The data is then fed into a Neural Network (NN) for classiﬁcation. 10 fold cross-validation is also used in this process. This combination of SOM-PCA-NN achieves a remarkable accuracy of 92.28%. A recent study [4] applied Naive Bayes but failed to provide results similar to the studies mentioned above. Another study [2] used Long Short Term Memory and Neural Networks but failed to provide 90% accuracy scores. Finally, the study in [26] compares the performance of ML models in the PIDD, and it was found that the NN performed the best and achieved an accuracy of over 97%. Based on this information, we are motivated to use the NN for our study. However, we failed to achieve similar levels of performance with the Neural Network, and so a CNN-based approach is introduced in this study. From previous research, it can be observed that all the features available in the dataset are associated with having diabetes. Hence in our proposed method, we will try to achieve maximum precision without removing any features. Our objective is to introduce a method that will perform even better than the above in diagnosing diabetes in women of Pima Indian heritage.

3

Methodology

In this section, we talk about the methods we used to achieve high accuracy in the PIDD dataset. For data analysis and model design we used pandas and Pytorch in python. 3.1

The Dataset

We have used the Pima Indian Diabetes Dataset (PIDD) for this study. This dataset is from the National Institute of Diabetes and Digestive and Kidney Diseases. In 1978, Knowler et al. [19] conducted a longitudinal study on the Pima Indian residents from Arizona. The dataset contained only data from females of

Diabetes Mellitus Prediction Using Transfer Learning

75

Table 1. The Pima Indians Diabetes Dataset Attribute Name

Details

Values

Pregnancies

Number of pregnancies

Numerical

Glucose

Plasma Glucose Concentration

Numerical

Blood Pressure

Diastolic Blood Pressure (mmHg)

Numerical

Skin Thickness

Triceps skin fold thickness (mm)

Numerical

Insulin

Serum of Insulin (muU/ml)

Numerical

BMI

Body Mass Index (kg/m2 )

Numerical

Diabetes Pedigree Function

History of diabetes in relatives and family members

Numerical

Age

Age of the individual in years

Numerical

Outcome

Whether the individual has diabetes or not

0 = False 1 = True

age 21 and above. In total, the PIDD has 9 columns. Details of these columns are given in Table 1. The Outcome column represents whether a person has been diagnosed with diabetes later in their lives or not. In this dataset, a total of 768 individuals have been tested, of whom 268 individuals tested positive for diabetes. 3.2

Exploratory Data Analysis

Firstly, we have checked for the distribution and skewness of the data. From Table 2, it can be observed that the Blood Pressure, Insulin, Diabetes Pedigree Function and Age are highly skewed and do not follow a normal distribution. The Pregnancies column is moderately skewed. The Glucose, Skin Thickness and BMI columns are moderately skewed. Next, we used Pearson Correlation of the features amongst one another. This has been shown in Fig. 1. It can be observed that with the class column, Outcome most of the features are correlated (Correlation value > 0.05). Additionally, it can be observed that none of the features are highly co-linear. From this, it can be concluded that all the features are relevant for diabetes. 3.3

Data Transforming

PIDD has negative values, and thus, we used Standard Scaler to transform the dataset. After using Standard Scaler, the mean value is 0, and the standard deviation is 1. The formula for Standard Scaler is given in Eq. 1. z=

xi + μ σ

(1)

76

Md. I. Iqbal et al. Table 2. Distribution of the Data Attribute Name

Skewness Value Distribution

Pregnancies

0.90

Moderately skewed

Glucose

0.17

Symmetric

Blood Pressure

-1.84

Highly skewed

Skin Thickness

0.11

Symmetric

Insulin

2.27

Highly skewed

BMI

0.43

Symmetric

Diabetes Pedigree Function 1.92

Highly skewed

Age

Highly skewed

1.13

Fig. 1. Correlation of the Columns in PIDD

Diabetes Mellitus Prediction Using Transfer Learning

3.4

77

Design of Our Model

Neural Networks were designed to help computers could analyze in ways a human brain does to solve complex problems. Deep Learning models are a key part of modern society due to their ability to learn complex patterns amongst the data. As we will be using all the features available in the dataset without any feature selection, ﬁnding complex patterns and hidden relations in the data was vital. We split the dataset into training and testing datasets. 95% of the data was used for training, while the rest was used for testing and evaluating the dataset. The training set was further divided into 64 batches and fed into the network one by one.

Fig. 2. Model Architecture

We have designed a neural network with 4 layers for predicting whether a person will have diabetes or not. The Rectiﬁer Linear Unit (ReLU) has been used as the activation function. Dropout(DP) is used to ensure no overﬁtting occurs. And, batch Normalization (BN) has also been used twice in our network. Network Architecture. The ﬁrst layer, or the input layer of our network, consists of 8 neurons. They take the value of the 8 input features. These values are then forwarded to our ﬁrst hidden layer (H1), which consists of 64 neurons. After completing the computation, the second layer forwards a value to our third layer. This is the second hidden layer (H2) of our model. This layer consists of 64 neurons. After computation, the results are forwarded to the output layer. The output layer contains 0 or 1, indicating whether diabetes is present or not in an individual. The network architecture has been shown in Fig. 2. Activation Function. Activation functions determine the activity of a neuron in a network. Additionally, activation functions are also essential in determining the speed at which a neural network is trained. Back-propagation is used

78

Md. I. Iqbal et al.

to train neural networks. Back-propagation requires a tremendous amount of computation, and hence, activation functions must be computationally eﬃcient. In our network, we used ReLU activation. ReLU has a linear scale for positive values and a value of 0 for negative instances. Due to this linear scale, ReLu is least aﬀected by vanishing gradient compared to other activation functions. Furthermore, due to the linear scale, it is cost-eﬀective, and networks using ReLU can converge rapidly. ReLU has a derivative function that allows it to be used in back-propagation. In the ﬁnal layer of the Neural Network, a Softmax function with crossentropy loss is applied. Optimizer and Criterion. Optimizers are used to edit the weight and learning rate of the network to reduce the loss. In the proposed system, we used the Adaptive Moment Estimation case (Adam) [18] optimizer. The Adam optimizer uses momentums of ﬁrst and second order. Adam calculates the exponentially decaying and the squared gradient of previous gradients of the Mean of the second moment. Adam optimizer uses Eq. 2 to update the parameter. Binary Cross-Entropy (BCE) is our experiment’s loss function. We also use Logit Loss as our criterion. Logit Loss combines the Sigmoid layer and BCE for numerical stability. η mt (2) θt+1 = θt − √ vt + Batch Normalization. Batch normalization (BN) is used to standardize the outputs of a layer in terms of their batches [10]. BN fastens the training process and makes the network more robust. BN also tackles vanishing gradient and exploding gradient issues. We use BN on the outputs of H1 and H2. Dropout. Dropout(DP) is a method in which neurons are turned oﬀ randomly during training for each iteration [30]. This is because neural networks tend to overﬁt on a lot of occasions as nodes adapt too much to the training data. Randomly turning them oﬀ solves this issue. We used a DP probability of 0.1 for each node. Learning Rate. Learning Rate (LR) determines and minimizes the loss after every iteration. LR is usually a small positive value as too high of an LR may lead to the network missing out on the minimum loss. However, if LR is too low, it takes a signiﬁcant amount of time and more epochs to reach the minima. Hence, we have used an LR value of 0.0002 and 100 epochs to train our network. As a result, the LR decreases exponentially as it approaches the local minima. 3.5

Application of Convolutional Neural Networks

Recently, several researchers have studied the eﬀectiveness of the Convolutional Neural Networks (CNN) on image data. However, their applications on tabular medical data have remained unexplored for a long period. In recent times,

Diabetes Mellitus Prediction Using Transfer Learning

79

methods like the SuperTML [31], DeepInsight [28] and DWTM [12] have opened the avenues for applying Convolutional Neural Networks on tabular data. The DWTM was preferred as it uses feature importance to convert the tabular data into an image dataset. The same statistical technique, Pearson Correlation which was used earlier in this study to ﬁnd the relevance of features, was used in the DWTM for creating the image dataset. Figure 3 shows the image of an instance created using the DWTM. Here, it can be observed that the font size varies. The font size is directly proportional to the correlation between the class and the feature. This enables the CNNs to focus more on the more signiﬁcant features, which boosts their performances.

Fig. 3. An instance created using the DWTM

Various state-of-the-art CNN models have been developed in recent times using the ImageNet dataset. Architectures like VGG, ResNet, and Inception are quite popular. However, due to the success in image classiﬁcation by ResNet [8], DenseNet [9] and RegNet [27] they are applied in this study. The transfer learning approach is used to apply these models on the PIDD. Pytorch has the pre-trained models of these CNN architectures available. The model is further trained for up to 10 epochs using the 80% training data of PIDD. This enables the CNNs to update their weights and ﬁne-tune themselves to predict diabetes in the patients successfully.

80

4

Md. I. Iqbal et al.

Results and Discussion

In this section, we have discussed the results obtained using our approach and the overall impact of our study. Accuracy, Sensitivity, and Speciﬁcity are the measures used for evaluating the performance of the Deep Learning models in this study. Sensitivity and Speciﬁcity are calculated using formulas 4 and 3 respectively. Specif icity =

TN TN + FP

(3)

Sensitivity =

TP TP + FN

(4)

Table 3. Comparison of our Results on PIDD with Previous Research Author

Performance Measures Accuracy Specificity Sensitivity Method Used

Kaur et al. [17]

89.00%

NA

NA

BWA-SVM

Calisir et al. [3]

LDA-MWSVM

89.74%

0.93

0.83

Erkaymaz et al. [7] 91.66%

0.96

0.85

SW-FFANN

Nilashi et al. [24]

92.28%

NA

NA

PCA-SOM-NN

Kumari et al. [20]

97.20%

NA

NA

Voting Classifier

Neural Network*

93.33%

95.20

90.90

PC+NN

Inception*

100%

100

100

Transfer Learning

ResNet*

100%

100

100

Transfer Learning

RegNet*

100%

100

100

Transfer Learning

Table 3 compares the result of our model with the results from previous studies. The * marks are the models applied in this study. The accuracy of the proposed NN surpasses the performance of the studies mentioned in Sect. 2. Furthermore, the proposed network outperforms the PCA-SOM-NN that used clustering and feature selection in [24]. On top of that, the network used in [24] was trained for 200 epochs, while ours only used 100 epochs. Hence, our system is computationally more eﬃcient and produces much better results. However, the sensitivity score for medical data is more crucial than accuracy [33]. Sensitivity scores refer to the correctly classiﬁed percentage scores of the people who are positively diagnosed with diabetes in this case. Therefore, high sensitivity score is critical as a wrong result may lead to deadly consequences. Due to this, we have heavily emphasized balancing our sensitivity and speciﬁcity scores. From the table, it can be observed that the proposed model results are the only one in the chart that achieves sensitivity and speciﬁcity scores that are higher than 0.90. Therefore, it can be deduced that our model works eﬀectively in diagnosing diabetes.

Diabetes Mellitus Prediction Using Transfer Learning

81

However, in recent times even better methods have been introduced which perform better than the NN proposed in this study. Hence, our study used data transformation techniques to apply CNNs to the diabetes dataset. The CNNs performed remarkably, and multiple CNN methods, including the ResNet, Inception, and RegNet architectures, produced 100% accuracy scores on the PIDD dataset. The impact of this study has enormous consequences as this shows the eﬀectiveness of CNNs for predicting diabetes. Additionally, they easily surpass the ML models’ performance, including the Neural Network. It also shows the eﬀectiveness of CNNs on tabular data, further strengthening the case for CNNs on tabular data as mentioned in [31]. The CNNs mentioned above are pre-trained on the ImageNet dataset for an extended period. This enhances the learning prowess of the CNNs and makes them robust for use in any tabular datasets. As a result, the CNNs perform exceptionally well on the PIDD dataset with the assistance of DWTM. Furthermore, the relevance of features when creating images also plays a crucial role in boosting the performance of CNNs. People who suﬀer from prediabetes can test themselves and, based on the results, can take action to prevent diabetes in the future. Furthermore, individuals who test positive can take precautions to minimize the impact of diabetes in their lives. On top of that, clinical experts can use this as an assistive system to validate their results and vice versa. However In the past, it has been shown that CNNs do not work well on tabular data or small datasets. However, our results show that CNN performs better than traditional classiﬁers on the PIDD, which is a small dataset. Despite minimal feature selection, the network performs better than traditional classiﬁers. Furthermore, the network has worked well on a dataset that is skewed. On top of that, only 5 epochs have been used, thus making this system very eﬃcient computationally. This suggests that increasing the number of epochs might result in the model achieving an even better result. The caveat is that the system will become computationally more expensive; hence, the slight increase in performance may not be worth it. Despite all its advantages, the proposed study does have a few shortcomings. First, the dataset has only been used on women of a speciﬁc heritage. As mentioned earlier, the system could be further improved by increasing the number of epochs(iterations) in the training loop. In future studies, this proposed method can also be used in other datasets to test its robustness.

5

Conclusion

In this paper, we proposed a transfer learning approach for predicting diabetes mellitus with 100% accuracy. Furthermore, convolutional neural networks can ﬁnd very complex patterns within the data due to the inner computations. As a result, the CNNs worked very well in predicting whether a person will have diabetes. We believe this study will have a signiﬁcant impact on the medical ﬁeld and can help ordinary people to control the impact diabetes has on their lives. Furthermore, this study proves that when using medical data, where keeping all the attributes is crucial, CNNs can work better than traditional ML models.

82

Md. I. Iqbal et al.

References 1. Al Jarullah, A.A.: Decision tree discovery for the diagnosis of type ii diabetes. In: 2011 International Conference on Innovations in Information Technology, pp. 303–307. IEEE (2011) 2. Butt, U.M., Letchmunan, S., Ali, M., Hassan, F.H., Baqir, A., Sherazi, H.H.R.: Machine learning based diabetes classiﬁcation and prediction for healthcare applications. J. Healthc. Eng. 2021 (2021) 3. Çalişir, D., Doğantekin, E.: An automatic diabetes diagnosis system based on LDAwavelet support vector machine classiﬁer. Expert Syst. Appl. 38(7), 8311–8315 (2011) 4. Chang, V., Bailey, J., Xu, Q.A., Sun, Z.: Pima Indians diabetes mellitus classiﬁcation based on machine learning (ml) algorithms. Neural Comput. Appl. 1–17 (2022) 5. Centers for Disease Control and Prevention: Missed opportunities in preventive counseling for cardiovascular disease-united states, 1995. MMWR. Morbidity and mortality weekly report, vol. 47, no. 5, p. 91 (1998) 6. Centers for Disease Control and Prevention: National diabetes fact sheet: national estimates and general information on diabetes and prediabetes in the united states, 2011. Atlanta, GA: US department of health and human services, centers for disease control and prevention, vol. 201, no. 1, pp. 2568–2569 (2011) 7. Erkaymaz, O., Ozer, M.: Impact of small-world network topology on the conventional artiﬁcial neural network for the diagnosis of diabetes. Chaos Solitons Fractals 83, 178–185 (2016) 8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 9. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 10. Ioﬀe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) 11. Iqbal, M.I., Leon, M.I., Tonmoy, N.H., Islam, J., Ghosh, A.: Deep learning based smart parking for a metropolitan area. In: 2021 IEEE Region 10 Symposium (TENSYMP), pp. 1–5. IEEE (2021) 12. Iqbal, M.I., Mukta, M., Hossain, S., Hasan, A.R.: A dynamic weighted tabular method for convolutional neural networks. arXiv preprint arXiv:2205.10386 (2022) 13. Iqbal, M., Leon, M., Azim, S.: Analysing and predicting coronavirus infections and deaths in Bangladesh using machine learning algorithms. SSRN Electron. J. (2020) 14. Islam, A., et al.: EduBot: an educational robot for underprivileged children. In: 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pp. 232–236. IEEE (2019) 15. Islam, J., Ghosh, A., Iqbal, M.I., Meem, S., Ahmad, N.: Integration of home assistance with a gesture controlled robotic arm. In: 2020 IEEE Region 10 Symposium (TENSYMP), pp. 266–270. IEEE (2020) 16. Kannel, W.B., McGee, D.L.: Diabetes and cardiovascular disease: the Framingham study. Jama 241(19), 2035–2038 (1979) 17. Kaur, H., Kumari, V.: Predictive modelling and analytics for diabetes using a machine learning approach. Appl. Comput. Inform. (2020)

Diabetes Mellitus Prediction Using Transfer Learning

83

18. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2014) 19. Knowler, W.C., Bennett, P.H., Hamman, R.F., Miller, M.: Diabetes incidence and prevalence in pima Indians: a 19-fold greater incidence than in Rochester, Minnesota. Am. J. Epidemiol. 108(6), 497–505 (1978) 20. Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classiﬁcation and prediction of diabetes mellitus using soft voting classiﬁer. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021) 21. Kumari, V.A., Chitra, R.: Classiﬁcation of diabetes disease using support vector machine. Int. J. Eng. Res. Appl. 3(2), 1797–1801 (2013) 22. Leon, M.I., Iqbal, M.I., Azim, S.M., Al Mamun, K.A.: Predicting COVID-19 infections and deaths in Bangladesh using machine learning algorithms. In: 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), pp. 70–75. IEEE (2021) 23. Li, G., et al.: The long-term eﬀect of lifestyle interventions to prevent diabetes in the China Da Ging diabetes prevention study: a 20-year follow-up study. Lancet 371(9626), 1783–1789 (2008) 24. Nilashi, M., Ibrahim, O., Dalvi, M., Ahmadi, H., Shahmoradi, L.: Accuracy improvement for diabetes disease classiﬁcation: a case on a public medical dataset. Fuzzy Inf. Eng. 9(3), 345–357 (2017) 25. Patil, P.B., Shastry, P.M., Ashokumar, P.: Machine learning based algorithm for risk prediction of cardio vascular disease (CVD). J. Crit. Rev. 7(9), 836–844 (2020) 26. Patil, V., Ingle, D.: Comparative analysis of diﬀerent ml classiﬁcation algorithms with diabetes prediction through pima Indian diabetics dataset. In: 2021 International Conference on Intelligent Technologies (CONIT), pp. 1–9. IEEE (2021) 27. Schneider, N., Piewak, F., Stiller, C., Franke, U.: RegNet: multimodal sensor registration using deep neural networks. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 1803–1810. IEEE (2017) 28. Sharma, A., Vans, E., Shigemizu, D., Boroevich, K.A., Tsunoda, T.: Deepinsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9(1), 1–7 (2019) 29. Sivanesan, R., Dhivya, K.D.R.: A review on diabetes mellitus diagnoses using classiﬁcation on pima Indian diabetes data set. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 5(1) (2017) 30. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overﬁtting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 31. Sun, B., et al.: SuperTML: two-dimensional word embedding for the precognition on structured tabular data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019) 32. Uday, T.I.R., et al.: Design and implementation of the next generation mars rover. In: 2018 21st International Conference of Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2018) 33. Veropoulos, K., et al.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, vol. 55, p. 60 (1999) 34. Wei, S., Zhao, X., Miao, C.: A comprehensive exploration to the machine learning techniques for diabetes identiﬁcation. In: 2018 IEEE 4th World Forum on Internet of Things (WF-IoT), pp. 291–295. IEEE (2018) 35. Yang, W., Dall, T.M., Beronjia, K., Semilla, A.P., Chakrabarti, R., Hogan, P.F.: Economic costs of diabetes in the us in 2017. Diabetes Care 41(5), 917–928 (2018)

An Improved Heart Disease Prediction Using Stacked Ensemble Method Md. Maidul Islam1 , Tanzina Nasrin Tania1 , Sharmin Akter1 , and Kazi Hassan Shakib2(B) 1 City University, Dhaka, Bangladesh 2 Chittagong University of Engineering and Technology, Chittagong, Bangladesh

[email protected]

Abstract. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. The discovery of previously unknown patterns and connections can help with an improved decision when it comes to forecasting heart disorder risk. In this study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, crossvalidation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews’ correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms. Keywords: Prediction · Heart Disease · CART · GBM · Multilayer Perception

1 Introduction Heart disorder, which affects the heart and arteries, is one of the most devastating human diseases. The heart is unable to pump the required volume of blood toward other parts of the body when it suffers from cardiac problems. In the case of heart disease, the valves and heart muscles are particularly affected. Cardiac illness is also referred to as cardiovascular disease. The cardiovascular framework comprises all blood vessels, including arteries, veins, and capillaries, that constitute an intricate system of the bloodstream throughout the organ. Cardiovascular infections include cardiac illnesses, cerebrovascular infections, and artery illnesses. Heart disease may be a hazard, usually unavoidable © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 84–97, 2023. https://doi.org/10.1007/978-3-031-34619-4_8

An Improved Heart Disease Prediction Using Stacked Ensemble Method

85

and an imminent reason for casualty. Heart disease is currently a prominent issue with all other well-being ailments since many people are losing their lives due to heart disease. Cardiovascular disease kills 17.7 million people per year, accounting for 31% of all deaths globally, as per the World Health Organization (WHO). Heart attacks and strokes account for 85% of these cases. Heart-related disorders have also become the major cause of death in India [1]. In the United States, one person is killed every 34 s. [9]. Heart diseases killed 1.7 million Indians in 2016, concurring to the 2016 Worldwide Burden of Disease Report, released on September 15, 2017 [3]. According to a WHO report published in 2018, nearly 6 million people died globally in 2016 because of heart infections. [4]. Controlling heart disorders costs approximately 3% of total healthcare spending [20]. The World Health Organization’s projections provided the impetus for this project. The WHO predicts that roughly 23.6 million people will die from heart disease by 2030. The expanding rate of heart infections has raised worldwide concern. Heart failure is tougher to diagnose because of diabetes, hypertension, hyperlipidemia, irregular ventricular rate, and other pertinent diagnosable conditions. As cardiac illness becomes increasingly common, data on the condition is getting more nonlinear, nonnormal, association-structured, and complicated. As a result, forecasting heart illness is a major difficulty in medical data exploration, and clinicians find it extremely difficult to properly forecast heart disease diagnosis. Several studies have endeavored to use advanced approaches to analyze heart disease data. If the bagging is not adequately represented in the ensemble approach, it might result in excessive bias and consequently under-fitting. The boosting is also difficult to apply in real time due to the algorithm’s increasing complexity. On the other hand, our proposed approach may combine the skills of several high-performing models on a classification or regression job to provide predictions that outperform any single model in the ensemble while also being simpler to build. Our suggested system hasn’t received much attention; so we’ve attempted to build it correctly and come up with a nice outcome, and a superior prediction system. The organization of the paper is explained as follows. In Sect. 2, we have made an effort to state related research contributions, state their major contributions and compare with our work. We also provided a table with the underlying overview of the related works and comparison analytics for readers. With Sect. 3, we have provided an outline of the system methodology and outlined the architecture. In Sect. 4, implementations, and experimental results are described. Section 5, we speak on our limitation in section and we conclude the paper.

2 Literature Review The study aims to look into how data mining techniques may be used to diagnose cardiac problems [15]. Practitioners and academics have previously employed pattern recognition and data mining technologies in the realm of diagnostics and healthcare for prediction purposes [13]. Various contributions have been made in recent times to determine the best preferred approach for predicting heart disorders [8]. So, the above part explores numerous analytical methodologies while providing a quick overview of the existing literature regarding heart disorders. In addition, current techniques have been evaluated in several ways, including a comprehensive comparison after this section.

86

M. M. Islam et al.

Mohan S. et al. [1] developed a unique approach to determining which ML approaches are being used to increase the accuracy of heart illness forecasting. The forecast model is introduced using a variety of feature combinations and well-known classification methods. They attain an enhanced performance level with an accuracy level of 88.7% using the Hybrid Random Forest with Linear Model prediction model for heart disease (HRFLM). As previously stated, the (ML) techniques used in this study include DT, NB, DL, GLM, RF, LR, GBT, and SVM. All 13 characteristics as well as all ML techniques were used to reproduce the investigation. Palaniappan S. et al. [2] applied a technology demonstrator Intelligent Heart Disease Prediction System (IHDPS), using data mining approaches such as DT, NB, and NN. The results show that each approach seems to have a different advantage in reaching the defined extraction criteria. Based on medical factors like sex, age, blood sugar, and blood pressure, this can forecast the probability of individuals developing heart disorders. It enables considerable knowledge to be established, such as patterns and correlations among galenic aspects connected to heart illness. The Microsoft.NET platform underpins IHDPS. The mining models are built by IHDPS using the CRISP-DM approach. Bashir S. et al. [4] in their research study discusses how data science can be used to predict cardiac disease in the medical industry. Despite the fact that several studies have been undertaken on the issue, prediction accuracy still needs to be improved. As a result, the focus of this study is on attribute selection strategies as well as algorithms, with numerous heart disease datasets being utilized for testing and improving accuracy. Attribute selection methodologies such as DT, LR, Logistic Regression SVM, NB, and RF are used with the Rapid miner, and the results indicate an increase in efficiency. Le, H.M. et al. [5] rank and weights of the Infinite Latent Feature Selection (ILFS) approach are used to weight and reorder HD characteristics in our method. A pulpous margin linear SVM is used to classify a subset of supplied qualities into discrete HD classes. The experiment makes use of the UCI Machine Learning Repository for Heart Disorders’ universal dataset. Experiments revealed that it suggested a method is useful for making precise HD predictions; our tactic performed the best, with an accuracy of 90.65% as well as an AUC of 0.96 for discriminating ‘No existence’ HD from ‘Existence’ HD. Yadav, D.C. and Pal et al. [6] implemented M5P, random Tree, and Reduced Error Pruning using the Random Forest Ensemble Method were presented and investigated as tree-based classification algorithms. All prediction-based methods were used after identifying features for the cardiac patient dataset. Three feature-based techniques were employed in this paper: PC, RFE, and LR. For improved prediction, the set of variables was evaluated using various feature selection approaches. With the findings, they concluded that the attribute selection methods PC and LR, along with the random-forest-ensemble approach, deliver 99% accuracy. Kabir, P.B. and Akter, S. et al. [7] among the most fundamental and widely used ensemble learning algorithms are tree-based techniques. Tree-based models such as Random Forest (RF), and Decision Tree (DT), according to the study, provide valuable intelligence with enhanced efficiency, consistency, as well as application. Using the Feature Selection (FS) method, relevant features are discovered, and classifier output is produced using these features. FS eliminates non-essential characteristics without

An Improved Heart Disease Prediction Using Stacked Ensemble Method

87

affecting learning outcomes. Our study aims to boost the performance. The aim of the research is really to apply FS in conjunction with tree-based approaches to increase heart disease prediction accuracy. Islam, M.T. et al. [8] in this work, PCA has been used to decrease characteristics. Aside from the final clustering, a HGA with k-means was applied. For clustering data, the k-means approach is often applied. Because this is a heuristic approach, it is possible for it to become trapped in local optima. To avoid this problem, they used the HGA for data clustering. The suggested methodology has a prediction accuracy of 94.06% for early cardiac disease. Rahman, M.J.U. et al. [10] the main purpose of this work is just to create a Robust Intelligent Heart Disease Prediction System (RIHDPS) applying several classifiers such as NB, LR, and NN. This content investigated the effectiveness of medical decision assistance systems utilizing ensemble techniques of these three algorithms. The fundamental purpose of this study is to establish a Robust Intelligent Heart Disease Prediction System (RIHDPS) by combining 3 data mining modelling techniques into an ensemble method: NB, LR, and NN. Patel, J. et al. [12] utilizing W-E-K-A, this study evaluates alternative Decision Tree classification algorithms to improve contribution in heart disorder detection. The methods being tested include the J48 approach, the LMT approach, and the RF method. Using existing datasets of heart disease patients as from the UCI repository’s Cleveland database, the performance of decision tree algorithms is examined and validated. The aim of the research is to utilize data mining tools that uncover hidden patterns in cases of heart problems as well as to forecast the existence of heart disorders in individuals, ranging from no existence to likely existence. Bhatla, N. et al. [28] research aims to look at different data mining techniques that might be employed in computerized heart disorder forecasting systems. The NN with 15 features has the best accuracy (100%) so far, according to the data. DT, on either hand, looked impressive with 99.62% accuracy when using 15 characteristics. Furthermore, the Decision Tree has shown 99.2% efficiency when combined with the Genetic Algorithm and 6 characteristics (Table 1).

3 Methodology This section mentioned above proposes an advanced and efficient prediction of heart disease based on past historical training data. The ideal strategy is to analyze and test various data-mining algorithms and to implement the algorithm that gives out the highest accuracy. This research also consists of a visualization module in which the heart disease datasets are displayed in a diagrammatic representation using different data visualization techniques for user convenience and better understanding. The subsections that follow go through several materials and methodologies in detail. The research design is shown in Section A, the data collection and preprocessing are summarized in Section B, and the ML classification techniques and stacked ensemble approach are explained in Section C of this study.

88

M. M. Islam et al.

Table 1. A literature evaluation of cardiac disease predictions included a comparison of several methods. Source

Datasets

Mohan S. [1]

Attributes

Classifier & Validation techniques

Accuracy

Cleveland UCI HRFLM repository

14 attributes

DT, GLM, RF, and 5 more attributes

88.4%

Bashir S. [4]

UCI dataset

Minimum Redundancy Maximum Relevance (MRMR)

FBS, Cp, Trestbps, Chol, Age, Slope, Sex, and more 7 attributes

NB, Logistic NB: 84.24% Regression, LR LR (SVM): SVM, DT and 84.85% RF

Le, H.M. [5]

UCI Machine Learning Repository

Infinite Latent Feature Selection (ILFS)

58 attributes

WEKE, NB, LR, Non-linear SVM (Gaussian, Polynomial, Sigmoid), and Linear SVM

Linear SVM: 89.93%, ILFS: 90.65%

Yadav D.C. and Pal [6]

UCI repository Lasso Regularization, Recursive Features Elimination and Pearson Correlation

Resting, FBS, CP, Chol, Sex, Ca, Age, and 7 more attributes

Random Tree, M5P, and Reduced Error Pruning with Random Forest Ensemble Method

Random Forest ensemble method: 99%

Kabir P.B. Hungary (HU), Hybrid and Akter Long Beach S. [7] (LB), Cleveland (Cleve.), and Switzerland (SR)

Cordocentesis, Max HR achieved, Epoch, Triglyceride, Sign, Coronary Infarction, Diastolic Pressure, and 6 more attributes

LGBM, RF, NB, SVM, and 3 more algorithms

KNN: 100.00% DT: 100.00% RF: 100.00%

Islam M.T. [8]

14 attributes

H-G-A with k-means

94.06%

UCI Machine Learning Repository

FS

PCA

(continued)

An Improved Heart Disease Prediction Using Stacked Ensemble Method

89

Table 1. (continued) Source

Datasets

FS

Patel J. [12]

Cleveland UCI WEKA repository

Attributes

Classifier & Validation techniques

Accuracy

13 attributes

DT(J48), LMT, J48 tree RF technique: 56.76%

3.1 Research Design In this section, gather all of the data into a single dataset. This approach for extracting functions for cardiovascular disease prognostication may also be applied with this aspect analysis procedure. Following the identification of accessible data resources, those are additionally picked, cleansed, and then converted to the required distribution. The atypical identification survey provides valuable characteristics for predicting coronary artery disease prognosis. Cross-validation, several classification approaches, and the stacked ensemble method will be utilized to predict using pre-processed data. After completing all of these steps, the illness will be forecast favorably. Following that, we’ll assess the entire performance. The outcome will be determined after the performance review (Fig. 1).

Fig. 1. Methodological framework of heart disease.

3.2 Data Collection and Preprocessing In this study, we used Statlog, Cleveland, and Hungary datasets as the three datasets in this fact compilation. There are 1190 records in all, with 11 characteristics and one target variable. Chest pain, cholesterol, sex, resting blood pressure, age, resting ecgnormal (0), st-t abnormality (1), lv hypertrophy (2), fasting blood sugar max hate rate, exercise angina, old-peak, st slope-normal (0), upsloping (1), flat (2), downsloping (3), 0 denoting no disease and 1 denoting illness. It should be noted that null or missing values are utilized to represent zero values. As a result, we must delete null values throughout the data preparation step. But in our case, we have no null values. After that, we complete exploratory data analysis (Table 2).

90

M. M. Islam et al. Table 2. Features of the dataset descriptive information.

Features

Definition

Type

Age

Patient’s age in years successfully completed

Numerical

Sex

Male patients are indicated at 1 and female patients are indicated at 0

Nominal

Chest Pain

The four types of chest pain that patients feel are: 1. typical Nominal angina, 2. atypical angina, 3. non-anginal pain, and 4. asymptomatic angina

Resting BPS

Blood pressure in mm/HG while in resting mode

Numerical

Cholesterol

mg/dl cholesterol in the bloodstream

Numerical

Fasting Blood Sugar Fasting blood sugar levels >120 mg/dl are expressed as 1 in Nominal real cases and 0 in false cases Resting ECG

The ECG results when at rest are displayed in three different values. 0: Normal 1: ST-T wave abnormality 2: Left ventricular hypertrophy

Nominal

Max heart rate

Accomplished maximum heart rate

Numerical

Exercise angina

Exercise-induced angina 0 represents No and 1 represents Yes

Nominal

Oldpeak

In compared to the resting state, exercise caused ST-depression

Numerical

ST slope

Three values for the ST segment assessed of slope at peak exercise: 1. slanting, 2. flat, 3. slanting

Nominal

Target

It is the objective variable that we must forecast. A score of 1 Numerical indicates that the person is at risk for heart disease, whereas a value of 0 indicates that the person is in good health

3.3 Models Machine learning classification methods are utilized in this phase to classify cardiac patients and healthy people. The system employs RF Classifier, MLP, KNN, ET Classifier, XGBoost, SVC, AdaBoost Classifier, CART, and GBM, among other common classification techniques. For our suggested system, we will apply the stacked ensemble approach. We need to construct a base model and a meta-learner algorithm for a stacked ensemble. The most relevant and standard evaluation metrics for this problem area, such as sensitivity, specificity, precision, F1-Score, ROC, Log Loss and Mathew correlation coefficient are used to assess the outcome of each event. 1. RF Classifier: Random Forest Model is a classification technique that uses a random forest as its foundation. As in regression and classification, an algorithm may handle data sets with both continuous and categorical variables. It outperforms the competition when it comes to categorized problems. Criterion: this is a function that determines whether or not the split is correct. We utilized “entropy” for information

An Improved Heart Disease Prediction Using Stacked Ensemble Method

91

gain, and “gini” stands for Gini impurity. Gini = 1 −

G

(pi )2

i=1

Entropy =

G

−pi ∗ log 2 (pi )

i=1

2. MLP: A pelleting neural network called a multi-layer perceptron (MLP) establishes a number of outputs from a collection of inputs. Multiple sections of input nodes comprise an MLP, between the inlet and outlet layers is linked as a directed graph. 3. KNN: K-NN method is straightforward to implement and does not require the use of a hypothesis or any other constraints. This algorithm may be used to do exploration, validation, and categorization. Despite the fact that K-NN is the most straightforward approach, it is hampered by duplicated and unnecessary data. 4. Extra Tree Classifier: Extremely Randomized Trees, or Extra Trees, is a machine learning ensemble technique. This is a decision tree ensemble comparable like bootstrap aggregation and random forest, among other decision tree ensemble, approaches. The Extra Trees approach uses the training data to construct a significant number of extremely randomized decision trees. An Average of decision tree estimates is used in regression, whereas a democratic majority is utilized in classification. 5. XGBoost: The XGBoost classifier is a machine learning method for categorizing both structured and tabular data. XGBoost is a high-speed and high-performance gradient boosted decision tree implementation. XGBoost is a high-gradient gradient boost algorithm. As a result, it’s a complicated machine learning method with many moving parts. XGBoost can handle large, complicated datasets with ease. XGBoost is an ensemble modelling approach. 6. SVC: In both classification and regression issues, the Support Vector Classifier (SVC) is a common supervised learning technique. The SVC method’s purpose is to find the optimal path or set point for categorizing n-dimensional regions because the following observations may be readily classified. SVC can be used to select the extreme positions that aid in the construction of the hyperplane. The Support Vector Machine is the method, and support vector classifiers are prominent examples. 7. AdaBoost Classifier: The Algorithms, shorthand for Adaptive Boosting, is a boosting approach used in Machine Learning as Ensemble Learning. Each instance’s weights are reassigned, with larger weights applied for instances that were mistakenly identified. This is known as “Adaptive Boosting”. 8. CART: Data is divided up frequently based on a parameter in decision trees, a kind of supervised machine learning. In the training data, specify the input and the associated output. Two entities may be used to explain the tree: decision nodes and leaves. 9. GBM: Gradient boosting is a collection of classification algorithms that may be applied to a variety of issues such as classification and regression problems. It assembles a prediction system from a collection of weak frameworks,—usually decision trees.

92

M. M. Islam et al.

10. Stacked Ensemble: The term “ensemble” relates to the procedure of combining many models. As a result, instead of employing model to make predictions, a group of models is used. Ensemble uses two different techniques: • Bagging creates a unique training segment with replenishment from experimental training phase, as well as the outcome is determined by a majority vote. Consider the Random Forest example. • Boosting transforms weak learners to strong learners through creating pursuant models with overall performance as the final model. For instance, in AdaBoost and XG BOOST. The stacked ensemble approach will be used. The stacked ensemble approach would be a supervised ensemble classification strategy that stacks many prediction algorithms to find the optimum combination. Stacking, also called as Superior Training or Stacking Regression, is a set of computational where another second-level regression model “metalearner” is combined with a first-level regression model has been programmed to find the optimum possible combination of basic learners. Stacking, in contrast to bagging and boosting, aims to bring together strong, varied groups of learners. We have completed our work in the following sections 1. 2. 3. 4. 5. 6.

For this system, we import all of the necessary libraries. After loading our dataset, we clean and preprocess it. We use the z-score to identify and eliminate outliers. We divided the data into two parts: training and testing, with 80/20%. We developed a model using cross-validation. For a stacked ensemble technique, we stack all of the models such as RF, MLP, KNN, ETC, XGB, SVC, ADB, CART, and GBM. 7. We assess and compare our model to other models.

Fig. 2. Stacked Ensemble Method

Figure 2 depicts two levels: LEVEL 0 and LEVEL 1. First, we use the base learners (level 0) to make forecasts. The ensemble prediction is then generated by feeding those forecasts into the meta-learner (level 1).

An Improved Heart Disease Prediction Using Stacked Ensemble Method

93

4 Result Analysis This section presents the outcomes of changing the ten orders indicated above. PRC, Sensitivity, Specificity, F1 Score, ROC, Log Loss, and MCC are the most common evaluation metrics used in this analysis. Complexity refers to a calculation that defines the importance of a segment of the review, whereas recall refers to the number of times genuinely qualified people are recovered (Table 3). Table 3. Result of various models with proposed model. Model

Accuracy

PRC

Sensitivity

Specificity

F1 Score

ROC

Log_Loss

MCC

Stacked Classifier

0.910638

0.898438

0.934959

0.883929

0.916335

0.909444

3.086488

0.821276

RF

0.893617

0.865672

0.943089

0.839286

0.902724

0.891188

3.674399

0.789339

MLP

0.821277

0.809160

0.861789

0.776786

0.834646

0.819287

6.172973

0.642127

KNN

0.800000

0.787879

0.845528

0.750000

0.815686

0.797764

6.907851

0.599458

Extra Tree Classifier

0.885106

0.869231

0.918699

0.848214

0.893281

0.883457

3.968343

0.770445

XGB

0.897872

0.896000

0.910569

0.883929

0.903226

0.897249

3.527409

0.795248

SVC

0.812766

0.788321

0.878049

0.741071

0.830769

0.809560

6.466933

0.627138

AdaBoost

0.817021

0.812500

0.845528

0.785714

0.828685

0.815621

6.319943

0.633084

CART

0.851064

0.879310

0.829268

0.875000

0.853556

0.852134

5.144121

0.703554

GBM

0.829787

0.826772

0.853659

0.803571

0.840000

0.828615

5.879016

0.658666

The Stacked Ensemble Classifier, with an accuracy of 0.910, sensitivity of 0.934, specificity of 0.883, best f1-score of 0.916, minimum Log Loss of 3.08, and highest ROC value of 0.909, is the best performer. Of the same evaluation metrics in every region, Random Forest has the highest sensitivity level, while XGboost is second best.

Fig. 3. Accuracy Chart of ML Models

This Fig. 3 shows a visual depiction of effectiveness for all the other previously discussed machine learning techniques. Stacked classifier model’s accuracy is 91.06%, however, the F1 score is 0.9163. The accuracy of the XGB and RF algorithms, on the other hand, is 89.78% and 89.36%,

94

M. M. Islam et al.

respectively, with F1 scores of 0.8972 and 0.8911. The accuracies of Extra Tree Classifiers, CART, GBM, MLP, SVC, and KNN algorithms are 88.51%, 85.10%, 82.97%, 82.12%, 81.27%, and 80.00% (Fig. 4).

Fig. 4. Confusion Matrixes of Stacked Classifier Models and ROC Curve.

The confusion matrix for the implemented system is generated as shown in the diagram above. In the area of machine learning, extracted features are also referred to as artificial neurons. It is a statistical form that allows the reproduction of the results of an approach. In the case of graph partitioning, an ensemble learning approach is extremely useful. Knowledge is, specifically, the complexity of quantitative categorization.

Fig. 5. Heart Disease Identification.

Figure 5 depicts a visual representation of all cardiac problems being detected. Crimson indicates a heart attack, whereas verdant indicates no cardiac disease.

An Improved Heart Disease Prediction Using Stacked Ensemble Method

95

5 Conclusion and Future Recommendation Among the most significant threats to human survival is heart disease. Predicting cardiac illness has become a major concern and priority in the medical industry. Using the Stacked Ensemble Classifier, we have shown an improved heart disease prediction method. It incorporates a number of different prediction techniques. In this work, we examined the significance of prediction performance, precision, ROC sensitivity, Specificity, F1 Score, Log Loss, and MCC. To identify whether or not a person has a heart problem, we applied machine learning techniques. The medical data set was used in a variety of ways. As a consequence of the findings, we discovered that the enhanced stacked ensemble approach provides better accuracy than previous methods. The purpose of this research is to inquire about particular ML techniques on a form, therefore we further wanted to increase the dependability of the system’s operations to provide a much adequate assertion as well as encourage certain Approaches for recognizing the appearance of CVD. The above-mentioned structure could be adapted and repurposed for new purposes. The results show that these data mining algorithms may accurately predict cardiac disease with a 91.06% accuracy rate. As our study is based on recorded data from the Statlog, Cleveland, and Hungary datasets, for future research possibilities, we will aim to train and test on a large medical data set using many ensemble methods in the future to see if we can enhance their performance. Our ensemble method is superior to traditional methods, as even if it is overfitting at times, it usually reduces variances, as well as minimizes modeling method bias. It also has superior Predictive performance, reduces dispersion and our approach has superior efficiency by choosing the best combination of models.

References 1. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554 (2019) 2. Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. In: 2008 IEEE/ACS International Conference on Computer Systems and Applications, pp. 108–115. IEEE (2008) 3. Ramalingam, V.V., Dandapath, A., Raja, M.K.: Heart disease prediction using machine learning techniques: a survey. Int. J. Eng. Technol. 7(2.8), 684–687 (2018) 4. Bashir, S., Khan, Z.S., Khan, F.H., Anjum, A., Bashir, K.: Improving heart disease prediction using feature selection approaches. In: 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pp. 619–623. IEEE (2019) 5. Le, H.M., Tran, T.D., Van Tran, L.A.N.G.: Automatic heart disease prediction using feature selection and data mining technique. J. Comput. Sci. Cybern. 34(1), 33–48 (2018) 6. Yadav, D.C., Pal, S.A.U.R.A.B.H.: Prediction of heart disease using feature selection and random forest ensemble method. Int. J. Pharm. Res. 12(4), 56–66 (2020) 7. Kabir, P.B., Akter, S.: Emphasised research on heart disease divination applying tree based algorithms and feature selection. In: 2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), pp. 1–6. IEEE (2021) 8. Islam, M.T., Rafa, S.R., Kibria, M.G.: Early prediction of heart disease using PCA and hybrid genetic algorithm with k-means. In: 2020 23rd International Conference on Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2020)

96

M. M. Islam et al.

9. Soni, J., Ansari, U., Sharma, D., Soni, S.: Intelligent and effective heart disease prediction system using weighted associative classifiers. Int. J. Comput. Sci. Eng. 3(6), 2385–2392 (2011) 10. Rahman, M.J.U., Sultan, R.I., Mahmud, F., Shawon, A., Khan, A.: Ensemble of multiple models for robust intelligent heart disease prediction system. In: 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (ICEEiCT), pp. 58–63. IEEE (2018) 11. Vinothini, S., Singh, I., Pradhan, S., Sharma, V.: Heart disease prediction. Int. J. Eng. Technol. 7(3.12), 753 (2018) 12. Patel, J., TejalUpadhyay, D., Patel, S.: Heart disease prediction using machine learning and data mining technique. Heart Dis. 7(1), 129–137 (2015) 13. Dinesh, K.G., Arumugaraj, K., Santhosh, K.D., Mareeswari, V.: Prediction of cardiovascular disease using machine learning algorithms. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–7. IEEE (2018) 14. Kunjir, A., Sawant, H., Shaikh, N.F.: Data mining and visualization for prediction of multiple diseases in healthcare. In: 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), pp. 329–334. IEEE (2017) 15. Babu, S., et al.: Heart disease diagnosis using data mining technique. In: 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), vol. 1, pp. 750–753. IEEE (2017) 16. Karthiga, A.S., Mary, M.S., Yogasini, M.: Early prediction of heart disease using decision tree algorithm. Int. J. Adv. Res. Basic Eng. Sci. Technol. 3(3), 1–16 (2017) 17. Repaka, A.N., Ravikanti, S.D., Franklin, R.G.: Design and implementing heart disease prediction using Naives Bayesian. In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 292–297. IEEE (2019) 18. Sonawane, J.S., Patil, D.R.: Prediction of heart disease using learning vector quantization algorithm. In: 2014 Conference on IT in Business, Industry and Government (CSIBIG), pp. 1–5. IEEE (2014) 19. Amin, S.U., Agarwal, K., Beg, R.: Genetic neural network based data mining in prediction of heart disease using risk factors. In: 2013 IEEE Conference on Information & Communication Technologies, pp. 1227–1231. IEEE (2013) 20. Ul Haq, A., Li, J.P., Memon, M.H., Nazir, S., Sun, R.: A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob. Inf. Syst. (2018) 21. Gavhane, A., Kokkula, G., Pandya, I., Devadkar, K.: Prediction of heart disease using machine learning. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1275–1278. IEEE (2018) 22. Shah, D., Patel, S., Bharti, S.K.: Heart disease prediction using machine learning techniques. SN Comput. Sci. 1(6), 1–6 (2020) 23. Singh, A., Kumar, R.: Heart disease prediction using machine learning algorithms. In: 2020 International Conference on Electrical and Electronics Engineering (ICE3), pp. 452–457. IEEE (2020) 24. Soni, J., Ansari, U., Sharma, D., Soni, S.: Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int. J. Comput. Appl. 17(8), 43–48 (2011) 25. Dangare, C.S., Apte, S.S.: Improved study of heart disease prediction system using data mining classification techniques. Int. J. Comput. Appl. 47(10), 44–48 (2012) 26. Anushya, D.A.: Genetic exploration for feature selection. Int. J. Comput. Sci. Eng. 7(2) (2019) 27. Chen, A.H., Huang, S.Y., Hong, P.S., Cheng, C.H., Lin, E.J.: HDPS: heart disease prediction system. In: 2011 Computing in Cardiology, pp. 557–560. IEEE (2011)

An Improved Heart Disease Prediction Using Stacked Ensemble Method

97

28. Bhatla, N., Jyoti, K.: An analysis of heart disease prediction using different data mining techniques. Int. J. Eng. 1(8), 1–4 (2012) 29. Stacking Ensemble Machine Learning with Python. https://machinelearningmastery.com/sta cking-ensemble-machine-learning-with-python/. Accessed 22 Feb 2022

Improved and Intelligent Heart Disease Prediction System Using Machine Learning Algorithm Nusrat Alam1(B) , Samiul Alam2

, Farzana Tasnim1

, and Sanjida Sharmin1

1 International Islamic University Chittagong, Chattogram, Bangladesh

[email protected] 2 East Delta University, Chittagong, Bangladesh

Abstract. Predicting heart disease needs more perfection, precision, and correctness because a little fault may cause a big danger for a patient. In the field of machine learning, there are many classification algorithms for predicting heart disease. This paper presents the probability of heart disease prediction by some machine learning classifiers which are processed by feature engineering techniques on datasets. Feature engineering is used for building features by the process of using domain knowledge of data. Here a comparison has been shown before and after feature engineering of those supervised learning algorithms and identified the best algorithm for the best accuracy. The performance of each algorithm is determined and a comparison is made for each algorithm based on the precision of the calculation and the evaluation time. The proposed method has used the Cleveland dataset and another dataset consists of four datasets (Switzerland, Hungary, Cleveland, and Long Beach) downloaded from the Kaggle repository. Here the better accuracy has been gained from Ridge Classifier 86.89% for the Cleveland database. Another dataset has given 100% accuracy for the Gradient Boosting classifier, Bagging Classifier, and Gaussian Process classifier. This research will help to predict heart disease at an early stage which will reduce the death rate of heart disease. Keywords: Machine learning algorithms · Disease Prediction · Heart Disease prediction · Feature selection · Feature engineering technique

1 Introduction Machine learning has become an important subject for any engineering department. It is very important for data analysis, classification, and prediction. Machine learning is closely associated with Big Data, Data Science, and Artificial Intelligence [19]. At present, various theories of ML are also applied to general web apps or mobile phones so that the application you use becomes more intelligent and can acquire the ability to understand the human mind. The difference between a normal app and an ML Implemented app is that the normal app will always be the same but the ML Implemented © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 98–108, 2023. https://doi.org/10.1007/978-3-031-34619-4_9

Improved and Intelligent Heart Disease Prediction System

99

app will be unique, every time you use it you will feel that the app is becoming more intelligent [20]. However, ML can not only give intelligence to the app, ML works for any kind of classification and prediction starting from diagnosis. Heart disease may be a major pathological state in today’s time that assigns to an outsized number of medical conditions associated with the guts. Medical terms refer to the abnormal health issues that have a direct impact on the center and each of its components. Over 5.8 million Americans have it, and there are over 23 million of them worldwide [9]. Coronary vein illness, one thing else known as coronary illness, is the foremost generally recognized form of coronary illness. It makes once the conduits that offer blood to the guts terminated up discouraged with plaque [18]. This makes them cement and slim. Plaque contains steroid alcohol and distinctive substances. Consequently, the blood offer lessons, and therefore the heart gets less oxygen and fewer supplements. As anticipated, the guts muscle debilitates, and there is a peril of vas breakdown and arrhythmias. For this purpose, once plaque creates among the veins, this will be known as a coronary-artery disease [12, 13]. The recognition of 5 ordinary cardiovascular disappointment signs (neck, jaw, or back burden [8]; inadequacy or dizziness; chest trouble; arm or bear burden; and windedness) is greater in females than in folks (54.4% versus 45.6%) also, in whites (54.8%) than in blacks (43.1%), Asians (33.5%). Between 1999 to 2000 and 2015 to 2016 [8], the pervasiveness of perfect levels for many cardiovascular prosperity parts moved forward for US kids (12–19 a long time ancient), counting nonsmoking, total cholesterol, and BP. But no remarkable changes were seen inside the pervasiveness of an ideal score for the sound eating plan score among kids over this period, the predominance of ideal levels of weight rundown, Father, and diabetes mellitus (DM) declined. Data from the Society of Thoracic Specialists Database reflects that a total of 122459 natural heart restorative methods were performed from July 2014 to June 2018. In 2018, 3408 heart transplantations were acted within the Joined together States, the foremost ever [8]. In an accomplice of 58671 porous females taking an intrigued within the Nurses’ Wellbeing Consider II without hypertension at gestational, benchmark, hypertension and toxemia amid to begin with pregnancy were related with the next pace of self-announced specialist analyzed steady hypertension over a 25- to 32-year follow-up for gestational hypertension (HR, 2.8 [95% CI, 2.6–3.0] and HR, 2.2 [95% CI, 2.1–2.3] for toxemia) [8]. The proposed research work is based on machine learning which is a section of Artificial Intelligence (AI). In previous research, they calculated the accuracy of different machine learning algorithms and on the support of calculation, they concluded which one was the best among them. There was some research gap like they used a limited machine learning algorithm for prediction and they worked on a limited dataset for predicting the accuracy. There is also a need to improve feature engineering and to find the machine learning approaches which will be used for the best analysis of heart diseases for best prediction accuracy. This research focuses on the development of new techniques to achieve the best accuracy of heart disease prediction were applied the feature engineering technique to various efficient machine learning algorithms and made a comparison between the accuracy before and after feature engineering. To calculate the accuracy of heart disease prediction of mentioned datasets [6, 7]. Where also has made

100

N. Alam et al.

a comparison of various previous research works. To demonstrate the best prediction value rate for every model in the final declaration.

2 Related Work In the research work by Archana et al. [1] they used the machine learning algorithms SVM, linear regression, decision tree, and K-nearest neighbor to predict cardiac disease. Because Jupiter notebook is simple to use for Python programming projects, it is employed as a simulation tool. Taking into account the confusion matrix, which is based on the true positive, false positive, true negative, and false negative values. They discover that among the evaluated algorithms, KNN provided the highest accuracy. KNN accuracy in the area was 87%. They draw a conclusion about which of them is finest. provided by Pranav et al. The work article by Nidhi et al. [3] mentioned offering a study regarding the various data mining techniques that can be applied in these automated systems. A patient will undergo fewer tests as a result of this automation. As a result, it will not only save expenses but also save analysts’ and patients’ time [3]. Based on a combination of variables, such as risk factors defining the disease, Raghunath et al. [4] employed and developed suitable machine learning algorithms that are computationally efficient as well as accurate for the prediction of heart disease occurrence. Using the Cleveland dataset from UCI, which consists of 303 records and 14 features, Kumar et al. [5] applied various machine learning techniques and algorithms. After analyzing the results of these algorithms, they discovered that SVM provided the best accuracy results, followed by naive Bayes, KNN, and decision tree. The Correlation-based Feature Selection (CFS) technique was put forth by Gazeloglu et al. [10] with a classifier that achieved the best accuracy of 84.81%. Additionally, they used the RBF network and CHi square to get an accuracy of 81.1%. The Relief feature selection method (RFRS) and C4.5 ensemble classifier were proposed by Liu et al. [11] using the SlatLog dataset, and they achieved the greatest accuracy of 92.5%. PCA and CHI Square feature selection methods were used by Farzana et al. [14] to minimize the number of features on Cleveland dataset, where RF with PCA given highest accuracy of 92.85%. Marappan, R. et al. [16] proposed method to extract hidden models in a dataset the usage of machine getting to know (ML) techniques and analyzing the accuracy of numerous ML algorithms to discover the satisfactory prediction of heart disease. Random forest achieved highest accuracy of 90.16%. Riyaz, L et al. [17] applied various ML techniques used for the prediction of the incidence of coronary heart diseases. Consistent with the results, the very best average prediction accuracy became done by ANN 86.90%, while the C4.5 DT method came up with the lowest prediction accuracy of 74.0%.

Improved and Intelligent Heart Disease Prediction System

101

3 Methodology The work is about predicting the probability of heart diseases by a machine learning algorithm, where two heart diseases dataset were collected with the same attribute. Then the prepossessing technique has been applied to unwanted occurrences in the dataset.The proposed method was divided into two methods, • First method: No application of any feature engineering method in these two datasets and 15 machine learning algorithms were applied on preprocessed datasets. After that the performance of the algorithms has been measured. • Second method: Application of the feature engineering method to both preprocessed datasets. Feature engineering is a method where this process extracts feature by domain knowledge and better understanding by machine learning algorithms. The proposed method applied 15 machine learning algorithms and analyzed the performance of a different algorithm. At last, compare the performance of two methods before and after feature engineering. Here Fig. 1 has shown the experimental workflow of this work.

Fig. 1. Experimental workflow of this research

3.1 Dataset Collection For this research, two open-source datasets have been collected. Cleveland dataset [6] collected from UCI repository which contains 303 records of patients and another dataset consists of four databases: Switzerland, Cleveland, Hungary, and Long Beach V [7] also collected from Kaggle repository which has 1025 records of patients. Both datasets have the same 14 attributes. The data sets attributes description is given in Table 1.

102

N. Alam et al. Table 1. Description of the attributes

3.2 EDA (Exploratory Data Analysis) & F.E (Feature Engineering) The feature is a method for taking vague raw data and turning it into a feature that a model can better understand and use to make decisions. Data domain expertise is required for feature engineering in order to build features. By creating highlights from raw data and creating facilities for the machine learning process, feature engineering increases the predictive power of machine learning calculations. Algorithms for machine learning use data as input and provide accurate, informative results. Features, which are typically represented by organized columns, are extracted from the incoming data [15]. By feature engineering, the dataset becomes more practical and intelligible for the machine to predictable the accuracy. To achieve prestigious results from the machine learning algorithms, feature engineering is necessary.So basically, creating such a feature is what was feed to the model. The model may be able to better understand. Feature engineering can arise out of domain understanding. It can come from data analysis or exploratory data analysis phase as well. Some insight has been converted into a feature. Sometimes the features can also come from an external data provider. Feature engineering points are to set up the correct information dataset and viable with the ML calculation prerequisites and for improving the exhibition of ML models. Feature engineering Techniques are: 1) Imputation 2) Categorical Encoding 3) Binning 4) Scaling 5) Feature Selection 6) Handling Outliers. • Categorical Encoding: For a good prediction from a dataset, there is a need for an intelligent way to preprocess datasets that are named by categorical features. Categorical Encoding is a feature engineering technique where those variables from dataset are extracted and separate these variables and mark them as their categorical type. There are many types of categorical encoding. Like Label Encoding, One-Hot Encoding, Count Encoding, Target Encoding, and Leave One Out Target Encoding. In proposed work, the Label Encoding of categorical encoding was applied. Where label Encoding was converted in each categorical value into some number.

Improved and Intelligent Heart Disease Prediction System

103

For example, the ‘Fruits’ feature contains 3 categories. Value assigned 0 to Mango, 1 to Apple, and 2 to Banana. The sex categories FEMALE and MALE can encode with values 0 and 1. • Scaling: For making better predictions the proposed method need to do preprocess on datasets. Scaling is a technique where data are scaled by some mathematical term. Like Standard Scaling, Min-Max Scaling (Normalization), Quintile Transformation, etc. In our research work, nStandard Scaling and Min-Max Scaling were applied. • Feature engineering: Both datasets have 14 attributes and the last attribute is the level of the dataset. The first 13attributes have gone through the feature engineering method and increased the attributes of the dataset. Here Table 2 shows the attribute of the dataset before feature engineering, where only 13 attributes of patients have an integer and categorical values.

Table 2. 13 Attributes before feature engineering 1

Age

6

thal

11

slope

2

Sex

7

resting

12

ca

3

Cp

8

thatch

13

thal

4

Trestbps

9

exang

5

Chol

10

odlpeak

Here Table 3: shows the attributes after applying the feature engineering method, which generates more than 47 attributes of patients. Feature creating method determination of categorical feature and encoding categorical feature was applied (label encoder). Previous 13 attributes and new generated 47 attributes merged and created a new dataset of 60 attributes. 3.3 Preparing to Model Here, for predicting heart disease split both datasets into 80% training data and 20% testing data. Then created the model using the training data applying different machine learning algorithms Linear Regression, Decision Tree Classifier, Support Vector Machines, Logistic Regression,k-Nearest Neighbors (KNN), Neural network (NN), Gaussian Process Classification, Naive Bayes, and so on. Then predict the label of test data and evaluate the accuracy of each of the algorithms for both datasets.

104

N. Alam et al. Table 3. new47 attributes after feature engineering

1

age2

17

age2_oldpeak2

33

restecg_ca

2

trestbps2

18

age2_slope

34

exang_cp

3

chol2

19

age2_ca

35

exang_trestbps2

4

thalch2

20

fbs_cp

36

exang_chol2

5

oldpeak2

21

fbs_trestbps2

37

exang_thalach2

6

sex_cp

22

fbs_chol2

38

exang_oldpeak2

7

sex_trestbps2

23

fbs_thalach2

39

exang_slope

8

sex_chol2

24

fbs_oldpeak2

40

exang_ca

9

sex_thalach2

25

fbs_slope

41

thal_cp

10

sex_oldpeak2

26

fbs_ca

42

thal_trestbps2

11

sex_slope

27

restecg_cp

43

thal_chol2

12

sex_ca

28

restecg_trestbps2

44

thal_thalach2

13

age2_cp

29

restecg_chol2

45

thal_oldpeak2

14

age2_trestbps2

30

restecg_thalach2

46

thal_slope

15

age2_chol2

31

restecg_oldpeak2

47

thal_ca

16

age2_thalach2

32

restecg_slope

4 Results and Discussion The work is about predicting heart diseases by a machine learning algorithm. Two datasets Cleveland and (Switzerland, and Long Beach) datasets are collected. The proposed method was divided the into two methods, where firstly feature engineering was not the method in these two datasets, and secondly here, the feature engineering method was applied. After that divided datasets into train and test applied machine learning algorithms and analyzed the performance of different algorithms. Comparing Accuracy with Before and After Feature Engineering for the Cleveland Dataset [6] In Table 4: the comparison of before and after feature engineering in the Cleveland dataset [6] of 303 instances have been shown. Here machine learning techniques have been applied and utilized that after applying feature engineering techniques, a better accuracy has been achived. The Support Vector Machines can be utilized which have given the best accuracy of 86.89%. Result than other machine learning techniques in this model. The equations of the evaluation parameters (Fig. 2): TP = True Positive, TN = True Negative FP = False Positive, FN = False Negative Accuracy =

TP + TN TP + FP + FN + TN

Improved and Intelligent Heart Disease Prediction System Table 4. Accuracy with before and after applying Feature Engineering [6] Algorithms

Accuracy Before Feature Engineering

Accuracy after Feature Engineering

Support Vector Machines

77.05%

88.52%

Extra Trees Classifier

80.33%

85.25%

Ridge Classifier

77.05%

86.89%

Linear SVC

78.69%

86.89%

Logistic Regression

77.05%

86.89%

KNN

77.05%

83.61%

Gradient Boosting Classifier

78.69%

81.97%

Precision = Recall = F1score =

TP TP + FP

TP TP + FN

2 ∗ (Recall ∗ Precision) Recall + Precision

Fig. 2. Precision, Recall, F1-score before and after FE on Cleveland dataset.

105

106

N. Alam et al.

Comparing Accuracy with Before and After Applying Feature Engineering for Dataset [7] Tables 5 and 6 reflects the comparison of the before and after feature engineering in the dataset consists of four databases: Switzerland, Hungary, Cleveland, and Long Beach V. [7] of 1025 instances. Follow the same procedure as the dataset [6]. Algorithm Gradient Boosting Classifier, Gaussian Process Classifier, and Bagging Classifier have given better accuracy of 100% than other algorithms. Table 5. Comparison table of accuracy before and after applying Feature Engineering [7] Algorithms

Accuracy Before Feature Engineering

Accuracy after Feature Engineering

Support Vector Machines

87.80%

90.24%

KNN

91.71%

99.02%

LGBM Classifier

95.12%

97.56%

Decision Tree Classifier

94.15%

98.54%

Gaussian Process Classifier

97.07%

100%

Bagging Classifier

95.12%

100%

Table 6. Evaluation metrics before and after applying feature engineering to dataset [7]: Algorithm Name SVM

Precision

Recall

F1-score

Before FE

After FE

Before FE

After FE

Before FE

After FE

0.89

0.91

0.88

0.90

0.88

0.90

KNN

0.92

0.99

0.92

0.99

0.92

0.99

LGBM Classifier

0.98

0.99

0.98

0.99

0.98

0.99

DT

0.94

0.99

0;94

0.99

0.94

0.99

Gaussian Process Classifier

0.97

1.00

0.97

1.00

0.97

1.00

Bagging Classifier 0.95

1.00

0.95

1.00

0.94

1.00

4.1 Comparing This Research Work with Some Previous Research Here, Table 7 has shown the comparison between the proposed methods with other previous research work. Where the proposed method has given better results than others.

Improved and Intelligent Heart Disease Prediction System

107

Table 7. Comparing the accuracy of heart disease prediction with different algorithms from different research Author

Dataset

Feature selection technique

Algorithms

highest Accuracy

Archana [1]

Cleveland Dataset

No feature selection technique

SVM, DT, LR, KNN 87%

D.Raghunath et al. [4]

Cleveland dataset with 500 sample

No feature selection technique

KNN, DT, KNN, LR, NB, SVM

84.76%

X. Liu et al. [11] SlatLog dataset

RFRS

C4.5

92.5%

C. Gazeloglu et al. [10]

Cleveland dataset

CFS & CHI square

NB, RBF classifier

84.81%

Marappan, R. et al. [16]

Cleveland Dataset and Hungarian Dataset

No feature selection

RF, LR, NB, SVM

90.165%

AJN

Cleveland dataset Feature Cleveland, Hungary, Engineering Switzerland, Long Beach V

Dataset [6]: SVM Dataset [7]: GradientBoosting BaggingClassifier

89% 100% 100% 100%

5 Conclusion This research provides an analysis of various machine learning techniques that will be useful to medical analysts or healthcare practitioners to make an accurate diagnosis of heart disease. In experimental analysis by applying feature engineering techniques and a machine learning algorithm can be, heart disease predicted. Here, the feature engineering technique increased the total number of attributes of the dataset which enhance the performance of the machine learning algorithm. The proposed method applied fifteen machine learning classification algorithms where for dataset [6] best accuracy was 88.52% gained by SVM. And for dataset [7] best accuracy was 100% gained by, Bagging Classifier and Gaussian Process classifier. Our future work will be proposed to gather local clinic datasets, Attributes of the dataset can similarly be changed and the proposed model can be applied. The dataset is displayed with its correlation values, so the process attributes are selected for predicting heart diseases.

References 1. Chudhey, A.S., Sharma, A., Singh, M.: Heart disease prediction using various machine learning algorithms. In: Mahapatra, R.P., Peddoju, S.K., Roy, S., Parwekar, P., Goel, L. (eds.) ICRTC 2021. LNNS, vol. 341, pp. 325–335. Springer, Singapore (2022). https://doi.org/10. 1007/978-981-16-7118-0_28

108

N. Alam et al.

2. Bhoyar, S., Wagholikar, N., Bakshi, K., Chaudhari, S.: Real-time heart disease prediction system using multilayer perceptron. In: 2021 2nd International Conference for Emerging Technology (INCET), pp. 1–4. IEEE (2021) 3. Bhatla, N., Jyoti, K.: An analysis of heart disease prediction using different data mining techniques. Int. J. Eng. 1(8), 1–4 (2012) 4. Raghunath, D., Usha, C., Veera, K., Manoj, V.: Predicting heart disease using machine learning techniques. Int. Res. J. Comput. Sci. 149–153 (2019) 5. Rajesh, N., Maneesha, T., Hafeez, S., Krishna, H.: Prediction of heart disease using machine learning algorithms. Int. J. Eng. Technol. (UAE) 7(2.32 Special Issue 32), 363–366 (2018) 6. “HeartDiseaseDataset.” https://www.kaggle.com/johnsmith88/heart-disease-dataset 7. Silva, F.S., et al.: Hyperbaric oxygen therapy mitigates left ventricular remodeling, upregulates MMP-2 and VEGF, and inhibits the induction of MMP-9, TGF-β1, and TNF-α in streptozotocin-induced diabetic rat heart. Life Sci. 295, 120393 (2022) 8. Gazeloglu, C.: Prediction of heart disease by classifying with feature selection and machine learning methods. Prog. Nutr. 22(2), 660–670 (2020) 9. Liu, X., et al.: A hybrid classification system for heart disease diagnosis based on the RFRS method. Comput. Math. Methods Med. 2017 (2017) 10. Nguyen, T.N.A., Bouzerdoum, A., Phung, S.L.: A scalable hierarchical Gaussian process classifier. IEEE Trans. Signal Process. 67(11), 3042–3057 (2019) 11. Patel, J., TejalUpadhyay, D., Patel, S.: Heart disease prediction using machine learning and data mining technique. Heart Dis. 7(1), 129–137 (2015) 12. Tasnim, F., Habiba, S.U.: A comparative study on heart disease prediction using data mining techniques and feature selection. In: 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 338–341. IEEE (2021) 13. Almustafa, K.M.: Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinform. 21(1), 1–18 (2020) 14. Srivastava, K., Choubey, D.K.: Heart disease prediction using machine learning and data mining. Int. J. Recent Technol. Eng. 9(1), 212–219 (2020) 15. Essinger, S.D., Rosen, G.L.: An introduction to machine learning for students in secondary education. In: 2011 Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), pp. 243–248. IEEE (2011) 16. Marappan, R.: Heart disease prediction analysis using machine learning algorithms. J. Appl. Math. Comput. 6(3), 273–281 (2022). https://doi.org/10.26855/jamc.2022.09.001 17. Riyaz, L., Butt, M.A., Zaman, M., Ayob, O.: Heart disease prediction using machine learning techniques: a quantitative review. In: Khanna, A., Gupta, D., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1394, pp. 81–94. Springer, Singapore (2022). https://doi.org/ 10.1007/978-981-16-3071-2_8 18. Hossen, M.K.: Heart disease prediction using machine learning techniques. Am. J. Comput. Sci. Technol. 5(3), 146–154 (2022) 19. Mahmud, M., Kaiser, M.S., Hussain, A., Vassanelli, S.: Applications of deep learning and reinforcement learning to biological data. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2063–2079 (2018). https://doi.org/10.1109/TNNLS.2018.2790388 20. Mahmud, M., Kaiser, M.S., McGinnity, T.M., Hussain, A.: Deep learning in mining biological data. Cogn. Comput. 13(1), 1–33 (2020). https://doi.org/10.1007/s12559-020-09773-x

PreCKD ML: Machine Learning Based Development of Prediction Model for Chronic Kidney Disease and Identify Significant Risk Factors Md. Rajib Mia1 , Md. Ashikur Rahman1 , Md. Mamun Ali1 , Kawsar Ahmed2,3(B) , Francis M. Bui3 , and S M Hasan Mahmud4 1

Department of Software Engineering (SWE), Daﬀodil International University (DIU), Daﬀodil Smart City, Ashulia, Savar, Dhaka 1341, Bangladesh {rajib.swe,ashikur35-562,mamun35-274}@diu.edu.bd 2 Group of Biophotomatix, Department of ICT, Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail 1902, Bangladesh [email protected], [email protected], [email protected] 3 Department of Electrical and Computer Engineering (ECE), University of Saskatchewan (USASK), 57 Campus Drive, Saskatoon, SK S7N 5A9, Canada {k.ahmed,francis.bui}@usask.ca 4 Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka 1229, Bangladesh [email protected]

Abstract. Chronic Kidney Disease (CKD) is major concern of death in recent years that can be cured by early treatment and proper supervision. But early detection of CKD and exact risk factors should be known to ensure proper treatment. The study mainly aims to address the issue by building a predictive model and discovers the most signiﬁcant risk factors employing machine learning (ML) approach for CKD patients. Four individual machine learning classiﬁers were applied to conduct this study. It is found that GB performed very poor compare to other applied classiﬁers where RF and LightGBM outperformed with 99.167% accuracy. In terms of risk factors, it is found that sg, hemo, sc, pcv, al, rbcc, htn, dm, bgr, and sod are the most signiﬁcant factors, which are mainly correlated with CKD. The study and its ﬁndings indicate that it will enable patients, doctors and clinicians to identify CKD patients early and ensure proper treatment for them. Keywords: CKD · Feature Importance Forest · Speciﬁc Gravity

· Hemoglobin · Random

c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 109–121, 2023. https://doi.org/10.1007/978-3-031-34619-4_10

110

1

Md. R. Mia et al.

Introduction

Kidney is one of the major organs in the human body. The kidneys are two organs that look like beans and work together. The primary function of the kidneys is to ﬁlter the blood and discharge waste, as well as balance the ﬂuid in the human body through the excretion of excess ﬂuid through the urine. Science says that healthy kidneys ﬁlter between 120 and 150 quarts of blood and make between 1 and 2 quarts of urine every day. The creation of urine comprises a series of very complex excretion and re-absorption processes. Potassium, acids, and generating hormones, which is essential for other organs’ functionality, this process is very much needed for keeping a static balance of the body’s salt. CKD is one of the fatal diseases that is causing concern throughout the world at the present moment. Besides, it also increases the risk of other diseases like heart disease and heart failure, strokes, early death, etc. [2]. CKD is a condition in which, the kidneys’ progressive decline in functionality gradually. There are about 850 million people around the world suﬀering from kidney disease only in 2020, and the number is increasing every day. CKD aﬀects 10.4 percent of men and 11.8 percent of women in this group [1]. The measured or estimated glomerular ﬁltration rate (EGFR), which is determined by the creatinine level, gender, and age, is used to assess the stages of CKD. A large sum of money was required for the treatment of CKD patients. Every year, countries all over the world spend a vast amount of their healthcare budgets on CKD patients. So it has been a big concern for developing countries like India, Bangladesh, and so on to spend a huge amount on CKD treatment. Therefore, it is necessary to raise awareness among people about the risk and symptoms of CKD. Several studies have been attempted in the last decade to predict CKD using various ML algorithms. Ramesh et al. (2019) performed data mining (DM), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM) classiﬁers to predict CKD. This research got the RF classiﬁer to give an accuracy of 99.16%. Though this study used 3 classiﬁer algorithms but only focused on the accuracy of the classiﬁer. But accuracy is not the only performance matrix to call a model good [3]. Yashﬁ et al. (2020) used Artiﬁcial Neural Network (ANN) and RF algorithm to identify CKD, and also performed a chi-square test for feature selection as well as applied 10-fold cross validation on the dataset. However, only two algorithms were utilized to predict CKD in this investigation, and only the chi-square test was used to identify the characteristics [4]. Ward et al. (2019) applied four Machine Learning (ML) algorithms like, Logistic Regression (LR), SVM, RF, and Gradient Boosting (GB). Among them, GB gives the height accuracy for training data 99.7% and for testing data 99.0% [5]. Pankaj et al. (2021) have trained the model using ML classiﬁers such as the ANN, C5.0, Logistic Regression (LR), linear SVM, K- nearest Neighbors (KNN), and Random Tree (RT). They also utilised a Deep Neural Network (DNN), which has a 99.6% accuracy rate [6]. Jiongming et al. (2019) performed various ML algorithms to analyze the diﬀerent features, which are important to predict CKD. Among those algorithms, KNN had 99.25% accuracy [7].

PreCKD ML: Development of Prediction Model for CKD

111

Hasan et al. (2019) performed a study, where this research showed ﬁve ML algorithms such as, Adaptive Boosting (AB), Bootstrap Aggregating (BA), Extra Trees (ET), GGB, and RF Classiﬁer. With a 99 percent accuracy, AB has been the most accurate [8]. And this study only focused on features importance techniques. Celik et al. (2016) proposed a model based on SV and DT classiﬁer algorithms and got an accuracy of 100% for DT Test1, but got 91.6667% for DT Test2. Here, only two algorithms have been used to predict CKD [9]. Surya et al. (2021) proposed a model based on Convolutional Neural Networks (CNN) and Bi-directional Long Short Term Memory (BLSTM). This study got AUROC of 0.957 for 6 month predictions and got AUROC of 0.954 for 12 month predictions. Although the dataset used in this study is not a clinical dataset, it contains only age, sex, comorbidities, and medications [10]. Another study in CKD has been proposed by Almansour et al. (2019) based on ANN and SVM approaches, where ANN got the better performance of 99.75% accuracy. Though the performance was good, the study only used 2 approaches for predicting CKD [11]. Furthermore, Radha and Ramya (2015) have taken a study, where this research was carried out Na¨ıve Bayes (NB), DT, KNN, and SVM classiﬁer algorithms to identify CKD. And got the highest accuracy of 98% for KNN, and NB gives a poor accuracy whereas no signiﬁcant characteristics have been shown by this study [12]. In addition, Chiu et al. proposed an ANN based model with an accuracy of 94.75%. However, this research did not come across any features that were signiﬁcant to making predictions in their work [13]. According to the above mentioned discussion, it can be found that still there is scope to improve the existing methods. From this perspective, this study concentrated on the characteristics those have been shown to predict CKD and also focused on which features are important to predict positive CKD and which features are impactful to predict negative CKD. Various Machine Learning (ML) algorithms have been used for performing predictions. Nowadays, ML is very useful for predicting many kinds of disease in the medical ﬁeld. The primary goal of this study is to analyze the features and predict which features are impactful in predicting CKD. And select the best model among diﬀerent ML algorithms by analyzing the statistical results. Our contributions are mentioned as following: ✦ Having collected an open source data from Kaggle online repository, the data was preprocessed to ﬁt for ML classiﬁers. ✦ Then four individual mostly used ML algorithms were applied to ﬁnd the best ﬁt ML classiﬁer for the expected predictive model. ✦ After comparing the performances of all the applied classiﬁers, best ﬁt classiﬁer is selected. ✦ SHAP values is calculated and found out the impact of each feature on the CKD. ✦ Risk factors of CKD are analysed and discussed.

112

Md. R. Mia et al.

Fig. 1. Experimental methodology

2 2.1

Materials and Methods Data

The datasets used in this study came from the UCI ML Repository [14]. There are 400 instances in all, 250 of which are CKD, and the remaining 150 are not CKD. In addition, the dataset has 25 features in total. Among them, 24 are labeled features and 1 is a targeted feature. ckd (stated as “CKD patient”) and notckd (stated as “Not CKD patient”) were used to categorize the target feature. Furthermore, there are basically two types of data, one is numeric and other is nominal. In the datasets 14 features are nominal and residual 11 features are numeric data. Table 1 contains detailed information on the datasets, including the features name, type, and interpretation for every feature. 2.2

Data Preprocessing

Every ML or DM approach requires data preprocessing. Because the eﬃciency of a ML approach is dependent on data preprocessing, The data was prepreoprocessed using Weka version 3.8.6 as a DM tool. Python version 3.7. 12 is suggested for Exploratory Data Analysis (EDA) visualisation and model building. To begin, a Replace MissingValues ﬁlter was used to handle missing data. Secondly, we performed encoding to convert string data to numeric format. Here we used label encoding to perform this conversion. Label Encoding is the process of converting labels into numbers so that machines can read them. Then, ML techniques might be able to ﬁgure out how these labels should be handled better. It is a very important step in the preparation of classiﬁers for organised datasets [26]. Then performed some EDA with visualization on the process datasets. The working methodology of this study has been shown in Fig. 1.

PreCKD ML: Development of Prediction Model for CKD

113

Table 1. Data and feature interpretation

2.3

Features Type

Interpretation

Unit/Attribute Values

age bp sg al su rbc pc pcc ba bgr bu sc sod pot hemo pcv wc rc htn dm cad appet pe ane class

Patient age Blood Pressure Speciﬁc Gravity Albumin Sugar Red Blood Cell Pus Cell Pus Cell clumps Bacteria Blood Glucose Random Blood Urea Serum Creatinine Sodium Potassium Hemoglobin Packed Cell Volume White Blood Cell Count Red Blood Cell Count Hypertension Diabetes Mellitus Coronary Artery Disease Appetite Pedal Edema Anemia Target Class

Years mm/Hg 1.005, 1.010, 1.015, 1.020, 1.025 0, 1, 2, 3,4, 5 0, 1, 2, 3,4, 5 normal, abnormal normal, abnormal present, not present present, not present mgs/dl mgs/dl mgs/dl mEq/l mEq/l gms – cells/cumm millions/cmm yes, no yes, no yes, no good, poor yes, no yes, no ckd, notckd

numeric numeric nominal nominal nominal nominal nominal nominal nominal numeric numeric numeric numeric numeric numeric numeric numeric numeric nominal nominal nominal nominal nominal nominal nominal

Performance Evaluation Metrics

Four diﬀerent classiﬁcation techniques were used to the dataset in order to determine the technique that performed the best when accuracy and other statistical metrics were compared by train-test-split. RF, GB, XGBoost (XGB) and Light Gradient Boosting Machine (LightGBM) were used in this study. On the basis of their performance evaluation metrics, these algorithms were compared. This part provides an overview of various performance assessments. The confusion matrix was used to determine the sensitivity (Sn), speciﬁcity(Sp), and accuracy(Acc) of each algorithm’s outcome. All parameters were calculated using the formulas shown below [20,23]. Acc =

TP + TN TP + TN + FP + FN

(1)

114

Md. R. Mia et al.

TP (2) TP + FN TN (3) Sp = TN + FP The number of accurate predictions provided by the model over all possible predictions is referred to as accuracy in classiﬁcation ML problems. Accuracy is a suitable metric when the target variable classes in the data are roughly balanced [20]. The percentage of true positive instances that were projected to be positive is known as sensitivity [23]. The term “speciﬁcity” refers to the fraction of real negatives that were predicted as negatives [24]. Multiple statistical measures like as kappa statistics (Kp), recall (Rc), precision (Pr ) and f1-measure (F1 ) were also used to evaluate the performance of various algorithms. The term “recall” refers to the quantity of positives generated by our ML algorithm [21]. Precision is the number of real positives divided by the total number of predicted positives [22]. The harmonic mean of accuracy and recall will be determined by this score. The weighted average of accuracy and recall is used to determine the F1 score [21]. The Kappa statistic is a measure that is used to evaluate observed and predicted accuracy [25]. Sn =

TP TP + FN TP Pr = TP + FP

Rc =

2.4

(5)

2 × P r × Rc P r + Rc

(6)

observed accuracy − expected accuracy 1 − expected accuracy

(7)

F1 = Kp =

(4)

Machine Learning Approaches

In this study, four ML classiﬁers like RF, GB, LightGBM, and XGBoost were used. These are explained in the next section. Random Forest: RF is a way to learn while being watched. It makes a “forest” from a group of decision trees, most of which are trained using the “bagging” method. The basic idea behind the bagging method is that combining diﬀerent ways of learning makes the ﬁnal result better [15]. This supervised learning approach predicts the result based on voting techniques. If a majority of the trees in the forest give prediction are 1, then the RF predicts that the ﬁnal prediction is 1, and vice versa [16]. Also, RF is a macro method that uses DT classiﬁers to resample the dataset multiple times and then uses averaging to improve prediction accuracy and avoid overﬁtting. When bootstrap=True the size of the resamples is determined by the max samples argument; otherwise,

PreCKD ML: Development of Prediction Model for CKD

115

the entire dataset is being used to create each tree [27]. For this research, the best-ﬁt n estimators value was 50 and max depth was 4, which provided the best performance upon this used dataset. Gradient Boosting: GB makes additive regression models by ﬁtting a basic parameterized function (base learner) to repeatedly show “pseudo”-residuals using least squares [17]. GB classiﬁers are actually a group of ML techniques that combine a lot of weak learning models to make a strong prediction model. DTs are often used when doing GB. GB models are becoming more popular because they are good at classifying big, complicated datasets [28]. Additionally, the GB algorithm employs the sequential ensemble learning technique. Weak learners signiﬁcantly improved over time as a result of this strategy of loss optimization. For instance, the second weak learner is stronger than the ﬁrst, and the third weak learner is stronger to the second. According to this research, the best-ﬁt n estimators value was 25 and learning rate was 0.1, which resulted in the best performance on the dataset used in the study. LightGBM: LightGBM is a GB method that uses techniques from tree-based learning. It is distributed and supports parallel and GPU learning, making it capable of managing massive amounts of data. LightGBM is six times the speed of XGBoost. XGBoost is a ML method that is very quick and accurate. However, it is currently being challenged by LightGBM, which runs quicker with equivalent model accuracy and provides users with more hyperparameters to tune. The critical performance diﬀerence is that XGBoost splits the tree nodes each level at a time, while LightGBM does it single node at a moment [19]. A similar treebased learning method is also used in LightGBM, which is a method for GB. LightGBM grows trees upward, while another method grows trees parallel to the ground. This means that LightGBM grows trees one leaf at a time, while the other method grows trees one level at a time. It will pick the leaf with the most water loss [30]. For this study, the best-ﬁt random state value was 75, learning rate was 0.09 and max depth was 5, which performed optimally on the used dataset. Extreme Gradient Boosting: XGBoost is a method for learning in groups. At times, it may not be enough to just use the results of a single ML model. Ensemble learning is a way to combine the ability to predict of many learners in a structured way. Because of this, a single model is made that combines the results of many models [18]. Also, XGBoost is a framework for distributed GB that has been made to be very eﬃcient, ﬂexible, and portable. It makes ML techniques with the help of the GB framework. XGBoost uses simultaneous tree boosting to do a number of data science tasks quickly and accurately. [29]. The XGBoost algorithm is designed on the concept of Performance and Execution Time. It performs much quicker than other boosting algorithms. Both regression and classiﬁcation issues may be solved with XGBoost. This strategy essentially

116

Md. R. Mia et al.

enhances the DT sequence and improves the accuracy dependent on the weight. For this analysis, the best suited base score value was 0.5 and learning rate was 0.1, which provided the highest performance for the used dataset. 2.5

Model Selection and Features Importance

For every ML approach, selecting the best model is crucial. In this study, we select the best model by considering the various evaluation matrix and statistically analyzing the result of the evaluation matrix. Another crucial term in ML techniques is to select the important features for prediction. Feature importance is crucial because if we ﬁnd the important features and rank them, it will have a really big impact on the research for prediction in the ﬁelds of biomedicine and social science. In this work, SHAP values have been used to sort-out the important features of this data set. SHAP values quantify the eﬀect of getting a particular value for a particular feature in relation to the prediction we’d generate if that feature had some numeric value [31]. The SHAP value is estimated using the below equation: φi =

(S⊆N {i})

(|S|)!(K − |S| − 1)! [f (S ∪ {i}) − f (S)] K!

(8)

Here φi is the value of feature importance of ith instance. K is the number of independent features where S is the non zero indexes.

3

Results and Discussion

In this study, python (Version 3.8.5) was employed to conduct the study and google colab was used as the IDE and programming environment. The result found from the study is represented in following section. Table 2 described the accuracy score, sensitivity and speciﬁcity score for different ML approaches, which have been used in this work. Table 2 represents that both ML approaches, RF and LightGBM, predicted CKD with the maximum accuracy of 99.167%. The heights sensitivity score is 1.0 for RF. The XGB gives the highest speciﬁcity of 1.0. The highest recall score found is 1.0 for RF. The XGB and LightGBM give the highest 1.0 precision score among four diﬀerent ML approaches. F1 Measure score is 0.989 for RF and LightGBM, which is the Table 2. Performance comparison among all the applied classiﬁers Algorithm Accuracy Sensitivity Speciﬁcity Recall Precision F1 Measure Kappa Statistic GB XGBoost RF LightGBM

95.83% 98.33% 99.167% 99.167%

0.959 0.973 1.00 0.986

0.956 1.0 0.978 1.0

0.959 0.973 1.0 0.986

0.972 1.0 0.986 1.0

0.946 0.979 0.989 0.989

0.912 0.965 0.982 0.982

PreCKD ML: Development of Prediction Model for CKD

117

highest and a 0.982 kappa statistic score is shown by LightGBM and RF among all the used ML algorithms. Those results are shown in Table 2. This research mainly focused on the features that are important to predict CKD. To perform and ﬁnd out important features, this study used the SHAP Summery plot. Utilizing SHAP Summery plot features, they are listed according to their impact on the prediction. Figure 1 is shown the SHAP Summery plot for four diﬀerent ML approaches that have been used in this study. In Fig. 2, the top twenty features out of a total of twenty-four are shown for each ML approach. Subplot A in Fig. 2 shows the features that are important for predicting CKD for the RF classiﬁer. Later, subplot B shows the impacted features of the GB classiﬁer. Lastly, subplots C and D, respectively, show those features which have the most impact on CKD prediction for XGB and LightGBM classiﬁers. In Table 3, the top 10 features that have an impact on predicting CKD for four diﬀerent ML algorithms are summarized. Table 3. Top 10 signiﬁcant features for CKD patients Algorithm Top Ten Features RF GB XGB LightGBM

sg, sg, sg, sg,

hemo, hemo, hemo, hemo,

sc, pcv, al, rbcc, htn, dm, bgr, sod pcv, al, rbcc, dm, sc, htn, bgr, ba pcv, sc, sod, al, age, htn, rbcc, bgr sc, al, pcv, htn, sod, dm, age, bgr

In brief, a CKD dataset for this study was collected from an online repository known as Kaggle. Then the dataset was preprocessed as necessary to prepare the data for applying ML approaches. Four diﬀerent ML approaches were used to predict CKD. Then it is found that the RF and LightGBM ML approaches give the highest accuracy of 99.167%. Later on, we applied feature selection method to ﬁnd out the features those are important to predict CKD. For features’ importance, the SHAP summary plot is used in this work to show the feature importance and their impact to ﬁnd the important risk factors. This study has found out important features of diﬀerent 4 ML approaches. Chittora et al. (2021) found the important features were (rbcc, pc, al, ba, su, pcc, sc, age, bp, bgr) [6]. Qin et al. (2019) showed that (sg, hemo, sc, al, pcv, rbcc, htn, dm, bgr, bu) are the most important features [7]. These publication’s outcome indicate that our ﬁndings are valid and the predictive model is highly potential to predict CKD. The study will support doctors, clinicians and patients to predict CKD and related complexities and to ﬁnd out their impact using the proposed methods. Overall, the study will contribute in medical sector to predict and analyze risk factors of a CKD patient.

118

Md. R. Mia et al.

Fig. 2. Signiﬁcant features and their impact on CKD

4

Conclusion and Future Work

The study proposed a ML model, RF, to predict CKD along with signiﬁcant accuracy of 99.17%. The study also focuses on discovering the most signiﬁcant risk factors those are signiﬁcant for CKD prediction. It is found that sg, hemu, sc and pcv are the most signiﬁcant risk factors which are mostly responsible for CKD. Besides, all the features are ranked according to their signiﬁcance. It can be noticed that the CKD dataset evaluated in this study is not particularly

PreCKD ML: Development of Prediction Model for CKD

119

large and was compiled by others. For improved analysis and performance of the features’ impact on model evaluation, raw data from CKD patients will be collected in the future. In addition to that, more advanced technology will be applied to upgrade the model and its performance. It is possible that this research could have important therapeutic beneﬁts, and the analysis of the study results could assist doctors and researchers better predict when someone will have CKD. Acknowledgement. This work was supported by funding from the Natural Sciences and Engineering Research Council of Canada (NSERC)

References 1. Davis, G., Kurse, A., Agarwal, A., Sheikh-Hamad, D., Kumar, M.R.: Nanoencapsulation strategies to circumvent drug-induced kidney injury and targeted nanomedicines to treat kidney diseases. Current Opinion in Toxicology, p. 100346 (2022) 2. Revathy, S., Bharathi, B., Jeyanthi, P., Ramesh, M.: Chronic kidney disease prediction using machine learning models. Int. J. Eng. Adv. Technol. (IJEAT), 9 (2019) 3. Yashﬁ, S.Y., Islam, M.A., Sakib, N., Islam, T., Shahbaaz, M., Pantho, S.S.: Risk prediction of chronic kidney disease using machine learning algorithms. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5. IEEE (2020) 4. Qin, J., Chen, L., Liu, Y., Liu, C., Feng, C., Chen, B.: A machine learning methodology for diagnosing chronic kidney disease. IEEE Access 8, 20991–21002 (2019) 5. Zubair Hasan, K.M., Zahid Hasan, M.: Performance evaluation of ensemble-based machine learning techniques for prediction of chronic kidney disease. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds.) Emerging Research in Computing, Information, Communication and Applications. AISC, vol. 882, pp. 415–426. Springer, Singapore (2019). https://doi.org/10.1007/978-981-135953-8 34 6. Celik, E., Atalay, M., Kondiloglu, A.: The diagnosis and estimate of chronic kidney disease using the machine learning methods. Int. J. Intell. Syst. Appl. Eng. 4(Special Issue-1), 27–31 (2016) 7. Krishnamurthy, S., et al.: Machine learning prediction models for chronic kidney disease using national health insurance claim data in Taiwan. In: Healthcare, vol. 9, no. 5, p. 546. Multidisciplinary Digital Publishing Institute (2021) 8. Almansour, N.A., et al.: Neural network and support vector machine for the prediction of chronic kidney disease: a comparative study. Comput. Biol. Med. 109, 101–111 (2019) 9. Radha, N., Ramya, S.: Performance analysis of machine learning algorithms for predicting chronic kidney disease. Int. J. Comput. Sci. Eng. Open Access 3, 72–76 (2015) 10. Chiu, R.K., Chen, R.Y., Wang, S.A., Jian, S.J.: Intelligent systems on the cloud for the early detection of chronic kidney disease. In: 2012 International Conference on Machine Learning and Cybernetics, vol. 5, pp. 1737–1742. IEEE (2012) 11. Ebiaredoh-Mienye, S.A., Esenogho, E., Swart, T.G.: Integrating enhanced sparse autoencoder-based artiﬁcial neural network technique and softmax regression for medical diagnosis. Electronics 9(11), 1963 (2020)

120

Md. R. Mia et al.

12. Donges, N.: A complete guide to the random forest algorithm. Built In, 16 (2019) 13. Quinlan, J.R.: Induction of decision trees. Mach. learn. 1(1), 81–106 (1986). https://doi.org/10.1007/BF00116251 14. Friedman, J.H.: Stochastic gradient boosting. Comput. Statistics Data Anal. 38(4), 367–378 (2002) 15. Sundaram, R.B.: An end-to-end guide to understand the math behind XGBoost (2018) 16. Gupta, A., Gupta, A., Verma, V., Khattar, A., Sharma, D.: Texture feature extraction: impact of variants on performance of machine learning classiﬁers: study on chest x-ray – pneumonia images. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds.) BDA 2020. LNCS, vol. 12581, pp. 151–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66665-1 11 17. Pramanik, R., Khare, S., Gourisaria, M.K.: Inferring the occurrence of chronic kidney failure: a data mining solution. In: Gupta, D., Khanna, A., Kansal, V., Fortino, G., Hassanien, A.E. (eds.) Proceedings of Second Doctoral Symposium on Computational Intelligence. AISC, vol. 1374, pp. 735–748. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3346-1 59 18. Ali, M.M.: Machine learning-based statistical analysis for early stage detection of cervical cancer. Comput. Biol. Med. 139, 104985 (2021) 19. Haitaamar, Z.N., Abdulaziz, N.: Detection and semantic segmentation of rib fractures using a convolutional neural network approach. In: 2021 IEEE Region 10 Symposium (TENSYMP), pp. 1–4. IEEE (2021) 20. Shah, A., Rathod, D., Dave, D.: DDoS attack detection using artiﬁcial neural network. In: International Conference on Computing Science, Communication and Security, pp. 46–66. Springer, Cham (2021) 21. Piech, M., Smywinski-Pohl, A., Marcjan, R., Siwik, L.: Towards automatic points of interest matching. ISPRS Int. J. Geo Inf. 9(5), 291 (2020) 22. Nelson, D.: Gradient boosting classiﬁers in python with scikit-learn. Retrieved from Stack Abuse. https://stackabuse.com/gradientboosting-classiﬁers-in-python-withscikit-learn (2019) 23. Chen, T., He, T., Benesty, M. and Khotilovich, V.: Package ‘xgboost’. R version, 90 (2019) 24. Abdurrahman, M.H., Irawan, B., Setianingsih, C.: A review of light gradient boosting machine method for hate speech classiﬁcation on twitter. In: 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), pp. 1–6. IEEE (2020) 25. Lazich, I., Bakris, G.L.: Prediction and management of hyperkalemia across the spectrum of chronic kidney disease. In: Seminars in nephrology, vol. 34, no. 3, pp. 333–339. WB Saunders (2014) 26. Rabby, A.S.A., Mamata, R., Laboni, M.A., Abujar, S.: Machine learning applied to kidney disease prediction: Comparison study. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2019) 27. Bhutani, H., et al.: A comparison of ultrasound and magnetic resonance imaging shows that kidney length predicts chronic kidney disease in autosomal dominant polycystic kidney disease. Kidney Int. 88(1), 146–151 (2015) 28. Elhoseny, M., Shankar, K., Uthayakumar, J.: Intelligent diagnostic prediction and classiﬁcation system for chronic kidney disease. Sci. Rep. 9(1), 1–14 (2019) 29. Grams, M.E.: Predicting timing of clinical outcomes in patients with chronic kidney disease and severely decreased glomerular ﬁltration rate. Kidney Int. 93(6), 1442– 1451 (2018)

PreCKD ML: Development of Prediction Model for CKD

121

30. Merzkani, M.A., et al.: Kidney microstructural features at the time of donation predict long-term risk of chronic kidney disease in living kidney donors. In: Mayo Clinic Proceedings, vol. 96, no. 1, pp. 40–51. Elsevier (2021) 31. Farrington, K., et al.: Clinical Practice Guideline on management of older patients with chronic kidney disease stage 3b or higher (eGFR< 45 mL/min/1.73 m2): a summary document from the European Renal Best Practice Group. Nephrology Dialysis Transplantation, 32(1), 9–16 (2017)

A Reliable and Eﬃcient Transfer Learning Approach for Identifying COVID-19 Pneumonia from Chest X-ray Sharmeen Jahan Seema1(B)

2

and Mosabber Uddin Ahmed2

1 Department of Information and Communication Technology, Bangladesh University of Professionals, Dhaka 1216, Bangladesh [email protected] Department of Electrical and Electronic Engineering, University of Dhaka, Dhaka 1000, Bangladesh [email protected]

Abstract. Over 500 million people have fallen prey to the coronavirus (COVID-19) epidemic that is sweeping the world. The traditional method for detecting it is pathogenic laboratory testing, but it has a high risk of false negatives, forcing the development of additional diagnostic approaches to combat the disease. X-ray imaging is a straightforward and patient-friendly operation that may be performed in almost any healthcare facility. The aim of the report is to use transfer learning models to build a feasible mechanism for determining COVID-19 pneumonia automatically utilizing chest X-ray images while enhancing detection accuracy. On three publicly available datasets, we ran several experiments. The recommended mechanism is intended to provide multi-class classiﬁcation diagnostics (COVID-19 pneumonia vs. Non COVID-19 pneumonia vs. Normal). In this study, 5 selected best transfer learning methods out of 9 alternative models were tested in various scenarios with varied dataset splitting and amalgamation. Based on their performance with the Merged dataset, an ensemble model was developed using top three models. Our proposed ensemble model had classiﬁcation accuracy, precision, recall, and f1-score of 99.62%, 1, 0.99, and 1.00 for multi-class cases, respectively. It detected 99.12% of COVID-19 pneumonia accurately. This recommended system can considerably improve COVID-19 diagnosis time and eﬃciency. Keywords: Transfer Learning · Ensemble Network · COVID-19 · Pneumonia

1

· Convolutional Neural

Introduction

SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) ﬁrst appeared in 2019, became pandemic in 2020, and is now a particularly major cause of c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 122–136, 2023. https://doi.org/10.1007/978-3-031-34619-4_11

A Reliable and Eﬃcient Transfer Learning Approach

123

pneumonia known as COVID-19 pneumonia [1]. As of April 14, 2022, the pandemic has spread to over 200 nations, with well over 500 million veriﬁed illnesses and surpassing 6 million deaths [2]. The infection might spread to the lungs as the virus multiplies. It’s probable that one will get pneumonia if this happens [3]. COVID-19-associated pneumonia was formerly known as Novel CoronavirusInfected Pneumonia (NCIP). The World Health Organization termed it COVID19, which stands for coronavirus disease 2019. It can aﬀect anybody, however it is more common in persons aged 65 and over. Fever, chest tightness and a persistent cough are obvious indicators of COVID-19. Symptoms such as high heart rate, breathlessness, diﬃculty breathing and disorientation may occur if the COVID-19 infection escalates to pneumonia [4]. One of the most common lung disease is known as pneumonia. It is a bacterial, viral, or fungal infection that aﬀects the lungs [5]. Infants and small children, those over 65, and those with health issues or compromised immune systems are the most vulnerable [6]. The World Health Organization claims, pneumonia costs around 4 million individuals to die prematurely each year. It aﬀects over 150 million people each year, primarily children under the age of ﬁve [7]. In some circumstances, identifying COVID-19 pneumonia from conventional pneumonia can be tricky, especially when clinical criteria are taken into account. In the initial days of COVID-19, the primary symptom were fever, exhaustion, and a persistent cough, but people with General Pneumonia (GP) displayed common characteristics [8]. As a result, timely identiﬁcation and isolation of those suﬀering from GP and COVID-19 pneumonia can hopefully minimize the outbreak from propagating [9]. The conventional technique to pinpoint COVID-19 disease is reverse transcription polymerase chain reaction (RT-PCR). Unfortunately, it has a number of shortcomings including false positives, limited sensitivity, high cost, and the need for professionals to perform the test. As the incidence rate climbs, it becomes increasingly important to develop an accurate, quick, and low-cost swift evaluation system. Because they are inexpensive to get and easily available, chest X-ray images could be used a substitute method. COVID-19 can only be diagnosed by a professional physician using a chest X-ray. There are very few specialists who can make this assessment. COVID-19 is also a extremely deadly viral disease, with healthcare professionals and attendants potentially extremely vulnerable. Early diagnosis of pneumonia is signiﬁcant both for the preventing epidemic spread as well as ensuring a patient’s recovery. By expediting the detection process, doctors may identify diﬀerent types of pneumonia from chest X-rays more eﬀectively and easily, potentially saving thousands of lives and lowering treatment costs. Despite the fact that the literature describes a variety of methods for distinguishing X-ray images and detecting COVID-19 pathogen, the bulk of the methods succeed to distinguish between two groups (COVID19 pneumonia vs. Normal). Nonetheless, well-developed models are required to categorize COVID-19 pneumonia from Non COVID-19 pneumonia and healthy instances [10]. The study’s main focus is to establish a robust model that will identify COVID-19 pneumonia, Non COVID-19 pneumonia, and Normal cases from Xray images collected from diﬀerent datasets. The existing datasets are typically

124

S. J. Seema and M. U. Ahmed

small in size due to scarcity of COVID-19 pneumonia samples. The works that have been done on large/amalgamation of small datasets is very few in number. This paper works with 3 diﬀerent publicly available datasets which contains comparatively higher number of chest X-ray images than previous datasets that have been worked upon. On top of that, this work implemented various ways of splitting and merging the datasets which ultimately increased the number of data. Performance analysis of 5 best transfer learning models chosen out of 9 transfer learning models for accurate recognition and classiﬁcation of genre of pneumonia is also seen in this work. Finally, an ensemble model has been proposed for diﬀerentiating the 3 classes.

2

Literature Review

The underlying features from 13 Convolutional Neural Network (CNN) models were supplied to the SVM model by Sethy et al. [11]. ResNet50 with SVM outperformed the other 12 classiﬁcation models with an eﬃciency of 95.33 %. Khan et al. [12] suggested CoroNet, a Deep Convolutional Neural Network model based on the Xception architecture. The suggested model exhibited classiﬁcation accuracy of 95 %. Narin et al. [13] oﬀered ﬁve CNN constructed models for the recognition of COVID-19, healthy, and pneumonia infected patients. Among the ﬁve models, ResNet50 model performed the best. To categorize COVID-19 pneumonia, non-COVID-19 pneumonia, and normal pneumonia, Nishio et al. [14] used the VGG16, MobileNet, DenseNet121, and EﬃcientNet CNN models. With an accuracy of 83.6 %, he realized how the VGG16 model surpassed its competitor models. Ozturk et al. [15] classiﬁed chest X-rays using DarkCovidNet on 1,127 samples, and she did it with an accuracy of 87.02 %. EDL-COVID was used by Tang et al. [16] on 15,477 samples, with an accuracy of roughly 95 %. To diﬀerentiate COVID-19 pneumonia from non-COVID pneumonia and nor¨ uz et al. [17] developed Ensemble-CVDNet and experimental mal patients, Oks¨ results showed that it had a 98.30 % accuracy rate. Bhardwaj et al. [18] used 4 distinct models to create a COVID-19 detecting deep ensemble learning system. It achieved multiclass accuracy of 92.36 %. To detect COVID-19, Aﬁﬁ et al. [19] utilized three networks and for a three-class issue and the outcomes of the trial revealed that their model was 91.2 % accurate.

3

Methodology

COVID-19 pneumonia and Non COVID-19 pneumonia recognition and classiﬁcation adopting digital chest X-ray is a challenging undertaking because the X-ray images of both diseases show little to no diﬀerence. For precise classiﬁcation, we need a robust and optimal model which can be achieved through deep learning methods. In order to identify a reliable and eﬀective model, we have endeavored to examine the eﬀectiveness of various transfer learning models under various conditions. We also tried to implement a novel CNN based model using ensemble method. The recommended methodology’s summary is shown in Fig. 1.

A Reliable and Eﬃcient Transfer Learning Approach

125

Fig. 1. Diagrammatic representation of the work technique.

3.1

Dataset Summarization

In this paper, three publicly available datasets have been used. The datasets are designated as COVID-19 Radiography Dataset, COVID IEEE and Pneumonia and Normal Chest X-ray PA Dataset respectively. In this work, the three datasets will be alluded to as Dataset 1, Dataset 2, and Dataset 3, respectively. Table 1. Dataset designation and data distribution for all classes. Dataset Name

Total Samples Normal

Non COVID-19 COVID-19 Modality Pneumonia Pneumonia

Dataset 1

15,153

10,192

1,345

3,616

Dataset 2

1,708

668

619

421

Dataset 3

4,575

1,525

1,525

1,525

Merged Dataset

21,240

12,385

3,297

5,558

X-ray

Dataset 1 was collected from various resources. It incorporates COVID-19 positive chest X-rays, as well as normal and viral pneumonia images [20] [21]. Dataset 2 contains 1,708 images including 421 images of COVID-19, 619 images of viral pneumonia, and 668 images of normal patients [22]. The chest X-ray poster anterior (PA) images of Dataset 3 was obtained from several sources. It consists of a total of 4,575 images, with 1,525 images used for each condition [23]. We got rid of some redundant images present in both Dataset 2 and Dataset 3. The Merged dataset, an integration of Dataset 1,2 and 3 consists of a total 21,240 images. The dataset designation and data distribution for all classes are depicted in Table 1.

126

S. J. Seema and M. U. Ahmed Table 2. Dataset splitting and amalgamation.

Datasets Used

Splitting Ratio/ Training-Testing Dataset

Training (Images)

Testing(Images)

Total

Total

(Normal, Non COVID-19, COVID-19) (Normal, Non COVID-19, COVID-19)

Dataset 1

80:20

12122 (8153, 1076, 2893)

3031 (2039, 269, 723)

Dataset 1

70:30

10609 (7135, 942, 2532)

4544 (3057, 403, 1084)

Dataset 1

60:40

9092 (6115, 807, 2170)

6061 (4077, 538, 1446)

Dataset 1

100% Dataset 1 (training) &

& Dataset 2

100% Dataset 2 (testing)

Dataset 1

100% Dataset 1 (training) &

& Dataset 3

100% Dataset 3 (testing)

Dataset 1 + Dataset 2 + Dataset 3

80:20

3.2

15153

1708

(10192, 1345, 3616)

(668,619,421)

15153

4575

(10192, 1345, 3616)

(1525, 1525, 1525)

17084

4156

(9908, 2638, 4538)

(2477, 659, 1020)

Dataset Preprocessing

The datasets have undergone minimal preparation, including image scaling and splitting. To make them compatible with all of the models, all of the images are scaled to the input image size of the individual transfer learning models. On Dataset 1, three types of splitting have been performed. A split of 80:20 means that 80 % images are utilized for training and 20 % are utilized for testing. A 70:30 split means that 70 % images are utilized for training and 30 % are for testing. A 60:40 split means that 60 % images are used for training and 40 % are used for testing. Furthermore, Dataset 1 and 2 were employed for training and testing, respectively. Next, Dataset 1 and 3 were employed for training and testing purposes, respectively. Finally, the Merged dataset was divided into 80:20 (80 % training, 20 % testing). The dataset splitting and amalgamation has been described in Table 2. 3.3

Altered Transfer Learning Methods

Transfer Learning (TL) refers to the process of employing a model that has already been developed on one situation to solve the issues that is identical to it. As a byproduct of the ImageNet challenge, numerous CNN models have been uncovered in the image classiﬁcation problem, and these pre-trained models are applicable via transfer learning to a multitude of image classiﬁcation tasks [24,25]. In this analysis, nine comparable pre-trained CNN models- InceptionV3 [26], InceptionResNetV2, Xception [27], VGG16 [28], VGG19, ResNet50 [29], ResNet101 [30], MobileNet [31] and DenseNet201 [32] have been modiﬁed and used on the 4 datasets. We have taken into consideration models of every weight including the lightest model (MobileNet), mid weight model (ResNet50) and heavy

A Reliable and Eﬃcient Transfer Learning Approach

127

weight model (VGG19). From depth perspective, InceptionResNetV2 comes ﬁrst and VGG16 comes last. VGG19 contains the highest number of parameters of 143,667,240 and MobileNet contains the lowest number of parameters of 4,253,864. The input image size of the 3 models Xception, InceptionV3 and InceptionResNetV2 is 299 × 299. The input image size of the other 6 models VGG16, VGG19, ResNet50, ResNet101, MobileNet and DenseNet201 is 224 × 224. In each of the nine TL models, the ﬁnal dense layer was eliminated, and a substitute dense layer with a softmax activation function was added in its place. Three neurons make up the new layer, indicating three classes. Normal cases are assigned to class 0, Non COVID-19 pneumonia cases to class 1, and COVID-19 pneumonia cases to class 2. All of the models were given training over 50 epochs employing Adam as the optimizer, a learning rate of 0.001, and a categorical cross-entropy loss function. 3.4

Training and Testing

The 9 TL models are at ﬁrst tested on Dataset 1 (splitted into 80:20). Based on accuracy, the best 5 models are chosen from the 9 models. The 5 chosen best models are tested in 5 diﬀerent scenarios. To begin, the 5 models are tested in an environment consisting of the same dataset (Dataset 1) but with two diﬀerent split (70:30 and 60:40). Furthermore, the 5 TL models are again tested on another environment consisting of the Dataset 1 and Dataset 2. Dataset 1 is utilized for training the images and Dataset 2 is utilized for testing the images. Next, the ﬁve selected TL models are tested with another combination of Dataset (Dataset 1 as training and Dataset 3 as testing). After that, the Merged dataset is created and tested on the 5 TL models. In the end, the performance of the models are observed by averaging their result in all the phases. 3.5

Ensemble of Best TL Models

A voting ensemble is a method for combining recommendations from several independent models. When compared to a single model, ensemble techniques often produce more accurate ﬁndings. When it comes to categorizing, each label’s predictions are tallied together, and the category with the mass votes is picked [33]. In the ﬁnal scenario, a voting ensemble was performed on three best models from the ﬁve TL models that were trained and evaluated using the Merged dataset (80:20 split). To begin, each of the three models will predict the class label of each sample in three diﬀerent columns. If three of them predict the same class label (0/1/2) for a single sample, the sample will be considered in that class label. If, on the other hand, the majority of the models predict one sample as belonging to one class and the last one as belonging to another, the sample will be classiﬁed according to the majority models’ predictions. Finally, if all three models accurately predict that a sample belongs to a speciﬁc class, the predictor with the maximum accuracy will be utilized to make the prediction. Eventually, the ensemble model’s accuracy is evaluated to that of the top three models based on average accuracy. The robust and ideal model is the one with the highest levels

128

S. J. Seema and M. U. Ahmed

of accuracy, precision, recall, f1-score and class accuracy (Normal/ Non COVID19 pneumonia/ COVID-19 pneumonia). Figure 1 depicts the methodology of this approach.

4

Result Analysis

The act of TL models for identifying COVID-19 pneumonia, Non COVID-19 pneumonia, and Normal patients was evaluated in this study using 9 TL models. We also looked at the results of ﬁve of the best models in ﬁve distinct scenarios. Finally, a voting ensemble was performed on the three top models selected from the ﬁfth scenario. The experiments were carried out in Kaggle using Python. All programs were run on an Acer Aspire 5 laptop with an NVIDIA GeForce MX150 graphics card (Intel Core i5-7th Gen Processor, 8 GB RAM, 2 TB Hard Drive, Windows 10 Pro). Accuracy, precision, recall, f1-score, and class wise accuracy are all used to evaluate each classiﬁer’s performance. 4.1

Result Analysis of 9 TL Models Based on Dataset 1 (80:20) Split

COVID-19 0.9934 1 0.99 0.99 0.9990 0.9814 0.9820

Non COVID-19

0.962 0.97 0.95 0.96 0.9808 0.9442 0.9156

Normal

0.9656 0.96 0.97 0.96 0.9681 0.9702 0.9571

F1-Score 0.9841 0.99 0.98 0.98 0.9931 0.9739 0.9626

Recall

0.9917 0.99 0.99 0.99 0.9936 0.9962 0.9847

Precision

0.994 1 0.99 0.99 0.9985 0.9776 0.9875

0.9943 1 0.99 0.99 0.9985 0.9888 0.9847

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0.9907 0.99 0.99 0.99 0.9960 0.9925 0.9751

Accuracy

0.9874 0.99 0.98 0.99 0.9941 0.9739 0.9737

In this phase the 9 TL models (InceptionV3 [26], InceptionResNetV2, Xception [27], VGG16 [28], VGG19, ResNet50 [29], ResNet101 [30], MobileNet [31] and DenseNet201 [32]) were experimented on Dataset 1 (80:20 split). With an accuracy of 99.4 %, precision 1, recall 0.99, and f1-score 0.99, InceptionV3 and InceptionResNetV2 were the most successful. They both had the highest class accuracy (99.85 %) for Normal class, InceptionResNetV2 had the highest class accuracy (99.88 %) for Non COVID-19 pneumonia class and InceptionV3 had the highest class accuracy (99.78 %) for COVID-19 pneumonia class. Xception, MobileNet, and DenseNet201 came in second, third, and fourth place, respectively, with accuracy of 99.34 %, 99.17 %, and 99.07 %. With 96.2 %, VGG19 was the poorest performer. The performance of the models in this stage are depicted in Fig. 2. The 5 best performed models (InceptionV3, InceptionResNetV2, Xception, MobileNet, and DenseNet201) were selected for further experimentation in the next phases.

Fig. 2. Performance comparison of 9 TL models based on accuracy, precision, recall, f1-score and class accuracy.

A Reliable and Eﬃcient Transfer Learning Approach

4.2

129

Result Analysis of 5 TL Models in First Scenario

The 5 selected TL models out of the 9 models are tested in a new scenario with the same dataset (Dataset 1) but with a diﬀerent split (70:30). In this scenario, InceptionResNetV2, Xception and Densenet201 came in ﬁrst, second and third in terms of achievement with an accuracy of 99.33 %, 99.09 % and 98.67 % respectively. InceptionV3 performed the worst in this environment. All ﬁve models overall performance is displayed in Fig. 3(a). 4.3

Result Analysis of 5 TL Models in Second Scenario

The 5 TL models that were chosen are tested again in a new setting using the same dataset (Dataset 1) but a diﬀerent split (60:40). With accuracy of 98.64 %, 98.13 %, and 98.05 %, respectively, Xception, MobileNet, and InceptionResNetV2 placed ﬁrst, second, and third in this scenario. In this context, DenseNet201 performed the poorest. Figure 3(b) depicts the overall performance of all ﬁve models. 4.4

Result Analysis of 5 TL Models in Third Scenario

This time, the 5 selected TL models are put to the test in a new scenario using two datasets. Training is done with Dataset 1, while testing is done with Dataset 2. Densenet201, Xception, and InceptionV3 took ﬁrst, second, and third place in this scenario, with accuracy of 97.48 %, 96.66 %, and 94.78 %, respectively. In this context, InceptionResNetV2 performed the poorest. Figure 3(c) depicts the overall performance of all ﬁve models. 4.5

Result Analysis of 5 TL Models in Fourth Scenario

The ﬁve TL models are again tested in a new context with two datasets. The ﬁrst dataset (Dataset 1) is utilized for training, while the third dataset (Dataset 3) is used for testing. In this scenario, Densenet201, InceptionResNetV2, and InceptionV3 came in ﬁrst, second, and third, with accuracy of 93.68 %, 92.89 %, and 92.15 %, respectively. Xception scored the worst in this regard. The overall eﬀectiveness of all ﬁve models is displayed in Fig. 3(d) shows the total performance of all ﬁve models. 4.6

Result Analysis of 5 TL Models in Fifth Scenario

With the Merged Dataset, the ﬁve TL models are now tested in a new context. It is separated into two halves, with 80 % dataset being utilized for training and 20 % for testing. In this scenario, InceptionResNetV2, InceptionV3, and MobileNet placed ﬁrst, second, and third, respectively, with accuracy of 99.42 %, 99.27 %, and 99.15 %. Densenet201 had the lowest score in this category. The total performance of all ﬁve models is shown in Fig. 3(e).

130

S. J. Seema and M. U. Ahmed

0.9947 0.9964 0.9934 0.9964 0.9973

0.9677 0.99 0.928 0.9503 0.9751

0.9714 0.9861 0.9861 0.9704 0.9787

Xcepon

0.98 0.99 0.98 0.98 0.99

MobileNet

0.98 0.99 0.97 0.97 0.98

InceponV3

0.99 0.99 0.99 0.99 0.99

1 0.8 0.6 0.4 0.2 0

InceponResNetV2

0.9867 0.9933 0.9859 0.9861 0.9909

Densenet201

ACCURACY

PRECISION

RECALL

F1-SCORE

NORMAL

NON COVID-19

COVID-19

(a)

0.9865 0.9828 0.9906 0.9936 0.9936

0.9516 0.933 0.933 0.9851 0.9907

0.9605 0.9917 0.9681 0.9453 0.9647

Xcepon

0.97 0.97 0.97 0.98 0.99

MobileNet

0.97 0.97 0.96 0.97 0.98

InceponV3

0.98 0.98 0.98 0.99 0.99

1 0.8 0.6 0.4 0.2 0

InceponResNetV2

0.9772 0.9805 0.9802 0.9813 0.9864

Densenet201

ACCURACY

PRECISION

RECALL

F1-SCORE

NORMAL

NON COVID-19

COVID-19

InceponResNetV2

0.98 0.93 0.94 0.94 0.97

0.98 0.93 0.95 0.93 0.97

0.98 0.93 0.95 0.93 0.97

0.9640 0.8817 0.9206 0.9191 0.9520

0.9822 0.9806 0.9838 0.9903 0.9838

0.9809 0.9311 0.9382 0.8812 0.9643

1 0.8 0.6 0.4 0.2 0

Densenet201 0.9748 0.9297 0.9478 0.9355 0.9666

(b) InceponV3

MobileNet

Xcepon

ACCURACY

PRECISION

RECALL

F1-SCORE

NORMAL

NON COVID-19

COVID-19

(c)

0.975 0.9547 0.9173 0.939 0.9678

0.8754 0.8714 0.8655 0.8734 0.8668

0.96 0.9606 0.9816 0.935 0.9095

Xcepon

0.94 0.93 0.92 0.92 0.92

MobileNet

0.94 0.93 0.92 0.92 0.91

InceponV3

0.94 0.93 0.93 0.92 0.92

1 0.8 0.6 0.4 0.2 0

InceponResNetV2

0.9368 0.9289 0.9215 0.9158 0.9147

Densenet201

ACCURACY

PRECISION

RECALL

F1-SCORE

NORMAL

NON COVID-19

COVID-19

(d)

0.9955 0.9975 0.9983 0.9975 0.9959

0.9939 0.9863 0.9939 0.9924 0.9878

0.9607 0.9911 0.9784 0.9764 0.9803

Xcepon

0.99 0.99 0.99 0.99 0.99

MobileNet

0.98 0.99 0.99 0.99 0.99

InceponV3

0.99 0.99 1 0.99 0.99

1 0.8 0.6 0.4 0.2 0

InceponResNetV2

0.986766 0.9942 0.9927 0.991578 0.990856

Densenet201

ACCURACY

PRECISION

RECALL

F1-SCORE

NORMAL

NON COVID-19

COVID-19

(e)

Fig. 3. Performance comparison of 5 TL models based on accuracy, precision, recall, f1-score and class accuracy in ﬁve diﬀerent scenarios (a) Dataset 1 (70:30 split). (b) Dataset 1 (60:40 split). (c) Dataset 1 (training) & Dataset 2 (testing). (d) Dataset 1 (training) & Dataset 3 (testing). (e) Dataset 1 + Dataset 2 + Dataset 3 (80:20 split).

A Reliable and Eﬃcient Transfer Learning Approach Xcepon

131

0.9698912

MobileNet

0.9620556

InceponV3

0.96562

InceponResNetV2

0.96532

DenseNet201

0.9724532 0.9

1 Average accuracy for 5 scenarios

Fig. 4. Average Accuracy of 5 TL Models Considering All Five Scenarios.

4.7

Average Performance of the 5 TL Models Throughout 5 Scenarios

We’ve already seen how all of the models performed in ﬁve distinct scenarios. In diﬀerent setups, the models performed diﬀerently. There was no uniformity in the models’ performance. In this situation, we took the average of all the models’ accuracy across the ﬁve scenarios. Densenet201 took top place with a 97.25 %, followed by Xception in second place with a 96.98 % and InceptionV3 in third place with a 96.56 %. InceptionResNetV2 and MobileNet’s average accuracy results were not up to par. The visual representation of the result is shown in Fig. 4.

Fig. 5. Confusion Matrix of Ensemble Model.

4.8

Result Analysis of Ensemble Model

A voting ensemble has been done on the 3 best selected models from the ﬁfth scenario i.e. InceptionResNetV2, InceptionV3, and MobileNet. The ensemble model proved to perform the best with an accuracy of 99.62%, precision 1.00, recall 0.99 and f1-score 1.00. It achieved 99.87% in identifying normal cases, 99.39% in identifying Non COVID-19 pneumonia cases and 99.12% in identifying COVID-19 pneumonia cases. Figure 5 presents the ensemble model’s confusion matrix. The column indicates the predicted value and the row indicates the

132

S. J. Seema and M. U. Ahmed

Table 3. Comparison of 3 Best TL Models from Fifth Scenario and Ensemble Model. Models

Accuracy Precision Recall F1-Score Normal

Non COVID-19 COVID-19

InceptionResNetV2 99.42%

0.99

0.99

0.99

99.75%

98.63%

99.11%

InceptionV3

99.27%

1

0.99

0.99

99.83%

99.39%

97.84%

MobileNet

99.16%

0.99

0.99

0.99

99.75%

99.24%

Ensemble Model

99.62%

1

0.99

1

99.87% 99.39%

97.64% 99.12%

actual values. The model correctly predicted 2474 images out of 2477 images for normal cases, 655 images out of 659 images for Non COVID-19 pneumonia images, 1011 images out of 1020 images for COVID-19 pneumonia cases. The comparison of the result between the individual best three models and the ensemble model based on ﬁfth scenario is given in Table 3. The ensemble model clearly beat all other models in accordance to accuracy, precision, recall f1-score, and class accuracy in identifying COVID-19 pneumonia, Non COVID19 pneumonia, and Normal cases. 4.9

Discussion

In order to build a practical process for estimating COVID-19 pneumonia from digital chest X-ray images and boost recognition rate, this study will employ TL models. There are numerous research on this topic in the literature, as seen in Table 4. It is typical to discriminate between COVID-19 positive and healthy patients when employing binary classiﬁcation but it’s critical to make a distinction amidst COVID-19 pneumonia patients and those who have viral/bacterial pneumonia, another lung illness. There aren’t many studies in the literature that use enough samples, particularly COVID-19 pneumonia samples. Additionally, the previous works’ performances are not substantial, and there aren’t many ensemble-related works available. Sethy et al. [11] provided the SVM classiﬁer with the deep features from 13 pre-trained CNN models. She employed 127 samples of COVID-19. The most accurate classiﬁer was ResNet50, which had an accuracy of 95.33 % for the three categories of normal, pneumonia, and COVID-19. Khan et al. [12] suggested CoroNet, a Deep Convolutional Neural Network model based on the Xception architecture. The suggested model exhibited classiﬁcation accuracy of 95 %. When the VGG16 model using a variety of augmentation techniques was tested alongside the MobileNet, DenseNet121, and EﬃcientNet CNN models, Nishio et al. [14] discovered that it performed best, with an accuracy of 83.6 %. In this work, 215 COVID-19 samples were used. The DarkCovidNet deep learning network was developed by Ozturk et al. [15] to demonstrate precise diagnostics for binary and multi-class classiﬁcation. He created the model using 1,127 images, including 127 COVID-19 samples, and got a multi-class accuracy of 87.02 %. EDL-COVID was used by Tang et al. [16] on 15,477 samples, with an accuracy of roughly 95 %. To distinguish ¨ uz COVID-19 pneumonia (219 samples) from Non COVID-19 pneumonia, Oks¨ et al. [17] developed Ensemble-CVDNet, an amalgamation of three pre-trained

A Reliable and Eﬃcient Transfer Learning Approach

133

models. Among the other studies, this model showed the best accuracy, at 98.30 %. Chowdhury et al. [20] applied 8 diﬀerent transfer learning models on 3,487 images to classify them into three categories. DenseNet201 outperformed them all with an accuracy of 97.94 %. Bhardwaj et al. [18] presented a COVID-19 detecting deep ensemble learning architecture utilizing four various pre-trained deep neural network architectures. The experiment’s ﬁndings indicated a multiclass accuracy of 92.36 %. Table 4. Performance comparison of previous works on classiﬁcation of COVID-19 from chest X-rays with the proposed model. Reference

Data Type Total Images Method(s)

Accuracy

Sethy et al. [11]

X-ray

381

95.33%

Khan et al. [12]

X-ray

1,251

CoroNet

89.60%

Nishio et al. [14]

X-ray

1,248

VGG16

83.60%

Ozturk et al. [15]

X-ray

1,127

DarkCovidNet

87.02%

Tang et al. [16] ¨ uz et al. [17] Oks¨

X-ray

15,477

EDL-COVID

95%

X-ray

2,905

Ensemble-CVDNet

98.30%

Chowdhury et al. [20] X-ray

3,487

DenseNet201

97.94%

InceptionV3, DenseNet121, InceptionResNetV2 and Xception

92.36%

Bhardwaj et al. [18]

X-ray

10,046

Proposed Model

X-ray

21,240

ResNet50 + SVM

Ensemble(InceptionResNetV2, 99.62% InceptionV3, MobileNet)

Most of the existing works have concentrated on a select few COVID-19 pneumonia chest X-ray images. The datasets that are provided are largely small. There have been few studies on large/amalgamation of small datasets. Our proposed model works with the amalgamation of three datasets containing a total of 21,240 samples including the highest number of COVID-19 samples (5,558) compared to the previous studies. Few researchers did analysis and experiments to ensure the model’s robustness, which is a key aspect of this type of work. Our models have been trained and tested in diﬀerent environments for robustness purpose. In the ﬁrst two scenarios, we implemented diﬀerent split on the same dataset (Dataset 1). InceptionResNetV2 and Xception performed the best in ﬁrst two environments. Then we tried combination of two diﬀerent datasets to create two more environments known as the third and fourth scenario. In both the environments, DenseNet201 performed the best. In the ﬁfth scenario, we combined all the datasets and created a Merged dataset consisting of the highest number of samples compared to the previous datasets. The models seemed to perform the best in this environment with the highest accuracy being 99.42% and lowest being 98.67%. We can conclude that if the models are developed on a massive number of images in diﬀerent scenarios before being tested, they will perform better at correctly identifying the images. The more the models are trained and tested in various settings, the more we can see how they behave, and eventually choose one that will work better in future circumstances. The average accuracy

134

S. J. Seema and M. U. Ahmed

of all the models are taken from the ﬁve scenarios. DenseNet201 excelled among all the models with an average accuracy of 97.25%. As the models performed the best in the ﬁfth scenario with the Merged dataset, we selected the best 3 performing models (InceptionResNetV2, InceptionV3, and MobileNet) from that environment and implemented voting ensemble on it in order to establish a more robust and eﬀective model. It proved to operate the best with an accuracy of 99.62%. It achieved 99.87% in identifying normal cases, 99.39% in identifying Non COVID-19 pneumonia cases and 99.12% in identifying COVID-19 pneumonia cases. When clinical criteria are taken into account, it can often be challenging to identify COVID-19 pneumonia from ordinary pneumonia. If the expert is weary, they may make more mistakes when making these mechanical assessments and conclusions. A proper decision support system can function as a radiologist’s helper, saving their valuable time and relieving them of the load of deciphering countless chest X-ray images. Our recommended model can be employed for prescreening of the X-rays. A website can be built where patients can upload their X-ray images. They will immediately know about their condition before going to the doctor. Depending on the severity of the ﬁndings, they can go to the doctor later on. Our study does have certain limitations. Only chest X-rays were used to evaluate our model. CT scans were not used by us. With this suggested model, working with CT scans might be more successful. We tested our suggested model using open datasets, but clinical data should also be used to test its robustness. Radiologists’ approval and clinical usefulness were not obtained. This study did not anticipate the sub-classiﬁcation of COVID-19 into mild, moderate, or severe disease because there was insuﬃcient data available. Although we worked with COVID-19 pneumonia data that is high in number in comparison to the other previous works but still there is a scope of working with much larger number of data. The future goal for this research work is to work with larger datasets particularly consisting CT scans. This work lacks application of data augmentation on the collected datasets and working with feature extraction. We also hope to classify the severity of the disease into mild, moderate and severe category.

5

Conclusion

We aimed to build a durable and eﬃcient model in this study by testing diﬀerent TL models in various situations. Experiments show that merging numerous datasets improves the model’s performance signiﬁcantly. For COVID-19 pneumonia identiﬁcation utilizing X-ray images, an ensemble model based on three TL models is proposed. For multi-class cases, our suggested ensemble model had classiﬁcation accuracy, precision, recall, and f1-score of 99.62 %, 1, 0.99, and 1.00 respectively. It accurately identiﬁed 99.87% of normal cases, 99.39% of Non COVID-19 pneumonia, and 99.12% of COVID-19 pneumonia. The high accuracy of this machine screening aid can signiﬁcantly increase COVID-19 diagnosis speed and accuracy. We believe that the strategy suggested in this paper will be beneﬁcial to the doctors and medical specialists.

A Reliable and Eﬃcient Transfer Learning Approach

135

References 1. Biology, P.: What is pneumonia? https://www.bumc.bu.edu/pneumonia/ background/what/. Accessed 18 Apr 2022 2. Pham, T.D.: Classiﬁcation of Covid-19 chest X-rays with deep learning: new models or ﬁne tuning? Health Inf. Sci. Syst. 9(1) (2021) 3. Seladi-Schulman, J.: Coronavirus and pneumonia: Covid-19 pneumonia symptoms, treatment (2020). https://www.healthline.com/health/coronavirus-pneumonia. Accessed 18 Apr 2022 4. WebMD: Pneumonia and coronavirus. https://www.webmd.com/lung/covid-andpneumonia1. Accessed 18 Apr 2022 5. AL Association: Learn about pneumonia. https://www.lung.org/lung-healthdiseases/lung-disease-lookup/pneumonia/learn-about-pneumonia. Accessed 26 July 2022 6. Mayo: Pneumonia symptoms and causes. https://www.mayoclinic.org/diseasesconditions/pneumonia/symptoms-causes/syc-20354204. Accessed 18 Apr 2022 7. Stephen, O., Sain, M., Maduh, U.J., Jeong, D.U.: An eﬃcient deep learning approach to pneumonia classiﬁcation in healthcare. J. Healthcare Eng. 2019 (2019) 8. Cheng, Z., et al.: Clinical features and chest CT manifestations of coronavirus disease 2019 (Covid-19) in a single-center study in Shanghai, China. Am. J. Roentgenol. 215(1), 121–126 (2020) 9. Liu, C., Wang, X., Liu, C., Sun, Q., Peng, W.: Diﬀerentiating novel coronavirus pneumonia from general pneumonia based on machine learning. Biomed. Eng. Online 19(1), 1–14 (2020) 10. Ibrahim, A.U., Ozsoz, M., Serte, S., Al-Turjman, F., Yakoi, P.S.: Pneumonia classiﬁcation using deep learning from chest X-ray images during Covid-19. Cogn. Comput. 1–13 (2021) 11. Sethy, P.K., Behera, S.K.: Detection of coronavirus disease (Covid-19) based on deep features (2020) 12. Khan, A.I., Shah, J.L., Bhat, M.M.: Coronet: a deep neural network for detection and diagnosis of Covid-19 from chest X-ray images. Comput. Methods Programs Biomed. 196, 105581 (2020) 13. Narin, A., Kaya, C., Pamuk, Z.: Automatic detection of coronavirus disease (Covid19) using X-ray images and deep convolutional neural networks. Pattern Anal. Appl. 24(3), 1207–1220 (2021) 14. Nishio, M., Noguchi, S., Matsuo, H., Murakami, T.: Automatic classiﬁcation between Covid-19 pneumonia, non-Covid-19 pneumonia, and the healthy on chest X-ray image: combination of data augmentation methods. Sci. Rep. 10(1), 1–6 (2020) 15. Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Acharya, U.R.: Automated detection of Covid-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 121, 103792 (2020) 16. Tang, S., et al.: EDL-Covid: ensemble deep learning for Covid-19 case detection from chest X-ray images. IEEE Trans. Ind. Inf. 17(9), 6539–6549 (2021) ¨ uz, C., Urhan, O., G¨ 17. Oks¨ ull¨ u, M.K.: Ensemble-CVDNet: a deep learning based endto-end classiﬁcation framework for Covid-19 detection using ensembles of networks. arXiv preprint arXiv:2012.09132 (2020) 18. Bhardwaj, P., Kaur, A.: A novel and eﬃcient deep learning approach for Covid19 detection using X-ray imaging modality. Int. J. Imaging Syst. Technol. 31(4), 1775–1791 (2021)

136

S. J. Seema and M. U. Ahmed

19. Aﬁﬁ, A., Hafsa, N.E., Ali, M.A., Alhumam, A., Alsalman, S.: An ensemble of global and local-attention based convolutional neural networks for Covid-19 diagnosis on chest X-ray images. Symmetry 13(1), 113 (2021) 20. Chowdhury, M.E., et al.: Can AI help in screening viral and Covid-19 pneumonia? IEEE Access 8, 132665–132676 (2020) 21. Rahman, T., et al.: Exploring the eﬀect of image enhancement techniques on Covid19 detection using chest X-ray images. Comput. Biol. Med. 132, 104319 (2021) 22. Chen, Z.H.: Mask-RCNN detection of Covid-19 pneumonia symptoms by employing stacked autoencoders in deep unsupervised learning on low-dose high resolution CT (2020). https://doi.org/10.21227/4kcm-m312 23. Alqudah, A.M.: Augmented Covid-19 X-ray images dataset (2020) 24. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009) 25. Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6 26. Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 27. Fei-Fei, L., Deng, J., Li, K.: ImageNet: constructing a large-scale image database. J. Vis. 9(8), 1037 (2009) 28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 29. Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recogn. 90, 119–133 (2019) 30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 31. Howard, A.G., et al.: MobileNets: eﬃcient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) 32. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 33. Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comput. Sci. 14(2), 241–258 (2020)

Infection Segmentation from COVID-19 Chest CT Scans with Dilated CBAM U-Net Tareque Bashar Ovi(B) , Md. Jawad-Ul Kabir Chowdhury , Shaira Senjuti Oyshee , and Mubdiul Islam Rizu Department of Electrical, Electronic and Communication Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh [email protected]

Abstract. The novel coronavirus illness (COVID-19) is a highly contagious virus that has swept around the world and has presented a serious threat to every country’s economy and public health. It has been demonstrated that COVID-19 can be accurately diagnosed with computed tomography (CT) scans that automatically partition infected areas. However, accurate segmentation continues to be a difficult task, due to the lack of pixel-level annotated medical images. For the automatic segmentation of distinct COVID-19 infection zones, a Convolutional Block Attention U-Net with additional dilated blocks is proposed in this study. The suggested architecture for the automatic segmentation of COVID-19 chest CT images with a dual attention mechanism and dilated block performs remarkably well in experiments, reaching an IoU score of 89.0% and a dice score of 90.2% . The proposal provides a novel, promising approach for quantitative COVID-19 detection utilizing CT scans of lung infection by overcoming the aforementioned issues. Keywords: COVID-19 · Segmentation · Chest CT · U-Net · CBAM · Dilated Convolution Block

1 Introduction Novel Coronavirus disease, or COVID-19, was recognized in December 2019. As of 18 May 2022, it has infected more than 524 million people worldwide, of which, more than 494 million have recovered, and close to 6.3 million have died [1]. Being such a deadly disease, its early diagnosis can be deemed crucial for a sound recovery. As a method of diagnosing COVID-19, Reverse transcriptase-polymerase chain reaction (RT-PCR) testing exists, however, its sensitivity ranges only from 42% to 71% [2]. On the other hand, chest CT images have shown 97% sensitivity for the diagnosis of COVID-19 [3], and chest CT images have been found to be sensitive even before clinical symptoms were exhibited by patients [4]. Therefore, the field of accurate segmentation of chest CT images has been studied extensively in recent times. Now, automatic segmentation of infectious parts in a chest CT image is quite challenging due to how low the contrast is among parts of the image, © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 137–151, 2023. https://doi.org/10.1007/978-3-031-34619-4_12

138

T. B. Ovi et al.

and also, due to how varied the position and the shape of the region of interest usually can be [5]. However, it has been found that machine learning-based approaches perform quite well for challenges like this [6, 7]. In particular, it has been shown that U-Net-based approaches work quite well for most medical image segmentation tasks [8]. Thus, a novel technique based on a U-Net that comprises convolutional block attention modules (CBAM) and a dilated convolutional block is examined in this study. The dilated convolutional block, utilized as a link between the encoder and the decoder of the U-Net, helps retain an enhanced amount of information for the decoder to work with. The CBAM collects important information and suppresses extraneous information in both the channel and spatial axes. For the purpose of optimizing the dual attention process, image enhancement using the Contract Limited Adaptive Histogram Equalizer (CLAHE) has been combined with normalizing and cropping. In the rest of the paper, first, a summary of some relevant works can be found in the literature review section. Then, a detailed discussion of the proposed model architecture has been included in the methodology section. Sequentially, the experimental study and performance assessment has been included in the result and analysis section, and finally, the conclusion, complete with a discussion and summary of the work can be found at the end of the paper.

2 Literature Review In this section, three relevant past contributions have been reviewed. Chen et al. [9] proposed a novel deep learning approach, which has both a residual network and an attention mechanism added to a ResNeXt block-based U-Net. The model achieves 89% accuracy, 94% dice similarity coefficient (DSC), and 95% precision with augmentation. According to the claim of the author, in multi-class segmentation, the model shows an improvement of over 10%, when compared to just U-Net with certain conditions in place. Zhao et al. [10] proposed D2A U-Net, which uses a ResNeXt-50-based encoder, for a dual attention U-Net to segment images. It also utilizes hybrid dilated convolution to boost its performance. The dual attention component, with modules gate attention and decoder attention, refines feature maps. This helps to produce much-improved feature representations. D2A U-Net achieves a dice score of 72.98%, a recall score of 70.71%, and a pixel error of 0.0311. Again, Wang et al. [11] investigated the transferability of a model’s segmentation capability, by including non-COVID-19 datasets in the training of a 3D U-Net. Their test showed a dice similarity coefficient (DSC) of 70.4%, a normalized surface distance of 0.735, a sensitivity of 68.2%, an F1-score of 70.7%, an accuracy of 99.4%, and a Matthews correlation coefficient (MSC) of 0.716, and reported better generalization. The over-fitting risks for segmenting COVID-19 infections are also observed to be lower in this approach. Therefore, considering it all, a model has been proposed that can achieve superior metrics without the need for non-COVID-19 datasets, based upon an approach that has shown promising performance in the cited works.

Infection Segmentation from COVID-19 Chest CT Scans

139

3 Methodology In this section, first, a brief dataset description and preprocessing details are attached. Then, the overall structure of the proposed model, with a brief description of dilated convolution module, and the backbone network, U-Net, has been included. Sequentially, a brief description of the convolutional block attention module (CBAM), is introduced. Finally, the summarized model architecture is illustrated in Fig. 4. 3.1 Dataset Description and Preprocessing The dataset contains 20 CT scans of individuals who have been given a COVID-19 diagnosis, and it also includes expert segmentation of infections in the lungs. Later on, to use this dataset, the following steps are performed, a process inspired by [18]. 1. 2. 3. 4.

Slicing Enhancement Normalizing Cropping by finding boundaries and contours

After slicing, 301 slices are collected. After that, enhancement is done using Contract Limited Adaptive Histogram Equalizer. Few results are given in Fig. 1.

Fig. 1. Samples After CLAHE Enhancement.

After the equalizer, normalization is done by dividing each pixel by its maximum pixel value. After that cropping is done by a 3 × 3 kernel, 2d filter, and a binary threshold.

140

T. B. Ovi et al.

Fig. 2. Samples After Enhancement and Cropping (left) and Final Sample (right).

Figure 2 (left) shows the state of our input image after enhancement and cropping. Finally, the input image and ground truth are shown in Fig. 2 (right). Then we have performed the following Augmentation mentioned in Table 1. Table 1. Augmentation Parameter Features

Value

Shear range

0.2

Zoom range

0.2

Horizontal flip

True

rescale

1./255

Vertical flip

True

Width shift range

0.2

Rotation range

15

Height shift range

0.2

3.2 Model Overview The proposed model uses a U-Net architecture with four residual convolutional blocks to encode or down-sample the input data to extract spatial features. However, where the

Infection Segmentation from COVID-19 Chest CT Scans

141

novelty of the proposed approach increases, is at the 5th block of the encoder network. It is a dilated module, which effectively expands the area covered by the kernel by skipping a set number of pixels between the pixels sampled by the kernel. A normal convolutional layer can be found comparable to a dilated convolutional layer with a dilation rate of 1, and if the dilation rate is set to 2, pixels for consecutive kernel members are sampled by skipping one pixel between each to-be-sampled pixels, from the inputted sampling field. Due to this, the dilated convolutional layer covers a larger area of its input to perform convolution, which when concatenated with results from multiple dilated convolutional layers with varying dilation rates, creates an output that raises the chances of identifying the region of interest. After the completion of the encoder network, the model moves onto its decoder network, consisting of 4 blocks. Finally, the Sigmoid function has been used as the activation function for a 1 × 1 convolutional layer to extract the output from the model. The complete structure of the proposed architecture is given in Fig. 3.

Fig. 3. U-Net Model Architecture.

3.3 U-Net Architecture U-Net is a type of fully convolutional network [12], which was proposed by Ronneberger et al. [7]. It is a type of artificial neural network (ANN), primarily made of convolutional layers, in sets. These layers make up blocks used in the encoder, or down-sampling network, and a set of deconvolutional layers, that make up blocks used in the decoder, or the up-sampling network. The entire architecture is symmetric due to the structuring of its encoder and decoder network. The encoder-decoder mechanism of the proposed model is given in Fig. 4.

Fig. 4. Encoder Block (Top) and Decoder Block (Bottom).

142

T. B. Ovi et al.

The encoder, designed to extract spatial features from its input, contains a sequence of blocks, each of which contains two blocks. The first block, called convolutional block, contains in parallel, a residual connection layer, and then, two branches in parallel, one of which contains a single convolutional layer and the other contain three of the same. The two branches then get concatenated, and then from that, two branches sprout out again, this time, one branch contains a single convolutional layer, and the other contains two of the same. Finally, the residual layer and the output of the branches get concatenated, and then, the output goes through a convolutional block attention module (CBAM). All of the convolutional layers used in these blocks use a kernel size of 3 × 3, except for the residual connection layer, which uses a kernel of size 1 × 1. Every layer uses ReLU activation, except for the CBAM block, which uses Sigmoid activation. The convolutional block is depicted in Fig. 5. The ReLU function and the Sigmoid functions are defined as: ReLU : f (x) = max{0, x} Sigmoid : f (x) =

1 1 + e−x

Fig. 5. The architecture of Convolutional Block (used in both encoder and decoder blocks).

In the end, the output from the CBAM block goes through the second block of an encoder block, which contains, in parallel, a max-pooling layer and an average-pooling layer, both of which use a pool size of 2 × 2, and the outputs get concatenated before getting outputted from the encoder block. Four of these encoder blocks are used in series in the encoder network. The filter number for the first block is 16, and then, the number of filters gets doubled at every block. Then, the encoder network and the decoder network are conjoined using a dilated module, with a filter size of 128, the same as the filter size used in the last encoder block, and the first decoder block. The decoder is designed for creating the segmented feature map from the spatial features retrieved by the encoder. The decoder contains another sequence of blocks, each of which contains a transpose convolutional layer at first, and then, contains a concatenation layer that concatenates the feature information retrieved from the convolutional layer of the last encoder block first. As we go through the decoder blocks, the last decoder block concatenates information from the convolutional layer of the first encoder block. Finally, a decoder block is concluded with a convolutional block, similar to the one used in encoder blocks used in the encoder network. Four of these decoder blocks are used

Infection Segmentation from COVID-19 Chest CT Scans

143

in the decoder network. The filter size gets halved at every block, and a 2 × 2 kernel is used with (2, 2) strides. Finally, the model outputs using a convolutional block with a filter size of 1, a 1 × 1 kernel, and Sigmoid as its activation function, as only binary classification is needed. 3.4 Convolutional Block Attention Module (CBAM) Proposed by Woo et al. [13], the convolutional block attention module (CBAM) tries to emphasize features that are impactful along two primary dimensions channel and spatial axes, instead of integrating cross-channel data and spatial data together, like in a regular convolutional block. The concept of CBAM is shown in Fig. 6.

Fig. 6. Convolutional Block Attention Module (CBAM) Architecture.

CBAM, as shown in Fig. 6, runs data sequentially, through a channel attention module first, and then, a spatial attention module. This allows branches of processing to have a clear and focused path of processing in both channel and spatial axes, which within the network, allows for emphasizing of the necessary information, and suppression of unnecessary details. The proposed work can be summarized in Fig. 7:

Fig. 7. Workflow.

The training parameter of our trained model is given in Table 2, which is fine-tuned through trial and error.

144

T. B. Ovi et al. Table 2. Training Parameters

Name of the Hyper-parameter

Parameter Value

Epochs

350

Batch Size

64

Output Layer Activation

Sigmoid

Optimizer

Adam (epsilon = 0.1)

Learning rate

0.05

Decay Rate

1.43e−4

Total Numbers of Parameters

7,755,445

Trainable Parameters

7,747,765

Non-Trainable Parameters

7,680

4 Result and Analysis To evaluate our model, we have used the following evaluation matrices: 1. Dice Coefficient: The Dice coefficient is applied to assess the pixel-wise agreement between a predicted segmentation and the corresponding ground truth. It is described as: Dice Coefficient =

2TP 2TP + FP + FN

Figure 8(a) depicts the dice coefficient curve, and from the curve, it is evident that the changes to the value become very minute after a bit over 100 epochs. 2. Loss: After each optimization iteration, a model’s performance is shown by its loss value. To assess how closely an estimated value resembles the ground truth value, loss functions are utilized. Validation loss peaks at a little over 50 epochs, as seen in Fig. 8(b). 3. Sensitivity: Sensitivity is the measure of how well a model can predict the true positives for each available class. This concept can be formulated as: Sensitivity =

TP TP + FN

Figure 8(c) depicts the sensitivity graph, and it is clear that after 100 epochs, the sensitivity does not change by any appreciable amount. 4. Specificity: It is a metric that evaluates the ability of a model to estimate the true negatives of the available categories. The equation for this concept is: Specificity =

TN TN + FP

Figure 8(d) depicts the specificity over epochs curve, and the curve proves the potential of the proposed model, as from nearly the beginning of the training, it gets close to 100.

Infection Segmentation from COVID-19 Chest CT Scans

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

145

Fig. 8. Evaluation Matrices: (a) Dice Coefficient; (b) Loss vs Epoch; (c) Sensitivity; (d) Specificity vs Epoch; (e) IoU vs Epoch; (f) Accuracy vs Epoch; (g) BCE Dice Loss vs Epoch; (h) Mathew’s Correlation vs epoch; (i) Precision vs Epoch; (j) Recall vs Epoch.

146

T. B. Ovi et al.

5. Accuracy: The proportion of accurately predicted data points among all the data points is known as accuracy. The formula for this concept is: Accuracy =

TP + TN TN + TP + FN + FP

Figure 8(f) depicts the accuracy curve and again, the proposed model proves its capability, as it reaches nearly 100% accuracy after about 20 epochs. 6. Intersection over Union (IoU): IoU is a typical metric for comparing the accuracy of proposed image segmentation to a known/ground-truth segmentation. The formula for IOU is, IoU =

TP TP + FP + FN

Figure 8(e) depicts the IoU curve, and after 200 epochs, the IoU does not improve appreciably. 7. BCE Dice Loss: The binary cross entropy between the target and the output is measured by a threshold that is made using BCE dice loss. Figure 8(g) shows the BCE dice loss curve, and from the curve, it is apparent that the score does not change appreciably after 200 epochs. 8. Mathews Correlation (MCC): For statistical model evaluation, MCC is used. It measures the difference between the estimated values and real values and is comparable to chi-square statistics for a 2 × 2 contingency table. Figure 8(h) shows the MCC over epochs curve, and from the curve, it is clear that MCC does not increase in any appreciable amount after 200 epochs. 9. Precision: The proportion of accurately categorized positive samples (True Positive) to the total number of positively classified samples is known as precision. The precision measures the model’s accuracy in classifying a sample as positive. Precision =

TP TP + FP

From Fig. 8(i), which depicts the precision curve, it is evident that the precision does not change appreciably after 150 epochs. 10. Recall: The recall is determined as the proportion of Positive samples that were correctly identified as Positive to all Positive samples. The recall gauges how well the model can identify positive samples. The more positive samples that are identified, the larger the recall. Mathematical intuition behind this concept is as follows, Recall =

TP TP + FN

From Fig. 8(j), which shows the recall curve, it is evident that the recall score settles down after 100 epochs. 4.1 Visual Results Figure 9 reflects how the proposed model performs on some test data. The first column consists of original CT images, the middle column of the figure shows the annotation

Infection Segmentation from COVID-19 Chest CT Scans

Fig. 9. Model performance visualized using test data.

147

148

T. B. Ovi et al.

made on the input images, and the last column shows the prediction done by the proposed model on the same input image. From the figures, it can be seen that the predictions made by the proposed model are quite similar to the annotations, and in some cases, the similarity of the predictions extends to the point of being nearly indistinguishable from the ground truth. 4.2 Performance Table Comparative performance analysis of our model with the existing literature is depicted in Table 3. From the table, it can be concluded that the proposed model has outperformed all the existing literature in every evaluation parameter. Table 3. Performance Analysis Recent Works

Used Dataset

[14]

Algorithm

Accuracy

DSC

IoU

Precision

https://med MPS-Net icalsegme ntation.com/ covid19

–

0.8325

0.742

–

[15]

Sourced by author

Novel CNN

–

0.987 (for lung) 0.726 (for COVID-19)

–

0.99 (for lung) 0.726 (for COVID-19)

[11]

COVID-19 Dataset, MSD Lung Tumor, StructSeg Lung Cancer, NSCLC Pleural Effusion

Attention-based 0.994 selective fusion unit with a dedicated and modified encoder for dynamic feature extraction and grouping

0.704

–

–

[16]

Kaggle CXR public dataset

COVID-SSNet

–

–

0.9971

0.9953

(continued)

Infection Segmentation from COVID-19 Chest CT Scans

149

Table 3. (continued) Recent Works

Used Dataset

[10]

Algorithm

Accuracy

DSC

IoU

Precision

https://med D2A icalsegme U-Net ntation.com/ covid19/ https://zen odo.org/rec ord/375 7476

–

0.7298 (with ResNeXt-50 backbone)

–

–

[17]

Sourced by author

COVID-SegNet

–

0.987 (for lung) 0.726 (for COVID-19)

–

0.99 (for lung) 0.726 (for COVID-19)

[9]

SIRM COVID Dataset

U-Net with residual and attention mechanism

0.89

0.94

–

0.95

Ours

https:// www.kag gle.com/dat asets/and rewmvd/cov id19-ctscans

U-Net

0.84

0.875

0.67

0.85

Link-Net

0.86

0.89

0.71

0.873

CBAM U-net with dilated block (Proposed architecture)

0.998

0.902

.89

0.99

5 Conclusion and Future Work According to a recent research, CT imaging is now the most popular screening method for COVID-19. It can aid the community in more promptly and properly determining the severity of COVID-19. In this study, a dual-attention-based deep learning architecture for automated segmentation of COVID-19 infectious area from CT images has been proposed, and it has shown to be both plausible and superior to previous research. In order to enhance performance, a modified CBAM U-net with a dilated block that employs an effective block and spatial attention strategy has been presented. The performance table shows that the suggested model outperforms the old method by 15% in terms of IoU. A recent study found that early COVID-19 detection is crucial. If the infection location in the chest CT image can be found early, patients have a greater chance of surviving. Radiologists now have a trustworthy and promising deep learning architecture for identifying COVID-19 treatment and segmenting the lung areas that are infected. The proposed approach has the potential to be used to a broader range of therapeutic applications in the future, such as assisting with the diagnosis of more diseases from CT

150

T. B. Ovi et al.

images. The quantity of ground truth data accessible in the case of a new disease, such as the coronavirus, is often limited due to the complexity of data collecting and annotation which restricts the model performance to broader extend. Our next objective would be to raise it to 99 percent because our validation IoU in Fig. 9(e) did not reach above 89 percent. Preprocessing and adjusting the hyperparameters are necessary. A semi-supervised generative model will be used to increase the capacity to address special problems. Future study should also focus on interpretability, which is crucial for medical applications. The attention techniques proposed in this article can induce internal decision process interpretation on some levels, despite the fact that deep learning is well known for its interpretability. This approach will continue to be developed in order to gain more scientific knowledge, and research into hybrid and multi-head attention models will also be conducted in order to provide the best possible semantic segmentation.

References 1. Coronavirus Update. https://www.worldometers.info/coronavirus/. Accessed 18 May 2022 2. Simpson, S., et al.: Radiological society of North America expert consensus statement on Proceedings of SPIE, vol. 11597 115972X-5. https://www.spiedigitallibrary.org/conferenceproceedings-of-spie on 10 Apr 2022 Terms of Use: https://www.spiedigitallibrary.org/termsof-use reporting chest CT findings related to COVID-19. Endorsed by the society of thoracic radiology, the American college of radiology, and RSNA. Radiology: Cardiothoracic Imaging 2(2) (2020) 3. Ai, T., et al.: Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (Covid-19) in China: a report of 1014 cases. Radiology 296(2) (2020) 4. Salehi, S., Abedi, A., Balakrishnan, S., Gholamrezanezhad, A.: Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients. Am. J. Roentgenol. 1–7 (2020) 5. Shan, F., et al.: Lung infection quantification of Covid-19 in CT images with deep learning. arXiv preprint arXiv:2003.04655 (2020) 6. Shi, F., et al.: Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for Covid-19. arXiv preprint arXiv:2004.02731 (2020) 7. Shen, D., Wu, G., Suk, H.-I.: Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017) 8. Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. arXiv preprint arXiv:1802.06955 (2018) 9. Chen, X., Lina, Y., Yu, Z.: Residual attention U-Net for automated multi-class segmentation of Covid-19 chest CT images. arXiv preprint arXiv:2004.05645 (2020) 10. Zhao, X., et al.: D2A U-Net: automatic segmentation of Covid-19 lesions from CT slices with dilated convolution and dual attention mechanism. arXiv preprint arXiv:2102.05210 (2021) 11. Wang, Y., et al.: Does non-COVID-19 lung lesion help? Investigating transferability in COVID-19 CT image segmentation. Comput. Methods Programs Biomed. 202, 106004 (2021) 12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 13. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

Infection Segmentation from COVID-19 Chest CT Scans

151

14. Pei, H.-Y., Yang, D., Liu, G.-R., Lu, T.: MPS-Net: multi-point supervised network for CT image segmentation of COVID-19. IEEE Access 9, 47144–47153 (2021) 15. Yan, Q., et al.: COVID-19 chest CT image segmentation–a deep convolutional neural network solution. arXiv preprint arXiv:2004.10987 (2020) 16. Prakash, N.B., Murugappan, M., Hemalakshmi, G.R., Jayalakshmi, M., Mahmud, M.: Deep transfer learning for COVID-19 detection and infection localization with superpixel based segmentation. Sustain. Cities Soc. 75, 103252 (2021) 17. Yan, Q., et al.: COVID-19 chest CT image segmentation network by multi-scale fusion and enhancement operations. IEEE Trans. Big Data 7(1), 13–24 (2021) 18. https://www.kaggle.com/code/haksorus/covid19-lungs-inf-segmentation-baseline

Convolutional Neural Network Model to Detect COVID-19 Patients Utilizing Chest X-Ray Images Md. Shahriare Satu1 , Khair Ahammed2(B) , Mohammad Zoynul Abedin3,4 , Md. Auhidur Rahman2 , Sheikh Mohammed Shariful Islam5 , A. K. M. Azad6 , Salem A. Alyami7 , and Mohammad Ali Moni8 1 Department of Management Information Systems, Noakhali Science and Technology University, Noakhali, Bangladesh [email protected] 2 Institute of Information Technology, Noakhali Science and Technology University, Noakhali, Bangladesh [email protected] 3 International Business School, Teesside University, Middlesbrough, UK 4 Department of Finance and Banking, Hajee Mohammad Danesh Science and Technology University, Rangpur, Bangladesh 5 Institute for Physical Activity and Nutrition, Deakin University, Geelong, Australia [email protected] 6 iThree Institute, Faculty of Science, University Technology of Sydney, Sydney, Australia [email protected] 7 Department of Mathematics and Statistics, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia [email protected] 8 School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, Saint Lucia, QLD 4072, Australia [email protected]

Abstract. This study aims to propose a deep learning model and detect COVID-19 chest X-ray cases more precisely. We have merged all the publicly available chest X-ray datasets of COVID-19 infected patients from Kaggle and Github, and pre-processed it using random sampling. Then, we proposed an enhanced convolutional neural network (CNN) model to this dataset and obtained a 94.03% accuracy, 95.52% AUC and 94.03% f-measure for detecting COVID-19 patients. We have also performed a comparative performance between proposed CNN model with several state-of-the-art classiﬁers including support vector machine, random forest, k-nearest neighbor, logistic regression, gaussian na¨ıve bayes, bernoulli na¨ıve bayes, decision tree, Xgboost, multilayer perceptron, nearest centroid, perceptron, deep neural network and pre-trained models such as residual neural network 50, visual geometry group network 16, and inception network V3 were employed, where our model yielded outperforming results compared to all other models. While evaluating the c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 152–166, 2023. https://doi.org/10.1007/978-3-031-34619-4_13

Convolutional Neural Network Model

153

performance of our models, we have emphasized on speciﬁcity along with accuracy to identify non-COVID-19 individuals more accurately, which may potentially facilitate the early detection of COVID-19 patients for their preliminary screening, especially in under-resourced health infrastructure with insuﬃcient PCR testing systems and testing facilities. This model could also be applicable to cases of other lung infections. Keywords: COVID-19 · Chest-Xray Images · Machine Learning Deep Learning · Convolutional Neural Network

1

·

Introduction

Novel coronavirus disease (COVID-19) is an ongoing pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [30]. The ﬁrst case of COVID-19 was believed to be detected at Wuhan, China in December 2019 and it had been spread rapidly throughout the world [11]. In has been reported that viruses from the Coronaviridae family were discovered in 1960s from the nasal pits of patients [17]. It is a massive infectious group that enveloped ribonucleic acid (RNA) viruses and generated diﬀerent types of respiratory, hepatic and neurological diseases among humans and other mammals [33]. It contains a large family of viruses where some of them induced community transmissions, such as the middle east respiratory syndrome (MERS-CoV) and severe acute respiratory syndrome (SARS-CoV). The SARS-CoV, MERS-CoV and SARS-CoV-2 were reported to be originated from bats [18], but, however, SARS-CoV-2 has been found to have phylogenetic similarity with SARS-CoV [33]. It causes COVID-19 that makes the third coronavirus emergent condition after the past two decades, preceded by the SARS-CoV and MERS-CoV outbreak in 2002 and 2012, respectively. The World Health Organization (WHO) has announced this situation as a public health emergency of international concern on 30th January and declared the situation as a pandemic on 11th March, 2020 [9]. Moreover, WHO issued public health advises to maintain diﬀerent precautions like keep social distancing, wash hand with soap and sanitizer, avoid touching nose, mouth, and eye etc. However, most of the aﬀected countries underwent a completed locked-down to prevent the local transmission of COVID-19 infection in their regions. However, SARS-CoV-2 infected patients were commonly identiﬁed primarily with some common symptoms such as fever, cough, fatigue, loss of appetite, muscle pain etc. Hence, they were needed to identify, isolate and ensure the treatment policy at early stages. There are existing two types of procedures, namely i) molecular diagnostic tests, and ii) serologic tests [23]. The reverse transcription polymerase chain reaction (RT-PCR) test is a molecular diagnostic test, which is currently considered as the gold standard [4] that detects the viral RNA of SARS-CoV-2 from sputum or nasopharyngeal swab. Nevertheless, it is relatively associated with true positive rate and required speciﬁc equipment [4]. Another technique is currently under development that are explored virus proteins to identify COVID19 called viral antigen detection. When a patient has been recovered and needed

154

Md. S. Satu et al.

to test again, the molecular tests cannot detect this disease for longer periods as the growth of the antibody may be shown in its reaction to the host. Serologic test is another primary tool to verify antibodies in blood and diagnose patients. Due to lack of analysis and skilled human resources, these procedures are time consuming and sometimes unavailable for many people especially in low and middle income countries, which, therefore, demands an alternate but cheaper solution for early diagnostics of COVID-19 inﬂections fast as possible. Recently, medical images such as chest X-ray and computed tomography (CT) scan images have been used to determine COVID-19 positive cases [21]. But, CT scan imaging is a costly procedure and not available in every hospital or medical centers. Alternatively, chest X-ray scanning machines are found in almost all the nearest clinic, medical lab or hospitals rather than biomolecular laboratory test. So, it is cheaper, faster and widely-used way to generate 2D images of the patients [16], that can potentially be used for COVID-19 patients as well. Moreover, radiologists have used these images to explore its pathology and detect relevant diseases. Most of the existing works were implemented machine learning algorithms into medical images for detecting COVID-19 patients and focused on how classiﬁers were adopted to identify positive cases, but failed to detect false negative cases that cause more community transmission of COVID19, requiring more stringent attention on the speciﬁcity measure of thoe model predictions. In this study, we proposed a convolutional neural network (CNN) to investigate chest X-ray images and identify COVID-19 patients in early stage more precisely with higher speciﬁcity, that may aid public health systems to reduce the local community transmission rate. This paper is organized as follows: Sect. 2 provides some related works about chest X-ray image analysis. Then, Sect. 3 describes about working dataset and step-by-step procedure of how we can analyze it using machine learning (ML) and deep learning (DL) models. Section 4 shows the experimental outcomes and Sect. 5 explains the performance of this work. Finally, Sect. 6 concludes this work by providing some future research directions.

2

Literature Review

Several studies have reported the use of medical images of COVID-19 infection for further investigation using various machine and deep learning methods. [31] generated a large benchmark dataset with 13,975 chest X-ray images called COVIDx and investigated them using deep learning model that showed 93.30% accuracy. [1] proposed DeTraC deep CNN model that gave solution by transferring knowledge from generic object recognition to domain-speciﬁc tasks. Their algorithm showed 95.55% accuracy (speciﬁcity of 91.87%, and a precision of 93.36%). [3] implemented transfer learning using CNNs into a small medical image dataset (1427 X-ray images), which provided highest 96.78% accuracy, 98.66% sensitivity, and 96.46% speciﬁcity respectively. [13] proposed a deep CNN architecture called COVIDX-Net that investigated 50 chest X-ray images with 25 COVID-19 cases and provided 90% accuracy and 91% F-score. [19] proposed

Convolutional Neural Network Model

155

a deep learning framework based on 5000 images named COVID-Xray-5k where they applied ResNet18, ResNet50, SqueezeNet and Densenet-121 into them and produced sensitivity 97.5% and speciﬁcity 90% on average. [10] used transfer learning based VGG16 model into chest x-ray images which showed 94.5% accuracy, 98.4% sensitivity and 98% speciﬁcity. Again, [15] represented a deep neural network based on Xception named CoroNet that provided 89.6% accuracy for four class and 95% accuracy for three class images. [5] used two-phase classiﬁcation approach that extracted majority vote based ensemble classiﬁer and showed 91.03% accuracy to detect COVID-19 from pneumonia. Also, [12] used several deep learning approaches, namely deep feature extraction with SVM, ﬁne tuning pre-trained CNN, and end-to-end trained CNN model, which classiﬁed COVID19 and normal chest X-ray images. [14] proposed a customized CNN with distinctive ﬁlter learning module that shows 97.94% accuracy and 96.90% F1-score for predicting four classes respectively. [22] proposed an automatic detection model where MobileNet with the SVM (linear kernel) provides 98.5% accuracy and an F1-score and DenseNet201 with MLP shows 95.6% accuracy and an F1-score for COVID-19 infection based on chest X-ray images. [20] investigated 1616 chest X-ray images using DenseNet161 where it shows 79.89% accuracy to classify normal, pathological and COVID-19 patients. [8] represented a custom CNN based model named COVID-XNet that shows 94.43% average accuracy, 98.8% AUC, 96.33% sensitivity, and 93.76% speciﬁcity respectively. [29] provided a siamese neural network called MetaCOVID to integrate contrastive learning with a ﬁnetuned pre-trained ConvNet encoder and capture unbiased feature representations using 10-shot learning scores and compared among the meta learning algorithm with InceptionV3, Xception, Inception, ResNetV2, and VGG16. [28] proposed fusion model hand-crafted with deep learning features (FM-HCF-DLF) that used multi-layer perceptron (MLP) and InceptionV3 where MLP generated 94.08% accuracy.

3

Materials and Methods

The working methodology has been used to detect COVID-19 patients from the publicly available datasets. This approach is described brieﬂy as follows: 3.1

Data Collection

The primary chest X-ray images have been obtained from the COVID-19 Radiography Database [6]. It contained 1,341 normal, 1,345 viral pneumonia, and 219 COVID-19 patient’s images, which have been taken as primary dataset. However, the distribution of diﬀerent types of images was not the same. To balance this dataset, we collected 66 images from [7] and added them with COVID-19 images of primary dataset. For other classes (normal and pneumonia), a random under-sampling method has been used and generated balanced instances of each class. Finally, this experimental dataset had been contained 285 normal, viral pneumonia and COVID-19 images respectively.

156

3.2

Md. S. Satu et al.

Data Pre-processing

In this step, we normalized training set into grayscale images. Then, all baseline classiﬁers have been implemented with transformed dataset respectively. But, pre-trained CNN models such as VGG16, ResNet50, InceptionV3 cannot support grayscale images, hence we directly employed them into primary dataset. 3.3

Proposed Convolutional Neural Networks

Convolutional Neural Networks (CNN) is a special class of artiﬁcial neural network (ANN) that manipulates an input layer along with the sequence of hidden and output layers. It maintains a sparse connection between layers and weights that shares them with output neurons in the hidden layers. Like regular ANN, CNN contains a sequence of hidden layers, which are denoted as convolutional and polling layer. In addition, the operations of these layers are called convolutional and polling operation, respectively. Alternatively, they are stacked to lead a series of fully connected layers followed by an output layer. In many research ﬁelds including image recognition, object detection, semantic segmentation and medical image analysis, CNN models yield considerably higher performances compared to the state-of-the-arts (Fig. 1).

Fig. 1. Proposed Convolution Neural Network

Convolutional Neural Network Model

157

Convolutional Layer. Convolution layer is the core structure of a CNN that manipulates convolution operation (represented by ∗) instead of general matrix multiplication. This layer accomplishes most of the computations of CNN model. The count of ﬁlters, size of local region, stride, and padding are mentioned as hyper-parameters of this layer. Convolution layers extract and learn about features using these ﬁlters. Hence, it is known as the feature extraction layer. These parameters are necessary where the similar ﬁlter is traversed across the whole image for a single feature. The main objective of this layer is to identify common features of input images and map their appearance to the feature map. The convolution operation is given as: I(i + m, j + n)K(m, n) (1) F (i, j) = (I ∗ K)(i, j) = m

n

To introduce non-linearity, the output of each convolutional layer is fed to an activation function. Numerous activation functions are available but Rectiﬁed Linear Unit (ReLU) is widely used in the deep learning ﬁeld. It is mathematically calculated as follows: f (x) = max(0, x)

(2)

In this model, we have used fewer layers and ﬁlters, which consists of two convolutional layers and gradually increased the number of ﬁlters from 32 to 64, respectively, where an image of size 100 × 100 and the pixel values of them are either 0 or 1. In the ﬁrst convolutional layer, this image is convoluted with 3 × 3 kernel for 32 ﬁlters and produces the feature map 100 × 100 × 32. Subsequently, this output has been forwarded to the second convolutional layer where we consider 3 × 3 sized kernel for 64 ﬁlters that is also convoluted with 100 × 100 × 32 extracted features and produced 100 × 100 × 64 sized output feature map is produced in this layer. Pooling Layer. In CNN, the sequence of convolution layer is followed by an optional pooling or down sampling layer to lessen the volume of input images and number of parameters. This layer computes fast and precludes over-ﬁtting. The most common pooling technique is called Max Pooling, which merely generates the highest result of the input region. Other pooling options are average pooling and sum pooling. Two hyper-parameters are essential for the pooling layer, namely ﬁlter and stride. In this model, we implement 2 × 2 ﬁlter into 100 × 100 × 64 sized output feature map and create 50 × 50 × 64 reduced feature map. Flatten Layer. After implementing the pooling layer, a ﬂatten layer has been employed to ﬂat the entire network. It converts the entire pooled feature map matrix into a single column.

158

Md. S. Satu et al.

Dense Layer. Then, we have implemented three dense layers which are also known as a fully connected layer. In this layer, the input of previous layers is ﬂattened from a matrix into a vector and forwarded it to this layer like a neural network. This layer viewed the output of past layers and decided which features are mostly matched with the individual class. Therefore, a fully connected layer can yield accurate probabilities for the diﬀerent classes. The outputs are classiﬁed by using the activation function at the output layer, which in our case was the Softmax function to calculate the probability of particular classes deﬁned by the following this equation: k

k

ex

Z = n

i=1

exn

(3)

Dropout Layer. When a large feed-forward neural network is investigated with a small training set, it usually shows poor performance on held-out test data, and dropout is a useful procedure to mitigate this problem. In our model, we used dropout layer after each dense layer and to reduce over-ﬁtting by preventing complex co-adaptations on the training data. 3.4

Baseline Classifiers

Several machine learning classiﬁers have been used to perform comparative performance assessments, such as support vector machine (SVM), random forest (RF), k-nearest neighbor (k-NN), logistic regression (LR), gaussian na¨ıve bayes (GNB), Bernoulli na¨ıve bayes (BNB), decision tree (DT), Xgboost (XGB), multilayer perceptron (MLP), nearest centroid (NC) and perceptron. While training with pre-processed dataset, several model hyper-parameters were ﬁne-tuned (i.e. changed and optimized) to get better predictive accuracy. These parameters of baseline classiﬁers are represented in Table 2. 3.5

Pre-trained Transfer Learning Models

Moreover, several deep learning classiﬁers were employed such as deep neural network (DNN) and several pre-trained CNNs like residual neural network (ResNet50), visual geometry group network 16 (VGG16), and inception network V3 (InceptionV3) for transfer learning. These models have been widely used to investigate images in various domains [2,25,26]. Like general classiﬁers, various parameters were tuned to get more accurate result for detecting COVID-19. For DNN, we considered the batch-size as 32, number of epochs as 50, adam optimizer, and the learning rate as 0.0001 with weight decay. Again, some regularization terms have been employed for reducing overﬁtting in the deep learning models. When pre-trained models have been loaded, it was downloaded requiring packages to manipulate input images. Then, the ﬂatten layer was added into these pre-trained models, which ﬂattens the input to one dimension. Then, we implemented a dense layer with 64 neurons, the relu activation function

Convolutional Neural Network Model

159

Table 1. A summary of proposed 9 layers model Layer (Type)

Output Shape Param #

conv2d 1 (Conv2D) conv2d 2 (Conv2D) max pooling2d 1 (MaxPooling2D) ﬂatten 1 (Flatten) dense 1 (Dense) dropout 1 (Dropout) dense 2 (Dense) dropout 2 (Dropout) dense 3 (Dense)

(100, 100, 32) (100, 100, 64) (50, 50, 64) 160000 120 120 60 60 3

320 18496 0 0 19200120 0 7260 0 183

Total params: 19,226,379 Trainable params: 19,226,379 Non-trainable params: 0

and the regularizer as 0.001, respectively. Before and after employing the dense layer, dropout layer has been used to reduce the over-ﬁtting issues. Finally, three classes have been assigned with the softmax activation function. To compile the model, categorical crossentopy loss function and adam optimizer were taken with 0.00001 learning rate. Therefore, we used last 1 trainable layer for ResNet50 and considered the last 62 trainable layers for InceptionV3 (Table 1). Table 2. Diﬀerent model parameters of classical machine learning classiﬁers Classiﬁer

Parameters

SVM

linear kernel, gamma = 0.0001

KNN

k = 5, euclidean distance

GNB

priors = None, var smoothing = 1e−09

BNB

alpha = 1.0, binarize = 0.0

DT

gini criterion, best splitter

LR

liblinear solver, max iter = 1000

RT

max depth = None, random state = 0

GB

max features = 2, max depth = 2, random state = 0

XGB

learning rate = 0.1, max depth = 3

MLP

adam solver, alpha = 1e−5 , random state = 1

NC

manhattan metric

Perceptron tol = 1e−3 , random state = 0

160

3.6

Md. S. Satu et al.

Evaluation

The performance of individual classiﬁers was assessed by diﬀerent evaluation metrics such as accuracy, AUC, F-measure, sensitivity, and specificity, respectively. The brief description of them is given as follows. – Accuracy is the summation of TP and TN divided by the total instance values of the confusion matrix. Accuracy =

TP + TN TP + FP + TN + FN

(4)

– AUC measures the capability of the model to distinguish between classes. 1 T P R F P R−1 (x) dx (5) AUC = x=0

– F-Measure/F1-score is the harmonic mean of precision and recall. F-Measure =

2TP 2TP + FP + FN

(6)

– Sensitivity is the ratio of true positives that are correctly identiﬁed. Sensitivity =

TP TP + FN

(7)

– Specificity is the ratio of true negatives that are correctly identiﬁed. Speciﬁcity =

4

TN TN + FP

(8)

Experiment Results

In this work, proposed CNN was used to investigate chest X-ray images of normal, pneumonia and COVID-19 patients. We have used various classiﬁers, e.g. SVM, RF, KNN, LR, GNB, BNB, XGB, MLP, NC and perceptron using scikit learn library in python. Again, deep learning model such as DNN and pretrained CNN models (VGG16, MobileNet, ResNet50) were implemented using keras library. Therefore, these classiﬁers were employed 10 fold cross validation procedure using python programming language. All applications have been implemented on a laptop with Asus VivoBook S, Core i5-8250U Processor 8th generation - (8 GB/1 TB HDD/Windows 10 Home/) Intel UHD Graphics 620. The performance of all classiﬁers was assessed by diﬀerent evaluation metrics: accuracy, AUC, F-measure, sensitivity, and speciﬁcity respectively. The results of these classiﬁers are shown in Table 3. Most of the models yielded substantially good predictive performances to classify chest X-ray images, where seven of them such as Proposed CNN, XGB, LR, SVM, MLP, RF and GB show the results greater than 90%, six classiﬁers like perceptron, KNN, GNB, NC, DT and

Convolutional Neural Network Model

161

Table 3. Performance Analysis of Classiﬁcation Methods Classiﬁer

Accuracy AUC

F-Measure Sensitivity Speciﬁcity

XGB

0.9274

0.9456

0.9274

0.9274

0.9637

LR

0.9251

0.9438

0.9253

0.9251

0.9625

SVM

0.9228

0.9421

0.923

0.9228

0.9614

MLP

0.9157

0.9368

0.9163

0.9157

0.9578

RF

0.9064

0.9298

0.906

0.9064

0.9532

GB

0.9052

0.9289

0.9049

0.9052

0.9526

Perceptron

0.8935

0.9201

0.8938

0.8935

0.9467

KNN

0.8538

0.8903

0.8558

0.8538

0.9269

GNB

0.8187

0.864

0.8155

0.8187

0.9093

NC

0.7988

0.8491

0.8009

0.7988

0.8994

DT

0.7883

0.8412

0.7889

0.7883

0.8941

DNN

0.7029

0.7772

0.7035

0.7029

0.8514

ResNet50

0.6070

0.7052

0.5948

0.6070

0.8035

VGG16

0.6035

0.7026

0.5816

0.6035

0.8017

BNB

0.5625

0.6719

0.5180

0.5625

0.7812

InceptionV3

0.5298

0.6473

0.5314

0.5298

0.7649

0.9403

0.9701

Proposed CNN 0.9403

0.9552 0.9403

DNN represent their performance greater than 70% and less than 90%. Consequently, the rest of the classiﬁers for instances, BNB and pre-trained CNNs such as ResNet50, VGG16, InceptionV3 are provide less than 70% outcomes. The characteristics of these neural network/regression-based classiﬁers are more realistic to investigate COVID-19 dataset. The details of average results of diﬀerent classiﬁers are shown in Fig. 2. Among all of these classiﬁers, proposed CNN shows the highest accuracy (94.03%), AUROC (95.53%), F-measure (94.03%), sensitivity (94.03%), and speciﬁcity (97.01%), where it classiﬁes 272 COVID-19 instances from 285 instances accurately. Then, XGB, LR, SVM, MLP, RF and GB demonstrate better results than other algorithms except for CNN. Moreover, the average performance of all classiﬁers is satisfactory for all evaluation metrics. We believed, due to the small number of COVID-19 cases, these deep classiﬁers cannot show more accurate results like CNN and others. Therefore, our proposed CNN model is found the best classiﬁer to identify both COVID-19 positive and negative cases with high performance and can assist physician and policy maker to identify them quickly and take necessary steps. In the proposed CNN, some metrics like accuracy, f-measure and sensitivity are indicated how COVID-19 positive (target) cases can be determined. Besides, they have been eﬀective to explore and identify the target cases. This model shows 94.03% accuracy, f-measure and sensitivity and 95.52% AUC correspond-

162

Md. S. Satu et al.

Fig. 2. Average results of individual classiﬁer

ingly. Besides, speciﬁcity is one of the most important terms because it shows how COVID-19 negative patients can be explored more accurately. The incremental result of speciﬁcity is denoted more appropriate identiﬁcation of COVID19 negative cases. Hence, the community transmission has been reduced by these infectious persons whose were not detected as positive cases precisely. Note, our proposed CNN has yielded the highest speciﬁcity (97.01%) among all the models evaluated in this work by outperforming them (Table 3). Thus, the proposed CNN indicate its potentiality to avoid false positive cases.

5

Discussion

In this work, several machine and deep learning classiﬁers were used to investigate chest X-ray images and detect COVID-19 positive cases rapidly. Most of the usage classiﬁers were widely implemented in earlier works which have been shown good predictive performances. Recently, many studies have been conducted related to COVID-19 chest X-ray image analysis, where some limitations were identiﬁed. Many works were focused on high sensitivity, i.e. how the classiﬁers can identify COVID-19 positive cases frequently [12–15,22]. But, nowadays the community transmission is a great issue to prevent the spread of COVID-19 and the growth of false negative rates are accelerated, which is a great concern. So, in this work, we particularly focused on speciﬁcity (i.g. reduced false negative rates) along with other metrics. Our proposed CNN model shows better speciﬁcity (97.01%) than many existing works [1,3,5,8,19,24]. At the application level, the usage of proposed model may prevent COVID-19 community transmission by detecting false negative cases more accurately. Also, we veriﬁed our experimental results using various evaluation metrics like accuracy, AUC, Fmeasure, sensitivity, and speciﬁcity respectively. Several works have analyzed a few number of COVID-19 samples along with other cases where their experi-

Convolutional Neural Network Model

163

mental dataset was remained imbalanced [13,27,28,32]. Besides, they were conducted with a separate dataset where the scarcity of samples was found in both of these datasets [3,28]. In the proposed model, we integrated COVID-19 samples of related datasets in addition to balance the target classes named normal and viral pneumonia using random under-sampling in view of these samples. Some works have improved their results, specially, accuracy by merging non-COVID classes (e.g., class 3 to 2) [3]. It is realized that normal and viral pneumonia are hardly associated with COVID-19. In this current study, we considered these conditions and analyzed them to justify this epidemic situation. Many state-ofthe-arts of CNN such as pre-trained transfer learning models which are available to investigate chest X-ray images and classify COVID-19 [1,3,24]. But, they were not shown better results for small number of samples. If we increase the number of images, it decreases the risk of false positive rate. In the previous work, most of them did not justify their work with machine learning along with deep learning simultaneously [3,13]. Instead, our approach gathers a small and a large number of chest X-images to investigate COVID-19 cases more precisely. Many techniques such as RT-PCR test and viral antigen detection techniques have been useful to identify COVID-19 cases more accurately. But, most of them are cost-heavy, time-consuming and requires speciﬁc instructions to implement them. However, sample collection from a large population is a slow process where the infection may remain undetected. Instead, chest-X-ray images are more accessible than other diagnostic measures. In this light, proposed CNN is more vigilant against false negative prediction to tackle community transmission and get better predictability as that undetected cases cannot further trigger more infections. Besides, physicians and healthcare workers cannot take proper steps when many patients are admitted to the hospital. If these cases are detected at an early stage, they can isolate from their community and give proper treatment rapidly to reduce the transmission of COVID-19. In this situation, we need a suitable tool that detect COVID-19 positive and negative cases more feasible way. Our proposed CNN model can automatically detect these cases more accurately with high speciﬁcity. In this work, we focus on COVID-19 negative cases so that community transmission is unchained as strongly as possible. Recently, this pandemic condition is getting severe day-by-day. Diﬀerent sectors such as agriculture, business, ﬁnance are faced a huge amount of loss during this period. Many people lose their jobs and do not manage other working opportunities in this pandemic situation. Also, the transmission rate of SARS-CoV-2 is extremely high, hence any undetected COVID-19 case may potentially spread out this disease throughout their community. Therefore, early detection via costeﬀective tools with high predictive power is extremely required to recognize cases and take proper steps as soon as possible. High speciﬁcity rate of our proposed CNN model can successfully reduce false negative rates by detecting subtle cases, that signiﬁcantly aﬀects not only the public health but consequently re-induce the social and economic normality.

164

6

Md. S. Satu et al.

Conclusion and Future Work

The study proposed a CNN model, which analyzed chest X-ray images of COVID-19, healthy and other viral pneumonia patients to classify and diagnosis COVID-19 patients automatically in a short period of time. Again, various machine and deep learning-based approach were used to justify the performance of our proposed CNN that yields the highest 94.01% accuracy, 95.53% AUC, 94.01% f-measure, 94.01% sensitivity and 97.01% speciﬁcity to detect COVID19 patients. Despite taking all the measures of avoiding over-ﬁtting, the performance of proposed CNN model is surprisingly well with small datasets, however, it would be interesting to see its performance with larger training dataset. Hence, in future, we will collect a large number of images from various sources and analyze them to get more feasible outcomes. This approach may be helpful for clinical practices and detection of COVID-19 cases to prevent future community transmission.

References 1. Abbas, A., Abdelsamea, M.M., Gaber, M.M.: Classiﬁcation of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl. Intell. 51, 854–864 (2020). https://doi.org/10.1007/s10489-020-01829-7 2. Ahammed, K., Satu, M.S., Khan, M.I., Whaiduzzaman, M.: Predicting infectious state of hepatitis C virus aﬀected patient’s applying machine learning methods. In: 2020 IEEE Region 10 Symposium (TENSYMP), pp. 1371–1374. IEEE (2020) 3. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 1 (2020) 4. Butt, C., Gill, J., Chun, D., Babu, B.A.: Deep learning system to screen coronavirus disease 2019 pneumonia. Appl. Intell. 1 (2020) 5. Chandra, T.B., Verma, K., Singh, B.K., Jain, D., Netam, S.S.: Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classiﬁer ensemble. Expert Syst. Appl. 165, 113909 (2021). https://doi.org/10.1016/j.eswa.2020.113909, http://www.sciencedirect. com/science/article/pii/S0957417420307041 6. Chowdhury, M.E., et al.: Can AI help in screening viral and COVID-19 pneumonia? arXiv preprint arXiv:2003.13145 (2020) 7. Cohen, J.P., Morrison, P., Dao, L.: COVID-19 image data collection. arXiv:2003.11597 (2020). https://github.com/ieee8023/covid-chestxray-dataset 8. Duran-Lopez, L., Dominguez-Morales, J.P., Corral-Jaime, J., Vicente-Diaz, S., Linares-Barranco, A.: COVID-XNet: a custom deep learning system to diagnose and locate COVID-19 in chest X-ray images. Appl. Sci. 10(16), 5683 (2020). https://doi.org/10.3390/app10165683, https://www.mdpi.com/2076-3417/10/16/ 5683 9. Dutta, S., Bandyopadhyay, S.K., Kim, T.H.: CNN-LSTM model for verifying predictions of COVID-19 cases. Asian J. Res. Comput. Sci. 25–32 (2020). https://doi.org/10.9734/ajrcos/2020/v5i430141, https://www.journalajrcos.com/ index.php/AJRCOS/article/view/30141

Convolutional Neural Network Model

165

10. Heidari, M., Mirniaharikandehei, S., Khuzani, A.Z., Danala, G., Qiu, Y., Zheng, B.: Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. Int. J. Med. Inform. 144, 104284 (2020). https://doi.org/10.1016/j.ijmedinf.2020.104284, http://www.sciencedirect. com/science/article/pii/S138650562030959X 11. Holshue, M.L., et al.: First case of 2019 novel coronavirus in the United States. New Engl. J. Med. (2020) 12. Ismael, A.M., S ¸ eng¨ ur, A.: Deep learning approaches for COVID-19 detection based on chest X-ray images. Expert Syst. Appl. 164, 114054 (2021). https://doi.org/10.1016/j.eswa.2020.114054, http://www.sciencedirect. com/science/article/pii/S0957417420308198 13. Karar, M.E., Hemdan, E.E.D., Shouman, M.A.: Cascaded deep learning classiﬁers for computer-aided diagnosis of COVID-19 and pneumonia diseases in X-ray scans. Complex Intell. Syst. 7, 235–247 (2020). https://doi.org/10.1007/s40747020-00199-4 14. Karthik, R., Menaka, R., M., H.: Learning distinctive ﬁlters for COVID-19 detection from chest X-ray using shuﬄed residual CNN. Appl. Soft Comput. 106744 (2020). https://doi.org/10.1016/j.asoc.2020.106744, https://www. sciencedirect.com/science/article/pii/S1568494620306827 15. Khan, A.I., Shah, J.L., Bhat, M.M.: CoroNet: a deep neural network for detection and diagnosis of COVID-19 from chest x-ray images. Comput. Methods Program. Biomed. 196, 105581 (2020). https://doi.org/10.1016/j.cmpb.2020.105581, http:// www.sciencedirect.com/science/article/pii/S0169260720314140 16. Kroft, L.J., van der Velden, L., Gir´ on, I.H., Roelofs, J.J., de Roos, A., Geleijns, J.: Added value of ultra-low-dose computed tomography, dose equivalent to chest x-ray radiography, for diagnosing chest pathology. J. Thorac. Imaging 34(3), 179 (2019) 17. Lippi, G., Plebani, M.: Procalcitonin in patients with severe coronavirus disease 2019 (covid-19): a meta-analysis. Clin. Chimica Acta Int. J. Clin. Chem. 505, 190 (2020) 18. Lu, R., et al.: Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 395(10224), 565–574 (2020) 19. Minaee, S., Kaﬁeh, R., Sonka, M., Yazdani, S., Jamalipour Souﬁ, G.: DeepCOVID: predicting COVID-19 from chest X-ray images using deep transfer learning. Med. Image Anal. 65, 101794 (2020). https://doi.org/10.1016/j.media.2020. 101794, http://www.sciencedirect.com/science/article/pii/S1361841520301584 20. Moura, J.D., et al.: Deep convolutional approaches for the analysis of COVID-19 using chest X-ray images from portable devices. IEEE Access 8, 195594–195607 (2020). https://doi.org/10.1109/ACCESS.2020.3033762 21. Ng, M.Y., et al.: Imaging proﬁle of the COVID-19 infection: radiologic ﬁndings and literature review. Radiol. Cardiothorac. Imaging 2(1), e200034 (2020) 22. Ohata, E.F., et al.: Automatic detection of COVID-19 infection using chest Xray images through transfer learning. IEEE/CAA J. Autom. Sinica 8(1), 239–248 (2021). https://doi.org/10.1109/JAS.2020.1003393 23. World Health Organization, et al.: Laboratory testing for coronavirus disease 2019 (COVID-19) in suspected human cases: interim guidance, 2 March 2020. Technical report, World Health Organization (2020) 24. Pandit, M.K., Banday, S.A.: SARS n-CoV2-19 detection from chest x-ray images using deep neural networks. Int. J. Pervasive Comput. Commun. 16(5), 419–427 (2020). https://doi.org/10.1108/IJPCC-06-2020-0060

166

Md. S. Satu et al.

25. Shahriare Satu, M., Atik, S.T., Moni, M.A.: A novel hybrid machine learning model to predict diabetes mellitus. In: Uddin, M.S., Bansal, J.C. (eds.) Proceedings of International Joint Conference on Computational Intelligence. AIS, pp. 453–465. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3607-6 36 26. Satu, M.S., Rahman, S., Khan, M.I., Abedin, M.Z., Kaiser, M.S., Mahmud, M.: Towards improved detection of cognitive performance using bidirectional multilayer long-short term memory neural network. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 297–306. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 27 27. Sekeroglu, B., Ozsahin, I.: Detection of COVID-19 from chest X-ray images using convolutional neural networks. SLAS TECHNOL.: Transl. Life Sci. Innov. 25(6), 553–565 (2020). https://doi.org/10.1177/2472630320958376 28. Shankar, K., Perumal, E.: A novel hand-crafted with deep learning features based fusion model for COVID-19 diagnosis and classiﬁcation using chest X-ray images. Complex Intell. Syst. 7, 1277–1293 (2020). https://doi.org/10.1007/s40747-02000216-6 29. Shorfuzzaman, M., Hossain, M.S.: MetaCOVID: a siamese neural network framework with contrastive loss for N-shot diagnosis of COVID-19 patients. Pattern Recognit. 107700 (2020). https://doi.org/10.1016/j.patcog.2020.107700, https:// www.sciencedirect.com/science/article/pii/S0031320320305033 30. Stoecklin, S.B., et al.: First cases of coronavirus disease 2019 (COVID-19) in France: surveillance, investigations and control measures, January 2020. Eurosurveillance 25(6), 2000094 (2020) 31. Wang, L., Lin, Z.Q., Wong, A.: COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 10(1), 19549 (2020). https://doi.org/10.1038/s41598-020-76550-z, https:// www.nature.com/articles/s41598-020-76550-z 32. Zebin, T., Rezvy, S.: COVID-19 detection and disease progression visualization: deep learning on chest X-rays for classiﬁcation and coarse localization. Appl. Intell. 51, 1010–1021 (2020). https://doi.org/10.1007/s10489-020-01867-1 33. Zhu, N., et al.: A novel coronavirus from patients with pneumonia in China, 2019. New Engl. J. Med. (2020)

Classification of Tumor Cell Using a Naive Convolutional Neural Network Model Debashis Gupta1(B) , Syed Rahat Hassan2 , Renu Gupta3 , Urmi Saha1 , and Mohammed Sowket Ali1 1

3

Bangladesh Army University of Science and Technology, Saidpur, Bangladesh {debashisgupta,sahaurmi,sowket}@baust.edu.bd 2 Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh [email protected] TMSS Medical College and Rafatullah Community Hospital, Bogra, Bangladesh [email protected] Abstract. Early detection of tumor tissue leading to cancer is the most burning health issue of the present world due to an increase of radiation, ultraviolet light, radon gas, infectious agents etc. To diagnose the tumor cell promptly nowadays computer-aided detection (CAD) systems using a convolutional neural network (CNN) draws a signiﬁcant role in the health sector. Many complicated CNN model has been introduced to eﬀectively classify tumor cell but in this study, we proposed a relatively less complex deep learning approach that is as eﬀective and reliable as a renowned pre-trained models such as VGG19, Inception-v3, Resnet-50 and DenseNet-201. Our proposed architecture can perfectly classify the tumor cell based on the PatchCamelyon (PCam) dataset with appeasement validation accuracy 94.70% using less computational parameters, comparatively mentioned pre-trained model. Keywords: Tumor Cell · Neural Network · Classiﬁcation · CNN · Digital histopathology · Computer-assisted diagnosis · Transfer learning

1

Introduction

In 2018, WHO reported that cancer is responsible for 9.6 million deaths and one of the prime causes of death around the globe [1]. Cancer cells grow without any bound and can spread throughout the body. Among many other reasons, one of the reasons behind cancer is genetic changes. Environmental elements like chemicals in tobacco, radiation, etc. also play a role in diﬀerent kinds of cancer [2]. To save valuable life, early detection of cancer is mandatory. In the early time of medical diagnosis, the test was quite laborious and expensive. With time science has shown us the path to ﬁnd a more eﬃcient, fast and cheaper way to do the test and acquire more reliable results. Machine learning became one of the well-established emerging techniques for medical diagnosis tasks in the late 20th-century and early 21st-century [8]. One of the problems For writing (original draft preparation) include Renu Gupta with other authors. c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 167–176, 2023. https://doi.org/10.1007/978-3-031-34619-4_14

168

D. Gupta et al.

with machine learning techniques was it required handcrafted feature engineering. Later deep learning helped us to thrive in this ﬁeld more robustly than ever [3]. With diﬀerent deep learning methods, we are continuously improving in case of diagnosis and overcoming obstacles. Various kinds of medical image techniques are there like MRI, X-ray, PET etc. to identify diseases. Among various deep learning methods, Convolutional Neural Network (CNN) has proven its worth quite clearly when it comes to image data [12]. Analyzing medical image data, pathology slides, etc. can be done with CNN more eﬃciently than before. Modern Deep Learning models depict an advanced improvement in classifying medical images and illustrate an impressive successful report on diagnosing diseases. However, these state-of-art-models are trained on numerous parameters, which also require a need to have the superior computational power and cost complexity. Hence in this study, we illustrated a model which achieves analogous accuracy along with other experimental metrics like f1-score, precision, recall, and support similar to the widely accepted pre-trained models, i.e., VGG-19, Inception-v3, ResNet-50, and DenseNet-201. Furthermore, we also successfully showed that our proposed architecture having equivalent metrics has much fewer computational parameters than mentioned models.

2

Literature Review

Zheng et al. [19] demonstrated a resnet based architecture to classify the cancer cells. They used Test Time Augmentation to make random changes to the test inputs and then feed through their model which shows an AUC score of 98%. Wang et al. [18] used Cameleyon16 dataset where they used two diﬀerent evaluation approach and achieved a adequate result. The ﬁrst approach was slide based classiﬁcation and second approach was lesion based classiﬁcation. With deep learning methods, in the ﬁrst case they achieved an AUC of 0.925 and in the second case they achieved 0.7051. Liang et al. [10] introduced CNN model with Convolutional Block Attention Module (CBAM) to identify cancer cells. Model was validated using the PCam dataset and got 0.976 AUC score. Lantang et al. [9] proposed their own convolutional neural network to identify cancer cells on the PCam dataset. They had total 8 convolution layers and after each two convolution layer there was a max pool layer. After convolution process, a ﬂatten layer was put to make the feature maps into a vector. Finally three FC (Fully connected) layer with dropout was placed to ﬁnish the architecture. With their approach the got 92% accuracy, 94% F1-score for cancerous cell class and 94% for no cancerous cell class and got AUC of 98%. Kassani et al. [7] proposed three-path ensemble architecture using three pretrained model VGG19, MobileNetV2 and Densenet201 for classiﬁcation of breast cancer. With this approach they ended up with a decent accuracy of 90.84%, 89.09% and 87.84% for VGG19, MobileNetV2 and Densenet201 respectively for PatchCamelyon dataset.

Classiﬁcation of Tumor Cell

169

In [17] for breast cancer detection on BreakHis dataset, a summation of the boosting trees classiﬁers and CNN was proposed where the model engaged Inception-ResNet-v2 model for feature extraction from multi-scale images. Afterward, using gradient boosting tree the classiﬁer is used for ﬁnal classiﬁcation leap. Roy et al. [11] proposed a patch-based classiﬁer using CNN and majority voting method to classify breast cancer histopathology using ICIAR dataset. The proposed model predicts the output class label for both binary and multiclass task. In this paper we tried to build a CNN (Convolutional Neural network) model and also implemented 4 pre-trained cnn model to identify metastatic cancer cells from small image patches which was acquired from larger digital pathology scans.

3 3.1

Methodology Dataset Description

The dataset has been taken from Kaggle competition. The data is a slightly modiﬁed of the PatchCamelyon (PCam) [16] benchmark dataset. In the original dataset the data were duplicate due to probabilistic sampling however the dataset we are working does not contain any duplicates. In the dataset a large number of small pathology images are given to classify. Here, id is used to represent the ﬁlenames and train labels.csv ﬁle provides the ground truth for the images in train folder. A positive label indicates that at least one pixel of tumor tissue is present in the middle 32 × 32 pixel section of a patch. However, tumor tissue on the outermost portion of the patch has no eﬀect on the label. This outside region allows fully-convolutional models without zero-padding to behave consistently when applied to a whole-slide image. Sample images from the data-set are shown In Fig. 1. There are total 220,025 data samples in the training set where 130,908 data are labeled as 0 (no tumor tissue) and 89,117 data are labeled as 1 (has tumor tissue). Along with this 57458 data samples are found in testing set. It is worth mentioning that all of these data are unique. 3.2

Data Pre-processing

For our experimental purpose, we took only 160,000 random data samples for training set which is later divided into the train set and validation set having 144,000 and 16,000 random data samples respectively. For our experimental workings, we took all the images in the size of 96 × 96 pixels in the RGB channel. Data normalization is important for a uniform pixel value distribution and makes the convergence faster. To normalize the values of pixels in between 0 to 1, pixel values were divided by 255.

170

D. Gupta et al.

Fig. 1. Sample images of the PCam dataset

3.3

Models

In this subsection, we have discussed brieﬂy about the pre-trained architectures that we have used in our experiment and also showed our proposed architecture. VGG-19: Our ﬁrst experimental model is VGG19 [13]. The VGG19 model is a variant of the VGG model that, in a nutshell, consists of 19 layers in total. These layers are as follows: 16 convolution layers, 3 completely connected levels, 5 MaxPool layers, and 1 softmax layer. During our experimental work we only took the feature extractor part of this model to extract the features in the image. A stacked convolutional layer of a fully connected (FC) layer was applied to the classiﬁer. A dropout value of 0.5 is applied to each of the 512 channels that make up the FC layer. Inception-V3: The second experimental model is Inception-v3 [15]. Inceptionv3 is the successor to Inception-v1, and it has 24M parameters. This pre-trained model accomplishes a level of accuracy that is considered to be state-of-the-art for the recognition of general objects that have 1000 classes. In the ﬁrst step, the model extracts general features from input images, and in the second step, it classiﬁes the images based on those features. Nevertheless, based on our previous model, we employed only the feature extraction portion of inception-v3 and two fully-connected (FC) convolutional layers to classify them. Both of the FC layers have a total of 1024 channels and a dropout value of 0.5 in each.

Classiﬁcation of Tumor Cell

171

Fig. 2. Our proposed architecture

DenseNet-201: In DenseNet architecture [5], each layer in the dense blocks takes the feature maps of the previous layers and gives it to the next layer. Feature maps gotten from diﬀerent layers are intertwined through concatenation. This dense connections gives the architecture a better pathway to archive signiﬁcantly ﬁner gradient-ﬂow. Here 201 represents the trainable layers with weights. Each dense layer has 1 × 1 and 3 × 3 convolution operation. Feature re-use ability of this architecture is quite eﬃcient since no compelling reason to relearn inessential feature maps. After extracting the feature maps using the base model densenet201, global-average pooling layer was used. Then one FC layer with 256 unit is given with batch normalization and a dropout of 0.5 placed before and after of the layer. ResNet-50: In ILSVRC 2015 image classiﬁcation, detection, and localization, ResNet was the winner. In the ResNet architecture, short connection is developed

172

D. Gupta et al.

and its job is without modifying the incoming feature from the former layer ﬁt the input to the next layer [4]. This removes the problem of Vanishing Gradient’s problem. To reduce the time complexity a bottleneck strategy is implemented where 1*1 convolution layer is given at the start and at the end of a network to lessen the number of parameters while not corrupting the performance of the architecture to such an extent. The base model ResNet50 was implemented like densenet201 with global-average pooling, batch normalization and dropout with one FC layer with 256 unit. Proposed Architecture: In the architecture there are main 3 convolutional blocks, which contains 3 convolution layer with relu activation function, 3 batch batch-normalization layer and lastly max pool layer. From Fig. 2, we can see the visual representation of the architecture. In the ﬁrst block there are 3 convolutional layers with 64 ﬁlters, in the second block there are 3 convolutional layers with 128 ﬁlters and ﬁnally in the thrid block there are 3 convolutional layers with 256 blocks. Convolutional layer takes ﬁlters and applies it to the values of pixel and extract feature maps. After every convolution layer there is a batch-normalization layer. To make the network more stable, By subtracting the batch mean and dividing by the batch standard deviation it normalizes the feature map that is taken from the previous layer [6]. It has also some regularization eﬀect. At the end of our every convolutional blocks, a max pool layer is placed. This layer takes the feature maps from it’s previous layer and takes the highest value from each small patch. It necessarily down samples the input and works better than average pooling. After the 3 convolutional blocks a dropout layer is added to make the generalization better and to address the over-ﬁtting problem [14]. Then comes the ﬂatten layer which takes the 2D matrix and makes it into a vector so that it can be given to the fully connected layer. There are three FC layers and the last one predicts the classiﬁcation task with the help of sigmoid function. Table 1. Classiﬁcation Report Model

Class

Precision(%) Recall(%) F1-Score(%) Support Val Accuracy(%) AUC(%)

VGG-19

no tumor tissue 94 has tumor tissue 97 avg 95

97 94 95

95 95 95

8000 8000 16000

95.38

98.91

Inception-v3

no tumor tissue 93 has tumor tissue 96 avg 94

96 93 94

95 94 94

8000 8000 16000

94.41

98.63

Resnet-50

no tumor tissue 92 has tumor tissue 97 avg 95

97 92 94

95 94 94

8000 8000 16000

94.44

98.74

Densenet-201

no tumor tissue 94 has tumor tissue 97 avg 95

97 94 95

95 95 95

8000 8000 16000

95.42

98.95

Proposed Model no tumor tissue 94 has tumor tissue 95 avg 95

95 94 95

95 95 95

8000 8000 16000

94.70

98.63

Classiﬁcation of Tumor Cell

173

Table 2. Comparison Of Computational Parameters Architecture

Total Computational Parameters

Inception-V3

122,224,546

VGG-19

45,717,570

ResNet-50

24,122,070

DenseNet-201

18,823,062

Proposed architecture 7,171,842

3.4

Experimental Setup

In this paper, we performed a classiﬁcation task on a version of the PatchCamelyon (PCam) dataset which classiﬁes whether a histopathology image has cancerous tissue or not. All the models were trained using the training set for 25 epochs. As we are performing binary classiﬁcation, binary cross-entropy is used for loss function. Validation and training batch size was set to 10. In order to minimize the loss function, Adam optimizer was used and 0.0001 was chosen as the learning rate. All experiment was done in Kaggle using GPU. As for the evaluation of models, we have chosen Accuracy, Precision, Recall, F1-score and AUC.

4

Result Analysis

From Table 1 it is clearly seen that the most dominating models are densenet201 and vgg-19 with 95.42% and 95.38% validation accuracy respectively.These models also have better precision, recall and F1-score than other models. The other models i.e. Inception-v3 and restnet-50 are also looking great with decent accuracy and AUC. Alongside these giant pre-trained models, our proposed model also has a comparable and better accuracy of 94.70% than the inception-v3 and restnet-50 with its less complex architecture. In Fig. 3a the training and validation loss curve denotes a small gap between the training loss and validation loss which remarks the better ﬁt of the model. From Table 1, it can be seen that the proposed model shows an identical precision, recall, and F1-score percentage to the best-performed models i.e., Densenet-201 and VGG-19, with a minute diﬀer in accuracy of 0.72% and 0.68% respectively. But compared to the number of computational parameters, Table 2 manifests a discernible diﬀerence between the proposed model and Densenet-201 and VGG-19 (Fig. 4).

174

D. Gupta et al.

Fig. 3. Performance Curve of Proposed Architecture

Fig. 4. Confusion Matrix of Proposed Architecture

5

Conclusion

In this paper, 4 pre-trained models were implemented and used transfer learning to identify tumor cells. The proposed model architecture with fewer computational parameters performs quite close to the pre-trained models. Among the 4 pre-trained models with tuning, DenseNet-201 and VGG-19 both performed the best. With our model, we achieved 94.70% accuracy and an AUC score of 98.70% which is quite close to DenseNet-201 and VGG-19 models performance. The limitation was acquiring most computational power by using well decorated GPU hardware system. Day by day medical image analysis is becoming famous for its eﬀectiveness. This kind of study can help our medical community to make more robust decisions and identify tumor cells leading to cancer eﬀectively for better health care.

Classiﬁcation of Tumor Cell

175

Acknowledgement. The authors Debashis Gupta, Syed Rahat Hassan, and Urmi Saha sincerely thank Dr. Renu Gupta for sharing knowledge on tumor tissue and validating the proposed model’s predictions. Additionally, the authors also possess their gratitude to Dr. Engr. Mohammed Sowket Ali for his continual supervision. Author contributions. Conceptualization and Methodology - Debashis Gupta, Urmi Saha, Syed Rahat Hassan, and Renu Gupta. Implementation Coding Debashis Gupta, Urmi Saha, and Syed Rahat Hassan. Resource Management Debashis Gupta, Renu Gupta and Urmi Saha. Writing - orginal draft preparation Debashis Gupta, Syed Rahat Hassan and Urmi Saha. Writing - Review and Editing -Renu Gupta, Mohammed Sowket Ali, and Debashis Gupta. All authors have read and agreed to the published version of the manuscript. Conflicts of Interest. The authors declare no conﬂict of interest.

References 1. Cancer. www.who.int/health-topics/cancer. Accessed 15 Jan 2020 2. What is cancer?. www.cancer.gov/about-cancer/understanding/what-is-cancer. Accessed 19 Jan 2020 3. Bakator, M., Radosav, D.: Deep learning and medical diagnosis: a review of literature. Multimodal Technol. Interact. 2(3), 47 (2018) 4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 5. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 6. Ioﬀe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) 7. Kassani, S.H., Kassani, P.H., Wesolowski, M.J., Schneider, K.A., Deters, R.: Classiﬁcation of histopathological biopsy images using ensemble of deep learning networks. arXiv preprint arXiv:1909.11870 (2019) 8. Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23(1), 89–109 (2001) 9. Lantang, O., Tiba, A., Hajdu, A., Terdik, G.: Convolutional neural network for predicting the spread of cancer. In: 2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), pp. 175–180. IEEE (2019) 10. Liang, Y., Yang, J., Quan, X., Zhang, H.: Metastatic breast cancer recognition in histopathology images using convolutional neural network with attention mechanism. In: 2019 Chinese Automation Congress (CAC), pp. 2922–2926. IEEE (2019) 11. Roy, K., Banik, D., Bhattacharjee, D., Nasipuri, M.: Patch-based system for classiﬁcation of breast histology images using deep learning. Comput. Med. Imaging Graph. 71, 90–103 (2019) 12. Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017) 13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

176

D. Gupta et al.

14. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overﬁtting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 15. Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 16. Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology. CoRR abs/1806.03962 (2018). arxiv.org/abs/1806.03962 17. Vo, D.M., Nguyen, N.Q., Lee, S.W.: Classiﬁcation of breast cancer histology images using incremental boosting convolution networks. Inf. Sci. 482, 123–138 (2019) 18. Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718 (2016) 19. Zheng, Z., Zhang, H., Li, X., Liu, S., Teng, Y.: ResNet-based model for cancer detection. In: 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), pp. 325–328. IEEE (2021)

Tumor-TL: A Transfer Learning Approach for Classifying Brain Tumors from MRI Images Abu Kowshir Bitto1

, Md. Hasan Imam Bijoy2(B) and Md. Jueal Mia2

, Sabina Yesmin1

,

1 Department of Software Engineering, Daffodil International University, Dhaka 1341,

Bangladesh [email protected] 2 Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh [email protected]

Abstract. An intracranial tumor is another name for a brain tumor, is a fast cell proliferation and uncontrolled bulk of tissue, and seems unaffected by the mechanisms that normally govern normal cells. The identification and segmentation of brain tumors are among the most common difficult and time-consuming tasks when processing medical images. MRI is a medical imaging technique that allows radiologists to see within body structures without requiring surgery. The information provided by MRI regarding human soft tissue contributes to the diagnosis of brain tumors. In this paper, we use several Convolutional Neural Network architectures to identify brain tumor MRI. We use a variety of pre-trained models such as VGG16, VGG19, and ResNet50, which we have found to be critical for reaching competitive performance. ResNet50 performs with an accuracy of 96.76% among all the models. Keywords: Brain Tumor · MRI · VGG16 · VGG19 · ResNet50 · Transfer Learning

1 Introduction Having millions of neurons cooperating, the brain is among the entire body’s most intricate systems [1]. When a tumor grows in the head, the weighted interior of the brain expands, causing damage to the brain. An intracranial neoplasm, often known as a brain tumor, is a disorder in which abnormal cells grow in human brain. Two types of brain tumors: malignant (cancerous) and benign (non-cancerous). Tumor that cause cancer might be primary tumors, metastases, or secondary tumors. When the DNA of normal brain cells has a flaw, it results in brain tumors. Cells in the body constantly split and die, only to be replaced by another cell. Modern cells are formed in several circumstances, but the old cells are not completely eliminated. These cells coagulate as a result, and they have the potential to form tumors. Brain tumors are frequently passed © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 177–186, 2023. https://doi.org/10.1007/978-3-031-34619-4_15

178

A. K. Bitto et al.

down through the generations. The most prevalent and powerful are gliomas [2]. Glioma detection at an early stage is critical for achieving the best treatment results [3]. Several diagnostic imaging techniques that provide vital information about the shape, size, area, and digestion system of brain tumors. While these modalities are used in combination to provide the most detailed information about brain tumors, MRI is considered the standard strategy due to its significant delicate tissue differentiation and broad accessibility [4]. As stated by to the World Health Organization’s categorization scheme for identifying brain tumors, Upwards of 120 varieties of brain tumors exist that vary in their origin, range, size, and features. The major goal of our research article is to figure out how to use transfer learning and deep learning to recognize this complicated human condition and calculate its percentage using picture data. Transfer learning is the technique of using the perception gleaned from a prepared demonstration to learn a new set of facts [2]. Analysts have presented numerous computational techniques for identifying and classifying brain tumors through brain MRI pictures ever since it was made possible to channel and stack meaningful images to the computer [3]. A Convolutional Neural Network is used to determine the kind and location of malignancies (CNN)based multitask classification is built. We used Kaggle and Figshare datasets for this study. A total of 7116 brain MRI scans have been divided into four categories: pituitary, meningioma, glioma, and no tumor, where the CNN pre-trained model will be used. Remind the paper as follows. Section 2 belongs to the literature review for MRI studies. Section 3 discusses the method for identifying brain tumors from MRI images. Section 4 represents and discusses the findings. The document’s 5 section, without a doubt, draws the whole study to a conclusion.

2 Literature Review Many papers, publications, and research projects focus on the detection and categorization MRI scans of brain tumors. A few of the work reviews that have been provided are included below. This study of Pereira et al. [1] described a CNN-based approach for segmenting brain tumors in MRI images. Compared to perceptions and a prior model, these models frequently include probability work. Pre-processing, classification by CNN, and post-processing are the three key parts of the approach. The BRATS 2013 and 2015 databases were used to test the suggested technique. The three most typical variety of brain tumors—glioma, meningioma, and pituitary tumors—are distinguished using a three-class classification system, according to Deepak et al. [2]. To abstraction properties from brain MRI data, the suggested categorization method uses deep transfer learning and a pre-trained Google Neural Network. The authors extracted features using a pre-trained VGG-16 model and a fine-tuned AlexNet, then categorized them using a support vector machine. The learning showed used 3D CNN engineering and was based on knowledge transmission. Those who used handcrafted engineering used metric for precision mentioned in the transfer learning-based calculations. Talo et al. [3] proposed a technique for automatically identifying abnormal and normal brain MR pictures. The ResNet34 model was utilized, and it accurately recognized brain anomalies in the brain by utilizing a PSO SVM classifier with a Kernel with a radial base. The experiments in

Tumor-TL: A Transfer Learning Approach

179

this paper were carried out using the Harvard Medical School MR dataset, and numerous sophisticated deep learning approaches for hyperparameter optimization were used. Ahuja et al. [4] proposed employing the superpixel approach to detect and segment brain tumors via transfer learning. For starters, MRI cuts are divided into three categories: particular, typical, LGG, and HGG. The proposed methodology consistently arranged 99.82% and 96.32% validation data precision using VGG-19 at epoch-6. The LGG and HGG MRI brain tumor images are segmented in the current arrangement to show the tumor. Tumor division is accomplished using the superpixel method. The tumor segmentation produces a normal dice file of 0.932. Future research should examine the method using a real-time patient database. To raise the average dice list, division organizations must support improvements in post-processing methods. Khan et al. [5] describe a fully automated profound learning system for multifunctional brain tumor classification that includes differentiating enhancement. The work’s quality was graded in three stages. To begin, differentiation extending using the edge-based interface; HE was used to prolong the image contrast of the tumor region in the preprocessing step. Furthermore, the use of general underlined the option of robust deep learning. Finally, the ELM classification was updated to categorize reported tumors into the appropriate group. With the aid of transfer learning, this study synthesized the features from two separate CNN models. The goal of combining two CNN models was always to create a more modern include variable with more data. The experiment was conducted on the BraTs datasets, and the outcomes revealed an improvement in precision (98.16%, 97.26%, and 93.40%, respectively, for the BraTs2015, BraTs2017, and BraTs 2018 datasets). For MR brain image categorization, Kaur et al. [6] compare separate pre-trained DCNN models using interchange learning. Using pre-trained DCNN modeling with interchange learning to provide a ceiling level of recognition rate proved successful. Out of all the models tested, the Alexnet show performed the best, with classification rates of 100%, 94%, and 95.92% for all data sets. They arose due to existing conventional and deep learning algorithms based on brain classification tasks. In contrast, the author describes next work that will concentrate on running models on frameworks with GPU-enabled capacity, which is anticipated to reduce computational overhead and investigate various fine-tuning techniques. Khan et al. [5] describe an automatic multimodal classification technique for brain tumor type categorization based on deep learning. Cancerous and noncancerous brain tumors exist. It has the potential to injure the brain, which might be fatal. This study used a direct distinguish upgrade technique modified with the help of histogram equalization. The extracted features from two different CNN models were done via exchange learning, and the integration was done. The goal of combining two CNN architectures was to provide more data to an underused highlight vector.

3 Methodology Our study’s primary objective is to create a method for identifying and classifying brain tumors from MRI scans. We must go through numerous phases to attain our aim, including dataset collecting, data preprocessing, model creation, etc. In Fig. 1, the functioning procedure is presented.

180

A. K. Bitto et al.

Fig. 1. MRI image classification according to the working technique.

3.1 Data Description We used Figshare and Kaggle to obtain brain tumor data of MRI images [7]. There are 7116 brain MRI pictures in this dataset, divided into four class: pituitary, meningioma, glioma, and no tumor. Colors have been applied to the photographs taken, and example data has been presented in Fig. 2.

Fig. 2. Sample dataset for (a) Pituitary, (b) Meningioma, (c) Glioma and (d) No Tumor.

3.2 Data Preprocessing Geometric alterations are used in data preprocessing procedures. We make the image pixels 220 * 220 for VGG19, VGG16, and ResNet50. All the images are of the same high quality. Based on picture alterations, the photographs were rotated, width shifted, height shifted, shear shifted, and horizontally flipped.

Tumor-TL: A Transfer Learning Approach

181

3.3 Model Implementation In this study, we used CNN based transfer algorithm for the brain tumor dataset. Transfer learning model relevant theory given below. Transfer Learning (TL): Transfer supervised machine learning approach in which a show made for an errand is used as the project focusing on a significant task [8, 9]. Given the enormous computation, it is a common method in computer vision and normal languages generating assignments. In computer vision, neural systems ordinarily point to identify edges within the first layer, shapes within the center layer, and task-specific highlights within the last-mentioned layers. The early and central layers are utilized in transfer learning, and the areas of the last-mentioned layer were retrained. It makes use of the named information from the errand it was prepared on. VGG-16: VGG-16 has 16 layers [10] and a homogeneous architecture, making it quite appealing. It is extremely similar to AlexNet in that it only includes 3x3 convolutions but a large number of filters. It may be taught for 2–3 weeks on 4 GPUs. On the other hand, it is now the most popular method to extract features from photos in the community. Transfer Learning can help to reach VGG. The model is pre-trained on a dataset, the parameters are adjusted for improved accuracy, and the parameter values can be used. VGG-19: The VGG19 architecture [11] consists of 5 convolutional blocks, that are implemented by 3 fully linked layers. An improved direct unit (ReLU) development is performed after each convolution, and a max-pooling technique is periodically utilized to minimize the spatial dimension. 2 totally connected layers with 4,096 ReLU enacted units are currently using the ultimate 1,000 entirely softmax layer. Convolutional components include extraction layers, which can be thought of as a subset of them. The actuation maps created by these layers are the bottleneck features. ResNet-50: A Residual Neural Network is known as “ResNet50” [12]. With 50 layers ResNet-50 is a convolutional neural network (CNN). Resnet50 is a variant of the ResNet that can operate with up to 50 neural network layers. ResNet50 is a ResNet model that has 48 Convolution layers, one MaxPool layer, and one Normal Pool layer. There are 3.8*109 coasting actions in al. The ResNet-50 [13] has roughly 23 million trainable features.

3.4 Performance Calculation We used test data to evaluate the algorithms’ efficiency after training. Here are a few of the metrics that have been generated for performance review. We discovered the most effective model to forecast in this situation using these criteria. Based on the confusion matrix that the model provides, numerous percentage performance measurements have been created using Eqs. (1–7). Accuracy =

True Positive + True Negative × 100% Total Number of Images

True Positive Rate (TPR) =

True Positve × 100% True Positive + False Negative

(1) (2)

182

A. K. Bitto et al.

True Negative × 100% False Positive + True Negative False Positive × 100% False Positive Rate (FPR) = False Positive + True Negative False Negative False Negative Rate (FNR) = × 100% False Negative + True Positive True Positive × 100% Precision = True Positive + False Positive Precision × Recall × 100% F1 Score = 2 × Precision + Recall True Negative Rate (TNR) =

(3) (4) (5) (6) (7)

4 Results and Discussions The 5712 tumor train images and 1404 validation images were split through 80:20 groups. Intel Core i7 CPU with 16 GB of RAM powers the observing platform. All input pictures for the VGG-16, VGG-19, and ResNet-50 models were scaled to 220 * 220, 220 * 220, and 220 * 220, respectively. Our research used those models to scale photos to 220 * 220 pixels. The weights of the pre-trained these models were employed. The resultant confusion matrix for each available model (TP, FN, FP, TN) is shown in Table 1 with four classes. Table 1. Confusion matrices as applied transfer learning with four class. Model

Class

TP

FN

FP

TN

VGG-16

Pituitary

818

48

23

515

Meningioma

778

43

18

565

Glioma

723

32

23

626

No Tumor

671

53

8

672

Pituitary

840

55

14

495

Meningioma

773

65

13

553

Glioma

670

54

12

670

VGG-19

Resnet-50

No Tumor

868

22

10

504

Pituitary

798

25

15

566

Meningioma

760

12

19

613

Glioma

745

36

26

597

No Tumor

723

34

21

626

We used 40 epochs with a batch size of 32 for VGG-16. We construct the confusion matrix and assess performance for each class when VGG-16 is completed. Figure 3 shows the accuracy graph and loss, while Table 2 appear the computed performance.

Tumor-TL: A Transfer Learning Approach

183

Fig. 3. Diagram for (a) VGG-16 accuracy and (b) VGG-16 loss on 40 epochs.

Table 2. Evaluation appraisal tables by class for VGG-16 Model

Class

Accuracy (%)

TPR (%)

FNR (%)

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

Vgg16

Pituitary

94.94

94.46

5.54

4.28

95.72

97.27

95.84

Meningioma

95.67

94.76

5.24

3.07

96.97

97.74

96.23

Glioma

96.09

95.76

4.24

3.53

96.47

96.92

96.34

No Tumor

95.66

92.68

7.32

1.17

98.83

98.82

95.66

For VGG-19, we utilized 40 epochs and a batch size of 32. When VGG-19 is completed, we create the confusion matrix from the model and evaluate each class’s performance. The accuracy graph and loss are shown in Fig. 4, and the computed performance is presented in Table 3.

Fig. 4. Diagram for (a) VGG-19 accuracy and (b) VGG-19 loss on 40 epochs.

184

A. K. Bitto et al. Table 3. Evaluation appraisal tables by class for VGG-19

Model

Class

Accuracy (%)

TPR (%)

FNR (%)

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

Vgg19

Pituitary

95.08

93.85

6.15

2.75

97.25

98.36

96.05

Meningioma

95.20

92.24

7.76

2.30

97.70

98.35

95.20

Glioma

95.31

92.54

7.46

1.75

98.25

98.24

95.31

No Tumor

98.19

97.53

2.47

1.95

98.05

98.86

98.19

Fig. 5. Diagram for (a) ResNet-50 accuracy and (b) ResNet-50 loss on 40 epochs.

We used 40 set of concepts and a batch size of 32 for ResNet-50. We generate the confusion matrix and evaluate each class’s performance when ResNet-50 is completed. Figure 5 shows the accuracy graph and loss, while Table 4 shows the computed performance. Table 4. Evaluation appraisal tables by class for ResNet-50 Model

Class

Accuracy (%)

TPR (%)

FNR (%)

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

ResNet-50

Pituitary

97.56

96.96

3.04

2.59

97.41

98.15

97.56

Meningioma

97.79

98.45

1.55

3.01

96.99

97.56

98.00

Glioma

95.59

95.39

4.61

4.16

95.84

96.63

96.01

No Tumor

96.09

95.51

4.49

3.24

96.76

97.18

96.34

The trained model will be assessed using the validation dataset in the proposed study. The training dataset for our model, which contained both the baseline and enhanced photographs, was made available as a learning option. After that, the model was verified

Tumor-TL: A Transfer Learning Approach

185

to confirm that it was correct. After being trained on the tumor illness dataset using all available model architectures, the model’s performance has been measured using test images. The pre-trained ResNet-50, VGG-16, and VGG-19 model weights were experimented with [14]. This was done to compare our model to other well-known, previously trained transfer learning networks. We examined which trained network was used to be the most appropriate for this dataset. Three distinct models are VGG-16, VGG-19, and ResNet-50, present in Table 5. Table 5. Final accuracy table for the computed performance of applied transfer learning. Model

Accuracy (%)

TPR (%)

FNR (%)

FPR (%)

TNR (%)

Precision (%)

F1 Score (%)

VGG-16

95.59

94.415

5.585

3.0125

96.9975

97.6875

96.01

VGG-19

95.95

94.04

5.96

2.19

97.81

98.45

96.19

ResNet-50

96.76

96.58

3.42

3.25

96.75

97.38

96.978

5 Conclusion With millions of cells interacting, one of the organs is the brain of the entire body’s highly complex systems. When a brain tumor grows in the head, the pressure inside the brain rises, causing the brain to be damaged. An intracranial neoplasm, often known as a brain tumor, is a disorder in which abnormal cells grow in the human brain. Using data from Figshare and Kaggle, this study describes the detection attempts of transfer learning and deep extraction of features on brain tumor MRI recognition. Using three well-known deep CNN architectures, Vgg16, ResNet50, and VGG19, deep feature extraction and transfer learning are carried out. Due to the vast number of example pictures, the collected dataset is chosen in experimental work. ResNet50 has the highest accuracy of all the models, with a 96.76% detection rate for brain tumor MRI. In the future, we intend to use several CNN architectural approaches to improve detection accuracy.

References 1. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016) 2. Deepak, S., Ameer, P.M.: Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 111, 103345 (2019) 3. Talo, M., Baloglu, U.B., Yıldırım, Ö., Rajendra Acharya, U.: Application of deep transfer learning for automated brain abnormality classification using MR images. Cogn. Syst. Res. 54, 176–188 (2019) 4. Ahuja, S., Panigrahi, B.K., Gandhi, T.: Transfer learning based brain tumor detection and segmentation using superpixel technique. In: 2020 International Conference on Contemporary Computing and Applications (IC3A), pp. 244–249. IEEE (2020)

186

A. K. Bitto et al.

5. Khan, M.A., et al.: Multimodal brain tumor classification using deep learning and robust feature selection: a machine learning application for radiologists. Diagnostics 10(8), 565 (2020). https://doi.org/10.3390/diagnostics10080565 6. Kaur, T., Gandhi, T.K.: Deep convolutional neural networks with transfer learning for automated brain image classification. Mach. Vis. Appl. 31(3), 1–16 (2020). https://doi.org/10. 1007/s00138-020-01069-2 7. Nickparvar, M.: Brain tumor MRI dataset. Kaggle (2021). https://www.kaggle.com/datasets/ masoudnickparvar/brain-tumor-mri-dataset. Accessed 24 Mar 2022 8. Mia, J., Bijoy, H.I., Uddin, S., Raza, D.M.: Real-time herb leaves localization and classification using YOLO. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7 (2021). https://doi.org/10.1109/ICCCNT 51525.2021.9579718 9. Krishna, R., Menzies, T.: Bellwethers: a baseline method for transfer learning. IEEE Trans. Softw. Eng. 45(11), 1081–1105 (2018) 10. Alippi, C., Disabato, S., Roveri, M.: Moving convolutional neural networks to embedded systems: the alexnet and VGG-16 case. In: 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 212–223. IEEE (2018) 11. Mateen, M., Wen, J., Song, S., Huang, Z.: Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry 11(1), 1 (2018) 12. Theckedath, D., Sedamkar, R.R.: Detecting affect states using VGG16, ResNet50 and SEResNet50 networks. SN Comput. Sci. 1(2), 1–7 (2020) 13. Bitto, A.K., Mahmud, I.: Multi categorical of common eye disease detect using convolutional neural network: a transfer learning approach. Bull. Electr. Eng. Inform. 11(4), 2378–2387 (2022). https://doi.org/10.11591/eei.v11i4.3834 14. Hasan, S., Rabbi, G., Islam, R., Imam Bijoy, H., Hakim, A.: Bangla font recognition using transfer learning method. In: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 57–62 (2022). https://doi.org/10.1109/ICICT54344.2022.985 0765

Deep Convolutional Comparison Architecture for Breast Cancer Binary Classification Nasim Ahmed Roni1(B) , Md. Shazzad Hossain2 , Musarrat Bintay Hossain3 , Md. Iftekharul Alam Efat4 , and Mohammad Abu Yousuf5 1 Institute of Information Technology (IIT), Jahangirnagar University, Savar, Dhaka 1342,

Bangladesh [email protected] 2 Daffodil International University, DIU Rd, Dhaka 1341, Bangladesh [email protected] 3 Changsha University of Science and Technology, Hunan, China 4 Institute of Information Technology (IIT), Noakhali Science and Technology University, University Rd, Noakhali 3814, Bangladesh 5 Institute of Information Technology (IIT), Jahangirnagar University, Savar, Dhaka 1342, Bangladesh [email protected]

Abstract. Early discernment of breast cancer can significantly improve the prospect of successful recovery and survival, but it takes a lot of time that frequently leads to pathologists disagreeing. Recently, much research has tried to develop the best breast cancer classification models to help pathologists make more precise diagnoses. Consequently, convolutional networks are prominent in biomedical imaging because they discover significant features and automate image processing. Knowing which CNN models are optimal for breast cancer binary classification is crucial. This work proposed architecture for finding the best CNN model. Inception-V3, ResNet-50, VGG-16, VGG- 19, DenseNet-121, DenseNet169, DenseNet-201, and Xception are analyzed as classifiers in this paper. We have examined these deep learning techniques on the breast ultra-sound image dataset. Due to limited data, a generative adversarial network is used to improve the algorithm’s precision. Several statistical analyses are used to determine the finest convolutional technique for premature breast cancer detection using improved images in binary class scenarios. This binary classification experiment evaluates each strategy across various dimensions to determine what aspects improve success. In both normalized and denormalized conditions, the Xception maintained 95% accuracy. Xception uses the complete knowledge-digging technique and is highly advanced. Therefore, the accuracy is considered to be better than that of others. Keywords: Breast Cancer · Binary Classification · Convolutional Network · Generative Adversarial Network · Data Augmentation · Xception

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 187–200, 2023. https://doi.org/10.1007/978-3-031-34619-4_16

188

N. A. Roni et al.

1 Introduction Breast cancer is a complicated and multidimensional disease with a wide range of risk factors [1], histological findings, clinical manifestations, and treatment choices [2]. Breast cancer is characterized by the unrestrained growth of malignant cells in the mammary epithelial tissue and is an illness that affects both men and women. Breast cancer is the most common cancer in women in the world, with a prevalence that rises with age [3]. It is the second largest cause of death for women after lung cancer [4]. Recent medical discoveries have been produced in new and improved techniques to prevent, diagnose, categorize, and treat breast cancer [5]. One of the most successful ways to automatically detect and diagnose diseases at an early stage is to use a computerbased diagnostic (CAD) tool for medical imaging. CAD binary classification methods use intelligent approaches to identify breast results as benign or cancerous automatically. These imaging techniques in medicine may be useful in early breast cancer diagnosis [6]. As a result, due to the rising importance of predictive diagnosis and therapy, there is a growing trend in cancer detection and classification research to apply machine learning algorithms for projection and prognosis [7]. The use of histopathology data to classify breast abnormalities has grown in popularity in recent years. An independent classifier is used to build a set of hand-crafted attributes in classic machine learning-based CAD systems. The extraction of features from histopathology images is critical [8]. Because histopathological images include various aspects, it is challenging to determine the particular features due to pathologists’ lack of experience in recognizing specific needs. The majority of the time, patterns, fractals, colors, and intensity levels can be used to identify images. Manual feature extractions are time-consuming and can need in-depth prior knowledge of the disease in order to find highly representative characteristics [9]. As processing power and massive amounts of data become more readily available and examined during the learning process, the number of studies using machine learning has increased. Because of its ability to learn from raw data, machine learning is becoming increasingly popular for dealing with simple and complex situations. However, the correct machine learning models can reduce diagnostic error rates [10]. Massive volumes of complex data may now be examined and comprehended using new machine learning techniques [11]. To circumvent the drawbacks of conventional machine learning techniques, deep learning was created to efficiently employ the significant data that may be extracted from raw images for categorization techniques [12]. Deep learning uses the generalpurpose learning technique to do away with the necessity for manual feature tuning. Deep learning utilizing convolutional neural networks has made a lot of advancements in the field of medical image analysis recently, including the classification of mitotic cells from microscopic images and the identification of tumors [13]. With large data sets, the convolutional neural networks application performs admirably, but poorly with smaller data sets. With encouraging findings, several research teams have investigated the application of convolutional networks in medical image processing [14]. Convolutional networks have been used to overcome the difficulty of identifying and classifying tumors in ultrasound images [15]. Furthermore, several recent studies evaluated the feasibility of [16]

Deep Convolutional Comparison Architecture for Breast Cancer

189

CNN’s, AlexNet, U-Net, VGG16, VGG19, ResNet18, ResNet50, MobileNet-V2, and Xception for the problem of classifying Ultrasound images as benign or malignant [17]. The purpose of this article is to compares the performance of pre-trained deep learning models in order to determine whether convolutional techniques are best for breast cancer binary classification and also the best performing model from a large number of alternatives, as well as to propose a mechanism for differentiating them. Eight pretrained CNN architectures are used to train these semantic segmentation models on the Ultrasound image dataset. The following will be how the remaining paper is constructed. Section 2 represents the previous study of breast cancer classification. Section 3 contains data preprocessing techniques, feature extraction, and methodology part. Section 4 encompasses the dataset description, experimental assessment, and discussion. Finally, Sect. 5 provides the conclusion and future direction.

2 Literature Review In recent years, numerous breast cancer classification strategies have been proposed, with CNN models which received a lot of research interest [18]. For cancer detection and classification, various machine learning and deep learning algorithms are available. Convolutional Neural Networks, Recurrent Neural Networks, and pre-trained models like Alex Net, Google Net, VGG16, VGG19, ResNet50, InceptionV3, DenseNet121, DenseNet 169, DenseNet201, and Xception are some of the most used deep learning approaches for breast cancer classification [19]. Using the idea of transfer learning, SanaUllah et al. developed a CNN-based framework that can recognize and binary categorize breast cytology images. Their proposed framework reached 97.52% accuracy [20]. Similarly, they also offered data augmentation to expand the size of a data set and improve the efficiency of CNN algorithms. Author Gupta et al. use SVM and Logistic Regression to compare deep feature extraction and classification performance [24]. Regarding the standard breast cancer dataset, the proposed study outperforms previous strategies and produces state-of-the-art results. Hong Fang et al. proposed an improved multilayer perceptron for breast cancer detection [21]. In contrast, Saeed Talatal et al. proposed a Multilayer Perceptron Neural Network-Based Intelligent Collective Classification method (IEC-MLP) with an average accuracy of 98.74% for breast cancer detection [5]. The suggested approach is composed of two components: parameter optimization and ensemble classification. Esraa et al. demonstrated a completely automated breast cancer diagnosis technique with a 99.33% accuracy rate utilizing a U-Net to determine the breast region from thermal images and a deep learning method to evaluate abnormal breast tissues using thermal images [22]. Mohammed Abdullah et al. created a DCNN classifier model based on InceptionV3 and V4 for breast cancer detection to study the behavior of many modern deep learning techniques for breast cancer diagnosis [23]. The results showed that employing color thermal imaging, DCNN Inception V4, and updated Inception MV4 considerably increased their accuracy and efficiency in detecting breast cancer. Karan Gupta et al. used a deep learning model to automatically categorize breast cancer images that relied on traditional classifiers’ pre-trained (CNN) activation properties [6].

190

N. A. Roni et al.

The difficulty in gathering sufficient positive cases and the problems in developing breast cancer binary classification algorithms make it challenging to tackle overfitting issues, as is the case with many machine learning applications in healthcare [26]. Numerous subsequent papers used generative adversarial networks (GANs) for data augmentation [27]. In this instance, the training dataset can be improved by using the generative adversarial network (GAN) [28]. The authors Shuyue et al. devised a technique for detecting breast cancer using artificial mammograms that included the use of GANs as an image enhancement technique, achieving a validation accuracy of 79.8% [29]. To enhance the classification model’s performance, Asha et al. devised a discriminating robustness approach to increase accuracy and improve the classification model’s performance [30]. In this paper, they compare the performance of numerous CNN + Traditional Classifier configurations, such as VGG-16 + SVM, VGG-19 + SVM, Xception + SVM, and ResNet-50 + SVM. According to the researchers, the ResNet50 network had a maximum accuracy of 93.27%, according to the researchers. Aditya et al. suggested a Modified VGG (MVGG) model based on transfer learning to diagnose breast cancer in mammography images [25]. According to the trials, the suggested transfer learning combination of MVGG and ImageNet achieves a 94.3% accuracy, and other convnets are outperformed by the suggested hybrid network. The prior background investigation revealed that other authors had previously employed several approaches with varying degrees of success. It assessed their work and uncovered several problems, prompting us to take action. Everyone has encountered a few algorithms, but none have encountered as many as we have. Because convolutional networks outperform previous models, the suggested comparison performance model will aid pathologists in accurately diagnosing breast cancer at an early stage. Additionally, data augmentation approaches are rarely employed in low-data scenarios, obviating the need to address issues such as feature forecasting. As a result, this paper gives a comparative analysis of breast cancer binary classification utilizing Convolutional Networks. We employed eight pre-trained CNN models and the GAN augmentation technique to extract and forecast attributes from ultrasound images for the purpose of classifying benign and malignant lesions.

3 Methodology The workflow is divided into five components and prepared for binary classification in Deep Convolutional Comparison Architecture (DCCA), which is described in Fig. 1. To illustrate, the first stage of this suggested comparison approach. The pre-processing data stage begins with preparing the dataset for pre-processing, and then augmenting the data using the Generative Adversarial Networks (GANs) architecture. By this, the preprocessing data stage was completed. Ian Good-fellow and his colleagues first proposed GANs technique in June of 2014 [31]. Nevertheless, GANs is a data enrichment tool that pits two neural networks against each other to create new, synthetic data instances that can pass for actual data. They’re commonly utilized in image, video, and voice generation. The experimental environment was built up, and networks were employed to build the data for classification through feature extraction and selection. The model training phase began once all of the features

Deep Convolutional Comparison Architecture for Breast Cancer

191

were identified and selected; however, eight different convolutional techniques were utilized to categorize the data in the experimental setting. Finally, the model assessment step determines the best binary classification result. The complete architecture is displayed in Fig. 1.

Fig. 1. Deep Convolutional Comparison Architecture (DCCA)

3.1 Data Pre-processing Using GAN The amount of the training dataset heavily influences deep learning models’ performance. As a result, strategies for increasing dataset cardinality, such as data augmentation, are crucial. By addressing the issue of channel over-fitting, data augmentation enhances network performance. In this study, GANs data augmentation tactics improve the generalizability of the fit model. A generator and discriminator network were built to provide GAN augmentation using the architecture. A noise vector is sent into the generator as input. It creates augmented data, which are then supplied to the discriminator, along with actual data, to identify which distribution the samples came from. On the other hand, the generator’s purpose is to learn the accurate distribution without seeing it, such that its output is indistinguishable from actual samples. Both networks are trained simultaneously and in opposite directions until an equilibrium is attained. For x Rd , y = P data (x) is a depict from x to actual data y in d- dimensional space. To model this mapping, a neural network dubbed the generator G. Sample y is genuine if it comes from Pdata; sample z is synthetic if it comes from G. The discriminator D is a neural network that determines whether or not a sample is genuine. (y) = 1, D(z) = 0 represents the absolute situation. The G and D are the two neural networks that constitute the GAN. The corresponding loss function of a two-player mini-max game is used to train these adversarial networks: min max V (G, D) = E{log D[Pdata(x)]} + E(log{1 − D[G(x)]})

(1)

192

N. A. Roni et al.

For G(x) = p data (x), there is a global optimal solution to this min-max issue (x). The goal is to determine the distribution of reliable data. When D(y) = D(z) = 0.5, the discriminator D can no longer tell the difference between a genuine and a manufactured sample. By adjusting the input x, G can be used to make artificial samples. The input x for G in this investigation was a noise vector with 100 attributes from a Gaussian mixture N (0, 1). It’s vital for a well-trained GAN to be able to create data samples that appear to be real by using noise vectors. Using the exception of the output layer, the generator network was designed with four up-sampling layers and five convolutional layers. The ReLU activation function is used in every layer, whereas the tanh function is used at the output layer. The generator’s purpose is to create a 229 229 3 picture from a 100-length vector. In contrast, the discriminator takes a 229 229 3 picture as input and outputs a value between 0 and 1, indicating whether the image is augmented or not. The discriminator network is constructed with four conv layers with max- pooling layers and one fully connected layer, just like a standard CNN. Figure 2 shows a sample output of augmented benign and malignant classes picture after processing the data via this GAN network. 3.2 Feature Extraction, Selection and Classification In this study, eight different deep convolutional methods are used to extract, select, and classify features, enabling the input image to flow ahead until it reaches a pre-specified layer, and then using the layer’s outputs as the outcome feature. The pre-trained network acts as an arbitrary feature extractor and selector. DenseNet: It’s a cutting-edge CNN model for visual object recognition that requires fewer parameters to attain cutting-edge performance. DenseNet is very similar to ResNet in terms of the preceding layer output being mixed with a prospective layer using Dense Net’s concatenated (.) characteristics. On the other hand, ResNet uses an incremental attribute (+) to integrate the result of prior layers with the output of subsequent layers. The DenseNet Architecture seeks to solve this problem by firmly connecting all tiers. The architectures DenseNet- 121, DenseNet-169, and DenseNet-201 were used in this study. VGG: The VGG Net architecture allows for exceptional accuracy performance. The architecture of the Visual Geometry Group is categorized into six categories. The architecture includes a layer of repeated convolution and pooling. VGG-19 Net has 19 layers, including 16 conv and three fully connected (FC) layers, whereas VGG-16 Net has just 13 conv levels and three layers. According to a deep-structure VGG-Net, the network’s depth is essential for good performance. Xception: This Xception architecture uses a depth-wise separable sequential array of convolution layers with residual blocks. The purpose of extensively detachable convolution is to cut down on processing time and memory usage. The 36 conv layers of Xception are divided into 14 components. In Xception, separable convolution helps in the resolution of issues such as fading gradients and representational bottlenecks. A channel in the sequential network separates channel-wise and space-wise features learning. Instead

Deep Convolutional Comparison Architecture for Breast Cancer

193

of being concatenated, this shortcut connection uses a summing operation to make the outcome of the previous layer can be used as an input to the final layer. Inception V3: There are 42 layers in the Inception module. The InceptionV3 module from Google Brain comprises 159 layers and is the third iteration of the Inception module. The Inception module’s main idea is to mix small and big kernels with learning multi-scale interpretations while keeping the computational cost and parameter count to a minimum. ResNet-50: The idea of a residual block was first thought of by ResNet, which is a deep residual learning network. The first block’s input is connected to the second block’s output through residual blocks. This strategy enables the residual block to acquire knowledge about the residual function without inflating the parameters. A conv layer, 48 residual blocks, and a classifier layer with eleven to thirty-three tiny filters make up the 50-layer residual block known as ResNet50.

4 Result Analysis and Discussion This study employed several statistical criteria to evaluate the techniques, including accuracy, precision, recall, and F1 score. A classification confusion matrix is available for both normalized and non-normalized data. To aid comprehension, graphs of model accuracy and loss function are provided.

Fig. 2. Augmented Data in Various Epochs

4.1 Dataset For this investigation, we used the Breast Ultrasound Image Dataset, which was also used for binary classification. The Breast Ultrasound Image Dataset is a freely accessible dataset found on Kaggle [32]. Walid et al. constructed the dataset in 2018 [33]. At the

194

N. A. Roni et al.

baseline, the original dataset had 780 images, including ultrasound images of women between the ages of 25 and 75. The classification is based on groups of 306 benign and 294 malignant images, respectively. The enhanced data at various epochs utilizing the GAN augmentation approach is shown in Fig. 2. 4.2 Experimental Setup In this experiment, data passed through eight algorithms to extract and pick features, then used the retrieved features to train the convolutional model. Before the feature extraction and selection stages, as well as the classification phase, some experimental setup is done to ensure that the pipeline runs smoothly. Finally, assess the experimental findings and choose the model that best fits the data. a. Workstation: The work is done on the premium virtual engine Collaboratory. b. Packages and Libraries: The following libraries were used in this experiment: – – – –

NumPy version 1.21.5. TensorFlow version 2.8.0. Colab GPU NVIDIA Tesla K80 Keras version 2. x

4.3 Classification Result and Confusion Matrix Data augmentation with GAN produced 1200 images total, of which 600 were (benign + malignant) and 600 were (normal) for use in model training in this study. For the training test, we split the data into an 80:20 ratio. The results are based on 240 test images that were not used during the training period. Table 1 summarizes the accuracy, precision, recall, and F1 score of the gathered findings for the Binary classification. Table 1 shows that DenseNet-121, and DenseNet-169 both had an accuracy of 92%, but Xception had the highest accuracy of 95%, with Xception having the highest precision of 97%. DenseNet-121, on the other hand, earns the highest score of 98%, and they all finish up with a 95% F1 score. The same analysis is performed for the Malignant class, and it can be observed that while the accuracy for DenseNet-121, DenseNet-169, and Xception is still 92%, the precision is now 90% for DenseNet-121, 88% for F1 score, and 84% for Xception. In this case study, the confusion matrix is vital to emphasize since it will show the fundamental importance of data processing strategies. Figure 3 shows the performance evaluation of classifying results for normalized data. The actual result in each model is greater in normalized data. However, it is less effective in non-normalized data. By doing a little digging, we can observe that in each case, Xception, DenseNet-169, DenseNet121, and VGG-16 came out with a high degree of accuracy in determining the real outcome. 4.4 Model Accuracy, Loss Function, and ROC Graph Analysis To understand the offered model performance, graphs are crucial to end the result analysis. The Xception model consistency on both the training and testing phases is considerably superior to the others, as seen in Fig. 4, where DenseNet-121 and DenseNet-169 have

Deep Convolutional Comparison Architecture for Breast Cancer

195

Table 1. Results of Binary Classification Model

Accuracy

Precision

Recall

F1 Score

VGG-16

0.91

0.91

0.98

0.94

VGG-19

0.88

0.91

0.95

0.93

DenseNet-121

0.92

0.92

0.98

0.95

DenseNet-169

0.92

0.94

0.96

0.95

DenseNet-201

0.81

0.96

0.78

0.86

Xception

0.95

0.97

0.93

0.95

ResNet-50

0.79

0.90

0.82

0.86

Inception-V3

0.88

0.91

0.95

0.93

higher training accuracy but lower testing accuracy. In this scenario, the performance of the VGG-16 is also considerable. In Fig. 5, it can be seen that while employing the categorical loss function, various abnormalities in the loss function graph occurred, with the most incredible consistency of loss in VGG-16 and Xception, but more high and low amounts of losses in the other cases. Finally, the ROC graph analysis finished the scenario in Fig. 6. This case study revealed that the Xception model ROC is the highest, implying that the Xception model is more resilient than the other given models in this binary classification case study. It contained the peak position in each dimension and produced the best-performing model. In this binary classification scenario, the Xception model beat the DenseNet121, DenseNet-169, and VGG-16 models, although the DenseNet-121, DenseNet-169, and VGG-16 models did poorly. Furthermore, it is vital to note that the balance between convolutional layers and residual connections is critical in dealing with the classification issue. The detachable convolution channel of the sequence network in Xception combines channel-wise and space-wise feature learning to aid in the resolution of difficulties like fading gradients and representational limits. Instead of concatenating, the alternative model focuses on attribute management, deep structural analysis, computational cost, and parameter explosion. 4.5 Discussion In this binary classification scenario, the Xception model beat the DenseNet-121, DenseNet-169, and VGG-16 models, although the DenseNet-121, DenseNet-169, and VGG-16 models did poorly. Furthermore, in dealing with the issue of classification, it is vital to note that the balance between convolutional layers and residual connections is critical. The detachable convolution channel of the sequence network in Xception combines channel-wise and space-wise feature learning to aid in the resolution of difficulties like fading gradients and representational limits. Instead of concatenating, the alternative

196

N. A. Roni et al.

model focuses on attribute management, deep structural analysis, computational cost, and parameter explosion.

Fig. 3. Confusion Matrix of Data with Normalization

Fig. 4. Model Accuracy Graphs of Presented Model

The DenseNet design may maximize the residual mechanism by having each layer tightly connected to the ones below it, based on the layering activity of these models. The model’s compactness makes it non-redundant because the learned feature is shared through community knowledge. Convolutions, average pooling, max pooling, dropouts, and fully connected layers train densely linked deep networks with implicit deep supervision, allowing the gradient to flow back more quickly due to fast connections. Convolutions, average pooling, max pooling, dropouts, and fully linked layers,

Deep Convolutional Comparison Architecture for Breast Cancer

197

Fig. 5. Model Loss Graphs of Presented Model

Fig. 6. ROC Graphs of Presented Model

on the other hand, are symmetric and asymmetric building components in the Inception model. It primarily uses factorizing Convolutions to reduce the number of connections and parameters to train, resulting in faster processing and better results. As a result, it serves as an optimized picture classification booster. ResNet works similarly, layering all of these blocks exceedingly profoundly and regularly. In addition, the VGG approach uses

198

N. A. Roni et al.

both independent and completely connected Conv layers, resulting in a computationally costly strategy with a lower error rate than others. Xception, an extreme version of Inception using depth-wise separable convolution, is even better than Inception V3. In Xception, depth-wise convolution is channel-wise n*n spatial convolution, and point-wise convolution is 1 1 convolution to change the dimension. Xception employs the most detailed and indepth knowledge-digging technique to extract explicit features from an image. Its residual network aids in achieving an optimal learning rate, making it the most efficient and best-fitting of all the convolutional models presented. Finally, despite the model’s strength, it can be reasonably said that it is rather lightweight. Here, the emphasis is on efficiency in data training. It is bias-free and has reached a new level using transfer learning and GAN data augmentation approaches. We can see from earlier research that their model has more accuracy than ours. Furthermore, the weaknesses of those models are frequently a result of the usage of typical machine learning models, biases, and data augmentation methods like shifting and rotating. Other factors may also make the model less apparent. As a result, even though our model is not extremely accurate, it is still better than others.

5 Conclusion and Future Work This study compared eight different convolutional models to discover which one outperformed in all possible case studies. The fittest model was determined by comparing each model’s performance in normalized and de-normalized forms of an ultrasound picture dataset, with the Xception model outperforming each dimensional case study. In the binary categorization of breast cancer, the model is the most appropriate and consistent. The use of the GAN architecture to pre-process the dataset enhanced the performance of each model, implying that data processing approaches are helpful in reaching the intended result. This also aids in determining which convolutional model outperforms binary breast cancer classification scenarios using fewer images. In addition, working with biomedical data is difficult due to its insufficiency, and more or less envisioning work necessitates the use of augmented data and data pre-processing terminology to extract the appropriate parameters. Above, the contribution is made with the idea of using deep learning methods rather than hybrid methods to create a gateway with advanced data pre-processing techniques that can generate forecast images to teach models with different dimensional parameters and select suitable features from those to train deep learning models, as well as enrich the transfer learning process by which it will take biomedical imaging ideas to multidimensional or correlational grounds. In the future, we will quantify breast cancer severity to develop a system that will represent an automated version of the BI-RADS severity measurement scale.

References 1. Polyak, K.: Heterogeneity in breast cancer. J. Clin. Investig. 121(10), 3786–3788 (2011). https://doi.org/10.1172/JCI60534

Deep Convolutional Comparison Architecture for Breast Cancer

199

2. Lopez-Garcia, M.A., Geyer, F.C., Lacroix-Triki, M., Marchió, C., Reis-Filho, J.S.: Breast cancer precursors revisited: molecular features and progression pathways. Histopathology 57(2), 171–192 (2010).https://doi.org/10.1111/j.1365-2559.2010.03568.x 3. Lukong, K.E.: Understanding breast cancer–the long and winding road. BBA Clin. 7, 64–77 (2017) 4. Zardavas, D., Irrthum, A., Swanton, C., Piccart, M.: Clinical management of breast cancer heterogeneity. Nat. Rev. Clin. Oncol. 12(7), 381–394 (2015) 5. Talatian Azad, S., Ahmadi, G., Rezaeipanah, A.: An intelligent ensemble classification method based on multilayer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. J. Exp. Theor. Artif. Intell. 1–21 (2021) 6. Gupta, V., Vasudev, M., Doegar, A., Sambyal, N.: Breast cancer detection from histopathology images using modified residual neural networks. Biocybern. Biomed. Eng. 41(4), 1272–1287 (2021) 7. Makki, J.: Diversity of breast carcinoma: histological subtypes and clinical relevance. Clin. Med. Insights Pathol. 8, CPath-S31563 (2015) 8. Sanchez-Morillo, D., González, J., García-Rojo, M., Ortega, J.: Classification of breast cancer histopathological images using KAZE features. In: Rojas, I., Ortuño, F. (eds.) Bioinformatics and Biomedical Engineering. IWBBIO 2018. LNCS, vol. 10814, pp. 276–286. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78759-6_26 9. Miah, M.B.A., Yousuf, M.A.: Detection of lung cancer from CT image using image processing and neural network. In: 2015 International Conference on electrical Engineering and Information Communication Technology (ICEEICT), pp. 1–6. IEEE, May 2015 10. Xie, J., Liu, R., Luttrell IV, J., Zhang, C.: Deep learning based analysis of histopathological images of breast cancer. Front. Genet. 10, 80 (2019). X 11. Huma, F., Jahan, M., Rashid, I.B., Yousuf, M.A.: Wavelet and LSB-based encrypted watermarking approach to hide patient’s information in medical image. In: Uddin, M.S., Bansal, J.C. (eds.) Proceedings of International Joint Conference on Advances in Computational Intelligence. Algorithms for Intelligent Systems, pp. 89–104. Springer, Singapore (2021). https:// doi.org/10.1007/978-981-16-0586-4_8 12. Faruqui, N., Yousuf, M.A., Whaiduzzaman, M., Azad, A.K.M., Barros, A., Moni, M.A.: LungNet: a hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensorbased medical IoT data. Comput. Biol. Med. 139, 104961 (2021) 13. Shin, H.C., et al.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016) 14. Khatun, M.A., Yousuf, M.A.: Human activity recognition using smartphone sensor based on selective classifiers. In: 2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI), pp. 1–6. IEEE, December 2020 15. Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017) 16. Chen, Y., Jiang, H., Li, C., Jia, X., Ghamisi, P.: Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54(10), 6232–6251 (2016) 17. Gómez-Flores, W., De Albuquerque Pereira, W.C.: A comparative study of pre-trained convolutional neural networks for semantic segmentation of breast tumors in ultrasound. Comput. Biol. Med. 126, 104036 (2020) 18. Daoud, M.I., Abdel-Rahman, S., Alazrai, R.: Breast ultra- sound image classification using a pre-trained convolutional neural network. In: 2019 15th International Conference on SignalImage Technology Internet-Based Systems (SITIS), pp. 167–171. IEEE, November 2019

200

N. A. Roni et al.

19. Khuriwal, N., Mishra, N.: Breast cancer detection from histopathological images using deep learning. In: 2018 3rd International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–4. IEEE, November 2018 20. Khan, S., Islam, N., Jan, Z., Din, I.U., Rodrigues, J.J.C.: A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recogn. Lett. 125, 1–6 (2019) 21. Fang, H., Fan, H., Lin, S., Qing, Z., Sheykhahmad, F.R.: Automatic breast cancer detection based on optimized neural network using whale optimization algorithm. Int. J. Imaging Syst. Technol. 31(1), 425–438 (2021) 22. Mohamed, E.A., Rashed, E.A., Gaber, T., Karam, O.: Deep learning model for fully automated breast cancer detection system from thermograms. PLoS ONE 17(1), e0262349 (2022) 23. Al Husaini, M.A.S., Habaebi, M.H., Gunawan, T.S., Islam, M.R., Elsheikh, E.A., Suliman, F.M.: Thermal-based early breast cancer detection using inception V3, Inception V4 and modified inception MV4. Neural Comput. Appl. 34(1), 333–348 (2022) 24. Gupta, K., Chawla, N.: Analysis of histopathological images for prediction of breast cancer using traditional classifiers with pre-trained CNN. Procedia Comput. Sci. 167, 878–889 (2020) 25. Khamparia, A., et al.: Diagnosis of breast cancer based on modern mammography using hybrid transfer learning. Multidimension. Syst. Signal Process. 32(2), 747–765 (2021) 26. Wu, E., Wu, K., Lotter, W.: Synthesizing lesions using contextual GANs improves breast cancer classification on mammograms (2020). arXiv preprint arXiv:2006.00086 27. Sohan, K., Yousuf, M.A.: 3D bone shape reconstruction from 2D X-ray images using MED generative adversarial network. In: 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT), pp. 53–58. IEEE, November 2020 28. Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D.: Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2226–2234 (2018) 29. Guan, S., Loew, M.: Breast cancer detection using synthetic mammograms from generative adversarial networks in convolutional neural networks. J. Med. Imaging 6(3), 031411 (2019) 30. Pang, T., Wong, J.H.D., Ng, W.L., Chan, C.S.: Semi-supervised GAN- based radiomics model for data augmentation in breast ultrasound mass classification t. Comput. Methods Programs Biomed. 203, 106018 (2021) 31. Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014) 32. Breast Ultrasound Image Dataset. https://www.kaggle.com/datasets/aryashah2k/breast-ultras ound-images-dataset 33. Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data Brief 28, 104863 (2020)

Lung Cancer Detection from Histopathological Images Using Deep Learning Rahul Deb Mohalder1(B) , Khandkar Asif Hossain2 , Juliet Polok Sarkar2 , Laboni Paul1 , M. Raihan2 , and Kamrul Hasan Talukder1 1 Khulna University, Khulna, Bangladesh {rahul,khtalukder}@ku.ac.bd, [email protected] 2 North Western University, Khulna, Bangladesh

Abstract. Computed tomography (CT) is critical for identifying tumors and detecting lung cancer. As was the case in the recent past, we wish to incorporate a well-educated, profound learning algorithm to recognize and categorize lung nodules based on clinical CT imagery. This investigation used open-source datasets and data from multiple centers. Deep learning is a widely used and powerful technique for pattern recognition and categorization. However, because large datasets of medical images are not always accessible, there are few deep structured applications used in diagnostic medical imaging. In this research, a deep learning model was created to identify lung tumors from histopathological images. Our proposed Deep Learning (DL) model accuracy was 95% and loss was 0.158073%.

Keywords: Deep Learning Detection

1

· Classification · Lung Cancer · Tumor ·

Introduction

With 18% of all cancer-related deaths, lung cancer is the most common cancerrelated cause of death. Other cancer death rates are 9.4% for colon cancer, 8.3% for liver cancer, 7.7% for stomach cancer, and 7.7% to 6.8% for female breast cancer. There is a broad spectrum of disease development and therapy response among lung cancer patients. Additionally, lung cancer is becoming the biggest cause of mortality in the United States and Europe, surpassing heart disease, according to the European Medical Association (EMA), the World Health Organization (WHO), and the United States (US) Association [16]. As a result, perfectly diagnosis or detect is so critical for the decision and applying of each lung cancer patient’s treatment timely [27]. Multimodality is a prominent lung cancer therapy technique. However, the current survival rate for cancer patients ranges from 4 to 17%. In the early stages of lung cancer, c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 201–212, 2023. https://doi.org/10.1007/978-3-031-34619-4_17

202

R. D. Mohalder et al.

Fig. 1. Lung cancer [1].

a successful resection can be curative. Non-small cell lung cancer (NSCLC) patients undergoing resection had a 5-year survival rate of 75%–100% for stage IA NSCLC, but only 25% for stage IIIA NSCLC. However, conﬁrming its pathological status via biopsy is diﬃcult, particularly for tiny tumors in their early stages. The constraint may negatively impact clinical decision-making and management. Increasing use of computed tomography (CT), which enables extensive surgical dissection, has led to an increase in the detection of NSCLC at an early stage (Fig. 1). In a sizable screening population, the National Lung Screening Trial (NLST) compared low-dose computed tomography (LDCT) to chest radiography and found that LDCT reduced lung cancer-speciﬁc mortality by 20%. Conventional CT analysis is time-consuming and requires radiologist approval. CT-based lung cancer screening sometimes gives false-positive results. Due to CNNs’ accuracy and low dependency on human participation in other computer vision applications, interest in pulmonary nodule recognition and classiﬁcation has surged in recent years. Cancer treatment requires microscopic diagnosis. Pathologists must ﬁnd minute histopathological features in complex tumor tissue. This approach is time consuming, critical, and results in considerable variation between and among observers [4,7]. The most often used staining method is hematoxylin and eosin [2] as technology progresses, H&E-stained whole slide imaging (WSI) is becoming a regular medical practice, resulting in a massive volume of high-resolution pathology images. Digital pathology is currently encountering a bottleneck as a result of the poor capacity of histopathology or pathological image processing methods. Historically, medical treatment has depended on symptom analysis. In other words, a patient’s symptoms are initially looked into, and if necessary, they are referred for a more in-depth assessment. Current

Lung Cancer Detection from Histopathological Images Using Deep Learning

203

usage of the word precise medicine refers to the diﬃculty faced by the vast but fragmented nature of biological data. To do this, patients’ digital data is stored in shareable online databases and patient-centric appointments are employed [16]. Recent breakthroughs in Deep Learning (DL) and Deep Neural Networks (DNN) have improved the technology of image processing and object recognition from image. We may use DNN to search or match for objects in an image and determine whether or not they are recognized. Additionally, when examining a photograph, we might seek for numerous patterns. Frequently, a preset dataset is required to train the Neural Network (NN). By which the network can learn, detect or recognize, and categorize images. DNN can be used to extract features and categorize images in diﬀerent kinds of diagnosis applications. In this research, we proposed a learning model to detect lung cancer from histopathoogical image dataset. The main objective of this research study is to enhance the performance of the Deep Learning model to identify the lung cancer eﬃciently. We organized this paper as Sect. 2 about previous lung cancer works. In the Sect. 3 we described the lung cancer dataset which we used in our experiment. Section 4 is about our proposed system and working procedure. We analyzed our results and compared our outcomes with others in Sect. 5. Finally, in the Sect. 6 we give the summary of this work.

2

Related Works

Jiang et al. [17] created a two-dimensional CNN architectures for tumor or spot detection. They used images with vascular characteristics. Setio et al. [26] employed 2D CNN architectures to locate lung cancer. CT scans with several planner perspectives were used as training data. Several lung nodule identiﬁcation techniques were combined using CNNs to improve the result [29]. Dou et al. [10] proposed a combined 3D CNN architecture to reduce false positives in lung cancer diagnosis. Ding et al. [9] employed two steps to identify lung nodules. To begin, Faster R-CNN was enhanced with a deconvolutional structure to ﬁnd lung candidates. Anirudh et al. [3] trained the 3D CNN by extending a 3D region from a single voxel point supplied by poorly labeled input. Artiﬁcial intelligence (AI) has recently demonstrated exceptional eﬀectiveness in medical data process and analysis due to the rapid advancement of DL methods, which have demonstrated an increasing capacity to handle diﬃcult real life problems [13,19,20]. By enabling pathologists with advanced deep learning algorithms, they may be able to assist them with tough diagnostic issues. Ivanov et al. projected the DNN layers onto a dynamic image and found that the layer count had an eﬀect on image porecessing [15]. Notably, the authors of [11] used modiﬁed AlexNet. By using this auto-encoder, they were able to boost the success rate of the DNNs learning feature to 90.1%. Using histopathology data, Mohalder et al. proposed a deep learning approach for precisely identifying and categorizing lung cancer levels. With 15,000 lung cancer histology images used for training and validation, they were able to reach a prediction accuracy of 99.80%. [22].

204

R. D. Mohalder et al.

Chen et al. [6] made excellent use of machine learning technologies in 2017 to forecast chronic sickness epidemics in disease-prone communities. Zhang et al. [28] devised a quick Fourier transformation-coupled based machine learning strategy for forecasting short-term sickness risk and generating suitable recommendations on clinical test requirements in chronic heart disease patients. That model was a combination of ANN, LS-SVM, and NB. A diﬀerent study group [23] proposed that clustering, noise reduction, and prediction algorithms be used to develop an analytical framework for forecasting disease. CART was used to generate fuzzy rules. Kotsavasiloglou and colleagues [18] developed a system which can classify unknown patients based on their linedrawing ability using an advanced machine learning methodology. Sedaghat et al. [25] developed a two-step technique for improving the outputs of the sequencebased prediction techniques or models. The ﬁrst phase utilized consensus learning, while the second phase utilized SVM (unary and binary) to recognize the evaluated connections in the gene regulatory network that are dependent on gene binding and network features.

3

Dataset

Borkowski et al. published a dataset on colon and lung cancer in 2019 [5]. It is a combination of 25000 histopathological image dataset and 5000 image in each class. This dataset is recent from others. There are ﬁve types of tissue data, including of lung cancer and colon cancer. Original size of those images was 1024 × 768 px. But they published cropped images in 768 × 768 px. We used only lung cancer dataset for out this research. Figure 2 shows three sample data from the lung cancer dataset which we used in this work. Table 1 presents lung cancer histopathological images assigned class name and id. Table 1. Assigned class name and id of LC25000 dataset.

4

Cancer Type

Class Name Class ID # of Images

Adenocarcinoma

lung aca

0

5000

Benign Tissue

lung n

1

5000

Squamous Cell Carcinoma

lung scc

2

5000

Methodology

This section will discuss the approaches utilized to develop our model and increase the accuracy of our forecasts. Figure 3 illustrates architectural overview of our proposed system´s workﬂow.

Lung Cancer Detection from Histopathological Images Using Deep Learning

205

Fig. 2. Images of 2(a) lung adenocarcinoma (lung aca), 2(b) lung benign tissue (lung n), and 2(c) lung squamous cell carcinoma (lung scc) from LC25000 dataset.

4.1

Collection and Analysis of Data

We used the LC25000 [5] dataset for this study. There are ﬁve types of histopathology imaging data in this collection. These are some details on colon and lung cancer. We only used the dataset for lung cancer. There was 5000 of lung adenocarcinoma, 5000 of squamous cell carcinoma, and 5000 of benign tissue image. 4.2

Data Preprocessing

As a result of the numerous irregularities and poor pixel quality of the acquired images, the projected images of lung cancer are less accurate. The quality of the CT lung image was enhanced using a pixel intensity analysis method that inﬂuences how image pixels are seen. Both the dependable and the noisy pixels were eliminated by continuous pixel modiﬁcation. Histogram algorithms were

206

R. D. Mohalder et al.

Fig. 3. Architectural overview of our proposed system.

frequently used to improve image quality because they are versatile and easy to implement. We also classiﬁed images into two groups by the classiﬁcation process. That process ensured us that there was no mixing or noisy image in each class. In the data preprocessing step we converted all histopathological data from RGB to HSV. By converting RGB to HSV we found exact information from the of cancer area and aﬀected level from histopathological data. 4.3

Deep Neural Network

DNN or DL model is the combination of input. hidden and output layer. Before constructing our DL model ﬁrst we split our dataset into two groups. one is training and another one is testing. To Train DL model we used 80% of the total data and the remaining 20% for testing. These components operate similarly to human brains and can be trained similarly to any other machine learning algorithm. We created a neural network model with four layers. We employed the ReLU and Softmax activation functions in our model. Layer type, output shape, parameters and total number of parameters are shows in Fig. 4 In Fig. 5 we showed our proposed DL model structure with neuron and activation function, and in Fig. 6 also showed the 3D structural overview of our DL model working procedure. Conv2D, maxpooling2D, dropout, ﬂatten, and dense are all depicted in this graphic using various colors. 4.4

Analysis and Visualization

By this process we to evaluated and visualized the ﬁndings of experiments. To assess accuracy, precision, recall, and F-1 Score outcomes were computed.

Lung Cancer Detection from Histopathological Images Using Deep Learning

Fig. 4. Our proposed DL model’s structure.

Fig. 5. DL model’s structure with neuron and activation function.

207

208

R. D. Mohalder et al.

Fig. 6. 3D view of our DL model.

Fig. 7. Training and validation accuracy-loss curve.

Lung Cancer Detection from Histopathological Images Using Deep Learning

209

Fig. 8. Confusion matrix of proposed DL model. Table 2. Measure the performance of our DL model. Precision Recall F1-Score Support lung aca

0.95

0.91

0.93

lung n

1.00

1.00

1.00

1000

lung scc

0.91

0.96

0.93

1000

0.95

3000

0.95

0.95

0.95

3000

weighted avg 0.95

0.95

0.95

3000

accuracy macro avg

5

1000

Result and Discussions

Only the lung cancer dataset from the LC25000 dataset was used in this analysis. Three groups of 15,000 histopathological images were present. There are three types of lung cancer image. All of the photographs were the same size (768 × 768 px). The comet-tail graph was used to evaluate picture intensity analysis outcomes. We processed our photographs in the image prepossessing stage by converting RGB to HSV. Because they lack crucial data needed to fore-

210

R. D. Mohalder et al. Table 3. Comparisons with previous works.

Reference

Algorithm

Accuracy(%)

Chen et al. [6]

CNN based Multimodal Prediction Model 94.8

Da Nobrega et al. [8]

NB, MLP, SVM, KNN, RF

Ding et al. [9]

Deep CNN Model

94.60

Gao et al. [12]

Improvement of Image Classification

94.00

Gunaydin et al. [14]

PCA, KNN, SVM, NB, DT, and ANN

93.24

Mehmood et al. [21]

Transfer Learning Method

89.00

93.19

Phankokkruad et al. [24] VGG16, ResNet50V2, and DenseNet201

90.00

Our Model

95.00

Deep Learning Model

cast lung cancer. After converting RGB to HSV we get black and white images. Black regions of an image indicate aﬀected areas. We considered only the aﬀected areas in our calculation. Others are not used to it because those areas do not carry any vital information. Following this, we divided our dataset into train and test groups. To train our DL model we set epoch size 10 and the batch size per epoch was 375. For model learning, we used the dynamic learning rate and Adam optimizer was used for the model optimization. The total number of trainable parameters was 3314275. At the end of 2 h 56 min and 24 s training and validation process was successfully completed. We got 95% accuracy with 0.158073% loss from our proposed DL model. Figure 7 illustrates the accuracy and loss curve for the training and validation process. We analysed precision, recall and f1-score value of our model. Table 2 shows the performance and Fig. 8 shows the confusion matrix of our DL model. We compared our accuracy with other researchersworks. ´ Table 3 illustrates the comparisons of our model accuracy with other researchers´ proposed model accuracy. Most of researchers used big and complex deep learning model for their detection and prediction tasks. But for our prediction task we tried to keep our model simple and short. Fro that reasons our model showed perfect accuracy within a short amount of time.

6

Conclusion

This study aims to categorize the severity of lung cancer and to identify malignant lung nodules in an input lung image. The location of malignant lung nodules is identiﬁed in this study utilizing ground-breaking deep learning techniques. In this scenario, traits are classiﬁed using deep learning. Future research will concentrate on enhancing the performance of pulmonary nodule classiﬁcation and optimizing the proposed model. Additional work will be done to grade the images in accordance with the malignancy of the pulmonary nodules, which is crucial for the practical applications of diagnosing and treating lung cancer.

Lung Cancer Detection from Histopathological Images Using Deep Learning

211

References 1. Lung cancer. www.verywellhealth.com/lung-cancer-overview-4581940 2. Alturkistani, H.A., Tashkandi, F.M., Mohammedsaleh, Z.M.: Histological stains: a literature review and case study. Global J. Health Sci. 8(3), 72 (2016) 3. Anirudh, R., Thiagarajan, J.J., Bremer, T., Kim, H.: Lung nodule detection using 3D convolutional neural networks trained on weakly labeled data. In: Medical Imaging 2016: Computer-Aided Diagnosis, vol. 9785, p. 978532. International Society for Optics and Photonics (2016) 4. Van den Bent, M.J.: Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician’s perspective. Acta Neuropathol. 120(3), 297– 304 (2010) 5. Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson, C.P., DeLand, L.A., Mastorides, S.M.: Lung and colon cancer histopathological image dataset (lc25000). arXiv preprint arXiv:1912.12142 (2019) 6. Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017) 7. Cooper, L.A., Kong, J., Gutman, D.A., Dunn, W.D., Nalisnik, M., Brat, D.J.: Novel genotype-phenotype associations in human cancers enabled by advanced molecular platforms and computational analysis of whole slide images. Lab. Invest. 95(4), 366–376 (2015) 8. Da N´ obrega, R.V.M., Peixoto, S.A., da Silva, S.P.P., Rebou¸cas Filho, P.P.: Lung nodule classification via deep transfer learning in CT lung images. In: 2018 IEEE 31st International Symposium on Computer-based Medical Systems (CBMS), pp. 244–249. IEEE (2018) 9. Ding, J., Li, A., Hu, Z., Wang, L.: Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 559–567. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7 64 10. Dou, Q., Chen, H., Yu, L., Qin, J., Heng, P.A.: Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection. IEEE Trans. Biomed. Eng. 64(7), 1558–1567 (2016) 11. Gao, F., Huang, T., Wang, J., Sun, J., Yang, E., Hussain, A.: Combining deep convolutional neural network and SVM to SAR image target recognition. In: 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1082–1085. IEEE (2017) 12. Gao, X., et al.: Improvement of image classification by multiple optical scattering. IEEE Photonics J. 13(5), 1–5 (2021). https://doi.org/10.1109/JPHOT.2021. 3109016 13. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, Cambridge (2016) ¨ G¨ ¨ Comparison of lung cancer detection algo14. G¨ unaydin, O., unay, M., S ¸ engel, O.: rithms. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–4. IEEE (2019) 15. Ivanov, A., Zhilenkov, A.: The prospects of use of deep learning neural networks in problems of dynamic images recognition. In: 2018 IEEE Conference of Russian

212

16. 17.

18.

19. 20. 21.

22.

23.

24.

25.

26.

27.

28.

29.

R. D. Mohalder et al. Young Researchers in Electrical and Electronic Engineering (EIConRus), pp. 886– 889. IEEE (2018) Jakimovski, G., Davcev, D.: Using double convolution neural network for lung cancer stage detection. Appl. Sci. 9(3), 427 (2019) Jiang, H., Ma, H., Qian, W., Gao, M., Li, Y.: An automatic detection system of lung nodule based on multigroup patch-based deep learning network. IEEE J. Biomed. Health Inform. 22(4), 1227–1237 (2017) Kotsavasiloglou, C., Kostikis, N., Hristu-Varsakelis, D., Arnaoutoglou, M.: Machine learning-based classification of simple drawing movements in Parkinson’s disease. Biomed. Signal Process. Control 31, 174–180 (2017) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) Mehmood, S., et al.: Malignancy detection in lung and colon histopathology images using transfer learning with class selective image processing. IEEE Access 10, 25657–25668 (2022). https://doi.org/10.1109/ACCESS.2022.3150924 Mohalder, R.D., Sarkar, J.P., Hossain, K.A., Paul, L., Raihan, M.: A deep learning based approach to predict lung cancer from histopathological images. In: 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), pp. 1–4 (2021). https://doi.org/10.1109/ICECIT54077.2021. 9641341 Nilashi, M., Bin Ibrahim, O., Ahmadi, H., Shahmoradi, L.: An analytical method for diseases prediction using machine learning techniques. Comput. Chem. Eng. 106, 212–223 (2017) Phankokkruad, M.: Ensemble transfer learning for lung cancer detection. In: 2021 4th International Conference on Data Science and Information Technology, pp. 438–442 (2021) Sedaghat, N., Fathy, M., Modarressi, M.H., Shojaie, A.: Combining supervised and unsupervised learning for improved miRNA target prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 15(5), 1594–1604 (2017) Setio, A.A.A., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35(5), 1160–1169 (2016) Sung, H., et al.: Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin. 71(3), 209–249 (2021) Zhang, J., et al.: Coupling a fast Fourier transformation with a machine learning ensemble model to support recommendations for heart disease patients in a telehealth environment. IEEE Access 5, 10674–10685 (2017) Zia ur Rehman, M., Javaid, M., Shah, S.I.A., Gilani, S.O., Jamil, M., Butt, S.I.: An appraisal of nodules detection techniques for lung cancer in CT images. Biomed. Signal Process. Control, 41, 140–151 (2018). https://doi.org/10.1016/j.bspc.2017. 11.017, www.sciencedirect.com/science/article/pii/S1746809417302811

Brain Tumor Detection Using Deep Network EfficientNet-B0 Mosaddeq Hossain1,2(B) and Md. Abdur Rahman1 1 Jahangirnagar University, Savar, Dhaka 1342, Bangladesh

[email protected] 2 Manarat International University, Ashulia, Dhaka 1349, Bangladesh

Abstract. The brain tumor is the deadliest disease in human beings. It could lead a person quickly to death if it is not detected and treated in the primary stage. However, catching a brain tumor with a bare eye could sometimes lead to misguidance or be costly to find someone who is a master in this field. So, the deep learning (DL) method is a boon for detecting tumors from images in the health sector. Here, we are going to propose a DL-based modified model which is backed by EfficientNet-B0, one of the EfficientNet (EN) models, to predict the MRI images as tumorous or non-tumorous. Our proposed model consists of blocks of deep layers, and the image classification is done by the SoftMax classifier. In our methodology, we have used a significant number of MRI images to train and test our proposed model. In contrast, this phenomenon is rare in those papers that endeavored to detect brain tumors using the deep learning method. Also, very few attempts have been initiated so far that used the EfficientNet model in brain tumor detection. We attained a detection accuracy of 99.97%, precision of 91.63%, F1score of 86.94%, and recall score of 85.49% in our proposed model. Detection accuracy in our model is relatively higher than that of those models which have used the EfficientNet model. Keywords: EfficientNet · EfficientNet-B0 · deep learning · CNN · brain tumor detection · MRI · SoftMax

1 Introduction A tumor is typically a rounded shape thing that could arise under the skin of any part of the body [1]. There are multiple kinds of tumors, such as brain tumors, colon tumors, tongue tumors, thyroid tumors, liver tumors, breast tumors, etc. Among these kinds of tumors, brain tumors (BT) are the most detrimental. Many people die every year due to Brain Tumor disease. However, other types of tumors are also severe because they could create complications in the human body by producing carcinogenic cells. In this paper, our intended target is to analyze an MRI image using deep learning and predict whether it contains tumors or not. Typically brain tumors are two kinds, namely benign and malignant. Because of their inability to produce cancer cells and immobile criteria to move the body’s other organs, benign tumors are normally not carcinogenic. On the © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 213–225, 2023. https://doi.org/10.1007/978-3-031-34619-4_18

214

M. Hossain and Md. A. Rahman

other hand, malignant brain tumors are carcinogenic because they can proliferate and be transmitted to other organs of the body [2]. The latter kind of tumor is the most dangerous kind. Because, in 2016, in the USA, BT was one of the significant causes of carcinogenic death in children (ages 0–14) [3]. Brain tumor was the third death-causing factor for adolescents and teenagers (ages 15–39) [4]. Since cancer is a life-threatening disease, early detection of tumors could help in medication of the malignant tumor and stops them from further deteriorating into carcinogenic cells. So, our goal is to bring forward a model that can forecast the appearance of brain tumors more accurately from the brain’s MRI images. Although BT could be diagnosed from both MRI and CT Scan images, MRI images give a more accurate depiction of the tumor. A doctor can detect the tumor from an MRI image; however, not all doctors have enough experience to predict the tumor from an image when a difficult situation appears, especially in undeveloped regions. At present, Machine learning (ML) is a popular method among data scientists because of its aptness to learn from the input data as well as to make predictions by forming a model based on the inputs [5]. Machine learning techniques require an efficient inspection to develop a model; however, deep learning (DL) has made it easier because it doesn’t require a more sophisticated inspection. There are various kinds of DL methods, namely Convolutional Neural Network (CNN), Long Short-Term Memory Network, Recurrent Neural Network, Generative Adversarial Network, Radial Basis Function Network, Multilayer Perceptron, Self Organizing Map, Deep Belief Network, Restricted Boltzmann Machine, etc. [6–10]. These models can be trained by a dataset and can make further predictions based on it. In this way, in many regions where the most experienced doctor is not available or even if available, doctors can be assured about the presence of tumors with the assistance of these DL methods. But the problem is that deep learning algorithms need massive data to train a model to get a more accurate result. However, this problem can be avoided by taking help from transfer learning techniques. A transfer learning DL algorithm is being trained with a vast amount of data, and that pre-trained model can be applied to a small amount of data to make predictions. Since it requires a lot of time to train a conventional deep model, people can save a lot of time by using transfer learning techniques. In this method, we have implemented EN, being pre-trained on ImageNet, to identify the existence of a BT in MRI slices. Here, we can summarize our work as follows, • First, we developed a modified DL model grounded on EfficientNet-B0, a CNN. • We utilized a publicly available dataset, BD_Brain_Tumor, containing 20,000 CT scan images to train up and evaluate our proposed model. We split our dataset into the train, test, and validation segments to train, test, and validate our proposed model. • Since a deep learning model needs a huge amount of data, different data augmentation techniques, namely zooming, rotating, flipping, etc., were implemented to increase our dataset for better outcomes. • Also, we have utilized a pre-processing technique that selects the central region of the brain tumor and crops it to reduce the burden of analyzing the unnecessary pixels outside the main contour.

Brain Tumor Detection Using Deep Network EfficientNet-B0

215

• Our proposed model can predict the presence of the brain tumor by analyzing CT scan images with excellent accuracy of 99.97%, a precision of 91.63%, and an F1-Score of 86.94%. The following sections are organized as Related Works, Proposed Methodology, Experimental Section, and Conclusion.

2 Related Works Many articles have already addressed the issue of detecting brain tumors using DL, and achieving different accuracies based on their model. Many of these methods might achieve higher accuracy; however, achieving higher accuracy doesn’t mean that that method is the most suitable or the most plausible. Because many of those methods might use a small dataset, or there might be an overfitting problem in their trained model. Herein, we will discuss some papers that used EN for their prediction and classifications. The authors presented a combined method to classify an MRI image into three categories: Meningioma, Glioma, and Pituitary [11]. The main steps of this method are Preprocessing, Enhancing image quality, Convolutional layers, and a classification stage. In this paper, they employed EfficientNet-B0 (EN-B0), the baseline network of the ENs. They used ReLU as the activation function for classifying the input images. Using a publicly available dataset named Figshare, containing 3064 photographs of the T1 modality. Their model’s overall classification accuracy is 98.04%. Although this paper is not related to this study, we can get some ideas from [12] about detecting brain tumors [12]. This paper developed a modified EN model to detect the presence of Lymph node metastasis in the breast tumor. Authors also have created Random Center Cropping (RCC), a data augmentation technique. They also introduce the attention and feature fusion mechanism for feature extraction. According to their experiment, these two mechanisms boost the efficiency of the EfficientNet-B3 (EN-B3). A rectified version of Patch Camelyon (RPCam), a Kaggle Competition dataset, is used for their investigation. This paper achieved an accuracy of 97.96% ± 0.03% in this boosted EN-B3 model. Childhood Medulloblastoma is a special kind of brain tumor, and this issue was addressed nicely by using EN in the paper [13]. They have discussed the accuracies among the various models of EN networks. It calculates the multiclass classification as well as binary classification of Medulloblastoma. An accuracy range of 95.67% to 98.78% has been achieved for multiclassification. And 100% accuracy is achieved for binary classification. However, they used a dataset that contains only 355 images of different kinds of Medulloblastoma. Another paper uses EN to segregate small-cell from non-small-cell for lung cancer [14]. This work is somewhat different from the others because they analyzed brain tumors with lung cancer origins. Using 102 brain MRI images from 69 patients, they achieved an average accuracy of 90% for the classification. Some authors propose a method for classifying two kinds of Medulloblastoma, a type of severe brain tumor in childhood in this paper [15]. Authors find that EN-B5 outperforms other EN functions when these aren’t using the pre-trained model. The authors also compare their model’s outcome with the other CNN models like VGG16,

216

M. Hossain and Md. A. Rahman

etc. They collected 2769 images from 161 patients from the different hospitals. This method achieved an F1-Score of 80.1%. Again, a few authors propose a DL model to detect different brain tumors [16]. They have used the 6 DL models in their proposed model with the datasets BRATS 2013 and Whole Brain Atlas. Although they didn’t provide any information regarding data preprocessing, they achieved 96–99% optimum results in all six models. A Capsule net method was developed in the paper to detect the presence of BT [17]. By flipping and patching, the authors have augmented the Figshare data of the T1wCE modality. They also resized the images into 28 × 28 resolution. Their experiment achieved an accuracy of 87% while the input data weren’t pre-processed; however, it gave an accuracy of 92% while the input data were pre-processed. Other authors propose a method to detect the presence of BT using CNN model [18]. In this method, the dataset provided by Chakroborty from Kaggle was used, and the data augmentation technique was used for the data balancing. This model secures overall accuracy of 96.77% (Fig. 1).

Images with brain tumor

Images without brain tumor

Fig. 1. Sample MRI images from the brain tumor dataset that has been used in our study.

3 Proposed Methodology In this part, we will discuss our proposed model with detailed information. In the first step, MRI input images go through the pre-processing stage, which includes resizing the images into 224 × 224 resolution and crops the main brain region, then enters in the EfficientNet-B0 blocks, and then image slices are processed and analyzed with the convolutional layers and blocks. Finally, the last layer classifies the output images into two categories: tumorous or non-tumorous. 3.1 Data Data Source. In this method, we used “BD_Brain-Tumor,” a public dataset in Kaggle [19]. This dataset of 20,000 images has been split into three parts: ‘Training Set’, which contains 13,547 images, ‘Testing Set’, which includes 2,064 images, and ‘Validation Set’, which includes 4,356 images. There are 19200 images in jpg format, 156 images in png format, and 645 images in jpeg format. However, 34 images have been rejected because of their very blurry appearance. All the CT scan images we found in this dataset have different resolutions. Therefore, we resized those images into 224 × 224 resolution according to our proposed model’s requirement. In our experiment, we split our dataset

Brain Tumor Detection Using Deep Network EfficientNet-B0

217

into training and testing, which contains 80% and 20% of all data, respectively. Then we used 80% of the training dataset to train our model, and the rest of the 20% data to validate our model. All the photos we used in our method are of 224 × 224 resolution. Data Resizing and Cropping. Since the different raw MRI images have different resolutions, we needed to resize all these images into a unique resolution of 224 × 224 for passing through the main architecture of our proposed model. Running a deep learning model with MRI images takes too much time because of the vast number of pixels in an image. Therefore, we removed all the pixels outside the central brain region by selecting the main contour of MRI slices. This step reduces the unwanted impact of outward pixels and unnecessary calculation of pixels not related to the brain tumor region. Figure 2 shows the image cropping process in our proposed model. 1st step: loads the raw MRI

2nd step: selects the biggest contour

3rd step: selects the extreme points

4th step: crops and saves as new image

Fig. 2. Cropping the MRI images before entering into our proposed deep model

Data Augmentation. Since the DL models need massive data to train a model, we applied various data augmentation processes. Rotation, zooming, and width shifting have been done to augment our dataset. For the image rotation, we rotated each image at 10°. For zooming purposes, we zoomed each image with a 10x scale. These augmentation techniques have been applied using standard Python libraries.

3.2 Deep Network EfficientNet. EN is a relatively new method in the deep learning field. This network has been developed by some researchers from the Google Research Brain team [20]. They presented this at a conference in 2019 [20]. This method gained popularity because of its compound scaling of the network model and simple structure. EN can uniformly scale up the network’s depth, width, and resolution wise by using compound coefficients and yields better performance. Its effectiveness is measured by scaling up ResNet and MobileNets [20, 21]. Effectiveness of manipulating EN as transfer learning is also assured by their experiment [20]. Proposed Model. The baseline EN-B0 consists of a total of 9 stages, of which the first one is a convolutional layer having kernel size 3 × 3. The subsequent seven layers, i.e., the 2nd to 8th stages are the mobile inverted bottleneck (MBConv) blocks having kernel

218

M. Hossain and Md. A. Rahman

sizes of 3 × 3 or 5 × 5. The final stage consists of convolution, fully connected, and pooling layers [20]. In this paper, we have used the EN-B0 model as the building block for our experiments. The last stage of our model is built by selecting the activation function as ‘SoftMax’ for the liner classification of the brain tumors, and ‘average pooling’ was chosen for the pooling layer. We also conducted experiments on the other ENs, and we found that our proposed model gives the highest accuracy among other papers that are related to EN. Figure 3 describes the graphical architecture of our proposed model. MRI Input Resizing

224x224

Cropping

Crops brain region

Augmented Images

14x14x112

14x14x192

MBConv6, 3x3

MBConv6, 5x5

MBConv6, 5x5

MBConv6, 5x5

Validating

MBConv6, 3x3

MBConv1, 3x3

Conv, 3x3

Testing

MBConv6, 3x3

Training

} Data Augmentation

Shifting Zooming Rotating Re-scaling Flipping Shearing

7x7x320

Tumor Conv, 1x1 7x7x1280 SoftMax Fully Connected layer

Not Tumor

28x28x80 56x56x40 112x112x16

112x112x24

Output

224x224x32

EfficientNet Block

Fig. 3. Proposed model’s architecture.

Compound Scaling Method of EN Model. Basically, the model’s efficiency doesn’t solely depend on the complication of the model, but properly scaling the network model is one of the key reasons to be successful. An example of compound scaling is given below [20]. Figure 4 shows the graphical presentation of its scaling method. If w, d , r denote the width, depth, and resolution, respectively, such that d = α ∅ , w = β ∅ , r = γ ∅ . Here, the symbol ∅ is used for the compound coefficient, defined by the users, of the compound scaling functions. Then the relation among them can be explained by the following equation, α.β 2 .γ 2 ≈ 2 where α ≥ 1, β ≥ 1, γ ≥ 1

(1)

Brain Tumor Detection Using Deep Network EfficientNet-B0

219

Fig. 4. Compound Scaling method of EfficientNet [20]

4 Experimental Section In this part, we will discuss all the steps of our experiment and the results with various kinds of measurement matrices. 4.1 Experiment Input images go through the first step of our proposed model. In our compound scaling model, images are processed by various kinds of layers that scale up the images and extract the features of the pictures. We divided our dataset into 80% and 20% in this experiment to train and test, respectively. Further, we shuffled the allocated train datasets in order to avoid any kind of learning partiality. Since deep learning models need a vast amount of data, later, we applied rotation, zooming, and other processes for data augmentation. We have started running our model with one epoch and observed the model’s accuracy. Then we gradually increased the number of epochs and simultaneously noted our model’s accuracy and learning rate. After reaching nearer epoch number 50, we saw that the accuracy and learning rate of the model no longer increased. At that point, we stopped conducting more epochs and plotted those results in the graphs. In Fig. 5, we can see the different accuracies and losses associated with the various epochs, namely 10, 20, 30, 40, 50, etc. “Adam” optimizer was employed to reduce the noise problem. Adam optimizer works for the Stochastic Gradient Descent Method which is the process of finding the best fit between the actual result and the predicted outcome. There are a few reasons that instigate us to use the Adam optimizer as it optimizes better than other optimizers, needs fewer parameters for tuning, and takes less computational time to train a deep model. Equation 2 and 3 says how the Adam optimizer mathematically works as follows, Wt+1 = Wt − αmt

(2)

Where, mt = βmt−1 + (1 − β)

δL δWt

(3)

220

M. Hossain and Md. A. Rahman

Here, mt = Total gradients at time t, Wt = Weight at time t, α = Learning rate, L = Loss function, β = Moving parameter. “Average pooling” was used to make the pictures smoother and to pick the best features. In the final layer, the SoftMax classifier was used to predict whether the image contained a tumor or not. We trained and tested our model with SoftMax, ReLU, and Sigmoid classifiers. However, we got a higher accuracy when we used SoftMax classifier. The mathematical formula for the SoftMax classifier is given below by Eq. 4, eXi (4) σ X = N Xi i j=1 e

where vector X denotes input, N is the outcome types to be predicted, and Xi takes all the real values. 4.2 Result Our experiment gives 99.97% accuracy on the training set and 82.66% accuracy on the testing dataset. Other performance matrices have been used for a more narrative portrayal of our model’s performance. To overcome the stagnant problem, we reduced the learning rate to benefit the model and to avoid overfitting. At the end of the last epoch, the learning rate was reduced to 3.138105874196098−15 . 4.3 Performance Matrices Various kinds of performance measurements have been conducted. Since there are different kinds of evaluation scales, measuring only the accuracy isn’t enough to judge a model. Essential measures could be calculated by the initial evaluation such that if we consider the true-positive as TP, false-positive as FP, false-negative as FN, and false-positive as FP [22–25], then mathematically, we can write, Accuracy =

TP + TN TP + TN + FP + FN

(5)

TP TP + FP

(6)

Precision = F1 − Score =

2TP 2TP + FN + FP

Recall =

TP TP + FN

(7) (8)

These performance measurements are given in Table 1. Loss Function. There are various kinds of loss functions, and their applications vary according to the type and number of the output. Graph of our model’s accuracy and loss can be found in Fig. 5. If we choose M as the value of training examples, k as the value

Brain Tumor Detection Using Deep Network EfficientNet-B0

221

Table 1. Performance of our work Accuracy

Precision

F1-Score

Recall

99.97%

91.63%

86.94%

85.49%

k as the target, x as input for m, h as the model, then the mathematical of classes, ym θ formula for presenting the standard CCE equation is [26],

LCCE = −

K M 1 k ym × log(hθ (xm , k)) M

(9)

k=1 m=1

0.15 0.99 trainaccuracy

0.97

0.1 train-loss

val-accuracy

0.95

0.93

0.05

val-loss

0 0

10

20

30

Epochs

40

50

0

10

20

30

40

50

Epochs

Fig. 5. Graphs of the accuracy and loss function

4.4 Discussion Several papers have been published on various deep learning approaches to detect brain tumors. Many of those papers used the transfer learning method for their experiment, and others used the conventional DL method. In our model, we have used the pre-trained ENB0 model for our experiment, and this model has been trained on ImageNet. The papers that used EN to classify or detect brain tumors, most of those used only a fewer number of data as like one of those used 102 images to train and validate their model. However, our network structure has achieved more accuracy than other models. A comparative study on the accuracy of our model with the other papers that used EN has been presented in Table 2. We also conducted a comparative study on the other deep learning methods that endeavored to identify the presence of BT from MRI slices. These studies have used different CNN techniques other than EN. Table 3 provides a comparative analysis of those kinds of models with our proposed model.

222

M. Hossain and Md. A. Rahman Table 2. Accuracy among those models that used EfficientNet

References

EfficientNet model name

Accuracy

[11]

EfficientNet-B0

98.04%

[12]

EfficientNet-B3

97.96% ± 0.03%

[13]

Comparative study of different EfficientNet

95.67% to 98.78%

[14]

EfficientNet-B0

90%

[15]

EfficientNet-B5

F1-Score of 80.1%

Our proposed model

99.97%

Table 3. Comparative study of those models that didn’t use EfficientNet [27] and our proposed model. References

Model Name

Accuracy

F1-Score

[28]

Stacked sparse Autoencoder (SSAE)

98%

Not specified

[29]

Modified ResNet50

97.1%

96.9%

[30]

CNN-GAN

With GAN Pretrained 95.60%

With GAN Pretrained 95.10%

[31]

RNN

96%

Not mentioned

[32]

Modified GAN to increase data and ResNet50 to detect

91%

Not Mentioned

[33]

CNN

94%

94%

[34]

Segmentation using MKPC 83% and classification using DL

Not mentioned

[35]

Modified CapsNet

95.54%

Not Mentioned

99.97%

86.94%

Our proposed model

5 Conclusion Many researchers have tried to develop a DL model to address this issue. Many of those models have used small data and still achieved good accuracies. However, we know that deep network models need a vast amount of data to make predictions accurately. Another thing is that a model’s accuracy solely doesn’t rely on the model’s complexity of the structure. But, EN models showed that properly scaling the model function is a key factor in becoming a more successful model. Although EfficientNet’s design is less convolved, its compound scaling method rationally scales the model’s depth, width, and resolution. In our model, we conducted 50 epochs with the SoftMax activation function and achieved an accuracy of 99.97%, precision of 91.63%, and F1-score of 86.94%.

Brain Tumor Detection Using Deep Network EfficientNet-B0

223

From this perspective, we say that our model gives a more accurate result than the other similar models. We have developed our model to predict the presence of brain tumors, i.e., tumorous or not tumorous, in the MRI images. However, the efficiency of our model is yet to be determined in determining the presence of various kinds of brain tumors, namely, Meningioma, Glioma, Pituitary, Glioblastoma, Sarcoma, etc. Further study can be extended to identify various brain tumors based on our proposed model. Acknowledgment. Firstly, I want to thank my creator for giving me the patience and ability to conduct this study. Secondly, I want to thank Professor Md. Abdur Rahman, who was the mentor of this work. Also, I’m grateful to my wife for her mental support and inspiration on this journey.

References 1. Mohammadi, F., Rastgar-Jazi, M.: Analytical and experimental solution for heat source located under skin: modeling chest tumor detection in male subjects by infrared thermography. J. Med. Biol. Eng. 38(2), 316–324 (2017) 2. Rehman, A., Naz, S., Razzak, M.I., Akram, F., Imran, M.: A deep learning-based framework for automatic brain tumors classification using transfer learning. Circuits Syst. Signal Process. 39, 757–775 (2020) 3. Varade, A.A., Ingle, K.S.: Brain MRI classification using PNN and segmentation using K means clustering. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 6, 6181–6188 (2017) 4. Abiwinanda, N., Hanif, M., Hesaputra, S.T., Handayani, A., Mengko, T.R.: Brain tumor classification using convolutional neural network. In: Lhotska, L., Sukupova, L., Lackovi´c, I., Ibbott, G.S. (eds.) World Congress on Medical Physics and Biomedical Engineering 2018. IP, vol. 68/1, pp. 183–189. Springer, Singapore (2019). https://doi.org/10.1007/978-981-109035-6_33 5. Song, Y., et al.: Association of GSTP1 Ile105Val polymorphism with the risk of coronary heart disease: an updated meta-analysis. PLoS ONE 16(7), e0254738 (2021) 6. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014) 7. Sirichotedumrong, W., Kiya, H.: A GAN-based image transformation scheme for privacypreserving deep neural networks. In: 2020 28th European Signal Processing Conference (EUSIPCO), pp. 745–749. IEEE (2021) 8. Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1(4), 111–122 (2011) 9. Schmah, T., et al.: Generative versus discriminative training of RBMs for classification of fMRI images. In: NIPS (2008) 10. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852. PMLR (2015) 11. Guan, Y., et al.: A framework for efficient brain tumor classification using MRI images. Math. Biosci. Eng. 18, 5790–5815 (2021). https://doi.org/10.3934/mbe.2021292 12. Wang, J., Liu, Q., Xie, H., Yang, Z., Zhou, H.: Boosted efficientnet: detection of lymph node metastases in breast cancer using convolutional neural networks. Cancers 13(4), 661 (2021) 13. Bhuma, C.M., Kongara, R.: Childhood medulloblastoma classification using EfficientNets. In: 2020 IEEE Bombay Section Signature Conference (IBSSC), pp. 64–68. IEEE (2020)

224

M. Hossain and Md. A. Rahman

14. Grossman, R., Haim, O., Abramov, S., Shofty, B., Artzi, M.: Differentiating small-cell lung cancer from non-small-cell lung cancer brain metastases based on MRI using efficientnet and transfer learning approach. Technol. Cancer Res. Treat. 20, 15330338211004920 (2021) 15. Bengs, M., Bockmayr, M., Schüller, U., Schlaefer, A.: Medulloblastoma tumor classification using deep transfer learning with multi-scale EfficientNets. In: Medical Imaging 2021: Digital Pathology, vol. 11603, p. 116030D. International Society for Optics and Photonics (2021) 16. Kalaiselvi, T., Padmapriya, S.T., Sriramakrishnan, P., Somasundaram, K.: Deriving tumor detection models using convolutional neural networks from MRI of human brain scans. Int. J. Inf. Technol. 12(2), 403–408 (2020). https://doi.org/10.1007/s41870-020-00438-4 17. Vimal Kurup, R., Sowmya, V., Soman, K.P.: Effect of data pre-processing on brain tumor classification using capsulenet. In: Gunjan, V.K., Garcia Diaz, V., Cardona, M., Solanki, V.K., Sunitha, K.V.N. (eds.) ICICCT 2019, pp. 110–119. Springer, Singapore (2020). https:// doi.org/10.1007/978-981-13-8461-5_13 18. To˘gaçar, M., Cömert, Z., Ergen, B.: Classification of brain MRI using hyper column technique with convolutional neural network and feature selection method. Expert Syst. Appl. 149, 113274 (2020) 19. Kaggle Dataset: https://www.kaggle.com/datasets/dorianea/bd-braintumor. Accessed 10 Apr 2022 20. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114 (2019). https://proceedings.mlr.press/ v97/tan19a.html 21. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) 22. Acharya, U.R., et al.: Automated detection of Alzheimer’s disease using brain MRI images—a study with various feature extraction techniques. J. Med. Syst. 43, 1–14 (2019) 23. Acharya, U.R., Sree, S.V., Ang, P.C.A., Yanti, R., Suri, J.S.: Application of non-linear and wavelet based features for the automated identification of epileptic EEG signals. Int. J. Neural Syst. 22, 1250002 (2012) 24. Acharya, U.R., Sudarshan, V.K., Adeli, H., Santhosh, J., Koh, J.E.W.: A novel depression diagnosis index using nonlinear features in EEG signals. Eur. Neurol. 74, 79–83 (2015) 25. Rajinikanth, V., Joseph Raj, A.N., Thanaraj, K.P., Naik, G.R.: A customized VGG19 network with concatenation of deep and handcrafted features for brain tumor detection. Appl. Sci. 10(10), 3429 (2020) 26. Ho, Y., Wookey, S.: The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access 8, 4806–4813 (2019). https://doi.org/10.1109/ACCESS.2019.296 2617 27. Nazir, M., Shakil, S., Khurshid, K.: Role of deep learning in brain tumor detection and classification (2015 to 2020): a review. Comput. Med. Imaging Graph. 91, 101940 (2021) 28. Amin, J., Sharif, M., Gul, N., Yasmin, M., Ali, S.: Brain tumor classification based on DWT fusion of MRI sequences using convolutional neural network. Pattern Recognit. Lett. 129, 115–122 (2020) 29. Çinar, A., Yildirim, M.: Detection of tumors on brain MRI images using the hybrid convolutional neural network architecture. Med. Hypotheses 139, 109684 (2020) 30. Ghassemi, N., Shoeibi, A., Rouhani, M.: Biomedical signal processing and control deep neural network with generative adversarial networks pre-training for brain tumor classification based on MR images. Biomed. Signal Process. Control 57, 101678 (2020) 31. Begum, S.S., Lakshmi, D.R.: Combining optimal wavelet statistical texture and recurrent neural network for tumour detection and classification over MRI. Multimed. Tools Appl. 79(19–20), 14009–14030 (2020). https://doi.org/10.1007/s11042-020-08643-w

Brain Tumor Detection Using Deep Network EfficientNet-B0

225

32. Han, C., et al.: Infinite brain MR images: PGGAN-based data augmentation for tumor detection. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Neural Approaches to Dynamics of Signal Exchanges. SIST, vol. 151, pp. 291–303. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8950-4_27 33. Zhou, Y., et al.: Holistic brain tumor screening and classification based on DenseNet and recurrent neural network. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) BrainLes 2018. LNCS, vol. 11383, pp. 208–217. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11723-8_21 34. Rathi, V.G.P., Palani, S.: Brain tumor detection and classification using deep learning classifier on MRI images. Res. J. Appl. Sci. Eng. Technol. 10(2), 177–187 (2015) 35. Adu, K., Yu, Y., Cai, J., Tashi, N.: Dilated capsule network for brain tumor type classification via MRI segmented tumor region. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 942–947 (2019)

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures Tania Ferdousey Promy1 , Nadia Islam Joya1 , Tasfia Haque Turna1 , Zinia Nawrin Sukhi1 , Faisal Bin Ashraf1 , and Jia Uddin2(B) 1 Department of Computer Science and Engineering, School of Data Science, Brac University,

Dhaka, Bangladesh 2 AI and Big Data Department, Endicott College, Woosong University, Daejeon, South Korea

[email protected]

Abstract. Cancer is a lethal disease among the diseases in the world. It is clinically known as ‘Malignant Neoplasm’ which is a vast group of diseases that encompasses unmonitored cell expansion. It can begin anywhere in the body such as the breast, skin, liver, lungs, brain, and so on. As reported by the National Institutes of Health (NIH), the projected growth of new cancer cases is forecast at 29.5 million and cancer-related deaths at 16.4 million through 2040. There are many medical procedures to identify the cancer cell, such as mammography, MRI, CT scan, which are common methods for cancer diagnosis. The methods used above have been found to be ineffective and necessitate the development of new and smarter cancer diagnostic technologies. Persuaded by the phenomena of medical image classification using deep learning, our recommended initiative targets to analyze the performance of different deep transfer learning models for cancer cell diagnosis. In this paper, we have used VGG16, Inception V3 and MobileNet V2 deep architectures to diagnosis the breast cancer (KAU-BCMD dataset), lung cancer (IQ-OTH/NCCD dataset) and skin cancerHam10000). Experimental results demonstrate that VGG16 architecture shows comparatively higher accuracy by exhibiting 98.5% of accuracy for breast cancer, 99.90% for lung cancer and 93% for skin cancer dataset. Keywords: Cancer Detection · Convolutional Neural Network (CNN) · Image Processing · Deep Transfer learning

1 Introduction Cancer is one of the world’s most life-threatening diseases. It is the biggest source of death in the United States which is affecting different ages people. Early detection is the key to cancer treatment, but it is often not that easy. Cancer is a complicated disease, and there are a lot of ways it can show up in the body. Sometimes the symptoms do not appear until the cancer is quite advanced. On other times, there are no symptoms at all. While certain cancers, such as breast cancer, are comparatively easy to identify. On the other hand, cancers like lung, kidney or brain are very hard to detect. The majority © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 226–237, 2023. https://doi.org/10.1007/978-3-031-34619-4_19

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures

227

of cancers are now discovered only after they have progressed beyond the organs in which they began. Nevertheless, because of recent breakthroughs in deep learning (an artificial intelligence approach that allows us to spot patterns in images, audio, or text, among other things), we can now construct the finest artificial intelligence techniques and algorithms for cancer-detection to improve cancer detection and quality of life for patients. This is also why cancer researchers are making quite remarkable progress. Different forms of cancer detection and classification with deep learning have opened up a new area of research for early cancer diagnosis, demonstrating the possibility to eliminate manual system limitations. The key purpose of this paper is to present a brief analysis, comparisons on three deep transfer learning models (VGG16, InceptionV3, MobileNet) utilizing dynamic datasets for breast cancer, lung cancer, and skin cancer (melanoma) diagnosis from accuracy and precision perspectives. Rest of the paper is organized as follows. Section 2 includes literature review, where detailed workflow is presented in Sect. 3. Experimental result analysis is in Sect. 4. Finally, conclude the paper in Sect. 5.

2 Background Study In this section, we discuss the relevant architectures along with reviews on most recent and relevant works. 2.1 Convolutional Neural Network Deep Convolutional Neural Network (DCNN) [1, 2] can identify a certain kind of images from image analysis, image recognition and classification, medical image analysis, computer vision, NLP, etc. In DCNN, there are three types of neuron layers: convolutional layers, pooling layers, and fully connected layers. Filter banks, feature pooling layers, batch normalization layers, dropout layers, and dense layers all use to build for various object identification tasks including detection, segmentation, and classification in CNNs. It offers a variety of pre-trained architectural models, including LeNet, AlexNet, GoogleNet, VGGNet, Inception V3, and others [3]. CNN’s feature several hierarchies, which means that the distribution of inputs alters as the training progresses. Preprocessed inputs acquired over the whitening process, are very necessary for achieving superior results in a variety of jobs [4]. 2.2 Transfer Learning Through the transfer learning we can transfer the knowledge from previous activity to improve learning in a new activity [5]. Transfer learning is prominent in deep learning because it provides the large properties essential to train deep learning models, as well as the huge and complex datasets on deep learning models. Transfer learning is first training on a base network and on a baseline dataset, then transferring the gained features to a target task, which will be trained on the target dataset. If the characteristics are general, which is applicable to the basic and target tasks, rather than exclusive to the base activity [6].

228

T. F. Promy et al.

2.3 Related Works Using Mammograms and Images X-ray Mammograms images are used by doctors to detect initial indications of breast cancer [7]. In breast cancer, we have seen many studies using CNN with transfer learning. Several researchers used CNN to diagnose breast cancer abnormality based on mammograms using transfer learning. In this paper [8], they have used MIAS [9] and DDSM [10] for image processing and classifying. In [11], MIAS PGM formatted images and DDSM Utility to convert DDSM images into PNG. They run these images using MATLAB and use ROIs to train the model. Since CNN (VGG16 model [12]) takes only RGB images of a particular size, they changed those images according to it and resample it. This study used pre-trained CNN with handcrafted features that give average accuracy (benign vs malignant) is 91.02%, AUC is about 0.76. Another study proposed a CAD system, DCNN-SVM–AlexNet for classifying benign and malignant masses which gives accuracy of 87.2% [13]. 2.4 Related Works Using CT Scan Images DCNNs have lately demonstrated amazing effectiveness in lung nodule identification. Generally, imaging tests are common lung cancer nodule detection. In [14], the National Cancer Institute established the Lung Image Database Consortium (LIDC) to boost research and development operations. The LIDC database was established with 3 types of items to be manifest by 4 radiologists: nodules more than or equal to 3 mm in diameter with assumed histology, nodules less than 3 mm in size with an uncertain origin, and non-Nodules smaller than 3 mm in diameter but benign [15]. For working with the dataset, multiple CT slices of individual patient are downloaded and stored in a directory depending on the XML file [16]. To help with feature extraction, it recognizes and distinguishes lung structures and nodules. An automatic lung nodule detection system using a Multi-group Patch-based DL Network is used in [17], multi-group 2D Lung CT images are utilized from the LIDC dataset. The CNN structure is tested on 2 sets of pictures: original images and binary images. The goal of segmenting lungs from a CT scan is to find distinct characteristics that will help the classifier better categorize the candidates. Besides, [18] contemporary lung Computer-Aided Diagnosis (CAD) can help medical decision-making by using the chest CT scans. The definitive objective of these systems is to distinguish between cancerous and non-cancerous nodules. Furthermore, when it was constructed to Multi-Level CNN and tested it on the LIDC dataset, an accuracy of 84.81% was reported by researchers [19]. Thus, to identify lung cancer, a respective number of researchers have proposed contrasting techniques using deep transfer learning. 2.5 Related Works Using Dermatoscopic Images Dermoscopy is a diagnostic process that is used to identify small lesions from a broad area of the body. Other than using dermoscopy, image classification can be used for the same purpose. For the image classification we can use Convolutional Neural Networks pre-trained on ImageNet along with transfer learning. In [20], the researchers have used HAM10000 as their dataset. This dataset has covered all 7 distinct skin cancer

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures

229

cases (Actinic Keratosis, Basal cell carcinoma, Melanoma, Benign Keratosis, Dermatofibroma, Vascular Skin Lesion, Melanocytic Nevi) and [21] is consists of 10015 images with a resolution of 600 × 450 pixels. With the ultimate picture size set to 224 × 224 pixels, data augmentation in the form of flipping, cropping, and rotating was performed. Each model needs input in a specific structure, and the preprocess function assists us in transforming their data into that layout. The researchers tested pre-trained models with the dataset and based on the weighted recalls, they chose ResNet50 as their ultimate base model for their custom model.

3 Workflow The goal of this paper is to compare different CNN and deep transfer architectures by utilizing different types of cancer datasets. To do so, it needs to accept an image from an image dataset as an input, run it through CNN Model Architecture layers, and deliver the output after classification as shown in Fig. 1. Here are the procedures that will be utilized: At first different image datasets for each cancer will be collected. Then different datasets for each cancer will be combined together. Next 1000 images will be chosen from them and will be processed the images taken from the datasets. After that the processed images will be saved and will be randomly splitted into 80:20 ratio for training and testing the different models. Lastly the processed images will be used to train the model and test pre-trained models, analyzing the test accuracy and comparing them to determine the best model. 3.1 Data Collection For Breast cancer, we have used multiple datasets which include mammographic images. The datasets we have used are MINI-DDSM [22] and KAU-BCMD each 50% [11]. It is a lighter edition of the now-outdated DDSM (Digital Database for Screening Mammography) data collection. The dataset comprises 1416 instances, including pictures from both breasts (right and left) and two types of views (CC and MLO) for a sum of 5662 mammography images. Following the BI-RAD approach, the dataset was divided into six groups. For skin cancer(melanoma), we have used HAM10000 which is dermatoscopic pictures from diverse demographics gathered and preserved using various modalities. The completed dataset contains 10015 dermatoscopic pictures that may be utilized as a training set in machine learning models [21]. More than half of tumors are verified by histopathology (histo), with the other instances relying on follow-up examinations, expert consensus, or in-vivo confocal microscopy confirmation. The lesion id-column in the HAM10000 metadata file may be used to track tumors with many pictures in the dataset. For Lung cancer, we have also used multiple datasets which are IQ-OTH/NCCD [23] and Chest CT-Scan images Dataset [24]. We mostly gathered our data from IQ-OTH/NCCD and only used around 40 files from the other for abnormal (cancerous) cell inadequacy in the main (IQ-OTH/NCCD) (Table 1).

230

T. F. Promy et al.

Fig. 1. Workflow diagram of the system.

Table 1. Details of dataset used in this paper. Cancer types

Dataset types

Dataset name

Breast

Mammographic images

MINI-DDSM & KAU BCMD

Lunch

CT SCAN images

IQ-OTH/NCCD & Chest CT scan image

Skin (Melanoma)

Dermatoscopic images

HAM 10000

3.2 Data Pre-processing We restructure and resize the whole dataset’s photos into a single size which is 256 × 256. We labeled our images into two parts normal and cancerous. OHE (One-Hot Encoding) is a categorical encoding method that converts all elements on a categorical column into new columns with binary values of 0 or 1 to indicate the existence of the category value. Here we implemented one hot encoding method for categorizing our image datasets by converting normal (healthy) image datasets into 1 and cancerous image datasets into 0.

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures

231

We also used the NumPy libraries to conduct simple image editing and store it to our local system after converting the loaded photos to and from the NumPy array. Images are read as arrays in both the Keras API and OpenCV. 3.2.1 Data Train and Implementation We downloaded pre-trained weights and printed out the InceptionV3, VGG16, MobilenetV2 model after importing the relevant Deep Learning libraries. We started with Inception V3 and subsequently moved on to VGG16, MobileNet V2 for training our datasets. For improved accuracy, we tweaked the models. Since maximum layers of them are pre-trained in the model so it was termed. As a result, we do not need to retrain them again. We will set different values of the input layers suitable for our datasets. The key advantage of Transfer learning is that it reduces the training time in half and gives us the output with proper accuracy. 3.2.2 Data Splitting Based on an 80:20 ratio, the complete dataset was randomly splitted into 80% training data and 20% testing data, as like as [16]. Furthermore, the divided datasets each have 20 batches, while the goal size is (256, 256). Here, for each cancer, 1000 images were utilized, with 600 malignant image (cancerous) files and 400 healthy (non-cancerous) image files. We implemented three models for training and testing: Inception-v3, VGG16, MobilenetV2. We divided the data set into two sections, with around 80% of data used for training and 20% for testing. 3.3 Architectures Transfer learning states to the application of a previously learned model to a new challenge. Since it can train deep neural networks using a very little data. For our paper we have used ImageNet trained models of transfer learning, to solve real-world picture classification challenges which is because the image dataset has over 14 million images in over 20,000 categories. 3.3.1 Visual Geometry Group (VGG16) It significantly outpaces AlexNet by serially substituting massive kernel-size filtration (11 and 5 with the 1st and corresponding convolutional layers) with innumerable 3 × 3 kernel-size filters. VGG16 had already been training on NVIDIA Titan for weeks which are black GPUs The input to the conv layer is a 224 × 224 RGB image. The image is pass through a series of conv layers with such a limited receptive field: 33% also has 1 × 1 convolution filters, that are a linear function of the input channels in a few of the configurations. The convolution process is set to one pixel, and the convolution is spatially padded. After convolution, the spatial resolution is preserved. Spatial pooling is handled by 5 max pooling layers that track few of the convolution layers. Max-pooling is used across a 2 × 2-pixel frame with a stride size 2.

232

T. F. Promy et al.

3.3.2 Inception V3 According to the ImageNet dataset, Inception v3 is a widely used image recognition model that has been shown to attain at least 78.1% of accuracy [25]. The model is the product of multiple conceptions that have been investigated by a variety of researchers over time. The model has symmetric and asymmetric construction components. 3.3.3 MobileNetV2 MobileNetV2 is design for aiming to be mobile-friendly. MobileNetV2’ includes a fully convolutional layer with 32 filters, followed by 19 residual bottleneck layers [26]. It is divided into two parts-Inverted Residual Block and Bottleneck Residual Block. Convolution of 1 × 1 without any linearity.

4 Experimental Result Analysis 4.1 Experiment 1: Breast Cancer We run InceptionV3, VGG16, MobilenetV2 models in our chosen dataset MINI-DDSM and KAU-BCMD to obtain the results of our accuracy and loss of these models. Figure 2 shows the accuracies in the models. For VGG16, the accuracy alternates around 99.9%, and validation accuracy is up to 98.5% also the loss in the training is around 0.0004 and validation loss is found around 0.0860. For the InceptionV3 model, the training accuracy is around 99.87% and validation accuracy is around 94%, where the loss in the training is about 0.0141 and validation loss is around 0.2154. For the MobilenetV2 model, the training accuracy is around 99.98% and validation accuracy is around 69.50%. The loss in the training and validation is about 0.0075 and 0.9186, respectively.

Fig. 2. Accuracy graphs of VGG16, InceptionV3, MobilenetV2 model on the dataset (Breast cancer)

4.2 Experiment 2: Lung Cancer For lung cancer, we used 1000 CT scans images on the models. We ran IQ-OTH/NCCD and Chest CT scans dataset in the VGG16, Inception V3, MobileNet V2; and gathered the results of accuracy and loss of these models, where most of the datasets were taken from

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures

233

IQ-OTH/NCCD dataset. In VGG16, we got train accuracy around 99.9% and validation accuracy 99.9% with the train loss about 0.0026 and the validation loss around 0.0041. For Inception V3, we got train accuracy around 99.9% and validation accuracy around 88.5%. The train loss was 0.0105 and validation loss was around 0.304. Also, the train accuracy is around 99.9% and validation accuracy is around 89.5% in MobileNet V2 and the train and validation loss are around 0.002 and 0.267, as depicted in Fig. 3.

Fig. 3. Accuracy graphs of VGG16, InceptionV3, MobilenetV2 model on the dataset (Lung cancer)

4.3 Experiment 3: Skin Cancer We use the 1000 images from the HAM10000 dataset for Skin Cancer. This dataset consists of dermatoscopic images acquired and stored by different modalities VGG16, Inception-V3, Mobilenet-V2 models have been used to run on our chosen dataset.

Fig. 4. Accuracy graphs of VGG16, InceptionV3, MobilenetV2 model on the dataset (Skin cancer)

Using VGG16, we get train accuracy around 99.9% and validation accuracy 93%. The train loss is 0.005 while the validation loss is found around 0.27. In Inception V3, we get train accuracy around 99.25% and validation accuracy around 87.50%. The train loss is 0.041 and validation loss is 0.37. For MobileNetV2, the train accuracy is around 99.9% and validation accuracy is around 86.5%. Also, the train loss is around 0.0051 and validation loss is found around 0.294, as depicted in Fig. 4. Figure 5 illustrates a confusion matrix of 3 different architectures for 3 cancer datasets. After comparing the accuracy for the cancer detection of predefined models, using the same epoch, VGG16 performs excellently for the selected separate cancer datasets.

234

T. F. Promy et al.

Fig. 5. Confusion Matrix of different models

At first, for breast cancer, the value of the accuracy reached 99.8% for training and 98.5% for validation where InceptionV3 delivered almost similar accuracy 99.87% for training and 94% for validation. Although for MobilenetV2, we get the same training accuracy but the validation accuracy dropped to 69.50%. Therefore, we can consider VGG16 as the best performing model following InceptionV3. For our lung cancer dataset, we get the training accuracy 99.9% with similar validation accuracy for VGG16 but for InceptionV3, the training accuracy remained same but validation accuracy delivered 88.5%. Next, for MobilenetV2 the training accuracy does not change but we get validation accuracy around 89.5%. So, we can consider VGG16, MobilenetV2 as suitable models for lung cancer. Though InceptionV3 performed quite well too. Lastly, for skin cancer (melanoma), we get the training accuracy 99.98% for VGG16 but the validation accuracy delivered 93% in this case. InceptionV3 delivered 87.50% for validation accuracy with the same training accuracy. After that, MobilenetV2 conveys 86.5% of accuracy, as illustrated in Table 2.

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures

235

Table 2. Test accuracy of different architectures for various cancer dataset. Dataset

VGG16

Inception V3

MobileNet

Breast Cancer

98.5%

94%

69.5%

Lung Cancer

99.90%

88.5%

89.5%

Melanoma

93%

87.5%

86.5%

Average

97.13%

90%

81.83%

5 Conclusion Our technique has demonstrated to be highly effective in terms of multiple datasets. We also offered a thorough overview of existing methods for diagnosing and quick detection of a variety of cancers that have a significant impact on the human body. The purpose of this article is to examine and categorize, compare cancer-related approaches with small datasets, as well as to identify any gaps. Another goal of this study is to provide new researchers with a thorough background in order to begin their research in this sector. Our findings indicate that pre-trained CNN models can automatically extract features from Mammographic, CT scan, and Dermoscopic images, and that a good classifier can really be trained utilizing these features without any need for hand-crafted features. In this analysis, we concentrated on transfer learning approaches, pre-processing, pretraining models, and convolutional neural network (CNN) models as they apply to all the mentioned image recognition and detection. In conclusion, a thorough evaluation of the models, VGG16 delivered comparatively higher accuracy with our datasets. So, we expect to see far better results in any other dataset, and we want to continue working on the other datasets in the future to develop a customized model that can be used for multiple cancer detection.

References 1. Bhuiyan, M.R., et al.: A deep crowd density classification model for Hajj pilgrimage using fully convolutional neural network. PeerJ Comput. Sci. 25(8), e895 (2022) 2. Sabab, M.N., Chowdhury, M.A.R., Nirjhor, S.M.M.I., Uddin, J.: Bangla speech recognition using 1D-CNN and LSTM with different dimension reduction techniques. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds.) iCETiC 2020. LNICSSITE, vol. 332, pp. 158– 169. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60036-5_11 3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012) 4. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: The International Conference on Machine Learning, pp. 448–456. PMLR (2015) 5. Ruhi, Z.M., Jahan, S., Uddin, J.: A novel hybrid signal decomposition technique for transfer learning based industrial fault diagnosis. Ann. Emerg. Technol. Comput. 5(4), 37–53 (2021). https://doi.org/10.33166/AETiC.2021.04.004

236

T. F. Promy et al.

6. Brownlee, J.: A gentle introduction to transfer learning for deep learning. Mach. Learn. Mastery 20 (2017) 7. CDCBreastCancer. “What is a mammogram?” Centers for Disease Control and Prevention (2022). https://www.cdc.gov/cancer/breast/basicinfo/mammograms.htm. Accessed 12 May 2022 8. Guan, S., Loew, M.: Breast cancer detection using transfer learning in convolutional neural networks. In: 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–8. IEEE (2017) 9. Suckling, J.P.: The mammographic image analysis society digital mammogram database. Digit. Mammo 375–386 (1994) 10. Pub, M.H., Bowyer, K., Kopans, D., Moore, R., Kegelmeyer, P.: The digital database for screening mammography. In: Fifth International Workshop on Digital Mammography, pp. 212–218 (2001) 11. Alsolami, A.S., Shalash, W., Alsaggaf, W., Ashoor, S., Refaat, H., Elmogy, M.: King Abdulaziz University breast cancer mammogram dataset (KAU-BCMD). Data 6(11), 111 (2021) 12. Islam, M.N., et al.: Diagnosis of hearing deficiency using EEG based AEP signals: CWT and improved-VGG16 pipeline. PeerJ Comput. Sci. 7, e638 (2021) 13. Ragab, D.A., Sharkas, M., Marshall, S., Ren, J.: Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ 7, e6201 (2019) 14. Fedorov, A., et al.: Standardized representation of the LIDC annotations using DICOM (No. e27378v2). PeerJ Preprints (2019) 15. Pehrson, L.M., Nielsen, M.B., Ammitzbøl Lauridsen, C.: Automatic pulmonary nodule detection applying deep learning or machine learning algorithms to the LIDC-IDRI database: a systematic review. Diagnostics 9(1), 29 (2019) 16. Sajja, T., Devarapalli, R., Kalluri, H.: Lung cancer detection based on CT scan images by using deep transfer learning. Traitement du Signal 36(4), 339–344 (2019) 17. Jiang, H., Ma, H., Qian, W., Gao, M., Li, Y.: An automatic detection system of lung nodules based on a multigroup patch-based deep learning network. IEEE J. Biomed. Health Inform. 22(4), 1227–1237 (2017) 18. Da Nóbrega, R.V.M., Peixoto, S.A., da Silva, S.P.P., Rebouças Filho, P.P.: Lung nodule classification via deep transfer learning in CT lung images. In: 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS), pp. 244–249. IEEE (2018) 19. Lyu, J., Ling, S.H.: Using multi-level convolutional neural networks for classification of lung nodules on CT images. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 686–689. IEEE (2018) 20. Kondaveeti, H.K., Edupuganti, P.: Skin cancer classification using transfer learning. In: 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (ICATMI), pp. 1–4. IEEE (2020) 21. Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1), 1–9 (2018) 22. Lekamlage, C.D., Afzal, F., Westerberg, E., Cheddad, A.: Mini-DDSM: mammography-based automatic age estimation. In: 2020 3rd International Conference on Digital Medicine and Image Processing, pp. 1–6 (2020) 23. Kareem, H.F., AL-Husieny, M.S., Mohsen, F.Y., Khalil, E.A., Hassan, Z.S.: Evaluation of SVM performance in the detection of lung cancer in marked CT scan dataset. Indonesian J. Electr. Eng. Comput. Sci. 21(3), 1731–1738 (2021) 24. Bhandary, A., et al.: Deep-learning framework to detect lung abnormality–a study with chest X-Ray and lung CT scan images. Pattern Recogn. Lett. 129, 271–278 (2020)

Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures

237

25. Sachan, A.N.K.I.T.: Detailed guide to understand and implement ResNets (2019). Accessed 5 Nov 2020 26. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

Transfer Learning Based Skin Cancer Classification Using GoogLeNet Sourav Barman1(B) , Md Raju Biswas1 , Sultana Marjan1 , Nazmun Nahar1 , Mohammad Shahadat Hossain2 , and Karl Andersson3 1

Noakhali Science and Technology University, Noakhali, Bangladesh {barman2514,raju2514,marjan2514}@student.nstu.edu.bd, [email protected] 2 University of Chittagong, Chittagong, Bangladesh hossain [email protected] 3 Lulea University of Technology, Skelleftea, Sweden [email protected]

Abstract. Skin cancer has been one of the top three cancers that can be fatal when caused by broken DNA. Damaged DNA causes cells to expand uncontrollably, and the rate of growth is currently increasing rapidly. Some studies have been conducted on the computerized detection of malignancy in skin lesion images. However, due to some problematic aspects such as light reﬂections from the skin surface, diﬀerences in color lighting, and varying forms and sizes of the lesions, analyzing these images is extremely diﬃcult. As a result, evidence-based automatic skin cancer detection can help pathologists improve their accuracy and competency in the early stages of the disease. In this paper, we present a transfer ring strategy based on a convolutional neural network (CNN) model for accurately classifying various types of skin lesions. Preprocessing normalizes the input photos for accurate classiﬁcation; data augmentation increases the amount of images, which enhances classiﬁcation rate accuracy. The performance of the GoogLeNet transfer learning model is compared to that of other transfer learning models such as Xpection, InceptionResNetVe, and DenseNet, among others. The model was tested on the ISIC dataset, and we ended up with the highest training and testing accuracy of 91.16% and 89.93%, respectively. When compared to existing transfer learning models, the ﬁnal results of our proposed GoogLeNet transfer learning model characterize it as more dependable and resilient. Keywords: Skin cancer learning

1

· GoogLeNet · Data augmentation · Transfer

Introduction

The skin is the body’s biggest organ, protecting all of the inner organs from the environment. It aids in temperature regulation and infection protection. Thereare three layers to the skin: The epidermis, dermis, and hypodermis are the three layers of the skin. c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 238–252, 2023. https://doi.org/10.1007/978-3-031-34619-4_20

Transfer Learning Based Skin Cancer Classiﬁcation

239

CCancer is a life-threatening disease for humans. It can sometimes result in a human’s death. In the human body, various types of cancer can exist, and skin cancer is one of the most rapidly developing tumors that can lead to death. It is triggered by a variety of circumstances, including smoking, alcohol consumption, allergy, infections, viruses, physical stress, changes in the environment, and sensitivity to ultraviolet (UV) rays, among others. UV radiation from the sun have the potential to destroy the DNA within skins. Skin cancer can also be caused by odd inﬂammations of the human body. According to the World Health Organization (WHO), Skin cancer is one out of three problems in every cancer cases [23]. In the United States, Canada, and Australia, the amount of persons detected with skin cancer has already been steadily growing over the previous few years. In the United States, it is estimated that 5.4 million cases of skin cancer will be detected each year. Every day, there is a growing pressure for speedy and accurate clinical testing [29]. AAs a consequence, timely identiﬁcation of skin cancer may result to earlier detection and treatment, potentially saving lives. Various forms of computeraided diagnosis (CAD) methods have been designed to detect skin cancer throughout the last few years. In order to identify cancer, conventional computer vision techniques are mostly employed as a detector to capture a large number of attributes such as shape, size, color, and texture. Artiﬁcial intelligence (AI) has evolved into a capability to address these issues in recent years. There are some architectures that mostly uses in the medical ﬁeldLike DNN (Deep neural network), CNN (convolutional neural network), LSTM (long short-term memory), and recruit neural network (RNN). All of those models are able to check for skin cancer. Furthermore, CNN and DNN create satisﬁed results in this ﬁeld. The most often used method is CNN, which is a collection of classiﬁcation algorithms for feature learning and classiﬁcation techniques. Now the outcome will be increased using transfer learning within vast data sets. The following is a summary of our paper’s primary contribution: – We present a transfer learning model based on the GoogLeNet model that more eﬀectively detects skin cancer people, even if they are at a preliminary phase. – With a large dataset, our suggested transfer learning model performs better in terms of accuracy than other deep learning (DL) models that are currently available. The remainder of this paper is organized as follows: The rest of the part of this paper is managed and Sect. 2 shows the literature review. Section 3 explains the methodology. Section 4 contains the results and discussion, while Sect. 5 contains the conclusion and future work.

240

2

S. Barman et al.

Related Work

In [6], A. Enezi et al. have proposed two machine learning algorithms. Feature extraction is done using a convolutional neural network (CNN), and classiﬁcation is done with a support vector machine. The proposed system can successfully detect three skin diseases, and the accuracy rate is 100%. The system was fast and accurate. The system can only detect three diseases, and being web-based, it was not helpful to everyone. In [33], Vijayalakshmi, M.M. et al. have oﬀered three alternative techniques to eﬀectively identify and categorise melanoma skin cancer. The model is designed in three phases. Pre-processing involves removing hair, glare, and shade from photos in the initial stage. Segmentation and classiﬁcation form the second phase. Utilising a Convolution Neural Network to extract features (CNN). In order to classify the photos, a neural network and support vector machine are used. They have an accuracy of 85%. In [28], Rathod, J., Waghmare, V., Sudha, A., and Shivashankar, et al. have proposed an automated image-based system for skin disease recognition using machine learning classiﬁcation. The proposed system extracts the features using a Convolutional Neural Network (CNN) and classiﬁes the image based on the algorithm of the softmax classiﬁer. An initial training gives the output accuracy of 70% approximately. In this paper, they initially tested ﬁve diseases. We can further increase the accuracy by more than 90% if we use a large dataset. In [10], Bhadula, S., Sharma, S., Juyal, P., and Kulshrestha, C. et al. have included ﬁve diﬀerent machine learning techniques that were used on a dataset of skin infections to predict skin diseases. These algorithms are random forest, naive Bayes, logistic regression, kernel SVM, and CNN. Above all, the algorithm Convolutional Neural Network (CNN) gives the best training and testing accuracy of 99.05%. Early diagnosis and classiﬁcation of skin diseases helps to lessen the disease’s eﬀects. In this study, the researcher have some limitation with access to and availability of medical information. In [18], Bhadula, S., Sharma, S., Juyal, P., and Kulshrestha, C. et al. have included ﬁve diﬀerent machine learning techniques that were used on a dataset of skin infections to predict skin diseases. These algorithms are random forest, naive Bayes, logistic regression, kernel SVM, and CNN. Above all, the algorithm Convolutional Neural Network (CNN) gives the best training and testing accuracy of 99.05%. Early diagnosis and classiﬁcation of skin diseases helps to lessen the disease’s eﬀects. In this study, the researcher have some limitation with access to and availability of medical information. In [24], Padmavathi, S., Mithaa, E.M., et al. have proposed convolutional neural networks (CNN) and residual neural networks (ResNet) to predict skin

Transfer Learning Based Skin Cancer Classiﬁcation

241

disease. A dataset of 10015 dermatoscopic images divided into seven classiﬁcations was used in this study. The experimental results show that the Convolutional neural network has an accuracy of 77%, whereas ResNet has an accuracy of 68%. They mentioned that Convolution Neural Networks perform better than Residual Neural Networks in diagnosing skin diseases. In [13], EL SALEH, R., BAKHSHI, et al. have mentioned a Convolutional neural network model named VGG-16 for face disease identiﬁcation. A dataset comprising ten classes and each containing 1200 photos is used to test and analyse the suggested approach. The model can successfully identify eight facial skin diseases. Python is utilized to implement the algorithms, while python OPENCV is employed for pre-processing. The model achieves an accuracy of 88%. Further, we can improve the model by increasing the dataset size and applying a new deep neural network. In [5], Samuel Akyeramfo-Sam, Derrick Yeboah et al. proposed an intelligent way to detect skin diseases by Machine learning (ML) that uses Convolutional Neural Network (CNN), decision trees (DT), artiﬁcial neural network (ANN) support vector machines (SVM). The CNN model and the pattern learned are used to classify the test dataset. The system is successful in detecting three types of diseases. The average accuracy is 85.9%. In [12], JinenDaghrir, LotﬁTlig et al. raised an automated system to detect melanoma using the three diﬀerent methods. This relies on a convolutional neural network, Two classical machine learning methods. Taking the feature out, a training phase is necessary to create a classiﬁcation model for melanoma detection. The support vector machines (SVMs) process the image and calculate complexity. However, they suggested comparing a KNearestNeighbor (KNN) classiﬁer and an Artiﬁcial Neural Network (ANN), which showed that the ANN was more accurate than the KNN. Though their raised system is so fast and successful, they work only on some diseases. They can use CNN to improve the system. In [31], Xiaoxiao Sun et al. proposed an automated skin diseases detection system. They work on some datasets to detect some skin diseases. Their paper presents no exact method and system. This paper introduces and comes out with a data set to ﬁnd some skin diseases. The main success ﬂow is data collection, which we will use in the future.

3

Methodology

The technique we recommend for detecting Skin cancer is outlined in this section. The approach is separated into several sections. In this methodology, we ﬁrst gather a training dataset. After gathering the training dataset, we pre-process the dataset to obtain clean image data for better input and carry out data augmentation. Data analysis is the last step before classiﬁcation and a learning model are created. Figure 1 illustrates our research’s general technique.

242

S. Barman et al.

Fig. 1. Proposed Model

3.1

Dataset Description

The data were obtained via Kaggle [1]. There are a lot of contrasting pictures in the collection. Nine types of data are included in the dataset. There are nine types of skin cancer:Actinic Keratosis, Basal Cell Carcinoma, Dermato Fibroma, Melanoma, Nevus, Pigmented Benign, Keratosis, Seborrheickeratosis, Squamous Cell Carcinoma, Vascular Lesion. The system takes the picture and compare those picture with the dataset and perform some action. 3.2

Data Preprocessing

There are taken pictures in the collection. However, as GoogleNet is built to accept coloured photos with an input layer size of 224 224 3, pre-processing is necessary. Z-score Normalization was used to ﬁrst standardize the intensity levels of the images. Equation 1 was used to normalize each image’s value to be within the range of 0 to 1. x−σ (1) z= s where s represents the standard deviation of the training sample and x represents the training sample. 3.3

Data Augmentation

Due to insuﬃcient training data, deep learning models such as GoogleNettransfer learning model for skin disease classiﬁcation become hampered. To increase the stability, and expand the functional variety of the model, more data is needed. To achieve this, we employ augmentation [30] to signiﬁcantly skew the dataset.

Transfer Learning Based Skin Cancer Classiﬁcation

243

Image augmentation techniques include rotation, width shift, shear range, height shift, and zoom. The model can now generalize more eﬀectively thanks to the enhanced data. In this regard, we have utilized Image Data Generator. The settings for data augmentation used in this study are as follows: Table 1. Data augmentation settings Augmentation techniques Range

3.4

Rotation

40

Width Shift

0.2

Shear range

0.2

Height Shift

0.2

Zoom

0.20

Fill mode

nearest

GoogLeNet

GoogLeNet is a 22 layers deep CNN (Convolutional Neural Network). GoogLeNet features nine linearly ﬁtted inception modules. The architecture, which determines the global average pooling, which replaces fully connected layers withaverage of each map’s feature. Nowadays, GoogLeNet is now utilized for various computer vision tasks, including face detection and identiﬁcation, adversarial training, and so on. GoogLeNet took ﬁrst place in the ILSVRC 2014 competition thanks to the inception block, which utilizes parallel convolution to enhance the width and depth of networks. The speciﬁcs of each inception block are shown in Fig. 2. Each inception block employs four routes to obtain detailed spatial information. To lessenThe use of 1 × 1 convolutions is based on feature dimensions and processing costs. BecauseAfter each inception block, features are concatenated; if no constraints were placed in place, computation costs would increase as feature dimensions in a matter of steps increased. The intermediate features’ dimensions are reduced by utilizing 1-by-1 convolutions. Each path’s units have a diﬀerent ﬁlter after convolution. Widths to guarantee that separate local spatial feature sets may be retrieved and combined. Noteworthy is the use of max-pooling in the ﬁnal approach, which removes the ability to extract new features eliminates the requirement for additional parameters. Following the integration of all the data, Google Net topped the ImageNet classiﬁcation test of the well designed architecture. 3.5

Xception Model

DDeeply separable Convolutions are employed in the Xception deep convolutional neural network design [11]. Francois Chollet, an employee at Google, Inc.,

244

S. Barman et al.

Fig. 2. Inception Block.

introduced this network. It’s call extreme version and it’s come from Inception module. An Inception module is called a deep neural network (DNN). The inventor of this model inspires by a movie that’s name is Inception (directed by Christopher Nolan) is a movie released in 2010 Xception is a 71 layers convolutional neural network. Its accuracy will more from Inception V3. For this assumption it’s an extreme version of Inception. Xception, which stands for “extreme inception,” pushes Inception’s core concepts to their absolute extent.1 × 1 convolutions were used to compress the original input in Inception, and diﬀerent sorts of ﬁlters were applied to each depth space from each of those input spaces.With xception, the opposite happens. Instead, Xception applies the ﬁlters to every depth map separately before compressing the input space all at once with 1X1 convolution. A depthwise separable convolution is quite similar to this method. There is another diﬀerence between Inception and Xception. Whether or not there is a non-linearity after the previous trial. While Xception doesn’t introduce any non-linearity, the Inception model has a ReLU non-linearity that follows both processes. 3.6

DenseNet Model

Dense Neural network (DenseNet) is working like a feed-forward fashion [35]]. It’s connecting each layer. Main focus point of this model is to go deeper and eventually take care about to making them more eﬃcient to train. If we think about other neural network then we can see there are L connection for L layers but for DenseNet our network has L(L+1)/2 direct connections. Image classiﬁcation is main, fundamental and essential computer vision task. VGG has 19 layers, the original LeNet5 had 5, and Residual Networks (ResNet) have crossed the 100-layer threshold. These models could encounter issues including too many parameters, gradient disappearance, and challenging training. In comparison to models like VGG and ResNet, the Dense Convolutional Network (DenseNet)

Transfer Learning Based Skin Cancer Classiﬁcation

245

exhibits dense connection. Direct connections from any layer to all subsequent layers distinguish the DenseNet model from other CNNs and potentially enhance the information ﬂow between layers. As a result, DenseNet may eﬀectively minimize some parameters, improve feature map propagation, and solve the gradient vanishing problem. 3.7

Inception-Resnet V2 Model

More than one million pictures were used to train the convolutional neural network named Inception-Resnet V2. In this case, ImageNet was used to train the model [34]. The Inception-Resnet V2 model contains a 164-layer network that can classify images into 1000 diﬀerent object categories, including pencil, mouse, keyboard, and animal images. Through a comparative investigation and examination of the classiﬁcation model’s structure, an improved Inception-ResNet-v2 model based on CNN is created in order to increase the convolutional neural networks (CNN) accuracy in image classiﬁcation [27]. Model Inception-ResNet-v2, which can extract features under various receptive ﬁelds and lower the number of model parameters. In addition, it creates a channel ﬁltering module based on a comparison of all available data to ﬁlter and combine channels, realizing eﬃcient feature extraction. 3.8

Transfer Learning

A technique called transfer learning uses a model that has already been trained to learn new information from an existing set of data [32]. There is an input space (Ds), a training task (Ts), a target domain (Dt), and related data in the input space. Transfer learning seeks to raise trainee performance on a given task (Tt). Data from Ds and T are combined. Diﬀerent transfer learning settings are established depending on the type of task and the nature of the data available at the source and destination domains. The transfer learning method is called “inductive transfer learning” when both the source and target domains have labelled data available for a classiﬁcation task [25]. The domain in this instance is D = (xi, yi), where xi is the feature vector of the ith training set and yi the classiﬁer. There are 24 million trainable parameters in the 164-layer Google Net. For training and optimization, this kind of deep model needed a sizable dataset, which is why Google’s Neural Network (googleLeNet) was trained on the ImageNet dataset, which has over 1.2 million photos organised into 1000 diﬀerent categories. As a result, smaller datasets, such as the skin cancer, are more easily analyzed. Overﬁtting is likely to be a problem for the model. It is at this stage that the transfer takes place. This is where learning plays a role. We use pretrained weights to create the model. After which you should ﬁne-tune it to ensure that it can complete the task at it, which in our instance was. For smaller datasets, such as brain tumours, we don’t need to start from scratch when training the model. Because

246

S. Barman et al.

the GoogLeNet model was designed for a diﬀerent purpose, considerable structural changes are needed to classify skin cancer. The last three layers of the GoogLeNet model were tweaked to ﬁt the intended purpose. The average pooling layer in the ﬂatten layer and the fully connected layer of googleNet were added to the original model to replace it was also scrapped, along with a system that was supposed to categorise 1000 separate classiﬁcations. Four output sizes in a new FC layer have been added. After the FC layer, the softmax layer’s activation was similarly changed out for a fresh one.

4

Result and Discussion

This section describes our approach’s experiments and outcomes. 4.1

System Configuration

In this system we used a tensor-ﬂow neural network to run the convolutional neural network. The most probable reason is to use this network for there are several matrix multiplication. We face some problem when we work on this and only CPU processing is the most reason for our work and then we use Google collaborative cloud server. And then easily we operate the CPU and Jupyter Notebook. After use those we can train and evaluate our proposed deep learning approach 4.2

Training and Test Datasets

In this experiment we use 2357 colored 521X512 sized image of Skin Cancer. The total number of images in each class in given in the following table: Table 2. Diﬀerent parameters Class

Training Images Testing Images

Actinic Keratosis

114

16

Basal Cell Carcinoma

376

16

Dermato Fibroma

95

16

Melanoma

438

16

Nevus

357

16

Pigmented Benign Keratosis 462

16

Seborrheic keratosis

77

3

Squamous Cell Carcinoma

181

16

Vascular Lesion

139

3

Total

2239

118

Transfer Learning Based Skin Cancer Classiﬁcation

4.3

247

Transfer Learning Model’s Hyperparameters

Categorical crossentropy is the loss function we’re employing to train our model. We train our approach for a maximum of 50 epochs because there are no more such variances in training and validation levels of accuracy. The loss function is optimized using Adam’s optimizer. In our method, Table 4 illustrates the bestconﬁgured hyper-parameters. The total number of epochs and batch size in our test are 50 and 16, respectively. Table 3. Hyperparameters Hyper-parameters Value Loss Function

4.4

Categorical Cross-entropy

Epochs

50

Batch Size

16

Optimizer

Adam

Learning rate

0.001

Performance Matrices

For the traditional evaluation of a classiﬁcation model, a number of performance parameters are described. The most often used statistic is classiﬁcation accuracy. Classiﬁcation accuracy is determined by the ratio of correctly classiﬁed observational data to the total number of observational data. Precision, recall (or sensitivity), and speciﬁcity, which are all signiﬁcant measures in classiﬁcation problems, can be calculated using the following equations. The number of classiﬁed true positives, false positives, true negatives, and false negatives is denoted by the letters TP, FP, TN, and FN. The harmonic mean of precision and recall is used to calculate the F-score, which is a useful statistical tool for classiﬁcation. The following equation can be used to determine accuracy, precision, recall, and f-score. TP + TN (2) Accuracy = TP + FN + FP + TN TP (3) P recicion = TP + Fp TP TP + FN 2 × P recision × Recall F − Score = P recision + Recall Recall =

(4) (5)

248

4.5

S. Barman et al.

Result

We take our model and try to ﬁnd the ﬁnal result and now we get training result is 91.67% and loss is 1.11%. In the same we also get 89.93% accuracy and 1.99% loss for testing data. All those data are given bellow in the Table 3. Table 4. Accuracy and Loss Model Part Accuracy Loss Training

91.17%

1.11

Testing

89.93%

1.99

Our calculation will be more perfect for our classiﬁcation when the test dataset contains an equal number of observations for each class. Otherwise our dataset will be used to solve the aforementioned classiﬁcation problem. This model needs a more in-depth analysis of the proposed strategy employing additional performance metrics. Table 4 shows the Precision, Recall, and F1-score of our recommended transfer learning method as well as a comparison to alternative methods like DenseNet, Xception, and InceptionResnetV2. Precision, recall, and F1-score for our suggested technique are 0.785, 0.687, and 0.733, respectively. Our approach also outperforms the other three pre-trained algorithms, as seen in the table. Table 5. Performance Matrices Model

Accuracy Precision Recall F1-Score

GoogLeNet

89.93%

0.785

0.687

0.733

Xception

86.81%

0.762

0.562

0.638

DenseNet

85.59%

0.730

0.593

0.654

InceptionResNetV2 88.89%

0.750

0.656

0.706

In this measurement our CNN-based transfer model was trained with more iterations condition (up to 50) The optimal set epoch is 50 because our model’s training and validation accuracies have not increased. The accuracy and loss of the model are depicted in Fig. 3 and 4. In Fig. 3, the training accuracy is less than 72.5 percent, and the validation accuracy is less than 80.00 percent at the ﬁrst epoch. In ﬁg 4, the training loss is more than 1.4 and the validation loss is greater than 1.1 at the starting epoch. When the number of epochs is increased, accuracy improves and loss decreases.

Transfer Learning Based Skin Cancer Classiﬁcation

Fig. 3. Training and Validation Accuracy of GoogLeNet

4.6

249

Fig. 4. Training and Validation Loss of GoogLeNet

Comparison with Existing Work

[13] This paper applied the VGG-16 model to identify skin defects. A database of 12,000 photos was used to train and verify the model. It has an 88% accuracy rate.[8] Skin diseases may be predicted with 77% and 68% accuracy using deep learning neural networks (CNN) and Residual neural networks (ResNet). Additionally, it has been discovered that Convolution Neural Networks outperform Residual Neural Networks in diagnosing skin diseases. To increase the accuracy, they might need to create a hierarchical classiﬁcation algorithm utilizing retrieved photos. Predictions may therefore be made more often than with earlier models by utilizing ensemble features and deep learning.[10] TThe system analyses an image, and performing the most important part feature extraction using the CNN method and show the SoftMax image classiﬁer to identify diseases. An initial training results in an output accuracy of about 70%. Table 6. Comparison with Existing Work Author

Method

Acuuracy

E. SALEH R. et al. [13]

VGG-16

88%

S. Padmavathi et al. [24] CNN and ResNet 77% and 68%

5

J. Rathod [28]

CNN

0.70%

Proposed Method

GoggLeNet

0.89.93%

Conclusion and Future Work

The categorization of skin malignancies using transfer learning with GoogLeNet was discussed in this work. We categorised skin cancer into nine kinds in our study, which is the most comprehensive categorization of skin cancer to date. We

250

S. Barman et al.

used data augmentation techniques for the existing dataset because we needed a large amount of data for eﬀective training and deployment of CNN-based architecture. We were able to obtain the required result with this method. The suggested approach greatly outperforms state-of-the-art models, with precision, recall, and F1 scores of 76.16%, 78.15%, and 76.92%, respectively, according to the exploratory research. Using a variety of performance matrices, including the weighted average and total accuracy In addition, the model demonstrates its capability. More research can be done to analyze and understand the situation. In future we will collect more data to detect skin cancer disease and we will work with some other deep learning method and other method [2–4,7–9,14– 17,19–22,26,36].

References 1. The International Skin Imaging Collaboration (ISIC). The international skin imaging collaboration (ISIC). Accessed 30 April 2022 2. Abedin, M.Z., Akther, S., Hossain, M.S.: An artiﬁcial neural network model for epilepsy seizure detection. In: 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), pp. 860–865. IEEE (2019) 3. Ahmed, T.U., Hossain, M.S., Alam, M.J., Andersson, K.: An integrated CNNRNN framework to assess road crack. In: 2019 22nd International Conference on Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2019) 4. Ahmed, T.U., Jamil, M.N., Hossain, M.S., Andersson, K., Hossain, M.S.: An integrated real-time deep learning and belief rule base intelligent system to assess facial expression under uncertainty. In: 2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), pp. 1–6. IEEE (2020) 5. Akyeramfo-Sam, S., Philip, A.A., Yeboah, D., Nartey, N.C., Nti, I.K.: A web-based skin disease diagnosis using convolutional neural networks. Int. J. Inf. Technol. Comput. Sci. 11(11), 54–60 (2019) 6. ALEnezi, N.S.A.: A method of skin disease detection using image processing and machine learning. Procedia Computer Science 163, 85–92 (2019) 7. Basnin, N., Nahar, L., Hossain, M.S.: An integrated CNN-LSTM model for micro hand gesture recognition. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 379–392. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-68154-8 35 8. Basnin, N., Nahar, L., Hossain, M.S.: An integrated CNN-LSTM model for bangla lexical sign language recognition. In: Kaiser, M.S., Bandyopadhyay, A., Mahmud, M., Ray, K. (eds.) Proceedings of International Conference on Trends in Computational and Cognitive Engineering. AISC, vol. 1309, pp. 695–707. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-4673-4 57 9. Basnin, N., Nahar, N., Anika, F.A., Hossain, M.S., Andersson, K.: Deep learning approach to classify parkinson’s disease from MRI samples. In: Mahmud, M., Kaiser, M.S., Vassanelli, S., Dai, Q., Zhong, N. (eds.) BI 2021. LNCS (LNAI), vol. 12960, pp. 536–547. Springer, Cham (2021). https://doi.org/10.1007/978-3-03086993-9 48 10. Bhadula, S., Sharma, S., Juyal, P., Kulshrestha, C.: Machine learning algorithms based skin disease detection. Int. J. Innovative Technol. Explor. Eng. (IJITEE) 9(2) (2019)

Transfer Learning Based Skin Cancer Classiﬁcation

251

11. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) 12. Daghrir, J., Tlig, L., Bouchouicha, M., Sayadi, M.: Melanoma skin cancer detection using deep learning and classical machine learning techniques: a hybrid approach. In: 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5. IEEE (2020) 13. El Saleh, R., Bakhshi, S., Amine, N.A.: Deep convolutional neural network for face skin diseases identiﬁcation. In: 2019 5th International Conference on Advances in Biomedical Engineering (ICABME), pp. 1–4. IEEE (2019) 14. Gosh, S., Nahar, N., Wahab, M.A., Biswas, M., Hossain, M.S., Andersson, K.: Recommendation system for e-commerce using alternating least squares (ALS) on apache spark. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 880–893. Springer, Cham (2021). https://doi.org/10.1007/978-3-03068154-8 75 15. Islam, R.U., Hossain, M.S., Andersson, K.: A deep learning inspired belief rulebased expert system. IEEE Access 8, 190637–190651 (2020) 16. Islam, R.U., Ruci, X., Hossain, M.S., Andersson, K., Kor, A.L.: Capacity management of hyperscale data centers using predictive modelling. Energies 12(18), 3438 (2019) 17. Kabir, S., Islam, R.U., Hossain, M.S., Andersson, K.: An integrated approach of belief rule base and deep learning to predict air pollution. Sensors 20(7), 1956 (2020) 18. Kumar, V.B., Kumar, S.S., Saboo, V.: Dermatological disease detection using image processing and machine learning. In: 2016 3rd International Conference on Artiﬁcial Intelligence and Pattern Recognition (AIPR), pp. 1–6. IEEE (2016) 19. Nahar, N., Ara, F., Neloy, M.A.I., Biswas, A., Hossain, M.S., Andersson, K.: Feature selection based machine learning to improve prediction of parkinson disease. In: Mahmud, M., Kaiser, M.S., Vassanelli, S., Dai, Q., Zhong, N. (eds.) BI 2021. LNCS (LNAI), vol. 12960, pp. 496–508. Springer, Cham (2021). https://doi.org/ 10.1007/978-3-030-86993-9 44 20. Nahar, N., Ara, F., Neloy, M.A.I., Barua, V., Hossain, M.S., Andersson, K.: A comparative analysis of the ensemble method for liver disease prediction. In: 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), pp. 1–6. IEEE (2019) 21. Nahar, N., Hossain, M.S., Andersson, K.: A machine learning based fall detection for elderly people with neurodegenerative disorders. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 194–203. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 18 22. Neloy, M.A.I., Nahar, N., Hossain, M.S., Andersson, K.: A weighted average ensemble technique to predict heart disease. In: Kaiser, M.S., Ray, K., Bandyopadhyay, A., Jacob, K., Long, K.S. (eds.) Proceedings of the Third International Conference on Trends in Computational and Cognitive Engineering. LNNS, vol. 348, pp. 17–29. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-7597-3 2 23. Pacheco, A.G., Krohling, R.A.: Recent advances in deep learning applied to skin cancer detection. arXiv preprint arXiv:1912.03280 (2019) 24. Padmavathi, S., Mithaa, E., Kiruthika, T., Ruba, M.: Skin diseases prediction using deep learning framework. Int. J. Recent Technol. Eng. (IJRTE) (2020) 25. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)

252

S. Barman et al.

26. Pathan, R.K., Uddin, M.A., Nahar, N., Ara, F., Hossain, M.S., Andersson, K.: Gender classiﬁcation from inertial sensor-based gait dataset. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 583–596. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8 51 27. Peng, C., Liu, Y., Yuan, X., Chen, Q.: Research of image recognition method based on enhanced inception-resnet-v2. Multimedia Tools and Applications, pp. 1–21 (2022) 28. Rathod, J., Waghmode, V., Sodha, A., Bhavathankar, P.: Diagnosis of skin diseases using convolutional neural networks. In: 2018 2nd International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1048–1051. IEEE (2018) 29. Rogers, H.W., Weinstock, M.A., Feldman, S.R., Coldiron, B.M.: Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the us population, 2012. JAMA Dermatol. 151(10), 1081–1086 (2015) 30. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019) 31. Sun, X., Yang, J., Sun, M., Wang, K.: A benchmark for automatic visual classiﬁcation of clinical skin disease images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 206–222. Springer, Cham (2016). https:// doi.org/10.1007/978-3-319-46466-4 13 32. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: K˚ urkov´ a, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7 27 33. Vijayalakshmi, M.: Melanoma skin cancer detection using image processing and machine learning. Int. J. Trend Sci. Re. Develop. (IJTSRD) 3(4), 780–784 (2019) 34. Wan, X., Ren, F., Yong, D.: Using inception-resnet v2 for face-based age recognition in scenic spots. In: 2019 IEEE 6th International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 159–163. IEEE (2019) 35. Zhong, Z., Zheng, M., Mai, H., Zhao, J., Liu, X.: Cancer image classiﬁcation based on densenet model. In: Journal of Physics: Conference Series. vol. 1651, p. 012143. IOP Publishing (2020) 36. Zisad, S.N., Hossain, M.S., Andersson, K.: Speech emotion recognition in neurological disorders using convolutional neural network. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 287–296. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 26

Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients Using Machine Learning Techniques Prosenjit Karmaker

and Muhammad Sajjadur Rahim(B)

Department of Information and Communication Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh [email protected]

Abstract. There is currently little evidence linking COVID-19 to Alzheimer’s Disease (AD). The goal of this paper is to examine the correlation among COVID19 symptoms to identify risks for AD patients and to determine the conditions that put AD patients in danger. We have developed a Machine Learning (ML) based model called AD-Cov-CorrelationNet that shows the relationship between various health issues and whether every attribute in the dataset is connected. We have discovered a direct link between several health issues in AD patients. The risk of getting an infection when they are directly contacted by the outside environment is very high. Although there is no direct contact with the outside environment, AD patients are still vulnerable to some health issues which cause serious problems and increase the risks of death. Supervised learning models such as Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) are utilized to understand the disease prognosis. The risk factors that the models predicted are clinically meaningful and relevant to reducing fatality. This comparative analysis achieves more than 98% accuracy, 97% precision, 97% recall, 97% F1 score, and accurate Receiver Operating Characteristic (ROC) curves. Keywords: COVID-19 · Alzheimer’s disease · Machine learning algorithms · Cross-validation · Correlation · Symptom · Predictive model · Logistic Regression · KNN · Decision Tree · Random Forest · SVM · MLP

1 Introduction Late in 2019, the new coronavirus SARS-CoV-2 made its preliminary appearance in China, with a market in Wuhan, China, serving as its source. The COVID-19 virus spread across the world, causing the World Health Organization to proclaim a global outbreak in March 2020. There have been 3,349,786 COVID-19 cases and 238,628 deaths globally as of May 3, 2020. As our understanding of COVID-19 has grown, older age groups have emerged as one of the key risk variables linked to horrible outcomes following infection, with adults over 58 years old having a risk of dying from COVID-19 © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 253–266, 2023. https://doi.org/10.1007/978-3-031-34619-4_21

254

P. Karmaker and M. S. Rahim

that is double that of children [1]. COVID-19 symptoms can vary from mild to severe, and can even be fatal in some cases. Coughing, fever, loss of smell and taste are all common side effects, with migraine, nasal congestion, and respiratory problems being less so. In moderate to severe cases, other symptoms include severe stomach pains, sore throat, diarrhea, eye problems, swelling or purple toes, and breathlessness. A neurological disorder known as Alzheimer’s disease (AD) is characterized by memory loss, emotional disturbances, and abnormalities of the behavioral system. Alzheimer’s disease affects more than 50 million individuals worldwide (Alzheimer’s Report WHO), and most pharmaceutical medicines only have palliative effect. Aside from negatively impacting quality of life for patients and human health, Alzheimer’s disease has a large financial impact. The most widespread degenerative nerve disorder globally is Alzheimer’s disease (AD), indicating that up to 80% of Alzheimer’s is caused. Among the 50 leading reasons for decreased life expectancy, this is one of the fastest-growing; if current trends continue, the number of Alzheimer’s disease patients will exceed 150 million by 2050 [2, 3]. Patients with Alzheimer’s disease often have short-term and long-term memory loss, as well as confusion, rage, violence, language issues, and mood changes as the disease progresses. Alzheimer’s disease has a global economic cost of one billion dollars every year. State-of-the-art supervised learning models such as Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, Random Forest, Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) are used to assess the prognosis and course of the disease. This technique can classify enormous volumes of unstructured data, including correlations between symptoms and outcomes [4, 5]. Machine learning architectures and algorithms have developed recently because of their use in a variety of industries, including speech recognition, picture processing, and answering biological inquiries. In biological circumstances, their risk relationship might be dependent on other independent causative factors that have a strong correlation to the disease. However, a variety of unbalanced datasets frequently limits the model performance. In addition, all these models exhibit individual limitations. The results are superior to the accuracy of the diagnosis. The risk factors that the models predicted are clinically meaningful and relevant to reduce fatality. The Pearson’s correlation model is expected to perform noticeably better than its competitors in simulating the complex interconnection of Alzheimer’s disease (AD) patient risks to catch COVID-19. As a result, three research questions (RQs) are investigated to evaluate the effectiveness of the suggested tactic for state-of-the-art approaches: • RQ1: How can unbalanced data be effectively handled and prepared for machine learning (ML) models? Or, how can unbalanced data be made more balanced in order to process ML models? • RQ2: How effectively can correlation method categorize an AD patient’s COVID-19 infection risk based on symptoms? • RQ3: What possible risk factors could lead to the serious problems of the Alzheimer’s disease patients with COVID-19 (AD-COVID-19)? The proposed solution is correlation study. To accomplish this, we have developed a ML-based AD-Cov-CorrelationNet model. Using the proposed model, this research shows the relationship between various health issues and whether every attribute in

Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients

255

the dataset is connected to one another. Relational attributes vary with strength and direction when the main attribute changes. We discovered a direct link between several health issues that will be risky for Alzheimer’s disease (AD) patients and increase their risk of death. The goal of our inquiry is to identify the risk factors that endanger people with Alzheimer’s disease (AD). This paper’s main contributions are as follows: • In order to balance huge unbalanced datasets, the study investigated cutting-edge resampling approaches and used evaluation measures (Logistic Regression, KNN, Decision Tree, Random Forest, SVM and MLP). In comparison to the body of previous works, this dataset is huge and imbalanced. Researchers in the field are confident in the data balancing method used. • Based on a real dataset, the study used a correlation method called Pearson correlation coefficient for classifying symptoms of AD-COVID-19 patients. Sweeping attributes are used to optimize the model throughout the experiment. • Without deleting feature subsets, the study demonstrated a reliable correlation result for identifying risk variables from already-existing, diverse feature sets related to AD-COVID-19 patients. • The study revealed that ML models may be applied in clinical practice by providing patients with risk variables that have clear therapeutic benefits in addition to improvements in performance and accuracy. The following sections contains (2) Literature Review, (3) Data Description, (4) Research Methodology, (5) Learning Models, (6) Results and Analysis, (7) Comparative Study, and (8) Conclusions.

2 Literature Review According to a study as in [3], Alzheimer’s disease was the sixth-leading reason of death in the U.S in 2019, and the fifth-leading cause of mortality among Americans of age 65 and older. Deaths from stroke, heart problems, and HIV decreased between 2000 and 2019, whereas recorded Alzheimer’s disease mortality has climbed by more than 145%. Many researchers have already used a machine learning-based approach to predict COVID-19 using a cough dataset [6]. This research provides coronavirus positive or negative predictions for different age groups and regions but is not able to detect which illness or symptoms affect a patient badly. This study inspires us to do further research on coronavirus and Alzheimer’s patients. Furthermore, we studied about the lifestyle and situation of Alzheimer’s patients. According to a study on Alzheimer’s patients [7], we found the proper reasons and difficulties for dementia patients. We found an artificial-based home solution too but were unable to identify how different illnesses can take part with coronavirus-positive Alzheimer’s patients. The works in [8] provided us with the idea to study different illness of coronavirus on Alzheimer’s patients. In addition, the research in [9] provided us the fatality rate idea of Alzheimer’s patients due to COVID-19. In this paper, using Google collaboration, the sensitivity, accuracy, specificity, and area under the ROC curve of the comparative analysis are evaluated. We compare every illness factor with each Alzheimer’s patient who is COVID-19 positive. This

256

P. Karmaker and M. S. Rahim

study focuses on the correlation between different illnesses and coronavirus-positive Alzheimer’s patients. We also identify the accuracy of our research to ensure proper outcomes.

3 Dataset Description With the World Health Organization (WHO)’s open data repository (collected from kaggle.com), this research is focused to check over 5000+ AD-COVID-19 patients worldwide to identify how different health conditions and risk factors take effect on AD-COVID-19 patients. Using Pearson’s correlation coefficient, we attempt to detect infection risks based on symptoms related COVID-19 infection. Therefore, we use a survey dataset which has the information of every patient about 13 health conditions and 8 risk factors as depicted in Table 1. Table 1. Dataset description. Attribute

Data Type

Equivalent Data Type

Non-Null Count

Breathing problem

Object

Binary

5434 non-null

Fever

Object

Binary

5434 non-null

Sore throat

Object

Binary

5434 non-null

Runny nose

Object

Binary

5434 non-null

Dry cough

Object

Binary

5434 non-null

Asthma

Object

Binary

5434 non-null

Chronic lung disease

Object

Binary

5434 non-null

Heart disease

Object

Binary

5434 non-null

Headache

Object

Binary

5434 non-null

Diabetes

Object

Binary

5434 non-null

Hyper tension

Object

Binary

5434 non-null

Fatigue

Object

Binary

5434 non-null

Gastrointestinal

Object

Binary

5434 non-null

Abroad travel

Object

Binary

5434 non-null

Contact with COVID-19 patient

Object

Binary

5434 non-null

Attended large gathering

Object

Binary

5434 non-null

Visited public exposed places

Object

Binary

5434 non-null

Wearing masks

Object

Binary

5434 non-null

Sanitization

Object

Binary

5434 non-null

COVID-19

Object

Binary

5434 non-null

Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients

257

4 Research Methodology There are five subsystems in the proposed AD-Cov-CorrelationNet model as shown in Fig. 1. Data categorization and characterization are covered in the first subsystem. This subsystem explains how the symptoms are divided as attribute in dataset. The second subsystem deals with how imbalanced data is processed. The optimal approach to show the data using statistical indicators has been determined in the third subsystem utilizing a variety of machine learning algorithms. The fourth subsystem addresses the correlation method used to categorize the risks of getting infection. The fifth subsystem addresses the performance evolution part and provides the accuracy, precision, recall, and F1 Score. Another subsystem connected to the fourth subsystem provides comprehensive processing of the correlation method.

Fig. 1. The operational outline of the proposed AD-Cov-CorrelationNet model.

5 Learning Models Table 2 gives a concise view of different ML algorithms used as learning models.

258

P. Karmaker and M. S. Rahim Table 2. Definition of ML algorithms and characterization of learning models.

Sl. No

ML Algorithm

Definition

Pros and Cons

1

Logistic Regression

In order to predict a binary outcome, logistic regression uses prior observations from a data collection

The training of logistic regression is very effective and easier to implement and analyze. If the number of data points is smaller than the number of features, logistic regression should not be used

2

K-Nearest Neighbors (KNN)

Classification and regression problems can be addressed using the supervised machine learning method known as the K-Nearest Neighbors (KNN)

It is instance-based learning. KNN is simple to use. To implement KNN, only two parameters are needed. It is unable to handle huge datasets, aware of noisy data, missing values, and outliers

3

Decision Tree

It is a method of decision support that utilizes a tree-like model to describe options and their possible results, including the possibility of chance events

Easily interpreted and understood, excellent for visual depiction. It has the ability to use both numerical and category features

4

Random Forest

A classification system made up of several decision trees is called the random forest

It is effective with non-linear data. Low probability of mistakes, and effectively uses a large dataset. Training is slow. For linear algorithms with numerous sparse features, it is not recommended

5

Support-Vector Machine (SVM)

SVMs, also referred to as support-vector machines, are supervisory learning models to analyze data for regression and classification

When there is a distinct margin of distinction, it works incredibly well. In high dimensional spaces, it works well. When we have a large data set, it does not perform as well because the training time is longer (continued)

Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients

259

Table 2. (continued) Sl. No

ML Algorithm

Definition

Pros and Cons

6

Multi-Layer Perceptron (MLP)

It is a completely connected Ability to learn non-linear feed-forward neural network models. Real-time model learning capability (online learning). Scaling of features has an impact on MLP

6 Results and Analysis 6.1 Correlation Model Performance (Pearson Correlation Analysis) The correlation model is used to quantify the linear relationship between two variables. It is possible for the correlation coefficient to fall between −1.0 and 1.0. The figures must not exceed 1.0 or fall below −1.0. A correlation of −1.0 denotes a perfect negative correlation, whereas a correlation of 1.0 denotes a perfect positive correlation. The performance of the correlation model is given in Table 3. Table 3. Correlation model performance (Pearson correlation analysis). Pearson correlation coefficient (r) value

Strength

Direction

Main Attribute

Relational Attribute (When main attribute changes relational attribute Changes too with strength and direction)

Findings

Greater than 0.5

Strong

Positive

Breathing problem, Fever, Dry cough, Sore throat, Asthma, Lung disease, Heart disease, Diabetes, Hypertension

Contact with COVID-19 patients, attended large gathering, abroad travel

Patients with direct contact with outside world are mostly suffer from COVID-19

(continued)

260

P. Karmaker and M. S. Rahim Table 3. (continued)

Pearson correlation coefficient (r) value

Strength

Direction

Main Attribute

Relational Attribute (When main attribute changes relational attribute Changes too with strength and direction)

Findings

Between 0.3 and 0.5

Moderate

Positive

Breathing problem, Fever, Dry cough, Sore throat, Runny nose

Asthma, Lung disease, Heart disease, Diabetes, Hypertension

Patients with serious chronic illness with COVID-19 symptoms also suffer from infection in spite of no direct contact with outside world

Between 0 and 0.3

Weak

Positive

Breathing problem, Fever, Dry cough

Diabetes, Hypertension

Patients with only Diabetes, Hypertension with mild COVID-19 symptoms also suffer from infection. But, the cases are not that significant

0

None

None

Headache, Fatigue, Gastrointestinal

Contact with COVID-19 patients, attended large gathering, abroad travel

Patients with Headache, Fatigue, Gastrointestinal problems are less likely to suffer from COVID-19 in spite of direct contact with outside (continued)

Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients

261

Table 3. (continued) Pearson correlation coefficient (r) value

Strength

Direction

Main Attribute

Relational Attribute (When main attribute changes relational attribute Changes too with strength and direction)

Findings

Between 0 and –0.3

Weak

Negative

Headache, Hypertension, Fatigue, Gastrointestinal

Diabetes, Fatigue, Breathing Problem

Patients are suffering low COVID-19 positive rate (In rare cases)

Between –0.3 and –0.5

Moderate

Negative

None

None

No correlation

Negative

None

None

No correlation

Less than –0.5 Strong

The results of correlation heat map are presented in Fig. 2.

Fig. 2. Correlation heat map results (simple format).

262

P. Karmaker and M. S. Rahim

6.2 Confusion Matrix Confusion matrix includes data on actual and expected classifications. Four types of combination are given as follows. The number of true predictions that an event is positive is known as True Positive (TP), the number of false negatives (FN), or positive classes that are wrongly categorized as negative, is the number of improperly anticipated negative cases. The term “false positive” (FP) describes the quantity of incorrectly positive predictions made regarding a specific example, indicating that a negative class was inadvertently labeled as positive. The number of correctly predicted instances where an example is negative is known as True Negative (TN). The confusion matrix of Logistic Regression is shown in Fig. 3. Table 4 gives the measured values of all the confusion matrices.

Fig. 3. Confusion matrix of Logistic Regression.

6.3 Accuracy, Precision, Recall, and F1-Score It is important to measure the values of accuracy, recall, F1-Score, precision. 1. Accuracy: Accuracy is defined as the proportion of correct guesses in the number of projections overall. 2. Recall: True Positive Rate (TPR) or Recall is other term for sensitivity. It is a measurement for how many positive cases the classifier recognized consequently. It ought to be higher. 3. Precision: It is also known as the proportion of all positively classified instances to all positively projected cases. 4. F1 score: It is calculated using a weighted average of recollection (sensitivity) and reliability. Figure 4 depicts the accuracy comparison of different ML classifier models. Table 5 presents the performance comparison of different classifier models in terms of accuracy, precision, recall, and F1 score.

Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients

263

Table 4. Confusion matrix of six ML models. Logistic Regression Predicted (0) Actual (0) TN 947 Predicted (1) FN 55 Decision Tree N=5434 Predicted (0) Actual (0) TN 1029 Predicted (1) FN 72 SVM N=5434 Predicted (0) Actual (0) TN 963 Predicted (1) FN 33 N=5434

Predicted (1) FP 104 TP 4328 Predicted (1) FP 22 TP 4311 Predicted (1) FP 88 TP 4350

N=5434 Actual (0) Predicted (1) N=5434 Actual (0) Predicted (1) N=5434 Actual (0)

Predicted (1)

KNN Predicted (0) TN 896 FN 8 Random Forest Predicted (0) TN 822 FN 1 MLP Predicted (0) TN 958 FN 61

Predicted (1) FP 155 TP 4375 Predicted (1) FP 229 TP 4382 Predicted (1) FP 93 TP 4322

Accuracy of ML Algorithms 100 98 96 94

97.07

97

98.27

97.77

98.6

95.76

Accuracy Fig. 4. Accuracy of ML Algorithms.

6.4 Receiver Operating Characteristic (ROC) Figure 5 shows the Receiver Operating Characteristic (ROC) comparison of different ML classifiers models.

264

P. Karmaker and M. S. Rahim Table 5. Performance comparison of different ML Classifier models.

ML Algorithm

Accuracy

Precision

Recall

F1 Score

Logistic Regression

97.07%

97.65%

98.74%

98.19%

KNN

97.00%

96.57%

99.81%

98.17%

Decision Tree

98.27%

99.49%

98.35%

98.92%

Random Forest

95.76%

95.03%

99.97%

97.44%

SVM

97.77%

98.01%

99.24%

98.62%

MLP

98.60%

97.92%

98.61%

98.25%

Fig. 5. ROC comparison of different ML Classifiers models.

7 Comparative Study Table 6 compares the results of the correlation model with relevant studies and provides examples. These results show that the correlation model performed competitively in comparison to different models and studies. Nevertheless, we could not find more than one binary COVID-19 patient datasets containing AD patients and medical information

Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients

265

for comparison. This correlation study provides around 98% accuracy from the Ad-CovCorrelationNet model. Table 6. Comparative study. Description

Dataset Type

Implemented Method/Algorithm

Accuracy Reference

Predicting COVID-19

X-ray image

LSTM-RNN

96.0%

[10]

Predicting COVID-19

X-ray image

LSTM-RNN

93.0%

[11]

Predicting COVID-19

X-ray image

Res-CovNet

86.0%

[12]

Predicting AD-COVID-19 Mortality

Binary

AD-CovNet

97.0%

[9]

AD-COVID-19 Binary dataset AD-Cov-CorrelationNet 98.60% symptoms correlation

Our work

8 Conclusions This study discovers substantial connections between distinct COVID-19 symptom cases and the worldwide burden of dementia. Health policymakers must have thorough plans in place to identify those at risk (including older people) and limit the risk of infection, even while paying attention to clinical and psychiatric well- being, at this key stage of the epidemic, when countries are ready to lift their national lockdown and begin opening their borders. Such patients may be prioritized based on their risk level if a vaccination becomes more broadly available. As a result, it is critical to assess the impact of COVID19 on Alzheimer’s patients’ health. Whenever it comes to vaccine, Alzheimer’s sufferers will be given extra attention and importance. The mortality rate of Alzheimer’s patients may be lowered as a result of the research. A comparative analysis is conducted using Google collaboration research, which has evaluated the performance of each ML technique included in the Ad-Cov-CorrelationNet model in terms of accuracy, precision, recall, and F1 score. The accuracy of Logistic Regression, KNN, Decision Tree, Random Forest, and SVM classification models is greater than 95%, and MLP yields the best accuracy of 98.60%. The outcomes of this study also show accurate ROC curves. A patient who is directly contacted in the outside world suffers more illnesses associated with coronavirus. Symptoms like breathing problems, fever, dry cough, and sore throat are very sensitive to COVID-19 cases. So, it is safe to stay at home for sensitive patients. Hypertension, headache and gastrointestinal are not the serious illness for Alzheimer’s patients. So, symptoms with these minor problems remain in a less risky position.

266

P. Karmaker and M. S. Rahim

Finally, the findings after the correlation study are given below: Finding 1: AD Patients with direct contact with outside world mostly suffer from COVID19. Finding 2: AD Patients with serious chronic illness with COVID-19 symptoms also suffer from infection in spite of no direct contact with outside world. Finding 3: AD Patients with only diabetes, hypertension with mild COVID-19 symptoms also suffer from infection. But, the cases are not that significant. Finding 4: AD Patients with headache, fatigue, and gastrointestinal problems are less likely to suffer from COVID-19 in spite of direct contact with outside. Finding 5: AD Patients are suffering low COVID-19 positive rate (in rare cases).

References 1. WHO: Coronavirus disease (COVID-19). https://www.who.int/emergencies/diseases/novelcoronavirus-2019. Accessed 17 Oct 2022 2. Wang, Q., Davis, P.B., Gurney, M.E., Xu, R.: COVID-19 and dementia: analyses of risk, disparity, and outcomes from electronic health records in the US. Alzheimer’s Dement. 17(8), 1297–1306 (2021) 3. Wiley, J.: Alzheimer’s disease facts and figures. Alzheimer’s Dement. 17, 327–406 (2021) 4. Bzdok, D., Altman, N., Krzywinski, M.: Statistics versus machine learning. Nat. Methods 15(4), 233–234 (2018) 5. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinform. 18(5), 851–869 (2016) 6. Laguarta, J., Hueto, F., Subirana, B.: COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J. Eng. Med. Biol. 1, 275–281 (2020) 7. Jesmin, S., Kaiser, M.S., Mahmud, M.: Artificial and internet of healthcare things based Alzheimer care during COVID 19. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) Brain Informatics. BI 2020. LNCS, vol. 12241, pp. 263–274. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6_24 8. Villavicencio, C.N., Macrohon, J.J., Inbaraj, X.A., Jeng, J.H., Hsieh, J.G.: Development of a machine learning based web application for early diagnosis of COVID-19 based on symptoms. Diagnostics 12(4), 821 (2022) 9. Akter, S., et al.: AD-CovNet: an exploratory analysis using a hybrid deep learning model to handle data imbalance, predict fatality, and risk factors in Alzheimer’s patients with COVID19. Comput. Biol. Med. 105657 (2022) 10. Alassafi, M.O., Jarrah, M., Alotaibi, R.: Time series predicting of COVID-19 based on deep learning. Neurocomputing 468, 335–344 (2022) 11. Alorini, G., Rawat, D.B., Alorini, D.: LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. In: ICC 2021-IEEE International Conference on Communications, pp. 1–6. IEEE (2021) 12. Madhavan, M.V., Khamparia, A., Gupta, D., Pande, S., Tiwari, P., Hossain, M.S.: Res-CovNet: an internet of medical health things driven COVID-19 framework using transfer learning. Neural Comput. Appl. 1–14 (2021)

MRI Based Automated Detection of Brain Tumor Using DWT, GLCM, PCA, Ensemble of SVM and PNN in Sequence Md. Sakib Ahmed1 , Sajib Hossain1 , Md. Nazmul Haque1 , M. M. Mahbubul Syeed2 , D. M. Saaduzzaman1 , Md. Hasan Maruf1 , and A. S. M. Shihavuddin2(B) 1

2

Green University of Bangladesh (GUB), Begum Rokeya Sarani, Dhaka 1207, Bangladesh {saaduzzaman,maruf}@eee.green.edu.bd Independent University, Bangladesh (IUB), Bashundhara R/A, Dhaka, Bangladesh {mahbubul.syeed,shihav}@iub.edu.bd Abstract. Challenging, iterative, error-prone and time-consuming process it is to classify, segment, and detect the area of infection in MRI images of brain tumors. Moreover, to visualize and numerically quantify the properties of the structure of the abnormal human brain even with sophisticated Magnetic Resonance Imaging techniques requires advanced and expensive tools. MRI can better diﬀerentiate and clarify the neuronal architecture of the human brain compared to other imaging methodologies. In this study, a complete pipeline is proposed to classify abnormal structures in the human brain from MRI images that might be early signs of tumor formation. The proposed pipeline consists of noise reduction techniques, gray-level matrix (GLCM) extraction features, segmentation of DWT-based brain tumor areas reducing complexity, and Support Vector Machine (SVM) using the Radial Basis Function kernel (RBF) in ensemble with PNN for classiﬁcation. SVM and PNN in combination provide a data-driven prediction model of the possible existence and location of a brain tumor in MRI images Experimental results achieved nearly 99% accuracy in identifying healthy and tumorous tissue based on structure from brain MRI images. The proposed method together with comparable accuracy is reasonably lightweight and fast compared to the other existing deep learning-based methods. Keywords: pre-processing · image segmentation · DWT · GLCM · PCA · MRI tumor Classiﬁcation · Feature extraction · Support vector machine · Probabilistic Neural Network

1

Introduction

In the ﬁeld of digital images [1], each pixels in combination holistically represents the information about the object of interest. With the recent availability c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 267–279, 2023. https://doi.org/10.1007/978-3-031-34619-4_22

268

Md. S. Ahmed et al.

computation intensive hardware at lower costs, deep learning based solutions have taken over the AI based problem solving. However, the energy cost of such solutions dues to heavy and ineﬃcient computations during training is still alarming. For some speciﬁc problems, utilizing prior knowledge from ﬁeld experts, simple image processing pipeline can be produced to generate eﬀective yet computationally light solutions. In this work, we have addressed Magnetic Resonance Image (MRI) based automated detection of Brain Tumor problem that can enable clinical professionals to deliver early detection and corresponding healthcare to patients [1,2]. This study uses the gray-level co-occurrence matrix (GLCM) feature extraction [3] and Support Vector Machine (SVM) classiﬁcation [25] to eﬀectively reveal the targeted identiﬁcation and classiﬁcation of normal and cancerous tissues from MR images [5]. Brain tumors develop an abnormal brain cancerous tissue that is uncontrolled. There may be a benign and malignant brain tumor [7]. The benign is structurally identical and contains cancer cells that are not involved. The malignant tumor has structures that are non-uniform and includes active cancer cells which migrate between diﬀerent sections [6]. The grading system scales used range from Grade I to Grade IV, which, according to the World Health Organization, classify benign and malignant tumors [11]. In this case, Grades I and II in general represents lower-grade tumors and grades III and IV represents higher-grade tumors respectively. At any age, brain tumors can be formed in adult human brains. It may not be the same impact on each individual. Because of such a complex human brain structure, it is challenging to diagnose the tumor area in the brain [8,11]. Malignant grade I and IV tumors may grow very rapidly. It can frequency aﬀect healthy human brain cells caused by lifestyle, external environmental factors and can easily spread to the other parts of the human brain or corresponding spinal cord, thereby be harmful and untreated. The identiﬁcation and classiﬁcation of such brain tumors at an early stage can, therefore, be of upmost importance for the right diagnosis and cure procedure selection [9]. It allows a professional physician to monitor and track the incidence and development of tumor-aﬀected areas at diﬀerent stages by improving the latest imaging techniques so that they can make accurate diagnoses by scanning these images [10]. Based on these facts the maximum suitable therapy, radiation, surgical procedure, or chemotherapy may be determined. As a result, it is clear that early detection of a tumor can signiﬁcantly increase the chance of a tumor-infected patient surviving [10,11]. Using imaging methods, segmentation is used to evaluate the part of the tumor aﬀected [7]. Segmentation is the process of dividing an object into component parts that share similar properties such as color, texture, contrast, and borders [6,7]. Brain tumors are among the leading causes of death in humans in recent decade. It is clear that if the tumor is recognized early and detected eﬀectively, the chances of survival may increase. Invasive approaches include usual technique procedures for identifying and classifying benign (non-cancer) and malignant (cancer) brain tumors, Biopsy, lumbar puncture, and movements of the spinal tap. A computer-assisted diagnostic algorithm [12] was developed

MRI Based Automated Detection of Brain Tumor

269

to replace traditional invasive devices and time consuming techniques in order to improve the brain tumor diagnosis in terms of accuracy and precision. This article also provides a high- quality intelligent feature based tumor classiﬁcation technique where speciﬁc magnetic resonance (MR) pixels are classiﬁed into common Genius tumors, non-cancerous (benign) and cancerous tumors letters. The proposed method consists of three steps: (1) wavelet decomposition, (2) texture extraction, and (3) classiﬁcation. Discrete Wavelet Transform is being used in many other literature to decompose the MR image into used speciﬁed unique segments and approximate coeﬃcients after which structural data such as forces are obtained. The proposed strategy has been applied to regular MR images and it is observed that the classiﬁcation accuracy of the usage of probabilistic neural networks in ensemble with SVM is nearly 99%. The main contributions of this work are the following: – Development of a light weight texture feature based segmentation and ensembie of SVN and PNN based classiﬁcation of Brain Tumor from MRI with as per accuracy as heavy duty deep learning based approaches. – Because of being feature based, the proposed method can achieve higher accuracy with comparatively lesser number of training sets concerning deep learning based methods. Also the proposed method is generalized and adaptable to similar problems at ease. – The proposed features are explainable with the biological signiﬁcance occurring in the brain due to tumor formation which is still a matter research in the ﬁeld of deep learning.

2

Dataset

The dataset used in this work is assembled from axial T2-weighted MR picture & 256 albums. 256 in-plane resolution were collected from publicly available dataset and some local ones. Higher contrast images are T2 & best visuals are between the T1 & PET modalities whereas T2 models were preferred in this case. The unnatural data set of brain MR picture consist the following diseases such as Alzheimer, Sarcoma, Huntington, etc. From the datasets selected, images were randomly selected to generate the training, testing and validation sets and perform cross validation.

3

Proposed Method

The proposed method for the classiﬁcation of the Brain Tumor composed of ﬁve major steps: Extraction of features using DWT and GLCM, Selection of features using PCA, Detection of tumors and using PNN-RBF Network identiﬁcation as illustrated in the Fig. 1.

270

Md. S. Ahmed et al.

Fig. 1. Three stage diagram of the proposed method for the MRI image processing and classiﬁcation.

3.1

Pre-processing

The ﬁrst step in the pre-processing of image processing is ﬁltration and denoise of images. Denoising is done using orders of ﬁlters such as Anisotropic diﬀusion to remove induced noise that can may inﬂuence the image quality and important feature for the identiﬁcation of tumorous tissues. This noises in general appears into the image during acquisition, transmission or compression dues to limitations of sensors and external inﬂuences. An example of denoised image is being illustrated in Fig. 2. The overall performance is sensitive to the denoising algorithm using in the prepossessing steps. Other methods as low pass ﬁlter, Gaussian smoothing, edge preserving smoothing can also be useful in this case. 3.2

Segmentation

In this work, with segmentation the process of separating an image into diﬀerent parts where with segmented parts each pixel has similar characteristics is being performed with classical approach. Image segmentation [14,15] can be performed through wide variety of approaches as shown in Fig. 3. Deep learning based approaches have consistently performed very well in recent years however it required thousands of examples of positive samples to train from. In cases of Brain tumors these many examples are very expensive to gather and validate. In this work, we approached the same problem with feature based methods as it provides more interpretability, low training samples requirements and faster training with equally reliable performances as reported from this experiments.

MRI Based Automated Detection of Brain Tumor

271

Fig. 2. Example of Original acquisition image and noised reduced version of the original image.

Local consistency of the initial segmentation is maintained using morphological operations and with some prior knowledge incorporated with it.

Fig. 3. Image segmentation types

3.3

Feature Extraction

In this work, the feature extraction part is very important as the classiﬁcation had been performed based on that information. In this segment of the work, the aim was to extracting quantitative information from MR images resembling texture, form, and contrast. In this work, discrete wavelets transform (DWT) [16] is being used for extracting wavelet coeﬃcients in concatenation with graylevel co-occurrence matrix (GLCM) [17] for locally distributed inherent texture feature extraction. Five diﬀerent levels were created by dividing the images of the MRI making sure that the identical coeﬃcients of the LL & HL bands were picked. Such sub-bands have been attained from the disarranged wavelet; statistic textural characteristics such as strength, correlation, entropy & same homogeneity have also been extracted from the GLCM.

272

Md. S. Ahmed et al.

DWT Features. As a function vector, the proposed system makes use of the coeﬃcients of the Discrete Wavelet Transform (DWT) [23]. The wavelet is an eﬀective mathematical apparatus for extracting facets and ﬁnd the wave coeﬃcient from a picture we used MR. Wavelets are localized fundamental features that are some constant mom wavelets scaled and shifted versions. Wavelets have the primary advantage of imparting localized frequency records about a signed variable, which is extremely useful for ranking. A continuous wavelet transformation of ax(t) signal, a square-integrable characteristic relative to a real- value wavelet, t is deﬁned as a indispensable evaluation of Wavelet Decomposition. Wavelet a, b are deﬁned by translation and dilation from the mother’s wavelet, wavelet, dilation factor. The base wavelet responds to fact that this wave forms can have zero means in possibility. This function can be utilized eﬀectively by means of limiting a & b representing a discrete shape that provides the discrete wavelet as illustrated in Fig. 4 and 5.

Fig. 4. Discrete wavelet decomposition steps form input images during feature extraction process

Here, Ho D is High Pass Filter and Lo D is Low Pass Filter. T is Daubechies wavelet transform. In this work, the overall performance of HL sub-bands was once higher than that of the LL sub-band features. Therefore, a ﬁve- level decomposition of the use of Daubechies wavelet was once decided in this procedure and the traits were extracted from sub- bands LH and HL generated the usage of DWT. GLCM Features. Based on local texture features extracted by Gray Level Co-occurrence Matrix (GLCM), it is possible distinguish between Regular and anomalous tissue with reliable accuracy [21,22]. GLCM texture features also structurally contrast malignant tissue to Regular tissue, that can be often challenging for even human experts. Texture based automated analysis is therefore regularly used in computer-assisted pathology providing supplement strategies for biopsy. Even in deep learning based training, the mid level features that are extracted while training are mainly the local texture features from the object surfaces [24]. GLCM in general calculates a speciﬁc gray level frequency in any optical image region & with taking the correlations between neighboring pixels into consideration. Texture records are primarily base the probability of discovering a pair of grey ranges at predeﬁned distances and angles across a whole body. Using the Gray Level Co-occurrence Matrix (GLCM), additionally acknowledged

MRI Based Automated Detection of Brain Tumor

273

Fig. 5. Block diagram for the extraction and reduction of the DWT features used in this work

as the Gray Level Spatial Dependence Matrix (GLSDM), statistical aspects of the MR snapshots are obtained. Haralick’s GLCM is a statistical technique that can give an explanation for the relation of pixel of particular gray level. GLCM is a two-dimensional histogram where i and j are the variables in prevalence frequency i with j. Using GLCM, the statistic ﬁeld elements such as contrast, energy, entropy, homogeneity, correlation, shade, etc. which are the prominent interrelationship and elasticity are extracted. Homogeneity and entropy had been also been extracted from the ﬁrst ﬁve wavelet decomposition degrees of the LH and HL sub-bands. 3.4

Feature Selection Using PCA

Principal component analysis (PCA) is an well established and widely used method for dimensionalilty reduction. In PCA, the linear data seeks the lowerdimensional feature matrix representation in a way that preserves replicated data variance keeping them orthogonal to each other. Lower dimensional feature vector representation reduces the redundancy in the data and also as a consequence enhances the classiﬁcation accuracy. A feature vector calculated from DWT and GLCM in concatenation going through a vector component analysis using a feature reduction system to chosen dimension size by the PCA results in an eﬀective classiﬁcation algorithm as illustrated in Fig. 6. Each features are normalized before applying for dimension reduction through PCA. For normalization, we converter each feature vectors across the samples into the range of -1 to 1. We normalized the values using minimum and maximum values of that particular feature across samples with the following Eq. 1. The performance of the PCA is also sensitive to the number of dimension it is projected to. With this parameter optimization, a better dimension reduction can be approached maintaining the orthogonality of the projected features. F eatnorm = 2 ∗ (

F eat − F eatmin − 0.5) F eatmax − F eatmin

(1)

Feature extraction is the determination matter of choosing subsets from the set of variables that demonstrates the behavior of the entire set. choosing the

274

Md. S. Ahmed et al.

Fig. 6. Block diagram for the extraction and reduction of the feature used

helpful variables and discarding the inapplicable ones. As an input vector, the extracted feature vectors were used to train & evaluate the output of the PNN networks [18] for corresponding classiﬁcation task. The statistic ﬁeld characteristics of these feature vectors are shown in Table 1. Table 1. Gray-level co-occurrence matrix of trained MRI images representing corresponding statistical ﬁeld MRI ID Contrast Correlation Energy

3.5

Homogeneity Entropy

1

0.38654

0.08050

0.815681 0.90000

2

0.36012

0.17250

0.81607

3

0.265017 0.1248

0.778082 0.938330

2.87463

4

0.315628 0.0928

0.796363 0.940348

2.67392

5

0.365406 0.1353

0.808549 0.9447

2.51157

6

0.285873 0.1143

0.758115 0.930993

2.88055

7

0.299221 0.1129

0.78455

0.938665

2.91007

8

0.267241 0.1246

0.778839 0.936707

2.91136

0.94738

2.32236 0.24365

9

0.2744

0.1381

0.79678

0.9325

2.2727

10

0.2812

0.1477

0.8074

0.8635

2.8837

Brain Tumor Classification

Classiﬁcation of images is a process of extracting data classes from raster images of multi bands. There are basically three types of classiﬁcation: per pixel, per subpixel, and per object. This study focuses on pixel-scale image classiﬁcation [19], which can be divided into three groups: supervised classiﬁer (user manual),

MRI Based Automated Detection of Brain Tumor

275

unsupervised classiﬁer [20] (calculated by the software) are the two most common approaches, but analytical object-based images are rare and the most recent technique as mentioned above, and high-resolution images are used as input for the analysis. this art. Figure 7 depicts, from various points of view, the various types of classical object classiﬁcation methods available in the literature.

Fig. 7. Diﬀerent types of techniques for image recognition

Ensemble of Support Vector Machine (SVM) and Probabilistic Neural Network (PNN) for Classification: Support vector machines is very powerful tool for feature based support vector driven system for classiﬁcation. Before deep learning took over the state of the art, SVM used to be one of the top contender in providing the best accuracy for varied classiﬁcation challenges. The main assumption behind the SVM is it tries to set the class boundary using predeﬁned kernels keeping more weights on the class examples need the boundaries. Together with SVM, a Probabilistic Neural Network (PNN) is trained based on feature mapping and used in ensemble method [30] for ﬁnal classiﬁcation prediction.

4

Results

The tests were performed with a standard Intel i5 platform run to Windows 10. For implementation, the wavelet toolbox were used to develop the feature extraction algorithm, Matlab’s 2018 bio-statistical toolbox (Mathworks), SVM toolbox, Expanded the SVM kernel were used for the classiﬁcation and comparison of MR brain tumor analysis. We evaluated the accuracy with four SVMs (Linear (LIN), higher order Polynomial (HPOL), Lower order polynomial (IPOL), & Gaussian Radial Basis function (GRBF) kernels to select the best ones. The results found from the experiments showed interesting gain in performance when SVM and

276

Md. S. Ahmed et al.

PNN are used in ensemble. The classical approaches covering DWT, and other post processing performed reasonably well, however needs sophisticated feature extraction, normalization, reduction and classiﬁcation approaches [26–29,31,32]. With the proposed method the MRI images are step by step in sequence are processed in eﬃcient way to extract the optimum information in terms of features to make a robust and informed prediction providing high accuracy. The overall result is presented in Fig. 8. It illustrates that the best performance of our proposed DWT + GLCM + PCA + SVM (GRBF) + PNN system with when compared with other state of the art methods on our combined dataset, achieving the highest classiﬁcation accuracy of 99%. The next one is the 98.75 percent accuracy DWT + GLCM + PCA + SVM (GRBF) method with 98.76%. All of these methods follows the same feature method principle however, with the SVM and PNN in combination we both able to extract the support vectors information and the class clusters in higher dimensional feature domain through PNN, that made it more comprehensive to achieve better results. However, for the training and ensembles, it requires more time and hardware space compared to others. Also the ensemble method needs to be further tuned for the optimized conﬁguration. The accuracy of the qualiﬁed and checked image was measured on the basis of the classiﬁcation of normal and abnormal tumour tissues. The ﬁndings of the identiﬁcation of normal and abnormal tumour tissues are shown in Fig. 8. Also the features extracted along the way is very useful for better understanding of the biological process of a tumor formation and their behavioral pattern. Classiﬁcation accuracy or right level is the eﬀectiveness of acceptable classiﬁcation to the total number of classiﬁcation tests. The classiﬁcation results rather than directly being used to diagnose the patients can also work as a suggestive system for the doctors who can make the ﬁnal decision based on corresponding additional patient data. SVM and PNN captures complementary traits from the features and guides the system in combination for a more accurate results. SVM focuses more on that support vectors and features individually, whereas PNN contributes by looking more into the locally coherent features in their decision making. The ensemble methods can be further explored and tuned which is part of our future direction.

Fig. 8. Comparative results in terms of accuracy

MRI Based Automated Detection of Brain Tumor

5

277

Conclusion

In this study, we tend to used brain MRI pictures divided into healthy brain tissue and tumorous tissue. Preprocessing is used to remove noise and smooth the image. This also contributes to an improved ratio of SNR. We, therefore, used a discreet wavelet transformation that breaks down the image and extracted the characteristics from the co-occurrence matrix of gray-level (GLCM), observed with the aid of using morphological operations. SVM with RBF kernel and PNN ensemble is used as classiﬁer to identify tumors in brain MRI images with around 99% accuracy. The ﬁnal accuracy is comparatively higher in average than other light weight classical approaches that require minimal amount of positive training samples. Research results clearly state that diagnosis of brain tumor is using the proposed method can oﬀer quicker and more reliable solution than diagnosing directly only by a medical professional. The proposed method can be a helping tool for the real life practitioners for more accurate diagnosis alongside tumor location identiﬁcation.

References 1. Scholl, I., Aach, T., Deserno, T.M., et al.: Challenges of medical image processing. Comput. Sci. Res. Dev. 26, 5–13 (2011). https://doi.org/10.1007/s00450-010-01469 2. Eklund, A., Dufort, P., Forsberg, D., LaConte, S.M.: Medical image processing on the GPU - past, present and future. Med. Image Anal. 17(8), 1073–1094 (2013). https://doi.org/10.1016/j.media.2013.05.008. ISSN 1361-8415 3. de Siqueira, F.R., Schwartz, W.R., Pedrini, H.: Multi-scale gray level co-occurrence matrices for texture description. Neurocomputing 120, 336–345 (2013). https:// doi.org/10.1016/j.neucom.2012.09.042. ISSN 0925-2312 4. Wu, S.G., Bao, F.S., Xu, E.Y., Wang, Y., Chang, Y., Xiang, Q.: A leaf recognition algorithm for plant classiﬁcation using probabilistic neural network. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 11–16 (2007). https://doi.org/10.1109/ISSPIT.2007.4458016 5. Ji, Z., Liu, J., Cao, G., Sun, Q., Chen, Q.: Robust spatially constrained fuzzy C-means algorithm for brain MR image segmentation. Pattern Recogn. 47(7), 2454–2466 (2014). https://doi.org/10.1016/j.patcog.2014.01.017. ISSN 0031-3203 6. Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015). https://doi.org/ 10.1109/TMI.2014.2377694 7. I¸sın, A., Direko˘ glu, C., S ¸ ah, M.: Review of MRI-based brain tumor image segmentation using deep learning methods. Procedia Comput. Sci. 102, 317–324 (2016). https://doi.org/10.1016/j.procs.2016.09.407. ISSN 1877-0509 8. Othman, M.F., Basri, M.A.M.: Probabilistic neural network for brain tumor classiﬁcation. In: 2011 Second International Conference on Intelligent Systems, Modelling and Simulation, pp. 136–138 (2011). https://doi.org/10.1109/ISMS.2011.32 9. Abiwinanda, N., Hanif, M., Hesaputra, S.T., Handayani, A., Mengko, T.R.: Brain tumor classiﬁcation using convolutional neural network. In: Lhotska, L., Sukupova, L., Lackovi´c, I., Ibbott, G.S. (eds.) World Congress on Medical Physics and Biomedical Engineering 2018. IP, vol. 68/1, pp. 183–189. Springer, Singapore (2019). https://doi.org/10.1007/978-981-10-9035-6 33

278

Md. S. Ahmed et al.

10. Bondy, M.L., et al.: Brain tumor epidemiology: consensus from the brain tumor epidemiology consortium. Cancer 113, 1953–1968 (2008). https://doi.org/10.1002/ cncr.23741 11. Anaraki, A.K., Ayati, M., Kazemi, F.: Magnetic resonance imaging-based brain tumor grades classiﬁcation and grading via convolutional neural networks and genetic algorithms. Biocybern. Biomed. Eng. 39(1), 63–74 (2019). https://doi.org/ 10.1016/j.bbe.2018.10.004. ISSN 0208-5216 12. Syeed, M., Lindman, J., Hammouda, I.: Measuring perceived trust in open source software communities. In: Balaguer, F., Di Cosmo, R., Garrido, A., Kon, F., Robles, G., Zacchiroli, S. (eds.) OSS 2017. IAICT, vol. 496, pp. 49–54. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57735-7 5 13. Gupta, M., Taneja, H., Chand, L., Goyal, V.: Enhancement and analysis in MRI image denoising for diﬀerent ﬁltering techniques. J. Stat. Manag. Syst. 21(4), 561– 568 (2018) 14. Kaur, D., Kaur, Y.: Various image segmentation techniques: a review. Int. J. Comput. Sci. Mob. Comput. 3(5), 809–814 (2014) 15. Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3523–3542 (2021) 16. Ramya, J., Vijaylakshmi, H.C., Saifuddin, H.M.: Segmentation of skin lesion images using discrete wavelet transform. Biomed. Signal Process. Control 69, 102839 (2021) 17. Riesaputri, D.F., Sari, C.A., Rachmawanto, E.H.: Classiﬁcation of breast cancer using PNN classiﬁer based on GLCM feature extraction and GMM segmentation. In: 2020 International Seminar on Application for Technology of Information and Communication (iSemantic), pp. 83–87. IEEE (2020) 18. Virmani, J., Singh, G.P., Singh, Y.: PNN-based classiﬁcation of retinal diseases using fundus images. In: Sensors for Health Monitoring, pp. 215–242. Academic Press (2019) 19. Alvarez, M.A., Theran, C.A., Arzuaga, E., Sierra, H.: Analyzing the eﬀects of pixel-scale data fusion in hyperspectral image classiﬁcation performance. In: Algorithms, Technologies, and Applications for Multispectral and Hyperspectral Imagery XXVI, vol. 11392, p. 1139205. International Society for Optics and Photonics (2020) 20. Alloghani, M., Al-Jumeily, D., Mustaﬁna, J., Hussain, A., Aljaaf, A.J.: A systematic review on supervised and unsupervised machine learning algorithms for data science. Supervised and Unsupervised Learning for Data Science, pp. 3–21 (2020) 21. Shihavuddin, A.S.M., Gracias, N., Garcia, R., Gleason, A.C., Gintert, B.: Imagebased coral reef classiﬁcation and thematic mapping. Remote Sens. 5(4), 1809–1841 (2013) 22. Shihavuddin, A.S.M., Gracias, N., Garcia, R., Escartin, J., Pedersen, R.B.: Automated classiﬁcation and thematic mapping of bacterial mats in the north sea. In: 2013 MTS/IEEE OCEANS-Bergen, pp. 1–8. IEEE (2013) 23. Kociolek, M., Materka, A., Strzelecki, M., Szczypi´ nski, P.: Discrete wavelet transform-derived features for digital image texture analysis. In: International Conference on Signals and Electronic Systems, vol. 2 (2001) 24. Ghosh, S., Das, N., Das, I., Maulik, U.: Understanding deep learning techniques for image segmentation. ACM Comput. Surv. (CSUR) 52(4), 1–35 (2019) 25. Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565– 1567 (2006)

MRI Based Automated Detection of Brain Tumor

279

26. Zhang, Y.D., Wu, L.: An MR brain images classiﬁer via principal component analysis and kernel support vector machine. Progress Electromagnet. Res. 130, 369–388 (2012) 27. Chaplot, S., Patnaik, L.M., Jagannathan, N.R.: Classiﬁcation of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed. Signal Process. Control 1(1), 86–92 (2006) 28. Zhang, Y., Wang, S., Wu, L.: A novel method for magnetic resonance brain image classiﬁcation based on adaptive chaotic PSO. Progress Electromagnet. Res. 109, 325–343 (2010) 29. El-Dahshan, E.S.A., Hosny, T., Salem, A.B.M.: Hybrid intelligent techniques for MRI brain images classiﬁcation. Digit. Signal Process. 20(2), 433–441 (2010) 30. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https:// doi.org/10.1007/3-540-45014-9 1 31. Varuna Shree, N., Kumar, T.N.R.: Identiﬁcation and classiﬁcation of brain tumor MRI images with feature extraction using DWT and probabilistic neural network. Brain Inform. 5(1), 23–30 (2018) 32. Mathur, Y., Jain, P., Singh, U.: Foremost section study and kernel support vector machine through brain images classiﬁer. In: International Conference of Electronics. Communication and Aerospace Technology (ICECA), pp. 559–562 (2017). https://doi.org/10.1109/ICECA.2017.8212726

Pattern Recognition and Natural Language Processing

Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets Ferdib-Al-Islam(B) , Nusrat Jahan, Farjana Yeasmin Rupa, Suprio Sarkar, Sifat Hossain, and Sk. Shalauddin Kabir Northern University of Business and Technology, Khulna, Bangladesh [email protected]

Abstract. The world has faced a massive health emergency due to the prompt transmission of corona-virus (COVID-19) over the last two years. Since there is no specific treatment for COVID-19, infections have to be limited through prevention methods. Wearing a face mask is an effective preventive method in public areas. However, it is impractical to manually implement such regulations on big locations and trace any infractions. Automatic face mask detection facilitated by deep learning techniques provides a better alternative to this. This research introduced an automatic face detection system using ASUS Tinker single-board computer and MobileNetV2 model. As most of the publicly available face mask detection dataset was artificially generated, in this work, a real face mask detection dataset was first created consisting of a total of 300 images. The ASUS Tinker board’s model training and testing performance and training time have been assessed for this dataset and a publicly accessible dataset of 1376 images. The recommended system reached 99% of test accuracy, precision, recall, and f1-score for the newly collected dataset and 100% of test accuracy, precision, recall, and f1-score for the publicly available dataset. Keywords: COVID-19 · Face Mask Detection · Deep Learning · MobileNetV2 · ASUS Tinker

1 Introduction COVID-19 emerged suddenly in 2019 and has had a worldwide impact as it has infected over 259,502,031 people globally and killed over 5,183,003 people as of November 27, 2021 [1]. This figure is continuously rising. The World Health Organization (WHO) has recognized Coronavirus as primarily characterized by fever, dry cough, exhaustion, diarrhea, and loss of taste and smell [2]. Numerous prophylactic measures have been taken to counteract COVID-19. The most significant safeguards are washing hands frequently, keeping a safe distance, wearing a protective face mask, and actively avoiding touching the face being the most straightforward. COVID-19 is a contagious illness that may be prevented by correctly using a face mask. COVID-19 may be warded off by © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 283–293, 2023. https://doi.org/10.1007/978-3-031-34619-4_23

284

Ferdib-Al-Islam et al.

maintaining a tight social distance and wearing masks. However, many are not adhering to the restrictions, which contributes to the infection’s spread. Identifying individuals who are not adhering to the recommendations and notifying relevant authorities may assist in halting the contamination of the Coronavirus. According to the WHO, the correct technique to wear a mask is to adjust it to cover the mouth, nose, and chin [3]. If masks are not worn properly, protection is severely decreased. Security officers are currently stationed in public areas to advise individuals to wear masks. However, this method exposes the guards to virus-infected air, causes overcrowding at the doors due to its inefficiency, and the guards can also contaminate COVID-19 to the fresh people as it is highly contagious. As a result, a quick and effective solution is required to solve the situation. Face mask detection is used to decide that someone is not wearing a mask. Detecting anything in an image is analogous. Deep learning algorithms are widely used in face mask detection and other image classification applications [4–7]. These algorithms can be applied to detect a mask on a person’s face in real-time. The main problem in face mask detection research is that most publicly available datasets were created artificially based on available face detection datasets and the performance issues in real-time implementation. In this research, a face mask detection dataset from real people images has been created with 300 images. Then, another publicly available dataset has also been used to analyze the performances in face mask detection (with facemask vs. without facemask) using the MobileNetV2 model. The trained model was deployed for real-time inference in ASUS Tinker Single Board Computer (SBC). The paper is arranged as per following: the section “Literature Review” summarizes recent research on face mask detection, the section “Methodology” details the materials and methodology used to conduct this research, the “Result and Discussion” section outlines the study’s findings, and finally, the “Conclusion” section summarizes the conclusion and recommendation.

2 Literature Review Many researchers proposed different methods to detect a man wearing a mask by machine learning, deep learning, and computer vision algorithms. In this section, the previous research on face mask detection and the shortcoming of those works have been discussed. Nagrath et al. [8] developed a face mask detection system with SSDMNV2. In this approach, a single shot-multibox detector was used to recognize the face, and MobileNetV2 was used to perform real-time face mask detection. Their work achieved 92.64% accuracy and an F1-score of 93%. The “Properly Wearing Masked Face Detection Dataset” (PWMFD) was recommended by Jiang et al. [9]. In this dataset, they used a total number of 9205 images with three sections. They also proposed a mask detector model known as Squeeze and Excitation (SE)-YOLOv3 and achieved a higher detection speed. By super-resolution and classification networks (SRCNet), Qin et al. [10] have designed an automatic categorization method of facemask-wearing conditions using 3835 images of the public mask dataset. The dataset was divided into three categories: no mask-wearing (671 images), correct mask-wearing (3030 images), and incorrect mask-wearing (134 images). Finally, the proposed method got an accuracy of

Performance Analysis of ASUS Tinker and MobileNetV2

285

98.70%. Li et al. [11] used “Celebi and Wider Face Databases” for the training section and FDDB database for evaluation by YOLOv3 algorithm based on Darknet_19 architecture. This system achieved 93.9% accuracy. To automatically identify face masks, Chowdary et al. [12] proposed a transfer learning model developed by InceptionV3. In that method, they used the “Simulated Masked Face Dataset” (SMFD) for training and testing consisting of 785 images of unmasked facial and 785 images of masked facial and model achieved 99.9% accuracy. Sethi et al. [13] suggested a face masked detection model by joining single-stage and double-stage detectors for detecting whether people are wearing a mask or not. They used a large dataset containing 45000 images and three popular deep learning models - AlexNet, MobileNet, and ResNet50. After the evaluation, the recommended model with ResNet50 obtained the highest accuracy of 98.2%. Kayali et al. [14] proposed a system for face mask detection using two popular deep learning-based ResNet50 and NASNetMobile networks. The “Labeled Faces in the Wild” (LFW) dataset by adding face masks was used in their proposed system, and the ResNet50 model got 92% face mask detection accuracy.

3 Methodology The significant steps in the implementation process have been classified as follows: • • • •

ASUS Tinker SBC Preparation Dataset Creation and Preprocessing Image Augmentation Training of MobileNetV2 and Classification

3.1 ASUS Tinker SBC Preparation ASUS Tinker SBC is part of a new generation of more capable maker tools. As a singleboard computer, it provides a builder with many alternatives. It can programmatically manipulate hardware and run customized operating systems for particular purposes. ASUS’s tinker board is another significant player in the SBC PC market. The emulation procedure is much smoother when used with its Rockchip RK288 GPU, resulting in pure master performance as this comes with Mali-T764 GPU and 2GB DDR3 memory. The non-shared GBit LAN port for increased performance, the upgradeable embedded shielded Wi-Fi for reliable IoT and network connectivity, and the highly functional PCB and configuration with HD Audio-192/24bit audio, HD and UHD of accelerated (4K) video playback. Tinker Board Debian OS (V2.1.11) as the operating system was used in this research [15]. Then the necessary programming environment for the implementation like – TensorFlow, Keras, Imutils, OpenCV, Scikit-learn, Matplotlib, Seaborn has been installed in the system. 3.2 Dataset Creation and Preprocessing Two different face mask detection datasets have been used in this research to analyze the ASUS Tinker SBC and MobileNetV2 model. One of the datasets has been created

286

Ferdib-Al-Islam et al.

(a)

(b) Fig. 1. Sample images of (a) people with face masks and (b) without face masks from the created dataset

by the authors of this paper. This dataset is available for download upon submitting a request in Zenodo [16]. The dataset encompasses 300 authentic images of two classes (150 images of people with face masks and 150 images without face masks). The other dataset contains 1376 images (690 people with face masks and 686 images without face masks), made available by Prajna Bhandary on GitHub [17]. She developed the dataset by collecting normal photos of faces and masking them using a programming script after detecting facial landmarks. The sample images from both datasets have been illustrated in Fig. 1.

Performance Analysis of ASUS Tinker and MobileNetV2

287

In the preprocessing step, the images were converted to 224 × 224 shapes. Then, images were converted into 8-bit integers and stored in the array. Normalization was performed in the images. By doing so, image pixel values were scaled between -1 and 1. One-hot encoding has been performed for the image labels. The datasets have been partitioned using the percentage split technique where the training and test set ratio was 80:20. The validation set was constructed from the test set. 3.3 Image Augmentation The term “image augmentation” denotes a variety of strategies for generating “new” training examples from prevailing ones by introducing random resonances and distortions (but without altering the image’s class labels) [18]. Generally, a deep learning model performs well when fed a large quantity of data. Image augmentation is an effective strategy when a large volume of data is unavailable for developing a deep learning model. There are generally five different techniques for performing image augmentation. During training, on-the-fly image mutations (image augmentation) were performed to both datasets to increase generalization using Keras “ImageDataGenerator”. The parameters of image augmentation are enumerated in Table 1. Table 1. Parameter details of image augmentation Parameter Name

Selected Value

rotation_range

20

shear_range

0.15

zoom_range

0.15

height_shift_range

0.2

width_shift_range

0.2

horizontal_flip

True

fill_mode

“nearest”

3.4 Training of MobileNetV2 and Classification MobileNetV2 architecture is a variant of convolutional neural networks generally used in mobile devices [19]. The MobileNetV2 concept depends on a reversed residual structure, with the residual blocks’ input and output being thin bottleneck layers. Additionally, it employs lightweight convolutions to filter the expansion layer’s characteristics. Finally, non-linearities are eliminated in the narrow layers. The architecture of MobileNetV2 has been demonstrated in Fig. 2. In this research, a fine-tuned simple MobileNetV2 model has been used without retraining the whole model. Fine-tuning is accomplished by unfreezing some of the top layers of a fixed model base and simultaneously training the novel auxiliary classifier layers and the base model’s concluding layers. It permits

288

Ferdib-Al-Islam et al.

Fig. 2. Architecture of MobileNetV2 [19]

“fine-tuning” of the underlying model’s higher-order extracted features to make them more appropriate for the precise task. The old head of the MobileNetV2 was replaced by constructing a new fullyconnected (FC) head and appending it to the base. Then, the base layers of MobileNetV2 were frozen. As a result, the weights of the base layers will not be changed during backpropagation, but the weights of the head layer will be adjusted. In this work, MobileNetV2 architecture was modified by adding a pooling layer of 7 × 7 kernel size using an “AveragePooling2D” layer, which was connected to a flatten layer. A dense layer with the “relu” activation was next appended. A dropout of 0.5 was chosen to prevent the model from overfitting. A fully connected dense layer with “softmax” activation was used to classify. MobileNet V2 model was instantiated and was pre-loaded with the weights trained on ImageNet. For training the head of MobileNetV2, batch size was 32; the learning rate was 0.0183, the epoch was 10; adam optimizer was chosen for optimization; binary cross-entropy was used as loss function.

4 Result and Discussion This system’s training, testing, and inference have been accomplished on ASUS Tinker SBC. The evaluation of the implemented system’s performance was accomplished using distinct performance metrics – accuracy, precision, recall, and f1-score. The comprehensive classification report of the MobileNetV2 model for both datasets has been represented in Table 2. The training time for Prajna Bhandary’s dataset [17] was 358s, and the dataset collected by this paper’s authors was 301s. The accuracy for the datasets presented in [17] and [16] were 100% and 99% correspondingly. Table 2. Classification report of MobileNetV2 model for both datasets Dataset Name

Class Name

Training Time (sec.)

Accuracy (%)

Precision (%)

Recall (%)

F1-Score (%)

Dataset by Prajna Bhandary [17]

with_mask

358

100

100

100

100

100

100

100

Dataset by authors of this work [16]

with_mask

301

99

98

100

99

100

98

99

without_mask

without_mask

Performance Analysis of ASUS Tinker and MobileNetV2

289

(a)

(b) Fig. 3. Loss vs. Accuracy graph of training and validation ser for (a) Prajna Bhandary’s dataset and (b) authors’ collected dataset

290

Ferdib-Al-Islam et al.

The training reports (loss vs. accuracy) have been demonstrated in Fig. 3. The loss and accuracy for the training and validation set have been observed for ten epochs as it got maximum. The implementation of the system has been illustrated in Fig. 4 using an A4Tech web camera and ASUS Tinker SBC. The web camera was connected to the ASUS Tinker SBC through USB2.0.

A4Tech USB Web Camera

ASUS Tinker SBC Fig. 4. Proposed system’s experimental hardware setup

The average inference time for the video was ~ 1.6 s. A sample inference from the implemented system has been illustrated in Fig. 5 for both conditions with confidence. The performance of the MobileNetV2 model for both datasets has been compared to the previous works described in Table 3. It can be seen that the proposed system with the frozen MobileNetV2 model performed better than the previous works.

Performance Analysis of ASUS Tinker and MobileNetV2

291

(a)

(b) Fig. 5. Real-time inference on ASUS Tinker SBC for (a) person with face mask, and (b) person without face mask condition

292

Ferdib-Al-Islam et al. Table 3. Comparison of the proposed system with previous works

Author

Method

Dataset Size

Accuracy (%)

Nagrath et al. [8]

SSDMNV2 + MobileNetV2

5521

92.64

Qin et al. [10]

SRCNet

3835

98.7

Li et al. [11]

YOLOv3 + DarkNet-19

5171

93.9

Sethi et al. [13]

ResNet50

45000

98.2

Kayali et al. [14]

ResNet50

13233

92

This Work

MobileNetV2

1376

100

300

99

5 Conclusion The pandemic due to COVID-19 has compelled most nations to mandate wearing face masks. Observing the face mask manually in crowded areas is a critical duty. Developing this system for detecting if the person is wearing a face mask or not would greatly assist the authorities. The suggested embedded vision-based system may be used in any work setting, including public places, stations, corporate environments, streets, retail malls, and test centers, where precision and sensitivity are critical for the task at hand. This research eliminates the research gaps in the previous studies with superior performance (99% of test accuracy, precision, recall, and f1-score for the authors’ created dataset and 100% of test accuracy, precision, recall, and f1-score for a publicly obtainable dataset) and the creation of the real-world dataset, which can be helpful in face mask detection research. In the future, the dataset size can be increased by collecting natural images of people; this system can be adapted for detecting people with incorrectly mask-wearing. The introduction of the Internet of Things (IoT) can benefit the authorities responsible for the compulsion to wear a face mask from remote.

References 1. COVID Live - Coronavirus Statistics – Worldometer. https://www.worldometers.info/corona virus/ 2. Ferdib-Al-Islam, Ghosh, M.: COV-doctor: a machine learning based scheme for early identification of COVID-19 in patients. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds.) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol. 95, pp. 39–50 Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-6636-0_4 3. When and how to use masks. https://www.who.int/emergencies/diseases/novel-coronavirus2019/advice-for-public/when-and-how-to-use-masks 4. Saha, P., et al.: COV-VGX: an automated COVID-19 detection system using X-ray images and transfer learning. Inf. Med. Unlocked 26, 100741 (2021) 5. Islam, M., et al.: A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. Inf. Med. Unlocked 20, 100412 (2020)

Performance Analysis of ASUS Tinker and MobileNetV2

293

6. Muhammad, L.J., Islam, M.M., Usman, S.S., Ayon, S.I.: Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery. SN Comput. Sci. 1(4), 1–7 (2020). https://doi.org/10.1007/s42979-020-00216-w 7. Liu, L., et al.: deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2019). https://doi.org/10.1007/s11263-019-01247-4 8. Nagrath, P., et al.: SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 66, 102692 (2021) 9. Jiang, X., et al.: Real-time face mask detection method based on YOLOv3. Electronics. 10(7), 837 (2021) 10. Qin, B., Li, D.: Identifying facemask-wearing condition using image super-resolution with classification network to prevent COVID-19. Sensors. 20(18), 5236 (2020) 11. Li, C., Wang, R., Li, J., Fei, L.: Face detection based on YOLOv3. In: Jain, V., Patnaik, S., Popent, iu Vl˘adicescu, F., Sethi, I.K. (eds.) Recent Trends in Intelligent Computing, Communication and Devices. AISC, vol. 1006, pp. 277–284. Springer, Singapore (2020). https://doi. org/10.1007/978-981-13-9406-5_34 12. Jignesh Chowdary, G., Punn, N.S., Sonbhadra, S.K., Agarwal, S.: Face mask detection using transfer learning of inceptionV3. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds.) BDA 2020. LNCS, vol. 12581, pp. 81–90. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-66665-1_6 13. Sethi, S., et al.: Face mask detection using deep learning: An approach to reduce risk of Coronavirus spread. J. Biomed. Inf. 120, 103848 (2021) 14. Kayali, D., et al.: Face mask detection and classification for COVID-19 using deep learning. In: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp.1–6 (2021) 15. Tinker Board | Single Board Computer | ASUS Sri Lanka. https://www.asus.com/bd/Mother boards-Components/Single-Board-Computer/All-series/Tinker-Board/ 16. Ferdib-Al-Islam., et al.: Face Mask Detection Dataset (2021). https://doi.org/10.5281/zenodo. 5305989 17. observations/experiements/data at master · prajnasb/observations. https://github.com/pra jnasb/observations/tree/master/experiements/data 18. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0 19. Sandler, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE, Salt Lake City (2018)

Fake Profile Detection Using Image Processing and Machine Learning Shuva Sen, Mohammad Intisarul Islam, Samiha Sofrana Azim, and Muhammad Iqbal Hossain(B) Department Of Computer Science and Engineering, BRAC University, Mohakhali, Dhaka 1212, Bangladesh {shuva.sen,mohammad.intisarul.islam,samiha.sofrana.azim}@g.bracu.ac.bd, [email protected] Abstract. In today’s technologically evolved society, almost everyone has a presence on social media. Making phony accounts is thus incredibly simple. One who may pose as someone else is referred to as having a “false proﬁle.” These accounts are mostly used to malign somebody by impersonating them. Nevertheless, a phony proﬁle can also be utilized for a number of purposes, such as inciting regional tensions, propagating false information, and publishing provocative material involving current sensitive issues. A model was suggested that can potentially help in the decrease of fraudulent proﬁles considering they pose such a major hazard to anyone and everyone. It can reliably pinpoint users which can be accused of being fraudulent, notably anyone without a personal image. In the suggested methodology, machine learning and image recognition were both employed to guarantee that each user has a distinct proﬁle. The goal of this concept was to prevent users from opening accounts with someone else’s image or personally identiﬁable information. In hopes of preventing someone from using another user’s photograph or an image of any objects, this model additionally uses an image processing service for face recognition. In order to prevent ﬁctitious individuals from being able to open an account by using someone else’s identify, One Time Password (OTP) technology was deployed. It is vital to deﬁne fake accounts using deep learning on a real dataset of respondents’ responses. The k-means method was applied to the dataset in order to determine the inaccurate results. The dataset was subjected to the k-means clustering algorithm, and it was discovered that the accuracy value was 75.30%.

Keywords: Fake proﬁle media

1

· photo identiﬁcation · cyber crime · Social

Introduction

In the era of the internet, social networks such as Facebook, Instagram, Twitter, and LinkedIn have become a constant necessity for our generation, allowing us to connect with people all over the world quickly and easily. Everyone, from c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 294–308, 2023. https://doi.org/10.1007/978-3-031-34619-4_24

Fake Proﬁle Detection Using Image Processing and Machine Learning

295

millennia-ls to the elderly, is using online social networking sites to improve their lives. They use these social networking platforms to discover and make friends by making and posting personal memories, photos, videos, and chats. Students, for example, use social networking features to acquire more expertise and equip themselves with a variety of specialties. Tutors use online social networks to connect with their students and help the students in their learning process. Many businesses have used a variety of websites to market and sell their products and services online. Public agencies use social media to eﬀectively provide government programs and keep citizens updated on diﬀerent circumstances. Every kind of person is relying on social networking platforms for numerous reasons. Hence, to get connected with every other person, they need to share their personal information to create their proﬁle which is available on social networks. Facebook is acclaimed as the most prominent and commonly utilized social network in the world, with more than 2.4 billion active members per month. The exchange of messages, photographs, and comments enable connections between people all over the world. Individuals use Facebook all around the world to communicate with others for their purposes. Any good practice, though, comes with its own set of issues. Unfortunately, often people unacceptably use Facebook. They created accounts by using other people’s information with a tendency of harassment, spreading fake news, and creating panic among the people. They even aim to embarrass celebrities by making proﬁles of their names and personal details. According to Facebook’s study, the social media platform has deleted 5.4 billion false proﬁles in 2019. According to Facebook, about 5% of monthly active users were fraudulent. According to the study, one out of every ten likes on a Facebook post might be a response from a false account. To identify current fake accounts and prevent fake users from creating new ones, a deep learning algorithm was used to detect fake accounts along with the concept of image processing and one-time password (OTP) to prevent fake users from creating new ones. The k-means algorithm was applied to distinguish fake data from a real dataset by calculating Euclidean distances and modifying the centroid points. 1.1

Motivation

The various social platforms have been an integral aspect of people’s everyday lives. Everyone’s public interaction has evolved in the modern era to being connected to online interpersonal organizations. These platforms have brought about a signiﬁcant change in the way we track our civic activity. It has proven to be easier to add new companions and keep in contact with them and their posts. Almost everyone has an account on various social networks and sometimes more than one account or on diﬀerent platforms. In the recent era of technology as the applications and the utilization increase in our daily life, we continuously post some unwanted and unaware stuﬀ on social networking and create a mess on the social platform. As a result, creating a ﬁctitious account is quite easy. In the name of a female, there are over 200 million fake Facebook proﬁles. As a result, controlling these accounts is diﬃcult, and money laundering, smuggling, and other anti-social activities result from these false accounts. Furthermore, these ﬁctitious identities are being used to

296

S. Sen et al.

spread rumors, hate speech, or even post pictures or thoughts of other people without their permission. In this case, we will consider false accounts and individuals who use this site to launder money and put a stop to such a serious situation. Nobody has yet come up with a feasible solution to these problems like fake proﬁles, online impersonation, etc. We hope to include a mechanism in this project that allows for the automated identiﬁcation of false accounts, ensuring that people’s social lives are protected. We also hope that we can make it easy for sites to handle the large number of proﬁles that cannot be handled manually by using this automatic detection strategy. We hope to reduce it so that no fake news or information cannot be spread from any fake accounts. It will also help to reduce identity theft. 1.2

Problem Statement

Fake proﬁles are being used to spread rumors and hate speech and post pictures and thoughts of other people without their permission. This type of action entails the mass production of false identities to launch an online assault on social media. We will also create multiple accounts in our app for comparison. For this, a database will be introduced where all the information about current users will be stored. The identiﬁcation would be focused on the users’ Facebook behavior and interactions with other users and their user feeds info. We often use image recognition to determine whether or not diﬀerent accounts have the same images. We will focus on which features are missing and diﬀerent in the fake accounts by comparing our accounts features and dataset features from the database using image processing and algorithms. 1.3

Objective and Contributions

The preliminary plan of this research is to detect fake accounts from various social media platforms starting from Facebook. The steps are as follows: i) First, constructed a database with some fundamental information about the users to distinguish the characteristics of real accounts. ii) Use image processing to classify the images being used to open a new account. iii) Use face detection and object detection features of OpenCV to do so. iv) Use the K means algorithm to divide our database into diﬀerent clusters. v) Lastly, add a one-time password (OTP) system to assure additional security to the accounts. In today’s world, the threat of cybercrime is growing along with the number of users on various social media platforms. And one of the simplest ways to accomplish this is to share erroneous data or to construct a proﬁle of a random individual and disparage them. Our contributions are listed below:

Fake Proﬁle Detection Using Image Processing and Machine Learning

297

i) Working to address potential threats and vulnerabilities brought on by duplicate and incorrectly categorized accounts. ii) Smoke out bogus accounts in addition to preventing individuals from creating identical accounts.

2 2.1

Background Literature Review

The algorithms used in the proposed model are- supervised ML, map reduction, pattern recognition approach and unsupervised two-layer meta-classiﬁer method. PCA algorithm, SMOTE, Medium Gaussian SVM, Regression, Logistic Algorithms, Various classiﬁer algorithms. In this model, linear SVM gives 95.8% accuracy, medium Gaussian SVM provides 97.6%, and logistic regression gives 96.6% [5]. Random forest along with C4.5 and adaptive boosting with decision stump are used as a second classiﬁer, in case the accuracy of the ﬁrst classiﬁer is less eﬀective [2]. ROC curve (Receiver Operating Characteristics curve) has been generated to measure the classiﬁers’ performance, along with some other metrics such as precision, recall, F-1 score, etc. Supervised Machine learning algorithms are used to dig out fake proﬁles. There is one more algorithm, a skin detection algorithm which has been applied to ﬁnd decent pictures from account holders [3]. If any portrayal contains a human face, it will go under skin detection, where the percentage of skin present in the image will be computed. Using all these algorithms, 80% accuracy was obtained from ML, and the rest of the classiﬁer has 60–80% accuracy and the error rate is 20% [6]. This paper proposes SVM, Neural network, SMOTE-NC, and Naive Bayes with Gaussian distribution [1]. To detect robotic accounts, to diﬀerentiate and check the eﬀectiveness of the executed techniques; Precision, Recall, and F1 Score are valued in the evaluation metric. On the proposed model, fake accounts are divided into two sections-user unclassiﬁed accounts and undesirable accounts [13]. User unclassiﬁed accounts are personal proﬁles created by users for a company, organization, or non-human entity such as a pet [14]. The user proﬁles which break Facebook terms of service, including spamming and this is done intentionally for speciﬁc purposes are undesirable accounts [15]. A set of 17 attributes are named and measured, which dictates the actions and behavior of Facebook users. Then, these attributes are preceded as input in setting up learning models. Machine Learning algorithms are divided into two major groups- 1) Supervised and 2) Unsupervised [8]. K-means clustering is one of the simplest and popular unsupervised machine learning algorithms where the input data have an unlabeled response and make presumptions from dataset using only input vectors. It performs the iteration that does the partition of the dataset into K pre-determined independent well-separated clusters where each data point is a member of only one group [9]. Through a deterministic global search process that includes N (where N is the size of the data set) executions of the k-means algorithm from suitable initial positions, it dynamically adds one cluster center at a

298

S. Sen et al.

time. It assembles the similarities to make intra-cluster data points and diﬀerentiate the clusters on the basis of dissimilarities. If k is small, it produces tighter clusters than Hierarchical clustering. Its’ output is strongly impacted by initial inputs like number of clusters and order of data will have a strong impact on the ﬁnal output [10]. Therefore, the arbitrary selection of the initial centroids has a signiﬁcant impact on the quality of the k-means algorithm’s ultimate clustering outcomes. But it requires the speciﬁcation of the number of cluster centers. It is very sensitive to re-scaling and unable to handle the non-linear dataset, noisy data and outliers. Best First Search algorithm is an informed search algorithm and traversal technique which uses both priority queue and heuristic function to ﬁnd the most promising node [11]. Best-ﬁrst searches require a great deal of bookkeeping for keeping track of all compelling nodes. The algorithm uses two lists for tracking the traversal and searching the graph space. They are 1) OPEN, 2) CLOSED. The nodes that are currently open for traversal are listed as “OPEN,” whereas the nodes that have already been traversed are listed as “CLOSED.” This algorithm traverses the shortest path ﬁrst in the queue. The time complexity of the algorithm is O (n*log n). First of all, two empty lists are needed to be created; OPEN and CLOSED. Then we need to start from the initial node and put it in the ordered ‘OPEN’ list [12]. If the ‘OPEN’ list is empty, then exit the loop and return “False”. The ﬁrst node in the OPEN list will then be chosen and moved to the CLOSED list. If N is a goal node, we will add it to the closed list and end the loop by returning “True,” but if N is not a goal node, we will expand N to create the following nodes and add them all to the OPEN list. Finally, the nodes in the OPEN list will then be rearranged in ascending order using an evaluation function f(n).

3

Dataset Description

To prevent identity theft, all the information about Facebook proﬁles, features, and privacy policies should be gathered. A survey form consisting of 25 questions helped us in collecting these data from people. It was posted throughout social media and Facebook groups for research purposes. After waiting for 3–4 weeks, a total of 505 responses were gathered. Added that, the details which people provided in the survey form were completely protected considering their privacy. After data processing, all identifying information was encoded immediately. Names and identiﬁcation numbers, and email IDs were maintained on a protected local server which was available only for fellow teammates. After the last inﬂux of data is processed the collected information will be destroyed instantly. To access any part of the dataset, destined users must need approval from their teammates. The questions are shown in Table 1. As mentioned, the responses were collected for a month. During this time, some ideas for two clustering algorithms and some ideas on how to preprocess the dataset to implement it were gathered by us. The initial idea was to assign every possible answer to all 25 questions in numerical values to run in the dataset.

Fake Proﬁle Detection Using Image Processing and Machine Learning

299

Table 1. List of questions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

How frequently you change your proﬁle picture? How much time do you spend using social media every day? How many FRIENDS do you have? How often do you comment on others activities? How many likes do you get on your posts (On average)? How many comments do you get on your posts each week(On average)? How many Picture album do you have? How many videos do you have in your account? How many artists do you follow on Facebook? How many Facebook groups are you a member of? How often do you post status updates on Facebook? Are your Facebook posts public or private? Do you use the video chat option for Facebook messaging? What did you use to create your account? How often you visit the links you see on Facebook? Do you use Facebook app to use your account? Why do you use your Facebook account for? How many other apps/sites connected with your Facebook account? How often do you share posts of others? How many friend request you sent per week? How many unknown message requests you get from others each week? Do you use your real name/picture on Facebook? Which of the following apps/games you have played? How often you watch live streaming on Facebook? Do you keep your Facebook accounts locked for unknown person?

Then, all the possible answers from the form to assign values for the responses from the sheet were saved there. While checking the responses, there were many responses where people shared their views on those questions diﬀerently. Along with the provided possible options, the individual responses are being taken and assigned numerical values. Also, in some parts where their views on the particular questions were similar, assigned a unique numerical value. This is how preprocessing of the data is done to use for implementation.

300

4

S. Sen et al.

Proposed Model

Following the preprocessing of the data, the next task is to run the dataset in a suitable algorithm to determine the ratio of false and true data from it. Since the purpose is to cluster the dataset into one group and gather all the data in one region to compare those data with some additional test data; It is necessary to ﬁnd out whether the system can distinguish fake or real under the aegis of machine learning. Looking at the classiﬁcation of clustering algorithms, K means algorithm was found relatable to the research. The value of K usually determines the number of clusters. However, only one cluster is needed to store the data in one category for detecting fake accounts. K denotes that the algorithm has a centroid point, and dynamically adds one cluster center at a time using a deterministic global search method made up of N k-means algorithm executions starting from appropriate initial positions [9]. The calculation is done based on the Euclidean distance between two points and data stored simultaneously for every data. The arbitrary selection of the starting centroid has a massive impact on the quality of the kmeans algorithm’s ultimate clustering results [10]. Our dataset contains 25 questions, to which 505 responses have been given. 505 responses were used as training data, with the ﬁrst data points serving as the initial centroid stage. Then, the Euclidean distance was measured for each piece of data and saved value in an array. The centroid points are also modiﬁed for each data point contained in the cluster simultaneously. The aim of storing the distance between each piece of data in an array is to determine the maximum distance between them, which can be used as a threshold. The most recent centroid is also provided for each data point for keeping updated. The data was checked to see if the data printed were true or false after having the most recent centroid and maximal distance for the training dataset. The data points were used as new centroid values and measured the diﬀerence between them and the most recent centroid from the test data. If the distance is greater than the threshold, the data is fake; otherwise, if the new distance is smaller than the threshold, the data is genuine and stored in the cluster (Fig. 1). Since 25 attributes were used, the sophistication of the algorithm and the program’s runtime both are relatively high. This has also had an impact on the quality of the results received. So Weka was used to reduce and search the modiﬁed attributes. Among the three simple algorithms for selecting attributes in Weka, there are two suitable algorithms which are as follows, the best ﬁrst algorithm and the greedy stepwise algorithm. The program was initially tested with both attributes. Upon running, the Best ﬁrst algorithm provided nine attributes, and the depth-wise algorithm provided six attributes. The best ﬁrst search algorithm has been chosen to implement in the dataset for more eﬃciency. After that, the nine attributes were taken, and created a new dataset to run in the algorithm. A start index and end index were also added to the program to count also any speciﬁc index and check with the algorithm to check the limited attributes and ﬁnd out the centroid and Euclidean distance respectively.

Fake Proﬁle Detection Using Image Processing and Machine Learning

301

Fig. 1. Steps of fake account detection system

4.1

Image Processing

Nowadays, one of the latest topics of the tech industry is image processing. There are so many branches of image processing and face recognition is one of them which is the main focus of this project. Steps of face recognition and how the face recognition ﬂow works are shown below respectively in Fig. 2 and 3.

Fig. 2. Steps of face recognition system applications

302

S. Sen et al.

Fig. 3. Face Recognition processing ﬂow

First of all, the user inputs an image. After that it will be checked if the image contains an object such as birds, airplanes, table, horse, chair, cow, dining table, bus, motorbike, dog, sheep, sofa etc. or includes human face. This detection will be done by a face detection model. Figure 4 shows system can detect the picture as car which is why user the user will be denied further access because the user must put the image of human faces. Afterward, in Fig. 5 if the image is of a human face, then the server will try to ﬁnd a match in the dataset of images (Fig. 6).

Fig. 4. Object detection

Fig. 5. Recognizing face using OpenCV

Fake Proﬁle Detection Using Image Processing and Machine Learning

303

Fig. 6. Workﬂow of image processing

4.2

One Time Password(OTP)

The user will be granted access depending on whether the image matches any other image in the database. If matched then an OTP (one-time password) will be sent to the person’s email address that the image matched with to authenticate the user which is shown in Fig. 7. For instance, let’s assume the image matches with the image of person X. An OTP will be sent to that person’s registered email address. There can be a scenario where he/she wants to have a new account. It will solely depend on that person. According to their will, they will be able to keep continuing to open the account.

304

S. Sen et al.

The user retrieves the OTP and inserts it into the prompt to authenticate its identity and obtain access. But if some trespasser is trying to open an account with someone else’s image already in the database, he will be stopped in his tracks. Without OTP which is only available to the actual possessor of the account, he will not be able to proceed.

Fig. 7. Input OTP from email

On the other hand, if the image does not match with the images from the dataset, access will be approved without any further complication. A desperate demand for personal security is being met by this defense system. If an adversary is trying to open an account using someone else’s identity, he/she will no longer be successful. But if somebody wants to have multiple accounts, this process will maintain their privacy and allow them to do so. The image that the user is using is matched within the previous image in the database and OTP will be sent to the person’s email address that the image matched with to authenticate the user. If it is the same user then he or she can enter the OTP and continue opening the account. Without the OTP which is only available to the actual processor of the account not be able to open an account (Fig. 8).

Fake Proﬁle Detection Using Image Processing and Machine Learning

305

Fig. 8. Workﬂow of OTP

5

Implementation and Result Analysis

Once the whole process has been completed, the next step is to determine how accurately the algorithm works. Since there is no available dataset, a dataset has been created for this research. Also, the comparative analysis is not possible for us to do as there is no work on this dataset. To ﬁnd out the percentage, some fabricated data have been added to the dataset to check on the algorithm to measure the accuracy of the research. For binary classiﬁcation, accuracy can be calculated in terms of positive and negative data. We have used performance evaluation equation to measure the accuracy of our work [16].

306

S. Sen et al.

Accuracy =

Tp + Tn T p + T n + Fp + Fn

(1)

To ﬁnd out the accuracy, a few more data are added as test data so that the above-mentioned equation can calculate the accuracy. 480 data are assigned as the training dataset. Then, we fabricated 25 data as True positive and 36 data as True Negative from the test data. After that, 10 False Positive data and 10 False Negative data were added too. Later, these data are calculated in the equation, the accuracy is 75.30%. For more data, the accuracy will be increased. For the larger dataset, the accuracy will rise and might be some more than the current one. There will be a time-to-time evaluation by collecting the data of existing accounts.

6

Analysis of Image Processing

Moreover, to upgrade the execution of human face detection, we are planning to improve many things such as color processing, edge detection, etc. can be added (Table 2). Here is the table below of Human Face detection rateTable 2. Human Face detection rate

Human Face detection is based on ﬁve cases and the average time of this detection is 2–4 s. This procedure has been executed in the Intel(R) Core(TM) i5-7200U CPU @ 2.50 GHz, 2712 Mhz, 2 Core(s), 4 Logical Processor(s). In the future, our goal is to acquire more accurate and precise result by using advanced methodologies and libraries as face detection.

7

Conclusion

To detect fake accounts, a work plan has been planned and developed for the proposed solution. At the dataset, data has been gathered and preprocessed. Then the best algorithm is chosen to distinguish between true and false accounts. To identify fake accounts, the relevant data must be clustered. K-means clustering is chosen for implementation because it is more accurate than the other algorithms for our proposed solution. As a result, the algorithm evaluates the results

Fake Proﬁle Detection Using Image Processing and Machine Learning

307

and accuracy rate. Upon running this algorithm in our dataset, the accuracy is 75.30%. For larger dataset, our accuracy will be increased. Then, the image processing contributes by gathering images and distinguishing between true and false proﬁles. Finally, after matching the user’s data with the database, a one-time password would be submitted to them for authentication. By doing so, false accounts will be identiﬁed and identity fraud can be avoided. The model will help to reduce the number of fake accounts and the vast amount of trouble that can be caused by these accounts. Image processing and OTP are planned together to run on a website. As the motive is to stop the user from creating fake accounts and also none of the genuine users would be aﬀected. There is a chance that genuine users might want to create more than one account. To make this happen, OTP is generated and if any user has already an account, and wants to create one more account, ﬁrstly the image of the user will be checked and then an OTP will be sent to the user’s ﬁrst created account’s email address for veriﬁcation. By doing this, no fake accounts can be created. In the future, the goal is to add more features to help us detect the image. In addition to that, OTP will be sent to the phone number of the user so that it will be easier for him to obtain the password with more ease. Finally, our model will be gradually applied to various other social media such as LinkedIn, Instagram, and Twitter.

References 1. Akyon, F.C., Kalfaoglu, M.E.: Instagram fake and automated account detection. In: 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–7. IEEE (2019) 2. Chen, Y.-C., Wu, S.F.: FakeBuster: a robust fake account detection by activity analysis. In: 2018 9th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 108–110. IEEE (2018) 3. Cresci, S., et al.: Fame for sale: eﬃcient detection of fake Twitter followers. Decis. Support Syst. 80, 56–71 (2015) 4. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003) 5. Mohammadrezaei, M., Shiri, M.E., Rahmani, A.M.: Identifying fake accounts on social networks based on graph analysis and classiﬁcation algorithms. Secur. Commun. Netw. 2018 (2018) 6. Smruthi, M., Harini, N.: A hybrid scheme for detecting fake accounts in Facebook. Int. J. Recent Technol. Eng. (IJRTE) 7, 5S3 (2019) 7. Yedla, M., Pathakota, S.R., Srinivasa, T.M.: Enhancing K-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2), 121–125 (2010) 8. Garbade, M.J.: Understanding K-means in machine learning. Towards Data Science, 18 September 2018 9. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003). https://doi.org/10.1016/s00313203(02)00060-2

308

S. Sen et al.

10. Yedla, M., Pathakota, S.R., Srinivasa,T.M.: Enhanced K-means clustering algorithm with improved initial center. Int. J. Sci. Inf. Technol. 1(2), 121–125 (2010) 11. Berliner, H.: The B tree search algorithm: a best-ﬁrst proof procedure. Artif. Intell. 12(1), 23–40 (1979). https://doi.org/10.1016/0004-3702(79)90003-1 12. Dechter, R., Pearl, J.: Generalized best-ﬁrst search strategies and the optimality of A*. J. ACM 32(3), 505–536 (1985). https://doi.org/10.1145/3828.3830 13. Gupta, A., Kaushal, R.: Towards detecting fake user accounts in Facebook. In: ISEA Asia Security and Privacy (ISEASP) 2017, pp. 1–6. IEEE (2017) 14. Llorens, F., Mora, F.J., Pujol, M., Rizo, R., Villagra, C.: Working with OpenCV and intel image processing libraries. processing image data tools 15. El Azab, A., Idrees, A.M., Mahmoud, M.A., Hefny, V.H.: Fake account detection in twitter based on minimum weighted feature set. Int. Sch 16. Shajihan, N.: Classiﬁcation of stages of Diabetic Retinopathy using Deep Learning

A Novel Texture Descriptor Evaluation Window Based Adjacent Distance Local Binary Pattern (EADLBP) for Image Classification Most. Maria Akter Misti1 , Sajal Mondal1 , Md Anwarul Islam Abir1(B) , and Md Zahidul Islam2 1

2

Department of CSE, Green University of Bangladesh, Dhaka, Bangladesh [email protected] Department of Information and Communication Technology, Islamic University, Kushtia, Bangladesh Abstract. In this research, we suggested a novel texture descriptor distance-based Adjacent Local Binary Pattern AdLBP based on the adjacent neighbor window and the relationships among the sequential neighbors pixel value with a given distance parameter. The suggested technique calculates the neighbor and extracts the binary code from the adjacent neighborhood window and surrounding sub-image window in order to improve the adjacent neighbor information and change the conventional LBP thresholding schema. Additionally, we expanded this adjacent distance-based local binary pattern AdLBP and combined it with the evaluation window-based local binary pattern EwLBP to create a texture descriptor for texture classiﬁcation that is more robust texture descriptor against noise. Finally combine AdLBP And EwLBP using encoding strategy to propose an Evaluation window based on Adjacent Distance Local Binary Pattern EADLBP descriptor for Image Classiﬁcation. These descriptors are tested with the KTH-TIPS, KTH-TIPS2b to the applicability of the proposed method. In comparison, the proposed EADLBP approach is more robust against noise and consistently outperform all of the fundamental methods. Keywords: neighborhoods Local binary patterns nLBPd · Adjacent Distance Local Binary Pattern EADLBP · Evaluation Window based Local Binary Pattern EwLBP

1

Introduction

Applications like face identiﬁcation, remote sensing, document analysis, medical image analysis, ﬁngerprint identiﬁcation, and classiﬁcation of actual outdoor images have all made extensive use of texture analysis, which is crucial to image processing and computer vision frameworks. The widely used techniques in practice are Local Binary Patterns LBP, despite the fact that many diﬀerent c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 309–317, 2023. https://doi.org/10.1007/978-3-031-34619-4_25

310

M. M. A. Misti et al.

strategies for extracting texture features have already been made out in past few decades. Among the easiest methods is LBP proposed by Ojala et al. [1] which is used to describe texture. It makes use of a local structure or an image’s statistical intensity and Each pixel is compared to its neighbors using LBP. Neighbors are placed on a circle surrounding the pixel. To show these relationships, a binary pattern is converted into a histogram. [2,3]. However, there are numerous problems, such as noise sensitivity and lighting variation. Several studies, including Completed Local Binary Complete CRLBP [4], have been suggested to enhance LBP performance. The Improved Completed Robust Local Binary Pattern (ICRLBP) [5] is a new approach method that is suggested because CRLBP is not rotationally invariant. This study describes a novel technique called enhanced micro structure descriptor EMSD for characterizing batik images. Guang-Hai Liu’s [6] proposed micro-structure descriptor MSD has an enhancement model called EMSD. The M2ECSLBP algorithm also has the beneﬁt of having computational times that are faster and more eﬃcient than LBP or other approaches. However, noise particularly aﬀects the LBP conventional encoding strategy. Local binary patterns become unstable because random noise can quickly change the values of the neighbors. Circular searches might miss this type of texture because micro-patterns can occasionally be oriented through a pixel. The “micro-structure” information of LBP and Local binary patterns by neighborhoods nLBPd [7] are investigated respectively.” However, noise interference has a more signiﬁcant impact on smaller amounts of “micro-structure” information. This could be the cause of LBP and its variant’s sensitivity to noise. In nLBPd [7], not capture the immediate adjacent neighbor information and also color information. LBP, nLBPd, and most of the variant descriptors are vulnerable to noise. Random noise can easily change the values of the neighbors, and that is the cause of local binary patterns. Taking this into account, we attempted to construct a distance-based Adjacent Local Binary Pattern which is based on the window of the adjacent neighbor that calculates the pixel value near the adjacent neighbor. The the rest of this article is structured as follows: The Background study is presented in Sect. 2. The proposed AdLBP, EwLBP, and EADLBP are presented in detail in Sect. 3. The experiments are then described in more detail in Sect. 4 along with the ﬁndings. Section 5 of the paper brings it to a conclusion.

2

Background Study

There are numerous texture feature extraction methodologies in the literature [8–11]. These techniques are typically broken down into four categories: – – – –

Statistical strategies Structural strategies Model-based strategies Filter-based strategies

EADLBP

311

Statistical and model-based strategies frequently explore the spatial relations of pixels based on small pixel neighborhoods.Markov random ﬁeld models (MRF),Local binary patterns (LBP) and gray level co-occurrence matrices (GLCM) [2] are the most widely used of these techniques. 2.1

Local Binary Pattern LBP

local binary pattern LBP [1], proposed in texture analysis to evaluate local contrast, searches for micro-textons in a very local region. As shown in Fig. 1, A binary pattern was produced by thresholding the value of the pixel in the center of each neighboring pixel.

Fig. 1. Encoding Process as basic in Local Binary Pattern LBP.

Using Eq. 1, deﬁne texture in a local neighborhood as a gray scale texture invariant measure derived from a basic deﬁnition of texture. The original LBP only took into account a pixel’s eight neighbors, but it has since been expanded to include all of the circular neighbors with number of pixels. LBPP,R (xc , yc ) = s(x) = 2.2

p−1

s(gp − gc )2p

(1)

p=0

0, x < 0 1, x ≥ 0

(2)

Local Binary Patterns by Neighborhoods nLBPd

Y.kaya [7] proposed two brand-new local binary pattern descriptors for texture analysis to detect unique patterns in images. And ﬁrst one is dependent on the relationships between subsequent neighborhoods of a center pixel, nLBPd, at a given distance, while the second one, dLBPα , that focuses on identifying the neighbors within the same orientation through central pixel (dLBP) parameter. This descriptor, nLBPd, is based on the relationship between eight pixels’ neighbors, P = P0, P1, P2, P3, P4, P5, P6, P7, with one another. With the

312

M. M. A. Misti et al.

distance parameter d, a speciﬁc neighbor pixel value is computed sequentially with the following immediate neighbor pixel value. Based on the pixel value of the same orientation at an angle that may also take 0◦ , 45◦ , 90◦ or 130◦ degrees, the comparison between dLBPα is made.

3 3.1

Proposed Texture Descriptor Proposed Adjacent Distance Based Local Binary Pattern AdLBP

A new feature extraction technique, adjacent local binary patterns by neighborhoods based on distance AdLBP is proposed. AdLBP is proposed to improve the performance of the feature extraction technique. AdLBP is based on the local binary patterns by neighborhoods nLBPd proposed by kaya [7]. AdLBP the comparison is done by pixels in the adjacent orientation based on distance parameter, clockwise. Here we consider two 3*3 windows one is a sub-image window (AdLBPcurrent,d=1 ) and another is an adjacent neighborhood window (AdLBPnext,d=1 )d for collecting the adjacent neighbor information. The above descriptor focuses on the relationships among both 8 neighbors of 3*3 sub-image window and 8 neighbors of adjacent neighborhood windows respectively, p = p0, p1, p2, p3, p4, p5, p6, p7, with each other around a pixel. Each neighboring pixel’s value is compared to the pixel next to it during the comparison, which only accepts values of 1 or 0. In the preceding procedure, we obtain a binary number pattern for each center pixel in 3 × 3 window. In addition, a predetermined weight is multiplied by this binary number, and the resulting values are added to obtain the AdLBP pattern value for a center pixel. With the help of this method, the feature map for a speciﬁc image can be created after the pattern of every pixel has been extracted. The ﬁnal feature vector, which is the histogram of the feature map, is derived from the feature map. AdLBP can be calculated by Eq. 7 and current window and next or adjacent window can calculated by Eq. 5 . The (AdLBPd=1 ) calculation Fig. 2 depicts the procedure (Fig. 3).

Fig. 2. The Encoding process of AdLBP

EADLBP

313

Fig. 3. Overview of relationships between neighbors within AdLBP

Distance = 1 ⇒ AdLBP(current,d=1) ⇒ P c = S(129 > 158), S(158 > 150), S(150 > 164), S(164 > 155), S(155 > 141), S(141 > 108), S(108 > 103), S(103 > 129) and Pc = {0, 1, 0, 1, 1, 1, 1, 0}

so Pc takes the value 94 Distance = 1 ⇒ AdLBP(next,d=1) ⇒ P c = S(158 > 150), S(150 > 134), S(134 > 136), S(136 > 155), S(155 > 155), S(155 > 141), S(141 > 150), S(150 > 158) and Pc = {1, 1, 0, 0, 0, 1, 0, 0}

so Pc takes the value 196 AdLBP(current,d=1 ) =

i=0

(Pi >Pj )

i=8

⇒ AdLBP(next,d=1 ) =

⎧ j=0 ⎨ ⎩

j=8

(3)

⎫ ⎬ (Pi >Pj ) ⎭

(4)

where Pc is the center of gray intensity value. Pi and Pj are gray intensity values of neighboring pixel of current frame (AdLBPcurrent,d=1 ) and next frame (AdLBPnext,d=1 ) and sampling center pixel and s(p) is deﬁned as Eq. 6. The ﬁnal feature vector is obtained by Eq. 7. ⎧ ⎫⎤ ⎡ ⎨i=0,j=0 ⎬ AdLBP(current,d=1 ) ⇒ ⎣AdLBP(next,d=1 ) s(Pi >Pj )2p ⎦ (5) ⎩ ⎭ i=8,j=8

s (Pi >Pj ) ←→ s (P ) ←→

1 if Pi >Pj , 0 if Pi≤ Pj

1, if X(AdLBPcurrent,d=1 ) , Y (AdLBPnext,d=1 ) > 0 0, else

(6)

(7)

where X(AdLBPcurrent,d=1 ) is the value of current frame and Y(AdLBPnext,d=1 ) denote as the value of next/adjacent frame of AdLBP.

314

3.2

M. M. A. Misti et al.

Proposed Evaluation Window Based Local Binary Pattern EwLBP

Following that, we join an evaluation window that is centered on the neighbor to reduce noise interference. In Evaluation Window-based Local Binary Pattern EwLBP, It reduces the noise of each neighbor pixel. On the other hand local binary pattern and its variant descriptor focus on only the canter. The suggested techniques continue to be more resistant to Gaussian noise. The proposed method creates an evaluation window to improve the traditional LBP and its variant threshold scheme. We can see that EwLBP is more robust than LBP against noise interference. It is commonly assumed that LBP can eﬀectively describe local texture by being able to detect “micro-structure” [1]. The smaller “micro-structure” information, however, the greater impact of noise interference. This could be the cause of LBP’s sensitivity to noise. Since our proposed method AnLBPd is a variant of LBP, we adopt an Evaluation Window-based Local Binary Pattern EwLBP to improve the robustness of the feature. Instead of using the values of the neighbors, EwLBP creates an evaluation window for each Pth number of positions and applies the average concept of each pixel value of the evaluation window, which can more eﬀectively reduce the noise of each neighbor’s values. Additionally, the use of the evaluation window expands the scale of the “micro-structure” information. As a result, it is possible to eﬀectively extract the “micro-structure” information, allowing EwLBP to achieve higher classiﬁcation accuracy despite noise interference. Equation 8 enables the calculation of EwLBP. EwLBPP,R =

P −1

S (xp− gc ) 2p

P =0

s (x) ←→

1 if x≥0, 0 if x