147 71 67MB
English Pages [597]
Md. Shahriare Satu Mohammad Ali Moni M. Shamim Kaiser Mohammad Shamsul Arefin (Eds.)
490
Machine Intelligence and Emerging Technologies First International Conference, MIET 2022 Noakhali, Bangladesh, September 23–25, 2022 Proceedings, Part I
Part 1
Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Editorial Board Members Ozgur Akan, Middle East Technical University, Ankara, Türkiye Paolo Bellavista, University of Bologna, Bologna, Italy Jiannong Cao, Hong Kong Polytechnic University, Hong Kong, China Geoffrey Coulson, Lancaster University, Lancaster, UK Falko Dressler, University of Erlangen, Erlangen, Germany Domenico Ferrari, Università Cattolica Piacenza, Piacenza, Italy Mario Gerla, UCLA, Los Angeles, USA Hisashi Kobayashi, Princeton University, Princeton, USA Sergio Palazzo, University of Catania, Catania, Italy Sartaj Sahni, University of Florida, Gainesville, USA Xuemin Shen , University of Waterloo, Waterloo, Canada Mircea Stan, University of Virginia, Charlottesville, USA Xiaohua Jia, City University of Hong Kong, Kowloon, Hong Kong Albert Y. Zomaya, University of Sydney, Sydney, Australia
490
The LNICST series publishes ICST’s conferences, symposia and workshops. LNICST reports state-of-the-art results in areas related to the scope of the Institute. The type of material published includes • Proceedings (published in time for the respective event) • Other edited monographs (such as project reports or invited volumes) LNICST topics span the following areas: • • • • • • • •
General Computer Science E-Economy E-Medicine Knowledge Management Multimedia Operations, Management and Policy Social Informatics Systems
Md. Shahriare Satu · Mohammad Ali Moni · M. Shamim Kaiser · Mohammad Shamsul Arefin Editors
Machine Intelligence and Emerging Technologies First International Conference, MIET 2022 Noakhali, Bangladesh, September 23–25, 2022 Proceedings, Part I
Editors Md. Shahriare Satu Noakhali Science and Technology University Noakhali, Bangladesh
Mohammad Ali Moni The University of Queensland St. Lucia, QLD, Australia
M. Shamim Kaiser Jahangirnagar University Dhaka, Bangladesh
Mohammad Shamsul Arefin Daffodil International University Dhaka, Bangladesh Chittagong University of Engineering and Technology Chattogram, Bangladesh
ISSN 1867-8211 ISSN 1867-822X (electronic) Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ISBN 978-3-031-34618-7 ISBN 978-3-031-34619-4 (eBook) https://doi.org/10.1007/978-3-031-34619-4 © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Machine intelligence is the practice of designing computer systems to make intelligent decisions based on context rather than direct input. It is important to understand that machine intelligence relies on a huge volume of data. On the other hand, the emerging technology can be termed as a radically novel and relatively fast-growing technology characterized by a certain degree of coherence that persists over time and with the potential to exert a significant impact on the socio-economic domain which is observed in terms of the composition of actors, institutions, and patterns of interactions among those, as well as the associated knowledge production processes. To that end, the 1st International Conference on Machine Intelligence and Emerging Technologies (MIET 2022) provided an opportunity to engage researchers, academicians, industry professionals, and experts in the multidisciplinary field and to share their cutting-edge research results gained through the application of Machine Learning, Data Science, Internet of Things, Cloud Computing, Sensing, and Security. Researchers in the relevant fields were invited to submit their original, novel, and extended unpublished works at this conference. The conference was hosted by Noakhali Science and Technology University (NSTU), Sonapur, Noakhali, 3814, Bangladesh. Support for MIET 2022 came from the IEEE Computer Society Bangladesh Chapter, Center for Natural Science and Engineering Research (CNSER), University Grant Commission (UGC) Bangladesh, Agrani Bank Limited, Mercantile Bank Limited, Union Bank Limited, Globe Pharmaceuticals Limited, EXIM Bank Limited and Janata Bank Limited. The papers presented at MIET 2022 covered theoretical and methodological frameworks for replicating applied research. These articles offer a representative cross-section of recent academic progress in the study of AI and IT’s wide-ranging applications. There are five main categories that the accepted papers cover: (1) imaging for disease detection; (2) pattern recognition and NLP; (3) biosignals and recommendation systems for well-being; (4) network, security, and nanotechnology; and (5) emerging technologies for society and industry. The five MIET 2022 streams received a total of 272 entries from writers in 12 different nations. There was one round of double-blind review performed on all articles, and each was read by at least two experts (one of whom was the managing chair). In the end, 104 full papers from authors in 9 countries were accepted for presentation at the conference after a thorough review procedure in which reports from the reviewers and the track chairs on the various articles were considered. All 104 papers that were presented physically at MIET 2022 are included in this volume of the proceedings. All of us on the MIET 2022 committee are indebted to the committee members’ tireless efforts and invaluable contributions. Without the hard work and dedication of the MIET 2022 Program Committee members in assessing the conference papers, we would not have had the fantastic program that we had. The success of MIET 2022 is due to the hard work of many people and the financial backing of our kind sponsors. We’d like to give particular thanks to Springer Nature and the rest of the Springer LNICST and EAI team for their support of
vi
Preface
our work. We’d like to extend our appreciation to the Springer and EAI teams for their tireless efforts in managing the publishing of this volume. Finally, we’d like to express our gratitude to everyone who has helped us prepare for MIET 2022 and contributed to it in any way. November 2022
Md. Shahriare Satu Mohammad Ali Moni M. Shamim Kaiser Mohammad Shamsul Arefin
Organization
International Advisory Committee Ali Dewan Amir Hussain Anirban Bandyopadhyay Anton Nijholt Chanchal K. Roy David Brown Enamul Hoque Prince Jarrod Trevathan Joarder Kamruzzaman Kanad Ray Karl Andersson Kenichi Matsumoto M. Julius Hossain Md. Atiqur Rahman Ahad Mohammad Tariqul Islam Manzur Murshed Mukesh Prasad Ning Zhong Pranab K. Muhuri Saidur Rahman Saifur Rahman Shariful Islam Stefano Vassanelli Suresh Chandra Satapathy Syed Ishtiaque Ahmed Upal Mahbub V. R. Singh Yang Yang Yoshinori Kuno Zamshed Chowdhury
Athabasca University, Canada Edinburgh Napier University, UK National Institute for Materials Science, Japan University of Twente, The Netherlands University of Saskatchewan, Canada Nottingham Trent University, UK York University, Canada Griffith University, Australia Federation University, Australia Amity University, India Luleå University of Technology, Sweden Nara Institute of Science and Technology, Japan European Molecular Biology Laboratory, Germany University of East London, UK Universiti Kebangsaan Malaysia, Malaysia Federation University, Australia University of Technology Sydney, Australia Maebashi Institute of Technology, Japan South Asian University, India Sunway University, Malaysia Virginia Tech Advanced Research Institute, USA Deakin University, Australia University of Padova, Italy KIIT Deemed to be University, India University of Toronto, Canada Qualcomm Inc., USA National Physical Laboratory, India Maebashi Institute of Technology, Japan Saitama University, Japan Intel Corporation, USA
National Advisory Committee A. B. M. Siddique Hossain A. Z. M. Touhidul Islam
AIUB, Bangladesh RU, Bangladesh
viii
Organization
Abu Sayed Md. Latiful Hoque Dibyadyuti Sarkar Hafiz Md. Hasan Babu Kaushik Deb Firoz Ahmed Kazi Muheymin-Us-Sakib M. Lutfar Rahman M. M. A. Hashem M. Rizwan Khan M. Sohel Rahman Md. Liakot Ali Md. Mahbubur Rahman Md. Sazzad Hossain Mohammod Abul Kashem Mohammad Hanif Mohammad Kaykobad Mohammad Mahfuzul Islam Mohammad Salim Hossain Mohammad Shahidur Rahman Mohammad Shorif Uddin Mohd Abdur Rashid Mozammel Huq Azad Khan Muhammad Quamruzzaman Munaz Ahmed Noor Nasim Akhtar Newaz Mohammed Bahadur S. I. Khan Subrata Kumar Aditya Suraiya Pervin Syed Akhter Hossain
BUET, Bangladesh NSTU, Bangladesh DU, Bangladesh CUET, Bangladesh NSTU, Bangladesh DU, Bangladesh DIU, Bangladesh KUET, Bangladesh UIU, Bangladesh BUET, Bangladesh BUET, Bangladesh MIST, Bangladesh UGC, Bangladesh DUET, Bangladesh NSTU, Bangladesh BRACU, Bangladesh BUET, Bangladesh NSTU, Bangladesh SUST, Bangladesh JU, Bangladesh NSTU, Bangladesh EWU, Bangladesh CUET, Bangladesh BDU, Bangladesh CSTU, Bangladesh NSTU, Bangladesh BRACU, Bangladesh SHU, Bangladesh DU, Bangladesh CUB, Bangladesh
Organizing Committee General Chairs Pietro Lió A. B. M. Shawkat Ali Nazmul Siddique
University of Cambridge, UK CQUniversity, Australia University of Ulster, UK
Organization
ix
General Co-chairs Ashadun Nobi Mohammad Ali Moni Mufti Mahmud
NSTU, Bangladesh University of Queensland, Australia Nottingham Trent University, UK
TPC Chairs Ashikur Rahman Khan M. Shamim Kaiser Mohammad Shamsul Arefin
NSTU, Bangladesh JU, Bangladesh DIU, Bangladesh
TPC Co-chairs Md. Amran Hossain Bhuiyan Md. Zahidul Islam
NSTU, Bangladesh IU, Bangladesh
Track Chairs Fateha Khanam Bappee M. Shamim Kaiser Nazia Majadi Rashed Mustafa Moahammad Jamshed Patwary Md. Atiqur Rahman Ahad Yusuf Sulistyo Nugroho Mohammad Ali Moni Md. Amran Hossain Bhuiyan Firoz Mridha Ashik Iftekher Mohammad Shamsul Arefin Syful Islam
NSTU, Bangladesh JU, Bangladesh NSTU, Bangladesh CU, Bangladesh IIUC, Bangladesh University of East London, UK Universitas Muhammadiyah Surakarta, Indonesia University of Queensland, Australia NSTU, Bangladesh AIUB, Bangladesh Nikon Corporation, Japan DIU, Bangladesh NSTU, Bangladesh
General Secretary S. M. Mahabubur Rahman
NSTU, Bangladesh
Joint Secretaries Md. Auhidur Rahman Md. Iftekharul Alam Efat Mimma Tabassum
NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh
x
Organization
Technical Secretary Koushik Chandra Howlader
North Dakota State University, USA
Organizing Chair Md. Shahriare Satu
NSTU, Bangladesh
Organizing Co-chairs A. Q. M. Salauddin Pathan Md. Abidur Rahman
NSTU, Bangladesh NSTU, Bangladesh
Finance Subcommittee Main Uddin Md. Javed Hossain Md. Omar Faruk Md. Shahriare Satu
NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh
Keynote Selection Subcommittee A. R. M. Mahamudul Hasan Rana NSTU, Bangladesh Nazia Majadi NSTU, Bangladesh Special Session Chairs Md. Kamal Uddin Zyed-Us-Salehin
NSTU, Bangladesh NSTU, Bangladesh
Tutorials Chairs Dipanita Saha Sultana Jahan Soheli
NSTU, Bangladesh NSTU, Bangladesh
Panels Chair Falguny Roy
NSTU, Bangladesh
Workshops Chair Nishu Nath
NSTU, Bangladesh
Organization
Proceeding Publication Committee Md. Shahriare Satu Mohammad Ali Moni M. Shamim Kaiser Mohammad Shamsul Arefin
NSTU, Bangladesh University of Queensland, Australia JU, Bangladesh DIU, Bangladesh
Industrial Session Chairs Apurba Adhikhary Tanvir Zaman Khan
NSTU, Bangladesh NSTU, Bangladesh
Project and Exhibition Chairs Dipok Chandra Das Rutnadip Kuri
NSTU, Bangladesh NSTU, Bangladesh
Publicity Chair Muhammad Abdus Salam
NSTU, Bangladesh
Kit and Registration Chairs K. M. Aslam Uddin Md. Bipul Hossain
NSTU, Bangladesh NSTU, Bangladesh
Venue Preparation Subcommittee Md. Habibur Rahman Md. Shohel Rana Md. Hasnat Riaz
NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh
Accommodation and Food Management Subcommittee Kamruzaman Md. Mamun Mia Subrata Bhowmik
NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh
Public Relation Chairs Md. Al-Amin Tasniya Ahmed
NSTU, Bangladesh NSTU, Bangladesh
xi
xii
Organization
Award Chair Md. Abul Kalam Azad
NSTU, Bangladesh
International Guest Management Subcommittee Iftakhar Parvez Tonmoy Dey Trisha Saha
NSTU, Bangladesh NSTU, Bangladesh NSTU, Bangladesh
Web Masters Md. Jane Alam Adnan Rahat Uddin Azad
NSTU, Bangladesh NSTU, Bangladesh
Graphics Designers Mohit Sarkar Shamsun Nahar Needhe
NSTU, Bangladesh Hezhou University, China
Technical Program Committee A. K. M. Mahbubur Rahman A. S. M. Sanwar Hosen A. A. Mamun A. F. M. Rashidul Hasan Abdul Kader Muhammad Masum Abdul Kaium Masud Abdullah Nahid Abdur Rahman Bin Shahid Abdur Rouf A. B. M. Aowlad Hossain Adnan Anwar Ahmed Imteaj Ahmed Wasif Reza Ahsanur Rahman Alessandra Pedrocchi Alex Ng Anindya Das Antar Anirban Bandyopadhyay Antesar Shabut
IUB, Bangladesh JNU, South Korea JU, Bangladesh RU, Bangladesh IIUC, Bangladesh NSTU, Bangladesh KU, Bangladesh Concord University, USA DUET, Bangladesh KUET, Bangladesh Deakin University, Australia Florida International University, USA EWU, Bangladesh NSU, Bangladesh Politecnico di Milano, Italy La Trobe University, Australia University of Michigan, USA NIMS, Japan Leeds Trinity University, UK
Organization
Antony Lam Anup Majumder Anupam Kumar Bairagi Arif Ahmad Asif Nashiry A. S. M. Kayes Atik Mahabub Aye Su Phyo Azizur Rahman Babul Islam Banani Roy Belayat Hossain Boshir Ahmed Chandan Kumar Karmakar Cosimo Ieracitano Cris Calude Derong Liu Dewan Md. Farid Dipankar Das Duong Minh Quan Eleni Vasilaki Emanuele Ogliari Enamul Hoque Prince Ezharul Islam Farah Deeba Fateha Khanam Bappee Francesco Carlo Morabito Gabriela Nicoleta Sava Giancarlo Ferregno Golam Dastoger Bashar H. Liu Habibur Rahman Hishato Fukuda Imtiaz Mahmud Indika Kumara Iqbal Hasan Sarkar Joarder Kamruzzaman John H. L. Hansen Jonathan Mappelli
xiii
Mercari Inc., Japan JU, Bangladesh KU, Bangladesh SUST, Bangladesh JUST, Bangladesh La Trobe University, Australia Concordia University, Canada Computer University Kalay, Myanmar City University of London, UK RU, Bangladesh University of Saskatchewan, Canada Loughborough University, UK RUET, Bangladesh Deakin University, Australia University Mediterranea of Reggio Calabria, Italy University of Auckland, New Zealand University of Illinois at Chicago, USA UIU, Bangladesh RU, Bangladesh University of Da Nang, Vietnam University of Sheffield, UK Politechnico di Milano, Italy York University, Canada JU, Bangladesh DUET, Bangladesh NSTU, Bangladesh Mediterranean University of Reggio Calabria, Italy University POLITEHNICA of Bucharest, Romania Politechnico di Milano, Italy Boise State University, USA Wayne State University, USA IU, Bangladesh Saitama University, Japan Kyungpook National University, South Korea Jheronimus Academy of Data Science, The Netherlands CUET, Bangladesh Federation University, Australia University of Texas at Dallas, USA University of Modena, Italy
xiv
Organization
Joyprokash Chakrabartty Kamruddin Md. Nur Kamrul Hasan Talukder Kawsar Ahmed K. C. Santosh Khan Iftekharuddin Khondaker Abdullah-Al-Mamun Khoo Bee Ee Lamia Iftekhar Linta Islam Lu Cao Luca Benini Luca Berdondini Luciano Gamberini M. Tanseer Ali M. Firoz Mridha M. Julius Hossain M. M. Azizur Rahman M. Tariqul Islam Mahfuzul Hoq Chowdhury Mahmudul Kabir Manjunath Aradhya Manohar Das Marzia Hoque Tania Md. Badrul Alam Miah Md. Faruk Hossain Md. Fazlul Kader Md. Manirul Islam Md. Saiful Islam Md. Sanaul Haque Md. Shirajum Munir Md. Whaiduzzaman Md. Abdul Awal Md. Abdur Razzak Md. Abu Layek Md. Ahsan Habib Md. Al Mamun Md. Amzad Hossain Md. Golam Rashed Md. Hanif Seddiqui Md. Hasanul Kabir Md. Hasanuzzaman
CUET, Bangladesh AIUB, Bangladesh KU, Bangladesh University of Saskatchewan, Canada University of South Dakota, USA Old Dominion University, USA UIU, Bangladesh Universiti Sains Malaysia, Malaysia NSU, Bangladesh Jagannath University, Bangladesh Saitama University, Japan ETH, Switzerland IIT, Italy University of Padova, Italy AIUB, Bangladesh AIUB, Bangladesh EMBL, Germany Grand Valley State University, USA Universiti Kebangsaan, Malaysia CUET, Bangladesh Akita University, Japan JSS S&T University, India Oakland University, USA University of Oxford, UK UPM, Malaysia RUET, Bangladesh CU, Bangladesh AIUB, Bangladesh CUET, Bangladesh University of Oulu, Finland Kyung Hee University, South Korea Queensland University of Technology, Australia KU, Bangladesh IUB, Bangladesh Jagannath University, Bangladesh MBSTU, Bangladesh RUET, Bangladesh NSTU, Bangladesh RU, Bangladesh CU, Bangladesh IUT, Bangladesh DU, Bangladesh
Organization
Md. Kamal Uddin Md. Mahfuzur Rahman Md. Murad Hossain Md. Nurul Islam Khan Md. Obaidur Rahman Md. Raju Ahmed Md. Rakibul Hoque Md. Saiful Islam Md. Sanaul Rabbi Md. Shamim Ahsan Md. Shamim Akhter Md. Sipon Miah Md. Ziaul Haque Mehdi Hasan Chowdhury Michele Magno Milon Biswas Min Jiang Mohammad Abu Yousuf Mohammad Hammoudeh Mohammad Mehedi Hassan Mohammad Motiur Rahman Mohammad Nurul Huda Mohammad Osiur Rahman Mohammad Zoynul Abedin Mohiuddin Ahmed Monirul Islam Sharif Monjurul Islam Muhammad Mahbub Alam Muhammed J. Alam Patwary Nabeel Mohammed Nahida Akter Nashid Alam Nasfikur Rahman Khan Nelishia Pillay Nihad Karim Chowdhury Nilanjan Dey Noushath Shaffi Nur Mohammad Nursadul Mamun Omaru Maruatona Omprakash Kaiwartya Osman Ali
xv
NSTU, Bangladesh Queensland University of Technology, Australia University of Turin, Italy BUET, Bangladesh DUET, Bangladesh DUET, Bangladesh DU, Bangladesh Griffith University, Australia CUET, Bangladesh KU, Bangladesh SUB, Bangladesh IU, Bangladesh NSTU, Bangladesh City University of Hong Kong, China ETH, Switzerland University of Alabama at Birmingham, USA Xiamen University, China JU, Bangladesh Manchester Metropolitan University, UK King Saud University, KSA MBSTU, Bangladesh UIU, Bangladesh CU, Bangladesh Teesside University, UK Edith Cowan University, Australia Google, USA Canberra Institute of Technology, Australia IUT, Bangladesh IIUC, Bangladesh NSU, Bangladesh UNSW, Australia Aberystwyth University, UK University of South Alabama, USA University of Pretoria, South Africa CU, Bangladesh JIS University, India College of Applied Sciences, Oman CUET, Bangladesh University of Texas at Dallas, USA Aiculus Pty Ltd, Australia Nottingham Trent University, UK NSTU, Bangladesh
xvi
Organization
Paolo Massobrio Partha Chakraborty Paul Watters Phalguni Gupta Pranab Kumar Dhar Rahma Mukta Ralf Zeitler Ramani Kannan Rameswar Debnath Rashed Mustafa Risala Tasin Khan Rokan Uddin Faruqui Roland Thewes Ryote Suzuki S. M. Rafizul Haque S. M. Riazul Islam S. M. Abdur Razzak Saiful Azad Saifur Rahman Sajal Halder Sajib Chakraborty Sajjad Waheed Samrat Kumar Dey Sayed Asaduzzaman Sayed Mohsin Reza Sazzadur Rahman Shafkat Kibria Shahidul Islam Khan Shahriar Badsha Shamim A. Mamun Shanto Roy Sharmin Majumder Silvestro Micera Surapong Uttama Syed Md. Galib Syful Islam Tabin Hassan Tamal Adhikary Tarique Anwar Tauhidul Alam Tawfik Al-Hadhrami Themis Prodomakis
University of Genova, Italy Cumilla University, Bangladesh La Trobe University, Australia IIT Kanpur, India CUET, Bangladesh UNSW, Australia Venneos GmbH, Germany Universiti Teknologi PETRONAS, Malaysia KU, Bangladesh CU, Bangladesh JU, Bangladesh CU, Bangladesh Technical University of Berlin, Germany Saitama University, Japan Canadian Food Inspection Agency, Canada Sejong University, South Korea RUET, Bangladesh Universiti Malaysia Pahang, Malaysia University of North Dakota, USA RMIT, Australia Vrije Universiteit Brussel, Belgium MBSTU, Bangladesh BOU, Bangladesh University of North Dakota, USA University of Texas at El Paso, USA JU, Bangladesh SUST, Bangladesh IIUC, Bangladesh University of Nevada, USA JU, Bangladesh University of Houston, USA Texas A&M University, USA Scuola Superiore Sant’Anna, Italy Mae Fah Luang University, Thailand JUST, Bangladesh NSTU, Bangladesh AIUB, Bangladesh University of Waterloo, Canada Macquarie University, Australia LSU Shreveport University, USA Nottingham Trent University, UK University of Southampton, UK
Organization
Thompson Stephan Tianhua Chen Tingwen Huang Tomonori Hashiyama Touhid Bhuiyan Tushar Kanti Saha Wladyslaw Homenda Wolfgang Maas Yasin Kabir Yusuf Sulistyo Nugroho Zubair Fadlullah
xvii
M. S. Ramaiah University of Applied Sciences, India University of Huddersfield, UK Texas A&M University, Qatar University of Electro-Communications, Japan DIU, Bangladesh JKKNIU, Bangladesh Warsaw University of Technology, Poland Technische Universität Graz, Austria Missouri University of Science and Technology, USA Universitas Muhammadiyah Surakarta, Indonesia Lakehead University, Canada
Contents – Part I
Imaging for Disease Detection Potato-Net: Classifying Potato Leaf Diseases Using Transfer Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abu Kowshir Bitto, Md.Hasan Imam Bijoy, Aka Das, Md.Ashikur Rahman, and Masud Rabbani False Smut Disease Detection in Paddy Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nahid Hasan, Tanzila Hasan, Shahadat Hossain, and Md. Manzurul Hasan
3
15
Gabor Wavelet Based Fused Texture Features for Identification of Mungbean Leaf Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarna Majumder, Badhan Mazumder, and S. M. Taohidul Islam
22
Potato Disease Detection Using Convolutional Neural Network: A Web Based Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jannathul Maowa Hasi and Mohammad Osiur Rahman
35
Device-Friendly Guava Fruit and Leaf Disease Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rabindra Nath Nandi, Aminul Haque Palash, Nazmul Siddique, and Mohammed Golam Zilani Cassava Leaf Disease Classification Using Supervised Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adit Ishraq, Sayefa Arafah, Sadiya Akter Mim, Nusrat Jahan Shammey, Firoz Mridha, and Md. Saifur Rahman
49
60
Diabetes Mellitus Prediction Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . Md Ifraham Iqbal, Ahmed Shabab Noor, and Ahmed Rafi Hasan
72
An Improved Heart Disease Prediction Using Stacked Ensemble Method . . . . . . Md. Maidul Islam, Tanzina Nasrin Tania, Sharmin Akter, and Kazi Hassan Shakib
84
Improved and Intelligent Heart Disease Prediction System Using Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nusrat Alam, Samiul Alam, Farzana Tasnim, and Sanjida Sharmin
98
xx
Contents – Part I
PreCKD_ML: Machine Learning Based Development of Prediction Model for Chronic Kidney Disease and Identify Significant Risk Factors . . . . . . 109 Md. Rajib Mia, Md. Ashikur Rahman, Md. Mamun Ali, Kawsar Ahmed, Francis M. Bui, and S M Hasan Mahmud A Reliable and Efficient Transfer Learning Approach for Identifying COVID-19 Pneumonia from Chest X-ray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Sharmeen Jahan Seema and Mosabber Uddin Ahmed Infection Segmentation from COVID-19 Chest CT Scans with Dilated CBAM U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Tareque Bashar Ovi, Md. Jawad-Ul Kabir Chowdhury, Shaira Senjuti Oyshee, and Mubdiul Islam Rizu Convolutional Neural Network Model to Detect COVID-19 Patients Utilizing Chest X-Ray Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Md. Shahriare Satu, Khair Ahammed, Mohammad Zoynul Abedin, Md. Auhidur Rahman, Sheikh Mohammed Shariful Islam, A. K. M. Azad, Salem A. Alyami, and Mohammad Ali Moni Classification of Tumor Cell Using a Naive Convolutional Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Debashis Gupta, Syed Rahat Hassan, Renu Gupta, Urmi Saha, and Mohammed Sowket Ali Tumor-TL: A Transfer Learning Approach for Classifying Brain Tumors from MRI Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Abu Kowshir Bitto, Md. Hasan Imam Bijoy, Sabina Yesmin, and Md. Jueal Mia Deep Convolutional Comparison Architecture for Breast Cancer Binary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Nasim Ahmed Roni, Md. Shazzad Hossain, Musarrat Bintay Hossain, Md. Iftekharul Alam Efat, and Mohammad Abu Yousuf Lung Cancer Detection from Histopathological Images Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Rahul Deb Mohalder, Khandkar Asif Hossain, Juliet Polok Sarkar, Laboni Paul, M. Raihan, and Kamrul Hasan Talukder Brain Tumor Detection Using Deep Network EfficientNet-B0 . . . . . . . . . . . . . . . . 213 Mosaddeq Hossain and Md. Abdur Rahman
Contents – Part I
xxi
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures . . . . . . 226 Tania Ferdousey Promy, Nadia Islam Joya, Tasfia Haque Turna, Zinia Nawrin Sukhi, Faisal Bin Ashraf, and Jia Uddin Transfer Learning Based Skin Cancer Classification Using GoogLeNet . . . . . . . . 238 Sourav Barman, Md Raju Biswas, Sultana Marjan, Nazmun Nahar, Mohammad Shahadat Hossain, and Karl Andersson Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Prosenjit Karmaker and Muhammad Sajjadur Rahim MRI Based Automated Detection of Brain Tumor Using DWT, GLCM, PCA, Ensemble of SVM and PNN in Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Md. Sakib Ahmed, Sajib Hossain, Md. Nazmul Haque, M. M. Mahbubul Syeed, D. M. Saaduzzaman, Md. Hasan Maruf, and A. S. M. Shihavuddin Pattern Recognition and Natural Language Processing Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Ferdib-Al-Islam, Nusrat Jahan, Farjana Yeasmin Rupa, Suprio Sarkar, Sifat Hossain, and Sk. Shalauddin Kabir Fake Profile Detection Using Image Processing and Machine Learning . . . . . . . . 294 Shuva Sen, Mohammad Intisarul Islam, Samiha Sofrana Azim, and Muhammad Iqbal Hossain A Novel Texture Descriptor Evaluation Window Based Adjacent Distance Local Binary Pattern (EADLBP) for Image Classification . . . . . . . . . . . . . . . . . . . 309 Most. Maria Akter Misti, Sajal Mondal, Md Anwarul Islam Abir, and Md Zahidul Islam Bornomala: A Deep Learning-Based Bangla Image Captioning Technique . . . . . 318 Jannatul Naim, Md. Bipul Hossain, and Apurba Adhikary Traffic Sign Detection and Recognition Using Deep Learning Approach . . . . . . . 331 Umma Saima Rahman and Maruf A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Ovishake Sen, Pias Roy, and Al-Mahmud
xxii
Contents – Part I
Bangla Speech-Based Person Identification Using LSTM Networks . . . . . . . . . . . 358 Rahad Khan, Saddam Hossain, Akbor Hossain, Fazlul Hasan Siddiqui, and Sabah Binte Noor VADER vs. BERT: A Comparative Performance Analysis for Sentiment on Coronavirus Outbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Subrata Saha, Md. Imran Hossain Showrov, Md. Motinur Rahman, and Md. Ziaul Hasan Majumder Aspect Based Sentiment Analysis of COVID-19 Tweets Using Blending Ensemble of Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Khandaker Tayef Shahriar, Md Musfique Anwar, and Iqbal H. Sarker Covid-19 Vaccine Sentiment Detection and Analysis Using Machine Learning Technique and NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Abdullah Al Maruf, Md. Nur Hossain Biplob, and Fahima Khanam Sentiment Analysis of Tweets on Covid Vaccine (Pfizer): A Boosting-Based Machine Learning Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Promila Haque, Rahatul Jannat Fariha, Israt Yousuf Nishat, and Mohammed Nazim Uddin Matching Job Circular with Resume Using Different Natural Language Processing Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 S. M. Shawal Chowdhury, Mrithika Chowdhury, and Arifa Sultana Transformer-Based Text Clustering for Newspaper Articles . . . . . . . . . . . . . . . . . . 443 Sumona Yeasmin, Nazia Afrin, and Mohammad Rezwanul Huq Bangla to English Translation Using Sequence to Sequence Learning Model Based Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 Rafiqul Islam, Mehedi Hasan, Mamunur Rashid, and Rabea Khatun Bangla Spelling Error Detection and Correction Using N-Gram Model . . . . . . . . 468 Promita Bagchi, Mursalin Arafin, Aysha Akther, and Kazi Masudul Alam Bidirectional Long-Short Term Memory with Byte Pair Encoding and Back Translation for Bangla-English Machine Translation . . . . . . . . . . . . . . . 483 Md. Tasnin Tanvir, Asfia Moon Oishy, M. A. H. Akhand, and Nazmul Siddique Face Recognition-Based Mass Attendance Using YOLOv5 and ArcFace . . . . . . 496 Omar Faruque, Fazlul Hasan Siddiqui, and Sabah Binte Noor
Contents – Part I
xxiii
A Hybrid Watermarking Technique Based on LH-HL Subbands of DWT and SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Fauzia Yasmeen, Mahbuba Begum, and Mohammad Shorif Uddin A Smartphone Based Real-Time Object Recognition System for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Md. Atikur Rahman, Kazi Md. Rokibul Alam, and Muhammad Sheikh Sadi Bangla Speech Emotion Recognition Using 3D CNN Bi-LSTM Model . . . . . . . . 539 Md. Riadul Islam, M. A. H. Akhand, and Md Abdus Samad Kamal An RNN Based Approach to Predict Next Word in Bangla Language . . . . . . . . . 551 Asif Mahmud, Md. Nazmul Hasan Rony, Deba Dip Bhowmik, Ratnadip Kuri, and A. R. M. Mahmudul Hasan Rana Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
Contents – Part II
Bio Signals and Recommendation Systems for Wellbeing Diagnosis and Classification of Fetal Health Based on CTG Data Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Monirul Islam, Md. Rokunojjaman, Al Amin, Md. Nasim Akhtar, and Iqbal H. Sarker Epileptic Seizure Prediction Using Bandpass Filtering and Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nabiha Mustaqeem, Tasnia Rahman, Jannatul Ferdous Binta Kalam Priyo, Mohammad Zavid Parvez, and Tanvir Ahmed Autism Spectrum Disorder Detection from EEG Through Hjorth Parameters and Classification Using Neural Network . . . . . . . . . . . . . . . . . . . . . . . Zahrul Jannat Peya, Bipasha Zaman, M. A. H. Akhand, and Nazmul Siddique A Review on Heart Diseases Prediction Using Artificial Intelligence . . . . . . . . . . Rehnuma Hasnat, Abdullah Al Mamun, Ahmmad Musha, and Anik Tahabilder Machine Learning Models to Identify Discriminatory Factors of Diabetes Subtypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shahriar Hassan, Tania Akter, Farzana Tasnim, and Md. Karam Newaz Analysis of Hand Movement from Surface EMG Signals Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. A. Ahsan Rajon, Mahmudul Hasan Abid, Niloy Sikder, Kamrul Hasan Talukder, Md. Mizanur Rahman, Md. Shamim Ahsan, Abu Shamim Mohammad Arif, and Abdullah-Al Nahid Design and Implementation of a Drowsiness Detection System Up to Extended Head Angle Using FaceMesh Machine Learning Solution . . . . . . . . Jafirul Islam Jewel, Md. Mahabub Hossain, and Md. Dulal Haque Fuzziness Based Semi-supervised Deep Learning for Multimodal Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abeda Asma, Dilshad Noor Mostafa, Koli Akter, Mufti Mahmud, and Muhammed J. A. Patwary
3
17
31
41
55
68
79
91
xxvi
Contents – Part II
Human Emotion Recognition from Facial Images Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Saima Sultana, Rashed Mustafa, and Mohammad Sanaullah Chowdhury Emotion Recognition from Brain Wave Using Multitask Machine Learning Leveraging Residual Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Rumman Ahmed Prodhan, Sumya Akter, Muhammad Bin Mujib, Md. Akhtaruzzaman Adnan, and Tanmoy Sarkar Pias Emotion Recognition from EEG Using Mutual Information Based Feature Map and CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Mahfuza Akter Maria, A. B. M. Aowlad Hossain, and M. A. H. Akhand A Machine Learning-Based System to Recommend Appropriate Military Training Program for a Soldier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Md Tauhidur Rahman, Raquib Hasan Dewan, Md Abdur Razzak, Sumaiya Nuha Mustafina, and Muhammad Nazrul Islam Integrated Music Recommendation System Using Collaborative and Content Based Filtering, and Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . 162 Arafat Bin Hossain, Wordh Ul Hasan, Kimia Tuz Zaman, and Koushik Howlader A Clustering Based Niching Method for Effectively Solving the 0-1 Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Md. Meheruzzaman Sarker, Md. Jakirul Islam, and Md. Zakir Hossain Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset Based on Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan, and Hasan Mahmud The Impact of Data Locality on the Performance of Cluster-Based Under-Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Ahmed Shabab Noor, Muhib Al Hasan, Ahmed Rafi Hasan, Rezab Ud Dawla, Afsana Airin, Akib Zaman, and Dewan Md. Farid An Analysis of Islamic Inheritance System Under Object-Oriented Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 A. H. M. Sajedul Hoque, Sadia Tabassum, Rashed Mustafa, Mohammad Sanaullah Chowdhury, and Mohammad Osiur Rahman Can Transformer Models Effectively Detect Software Aspects in StackOverflow Discussion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Nibir Chandra Mandal, Tashreef Muhammad, and G. M. Shahariar
Contents – Part II
xxvii
An Empirical Study on How the Developers Discussed About Pandas Topics . . . 242 Sajib Kumar Saha Joy, Farzad Ahmed, Al Hasib Mahamud, and Nibir Chandra Mandal BSDRM: A Machine Learning Based Bug Triaging Model to Recommend Developer Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 K. M. Aslam Uddin, Md. Kowsher, and Kazi Sakib A Belief Rule Based Expert System to Diagnose Schizophrenia Using Whole Blood DNA Methylation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Mohammad Shahadat Hossain, Mumtahina Ahmed, S. M. Shafkat Raihan, Angel Sharma, Raihan Ul Islam, and Karl Andersson Network, Security and Nanotechnology Reactive and Proactive Routing Protocols Performance Evaluation for MANETS Using OPNET Modeler Simulation Tools . . . . . . . . . . . . . . . . . . . . . 285 Mala Rani Barman, Dulal Chakraborty, and Jugal Krishna Das A Novel MIMO Antenna for 6G Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Umor Fasal, Md. Kamrul Hasan, Ayubali, Md. Emdadul Hoque Bhuiyan, Abu Zafar Md. Imran, Md. Razu Ahmed, and Ferose Khan Modification of Link Speed Estimation Model for IEEE 802.11ac WLANs by Considering Shadowing Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Mohammed Aman Ullah Aman and Sumon Kumar Debnath Electromagnetic Absorption Analysis of 5G Wireless Devices for Different Electromagnetic Shielding Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Abdullah Al Imtiaz, Md. Saifur Rahman, Tanveer Ahsan, Mohammed Shamsul Alam, Abdul Kader Mohammad Masum, and Touhidul Alam ToothHack: An Investigation on a Bluetooth Dongle to Implement a Low-Cost and Dynamic Wireless Control-Signal Transmission System . . . . . . 325 Md. S. Shantonu, Imran Chowdhury, Taslim Ahmed, Al Imtiaz, and Md. Rokonuzzaman Robustness of Eigenvalue-Spread Based Rule of Combination in Dynamic Networked System with Link Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Miss. Nargis Parvin, Md. Saifur Rahman, Md. Tofael Ahmed, and Maqsudur Rahman
xxviii
Contents – Part II
Blockchain Based Services in Education: A Bibliometric Analysis . . . . . . . . . . . . 348 Md. Shariar Hossain and A. K. M. Bahalul Haque An Approach Towards Minimizing Covid-19 Situation Using Android App and Drone-Based Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Robi Paul, Junayed Bin Nazir, and Arif Ahammad IoT and ML Based Approach for Highway Monitoring and Streetlamp Controlling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Mushfiqur Rahman, Md. Faridul Islam Suny, Jerin Tasnim, Md. Sabab Zulfiker, Mohammad Jahangir Alam, and Tajim Md. Niamat Ullah Akhund Cyber-Attack Detection Through Ensemble-Based Machine Learning Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Mohammad Amaz Uddin, Khandaker Tayef Shahriar, Md. Mokammel Haque, and Iqbal H. Sarker A Stacked Ensemble Spyware Detection Model Using Hyper-Parameter Tuned Tree Based Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Nowshin Tasnim, Md. Musfique Anwar, and Iqbal H. Sarker IoT Based Framework for Remote Patient Monitoring . . . . . . . . . . . . . . . . . . . . . . 409 Ifti Akib Abir, Sazzad Hossain Rafi, and Mosabber Uddin Ahmed Block-chain Aided Cluster Based Logistic Network for Food Supply Chain . . . . 422 Rahat Uddin Azad, Khair Ahammed, Muhammad Abdus Salam, and Md. Ifthekarul Alam Efat Programmable Logic Array in Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . 435 Fatema Akter, Tamanna Tabassum, and Mohammed Nasir Uddin QPROM: Quantum Nanotechnology for Data Storage Using Programmable Read Only Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Tamanna Tabassum, Fatema Akter, and Mohammed Nasir Uddin Analytical Modeling of Multi-junction Solar Cell Using SiSn Alloy . . . . . . . . . . 460 Tanber Hasan Shemanto and Lubaba Binte Billah Design and Fabrication of a Low-Cost Customizable Modern CNC Laser Cutter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Radif Uddin Ahmed, Mst. Nusrat Yasmin, Avishek Das, and Syed Masrur Ahmmad
Contents – Part II
xxix
Hole Transport Layer Free Non-toxic Perovskite Solar Cell Using ZnSe Electron Transport Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 Rukon Uddin, Subrata Bhowmik, Md. Eyakub Ali, and Sayem Ul Alam A Novel ADI Based Method for Model Reduction of Discrete-Time Index 2 Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Mohammad-Sahadet Hossain, Atia Afroz, Oshin Mumtaha, and Musannan Hossain Emerging Technologies for Society and Industry Prevalence of Stroke in Rural Bangladesh: A Population Based Study . . . . . . . . . 515 Md. Mashiar Rahman, Rony Chowdhury Ripan, Farhana Sarker, Moinul H. Chowdhury, A. K. M. Nazmul Islam, and Khondaker A. Mamun Segmented-Truncated-SVD for Effective Feature Extraction in Hyperspectral Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Md. Moshiur Rahman, Shabbir Ahmed, Md. Shahriar Haque, Md. Abu Marjan, Masud Ibn Afjal, and Md. Palash Uddin Effective Feature Extraction via Folded-Sparse-PCA for Hyperspectral Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 Md. Hasanul Bari, Tanver Ahmed, Masud Ibn Afjal, Adiba Mahjabin Nitu, Md. Palash Uddin, and Md. Abu Marjan Segmented-Incremental-PCA for Hyperspectral Image Classification . . . . . . . . . 550 Shabbir Ahmed, Md. Moshiur Rahman, Md. Shahriar Haque, Md. Abu Marjan, Md. Palash Uddin, and Masud Ibn Afjal Spectral–Spatial Feature Reduction for Hyperspectral Image Classification . . . . 564 Md. Touhid Islam, Mohadeb Kumar, and Md. Rashedul Islam Predicting the Risk of COVID-19 Infection Using Lifestyle Data . . . . . . . . . . . . . 578 Nafiz Fuad Siam, Mahira Tabassum Khan, M. R. Rownak, Md. Rejaben Jamin Juel, and Ashraf Uddin Forecasting Dengue Incidence in Bangladesh Using Seasonal ARIMA Model, a Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Nur Mohammed and Md. Zahidur Rahman The Impact of Social and Economic Indicators on Infant Mortality Rate in Bangladesh: A Vector Error Correction Model (VECM) Approach . . . . . . . . . 599 Muhmmad Mohsinul Hoque and Md. Shohel Rana
xxx
Contents – Part II
Machine Learning Approaches to Predict Movie Success . . . . . . . . . . . . . . . . . . . . 613 Md. Afzazul Hoque and Md. Mohsin Khan Structure of Global Financial Networks Before and During COVID-19 Based on Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 Sheikh Shadia Hassan, Mahmudul Islam Rakib, Kamrul Hasan Tuhin, and Ashadun Nobi Employee Attrition Analysis Using CatBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644 Md. Monir Ahammod Bin Atique, Md. Nesarul Hoque, and Md. Jamal Uddin Readiness Towards Industry 4.0 of Selected Industrial Sector . . . . . . . . . . . . . . . . 659 Choudhury Abul Anam Rashed, Mst. Nasima Bagum, and Mahfuzul Haque Estimating Energy Expenditure of Push-Up Exercise in Real Time Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 Md. Shoreef Uddin, Sadman Saumik Islam, and M. M. Musharaf Hussain Cross-Layer Architecture for Energy Optimization of Edge Computing . . . . . . . . 687 Rushali Sharif Uddin, Nusaiba Zaman Manifa, Latin Chakma, and Md. Motaharul Islam Energy Consumption Issues of a Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702 Nabila Islam, Lubaba Alam Chhoa, and Ahmed Wasif Reza Trade-Offs of Improper E-waste Recycling: An Empirical Study . . . . . . . . . . . . . 715 Md Shamsur Rahman Talukdar, Marwa Khanom Nurtaj, Md Nahid Hasan, Aysha Siddeka, Ahmed Wasif Reza, and Mohammad Shamsul Arefin A Hybrid Cloud System for Power-Efficient Cloud Computing . . . . . . . . . . . . . . . 730 S. M. Mursalin, Md. Abdul Kader Jilani, and Ahmed Wasif Reza A Sustainable E-Waste Management System for Bangladesh . . . . . . . . . . . . . . . . . 739 Md. Shahadat Anik Sheikh, Rashik Buksh Rafsan, Hasib Ar Rafiul Fahim, Md. Tabib Khan, and Ahmed Wasif Reza Machine Learning Algorithms on COVID-19 Prediction Using CpG Island and AT-CG Feature on Human Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . 754 Md. Motaleb Hossen Manik, Md.Ahsan Habib, and Tanim Ahmed
Contents – Part II
xxxi
Statistical and Bioinformatics Model to Identify the Influential Genes and Comorbidities of Glioblastoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 Nitun Kumar Podder and Pintu Chandra Shill Protein Folding Optimization Using Butterfly Optimization Algorithm . . . . . . . . 775 Md. Sowad Karim, Sajib Chatterjee, Ashis Hira, Tarin Islam, and Rezanul Islam Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
Imaging for Disease Detection
Potato-Net: Classifying Potato Leaf Diseases Using Transfer Learning Approach Abu Kowshir Bitto1 , Md.Hasan Imam Bijoy2,3(B) , Aka Das2,4 Md.Ashikur Rahman1 , and Masud Rabbani2,3
,
1 Department of Software Engineering, Daffodil International University, Dhaka 1341,
Bangladesh [email protected] 2 Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh {hasan15-11743,masud.cse}@diu.edu.bd 3 Daffodil International University, Dhaka 1341, Bangladesh 4 Premier University, Chattogram 4203, Bangladesh
Abstract. Research on pertinent topics is more important than ever for the longterm development of agriculture, given the advancements in contemporary farming and use of artificial intelligence (AI) for identifying crop illnesses. There are numerous diseases, and they all significantly affect the amount and quality of potatoes. Early and automated detection of these illnesses during the budding phase can assist increase the output of potato crops, but it requires a high level of ability. Several models have already been created to identify various plant diseases. In this study, we use a variety of convolutional neural network designs to recognize potato leaf disease and assess their early detection accuracy against that of other researchers’ work. The learning sample for our algorithm included both the original and enhanced photos, as a learning option. The model was then evaluated to ensure that it was accurate. After being trained on the dataset for the potato leaf disease using the Inception-v3, Xception, and ResNet50 models, the model’s performance was evaluated using test images. ResNet50 has the highest accuracy and lowest error rate for detecting potato leaf disease, followed by Inception-v3 with an accuracy of nighty four point two five percent (94.25%) and Xception with an accuracy of eighty-nine point seven one percent (89.71%). Keywords: Potato · Leaf Disease · Inception-V3 · Xception · ResNet50 · Transfer Learning
1 Introduction The potato is a popular vegetable in practically any nation on the planet [1]. It is a readily accessible and affordable vegetable. Therefore, people use this, and farmers grow this vegetable plentifully. Planting potatoes is 4th according to production. After rice, grains, and corn, potato is the only vegetable that farmers grow plentifully. We can © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 3–14, 2023. https://doi.org/10.1007/978-3-031-34619-4_1
4
A. K. Bitto et al.
also find the farmers growing potatoes everywhere in Bangladesh also. Potato grows considerably in some districts in Bangladesh of nice weather and good marketing. We mainly use potatoes as a vegetable because it is very delicious themselves and with any other vegetables also. Potatoes can be eaten as an alternative to starch because it’s a source of starch or carbohydrate. It is taken as the main meal in forty countries, including Bangladesh. Potato is a short-term but highly yielded crop that can increase the food production of Bangladesh. In Bangladesh, farmers grow only 11 tons of potatoes in every hector. But it can be increased to 20 tons. If people started taking potatoes as an alternative to rice, the pressure on agriculture could be decreased a lot. At least we can take potatoes sometime between February to June to lessen the anxiety from grain. Farmers should choose a good kind of potato to cultivate a good amount of potato. The advantage of a good category is we can store potatoes for a long time. And we can grow more good kinds of crops from them. Since the year 1960, farmers of Bangladesh have been developing a suitable type of potatoes. Some best kinds of potatoes are hira, diamond, bari potato11(chook), bari potato-12(dhira), potato-13(granola), potato-15(binela), and so on. Bari tropics-1 and bari tropics-2 are two types of hybrid potatoes invented by the Bangladesh Agricultural Institute. Farmers of Bangladesh apply many techniques to cultivate potatoes. But none of them is scientific. If they use scientific ways to produce the number of crops, the profit will Increase. The yield of potatoes can be good when the plant is healthy. Though potato grows anytime in the year, it is mainly winter seasoned vegetable. In winter, the plants become dry, and then the plants need irrigation. If there is a lack of irrigation, plants get dry. The size of the potatoes became more petite than usual. So it’s essential to water the plants as needed. It’s imperative to keep the plants hydrated and need to arouse the soil after 30–35 days of implantation. The roots of the plants must be kept clean from any weeds because it prevents the plants from growing naturally. And sucks all the nutrition from the plant. Potato plants can be affected by insects while they are in fields. If this happens, the irrigation should stop immediately because irrigation helps to influence the healthy plant. People have to depend on imported food which will increase their expenditure and lead to health risks for people. [11] Sometimes, potato plants can get diseases. The insects mostly destroy the crops. They are brown and can be 10-15mm in size. They cut the roots and drill the leaves and destroy the entire cultivation. There can be various kinds of insects that destroy plants and fruits. Farmers need to use insecticide according to the guidance of an agricultural officer. If it is a matter of storing the potatoes, farmers need to use sand. They should make a thin layer over and under the potatoes to keep them fresh for a long time. They should pick the Rotten or insect-affected potatoes and throw them out. The healthy and fresh ones will be affected by them. Despite taking proper care of them. Some most common diseases of potato plants are septoria leaf spot, early blight, bacteria wilt, late blight, common scab, and so on. There are also some fungal diseases. They are Powdery mildew, powdery scab (not a fungus but Rhizaria), Rosellina black rot, and black scurf/canker. The disorders of potatoes can be recognized by using machine learning. We can detect the disease more accurately and deeply by using machine learning. The solution can be more straightforward once the problem is seen in detail. It is
Potato-Net
5
essential to keep the plant healthy for harvesting a good amount of fruit. We can solve any problem quicker than before by using technology in this modern world [10]. We can detect a problem more accurately and profoundly and solve it than with the help of modern technology nowadays. Therefore, it’s highly possible to detect the diseases of potato plants and solve them using modern technology to prevent the loss of the farmers who invest their money and hard work to grow potatoes and balance the economy. CNN is one of the most powerful techniques in pattern recognition, with a large amount of data that benefits encouraging results to detect these diseases [11]. In this paper, we use several architectures of the CNN to detect potato leaf disease, find out whichever algorithm has the best detection accuracy and compare it with other researchers’ work. Section 2 belongs to the literature review. Section 3 discusses the method for identifying potato leaf disease. Section 4 presents and discusses the experimental findings. The document’s 5 section draws the whole affair to a conclusion.
2 Literature Review Many papers, publications, and research projects focus on the detection and categorization of potato leaf disease. A few of the work reviews that have been provided are included below. Tiwari et al. [1] focus on various diseases of potato leaf. A deep learning model is used to detect potato leaf diseases, and multiple classifiers are used. Several techniques for detecting potato leaf diseases include K Means Clustering, VGG19 model, etc. are used. Dataset collected from Kaggle. The competition winner, GoogLeNet, had a 6.7% mistake rate, whereas VGG19 had a 6.8% error rate in the top 5 validation and test error categories. Sravan et al. [2] use machine learning and deep learning approaches to concentrate on identifying the diseased plant at the lowest possible cost. The data collected for the course comprises 20,639 images taken from the plant village database. This research work is done by fine-tuning the ResNet-50 model. The suggested strategy optimizes ResNet50 with the greatest classification accuracy of 99.26%. Islam et al. [3] focus on employing a few leaf picture data and cutting-edge machine learning techniques, to diagnose potato illnesses. A transfer learning is applied to detect potato diseases early. For detecting potato leaf diseases, many techniques are used, such as the CNN model, VGG19, VGG16, ResNet50, RGB imaging, etc. Here, five key methodologies are used: data collection and analysis, image pre-processing, data splitting, classification model construction, and model testing. Data collected from plant village. In testing, the software predicts with an accuracy of 99.43% using 20% test data and 80% train data. Chen et al. [4] focus on various identifying plant diseases. Plant diseases are identified using a deep learning model and transfer learning. MobileNet-V2, CNNs, classification activation map (CAM), and other approaches are used to detect leaf diseases. To detect leaf diseases, AlexNet and ResNet101 were used. They gathered information from Xiamen, China’s Fujian Institute of Subtropical Botany. On the public dataset, it gets an average recognition accuracy of ninety nine point eight five (99.85) per cent. The average accuracy of the collected plant disease photos reaches 99.11% even under numerous classes and difficult backdrop conditions. Dasgupta et al. [5] focus on the detection of diseases in potato leaves. To identify diseases in potato leaves, researchers used a deep learning
6
A. K. Bitto et al.
model and transfer learning techniques and computer vision and image categorization. The accuracy measure is used to quantify model’s performance and visually display the model’s performance. Because the model is lightweight and resilient, it may have integrated into the application for a handheld device such as a smartphone, allowing crop growers to spot diseased crops on the go and save them from ruin. Here, a CNN was utilized to automatically detect sick maize plants in fields using leaf photos using a deep-learning-based approach, which obtained a 97% accuracy. A portion of the Plant Village Dataset was used for this challenge. There are three types of potato leaf photos in this dataset: Alternaria solani, Phytophthora infestans, and healthy leaf images. The dataset originally contained fifty thousand photos of disease-infected leaves from thirty-eight different plant species. Mukti et al. [6] focus on transfer learning based plant diseases detection using ResNet50. In this study, a CNN model based on transfer learning was built to accurately identify plant diseases. The ResNet50 network is the major focus of this research. This model produced the greatest outcome, with a training accuracy of 99.80%. They can manage infections and increase productivity if farmers use this application. The plant disease dataset was obtained from the GitHub repository of the ‘salathegroup’, a well-known research group. A CNN was built in this paper to identify plant illnesses automatically. The model’s performance was evaluated using a comparison with a few different transfer learning models and appropriate graphics. More picture data is necessary for the best generalization of the CNN model as the model’s depth increases. Finally, transfer learning was performed using a popular pre-trained model (ResNet50). The proposed model’s overall accuracy was 99.80%. Too et al. [7] focus on plant disease identification. Use ResNet with 50, 101, VGG 16, Inception V4, and 152 layers, and DenseNets with 121 layers. Dataset collected from Plant Village. Plant Village has 54,306 images, with twenty-six diseases for fourteen crop plants. DenseNet achieves a testing accuracy score of 99.75% to beat the rest of the architectures. Sumalatha et al. [8] focus on transfer learning-based plant disease detection. Deep CNN and transfer learning are used in this paper. In this case, picture classification challenges have seen great success using deep neural networks (DNN). Six different CNN architectures used Xception, Resnet50, MobileNet, VGG16, InceptionV3, and DenseNet121 are compared, and they found that DenseNet121 achieves the best accuracy of 95.48 on test data. Data collected from plant village and trained, validated, and tested data 11,333. Arshad et al. [9] focus on Plant Disease Identification Using Transfer Learning. In this study, ResNet50 with Transfer Learning is used for tomato, potato, tomato, and corn disease identification. Plant disease identification. CNN, VGG16, and CNN are also used in this paper. The best performance for identifying plant diseases was attained by ResNet50, which scored 98.7 per cent. Data were collected from the plant village. AJasim et al. [10] focus on plant leaf disease detection. Here CNN is used for plant leaf disease detection. Here got an accuracy of (98.29 per cent) for training and (98.029 per cent) for testing for all data sets used. They collected data from Plant Village, and the total number of data was 20636. They used a convolutional neural network algorithm. Future research on the suggested system could also test out different learning rates and optimizers.
Potato-Net
7
3 Methodology The main aim of our study is to develop a transfer learning model that will detect potato leaf disease. We must go through numerous phases to attain our aim, including dataset collecting, data preprocessing, model creation, etc. In Fig. 1, the functioning procedure is presented.
Fig. 1. Working procedure diagram to classify the potato leaf diseases.
3.1 Dataset Description The data is collected from a GitHub repository. 7148 data points were collected for each leaf disease: 2183 images for early blight disease, 2498 for late blight disease, and 2468 images for healthy leaf. The 7148 photographs in the customized leaf disease dataset are divided into 5722 for training and 1426 for testing. In Fig. 2, colors have been added to the images that have been taken, and sample data has been visualized.
Fig. 2. Sample dataset for (a) Early Blight, (b) Late Blight, (c) Healthy.
8
A. K. Bitto et al.
3.2 Data Preprocessing Preprocessing techniques employ geometric modifications. Image modification includes things like rotation, scale, and translation. During the preparation steps, we reduced the data resolution. For Xception, Inception-v3, and ResNet50, all pictures resize to 220*220 pixels. The photographs are all of the same excellent quality. Shear shifting, rotation, height shifting, width shifting, and horizontal flipping were applied to the pictures based on image modifications. 3.3 Model Implementation In this study, we used CNN based transfer algorithm into the potato leaf disease dataset. Transfer learning model relevant theory given below. Transfer Learning (TL): A previously trained model is utilized as the foundation for a new model on a different topic using the machine learning approach known as transfer learning. Simply put, a model developed for one job is used to another that is comparable as an optimization to enable quick modeling advancement on the second task [12, 13]. By avoiding the need to train many machine learning models from scratch to carry out similar tasks, transfer learning is frequently utilized to save time and money. In machine learning applications that need a lot of resources, such image classification, as a cost-saving measure. ResNet-50: A well-known neural network called ResNet [14], or Residual Networks, serves as the foundation for many computer vision applications. In the 2015 ImageNet competition, this model won first place. ResNet changed the game because it made it possible for us to successfully train 150-layer deep neural networks. One MaxPool, one Average Pool, and 48 Convolution layers make up ResNet50, a ResNet variant. There are 3.8 x 109 floating-point operations in all. We have extensively examined the ResNet50 architecture, a well-liked ResNet model. He and his collaborators suggested ResNet, which the 2015 ImageNet competition judged to be the best. This technique allows for the training of deeper networks. Xception: A 71-layer convolutional neural network called Xception exists [15]. The ImageNet database is used to load a pre-trained version of the network that has been trained on more than a million images. The network can classify photos into a thousand different item categories, including keyboards, pencils, mouse, and different animals. The Inception Architecture is enhanced by Xception [16], which substitutes depth-wise Separable Con-volutions for the original Inception modules. Inception-V3: When Inception Net moved farther to enhance performance and accuracy without compromising computing cost, it established a new standard for CNN classifiers. The Inception network, on the other hand, has been carefully built. “Convolutional Neural Networks employ stacked 11 convolutions to reduce dimensionality and deliver more efficient computation and deeper networks.” The modules were created to address problems like over-fitting and computational expense, among others. The Convolutional Layers: 11, 33, and 55, make up the Inception Layer, and their output filter banks have been concatenated into a single output vector that is used as the input for the following phase.
Potato-Net
9
3.4 Performance Calculation We used test data to quantify the models’ performance after training. Here are some of the metrics that were calculated for performance evaluation. We discovered the most accurate model to forecast using these parameters. Using Eqs. (1–7), several percentage performance measures have been created depending on the model’s given confusion matrix. Accuracy =
True Positive + True Negative × 100% Total Number of Images
(1)
True Positive Rate (TPR) =
True Positve × 100% True Positive + False Negative
(2)
True Negative Rate (TNR) =
True Negative × 100% False Positive + True Negative
(3)
False Positive Rate (FPR) =
False Positive × 100% False Positive + True Negative
(4)
False Negative Rate (FNR) =
False Negative × 100% False Negative + True Positive
(5)
Precision =
True Positive × 100% True Positive + False Positive
F1 Score = 2 ×
Precision × Recall × 100% Precision + Recall
(6) (7)
4 Results and Discussions A ratio of 80:10 was used to split the 5722 training pictures of potato leaf disease and the 1426 validation images. The experiment platform has an Intel Core i5 CPU and 8 GB of RAM. All input pictures were scaled to 220*220, 220*220, and 220*220, respectively, for the Xception ResNet50 and Inception-v3 models. These appropriate models were used in our study to scale images to 220*220. Weights from the pre-trained Xception, ResNet50, and Inception-v3 models were applied. The resulting confusion matrix (TP, FN, FP, TN) for each of the applicable models is shown in Table 1 with three classes. For Xception, we chose a batch size of 64 and 40 epochs. When Xception is finished, we build the confusion matrix and evaluate performance for each class. Fig. Table 2 displays the computed performance, while Fig. 3 depicts the accuracy graph and loss. We used a batch size of 64 and 40 epochs for ResNet-50. We build the confusion matrix from the model when ResNet-50 is finished and assess the performance of each class. Figure 4 displays the accuracy graph and loss, while Table 3 displays the computed performance. For Inception-V3, we utilized a batch size of 64 and 30 epochs. When Inception-V3 is finished, we create the confusion matrix and assess each class’s performance. The
10
A. K. Bitto et al. Table 1. Confusion matrices for applied transfer learning with three classes.
Model
Class
TP
FN
FP
TN
Xception
Late Blight
844
41
50
491
Early Blight
867
106
24
429
Healthy
726
129
90
481
Late Blight
854
56
13
503
Early Blight
882
23
11
510
Healthy
785
60
13
568
Late Blight
463
22
11
930
Early Blight
470
108
15
833
Healthy
364
92
4
966
ResNet-50
Inception-v3
Fig. 3. Diagram for (a) Xception accuracy and (b) Xception loss on 40 epochs.
Table 2. Class wise performance evaluation matrices for Xception. Model
Class
Accuracy (%)
TPR (%)
FNR (%)
Xception
Late Blight
93.62
95.37
4.63
Early Blight
90.88
89.11
Healthy
84.64
84.91
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
9.24
90.76
94.41
94.88
10.89
5.30
94.70
97.31
93.03
15.09
15.76
84.24
88.97
86.89
accuracy graph and loss are shown in Fig. 5, and the computed performance is shown in Table 4. In this suggested study, we want to evaluate the trained model using the test data set. Our model was given the training dataset, which contained both the original and the
Potato-Net
11
Fig. 4. Diagram for (a) ResNet-50 accuracy and (b) ResNet-50 loss on 40 epochs.
Table 3. Class-wise performance evaluation matrices for ResNet-50. Model
Class
Accuracy (%)
TPR (%)
FNR (%)
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
ResNet-50
Late Blight
95.16
93.85
6.15
2.52
97.48
98.50
96.12
Early Blight
97.62
97.46
2.54
2.11
97.89
98.77
98.11
Healthy
94.88
92.90
7.10
2.24
97.76
98.37
95.56
Fig. 5. Diagram for (a) Inception-V3 accuracy and (b Inception-V3 loss on 30 epochs.
improved photographs, as an additional learning option. The model was subsequently verified for accuracy. The model’s performance was evaluated using test images after it had been trained using the ResNet50, Inception-v3, and Xception, architecture on the dataset for the potato leaf disease. We played around with the ResNet50, Xception, and Inception-v3 models’ weights. This was done in order to compare our model to other well-known transfer learning networks that have already been trained [18]. We
12
A. K. Bitto et al. Table 4. Class-wise performance evaluation matrices for Inception-v3.
Model
Class
Accuracy (%)
TPR (%)
Inception-v3
Late Blight
97.69
95.46
Early Blight
91.37
Healthy
93.27
FNR (%)
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
4.54
1.17
98.83
97.68
96.56
81.31
18.69
1.77
98.23
96.91
88.43
79.83
20.17
0.41
99.59
98.91
88.35
investigated which pre-trained network will best complement this dataset. There are three distinct models: ResNet50, Xception, and Inception-v3. Table 5 reveals that for identifying potato leaf disease, ResNet50 has the best accuracy (96.11%) and the lowest mistake rate, Inception-v3 is second (94.25%), and Xception has the lowest accuracy (89.71%) and the highest error rate. Table 5. Final accuracy table for the computed performance of applied transfer learning. Model
Accuracy (%)
TPR (%)
FNR (%)
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
Xception
89.71
89.80
10.20
10.1
89.9
93.56
91.6
ResNet-50
96.11
94.74
5.26
2.3
97.7
98.55
96.60
Inception-V3
94.25
85.53
14.47
1.12
98.88
97.83
91.11
We will integrate these into a mobile application. With the use of a smartphone application, farmers will be able to detect potato leaf disease as early as possible, allowing them to produce more potatoes.
5 Conclusion In order to determine which model recognizes the disease the earliest and most precisely, in this article, we used different transfer learning approaches to recognize potato leaf illness and contrast them with previous studies. The learning data for our model came with the original and modified photographs, as a learning option. The model was then tested to make sure it was accurate. Utilizing test images, the model’s performance was assessed after being trained on the dataset for the potato leaf disease using the Xception, ResNet50, and Inception-v3 architecture. We experimented with the Xception, ResNet50, and Inception-v3 models’ weights. Inception-v3 came in second with 94.25% accuracy for identifying potato leaf disease, ResNet50 comes in first with 96.11% accuracy and a low error rate, and Xception came in last with 89.71% accuracy and the greatest mistake
Potato-Net
13
rate. For identifying potato leaf disease, ResNet50 has the best accuracy (96.11%) and a low error rate (0.4%), whereas Inception-v3 has the second best accuracy (94.25%). For identifying potato leaf disease, Xception has the most significant error rate and the least accurate detection rate of 89.71%. We pondered why accuracy was not 100% accurate. It’s most likely owing to the leaves’ nature. Some species’ pictures are strikingly similar in shape, color, and texture to those of other species. As a result, it can be difficult for networks to forecast genuine labels at times correctly. These plant leaves can be captured with a mobile camera. Thanks to an accurate plant disease detection model included in those smartphones, farmers will be able to spot plant illnesses in a timely and straightforward manner. Farmers will have the freedom to choose for themselves. Additionally, it will benefit the development of agriculture.
References 1. Tiwari, D., Ashish, M., Gangwar, N., Sharma, A., Patel, S., Bhardwaj, S.: Potato leaf diseases detection using deep learning. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 461–466. IEEE (2020) 2. Sravan, V., Swaraj, K., Meenakshi, K., Kora, P.: A deep learning based crop disease classification using transfer learning. In: Materials Today: Proceedings (2021) 3. Islam, F., Hoq, M.N., Rahman, C.M.: Application of transfer learning to detect potato disease from leaf image. In: 2019 IEEE International Conference on Robotics, Automation, Artificialintelligence and Internet-of-Things (RAAICON), pp. 127–130. IEEE (2019) 4. Chen, J., Zhang, D., Nanehkaran, Y.A.: Identifying plant diseases using deep transfer learning and enhanced lightweight network. Multimed. Tools Appl. 79(41–42), 31497–31515 (2020). https://doi.org/10.1007/s11042-020-09669-w 5. Dasgupta, S.R., Rakshit, S., Mondal, D., Kole, D.K.: Detection of diseases in potato leaves using transfer learning. In: Das, A.K., Nayak, J., Naik, B., Pati, S.K., Pelusi, D. (eds.) Computational Intelligence in Pattern Recognition. AISC, vol. 999, pp. 675–684. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9042-5_58 6. Mukti, I.Z., Biswas, D.: Transfer learning based plant diseases detection using ResNet50. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–6. IEEE (2019) 7. Too, E.C., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric. 161, 272–279 (2019) 8. Sumalatha, G., Krishna Rao, D.S., Singothu, D., Rani, J.: Transfer learning-based plant disease detection (2021) 9. Arshad, M.S., Rehman, U.A., Fraz, M.M.: Plant disease identification using transfer learning. In: 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1–5. IEEE (2021) 10. Jasim, M.A., Al-Tuwaijari, J.M.: Plant leaf diseases detection and classification using image processing and deep learning techniques. In: 2020 International Conference on Computer Science and Software Engineering (CSASE), pp. 259–265. IEEE (2020) 11. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016) 12. Mia, J., Bijoy, H.I., Uddin, S., Raza, D.M.: Real-time herb leaves localization and classification using YOLO. In: 2021 12th International Conference on Computing Communication
14
13. 14.
15.
16. 17. 18.
A. K. Bitto et al. and Networking Technologies (ICCCNT), pp. 1–7 (2021). https://doi.org/10.1109/ICCCNT 51525.2021.9579718 Krishna, R., Menzies, T.: Bellwethers: a baseline method for transfer learning. IEEE Trans. Softw. Eng. 45(11), 1081–1105 (2018) Theckedath, D., Sedamkar, R.R.: Detecting affect states using VGG16, ResNet50 and SEResNet50 networks. SN Comput. Sci. 1(2), 1–7 (2020). https://doi.org/10.1007/s42979-0200114-9 Bitto, A.K., Mahmud, I.: Multi categorical of common eye disease detect using convolutional neural network: a transfer learning approach. Bull. Electr. Eng. Inform. 11(4), 2378–2387 (2022) François, C.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Xia, X., Xu, C., Nan, B.: Inception-v3 for flower classification. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 783–787. IEEE (2017) Hasan, S., Rabbi, G., Islam, R., Bijoy, H.I., Hakim, A.: Bangla font recognition using transfer learning method. In: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 57–62 (2022). https://doi.org/10.1109/ICICT54344.2022.9850765
False Smut Disease Detection in Paddy Using Convolutional Neural Network Nahid Hasan1 , Tanzila Hasan1(B) , Shahadat Hossain3 , and Md. Manzurul Hasan2 1
2
City University, Dhaka, Bangladesh [email protected] American International University-Bangladesh (AIUB), Dhaka, Bangladesh [email protected] 3 Daffodil International University, Dhaka, Bangladesh [email protected] Abstract. Rice false smut (RFS) is the most severe grain disease affecting rice agriculture worldwide. Because of the various mycotoxins produced by the causal pathogen, Villosiclava virens, epidemics result in yield loss and poor grain quality (anamorph: Ustilaginoidea virens). As a result, the farmers’ main concern is disease management measures that are effective, simple, and practical. Because of this, we look at the image of the RFS to understand and predict this severe grain disease. This research proposes a model based on the Convolutional Neural Network (CNN), widely used for image classification and identification due to its high accuracy. First, we acquire data from actual rice farming fields with highresolution RFS images. Then, we train and test our model’s performance using actual images to compare and validate it. As a result, our model provides 90.90% accurate results for detecting the RFS in actual photos. Finally, we evaluate and record all of the data for subsequent studies. Keywords: CNN · False Smut · Detection · Image Processing
1
Introduction
Rice (Oryza sativa) is the main crop in Bangladesh and many other countries. In Bangladesh, 90% of all farmers are involved in rice farming, and 95% of the country’s food needs are met by paddy. The importance of rice to economic growth is huge. For example, Bangladesh exports paddy to other countries every year, which employs many people and brings in much foreign money. The United States Department of Agriculture says Bangladesh makes 3.6 crore tons of rice annually. Also, Bangladesh made the fourth most rice after China, India, and Indonesia. There are several reasons to limit rice production, but one of the most important is the disease in paddy grains. In 2019, a survey was done in the paddy fields of NRRI in Cuttack. They found that the grown rice genotypes had lost a lot of their brown, black, or ash-colored grains, usually accompanied by chaffiness. Discoloured grains in different rice genotypes ranged from 25% to 92% [1]. c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 15–21, 2023. https://doi.org/10.1007/978-3-031-34619-4_2
16
N. Hasan et al.
People have gotten sick with this disease in India. The highest infection rate, 85%, was found in Tamil Nadu [10]. False smut (Ustilaginoidea virens) is a pathogen that damages crops. The most common sign of the disease is the growth of black fungus on the rice grains, which are covered by yellow fungus in the field. Spores that are fully grown are orange, but they turn yellowish-green or greenishblack as they age. Most of the time, only a few grains on a panicle are bad, and the rest are fine. However, farmers who grow rice crops get this disease more often than when they first saw them. So, the grain has a low market price because its colour is faded and not very good. As a result, people are more likely to spread false smut diseases. Recently, RFS has been found in many of the world’s most important rice-growing areas, such as China, India, and the United States [12]. We made a CNN system that makes it easy to spot paddy fake smut sickness. The rest of the information in this article is set up as follows. The second part talks about the works that go together. In Sect. 3, we look at how the analysis was done. Section 4 talks about our experiment with the CNN model. It shows how well our model did on the dataset and talks about it. Before ending, Sect. 5 talks about where we want to go with our research in the future.
2
Related Work
In agriculture, the study of diagnosing from a picture is an interesting one. Based on pictures of sick rice plants, a way to find out if rice is sick has been found. They look at the extracted sample’s color, shape, and feel to find bacterial leaf blight, brown spots, and leaf smoot. Regarding accuracy, 73.33% of the test dataset was correct, while 93.33% of the training dataset was correct [7]. Using image processing techniques to make diagnoses from pictures of leaves is one of the newest areas of research in agriculture that got our attention. This study suggests using an Optimized Deep Neural Network with the Jaya Algorithm to find and categorize diseases in paddy leaves. They took pictures of the paddy field and found four diseases: bacterial blight, blast, brown spot, and sheath rot. Also, RGB photos have been changed into HSV images, which makes it possible to remove and mask backgrounds based on color in pre-processing. Also, a clustering algorithm was used to separate the sick, normal, and background parts [8]. Using photos of tree leaves to determine what kind of disease a tree has is an interesting topic in agriculture. For example, pictures of rice leaves have shown three diseases: Bacterial Leaf Blight, Brown Spot, and Leaf Smut. The Deep Convolutional Neural Network Alexnet model was used to perform accurate feature extraction and detection. Rice is one of the most important crops right now because the economy depends on the production and productivity of agriculture. Moreover, even rice is a crop that most people in most countries eat daily. On the other hand, rice is grown in the field and can get sick from bacteria and fungi. So, we must be more careful about growing and testing rice. With the help of image processing, three diseases of rice were found: Bacterial Leaf Blight, Brown Spot, and Leaf Smut. The UCI Machine Learning Repository’s Rice Leaf Disease dataset was used [9]. The remaining neural networks could put the pictures into the correct disease category 95.83% of the time [6]. These studies are
False Smut Disease Detection in Paddy Using Convolutional Neural Network
17
mostly about paddy blast, brown spot disease, and narrow brown spot disease. They used a tool called Matlab. The neural network puts diseases like paddy blast, brown spot disease, narrow brown spot disease, and normal paddy leaf disease into different groups. As examples for training, ten pictures of Blast Disease, ten pictures of Brown Spot Disease, and ten pictures of Narrow-Brown Spot Disease are used. Also, the test score is 92.5% [13]. India’s main crop is rice, and most paddy farming land is used to grow brown and white rice. Rice is grown in almost every Indian state. More than threequarters of India’s people work in agriculture. The first thing that makes rice plants sick is when fungi or bacteria attack them, and the second is when the weather changes without warning. Diseases in grain plants or changes in the weather can cause famine. It could also hurt the economy. Rice blasts, brown spots, leaf smut, tungro, and sheath blight are the most common diseases. Rice disease is the most common problem for most farmers. It means that early diagnosis is very important [11].
3
Methodology
This section explains our workflow of analyses and the required methods to observe our collected dataset and detect Paddy False Smut disease (Fig. 1).
Fig. 1. Work flow diagram
3.1
False Smut Disease
Paddy is the major crop in many nations throughout the world. Thus its production and diagnosis are critical for us [9] There are several diseases in paddy. That is why we lose paddy production every year. One of these is False Smut Disease. The fungal pathogen Ustilaginoidea virens, which produces both sexual ascospores and asexual chlamydospores in its life cycle, is the primary cause of Rice false smut disease, according to [2] (Fig. 2).
18
N. Hasan et al.
Fig. 2. Affected rice and healthy rice
3.2
Data Collection and Preprocessing
Dataset is the most crucial part of image processing. So, we should be careful when we collect data. The imbalanced dataset is one of the significant problems in dataset [5]. Our research contains images directly from the paddy field in Gazipur, Bangladesh. In every dataset, we have two types of class Disease and Healthy. After collecting data, we pre-processed our data. Then, we use image augmentation for re-scaling, crop, rotation, zooming, resizing images, and formatting images for further analysis. 3.3
Convolutional Neural Network
CNN is a class of neural networks that are used for deep learning. In a nutshell, it is a robust machine learning method for automatically classifying images. CNN is often used to segment images and can also be used to classify images. A convolution neural network (CNN) is an important part of many well-known methods for classifying images. CNN’s method has a few layers to determine the real output [4]. The input image is passed through a set of filters in the convolution layers. We use binary classification and combine each feature map with a network with all its links. Because of this, we use the sigmoid algorithm in our research.
4
Result and Performance Analyses
In Table 1, we look at how well our model works and show performance metrics like precision, specificity, sensitivity, and the f1 score. Accuracy metrics create two local variables, “total” and “count,” which determine how often Y pred and Y true match up [3]. Based on our research, we have found that on-field datasets are accurate 90.90% of the time. One way to measure how well a machine learning model works is by accuracy. It is worked out by dividing the number of real positives by the number of optimistic predictions. Our research has given us an on-field dataset model with an accuracy of 90.90% (Fig. 3).
False Smut Disease Detection in Paddy Using Convolutional Neural Network
19
Table 1. Result Table Precision Specificity Sensitivity F1 score Accuracy On-field Dataset 90.62%
91.17%
90.62%
90.62%
90.90%
Fig. 3. On-field Dataset model curved
Sensitivity metrics assess a model’s ability to predict true positives in each accessible category, and specificity measures a model’s ability to predict the actual negatives of each accessible type. Our model achieves a sensitivity of 90.62% and a specificity of 91.17%. The F1 score is a better metric than accuracy since it is the harmonic mean of precision and recall. In our on-field CNN model, the F1 score is 90.62%. Confusion matrices are a summary of the results of classification problem prediction. Figure 4 shows confusion matrices for on-field photographs.
Fig. 4. Confusion metrices of CNN model
20
N. Hasan et al.
After fitting our model to the dataset, we obtain two curves: accuracy and loss curves (training and validation). The x-axis shows the number of epochs in the accuracy curve, and the y-axis represents the number of accuracies. Likewise, the x-axis represents the number of epochs in the loss curve, while the y-axis represents the number of validations.
5
Future Research Direction and Conclusion
We have determined the last stage of the fake smut disease based on healthy and diseased images from on-field images. In our future expanded effort to ensure the early detection of fake smut disease, we will try to collect the second stage of this condition. Paddy is Bangladesh’s most significant food crop, accounting for 80% of the country’s total land area. Paddy agriculture employs 90% of the total farmer population. The importance of paddy in socioeconomic development is immense. Unfortunately, false smut disease destroys many paddies. As a result, farmers must face a significant amount of loss. Farmers, but also the country’s economy, have a considerable impact. As a result, we developed a model to detect false smut disease and found noticeable results on datasets. We hope our research contributes to developing a solution to one of Bangladesh’s most severe agricultural problems and inspires other scholars to investigate agriculture’s well-being.
References 1. Baite, M.S., Raghu, S., Prabhukarthikeyan, S., Keerthana, U., Jambhulkar, N.N., Rath, P.C.: Disease incidence and yield loss in rice due to grain discolouration. J. Plant Dis. Prot. 127(1), 9–13 (2020) 2. Biswas, A.: False smut disease of rice: a review. Environ. Ecol. 19(1), 67–83 (2001) 3. Flach, P.A.: The geometry of roc space: understanding machine learning metrics through roc isometrics. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 194–201 (2003) 4. Lei, X., Pan, H., Huang, X.: A dilated CNN model for image classification. IEEE Access 7, 124087–124095 (2019). https://doi.org/10.1109/ACCESS.2019.2927169 5. L´ opez, V., Fern´ andez, A., Herrera, F.: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed. Inf. Sci. 257, 1–13 (2014) 6. Patidar, S., Pandey, A., Shirish, B.A., Sriram, A.: Rice plant disease detection and classification using deep residual learning. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) MIND 2020. CCIS, vol. 1240, pp. 278–293. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6315-7 23 7. Prajapati, H.B., Shah, J.P., Dabhi, V.K.: Detection and classification of rice plant diseases. Intell. Decis. Technol. 11(3), 357–373 (2017) 8. Ramesh, S., Vydeki, D.: Recognition and classification of paddy leaf diseases using optimized deep neural network with Jaya algorithm. Inf. Process. Agric. 7(2), 249– 260 (2020)
False Smut Disease Detection in Paddy Using Convolutional Neural Network
21
9. Rao, D.S., Kavya, N., Kumar, S.N., Venkat, L.Y., Kumar, N.P.: Detection and classification of rice leaf diseases using deep learning. Int. J. Adv. Sci. Tech. 29(03), 5868–5874 (2020) 10. Sethy, P.K., Barpanda, N.K., Rath, A.K., Behera, S.K.: Rice false smut detection based on faster r-CNN. Indonesian J. Electr. Eng. Comput. Sci. 19(3), 1590–1595 (2020) 11. Shah, J.P., Prajapati, H.B., Dabhi, V.K.: A survey on detection and classification of rice plant diseases. In: 2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), pp. 1–8. IEEE (2016) 12. Wang, W.M., Fan, J., Jeyakumar, J.M.J., Jia, Y.: Rice False Smut: An Increasing Threat to Grain Yield and Quality. Protecting Rice Grains in the Post-genomic Era. London: IntechOpen, pp. 89–108 (2019) 13. Zainon, R.: Paddy disease detection system using image processing. Ph.D. thesis, UMP (2012)
Gabor Wavelet Based Fused Texture Features for Identification of Mungbean Leaf Diseases Sarna Majumder1,2 , Badhan Mazumder1,3(B) , and S. M. Taohidul Islam1,4 1
2
3
Faculty of Computer Science and Engineering, Patuakhali Science and Technology University, Patuakhali, Bangladesh [email protected] Department of Computer and Communication Engineering, Patuakhali Science and Technology University, Patuakhali, Bangladesh Department of Computer Science and Engineering, Dhaka International University, Dhaka, Bangladesh 4 Department of Electrical and Electronics Engineering, Patuakhali Science and Technology University, Patuakhali, Bangladesh
Abstract. Early diagnosis of crop plant disease is crucial since many of these diseases pose a considerable threat not only to global food security but also towards agricultural productivity. Aiming that, for detection of mungbean leaf diseases at the beginning stage, we introduce a novel approach based on Gabor Wavelet Transform (GWT) and Cubic SVM in this paper. We perform GWT to decompose the given images into eighteen directional sub-bands and then extract fusion of Gabor Wavelet (GW) based texture features from each detailed GWT coefficient subband. Finally, Cubic SVM was deployed to classify three different disease classes by using these GW based features, conducting cross validation at 10 fold. Outcomes of our experimental evaluation using our self-prepared dataset of mungbean leaf diseases exhibit that our proposed method yields overall a sensitivity of 91.11%, a specificity of 95.56%, a precision of 91.39% and an accuracy of 91.11%. Moreover, outcomes obtained from our comparative analysis confirm the supremacy of our proposed framework over 3 currently existing approaches.
Keywords: Crop Diseases SVM
1
· Gabor Wavelet · Texture Feature · Cubic
Introduction
Plant diseases, especially plant leaf diseases are considered as a prime reason for both quantitative and qualitative losses in agricultural production which badly affect the production cost as well as the agriculture based economy of Bangladesh. Since tools for precise and quick diagnosis still remain scarce, c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 22–34, 2023. https://doi.org/10.1007/978-3-031-34619-4_3
Gabor Wavelet Based Fused Texture Features
23
the nutrition security, the food supply and not to mention the livelihoods of farmers are hanging in the balance in case of any sort of bacterial or fungal disease outbreaks occur. Conventionally, farmers in our country still perform detection of diseases with their naked eyes which usually leads to inaccurate and biassed decisions since most of the leaf disease’s lesions appear to be exactly the same at the primary stage. This traditional approach eventually leads to the extensive use of pesticides, which not only increase the production cost significantly but also impact the surrounding environment negatively. Hence, the need for a robust early disease detection technique is crucial to prevent the widespread of diseases as well as substantial economic damages. Recent advances in computer vision have paved the way to capture the texture features of the lesion better than human eyes. Adopting these features, several studies have been conducted in recent past few years to recognize plant diseases using different machine learning (ML) algorithms and deep learning (DL) techniques. For disease identification of plants, Konstantinos et al. [5] developed a model based on Visual Geometry Group (VGG), which obtained 99.53% accuracy. Using GLCM and Wavelet-based features, Akhtar et al. introduced an automated system for plant disease diagnosis [1]. Different ML methods were trained adopting these features, including K-Nearest Neighbour (KNN), Decision Tree, Support Vector Machine (SVM), Naive Bayes Classifier, and Recurrent Neural Networks. Trivedi et al. employed K-means clustering algorithm with hue value to segment the contaminated region of leaf at first. The features from the desired region of interest were then retrieved and trained using SVM for classification [18]. Hyperspectral measurement based approach was proposed to detect leaf diseases by Ashourloo et al. in their work [4]. Too et al. [17] compared the outcomes of several deep learning models, including Inception V4, Visual Geometry Group (VGG), DenseNet and ResNet, for classification of disease adopting the popular “Plant Village” dataset, and concluded that DenseNet was the most convenient with 99.75%. To classify diseases in crops, Mohanty et al. [9] used a transfer learning (TL) technique using a pre-trained AlexNet. With a dataset of 54,306 sample and 99.35% accuracy, the model can diagnose 26 different diseases in 14 different crop species. Rangarajan et al. [11] classified tomato leaf diseases using AlexNet and VGG16, with VGG16 achieving 97.29% and AlexNet achieving 97.49%. Shijie et al. [12] also employed VGG16 to classify tomato leaf diseases, obtaining an accuracy of 88%. In addition, Ashok et al. [3] introduced CNN based algorithm in their work for diagnosis of tomato leaf lesions. As previously stated, deep learning algorithms comprise the majority of the recently suggested models for recognition of plant diseases. The performance of deep learning models, on the other hand, is highly dependent on the training dataset provided. On a sufficiently big dataset, these models deliver better outcomes and excellent generalizability. Since there is no publicly available dataset on mungbean leaf diseases, the datasets currently available in our possession lack sufficient images in a variety of conditions, which is considered necessary for developing highaccuracy trained models. In case of such small dataset, the developed model may perform poorly on actual test data by overfitting.
24
S. Majumder et al.
We introduce a two-stage method for detecting and classifying three major mungbean leaf diseases: Cercospora Leaf Spot, Powdery Mildew, and Yellow Mosaic in this work. Our work’s primary novelties can be summed up as follows. Pre-processing was done using a morphological technique in the first step to segment the lesion on the mungbean leaf surface. During the subsequent stage, a Gabor Wavelet (GW) based decomposition of segmented pictures into 18 directional coefficient sub-bands was done, with fusion of three different texture characteristics retrieved from each of the directional sub-bands. We picked a Gabor-based technique because it allows for directional decomposition, with each sub-band providing independent directional information in its own channel. Finally, Cubic SVM was used to analyse these extracted Gabor wavelet based fusioned texture features and categorise them into three separate disease classes using 10-fold cross validation. Our key contribution is to replace traditional texture features with a fusion of GW-based texture features to overcome its limitations and develop a robust mungbean leaf detection approach. This is the first work that we are aware of that proposes a mungbean disease diagnosis method based on the merging of GW-based features. Mungbeans are widely grown in Bangladesh’s southern coastal regions, notably in the Barisal and Patuakhali districts. We believe that the farmers in these remote areas will benefit from our proposed framework, which will not only expand mungbean productivity but also deter farmers from overusing pesticides, slowing the rate of environmental pollution in the near future. The remainder of this paper’s structure is as follows: Sect. 2 contains sufficient information about our mungbean leaf disease dataset as well as a detailed description of our proposed GW fusioned features and Cubic SVM-based framework. The experimental results of our investigation are presented in Sect. 3, followed by a thorough discussion, and the entire paper is summed up in Sect. 4.
2
Methodology
The following are the main phases in our proposed GW-based mungbean leaf disease detection approach: pre-processing and segmentation of mungbean leaf lesion, feature fusion extraction using GW, and disease classification employing Cubic SVM. In a nutshell, Fig. 1 depicts our proposed methodology. This section contains a elaborate overview of the dataset we utilized and the methodology we developed. 2.1
Dataset
Our data set includes 120 photos of mungbean leaves infected with three different forms of diseases: Cercospora Leaf Spot, Powdery Mildew, and Yellow Mosaic. The samples were obtained from several locations in the Patuakhali area, which is recognized for producing the most mungbean in Bangladesh. When blobs appeared, the leaves were collected and photographed the same day using a Canon 700D DSLR (18–55 mm lens) with a resolution of 5184 × 3456. We clliped
Gabor Wavelet Based Fused Texture Features
25
Fig. 1. Block diagram of our proposed GW based system.
ROI from each image and shrunk it to 256 × 256 pixels to make the model more efficient and reduce required computation time. The three types of mungbean leaf diseases in our dataset are depicted in Fig. 2.
Fig. 2. Three classes of mungbean leaf diseases in our study.
Lesion Segmentation. Because of the numerous contrasts on the leaf surface, the contrast enhancement approach is utilized to adjust pixel intensities, which allows to provide further information in certain sections of an image. In this study, we deploy an advanced contrast enhancement strategy [10] for further investigation to improve low-contrast features and improve contrast quality. The fundamental idea behind this method is to keep an input image’s mean brightness intact while adjusting contrast in local areas. To begin, the RGB color
26
S. Majumder et al.
channels of the input image are transformed to HSI. This method just considers the intensity parameter, leaving the hue and saturation parameters unchanged. Following that, a separator divides the intensity into two groups: high and low using Eq. 1 [10] βhi = β( s)|s > βm , βlo = β( g)|g ≤ βm
(1)
where m represents the value of transitory threshold intensity that is specified to partition the image into two sub-images, βlo and βhi are intensity low and high intensity groups, respectively. To accomplish the improved intensity, estimation of the 2 intensity based sub-parameters are combined using following Eq. 2 [10] βenhance (s) = βlo + (βhi − βlo ) × z(s)
(2)
where z(s) represents the incremental density obtained from the histogram. To reduce the inaccuracy, both mean and given brightness are computed and compared. This technique is repeated until the improved intensity value is found to be optimal. To construct the output image, along with other preliminary hue and saturation values the boosted intensity are aggregated and subsequently converted to RGB channel. This contrast enhanced RGB image is then converted to HSV space and later CLAHE [20] is deployed on the H channel. The CLAHE approach analyses a intensity histogram in a contextual region concentrated at each pixel and represents the intensity at that pixel according to the serialwise pixel’s intensity in its own histogram. After that, the global thresholding approach is adopted to transform that image into binary. To segment the lesion, this generated binary image is considered as a mask and typecast onto the original RGB image. The effect of our lesion segmentation approach is partially illustrated in Fig. 3.
Fig. 3. Outcomes of pre-processing and lesion segmentation.
Gabor Wavelet Based Fused Texture Features
2.2
27
Feature Extraction
Gabor Wavelet Representation. Two dimensional Gabor wavelet can be defined using Eq. 3 [15] 2 1 (3) ψG (x) = ejk0 x e− 2 |Bx| where B is a 2×2 diagonal matrix which can be elongated in any specific direction 1 and defined as B = [ε− 2 .1], ε >= 1. The k0 parameter represents the complex exponentials frequency. Elongation [14] of this filter is done by setting the value of k0 to [0, 3] and, ψ = 4, minimal-frequency complex exponential with a small number of significant oscillations orthogonal to the wavelet’s main axis. These two parameters were chosen to allow the transform to present higher responses to pixels linked with lesion texture, as they are well suited for the characterization of directional information. Alongside frequency and elongation, the other two fundamental characteristics of the Gabor wavelet are scale and orientation. The breadth of the elongated object is determined by scale, whereas the orientation of the objects is determined by orientation. Varying widths lesions can be found in the mungbean leaf lesion in distinct orientations. To accommodate all the possible sizes and orientation of lesion, we used two scales: 8 and 16 with 9 different orientations: 0◦ , 25◦ , 50◦ , 75◦ , 100◦ , 125◦ , 150◦ , 175◦ and 200◦ , in our work and kept the highest responses from all 9 orientations for further analysis. Gabor Wavelet Based Texture Feature Fusion. GLCMs, unlike traditional textural features, reveal spatial information and occurrence details despite being time consuming [19]. Leaf lesions may have distinct texture in one direction that can be tracked utilizing GLCMs. Likewise, binary patterns in mungbean leaf lesions could assist in the classification of disease groups. To characterize the spatial organization of a local image texture, the Local Binary Pattern (LBP) derives binary patterns from neighbouring pixels [7]. The Gray Level Run Length (GLRL) texture feature is a sophisticated texture feature that is responsive to intensity patterns. Since it can differentiate between distinct image intensities, it can facilitate the detection of lesion formation and dispersion. GW was solely employed for decomposition in our proposed feature extraction method. GW decomposes the provided input image into detailed directional coefficient subbands based on appropriate scales and orientation. As previously stated, we applied GW with two different scales and nine distinct orientations, yielding 18 separate directional sub-bands from each input image. After that, from each one of the 18 directional sub-bands, GLCM, LBP and GLRL features are extracted. Our study employs the most common twenty-two GLCM features [2]. Four main directions, 0◦ (Horizontal H), 90◦ (Vertical),45◦ (Diagonal D1), and 135◦ (Diagonal D2), are chosen to construct corresponding GLCMs and their average is employed for experimentation in order to gain irregular patterns of mungbean leaf lesion. LBP is calculated by comparing pixel values to neighbouring pixels in each directional sub-band [8]. For each gray level, GLRL is formed by stating the
28
S. Majumder et al.
direction and quantifying the number of runs and length in those directions [16]. Algorithm 1 demonstrates our proposed GW-based texture feature extraction algorithm. Algorithm 1. GW based texture feature extraction algorithm Input: Lesion segmented mungbean leaf image, value of scaling array S [2] and orientation array O [9] of Gabor wavelet. Output: GW based texture dataset Initialisation : 1: for for m = 1 to the number of lesion segmented mungbean leaf image: do 2: Read input image Im 3: Read input value of array S and O for of Gabor wavelet 4: Apply GW decomposition of Im according to S and O array 5: Obtain 18 directional sub-bands in total 6: Determine GLCM+LBP+GLRL features from each one of the 18 direction subbands 7: Return the GW texture feature vector (1×18) and include it to the corresponding dataset. 8: end for 9: return GW based texture dataset
2.3
Classification Using Cubic SVM
We used an SVM classifier in our study to categorize mungbean leaf lesions based on GW texture features, which is favourable when memory space is limited. Around 70% of all lesion containing leaf image in our databases were solely used for training, while 30% were used for testing purpose. In order to generate the requisite trained model, the derived GW texture features were supplied to the SVM classifier. In multidimensional feature space, SVM finds a hyperplane that separates the classes in the best feasible way [6]. SVM employs a mathematical function known as the kernel to generate the geometrical hyper plane. Linear, polynomial, radial basis function (RBF), sigmoid and non-linear are among some of the kernel functions [13]. The kernel is basically defined employing Eq. 4: k(¯ x) = 1; if |¯ x| ≤ 1; otherwise 0;
(4)
In our work, we implemented Cubic SVM with a polynomial kernel function of order 3 (box constraint level = 1), which can be expressed using Eq. 5. 3 k xTi .xj + 1
3
(5)
Results and Discussion
For our experimental studies, we employed an 64-bit PC (Acer) with a i5 CPU (2.4 GHz) and 8 GB RAM, as well as MATLAB R2017b for all sorts of program implementation.
Gabor Wavelet Based Fused Texture Features
3.1
29
GW Decomposition
Figure 4(a) exhibits a lesion segmented image of affected mungbean leaf, and Fig. 5 demonstrates all 18 sub-bands derived using GW decomposition with two distinct scales and nine different orientations. To highlight the feasibility of GW representation, we performed discrete wavelet transform (DWT) with basic filter (filter: haar, order: 4) on the same image illustrated in Fig. 4(a) and presented all 16 generated sub-bands in Fig. 4(b).
Fig. 4. (a) Segmented image (b) Outcome of decomposition by applying DWT (4 levels).
3.2
Statistical Analysis
In the Cubic SVM classifier, we feed a 120×3827 matrix, where the row value 120 defines the total amount of leaf photos in our dataset, and the overall number of retrieved GW-based fused texture features for one image is represented by the column value 3827. For the performance assessment of our suggested method, we evaluated four distinct indices: sensitivity, specificity, precision, and accuracy. We also carried 10 fold cross validation to ensure optimal validation, and the corresponding statistics for each specific disease class are shown in Table 1. Table 1 demonstrates that overall attained sensitivity, specificity, precision, and accuracy incorporating 10 fold cross validation are 91.11%, 95.56%, 91.39%, and 91.11%, respectively. This is because GW-based fused texture features are superior at categorizing the complicated texture of mungbean leaf lesions and retrieving local texture properties.
30
S. Majumder et al.
Fig. 5. Obtained 18 sub-bands of GW decomposition Table 1. Statistical analysis adopting 10-fold cross validation (unit: %).
3.3
Disease’s Name
Sensitivity Specificity Precision Accuracy
Cercospora Leaf Spot Powdery Mildew Yellow Mosaic Overall
93.33 86.67 93.33 91.11
93.33 93.33 100 95.56
87.50 86.67 100 91.39
93.33 86.7 93.33 91.11
Comparison to Other Classifiers
Our collected features were also supplied into linear discriminant analysis (LDA), decision tree (DT), and k nearest neighbour (KNN) classifiers for comparability purpose with cubic SVM employing 10 fold cross validation, with their outcomes reported in Table 2. Table 2. Comparison of classifiers (unit: %).
3.4
Classifiers
Sensitivity Specificity Precision Accuracy
LDA DT KNN Cubic SVM
75.56 62.22 75.56 91.11
87.78 81.11 87.78 95.56
75.48 63.55 76.12 91.39
75.56 62.22 75.56 91.11
Comparison to Existing Approaches
To demonstrate the effectiveness and sustainability of our work, we compared our suggested GW+Texture+CubicSVM strategy to three different method-
Gabor Wavelet Based Fused Texture Features
31
ologies, the findings of which are reported in Table 3. Table 3 indicates that DCT+DWT+Linear SVM [1] yields 82.22%, 91.11%, 82.46%, and 82.25%; Hue+Linear SVM [18] yields 75.6%, 87.78%, 75.93%, and 75.55%; and CNN [3] yields 88.89%, 94.44%, 89.36%, and 88.91%, respectively, in terms with sensitivity, specificity, precision and accuracy. On the other hand, our proposed GW+Texture+CubicSVM approach achieves 91.11%, 95.56%, 91.39%, and 91.11%, respectively, in therms of those 4 different indices which seem to be superior and improved than others. Table 3. Comparison to Existing Approaches (unit: %). Methods
Sensitivity Specificity Precision Accuracy
DCT+DWT+Linear SVM [1] Hue+Linear SVM [18] CNN [3] GW+Texture+Cubic SVM (Proposed)
82.22 75.6 88.89 91.11
3.5
91.11 87.78 94.44 95.56
82.46 75.93 89.36 91.39
82.25 75.55 88.91 91.11
Analysis of Computational Time
We assessed elapsed time for each produced program utilizing 2 different MATLAB functions: tic and toc. The findings of computational time analysis for all 4 employed classifiers are depicted in Fig. 6, which reveal that LDA, DT, KNN, and Cubic SVM consume 2.7310 ± 2.0297, 3.1002 ± 1.8578, 2.5138 ± 1.9354, and 3.2910 ± 1.8958 s seconds (mean ± standard deviation), respectively. 3.6
Discussion
Table 2 highlights that Cubic SVM surpasses all four deployed classifiers when adopting 10-fold cross validation, with a sensitivity of 91.11%, a specificity of 95.56%, a precision of 91.39%, and an accuracy of 91.11%. DT, on the other hand, provide one of the worst performance, with a sensitivity of 62.22%, a specificity of 81.11%, a precision of 63.55%, and an accuracy of 62.22%, allowing us to state that cubic svm delivers the best classification performance of all the classifiers selected. Outcomes of Table 3 implies that our GW+Texture+Cubic SVM approach is extremely resilient. This could be for two basic reasons. To begin, we suggested a novel feature extraction strategy to provide a productive collection of features for our diseases detection system by integrating GW and texture characteristics. Second, Cubic SVM is a powerful way to improve classification results by evaluating detailed directional data. The plots in Fig. 6 illustrate that the expected computation time for all employed classifiers are satisfactory and is fairly close. In addition, as model
32
S. Majumder et al.
Fig. 6. Comparison of classifiers required computational time (Unit: seconds)
training will not be necessary for each new instance, calculation time will be reduced even further in practice. We employed a self-collected database with a limited collection of pictures to examine the performance of our retrieved GW based texture features for classifying mungbean leaf lesion, which is the major shortcoming of our study. In the future, we plan to use additional affected mungbean leaf images to assess our GW+Texture+Cubic SVM approach. Moreover, in addition to other classification approaches for evaluating the overall performance of extracted features, there are more highly developed transforms such as Shearlet Transform that have the potential to outperform GW, which we would like to explore in our future research work.
4
Conclusion
Our study introduces a gabor wavelet based fused texture feature extraction algorithm for early disease diagnosis in mungbean leaves. Traditional texture features are substituted with GW-based texture features in order to more correctly categorize disease classes with overall enhanced performance. Comparative experimental findings revealed that our GW+Texture+CubicSVM method outperforms three other current techniques by obtaining improved outcomes which directly prove our method’s reliability and supremacy.
Gabor Wavelet Based Fused Texture Features
33
Acknowledgement. This study is being carried out with the collaboration of the CRG of PIU-BARC, NATP-2, Asi@Connect, and TEIN society. Special thanks to Sudipto Baral and Manish Sah from CSE-12th batch, Patuakhali Science and Technology University, Patuakhali, Bangladesh for their efforts and assistances in preparing the dataset.
References 1. Akhtar, A., Khanum, A., Khan, S.A., Shaukat, A.: Automated plant disease analysis (APDA): performance comparison of machine learning techniques. In: 2013 11th International Conference on Frontiers of Information Technology, pp. 60–65. IEEE (2013) 2. Albregtsen, F., et al.: Statistical texture measures computed from gray level coocurrence matrices. Image Processing Laboratory, Department of Informatics, University of Oslo 5(5) (2008) 3. Ashok, S., Kishore, G., Rajesh, V., Suchitra, S., Sophia, S.G., Pavithra, B.: Tomato leaf disease detection using deep learning techniques. In: 2020 5th International Conference on Communication and Electronics Systems (ICCES), pp. 979–983. IEEE (2020) 4. Ashourloo, D., Aghighi, H., Matkan, A.A., Mobasheri, M.R., Rad, A.M.: An investigation into machine learning regression techniques for the leaf rust disease detection using hyperspectral measurement. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(9), 4344–4351 (2016) 5. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145, 311–318 (2018) 6. Jain, U., Nathani, K., Ruban, N., Raj, A.N.J., Zhuang, Z., Mahesh, V.G.: Cubic SVM classifier based feature extraction and emotion detection from speech signals. In: 2018 International Conference on Sensor Networks and Signal Processing (SNSP), pp. 386–391. IEEE (2018) 7. Li, L., Fieguth, P.W., Kuang, G.: Generalized local binary patterns for texture classification. In: BMVC, vol. 123, pp. 1–11 (2011) 8. M¨ aenp¨ aa ¨, T., Pietik¨ ainen, M.: Texture analysis with local binary patterns. In: Handbook of pattern recognition and computer vision, pp. 197–216. World Scientific (2005) 9. Mohanty, S.P., Hughes, D.P., Salath´e, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016) 10. Rahman, M., Liu, S., Lin, S., Wong, C., Jiang, G., Kwok, N.: Image contrast enhancement for brightness preservation based on dynamic stretching. Int. J. Image Process. 9(4), 241 (2015) 11. Rangarajan, A.K., Purushothaman, R., Ramesh, A.: Tomato crop disease classification using pre-trained deep learning algorithm. Procedia Comput. Sci. 133, 1040–1047 (2018) 12. Shijie, J., Peiyi, J., Siping, H., et al.: Automatic detection of tomato diseases and pests based on leaf images. In: 2017 Chinese Automation Congress (CAC), pp. 2537–2510. IEEE (2017) 13. Singh, S., Kumar, R.: Histopathological image analysis for breast cancer detection using cubic SVM. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 498–503. IEEE (2020)
34
S. Majumder et al.
14. Soares, J.V., Cesar Jr, R.M.: Segmentation of retinal vasculature using wavelets and supervised classification: theory and implementation. In: Automated Image Detection of Retinal Pathology, pp. 239–286. CRC Press (2009) 15. Soares, J.V., Leandro, J.J., Cesar, R.M., Jelinek, H.F., Cree, M.J.: Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification. IEEE Trans. Med. Imaging 25(9), 1214–1222 (2006) 16. Tang, X.: Texture information in run-length matrices. IEEE Trans. Image Process. 7(11), 1602–1609 (1998) 17. Too, E.C., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric. 161, 272–279 (2019) 18. Trivedi, V.K., Shukla, P.K., Pandey, A.: Hue based plant leaves disease detection and classification using machine learning approach. In: 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), pp. 549–554. IEEE (2021) 19. Wei, L., Hong-ying, D.: Real-time road congestion detection based on image texture analysis. Procedia Eng. 137, 196–201 (2016) 20. Zuiderveld, K.J.: Contrast limited adaptive histogram equalization. In: Graphics Gems (1994)
Potato Disease Detection Using Convolutional Neural Network: A Web Based Solution Jannathul Maowa Hasi(B)
and Mohammad Osiur Rahman
Department of Computer Science and Engineering, University of Chittagong, Chattogram 4331, Bangladesh [email protected], [email protected]
Abstract. Despite being reliant on agriculture for the provision of food, many nations, including Bangladesh, struggle to feed their populations enough. The potato (Solanum tuberosum) is Bangladesh’s second-most popular and in-demand crop. But deadly diseases late blight and early blight cause an enormous loss in potato production. To increase plant yields, it is important to identify the symptoms of these diseases in plants in the early stage and advise farmers on how to respond. This project involves the development of a web application that allows users to upload images of potato leaves and then use a trained CNN model to diagnose the disease from those images. After comparing with different Convolutional Neural Network Models (EfficientNetB0-B3, MobileNetV2, DenseNet121, and ResNet50V2), MobileNetV2 was able to reach an accuracy of 96.14% with the test dataset in detecting early blight and late blight. So, MobileNetV2 is deployed in the web application to detect the disease of the input image. The developed web application offers a user-friendly interface that enables farmers who are less tech-savvy to use this method to identify disease at an early stage and prevent it. Keywords: Deep Learning · CNN · Potato Disease · Late Blight · Early Blight · Disease Detection
1 Introduction No food no life. In Bangladesh, more than 40 million peoples (27 percent of the population) are food insecure, with more than 11 million people suffering from intense hunger [1]. In Bangladeshi farming, Potato (Solanum tuberosum) is one of the most demanding and cultivating plants. Bangladesh is the world’s seventh largest potato grower. It is the second most produced crop in Bangladesh, behind rice [2]. But being affected by deadly diseases like late blight or early blight Bangladeshi potato cultivation faces a great loss every year. According to the Department of Agricultural Extension (DAE), with an annual average demand of around 70 lakh tones, there was a surplus of about 40 lakh tones of potatoes in 2020 [22]. Late blight and early blight are the two most common and destructive diseases of potatoes. Alternaria solani is the fungus that causes early blight in potatoes. The disease © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 35–48, 2023. https://doi.org/10.1007/978-3-031-34619-4_4
36
J. M. Hasi and M. O. Rahman
damages leave, stems, and tubers, reducing yield, tuber size, tuber storability, freshmarket, processed tuber quality, and crop marketability [3]. Every year, early blight wreaks havoc on the potato crop, producing premature defoliation, a major reduction in tuber output, and a quality loss of up to 50% [4]. Late blight (caused by the water mold Phytophthora infestans) is the most destructive, fungal disease in potatoes. Late blight causes the loss of almost three million hectares of potato fields globally [5]. Bangladesh also faces a loss of 25–27% of potato production each year due to that specific disease according to FAO (Food and Agriculture Organization of United Nations) [6]. The effect of applying remedies like- fungicides against potato blights highly depends on the timing of the application. After affecting certain plants this disease can destroy the plants of the entire field in no time [7], and the decay continues even after uprooting all the infected plants. To improve the efficiency of controlling late blight detecting late blight infection effectively in the primary stage and informing the farmer about the prediction is mandatory. The entrance of the fungi responsible for late and early blight is through leaves. So, the symptoms appear at the leaves first (Fig. 1).
Fig. 1. Healthy, Late Blight and Early Blight Infected Potato Leaves
Traditionally farmers depend on their eyesight and visual inspection to detect the symptoms. But this is not feasible as humans are not error-free. Hopefully, in recent years many researchers used CNN to detect plant disease resulting in a truly significant result [8].
2 Related Work Earlier, many researchers worked with CNN for developing a plant disease classification systems and recommended various approaches. Tiwari et al. [9] proposed a model using deep learning to detect potato leaf diseases. The Author used pre-trained models inceptionV3, VGG16, and VGG19 to extract the feature of an input image. And for classification of the input image classifiers - KNN, SVM, Neural Network, and logistic regression are used. According to that study, author found the optimal result with model- VGG19 and classifier – Logistic Regression. Emma Harte [21] worked with a plant disease recognition system using CNN. The author used ResNet34 and optimized it to get a better result while dealing with ‘in-field’ images. But the great irony of this work is, that although the author’s model showed a great accuracy while tested with laboratory-controlled images, that same model showed an accuracy of only 44% with images collected from fields or other no controlled image
Potato Disease Detection Using Convolutional Neural Network
37
sources. Moreover, the author implemented a web-based classification system with that optimized model. Agarwal et al. [10] presented a deep learning-based technique to detect early and late blight diseases in potatoes. His experiments show that his proposed model works even in difficult conditions like changing backdrops, image sizes, spatial differentiation, a high-frequency variation in illumination grades, and real-world photos. His proposed Convolution Neural Network Architecture (CNN) includes four convolution layers, each with 32, 16, and 8 filters, and provides training and testing accuracy of 99.47% and 99.8%, respectively. Afzaal, H. et al. [11] trained three convolutional neural networks- GoogleNet, VGGNet, and EfficientNet, using the PyTorch framework to detect early blight using AI (Artificial Intelligence). As result, He achieved validation accuracy in the range of 0.95 – 0.97 for all three candidates in detecting early blight. But, the performance accuracy of EfficientNet and VGGNet was higher when compared with GoogleNet. Rashid, J. et al. [12] developed a model to detect potato leaf diseases using multi-level deep learning. In the first level of the author’s proposed model, he used the YOLOv5 image segmentation technique to extract potato leaf images from potato trees. In the second layer of the proposed model a PDDCNN (Potato Disease Detection Convolutional Neural network) is developed to classify the input images as Healthy, Late Blight infected, or Early Blight infected. The author’s developed model achieves an average accuracy of 91.15%.
3 Proposed System In Fig. 2 the block diagram of the proposed system is illustrated. It is aimed to develop a web application using Convolutional Neural Network model MobileNetV2. MobileNetV2 was proven the best model concerning performance by comparing eight different CNN models (ResNet50V2, MobileNetV2, DenseNet121, and EfficientNetB0 – B3) with the same dataset. At first, the CNN model is trained with the selected dataset. Then the trained CNN model is used to implement a web application. The application takes the potato leaf’s image as input. The inputted image is then pre-possessed. After preprocessing feature extraction takes place. Extracted features are used to run the classification process and finally the result of classification, which is originally the name of the predicted class is viewed by the application user on the web interface. 3.1 Methodology This research has two main parts- comparison of models and implementation of the web application to detect potato diseases. For the comparison part, some well-known Convolutional Neural Network (CNN) models were trained using the dataset. For the training purpose, TensorFlow’s Keras framework is used and the models are compared concerning their performances. After the comparison, the best candidate model was selected to be deployed to the web application. The pertained model is integrated with the web app, developed using Flask API, to classify the inputted image.
38
J. M. Hasi and M. O. Rahman
Fig. 2. Block Diagram of the System
CNNs have multiple layers, so when an image passes through the CNN layers, it technically gets deeper and deeper by getting smaller in every layer [13]. So, when the leaf’s image passes each layer it gets deeper and smaller, and the most important features are filtered out. In the case of MobileNetV2, it was introduced to gain better performance with mobile devices [15]. The basic architecture of MobileNetV2 consists
Potato Disease Detection Using Convolutional Neural Network
39
of 17 building blocks in a row and is followed by a 1x1 convolution layer, 1 Avg pool, and a classification layer resulting in the model 53 layers deep [14].. 3.2 Dataset Description All images of potato leaves are collected from New Plant Disease Dataset [16]. This dataset contains a total of 87.9k images and 38 subdirectories with test and train classes. To carry out the research, the dataset is divided into 80% for training and 20% for validation. Table 1 shows the number of samples under the Early_Blight, Healthy, and Late_Blight classes of the dataset after the division. Sample images from the used dataset are also viewable in Fig. 3. Table 1. Dataset Description Class
Train
Test
Total
Healthy
1824
456
2280
Late_Blight
1939
485
2424
Early_Blight
1939
485
2424
Fig. 3. Sample images from Healthy, EarlyBlight and LateBlight classes
3.3 Data Preprocessing Images from the selected dataset are preprocessed to get better results while classification. In the preprocessing phase input image was resized to (256, 256) for EfficientNetB0, DenseNet121, ResNet50V2and MobileNetV2, (240, 240) for EfficientNetB1, (260, 260) for EfficientNetB2 and (300, 300) for EfficientNetB3. In case of data augmentation, three different versions (excluding the original version) of images from the “New Plant disease Dataset” are used to ensure that the models are not biased while classifying a sample [16].
40
J. M. Hasi and M. O. Rahman
Fig. 4. Different version of leaf images
3.4 Evaluation Measures Following evaluation measures [17] are used to evaluate the convolutional neural network model that is integrated into the web application. Accuracy Accuarcy =
Number of correct predictions Number of total predictions
(1)
Latency The time taken by the model to process one unit of data is referred to as latency. It is measured in seconds. Precision Precision =
True Positive True Positive + False Positive
(2)
Recall Recall =
True Positive True Positive + False Negative
(3)
F1-Score F1-score = 2 ∗
Precision ∗ Recall Precision + Recall
(4)
3.5 UML Use Case Diagram of Proposed Web Application UML Use Case Diagrams are used to collect a system’s needs, including those resulting from both internal and external influences. When a system is investigated to determine its functioning, use cases are created and actors are recognized [18]. The user is the actor in the online application, and the three primary features are uploading an image, viewing predictions, and watching associated videos. The Use Case Diagram of the proposed web app is shown in Fig. 5.
Potato Disease Detection Using Convolutional Neural Network
41
Fig. 5. UML Use Case Diagram
3.6 UML Activity Diagram of Proposed Web Application A behavioral diagram that shows a system’s behavior is called an activity diagram [18]. The control flow from beginning to end and every action that is being done are shown in the system’s activity diagram. The Activity Diagram of the proposed system is shown in Fig. 6.
4 Experimental Result In this section, the comparison of CNN models,model evaluation, user interface of the web application, sample prediction of the web application, comparison with other works are described in details. 4.1 Comparison of CNN Models The result of comparison of CNN models EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, MobileNetV2, DenseNet121 and ResNet50v2 is shown in Table 2. A comparison of the accuracy and training time of those models is also shown in Fig. 7. Hyper parameters used in the comparison – – – –
Activation = Softmax Class Mode = Categorical Optimizer = RMSprop Classifier = Softmax
42
J. M. Hasi and M. O. Rahman
– Input Shape = (224, 224, 3) for EfficientNetB0, DenseNet121, ResNet50V2, MobileNetV2, (240, 240, 3) for EfficientNetB1, (260, 260, 3) for EfficientNetB2 and (300, 300, 3) for EfficientNetB3
Fig. 6. UML Activity Diagram
Fig. 7. Comparison of accuracy and training time
After the comparison, MobileNetV2, EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, DenseNet121 and ResNet50v2 reached 96.14%, 93.55%, 91.94%, 92.78%, 91.37%, 92.85% and 82.68% respectively. Considering the latency, EfficientNetB1 (0.0301 s) takes a lead over MobileNetV2 (0.0336 s) but MobileNetV2 beats EfficientNetB1 with a training time of 16.8s. Moreover, MobileNetV2 was designed to get better results with mobile devices. Considering all, the comparison makes MobileNetV2 the best candidate.
6
EfficienetB0
DefenseNet121
ResNet50V2
MobileNetV2
EfficienetB1
EfficienetB2
EfficienetB3
1
2
3
4
5
6
7
6
6
6
6
6
6
Batch size
Mode l
Expe rime nt no.
10
10
10
10
10
10
10
Epochs
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
Learning Rate
Parame te rs
0.9649 Re call
0.8699 Pre cis ion
Late blight
0.8887 Re call
0.9329 Pre cis ion
Late blight
0.9567
Healthy
0.7361 Re call
0.8283 Pre cis ion
Late blight
0.9794
Healthy
0.9588 Re call
0.9337 Pre cis ion
Late blight
0.9485
Healthy
0.8825 Re call
0.9049 Pre cis ion
Late blight
0.9402
Healthy
0.9673 0.8792 0.8983
Early Blight Healthy Late blight
0.8722 Re call
0.9276 Pre cis ion
Late blight
0.9629
Healthy
0.8557
0.9737
0.9155
0.9496
0.9473 0.9078
Early Blight
0.9364
0.9048 0.9510
Early Blight
0.9781
0.9829 0.9696
Early Blight
0.7610
0.7917 0.8785
Early Blight
0.9408
0.9336 0.9186
Early Blight
0.9715
0.9651
Healthy
0.8722
Re call
0.9860
Early Blight
Pre cis ion
Pe rformance Matrix
Table 2. Comparison of CNN models
0.8765
0.9240
0.9407
F1-score
0.8990
0.9282
0.9550
F1-score
0.8935
0.9436
0.9221
F1-score
0.9461
0.9738
0.9654
F1-score
0.7795
0.8155
0.8756
F1-score
0.9102
0.9296
0.9450
F1-score
0.9150
0.9683
0.9256
F1-score
0.0482s
0.0358s
0.0301s
0.0336s
0.0385s
0.0408s
0.0396s
Late ncy
0.9137
0.9278
0.9194
0.9614
0.8268
0.9285
0.9355
Accuracy
Potato Disease Detection Using Convolutional Neural Network 43
44
J. M. Hasi and M. O. Rahman
4.2 Model Evaluation Confusion Matrix evaluates a model’s performance by the ability to classify samples as positive or negative correctly. Figure 8 shows the confusion matrix of MobileNetV2.
Fig. 8. Confusion Matrix & performance matrix of the model
Accuracy, precision, recall, f1-score etc. helps us to know how good the model actually performs while classifying images. The model loss measures how well a model performs during classification, whereas the accuracy measures how closely a model’s forecast is to the actual data. After each optimization iteration, a model’s performance is shown by its loss value [19] (Fig. 9).
Fig. 9. Model Accuracy and Model Loss of MobileNetV2
4.3 User Interface of the Web Application The user interface of the implemented web application is shown in Fig. 10. The user interface contains two buttons, one for selecting the leaf’s image to upload from the device’s storage and the other is to predict the disease by analyzing the input image.
Potato Disease Detection Using Convolutional Neural Network
45
Fig. 10. User Interface of the web app
4.4 Sample Prediction of the Web Application The proposed system predicts whether the uploaded image is healthy or infected by late blight or early blight and shows the prediction as a result. The prediction process takes place when the user clicks on predict button and as result, the predicted class name is shown on the bottom section of the web page alongside a preview of the uploaded image, some preventative measures, and some disease-related videos to guide the user with his proceedings. Figure 11 shows a sample output of the system. 4.5 Comparison with Other Works Previously many authors worked to detect plant diseases. Although the outcome of the work is a web application that will help the users detect the disease by simply capturing an image, which is the main point to be compared, a comparative performance analysis with some existing works is shown in Table 3.
46
J. M. Hasi and M. O. Rahman
Fig. 11. Sample output of the system
Table 3. Comparative Analysis with Other Works Reference
Plant
Used Approach
Accuracy
Remarks
[9]
Multiple
ResNet34
44%
Better accuracy with field captured images
[12]
Potato
YOLOv5 Image segmentation technique and Multilayer CNN
91.15%
Comparatively better accuracy
[10]
Potato
Multilayer CNN model with 4 layers
99.45%
No proof that the proposed method is better than MobieNetV2 with mobile devices
Proposed Work
Potato
MobileNetV2
96.14%
Final outcome is a handy user friendly web app. Better accuracy with mobile devices
Potato Disease Detection Using Convolutional Neural Network
47
5 Conclusion Early detection of plant diseases is essential for maintaining crop quality. In comparison to hand-crafted-based approaches, deep learning techniques, notably convolutional neural network architectures, show promising outcomes. Potato is one of the major crops of Bangladesh. But due to some diseases, this country faces a massive loss in potato production each year. Detecting late blight and early blight, the two most deadly diseases of potatoes, in the earlier stage can help reduce the production loss. In this research, some CNN models (MobileNetV2, ResNet50V2, DenseNet121, EfficientNetB0-B3) were trained with a New Plant Disease Dataset [16] containing thousands of potato images. While measuring performance, MobileNetV2 reached the best performance with 96.14% test accuracy. So, the trained MobileNetV2 model was deployed and integrated into a web application that takes a potato leaf’s image as input and detects if the leaf is healthy, and predicts the disease otherwise. In the future, there is a plan of developing an organized dataset by collecting images by field studying Bangladeshi farms and gardens. Working with other deadly diseases of potato (for example – Common Scab Streptomyces spp., Fusarium Dry Rot Fusarium spp., and other major vegetables and crops (for example – Rice Blast of Rice (Oryza sativa), Fusarium Wilt of Lentil (Lens culinaris) [20] Stemphylium leaf blight of Onion (Allium cepa), etc.) is also in the future aspects of this research. Nowadays, almost everyone has a mobile device in their hand. So using a cellphone to detect plant disease by simply capturing an image of leaves can act as a lifesaver for our farmers. Hence, it is recommended to use this system to detect potato’s disease in an early stage and take proper steps to reduce the decay caused by these deadly diseases.
References 1. GHI: Global Hunger Index for Bangladesh (2021). https://www.globalhungerindex.org/ban gladesh.html. Last accessed 4 May 2022 2. BBS, Bangladesh Bureau of statistics (BBS): Agricultural Statistics Yearbook, 2012–2013 3. Bauske, M.J., Robinson, A.P.: Early blight in potato (2018). https://www.ag.ndsu.edu/public ations/crops/early-blight-in-potato 4. Landschoot, S., Vandecasteele, M., De Baets, B., Hofte, M., Audenaert, K., Haesaert, G.: Identification of A. arborescens, A. grandis, and A. protenta as new members of the Eur pean Alternaria population on potato. Fung. Biol. 121, 172–188 (2017). https://doi.org/10.1016/j. funbio.2016.11.005 5. The daily Star: Potato freed from deadly disease. https://www.thedailystar.net/frontpage/pot ato-freed-deadly-disease-209158. Accessed 29 Jan 2016, 21 Nov 2021 6. Hengsdijk, H., van Uum, J.: Geodata to control potato late blight in Bangladesh. https:// www.fao.org/e-agriculture/news/geodata-control-potato-late-blight-bangladesh-geopotato. Accessed 15 Mar 2017 7. Pande, A., Jagyasi, B.G., Choudhuri, R.: Late blight forecast using mobile phone based agro advisory system. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) PReMI 2009. LNCS, vol. 5909, pp. 609–614. Springer, Heidelberg (2009). https://doi.org/ 10.1007/978-3-642-11164-8_99 8. Toda, Y., Okura, F.: How convolutional neural networks diagnose plant disease. Plant Phenom. 2019, 1–14 (2019). https://doi.org/10.34133/2019/9237136
48
J. M. Hasi and M. O. Rahman
9. Tiwari, D., Ashish, M., Gangwar, N.: Potato leaf diseases detection using deep learning. IEEE (2020). 978-1-7281-4876-2/20/\$31.00 10. Agarwal, M., Sinha, A., Gupta, S.K., Mishra, D., Mishra, R.: Potato crop disease classification using convolutional neural network. In: Somani, A.K., Shekhawat, R.S., Mundra, A., Srivastava, S., Verma, V.K. (eds.) Smart Systems and IoT: Innovations in Computing. Smart Innovation, Systems and Technologies, vol. 141, pp. 391–400. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8406-6-37 11. Afzaal, H., et al.: Detection of a Potato disease (early blight) using artificial intelligence. Remote Sens. 13, 411 (2021). https://doi.org/10.3390/rs13030411 12. Rashid, J., Khan, I., Ali, G., Almotiri, S.H., AlGhamdi, M.A., Masood, K.: Multi-level deep learning model for potato leaf disease recognition. Electronics 10(17), 2064 (2021). https:// doi.org/10.3390/electronics10172064 13. Géron, A.: Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems, Chap 14. O’Reilly (2019). https://www. oreilly.com/library/view/hands-on-machine-learning/9781492032632/ 14. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 15. Hollemans, M.: MobileNet version 2. https://machinethink.net/blog/mobilenet-v2/. Accessed 22 Apr 2018 16. Bhattarai, S.: New Plant Disease Dataset (2018). https://www.kaggle.com/vipoooool/newplant-diseases-dataset 17. Gad, A.F.: Evaluating deep learning models: the confusion matrix, accuracy, precision, and recall. https://blog.paperspace.com/deep-learning-metrics-precision-recall-acc uracy/. Accessed 20 May 2021 18. Sommerville, I.: Software Engineering, 9th edn., chapter-7, p. cm. Includes index (2010). ISBN-13: 978-0-13-703515-1, ISBN-10: 0-13-703515-21 19. Riva, M.: Interpretation of loss and accuracy for a machine learning model (2021). https:// www.baeldung.com/cs/ml-loss-accuracy 20. Tiwari, N., Ahmed, S., Kumar, S., Sarker, A.: FusariumWilt: A Killer Disease of Lentil (2018). https://doi.org/10.5772/intechopen.72508 21. Harte, E.: Plant disease detection using CNN (2020). https://doi.org/10.13140/RG.2.2.36485. 99048 22. Potato freed from deadly disease. https://www.thedailystar.net/frontpage/potato-freed-dea dly-disease-209158. Accessed 25 July 2022
Device-Friendly Guava Fruit and Leaf Disease Detection Using Deep Learning Rabindra Nath Nandi1(B) , Aminul Haque Palash1 , Nazmul Siddique2 and Mohammed Golam Zilani3
,
1 BJIT Limited, Dhaka, Bangladesh
[email protected]
2 School of Computing, Engineering and Intelligent Systems, Ulster University, Coleraine, UK
[email protected]
3 Swinburne Online Australia, Melbourne, Australia
[email protected]
Abstract. This research presents a machine learning model that detects plant disease from images of fruits and leaves. Five state-of-the-art machine learning models are used in this research, which achieved high accuracy in detecting plant disease. The problem with such models with high accuracy is that the models are of larger size which do not allow them to be applied to end-user devices. In this research, model quantization techniques such as float16 and dynamic range quantization are applied to the model architectures. The experimental results show that the GoogleNet model reached the size of 0.143 MB with an accuracy of 97% and the EfficientNet model reached the size of 4.2 MB with an accuracy of 99%. The source codes are available at https://github.com/CompostieAI/Guava-diseasedetection. Keywords: Fruits and leaf disease detection · Guava disease · convolutional neural network · model quantization · model size reduction
1 Introduction Agriculture is a key player for sustainable development and global food security, which is becoming more challenging with the growing world population. Agriculture faces crop loss every year due to draught, pests and plant diseases. Plant disease is a major threat to plant health, production of major crops and economic loss in agriculture every year. There are plant pathogens which can spread from plant to plant very fast. Therefore, early diagnosis of plant disease is of key importance to prevent disease spread and crop production. Pathological test for diagnosis of plant disease is very often time consuming due to sample collection, processing and analysis. Moreover, pathological laboratory facilities are also not available in many parts of the country. An alternative to pathological test is the traditional visual assessment of plant symptoms. The traditional method requires an experienced expert in the domain but the visual assessment method © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 49–59, 2023. https://doi.org/10.1007/978-3-031-34619-4_5
50
R. N. Nandi et al.
is too subjective and the domain expert may not always be available in remote areas. Moreover, there are variants of disease due to variations of plant species and climate changes, which can make it difficult for an expert to diagnose plant disease correctly. Machine learning methods are widely employed for plant disease diagnosis with higher accuracy [1–3]. The use of deep learning [4], a class of machine learning algorithm, is increasing due to its promising result in numerous applications including agriculture and big data. A study shows that a Deep CNN (Convolutional Neural Network) provides an accuracy of 99.35% for a dataset containing 54,000 images of 26 different types of diseases for 14 crops. Many of such deep learning methods are web-based applications, which do not consider the end-users’ device leading to degraded performance of the system [5]. A similar study with a dataset containing 7,000 images of 25 different plant categories is reported where multiple DCNN architectures are used and the highest accuracy achieved is 99.53%. The study suggested the use of the model for real-time plant disease identification [6]. Unfortunately, these researchers didn’t investigate the model size complexity and application overhead for real-time plant disease detection. Deep architectures are usually large in size and the inference time is directly dependent on the number of parameters of the architecture. Sometimes a trained model with a reasonably large size is not feasible for mobile-based applications where hardware does not support the execution of the model in real-time [7]. Edge AI refers to the use of AI models for prediction in Edge devices and it is currently an active research area to use the highly efficient larger deep model on Edge devices [8]. The first requirement of Edge AI is model size optimization. The optimized models are smaller in size and suitable to deploy on Edge Devices. Google has some fascinating tools and techniques for model quantization and compression and lite version conversion [9]. This research investigates different models of deep learning for detecting disease fruit and leaf based on available dataset. Two popular quantization techniques are employed: 1) Float16 quantization and 2) Dynamic range quantization. The model performances are verified after quantization and found promising. The empirical investigations show that the optimized models are feasible for use by affordable smartphones with low specifications available in Bangladesh. The primary contributions of this study are: i) the development of deep learning models for fruits and leaves disease diagnosis, ii) optimization of deep learning models, and 3) conversion of the models into optimized TF-lite version applicable to smartphone applications. The rest of the paper is organized as follows: Sect. 2 describes related works, Sect. 3 describes the dataset, Sect. 4 describes the method, Sect. 5 presents the experiments and Sect. 6 concludes the paper with few directions for future work.
2 Related Works There are several studies done using both machine learning and deep learning for Guava disease detection. The generic procedure for detection of plant disease consists of 5 stages: image acquisition, labeling, feature extraction, feature fusion, feature selection, and disease classification. Image acquisition is the first step towards image processing.
Device-Friendly Guava Fruit and Leaf Disease Detection
51
It is done with a high-resolution digital camera followed by labeling for classification. Feature extraction is a very crucial part that refers to the process of transforming raw data (i.e., image) into numerical features that can be used by a machine learning algorithm. The most widely used features are color features and Local Binary Pattern (LBP) [10], Gray Level Co-Occurrence Matrix (GLCM) [11], and Scale Invariant Feature Transform (SIFT) [12]. Image segmentation [13], a process of partitioning the image into multiple segments or set of pixels, is often applied before feature extraction. Deep convolutional neural network (DCNN) is used for disease detection. Bhushanamu et al. [14] used Temporal CNN where, firstly, contour detection is applied to detect the shape of the leaf and Fourier feature descriptor is used as features to the 1D CNN. Mostafa et al. [15] used five different CNN structures to identify different guava diseases such as ResNet-50, ResNet-101, AlexNet, SqueezeNet and GoogLeNet. A CNN based plant disease identification model is developed by Mohanty et al. [5] which can detect 26 diseases of 14 crop species. An attention mechanism is developed by Yu et al. [16] that highlights the leaf area capable of capturing more discriminative features. Jalal et al. [17] used DNN to develop a plant disease detection system for apple leaf diseases where SURF is used for feature extraction and an evolutionary algorithm is used for feature optimization. A leaf disease model based on MobileNet model is developed in [18] and the performance of MobileNet is evaluated and compared with ResNet152 and InceptionV3 models. In all of these studies, deep learning models are developed and trained for plant disease identification but there has been no proper study on the model optimization meaning reduction of model size suitable for end-user device applications.
3 Dataset Guava belongs to the Myrtaceae plant family and it is a common tropical fruit cultivated in many tropical and subtropical regions like Bangladesh, India, Pakistan, Brazil, and Cuba [15]. Guavas are incredibly delicious and rich in antioxidants, vitamin C, potassium, calcium, nicotinic acid, and fiber. The data was gathered in the middle of 2021 by an expert team from Bangladesh Agricultural University from a reasonable size Guava plantation in Bangladesh. A digital SLR camera was used to capture the image, with no preprocessing being used [19]. The dataset includes four prevalent diseases: Red Rus, Scab, Styler end Rot, Phytophthora, as well as disease-free leaves and fruits. A fungus called Phytophthora causes a fruit disease that appears as black blemishes on young fruits. Red Rus is a Guava leaf disease caused by fungus. Different shapes of lesion, e.g. ovoid, corky, and spherical, on the surface of the guava fruits are the signs of scab, which is a fungus. Styler end Rot begins at the styler and spreads towards the root in guava fruits shown in Fig. 1. The dataset comprises two types of datasets: original dataset and augmented dataset. Original dataset contains 681 samples and the augmented dataset contains 8525 samples. Class-wise data distribution is provided in Table 1. The minimum samples are 87 of Red Rus disease and maximum number of samples are 154 belonging to Disease-free (fruit). It can be roughly said that the original dataset is a roughly balanced dataset. The augmented dataset is quite larger than the original dataset and about 10 times larger than
52
R. N. Nandi et al.
the original dataset. The range of the augmented dataset is 1264 to 1626. For the sake of experiments, both datasets original and augmented are combined. Table 1. Disease-wise data distribution for both original and augmented dataset Disease Name
Original Data
Augmented Data
Phytophthora
114
1342
Red Rus
87
1554
Scab
106
1264
Styler end Rot
96
1463
Disease-free (leave)
126
1276
Disease-free (fruit)
154
1626
Fig. 1. Samples of images containing four types of diseases (Red Rust, Styler and Root, Phytopthora, Scan), and Disease-free Leaf, Disease-free Fruit.
Device-Friendly Guava Fruit and Leaf Disease Detection
53
4 Method There are two objectives of this research. The primary objective is to detect the disease from the samples with high accuracy and the secondary objective is to optimize the model without degrading the detection performance. The work comprises two parts: 1) model training and 2) model optimization. The overall architecture of the proposed system is illustrated in Fig. 2. 4.1 Model Training Dataset is firstly divided into the training, validation, and test sets as the source images are not provided in a grouped manner. As part of the image pre-processing, resizing is applied to images, original orientation and colors are kept for the images. No special type of feature extraction is used before using the model. Five prominent image classification models are used. These are VGG-16 [20], GoogleNet [21], Resnet-18 [22], MobileNet-v2 [23], and EfficientNet [24]. The model parameters are described in Table 1. VGG-16 has the highest number of parameters 138,357,544. MobileNet-v2 has 2,230,277 parameters which is the lowest in number among these models. A common experimental setup is used and pretrained models are used for fine-tuning to the Guava dataset. During the model training, the models are tested on both training and validation data, and after the model training, models are saved and validated with the test dataset (Table 2). The parameters of the listed model are large in number and model size grows according to the increase of the parameters which have an inverse impact on inference time, battery consumption and device storage. Hence, model optimization is needed to optimize and compress the model for suitable use in edge devices. 4.2 Model Optimization Model optimization involves different factors: low latency, memory utilization, low power consumption, low cost, lowering the size of the payload for over-the-air model updates. Resources are even more limited on edge devices, such as smartphones and Internet of Things (IoT), therefore model size and computation efficiency become a top priority. Besides this, when using machine learning models, inference efficiency is a significant concern. One possible solution is to enable execution on hardware optimizedfor fixed point operations and preparing optimized models for special purpose hardware accelerators. According to Tensorflow documentation [25], model optimization can be carried out in three ways: Quantization, Pruning and Clustering. The precision of the integers used in representation of the model parameters is decreased by quantization. Model parameters are of 32-bit floating-point values by default. Pruning reduces the model parameters by removing parameters with a minor impact on model prediction. When using clustering, weights of each layer in the model are divided into a predetermined number of clusters, and only the centroid values of each cluster are taken into account. Only the quantization techniques are applied to model optimization in this research. There are mainly four types of quantization techniques: Quantization-aware training,
54
R. N. Nandi et al. Model Training
Processed Image data
Colouring
Plant disease
Orienting
Resizing
Image Acquisition Deep Learning Model Pooling Convolution
Image pre-processing without explicit feature extraction
Image data
Leaf
Quantisation
TF-Lite conversion
Model with float-16 data format Conv Pool
TF-lite model with float-16 data format Conv Pool
Convolution Pooling
Dynamic range quantisation
Conv
Pool
Model with float-8 data format
Conv Pool
Model testing and size calculation
Model with float 32 data format
Float 16 quantisation
Model Optimisation
TF-lite model with float-8 data format
Fig. 2. The overall workflow of our system. Data split to model training and performance analysis, model optimization, and further performance analysis to the optimized model.
Table 2. No of trainable parameters of different models on ImageNet dataset Model
Trainable Parameters
VGG-16
138,357,544
GoogleNet
5,605,029
ResNet-18
11,179,077
Mobilenet-V2
2,230,277
EfficientNet-b2
4,013,953
Post-training float16 quantization, Post-training dynamic range quantization, and Posttraining integer quantization [17]. A summary of features of Post-training quantization techniques is presented in Table 3. Post-training quantization approach is presented in Fig. 3. The model constants, such as weights and bias values, are converted from full precision floating point (32-bit) to
Device-Friendly Guava Fruit and Leaf Disease Detection
55
Table 3. Summary of three types of post-training quantization techniques Technique
Benefits
Hardware
Dynamic Range Quantization
4x-smaller, 2x-3x speedup
CPU
Full Integer Quantization
4x-smaller, 3x + speedup
CPU, Edge TPU, Microcontrollers
Float16 Quantization
2x smaller, GPU acceleration
CPU, GPU
a reduced precision floating-point data type in Float16 quantization, which reduces model sizes by up to 50% (IEEE FP16). Dynamic-range quantization reduces model size up to 75% by applying float-32 to float-8 quantization. Additionally, it employs “dynamicrange” operators that dynamically quantize activations according to their range to 8 bits and carry out computations using 8-bit weights and activations. Full Integer quantization requires calibration of all floating-point tensor ranges to get the max-min values. It requires a representative dataset to get the max-min values of variable tensors like model input, activations and model output [17].
Fig. 3. Three-types of quantization techniques (Float-16 Quantization, Dynamic Range Quantization and Full Integer Quantization) with their operational details.
The quantized models are small in size compared to the original model. Theoretically, their performance should degrade also. To understand the effect of quantization, quantized models are tested with validation data and test data sets. The next step is to compare the quantized and original models according to both size and accuracy. Finally, to balance the trade-off between size and performance, an optimal model is chosen for specific hardware. The selected model is in TF-lite format that can be directly used by Android and iOS applications with their convenient APIs.
56
R. N. Nandi et al.
5 Results and Discussion Experiments have been carried out on all five models with pre-trained models without freezing the layers. Only the decision layer is changed as per the number of classes of guava disease. Stochastic Gradient Descent (SGD) is used with decay LR by a factor of 0.1 every 7 epochs. For evaluation purposes, accuracy, precision, recall, and macro-F1 scores are used. Table 4 shows the accuracy, precision, recall, and macro-f1 score for both validation data and training data. EfficientNet provides 99% accuracy and 100% f1-score which is the overall best result but other models also provide satisfactory results. GoogleNet provides 97% accuracy and f1-score on test data. Table 4. Experimental results (Acc, Pr, Re, marco-F1) on training and testing data using different backbone cnn network Model
Validation data
Test data
Acc
Pr
Re
F1
Acc
Pr
Re
F1
VGG-16
0.97
0.97
0.96
0.96
0.97
0.98
0.97
0.98
GoogleNet
0.96
0.96
0.96
0.96
0.97
0.97
0.97
0.97
ResNet-16
0.96
0.97
0.96
0.96
0.97
0.97
0.97
0.97
MobileNet-v2
0.97
0.98
0.98
0.98
0.98
0.99
0.99
0.99
EfficientNet-b2
0.98
0.99
0.98
0.98
0.99
1
1
1
Table 5 shows the size reduction result after applying two types of quantization method. The VGG-16 model has the original size of 512.3 MB and is reduced to 256.1 MB and 128.01 MB after applying float16 quantization and dynamic range quantization respectively. This shows a reduction of model by 50% for float16 quantization and 75% for dynamic range quantization. For GoogleNet architecture, the original size of the model is 22.6 MB and the reduced size is 0.668 MB and 0.143 MB using float16 and dynamic range quantization respectively. This is about 156% reduction using dynamic range quantization. The current quantized model size is less than 1% of the original size. For ResNet-16 architecture, the reduction is from 44.8 MB to 22.4 MB and 1.7 MB for float16 quantization and dynamic range quantization respectively. This is about 96% reduction using dynamic range quantization. The current quantized model size is only 4% of the original size. For MobileNet-v2 architecture, the original model size is 9.2 MB and the quantized model size is 0.991 MB and 0.188 MB for float16 quantization and dynamic range quantization respectively. It is found that the MobileNet base model is small in size among all models compared to their original size. The reduction is 49 times smaller than the original model using dynamic range quantization. The quantized EfficientNet-b2 models have 8.1 MB and 4.5 MB where the original model size is 16.4 MB. It is shown that the dynamic range quantization method effectively reduces the model size for all five models compared to the Float-16 quantization. From Table 5,
Device-Friendly Guava Fruit and Leaf Disease Detection
57
the quantized GoogleNet model size using dynamic range quantization is 0.143 MB, which is the lowest among all the models. Table 5. Experimental size comparison before and after applying quantization Model
Optimization Method
Previous size (MB)
Optimized size (MB)
VGG-16
Float16 quantization
512.3
256.10
Dynamic range quantization
512.3
128.08
GoogeNet
Float16 quantization Dynamic range quantization
22.60
ResNet-16
Float16 quantization
44.80
22.40
Dynamic range quantization
44.8
1.70
MobileNet-v2 EfficientNet-b2
22.60
0.668 0.143
Float16 quantization
9.20
0.991
Dynamic range quantization
9.20
0.188
Float16 quantization
16.40
8.10
Dynamic range quantization
16.40
4.50
From Table 6, it can be seen that the result doesn’t change so much from the original model to the quantized models. Only 1–2% changes for a few models. VGG-16 model accuracy decreased from 97% to 96% and the F1-score from 98% to 96% for float16 quantization. For GoogleNet, there is no impact on model quantization and the accuracy is 97% for both original and quantized models. The accuracy of ResNet model is 97% and the quantized models have an accuracy of 96% and 95% respectively. For EfficeinetNet model, there is a minor change in accuracy, precision, recall and F1-score. From the performance perspective, EfficientNet model is the better among all quantized models. The reason for getting good performance of all models is probably the data quality is too good and only a few classes are available in the Guava dataset. Therefore, the optimal choice is to use the GoogleNet model using dynamic range quantization when the model size is the priority and if the model size is not an issue, then the EfficientNet model with dynamic quantization would be a good choice. In this case study, model optimization techniques are explored and their impact on performance on storage size and memory are analyzed. It is clear that, if optimization hasn’t been considered, MobileNet is the best choice as it has a size of 9.2 MB which is the lowest but after quantization. GoogleNet model with dynamic range quantization has size of only 0.143 MB and F1-score 0.97 is overall best candidate model from the all models described in Table 6. EfficientNet-b2 with dynamic quantization can be considered for mobile application if its size 4.5 MB is not a big issue as it has F1-score 0.99 which is better than GoogleNet quantized model.
58
R. N. Nandi et al.
Table 6. Experimental results (Acc, Pr, Re, marco-F1) on testing data using different backbone CNN network after applying model optimization and size comparison with original models Model VGG-16
GoogleNet
ResNet-16
MobileNet-v2
EfficeintNet-b2
Test data No optimization
Size
Acc
Pr
Re
F1
512.3
0.97
0.98
0.97
0.98
Float16 quantization
256.10
0.96
0.96
0.96
0.96
Dynamic range quantization
128.08
0.95
0.96
0.96
0.95
22.6
0.97
0.97
0.97
0.97
No optimization Float16 quantization
0.668
0.97
0.97
0.97
0.97
Dynamic range quantization
0.143
0.97
0.97
0.97
0.97
0.97
0.97
0.97
0.97
No optimization
44.8
Float16 quantization
22.4
0.96
0.96
0.96
0.96
Dynamic range quantization
1.70
0.95
0.96
0.95
0.95
No optimization
9.20
0.98
0.99
0.99
0.99
Float16 quantization
0.991
0.96
0.97
0.97
0.96
Dynamic range quantization
0.188
0.96
0.97
0.97
0.96
0.99
1.00
1.00
1.00
No optimization
16.4
Float16 quantization
8.10
0.99
0.99
0.99
0.99
Dynamic range quantization
4.50
0.99
0.99
0.99
0.99
6 Conclusion and Future Work Device end machine prediction is an active research area to overcome the complexity and cost of cloud computing. In this study, it is demonstrated that model optimization is an elegant way to use large-scale neural network models in edge computing. A similar study can be applied to other problems and datasets also. The future plan is to work with large and complex disease datasets not only considering the aspects of theoretical justification but also the real-life product development.
References 1. Pallathadka, H., et al.: Application of machine learning techniques in rice leaf disease detection. Mater. Today: Proc. 51, 2277–2280 (2022) 2. Sharma, A., Jain, A., Gupta, P., Chowdary, V.: Machine learning applications for precision agriculture: a comprehensive review. IEEE Access 9, 4843–4873 (2020) 3. Fan, X., Luo, P., Mu, Y., Zhou, R., Tjahjadi, T., Ren, Y.: Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 196, 106892 (2022) 4. Wick, C.: Deep Learning. Informatik-Spektrum 40(1), 103–107 (2016). https://doi.org/10. 1007/s00287-016-1013-2
Device-Friendly Guava Fruit and Leaf Disease Detection
59
5. Mohanty, S., Hughes, D., Salath´e, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 2016 6. Ferentinos, K.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018(145), 311–318 (2018) 7. Deng, Y.: Deep learning on mobile devices: a review. In: Mobile Multimedia/Image Processing, Security, and Applications 2019. International Society for Optics and Photonics, vol. 10993, p. 109930A (2019) 8. Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge ai: On-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19(1), 447–457 (2019) 9. Verma, G., Gupta, Y., Malik, A.M., Chapman, B.: Performance evaluation of deep learning compilers for edge inference. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp. 858–865 (2021) 10. Almutiry, O., et al.: A novel framework for multi-classification of guava disease. CMC— Comput Mater. Continua 69, 1915–1926 (2021) 11. Kour, H., Chand, L.: Healthy and unhealthy leaf classification using convolution neural network and CSLBP features. Int. J. Eng. Adv. Technol. (IJEAT), 10(1), October 2020 (2020). ISSN: 2249–8958 12. Thilagavathi, M., Abirami, S.: Application of image processing in diagnosing guava leaf diseases. Int. J. Sci. Res. Manage. (IJSRM) 5(07), 5927–5933 (2017) 13. Perumal, P., et al.: Guava leaf disease classification using support vector machine. Turkish J. Comput. Math. Educ. (TURCOMAT) 12(7), 1177–1183 (2021) 14. Bhushanamu, M.B.N., Rao, M.P., Samatha, K.: Plant curl disease detection and classification using active contour and Fourier descriptor. Eur. J. Mol. Clin. Med. 7(5), 1088–1105 (2020) 15. Mostafa, A.M., Kumar, S.A., Meraj, T., Rauf, H.T., Alnuaim, A.A., Alkhayyal, M.A.: Guava disease detection using deep convolutional neural networks: a case study of guava plants. Appl. Sci. 12(1), 239 (2021) 16. Yu, H.-J., Son, C.-H., Lee, D.H.: Apple leaf disease identification through region-of-interestaware deep convolutional neural network. J. Imaging Sci. Technol. 64(2), 20507–20510 (2020) 17. Al-bayati, J.S.H.,¨Ust¨unda˘g B.B.: Evolutionary feature optimization for plant leaf disease detection by deep neural networks. Int. J. Comput. Intell. Syst. 13(1), 12 (2020) 18. Bi, C., Wang, J., Duan, Y., Fu, B., Kang, J.-R., Shi, Y.: Mobilenet based apple leaf diseases identification. Mob. Netw. Appl. 27, 1–9 (2020) 19. Rajbongshi, A., Sazzad, S., Shakil, R., Akter, B., Sara, U.: A comprehensive guava leaves and fruits dataset for guava disease recognition. Data Brief 42, 108174 (2022) 20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 21. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 23. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018) 24. Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, pp. 6105–6114 (2019) 25. Dillon, J.V., et al.: Tensorflow distributions,” arXiv preprint arXiv:1711.10604 (2017)
Cassava Leaf Disease Classification Using Supervised Contrastive Learning Adit Ishraq1 , Sayefa Arafah1 , Sadiya Akter Mim1 , Nusrat Jahan Shammey1 , Firoz Mridha2(B) , and Md. Saifur Rahman1 1
Bangladesh University of Business and Technology, Dhaka, Bangladesh American International University-Bangladesh, Dhaka, Bangladesh [email protected]
2
Abstract. Cassava is a nutty-flavored long bulbaceous starchy root vegetable. It is the principal source of calories and carbs for many people around the world, especially in southern Africa. Cassava production is most common in South Africa because it can survive well in a harsh environment. Sometimes the cassava crop gets affected by leaf disease, which infects its overall production, and reduces the farmers’ income. And manual leaf disease detection may not obtain proper accuracy. In order to detect cassava leaf diseases, the current studies face various challenges such as poor accuracy, low detection rate, and high processing time. In our research, we have used supervised contrastive learning to detect four diseases of the cassava leaf and identify healthy leaves, from which we got tremendous results. We also used data augmentation modules, encoder networks, and projection networks to perform certain tasks such as network training, embedding similar classes nearby and other classes away, image labeling, and so on. From our study, we have achieved an accuracy, precision, recall, and f 1score of 88%, 78%, 79%, and 79%, respectively using supervised contrastive model. Keywords: Cassava Disease · Data Augmentation · Disease Classification · Supervised Contrastive Loss · Deep Learning
1
Introduction
About 800,000 people eat cassava in 80 countries around the world [1]. It is a staple meal consumed by numerous people worldwide as a vegetable [2]. Cassava is a low-priced rich source of carbohydrates. For each acre, cassava can provide more calories than whole grains, making it a surely beneficent crop in developing countries [3]. The leaves of cassava contain protein, vitamins, minerals, and essential amino acids. It helps in digestion and also helps in relieving constipation. Cassava faces various challenges during the harvest, e.g., leaf disease and inferiority. Cassava is vulnerable to the risk of a wide range of diseases caused by viruses. The main problem of cassava leaf disease decreases production, and the c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 60–71, 2023. https://doi.org/10.1007/978-3-031-34619-4_6
Cassava Leaf Disease Classification Using Supervised Contrastive Learning
61
revenue of farmers’ can straightly get affected by it. That is why cassava leaves need to be handled superiorly to improve disease diagnosis and production capacity. Early recognition of leaf disease helps in rescuing species before planting can be permanently infected. Without proper diagnosis and causative agents, disease control measures can be a waste of time and money and cause further damage to plants [4]. Therefore, various strategies can play an important role in accurately detecting leaf disease in a short period during the growth of the plant and also fungal infections. The possibility of spreading the virus through planting material is further increased because none of the farmers were aware that planting material could be the source of further virus transmission in the soil ground [5]. The symptoms can often be misleading because of the wide variety of diseases. So the manual diagnosis can be very time-consuming, inefficient, or incorrect. This can hinder the overall production of Cassava. A technology that can efficiently diagnose the deceases at an early state with incredible accuracy should be introduced. Artificial intelligence applications have achieved huge success [6]. With the advancement of computer vision and machine learning, We used supervised contrastive learning to detect cassava leaf diseases which actually distinguishes between similar data and dissimilar data. Supervised contrastive learning can improve the accuracy and robustness of classifiers with minimal complexity. The overall contribution of our study can be summarized as, – This study presents a full overview of four types of cassava leaf disease detection. – This study presents a detailed description of supervised contrastive learning algorithm and explains how the model can upgrade Cassava leaf disease detection. – The suggested model is evaluated using different performance metrics and compared with some existing studies. – Data augmentation have been used for getting the best results while training linear classifiers. Rest of the study is organized as follows: Sect. 2 represents the related work; Sect. 3 indicates the dataset. Section 4 covers the overall architectural methods related to our study. Section 5 explains the proposed model’s implementation, results and comparisons. In the end, Sect. 6 concludes the study.
2
Related Work
Cassava is a staple vegetable that is widely eaten in many parts of the world, especially in Africa and Thailand. Due to its ability to withstand harsh growing conditions, it grows in tropical regions of the world. These Cassava crops are exposed to many diseases and infections such as leaf blight and fungal infections throughout their life cycle, which reduces their productivity. For researchers, the main challenge is to identify the Cassava leaf diseases accurately during the early stages. Various studies have been accomplished to improve the accuracy of
62
A. Ishraq et al.
Cassava leaf diagnosis. In this section, we have an overview of existing research papers on this particular topic. Kiran Rao [7] proposed the Separable U-Net architecture for the Cassava Bacterial Blight Disease and Cassava Mosaic Disease detection, which resulted in accuracies of 63.6% and 83.9% respectively. In terms of accuracy, their model achieved comparable performance using Separable UNet, Dice co-efficient, and also the mean IOU, which eventually became more efficient than the original U-Net model. Ravi [8] suggested an approach that can be considered a deployable tool for the classification of Cassava leaf diseases in agriculture. To do so, the author has used pre-trained CNN-based EfficientNet models for identifying and locating the infected tiny regions of the cassava leaf. Alongside the macro recall, macro precision, and macro F1 Score results being 0.69, 0.73, and 0.70, respectively, the EfficientNetB4 models exceeded other EfficientNet models. But this contemplation carries a highly inconsistent data set, and the proposed process is sensitive toward this kind of data. Based on Enhanced CNN models (ECNN), Lilhore [9] gave an extensive learning technique for real-time identification of Cassava leaf disease. For the stabilized dataset, the proposed ECNN classifier impressively exceeded and attained an accuracy of 99.3%. On the other hand, the model wasn’t capable enough in respect of other disease classes and data size. Oyewola [10] proposed a novel Deep Residual Convolution Neural Network (DRNN) for detecting Cassava Mosaic Disease in cassava leaf images with distinct block processing. Besides, for our predictive model, the DRNN model has generated the finest outputs and attains the 96.75% accuracy for the Cassava Disease Dataset of Kaggle. Though the method has favorable results, it also has some drawbacks as all Deep Learning-based methods are aptitude to overfit with training datasets, which stops them from generalizing. Also, for repugnant photographing conditions, the image enhancement using gamma correction could not be the ideal method all time. Ayu [11] has developed an intelligence system to detect Cassava leaf disease where MovileNetV2 has been used to create it and a Python graphical user interface (GUI) to display it. The study proposes to identify five disease classes called Cassava Green Mite (CGM), Cassava Bacterial Blight (CBB), cassava mosaic disease (CMD), Cassava Brown Steak Disease (CBSD), and healthy. And the accuracy obtained from the test data was 65.6%. Here accuracy is not so good. Sambasivam [12] proposed a CNN (Convulsive Neural Network) model with a very small dataset to detect cassava leaf disease with the challenge of achieving high accuracy. The study proposed to identify cassava leaf disease, but since the dataset was very imbalanced, the identification became very biased towards cassava brown streak virus disease and also the Cassava mosaic disease classes. The overall accuracy was achieved 93%, but they couldn’t introduce any mobile application for the detection.
Cassava Leaf Disease Classification Using Supervised Contrastive Learning
63
Sangamrang [13] has proposed a method that automatically classifies unhealthy infected cassava and healthy cassava. Here deep learning has been used to introduce a novel method of classifying cassava disease. More specifically, CNN has been used for detection. Among the many diseases here, the focus was only on Cassava Brown Streak Virus Disease (CBSD). Diagnosis accuracy was achieved at 96%.In this study, they were only working on this particular disease though there are many more disease classes of cassava leaves. Emuoyibofarhe [14] has developed a trained machine learning system with Cubic Support Vector Machine Model(CSVM) with the object of health diagnosis. They also used the Coarse Gaussian Support Vector Machine (CGSVM) that detects cassava mosaic disease (CMD) and cassava bacterial bright disease (CBBD). Its accuracy is 83.9% and 61.8%, respectively. But the accuracy of the models is not so good, and they require a large sample. Surya [15] proposed a method of diagnosing cassava leaves using a Convolutional neural network(CNN), Tensorflow Package, In Google Collabs, and MobileNet V2 Architecture. They have used the ReLu Activation function, and the Classifier function that has been used is Softmax. They also used Categorical cross-entropy as the Loss Function. The total accuracy for the training process was 0.8538 and for the Validation process was 0.7496. In the validation process, total accuracy is less than 80%, which needs to be improved. Metlek [16] has proposed disease detection methods in Deep Learning through the MobileNetV2 DL algorithms and ResNet 50 from identified areas. They also have used the K-nearest capture algorithm and support vector machine in order to classify Extracted properties. Average maximum success rate achieved from ResNet50 architecture and SVM classifier. Resulted value of the training and testing process was 85.4% and 84.4%, respectively.But the model was not able to use more data sets.
3
Dataset
A dataset from the study competition of Kaggle was used for the classification of cassava leaf disease during working on this model. There are five classes of datasets; among them, four have been used for four diseases and one for healthy leaves. In this model, the main goal is to learn to classify given images into Five classes. Local farmers in Uganda have collected these pictures from their gardens. Among the four diseases in this dataset are Cassava Mosaic Disease (CMD), Cassava bacteria Blight (CBB), Cassava Green Mite (CGM), and Cassava Brown Streak Disease (CBSD). More than 22,031 cassava leaf image datasets were included. Here we have used 5656 images to train the model.
4
Methodology
As the proposed concept of supervised contrastive learning is quite simple, the method is implemented and applied to detect Cassava leaf diseases. We know
64
A. Ishraq et al.
Fig. 1. This figure illustrates four types of diseases that the Cassava leaf contains. Each row presents a set of disease conditions for Cassava leaves.
that supervised learning enables to map the normalized encoding of instances of the same classification closer and farther away from other classifications of instances (Fig. 1). The neural network alters the image into a representation and then uses this representation to predict the outcome. So it will be uncomplicated for the classifier to give the correct result using this idea. Firstly the contrastive loss is used to train the network. The images are encoded in order to perform the task of embedding similar classes closer and other classes afar. Image labels are also operated. This phase consists of three components, namely Data Augmentation Module, Encoder Network, and Projection Network. These elements are described separately underneath. Secondly, the encoder network used in the previous state becomes frozen, and the projection network is abandoned. In order to learn a classifier, the representation assimilated from the encoder network is used, which can be considered as a linear layer. We also used the cross-entropy loss to forecast the labels. 4.1
Representation Learning Framework
– Data Augmentation Module: In this module, the input images get transformed into augmented images. With various augmentation principles, two images are augmented for each particular image (Fig. 2).
Cassava Leaf Disease Classification Using Supervised Contrastive Learning Stage 1
Encoder
Dense
Relu
Encoder
Augmented Image
Base Encoder
Dense
Relu
Stage 2
Dense
Contrastive Loss Function
Original Image
65
Image Embedding
Classifer
Softmax Loss Function
Output
Dense
Projection Head
Fig. 2. The supervised contrastive loss learns expressions operating a contrastive loss but uses label data to sample positives in addition to augmentations of the same picture. Both contrastive techniques can have an optional second phase which trains a model on top of the learned expressions.
• To acquire the main augmented image, it haphazardly trims the original image and resizes it to the actual size of the input image. • To acquire the second augmented picture there are three different options. They are, AutoAugment, RandAugment and SimAugment ( the augmentation scheme proposed in SimCLR). The best results were found when we used the same data augmentation principle as in stage 2 when training linear classifiers. For every individual image, two different augmented images can be found. This means it will return 2N augmented images for each N image. Encoder Network In Encoder Network, the image simply gets converted into a representation vector. Using headless ResNet-50 and ResNet-200 as base models for encoder networks, the authors have found some excellent results. The augmented images of the input image are sent separately to the same encoder, which was obtained from the data augmentation module, that outputs two representation vectors. Here, two representations for each input image mean that the outputs are normalized values. Projection Network Conversion of the representation vectors into a vector that is compatible with contrastive loss calculation is the Projection network. We have used a multi-layer perceptron and an output vector whose size DP = 128, along with an individual hidden layer whose size is 2048. In this network, those encoded vectors are fed, which were obtained as an output from the encoder network. The output we got from this network, i.e., At first, normalized the projection vector and after that, used it in the loss function. When the supervised contraceptive loss function receives the output vector of this projection network, the loss is calculated and minimized.
66
A. Ishraq et al.
4.2
Projection Head
The projection is mainly responsible for the projection of the output, which is projecting the output of the encoders and setting a level on a smaller scale. 4.3
Classifier Head
In training, the classifier head is used as the second optional phase. We can remove the projection head when we complete the SCL phase. After removing the projection head, we can add the classifier head to the encoder and adjust the model with regular cross-entropy loss. 4.4
Supervised Contrastive Learning Loss
Supervised Contrastive Loss is an alternative loss function that is said to be an alternative to the loss function when it exceeds entropy. Here only parameter is temperature, and the default value is 0.1. But it can be changed. Low temperatures can benefit from long training, but high temperatures can make classes more different. And it is the execution of the SCL loss code. ιsup =
2N i=1
ι
sup
=
2N i=1
sup ιsup i ιi
ιsup i
(1)
2N
−1 exp(zi · zj /τ ) = Ii=j · Iy˜i = y˜j · log 2N 2Ny˜i − 1 j=1 k=1 Ii=k · exp(zi · zj /τ ) (2)
where, N is the number of sample images. Which is haphazard in a mini-batch.And we will get 2N images when N images pass the model stage 1. i indicates the index of an arbitrarily augmented image in a mini-batch. j specifies the index of other augmented images. K is the index of other images apart from Xi and Xj. τ is a positive scalar temperature parameter. The total number of images that have the same label y is Ny . Zi and Zj are the projected vectors for the same images, and Zk is for any. i.e. Zi = P (E(Xi)).
Cassava Leaf Disease Classification Using Supervised Contrastive Learning
67
Between the normalized vectors Zi*Zj computes an inner (dot) product. If B is true, then 1B is 1; otherwise, 0. And each z is normalized 128 dimensional vector. The numerator exp(zi ∗ zj /T ) denotes all cassava leaves in a batch. Than the log probability have been taken and sum that over all cassava leaves images in one batch except itself and divide by 2 ∗ n − 1. 4.5
First Stage Training
This phase of the training is completed using supervised contrastive Learning Loss with Encoder and Projection Head. 4.6
Second Stage Training (Encoder + Classifier Head)
In the second phase of training, we can conduct regular cross-entropy losses and train the model as usual. Here we remove the projection head and add the head classifier to the top of the encoder. And now that has been considered.
5
Evaluation
At first, we clarified the evaluation metrics. Then the observed setup is described. At last, the assessment is presented with a comprehensive analysis. 5.1
Evaluation Metric
Depending on the Confusion matrix, we used the accuracy, precision, and retraction evaluation metrics. A Confusion matrix summaries the results of prediction that estimate Machine learning, including the problem of deep learning classification, contains four measures: true positive (TP), true negative (TN), false positive (FP), and False Negative (FN). Architectural performance is then evaluated by these measurements. Precision: The proportion of the quantity of accurately classified positive samples to the absolute quantity of positively classified samples is regarded as precision. It reflects how well-grounded the model is in classifying samples as positive. The formula can be stated as follows: P recision =
TP TP + FP
(3)
Recall: The portion of positive samples accurately classified as positive is divided by the total amount of positive instances to compute the recall can be represented as the equation for recall:
68
A. Ishraq et al.
Recall =
TP TP + FN
(4)
f 1score The harmonic average of precision and recall, which carries into account both and integrates the precision with a single number of recalls operating the following instructions, is named the F1-score: F 1 − score = 2
P recision ∗ Recall P recision + Recall
(5)
The range of F1-score is 0 to 1. It can also be stated that the closer it is to 1, the better the model is. 5.2
Experimental Setup
When we require data collection, pre-processing, testing, and evaluation of any model, we ought to use Python, and here we use Python programming.Keras is used to complete the neural network architecture and implement the deep learning model.It has implemented the Adam [17] optimization function to train the dataset model. And the percentage of the total learning rate is 0.1%. For basic mathematical operations here, we use Numpy. We conduct to produce the GPU performance of the neural network by TensorFlow and also use ImageNet weights for each of the models. The size of our leaf image is 512 × 512 × 3. Here our dataset is divided into three parts train, test, and validity, and their partition part is 50% 25% and 25% respectively. During the training, here, we use the validation dataset to estimate the quality of the deep learning model, and the test data is calculated as the concluding dataset. 5.3
Evaluation and Comparison
In order to confirm the performance of the Supervised Contrastive model for the classification, we have compared Supervised Contrastive with other models using the same test environment and data set, such as DenseNet121, VGG16, InceptionV3[25], and ResNet50. The results for the other models are given in the table below. From the given Table 1, the evaluation of the applied three classic models’ precision, recall, F1-score, and also accuracy are issued. Given DenseNet121,The precision, F1-score, recall, and accuracy values are 0.67, 0.53, 0.53, and 0.70.For EfficientNetB7, the values of recall, F1-score, accuracy, and precision is 0.69,0.69 0.72 and 0.72. The nceptionV3 [11] model recall 0.43 , the F1-score is 0.46, precision is 0.55, and the total accuracy is 0.50.The MobileNetV2 [18] model F1-score is 0.65, recall 0.65 , the total accuracy is 0.68, and precision is 0.66. The ResNet50 model the F1-score is 0.59, precision is 0.62, accuracy is 0.68, and recall 0.59. The VGG16 model recall is 0.27, the F1-score is 0.32, accuracy is 0.32, and precision is 0.35. The VGG19 model recall is 0.43, the precision is 0.50, the F1 score is 0.43, and the absolute accuracy is 0.54. After that, recall and F1-score are 0.69, precision is 0.70, and accuracy is 0.79 for the efficiency of
Cassava Leaf Disease Classification Using Supervised Contrastive Learning
69
Table 1. The table demonstrates our work’s accuracy, precision, recall & f1-score values of our assignment and popular architectures. Model
Precision Recall F 1score Accuracy
DenseNet121 [8]
0.67
0.53
0.53
0.70
EfficientNetB7 [8]
0.72
0.69
0.69
0.72
InceptionV3 [11]
0.55
0.46
0.43
0.50
MobileNetV2 [18]
0.66
0.65
0.65
0.68
ResNet50 [8]
0.62
0.59
0.59
0.68
VGG16 [8]
0.35
0.32
0.27
0.32
VGG19 [8]
0.50
0.43
0.43
0.54
Xception [8]
0.70
0.69
0.69
0.79
Supervised Contrastive
0.78
0.79
0.79
0.88
Xception. Eventually, the supervised contrastive model precision version is 0.78, the F1-score is 0.79, and the recall is 0.79. This study has the highest accuracy for the disease detection of cassava leaf images, with an accuracy of 88%, which denotes the acceptability of our proposed method. This study can easily exceed the other four, in the case of the F1 score, precision, or recall.
6
Conclusion
Cassava leaf disease detection is a very prominent field of research. Supervised contrastive Learning (SCL) model has been developed in this study to identify four diseases of cassava leave and also identify the healthy leaves. Here data augmentation modules, encoder networks, and projection networks have also been used to perform certain tasks such as network training, embedding similar classes nearby and other classes away, and image labeling. We get the best results when we have used the same data augmentation principle as in stage 2 when training linear classifiers. In the following, we evaluated the architecture of 4 different diseases of cassava leaves and identified healthy leaves. Our proposed architecture is convincing by achieving futuristic performance with the object of detecting any condition of cassava leaves and identifying healthy leaves have achieved accuracy of precision, recall, and f 1score are 78%, 79%, and 79%, respectively. In this study, the used dataset is highly imbalanced and the recommended approach is sensitive to imbalanced data. The imbalanced data can be controlled by modifying the proposed model, which can be considered as future work. Cassava diseases are equivalent to each other. Besides, the different concepts of fine-grained image classification can be employed, which can help to reduce the proposed model’s misclassification rate.
70
A. Ishraq et al.
References 1. Oyewole, O.B.: Cassava processing in Africa. In: Application of Biotechnology to Traditional Fermented Foods. Report of an Ad Hoc Panel of the Board on Science and Technology for International Development, USA, National Research Council, pp. 89–92 (1992) 2. Li, S., Cui, Y., Zhou, Y., Luo, Z., Liu, J., Zhao, M.: The industrial applications of cassava: current status, opportunities and prospects. J. Sci. Food Agric. 97(8), 2282–2290 (2017) 3. Zhao, P., et al.: Analysis of different strategies adapted by two cassava cultivars in response to drought stress: ensuring survival or continuing growth. J. Exp. Bot. 66(5), 1477–1488 (2015) 4. Kabir, M.M., Ohi, A.Q., Mridha, M.F.: A multi-plant disease diagnosis method using convolutional neural network. In: Uddin, M.S., Bansal, J.C. (eds.) Computer Vision and Machine Learning in Agriculture. AIS, pp. 99–111. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6424-0 7 5. Prodeep, A.R., Hoque, A.M., Kabir, M.M., Rahman, M.S., Mridha, M.F.: Plant disease identification from leaf images using deep CNN’S efficientnet. In: 2022 International Conference on Decision Aid Sciences and Applications (DASA), pp. 523–527. IEEE (2022) 6. Jani, R., Shanto, M.S.I., Kabir, M.M., Rahman, M.S., Mridha, M.F.: Heart disease prediction and analysis using ensemble architecture. In: 2022 International Conference on Decision Aid Sciences and Applications (DASA), pp. 1386–1390. IEEE (2022) 7. Rao, P.K., et al.: Cassava leaf disease classification using separable convolutions UNet. Turk. J. Comput. Math. Educ. (TURCOMAT) 12(7), 140–145 (2021) 8. Ravi, V., Acharya, V., Pham, T.D.: Attention deep learning-based large-scale learning classifier for cassava leaf disease classification. Expert Syst. 39(2), e12862 (2022) 9. Lilhore, U.K., et al.: Enhanced convolutional neural network model for cassava leaf disease identification and classification. Mathematics 10(4), 580 (2022) 10. Oyewola, D.O., Dada, E.G., Misra, S., Damaˇseviˇcius, R.: Detecting cassava mosaic disease using a deep residual convolutional neural network with distinct block processing. PeerJ Comput. Sci. 7, e352 (2021) 11. Ayu, H.R., Surtono, A., Apriyanto, D.K.: Deep learning for detection cassava leaf disease. In: Journal of Physics: Conference Series, vol. 1751, pp. 012072. IOP Publishing (2021) 12. Sambasivam, G., Opiyo, G.D.: A predictive machine learning application in agriculture: cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egypt. Informat. J. 22(1), 27–34 (2021) 13. Sangbamrung, I., Praneetpholkrang, P., Kanjanawattana, S.: A novel automatic method for cassava disease classification using deep learning. J. Adv. Inf. Technol. 11(4), 241–248 (2020) 14. Emuoyibofarhe, O., Emuoyibofarhe, J.O., Adebayo, S., Ayandiji, A., Demeji, O., James, O.: Detection and classification of cassava diseases using machine learning. Int. J. Comput. Sci. Soft. Eng. (IJCSSE) 8(7), 166–176 (2019) 15. Surya, R., Gautama, E.: Cassava leaf disease detection using convolutional neural networks. In: 2020 6th International Conference on Science in Information Technology (ICSITech), pp. 97–102. IEEE (2020)
Cassava Leaf Disease Classification Using Supervised Contrastive Learning
71
16. Metlek, S.: Disease detection from cassava leaf images with deep learning methods in web environment. Int. J. 3D Print. Technol. Digit. Ind. 5(3), 625–644 (2021) 17. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 18. Ramcharan, A., Baranowski, K., McCloskey, P., Ahmed, B., Legg, J., Hughes, D.P.: Deep learning for image-based cassava disease detection. Front. Plant Sci. 8, 1852 (2017)
Diabetes Mellitus Prediction Using Transfer Learning Md Ifraham Iqbal1(B) , Ahmed Shabab Noor2 , and Ahmed Rafi Hasan2 1
2
Department of Data Science, Friedrich Alexander University of Erlangen, Erlangen, Germany [email protected] Department of Computer Science and Engineering, United International University (UIU), Madani Avenue, Badda, Dhaka, Bangladesh {anoor193024,ahasan191131}@bscse.uiu.ac.bd Abstract. Over 25% of the elderly population suffer from diabetes. Diabetes has no cure but an early diagnosis can assist in reducing its effects. Previously, Machine Learning has proven to be effective for diabetes prediction. However, in the literature, barely any methods used the high learning capacities of Deep Learning (DL) techniques for diabetes prediction. Hence, in this study, we have proposed methods for diabetes diagnosis using Deep Learning (DL). All the attributes in the Pima Indian Diabetes Dataset (PIDD) are crucial for diabetes diagnosis. Since medical data is sensitive, the requirement of a non-biased classifier is of utmost importance. Thus, the goal of this study is to create an intelligent model that can predict the presence of diabetes without using dimensionality reduction techniques. A 4-layered Neural Network (NN) model was used where the hidden layers consist of 64 neurons. Testing and evaluation demonstrated that the model achieves an accuracy of 93.33% on the PIDD. Alongside this, the data was also converted into an image dataset to apply transfer learning to the PIDD dataset. The obtained results are significantly better than the ones obtained via experiments from previous studies. The CNN models produce 100% accuracy scores on the PIDD. The study proves that CNNs can be successful when being used on small medical datasets. Based on the results, we can also conclude that the proposed system can effectively diagnose diabetes. Keywords: Diabetes Prediction · Convolutional Neural Networks · Tabular data to Image · Feature Analysis · Classification · Tabular Convolution · Transfer Learning · Neural Networks
1
Introduction
A study in the United States has shown that more than 25.6 million people over the age of 20 (12.6 million are women) and more than 1 in 4 people over the age of 65 have been diagnosed with diabetes [6]. In the long term, diabetes c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 72–83, 2023. https://doi.org/10.1007/978-3-031-34619-4_7
Diabetes Mellitus Prediction Using Transfer Learning
73
can lead to eye diseases (i.e., glaucoma, retinopathy, etc., kidney damage, nerve damage, etc. However, the most significant issue is that diabetes can also lead to cardiovascular diseases (CVD) [16] which is the leading cause of death in the US [5]. Another study concluded that in 2017 alone, diabetes diagnosis led to an expenditure of over 327 billion dollars in the US [35]. Diabetes has no cure, but research has shown that diabetes occurs due to the combination of daily lifestyle and the genes in a human being [23]. However, diabetes can be avoided with comprehensive lifestyle management regularly. However, in the US, more than 1 in 3 people have pre-diabetes which can lead to diabetes in the future, and the majority do not even know they suffer from this condition. Therefore, why will they change their lifestyle if they are unaware that they are developing risks of diabetes? Hence, our goal in this project is to build a system that will predict whether a person will have diabetes or not in the future. Technology has advanced dramatically during the last few decades. Technology is being used in robotics [32], home assistance [14,15], smart city applications [11], etc. With the invention of Machine Learning(ML), technology also plays a huge role in the medical sector. Researchers have already used machine learning to forecast CVD risks [25], growth of COVID-19 cases [13,22], etc. In this study, we will discuss a Deep Learning (DL) technique we have developed for diabetes diagnosis. The models give out positive or negative classification results so that the individuals can reduce the impact of diabetes later on in their lives. Furthermore, the dataset is converted into images to apply CNNs to this dataset. Finally, multiple CNN models are applied to the PIDD using the transfer learning approach. The CNNs perform significantly better than the existing literature, achieving sensitivity and specificity scores exceeding 0.90. Furthermore, the results suggest that DL Models can be used in small datasets. Section 2 looks at the previous studies on similar topics. In Sect. 3 we present the methods we have used in this study. Section 4 displays the results we have obtained and our discussion subsection. Finally, Sect. 5 concludes the study.
2
Background Study
Some researchers have already attempted to implement ML models to make an early prediction of diabetes. Jarullah et al. [1] used Decision Tree (DT) using the WEKA software. The DT accurately predicted 78.18% of the instances with specificity and sensitivity of 0.82 and 0.72, respectively. Srivanesan et al. [29], used the J48 DT on the dataset. However, the results did not improve with an accuracy of only 76.58% and specificity and sensitivity of 0.62 and 0.86. The J48 DT was biased in this case. Kumari et al. [21] used a Support Vector Machine (SVM) on the Pima Indian Diabetes dataset. They achieved an accuracy of 78% with sensitivity and specificity scores of 0.80 and 0.77, respectively. Wei et al. [34] applied several ML models and found that deep neural network (DNN) achieved the best performance (i.e., accuracy = 77.86%).
74
Md. I. Iqbal et al.
However, none of these works have focused on feature selection; hence, their performances are not up to the mark. Kaur et al. [17] used the Boruta Wrapper algorithm (BWA) for feature selection and then used five different ML methods for classification. Linear Kernel SVM gave the best results with an accuracy of 89% and an AUC score of 0.90. Calisir et al. [3] introduced a Morlet Wavelet Support Vector Machine Classifier (MWSVM). They used Linear Discriminant Analysis (LDA) for feature reduction, and afterward, classification was done using the MWSVM classifier. The classifier achieved accuracy, specificity, and sensitivity scores of 89.74%, 93.75, and 83.33, respectively. Erkaymaz et al. [7] introduced a Small-World Feed Forward Artificial Neural Network (SWFFANN) where they constructed the Small World network using Watts-Strogatz approach. This method achieved an accuracy of over 90% with an incredible specificity score of 0.9615 and a respectable sensitivity score of 0.85. However, this method is costly computationally. On the other hand, Nilashi et al. [24] introduced a novel classification method. For clustering the data, they used Self Organizing Maps (SOM). Then they applied Principal Component Analysis (PCA) for dimensionality reduction and noise removal. The data is then fed into a Neural Network (NN) for classification. 10 fold cross-validation is also used in this process. This combination of SOM-PCA-NN achieves a remarkable accuracy of 92.28%. A recent study [4] applied Naive Bayes but failed to provide results similar to the studies mentioned above. Another study [2] used Long Short Term Memory and Neural Networks but failed to provide 90% accuracy scores. Finally, the study in [26] compares the performance of ML models in the PIDD, and it was found that the NN performed the best and achieved an accuracy of over 97%. Based on this information, we are motivated to use the NN for our study. However, we failed to achieve similar levels of performance with the Neural Network, and so a CNN-based approach is introduced in this study. From previous research, it can be observed that all the features available in the dataset are associated with having diabetes. Hence in our proposed method, we will try to achieve maximum precision without removing any features. Our objective is to introduce a method that will perform even better than the above in diagnosing diabetes in women of Pima Indian heritage.
3
Methodology
In this section, we talk about the methods we used to achieve high accuracy in the PIDD dataset. For data analysis and model design we used pandas and Pytorch in python. 3.1
The Dataset
We have used the Pima Indian Diabetes Dataset (PIDD) for this study. This dataset is from the National Institute of Diabetes and Digestive and Kidney Diseases. In 1978, Knowler et al. [19] conducted a longitudinal study on the Pima Indian residents from Arizona. The dataset contained only data from females of
Diabetes Mellitus Prediction Using Transfer Learning
75
Table 1. The Pima Indians Diabetes Dataset Attribute Name
Details
Values
Pregnancies
Number of pregnancies
Numerical
Glucose
Plasma Glucose Concentration
Numerical
Blood Pressure
Diastolic Blood Pressure (mmHg)
Numerical
Skin Thickness
Triceps skin fold thickness (mm)
Numerical
Insulin
Serum of Insulin (muU/ml)
Numerical
BMI
Body Mass Index (kg/m2 )
Numerical
Diabetes Pedigree Function
History of diabetes in relatives and family members
Numerical
Age
Age of the individual in years
Numerical
Outcome
Whether the individual has diabetes or not
0 = False 1 = True
age 21 and above. In total, the PIDD has 9 columns. Details of these columns are given in Table 1. The Outcome column represents whether a person has been diagnosed with diabetes later in their lives or not. In this dataset, a total of 768 individuals have been tested, of whom 268 individuals tested positive for diabetes. 3.2
Exploratory Data Analysis
Firstly, we have checked for the distribution and skewness of the data. From Table 2, it can be observed that the Blood Pressure, Insulin, Diabetes Pedigree Function and Age are highly skewed and do not follow a normal distribution. The Pregnancies column is moderately skewed. The Glucose, Skin Thickness and BMI columns are moderately skewed. Next, we used Pearson Correlation of the features amongst one another. This has been shown in Fig. 1. It can be observed that with the class column, Outcome most of the features are correlated (Correlation value > 0.05). Additionally, it can be observed that none of the features are highly co-linear. From this, it can be concluded that all the features are relevant for diabetes. 3.3
Data Transforming
PIDD has negative values, and thus, we used Standard Scaler to transform the dataset. After using Standard Scaler, the mean value is 0, and the standard deviation is 1. The formula for Standard Scaler is given in Eq. 1. z=
xi + μ σ
(1)
76
Md. I. Iqbal et al. Table 2. Distribution of the Data Attribute Name
Skewness Value Distribution
Pregnancies
0.90
Moderately skewed
Glucose
0.17
Symmetric
Blood Pressure
-1.84
Highly skewed
Skin Thickness
0.11
Symmetric
Insulin
2.27
Highly skewed
BMI
0.43
Symmetric
Diabetes Pedigree Function 1.92
Highly skewed
Age
Highly skewed
1.13
Fig. 1. Correlation of the Columns in PIDD
Diabetes Mellitus Prediction Using Transfer Learning
3.4
77
Design of Our Model
Neural Networks were designed to help computers could analyze in ways a human brain does to solve complex problems. Deep Learning models are a key part of modern society due to their ability to learn complex patterns amongst the data. As we will be using all the features available in the dataset without any feature selection, finding complex patterns and hidden relations in the data was vital. We split the dataset into training and testing datasets. 95% of the data was used for training, while the rest was used for testing and evaluating the dataset. The training set was further divided into 64 batches and fed into the network one by one.
Fig. 2. Model Architecture
We have designed a neural network with 4 layers for predicting whether a person will have diabetes or not. The Rectifier Linear Unit (ReLU) has been used as the activation function. Dropout(DP) is used to ensure no overfitting occurs. And, batch Normalization (BN) has also been used twice in our network. Network Architecture. The first layer, or the input layer of our network, consists of 8 neurons. They take the value of the 8 input features. These values are then forwarded to our first hidden layer (H1), which consists of 64 neurons. After completing the computation, the second layer forwards a value to our third layer. This is the second hidden layer (H2) of our model. This layer consists of 64 neurons. After computation, the results are forwarded to the output layer. The output layer contains 0 or 1, indicating whether diabetes is present or not in an individual. The network architecture has been shown in Fig. 2. Activation Function. Activation functions determine the activity of a neuron in a network. Additionally, activation functions are also essential in determining the speed at which a neural network is trained. Back-propagation is used
78
Md. I. Iqbal et al.
to train neural networks. Back-propagation requires a tremendous amount of computation, and hence, activation functions must be computationally efficient. In our network, we used ReLU activation. ReLU has a linear scale for positive values and a value of 0 for negative instances. Due to this linear scale, ReLu is least affected by vanishing gradient compared to other activation functions. Furthermore, due to the linear scale, it is cost-effective, and networks using ReLU can converge rapidly. ReLU has a derivative function that allows it to be used in back-propagation. In the final layer of the Neural Network, a Softmax function with crossentropy loss is applied. Optimizer and Criterion. Optimizers are used to edit the weight and learning rate of the network to reduce the loss. In the proposed system, we used the Adaptive Moment Estimation case (Adam) [18] optimizer. The Adam optimizer uses momentums of first and second order. Adam calculates the exponentially decaying and the squared gradient of previous gradients of the Mean of the second moment. Adam optimizer uses Eq. 2 to update the parameter. Binary Cross-Entropy (BCE) is our experiment’s loss function. We also use Logit Loss as our criterion. Logit Loss combines the Sigmoid layer and BCE for numerical stability. η mt (2) θt+1 = θt − √ vt + Batch Normalization. Batch normalization (BN) is used to standardize the outputs of a layer in terms of their batches [10]. BN fastens the training process and makes the network more robust. BN also tackles vanishing gradient and exploding gradient issues. We use BN on the outputs of H1 and H2. Dropout. Dropout(DP) is a method in which neurons are turned off randomly during training for each iteration [30]. This is because neural networks tend to overfit on a lot of occasions as nodes adapt too much to the training data. Randomly turning them off solves this issue. We used a DP probability of 0.1 for each node. Learning Rate. Learning Rate (LR) determines and minimizes the loss after every iteration. LR is usually a small positive value as too high of an LR may lead to the network missing out on the minimum loss. However, if LR is too low, it takes a significant amount of time and more epochs to reach the minima. Hence, we have used an LR value of 0.0002 and 100 epochs to train our network. As a result, the LR decreases exponentially as it approaches the local minima. 3.5
Application of Convolutional Neural Networks
Recently, several researchers have studied the effectiveness of the Convolutional Neural Networks (CNN) on image data. However, their applications on tabular medical data have remained unexplored for a long period. In recent times,
Diabetes Mellitus Prediction Using Transfer Learning
79
methods like the SuperTML [31], DeepInsight [28] and DWTM [12] have opened the avenues for applying Convolutional Neural Networks on tabular data. The DWTM was preferred as it uses feature importance to convert the tabular data into an image dataset. The same statistical technique, Pearson Correlation which was used earlier in this study to find the relevance of features, was used in the DWTM for creating the image dataset. Figure 3 shows the image of an instance created using the DWTM. Here, it can be observed that the font size varies. The font size is directly proportional to the correlation between the class and the feature. This enables the CNNs to focus more on the more significant features, which boosts their performances.
Fig. 3. An instance created using the DWTM
Various state-of-the-art CNN models have been developed in recent times using the ImageNet dataset. Architectures like VGG, ResNet, and Inception are quite popular. However, due to the success in image classification by ResNet [8], DenseNet [9] and RegNet [27] they are applied in this study. The transfer learning approach is used to apply these models on the PIDD. Pytorch has the pre-trained models of these CNN architectures available. The model is further trained for up to 10 epochs using the 80% training data of PIDD. This enables the CNNs to update their weights and fine-tune themselves to predict diabetes in the patients successfully.
80
4
Md. I. Iqbal et al.
Results and Discussion
In this section, we have discussed the results obtained using our approach and the overall impact of our study. Accuracy, Sensitivity, and Specificity are the measures used for evaluating the performance of the Deep Learning models in this study. Sensitivity and Specificity are calculated using formulas 4 and 3 respectively. Specif icity =
TN TN + FP
(3)
Sensitivity =
TP TP + FN
(4)
Table 3. Comparison of our Results on PIDD with Previous Research Author
Performance Measures Accuracy Specificity Sensitivity Method Used
Kaur et al. [17]
89.00%
NA
NA
BWA-SVM
Calisir et al. [3]
LDA-MWSVM
89.74%
0.93
0.83
Erkaymaz et al. [7] 91.66%
0.96
0.85
SW-FFANN
Nilashi et al. [24]
92.28%
NA
NA
PCA-SOM-NN
Kumari et al. [20]
97.20%
NA
NA
Voting Classifier
Neural Network*
93.33%
95.20
90.90
PC+NN
Inception*
100%
100
100
Transfer Learning
ResNet*
100%
100
100
Transfer Learning
RegNet*
100%
100
100
Transfer Learning
Table 3 compares the result of our model with the results from previous studies. The * marks are the models applied in this study. The accuracy of the proposed NN surpasses the performance of the studies mentioned in Sect. 2. Furthermore, the proposed network outperforms the PCA-SOM-NN that used clustering and feature selection in [24]. On top of that, the network used in [24] was trained for 200 epochs, while ours only used 100 epochs. Hence, our system is computationally more efficient and produces much better results. However, the sensitivity score for medical data is more crucial than accuracy [33]. Sensitivity scores refer to the correctly classified percentage scores of the people who are positively diagnosed with diabetes in this case. Therefore, high sensitivity score is critical as a wrong result may lead to deadly consequences. Due to this, we have heavily emphasized balancing our sensitivity and specificity scores. From the table, it can be observed that the proposed model results are the only one in the chart that achieves sensitivity and specificity scores that are higher than 0.90. Therefore, it can be deduced that our model works effectively in diagnosing diabetes.
Diabetes Mellitus Prediction Using Transfer Learning
81
However, in recent times even better methods have been introduced which perform better than the NN proposed in this study. Hence, our study used data transformation techniques to apply CNNs to the diabetes dataset. The CNNs performed remarkably, and multiple CNN methods, including the ResNet, Inception, and RegNet architectures, produced 100% accuracy scores on the PIDD dataset. The impact of this study has enormous consequences as this shows the effectiveness of CNNs for predicting diabetes. Additionally, they easily surpass the ML models’ performance, including the Neural Network. It also shows the effectiveness of CNNs on tabular data, further strengthening the case for CNNs on tabular data as mentioned in [31]. The CNNs mentioned above are pre-trained on the ImageNet dataset for an extended period. This enhances the learning prowess of the CNNs and makes them robust for use in any tabular datasets. As a result, the CNNs perform exceptionally well on the PIDD dataset with the assistance of DWTM. Furthermore, the relevance of features when creating images also plays a crucial role in boosting the performance of CNNs. People who suffer from prediabetes can test themselves and, based on the results, can take action to prevent diabetes in the future. Furthermore, individuals who test positive can take precautions to minimize the impact of diabetes in their lives. On top of that, clinical experts can use this as an assistive system to validate their results and vice versa. However In the past, it has been shown that CNNs do not work well on tabular data or small datasets. However, our results show that CNN performs better than traditional classifiers on the PIDD, which is a small dataset. Despite minimal feature selection, the network performs better than traditional classifiers. Furthermore, the network has worked well on a dataset that is skewed. On top of that, only 5 epochs have been used, thus making this system very efficient computationally. This suggests that increasing the number of epochs might result in the model achieving an even better result. The caveat is that the system will become computationally more expensive; hence, the slight increase in performance may not be worth it. Despite all its advantages, the proposed study does have a few shortcomings. First, the dataset has only been used on women of a specific heritage. As mentioned earlier, the system could be further improved by increasing the number of epochs(iterations) in the training loop. In future studies, this proposed method can also be used in other datasets to test its robustness.
5
Conclusion
In this paper, we proposed a transfer learning approach for predicting diabetes mellitus with 100% accuracy. Furthermore, convolutional neural networks can find very complex patterns within the data due to the inner computations. As a result, the CNNs worked very well in predicting whether a person will have diabetes. We believe this study will have a significant impact on the medical field and can help ordinary people to control the impact diabetes has on their lives. Furthermore, this study proves that when using medical data, where keeping all the attributes is crucial, CNNs can work better than traditional ML models.
82
Md. I. Iqbal et al.
References 1. Al Jarullah, A.A.: Decision tree discovery for the diagnosis of type ii diabetes. In: 2011 International Conference on Innovations in Information Technology, pp. 303–307. IEEE (2011) 2. Butt, U.M., Letchmunan, S., Ali, M., Hassan, F.H., Baqir, A., Sherazi, H.H.R.: Machine learning based diabetes classification and prediction for healthcare applications. J. Healthc. Eng. 2021 (2021) 3. Çalişir, D., Doğantekin, E.: An automatic diabetes diagnosis system based on LDAwavelet support vector machine classifier. Expert Syst. Appl. 38(7), 8311–8315 (2011) 4. Chang, V., Bailey, J., Xu, Q.A., Sun, Z.: Pima Indians diabetes mellitus classification based on machine learning (ml) algorithms. Neural Comput. Appl. 1–17 (2022) 5. Centers for Disease Control and Prevention: Missed opportunities in preventive counseling for cardiovascular disease-united states, 1995. MMWR. Morbidity and mortality weekly report, vol. 47, no. 5, p. 91 (1998) 6. Centers for Disease Control and Prevention: National diabetes fact sheet: national estimates and general information on diabetes and prediabetes in the united states, 2011. Atlanta, GA: US department of health and human services, centers for disease control and prevention, vol. 201, no. 1, pp. 2568–2569 (2011) 7. Erkaymaz, O., Ozer, M.: Impact of small-world network topology on the conventional artificial neural network for the diagnosis of diabetes. Chaos Solitons Fractals 83, 178–185 (2016) 8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 9. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 10. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) 11. Iqbal, M.I., Leon, M.I., Tonmoy, N.H., Islam, J., Ghosh, A.: Deep learning based smart parking for a metropolitan area. In: 2021 IEEE Region 10 Symposium (TENSYMP), pp. 1–5. IEEE (2021) 12. Iqbal, M.I., Mukta, M., Hossain, S., Hasan, A.R.: A dynamic weighted tabular method for convolutional neural networks. arXiv preprint arXiv:2205.10386 (2022) 13. Iqbal, M., Leon, M., Azim, S.: Analysing and predicting coronavirus infections and deaths in Bangladesh using machine learning algorithms. SSRN Electron. J. (2020) 14. Islam, A., et al.: EduBot: an educational robot for underprivileged children. In: 2019 International Conference on Automation, Computational and Technology Management (ICACTM), pp. 232–236. IEEE (2019) 15. Islam, J., Ghosh, A., Iqbal, M.I., Meem, S., Ahmad, N.: Integration of home assistance with a gesture controlled robotic arm. In: 2020 IEEE Region 10 Symposium (TENSYMP), pp. 266–270. IEEE (2020) 16. Kannel, W.B., McGee, D.L.: Diabetes and cardiovascular disease: the Framingham study. Jama 241(19), 2035–2038 (1979) 17. Kaur, H., Kumari, V.: Predictive modelling and analytics for diabetes using a machine learning approach. Appl. Comput. Inform. (2020)
Diabetes Mellitus Prediction Using Transfer Learning
83
18. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2014) 19. Knowler, W.C., Bennett, P.H., Hamman, R.F., Miller, M.: Diabetes incidence and prevalence in pima Indians: a 19-fold greater incidence than in Rochester, Minnesota. Am. J. Epidemiol. 108(6), 497–505 (1978) 20. Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021) 21. Kumari, V.A., Chitra, R.: Classification of diabetes disease using support vector machine. Int. J. Eng. Res. Appl. 3(2), 1797–1801 (2013) 22. Leon, M.I., Iqbal, M.I., Azim, S.M., Al Mamun, K.A.: Predicting COVID-19 infections and deaths in Bangladesh using machine learning algorithms. In: 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), pp. 70–75. IEEE (2021) 23. Li, G., et al.: The long-term effect of lifestyle interventions to prevent diabetes in the China Da Ging diabetes prevention study: a 20-year follow-up study. Lancet 371(9626), 1783–1789 (2008) 24. Nilashi, M., Ibrahim, O., Dalvi, M., Ahmadi, H., Shahmoradi, L.: Accuracy improvement for diabetes disease classification: a case on a public medical dataset. Fuzzy Inf. Eng. 9(3), 345–357 (2017) 25. Patil, P.B., Shastry, P.M., Ashokumar, P.: Machine learning based algorithm for risk prediction of cardio vascular disease (CVD). J. Crit. Rev. 7(9), 836–844 (2020) 26. Patil, V., Ingle, D.: Comparative analysis of different ml classification algorithms with diabetes prediction through pima Indian diabetics dataset. In: 2021 International Conference on Intelligent Technologies (CONIT), pp. 1–9. IEEE (2021) 27. Schneider, N., Piewak, F., Stiller, C., Franke, U.: RegNet: multimodal sensor registration using deep neural networks. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 1803–1810. IEEE (2017) 28. Sharma, A., Vans, E., Shigemizu, D., Boroevich, K.A., Tsunoda, T.: Deepinsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9(1), 1–7 (2019) 29. Sivanesan, R., Dhivya, K.D.R.: A review on diabetes mellitus diagnoses using classification on pima Indian diabetes data set. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 5(1) (2017) 30. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 31. Sun, B., et al.: SuperTML: two-dimensional word embedding for the precognition on structured tabular data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019) 32. Uday, T.I.R., et al.: Design and implementation of the next generation mars rover. In: 2018 21st International Conference of Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2018) 33. Veropoulos, K., et al.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, vol. 55, p. 60 (1999) 34. Wei, S., Zhao, X., Miao, C.: A comprehensive exploration to the machine learning techniques for diabetes identification. In: 2018 IEEE 4th World Forum on Internet of Things (WF-IoT), pp. 291–295. IEEE (2018) 35. Yang, W., Dall, T.M., Beronjia, K., Semilla, A.P., Chakrabarti, R., Hogan, P.F.: Economic costs of diabetes in the us in 2017. Diabetes Care 41(5), 917–928 (2018)
An Improved Heart Disease Prediction Using Stacked Ensemble Method Md. Maidul Islam1 , Tanzina Nasrin Tania1 , Sharmin Akter1 , and Kazi Hassan Shakib2(B) 1 City University, Dhaka, Bangladesh 2 Chittagong University of Engineering and Technology, Chittagong, Bangladesh
[email protected]
Abstract. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. The discovery of previously unknown patterns and connections can help with an improved decision when it comes to forecasting heart disorder risk. In this study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, crossvalidation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews’ correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms. Keywords: Prediction · Heart Disease · CART · GBM · Multilayer Perception
1 Introduction Heart disorder, which affects the heart and arteries, is one of the most devastating human diseases. The heart is unable to pump the required volume of blood toward other parts of the body when it suffers from cardiac problems. In the case of heart disease, the valves and heart muscles are particularly affected. Cardiac illness is also referred to as cardiovascular disease. The cardiovascular framework comprises all blood vessels, including arteries, veins, and capillaries, that constitute an intricate system of the bloodstream throughout the organ. Cardiovascular infections include cardiac illnesses, cerebrovascular infections, and artery illnesses. Heart disease may be a hazard, usually unavoidable © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 84–97, 2023. https://doi.org/10.1007/978-3-031-34619-4_8
An Improved Heart Disease Prediction Using Stacked Ensemble Method
85
and an imminent reason for casualty. Heart disease is currently a prominent issue with all other well-being ailments since many people are losing their lives due to heart disease. Cardiovascular disease kills 17.7 million people per year, accounting for 31% of all deaths globally, as per the World Health Organization (WHO). Heart attacks and strokes account for 85% of these cases. Heart-related disorders have also become the major cause of death in India [1]. In the United States, one person is killed every 34 s. [9]. Heart diseases killed 1.7 million Indians in 2016, concurring to the 2016 Worldwide Burden of Disease Report, released on September 15, 2017 [3]. According to a WHO report published in 2018, nearly 6 million people died globally in 2016 because of heart infections. [4]. Controlling heart disorders costs approximately 3% of total healthcare spending [20]. The World Health Organization’s projections provided the impetus for this project. The WHO predicts that roughly 23.6 million people will die from heart disease by 2030. The expanding rate of heart infections has raised worldwide concern. Heart failure is tougher to diagnose because of diabetes, hypertension, hyperlipidemia, irregular ventricular rate, and other pertinent diagnosable conditions. As cardiac illness becomes increasingly common, data on the condition is getting more nonlinear, nonnormal, association-structured, and complicated. As a result, forecasting heart illness is a major difficulty in medical data exploration, and clinicians find it extremely difficult to properly forecast heart disease diagnosis. Several studies have endeavored to use advanced approaches to analyze heart disease data. If the bagging is not adequately represented in the ensemble approach, it might result in excessive bias and consequently under-fitting. The boosting is also difficult to apply in real time due to the algorithm’s increasing complexity. On the other hand, our proposed approach may combine the skills of several high-performing models on a classification or regression job to provide predictions that outperform any single model in the ensemble while also being simpler to build. Our suggested system hasn’t received much attention; so we’ve attempted to build it correctly and come up with a nice outcome, and a superior prediction system. The organization of the paper is explained as follows. In Sect. 2, we have made an effort to state related research contributions, state their major contributions and compare with our work. We also provided a table with the underlying overview of the related works and comparison analytics for readers. With Sect. 3, we have provided an outline of the system methodology and outlined the architecture. In Sect. 4, implementations, and experimental results are described. Section 5, we speak on our limitation in section and we conclude the paper.
2 Literature Review The study aims to look into how data mining techniques may be used to diagnose cardiac problems [15]. Practitioners and academics have previously employed pattern recognition and data mining technologies in the realm of diagnostics and healthcare for prediction purposes [13]. Various contributions have been made in recent times to determine the best preferred approach for predicting heart disorders [8]. So, the above part explores numerous analytical methodologies while providing a quick overview of the existing literature regarding heart disorders. In addition, current techniques have been evaluated in several ways, including a comprehensive comparison after this section.
86
M. M. Islam et al.
Mohan S. et al. [1] developed a unique approach to determining which ML approaches are being used to increase the accuracy of heart illness forecasting. The forecast model is introduced using a variety of feature combinations and well-known classification methods. They attain an enhanced performance level with an accuracy level of 88.7% using the Hybrid Random Forest with Linear Model prediction model for heart disease (HRFLM). As previously stated, the (ML) techniques used in this study include DT, NB, DL, GLM, RF, LR, GBT, and SVM. All 13 characteristics as well as all ML techniques were used to reproduce the investigation. Palaniappan S. et al. [2] applied a technology demonstrator Intelligent Heart Disease Prediction System (IHDPS), using data mining approaches such as DT, NB, and NN. The results show that each approach seems to have a different advantage in reaching the defined extraction criteria. Based on medical factors like sex, age, blood sugar, and blood pressure, this can forecast the probability of individuals developing heart disorders. It enables considerable knowledge to be established, such as patterns and correlations among galenic aspects connected to heart illness. The Microsoft.NET platform underpins IHDPS. The mining models are built by IHDPS using the CRISP-DM approach. Bashir S. et al. [4] in their research study discusses how data science can be used to predict cardiac disease in the medical industry. Despite the fact that several studies have been undertaken on the issue, prediction accuracy still needs to be improved. As a result, the focus of this study is on attribute selection strategies as well as algorithms, with numerous heart disease datasets being utilized for testing and improving accuracy. Attribute selection methodologies such as DT, LR, Logistic Regression SVM, NB, and RF are used with the Rapid miner, and the results indicate an increase in efficiency. Le, H.M. et al. [5] rank and weights of the Infinite Latent Feature Selection (ILFS) approach are used to weight and reorder HD characteristics in our method. A pulpous margin linear SVM is used to classify a subset of supplied qualities into discrete HD classes. The experiment makes use of the UCI Machine Learning Repository for Heart Disorders’ universal dataset. Experiments revealed that it suggested a method is useful for making precise HD predictions; our tactic performed the best, with an accuracy of 90.65% as well as an AUC of 0.96 for discriminating ‘No existence’ HD from ‘Existence’ HD. Yadav, D.C. and Pal et al. [6] implemented M5P, random Tree, and Reduced Error Pruning using the Random Forest Ensemble Method were presented and investigated as tree-based classification algorithms. All prediction-based methods were used after identifying features for the cardiac patient dataset. Three feature-based techniques were employed in this paper: PC, RFE, and LR. For improved prediction, the set of variables was evaluated using various feature selection approaches. With the findings, they concluded that the attribute selection methods PC and LR, along with the random-forest-ensemble approach, deliver 99% accuracy. Kabir, P.B. and Akter, S. et al. [7] among the most fundamental and widely used ensemble learning algorithms are tree-based techniques. Tree-based models such as Random Forest (RF), and Decision Tree (DT), according to the study, provide valuable intelligence with enhanced efficiency, consistency, as well as application. Using the Feature Selection (FS) method, relevant features are discovered, and classifier output is produced using these features. FS eliminates non-essential characteristics without
An Improved Heart Disease Prediction Using Stacked Ensemble Method
87
affecting learning outcomes. Our study aims to boost the performance. The aim of the research is really to apply FS in conjunction with tree-based approaches to increase heart disease prediction accuracy. Islam, M.T. et al. [8] in this work, PCA has been used to decrease characteristics. Aside from the final clustering, a HGA with k-means was applied. For clustering data, the k-means approach is often applied. Because this is a heuristic approach, it is possible for it to become trapped in local optima. To avoid this problem, they used the HGA for data clustering. The suggested methodology has a prediction accuracy of 94.06% for early cardiac disease. Rahman, M.J.U. et al. [10] the main purpose of this work is just to create a Robust Intelligent Heart Disease Prediction System (RIHDPS) applying several classifiers such as NB, LR, and NN. This content investigated the effectiveness of medical decision assistance systems utilizing ensemble techniques of these three algorithms. The fundamental purpose of this study is to establish a Robust Intelligent Heart Disease Prediction System (RIHDPS) by combining 3 data mining modelling techniques into an ensemble method: NB, LR, and NN. Patel, J. et al. [12] utilizing W-E-K-A, this study evaluates alternative Decision Tree classification algorithms to improve contribution in heart disorder detection. The methods being tested include the J48 approach, the LMT approach, and the RF method. Using existing datasets of heart disease patients as from the UCI repository’s Cleveland database, the performance of decision tree algorithms is examined and validated. The aim of the research is to utilize data mining tools that uncover hidden patterns in cases of heart problems as well as to forecast the existence of heart disorders in individuals, ranging from no existence to likely existence. Bhatla, N. et al. [28] research aims to look at different data mining techniques that might be employed in computerized heart disorder forecasting systems. The NN with 15 features has the best accuracy (100%) so far, according to the data. DT, on either hand, looked impressive with 99.62% accuracy when using 15 characteristics. Furthermore, the Decision Tree has shown 99.2% efficiency when combined with the Genetic Algorithm and 6 characteristics (Table 1).
3 Methodology This section mentioned above proposes an advanced and efficient prediction of heart disease based on past historical training data. The ideal strategy is to analyze and test various data-mining algorithms and to implement the algorithm that gives out the highest accuracy. This research also consists of a visualization module in which the heart disease datasets are displayed in a diagrammatic representation using different data visualization techniques for user convenience and better understanding. The subsections that follow go through several materials and methodologies in detail. The research design is shown in Section A, the data collection and preprocessing are summarized in Section B, and the ML classification techniques and stacked ensemble approach are explained in Section C of this study.
88
M. M. Islam et al.
Table 1. A literature evaluation of cardiac disease predictions included a comparison of several methods. Source
Datasets
Mohan S. [1]
Attributes
Classifier & Validation techniques
Accuracy
Cleveland UCI HRFLM repository
14 attributes
DT, GLM, RF, and 5 more attributes
88.4%
Bashir S. [4]
UCI dataset
Minimum Redundancy Maximum Relevance (MRMR)
FBS, Cp, Trestbps, Chol, Age, Slope, Sex, and more 7 attributes
NB, Logistic NB: 84.24% Regression, LR LR (SVM): SVM, DT and 84.85% RF
Le, H.M. [5]
UCI Machine Learning Repository
Infinite Latent Feature Selection (ILFS)
58 attributes
WEKE, NB, LR, Non-linear SVM (Gaussian, Polynomial, Sigmoid), and Linear SVM
Linear SVM: 89.93%, ILFS: 90.65%
Yadav D.C. and Pal [6]
UCI repository Lasso Regularization, Recursive Features Elimination and Pearson Correlation
Resting, FBS, CP, Chol, Sex, Ca, Age, and 7 more attributes
Random Tree, M5P, and Reduced Error Pruning with Random Forest Ensemble Method
Random Forest ensemble method: 99%
Kabir P.B. Hungary (HU), Hybrid and Akter Long Beach S. [7] (LB), Cleveland (Cleve.), and Switzerland (SR)
Cordocentesis, Max HR achieved, Epoch, Triglyceride, Sign, Coronary Infarction, Diastolic Pressure, and 6 more attributes
LGBM, RF, NB, SVM, and 3 more algorithms
KNN: 100.00% DT: 100.00% RF: 100.00%
Islam M.T. [8]
14 attributes
H-G-A with k-means
94.06%
UCI Machine Learning Repository
FS
PCA
(continued)
An Improved Heart Disease Prediction Using Stacked Ensemble Method
89
Table 1. (continued) Source
Datasets
FS
Patel J. [12]
Cleveland UCI WEKA repository
Attributes
Classifier & Validation techniques
Accuracy
13 attributes
DT(J48), LMT, J48 tree RF technique: 56.76%
3.1 Research Design In this section, gather all of the data into a single dataset. This approach for extracting functions for cardiovascular disease prognostication may also be applied with this aspect analysis procedure. Following the identification of accessible data resources, those are additionally picked, cleansed, and then converted to the required distribution. The atypical identification survey provides valuable characteristics for predicting coronary artery disease prognosis. Cross-validation, several classification approaches, and the stacked ensemble method will be utilized to predict using pre-processed data. After completing all of these steps, the illness will be forecast favorably. Following that, we’ll assess the entire performance. The outcome will be determined after the performance review (Fig. 1).
Fig. 1. Methodological framework of heart disease.
3.2 Data Collection and Preprocessing In this study, we used Statlog, Cleveland, and Hungary datasets as the three datasets in this fact compilation. There are 1190 records in all, with 11 characteristics and one target variable. Chest pain, cholesterol, sex, resting blood pressure, age, resting ecgnormal (0), st-t abnormality (1), lv hypertrophy (2), fasting blood sugar max hate rate, exercise angina, old-peak, st slope-normal (0), upsloping (1), flat (2), downsloping (3), 0 denoting no disease and 1 denoting illness. It should be noted that null or missing values are utilized to represent zero values. As a result, we must delete null values throughout the data preparation step. But in our case, we have no null values. After that, we complete exploratory data analysis (Table 2).
90
M. M. Islam et al. Table 2. Features of the dataset descriptive information.
Features
Definition
Type
Age
Patient’s age in years successfully completed
Numerical
Sex
Male patients are indicated at 1 and female patients are indicated at 0
Nominal
Chest Pain
The four types of chest pain that patients feel are: 1. typical Nominal angina, 2. atypical angina, 3. non-anginal pain, and 4. asymptomatic angina
Resting BPS
Blood pressure in mm/HG while in resting mode
Numerical
Cholesterol
mg/dl cholesterol in the bloodstream
Numerical
Fasting Blood Sugar Fasting blood sugar levels >120 mg/dl are expressed as 1 in Nominal real cases and 0 in false cases Resting ECG
The ECG results when at rest are displayed in three different values. 0: Normal 1: ST-T wave abnormality 2: Left ventricular hypertrophy
Nominal
Max heart rate
Accomplished maximum heart rate
Numerical
Exercise angina
Exercise-induced angina 0 represents No and 1 represents Yes
Nominal
Oldpeak
In compared to the resting state, exercise caused ST-depression
Numerical
ST slope
Three values for the ST segment assessed of slope at peak exercise: 1. slanting, 2. flat, 3. slanting
Nominal
Target
It is the objective variable that we must forecast. A score of 1 Numerical indicates that the person is at risk for heart disease, whereas a value of 0 indicates that the person is in good health
3.3 Models Machine learning classification methods are utilized in this phase to classify cardiac patients and healthy people. The system employs RF Classifier, MLP, KNN, ET Classifier, XGBoost, SVC, AdaBoost Classifier, CART, and GBM, among other common classification techniques. For our suggested system, we will apply the stacked ensemble approach. We need to construct a base model and a meta-learner algorithm for a stacked ensemble. The most relevant and standard evaluation metrics for this problem area, such as sensitivity, specificity, precision, F1-Score, ROC, Log Loss and Mathew correlation coefficient are used to assess the outcome of each event. 1. RF Classifier: Random Forest Model is a classification technique that uses a random forest as its foundation. As in regression and classification, an algorithm may handle data sets with both continuous and categorical variables. It outperforms the competition when it comes to categorized problems. Criterion: this is a function that determines whether or not the split is correct. We utilized “entropy” for information
An Improved Heart Disease Prediction Using Stacked Ensemble Method
91
gain, and “gini” stands for Gini impurity. Gini = 1 −
G
(pi )2
i=1
Entropy =
G
−pi ∗ log 2 (pi )
i=1
2. MLP: A pelleting neural network called a multi-layer perceptron (MLP) establishes a number of outputs from a collection of inputs. Multiple sections of input nodes comprise an MLP, between the inlet and outlet layers is linked as a directed graph. 3. KNN: K-NN method is straightforward to implement and does not require the use of a hypothesis or any other constraints. This algorithm may be used to do exploration, validation, and categorization. Despite the fact that K-NN is the most straightforward approach, it is hampered by duplicated and unnecessary data. 4. Extra Tree Classifier: Extremely Randomized Trees, or Extra Trees, is a machine learning ensemble technique. This is a decision tree ensemble comparable like bootstrap aggregation and random forest, among other decision tree ensemble, approaches. The Extra Trees approach uses the training data to construct a significant number of extremely randomized decision trees. An Average of decision tree estimates is used in regression, whereas a democratic majority is utilized in classification. 5. XGBoost: The XGBoost classifier is a machine learning method for categorizing both structured and tabular data. XGBoost is a high-speed and high-performance gradient boosted decision tree implementation. XGBoost is a high-gradient gradient boost algorithm. As a result, it’s a complicated machine learning method with many moving parts. XGBoost can handle large, complicated datasets with ease. XGBoost is an ensemble modelling approach. 6. SVC: In both classification and regression issues, the Support Vector Classifier (SVC) is a common supervised learning technique. The SVC method’s purpose is to find the optimal path or set point for categorizing n-dimensional regions because the following observations may be readily classified. SVC can be used to select the extreme positions that aid in the construction of the hyperplane. The Support Vector Machine is the method, and support vector classifiers are prominent examples. 7. AdaBoost Classifier: The Algorithms, shorthand for Adaptive Boosting, is a boosting approach used in Machine Learning as Ensemble Learning. Each instance’s weights are reassigned, with larger weights applied for instances that were mistakenly identified. This is known as “Adaptive Boosting”. 8. CART: Data is divided up frequently based on a parameter in decision trees, a kind of supervised machine learning. In the training data, specify the input and the associated output. Two entities may be used to explain the tree: decision nodes and leaves. 9. GBM: Gradient boosting is a collection of classification algorithms that may be applied to a variety of issues such as classification and regression problems. It assembles a prediction system from a collection of weak frameworks,—usually decision trees.
92
M. M. Islam et al.
10. Stacked Ensemble: The term “ensemble” relates to the procedure of combining many models. As a result, instead of employing model to make predictions, a group of models is used. Ensemble uses two different techniques: • Bagging creates a unique training segment with replenishment from experimental training phase, as well as the outcome is determined by a majority vote. Consider the Random Forest example. • Boosting transforms weak learners to strong learners through creating pursuant models with overall performance as the final model. For instance, in AdaBoost and XG BOOST. The stacked ensemble approach will be used. The stacked ensemble approach would be a supervised ensemble classification strategy that stacks many prediction algorithms to find the optimum combination. Stacking, also called as Superior Training or Stacking Regression, is a set of computational where another second-level regression model “metalearner” is combined with a first-level regression model has been programmed to find the optimum possible combination of basic learners. Stacking, in contrast to bagging and boosting, aims to bring together strong, varied groups of learners. We have completed our work in the following sections 1. 2. 3. 4. 5. 6.
For this system, we import all of the necessary libraries. After loading our dataset, we clean and preprocess it. We use the z-score to identify and eliminate outliers. We divided the data into two parts: training and testing, with 80/20%. We developed a model using cross-validation. For a stacked ensemble technique, we stack all of the models such as RF, MLP, KNN, ETC, XGB, SVC, ADB, CART, and GBM. 7. We assess and compare our model to other models.
Fig. 2. Stacked Ensemble Method
Figure 2 depicts two levels: LEVEL 0 and LEVEL 1. First, we use the base learners (level 0) to make forecasts. The ensemble prediction is then generated by feeding those forecasts into the meta-learner (level 1).
An Improved Heart Disease Prediction Using Stacked Ensemble Method
93
4 Result Analysis This section presents the outcomes of changing the ten orders indicated above. PRC, Sensitivity, Specificity, F1 Score, ROC, Log Loss, and MCC are the most common evaluation metrics used in this analysis. Complexity refers to a calculation that defines the importance of a segment of the review, whereas recall refers to the number of times genuinely qualified people are recovered (Table 3). Table 3. Result of various models with proposed model. Model
Accuracy
PRC
Sensitivity
Specificity
F1 Score
ROC
Log_Loss
MCC
Stacked Classifier
0.910638
0.898438
0.934959
0.883929
0.916335
0.909444
3.086488
0.821276
RF
0.893617
0.865672
0.943089
0.839286
0.902724
0.891188
3.674399
0.789339
MLP
0.821277
0.809160
0.861789
0.776786
0.834646
0.819287
6.172973
0.642127
KNN
0.800000
0.787879
0.845528
0.750000
0.815686
0.797764
6.907851
0.599458
Extra Tree Classifier
0.885106
0.869231
0.918699
0.848214
0.893281
0.883457
3.968343
0.770445
XGB
0.897872
0.896000
0.910569
0.883929
0.903226
0.897249
3.527409
0.795248
SVC
0.812766
0.788321
0.878049
0.741071
0.830769
0.809560
6.466933
0.627138
AdaBoost
0.817021
0.812500
0.845528
0.785714
0.828685
0.815621
6.319943
0.633084
CART
0.851064
0.879310
0.829268
0.875000
0.853556
0.852134
5.144121
0.703554
GBM
0.829787
0.826772
0.853659
0.803571
0.840000
0.828615
5.879016
0.658666
The Stacked Ensemble Classifier, with an accuracy of 0.910, sensitivity of 0.934, specificity of 0.883, best f1-score of 0.916, minimum Log Loss of 3.08, and highest ROC value of 0.909, is the best performer. Of the same evaluation metrics in every region, Random Forest has the highest sensitivity level, while XGboost is second best.
Fig. 3. Accuracy Chart of ML Models
This Fig. 3 shows a visual depiction of effectiveness for all the other previously discussed machine learning techniques. Stacked classifier model’s accuracy is 91.06%, however, the F1 score is 0.9163. The accuracy of the XGB and RF algorithms, on the other hand, is 89.78% and 89.36%,
94
M. M. Islam et al.
respectively, with F1 scores of 0.8972 and 0.8911. The accuracies of Extra Tree Classifiers, CART, GBM, MLP, SVC, and KNN algorithms are 88.51%, 85.10%, 82.97%, 82.12%, 81.27%, and 80.00% (Fig. 4).
Fig. 4. Confusion Matrixes of Stacked Classifier Models and ROC Curve.
The confusion matrix for the implemented system is generated as shown in the diagram above. In the area of machine learning, extracted features are also referred to as artificial neurons. It is a statistical form that allows the reproduction of the results of an approach. In the case of graph partitioning, an ensemble learning approach is extremely useful. Knowledge is, specifically, the complexity of quantitative categorization.
Fig. 5. Heart Disease Identification.
Figure 5 depicts a visual representation of all cardiac problems being detected. Crimson indicates a heart attack, whereas verdant indicates no cardiac disease.
An Improved Heart Disease Prediction Using Stacked Ensemble Method
95
5 Conclusion and Future Recommendation Among the most significant threats to human survival is heart disease. Predicting cardiac illness has become a major concern and priority in the medical industry. Using the Stacked Ensemble Classifier, we have shown an improved heart disease prediction method. It incorporates a number of different prediction techniques. In this work, we examined the significance of prediction performance, precision, ROC sensitivity, Specificity, F1 Score, Log Loss, and MCC. To identify whether or not a person has a heart problem, we applied machine learning techniques. The medical data set was used in a variety of ways. As a consequence of the findings, we discovered that the enhanced stacked ensemble approach provides better accuracy than previous methods. The purpose of this research is to inquire about particular ML techniques on a form, therefore we further wanted to increase the dependability of the system’s operations to provide a much adequate assertion as well as encourage certain Approaches for recognizing the appearance of CVD. The above-mentioned structure could be adapted and repurposed for new purposes. The results show that these data mining algorithms may accurately predict cardiac disease with a 91.06% accuracy rate. As our study is based on recorded data from the Statlog, Cleveland, and Hungary datasets, for future research possibilities, we will aim to train and test on a large medical data set using many ensemble methods in the future to see if we can enhance their performance. Our ensemble method is superior to traditional methods, as even if it is overfitting at times, it usually reduces variances, as well as minimizes modeling method bias. It also has superior Predictive performance, reduces dispersion and our approach has superior efficiency by choosing the best combination of models.
References 1. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554 (2019) 2. Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. In: 2008 IEEE/ACS International Conference on Computer Systems and Applications, pp. 108–115. IEEE (2008) 3. Ramalingam, V.V., Dandapath, A., Raja, M.K.: Heart disease prediction using machine learning techniques: a survey. Int. J. Eng. Technol. 7(2.8), 684–687 (2018) 4. Bashir, S., Khan, Z.S., Khan, F.H., Anjum, A., Bashir, K.: Improving heart disease prediction using feature selection approaches. In: 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pp. 619–623. IEEE (2019) 5. Le, H.M., Tran, T.D., Van Tran, L.A.N.G.: Automatic heart disease prediction using feature selection and data mining technique. J. Comput. Sci. Cybern. 34(1), 33–48 (2018) 6. Yadav, D.C., Pal, S.A.U.R.A.B.H.: Prediction of heart disease using feature selection and random forest ensemble method. Int. J. Pharm. Res. 12(4), 56–66 (2020) 7. Kabir, P.B., Akter, S.: Emphasised research on heart disease divination applying tree based algorithms and feature selection. In: 2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), pp. 1–6. IEEE (2021) 8. Islam, M.T., Rafa, S.R., Kibria, M.G.: Early prediction of heart disease using PCA and hybrid genetic algorithm with k-means. In: 2020 23rd International Conference on Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2020)
96
M. M. Islam et al.
9. Soni, J., Ansari, U., Sharma, D., Soni, S.: Intelligent and effective heart disease prediction system using weighted associative classifiers. Int. J. Comput. Sci. Eng. 3(6), 2385–2392 (2011) 10. Rahman, M.J.U., Sultan, R.I., Mahmud, F., Shawon, A., Khan, A.: Ensemble of multiple models for robust intelligent heart disease prediction system. In: 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (ICEEiCT), pp. 58–63. IEEE (2018) 11. Vinothini, S., Singh, I., Pradhan, S., Sharma, V.: Heart disease prediction. Int. J. Eng. Technol. 7(3.12), 753 (2018) 12. Patel, J., TejalUpadhyay, D., Patel, S.: Heart disease prediction using machine learning and data mining technique. Heart Dis. 7(1), 129–137 (2015) 13. Dinesh, K.G., Arumugaraj, K., Santhosh, K.D., Mareeswari, V.: Prediction of cardiovascular disease using machine learning algorithms. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–7. IEEE (2018) 14. Kunjir, A., Sawant, H., Shaikh, N.F.: Data mining and visualization for prediction of multiple diseases in healthcare. In: 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), pp. 329–334. IEEE (2017) 15. Babu, S., et al.: Heart disease diagnosis using data mining technique. In: 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), vol. 1, pp. 750–753. IEEE (2017) 16. Karthiga, A.S., Mary, M.S., Yogasini, M.: Early prediction of heart disease using decision tree algorithm. Int. J. Adv. Res. Basic Eng. Sci. Technol. 3(3), 1–16 (2017) 17. Repaka, A.N., Ravikanti, S.D., Franklin, R.G.: Design and implementing heart disease prediction using Naives Bayesian. In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 292–297. IEEE (2019) 18. Sonawane, J.S., Patil, D.R.: Prediction of heart disease using learning vector quantization algorithm. In: 2014 Conference on IT in Business, Industry and Government (CSIBIG), pp. 1–5. IEEE (2014) 19. Amin, S.U., Agarwal, K., Beg, R.: Genetic neural network based data mining in prediction of heart disease using risk factors. In: 2013 IEEE Conference on Information & Communication Technologies, pp. 1227–1231. IEEE (2013) 20. Ul Haq, A., Li, J.P., Memon, M.H., Nazir, S., Sun, R.: A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob. Inf. Syst. (2018) 21. Gavhane, A., Kokkula, G., Pandya, I., Devadkar, K.: Prediction of heart disease using machine learning. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1275–1278. IEEE (2018) 22. Shah, D., Patel, S., Bharti, S.K.: Heart disease prediction using machine learning techniques. SN Comput. Sci. 1(6), 1–6 (2020) 23. Singh, A., Kumar, R.: Heart disease prediction using machine learning algorithms. In: 2020 International Conference on Electrical and Electronics Engineering (ICE3), pp. 452–457. IEEE (2020) 24. Soni, J., Ansari, U., Sharma, D., Soni, S.: Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int. J. Comput. Appl. 17(8), 43–48 (2011) 25. Dangare, C.S., Apte, S.S.: Improved study of heart disease prediction system using data mining classification techniques. Int. J. Comput. Appl. 47(10), 44–48 (2012) 26. Anushya, D.A.: Genetic exploration for feature selection. Int. J. Comput. Sci. Eng. 7(2) (2019) 27. Chen, A.H., Huang, S.Y., Hong, P.S., Cheng, C.H., Lin, E.J.: HDPS: heart disease prediction system. In: 2011 Computing in Cardiology, pp. 557–560. IEEE (2011)
An Improved Heart Disease Prediction Using Stacked Ensemble Method
97
28. Bhatla, N., Jyoti, K.: An analysis of heart disease prediction using different data mining techniques. Int. J. Eng. 1(8), 1–4 (2012) 29. Stacking Ensemble Machine Learning with Python. https://machinelearningmastery.com/sta cking-ensemble-machine-learning-with-python/. Accessed 22 Feb 2022
Improved and Intelligent Heart Disease Prediction System Using Machine Learning Algorithm Nusrat Alam1(B) , Samiul Alam2
, Farzana Tasnim1
, and Sanjida Sharmin1
1 International Islamic University Chittagong, Chattogram, Bangladesh
[email protected] 2 East Delta University, Chittagong, Bangladesh
Abstract. Predicting heart disease needs more perfection, precision, and correctness because a little fault may cause a big danger for a patient. In the field of machine learning, there are many classification algorithms for predicting heart disease. This paper presents the probability of heart disease prediction by some machine learning classifiers which are processed by feature engineering techniques on datasets. Feature engineering is used for building features by the process of using domain knowledge of data. Here a comparison has been shown before and after feature engineering of those supervised learning algorithms and identified the best algorithm for the best accuracy. The performance of each algorithm is determined and a comparison is made for each algorithm based on the precision of the calculation and the evaluation time. The proposed method has used the Cleveland dataset and another dataset consists of four datasets (Switzerland, Hungary, Cleveland, and Long Beach) downloaded from the Kaggle repository. Here the better accuracy has been gained from Ridge Classifier 86.89% for the Cleveland database. Another dataset has given 100% accuracy for the Gradient Boosting classifier, Bagging Classifier, and Gaussian Process classifier. This research will help to predict heart disease at an early stage which will reduce the death rate of heart disease. Keywords: Machine learning algorithms · Disease Prediction · Heart Disease prediction · Feature selection · Feature engineering technique
1 Introduction Machine learning has become an important subject for any engineering department. It is very important for data analysis, classification, and prediction. Machine learning is closely associated with Big Data, Data Science, and Artificial Intelligence [19]. At present, various theories of ML are also applied to general web apps or mobile phones so that the application you use becomes more intelligent and can acquire the ability to understand the human mind. The difference between a normal app and an ML Implemented app is that the normal app will always be the same but the ML Implemented © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 98–108, 2023. https://doi.org/10.1007/978-3-031-34619-4_9
Improved and Intelligent Heart Disease Prediction System
99
app will be unique, every time you use it you will feel that the app is becoming more intelligent [20]. However, ML can not only give intelligence to the app, ML works for any kind of classification and prediction starting from diagnosis. Heart disease may be a major pathological state in today’s time that assigns to an outsized number of medical conditions associated with the guts. Medical terms refer to the abnormal health issues that have a direct impact on the center and each of its components. Over 5.8 million Americans have it, and there are over 23 million of them worldwide [9]. Coronary vein illness, one thing else known as coronary illness, is the foremost generally recognized form of coronary illness. It makes once the conduits that offer blood to the guts terminated up discouraged with plaque [18]. This makes them cement and slim. Plaque contains steroid alcohol and distinctive substances. Consequently, the blood offer lessons, and therefore the heart gets less oxygen and fewer supplements. As anticipated, the guts muscle debilitates, and there is a peril of vas breakdown and arrhythmias. For this purpose, once plaque creates among the veins, this will be known as a coronary-artery disease [12, 13]. The recognition of 5 ordinary cardiovascular disappointment signs (neck, jaw, or back burden [8]; inadequacy or dizziness; chest trouble; arm or bear burden; and windedness) is greater in females than in folks (54.4% versus 45.6%) also, in whites (54.8%) than in blacks (43.1%), Asians (33.5%). Between 1999 to 2000 and 2015 to 2016 [8], the pervasiveness of perfect levels for many cardiovascular prosperity parts moved forward for US kids (12–19 a long time ancient), counting nonsmoking, total cholesterol, and BP. But no remarkable changes were seen inside the pervasiveness of an ideal score for the sound eating plan score among kids over this period, the predominance of ideal levels of weight rundown, Father, and diabetes mellitus (DM) declined. Data from the Society of Thoracic Specialists Database reflects that a total of 122459 natural heart restorative methods were performed from July 2014 to June 2018. In 2018, 3408 heart transplantations were acted within the Joined together States, the foremost ever [8]. In an accomplice of 58671 porous females taking an intrigued within the Nurses’ Wellbeing Consider II without hypertension at gestational, benchmark, hypertension and toxemia amid to begin with pregnancy were related with the next pace of self-announced specialist analyzed steady hypertension over a 25- to 32-year follow-up for gestational hypertension (HR, 2.8 [95% CI, 2.6–3.0] and HR, 2.2 [95% CI, 2.1–2.3] for toxemia) [8]. The proposed research work is based on machine learning which is a section of Artificial Intelligence (AI). In previous research, they calculated the accuracy of different machine learning algorithms and on the support of calculation, they concluded which one was the best among them. There was some research gap like they used a limited machine learning algorithm for prediction and they worked on a limited dataset for predicting the accuracy. There is also a need to improve feature engineering and to find the machine learning approaches which will be used for the best analysis of heart diseases for best prediction accuracy. This research focuses on the development of new techniques to achieve the best accuracy of heart disease prediction were applied the feature engineering technique to various efficient machine learning algorithms and made a comparison between the accuracy before and after feature engineering. To calculate the accuracy of heart disease prediction of mentioned datasets [6, 7]. Where also has made
100
N. Alam et al.
a comparison of various previous research works. To demonstrate the best prediction value rate for every model in the final declaration.
2 Related Work In the research work by Archana et al. [1] they used the machine learning algorithms SVM, linear regression, decision tree, and K-nearest neighbor to predict cardiac disease. Because Jupiter notebook is simple to use for Python programming projects, it is employed as a simulation tool. Taking into account the confusion matrix, which is based on the true positive, false positive, true negative, and false negative values. They discover that among the evaluated algorithms, KNN provided the highest accuracy. KNN accuracy in the area was 87%. They draw a conclusion about which of them is finest. provided by Pranav et al. The work article by Nidhi et al. [3] mentioned offering a study regarding the various data mining techniques that can be applied in these automated systems. A patient will undergo fewer tests as a result of this automation. As a result, it will not only save expenses but also save analysts’ and patients’ time [3]. Based on a combination of variables, such as risk factors defining the disease, Raghunath et al. [4] employed and developed suitable machine learning algorithms that are computationally efficient as well as accurate for the prediction of heart disease occurrence. Using the Cleveland dataset from UCI, which consists of 303 records and 14 features, Kumar et al. [5] applied various machine learning techniques and algorithms. After analyzing the results of these algorithms, they discovered that SVM provided the best accuracy results, followed by naive Bayes, KNN, and decision tree. The Correlation-based Feature Selection (CFS) technique was put forth by Gazeloglu et al. [10] with a classifier that achieved the best accuracy of 84.81%. Additionally, they used the RBF network and CHi square to get an accuracy of 81.1%. The Relief feature selection method (RFRS) and C4.5 ensemble classifier were proposed by Liu et al. [11] using the SlatLog dataset, and they achieved the greatest accuracy of 92.5%. PCA and CHI Square feature selection methods were used by Farzana et al. [14] to minimize the number of features on Cleveland dataset, where RF with PCA given highest accuracy of 92.85%. Marappan, R. et al. [16] proposed method to extract hidden models in a dataset the usage of machine getting to know (ML) techniques and analyzing the accuracy of numerous ML algorithms to discover the satisfactory prediction of heart disease. Random forest achieved highest accuracy of 90.16%. Riyaz, L et al. [17] applied various ML techniques used for the prediction of the incidence of coronary heart diseases. Consistent with the results, the very best average prediction accuracy became done by ANN 86.90%, while the C4.5 DT method came up with the lowest prediction accuracy of 74.0%.
Improved and Intelligent Heart Disease Prediction System
101
3 Methodology The work is about predicting the probability of heart diseases by a machine learning algorithm, where two heart diseases dataset were collected with the same attribute. Then the prepossessing technique has been applied to unwanted occurrences in the dataset.The proposed method was divided into two methods, • First method: No application of any feature engineering method in these two datasets and 15 machine learning algorithms were applied on preprocessed datasets. After that the performance of the algorithms has been measured. • Second method: Application of the feature engineering method to both preprocessed datasets. Feature engineering is a method where this process extracts feature by domain knowledge and better understanding by machine learning algorithms. The proposed method applied 15 machine learning algorithms and analyzed the performance of a different algorithm. At last, compare the performance of two methods before and after feature engineering. Here Fig. 1 has shown the experimental workflow of this work.
Fig. 1. Experimental workflow of this research
3.1 Dataset Collection For this research, two open-source datasets have been collected. Cleveland dataset [6] collected from UCI repository which contains 303 records of patients and another dataset consists of four databases: Switzerland, Cleveland, Hungary, and Long Beach V [7] also collected from Kaggle repository which has 1025 records of patients. Both datasets have the same 14 attributes. The data sets attributes description is given in Table 1.
102
N. Alam et al. Table 1. Description of the attributes
3.2 EDA (Exploratory Data Analysis) & F.E (Feature Engineering) The feature is a method for taking vague raw data and turning it into a feature that a model can better understand and use to make decisions. Data domain expertise is required for feature engineering in order to build features. By creating highlights from raw data and creating facilities for the machine learning process, feature engineering increases the predictive power of machine learning calculations. Algorithms for machine learning use data as input and provide accurate, informative results. Features, which are typically represented by organized columns, are extracted from the incoming data [15]. By feature engineering, the dataset becomes more practical and intelligible for the machine to predictable the accuracy. To achieve prestigious results from the machine learning algorithms, feature engineering is necessary.So basically, creating such a feature is what was feed to the model. The model may be able to better understand. Feature engineering can arise out of domain understanding. It can come from data analysis or exploratory data analysis phase as well. Some insight has been converted into a feature. Sometimes the features can also come from an external data provider. Feature engineering points are to set up the correct information dataset and viable with the ML calculation prerequisites and for improving the exhibition of ML models. Feature engineering Techniques are: 1) Imputation 2) Categorical Encoding 3) Binning 4) Scaling 5) Feature Selection 6) Handling Outliers. • Categorical Encoding: For a good prediction from a dataset, there is a need for an intelligent way to preprocess datasets that are named by categorical features. Categorical Encoding is a feature engineering technique where those variables from dataset are extracted and separate these variables and mark them as their categorical type. There are many types of categorical encoding. Like Label Encoding, One-Hot Encoding, Count Encoding, Target Encoding, and Leave One Out Target Encoding. In proposed work, the Label Encoding of categorical encoding was applied. Where label Encoding was converted in each categorical value into some number.
Improved and Intelligent Heart Disease Prediction System
103
For example, the ‘Fruits’ feature contains 3 categories. Value assigned 0 to Mango, 1 to Apple, and 2 to Banana. The sex categories FEMALE and MALE can encode with values 0 and 1. • Scaling: For making better predictions the proposed method need to do preprocess on datasets. Scaling is a technique where data are scaled by some mathematical term. Like Standard Scaling, Min-Max Scaling (Normalization), Quintile Transformation, etc. In our research work, nStandard Scaling and Min-Max Scaling were applied. • Feature engineering: Both datasets have 14 attributes and the last attribute is the level of the dataset. The first 13attributes have gone through the feature engineering method and increased the attributes of the dataset. Here Table 2 shows the attribute of the dataset before feature engineering, where only 13 attributes of patients have an integer and categorical values.
Table 2. 13 Attributes before feature engineering 1
Age
6
thal
11
slope
2
Sex
7
resting
12
ca
3
Cp
8
thatch
13
thal
4
Trestbps
9
exang
5
Chol
10
odlpeak
Here Table 3: shows the attributes after applying the feature engineering method, which generates more than 47 attributes of patients. Feature creating method determination of categorical feature and encoding categorical feature was applied (label encoder). Previous 13 attributes and new generated 47 attributes merged and created a new dataset of 60 attributes. 3.3 Preparing to Model Here, for predicting heart disease split both datasets into 80% training data and 20% testing data. Then created the model using the training data applying different machine learning algorithms Linear Regression, Decision Tree Classifier, Support Vector Machines, Logistic Regression,k-Nearest Neighbors (KNN), Neural network (NN), Gaussian Process Classification, Naive Bayes, and so on. Then predict the label of test data and evaluate the accuracy of each of the algorithms for both datasets.
104
N. Alam et al. Table 3. new47 attributes after feature engineering
1
age2
17
age2_oldpeak2
33
restecg_ca
2
trestbps2
18
age2_slope
34
exang_cp
3
chol2
19
age2_ca
35
exang_trestbps2
4
thalch2
20
fbs_cp
36
exang_chol2
5
oldpeak2
21
fbs_trestbps2
37
exang_thalach2
6
sex_cp
22
fbs_chol2
38
exang_oldpeak2
7
sex_trestbps2
23
fbs_thalach2
39
exang_slope
8
sex_chol2
24
fbs_oldpeak2
40
exang_ca
9
sex_thalach2
25
fbs_slope
41
thal_cp
10
sex_oldpeak2
26
fbs_ca
42
thal_trestbps2
11
sex_slope
27
restecg_cp
43
thal_chol2
12
sex_ca
28
restecg_trestbps2
44
thal_thalach2
13
age2_cp
29
restecg_chol2
45
thal_oldpeak2
14
age2_trestbps2
30
restecg_thalach2
46
thal_slope
15
age2_chol2
31
restecg_oldpeak2
47
thal_ca
16
age2_thalach2
32
restecg_slope
4 Results and Discussion The work is about predicting heart diseases by a machine learning algorithm. Two datasets Cleveland and (Switzerland, and Long Beach) datasets are collected. The proposed method was divided the into two methods, where firstly feature engineering was not the method in these two datasets, and secondly here, the feature engineering method was applied. After that divided datasets into train and test applied machine learning algorithms and analyzed the performance of different algorithms. Comparing Accuracy with Before and After Feature Engineering for the Cleveland Dataset [6] In Table 4: the comparison of before and after feature engineering in the Cleveland dataset [6] of 303 instances have been shown. Here machine learning techniques have been applied and utilized that after applying feature engineering techniques, a better accuracy has been achived. The Support Vector Machines can be utilized which have given the best accuracy of 86.89%. Result than other machine learning techniques in this model. The equations of the evaluation parameters (Fig. 2): TP = True Positive, TN = True Negative FP = False Positive, FN = False Negative Accuracy =
TP + TN TP + FP + FN + TN
Improved and Intelligent Heart Disease Prediction System Table 4. Accuracy with before and after applying Feature Engineering [6] Algorithms
Accuracy Before Feature Engineering
Accuracy after Feature Engineering
Support Vector Machines
77.05%
88.52%
Extra Trees Classifier
80.33%
85.25%
Ridge Classifier
77.05%
86.89%
Linear SVC
78.69%
86.89%
Logistic Regression
77.05%
86.89%
KNN
77.05%
83.61%
Gradient Boosting Classifier
78.69%
81.97%
Precision = Recall = F1score =
TP TP + FP
TP TP + FN
2 ∗ (Recall ∗ Precision) Recall + Precision
Fig. 2. Precision, Recall, F1-score before and after FE on Cleveland dataset.
105
106
N. Alam et al.
Comparing Accuracy with Before and After Applying Feature Engineering for Dataset [7] Tables 5 and 6 reflects the comparison of the before and after feature engineering in the dataset consists of four databases: Switzerland, Hungary, Cleveland, and Long Beach V. [7] of 1025 instances. Follow the same procedure as the dataset [6]. Algorithm Gradient Boosting Classifier, Gaussian Process Classifier, and Bagging Classifier have given better accuracy of 100% than other algorithms. Table 5. Comparison table of accuracy before and after applying Feature Engineering [7] Algorithms
Accuracy Before Feature Engineering
Accuracy after Feature Engineering
Support Vector Machines
87.80%
90.24%
KNN
91.71%
99.02%
LGBM Classifier
95.12%
97.56%
Decision Tree Classifier
94.15%
98.54%
Gaussian Process Classifier
97.07%
100%
Bagging Classifier
95.12%
100%
Table 6. Evaluation metrics before and after applying feature engineering to dataset [7]: Algorithm Name SVM
Precision
Recall
F1-score
Before FE
After FE
Before FE
After FE
Before FE
After FE
0.89
0.91
0.88
0.90
0.88
0.90
KNN
0.92
0.99
0.92
0.99
0.92
0.99
LGBM Classifier
0.98
0.99
0.98
0.99
0.98
0.99
DT
0.94
0.99
0;94
0.99
0.94
0.99
Gaussian Process Classifier
0.97
1.00
0.97
1.00
0.97
1.00
Bagging Classifier 0.95
1.00
0.95
1.00
0.94
1.00
4.1 Comparing This Research Work with Some Previous Research Here, Table 7 has shown the comparison between the proposed methods with other previous research work. Where the proposed method has given better results than others.
Improved and Intelligent Heart Disease Prediction System
107
Table 7. Comparing the accuracy of heart disease prediction with different algorithms from different research Author
Dataset
Feature selection technique
Algorithms
highest Accuracy
Archana [1]
Cleveland Dataset
No feature selection technique
SVM, DT, LR, KNN 87%
D.Raghunath et al. [4]
Cleveland dataset with 500 sample
No feature selection technique
KNN, DT, KNN, LR, NB, SVM
84.76%
X. Liu et al. [11] SlatLog dataset
RFRS
C4.5
92.5%
C. Gazeloglu et al. [10]
Cleveland dataset
CFS & CHI square
NB, RBF classifier
84.81%
Marappan, R. et al. [16]
Cleveland Dataset and Hungarian Dataset
No feature selection
RF, LR, NB, SVM
90.165%
AJN
Cleveland dataset Feature Cleveland, Hungary, Engineering Switzerland, Long Beach V
Dataset [6]: SVM Dataset [7]: GradientBoosting BaggingClassifier
89% 100% 100% 100%
5 Conclusion This research provides an analysis of various machine learning techniques that will be useful to medical analysts or healthcare practitioners to make an accurate diagnosis of heart disease. In experimental analysis by applying feature engineering techniques and a machine learning algorithm can be, heart disease predicted. Here, the feature engineering technique increased the total number of attributes of the dataset which enhance the performance of the machine learning algorithm. The proposed method applied fifteen machine learning classification algorithms where for dataset [6] best accuracy was 88.52% gained by SVM. And for dataset [7] best accuracy was 100% gained by, Bagging Classifier and Gaussian Process classifier. Our future work will be proposed to gather local clinic datasets, Attributes of the dataset can similarly be changed and the proposed model can be applied. The dataset is displayed with its correlation values, so the process attributes are selected for predicting heart diseases.
References 1. Chudhey, A.S., Sharma, A., Singh, M.: Heart disease prediction using various machine learning algorithms. In: Mahapatra, R.P., Peddoju, S.K., Roy, S., Parwekar, P., Goel, L. (eds.) ICRTC 2021. LNNS, vol. 341, pp. 325–335. Springer, Singapore (2022). https://doi.org/10. 1007/978-981-16-7118-0_28
108
N. Alam et al.
2. Bhoyar, S., Wagholikar, N., Bakshi, K., Chaudhari, S.: Real-time heart disease prediction system using multilayer perceptron. In: 2021 2nd International Conference for Emerging Technology (INCET), pp. 1–4. IEEE (2021) 3. Bhatla, N., Jyoti, K.: An analysis of heart disease prediction using different data mining techniques. Int. J. Eng. 1(8), 1–4 (2012) 4. Raghunath, D., Usha, C., Veera, K., Manoj, V.: Predicting heart disease using machine learning techniques. Int. Res. J. Comput. Sci. 149–153 (2019) 5. Rajesh, N., Maneesha, T., Hafeez, S., Krishna, H.: Prediction of heart disease using machine learning algorithms. Int. J. Eng. Technol. (UAE) 7(2.32 Special Issue 32), 363–366 (2018) 6. “HeartDiseaseDataset.” https://www.kaggle.com/johnsmith88/heart-disease-dataset 7. Silva, F.S., et al.: Hyperbaric oxygen therapy mitigates left ventricular remodeling, upregulates MMP-2 and VEGF, and inhibits the induction of MMP-9, TGF-β1, and TNF-α in streptozotocin-induced diabetic rat heart. Life Sci. 295, 120393 (2022) 8. Gazeloglu, C.: Prediction of heart disease by classifying with feature selection and machine learning methods. Prog. Nutr. 22(2), 660–670 (2020) 9. Liu, X., et al.: A hybrid classification system for heart disease diagnosis based on the RFRS method. Comput. Math. Methods Med. 2017 (2017) 10. Nguyen, T.N.A., Bouzerdoum, A., Phung, S.L.: A scalable hierarchical Gaussian process classifier. IEEE Trans. Signal Process. 67(11), 3042–3057 (2019) 11. Patel, J., TejalUpadhyay, D., Patel, S.: Heart disease prediction using machine learning and data mining technique. Heart Dis. 7(1), 129–137 (2015) 12. Tasnim, F., Habiba, S.U.: A comparative study on heart disease prediction using data mining techniques and feature selection. In: 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 338–341. IEEE (2021) 13. Almustafa, K.M.: Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinform. 21(1), 1–18 (2020) 14. Srivastava, K., Choubey, D.K.: Heart disease prediction using machine learning and data mining. Int. J. Recent Technol. Eng. 9(1), 212–219 (2020) 15. Essinger, S.D., Rosen, G.L.: An introduction to machine learning for students in secondary education. In: 2011 Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), pp. 243–248. IEEE (2011) 16. Marappan, R.: Heart disease prediction analysis using machine learning algorithms. J. Appl. Math. Comput. 6(3), 273–281 (2022). https://doi.org/10.26855/jamc.2022.09.001 17. Riyaz, L., Butt, M.A., Zaman, M., Ayob, O.: Heart disease prediction using machine learning techniques: a quantitative review. In: Khanna, A., Gupta, D., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1394, pp. 81–94. Springer, Singapore (2022). https://doi.org/ 10.1007/978-981-16-3071-2_8 18. Hossen, M.K.: Heart disease prediction using machine learning techniques. Am. J. Comput. Sci. Technol. 5(3), 146–154 (2022) 19. Mahmud, M., Kaiser, M.S., Hussain, A., Vassanelli, S.: Applications of deep learning and reinforcement learning to biological data. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2063–2079 (2018). https://doi.org/10.1109/TNNLS.2018.2790388 20. Mahmud, M., Kaiser, M.S., McGinnity, T.M., Hussain, A.: Deep learning in mining biological data. Cogn. Comput. 13(1), 1–33 (2020). https://doi.org/10.1007/s12559-020-09773-x
PreCKD ML: Machine Learning Based Development of Prediction Model for Chronic Kidney Disease and Identify Significant Risk Factors Md. Rajib Mia1 , Md. Ashikur Rahman1 , Md. Mamun Ali1 , Kawsar Ahmed2,3(B) , Francis M. Bui3 , and S M Hasan Mahmud4 1
Department of Software Engineering (SWE), Daffodil International University (DIU), Daffodil Smart City, Ashulia, Savar, Dhaka 1341, Bangladesh {rajib.swe,ashikur35-562,mamun35-274}@diu.edu.bd 2 Group of Biophotomatix, Department of ICT, Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail 1902, Bangladesh [email protected], [email protected], [email protected] 3 Department of Electrical and Computer Engineering (ECE), University of Saskatchewan (USASK), 57 Campus Drive, Saskatoon, SK S7N 5A9, Canada {k.ahmed,francis.bui}@usask.ca 4 Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka 1229, Bangladesh [email protected]
Abstract. Chronic Kidney Disease (CKD) is major concern of death in recent years that can be cured by early treatment and proper supervision. But early detection of CKD and exact risk factors should be known to ensure proper treatment. The study mainly aims to address the issue by building a predictive model and discovers the most significant risk factors employing machine learning (ML) approach for CKD patients. Four individual machine learning classifiers were applied to conduct this study. It is found that GB performed very poor compare to other applied classifiers where RF and LightGBM outperformed with 99.167% accuracy. In terms of risk factors, it is found that sg, hemo, sc, pcv, al, rbcc, htn, dm, bgr, and sod are the most significant factors, which are mainly correlated with CKD. The study and its findings indicate that it will enable patients, doctors and clinicians to identify CKD patients early and ensure proper treatment for them. Keywords: CKD · Feature Importance Forest · Specific Gravity
· Hemoglobin · Random
c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 109–121, 2023. https://doi.org/10.1007/978-3-031-34619-4_10
110
1
Md. R. Mia et al.
Introduction
Kidney is one of the major organs in the human body. The kidneys are two organs that look like beans and work together. The primary function of the kidneys is to filter the blood and discharge waste, as well as balance the fluid in the human body through the excretion of excess fluid through the urine. Science says that healthy kidneys filter between 120 and 150 quarts of blood and make between 1 and 2 quarts of urine every day. The creation of urine comprises a series of very complex excretion and re-absorption processes. Potassium, acids, and generating hormones, which is essential for other organs’ functionality, this process is very much needed for keeping a static balance of the body’s salt. CKD is one of the fatal diseases that is causing concern throughout the world at the present moment. Besides, it also increases the risk of other diseases like heart disease and heart failure, strokes, early death, etc. [2]. CKD is a condition in which, the kidneys’ progressive decline in functionality gradually. There are about 850 million people around the world suffering from kidney disease only in 2020, and the number is increasing every day. CKD affects 10.4 percent of men and 11.8 percent of women in this group [1]. The measured or estimated glomerular filtration rate (EGFR), which is determined by the creatinine level, gender, and age, is used to assess the stages of CKD. A large sum of money was required for the treatment of CKD patients. Every year, countries all over the world spend a vast amount of their healthcare budgets on CKD patients. So it has been a big concern for developing countries like India, Bangladesh, and so on to spend a huge amount on CKD treatment. Therefore, it is necessary to raise awareness among people about the risk and symptoms of CKD. Several studies have been attempted in the last decade to predict CKD using various ML algorithms. Ramesh et al. (2019) performed data mining (DM), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM) classifiers to predict CKD. This research got the RF classifier to give an accuracy of 99.16%. Though this study used 3 classifier algorithms but only focused on the accuracy of the classifier. But accuracy is not the only performance matrix to call a model good [3]. Yashfi et al. (2020) used Artificial Neural Network (ANN) and RF algorithm to identify CKD, and also performed a chi-square test for feature selection as well as applied 10-fold cross validation on the dataset. However, only two algorithms were utilized to predict CKD in this investigation, and only the chi-square test was used to identify the characteristics [4]. Ward et al. (2019) applied four Machine Learning (ML) algorithms like, Logistic Regression (LR), SVM, RF, and Gradient Boosting (GB). Among them, GB gives the height accuracy for training data 99.7% and for testing data 99.0% [5]. Pankaj et al. (2021) have trained the model using ML classifiers such as the ANN, C5.0, Logistic Regression (LR), linear SVM, K- nearest Neighbors (KNN), and Random Tree (RT). They also utilised a Deep Neural Network (DNN), which has a 99.6% accuracy rate [6]. Jiongming et al. (2019) performed various ML algorithms to analyze the different features, which are important to predict CKD. Among those algorithms, KNN had 99.25% accuracy [7].
PreCKD ML: Development of Prediction Model for CKD
111
Hasan et al. (2019) performed a study, where this research showed five ML algorithms such as, Adaptive Boosting (AB), Bootstrap Aggregating (BA), Extra Trees (ET), GGB, and RF Classifier. With a 99 percent accuracy, AB has been the most accurate [8]. And this study only focused on features importance techniques. Celik et al. (2016) proposed a model based on SV and DT classifier algorithms and got an accuracy of 100% for DT Test1, but got 91.6667% for DT Test2. Here, only two algorithms have been used to predict CKD [9]. Surya et al. (2021) proposed a model based on Convolutional Neural Networks (CNN) and Bi-directional Long Short Term Memory (BLSTM). This study got AUROC of 0.957 for 6 month predictions and got AUROC of 0.954 for 12 month predictions. Although the dataset used in this study is not a clinical dataset, it contains only age, sex, comorbidities, and medications [10]. Another study in CKD has been proposed by Almansour et al. (2019) based on ANN and SVM approaches, where ANN got the better performance of 99.75% accuracy. Though the performance was good, the study only used 2 approaches for predicting CKD [11]. Furthermore, Radha and Ramya (2015) have taken a study, where this research was carried out Na¨ıve Bayes (NB), DT, KNN, and SVM classifier algorithms to identify CKD. And got the highest accuracy of 98% for KNN, and NB gives a poor accuracy whereas no significant characteristics have been shown by this study [12]. In addition, Chiu et al. proposed an ANN based model with an accuracy of 94.75%. However, this research did not come across any features that were significant to making predictions in their work [13]. According to the above mentioned discussion, it can be found that still there is scope to improve the existing methods. From this perspective, this study concentrated on the characteristics those have been shown to predict CKD and also focused on which features are important to predict positive CKD and which features are impactful to predict negative CKD. Various Machine Learning (ML) algorithms have been used for performing predictions. Nowadays, ML is very useful for predicting many kinds of disease in the medical field. The primary goal of this study is to analyze the features and predict which features are impactful in predicting CKD. And select the best model among different ML algorithms by analyzing the statistical results. Our contributions are mentioned as following: ✦ Having collected an open source data from Kaggle online repository, the data was preprocessed to fit for ML classifiers. ✦ Then four individual mostly used ML algorithms were applied to find the best fit ML classifier for the expected predictive model. ✦ After comparing the performances of all the applied classifiers, best fit classifier is selected. ✦ SHAP values is calculated and found out the impact of each feature on the CKD. ✦ Risk factors of CKD are analysed and discussed.
112
Md. R. Mia et al.
Fig. 1. Experimental methodology
2 2.1
Materials and Methods Data
The datasets used in this study came from the UCI ML Repository [14]. There are 400 instances in all, 250 of which are CKD, and the remaining 150 are not CKD. In addition, the dataset has 25 features in total. Among them, 24 are labeled features and 1 is a targeted feature. ckd (stated as “CKD patient”) and notckd (stated as “Not CKD patient”) were used to categorize the target feature. Furthermore, there are basically two types of data, one is numeric and other is nominal. In the datasets 14 features are nominal and residual 11 features are numeric data. Table 1 contains detailed information on the datasets, including the features name, type, and interpretation for every feature. 2.2
Data Preprocessing
Every ML or DM approach requires data preprocessing. Because the efficiency of a ML approach is dependent on data preprocessing, The data was prepreoprocessed using Weka version 3.8.6 as a DM tool. Python version 3.7. 12 is suggested for Exploratory Data Analysis (EDA) visualisation and model building. To begin, a Replace MissingValues filter was used to handle missing data. Secondly, we performed encoding to convert string data to numeric format. Here we used label encoding to perform this conversion. Label Encoding is the process of converting labels into numbers so that machines can read them. Then, ML techniques might be able to figure out how these labels should be handled better. It is a very important step in the preparation of classifiers for organised datasets [26]. Then performed some EDA with visualization on the process datasets. The working methodology of this study has been shown in Fig. 1.
PreCKD ML: Development of Prediction Model for CKD
113
Table 1. Data and feature interpretation
2.3
Features Type
Interpretation
Unit/Attribute Values
age bp sg al su rbc pc pcc ba bgr bu sc sod pot hemo pcv wc rc htn dm cad appet pe ane class
Patient age Blood Pressure Specific Gravity Albumin Sugar Red Blood Cell Pus Cell Pus Cell clumps Bacteria Blood Glucose Random Blood Urea Serum Creatinine Sodium Potassium Hemoglobin Packed Cell Volume White Blood Cell Count Red Blood Cell Count Hypertension Diabetes Mellitus Coronary Artery Disease Appetite Pedal Edema Anemia Target Class
Years mm/Hg 1.005, 1.010, 1.015, 1.020, 1.025 0, 1, 2, 3,4, 5 0, 1, 2, 3,4, 5 normal, abnormal normal, abnormal present, not present present, not present mgs/dl mgs/dl mgs/dl mEq/l mEq/l gms – cells/cumm millions/cmm yes, no yes, no yes, no good, poor yes, no yes, no ckd, notckd
numeric numeric nominal nominal nominal nominal nominal nominal nominal numeric numeric numeric numeric numeric numeric numeric numeric numeric nominal nominal nominal nominal nominal nominal nominal
Performance Evaluation Metrics
Four different classification techniques were used to the dataset in order to determine the technique that performed the best when accuracy and other statistical metrics were compared by train-test-split. RF, GB, XGBoost (XGB) and Light Gradient Boosting Machine (LightGBM) were used in this study. On the basis of their performance evaluation metrics, these algorithms were compared. This part provides an overview of various performance assessments. The confusion matrix was used to determine the sensitivity (Sn), specificity(Sp), and accuracy(Acc) of each algorithm’s outcome. All parameters were calculated using the formulas shown below [20,23]. Acc =
TP + TN TP + TN + FP + FN
(1)
114
Md. R. Mia et al.
TP (2) TP + FN TN (3) Sp = TN + FP The number of accurate predictions provided by the model over all possible predictions is referred to as accuracy in classification ML problems. Accuracy is a suitable metric when the target variable classes in the data are roughly balanced [20]. The percentage of true positive instances that were projected to be positive is known as sensitivity [23]. The term “specificity” refers to the fraction of real negatives that were predicted as negatives [24]. Multiple statistical measures like as kappa statistics (Kp), recall (Rc), precision (Pr ) and f1-measure (F1 ) were also used to evaluate the performance of various algorithms. The term “recall” refers to the quantity of positives generated by our ML algorithm [21]. Precision is the number of real positives divided by the total number of predicted positives [22]. The harmonic mean of accuracy and recall will be determined by this score. The weighted average of accuracy and recall is used to determine the F1 score [21]. The Kappa statistic is a measure that is used to evaluate observed and predicted accuracy [25]. Sn =
TP TP + FN TP Pr = TP + FP
Rc =
2.4
(5)
2 × P r × Rc P r + Rc
(6)
observed accuracy − expected accuracy 1 − expected accuracy
(7)
F1 = Kp =
(4)
Machine Learning Approaches
In this study, four ML classifiers like RF, GB, LightGBM, and XGBoost were used. These are explained in the next section. Random Forest: RF is a way to learn while being watched. It makes a “forest” from a group of decision trees, most of which are trained using the “bagging” method. The basic idea behind the bagging method is that combining different ways of learning makes the final result better [15]. This supervised learning approach predicts the result based on voting techniques. If a majority of the trees in the forest give prediction are 1, then the RF predicts that the final prediction is 1, and vice versa [16]. Also, RF is a macro method that uses DT classifiers to resample the dataset multiple times and then uses averaging to improve prediction accuracy and avoid overfitting. When bootstrap=True the size of the resamples is determined by the max samples argument; otherwise,
PreCKD ML: Development of Prediction Model for CKD
115
the entire dataset is being used to create each tree [27]. For this research, the best-fit n estimators value was 50 and max depth was 4, which provided the best performance upon this used dataset. Gradient Boosting: GB makes additive regression models by fitting a basic parameterized function (base learner) to repeatedly show “pseudo”-residuals using least squares [17]. GB classifiers are actually a group of ML techniques that combine a lot of weak learning models to make a strong prediction model. DTs are often used when doing GB. GB models are becoming more popular because they are good at classifying big, complicated datasets [28]. Additionally, the GB algorithm employs the sequential ensemble learning technique. Weak learners significantly improved over time as a result of this strategy of loss optimization. For instance, the second weak learner is stronger than the first, and the third weak learner is stronger to the second. According to this research, the best-fit n estimators value was 25 and learning rate was 0.1, which resulted in the best performance on the dataset used in the study. LightGBM: LightGBM is a GB method that uses techniques from tree-based learning. It is distributed and supports parallel and GPU learning, making it capable of managing massive amounts of data. LightGBM is six times the speed of XGBoost. XGBoost is a ML method that is very quick and accurate. However, it is currently being challenged by LightGBM, which runs quicker with equivalent model accuracy and provides users with more hyperparameters to tune. The critical performance difference is that XGBoost splits the tree nodes each level at a time, while LightGBM does it single node at a moment [19]. A similar treebased learning method is also used in LightGBM, which is a method for GB. LightGBM grows trees upward, while another method grows trees parallel to the ground. This means that LightGBM grows trees one leaf at a time, while the other method grows trees one level at a time. It will pick the leaf with the most water loss [30]. For this study, the best-fit random state value was 75, learning rate was 0.09 and max depth was 5, which performed optimally on the used dataset. Extreme Gradient Boosting: XGBoost is a method for learning in groups. At times, it may not be enough to just use the results of a single ML model. Ensemble learning is a way to combine the ability to predict of many learners in a structured way. Because of this, a single model is made that combines the results of many models [18]. Also, XGBoost is a framework for distributed GB that has been made to be very efficient, flexible, and portable. It makes ML techniques with the help of the GB framework. XGBoost uses simultaneous tree boosting to do a number of data science tasks quickly and accurately. [29]. The XGBoost algorithm is designed on the concept of Performance and Execution Time. It performs much quicker than other boosting algorithms. Both regression and classification issues may be solved with XGBoost. This strategy essentially
116
Md. R. Mia et al.
enhances the DT sequence and improves the accuracy dependent on the weight. For this analysis, the best suited base score value was 0.5 and learning rate was 0.1, which provided the highest performance for the used dataset. 2.5
Model Selection and Features Importance
For every ML approach, selecting the best model is crucial. In this study, we select the best model by considering the various evaluation matrix and statistically analyzing the result of the evaluation matrix. Another crucial term in ML techniques is to select the important features for prediction. Feature importance is crucial because if we find the important features and rank them, it will have a really big impact on the research for prediction in the fields of biomedicine and social science. In this work, SHAP values have been used to sort-out the important features of this data set. SHAP values quantify the effect of getting a particular value for a particular feature in relation to the prediction we’d generate if that feature had some numeric value [31]. The SHAP value is estimated using the below equation: φi =
(S⊆N {i})
(|S|)!(K − |S| − 1)! [f (S ∪ {i}) − f (S)] K!
(8)
Here φi is the value of feature importance of ith instance. K is the number of independent features where S is the non zero indexes.
3
Results and Discussion
In this study, python (Version 3.8.5) was employed to conduct the study and google colab was used as the IDE and programming environment. The result found from the study is represented in following section. Table 2 described the accuracy score, sensitivity and specificity score for different ML approaches, which have been used in this work. Table 2 represents that both ML approaches, RF and LightGBM, predicted CKD with the maximum accuracy of 99.167%. The heights sensitivity score is 1.0 for RF. The XGB gives the highest specificity of 1.0. The highest recall score found is 1.0 for RF. The XGB and LightGBM give the highest 1.0 precision score among four different ML approaches. F1 Measure score is 0.989 for RF and LightGBM, which is the Table 2. Performance comparison among all the applied classifiers Algorithm Accuracy Sensitivity Specificity Recall Precision F1 Measure Kappa Statistic GB XGBoost RF LightGBM
95.83% 98.33% 99.167% 99.167%
0.959 0.973 1.00 0.986
0.956 1.0 0.978 1.0
0.959 0.973 1.0 0.986
0.972 1.0 0.986 1.0
0.946 0.979 0.989 0.989
0.912 0.965 0.982 0.982
PreCKD ML: Development of Prediction Model for CKD
117
highest and a 0.982 kappa statistic score is shown by LightGBM and RF among all the used ML algorithms. Those results are shown in Table 2. This research mainly focused on the features that are important to predict CKD. To perform and find out important features, this study used the SHAP Summery plot. Utilizing SHAP Summery plot features, they are listed according to their impact on the prediction. Figure 1 is shown the SHAP Summery plot for four different ML approaches that have been used in this study. In Fig. 2, the top twenty features out of a total of twenty-four are shown for each ML approach. Subplot A in Fig. 2 shows the features that are important for predicting CKD for the RF classifier. Later, subplot B shows the impacted features of the GB classifier. Lastly, subplots C and D, respectively, show those features which have the most impact on CKD prediction for XGB and LightGBM classifiers. In Table 3, the top 10 features that have an impact on predicting CKD for four different ML algorithms are summarized. Table 3. Top 10 significant features for CKD patients Algorithm Top Ten Features RF GB XGB LightGBM
sg, sg, sg, sg,
hemo, hemo, hemo, hemo,
sc, pcv, al, rbcc, htn, dm, bgr, sod pcv, al, rbcc, dm, sc, htn, bgr, ba pcv, sc, sod, al, age, htn, rbcc, bgr sc, al, pcv, htn, sod, dm, age, bgr
In brief, a CKD dataset for this study was collected from an online repository known as Kaggle. Then the dataset was preprocessed as necessary to prepare the data for applying ML approaches. Four different ML approaches were used to predict CKD. Then it is found that the RF and LightGBM ML approaches give the highest accuracy of 99.167%. Later on, we applied feature selection method to find out the features those are important to predict CKD. For features’ importance, the SHAP summary plot is used in this work to show the feature importance and their impact to find the important risk factors. This study has found out important features of different 4 ML approaches. Chittora et al. (2021) found the important features were (rbcc, pc, al, ba, su, pcc, sc, age, bp, bgr) [6]. Qin et al. (2019) showed that (sg, hemo, sc, al, pcv, rbcc, htn, dm, bgr, bu) are the most important features [7]. These publication’s outcome indicate that our findings are valid and the predictive model is highly potential to predict CKD. The study will support doctors, clinicians and patients to predict CKD and related complexities and to find out their impact using the proposed methods. Overall, the study will contribute in medical sector to predict and analyze risk factors of a CKD patient.
118
Md. R. Mia et al.
Fig. 2. Significant features and their impact on CKD
4
Conclusion and Future Work
The study proposed a ML model, RF, to predict CKD along with significant accuracy of 99.17%. The study also focuses on discovering the most significant risk factors those are significant for CKD prediction. It is found that sg, hemu, sc and pcv are the most significant risk factors which are mostly responsible for CKD. Besides, all the features are ranked according to their significance. It can be noticed that the CKD dataset evaluated in this study is not particularly
PreCKD ML: Development of Prediction Model for CKD
119
large and was compiled by others. For improved analysis and performance of the features’ impact on model evaluation, raw data from CKD patients will be collected in the future. In addition to that, more advanced technology will be applied to upgrade the model and its performance. It is possible that this research could have important therapeutic benefits, and the analysis of the study results could assist doctors and researchers better predict when someone will have CKD. Acknowledgement. This work was supported by funding from the Natural Sciences and Engineering Research Council of Canada (NSERC)
References 1. Davis, G., Kurse, A., Agarwal, A., Sheikh-Hamad, D., Kumar, M.R.: Nanoencapsulation strategies to circumvent drug-induced kidney injury and targeted nanomedicines to treat kidney diseases. Current Opinion in Toxicology, p. 100346 (2022) 2. Revathy, S., Bharathi, B., Jeyanthi, P., Ramesh, M.: Chronic kidney disease prediction using machine learning models. Int. J. Eng. Adv. Technol. (IJEAT), 9 (2019) 3. Yashfi, S.Y., Islam, M.A., Sakib, N., Islam, T., Shahbaaz, M., Pantho, S.S.: Risk prediction of chronic kidney disease using machine learning algorithms. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5. IEEE (2020) 4. Qin, J., Chen, L., Liu, Y., Liu, C., Feng, C., Chen, B.: A machine learning methodology for diagnosing chronic kidney disease. IEEE Access 8, 20991–21002 (2019) 5. Zubair Hasan, K.M., Zahid Hasan, M.: Performance evaluation of ensemble-based machine learning techniques for prediction of chronic kidney disease. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds.) Emerging Research in Computing, Information, Communication and Applications. AISC, vol. 882, pp. 415–426. Springer, Singapore (2019). https://doi.org/10.1007/978-981-135953-8 34 6. Celik, E., Atalay, M., Kondiloglu, A.: The diagnosis and estimate of chronic kidney disease using the machine learning methods. Int. J. Intell. Syst. Appl. Eng. 4(Special Issue-1), 27–31 (2016) 7. Krishnamurthy, S., et al.: Machine learning prediction models for chronic kidney disease using national health insurance claim data in Taiwan. In: Healthcare, vol. 9, no. 5, p. 546. Multidisciplinary Digital Publishing Institute (2021) 8. Almansour, N.A., et al.: Neural network and support vector machine for the prediction of chronic kidney disease: a comparative study. Comput. Biol. Med. 109, 101–111 (2019) 9. Radha, N., Ramya, S.: Performance analysis of machine learning algorithms for predicting chronic kidney disease. Int. J. Comput. Sci. Eng. Open Access 3, 72–76 (2015) 10. Chiu, R.K., Chen, R.Y., Wang, S.A., Jian, S.J.: Intelligent systems on the cloud for the early detection of chronic kidney disease. In: 2012 International Conference on Machine Learning and Cybernetics, vol. 5, pp. 1737–1742. IEEE (2012) 11. Ebiaredoh-Mienye, S.A., Esenogho, E., Swart, T.G.: Integrating enhanced sparse autoencoder-based artificial neural network technique and softmax regression for medical diagnosis. Electronics 9(11), 1963 (2020)
120
Md. R. Mia et al.
12. Donges, N.: A complete guide to the random forest algorithm. Built In, 16 (2019) 13. Quinlan, J.R.: Induction of decision trees. Mach. learn. 1(1), 81–106 (1986). https://doi.org/10.1007/BF00116251 14. Friedman, J.H.: Stochastic gradient boosting. Comput. Statistics Data Anal. 38(4), 367–378 (2002) 15. Sundaram, R.B.: An end-to-end guide to understand the math behind XGBoost (2018) 16. Gupta, A., Gupta, A., Verma, V., Khattar, A., Sharma, D.: Texture feature extraction: impact of variants on performance of machine learning classifiers: study on chest x-ray – pneumonia images. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds.) BDA 2020. LNCS, vol. 12581, pp. 151–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66665-1 11 17. Pramanik, R., Khare, S., Gourisaria, M.K.: Inferring the occurrence of chronic kidney failure: a data mining solution. In: Gupta, D., Khanna, A., Kansal, V., Fortino, G., Hassanien, A.E. (eds.) Proceedings of Second Doctoral Symposium on Computational Intelligence. AISC, vol. 1374, pp. 735–748. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3346-1 59 18. Ali, M.M.: Machine learning-based statistical analysis for early stage detection of cervical cancer. Comput. Biol. Med. 139, 104985 (2021) 19. Haitaamar, Z.N., Abdulaziz, N.: Detection and semantic segmentation of rib fractures using a convolutional neural network approach. In: 2021 IEEE Region 10 Symposium (TENSYMP), pp. 1–4. IEEE (2021) 20. Shah, A., Rathod, D., Dave, D.: DDoS attack detection using artificial neural network. In: International Conference on Computing Science, Communication and Security, pp. 46–66. Springer, Cham (2021) 21. Piech, M., Smywinski-Pohl, A., Marcjan, R., Siwik, L.: Towards automatic points of interest matching. ISPRS Int. J. Geo Inf. 9(5), 291 (2020) 22. Nelson, D.: Gradient boosting classifiers in python with scikit-learn. Retrieved from Stack Abuse. https://stackabuse.com/gradientboosting-classifiers-in-python-withscikit-learn (2019) 23. Chen, T., He, T., Benesty, M. and Khotilovich, V.: Package ‘xgboost’. R version, 90 (2019) 24. Abdurrahman, M.H., Irawan, B., Setianingsih, C.: A review of light gradient boosting machine method for hate speech classification on twitter. In: 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), pp. 1–6. IEEE (2020) 25. Lazich, I., Bakris, G.L.: Prediction and management of hyperkalemia across the spectrum of chronic kidney disease. In: Seminars in nephrology, vol. 34, no. 3, pp. 333–339. WB Saunders (2014) 26. Rabby, A.S.A., Mamata, R., Laboni, M.A., Abujar, S.: Machine learning applied to kidney disease prediction: Comparison study. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2019) 27. Bhutani, H., et al.: A comparison of ultrasound and magnetic resonance imaging shows that kidney length predicts chronic kidney disease in autosomal dominant polycystic kidney disease. Kidney Int. 88(1), 146–151 (2015) 28. Elhoseny, M., Shankar, K., Uthayakumar, J.: Intelligent diagnostic prediction and classification system for chronic kidney disease. Sci. Rep. 9(1), 1–14 (2019) 29. Grams, M.E.: Predicting timing of clinical outcomes in patients with chronic kidney disease and severely decreased glomerular filtration rate. Kidney Int. 93(6), 1442– 1451 (2018)
PreCKD ML: Development of Prediction Model for CKD
121
30. Merzkani, M.A., et al.: Kidney microstructural features at the time of donation predict long-term risk of chronic kidney disease in living kidney donors. In: Mayo Clinic Proceedings, vol. 96, no. 1, pp. 40–51. Elsevier (2021) 31. Farrington, K., et al.: Clinical Practice Guideline on management of older patients with chronic kidney disease stage 3b or higher (eGFR< 45 mL/min/1.73 m2): a summary document from the European Renal Best Practice Group. Nephrology Dialysis Transplantation, 32(1), 9–16 (2017)
A Reliable and Efficient Transfer Learning Approach for Identifying COVID-19 Pneumonia from Chest X-ray Sharmeen Jahan Seema1(B)
2
and Mosabber Uddin Ahmed2
1 Department of Information and Communication Technology, Bangladesh University of Professionals, Dhaka 1216, Bangladesh [email protected] Department of Electrical and Electronic Engineering, University of Dhaka, Dhaka 1000, Bangladesh [email protected]
Abstract. Over 500 million people have fallen prey to the coronavirus (COVID-19) epidemic that is sweeping the world. The traditional method for detecting it is pathogenic laboratory testing, but it has a high risk of false negatives, forcing the development of additional diagnostic approaches to combat the disease. X-ray imaging is a straightforward and patient-friendly operation that may be performed in almost any healthcare facility. The aim of the report is to use transfer learning models to build a feasible mechanism for determining COVID-19 pneumonia automatically utilizing chest X-ray images while enhancing detection accuracy. On three publicly available datasets, we ran several experiments. The recommended mechanism is intended to provide multi-class classification diagnostics (COVID-19 pneumonia vs. Non COVID-19 pneumonia vs. Normal). In this study, 5 selected best transfer learning methods out of 9 alternative models were tested in various scenarios with varied dataset splitting and amalgamation. Based on their performance with the Merged dataset, an ensemble model was developed using top three models. Our proposed ensemble model had classification accuracy, precision, recall, and f1-score of 99.62%, 1, 0.99, and 1.00 for multi-class cases, respectively. It detected 99.12% of COVID-19 pneumonia accurately. This recommended system can considerably improve COVID-19 diagnosis time and efficiency. Keywords: Transfer Learning · Ensemble Network · COVID-19 · Pneumonia
1
· Convolutional Neural
Introduction
SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) first appeared in 2019, became pandemic in 2020, and is now a particularly major cause of c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 122–136, 2023. https://doi.org/10.1007/978-3-031-34619-4_11
A Reliable and Efficient Transfer Learning Approach
123
pneumonia known as COVID-19 pneumonia [1]. As of April 14, 2022, the pandemic has spread to over 200 nations, with well over 500 million verified illnesses and surpassing 6 million deaths [2]. The infection might spread to the lungs as the virus multiplies. It’s probable that one will get pneumonia if this happens [3]. COVID-19-associated pneumonia was formerly known as Novel CoronavirusInfected Pneumonia (NCIP). The World Health Organization termed it COVID19, which stands for coronavirus disease 2019. It can affect anybody, however it is more common in persons aged 65 and over. Fever, chest tightness and a persistent cough are obvious indicators of COVID-19. Symptoms such as high heart rate, breathlessness, difficulty breathing and disorientation may occur if the COVID-19 infection escalates to pneumonia [4]. One of the most common lung disease is known as pneumonia. It is a bacterial, viral, or fungal infection that affects the lungs [5]. Infants and small children, those over 65, and those with health issues or compromised immune systems are the most vulnerable [6]. The World Health Organization claims, pneumonia costs around 4 million individuals to die prematurely each year. It affects over 150 million people each year, primarily children under the age of five [7]. In some circumstances, identifying COVID-19 pneumonia from conventional pneumonia can be tricky, especially when clinical criteria are taken into account. In the initial days of COVID-19, the primary symptom were fever, exhaustion, and a persistent cough, but people with General Pneumonia (GP) displayed common characteristics [8]. As a result, timely identification and isolation of those suffering from GP and COVID-19 pneumonia can hopefully minimize the outbreak from propagating [9]. The conventional technique to pinpoint COVID-19 disease is reverse transcription polymerase chain reaction (RT-PCR). Unfortunately, it has a number of shortcomings including false positives, limited sensitivity, high cost, and the need for professionals to perform the test. As the incidence rate climbs, it becomes increasingly important to develop an accurate, quick, and low-cost swift evaluation system. Because they are inexpensive to get and easily available, chest X-ray images could be used a substitute method. COVID-19 can only be diagnosed by a professional physician using a chest X-ray. There are very few specialists who can make this assessment. COVID-19 is also a extremely deadly viral disease, with healthcare professionals and attendants potentially extremely vulnerable. Early diagnosis of pneumonia is significant both for the preventing epidemic spread as well as ensuring a patient’s recovery. By expediting the detection process, doctors may identify different types of pneumonia from chest X-rays more effectively and easily, potentially saving thousands of lives and lowering treatment costs. Despite the fact that the literature describes a variety of methods for distinguishing X-ray images and detecting COVID-19 pathogen, the bulk of the methods succeed to distinguish between two groups (COVID19 pneumonia vs. Normal). Nonetheless, well-developed models are required to categorize COVID-19 pneumonia from Non COVID-19 pneumonia and healthy instances [10]. The study’s main focus is to establish a robust model that will identify COVID-19 pneumonia, Non COVID-19 pneumonia, and Normal cases from Xray images collected from different datasets. The existing datasets are typically
124
S. J. Seema and M. U. Ahmed
small in size due to scarcity of COVID-19 pneumonia samples. The works that have been done on large/amalgamation of small datasets is very few in number. This paper works with 3 different publicly available datasets which contains comparatively higher number of chest X-ray images than previous datasets that have been worked upon. On top of that, this work implemented various ways of splitting and merging the datasets which ultimately increased the number of data. Performance analysis of 5 best transfer learning models chosen out of 9 transfer learning models for accurate recognition and classification of genre of pneumonia is also seen in this work. Finally, an ensemble model has been proposed for differentiating the 3 classes.
2
Literature Review
The underlying features from 13 Convolutional Neural Network (CNN) models were supplied to the SVM model by Sethy et al. [11]. ResNet50 with SVM outperformed the other 12 classification models with an efficiency of 95.33 %. Khan et al. [12] suggested CoroNet, a Deep Convolutional Neural Network model based on the Xception architecture. The suggested model exhibited classification accuracy of 95 %. Narin et al. [13] offered five CNN constructed models for the recognition of COVID-19, healthy, and pneumonia infected patients. Among the five models, ResNet50 model performed the best. To categorize COVID-19 pneumonia, non-COVID-19 pneumonia, and normal pneumonia, Nishio et al. [14] used the VGG16, MobileNet, DenseNet121, and EfficientNet CNN models. With an accuracy of 83.6 %, he realized how the VGG16 model surpassed its competitor models. Ozturk et al. [15] classified chest X-rays using DarkCovidNet on 1,127 samples, and she did it with an accuracy of 87.02 %. EDL-COVID was used by Tang et al. [16] on 15,477 samples, with an accuracy of roughly 95 %. To differentiate COVID-19 pneumonia from non-COVID pneumonia and nor¨ uz et al. [17] developed Ensemble-CVDNet and experimental mal patients, Oks¨ results showed that it had a 98.30 % accuracy rate. Bhardwaj et al. [18] used 4 distinct models to create a COVID-19 detecting deep ensemble learning system. It achieved multiclass accuracy of 92.36 %. To detect COVID-19, Afifi et al. [19] utilized three networks and for a three-class issue and the outcomes of the trial revealed that their model was 91.2 % accurate.
3
Methodology
COVID-19 pneumonia and Non COVID-19 pneumonia recognition and classification adopting digital chest X-ray is a challenging undertaking because the X-ray images of both diseases show little to no difference. For precise classification, we need a robust and optimal model which can be achieved through deep learning methods. In order to identify a reliable and effective model, we have endeavored to examine the effectiveness of various transfer learning models under various conditions. We also tried to implement a novel CNN based model using ensemble method. The recommended methodology’s summary is shown in Fig. 1.
A Reliable and Efficient Transfer Learning Approach
125
Fig. 1. Diagrammatic representation of the work technique.
3.1
Dataset Summarization
In this paper, three publicly available datasets have been used. The datasets are designated as COVID-19 Radiography Dataset, COVID IEEE and Pneumonia and Normal Chest X-ray PA Dataset respectively. In this work, the three datasets will be alluded to as Dataset 1, Dataset 2, and Dataset 3, respectively. Table 1. Dataset designation and data distribution for all classes. Dataset Name
Total Samples Normal
Non COVID-19 COVID-19 Modality Pneumonia Pneumonia
Dataset 1
15,153
10,192
1,345
3,616
Dataset 2
1,708
668
619
421
Dataset 3
4,575
1,525
1,525
1,525
Merged Dataset
21,240
12,385
3,297
5,558
X-ray
Dataset 1 was collected from various resources. It incorporates COVID-19 positive chest X-rays, as well as normal and viral pneumonia images [20] [21]. Dataset 2 contains 1,708 images including 421 images of COVID-19, 619 images of viral pneumonia, and 668 images of normal patients [22]. The chest X-ray poster anterior (PA) images of Dataset 3 was obtained from several sources. It consists of a total of 4,575 images, with 1,525 images used for each condition [23]. We got rid of some redundant images present in both Dataset 2 and Dataset 3. The Merged dataset, an integration of Dataset 1,2 and 3 consists of a total 21,240 images. The dataset designation and data distribution for all classes are depicted in Table 1.
126
S. J. Seema and M. U. Ahmed Table 2. Dataset splitting and amalgamation.
Datasets Used
Splitting Ratio/ Training-Testing Dataset
Training (Images)
Testing(Images)
Total
Total
(Normal, Non COVID-19, COVID-19) (Normal, Non COVID-19, COVID-19)
Dataset 1
80:20
12122 (8153, 1076, 2893)
3031 (2039, 269, 723)
Dataset 1
70:30
10609 (7135, 942, 2532)
4544 (3057, 403, 1084)
Dataset 1
60:40
9092 (6115, 807, 2170)
6061 (4077, 538, 1446)
Dataset 1
100% Dataset 1 (training) &
& Dataset 2
100% Dataset 2 (testing)
Dataset 1
100% Dataset 1 (training) &
& Dataset 3
100% Dataset 3 (testing)
Dataset 1 + Dataset 2 + Dataset 3
80:20
3.2
15153
1708
(10192, 1345, 3616)
(668,619,421)
15153
4575
(10192, 1345, 3616)
(1525, 1525, 1525)
17084
4156
(9908, 2638, 4538)
(2477, 659, 1020)
Dataset Preprocessing
The datasets have undergone minimal preparation, including image scaling and splitting. To make them compatible with all of the models, all of the images are scaled to the input image size of the individual transfer learning models. On Dataset 1, three types of splitting have been performed. A split of 80:20 means that 80 % images are utilized for training and 20 % are utilized for testing. A 70:30 split means that 70 % images are utilized for training and 30 % are for testing. A 60:40 split means that 60 % images are used for training and 40 % are used for testing. Furthermore, Dataset 1 and 2 were employed for training and testing, respectively. Next, Dataset 1 and 3 were employed for training and testing purposes, respectively. Finally, the Merged dataset was divided into 80:20 (80 % training, 20 % testing). The dataset splitting and amalgamation has been described in Table 2. 3.3
Altered Transfer Learning Methods
Transfer Learning (TL) refers to the process of employing a model that has already been developed on one situation to solve the issues that is identical to it. As a byproduct of the ImageNet challenge, numerous CNN models have been uncovered in the image classification problem, and these pre-trained models are applicable via transfer learning to a multitude of image classification tasks [24,25]. In this analysis, nine comparable pre-trained CNN models- InceptionV3 [26], InceptionResNetV2, Xception [27], VGG16 [28], VGG19, ResNet50 [29], ResNet101 [30], MobileNet [31] and DenseNet201 [32] have been modified and used on the 4 datasets. We have taken into consideration models of every weight including the lightest model (MobileNet), mid weight model (ResNet50) and heavy
A Reliable and Efficient Transfer Learning Approach
127
weight model (VGG19). From depth perspective, InceptionResNetV2 comes first and VGG16 comes last. VGG19 contains the highest number of parameters of 143,667,240 and MobileNet contains the lowest number of parameters of 4,253,864. The input image size of the 3 models Xception, InceptionV3 and InceptionResNetV2 is 299 × 299. The input image size of the other 6 models VGG16, VGG19, ResNet50, ResNet101, MobileNet and DenseNet201 is 224 × 224. In each of the nine TL models, the final dense layer was eliminated, and a substitute dense layer with a softmax activation function was added in its place. Three neurons make up the new layer, indicating three classes. Normal cases are assigned to class 0, Non COVID-19 pneumonia cases to class 1, and COVID-19 pneumonia cases to class 2. All of the models were given training over 50 epochs employing Adam as the optimizer, a learning rate of 0.001, and a categorical cross-entropy loss function. 3.4
Training and Testing
The 9 TL models are at first tested on Dataset 1 (splitted into 80:20). Based on accuracy, the best 5 models are chosen from the 9 models. The 5 chosen best models are tested in 5 different scenarios. To begin, the 5 models are tested in an environment consisting of the same dataset (Dataset 1) but with two different split (70:30 and 60:40). Furthermore, the 5 TL models are again tested on another environment consisting of the Dataset 1 and Dataset 2. Dataset 1 is utilized for training the images and Dataset 2 is utilized for testing the images. Next, the five selected TL models are tested with another combination of Dataset (Dataset 1 as training and Dataset 3 as testing). After that, the Merged dataset is created and tested on the 5 TL models. In the end, the performance of the models are observed by averaging their result in all the phases. 3.5
Ensemble of Best TL Models
A voting ensemble is a method for combining recommendations from several independent models. When compared to a single model, ensemble techniques often produce more accurate findings. When it comes to categorizing, each label’s predictions are tallied together, and the category with the mass votes is picked [33]. In the final scenario, a voting ensemble was performed on three best models from the five TL models that were trained and evaluated using the Merged dataset (80:20 split). To begin, each of the three models will predict the class label of each sample in three different columns. If three of them predict the same class label (0/1/2) for a single sample, the sample will be considered in that class label. If, on the other hand, the majority of the models predict one sample as belonging to one class and the last one as belonging to another, the sample will be classified according to the majority models’ predictions. Finally, if all three models accurately predict that a sample belongs to a specific class, the predictor with the maximum accuracy will be utilized to make the prediction. Eventually, the ensemble model’s accuracy is evaluated to that of the top three models based on average accuracy. The robust and ideal model is the one with the highest levels
128
S. J. Seema and M. U. Ahmed
of accuracy, precision, recall, f1-score and class accuracy (Normal/ Non COVID19 pneumonia/ COVID-19 pneumonia). Figure 1 depicts the methodology of this approach.
4
Result Analysis
The act of TL models for identifying COVID-19 pneumonia, Non COVID-19 pneumonia, and Normal patients was evaluated in this study using 9 TL models. We also looked at the results of five of the best models in five distinct scenarios. Finally, a voting ensemble was performed on the three top models selected from the fifth scenario. The experiments were carried out in Kaggle using Python. All programs were run on an Acer Aspire 5 laptop with an NVIDIA GeForce MX150 graphics card (Intel Core i5-7th Gen Processor, 8 GB RAM, 2 TB Hard Drive, Windows 10 Pro). Accuracy, precision, recall, f1-score, and class wise accuracy are all used to evaluate each classifier’s performance. 4.1
Result Analysis of 9 TL Models Based on Dataset 1 (80:20) Split
COVID-19 0.9934 1 0.99 0.99 0.9990 0.9814 0.9820
Non COVID-19
0.962 0.97 0.95 0.96 0.9808 0.9442 0.9156
Normal
0.9656 0.96 0.97 0.96 0.9681 0.9702 0.9571
F1-Score 0.9841 0.99 0.98 0.98 0.9931 0.9739 0.9626
Recall
0.9917 0.99 0.99 0.99 0.9936 0.9962 0.9847
Precision
0.994 1 0.99 0.99 0.9985 0.9776 0.9875
0.9943 1 0.99 0.99 0.9985 0.9888 0.9847
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0.9907 0.99 0.99 0.99 0.9960 0.9925 0.9751
Accuracy
0.9874 0.99 0.98 0.99 0.9941 0.9739 0.9737
In this phase the 9 TL models (InceptionV3 [26], InceptionResNetV2, Xception [27], VGG16 [28], VGG19, ResNet50 [29], ResNet101 [30], MobileNet [31] and DenseNet201 [32]) were experimented on Dataset 1 (80:20 split). With an accuracy of 99.4 %, precision 1, recall 0.99, and f1-score 0.99, InceptionV3 and InceptionResNetV2 were the most successful. They both had the highest class accuracy (99.85 %) for Normal class, InceptionResNetV2 had the highest class accuracy (99.88 %) for Non COVID-19 pneumonia class and InceptionV3 had the highest class accuracy (99.78 %) for COVID-19 pneumonia class. Xception, MobileNet, and DenseNet201 came in second, third, and fourth place, respectively, with accuracy of 99.34 %, 99.17 %, and 99.07 %. With 96.2 %, VGG19 was the poorest performer. The performance of the models in this stage are depicted in Fig. 2. The 5 best performed models (InceptionV3, InceptionResNetV2, Xception, MobileNet, and DenseNet201) were selected for further experimentation in the next phases.
Fig. 2. Performance comparison of 9 TL models based on accuracy, precision, recall, f1-score and class accuracy.
A Reliable and Efficient Transfer Learning Approach
4.2
129
Result Analysis of 5 TL Models in First Scenario
The 5 selected TL models out of the 9 models are tested in a new scenario with the same dataset (Dataset 1) but with a different split (70:30). In this scenario, InceptionResNetV2, Xception and Densenet201 came in first, second and third in terms of achievement with an accuracy of 99.33 %, 99.09 % and 98.67 % respectively. InceptionV3 performed the worst in this environment. All five models overall performance is displayed in Fig. 3(a). 4.3
Result Analysis of 5 TL Models in Second Scenario
The 5 TL models that were chosen are tested again in a new setting using the same dataset (Dataset 1) but a different split (60:40). With accuracy of 98.64 %, 98.13 %, and 98.05 %, respectively, Xception, MobileNet, and InceptionResNetV2 placed first, second, and third in this scenario. In this context, DenseNet201 performed the poorest. Figure 3(b) depicts the overall performance of all five models. 4.4
Result Analysis of 5 TL Models in Third Scenario
This time, the 5 selected TL models are put to the test in a new scenario using two datasets. Training is done with Dataset 1, while testing is done with Dataset 2. Densenet201, Xception, and InceptionV3 took first, second, and third place in this scenario, with accuracy of 97.48 %, 96.66 %, and 94.78 %, respectively. In this context, InceptionResNetV2 performed the poorest. Figure 3(c) depicts the overall performance of all five models. 4.5
Result Analysis of 5 TL Models in Fourth Scenario
The five TL models are again tested in a new context with two datasets. The first dataset (Dataset 1) is utilized for training, while the third dataset (Dataset 3) is used for testing. In this scenario, Densenet201, InceptionResNetV2, and InceptionV3 came in first, second, and third, with accuracy of 93.68 %, 92.89 %, and 92.15 %, respectively. Xception scored the worst in this regard. The overall effectiveness of all five models is displayed in Fig. 3(d) shows the total performance of all five models. 4.6
Result Analysis of 5 TL Models in Fifth Scenario
With the Merged Dataset, the five TL models are now tested in a new context. It is separated into two halves, with 80 % dataset being utilized for training and 20 % for testing. In this scenario, InceptionResNetV2, InceptionV3, and MobileNet placed first, second, and third, respectively, with accuracy of 99.42 %, 99.27 %, and 99.15 %. Densenet201 had the lowest score in this category. The total performance of all five models is shown in Fig. 3(e).
130
S. J. Seema and M. U. Ahmed
0.9947 0.9964 0.9934 0.9964 0.9973
0.9677 0.99 0.928 0.9503 0.9751
0.9714 0.9861 0.9861 0.9704 0.9787
Xcepon
0.98 0.99 0.98 0.98 0.99
MobileNet
0.98 0.99 0.97 0.97 0.98
InceponV3
0.99 0.99 0.99 0.99 0.99
1 0.8 0.6 0.4 0.2 0
InceponResNetV2
0.9867 0.9933 0.9859 0.9861 0.9909
Densenet201
ACCURACY
PRECISION
RECALL
F1-SCORE
NORMAL
NON COVID-19
COVID-19
(a)
0.9865 0.9828 0.9906 0.9936 0.9936
0.9516 0.933 0.933 0.9851 0.9907
0.9605 0.9917 0.9681 0.9453 0.9647
Xcepon
0.97 0.97 0.97 0.98 0.99
MobileNet
0.97 0.97 0.96 0.97 0.98
InceponV3
0.98 0.98 0.98 0.99 0.99
1 0.8 0.6 0.4 0.2 0
InceponResNetV2
0.9772 0.9805 0.9802 0.9813 0.9864
Densenet201
ACCURACY
PRECISION
RECALL
F1-SCORE
NORMAL
NON COVID-19
COVID-19
InceponResNetV2
0.98 0.93 0.94 0.94 0.97
0.98 0.93 0.95 0.93 0.97
0.98 0.93 0.95 0.93 0.97
0.9640 0.8817 0.9206 0.9191 0.9520
0.9822 0.9806 0.9838 0.9903 0.9838
0.9809 0.9311 0.9382 0.8812 0.9643
1 0.8 0.6 0.4 0.2 0
Densenet201 0.9748 0.9297 0.9478 0.9355 0.9666
(b) InceponV3
MobileNet
Xcepon
ACCURACY
PRECISION
RECALL
F1-SCORE
NORMAL
NON COVID-19
COVID-19
(c)
0.975 0.9547 0.9173 0.939 0.9678
0.8754 0.8714 0.8655 0.8734 0.8668
0.96 0.9606 0.9816 0.935 0.9095
Xcepon
0.94 0.93 0.92 0.92 0.92
MobileNet
0.94 0.93 0.92 0.92 0.91
InceponV3
0.94 0.93 0.93 0.92 0.92
1 0.8 0.6 0.4 0.2 0
InceponResNetV2
0.9368 0.9289 0.9215 0.9158 0.9147
Densenet201
ACCURACY
PRECISION
RECALL
F1-SCORE
NORMAL
NON COVID-19
COVID-19
(d)
0.9955 0.9975 0.9983 0.9975 0.9959
0.9939 0.9863 0.9939 0.9924 0.9878
0.9607 0.9911 0.9784 0.9764 0.9803
Xcepon
0.99 0.99 0.99 0.99 0.99
MobileNet
0.98 0.99 0.99 0.99 0.99
InceponV3
0.99 0.99 1 0.99 0.99
1 0.8 0.6 0.4 0.2 0
InceponResNetV2
0.986766 0.9942 0.9927 0.991578 0.990856
Densenet201
ACCURACY
PRECISION
RECALL
F1-SCORE
NORMAL
NON COVID-19
COVID-19
(e)
Fig. 3. Performance comparison of 5 TL models based on accuracy, precision, recall, f1-score and class accuracy in five different scenarios (a) Dataset 1 (70:30 split). (b) Dataset 1 (60:40 split). (c) Dataset 1 (training) & Dataset 2 (testing). (d) Dataset 1 (training) & Dataset 3 (testing). (e) Dataset 1 + Dataset 2 + Dataset 3 (80:20 split).
A Reliable and Efficient Transfer Learning Approach Xcepon
131
0.9698912
MobileNet
0.9620556
InceponV3
0.96562
InceponResNetV2
0.96532
DenseNet201
0.9724532 0.9
1 Average accuracy for 5 scenarios
Fig. 4. Average Accuracy of 5 TL Models Considering All Five Scenarios.
4.7
Average Performance of the 5 TL Models Throughout 5 Scenarios
We’ve already seen how all of the models performed in five distinct scenarios. In different setups, the models performed differently. There was no uniformity in the models’ performance. In this situation, we took the average of all the models’ accuracy across the five scenarios. Densenet201 took top place with a 97.25 %, followed by Xception in second place with a 96.98 % and InceptionV3 in third place with a 96.56 %. InceptionResNetV2 and MobileNet’s average accuracy results were not up to par. The visual representation of the result is shown in Fig. 4.
Fig. 5. Confusion Matrix of Ensemble Model.
4.8
Result Analysis of Ensemble Model
A voting ensemble has been done on the 3 best selected models from the fifth scenario i.e. InceptionResNetV2, InceptionV3, and MobileNet. The ensemble model proved to perform the best with an accuracy of 99.62%, precision 1.00, recall 0.99 and f1-score 1.00. It achieved 99.87% in identifying normal cases, 99.39% in identifying Non COVID-19 pneumonia cases and 99.12% in identifying COVID-19 pneumonia cases. Figure 5 presents the ensemble model’s confusion matrix. The column indicates the predicted value and the row indicates the
132
S. J. Seema and M. U. Ahmed
Table 3. Comparison of 3 Best TL Models from Fifth Scenario and Ensemble Model. Models
Accuracy Precision Recall F1-Score Normal
Non COVID-19 COVID-19
InceptionResNetV2 99.42%
0.99
0.99
0.99
99.75%
98.63%
99.11%
InceptionV3
99.27%
1
0.99
0.99
99.83%
99.39%
97.84%
MobileNet
99.16%
0.99
0.99
0.99
99.75%
99.24%
Ensemble Model
99.62%
1
0.99
1
99.87% 99.39%
97.64% 99.12%
actual values. The model correctly predicted 2474 images out of 2477 images for normal cases, 655 images out of 659 images for Non COVID-19 pneumonia images, 1011 images out of 1020 images for COVID-19 pneumonia cases. The comparison of the result between the individual best three models and the ensemble model based on fifth scenario is given in Table 3. The ensemble model clearly beat all other models in accordance to accuracy, precision, recall f1-score, and class accuracy in identifying COVID-19 pneumonia, Non COVID19 pneumonia, and Normal cases. 4.9
Discussion
In order to build a practical process for estimating COVID-19 pneumonia from digital chest X-ray images and boost recognition rate, this study will employ TL models. There are numerous research on this topic in the literature, as seen in Table 4. It is typical to discriminate between COVID-19 positive and healthy patients when employing binary classification but it’s critical to make a distinction amidst COVID-19 pneumonia patients and those who have viral/bacterial pneumonia, another lung illness. There aren’t many studies in the literature that use enough samples, particularly COVID-19 pneumonia samples. Additionally, the previous works’ performances are not substantial, and there aren’t many ensemble-related works available. Sethy et al. [11] provided the SVM classifier with the deep features from 13 pre-trained CNN models. She employed 127 samples of COVID-19. The most accurate classifier was ResNet50, which had an accuracy of 95.33 % for the three categories of normal, pneumonia, and COVID-19. Khan et al. [12] suggested CoroNet, a Deep Convolutional Neural Network model based on the Xception architecture. The suggested model exhibited classification accuracy of 95 %. When the VGG16 model using a variety of augmentation techniques was tested alongside the MobileNet, DenseNet121, and EfficientNet CNN models, Nishio et al. [14] discovered that it performed best, with an accuracy of 83.6 %. In this work, 215 COVID-19 samples were used. The DarkCovidNet deep learning network was developed by Ozturk et al. [15] to demonstrate precise diagnostics for binary and multi-class classification. He created the model using 1,127 images, including 127 COVID-19 samples, and got a multi-class accuracy of 87.02 %. EDL-COVID was used by Tang et al. [16] on 15,477 samples, with an accuracy of roughly 95 %. To distinguish ¨ uz COVID-19 pneumonia (219 samples) from Non COVID-19 pneumonia, Oks¨ et al. [17] developed Ensemble-CVDNet, an amalgamation of three pre-trained
A Reliable and Efficient Transfer Learning Approach
133
models. Among the other studies, this model showed the best accuracy, at 98.30 %. Chowdhury et al. [20] applied 8 different transfer learning models on 3,487 images to classify them into three categories. DenseNet201 outperformed them all with an accuracy of 97.94 %. Bhardwaj et al. [18] presented a COVID-19 detecting deep ensemble learning architecture utilizing four various pre-trained deep neural network architectures. The experiment’s findings indicated a multiclass accuracy of 92.36 %. Table 4. Performance comparison of previous works on classification of COVID-19 from chest X-rays with the proposed model. Reference
Data Type Total Images Method(s)
Accuracy
Sethy et al. [11]
X-ray
381
95.33%
Khan et al. [12]
X-ray
1,251
CoroNet
89.60%
Nishio et al. [14]
X-ray
1,248
VGG16
83.60%
Ozturk et al. [15]
X-ray
1,127
DarkCovidNet
87.02%
Tang et al. [16] ¨ uz et al. [17] Oks¨
X-ray
15,477
EDL-COVID
95%
X-ray
2,905
Ensemble-CVDNet
98.30%
Chowdhury et al. [20] X-ray
3,487
DenseNet201
97.94%
InceptionV3, DenseNet121, InceptionResNetV2 and Xception
92.36%
Bhardwaj et al. [18]
X-ray
10,046
Proposed Model
X-ray
21,240
ResNet50 + SVM
Ensemble(InceptionResNetV2, 99.62% InceptionV3, MobileNet)
Most of the existing works have concentrated on a select few COVID-19 pneumonia chest X-ray images. The datasets that are provided are largely small. There have been few studies on large/amalgamation of small datasets. Our proposed model works with the amalgamation of three datasets containing a total of 21,240 samples including the highest number of COVID-19 samples (5,558) compared to the previous studies. Few researchers did analysis and experiments to ensure the model’s robustness, which is a key aspect of this type of work. Our models have been trained and tested in different environments for robustness purpose. In the first two scenarios, we implemented different split on the same dataset (Dataset 1). InceptionResNetV2 and Xception performed the best in first two environments. Then we tried combination of two different datasets to create two more environments known as the third and fourth scenario. In both the environments, DenseNet201 performed the best. In the fifth scenario, we combined all the datasets and created a Merged dataset consisting of the highest number of samples compared to the previous datasets. The models seemed to perform the best in this environment with the highest accuracy being 99.42% and lowest being 98.67%. We can conclude that if the models are developed on a massive number of images in different scenarios before being tested, they will perform better at correctly identifying the images. The more the models are trained and tested in various settings, the more we can see how they behave, and eventually choose one that will work better in future circumstances. The average accuracy
134
S. J. Seema and M. U. Ahmed
of all the models are taken from the five scenarios. DenseNet201 excelled among all the models with an average accuracy of 97.25%. As the models performed the best in the fifth scenario with the Merged dataset, we selected the best 3 performing models (InceptionResNetV2, InceptionV3, and MobileNet) from that environment and implemented voting ensemble on it in order to establish a more robust and effective model. It proved to operate the best with an accuracy of 99.62%. It achieved 99.87% in identifying normal cases, 99.39% in identifying Non COVID-19 pneumonia cases and 99.12% in identifying COVID-19 pneumonia cases. When clinical criteria are taken into account, it can often be challenging to identify COVID-19 pneumonia from ordinary pneumonia. If the expert is weary, they may make more mistakes when making these mechanical assessments and conclusions. A proper decision support system can function as a radiologist’s helper, saving their valuable time and relieving them of the load of deciphering countless chest X-ray images. Our recommended model can be employed for prescreening of the X-rays. A website can be built where patients can upload their X-ray images. They will immediately know about their condition before going to the doctor. Depending on the severity of the findings, they can go to the doctor later on. Our study does have certain limitations. Only chest X-rays were used to evaluate our model. CT scans were not used by us. With this suggested model, working with CT scans might be more successful. We tested our suggested model using open datasets, but clinical data should also be used to test its robustness. Radiologists’ approval and clinical usefulness were not obtained. This study did not anticipate the sub-classification of COVID-19 into mild, moderate, or severe disease because there was insufficient data available. Although we worked with COVID-19 pneumonia data that is high in number in comparison to the other previous works but still there is a scope of working with much larger number of data. The future goal for this research work is to work with larger datasets particularly consisting CT scans. This work lacks application of data augmentation on the collected datasets and working with feature extraction. We also hope to classify the severity of the disease into mild, moderate and severe category.
5
Conclusion
We aimed to build a durable and efficient model in this study by testing different TL models in various situations. Experiments show that merging numerous datasets improves the model’s performance significantly. For COVID-19 pneumonia identification utilizing X-ray images, an ensemble model based on three TL models is proposed. For multi-class cases, our suggested ensemble model had classification accuracy, precision, recall, and f1-score of 99.62 %, 1, 0.99, and 1.00 respectively. It accurately identified 99.87% of normal cases, 99.39% of Non COVID-19 pneumonia, and 99.12% of COVID-19 pneumonia. The high accuracy of this machine screening aid can significantly increase COVID-19 diagnosis speed and accuracy. We believe that the strategy suggested in this paper will be beneficial to the doctors and medical specialists.
A Reliable and Efficient Transfer Learning Approach
135
References 1. Biology, P.: What is pneumonia? https://www.bumc.bu.edu/pneumonia/ background/what/. Accessed 18 Apr 2022 2. Pham, T.D.: Classification of Covid-19 chest X-rays with deep learning: new models or fine tuning? Health Inf. Sci. Syst. 9(1) (2021) 3. Seladi-Schulman, J.: Coronavirus and pneumonia: Covid-19 pneumonia symptoms, treatment (2020). https://www.healthline.com/health/coronavirus-pneumonia. Accessed 18 Apr 2022 4. WebMD: Pneumonia and coronavirus. https://www.webmd.com/lung/covid-andpneumonia1. Accessed 18 Apr 2022 5. AL Association: Learn about pneumonia. https://www.lung.org/lung-healthdiseases/lung-disease-lookup/pneumonia/learn-about-pneumonia. Accessed 26 July 2022 6. Mayo: Pneumonia symptoms and causes. https://www.mayoclinic.org/diseasesconditions/pneumonia/symptoms-causes/syc-20354204. Accessed 18 Apr 2022 7. Stephen, O., Sain, M., Maduh, U.J., Jeong, D.U.: An efficient deep learning approach to pneumonia classification in healthcare. J. Healthcare Eng. 2019 (2019) 8. Cheng, Z., et al.: Clinical features and chest CT manifestations of coronavirus disease 2019 (Covid-19) in a single-center study in Shanghai, China. Am. J. Roentgenol. 215(1), 121–126 (2020) 9. Liu, C., Wang, X., Liu, C., Sun, Q., Peng, W.: Differentiating novel coronavirus pneumonia from general pneumonia based on machine learning. Biomed. Eng. Online 19(1), 1–14 (2020) 10. Ibrahim, A.U., Ozsoz, M., Serte, S., Al-Turjman, F., Yakoi, P.S.: Pneumonia classification using deep learning from chest X-ray images during Covid-19. Cogn. Comput. 1–13 (2021) 11. Sethy, P.K., Behera, S.K.: Detection of coronavirus disease (Covid-19) based on deep features (2020) 12. Khan, A.I., Shah, J.L., Bhat, M.M.: Coronet: a deep neural network for detection and diagnosis of Covid-19 from chest X-ray images. Comput. Methods Programs Biomed. 196, 105581 (2020) 13. Narin, A., Kaya, C., Pamuk, Z.: Automatic detection of coronavirus disease (Covid19) using X-ray images and deep convolutional neural networks. Pattern Anal. Appl. 24(3), 1207–1220 (2021) 14. Nishio, M., Noguchi, S., Matsuo, H., Murakami, T.: Automatic classification between Covid-19 pneumonia, non-Covid-19 pneumonia, and the healthy on chest X-ray image: combination of data augmentation methods. Sci. Rep. 10(1), 1–6 (2020) 15. Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Acharya, U.R.: Automated detection of Covid-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 121, 103792 (2020) 16. Tang, S., et al.: EDL-Covid: ensemble deep learning for Covid-19 case detection from chest X-ray images. IEEE Trans. Ind. Inf. 17(9), 6539–6549 (2021) ¨ uz, C., Urhan, O., G¨ 17. Oks¨ ull¨ u, M.K.: Ensemble-CVDNet: a deep learning based endto-end classification framework for Covid-19 detection using ensembles of networks. arXiv preprint arXiv:2012.09132 (2020) 18. Bhardwaj, P., Kaur, A.: A novel and efficient deep learning approach for Covid19 detection using X-ray imaging modality. Int. J. Imaging Syst. Technol. 31(4), 1775–1791 (2021)
136
S. J. Seema and M. U. Ahmed
19. Afifi, A., Hafsa, N.E., Ali, M.A., Alhumam, A., Alsalman, S.: An ensemble of global and local-attention based convolutional neural networks for Covid-19 diagnosis on chest X-ray images. Symmetry 13(1), 113 (2021) 20. Chowdhury, M.E., et al.: Can AI help in screening viral and Covid-19 pneumonia? IEEE Access 8, 132665–132676 (2020) 21. Rahman, T., et al.: Exploring the effect of image enhancement techniques on Covid19 detection using chest X-ray images. Comput. Biol. Med. 132, 104319 (2021) 22. Chen, Z.H.: Mask-RCNN detection of Covid-19 pneumonia symptoms by employing stacked autoencoders in deep unsupervised learning on low-dose high resolution CT (2020). https://doi.org/10.21227/4kcm-m312 23. Alqudah, A.M.: Augmented Covid-19 X-ray images dataset (2020) 24. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009) 25. Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6 26. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 27. Fei-Fei, L., Deng, J., Li, K.: ImageNet: constructing a large-scale image database. J. Vis. 9(8), 1037 (2009) 28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 29. Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recogn. 90, 119–133 (2019) 30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 31. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) 32. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 33. Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comput. Sci. 14(2), 241–258 (2020)
Infection Segmentation from COVID-19 Chest CT Scans with Dilated CBAM U-Net Tareque Bashar Ovi(B) , Md. Jawad-Ul Kabir Chowdhury , Shaira Senjuti Oyshee , and Mubdiul Islam Rizu Department of Electrical, Electronic and Communication Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh [email protected]
Abstract. The novel coronavirus illness (COVID-19) is a highly contagious virus that has swept around the world and has presented a serious threat to every country’s economy and public health. It has been demonstrated that COVID-19 can be accurately diagnosed with computed tomography (CT) scans that automatically partition infected areas. However, accurate segmentation continues to be a difficult task, due to the lack of pixel-level annotated medical images. For the automatic segmentation of distinct COVID-19 infection zones, a Convolutional Block Attention U-Net with additional dilated blocks is proposed in this study. The suggested architecture for the automatic segmentation of COVID-19 chest CT images with a dual attention mechanism and dilated block performs remarkably well in experiments, reaching an IoU score of 89.0% and a dice score of 90.2% . The proposal provides a novel, promising approach for quantitative COVID-19 detection utilizing CT scans of lung infection by overcoming the aforementioned issues. Keywords: COVID-19 · Segmentation · Chest CT · U-Net · CBAM · Dilated Convolution Block
1 Introduction Novel Coronavirus disease, or COVID-19, was recognized in December 2019. As of 18 May 2022, it has infected more than 524 million people worldwide, of which, more than 494 million have recovered, and close to 6.3 million have died [1]. Being such a deadly disease, its early diagnosis can be deemed crucial for a sound recovery. As a method of diagnosing COVID-19, Reverse transcriptase-polymerase chain reaction (RT-PCR) testing exists, however, its sensitivity ranges only from 42% to 71% [2]. On the other hand, chest CT images have shown 97% sensitivity for the diagnosis of COVID-19 [3], and chest CT images have been found to be sensitive even before clinical symptoms were exhibited by patients [4]. Therefore, the field of accurate segmentation of chest CT images has been studied extensively in recent times. Now, automatic segmentation of infectious parts in a chest CT image is quite challenging due to how low the contrast is among parts of the image, © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 137–151, 2023. https://doi.org/10.1007/978-3-031-34619-4_12
138
T. B. Ovi et al.
and also, due to how varied the position and the shape of the region of interest usually can be [5]. However, it has been found that machine learning-based approaches perform quite well for challenges like this [6, 7]. In particular, it has been shown that U-Net-based approaches work quite well for most medical image segmentation tasks [8]. Thus, a novel technique based on a U-Net that comprises convolutional block attention modules (CBAM) and a dilated convolutional block is examined in this study. The dilated convolutional block, utilized as a link between the encoder and the decoder of the U-Net, helps retain an enhanced amount of information for the decoder to work with. The CBAM collects important information and suppresses extraneous information in both the channel and spatial axes. For the purpose of optimizing the dual attention process, image enhancement using the Contract Limited Adaptive Histogram Equalizer (CLAHE) has been combined with normalizing and cropping. In the rest of the paper, first, a summary of some relevant works can be found in the literature review section. Then, a detailed discussion of the proposed model architecture has been included in the methodology section. Sequentially, the experimental study and performance assessment has been included in the result and analysis section, and finally, the conclusion, complete with a discussion and summary of the work can be found at the end of the paper.
2 Literature Review In this section, three relevant past contributions have been reviewed. Chen et al. [9] proposed a novel deep learning approach, which has both a residual network and an attention mechanism added to a ResNeXt block-based U-Net. The model achieves 89% accuracy, 94% dice similarity coefficient (DSC), and 95% precision with augmentation. According to the claim of the author, in multi-class segmentation, the model shows an improvement of over 10%, when compared to just U-Net with certain conditions in place. Zhao et al. [10] proposed D2A U-Net, which uses a ResNeXt-50-based encoder, for a dual attention U-Net to segment images. It also utilizes hybrid dilated convolution to boost its performance. The dual attention component, with modules gate attention and decoder attention, refines feature maps. This helps to produce much-improved feature representations. D2A U-Net achieves a dice score of 72.98%, a recall score of 70.71%, and a pixel error of 0.0311. Again, Wang et al. [11] investigated the transferability of a model’s segmentation capability, by including non-COVID-19 datasets in the training of a 3D U-Net. Their test showed a dice similarity coefficient (DSC) of 70.4%, a normalized surface distance of 0.735, a sensitivity of 68.2%, an F1-score of 70.7%, an accuracy of 99.4%, and a Matthews correlation coefficient (MSC) of 0.716, and reported better generalization. The over-fitting risks for segmenting COVID-19 infections are also observed to be lower in this approach. Therefore, considering it all, a model has been proposed that can achieve superior metrics without the need for non-COVID-19 datasets, based upon an approach that has shown promising performance in the cited works.
Infection Segmentation from COVID-19 Chest CT Scans
139
3 Methodology In this section, first, a brief dataset description and preprocessing details are attached. Then, the overall structure of the proposed model, with a brief description of dilated convolution module, and the backbone network, U-Net, has been included. Sequentially, a brief description of the convolutional block attention module (CBAM), is introduced. Finally, the summarized model architecture is illustrated in Fig. 4. 3.1 Dataset Description and Preprocessing The dataset contains 20 CT scans of individuals who have been given a COVID-19 diagnosis, and it also includes expert segmentation of infections in the lungs. Later on, to use this dataset, the following steps are performed, a process inspired by [18]. 1. 2. 3. 4.
Slicing Enhancement Normalizing Cropping by finding boundaries and contours
After slicing, 301 slices are collected. After that, enhancement is done using Contract Limited Adaptive Histogram Equalizer. Few results are given in Fig. 1.
Fig. 1. Samples After CLAHE Enhancement.
After the equalizer, normalization is done by dividing each pixel by its maximum pixel value. After that cropping is done by a 3 × 3 kernel, 2d filter, and a binary threshold.
140
T. B. Ovi et al.
Fig. 2. Samples After Enhancement and Cropping (left) and Final Sample (right).
Figure 2 (left) shows the state of our input image after enhancement and cropping. Finally, the input image and ground truth are shown in Fig. 2 (right). Then we have performed the following Augmentation mentioned in Table 1. Table 1. Augmentation Parameter Features
Value
Shear range
0.2
Zoom range
0.2
Horizontal flip
True
rescale
1./255
Vertical flip
True
Width shift range
0.2
Rotation range
15
Height shift range
0.2
3.2 Model Overview The proposed model uses a U-Net architecture with four residual convolutional blocks to encode or down-sample the input data to extract spatial features. However, where the
Infection Segmentation from COVID-19 Chest CT Scans
141
novelty of the proposed approach increases, is at the 5th block of the encoder network. It is a dilated module, which effectively expands the area covered by the kernel by skipping a set number of pixels between the pixels sampled by the kernel. A normal convolutional layer can be found comparable to a dilated convolutional layer with a dilation rate of 1, and if the dilation rate is set to 2, pixels for consecutive kernel members are sampled by skipping one pixel between each to-be-sampled pixels, from the inputted sampling field. Due to this, the dilated convolutional layer covers a larger area of its input to perform convolution, which when concatenated with results from multiple dilated convolutional layers with varying dilation rates, creates an output that raises the chances of identifying the region of interest. After the completion of the encoder network, the model moves onto its decoder network, consisting of 4 blocks. Finally, the Sigmoid function has been used as the activation function for a 1 × 1 convolutional layer to extract the output from the model. The complete structure of the proposed architecture is given in Fig. 3.
Fig. 3. U-Net Model Architecture.
3.3 U-Net Architecture U-Net is a type of fully convolutional network [12], which was proposed by Ronneberger et al. [7]. It is a type of artificial neural network (ANN), primarily made of convolutional layers, in sets. These layers make up blocks used in the encoder, or down-sampling network, and a set of deconvolutional layers, that make up blocks used in the decoder, or the up-sampling network. The entire architecture is symmetric due to the structuring of its encoder and decoder network. The encoder-decoder mechanism of the proposed model is given in Fig. 4.
Fig. 4. Encoder Block (Top) and Decoder Block (Bottom).
142
T. B. Ovi et al.
The encoder, designed to extract spatial features from its input, contains a sequence of blocks, each of which contains two blocks. The first block, called convolutional block, contains in parallel, a residual connection layer, and then, two branches in parallel, one of which contains a single convolutional layer and the other contain three of the same. The two branches then get concatenated, and then from that, two branches sprout out again, this time, one branch contains a single convolutional layer, and the other contains two of the same. Finally, the residual layer and the output of the branches get concatenated, and then, the output goes through a convolutional block attention module (CBAM). All of the convolutional layers used in these blocks use a kernel size of 3 × 3, except for the residual connection layer, which uses a kernel of size 1 × 1. Every layer uses ReLU activation, except for the CBAM block, which uses Sigmoid activation. The convolutional block is depicted in Fig. 5. The ReLU function and the Sigmoid functions are defined as: ReLU : f (x) = max{0, x} Sigmoid : f (x) =
1 1 + e−x
Fig. 5. The architecture of Convolutional Block (used in both encoder and decoder blocks).
In the end, the output from the CBAM block goes through the second block of an encoder block, which contains, in parallel, a max-pooling layer and an average-pooling layer, both of which use a pool size of 2 × 2, and the outputs get concatenated before getting outputted from the encoder block. Four of these encoder blocks are used in series in the encoder network. The filter number for the first block is 16, and then, the number of filters gets doubled at every block. Then, the encoder network and the decoder network are conjoined using a dilated module, with a filter size of 128, the same as the filter size used in the last encoder block, and the first decoder block. The decoder is designed for creating the segmented feature map from the spatial features retrieved by the encoder. The decoder contains another sequence of blocks, each of which contains a transpose convolutional layer at first, and then, contains a concatenation layer that concatenates the feature information retrieved from the convolutional layer of the last encoder block first. As we go through the decoder blocks, the last decoder block concatenates information from the convolutional layer of the first encoder block. Finally, a decoder block is concluded with a convolutional block, similar to the one used in encoder blocks used in the encoder network. Four of these decoder blocks are used
Infection Segmentation from COVID-19 Chest CT Scans
143
in the decoder network. The filter size gets halved at every block, and a 2 × 2 kernel is used with (2, 2) strides. Finally, the model outputs using a convolutional block with a filter size of 1, a 1 × 1 kernel, and Sigmoid as its activation function, as only binary classification is needed. 3.4 Convolutional Block Attention Module (CBAM) Proposed by Woo et al. [13], the convolutional block attention module (CBAM) tries to emphasize features that are impactful along two primary dimensions channel and spatial axes, instead of integrating cross-channel data and spatial data together, like in a regular convolutional block. The concept of CBAM is shown in Fig. 6.
Fig. 6. Convolutional Block Attention Module (CBAM) Architecture.
CBAM, as shown in Fig. 6, runs data sequentially, through a channel attention module first, and then, a spatial attention module. This allows branches of processing to have a clear and focused path of processing in both channel and spatial axes, which within the network, allows for emphasizing of the necessary information, and suppression of unnecessary details. The proposed work can be summarized in Fig. 7:
Fig. 7. Workflow.
The training parameter of our trained model is given in Table 2, which is fine-tuned through trial and error.
144
T. B. Ovi et al. Table 2. Training Parameters
Name of the Hyper-parameter
Parameter Value
Epochs
350
Batch Size
64
Output Layer Activation
Sigmoid
Optimizer
Adam (epsilon = 0.1)
Learning rate
0.05
Decay Rate
1.43e−4
Total Numbers of Parameters
7,755,445
Trainable Parameters
7,747,765
Non-Trainable Parameters
7,680
4 Result and Analysis To evaluate our model, we have used the following evaluation matrices: 1. Dice Coefficient: The Dice coefficient is applied to assess the pixel-wise agreement between a predicted segmentation and the corresponding ground truth. It is described as: Dice Coefficient =
2TP 2TP + FP + FN
Figure 8(a) depicts the dice coefficient curve, and from the curve, it is evident that the changes to the value become very minute after a bit over 100 epochs. 2. Loss: After each optimization iteration, a model’s performance is shown by its loss value. To assess how closely an estimated value resembles the ground truth value, loss functions are utilized. Validation loss peaks at a little over 50 epochs, as seen in Fig. 8(b). 3. Sensitivity: Sensitivity is the measure of how well a model can predict the true positives for each available class. This concept can be formulated as: Sensitivity =
TP TP + FN
Figure 8(c) depicts the sensitivity graph, and it is clear that after 100 epochs, the sensitivity does not change by any appreciable amount. 4. Specificity: It is a metric that evaluates the ability of a model to estimate the true negatives of the available categories. The equation for this concept is: Specificity =
TN TN + FP
Figure 8(d) depicts the specificity over epochs curve, and the curve proves the potential of the proposed model, as from nearly the beginning of the training, it gets close to 100.
Infection Segmentation from COVID-19 Chest CT Scans
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
145
Fig. 8. Evaluation Matrices: (a) Dice Coefficient; (b) Loss vs Epoch; (c) Sensitivity; (d) Specificity vs Epoch; (e) IoU vs Epoch; (f) Accuracy vs Epoch; (g) BCE Dice Loss vs Epoch; (h) Mathew’s Correlation vs epoch; (i) Precision vs Epoch; (j) Recall vs Epoch.
146
T. B. Ovi et al.
5. Accuracy: The proportion of accurately predicted data points among all the data points is known as accuracy. The formula for this concept is: Accuracy =
TP + TN TN + TP + FN + FP
Figure 8(f) depicts the accuracy curve and again, the proposed model proves its capability, as it reaches nearly 100% accuracy after about 20 epochs. 6. Intersection over Union (IoU): IoU is a typical metric for comparing the accuracy of proposed image segmentation to a known/ground-truth segmentation. The formula for IOU is, IoU =
TP TP + FP + FN
Figure 8(e) depicts the IoU curve, and after 200 epochs, the IoU does not improve appreciably. 7. BCE Dice Loss: The binary cross entropy between the target and the output is measured by a threshold that is made using BCE dice loss. Figure 8(g) shows the BCE dice loss curve, and from the curve, it is apparent that the score does not change appreciably after 200 epochs. 8. Mathews Correlation (MCC): For statistical model evaluation, MCC is used. It measures the difference between the estimated values and real values and is comparable to chi-square statistics for a 2 × 2 contingency table. Figure 8(h) shows the MCC over epochs curve, and from the curve, it is clear that MCC does not increase in any appreciable amount after 200 epochs. 9. Precision: The proportion of accurately categorized positive samples (True Positive) to the total number of positively classified samples is known as precision. The precision measures the model’s accuracy in classifying a sample as positive. Precision =
TP TP + FP
From Fig. 8(i), which depicts the precision curve, it is evident that the precision does not change appreciably after 150 epochs. 10. Recall: The recall is determined as the proportion of Positive samples that were correctly identified as Positive to all Positive samples. The recall gauges how well the model can identify positive samples. The more positive samples that are identified, the larger the recall. Mathematical intuition behind this concept is as follows, Recall =
TP TP + FN
From Fig. 8(j), which shows the recall curve, it is evident that the recall score settles down after 100 epochs. 4.1 Visual Results Figure 9 reflects how the proposed model performs on some test data. The first column consists of original CT images, the middle column of the figure shows the annotation
Infection Segmentation from COVID-19 Chest CT Scans
Fig. 9. Model performance visualized using test data.
147
148
T. B. Ovi et al.
made on the input images, and the last column shows the prediction done by the proposed model on the same input image. From the figures, it can be seen that the predictions made by the proposed model are quite similar to the annotations, and in some cases, the similarity of the predictions extends to the point of being nearly indistinguishable from the ground truth. 4.2 Performance Table Comparative performance analysis of our model with the existing literature is depicted in Table 3. From the table, it can be concluded that the proposed model has outperformed all the existing literature in every evaluation parameter. Table 3. Performance Analysis Recent Works
Used Dataset
[14]
Algorithm
Accuracy
DSC
IoU
Precision
https://med MPS-Net icalsegme ntation.com/ covid19
–
0.8325
0.742
–
[15]
Sourced by author
Novel CNN
–
0.987 (for lung) 0.726 (for COVID-19)
–
0.99 (for lung) 0.726 (for COVID-19)
[11]
COVID-19 Dataset, MSD Lung Tumor, StructSeg Lung Cancer, NSCLC Pleural Effusion
Attention-based 0.994 selective fusion unit with a dedicated and modified encoder for dynamic feature extraction and grouping
0.704
–
–
[16]
Kaggle CXR public dataset
COVID-SSNet
–
–
0.9971
0.9953
(continued)
Infection Segmentation from COVID-19 Chest CT Scans
149
Table 3. (continued) Recent Works
Used Dataset
[10]
Algorithm
Accuracy
DSC
IoU
Precision
https://med D2A icalsegme U-Net ntation.com/ covid19/ https://zen odo.org/rec ord/375 7476
–
0.7298 (with ResNeXt-50 backbone)
–
–
[17]
Sourced by author
COVID-SegNet
–
0.987 (for lung) 0.726 (for COVID-19)
–
0.99 (for lung) 0.726 (for COVID-19)
[9]
SIRM COVID Dataset
U-Net with residual and attention mechanism
0.89
0.94
–
0.95
Ours
https:// www.kag gle.com/dat asets/and rewmvd/cov id19-ctscans
U-Net
0.84
0.875
0.67
0.85
Link-Net
0.86
0.89
0.71
0.873
CBAM U-net with dilated block (Proposed architecture)
0.998
0.902
.89
0.99
5 Conclusion and Future Work According to a recent research, CT imaging is now the most popular screening method for COVID-19. It can aid the community in more promptly and properly determining the severity of COVID-19. In this study, a dual-attention-based deep learning architecture for automated segmentation of COVID-19 infectious area from CT images has been proposed, and it has shown to be both plausible and superior to previous research. In order to enhance performance, a modified CBAM U-net with a dilated block that employs an effective block and spatial attention strategy has been presented. The performance table shows that the suggested model outperforms the old method by 15% in terms of IoU. A recent study found that early COVID-19 detection is crucial. If the infection location in the chest CT image can be found early, patients have a greater chance of surviving. Radiologists now have a trustworthy and promising deep learning architecture for identifying COVID-19 treatment and segmenting the lung areas that are infected. The proposed approach has the potential to be used to a broader range of therapeutic applications in the future, such as assisting with the diagnosis of more diseases from CT
150
T. B. Ovi et al.
images. The quantity of ground truth data accessible in the case of a new disease, such as the coronavirus, is often limited due to the complexity of data collecting and annotation which restricts the model performance to broader extend. Our next objective would be to raise it to 99 percent because our validation IoU in Fig. 9(e) did not reach above 89 percent. Preprocessing and adjusting the hyperparameters are necessary. A semi-supervised generative model will be used to increase the capacity to address special problems. Future study should also focus on interpretability, which is crucial for medical applications. The attention techniques proposed in this article can induce internal decision process interpretation on some levels, despite the fact that deep learning is well known for its interpretability. This approach will continue to be developed in order to gain more scientific knowledge, and research into hybrid and multi-head attention models will also be conducted in order to provide the best possible semantic segmentation.
References 1. Coronavirus Update. https://www.worldometers.info/coronavirus/. Accessed 18 May 2022 2. Simpson, S., et al.: Radiological society of North America expert consensus statement on Proceedings of SPIE, vol. 11597 115972X-5. https://www.spiedigitallibrary.org/conferenceproceedings-of-spie on 10 Apr 2022 Terms of Use: https://www.spiedigitallibrary.org/termsof-use reporting chest CT findings related to COVID-19. Endorsed by the society of thoracic radiology, the American college of radiology, and RSNA. Radiology: Cardiothoracic Imaging 2(2) (2020) 3. Ai, T., et al.: Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (Covid-19) in China: a report of 1014 cases. Radiology 296(2) (2020) 4. Salehi, S., Abedi, A., Balakrishnan, S., Gholamrezanezhad, A.: Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients. Am. J. Roentgenol. 1–7 (2020) 5. Shan, F., et al.: Lung infection quantification of Covid-19 in CT images with deep learning. arXiv preprint arXiv:2003.04655 (2020) 6. Shi, F., et al.: Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for Covid-19. arXiv preprint arXiv:2004.02731 (2020) 7. Shen, D., Wu, G., Suk, H.-I.: Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017) 8. Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. arXiv preprint arXiv:1802.06955 (2018) 9. Chen, X., Lina, Y., Yu, Z.: Residual attention U-Net for automated multi-class segmentation of Covid-19 chest CT images. arXiv preprint arXiv:2004.05645 (2020) 10. Zhao, X., et al.: D2A U-Net: automatic segmentation of Covid-19 lesions from CT slices with dilated convolution and dual attention mechanism. arXiv preprint arXiv:2102.05210 (2021) 11. Wang, Y., et al.: Does non-COVID-19 lung lesion help? Investigating transferability in COVID-19 CT image segmentation. Comput. Methods Programs Biomed. 202, 106004 (2021) 12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 13. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Infection Segmentation from COVID-19 Chest CT Scans
151
14. Pei, H.-Y., Yang, D., Liu, G.-R., Lu, T.: MPS-Net: multi-point supervised network for CT image segmentation of COVID-19. IEEE Access 9, 47144–47153 (2021) 15. Yan, Q., et al.: COVID-19 chest CT image segmentation–a deep convolutional neural network solution. arXiv preprint arXiv:2004.10987 (2020) 16. Prakash, N.B., Murugappan, M., Hemalakshmi, G.R., Jayalakshmi, M., Mahmud, M.: Deep transfer learning for COVID-19 detection and infection localization with superpixel based segmentation. Sustain. Cities Soc. 75, 103252 (2021) 17. Yan, Q., et al.: COVID-19 chest CT image segmentation network by multi-scale fusion and enhancement operations. IEEE Trans. Big Data 7(1), 13–24 (2021) 18. https://www.kaggle.com/code/haksorus/covid19-lungs-inf-segmentation-baseline
Convolutional Neural Network Model to Detect COVID-19 Patients Utilizing Chest X-Ray Images Md. Shahriare Satu1 , Khair Ahammed2(B) , Mohammad Zoynul Abedin3,4 , Md. Auhidur Rahman2 , Sheikh Mohammed Shariful Islam5 , A. K. M. Azad6 , Salem A. Alyami7 , and Mohammad Ali Moni8 1 Department of Management Information Systems, Noakhali Science and Technology University, Noakhali, Bangladesh [email protected] 2 Institute of Information Technology, Noakhali Science and Technology University, Noakhali, Bangladesh [email protected] 3 International Business School, Teesside University, Middlesbrough, UK 4 Department of Finance and Banking, Hajee Mohammad Danesh Science and Technology University, Rangpur, Bangladesh 5 Institute for Physical Activity and Nutrition, Deakin University, Geelong, Australia [email protected] 6 iThree Institute, Faculty of Science, University Technology of Sydney, Sydney, Australia [email protected] 7 Department of Mathematics and Statistics, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia [email protected] 8 School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, Saint Lucia, QLD 4072, Australia [email protected]
Abstract. This study aims to propose a deep learning model and detect COVID-19 chest X-ray cases more precisely. We have merged all the publicly available chest X-ray datasets of COVID-19 infected patients from Kaggle and Github, and pre-processed it using random sampling. Then, we proposed an enhanced convolutional neural network (CNN) model to this dataset and obtained a 94.03% accuracy, 95.52% AUC and 94.03% f-measure for detecting COVID-19 patients. We have also performed a comparative performance between proposed CNN model with several state-of-the-art classifiers including support vector machine, random forest, k-nearest neighbor, logistic regression, gaussian na¨ıve bayes, bernoulli na¨ıve bayes, decision tree, Xgboost, multilayer perceptron, nearest centroid, perceptron, deep neural network and pre-trained models such as residual neural network 50, visual geometry group network 16, and inception network V3 were employed, where our model yielded outperforming results compared to all other models. While evaluating the c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 152–166, 2023. https://doi.org/10.1007/978-3-031-34619-4_13
Convolutional Neural Network Model
153
performance of our models, we have emphasized on specificity along with accuracy to identify non-COVID-19 individuals more accurately, which may potentially facilitate the early detection of COVID-19 patients for their preliminary screening, especially in under-resourced health infrastructure with insufficient PCR testing systems and testing facilities. This model could also be applicable to cases of other lung infections. Keywords: COVID-19 · Chest-Xray Images · Machine Learning Deep Learning · Convolutional Neural Network
1
·
Introduction
Novel coronavirus disease (COVID-19) is an ongoing pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [30]. The first case of COVID-19 was believed to be detected at Wuhan, China in December 2019 and it had been spread rapidly throughout the world [11]. In has been reported that viruses from the Coronaviridae family were discovered in 1960s from the nasal pits of patients [17]. It is a massive infectious group that enveloped ribonucleic acid (RNA) viruses and generated different types of respiratory, hepatic and neurological diseases among humans and other mammals [33]. It contains a large family of viruses where some of them induced community transmissions, such as the middle east respiratory syndrome (MERS-CoV) and severe acute respiratory syndrome (SARS-CoV). The SARS-CoV, MERS-CoV and SARS-CoV-2 were reported to be originated from bats [18], but, however, SARS-CoV-2 has been found to have phylogenetic similarity with SARS-CoV [33]. It causes COVID-19 that makes the third coronavirus emergent condition after the past two decades, preceded by the SARS-CoV and MERS-CoV outbreak in 2002 and 2012, respectively. The World Health Organization (WHO) has announced this situation as a public health emergency of international concern on 30th January and declared the situation as a pandemic on 11th March, 2020 [9]. Moreover, WHO issued public health advises to maintain different precautions like keep social distancing, wash hand with soap and sanitizer, avoid touching nose, mouth, and eye etc. However, most of the affected countries underwent a completed locked-down to prevent the local transmission of COVID-19 infection in their regions. However, SARS-CoV-2 infected patients were commonly identified primarily with some common symptoms such as fever, cough, fatigue, loss of appetite, muscle pain etc. Hence, they were needed to identify, isolate and ensure the treatment policy at early stages. There are existing two types of procedures, namely i) molecular diagnostic tests, and ii) serologic tests [23]. The reverse transcription polymerase chain reaction (RT-PCR) test is a molecular diagnostic test, which is currently considered as the gold standard [4] that detects the viral RNA of SARS-CoV-2 from sputum or nasopharyngeal swab. Nevertheless, it is relatively associated with true positive rate and required specific equipment [4]. Another technique is currently under development that are explored virus proteins to identify COVID19 called viral antigen detection. When a patient has been recovered and needed
154
Md. S. Satu et al.
to test again, the molecular tests cannot detect this disease for longer periods as the growth of the antibody may be shown in its reaction to the host. Serologic test is another primary tool to verify antibodies in blood and diagnose patients. Due to lack of analysis and skilled human resources, these procedures are time consuming and sometimes unavailable for many people especially in low and middle income countries, which, therefore, demands an alternate but cheaper solution for early diagnostics of COVID-19 inflections fast as possible. Recently, medical images such as chest X-ray and computed tomography (CT) scan images have been used to determine COVID-19 positive cases [21]. But, CT scan imaging is a costly procedure and not available in every hospital or medical centers. Alternatively, chest X-ray scanning machines are found in almost all the nearest clinic, medical lab or hospitals rather than biomolecular laboratory test. So, it is cheaper, faster and widely-used way to generate 2D images of the patients [16], that can potentially be used for COVID-19 patients as well. Moreover, radiologists have used these images to explore its pathology and detect relevant diseases. Most of the existing works were implemented machine learning algorithms into medical images for detecting COVID-19 patients and focused on how classifiers were adopted to identify positive cases, but failed to detect false negative cases that cause more community transmission of COVID19, requiring more stringent attention on the specificity measure of thoe model predictions. In this study, we proposed a convolutional neural network (CNN) to investigate chest X-ray images and identify COVID-19 patients in early stage more precisely with higher specificity, that may aid public health systems to reduce the local community transmission rate. This paper is organized as follows: Sect. 2 provides some related works about chest X-ray image analysis. Then, Sect. 3 describes about working dataset and step-by-step procedure of how we can analyze it using machine learning (ML) and deep learning (DL) models. Section 4 shows the experimental outcomes and Sect. 5 explains the performance of this work. Finally, Sect. 6 concludes this work by providing some future research directions.
2
Literature Review
Several studies have reported the use of medical images of COVID-19 infection for further investigation using various machine and deep learning methods. [31] generated a large benchmark dataset with 13,975 chest X-ray images called COVIDx and investigated them using deep learning model that showed 93.30% accuracy. [1] proposed DeTraC deep CNN model that gave solution by transferring knowledge from generic object recognition to domain-specific tasks. Their algorithm showed 95.55% accuracy (specificity of 91.87%, and a precision of 93.36%). [3] implemented transfer learning using CNNs into a small medical image dataset (1427 X-ray images), which provided highest 96.78% accuracy, 98.66% sensitivity, and 96.46% specificity respectively. [13] proposed a deep CNN architecture called COVIDX-Net that investigated 50 chest X-ray images with 25 COVID-19 cases and provided 90% accuracy and 91% F-score. [19] proposed
Convolutional Neural Network Model
155
a deep learning framework based on 5000 images named COVID-Xray-5k where they applied ResNet18, ResNet50, SqueezeNet and Densenet-121 into them and produced sensitivity 97.5% and specificity 90% on average. [10] used transfer learning based VGG16 model into chest x-ray images which showed 94.5% accuracy, 98.4% sensitivity and 98% specificity. Again, [15] represented a deep neural network based on Xception named CoroNet that provided 89.6% accuracy for four class and 95% accuracy for three class images. [5] used two-phase classification approach that extracted majority vote based ensemble classifier and showed 91.03% accuracy to detect COVID-19 from pneumonia. Also, [12] used several deep learning approaches, namely deep feature extraction with SVM, fine tuning pre-trained CNN, and end-to-end trained CNN model, which classified COVID19 and normal chest X-ray images. [14] proposed a customized CNN with distinctive filter learning module that shows 97.94% accuracy and 96.90% F1-score for predicting four classes respectively. [22] proposed an automatic detection model where MobileNet with the SVM (linear kernel) provides 98.5% accuracy and an F1-score and DenseNet201 with MLP shows 95.6% accuracy and an F1-score for COVID-19 infection based on chest X-ray images. [20] investigated 1616 chest X-ray images using DenseNet161 where it shows 79.89% accuracy to classify normal, pathological and COVID-19 patients. [8] represented a custom CNN based model named COVID-XNet that shows 94.43% average accuracy, 98.8% AUC, 96.33% sensitivity, and 93.76% specificity respectively. [29] provided a siamese neural network called MetaCOVID to integrate contrastive learning with a finetuned pre-trained ConvNet encoder and capture unbiased feature representations using 10-shot learning scores and compared among the meta learning algorithm with InceptionV3, Xception, Inception, ResNetV2, and VGG16. [28] proposed fusion model hand-crafted with deep learning features (FM-HCF-DLF) that used multi-layer perceptron (MLP) and InceptionV3 where MLP generated 94.08% accuracy.
3
Materials and Methods
The working methodology has been used to detect COVID-19 patients from the publicly available datasets. This approach is described briefly as follows: 3.1
Data Collection
The primary chest X-ray images have been obtained from the COVID-19 Radiography Database [6]. It contained 1,341 normal, 1,345 viral pneumonia, and 219 COVID-19 patient’s images, which have been taken as primary dataset. However, the distribution of different types of images was not the same. To balance this dataset, we collected 66 images from [7] and added them with COVID-19 images of primary dataset. For other classes (normal and pneumonia), a random under-sampling method has been used and generated balanced instances of each class. Finally, this experimental dataset had been contained 285 normal, viral pneumonia and COVID-19 images respectively.
156
3.2
Md. S. Satu et al.
Data Pre-processing
In this step, we normalized training set into grayscale images. Then, all baseline classifiers have been implemented with transformed dataset respectively. But, pre-trained CNN models such as VGG16, ResNet50, InceptionV3 cannot support grayscale images, hence we directly employed them into primary dataset. 3.3
Proposed Convolutional Neural Networks
Convolutional Neural Networks (CNN) is a special class of artificial neural network (ANN) that manipulates an input layer along with the sequence of hidden and output layers. It maintains a sparse connection between layers and weights that shares them with output neurons in the hidden layers. Like regular ANN, CNN contains a sequence of hidden layers, which are denoted as convolutional and polling layer. In addition, the operations of these layers are called convolutional and polling operation, respectively. Alternatively, they are stacked to lead a series of fully connected layers followed by an output layer. In many research fields including image recognition, object detection, semantic segmentation and medical image analysis, CNN models yield considerably higher performances compared to the state-of-the-arts (Fig. 1).
Fig. 1. Proposed Convolution Neural Network
Convolutional Neural Network Model
157
Convolutional Layer. Convolution layer is the core structure of a CNN that manipulates convolution operation (represented by ∗) instead of general matrix multiplication. This layer accomplishes most of the computations of CNN model. The count of filters, size of local region, stride, and padding are mentioned as hyper-parameters of this layer. Convolution layers extract and learn about features using these filters. Hence, it is known as the feature extraction layer. These parameters are necessary where the similar filter is traversed across the whole image for a single feature. The main objective of this layer is to identify common features of input images and map their appearance to the feature map. The convolution operation is given as: I(i + m, j + n)K(m, n) (1) F (i, j) = (I ∗ K)(i, j) = m
n
To introduce non-linearity, the output of each convolutional layer is fed to an activation function. Numerous activation functions are available but Rectified Linear Unit (ReLU) is widely used in the deep learning field. It is mathematically calculated as follows: f (x) = max(0, x)
(2)
In this model, we have used fewer layers and filters, which consists of two convolutional layers and gradually increased the number of filters from 32 to 64, respectively, where an image of size 100 × 100 and the pixel values of them are either 0 or 1. In the first convolutional layer, this image is convoluted with 3 × 3 kernel for 32 filters and produces the feature map 100 × 100 × 32. Subsequently, this output has been forwarded to the second convolutional layer where we consider 3 × 3 sized kernel for 64 filters that is also convoluted with 100 × 100 × 32 extracted features and produced 100 × 100 × 64 sized output feature map is produced in this layer. Pooling Layer. In CNN, the sequence of convolution layer is followed by an optional pooling or down sampling layer to lessen the volume of input images and number of parameters. This layer computes fast and precludes over-fitting. The most common pooling technique is called Max Pooling, which merely generates the highest result of the input region. Other pooling options are average pooling and sum pooling. Two hyper-parameters are essential for the pooling layer, namely filter and stride. In this model, we implement 2 × 2 filter into 100 × 100 × 64 sized output feature map and create 50 × 50 × 64 reduced feature map. Flatten Layer. After implementing the pooling layer, a flatten layer has been employed to flat the entire network. It converts the entire pooled feature map matrix into a single column.
158
Md. S. Satu et al.
Dense Layer. Then, we have implemented three dense layers which are also known as a fully connected layer. In this layer, the input of previous layers is flattened from a matrix into a vector and forwarded it to this layer like a neural network. This layer viewed the output of past layers and decided which features are mostly matched with the individual class. Therefore, a fully connected layer can yield accurate probabilities for the different classes. The outputs are classified by using the activation function at the output layer, which in our case was the Softmax function to calculate the probability of particular classes defined by the following this equation: k
k
ex
Z = n
i=1
exn
(3)
Dropout Layer. When a large feed-forward neural network is investigated with a small training set, it usually shows poor performance on held-out test data, and dropout is a useful procedure to mitigate this problem. In our model, we used dropout layer after each dense layer and to reduce over-fitting by preventing complex co-adaptations on the training data. 3.4
Baseline Classifiers
Several machine learning classifiers have been used to perform comparative performance assessments, such as support vector machine (SVM), random forest (RF), k-nearest neighbor (k-NN), logistic regression (LR), gaussian na¨ıve bayes (GNB), Bernoulli na¨ıve bayes (BNB), decision tree (DT), Xgboost (XGB), multilayer perceptron (MLP), nearest centroid (NC) and perceptron. While training with pre-processed dataset, several model hyper-parameters were fine-tuned (i.e. changed and optimized) to get better predictive accuracy. These parameters of baseline classifiers are represented in Table 2. 3.5
Pre-trained Transfer Learning Models
Moreover, several deep learning classifiers were employed such as deep neural network (DNN) and several pre-trained CNNs like residual neural network (ResNet50), visual geometry group network 16 (VGG16), and inception network V3 (InceptionV3) for transfer learning. These models have been widely used to investigate images in various domains [2,25,26]. Like general classifiers, various parameters were tuned to get more accurate result for detecting COVID-19. For DNN, we considered the batch-size as 32, number of epochs as 50, adam optimizer, and the learning rate as 0.0001 with weight decay. Again, some regularization terms have been employed for reducing overfitting in the deep learning models. When pre-trained models have been loaded, it was downloaded requiring packages to manipulate input images. Then, the flatten layer was added into these pre-trained models, which flattens the input to one dimension. Then, we implemented a dense layer with 64 neurons, the relu activation function
Convolutional Neural Network Model
159
Table 1. A summary of proposed 9 layers model Layer (Type)
Output Shape Param #
conv2d 1 (Conv2D) conv2d 2 (Conv2D) max pooling2d 1 (MaxPooling2D) flatten 1 (Flatten) dense 1 (Dense) dropout 1 (Dropout) dense 2 (Dense) dropout 2 (Dropout) dense 3 (Dense)
(100, 100, 32) (100, 100, 64) (50, 50, 64) 160000 120 120 60 60 3
320 18496 0 0 19200120 0 7260 0 183
Total params: 19,226,379 Trainable params: 19,226,379 Non-trainable params: 0
and the regularizer as 0.001, respectively. Before and after employing the dense layer, dropout layer has been used to reduce the over-fitting issues. Finally, three classes have been assigned with the softmax activation function. To compile the model, categorical crossentopy loss function and adam optimizer were taken with 0.00001 learning rate. Therefore, we used last 1 trainable layer for ResNet50 and considered the last 62 trainable layers for InceptionV3 (Table 1). Table 2. Different model parameters of classical machine learning classifiers Classifier
Parameters
SVM
linear kernel, gamma = 0.0001
KNN
k = 5, euclidean distance
GNB
priors = None, var smoothing = 1e−09
BNB
alpha = 1.0, binarize = 0.0
DT
gini criterion, best splitter
LR
liblinear solver, max iter = 1000
RT
max depth = None, random state = 0
GB
max features = 2, max depth = 2, random state = 0
XGB
learning rate = 0.1, max depth = 3
MLP
adam solver, alpha = 1e−5 , random state = 1
NC
manhattan metric
Perceptron tol = 1e−3 , random state = 0
160
3.6
Md. S. Satu et al.
Evaluation
The performance of individual classifiers was assessed by different evaluation metrics such as accuracy, AUC, F-measure, sensitivity, and specificity, respectively. The brief description of them is given as follows. – Accuracy is the summation of TP and TN divided by the total instance values of the confusion matrix. Accuracy =
TP + TN TP + FP + TN + FN
(4)
– AUC measures the capability of the model to distinguish between classes. 1 T P R F P R−1 (x) dx (5) AUC = x=0
– F-Measure/F1-score is the harmonic mean of precision and recall. F-Measure =
2TP 2TP + FP + FN
(6)
– Sensitivity is the ratio of true positives that are correctly identified. Sensitivity =
TP TP + FN
(7)
– Specificity is the ratio of true negatives that are correctly identified. Specificity =
4
TN TN + FP
(8)
Experiment Results
In this work, proposed CNN was used to investigate chest X-ray images of normal, pneumonia and COVID-19 patients. We have used various classifiers, e.g. SVM, RF, KNN, LR, GNB, BNB, XGB, MLP, NC and perceptron using scikit learn library in python. Again, deep learning model such as DNN and pretrained CNN models (VGG16, MobileNet, ResNet50) were implemented using keras library. Therefore, these classifiers were employed 10 fold cross validation procedure using python programming language. All applications have been implemented on a laptop with Asus VivoBook S, Core i5-8250U Processor 8th generation - (8 GB/1 TB HDD/Windows 10 Home/) Intel UHD Graphics 620. The performance of all classifiers was assessed by different evaluation metrics: accuracy, AUC, F-measure, sensitivity, and specificity respectively. The results of these classifiers are shown in Table 3. Most of the models yielded substantially good predictive performances to classify chest X-ray images, where seven of them such as Proposed CNN, XGB, LR, SVM, MLP, RF and GB show the results greater than 90%, six classifiers like perceptron, KNN, GNB, NC, DT and
Convolutional Neural Network Model
161
Table 3. Performance Analysis of Classification Methods Classifier
Accuracy AUC
F-Measure Sensitivity Specificity
XGB
0.9274
0.9456
0.9274
0.9274
0.9637
LR
0.9251
0.9438
0.9253
0.9251
0.9625
SVM
0.9228
0.9421
0.923
0.9228
0.9614
MLP
0.9157
0.9368
0.9163
0.9157
0.9578
RF
0.9064
0.9298
0.906
0.9064
0.9532
GB
0.9052
0.9289
0.9049
0.9052
0.9526
Perceptron
0.8935
0.9201
0.8938
0.8935
0.9467
KNN
0.8538
0.8903
0.8558
0.8538
0.9269
GNB
0.8187
0.864
0.8155
0.8187
0.9093
NC
0.7988
0.8491
0.8009
0.7988
0.8994
DT
0.7883
0.8412
0.7889
0.7883
0.8941
DNN
0.7029
0.7772
0.7035
0.7029
0.8514
ResNet50
0.6070
0.7052
0.5948
0.6070
0.8035
VGG16
0.6035
0.7026
0.5816
0.6035
0.8017
BNB
0.5625
0.6719
0.5180
0.5625
0.7812
InceptionV3
0.5298
0.6473
0.5314
0.5298
0.7649
0.9403
0.9701
Proposed CNN 0.9403
0.9552 0.9403
DNN represent their performance greater than 70% and less than 90%. Consequently, the rest of the classifiers for instances, BNB and pre-trained CNNs such as ResNet50, VGG16, InceptionV3 are provide less than 70% outcomes. The characteristics of these neural network/regression-based classifiers are more realistic to investigate COVID-19 dataset. The details of average results of different classifiers are shown in Fig. 2. Among all of these classifiers, proposed CNN shows the highest accuracy (94.03%), AUROC (95.53%), F-measure (94.03%), sensitivity (94.03%), and specificity (97.01%), where it classifies 272 COVID-19 instances from 285 instances accurately. Then, XGB, LR, SVM, MLP, RF and GB demonstrate better results than other algorithms except for CNN. Moreover, the average performance of all classifiers is satisfactory for all evaluation metrics. We believed, due to the small number of COVID-19 cases, these deep classifiers cannot show more accurate results like CNN and others. Therefore, our proposed CNN model is found the best classifier to identify both COVID-19 positive and negative cases with high performance and can assist physician and policy maker to identify them quickly and take necessary steps. In the proposed CNN, some metrics like accuracy, f-measure and sensitivity are indicated how COVID-19 positive (target) cases can be determined. Besides, they have been effective to explore and identify the target cases. This model shows 94.03% accuracy, f-measure and sensitivity and 95.52% AUC correspond-
162
Md. S. Satu et al.
Fig. 2. Average results of individual classifier
ingly. Besides, specificity is one of the most important terms because it shows how COVID-19 negative patients can be explored more accurately. The incremental result of specificity is denoted more appropriate identification of COVID19 negative cases. Hence, the community transmission has been reduced by these infectious persons whose were not detected as positive cases precisely. Note, our proposed CNN has yielded the highest specificity (97.01%) among all the models evaluated in this work by outperforming them (Table 3). Thus, the proposed CNN indicate its potentiality to avoid false positive cases.
5
Discussion
In this work, several machine and deep learning classifiers were used to investigate chest X-ray images and detect COVID-19 positive cases rapidly. Most of the usage classifiers were widely implemented in earlier works which have been shown good predictive performances. Recently, many studies have been conducted related to COVID-19 chest X-ray image analysis, where some limitations were identified. Many works were focused on high sensitivity, i.e. how the classifiers can identify COVID-19 positive cases frequently [12–15,22]. But, nowadays the community transmission is a great issue to prevent the spread of COVID-19 and the growth of false negative rates are accelerated, which is a great concern. So, in this work, we particularly focused on specificity (i.g. reduced false negative rates) along with other metrics. Our proposed CNN model shows better specificity (97.01%) than many existing works [1,3,5,8,19,24]. At the application level, the usage of proposed model may prevent COVID-19 community transmission by detecting false negative cases more accurately. Also, we verified our experimental results using various evaluation metrics like accuracy, AUC, Fmeasure, sensitivity, and specificity respectively. Several works have analyzed a few number of COVID-19 samples along with other cases where their experi-
Convolutional Neural Network Model
163
mental dataset was remained imbalanced [13,27,28,32]. Besides, they were conducted with a separate dataset where the scarcity of samples was found in both of these datasets [3,28]. In the proposed model, we integrated COVID-19 samples of related datasets in addition to balance the target classes named normal and viral pneumonia using random under-sampling in view of these samples. Some works have improved their results, specially, accuracy by merging non-COVID classes (e.g., class 3 to 2) [3]. It is realized that normal and viral pneumonia are hardly associated with COVID-19. In this current study, we considered these conditions and analyzed them to justify this epidemic situation. Many state-ofthe-arts of CNN such as pre-trained transfer learning models which are available to investigate chest X-ray images and classify COVID-19 [1,3,24]. But, they were not shown better results for small number of samples. If we increase the number of images, it decreases the risk of false positive rate. In the previous work, most of them did not justify their work with machine learning along with deep learning simultaneously [3,13]. Instead, our approach gathers a small and a large number of chest X-images to investigate COVID-19 cases more precisely. Many techniques such as RT-PCR test and viral antigen detection techniques have been useful to identify COVID-19 cases more accurately. But, most of them are cost-heavy, time-consuming and requires specific instructions to implement them. However, sample collection from a large population is a slow process where the infection may remain undetected. Instead, chest-X-ray images are more accessible than other diagnostic measures. In this light, proposed CNN is more vigilant against false negative prediction to tackle community transmission and get better predictability as that undetected cases cannot further trigger more infections. Besides, physicians and healthcare workers cannot take proper steps when many patients are admitted to the hospital. If these cases are detected at an early stage, they can isolate from their community and give proper treatment rapidly to reduce the transmission of COVID-19. In this situation, we need a suitable tool that detect COVID-19 positive and negative cases more feasible way. Our proposed CNN model can automatically detect these cases more accurately with high specificity. In this work, we focus on COVID-19 negative cases so that community transmission is unchained as strongly as possible. Recently, this pandemic condition is getting severe day-by-day. Different sectors such as agriculture, business, finance are faced a huge amount of loss during this period. Many people lose their jobs and do not manage other working opportunities in this pandemic situation. Also, the transmission rate of SARS-CoV-2 is extremely high, hence any undetected COVID-19 case may potentially spread out this disease throughout their community. Therefore, early detection via costeffective tools with high predictive power is extremely required to recognize cases and take proper steps as soon as possible. High specificity rate of our proposed CNN model can successfully reduce false negative rates by detecting subtle cases, that significantly affects not only the public health but consequently re-induce the social and economic normality.
164
6
Md. S. Satu et al.
Conclusion and Future Work
The study proposed a CNN model, which analyzed chest X-ray images of COVID-19, healthy and other viral pneumonia patients to classify and diagnosis COVID-19 patients automatically in a short period of time. Again, various machine and deep learning-based approach were used to justify the performance of our proposed CNN that yields the highest 94.01% accuracy, 95.53% AUC, 94.01% f-measure, 94.01% sensitivity and 97.01% specificity to detect COVID19 patients. Despite taking all the measures of avoiding over-fitting, the performance of proposed CNN model is surprisingly well with small datasets, however, it would be interesting to see its performance with larger training dataset. Hence, in future, we will collect a large number of images from various sources and analyze them to get more feasible outcomes. This approach may be helpful for clinical practices and detection of COVID-19 cases to prevent future community transmission.
References 1. Abbas, A., Abdelsamea, M.M., Gaber, M.M.: Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl. Intell. 51, 854–864 (2020). https://doi.org/10.1007/s10489-020-01829-7 2. Ahammed, K., Satu, M.S., Khan, M.I., Whaiduzzaman, M.: Predicting infectious state of hepatitis C virus affected patient’s applying machine learning methods. In: 2020 IEEE Region 10 Symposium (TENSYMP), pp. 1371–1374. IEEE (2020) 3. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 1 (2020) 4. Butt, C., Gill, J., Chun, D., Babu, B.A.: Deep learning system to screen coronavirus disease 2019 pneumonia. Appl. Intell. 1 (2020) 5. Chandra, T.B., Verma, K., Singh, B.K., Jain, D., Netam, S.S.: Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble. Expert Syst. Appl. 165, 113909 (2021). https://doi.org/10.1016/j.eswa.2020.113909, http://www.sciencedirect. com/science/article/pii/S0957417420307041 6. Chowdhury, M.E., et al.: Can AI help in screening viral and COVID-19 pneumonia? arXiv preprint arXiv:2003.13145 (2020) 7. Cohen, J.P., Morrison, P., Dao, L.: COVID-19 image data collection. arXiv:2003.11597 (2020). https://github.com/ieee8023/covid-chestxray-dataset 8. Duran-Lopez, L., Dominguez-Morales, J.P., Corral-Jaime, J., Vicente-Diaz, S., Linares-Barranco, A.: COVID-XNet: a custom deep learning system to diagnose and locate COVID-19 in chest X-ray images. Appl. Sci. 10(16), 5683 (2020). https://doi.org/10.3390/app10165683, https://www.mdpi.com/2076-3417/10/16/ 5683 9. Dutta, S., Bandyopadhyay, S.K., Kim, T.H.: CNN-LSTM model for verifying predictions of COVID-19 cases. Asian J. Res. Comput. Sci. 25–32 (2020). https://doi.org/10.9734/ajrcos/2020/v5i430141, https://www.journalajrcos.com/ index.php/AJRCOS/article/view/30141
Convolutional Neural Network Model
165
10. Heidari, M., Mirniaharikandehei, S., Khuzani, A.Z., Danala, G., Qiu, Y., Zheng, B.: Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. Int. J. Med. Inform. 144, 104284 (2020). https://doi.org/10.1016/j.ijmedinf.2020.104284, http://www.sciencedirect. com/science/article/pii/S138650562030959X 11. Holshue, M.L., et al.: First case of 2019 novel coronavirus in the United States. New Engl. J. Med. (2020) 12. Ismael, A.M., S ¸ eng¨ ur, A.: Deep learning approaches for COVID-19 detection based on chest X-ray images. Expert Syst. Appl. 164, 114054 (2021). https://doi.org/10.1016/j.eswa.2020.114054, http://www.sciencedirect. com/science/article/pii/S0957417420308198 13. Karar, M.E., Hemdan, E.E.D., Shouman, M.A.: Cascaded deep learning classifiers for computer-aided diagnosis of COVID-19 and pneumonia diseases in X-ray scans. Complex Intell. Syst. 7, 235–247 (2020). https://doi.org/10.1007/s40747020-00199-4 14. Karthik, R., Menaka, R., M., H.: Learning distinctive filters for COVID-19 detection from chest X-ray using shuffled residual CNN. Appl. Soft Comput. 106744 (2020). https://doi.org/10.1016/j.asoc.2020.106744, https://www. sciencedirect.com/science/article/pii/S1568494620306827 15. Khan, A.I., Shah, J.L., Bhat, M.M.: CoroNet: a deep neural network for detection and diagnosis of COVID-19 from chest x-ray images. Comput. Methods Program. Biomed. 196, 105581 (2020). https://doi.org/10.1016/j.cmpb.2020.105581, http:// www.sciencedirect.com/science/article/pii/S0169260720314140 16. Kroft, L.J., van der Velden, L., Gir´ on, I.H., Roelofs, J.J., de Roos, A., Geleijns, J.: Added value of ultra-low-dose computed tomography, dose equivalent to chest x-ray radiography, for diagnosing chest pathology. J. Thorac. Imaging 34(3), 179 (2019) 17. Lippi, G., Plebani, M.: Procalcitonin in patients with severe coronavirus disease 2019 (covid-19): a meta-analysis. Clin. Chimica Acta Int. J. Clin. Chem. 505, 190 (2020) 18. Lu, R., et al.: Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 395(10224), 565–574 (2020) 19. Minaee, S., Kafieh, R., Sonka, M., Yazdani, S., Jamalipour Soufi, G.: DeepCOVID: predicting COVID-19 from chest X-ray images using deep transfer learning. Med. Image Anal. 65, 101794 (2020). https://doi.org/10.1016/j.media.2020. 101794, http://www.sciencedirect.com/science/article/pii/S1361841520301584 20. Moura, J.D., et al.: Deep convolutional approaches for the analysis of COVID-19 using chest X-ray images from portable devices. IEEE Access 8, 195594–195607 (2020). https://doi.org/10.1109/ACCESS.2020.3033762 21. Ng, M.Y., et al.: Imaging profile of the COVID-19 infection: radiologic findings and literature review. Radiol. Cardiothorac. Imaging 2(1), e200034 (2020) 22. Ohata, E.F., et al.: Automatic detection of COVID-19 infection using chest Xray images through transfer learning. IEEE/CAA J. Autom. Sinica 8(1), 239–248 (2021). https://doi.org/10.1109/JAS.2020.1003393 23. World Health Organization, et al.: Laboratory testing for coronavirus disease 2019 (COVID-19) in suspected human cases: interim guidance, 2 March 2020. Technical report, World Health Organization (2020) 24. Pandit, M.K., Banday, S.A.: SARS n-CoV2-19 detection from chest x-ray images using deep neural networks. Int. J. Pervasive Comput. Commun. 16(5), 419–427 (2020). https://doi.org/10.1108/IJPCC-06-2020-0060
166
Md. S. Satu et al.
25. Shahriare Satu, M., Atik, S.T., Moni, M.A.: A novel hybrid machine learning model to predict diabetes mellitus. In: Uddin, M.S., Bansal, J.C. (eds.) Proceedings of International Joint Conference on Computational Intelligence. AIS, pp. 453–465. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3607-6 36 26. Satu, M.S., Rahman, S., Khan, M.I., Abedin, M.Z., Kaiser, M.S., Mahmud, M.: Towards improved detection of cognitive performance using bidirectional multilayer long-short term memory neural network. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 297–306. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 27 27. Sekeroglu, B., Ozsahin, I.: Detection of COVID-19 from chest X-ray images using convolutional neural networks. SLAS TECHNOL.: Transl. Life Sci. Innov. 25(6), 553–565 (2020). https://doi.org/10.1177/2472630320958376 28. Shankar, K., Perumal, E.: A novel hand-crafted with deep learning features based fusion model for COVID-19 diagnosis and classification using chest X-ray images. Complex Intell. Syst. 7, 1277–1293 (2020). https://doi.org/10.1007/s40747-02000216-6 29. Shorfuzzaman, M., Hossain, M.S.: MetaCOVID: a siamese neural network framework with contrastive loss for N-shot diagnosis of COVID-19 patients. Pattern Recognit. 107700 (2020). https://doi.org/10.1016/j.patcog.2020.107700, https:// www.sciencedirect.com/science/article/pii/S0031320320305033 30. Stoecklin, S.B., et al.: First cases of coronavirus disease 2019 (COVID-19) in France: surveillance, investigations and control measures, January 2020. Eurosurveillance 25(6), 2000094 (2020) 31. Wang, L., Lin, Z.Q., Wong, A.: COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 10(1), 19549 (2020). https://doi.org/10.1038/s41598-020-76550-z, https:// www.nature.com/articles/s41598-020-76550-z 32. Zebin, T., Rezvy, S.: COVID-19 detection and disease progression visualization: deep learning on chest X-rays for classification and coarse localization. Appl. Intell. 51, 1010–1021 (2020). https://doi.org/10.1007/s10489-020-01867-1 33. Zhu, N., et al.: A novel coronavirus from patients with pneumonia in China, 2019. New Engl. J. Med. (2020)
Classification of Tumor Cell Using a Naive Convolutional Neural Network Model Debashis Gupta1(B) , Syed Rahat Hassan2 , Renu Gupta3 , Urmi Saha1 , and Mohammed Sowket Ali1 1
3
Bangladesh Army University of Science and Technology, Saidpur, Bangladesh {debashisgupta,sahaurmi,sowket}@baust.edu.bd 2 Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh [email protected] TMSS Medical College and Rafatullah Community Hospital, Bogra, Bangladesh [email protected] Abstract. Early detection of tumor tissue leading to cancer is the most burning health issue of the present world due to an increase of radiation, ultraviolet light, radon gas, infectious agents etc. To diagnose the tumor cell promptly nowadays computer-aided detection (CAD) systems using a convolutional neural network (CNN) draws a significant role in the health sector. Many complicated CNN model has been introduced to effectively classify tumor cell but in this study, we proposed a relatively less complex deep learning approach that is as effective and reliable as a renowned pre-trained models such as VGG19, Inception-v3, Resnet-50 and DenseNet-201. Our proposed architecture can perfectly classify the tumor cell based on the PatchCamelyon (PCam) dataset with appeasement validation accuracy 94.70% using less computational parameters, comparatively mentioned pre-trained model. Keywords: Tumor Cell · Neural Network · Classification · CNN · Digital histopathology · Computer-assisted diagnosis · Transfer learning
1
Introduction
In 2018, WHO reported that cancer is responsible for 9.6 million deaths and one of the prime causes of death around the globe [1]. Cancer cells grow without any bound and can spread throughout the body. Among many other reasons, one of the reasons behind cancer is genetic changes. Environmental elements like chemicals in tobacco, radiation, etc. also play a role in different kinds of cancer [2]. To save valuable life, early detection of cancer is mandatory. In the early time of medical diagnosis, the test was quite laborious and expensive. With time science has shown us the path to find a more efficient, fast and cheaper way to do the test and acquire more reliable results. Machine learning became one of the well-established emerging techniques for medical diagnosis tasks in the late 20th-century and early 21st-century [8]. One of the problems For writing (original draft preparation) include Renu Gupta with other authors. c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 167–176, 2023. https://doi.org/10.1007/978-3-031-34619-4_14
168
D. Gupta et al.
with machine learning techniques was it required handcrafted feature engineering. Later deep learning helped us to thrive in this field more robustly than ever [3]. With different deep learning methods, we are continuously improving in case of diagnosis and overcoming obstacles. Various kinds of medical image techniques are there like MRI, X-ray, PET etc. to identify diseases. Among various deep learning methods, Convolutional Neural Network (CNN) has proven its worth quite clearly when it comes to image data [12]. Analyzing medical image data, pathology slides, etc. can be done with CNN more efficiently than before. Modern Deep Learning models depict an advanced improvement in classifying medical images and illustrate an impressive successful report on diagnosing diseases. However, these state-of-art-models are trained on numerous parameters, which also require a need to have the superior computational power and cost complexity. Hence in this study, we illustrated a model which achieves analogous accuracy along with other experimental metrics like f1-score, precision, recall, and support similar to the widely accepted pre-trained models, i.e., VGG-19, Inception-v3, ResNet-50, and DenseNet-201. Furthermore, we also successfully showed that our proposed architecture having equivalent metrics has much fewer computational parameters than mentioned models.
2
Literature Review
Zheng et al. [19] demonstrated a resnet based architecture to classify the cancer cells. They used Test Time Augmentation to make random changes to the test inputs and then feed through their model which shows an AUC score of 98%. Wang et al. [18] used Cameleyon16 dataset where they used two different evaluation approach and achieved a adequate result. The first approach was slide based classification and second approach was lesion based classification. With deep learning methods, in the first case they achieved an AUC of 0.925 and in the second case they achieved 0.7051. Liang et al. [10] introduced CNN model with Convolutional Block Attention Module (CBAM) to identify cancer cells. Model was validated using the PCam dataset and got 0.976 AUC score. Lantang et al. [9] proposed their own convolutional neural network to identify cancer cells on the PCam dataset. They had total 8 convolution layers and after each two convolution layer there was a max pool layer. After convolution process, a flatten layer was put to make the feature maps into a vector. Finally three FC (Fully connected) layer with dropout was placed to finish the architecture. With their approach the got 92% accuracy, 94% F1-score for cancerous cell class and 94% for no cancerous cell class and got AUC of 98%. Kassani et al. [7] proposed three-path ensemble architecture using three pretrained model VGG19, MobileNetV2 and Densenet201 for classification of breast cancer. With this approach they ended up with a decent accuracy of 90.84%, 89.09% and 87.84% for VGG19, MobileNetV2 and Densenet201 respectively for PatchCamelyon dataset.
Classification of Tumor Cell
169
In [17] for breast cancer detection on BreakHis dataset, a summation of the boosting trees classifiers and CNN was proposed where the model engaged Inception-ResNet-v2 model for feature extraction from multi-scale images. Afterward, using gradient boosting tree the classifier is used for final classification leap. Roy et al. [11] proposed a patch-based classifier using CNN and majority voting method to classify breast cancer histopathology using ICIAR dataset. The proposed model predicts the output class label for both binary and multiclass task. In this paper we tried to build a CNN (Convolutional Neural network) model and also implemented 4 pre-trained cnn model to identify metastatic cancer cells from small image patches which was acquired from larger digital pathology scans.
3 3.1
Methodology Dataset Description
The dataset has been taken from Kaggle competition. The data is a slightly modified of the PatchCamelyon (PCam) [16] benchmark dataset. In the original dataset the data were duplicate due to probabilistic sampling however the dataset we are working does not contain any duplicates. In the dataset a large number of small pathology images are given to classify. Here, id is used to represent the filenames and train labels.csv file provides the ground truth for the images in train folder. A positive label indicates that at least one pixel of tumor tissue is present in the middle 32 × 32 pixel section of a patch. However, tumor tissue on the outermost portion of the patch has no effect on the label. This outside region allows fully-convolutional models without zero-padding to behave consistently when applied to a whole-slide image. Sample images from the data-set are shown In Fig. 1. There are total 220,025 data samples in the training set where 130,908 data are labeled as 0 (no tumor tissue) and 89,117 data are labeled as 1 (has tumor tissue). Along with this 57458 data samples are found in testing set. It is worth mentioning that all of these data are unique. 3.2
Data Pre-processing
For our experimental purpose, we took only 160,000 random data samples for training set which is later divided into the train set and validation set having 144,000 and 16,000 random data samples respectively. For our experimental workings, we took all the images in the size of 96 × 96 pixels in the RGB channel. Data normalization is important for a uniform pixel value distribution and makes the convergence faster. To normalize the values of pixels in between 0 to 1, pixel values were divided by 255.
170
D. Gupta et al.
Fig. 1. Sample images of the PCam dataset
3.3
Models
In this subsection, we have discussed briefly about the pre-trained architectures that we have used in our experiment and also showed our proposed architecture. VGG-19: Our first experimental model is VGG19 [13]. The VGG19 model is a variant of the VGG model that, in a nutshell, consists of 19 layers in total. These layers are as follows: 16 convolution layers, 3 completely connected levels, 5 MaxPool layers, and 1 softmax layer. During our experimental work we only took the feature extractor part of this model to extract the features in the image. A stacked convolutional layer of a fully connected (FC) layer was applied to the classifier. A dropout value of 0.5 is applied to each of the 512 channels that make up the FC layer. Inception-V3: The second experimental model is Inception-v3 [15]. Inceptionv3 is the successor to Inception-v1, and it has 24M parameters. This pre-trained model accomplishes a level of accuracy that is considered to be state-of-the-art for the recognition of general objects that have 1000 classes. In the first step, the model extracts general features from input images, and in the second step, it classifies the images based on those features. Nevertheless, based on our previous model, we employed only the feature extraction portion of inception-v3 and two fully-connected (FC) convolutional layers to classify them. Both of the FC layers have a total of 1024 channels and a dropout value of 0.5 in each.
Classification of Tumor Cell
171
Fig. 2. Our proposed architecture
DenseNet-201: In DenseNet architecture [5], each layer in the dense blocks takes the feature maps of the previous layers and gives it to the next layer. Feature maps gotten from different layers are intertwined through concatenation. This dense connections gives the architecture a better pathway to archive significantly finer gradient-flow. Here 201 represents the trainable layers with weights. Each dense layer has 1 × 1 and 3 × 3 convolution operation. Feature re-use ability of this architecture is quite efficient since no compelling reason to relearn inessential feature maps. After extracting the feature maps using the base model densenet201, global-average pooling layer was used. Then one FC layer with 256 unit is given with batch normalization and a dropout of 0.5 placed before and after of the layer. ResNet-50: In ILSVRC 2015 image classification, detection, and localization, ResNet was the winner. In the ResNet architecture, short connection is developed
172
D. Gupta et al.
and its job is without modifying the incoming feature from the former layer fit the input to the next layer [4]. This removes the problem of Vanishing Gradient’s problem. To reduce the time complexity a bottleneck strategy is implemented where 1*1 convolution layer is given at the start and at the end of a network to lessen the number of parameters while not corrupting the performance of the architecture to such an extent. The base model ResNet50 was implemented like densenet201 with global-average pooling, batch normalization and dropout with one FC layer with 256 unit. Proposed Architecture: In the architecture there are main 3 convolutional blocks, which contains 3 convolution layer with relu activation function, 3 batch batch-normalization layer and lastly max pool layer. From Fig. 2, we can see the visual representation of the architecture. In the first block there are 3 convolutional layers with 64 filters, in the second block there are 3 convolutional layers with 128 filters and finally in the thrid block there are 3 convolutional layers with 256 blocks. Convolutional layer takes filters and applies it to the values of pixel and extract feature maps. After every convolution layer there is a batch-normalization layer. To make the network more stable, By subtracting the batch mean and dividing by the batch standard deviation it normalizes the feature map that is taken from the previous layer [6]. It has also some regularization effect. At the end of our every convolutional blocks, a max pool layer is placed. This layer takes the feature maps from it’s previous layer and takes the highest value from each small patch. It necessarily down samples the input and works better than average pooling. After the 3 convolutional blocks a dropout layer is added to make the generalization better and to address the over-fitting problem [14]. Then comes the flatten layer which takes the 2D matrix and makes it into a vector so that it can be given to the fully connected layer. There are three FC layers and the last one predicts the classification task with the help of sigmoid function. Table 1. Classification Report Model
Class
Precision(%) Recall(%) F1-Score(%) Support Val Accuracy(%) AUC(%)
VGG-19
no tumor tissue 94 has tumor tissue 97 avg 95
97 94 95
95 95 95
8000 8000 16000
95.38
98.91
Inception-v3
no tumor tissue 93 has tumor tissue 96 avg 94
96 93 94
95 94 94
8000 8000 16000
94.41
98.63
Resnet-50
no tumor tissue 92 has tumor tissue 97 avg 95
97 92 94
95 94 94
8000 8000 16000
94.44
98.74
Densenet-201
no tumor tissue 94 has tumor tissue 97 avg 95
97 94 95
95 95 95
8000 8000 16000
95.42
98.95
Proposed Model no tumor tissue 94 has tumor tissue 95 avg 95
95 94 95
95 95 95
8000 8000 16000
94.70
98.63
Classification of Tumor Cell
173
Table 2. Comparison Of Computational Parameters Architecture
Total Computational Parameters
Inception-V3
122,224,546
VGG-19
45,717,570
ResNet-50
24,122,070
DenseNet-201
18,823,062
Proposed architecture 7,171,842
3.4
Experimental Setup
In this paper, we performed a classification task on a version of the PatchCamelyon (PCam) dataset which classifies whether a histopathology image has cancerous tissue or not. All the models were trained using the training set for 25 epochs. As we are performing binary classification, binary cross-entropy is used for loss function. Validation and training batch size was set to 10. In order to minimize the loss function, Adam optimizer was used and 0.0001 was chosen as the learning rate. All experiment was done in Kaggle using GPU. As for the evaluation of models, we have chosen Accuracy, Precision, Recall, F1-score and AUC.
4
Result Analysis
From Table 1 it is clearly seen that the most dominating models are densenet201 and vgg-19 with 95.42% and 95.38% validation accuracy respectively.These models also have better precision, recall and F1-score than other models. The other models i.e. Inception-v3 and restnet-50 are also looking great with decent accuracy and AUC. Alongside these giant pre-trained models, our proposed model also has a comparable and better accuracy of 94.70% than the inception-v3 and restnet-50 with its less complex architecture. In Fig. 3a the training and validation loss curve denotes a small gap between the training loss and validation loss which remarks the better fit of the model. From Table 1, it can be seen that the proposed model shows an identical precision, recall, and F1-score percentage to the best-performed models i.e., Densenet-201 and VGG-19, with a minute differ in accuracy of 0.72% and 0.68% respectively. But compared to the number of computational parameters, Table 2 manifests a discernible difference between the proposed model and Densenet-201 and VGG-19 (Fig. 4).
174
D. Gupta et al.
Fig. 3. Performance Curve of Proposed Architecture
Fig. 4. Confusion Matrix of Proposed Architecture
5
Conclusion
In this paper, 4 pre-trained models were implemented and used transfer learning to identify tumor cells. The proposed model architecture with fewer computational parameters performs quite close to the pre-trained models. Among the 4 pre-trained models with tuning, DenseNet-201 and VGG-19 both performed the best. With our model, we achieved 94.70% accuracy and an AUC score of 98.70% which is quite close to DenseNet-201 and VGG-19 models performance. The limitation was acquiring most computational power by using well decorated GPU hardware system. Day by day medical image analysis is becoming famous for its effectiveness. This kind of study can help our medical community to make more robust decisions and identify tumor cells leading to cancer effectively for better health care.
Classification of Tumor Cell
175
Acknowledgement. The authors Debashis Gupta, Syed Rahat Hassan, and Urmi Saha sincerely thank Dr. Renu Gupta for sharing knowledge on tumor tissue and validating the proposed model’s predictions. Additionally, the authors also possess their gratitude to Dr. Engr. Mohammed Sowket Ali for his continual supervision. Author contributions. Conceptualization and Methodology - Debashis Gupta, Urmi Saha, Syed Rahat Hassan, and Renu Gupta. Implementation Coding Debashis Gupta, Urmi Saha, and Syed Rahat Hassan. Resource Management Debashis Gupta, Renu Gupta and Urmi Saha. Writing - orginal draft preparation Debashis Gupta, Syed Rahat Hassan and Urmi Saha. Writing - Review and Editing -Renu Gupta, Mohammed Sowket Ali, and Debashis Gupta. All authors have read and agreed to the published version of the manuscript. Conflicts of Interest. The authors declare no conflict of interest.
References 1. Cancer. www.who.int/health-topics/cancer. Accessed 15 Jan 2020 2. What is cancer?. www.cancer.gov/about-cancer/understanding/what-is-cancer. Accessed 19 Jan 2020 3. Bakator, M., Radosav, D.: Deep learning and medical diagnosis: a review of literature. Multimodal Technol. Interact. 2(3), 47 (2018) 4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 5. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 6. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) 7. Kassani, S.H., Kassani, P.H., Wesolowski, M.J., Schneider, K.A., Deters, R.: Classification of histopathological biopsy images using ensemble of deep learning networks. arXiv preprint arXiv:1909.11870 (2019) 8. Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23(1), 89–109 (2001) 9. Lantang, O., Tiba, A., Hajdu, A., Terdik, G.: Convolutional neural network for predicting the spread of cancer. In: 2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), pp. 175–180. IEEE (2019) 10. Liang, Y., Yang, J., Quan, X., Zhang, H.: Metastatic breast cancer recognition in histopathology images using convolutional neural network with attention mechanism. In: 2019 Chinese Automation Congress (CAC), pp. 2922–2926. IEEE (2019) 11. Roy, K., Banik, D., Bhattacharjee, D., Nasipuri, M.: Patch-based system for classification of breast histology images using deep learning. Comput. Med. Imaging Graph. 71, 90–103 (2019) 12. Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017) 13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
176
D. Gupta et al.
14. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 15. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 16. Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology. CoRR abs/1806.03962 (2018). arxiv.org/abs/1806.03962 17. Vo, D.M., Nguyen, N.Q., Lee, S.W.: Classification of breast cancer histology images using incremental boosting convolution networks. Inf. Sci. 482, 123–138 (2019) 18. Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718 (2016) 19. Zheng, Z., Zhang, H., Li, X., Liu, S., Teng, Y.: ResNet-based model for cancer detection. In: 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), pp. 325–328. IEEE (2021)
Tumor-TL: A Transfer Learning Approach for Classifying Brain Tumors from MRI Images Abu Kowshir Bitto1
, Md. Hasan Imam Bijoy2(B) and Md. Jueal Mia2
, Sabina Yesmin1
,
1 Department of Software Engineering, Daffodil International University, Dhaka 1341,
Bangladesh [email protected] 2 Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh [email protected]
Abstract. An intracranial tumor is another name for a brain tumor, is a fast cell proliferation and uncontrolled bulk of tissue, and seems unaffected by the mechanisms that normally govern normal cells. The identification and segmentation of brain tumors are among the most common difficult and time-consuming tasks when processing medical images. MRI is a medical imaging technique that allows radiologists to see within body structures without requiring surgery. The information provided by MRI regarding human soft tissue contributes to the diagnosis of brain tumors. In this paper, we use several Convolutional Neural Network architectures to identify brain tumor MRI. We use a variety of pre-trained models such as VGG16, VGG19, and ResNet50, which we have found to be critical for reaching competitive performance. ResNet50 performs with an accuracy of 96.76% among all the models. Keywords: Brain Tumor · MRI · VGG16 · VGG19 · ResNet50 · Transfer Learning
1 Introduction Having millions of neurons cooperating, the brain is among the entire body’s most intricate systems [1]. When a tumor grows in the head, the weighted interior of the brain expands, causing damage to the brain. An intracranial neoplasm, often known as a brain tumor, is a disorder in which abnormal cells grow in human brain. Two types of brain tumors: malignant (cancerous) and benign (non-cancerous). Tumor that cause cancer might be primary tumors, metastases, or secondary tumors. When the DNA of normal brain cells has a flaw, it results in brain tumors. Cells in the body constantly split and die, only to be replaced by another cell. Modern cells are formed in several circumstances, but the old cells are not completely eliminated. These cells coagulate as a result, and they have the potential to form tumors. Brain tumors are frequently passed © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 177–186, 2023. https://doi.org/10.1007/978-3-031-34619-4_15
178
A. K. Bitto et al.
down through the generations. The most prevalent and powerful are gliomas [2]. Glioma detection at an early stage is critical for achieving the best treatment results [3]. Several diagnostic imaging techniques that provide vital information about the shape, size, area, and digestion system of brain tumors. While these modalities are used in combination to provide the most detailed information about brain tumors, MRI is considered the standard strategy due to its significant delicate tissue differentiation and broad accessibility [4]. As stated by to the World Health Organization’s categorization scheme for identifying brain tumors, Upwards of 120 varieties of brain tumors exist that vary in their origin, range, size, and features. The major goal of our research article is to figure out how to use transfer learning and deep learning to recognize this complicated human condition and calculate its percentage using picture data. Transfer learning is the technique of using the perception gleaned from a prepared demonstration to learn a new set of facts [2]. Analysts have presented numerous computational techniques for identifying and classifying brain tumors through brain MRI pictures ever since it was made possible to channel and stack meaningful images to the computer [3]. A Convolutional Neural Network is used to determine the kind and location of malignancies (CNN)based multitask classification is built. We used Kaggle and Figshare datasets for this study. A total of 7116 brain MRI scans have been divided into four categories: pituitary, meningioma, glioma, and no tumor, where the CNN pre-trained model will be used. Remind the paper as follows. Section 2 belongs to the literature review for MRI studies. Section 3 discusses the method for identifying brain tumors from MRI images. Section 4 represents and discusses the findings. The document’s 5 section, without a doubt, draws the whole study to a conclusion.
2 Literature Review Many papers, publications, and research projects focus on the detection and categorization MRI scans of brain tumors. A few of the work reviews that have been provided are included below. This study of Pereira et al. [1] described a CNN-based approach for segmenting brain tumors in MRI images. Compared to perceptions and a prior model, these models frequently include probability work. Pre-processing, classification by CNN, and post-processing are the three key parts of the approach. The BRATS 2013 and 2015 databases were used to test the suggested technique. The three most typical variety of brain tumors—glioma, meningioma, and pituitary tumors—are distinguished using a three-class classification system, according to Deepak et al. [2]. To abstraction properties from brain MRI data, the suggested categorization method uses deep transfer learning and a pre-trained Google Neural Network. The authors extracted features using a pre-trained VGG-16 model and a fine-tuned AlexNet, then categorized them using a support vector machine. The learning showed used 3D CNN engineering and was based on knowledge transmission. Those who used handcrafted engineering used metric for precision mentioned in the transfer learning-based calculations. Talo et al. [3] proposed a technique for automatically identifying abnormal and normal brain MR pictures. The ResNet34 model was utilized, and it accurately recognized brain anomalies in the brain by utilizing a PSO SVM classifier with a Kernel with a radial base. The experiments in
Tumor-TL: A Transfer Learning Approach
179
this paper were carried out using the Harvard Medical School MR dataset, and numerous sophisticated deep learning approaches for hyperparameter optimization were used. Ahuja et al. [4] proposed employing the superpixel approach to detect and segment brain tumors via transfer learning. For starters, MRI cuts are divided into three categories: particular, typical, LGG, and HGG. The proposed methodology consistently arranged 99.82% and 96.32% validation data precision using VGG-19 at epoch-6. The LGG and HGG MRI brain tumor images are segmented in the current arrangement to show the tumor. Tumor division is accomplished using the superpixel method. The tumor segmentation produces a normal dice file of 0.932. Future research should examine the method using a real-time patient database. To raise the average dice list, division organizations must support improvements in post-processing methods. Khan et al. [5] describe a fully automated profound learning system for multifunctional brain tumor classification that includes differentiating enhancement. The work’s quality was graded in three stages. To begin, differentiation extending using the edge-based interface; HE was used to prolong the image contrast of the tumor region in the preprocessing step. Furthermore, the use of general underlined the option of robust deep learning. Finally, the ELM classification was updated to categorize reported tumors into the appropriate group. With the aid of transfer learning, this study synthesized the features from two separate CNN models. The goal of combining two CNN models was always to create a more modern include variable with more data. The experiment was conducted on the BraTs datasets, and the outcomes revealed an improvement in precision (98.16%, 97.26%, and 93.40%, respectively, for the BraTs2015, BraTs2017, and BraTs 2018 datasets). For MR brain image categorization, Kaur et al. [6] compare separate pre-trained DCNN models using interchange learning. Using pre-trained DCNN modeling with interchange learning to provide a ceiling level of recognition rate proved successful. Out of all the models tested, the Alexnet show performed the best, with classification rates of 100%, 94%, and 95.92% for all data sets. They arose due to existing conventional and deep learning algorithms based on brain classification tasks. In contrast, the author describes next work that will concentrate on running models on frameworks with GPU-enabled capacity, which is anticipated to reduce computational overhead and investigate various fine-tuning techniques. Khan et al. [5] describe an automatic multimodal classification technique for brain tumor type categorization based on deep learning. Cancerous and noncancerous brain tumors exist. It has the potential to injure the brain, which might be fatal. This study used a direct distinguish upgrade technique modified with the help of histogram equalization. The extracted features from two different CNN models were done via exchange learning, and the integration was done. The goal of combining two CNN architectures was to provide more data to an underused highlight vector.
3 Methodology Our study’s primary objective is to create a method for identifying and classifying brain tumors from MRI scans. We must go through numerous phases to attain our aim, including dataset collecting, data preprocessing, model creation, etc. In Fig. 1, the functioning procedure is presented.
180
A. K. Bitto et al.
Fig. 1. MRI image classification according to the working technique.
3.1 Data Description We used Figshare and Kaggle to obtain brain tumor data of MRI images [7]. There are 7116 brain MRI pictures in this dataset, divided into four class: pituitary, meningioma, glioma, and no tumor. Colors have been applied to the photographs taken, and example data has been presented in Fig. 2.
Fig. 2. Sample dataset for (a) Pituitary, (b) Meningioma, (c) Glioma and (d) No Tumor.
3.2 Data Preprocessing Geometric alterations are used in data preprocessing procedures. We make the image pixels 220 * 220 for VGG19, VGG16, and ResNet50. All the images are of the same high quality. Based on picture alterations, the photographs were rotated, width shifted, height shifted, shear shifted, and horizontally flipped.
Tumor-TL: A Transfer Learning Approach
181
3.3 Model Implementation In this study, we used CNN based transfer algorithm for the brain tumor dataset. Transfer learning model relevant theory given below. Transfer Learning (TL): Transfer supervised machine learning approach in which a show made for an errand is used as the project focusing on a significant task [8, 9]. Given the enormous computation, it is a common method in computer vision and normal languages generating assignments. In computer vision, neural systems ordinarily point to identify edges within the first layer, shapes within the center layer, and task-specific highlights within the last-mentioned layers. The early and central layers are utilized in transfer learning, and the areas of the last-mentioned layer were retrained. It makes use of the named information from the errand it was prepared on. VGG-16: VGG-16 has 16 layers [10] and a homogeneous architecture, making it quite appealing. It is extremely similar to AlexNet in that it only includes 3x3 convolutions but a large number of filters. It may be taught for 2–3 weeks on 4 GPUs. On the other hand, it is now the most popular method to extract features from photos in the community. Transfer Learning can help to reach VGG. The model is pre-trained on a dataset, the parameters are adjusted for improved accuracy, and the parameter values can be used. VGG-19: The VGG19 architecture [11] consists of 5 convolutional blocks, that are implemented by 3 fully linked layers. An improved direct unit (ReLU) development is performed after each convolution, and a max-pooling technique is periodically utilized to minimize the spatial dimension. 2 totally connected layers with 4,096 ReLU enacted units are currently using the ultimate 1,000 entirely softmax layer. Convolutional components include extraction layers, which can be thought of as a subset of them. The actuation maps created by these layers are the bottleneck features. ResNet-50: A Residual Neural Network is known as “ResNet50” [12]. With 50 layers ResNet-50 is a convolutional neural network (CNN). Resnet50 is a variant of the ResNet that can operate with up to 50 neural network layers. ResNet50 is a ResNet model that has 48 Convolution layers, one MaxPool layer, and one Normal Pool layer. There are 3.8*109 coasting actions in al. The ResNet-50 [13] has roughly 23 million trainable features.
3.4 Performance Calculation We used test data to evaluate the algorithms’ efficiency after training. Here are a few of the metrics that have been generated for performance review. We discovered the most effective model to forecast in this situation using these criteria. Based on the confusion matrix that the model provides, numerous percentage performance measurements have been created using Eqs. (1–7). Accuracy =
True Positive + True Negative × 100% Total Number of Images
True Positive Rate (TPR) =
True Positve × 100% True Positive + False Negative
(1) (2)
182
A. K. Bitto et al.
True Negative × 100% False Positive + True Negative False Positive × 100% False Positive Rate (FPR) = False Positive + True Negative False Negative False Negative Rate (FNR) = × 100% False Negative + True Positive True Positive × 100% Precision = True Positive + False Positive Precision × Recall × 100% F1 Score = 2 × Precision + Recall True Negative Rate (TNR) =
(3) (4) (5) (6) (7)
4 Results and Discussions The 5712 tumor train images and 1404 validation images were split through 80:20 groups. Intel Core i7 CPU with 16 GB of RAM powers the observing platform. All input pictures for the VGG-16, VGG-19, and ResNet-50 models were scaled to 220 * 220, 220 * 220, and 220 * 220, respectively. Our research used those models to scale photos to 220 * 220 pixels. The weights of the pre-trained these models were employed. The resultant confusion matrix for each available model (TP, FN, FP, TN) is shown in Table 1 with four classes. Table 1. Confusion matrices as applied transfer learning with four class. Model
Class
TP
FN
FP
TN
VGG-16
Pituitary
818
48
23
515
Meningioma
778
43
18
565
Glioma
723
32
23
626
No Tumor
671
53
8
672
Pituitary
840
55
14
495
Meningioma
773
65
13
553
Glioma
670
54
12
670
VGG-19
Resnet-50
No Tumor
868
22
10
504
Pituitary
798
25
15
566
Meningioma
760
12
19
613
Glioma
745
36
26
597
No Tumor
723
34
21
626
We used 40 epochs with a batch size of 32 for VGG-16. We construct the confusion matrix and assess performance for each class when VGG-16 is completed. Figure 3 shows the accuracy graph and loss, while Table 2 appear the computed performance.
Tumor-TL: A Transfer Learning Approach
183
Fig. 3. Diagram for (a) VGG-16 accuracy and (b) VGG-16 loss on 40 epochs.
Table 2. Evaluation appraisal tables by class for VGG-16 Model
Class
Accuracy (%)
TPR (%)
FNR (%)
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
Vgg16
Pituitary
94.94
94.46
5.54
4.28
95.72
97.27
95.84
Meningioma
95.67
94.76
5.24
3.07
96.97
97.74
96.23
Glioma
96.09
95.76
4.24
3.53
96.47
96.92
96.34
No Tumor
95.66
92.68
7.32
1.17
98.83
98.82
95.66
For VGG-19, we utilized 40 epochs and a batch size of 32. When VGG-19 is completed, we create the confusion matrix from the model and evaluate each class’s performance. The accuracy graph and loss are shown in Fig. 4, and the computed performance is presented in Table 3.
Fig. 4. Diagram for (a) VGG-19 accuracy and (b) VGG-19 loss on 40 epochs.
184
A. K. Bitto et al. Table 3. Evaluation appraisal tables by class for VGG-19
Model
Class
Accuracy (%)
TPR (%)
FNR (%)
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
Vgg19
Pituitary
95.08
93.85
6.15
2.75
97.25
98.36
96.05
Meningioma
95.20
92.24
7.76
2.30
97.70
98.35
95.20
Glioma
95.31
92.54
7.46
1.75
98.25
98.24
95.31
No Tumor
98.19
97.53
2.47
1.95
98.05
98.86
98.19
Fig. 5. Diagram for (a) ResNet-50 accuracy and (b) ResNet-50 loss on 40 epochs.
We used 40 set of concepts and a batch size of 32 for ResNet-50. We generate the confusion matrix and evaluate each class’s performance when ResNet-50 is completed. Figure 5 shows the accuracy graph and loss, while Table 4 shows the computed performance. Table 4. Evaluation appraisal tables by class for ResNet-50 Model
Class
Accuracy (%)
TPR (%)
FNR (%)
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
ResNet-50
Pituitary
97.56
96.96
3.04
2.59
97.41
98.15
97.56
Meningioma
97.79
98.45
1.55
3.01
96.99
97.56
98.00
Glioma
95.59
95.39
4.61
4.16
95.84
96.63
96.01
No Tumor
96.09
95.51
4.49
3.24
96.76
97.18
96.34
The trained model will be assessed using the validation dataset in the proposed study. The training dataset for our model, which contained both the baseline and enhanced photographs, was made available as a learning option. After that, the model was verified
Tumor-TL: A Transfer Learning Approach
185
to confirm that it was correct. After being trained on the tumor illness dataset using all available model architectures, the model’s performance has been measured using test images. The pre-trained ResNet-50, VGG-16, and VGG-19 model weights were experimented with [14]. This was done to compare our model to other well-known, previously trained transfer learning networks. We examined which trained network was used to be the most appropriate for this dataset. Three distinct models are VGG-16, VGG-19, and ResNet-50, present in Table 5. Table 5. Final accuracy table for the computed performance of applied transfer learning. Model
Accuracy (%)
TPR (%)
FNR (%)
FPR (%)
TNR (%)
Precision (%)
F1 Score (%)
VGG-16
95.59
94.415
5.585
3.0125
96.9975
97.6875
96.01
VGG-19
95.95
94.04
5.96
2.19
97.81
98.45
96.19
ResNet-50
96.76
96.58
3.42
3.25
96.75
97.38
96.978
5 Conclusion With millions of cells interacting, one of the organs is the brain of the entire body’s highly complex systems. When a brain tumor grows in the head, the pressure inside the brain rises, causing the brain to be damaged. An intracranial neoplasm, often known as a brain tumor, is a disorder in which abnormal cells grow in the human brain. Using data from Figshare and Kaggle, this study describes the detection attempts of transfer learning and deep extraction of features on brain tumor MRI recognition. Using three well-known deep CNN architectures, Vgg16, ResNet50, and VGG19, deep feature extraction and transfer learning are carried out. Due to the vast number of example pictures, the collected dataset is chosen in experimental work. ResNet50 has the highest accuracy of all the models, with a 96.76% detection rate for brain tumor MRI. In the future, we intend to use several CNN architectural approaches to improve detection accuracy.
References 1. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016) 2. Deepak, S., Ameer, P.M.: Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 111, 103345 (2019) 3. Talo, M., Baloglu, U.B., Yıldırım, Ö., Rajendra Acharya, U.: Application of deep transfer learning for automated brain abnormality classification using MR images. Cogn. Syst. Res. 54, 176–188 (2019) 4. Ahuja, S., Panigrahi, B.K., Gandhi, T.: Transfer learning based brain tumor detection and segmentation using superpixel technique. In: 2020 International Conference on Contemporary Computing and Applications (IC3A), pp. 244–249. IEEE (2020)
186
A. K. Bitto et al.
5. Khan, M.A., et al.: Multimodal brain tumor classification using deep learning and robust feature selection: a machine learning application for radiologists. Diagnostics 10(8), 565 (2020). https://doi.org/10.3390/diagnostics10080565 6. Kaur, T., Gandhi, T.K.: Deep convolutional neural networks with transfer learning for automated brain image classification. Mach. Vis. Appl. 31(3), 1–16 (2020). https://doi.org/10. 1007/s00138-020-01069-2 7. Nickparvar, M.: Brain tumor MRI dataset. Kaggle (2021). https://www.kaggle.com/datasets/ masoudnickparvar/brain-tumor-mri-dataset. Accessed 24 Mar 2022 8. Mia, J., Bijoy, H.I., Uddin, S., Raza, D.M.: Real-time herb leaves localization and classification using YOLO. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7 (2021). https://doi.org/10.1109/ICCCNT 51525.2021.9579718 9. Krishna, R., Menzies, T.: Bellwethers: a baseline method for transfer learning. IEEE Trans. Softw. Eng. 45(11), 1081–1105 (2018) 10. Alippi, C., Disabato, S., Roveri, M.: Moving convolutional neural networks to embedded systems: the alexnet and VGG-16 case. In: 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 212–223. IEEE (2018) 11. Mateen, M., Wen, J., Song, S., Huang, Z.: Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry 11(1), 1 (2018) 12. Theckedath, D., Sedamkar, R.R.: Detecting affect states using VGG16, ResNet50 and SEResNet50 networks. SN Comput. Sci. 1(2), 1–7 (2020) 13. Bitto, A.K., Mahmud, I.: Multi categorical of common eye disease detect using convolutional neural network: a transfer learning approach. Bull. Electr. Eng. Inform. 11(4), 2378–2387 (2022). https://doi.org/10.11591/eei.v11i4.3834 14. Hasan, S., Rabbi, G., Islam, R., Imam Bijoy, H., Hakim, A.: Bangla font recognition using transfer learning method. In: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 57–62 (2022). https://doi.org/10.1109/ICICT54344.2022.985 0765
Deep Convolutional Comparison Architecture for Breast Cancer Binary Classification Nasim Ahmed Roni1(B) , Md. Shazzad Hossain2 , Musarrat Bintay Hossain3 , Md. Iftekharul Alam Efat4 , and Mohammad Abu Yousuf5 1 Institute of Information Technology (IIT), Jahangirnagar University, Savar, Dhaka 1342,
Bangladesh [email protected] 2 Daffodil International University, DIU Rd, Dhaka 1341, Bangladesh [email protected] 3 Changsha University of Science and Technology, Hunan, China 4 Institute of Information Technology (IIT), Noakhali Science and Technology University, University Rd, Noakhali 3814, Bangladesh 5 Institute of Information Technology (IIT), Jahangirnagar University, Savar, Dhaka 1342, Bangladesh [email protected]
Abstract. Early discernment of breast cancer can significantly improve the prospect of successful recovery and survival, but it takes a lot of time that frequently leads to pathologists disagreeing. Recently, much research has tried to develop the best breast cancer classification models to help pathologists make more precise diagnoses. Consequently, convolutional networks are prominent in biomedical imaging because they discover significant features and automate image processing. Knowing which CNN models are optimal for breast cancer binary classification is crucial. This work proposed architecture for finding the best CNN model. Inception-V3, ResNet-50, VGG-16, VGG- 19, DenseNet-121, DenseNet169, DenseNet-201, and Xception are analyzed as classifiers in this paper. We have examined these deep learning techniques on the breast ultra-sound image dataset. Due to limited data, a generative adversarial network is used to improve the algorithm’s precision. Several statistical analyses are used to determine the finest convolutional technique for premature breast cancer detection using improved images in binary class scenarios. This binary classification experiment evaluates each strategy across various dimensions to determine what aspects improve success. In both normalized and denormalized conditions, the Xception maintained 95% accuracy. Xception uses the complete knowledge-digging technique and is highly advanced. Therefore, the accuracy is considered to be better than that of others. Keywords: Breast Cancer · Binary Classification · Convolutional Network · Generative Adversarial Network · Data Augmentation · Xception
© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 187–200, 2023. https://doi.org/10.1007/978-3-031-34619-4_16
188
N. A. Roni et al.
1 Introduction Breast cancer is a complicated and multidimensional disease with a wide range of risk factors [1], histological findings, clinical manifestations, and treatment choices [2]. Breast cancer is characterized by the unrestrained growth of malignant cells in the mammary epithelial tissue and is an illness that affects both men and women. Breast cancer is the most common cancer in women in the world, with a prevalence that rises with age [3]. It is the second largest cause of death for women after lung cancer [4]. Recent medical discoveries have been produced in new and improved techniques to prevent, diagnose, categorize, and treat breast cancer [5]. One of the most successful ways to automatically detect and diagnose diseases at an early stage is to use a computerbased diagnostic (CAD) tool for medical imaging. CAD binary classification methods use intelligent approaches to identify breast results as benign or cancerous automatically. These imaging techniques in medicine may be useful in early breast cancer diagnosis [6]. As a result, due to the rising importance of predictive diagnosis and therapy, there is a growing trend in cancer detection and classification research to apply machine learning algorithms for projection and prognosis [7]. The use of histopathology data to classify breast abnormalities has grown in popularity in recent years. An independent classifier is used to build a set of hand-crafted attributes in classic machine learning-based CAD systems. The extraction of features from histopathology images is critical [8]. Because histopathological images include various aspects, it is challenging to determine the particular features due to pathologists’ lack of experience in recognizing specific needs. The majority of the time, patterns, fractals, colors, and intensity levels can be used to identify images. Manual feature extractions are time-consuming and can need in-depth prior knowledge of the disease in order to find highly representative characteristics [9]. As processing power and massive amounts of data become more readily available and examined during the learning process, the number of studies using machine learning has increased. Because of its ability to learn from raw data, machine learning is becoming increasingly popular for dealing with simple and complex situations. However, the correct machine learning models can reduce diagnostic error rates [10]. Massive volumes of complex data may now be examined and comprehended using new machine learning techniques [11]. To circumvent the drawbacks of conventional machine learning techniques, deep learning was created to efficiently employ the significant data that may be extracted from raw images for categorization techniques [12]. Deep learning uses the generalpurpose learning technique to do away with the necessity for manual feature tuning. Deep learning utilizing convolutional neural networks has made a lot of advancements in the field of medical image analysis recently, including the classification of mitotic cells from microscopic images and the identification of tumors [13]. With large data sets, the convolutional neural networks application performs admirably, but poorly with smaller data sets. With encouraging findings, several research teams have investigated the application of convolutional networks in medical image processing [14]. Convolutional networks have been used to overcome the difficulty of identifying and classifying tumors in ultrasound images [15]. Furthermore, several recent studies evaluated the feasibility of [16]
Deep Convolutional Comparison Architecture for Breast Cancer
189
CNN’s, AlexNet, U-Net, VGG16, VGG19, ResNet18, ResNet50, MobileNet-V2, and Xception for the problem of classifying Ultrasound images as benign or malignant [17]. The purpose of this article is to compares the performance of pre-trained deep learning models in order to determine whether convolutional techniques are best for breast cancer binary classification and also the best performing model from a large number of alternatives, as well as to propose a mechanism for differentiating them. Eight pretrained CNN architectures are used to train these semantic segmentation models on the Ultrasound image dataset. The following will be how the remaining paper is constructed. Section 2 represents the previous study of breast cancer classification. Section 3 contains data preprocessing techniques, feature extraction, and methodology part. Section 4 encompasses the dataset description, experimental assessment, and discussion. Finally, Sect. 5 provides the conclusion and future direction.
2 Literature Review In recent years, numerous breast cancer classification strategies have been proposed, with CNN models which received a lot of research interest [18]. For cancer detection and classification, various machine learning and deep learning algorithms are available. Convolutional Neural Networks, Recurrent Neural Networks, and pre-trained models like Alex Net, Google Net, VGG16, VGG19, ResNet50, InceptionV3, DenseNet121, DenseNet 169, DenseNet201, and Xception are some of the most used deep learning approaches for breast cancer classification [19]. Using the idea of transfer learning, SanaUllah et al. developed a CNN-based framework that can recognize and binary categorize breast cytology images. Their proposed framework reached 97.52% accuracy [20]. Similarly, they also offered data augmentation to expand the size of a data set and improve the efficiency of CNN algorithms. Author Gupta et al. use SVM and Logistic Regression to compare deep feature extraction and classification performance [24]. Regarding the standard breast cancer dataset, the proposed study outperforms previous strategies and produces state-of-the-art results. Hong Fang et al. proposed an improved multilayer perceptron for breast cancer detection [21]. In contrast, Saeed Talatal et al. proposed a Multilayer Perceptron Neural Network-Based Intelligent Collective Classification method (IEC-MLP) with an average accuracy of 98.74% for breast cancer detection [5]. The suggested approach is composed of two components: parameter optimization and ensemble classification. Esraa et al. demonstrated a completely automated breast cancer diagnosis technique with a 99.33% accuracy rate utilizing a U-Net to determine the breast region from thermal images and a deep learning method to evaluate abnormal breast tissues using thermal images [22]. Mohammed Abdullah et al. created a DCNN classifier model based on InceptionV3 and V4 for breast cancer detection to study the behavior of many modern deep learning techniques for breast cancer diagnosis [23]. The results showed that employing color thermal imaging, DCNN Inception V4, and updated Inception MV4 considerably increased their accuracy and efficiency in detecting breast cancer. Karan Gupta et al. used a deep learning model to automatically categorize breast cancer images that relied on traditional classifiers’ pre-trained (CNN) activation properties [6].
190
N. A. Roni et al.
The difficulty in gathering sufficient positive cases and the problems in developing breast cancer binary classification algorithms make it challenging to tackle overfitting issues, as is the case with many machine learning applications in healthcare [26]. Numerous subsequent papers used generative adversarial networks (GANs) for data augmentation [27]. In this instance, the training dataset can be improved by using the generative adversarial network (GAN) [28]. The authors Shuyue et al. devised a technique for detecting breast cancer using artificial mammograms that included the use of GANs as an image enhancement technique, achieving a validation accuracy of 79.8% [29]. To enhance the classification model’s performance, Asha et al. devised a discriminating robustness approach to increase accuracy and improve the classification model’s performance [30]. In this paper, they compare the performance of numerous CNN + Traditional Classifier configurations, such as VGG-16 + SVM, VGG-19 + SVM, Xception + SVM, and ResNet-50 + SVM. According to the researchers, the ResNet50 network had a maximum accuracy of 93.27%, according to the researchers. Aditya et al. suggested a Modified VGG (MVGG) model based on transfer learning to diagnose breast cancer in mammography images [25]. According to the trials, the suggested transfer learning combination of MVGG and ImageNet achieves a 94.3% accuracy, and other convnets are outperformed by the suggested hybrid network. The prior background investigation revealed that other authors had previously employed several approaches with varying degrees of success. It assessed their work and uncovered several problems, prompting us to take action. Everyone has encountered a few algorithms, but none have encountered as many as we have. Because convolutional networks outperform previous models, the suggested comparison performance model will aid pathologists in accurately diagnosing breast cancer at an early stage. Additionally, data augmentation approaches are rarely employed in low-data scenarios, obviating the need to address issues such as feature forecasting. As a result, this paper gives a comparative analysis of breast cancer binary classification utilizing Convolutional Networks. We employed eight pre-trained CNN models and the GAN augmentation technique to extract and forecast attributes from ultrasound images for the purpose of classifying benign and malignant lesions.
3 Methodology The workflow is divided into five components and prepared for binary classification in Deep Convolutional Comparison Architecture (DCCA), which is described in Fig. 1. To illustrate, the first stage of this suggested comparison approach. The pre-processing data stage begins with preparing the dataset for pre-processing, and then augmenting the data using the Generative Adversarial Networks (GANs) architecture. By this, the preprocessing data stage was completed. Ian Good-fellow and his colleagues first proposed GANs technique in June of 2014 [31]. Nevertheless, GANs is a data enrichment tool that pits two neural networks against each other to create new, synthetic data instances that can pass for actual data. They’re commonly utilized in image, video, and voice generation. The experimental environment was built up, and networks were employed to build the data for classification through feature extraction and selection. The model training phase began once all of the features
Deep Convolutional Comparison Architecture for Breast Cancer
191
were identified and selected; however, eight different convolutional techniques were utilized to categorize the data in the experimental setting. Finally, the model assessment step determines the best binary classification result. The complete architecture is displayed in Fig. 1.
Fig. 1. Deep Convolutional Comparison Architecture (DCCA)
3.1 Data Pre-processing Using GAN The amount of the training dataset heavily influences deep learning models’ performance. As a result, strategies for increasing dataset cardinality, such as data augmentation, are crucial. By addressing the issue of channel over-fitting, data augmentation enhances network performance. In this study, GANs data augmentation tactics improve the generalizability of the fit model. A generator and discriminator network were built to provide GAN augmentation using the architecture. A noise vector is sent into the generator as input. It creates augmented data, which are then supplied to the discriminator, along with actual data, to identify which distribution the samples came from. On the other hand, the generator’s purpose is to learn the accurate distribution without seeing it, such that its output is indistinguishable from actual samples. Both networks are trained simultaneously and in opposite directions until an equilibrium is attained. For x Rd , y = P data (x) is a depict from x to actual data y in d- dimensional space. To model this mapping, a neural network dubbed the generator G. Sample y is genuine if it comes from Pdata; sample z is synthetic if it comes from G. The discriminator D is a neural network that determines whether or not a sample is genuine. (y) = 1, D(z) = 0 represents the absolute situation. The G and D are the two neural networks that constitute the GAN. The corresponding loss function of a two-player mini-max game is used to train these adversarial networks: min max V (G, D) = E{log D[Pdata(x)]} + E(log{1 − D[G(x)]})
(1)
192
N. A. Roni et al.
For G(x) = p data (x), there is a global optimal solution to this min-max issue (x). The goal is to determine the distribution of reliable data. When D(y) = D(z) = 0.5, the discriminator D can no longer tell the difference between a genuine and a manufactured sample. By adjusting the input x, G can be used to make artificial samples. The input x for G in this investigation was a noise vector with 100 attributes from a Gaussian mixture N (0, 1). It’s vital for a well-trained GAN to be able to create data samples that appear to be real by using noise vectors. Using the exception of the output layer, the generator network was designed with four up-sampling layers and five convolutional layers. The ReLU activation function is used in every layer, whereas the tanh function is used at the output layer. The generator’s purpose is to create a 229 229 3 picture from a 100-length vector. In contrast, the discriminator takes a 229 229 3 picture as input and outputs a value between 0 and 1, indicating whether the image is augmented or not. The discriminator network is constructed with four conv layers with max- pooling layers and one fully connected layer, just like a standard CNN. Figure 2 shows a sample output of augmented benign and malignant classes picture after processing the data via this GAN network. 3.2 Feature Extraction, Selection and Classification In this study, eight different deep convolutional methods are used to extract, select, and classify features, enabling the input image to flow ahead until it reaches a pre-specified layer, and then using the layer’s outputs as the outcome feature. The pre-trained network acts as an arbitrary feature extractor and selector. DenseNet: It’s a cutting-edge CNN model for visual object recognition that requires fewer parameters to attain cutting-edge performance. DenseNet is very similar to ResNet in terms of the preceding layer output being mixed with a prospective layer using Dense Net’s concatenated (.) characteristics. On the other hand, ResNet uses an incremental attribute (+) to integrate the result of prior layers with the output of subsequent layers. The DenseNet Architecture seeks to solve this problem by firmly connecting all tiers. The architectures DenseNet- 121, DenseNet-169, and DenseNet-201 were used in this study. VGG: The VGG Net architecture allows for exceptional accuracy performance. The architecture of the Visual Geometry Group is categorized into six categories. The architecture includes a layer of repeated convolution and pooling. VGG-19 Net has 19 layers, including 16 conv and three fully connected (FC) layers, whereas VGG-16 Net has just 13 conv levels and three layers. According to a deep-structure VGG-Net, the network’s depth is essential for good performance. Xception: This Xception architecture uses a depth-wise separable sequential array of convolution layers with residual blocks. The purpose of extensively detachable convolution is to cut down on processing time and memory usage. The 36 conv layers of Xception are divided into 14 components. In Xception, separable convolution helps in the resolution of issues such as fading gradients and representational bottlenecks. A channel in the sequential network separates channel-wise and space-wise features learning. Instead
Deep Convolutional Comparison Architecture for Breast Cancer
193
of being concatenated, this shortcut connection uses a summing operation to make the outcome of the previous layer can be used as an input to the final layer. Inception V3: There are 42 layers in the Inception module. The InceptionV3 module from Google Brain comprises 159 layers and is the third iteration of the Inception module. The Inception module’s main idea is to mix small and big kernels with learning multi-scale interpretations while keeping the computational cost and parameter count to a minimum. ResNet-50: The idea of a residual block was first thought of by ResNet, which is a deep residual learning network. The first block’s input is connected to the second block’s output through residual blocks. This strategy enables the residual block to acquire knowledge about the residual function without inflating the parameters. A conv layer, 48 residual blocks, and a classifier layer with eleven to thirty-three tiny filters make up the 50-layer residual block known as ResNet50.
4 Result Analysis and Discussion This study employed several statistical criteria to evaluate the techniques, including accuracy, precision, recall, and F1 score. A classification confusion matrix is available for both normalized and non-normalized data. To aid comprehension, graphs of model accuracy and loss function are provided.
Fig. 2. Augmented Data in Various Epochs
4.1 Dataset For this investigation, we used the Breast Ultrasound Image Dataset, which was also used for binary classification. The Breast Ultrasound Image Dataset is a freely accessible dataset found on Kaggle [32]. Walid et al. constructed the dataset in 2018 [33]. At the
194
N. A. Roni et al.
baseline, the original dataset had 780 images, including ultrasound images of women between the ages of 25 and 75. The classification is based on groups of 306 benign and 294 malignant images, respectively. The enhanced data at various epochs utilizing the GAN augmentation approach is shown in Fig. 2. 4.2 Experimental Setup In this experiment, data passed through eight algorithms to extract and pick features, then used the retrieved features to train the convolutional model. Before the feature extraction and selection stages, as well as the classification phase, some experimental setup is done to ensure that the pipeline runs smoothly. Finally, assess the experimental findings and choose the model that best fits the data. a. Workstation: The work is done on the premium virtual engine Collaboratory. b. Packages and Libraries: The following libraries were used in this experiment: – – – –
NumPy version 1.21.5. TensorFlow version 2.8.0. Colab GPU NVIDIA Tesla K80 Keras version 2. x
4.3 Classification Result and Confusion Matrix Data augmentation with GAN produced 1200 images total, of which 600 were (benign + malignant) and 600 were (normal) for use in model training in this study. For the training test, we split the data into an 80:20 ratio. The results are based on 240 test images that were not used during the training period. Table 1 summarizes the accuracy, precision, recall, and F1 score of the gathered findings for the Binary classification. Table 1 shows that DenseNet-121, and DenseNet-169 both had an accuracy of 92%, but Xception had the highest accuracy of 95%, with Xception having the highest precision of 97%. DenseNet-121, on the other hand, earns the highest score of 98%, and they all finish up with a 95% F1 score. The same analysis is performed for the Malignant class, and it can be observed that while the accuracy for DenseNet-121, DenseNet-169, and Xception is still 92%, the precision is now 90% for DenseNet-121, 88% for F1 score, and 84% for Xception. In this case study, the confusion matrix is vital to emphasize since it will show the fundamental importance of data processing strategies. Figure 3 shows the performance evaluation of classifying results for normalized data. The actual result in each model is greater in normalized data. However, it is less effective in non-normalized data. By doing a little digging, we can observe that in each case, Xception, DenseNet-169, DenseNet121, and VGG-16 came out with a high degree of accuracy in determining the real outcome. 4.4 Model Accuracy, Loss Function, and ROC Graph Analysis To understand the offered model performance, graphs are crucial to end the result analysis. The Xception model consistency on both the training and testing phases is considerably superior to the others, as seen in Fig. 4, where DenseNet-121 and DenseNet-169 have
Deep Convolutional Comparison Architecture for Breast Cancer
195
Table 1. Results of Binary Classification Model
Accuracy
Precision
Recall
F1 Score
VGG-16
0.91
0.91
0.98
0.94
VGG-19
0.88
0.91
0.95
0.93
DenseNet-121
0.92
0.92
0.98
0.95
DenseNet-169
0.92
0.94
0.96
0.95
DenseNet-201
0.81
0.96
0.78
0.86
Xception
0.95
0.97
0.93
0.95
ResNet-50
0.79
0.90
0.82
0.86
Inception-V3
0.88
0.91
0.95
0.93
higher training accuracy but lower testing accuracy. In this scenario, the performance of the VGG-16 is also considerable. In Fig. 5, it can be seen that while employing the categorical loss function, various abnormalities in the loss function graph occurred, with the most incredible consistency of loss in VGG-16 and Xception, but more high and low amounts of losses in the other cases. Finally, the ROC graph analysis finished the scenario in Fig. 6. This case study revealed that the Xception model ROC is the highest, implying that the Xception model is more resilient than the other given models in this binary classification case study. It contained the peak position in each dimension and produced the best-performing model. In this binary classification scenario, the Xception model beat the DenseNet121, DenseNet-169, and VGG-16 models, although the DenseNet-121, DenseNet-169, and VGG-16 models did poorly. Furthermore, it is vital to note that the balance between convolutional layers and residual connections is critical in dealing with the classification issue. The detachable convolution channel of the sequence network in Xception combines channel-wise and space-wise feature learning to aid in the resolution of difficulties like fading gradients and representational limits. Instead of concatenating, the alternative model focuses on attribute management, deep structural analysis, computational cost, and parameter explosion. 4.5 Discussion In this binary classification scenario, the Xception model beat the DenseNet-121, DenseNet-169, and VGG-16 models, although the DenseNet-121, DenseNet-169, and VGG-16 models did poorly. Furthermore, in dealing with the issue of classification, it is vital to note that the balance between convolutional layers and residual connections is critical. The detachable convolution channel of the sequence network in Xception combines channel-wise and space-wise feature learning to aid in the resolution of difficulties like fading gradients and representational limits. Instead of concatenating, the alternative
196
N. A. Roni et al.
model focuses on attribute management, deep structural analysis, computational cost, and parameter explosion.
Fig. 3. Confusion Matrix of Data with Normalization
Fig. 4. Model Accuracy Graphs of Presented Model
The DenseNet design may maximize the residual mechanism by having each layer tightly connected to the ones below it, based on the layering activity of these models. The model’s compactness makes it non-redundant because the learned feature is shared through community knowledge. Convolutions, average pooling, max pooling, dropouts, and fully connected layers train densely linked deep networks with implicit deep supervision, allowing the gradient to flow back more quickly due to fast connections. Convolutions, average pooling, max pooling, dropouts, and fully linked layers,
Deep Convolutional Comparison Architecture for Breast Cancer
197
Fig. 5. Model Loss Graphs of Presented Model
Fig. 6. ROC Graphs of Presented Model
on the other hand, are symmetric and asymmetric building components in the Inception model. It primarily uses factorizing Convolutions to reduce the number of connections and parameters to train, resulting in faster processing and better results. As a result, it serves as an optimized picture classification booster. ResNet works similarly, layering all of these blocks exceedingly profoundly and regularly. In addition, the VGG approach uses
198
N. A. Roni et al.
both independent and completely connected Conv layers, resulting in a computationally costly strategy with a lower error rate than others. Xception, an extreme version of Inception using depth-wise separable convolution, is even better than Inception V3. In Xception, depth-wise convolution is channel-wise n*n spatial convolution, and point-wise convolution is 1 1 convolution to change the dimension. Xception employs the most detailed and indepth knowledge-digging technique to extract explicit features from an image. Its residual network aids in achieving an optimal learning rate, making it the most efficient and best-fitting of all the convolutional models presented. Finally, despite the model’s strength, it can be reasonably said that it is rather lightweight. Here, the emphasis is on efficiency in data training. It is bias-free and has reached a new level using transfer learning and GAN data augmentation approaches. We can see from earlier research that their model has more accuracy than ours. Furthermore, the weaknesses of those models are frequently a result of the usage of typical machine learning models, biases, and data augmentation methods like shifting and rotating. Other factors may also make the model less apparent. As a result, even though our model is not extremely accurate, it is still better than others.
5 Conclusion and Future Work This study compared eight different convolutional models to discover which one outperformed in all possible case studies. The fittest model was determined by comparing each model’s performance in normalized and de-normalized forms of an ultrasound picture dataset, with the Xception model outperforming each dimensional case study. In the binary categorization of breast cancer, the model is the most appropriate and consistent. The use of the GAN architecture to pre-process the dataset enhanced the performance of each model, implying that data processing approaches are helpful in reaching the intended result. This also aids in determining which convolutional model outperforms binary breast cancer classification scenarios using fewer images. In addition, working with biomedical data is difficult due to its insufficiency, and more or less envisioning work necessitates the use of augmented data and data pre-processing terminology to extract the appropriate parameters. Above, the contribution is made with the idea of using deep learning methods rather than hybrid methods to create a gateway with advanced data pre-processing techniques that can generate forecast images to teach models with different dimensional parameters and select suitable features from those to train deep learning models, as well as enrich the transfer learning process by which it will take biomedical imaging ideas to multidimensional or correlational grounds. In the future, we will quantify breast cancer severity to develop a system that will represent an automated version of the BI-RADS severity measurement scale.
References 1. Polyak, K.: Heterogeneity in breast cancer. J. Clin. Investig. 121(10), 3786–3788 (2011). https://doi.org/10.1172/JCI60534
Deep Convolutional Comparison Architecture for Breast Cancer
199
2. Lopez-Garcia, M.A., Geyer, F.C., Lacroix-Triki, M., Marchió, C., Reis-Filho, J.S.: Breast cancer precursors revisited: molecular features and progression pathways. Histopathology 57(2), 171–192 (2010).https://doi.org/10.1111/j.1365-2559.2010.03568.x 3. Lukong, K.E.: Understanding breast cancer–the long and winding road. BBA Clin. 7, 64–77 (2017) 4. Zardavas, D., Irrthum, A., Swanton, C., Piccart, M.: Clinical management of breast cancer heterogeneity. Nat. Rev. Clin. Oncol. 12(7), 381–394 (2015) 5. Talatian Azad, S., Ahmadi, G., Rezaeipanah, A.: An intelligent ensemble classification method based on multilayer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. J. Exp. Theor. Artif. Intell. 1–21 (2021) 6. Gupta, V., Vasudev, M., Doegar, A., Sambyal, N.: Breast cancer detection from histopathology images using modified residual neural networks. Biocybern. Biomed. Eng. 41(4), 1272–1287 (2021) 7. Makki, J.: Diversity of breast carcinoma: histological subtypes and clinical relevance. Clin. Med. Insights Pathol. 8, CPath-S31563 (2015) 8. Sanchez-Morillo, D., González, J., García-Rojo, M., Ortega, J.: Classification of breast cancer histopathological images using KAZE features. In: Rojas, I., Ortuño, F. (eds.) Bioinformatics and Biomedical Engineering. IWBBIO 2018. LNCS, vol. 10814, pp. 276–286. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78759-6_26 9. Miah, M.B.A., Yousuf, M.A.: Detection of lung cancer from CT image using image processing and neural network. In: 2015 International Conference on electrical Engineering and Information Communication Technology (ICEEICT), pp. 1–6. IEEE, May 2015 10. Xie, J., Liu, R., Luttrell IV, J., Zhang, C.: Deep learning based analysis of histopathological images of breast cancer. Front. Genet. 10, 80 (2019). X 11. Huma, F., Jahan, M., Rashid, I.B., Yousuf, M.A.: Wavelet and LSB-based encrypted watermarking approach to hide patient’s information in medical image. In: Uddin, M.S., Bansal, J.C. (eds.) Proceedings of International Joint Conference on Advances in Computational Intelligence. Algorithms for Intelligent Systems, pp. 89–104. Springer, Singapore (2021). https:// doi.org/10.1007/978-981-16-0586-4_8 12. Faruqui, N., Yousuf, M.A., Whaiduzzaman, M., Azad, A.K.M., Barros, A., Moni, M.A.: LungNet: a hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensorbased medical IoT data. Comput. Biol. Med. 139, 104961 (2021) 13. Shin, H.C., et al.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016) 14. Khatun, M.A., Yousuf, M.A.: Human activity recognition using smartphone sensor based on selective classifiers. In: 2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI), pp. 1–6. IEEE, December 2020 15. Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017) 16. Chen, Y., Jiang, H., Li, C., Jia, X., Ghamisi, P.: Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54(10), 6232–6251 (2016) 17. Gómez-Flores, W., De Albuquerque Pereira, W.C.: A comparative study of pre-trained convolutional neural networks for semantic segmentation of breast tumors in ultrasound. Comput. Biol. Med. 126, 104036 (2020) 18. Daoud, M.I., Abdel-Rahman, S., Alazrai, R.: Breast ultra- sound image classification using a pre-trained convolutional neural network. In: 2019 15th International Conference on SignalImage Technology Internet-Based Systems (SITIS), pp. 167–171. IEEE, November 2019
200
N. A. Roni et al.
19. Khuriwal, N., Mishra, N.: Breast cancer detection from histopathological images using deep learning. In: 2018 3rd International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–4. IEEE, November 2018 20. Khan, S., Islam, N., Jan, Z., Din, I.U., Rodrigues, J.J.C.: A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recogn. Lett. 125, 1–6 (2019) 21. Fang, H., Fan, H., Lin, S., Qing, Z., Sheykhahmad, F.R.: Automatic breast cancer detection based on optimized neural network using whale optimization algorithm. Int. J. Imaging Syst. Technol. 31(1), 425–438 (2021) 22. Mohamed, E.A., Rashed, E.A., Gaber, T., Karam, O.: Deep learning model for fully automated breast cancer detection system from thermograms. PLoS ONE 17(1), e0262349 (2022) 23. Al Husaini, M.A.S., Habaebi, M.H., Gunawan, T.S., Islam, M.R., Elsheikh, E.A., Suliman, F.M.: Thermal-based early breast cancer detection using inception V3, Inception V4 and modified inception MV4. Neural Comput. Appl. 34(1), 333–348 (2022) 24. Gupta, K., Chawla, N.: Analysis of histopathological images for prediction of breast cancer using traditional classifiers with pre-trained CNN. Procedia Comput. Sci. 167, 878–889 (2020) 25. Khamparia, A., et al.: Diagnosis of breast cancer based on modern mammography using hybrid transfer learning. Multidimension. Syst. Signal Process. 32(2), 747–765 (2021) 26. Wu, E., Wu, K., Lotter, W.: Synthesizing lesions using contextual GANs improves breast cancer classification on mammograms (2020). arXiv preprint arXiv:2006.00086 27. Sohan, K., Yousuf, M.A.: 3D bone shape reconstruction from 2D X-ray images using MED generative adversarial network. In: 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT), pp. 53–58. IEEE, November 2020 28. Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D.: Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2226–2234 (2018) 29. Guan, S., Loew, M.: Breast cancer detection using synthetic mammograms from generative adversarial networks in convolutional neural networks. J. Med. Imaging 6(3), 031411 (2019) 30. Pang, T., Wong, J.H.D., Ng, W.L., Chan, C.S.: Semi-supervised GAN- based radiomics model for data augmentation in breast ultrasound mass classification t. Comput. Methods Programs Biomed. 203, 106018 (2021) 31. Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014) 32. Breast Ultrasound Image Dataset. https://www.kaggle.com/datasets/aryashah2k/breast-ultras ound-images-dataset 33. Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data Brief 28, 104863 (2020)
Lung Cancer Detection from Histopathological Images Using Deep Learning Rahul Deb Mohalder1(B) , Khandkar Asif Hossain2 , Juliet Polok Sarkar2 , Laboni Paul1 , M. Raihan2 , and Kamrul Hasan Talukder1 1 Khulna University, Khulna, Bangladesh {rahul,khtalukder}@ku.ac.bd, [email protected] 2 North Western University, Khulna, Bangladesh
Abstract. Computed tomography (CT) is critical for identifying tumors and detecting lung cancer. As was the case in the recent past, we wish to incorporate a well-educated, profound learning algorithm to recognize and categorize lung nodules based on clinical CT imagery. This investigation used open-source datasets and data from multiple centers. Deep learning is a widely used and powerful technique for pattern recognition and categorization. However, because large datasets of medical images are not always accessible, there are few deep structured applications used in diagnostic medical imaging. In this research, a deep learning model was created to identify lung tumors from histopathological images. Our proposed Deep Learning (DL) model accuracy was 95% and loss was 0.158073%.
Keywords: Deep Learning Detection
1
· Classification · Lung Cancer · Tumor ·
Introduction
With 18% of all cancer-related deaths, lung cancer is the most common cancerrelated cause of death. Other cancer death rates are 9.4% for colon cancer, 8.3% for liver cancer, 7.7% for stomach cancer, and 7.7% to 6.8% for female breast cancer. There is a broad spectrum of disease development and therapy response among lung cancer patients. Additionally, lung cancer is becoming the biggest cause of mortality in the United States and Europe, surpassing heart disease, according to the European Medical Association (EMA), the World Health Organization (WHO), and the United States (US) Association [16]. As a result, perfectly diagnosis or detect is so critical for the decision and applying of each lung cancer patient’s treatment timely [27]. Multimodality is a prominent lung cancer therapy technique. However, the current survival rate for cancer patients ranges from 4 to 17%. In the early stages of lung cancer, c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 201–212, 2023. https://doi.org/10.1007/978-3-031-34619-4_17
202
R. D. Mohalder et al.
Fig. 1. Lung cancer [1].
a successful resection can be curative. Non-small cell lung cancer (NSCLC) patients undergoing resection had a 5-year survival rate of 75%–100% for stage IA NSCLC, but only 25% for stage IIIA NSCLC. However, confirming its pathological status via biopsy is difficult, particularly for tiny tumors in their early stages. The constraint may negatively impact clinical decision-making and management. Increasing use of computed tomography (CT), which enables extensive surgical dissection, has led to an increase in the detection of NSCLC at an early stage (Fig. 1). In a sizable screening population, the National Lung Screening Trial (NLST) compared low-dose computed tomography (LDCT) to chest radiography and found that LDCT reduced lung cancer-specific mortality by 20%. Conventional CT analysis is time-consuming and requires radiologist approval. CT-based lung cancer screening sometimes gives false-positive results. Due to CNNs’ accuracy and low dependency on human participation in other computer vision applications, interest in pulmonary nodule recognition and classification has surged in recent years. Cancer treatment requires microscopic diagnosis. Pathologists must find minute histopathological features in complex tumor tissue. This approach is time consuming, critical, and results in considerable variation between and among observers [4,7]. The most often used staining method is hematoxylin and eosin [2] as technology progresses, H&E-stained whole slide imaging (WSI) is becoming a regular medical practice, resulting in a massive volume of high-resolution pathology images. Digital pathology is currently encountering a bottleneck as a result of the poor capacity of histopathology or pathological image processing methods. Historically, medical treatment has depended on symptom analysis. In other words, a patient’s symptoms are initially looked into, and if necessary, they are referred for a more in-depth assessment. Current
Lung Cancer Detection from Histopathological Images Using Deep Learning
203
usage of the word precise medicine refers to the difficulty faced by the vast but fragmented nature of biological data. To do this, patients’ digital data is stored in shareable online databases and patient-centric appointments are employed [16]. Recent breakthroughs in Deep Learning (DL) and Deep Neural Networks (DNN) have improved the technology of image processing and object recognition from image. We may use DNN to search or match for objects in an image and determine whether or not they are recognized. Additionally, when examining a photograph, we might seek for numerous patterns. Frequently, a preset dataset is required to train the Neural Network (NN). By which the network can learn, detect or recognize, and categorize images. DNN can be used to extract features and categorize images in different kinds of diagnosis applications. In this research, we proposed a learning model to detect lung cancer from histopathoogical image dataset. The main objective of this research study is to enhance the performance of the Deep Learning model to identify the lung cancer efficiently. We organized this paper as Sect. 2 about previous lung cancer works. In the Sect. 3 we described the lung cancer dataset which we used in our experiment. Section 4 is about our proposed system and working procedure. We analyzed our results and compared our outcomes with others in Sect. 5. Finally, in the Sect. 6 we give the summary of this work.
2
Related Works
Jiang et al. [17] created a two-dimensional CNN architectures for tumor or spot detection. They used images with vascular characteristics. Setio et al. [26] employed 2D CNN architectures to locate lung cancer. CT scans with several planner perspectives were used as training data. Several lung nodule identification techniques were combined using CNNs to improve the result [29]. Dou et al. [10] proposed a combined 3D CNN architecture to reduce false positives in lung cancer diagnosis. Ding et al. [9] employed two steps to identify lung nodules. To begin, Faster R-CNN was enhanced with a deconvolutional structure to find lung candidates. Anirudh et al. [3] trained the 3D CNN by extending a 3D region from a single voxel point supplied by poorly labeled input. Artificial intelligence (AI) has recently demonstrated exceptional effectiveness in medical data process and analysis due to the rapid advancement of DL methods, which have demonstrated an increasing capacity to handle difficult real life problems [13,19,20]. By enabling pathologists with advanced deep learning algorithms, they may be able to assist them with tough diagnostic issues. Ivanov et al. projected the DNN layers onto a dynamic image and found that the layer count had an effect on image porecessing [15]. Notably, the authors of [11] used modified AlexNet. By using this auto-encoder, they were able to boost the success rate of the DNNs learning feature to 90.1%. Using histopathology data, Mohalder et al. proposed a deep learning approach for precisely identifying and categorizing lung cancer levels. With 15,000 lung cancer histology images used for training and validation, they were able to reach a prediction accuracy of 99.80%. [22].
204
R. D. Mohalder et al.
Chen et al. [6] made excellent use of machine learning technologies in 2017 to forecast chronic sickness epidemics in disease-prone communities. Zhang et al. [28] devised a quick Fourier transformation-coupled based machine learning strategy for forecasting short-term sickness risk and generating suitable recommendations on clinical test requirements in chronic heart disease patients. That model was a combination of ANN, LS-SVM, and NB. A different study group [23] proposed that clustering, noise reduction, and prediction algorithms be used to develop an analytical framework for forecasting disease. CART was used to generate fuzzy rules. Kotsavasiloglou and colleagues [18] developed a system which can classify unknown patients based on their linedrawing ability using an advanced machine learning methodology. Sedaghat et al. [25] developed a two-step technique for improving the outputs of the sequencebased prediction techniques or models. The first phase utilized consensus learning, while the second phase utilized SVM (unary and binary) to recognize the evaluated connections in the gene regulatory network that are dependent on gene binding and network features.
3
Dataset
Borkowski et al. published a dataset on colon and lung cancer in 2019 [5]. It is a combination of 25000 histopathological image dataset and 5000 image in each class. This dataset is recent from others. There are five types of tissue data, including of lung cancer and colon cancer. Original size of those images was 1024 × 768 px. But they published cropped images in 768 × 768 px. We used only lung cancer dataset for out this research. Figure 2 shows three sample data from the lung cancer dataset which we used in this work. Table 1 presents lung cancer histopathological images assigned class name and id. Table 1. Assigned class name and id of LC25000 dataset.
4
Cancer Type
Class Name Class ID # of Images
Adenocarcinoma
lung aca
0
5000
Benign Tissue
lung n
1
5000
Squamous Cell Carcinoma
lung scc
2
5000
Methodology
This section will discuss the approaches utilized to develop our model and increase the accuracy of our forecasts. Figure 3 illustrates architectural overview of our proposed system´s workflow.
Lung Cancer Detection from Histopathological Images Using Deep Learning
205
Fig. 2. Images of 2(a) lung adenocarcinoma (lung aca), 2(b) lung benign tissue (lung n), and 2(c) lung squamous cell carcinoma (lung scc) from LC25000 dataset.
4.1
Collection and Analysis of Data
We used the LC25000 [5] dataset for this study. There are five types of histopathology imaging data in this collection. These are some details on colon and lung cancer. We only used the dataset for lung cancer. There was 5000 of lung adenocarcinoma, 5000 of squamous cell carcinoma, and 5000 of benign tissue image. 4.2
Data Preprocessing
As a result of the numerous irregularities and poor pixel quality of the acquired images, the projected images of lung cancer are less accurate. The quality of the CT lung image was enhanced using a pixel intensity analysis method that influences how image pixels are seen. Both the dependable and the noisy pixels were eliminated by continuous pixel modification. Histogram algorithms were
206
R. D. Mohalder et al.
Fig. 3. Architectural overview of our proposed system.
frequently used to improve image quality because they are versatile and easy to implement. We also classified images into two groups by the classification process. That process ensured us that there was no mixing or noisy image in each class. In the data preprocessing step we converted all histopathological data from RGB to HSV. By converting RGB to HSV we found exact information from the of cancer area and affected level from histopathological data. 4.3
Deep Neural Network
DNN or DL model is the combination of input. hidden and output layer. Before constructing our DL model first we split our dataset into two groups. one is training and another one is testing. To Train DL model we used 80% of the total data and the remaining 20% for testing. These components operate similarly to human brains and can be trained similarly to any other machine learning algorithm. We created a neural network model with four layers. We employed the ReLU and Softmax activation functions in our model. Layer type, output shape, parameters and total number of parameters are shows in Fig. 4 In Fig. 5 we showed our proposed DL model structure with neuron and activation function, and in Fig. 6 also showed the 3D structural overview of our DL model working procedure. Conv2D, maxpooling2D, dropout, flatten, and dense are all depicted in this graphic using various colors. 4.4
Analysis and Visualization
By this process we to evaluated and visualized the findings of experiments. To assess accuracy, precision, recall, and F-1 Score outcomes were computed.
Lung Cancer Detection from Histopathological Images Using Deep Learning
Fig. 4. Our proposed DL model’s structure.
Fig. 5. DL model’s structure with neuron and activation function.
207
208
R. D. Mohalder et al.
Fig. 6. 3D view of our DL model.
Fig. 7. Training and validation accuracy-loss curve.
Lung Cancer Detection from Histopathological Images Using Deep Learning
209
Fig. 8. Confusion matrix of proposed DL model. Table 2. Measure the performance of our DL model. Precision Recall F1-Score Support lung aca
0.95
0.91
0.93
lung n
1.00
1.00
1.00
1000
lung scc
0.91
0.96
0.93
1000
0.95
3000
0.95
0.95
0.95
3000
weighted avg 0.95
0.95
0.95
3000
accuracy macro avg
5
1000
Result and Discussions
Only the lung cancer dataset from the LC25000 dataset was used in this analysis. Three groups of 15,000 histopathological images were present. There are three types of lung cancer image. All of the photographs were the same size (768 × 768 px). The comet-tail graph was used to evaluate picture intensity analysis outcomes. We processed our photographs in the image prepossessing stage by converting RGB to HSV. Because they lack crucial data needed to fore-
210
R. D. Mohalder et al. Table 3. Comparisons with previous works.
Reference
Algorithm
Accuracy(%)
Chen et al. [6]
CNN based Multimodal Prediction Model 94.8
Da Nobrega et al. [8]
NB, MLP, SVM, KNN, RF
Ding et al. [9]
Deep CNN Model
94.60
Gao et al. [12]
Improvement of Image Classification
94.00
Gunaydin et al. [14]
PCA, KNN, SVM, NB, DT, and ANN
93.24
Mehmood et al. [21]
Transfer Learning Method
89.00
93.19
Phankokkruad et al. [24] VGG16, ResNet50V2, and DenseNet201
90.00
Our Model
95.00
Deep Learning Model
cast lung cancer. After converting RGB to HSV we get black and white images. Black regions of an image indicate affected areas. We considered only the affected areas in our calculation. Others are not used to it because those areas do not carry any vital information. Following this, we divided our dataset into train and test groups. To train our DL model we set epoch size 10 and the batch size per epoch was 375. For model learning, we used the dynamic learning rate and Adam optimizer was used for the model optimization. The total number of trainable parameters was 3314275. At the end of 2 h 56 min and 24 s training and validation process was successfully completed. We got 95% accuracy with 0.158073% loss from our proposed DL model. Figure 7 illustrates the accuracy and loss curve for the training and validation process. We analysed precision, recall and f1-score value of our model. Table 2 shows the performance and Fig. 8 shows the confusion matrix of our DL model. We compared our accuracy with other researchersworks. ´ Table 3 illustrates the comparisons of our model accuracy with other researchers´ proposed model accuracy. Most of researchers used big and complex deep learning model for their detection and prediction tasks. But for our prediction task we tried to keep our model simple and short. Fro that reasons our model showed perfect accuracy within a short amount of time.
6
Conclusion
This study aims to categorize the severity of lung cancer and to identify malignant lung nodules in an input lung image. The location of malignant lung nodules is identified in this study utilizing ground-breaking deep learning techniques. In this scenario, traits are classified using deep learning. Future research will concentrate on enhancing the performance of pulmonary nodule classification and optimizing the proposed model. Additional work will be done to grade the images in accordance with the malignancy of the pulmonary nodules, which is crucial for the practical applications of diagnosing and treating lung cancer.
Lung Cancer Detection from Histopathological Images Using Deep Learning
211
References 1. Lung cancer. www.verywellhealth.com/lung-cancer-overview-4581940 2. Alturkistani, H.A., Tashkandi, F.M., Mohammedsaleh, Z.M.: Histological stains: a literature review and case study. Global J. Health Sci. 8(3), 72 (2016) 3. Anirudh, R., Thiagarajan, J.J., Bremer, T., Kim, H.: Lung nodule detection using 3D convolutional neural networks trained on weakly labeled data. In: Medical Imaging 2016: Computer-Aided Diagnosis, vol. 9785, p. 978532. International Society for Optics and Photonics (2016) 4. Van den Bent, M.J.: Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician’s perspective. Acta Neuropathol. 120(3), 297– 304 (2010) 5. Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson, C.P., DeLand, L.A., Mastorides, S.M.: Lung and colon cancer histopathological image dataset (lc25000). arXiv preprint arXiv:1912.12142 (2019) 6. Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017) 7. Cooper, L.A., Kong, J., Gutman, D.A., Dunn, W.D., Nalisnik, M., Brat, D.J.: Novel genotype-phenotype associations in human cancers enabled by advanced molecular platforms and computational analysis of whole slide images. Lab. Invest. 95(4), 366–376 (2015) 8. Da N´ obrega, R.V.M., Peixoto, S.A., da Silva, S.P.P., Rebou¸cas Filho, P.P.: Lung nodule classification via deep transfer learning in CT lung images. In: 2018 IEEE 31st International Symposium on Computer-based Medical Systems (CBMS), pp. 244–249. IEEE (2018) 9. Ding, J., Li, A., Hu, Z., Wang, L.: Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 559–567. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7 64 10. Dou, Q., Chen, H., Yu, L., Qin, J., Heng, P.A.: Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection. IEEE Trans. Biomed. Eng. 64(7), 1558–1567 (2016) 11. Gao, F., Huang, T., Wang, J., Sun, J., Yang, E., Hussain, A.: Combining deep convolutional neural network and SVM to SAR image target recognition. In: 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1082–1085. IEEE (2017) 12. Gao, X., et al.: Improvement of image classification by multiple optical scattering. IEEE Photonics J. 13(5), 1–5 (2021). https://doi.org/10.1109/JPHOT.2021. 3109016 13. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, Cambridge (2016) ¨ G¨ ¨ Comparison of lung cancer detection algo14. G¨ unaydin, O., unay, M., S ¸ engel, O.: rithms. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–4. IEEE (2019) 15. Ivanov, A., Zhilenkov, A.: The prospects of use of deep learning neural networks in problems of dynamic images recognition. In: 2018 IEEE Conference of Russian
212
16. 17.
18.
19. 20. 21.
22.
23.
24.
25.
26.
27.
28.
29.
R. D. Mohalder et al. Young Researchers in Electrical and Electronic Engineering (EIConRus), pp. 886– 889. IEEE (2018) Jakimovski, G., Davcev, D.: Using double convolution neural network for lung cancer stage detection. Appl. Sci. 9(3), 427 (2019) Jiang, H., Ma, H., Qian, W., Gao, M., Li, Y.: An automatic detection system of lung nodule based on multigroup patch-based deep learning network. IEEE J. Biomed. Health Inform. 22(4), 1227–1237 (2017) Kotsavasiloglou, C., Kostikis, N., Hristu-Varsakelis, D., Arnaoutoglou, M.: Machine learning-based classification of simple drawing movements in Parkinson’s disease. Biomed. Signal Process. Control 31, 174–180 (2017) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) Mehmood, S., et al.: Malignancy detection in lung and colon histopathology images using transfer learning with class selective image processing. IEEE Access 10, 25657–25668 (2022). https://doi.org/10.1109/ACCESS.2022.3150924 Mohalder, R.D., Sarkar, J.P., Hossain, K.A., Paul, L., Raihan, M.: A deep learning based approach to predict lung cancer from histopathological images. In: 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), pp. 1–4 (2021). https://doi.org/10.1109/ICECIT54077.2021. 9641341 Nilashi, M., Bin Ibrahim, O., Ahmadi, H., Shahmoradi, L.: An analytical method for diseases prediction using machine learning techniques. Comput. Chem. Eng. 106, 212–223 (2017) Phankokkruad, M.: Ensemble transfer learning for lung cancer detection. In: 2021 4th International Conference on Data Science and Information Technology, pp. 438–442 (2021) Sedaghat, N., Fathy, M., Modarressi, M.H., Shojaie, A.: Combining supervised and unsupervised learning for improved miRNA target prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 15(5), 1594–1604 (2017) Setio, A.A.A., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35(5), 1160–1169 (2016) Sung, H., et al.: Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin. 71(3), 209–249 (2021) Zhang, J., et al.: Coupling a fast Fourier transformation with a machine learning ensemble model to support recommendations for heart disease patients in a telehealth environment. IEEE Access 5, 10674–10685 (2017) Zia ur Rehman, M., Javaid, M., Shah, S.I.A., Gilani, S.O., Jamil, M., Butt, S.I.: An appraisal of nodules detection techniques for lung cancer in CT images. Biomed. Signal Process. Control, 41, 140–151 (2018). https://doi.org/10.1016/j.bspc.2017. 11.017, www.sciencedirect.com/science/article/pii/S1746809417302811
Brain Tumor Detection Using Deep Network EfficientNet-B0 Mosaddeq Hossain1,2(B) and Md. Abdur Rahman1 1 Jahangirnagar University, Savar, Dhaka 1342, Bangladesh
[email protected] 2 Manarat International University, Ashulia, Dhaka 1349, Bangladesh
Abstract. The brain tumor is the deadliest disease in human beings. It could lead a person quickly to death if it is not detected and treated in the primary stage. However, catching a brain tumor with a bare eye could sometimes lead to misguidance or be costly to find someone who is a master in this field. So, the deep learning (DL) method is a boon for detecting tumors from images in the health sector. Here, we are going to propose a DL-based modified model which is backed by EfficientNet-B0, one of the EfficientNet (EN) models, to predict the MRI images as tumorous or non-tumorous. Our proposed model consists of blocks of deep layers, and the image classification is done by the SoftMax classifier. In our methodology, we have used a significant number of MRI images to train and test our proposed model. In contrast, this phenomenon is rare in those papers that endeavored to detect brain tumors using the deep learning method. Also, very few attempts have been initiated so far that used the EfficientNet model in brain tumor detection. We attained a detection accuracy of 99.97%, precision of 91.63%, F1score of 86.94%, and recall score of 85.49% in our proposed model. Detection accuracy in our model is relatively higher than that of those models which have used the EfficientNet model. Keywords: EfficientNet · EfficientNet-B0 · deep learning · CNN · brain tumor detection · MRI · SoftMax
1 Introduction A tumor is typically a rounded shape thing that could arise under the skin of any part of the body [1]. There are multiple kinds of tumors, such as brain tumors, colon tumors, tongue tumors, thyroid tumors, liver tumors, breast tumors, etc. Among these kinds of tumors, brain tumors (BT) are the most detrimental. Many people die every year due to Brain Tumor disease. However, other types of tumors are also severe because they could create complications in the human body by producing carcinogenic cells. In this paper, our intended target is to analyze an MRI image using deep learning and predict whether it contains tumors or not. Typically brain tumors are two kinds, namely benign and malignant. Because of their inability to produce cancer cells and immobile criteria to move the body’s other organs, benign tumors are normally not carcinogenic. On the © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 213–225, 2023. https://doi.org/10.1007/978-3-031-34619-4_18
214
M. Hossain and Md. A. Rahman
other hand, malignant brain tumors are carcinogenic because they can proliferate and be transmitted to other organs of the body [2]. The latter kind of tumor is the most dangerous kind. Because, in 2016, in the USA, BT was one of the significant causes of carcinogenic death in children (ages 0–14) [3]. Brain tumor was the third death-causing factor for adolescents and teenagers (ages 15–39) [4]. Since cancer is a life-threatening disease, early detection of tumors could help in medication of the malignant tumor and stops them from further deteriorating into carcinogenic cells. So, our goal is to bring forward a model that can forecast the appearance of brain tumors more accurately from the brain’s MRI images. Although BT could be diagnosed from both MRI and CT Scan images, MRI images give a more accurate depiction of the tumor. A doctor can detect the tumor from an MRI image; however, not all doctors have enough experience to predict the tumor from an image when a difficult situation appears, especially in undeveloped regions. At present, Machine learning (ML) is a popular method among data scientists because of its aptness to learn from the input data as well as to make predictions by forming a model based on the inputs [5]. Machine learning techniques require an efficient inspection to develop a model; however, deep learning (DL) has made it easier because it doesn’t require a more sophisticated inspection. There are various kinds of DL methods, namely Convolutional Neural Network (CNN), Long Short-Term Memory Network, Recurrent Neural Network, Generative Adversarial Network, Radial Basis Function Network, Multilayer Perceptron, Self Organizing Map, Deep Belief Network, Restricted Boltzmann Machine, etc. [6–10]. These models can be trained by a dataset and can make further predictions based on it. In this way, in many regions where the most experienced doctor is not available or even if available, doctors can be assured about the presence of tumors with the assistance of these DL methods. But the problem is that deep learning algorithms need massive data to train a model to get a more accurate result. However, this problem can be avoided by taking help from transfer learning techniques. A transfer learning DL algorithm is being trained with a vast amount of data, and that pre-trained model can be applied to a small amount of data to make predictions. Since it requires a lot of time to train a conventional deep model, people can save a lot of time by using transfer learning techniques. In this method, we have implemented EN, being pre-trained on ImageNet, to identify the existence of a BT in MRI slices. Here, we can summarize our work as follows, • First, we developed a modified DL model grounded on EfficientNet-B0, a CNN. • We utilized a publicly available dataset, BD_Brain_Tumor, containing 20,000 CT scan images to train up and evaluate our proposed model. We split our dataset into the train, test, and validation segments to train, test, and validate our proposed model. • Since a deep learning model needs a huge amount of data, different data augmentation techniques, namely zooming, rotating, flipping, etc., were implemented to increase our dataset for better outcomes. • Also, we have utilized a pre-processing technique that selects the central region of the brain tumor and crops it to reduce the burden of analyzing the unnecessary pixels outside the main contour.
Brain Tumor Detection Using Deep Network EfficientNet-B0
215
• Our proposed model can predict the presence of the brain tumor by analyzing CT scan images with excellent accuracy of 99.97%, a precision of 91.63%, and an F1-Score of 86.94%. The following sections are organized as Related Works, Proposed Methodology, Experimental Section, and Conclusion.
2 Related Works Many articles have already addressed the issue of detecting brain tumors using DL, and achieving different accuracies based on their model. Many of these methods might achieve higher accuracy; however, achieving higher accuracy doesn’t mean that that method is the most suitable or the most plausible. Because many of those methods might use a small dataset, or there might be an overfitting problem in their trained model. Herein, we will discuss some papers that used EN for their prediction and classifications. The authors presented a combined method to classify an MRI image into three categories: Meningioma, Glioma, and Pituitary [11]. The main steps of this method are Preprocessing, Enhancing image quality, Convolutional layers, and a classification stage. In this paper, they employed EfficientNet-B0 (EN-B0), the baseline network of the ENs. They used ReLU as the activation function for classifying the input images. Using a publicly available dataset named Figshare, containing 3064 photographs of the T1 modality. Their model’s overall classification accuracy is 98.04%. Although this paper is not related to this study, we can get some ideas from [12] about detecting brain tumors [12]. This paper developed a modified EN model to detect the presence of Lymph node metastasis in the breast tumor. Authors also have created Random Center Cropping (RCC), a data augmentation technique. They also introduce the attention and feature fusion mechanism for feature extraction. According to their experiment, these two mechanisms boost the efficiency of the EfficientNet-B3 (EN-B3). A rectified version of Patch Camelyon (RPCam), a Kaggle Competition dataset, is used for their investigation. This paper achieved an accuracy of 97.96% ± 0.03% in this boosted EN-B3 model. Childhood Medulloblastoma is a special kind of brain tumor, and this issue was addressed nicely by using EN in the paper [13]. They have discussed the accuracies among the various models of EN networks. It calculates the multiclass classification as well as binary classification of Medulloblastoma. An accuracy range of 95.67% to 98.78% has been achieved for multiclassification. And 100% accuracy is achieved for binary classification. However, they used a dataset that contains only 355 images of different kinds of Medulloblastoma. Another paper uses EN to segregate small-cell from non-small-cell for lung cancer [14]. This work is somewhat different from the others because they analyzed brain tumors with lung cancer origins. Using 102 brain MRI images from 69 patients, they achieved an average accuracy of 90% for the classification. Some authors propose a method for classifying two kinds of Medulloblastoma, a type of severe brain tumor in childhood in this paper [15]. Authors find that EN-B5 outperforms other EN functions when these aren’t using the pre-trained model. The authors also compare their model’s outcome with the other CNN models like VGG16,
216
M. Hossain and Md. A. Rahman
etc. They collected 2769 images from 161 patients from the different hospitals. This method achieved an F1-Score of 80.1%. Again, a few authors propose a DL model to detect different brain tumors [16]. They have used the 6 DL models in their proposed model with the datasets BRATS 2013 and Whole Brain Atlas. Although they didn’t provide any information regarding data preprocessing, they achieved 96–99% optimum results in all six models. A Capsule net method was developed in the paper to detect the presence of BT [17]. By flipping and patching, the authors have augmented the Figshare data of the T1wCE modality. They also resized the images into 28 × 28 resolution. Their experiment achieved an accuracy of 87% while the input data weren’t pre-processed; however, it gave an accuracy of 92% while the input data were pre-processed. Other authors propose a method to detect the presence of BT using CNN model [18]. In this method, the dataset provided by Chakroborty from Kaggle was used, and the data augmentation technique was used for the data balancing. This model secures overall accuracy of 96.77% (Fig. 1).
Images with brain tumor
Images without brain tumor
Fig. 1. Sample MRI images from the brain tumor dataset that has been used in our study.
3 Proposed Methodology In this part, we will discuss our proposed model with detailed information. In the first step, MRI input images go through the pre-processing stage, which includes resizing the images into 224 × 224 resolution and crops the main brain region, then enters in the EfficientNet-B0 blocks, and then image slices are processed and analyzed with the convolutional layers and blocks. Finally, the last layer classifies the output images into two categories: tumorous or non-tumorous. 3.1 Data Data Source. In this method, we used “BD_Brain-Tumor,” a public dataset in Kaggle [19]. This dataset of 20,000 images has been split into three parts: ‘Training Set’, which contains 13,547 images, ‘Testing Set’, which includes 2,064 images, and ‘Validation Set’, which includes 4,356 images. There are 19200 images in jpg format, 156 images in png format, and 645 images in jpeg format. However, 34 images have been rejected because of their very blurry appearance. All the CT scan images we found in this dataset have different resolutions. Therefore, we resized those images into 224 × 224 resolution according to our proposed model’s requirement. In our experiment, we split our dataset
Brain Tumor Detection Using Deep Network EfficientNet-B0
217
into training and testing, which contains 80% and 20% of all data, respectively. Then we used 80% of the training dataset to train our model, and the rest of the 20% data to validate our model. All the photos we used in our method are of 224 × 224 resolution. Data Resizing and Cropping. Since the different raw MRI images have different resolutions, we needed to resize all these images into a unique resolution of 224 × 224 for passing through the main architecture of our proposed model. Running a deep learning model with MRI images takes too much time because of the vast number of pixels in an image. Therefore, we removed all the pixels outside the central brain region by selecting the main contour of MRI slices. This step reduces the unwanted impact of outward pixels and unnecessary calculation of pixels not related to the brain tumor region. Figure 2 shows the image cropping process in our proposed model. 1st step: loads the raw MRI
2nd step: selects the biggest contour
3rd step: selects the extreme points
4th step: crops and saves as new image
Fig. 2. Cropping the MRI images before entering into our proposed deep model
Data Augmentation. Since the DL models need massive data to train a model, we applied various data augmentation processes. Rotation, zooming, and width shifting have been done to augment our dataset. For the image rotation, we rotated each image at 10°. For zooming purposes, we zoomed each image with a 10x scale. These augmentation techniques have been applied using standard Python libraries.
3.2 Deep Network EfficientNet. EN is a relatively new method in the deep learning field. This network has been developed by some researchers from the Google Research Brain team [20]. They presented this at a conference in 2019 [20]. This method gained popularity because of its compound scaling of the network model and simple structure. EN can uniformly scale up the network’s depth, width, and resolution wise by using compound coefficients and yields better performance. Its effectiveness is measured by scaling up ResNet and MobileNets [20, 21]. Effectiveness of manipulating EN as transfer learning is also assured by their experiment [20]. Proposed Model. The baseline EN-B0 consists of a total of 9 stages, of which the first one is a convolutional layer having kernel size 3 × 3. The subsequent seven layers, i.e., the 2nd to 8th stages are the mobile inverted bottleneck (MBConv) blocks having kernel
218
M. Hossain and Md. A. Rahman
sizes of 3 × 3 or 5 × 5. The final stage consists of convolution, fully connected, and pooling layers [20]. In this paper, we have used the EN-B0 model as the building block for our experiments. The last stage of our model is built by selecting the activation function as ‘SoftMax’ for the liner classification of the brain tumors, and ‘average pooling’ was chosen for the pooling layer. We also conducted experiments on the other ENs, and we found that our proposed model gives the highest accuracy among other papers that are related to EN. Figure 3 describes the graphical architecture of our proposed model. MRI Input Resizing
224x224
Cropping
Crops brain region
Augmented Images
14x14x112
14x14x192
MBConv6, 3x3
MBConv6, 5x5
MBConv6, 5x5
MBConv6, 5x5
Validating
MBConv6, 3x3
MBConv1, 3x3
Conv, 3x3
Testing
MBConv6, 3x3
Training
} Data Augmentation
Shifting Zooming Rotating Re-scaling Flipping Shearing
7x7x320
Tumor Conv, 1x1 7x7x1280 SoftMax Fully Connected layer
Not Tumor
28x28x80 56x56x40 112x112x16
112x112x24
Output
224x224x32
EfficientNet Block
Fig. 3. Proposed model’s architecture.
Compound Scaling Method of EN Model. Basically, the model’s efficiency doesn’t solely depend on the complication of the model, but properly scaling the network model is one of the key reasons to be successful. An example of compound scaling is given below [20]. Figure 4 shows the graphical presentation of its scaling method. If w, d , r denote the width, depth, and resolution, respectively, such that d = α ∅ , w = β ∅ , r = γ ∅ . Here, the symbol ∅ is used for the compound coefficient, defined by the users, of the compound scaling functions. Then the relation among them can be explained by the following equation, α.β 2 .γ 2 ≈ 2 where α ≥ 1, β ≥ 1, γ ≥ 1
(1)
Brain Tumor Detection Using Deep Network EfficientNet-B0
219
Fig. 4. Compound Scaling method of EfficientNet [20]
4 Experimental Section In this part, we will discuss all the steps of our experiment and the results with various kinds of measurement matrices. 4.1 Experiment Input images go through the first step of our proposed model. In our compound scaling model, images are processed by various kinds of layers that scale up the images and extract the features of the pictures. We divided our dataset into 80% and 20% in this experiment to train and test, respectively. Further, we shuffled the allocated train datasets in order to avoid any kind of learning partiality. Since deep learning models need a vast amount of data, later, we applied rotation, zooming, and other processes for data augmentation. We have started running our model with one epoch and observed the model’s accuracy. Then we gradually increased the number of epochs and simultaneously noted our model’s accuracy and learning rate. After reaching nearer epoch number 50, we saw that the accuracy and learning rate of the model no longer increased. At that point, we stopped conducting more epochs and plotted those results in the graphs. In Fig. 5, we can see the different accuracies and losses associated with the various epochs, namely 10, 20, 30, 40, 50, etc. “Adam” optimizer was employed to reduce the noise problem. Adam optimizer works for the Stochastic Gradient Descent Method which is the process of finding the best fit between the actual result and the predicted outcome. There are a few reasons that instigate us to use the Adam optimizer as it optimizes better than other optimizers, needs fewer parameters for tuning, and takes less computational time to train a deep model. Equation 2 and 3 says how the Adam optimizer mathematically works as follows, Wt+1 = Wt − αmt
(2)
Where, mt = βmt−1 + (1 − β)
δL δWt
(3)
220
M. Hossain and Md. A. Rahman
Here, mt = Total gradients at time t, Wt = Weight at time t, α = Learning rate, L = Loss function, β = Moving parameter. “Average pooling” was used to make the pictures smoother and to pick the best features. In the final layer, the SoftMax classifier was used to predict whether the image contained a tumor or not. We trained and tested our model with SoftMax, ReLU, and Sigmoid classifiers. However, we got a higher accuracy when we used SoftMax classifier. The mathematical formula for the SoftMax classifier is given below by Eq. 4, eXi (4) σ X = N Xi i j=1 e
where vector X denotes input, N is the outcome types to be predicted, and Xi takes all the real values. 4.2 Result Our experiment gives 99.97% accuracy on the training set and 82.66% accuracy on the testing dataset. Other performance matrices have been used for a more narrative portrayal of our model’s performance. To overcome the stagnant problem, we reduced the learning rate to benefit the model and to avoid overfitting. At the end of the last epoch, the learning rate was reduced to 3.138105874196098−15 . 4.3 Performance Matrices Various kinds of performance measurements have been conducted. Since there are different kinds of evaluation scales, measuring only the accuracy isn’t enough to judge a model. Essential measures could be calculated by the initial evaluation such that if we consider the true-positive as TP, false-positive as FP, false-negative as FN, and false-positive as FP [22–25], then mathematically, we can write, Accuracy =
TP + TN TP + TN + FP + FN
(5)
TP TP + FP
(6)
Precision = F1 − Score =
2TP 2TP + FN + FP
Recall =
TP TP + FN
(7) (8)
These performance measurements are given in Table 1. Loss Function. There are various kinds of loss functions, and their applications vary according to the type and number of the output. Graph of our model’s accuracy and loss can be found in Fig. 5. If we choose M as the value of training examples, k as the value
Brain Tumor Detection Using Deep Network EfficientNet-B0
221
Table 1. Performance of our work Accuracy
Precision
F1-Score
Recall
99.97%
91.63%
86.94%
85.49%
k as the target, x as input for m, h as the model, then the mathematical of classes, ym θ formula for presenting the standard CCE equation is [26],
LCCE = −
K M 1 k ym × log(hθ (xm , k)) M
(9)
k=1 m=1
0.15 0.99 trainaccuracy
0.97
0.1 train-loss
val-accuracy
0.95
0.93
0.05
val-loss
0 0
10
20
30
Epochs
40
50
0
10
20
30
40
50
Epochs
Fig. 5. Graphs of the accuracy and loss function
4.4 Discussion Several papers have been published on various deep learning approaches to detect brain tumors. Many of those papers used the transfer learning method for their experiment, and others used the conventional DL method. In our model, we have used the pre-trained ENB0 model for our experiment, and this model has been trained on ImageNet. The papers that used EN to classify or detect brain tumors, most of those used only a fewer number of data as like one of those used 102 images to train and validate their model. However, our network structure has achieved more accuracy than other models. A comparative study on the accuracy of our model with the other papers that used EN has been presented in Table 2. We also conducted a comparative study on the other deep learning methods that endeavored to identify the presence of BT from MRI slices. These studies have used different CNN techniques other than EN. Table 3 provides a comparative analysis of those kinds of models with our proposed model.
222
M. Hossain and Md. A. Rahman Table 2. Accuracy among those models that used EfficientNet
References
EfficientNet model name
Accuracy
[11]
EfficientNet-B0
98.04%
[12]
EfficientNet-B3
97.96% ± 0.03%
[13]
Comparative study of different EfficientNet
95.67% to 98.78%
[14]
EfficientNet-B0
90%
[15]
EfficientNet-B5
F1-Score of 80.1%
Our proposed model
99.97%
Table 3. Comparative study of those models that didn’t use EfficientNet [27] and our proposed model. References
Model Name
Accuracy
F1-Score
[28]
Stacked sparse Autoencoder (SSAE)
98%
Not specified
[29]
Modified ResNet50
97.1%
96.9%
[30]
CNN-GAN
With GAN Pretrained 95.60%
With GAN Pretrained 95.10%
[31]
RNN
96%
Not mentioned
[32]
Modified GAN to increase data and ResNet50 to detect
91%
Not Mentioned
[33]
CNN
94%
94%
[34]
Segmentation using MKPC 83% and classification using DL
Not mentioned
[35]
Modified CapsNet
95.54%
Not Mentioned
99.97%
86.94%
Our proposed model
5 Conclusion Many researchers have tried to develop a DL model to address this issue. Many of those models have used small data and still achieved good accuracies. However, we know that deep network models need a vast amount of data to make predictions accurately. Another thing is that a model’s accuracy solely doesn’t rely on the model’s complexity of the structure. But, EN models showed that properly scaling the model function is a key factor in becoming a more successful model. Although EfficientNet’s design is less convolved, its compound scaling method rationally scales the model’s depth, width, and resolution. In our model, we conducted 50 epochs with the SoftMax activation function and achieved an accuracy of 99.97%, precision of 91.63%, and F1-score of 86.94%.
Brain Tumor Detection Using Deep Network EfficientNet-B0
223
From this perspective, we say that our model gives a more accurate result than the other similar models. We have developed our model to predict the presence of brain tumors, i.e., tumorous or not tumorous, in the MRI images. However, the efficiency of our model is yet to be determined in determining the presence of various kinds of brain tumors, namely, Meningioma, Glioma, Pituitary, Glioblastoma, Sarcoma, etc. Further study can be extended to identify various brain tumors based on our proposed model. Acknowledgment. Firstly, I want to thank my creator for giving me the patience and ability to conduct this study. Secondly, I want to thank Professor Md. Abdur Rahman, who was the mentor of this work. Also, I’m grateful to my wife for her mental support and inspiration on this journey.
References 1. Mohammadi, F., Rastgar-Jazi, M.: Analytical and experimental solution for heat source located under skin: modeling chest tumor detection in male subjects by infrared thermography. J. Med. Biol. Eng. 38(2), 316–324 (2017) 2. Rehman, A., Naz, S., Razzak, M.I., Akram, F., Imran, M.: A deep learning-based framework for automatic brain tumors classification using transfer learning. Circuits Syst. Signal Process. 39, 757–775 (2020) 3. Varade, A.A., Ingle, K.S.: Brain MRI classification using PNN and segmentation using K means clustering. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 6, 6181–6188 (2017) 4. Abiwinanda, N., Hanif, M., Hesaputra, S.T., Handayani, A., Mengko, T.R.: Brain tumor classification using convolutional neural network. In: Lhotska, L., Sukupova, L., Lackovi´c, I., Ibbott, G.S. (eds.) World Congress on Medical Physics and Biomedical Engineering 2018. IP, vol. 68/1, pp. 183–189. Springer, Singapore (2019). https://doi.org/10.1007/978-981-109035-6_33 5. Song, Y., et al.: Association of GSTP1 Ile105Val polymorphism with the risk of coronary heart disease: an updated meta-analysis. PLoS ONE 16(7), e0254738 (2021) 6. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014) 7. Sirichotedumrong, W., Kiya, H.: A GAN-based image transformation scheme for privacypreserving deep neural networks. In: 2020 28th European Signal Processing Conference (EUSIPCO), pp. 745–749. IEEE (2021) 8. Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1(4), 111–122 (2011) 9. Schmah, T., et al.: Generative versus discriminative training of RBMs for classification of fMRI images. In: NIPS (2008) 10. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852. PMLR (2015) 11. Guan, Y., et al.: A framework for efficient brain tumor classification using MRI images. Math. Biosci. Eng. 18, 5790–5815 (2021). https://doi.org/10.3934/mbe.2021292 12. Wang, J., Liu, Q., Xie, H., Yang, Z., Zhou, H.: Boosted efficientnet: detection of lymph node metastases in breast cancer using convolutional neural networks. Cancers 13(4), 661 (2021) 13. Bhuma, C.M., Kongara, R.: Childhood medulloblastoma classification using EfficientNets. In: 2020 IEEE Bombay Section Signature Conference (IBSSC), pp. 64–68. IEEE (2020)
224
M. Hossain and Md. A. Rahman
14. Grossman, R., Haim, O., Abramov, S., Shofty, B., Artzi, M.: Differentiating small-cell lung cancer from non-small-cell lung cancer brain metastases based on MRI using efficientnet and transfer learning approach. Technol. Cancer Res. Treat. 20, 15330338211004920 (2021) 15. Bengs, M., Bockmayr, M., Schüller, U., Schlaefer, A.: Medulloblastoma tumor classification using deep transfer learning with multi-scale EfficientNets. In: Medical Imaging 2021: Digital Pathology, vol. 11603, p. 116030D. International Society for Optics and Photonics (2021) 16. Kalaiselvi, T., Padmapriya, S.T., Sriramakrishnan, P., Somasundaram, K.: Deriving tumor detection models using convolutional neural networks from MRI of human brain scans. Int. J. Inf. Technol. 12(2), 403–408 (2020). https://doi.org/10.1007/s41870-020-00438-4 17. Vimal Kurup, R., Sowmya, V., Soman, K.P.: Effect of data pre-processing on brain tumor classification using capsulenet. In: Gunjan, V.K., Garcia Diaz, V., Cardona, M., Solanki, V.K., Sunitha, K.V.N. (eds.) ICICCT 2019, pp. 110–119. Springer, Singapore (2020). https:// doi.org/10.1007/978-981-13-8461-5_13 18. To˘gaçar, M., Cömert, Z., Ergen, B.: Classification of brain MRI using hyper column technique with convolutional neural network and feature selection method. Expert Syst. Appl. 149, 113274 (2020) 19. Kaggle Dataset: https://www.kaggle.com/datasets/dorianea/bd-braintumor. Accessed 10 Apr 2022 20. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114 (2019). https://proceedings.mlr.press/ v97/tan19a.html 21. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) 22. Acharya, U.R., et al.: Automated detection of Alzheimer’s disease using brain MRI images—a study with various feature extraction techniques. J. Med. Syst. 43, 1–14 (2019) 23. Acharya, U.R., Sree, S.V., Ang, P.C.A., Yanti, R., Suri, J.S.: Application of non-linear and wavelet based features for the automated identification of epileptic EEG signals. Int. J. Neural Syst. 22, 1250002 (2012) 24. Acharya, U.R., Sudarshan, V.K., Adeli, H., Santhosh, J., Koh, J.E.W.: A novel depression diagnosis index using nonlinear features in EEG signals. Eur. Neurol. 74, 79–83 (2015) 25. Rajinikanth, V., Joseph Raj, A.N., Thanaraj, K.P., Naik, G.R.: A customized VGG19 network with concatenation of deep and handcrafted features for brain tumor detection. Appl. Sci. 10(10), 3429 (2020) 26. Ho, Y., Wookey, S.: The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access 8, 4806–4813 (2019). https://doi.org/10.1109/ACCESS.2019.296 2617 27. Nazir, M., Shakil, S., Khurshid, K.: Role of deep learning in brain tumor detection and classification (2015 to 2020): a review. Comput. Med. Imaging Graph. 91, 101940 (2021) 28. Amin, J., Sharif, M., Gul, N., Yasmin, M., Ali, S.: Brain tumor classification based on DWT fusion of MRI sequences using convolutional neural network. Pattern Recognit. Lett. 129, 115–122 (2020) 29. Çinar, A., Yildirim, M.: Detection of tumors on brain MRI images using the hybrid convolutional neural network architecture. Med. Hypotheses 139, 109684 (2020) 30. Ghassemi, N., Shoeibi, A., Rouhani, M.: Biomedical signal processing and control deep neural network with generative adversarial networks pre-training for brain tumor classification based on MR images. Biomed. Signal Process. Control 57, 101678 (2020) 31. Begum, S.S., Lakshmi, D.R.: Combining optimal wavelet statistical texture and recurrent neural network for tumour detection and classification over MRI. Multimed. Tools Appl. 79(19–20), 14009–14030 (2020). https://doi.org/10.1007/s11042-020-08643-w
Brain Tumor Detection Using Deep Network EfficientNet-B0
225
32. Han, C., et al.: Infinite brain MR images: PGGAN-based data augmentation for tumor detection. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Neural Approaches to Dynamics of Signal Exchanges. SIST, vol. 151, pp. 291–303. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8950-4_27 33. Zhou, Y., et al.: Holistic brain tumor screening and classification based on DenseNet and recurrent neural network. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) BrainLes 2018. LNCS, vol. 11383, pp. 208–217. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11723-8_21 34. Rathi, V.G.P., Palani, S.: Brain tumor detection and classification using deep learning classifier on MRI images. Res. J. Appl. Sci. Eng. Technol. 10(2), 177–187 (2015) 35. Adu, K., Yu, Y., Cai, J., Tashi, N.: Dilated capsule network for brain tumor type classification via MRI segmented tumor region. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 942–947 (2019)
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures Tania Ferdousey Promy1 , Nadia Islam Joya1 , Tasfia Haque Turna1 , Zinia Nawrin Sukhi1 , Faisal Bin Ashraf1 , and Jia Uddin2(B) 1 Department of Computer Science and Engineering, School of Data Science, Brac University,
Dhaka, Bangladesh 2 AI and Big Data Department, Endicott College, Woosong University, Daejeon, South Korea
[email protected]
Abstract. Cancer is a lethal disease among the diseases in the world. It is clinically known as ‘Malignant Neoplasm’ which is a vast group of diseases that encompasses unmonitored cell expansion. It can begin anywhere in the body such as the breast, skin, liver, lungs, brain, and so on. As reported by the National Institutes of Health (NIH), the projected growth of new cancer cases is forecast at 29.5 million and cancer-related deaths at 16.4 million through 2040. There are many medical procedures to identify the cancer cell, such as mammography, MRI, CT scan, which are common methods for cancer diagnosis. The methods used above have been found to be ineffective and necessitate the development of new and smarter cancer diagnostic technologies. Persuaded by the phenomena of medical image classification using deep learning, our recommended initiative targets to analyze the performance of different deep transfer learning models for cancer cell diagnosis. In this paper, we have used VGG16, Inception V3 and MobileNet V2 deep architectures to diagnosis the breast cancer (KAU-BCMD dataset), lung cancer (IQ-OTH/NCCD dataset) and skin cancerHam10000). Experimental results demonstrate that VGG16 architecture shows comparatively higher accuracy by exhibiting 98.5% of accuracy for breast cancer, 99.90% for lung cancer and 93% for skin cancer dataset. Keywords: Cancer Detection · Convolutional Neural Network (CNN) · Image Processing · Deep Transfer learning
1 Introduction Cancer is one of the world’s most life-threatening diseases. It is the biggest source of death in the United States which is affecting different ages people. Early detection is the key to cancer treatment, but it is often not that easy. Cancer is a complicated disease, and there are a lot of ways it can show up in the body. Sometimes the symptoms do not appear until the cancer is quite advanced. On other times, there are no symptoms at all. While certain cancers, such as breast cancer, are comparatively easy to identify. On the other hand, cancers like lung, kidney or brain are very hard to detect. The majority © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 226–237, 2023. https://doi.org/10.1007/978-3-031-34619-4_19
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures
227
of cancers are now discovered only after they have progressed beyond the organs in which they began. Nevertheless, because of recent breakthroughs in deep learning (an artificial intelligence approach that allows us to spot patterns in images, audio, or text, among other things), we can now construct the finest artificial intelligence techniques and algorithms for cancer-detection to improve cancer detection and quality of life for patients. This is also why cancer researchers are making quite remarkable progress. Different forms of cancer detection and classification with deep learning have opened up a new area of research for early cancer diagnosis, demonstrating the possibility to eliminate manual system limitations. The key purpose of this paper is to present a brief analysis, comparisons on three deep transfer learning models (VGG16, InceptionV3, MobileNet) utilizing dynamic datasets for breast cancer, lung cancer, and skin cancer (melanoma) diagnosis from accuracy and precision perspectives. Rest of the paper is organized as follows. Section 2 includes literature review, where detailed workflow is presented in Sect. 3. Experimental result analysis is in Sect. 4. Finally, conclude the paper in Sect. 5.
2 Background Study In this section, we discuss the relevant architectures along with reviews on most recent and relevant works. 2.1 Convolutional Neural Network Deep Convolutional Neural Network (DCNN) [1, 2] can identify a certain kind of images from image analysis, image recognition and classification, medical image analysis, computer vision, NLP, etc. In DCNN, there are three types of neuron layers: convolutional layers, pooling layers, and fully connected layers. Filter banks, feature pooling layers, batch normalization layers, dropout layers, and dense layers all use to build for various object identification tasks including detection, segmentation, and classification in CNNs. It offers a variety of pre-trained architectural models, including LeNet, AlexNet, GoogleNet, VGGNet, Inception V3, and others [3]. CNN’s feature several hierarchies, which means that the distribution of inputs alters as the training progresses. Preprocessed inputs acquired over the whitening process, are very necessary for achieving superior results in a variety of jobs [4]. 2.2 Transfer Learning Through the transfer learning we can transfer the knowledge from previous activity to improve learning in a new activity [5]. Transfer learning is prominent in deep learning because it provides the large properties essential to train deep learning models, as well as the huge and complex datasets on deep learning models. Transfer learning is first training on a base network and on a baseline dataset, then transferring the gained features to a target task, which will be trained on the target dataset. If the characteristics are general, which is applicable to the basic and target tasks, rather than exclusive to the base activity [6].
228
T. F. Promy et al.
2.3 Related Works Using Mammograms and Images X-ray Mammograms images are used by doctors to detect initial indications of breast cancer [7]. In breast cancer, we have seen many studies using CNN with transfer learning. Several researchers used CNN to diagnose breast cancer abnormality based on mammograms using transfer learning. In this paper [8], they have used MIAS [9] and DDSM [10] for image processing and classifying. In [11], MIAS PGM formatted images and DDSM Utility to convert DDSM images into PNG. They run these images using MATLAB and use ROIs to train the model. Since CNN (VGG16 model [12]) takes only RGB images of a particular size, they changed those images according to it and resample it. This study used pre-trained CNN with handcrafted features that give average accuracy (benign vs malignant) is 91.02%, AUC is about 0.76. Another study proposed a CAD system, DCNN-SVM–AlexNet for classifying benign and malignant masses which gives accuracy of 87.2% [13]. 2.4 Related Works Using CT Scan Images DCNNs have lately demonstrated amazing effectiveness in lung nodule identification. Generally, imaging tests are common lung cancer nodule detection. In [14], the National Cancer Institute established the Lung Image Database Consortium (LIDC) to boost research and development operations. The LIDC database was established with 3 types of items to be manifest by 4 radiologists: nodules more than or equal to 3 mm in diameter with assumed histology, nodules less than 3 mm in size with an uncertain origin, and non-Nodules smaller than 3 mm in diameter but benign [15]. For working with the dataset, multiple CT slices of individual patient are downloaded and stored in a directory depending on the XML file [16]. To help with feature extraction, it recognizes and distinguishes lung structures and nodules. An automatic lung nodule detection system using a Multi-group Patch-based DL Network is used in [17], multi-group 2D Lung CT images are utilized from the LIDC dataset. The CNN structure is tested on 2 sets of pictures: original images and binary images. The goal of segmenting lungs from a CT scan is to find distinct characteristics that will help the classifier better categorize the candidates. Besides, [18] contemporary lung Computer-Aided Diagnosis (CAD) can help medical decision-making by using the chest CT scans. The definitive objective of these systems is to distinguish between cancerous and non-cancerous nodules. Furthermore, when it was constructed to Multi-Level CNN and tested it on the LIDC dataset, an accuracy of 84.81% was reported by researchers [19]. Thus, to identify lung cancer, a respective number of researchers have proposed contrasting techniques using deep transfer learning. 2.5 Related Works Using Dermatoscopic Images Dermoscopy is a diagnostic process that is used to identify small lesions from a broad area of the body. Other than using dermoscopy, image classification can be used for the same purpose. For the image classification we can use Convolutional Neural Networks pre-trained on ImageNet along with transfer learning. In [20], the researchers have used HAM10000 as their dataset. This dataset has covered all 7 distinct skin cancer
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures
229
cases (Actinic Keratosis, Basal cell carcinoma, Melanoma, Benign Keratosis, Dermatofibroma, Vascular Skin Lesion, Melanocytic Nevi) and [21] is consists of 10015 images with a resolution of 600 × 450 pixels. With the ultimate picture size set to 224 × 224 pixels, data augmentation in the form of flipping, cropping, and rotating was performed. Each model needs input in a specific structure, and the preprocess function assists us in transforming their data into that layout. The researchers tested pre-trained models with the dataset and based on the weighted recalls, they chose ResNet50 as their ultimate base model for their custom model.
3 Workflow The goal of this paper is to compare different CNN and deep transfer architectures by utilizing different types of cancer datasets. To do so, it needs to accept an image from an image dataset as an input, run it through CNN Model Architecture layers, and deliver the output after classification as shown in Fig. 1. Here are the procedures that will be utilized: At first different image datasets for each cancer will be collected. Then different datasets for each cancer will be combined together. Next 1000 images will be chosen from them and will be processed the images taken from the datasets. After that the processed images will be saved and will be randomly splitted into 80:20 ratio for training and testing the different models. Lastly the processed images will be used to train the model and test pre-trained models, analyzing the test accuracy and comparing them to determine the best model. 3.1 Data Collection For Breast cancer, we have used multiple datasets which include mammographic images. The datasets we have used are MINI-DDSM [22] and KAU-BCMD each 50% [11]. It is a lighter edition of the now-outdated DDSM (Digital Database for Screening Mammography) data collection. The dataset comprises 1416 instances, including pictures from both breasts (right and left) and two types of views (CC and MLO) for a sum of 5662 mammography images. Following the BI-RAD approach, the dataset was divided into six groups. For skin cancer(melanoma), we have used HAM10000 which is dermatoscopic pictures from diverse demographics gathered and preserved using various modalities. The completed dataset contains 10015 dermatoscopic pictures that may be utilized as a training set in machine learning models [21]. More than half of tumors are verified by histopathology (histo), with the other instances relying on follow-up examinations, expert consensus, or in-vivo confocal microscopy confirmation. The lesion id-column in the HAM10000 metadata file may be used to track tumors with many pictures in the dataset. For Lung cancer, we have also used multiple datasets which are IQ-OTH/NCCD [23] and Chest CT-Scan images Dataset [24]. We mostly gathered our data from IQ-OTH/NCCD and only used around 40 files from the other for abnormal (cancerous) cell inadequacy in the main (IQ-OTH/NCCD) (Table 1).
230
T. F. Promy et al.
Fig. 1. Workflow diagram of the system.
Table 1. Details of dataset used in this paper. Cancer types
Dataset types
Dataset name
Breast
Mammographic images
MINI-DDSM & KAU BCMD
Lunch
CT SCAN images
IQ-OTH/NCCD & Chest CT scan image
Skin (Melanoma)
Dermatoscopic images
HAM 10000
3.2 Data Pre-processing We restructure and resize the whole dataset’s photos into a single size which is 256 × 256. We labeled our images into two parts normal and cancerous. OHE (One-Hot Encoding) is a categorical encoding method that converts all elements on a categorical column into new columns with binary values of 0 or 1 to indicate the existence of the category value. Here we implemented one hot encoding method for categorizing our image datasets by converting normal (healthy) image datasets into 1 and cancerous image datasets into 0.
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures
231
We also used the NumPy libraries to conduct simple image editing and store it to our local system after converting the loaded photos to and from the NumPy array. Images are read as arrays in both the Keras API and OpenCV. 3.2.1 Data Train and Implementation We downloaded pre-trained weights and printed out the InceptionV3, VGG16, MobilenetV2 model after importing the relevant Deep Learning libraries. We started with Inception V3 and subsequently moved on to VGG16, MobileNet V2 for training our datasets. For improved accuracy, we tweaked the models. Since maximum layers of them are pre-trained in the model so it was termed. As a result, we do not need to retrain them again. We will set different values of the input layers suitable for our datasets. The key advantage of Transfer learning is that it reduces the training time in half and gives us the output with proper accuracy. 3.2.2 Data Splitting Based on an 80:20 ratio, the complete dataset was randomly splitted into 80% training data and 20% testing data, as like as [16]. Furthermore, the divided datasets each have 20 batches, while the goal size is (256, 256). Here, for each cancer, 1000 images were utilized, with 600 malignant image (cancerous) files and 400 healthy (non-cancerous) image files. We implemented three models for training and testing: Inception-v3, VGG16, MobilenetV2. We divided the data set into two sections, with around 80% of data used for training and 20% for testing. 3.3 Architectures Transfer learning states to the application of a previously learned model to a new challenge. Since it can train deep neural networks using a very little data. For our paper we have used ImageNet trained models of transfer learning, to solve real-world picture classification challenges which is because the image dataset has over 14 million images in over 20,000 categories. 3.3.1 Visual Geometry Group (VGG16) It significantly outpaces AlexNet by serially substituting massive kernel-size filtration (11 and 5 with the 1st and corresponding convolutional layers) with innumerable 3 × 3 kernel-size filters. VGG16 had already been training on NVIDIA Titan for weeks which are black GPUs The input to the conv layer is a 224 × 224 RGB image. The image is pass through a series of conv layers with such a limited receptive field: 33% also has 1 × 1 convolution filters, that are a linear function of the input channels in a few of the configurations. The convolution process is set to one pixel, and the convolution is spatially padded. After convolution, the spatial resolution is preserved. Spatial pooling is handled by 5 max pooling layers that track few of the convolution layers. Max-pooling is used across a 2 × 2-pixel frame with a stride size 2.
232
T. F. Promy et al.
3.3.2 Inception V3 According to the ImageNet dataset, Inception v3 is a widely used image recognition model that has been shown to attain at least 78.1% of accuracy [25]. The model is the product of multiple conceptions that have been investigated by a variety of researchers over time. The model has symmetric and asymmetric construction components. 3.3.3 MobileNetV2 MobileNetV2 is design for aiming to be mobile-friendly. MobileNetV2’ includes a fully convolutional layer with 32 filters, followed by 19 residual bottleneck layers [26]. It is divided into two parts-Inverted Residual Block and Bottleneck Residual Block. Convolution of 1 × 1 without any linearity.
4 Experimental Result Analysis 4.1 Experiment 1: Breast Cancer We run InceptionV3, VGG16, MobilenetV2 models in our chosen dataset MINI-DDSM and KAU-BCMD to obtain the results of our accuracy and loss of these models. Figure 2 shows the accuracies in the models. For VGG16, the accuracy alternates around 99.9%, and validation accuracy is up to 98.5% also the loss in the training is around 0.0004 and validation loss is found around 0.0860. For the InceptionV3 model, the training accuracy is around 99.87% and validation accuracy is around 94%, where the loss in the training is about 0.0141 and validation loss is around 0.2154. For the MobilenetV2 model, the training accuracy is around 99.98% and validation accuracy is around 69.50%. The loss in the training and validation is about 0.0075 and 0.9186, respectively.
Fig. 2. Accuracy graphs of VGG16, InceptionV3, MobilenetV2 model on the dataset (Breast cancer)
4.2 Experiment 2: Lung Cancer For lung cancer, we used 1000 CT scans images on the models. We ran IQ-OTH/NCCD and Chest CT scans dataset in the VGG16, Inception V3, MobileNet V2; and gathered the results of accuracy and loss of these models, where most of the datasets were taken from
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures
233
IQ-OTH/NCCD dataset. In VGG16, we got train accuracy around 99.9% and validation accuracy 99.9% with the train loss about 0.0026 and the validation loss around 0.0041. For Inception V3, we got train accuracy around 99.9% and validation accuracy around 88.5%. The train loss was 0.0105 and validation loss was around 0.304. Also, the train accuracy is around 99.9% and validation accuracy is around 89.5% in MobileNet V2 and the train and validation loss are around 0.002 and 0.267, as depicted in Fig. 3.
Fig. 3. Accuracy graphs of VGG16, InceptionV3, MobilenetV2 model on the dataset (Lung cancer)
4.3 Experiment 3: Skin Cancer We use the 1000 images from the HAM10000 dataset for Skin Cancer. This dataset consists of dermatoscopic images acquired and stored by different modalities VGG16, Inception-V3, Mobilenet-V2 models have been used to run on our chosen dataset.
Fig. 4. Accuracy graphs of VGG16, InceptionV3, MobilenetV2 model on the dataset (Skin cancer)
Using VGG16, we get train accuracy around 99.9% and validation accuracy 93%. The train loss is 0.005 while the validation loss is found around 0.27. In Inception V3, we get train accuracy around 99.25% and validation accuracy around 87.50%. The train loss is 0.041 and validation loss is 0.37. For MobileNetV2, the train accuracy is around 99.9% and validation accuracy is around 86.5%. Also, the train loss is around 0.0051 and validation loss is found around 0.294, as depicted in Fig. 4. Figure 5 illustrates a confusion matrix of 3 different architectures for 3 cancer datasets. After comparing the accuracy for the cancer detection of predefined models, using the same epoch, VGG16 performs excellently for the selected separate cancer datasets.
234
T. F. Promy et al.
Fig. 5. Confusion Matrix of different models
At first, for breast cancer, the value of the accuracy reached 99.8% for training and 98.5% for validation where InceptionV3 delivered almost similar accuracy 99.87% for training and 94% for validation. Although for MobilenetV2, we get the same training accuracy but the validation accuracy dropped to 69.50%. Therefore, we can consider VGG16 as the best performing model following InceptionV3. For our lung cancer dataset, we get the training accuracy 99.9% with similar validation accuracy for VGG16 but for InceptionV3, the training accuracy remained same but validation accuracy delivered 88.5%. Next, for MobilenetV2 the training accuracy does not change but we get validation accuracy around 89.5%. So, we can consider VGG16, MobilenetV2 as suitable models for lung cancer. Though InceptionV3 performed quite well too. Lastly, for skin cancer (melanoma), we get the training accuracy 99.98% for VGG16 but the validation accuracy delivered 93% in this case. InceptionV3 delivered 87.50% for validation accuracy with the same training accuracy. After that, MobilenetV2 conveys 86.5% of accuracy, as illustrated in Table 2.
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures
235
Table 2. Test accuracy of different architectures for various cancer dataset. Dataset
VGG16
Inception V3
MobileNet
Breast Cancer
98.5%
94%
69.5%
Lung Cancer
99.90%
88.5%
89.5%
Melanoma
93%
87.5%
86.5%
Average
97.13%
90%
81.83%
5 Conclusion Our technique has demonstrated to be highly effective in terms of multiple datasets. We also offered a thorough overview of existing methods for diagnosing and quick detection of a variety of cancers that have a significant impact on the human body. The purpose of this article is to examine and categorize, compare cancer-related approaches with small datasets, as well as to identify any gaps. Another goal of this study is to provide new researchers with a thorough background in order to begin their research in this sector. Our findings indicate that pre-trained CNN models can automatically extract features from Mammographic, CT scan, and Dermoscopic images, and that a good classifier can really be trained utilizing these features without any need for hand-crafted features. In this analysis, we concentrated on transfer learning approaches, pre-processing, pretraining models, and convolutional neural network (CNN) models as they apply to all the mentioned image recognition and detection. In conclusion, a thorough evaluation of the models, VGG16 delivered comparatively higher accuracy with our datasets. So, we expect to see far better results in any other dataset, and we want to continue working on the other datasets in the future to develop a customized model that can be used for multiple cancer detection.
References 1. Bhuiyan, M.R., et al.: A deep crowd density classification model for Hajj pilgrimage using fully convolutional neural network. PeerJ Comput. Sci. 25(8), e895 (2022) 2. Sabab, M.N., Chowdhury, M.A.R., Nirjhor, S.M.M.I., Uddin, J.: Bangla speech recognition using 1D-CNN and LSTM with different dimension reduction techniques. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds.) iCETiC 2020. LNICSSITE, vol. 332, pp. 158– 169. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60036-5_11 3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012) 4. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: The International Conference on Machine Learning, pp. 448–456. PMLR (2015) 5. Ruhi, Z.M., Jahan, S., Uddin, J.: A novel hybrid signal decomposition technique for transfer learning based industrial fault diagnosis. Ann. Emerg. Technol. Comput. 5(4), 37–53 (2021). https://doi.org/10.33166/AETiC.2021.04.004
236
T. F. Promy et al.
6. Brownlee, J.: A gentle introduction to transfer learning for deep learning. Mach. Learn. Mastery 20 (2017) 7. CDCBreastCancer. “What is a mammogram?” Centers for Disease Control and Prevention (2022). https://www.cdc.gov/cancer/breast/basicinfo/mammograms.htm. Accessed 12 May 2022 8. Guan, S., Loew, M.: Breast cancer detection using transfer learning in convolutional neural networks. In: 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–8. IEEE (2017) 9. Suckling, J.P.: The mammographic image analysis society digital mammogram database. Digit. Mammo 375–386 (1994) 10. Pub, M.H., Bowyer, K., Kopans, D., Moore, R., Kegelmeyer, P.: The digital database for screening mammography. In: Fifth International Workshop on Digital Mammography, pp. 212–218 (2001) 11. Alsolami, A.S., Shalash, W., Alsaggaf, W., Ashoor, S., Refaat, H., Elmogy, M.: King Abdulaziz University breast cancer mammogram dataset (KAU-BCMD). Data 6(11), 111 (2021) 12. Islam, M.N., et al.: Diagnosis of hearing deficiency using EEG based AEP signals: CWT and improved-VGG16 pipeline. PeerJ Comput. Sci. 7, e638 (2021) 13. Ragab, D.A., Sharkas, M., Marshall, S., Ren, J.: Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ 7, e6201 (2019) 14. Fedorov, A., et al.: Standardized representation of the LIDC annotations using DICOM (No. e27378v2). PeerJ Preprints (2019) 15. Pehrson, L.M., Nielsen, M.B., Ammitzbøl Lauridsen, C.: Automatic pulmonary nodule detection applying deep learning or machine learning algorithms to the LIDC-IDRI database: a systematic review. Diagnostics 9(1), 29 (2019) 16. Sajja, T., Devarapalli, R., Kalluri, H.: Lung cancer detection based on CT scan images by using deep transfer learning. Traitement du Signal 36(4), 339–344 (2019) 17. Jiang, H., Ma, H., Qian, W., Gao, M., Li, Y.: An automatic detection system of lung nodules based on a multigroup patch-based deep learning network. IEEE J. Biomed. Health Inform. 22(4), 1227–1237 (2017) 18. Da Nóbrega, R.V.M., Peixoto, S.A., da Silva, S.P.P., Rebouças Filho, P.P.: Lung nodule classification via deep transfer learning in CT lung images. In: 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS), pp. 244–249. IEEE (2018) 19. Lyu, J., Ling, S.H.: Using multi-level convolutional neural networks for classification of lung nodules on CT images. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 686–689. IEEE (2018) 20. Kondaveeti, H.K., Edupuganti, P.: Skin cancer classification using transfer learning. In: 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (ICATMI), pp. 1–4. IEEE (2020) 21. Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1), 1–9 (2018) 22. Lekamlage, C.D., Afzal, F., Westerberg, E., Cheddad, A.: Mini-DDSM: mammography-based automatic age estimation. In: 2020 3rd International Conference on Digital Medicine and Image Processing, pp. 1–6 (2020) 23. Kareem, H.F., AL-Husieny, M.S., Mohsen, F.Y., Khalil, E.A., Hassan, Z.S.: Evaluation of SVM performance in the detection of lung cancer in marked CT scan dataset. Indonesian J. Electr. Eng. Comput. Sci. 21(3), 1731–1738 (2021) 24. Bhandary, A., et al.: Deep-learning framework to detect lung abnormality–a study with chest X-Ray and lung CT scan images. Pattern Recogn. Lett. 129, 271–278 (2020)
Cancer Diseases Diagnosis Using Deep Transfer Learning Architectures
237
25. Sachan, A.N.K.I.T.: Detailed guide to understand and implement ResNets (2019). Accessed 5 Nov 2020 26. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Transfer Learning Based Skin Cancer Classification Using GoogLeNet Sourav Barman1(B) , Md Raju Biswas1 , Sultana Marjan1 , Nazmun Nahar1 , Mohammad Shahadat Hossain2 , and Karl Andersson3 1
Noakhali Science and Technology University, Noakhali, Bangladesh {barman2514,raju2514,marjan2514}@student.nstu.edu.bd, [email protected] 2 University of Chittagong, Chittagong, Bangladesh hossain [email protected] 3 Lulea University of Technology, Skelleftea, Sweden [email protected]
Abstract. Skin cancer has been one of the top three cancers that can be fatal when caused by broken DNA. Damaged DNA causes cells to expand uncontrollably, and the rate of growth is currently increasing rapidly. Some studies have been conducted on the computerized detection of malignancy in skin lesion images. However, due to some problematic aspects such as light reflections from the skin surface, differences in color lighting, and varying forms and sizes of the lesions, analyzing these images is extremely difficult. As a result, evidence-based automatic skin cancer detection can help pathologists improve their accuracy and competency in the early stages of the disease. In this paper, we present a transfer ring strategy based on a convolutional neural network (CNN) model for accurately classifying various types of skin lesions. Preprocessing normalizes the input photos for accurate classification; data augmentation increases the amount of images, which enhances classification rate accuracy. The performance of the GoogLeNet transfer learning model is compared to that of other transfer learning models such as Xpection, InceptionResNetVe, and DenseNet, among others. The model was tested on the ISIC dataset, and we ended up with the highest training and testing accuracy of 91.16% and 89.93%, respectively. When compared to existing transfer learning models, the final results of our proposed GoogLeNet transfer learning model characterize it as more dependable and resilient. Keywords: Skin cancer learning
1
· GoogLeNet · Data augmentation · Transfer
Introduction
The skin is the body’s biggest organ, protecting all of the inner organs from the environment. It aids in temperature regulation and infection protection. Thereare three layers to the skin: The epidermis, dermis, and hypodermis are the three layers of the skin. c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 238–252, 2023. https://doi.org/10.1007/978-3-031-34619-4_20
Transfer Learning Based Skin Cancer Classification
239
CCancer is a life-threatening disease for humans. It can sometimes result in a human’s death. In the human body, various types of cancer can exist, and skin cancer is one of the most rapidly developing tumors that can lead to death. It is triggered by a variety of circumstances, including smoking, alcohol consumption, allergy, infections, viruses, physical stress, changes in the environment, and sensitivity to ultraviolet (UV) rays, among others. UV radiation from the sun have the potential to destroy the DNA within skins. Skin cancer can also be caused by odd inflammations of the human body. According to the World Health Organization (WHO), Skin cancer is one out of three problems in every cancer cases [23]. In the United States, Canada, and Australia, the amount of persons detected with skin cancer has already been steadily growing over the previous few years. In the United States, it is estimated that 5.4 million cases of skin cancer will be detected each year. Every day, there is a growing pressure for speedy and accurate clinical testing [29]. AAs a consequence, timely identification of skin cancer may result to earlier detection and treatment, potentially saving lives. Various forms of computeraided diagnosis (CAD) methods have been designed to detect skin cancer throughout the last few years. In order to identify cancer, conventional computer vision techniques are mostly employed as a detector to capture a large number of attributes such as shape, size, color, and texture. Artificial intelligence (AI) has evolved into a capability to address these issues in recent years. There are some architectures that mostly uses in the medical fieldLike DNN (Deep neural network), CNN (convolutional neural network), LSTM (long short-term memory), and recruit neural network (RNN). All of those models are able to check for skin cancer. Furthermore, CNN and DNN create satisfied results in this field. The most often used method is CNN, which is a collection of classification algorithms for feature learning and classification techniques. Now the outcome will be increased using transfer learning within vast data sets. The following is a summary of our paper’s primary contribution: – We present a transfer learning model based on the GoogLeNet model that more effectively detects skin cancer people, even if they are at a preliminary phase. – With a large dataset, our suggested transfer learning model performs better in terms of accuracy than other deep learning (DL) models that are currently available. The remainder of this paper is organized as follows: The rest of the part of this paper is managed and Sect. 2 shows the literature review. Section 3 explains the methodology. Section 4 contains the results and discussion, while Sect. 5 contains the conclusion and future work.
240
2
S. Barman et al.
Related Work
In [6], A. Enezi et al. have proposed two machine learning algorithms. Feature extraction is done using a convolutional neural network (CNN), and classification is done with a support vector machine. The proposed system can successfully detect three skin diseases, and the accuracy rate is 100%. The system was fast and accurate. The system can only detect three diseases, and being web-based, it was not helpful to everyone. In [33], Vijayalakshmi, M.M. et al. have offered three alternative techniques to effectively identify and categorise melanoma skin cancer. The model is designed in three phases. Pre-processing involves removing hair, glare, and shade from photos in the initial stage. Segmentation and classification form the second phase. Utilising a Convolution Neural Network to extract features (CNN). In order to classify the photos, a neural network and support vector machine are used. They have an accuracy of 85%. In [28], Rathod, J., Waghmare, V., Sudha, A., and Shivashankar, et al. have proposed an automated image-based system for skin disease recognition using machine learning classification. The proposed system extracts the features using a Convolutional Neural Network (CNN) and classifies the image based on the algorithm of the softmax classifier. An initial training gives the output accuracy of 70% approximately. In this paper, they initially tested five diseases. We can further increase the accuracy by more than 90% if we use a large dataset. In [10], Bhadula, S., Sharma, S., Juyal, P., and Kulshrestha, C. et al. have included five different machine learning techniques that were used on a dataset of skin infections to predict skin diseases. These algorithms are random forest, naive Bayes, logistic regression, kernel SVM, and CNN. Above all, the algorithm Convolutional Neural Network (CNN) gives the best training and testing accuracy of 99.05%. Early diagnosis and classification of skin diseases helps to lessen the disease’s effects. In this study, the researcher have some limitation with access to and availability of medical information. In [18], Bhadula, S., Sharma, S., Juyal, P., and Kulshrestha, C. et al. have included five different machine learning techniques that were used on a dataset of skin infections to predict skin diseases. These algorithms are random forest, naive Bayes, logistic regression, kernel SVM, and CNN. Above all, the algorithm Convolutional Neural Network (CNN) gives the best training and testing accuracy of 99.05%. Early diagnosis and classification of skin diseases helps to lessen the disease’s effects. In this study, the researcher have some limitation with access to and availability of medical information. In [24], Padmavathi, S., Mithaa, E.M., et al. have proposed convolutional neural networks (CNN) and residual neural networks (ResNet) to predict skin
Transfer Learning Based Skin Cancer Classification
241
disease. A dataset of 10015 dermatoscopic images divided into seven classifications was used in this study. The experimental results show that the Convolutional neural network has an accuracy of 77%, whereas ResNet has an accuracy of 68%. They mentioned that Convolution Neural Networks perform better than Residual Neural Networks in diagnosing skin diseases. In [13], EL SALEH, R., BAKHSHI, et al. have mentioned a Convolutional neural network model named VGG-16 for face disease identification. A dataset comprising ten classes and each containing 1200 photos is used to test and analyse the suggested approach. The model can successfully identify eight facial skin diseases. Python is utilized to implement the algorithms, while python OPENCV is employed for pre-processing. The model achieves an accuracy of 88%. Further, we can improve the model by increasing the dataset size and applying a new deep neural network. In [5], Samuel Akyeramfo-Sam, Derrick Yeboah et al. proposed an intelligent way to detect skin diseases by Machine learning (ML) that uses Convolutional Neural Network (CNN), decision trees (DT), artificial neural network (ANN) support vector machines (SVM). The CNN model and the pattern learned are used to classify the test dataset. The system is successful in detecting three types of diseases. The average accuracy is 85.9%. In [12], JinenDaghrir, LotfiTlig et al. raised an automated system to detect melanoma using the three different methods. This relies on a convolutional neural network, Two classical machine learning methods. Taking the feature out, a training phase is necessary to create a classification model for melanoma detection. The support vector machines (SVMs) process the image and calculate complexity. However, they suggested comparing a KNearestNeighbor (KNN) classifier and an Artificial Neural Network (ANN), which showed that the ANN was more accurate than the KNN. Though their raised system is so fast and successful, they work only on some diseases. They can use CNN to improve the system. In [31], Xiaoxiao Sun et al. proposed an automated skin diseases detection system. They work on some datasets to detect some skin diseases. Their paper presents no exact method and system. This paper introduces and comes out with a data set to find some skin diseases. The main success flow is data collection, which we will use in the future.
3
Methodology
The technique we recommend for detecting Skin cancer is outlined in this section. The approach is separated into several sections. In this methodology, we first gather a training dataset. After gathering the training dataset, we pre-process the dataset to obtain clean image data for better input and carry out data augmentation. Data analysis is the last step before classification and a learning model are created. Figure 1 illustrates our research’s general technique.
242
S. Barman et al.
Fig. 1. Proposed Model
3.1
Dataset Description
The data were obtained via Kaggle [1]. There are a lot of contrasting pictures in the collection. Nine types of data are included in the dataset. There are nine types of skin cancer:Actinic Keratosis, Basal Cell Carcinoma, Dermato Fibroma, Melanoma, Nevus, Pigmented Benign, Keratosis, Seborrheickeratosis, Squamous Cell Carcinoma, Vascular Lesion. The system takes the picture and compare those picture with the dataset and perform some action. 3.2
Data Preprocessing
There are taken pictures in the collection. However, as GoogleNet is built to accept coloured photos with an input layer size of 224 224 3, pre-processing is necessary. Z-score Normalization was used to first standardize the intensity levels of the images. Equation 1 was used to normalize each image’s value to be within the range of 0 to 1. x−σ (1) z= s where s represents the standard deviation of the training sample and x represents the training sample. 3.3
Data Augmentation
Due to insufficient training data, deep learning models such as GoogleNettransfer learning model for skin disease classification become hampered. To increase the stability, and expand the functional variety of the model, more data is needed. To achieve this, we employ augmentation [30] to significantly skew the dataset.
Transfer Learning Based Skin Cancer Classification
243
Image augmentation techniques include rotation, width shift, shear range, height shift, and zoom. The model can now generalize more effectively thanks to the enhanced data. In this regard, we have utilized Image Data Generator. The settings for data augmentation used in this study are as follows: Table 1. Data augmentation settings Augmentation techniques Range
3.4
Rotation
40
Width Shift
0.2
Shear range
0.2
Height Shift
0.2
Zoom
0.20
Fill mode
nearest
GoogLeNet
GoogLeNet is a 22 layers deep CNN (Convolutional Neural Network). GoogLeNet features nine linearly fitted inception modules. The architecture, which determines the global average pooling, which replaces fully connected layers withaverage of each map’s feature. Nowadays, GoogLeNet is now utilized for various computer vision tasks, including face detection and identification, adversarial training, and so on. GoogLeNet took first place in the ILSVRC 2014 competition thanks to the inception block, which utilizes parallel convolution to enhance the width and depth of networks. The specifics of each inception block are shown in Fig. 2. Each inception block employs four routes to obtain detailed spatial information. To lessenThe use of 1 × 1 convolutions is based on feature dimensions and processing costs. BecauseAfter each inception block, features are concatenated; if no constraints were placed in place, computation costs would increase as feature dimensions in a matter of steps increased. The intermediate features’ dimensions are reduced by utilizing 1-by-1 convolutions. Each path’s units have a different filter after convolution. Widths to guarantee that separate local spatial feature sets may be retrieved and combined. Noteworthy is the use of max-pooling in the final approach, which removes the ability to extract new features eliminates the requirement for additional parameters. Following the integration of all the data, Google Net topped the ImageNet classification test of the well designed architecture. 3.5
Xception Model
DDeeply separable Convolutions are employed in the Xception deep convolutional neural network design [11]. Francois Chollet, an employee at Google, Inc.,
244
S. Barman et al.
Fig. 2. Inception Block.
introduced this network. It’s call extreme version and it’s come from Inception module. An Inception module is called a deep neural network (DNN). The inventor of this model inspires by a movie that’s name is Inception (directed by Christopher Nolan) is a movie released in 2010 Xception is a 71 layers convolutional neural network. Its accuracy will more from Inception V3. For this assumption it’s an extreme version of Inception. Xception, which stands for “extreme inception,” pushes Inception’s core concepts to their absolute extent.1 × 1 convolutions were used to compress the original input in Inception, and different sorts of filters were applied to each depth space from each of those input spaces.With xception, the opposite happens. Instead, Xception applies the filters to every depth map separately before compressing the input space all at once with 1X1 convolution. A depthwise separable convolution is quite similar to this method. There is another difference between Inception and Xception. Whether or not there is a non-linearity after the previous trial. While Xception doesn’t introduce any non-linearity, the Inception model has a ReLU non-linearity that follows both processes. 3.6
DenseNet Model
Dense Neural network (DenseNet) is working like a feed-forward fashion [35]]. It’s connecting each layer. Main focus point of this model is to go deeper and eventually take care about to making them more efficient to train. If we think about other neural network then we can see there are L connection for L layers but for DenseNet our network has L(L+1)/2 direct connections. Image classification is main, fundamental and essential computer vision task. VGG has 19 layers, the original LeNet5 had 5, and Residual Networks (ResNet) have crossed the 100-layer threshold. These models could encounter issues including too many parameters, gradient disappearance, and challenging training. In comparison to models like VGG and ResNet, the Dense Convolutional Network (DenseNet)
Transfer Learning Based Skin Cancer Classification
245
exhibits dense connection. Direct connections from any layer to all subsequent layers distinguish the DenseNet model from other CNNs and potentially enhance the information flow between layers. As a result, DenseNet may effectively minimize some parameters, improve feature map propagation, and solve the gradient vanishing problem. 3.7
Inception-Resnet V2 Model
More than one million pictures were used to train the convolutional neural network named Inception-Resnet V2. In this case, ImageNet was used to train the model [34]. The Inception-Resnet V2 model contains a 164-layer network that can classify images into 1000 different object categories, including pencil, mouse, keyboard, and animal images. Through a comparative investigation and examination of the classification model’s structure, an improved Inception-ResNet-v2 model based on CNN is created in order to increase the convolutional neural networks (CNN) accuracy in image classification [27]. Model Inception-ResNet-v2, which can extract features under various receptive fields and lower the number of model parameters. In addition, it creates a channel filtering module based on a comparison of all available data to filter and combine channels, realizing efficient feature extraction. 3.8
Transfer Learning
A technique called transfer learning uses a model that has already been trained to learn new information from an existing set of data [32]. There is an input space (Ds), a training task (Ts), a target domain (Dt), and related data in the input space. Transfer learning seeks to raise trainee performance on a given task (Tt). Data from Ds and T are combined. Different transfer learning settings are established depending on the type of task and the nature of the data available at the source and destination domains. The transfer learning method is called “inductive transfer learning” when both the source and target domains have labelled data available for a classification task [25]. The domain in this instance is D = (xi, yi), where xi is the feature vector of the ith training set and yi the classifier. There are 24 million trainable parameters in the 164-layer Google Net. For training and optimization, this kind of deep model needed a sizable dataset, which is why Google’s Neural Network (googleLeNet) was trained on the ImageNet dataset, which has over 1.2 million photos organised into 1000 different categories. As a result, smaller datasets, such as the skin cancer, are more easily analyzed. Overfitting is likely to be a problem for the model. It is at this stage that the transfer takes place. This is where learning plays a role. We use pretrained weights to create the model. After which you should fine-tune it to ensure that it can complete the task at it, which in our instance was. For smaller datasets, such as brain tumours, we don’t need to start from scratch when training the model. Because
246
S. Barman et al.
the GoogLeNet model was designed for a different purpose, considerable structural changes are needed to classify skin cancer. The last three layers of the GoogLeNet model were tweaked to fit the intended purpose. The average pooling layer in the flatten layer and the fully connected layer of googleNet were added to the original model to replace it was also scrapped, along with a system that was supposed to categorise 1000 separate classifications. Four output sizes in a new FC layer have been added. After the FC layer, the softmax layer’s activation was similarly changed out for a fresh one.
4
Result and Discussion
This section describes our approach’s experiments and outcomes. 4.1
System Configuration
In this system we used a tensor-flow neural network to run the convolutional neural network. The most probable reason is to use this network for there are several matrix multiplication. We face some problem when we work on this and only CPU processing is the most reason for our work and then we use Google collaborative cloud server. And then easily we operate the CPU and Jupyter Notebook. After use those we can train and evaluate our proposed deep learning approach 4.2
Training and Test Datasets
In this experiment we use 2357 colored 521X512 sized image of Skin Cancer. The total number of images in each class in given in the following table: Table 2. Different parameters Class
Training Images Testing Images
Actinic Keratosis
114
16
Basal Cell Carcinoma
376
16
Dermato Fibroma
95
16
Melanoma
438
16
Nevus
357
16
Pigmented Benign Keratosis 462
16
Seborrheic keratosis
77
3
Squamous Cell Carcinoma
181
16
Vascular Lesion
139
3
Total
2239
118
Transfer Learning Based Skin Cancer Classification
4.3
247
Transfer Learning Model’s Hyperparameters
Categorical crossentropy is the loss function we’re employing to train our model. We train our approach for a maximum of 50 epochs because there are no more such variances in training and validation levels of accuracy. The loss function is optimized using Adam’s optimizer. In our method, Table 4 illustrates the bestconfigured hyper-parameters. The total number of epochs and batch size in our test are 50 and 16, respectively. Table 3. Hyperparameters Hyper-parameters Value Loss Function
4.4
Categorical Cross-entropy
Epochs
50
Batch Size
16
Optimizer
Adam
Learning rate
0.001
Performance Matrices
For the traditional evaluation of a classification model, a number of performance parameters are described. The most often used statistic is classification accuracy. Classification accuracy is determined by the ratio of correctly classified observational data to the total number of observational data. Precision, recall (or sensitivity), and specificity, which are all significant measures in classification problems, can be calculated using the following equations. The number of classified true positives, false positives, true negatives, and false negatives is denoted by the letters TP, FP, TN, and FN. The harmonic mean of precision and recall is used to calculate the F-score, which is a useful statistical tool for classification. The following equation can be used to determine accuracy, precision, recall, and f-score. TP + TN (2) Accuracy = TP + FN + FP + TN TP (3) P recicion = TP + Fp TP TP + FN 2 × P recision × Recall F − Score = P recision + Recall Recall =
(4) (5)
248
4.5
S. Barman et al.
Result
We take our model and try to find the final result and now we get training result is 91.67% and loss is 1.11%. In the same we also get 89.93% accuracy and 1.99% loss for testing data. All those data are given bellow in the Table 3. Table 4. Accuracy and Loss Model Part Accuracy Loss Training
91.17%
1.11
Testing
89.93%
1.99
Our calculation will be more perfect for our classification when the test dataset contains an equal number of observations for each class. Otherwise our dataset will be used to solve the aforementioned classification problem. This model needs a more in-depth analysis of the proposed strategy employing additional performance metrics. Table 4 shows the Precision, Recall, and F1-score of our recommended transfer learning method as well as a comparison to alternative methods like DenseNet, Xception, and InceptionResnetV2. Precision, recall, and F1-score for our suggested technique are 0.785, 0.687, and 0.733, respectively. Our approach also outperforms the other three pre-trained algorithms, as seen in the table. Table 5. Performance Matrices Model
Accuracy Precision Recall F1-Score
GoogLeNet
89.93%
0.785
0.687
0.733
Xception
86.81%
0.762
0.562
0.638
DenseNet
85.59%
0.730
0.593
0.654
InceptionResNetV2 88.89%
0.750
0.656
0.706
In this measurement our CNN-based transfer model was trained with more iterations condition (up to 50) The optimal set epoch is 50 because our model’s training and validation accuracies have not increased. The accuracy and loss of the model are depicted in Fig. 3 and 4. In Fig. 3, the training accuracy is less than 72.5 percent, and the validation accuracy is less than 80.00 percent at the first epoch. In fig 4, the training loss is more than 1.4 and the validation loss is greater than 1.1 at the starting epoch. When the number of epochs is increased, accuracy improves and loss decreases.
Transfer Learning Based Skin Cancer Classification
Fig. 3. Training and Validation Accuracy of GoogLeNet
4.6
249
Fig. 4. Training and Validation Loss of GoogLeNet
Comparison with Existing Work
[13] This paper applied the VGG-16 model to identify skin defects. A database of 12,000 photos was used to train and verify the model. It has an 88% accuracy rate.[8] Skin diseases may be predicted with 77% and 68% accuracy using deep learning neural networks (CNN) and Residual neural networks (ResNet). Additionally, it has been discovered that Convolution Neural Networks outperform Residual Neural Networks in diagnosing skin diseases. To increase the accuracy, they might need to create a hierarchical classification algorithm utilizing retrieved photos. Predictions may therefore be made more often than with earlier models by utilizing ensemble features and deep learning.[10] TThe system analyses an image, and performing the most important part feature extraction using the CNN method and show the SoftMax image classifier to identify diseases. An initial training results in an output accuracy of about 70%. Table 6. Comparison with Existing Work Author
Method
Acuuracy
E. SALEH R. et al. [13]
VGG-16
88%
S. Padmavathi et al. [24] CNN and ResNet 77% and 68%
5
J. Rathod [28]
CNN
0.70%
Proposed Method
GoggLeNet
0.89.93%
Conclusion and Future Work
The categorization of skin malignancies using transfer learning with GoogLeNet was discussed in this work. We categorised skin cancer into nine kinds in our study, which is the most comprehensive categorization of skin cancer to date. We
250
S. Barman et al.
used data augmentation techniques for the existing dataset because we needed a large amount of data for effective training and deployment of CNN-based architecture. We were able to obtain the required result with this method. The suggested approach greatly outperforms state-of-the-art models, with precision, recall, and F1 scores of 76.16%, 78.15%, and 76.92%, respectively, according to the exploratory research. Using a variety of performance matrices, including the weighted average and total accuracy In addition, the model demonstrates its capability. More research can be done to analyze and understand the situation. In future we will collect more data to detect skin cancer disease and we will work with some other deep learning method and other method [2–4,7–9,14– 17,19–22,26,36].
References 1. The International Skin Imaging Collaboration (ISIC). The international skin imaging collaboration (ISIC). Accessed 30 April 2022 2. Abedin, M.Z., Akther, S., Hossain, M.S.: An artificial neural network model for epilepsy seizure detection. In: 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), pp. 860–865. IEEE (2019) 3. Ahmed, T.U., Hossain, M.S., Alam, M.J., Andersson, K.: An integrated CNNRNN framework to assess road crack. In: 2019 22nd International Conference on Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2019) 4. Ahmed, T.U., Jamil, M.N., Hossain, M.S., Andersson, K., Hossain, M.S.: An integrated real-time deep learning and belief rule base intelligent system to assess facial expression under uncertainty. In: 2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), pp. 1–6. IEEE (2020) 5. Akyeramfo-Sam, S., Philip, A.A., Yeboah, D., Nartey, N.C., Nti, I.K.: A web-based skin disease diagnosis using convolutional neural networks. Int. J. Inf. Technol. Comput. Sci. 11(11), 54–60 (2019) 6. ALEnezi, N.S.A.: A method of skin disease detection using image processing and machine learning. Procedia Computer Science 163, 85–92 (2019) 7. Basnin, N., Nahar, L., Hossain, M.S.: An integrated CNN-LSTM model for micro hand gesture recognition. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 379–392. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-68154-8 35 8. Basnin, N., Nahar, L., Hossain, M.S.: An integrated CNN-LSTM model for bangla lexical sign language recognition. In: Kaiser, M.S., Bandyopadhyay, A., Mahmud, M., Ray, K. (eds.) Proceedings of International Conference on Trends in Computational and Cognitive Engineering. AISC, vol. 1309, pp. 695–707. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-4673-4 57 9. Basnin, N., Nahar, N., Anika, F.A., Hossain, M.S., Andersson, K.: Deep learning approach to classify parkinson’s disease from MRI samples. In: Mahmud, M., Kaiser, M.S., Vassanelli, S., Dai, Q., Zhong, N. (eds.) BI 2021. LNCS (LNAI), vol. 12960, pp. 536–547. Springer, Cham (2021). https://doi.org/10.1007/978-3-03086993-9 48 10. Bhadula, S., Sharma, S., Juyal, P., Kulshrestha, C.: Machine learning algorithms based skin disease detection. Int. J. Innovative Technol. Explor. Eng. (IJITEE) 9(2) (2019)
Transfer Learning Based Skin Cancer Classification
251
11. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) 12. Daghrir, J., Tlig, L., Bouchouicha, M., Sayadi, M.: Melanoma skin cancer detection using deep learning and classical machine learning techniques: a hybrid approach. In: 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5. IEEE (2020) 13. El Saleh, R., Bakhshi, S., Amine, N.A.: Deep convolutional neural network for face skin diseases identification. In: 2019 5th International Conference on Advances in Biomedical Engineering (ICABME), pp. 1–4. IEEE (2019) 14. Gosh, S., Nahar, N., Wahab, M.A., Biswas, M., Hossain, M.S., Andersson, K.: Recommendation system for e-commerce using alternating least squares (ALS) on apache spark. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 880–893. Springer, Cham (2021). https://doi.org/10.1007/978-3-03068154-8 75 15. Islam, R.U., Hossain, M.S., Andersson, K.: A deep learning inspired belief rulebased expert system. IEEE Access 8, 190637–190651 (2020) 16. Islam, R.U., Ruci, X., Hossain, M.S., Andersson, K., Kor, A.L.: Capacity management of hyperscale data centers using predictive modelling. Energies 12(18), 3438 (2019) 17. Kabir, S., Islam, R.U., Hossain, M.S., Andersson, K.: An integrated approach of belief rule base and deep learning to predict air pollution. Sensors 20(7), 1956 (2020) 18. Kumar, V.B., Kumar, S.S., Saboo, V.: Dermatological disease detection using image processing and machine learning. In: 2016 3rd International Conference on Artificial Intelligence and Pattern Recognition (AIPR), pp. 1–6. IEEE (2016) 19. Nahar, N., Ara, F., Neloy, M.A.I., Biswas, A., Hossain, M.S., Andersson, K.: Feature selection based machine learning to improve prediction of parkinson disease. In: Mahmud, M., Kaiser, M.S., Vassanelli, S., Dai, Q., Zhong, N. (eds.) BI 2021. LNCS (LNAI), vol. 12960, pp. 496–508. Springer, Cham (2021). https://doi.org/ 10.1007/978-3-030-86993-9 44 20. Nahar, N., Ara, F., Neloy, M.A.I., Barua, V., Hossain, M.S., Andersson, K.: A comparative analysis of the ensemble method for liver disease prediction. In: 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), pp. 1–6. IEEE (2019) 21. Nahar, N., Hossain, M.S., Andersson, K.: A machine learning based fall detection for elderly people with neurodegenerative disorders. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 194–203. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 18 22. Neloy, M.A.I., Nahar, N., Hossain, M.S., Andersson, K.: A weighted average ensemble technique to predict heart disease. In: Kaiser, M.S., Ray, K., Bandyopadhyay, A., Jacob, K., Long, K.S. (eds.) Proceedings of the Third International Conference on Trends in Computational and Cognitive Engineering. LNNS, vol. 348, pp. 17–29. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-7597-3 2 23. Pacheco, A.G., Krohling, R.A.: Recent advances in deep learning applied to skin cancer detection. arXiv preprint arXiv:1912.03280 (2019) 24. Padmavathi, S., Mithaa, E., Kiruthika, T., Ruba, M.: Skin diseases prediction using deep learning framework. Int. J. Recent Technol. Eng. (IJRTE) (2020) 25. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
252
S. Barman et al.
26. Pathan, R.K., Uddin, M.A., Nahar, N., Ara, F., Hossain, M.S., Andersson, K.: Gender classification from inertial sensor-based gait dataset. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 583–596. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8 51 27. Peng, C., Liu, Y., Yuan, X., Chen, Q.: Research of image recognition method based on enhanced inception-resnet-v2. Multimedia Tools and Applications, pp. 1–21 (2022) 28. Rathod, J., Waghmode, V., Sodha, A., Bhavathankar, P.: Diagnosis of skin diseases using convolutional neural networks. In: 2018 2nd International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1048–1051. IEEE (2018) 29. Rogers, H.W., Weinstock, M.A., Feldman, S.R., Coldiron, B.M.: Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the us population, 2012. JAMA Dermatol. 151(10), 1081–1086 (2015) 30. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019) 31. Sun, X., Yang, J., Sun, M., Wang, K.: A benchmark for automatic visual classification of clinical skin disease images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 206–222. Springer, Cham (2016). https:// doi.org/10.1007/978-3-319-46466-4 13 32. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: K˚ urkov´ a, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7 27 33. Vijayalakshmi, M.: Melanoma skin cancer detection using image processing and machine learning. Int. J. Trend Sci. Re. Develop. (IJTSRD) 3(4), 780–784 (2019) 34. Wan, X., Ren, F., Yong, D.: Using inception-resnet v2 for face-based age recognition in scenic spots. In: 2019 IEEE 6th International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 159–163. IEEE (2019) 35. Zhong, Z., Zheng, M., Mai, H., Zhao, J., Liu, X.: Cancer image classification based on densenet model. In: Journal of Physics: Conference Series. vol. 1651, p. 012143. IOP Publishing (2020) 36. Zisad, S.N., Hossain, M.S., Andersson, K.: Speech emotion recognition in neurological disorders using convolutional neural network. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 287–296. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6 26
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients Using Machine Learning Techniques Prosenjit Karmaker
and Muhammad Sajjadur Rahim(B)
Department of Information and Communication Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh [email protected]
Abstract. There is currently little evidence linking COVID-19 to Alzheimer’s Disease (AD). The goal of this paper is to examine the correlation among COVID19 symptoms to identify risks for AD patients and to determine the conditions that put AD patients in danger. We have developed a Machine Learning (ML) based model called AD-Cov-CorrelationNet that shows the relationship between various health issues and whether every attribute in the dataset is connected. We have discovered a direct link between several health issues in AD patients. The risk of getting an infection when they are directly contacted by the outside environment is very high. Although there is no direct contact with the outside environment, AD patients are still vulnerable to some health issues which cause serious problems and increase the risks of death. Supervised learning models such as Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) are utilized to understand the disease prognosis. The risk factors that the models predicted are clinically meaningful and relevant to reducing fatality. This comparative analysis achieves more than 98% accuracy, 97% precision, 97% recall, 97% F1 score, and accurate Receiver Operating Characteristic (ROC) curves. Keywords: COVID-19 · Alzheimer’s disease · Machine learning algorithms · Cross-validation · Correlation · Symptom · Predictive model · Logistic Regression · KNN · Decision Tree · Random Forest · SVM · MLP
1 Introduction Late in 2019, the new coronavirus SARS-CoV-2 made its preliminary appearance in China, with a market in Wuhan, China, serving as its source. The COVID-19 virus spread across the world, causing the World Health Organization to proclaim a global outbreak in March 2020. There have been 3,349,786 COVID-19 cases and 238,628 deaths globally as of May 3, 2020. As our understanding of COVID-19 has grown, older age groups have emerged as one of the key risk variables linked to horrible outcomes following infection, with adults over 58 years old having a risk of dying from COVID-19 © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 253–266, 2023. https://doi.org/10.1007/978-3-031-34619-4_21
254
P. Karmaker and M. S. Rahim
that is double that of children [1]. COVID-19 symptoms can vary from mild to severe, and can even be fatal in some cases. Coughing, fever, loss of smell and taste are all common side effects, with migraine, nasal congestion, and respiratory problems being less so. In moderate to severe cases, other symptoms include severe stomach pains, sore throat, diarrhea, eye problems, swelling or purple toes, and breathlessness. A neurological disorder known as Alzheimer’s disease (AD) is characterized by memory loss, emotional disturbances, and abnormalities of the behavioral system. Alzheimer’s disease affects more than 50 million individuals worldwide (Alzheimer’s Report WHO), and most pharmaceutical medicines only have palliative effect. Aside from negatively impacting quality of life for patients and human health, Alzheimer’s disease has a large financial impact. The most widespread degenerative nerve disorder globally is Alzheimer’s disease (AD), indicating that up to 80% of Alzheimer’s is caused. Among the 50 leading reasons for decreased life expectancy, this is one of the fastest-growing; if current trends continue, the number of Alzheimer’s disease patients will exceed 150 million by 2050 [2, 3]. Patients with Alzheimer’s disease often have short-term and long-term memory loss, as well as confusion, rage, violence, language issues, and mood changes as the disease progresses. Alzheimer’s disease has a global economic cost of one billion dollars every year. State-of-the-art supervised learning models such as Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, Random Forest, Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) are used to assess the prognosis and course of the disease. This technique can classify enormous volumes of unstructured data, including correlations between symptoms and outcomes [4, 5]. Machine learning architectures and algorithms have developed recently because of their use in a variety of industries, including speech recognition, picture processing, and answering biological inquiries. In biological circumstances, their risk relationship might be dependent on other independent causative factors that have a strong correlation to the disease. However, a variety of unbalanced datasets frequently limits the model performance. In addition, all these models exhibit individual limitations. The results are superior to the accuracy of the diagnosis. The risk factors that the models predicted are clinically meaningful and relevant to reduce fatality. The Pearson’s correlation model is expected to perform noticeably better than its competitors in simulating the complex interconnection of Alzheimer’s disease (AD) patient risks to catch COVID-19. As a result, three research questions (RQs) are investigated to evaluate the effectiveness of the suggested tactic for state-of-the-art approaches: • RQ1: How can unbalanced data be effectively handled and prepared for machine learning (ML) models? Or, how can unbalanced data be made more balanced in order to process ML models? • RQ2: How effectively can correlation method categorize an AD patient’s COVID-19 infection risk based on symptoms? • RQ3: What possible risk factors could lead to the serious problems of the Alzheimer’s disease patients with COVID-19 (AD-COVID-19)? The proposed solution is correlation study. To accomplish this, we have developed a ML-based AD-Cov-CorrelationNet model. Using the proposed model, this research shows the relationship between various health issues and whether every attribute in
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients
255
the dataset is connected to one another. Relational attributes vary with strength and direction when the main attribute changes. We discovered a direct link between several health issues that will be risky for Alzheimer’s disease (AD) patients and increase their risk of death. The goal of our inquiry is to identify the risk factors that endanger people with Alzheimer’s disease (AD). This paper’s main contributions are as follows: • In order to balance huge unbalanced datasets, the study investigated cutting-edge resampling approaches and used evaluation measures (Logistic Regression, KNN, Decision Tree, Random Forest, SVM and MLP). In comparison to the body of previous works, this dataset is huge and imbalanced. Researchers in the field are confident in the data balancing method used. • Based on a real dataset, the study used a correlation method called Pearson correlation coefficient for classifying symptoms of AD-COVID-19 patients. Sweeping attributes are used to optimize the model throughout the experiment. • Without deleting feature subsets, the study demonstrated a reliable correlation result for identifying risk variables from already-existing, diverse feature sets related to AD-COVID-19 patients. • The study revealed that ML models may be applied in clinical practice by providing patients with risk variables that have clear therapeutic benefits in addition to improvements in performance and accuracy. The following sections contains (2) Literature Review, (3) Data Description, (4) Research Methodology, (5) Learning Models, (6) Results and Analysis, (7) Comparative Study, and (8) Conclusions.
2 Literature Review According to a study as in [3], Alzheimer’s disease was the sixth-leading reason of death in the U.S in 2019, and the fifth-leading cause of mortality among Americans of age 65 and older. Deaths from stroke, heart problems, and HIV decreased between 2000 and 2019, whereas recorded Alzheimer’s disease mortality has climbed by more than 145%. Many researchers have already used a machine learning-based approach to predict COVID-19 using a cough dataset [6]. This research provides coronavirus positive or negative predictions for different age groups and regions but is not able to detect which illness or symptoms affect a patient badly. This study inspires us to do further research on coronavirus and Alzheimer’s patients. Furthermore, we studied about the lifestyle and situation of Alzheimer’s patients. According to a study on Alzheimer’s patients [7], we found the proper reasons and difficulties for dementia patients. We found an artificial-based home solution too but were unable to identify how different illnesses can take part with coronavirus-positive Alzheimer’s patients. The works in [8] provided us with the idea to study different illness of coronavirus on Alzheimer’s patients. In addition, the research in [9] provided us the fatality rate idea of Alzheimer’s patients due to COVID-19. In this paper, using Google collaboration, the sensitivity, accuracy, specificity, and area under the ROC curve of the comparative analysis are evaluated. We compare every illness factor with each Alzheimer’s patient who is COVID-19 positive. This
256
P. Karmaker and M. S. Rahim
study focuses on the correlation between different illnesses and coronavirus-positive Alzheimer’s patients. We also identify the accuracy of our research to ensure proper outcomes.
3 Dataset Description With the World Health Organization (WHO)’s open data repository (collected from kaggle.com), this research is focused to check over 5000+ AD-COVID-19 patients worldwide to identify how different health conditions and risk factors take effect on AD-COVID-19 patients. Using Pearson’s correlation coefficient, we attempt to detect infection risks based on symptoms related COVID-19 infection. Therefore, we use a survey dataset which has the information of every patient about 13 health conditions and 8 risk factors as depicted in Table 1. Table 1. Dataset description. Attribute
Data Type
Equivalent Data Type
Non-Null Count
Breathing problem
Object
Binary
5434 non-null
Fever
Object
Binary
5434 non-null
Sore throat
Object
Binary
5434 non-null
Runny nose
Object
Binary
5434 non-null
Dry cough
Object
Binary
5434 non-null
Asthma
Object
Binary
5434 non-null
Chronic lung disease
Object
Binary
5434 non-null
Heart disease
Object
Binary
5434 non-null
Headache
Object
Binary
5434 non-null
Diabetes
Object
Binary
5434 non-null
Hyper tension
Object
Binary
5434 non-null
Fatigue
Object
Binary
5434 non-null
Gastrointestinal
Object
Binary
5434 non-null
Abroad travel
Object
Binary
5434 non-null
Contact with COVID-19 patient
Object
Binary
5434 non-null
Attended large gathering
Object
Binary
5434 non-null
Visited public exposed places
Object
Binary
5434 non-null
Wearing masks
Object
Binary
5434 non-null
Sanitization
Object
Binary
5434 non-null
COVID-19
Object
Binary
5434 non-null
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients
257
4 Research Methodology There are five subsystems in the proposed AD-Cov-CorrelationNet model as shown in Fig. 1. Data categorization and characterization are covered in the first subsystem. This subsystem explains how the symptoms are divided as attribute in dataset. The second subsystem deals with how imbalanced data is processed. The optimal approach to show the data using statistical indicators has been determined in the third subsystem utilizing a variety of machine learning algorithms. The fourth subsystem addresses the correlation method used to categorize the risks of getting infection. The fifth subsystem addresses the performance evolution part and provides the accuracy, precision, recall, and F1 Score. Another subsystem connected to the fourth subsystem provides comprehensive processing of the correlation method.
Fig. 1. The operational outline of the proposed AD-Cov-CorrelationNet model.
5 Learning Models Table 2 gives a concise view of different ML algorithms used as learning models.
258
P. Karmaker and M. S. Rahim Table 2. Definition of ML algorithms and characterization of learning models.
Sl. No
ML Algorithm
Definition
Pros and Cons
1
Logistic Regression
In order to predict a binary outcome, logistic regression uses prior observations from a data collection
The training of logistic regression is very effective and easier to implement and analyze. If the number of data points is smaller than the number of features, logistic regression should not be used
2
K-Nearest Neighbors (KNN)
Classification and regression problems can be addressed using the supervised machine learning method known as the K-Nearest Neighbors (KNN)
It is instance-based learning. KNN is simple to use. To implement KNN, only two parameters are needed. It is unable to handle huge datasets, aware of noisy data, missing values, and outliers
3
Decision Tree
It is a method of decision support that utilizes a tree-like model to describe options and their possible results, including the possibility of chance events
Easily interpreted and understood, excellent for visual depiction. It has the ability to use both numerical and category features
4
Random Forest
A classification system made up of several decision trees is called the random forest
It is effective with non-linear data. Low probability of mistakes, and effectively uses a large dataset. Training is slow. For linear algorithms with numerous sparse features, it is not recommended
5
Support-Vector Machine (SVM)
SVMs, also referred to as support-vector machines, are supervisory learning models to analyze data for regression and classification
When there is a distinct margin of distinction, it works incredibly well. In high dimensional spaces, it works well. When we have a large data set, it does not perform as well because the training time is longer (continued)
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients
259
Table 2. (continued) Sl. No
ML Algorithm
Definition
Pros and Cons
6
Multi-Layer Perceptron (MLP)
It is a completely connected Ability to learn non-linear feed-forward neural network models. Real-time model learning capability (online learning). Scaling of features has an impact on MLP
6 Results and Analysis 6.1 Correlation Model Performance (Pearson Correlation Analysis) The correlation model is used to quantify the linear relationship between two variables. It is possible for the correlation coefficient to fall between −1.0 and 1.0. The figures must not exceed 1.0 or fall below −1.0. A correlation of −1.0 denotes a perfect negative correlation, whereas a correlation of 1.0 denotes a perfect positive correlation. The performance of the correlation model is given in Table 3. Table 3. Correlation model performance (Pearson correlation analysis). Pearson correlation coefficient (r) value
Strength
Direction
Main Attribute
Relational Attribute (When main attribute changes relational attribute Changes too with strength and direction)
Findings
Greater than 0.5
Strong
Positive
Breathing problem, Fever, Dry cough, Sore throat, Asthma, Lung disease, Heart disease, Diabetes, Hypertension
Contact with COVID-19 patients, attended large gathering, abroad travel
Patients with direct contact with outside world are mostly suffer from COVID-19
(continued)
260
P. Karmaker and M. S. Rahim Table 3. (continued)
Pearson correlation coefficient (r) value
Strength
Direction
Main Attribute
Relational Attribute (When main attribute changes relational attribute Changes too with strength and direction)
Findings
Between 0.3 and 0.5
Moderate
Positive
Breathing problem, Fever, Dry cough, Sore throat, Runny nose
Asthma, Lung disease, Heart disease, Diabetes, Hypertension
Patients with serious chronic illness with COVID-19 symptoms also suffer from infection in spite of no direct contact with outside world
Between 0 and 0.3
Weak
Positive
Breathing problem, Fever, Dry cough
Diabetes, Hypertension
Patients with only Diabetes, Hypertension with mild COVID-19 symptoms also suffer from infection. But, the cases are not that significant
0
None
None
Headache, Fatigue, Gastrointestinal
Contact with COVID-19 patients, attended large gathering, abroad travel
Patients with Headache, Fatigue, Gastrointestinal problems are less likely to suffer from COVID-19 in spite of direct contact with outside (continued)
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients
261
Table 3. (continued) Pearson correlation coefficient (r) value
Strength
Direction
Main Attribute
Relational Attribute (When main attribute changes relational attribute Changes too with strength and direction)
Findings
Between 0 and –0.3
Weak
Negative
Headache, Hypertension, Fatigue, Gastrointestinal
Diabetes, Fatigue, Breathing Problem
Patients are suffering low COVID-19 positive rate (In rare cases)
Between –0.3 and –0.5
Moderate
Negative
None
None
No correlation
Negative
None
None
No correlation
Less than –0.5 Strong
The results of correlation heat map are presented in Fig. 2.
Fig. 2. Correlation heat map results (simple format).
262
P. Karmaker and M. S. Rahim
6.2 Confusion Matrix Confusion matrix includes data on actual and expected classifications. Four types of combination are given as follows. The number of true predictions that an event is positive is known as True Positive (TP), the number of false negatives (FN), or positive classes that are wrongly categorized as negative, is the number of improperly anticipated negative cases. The term “false positive” (FP) describes the quantity of incorrectly positive predictions made regarding a specific example, indicating that a negative class was inadvertently labeled as positive. The number of correctly predicted instances where an example is negative is known as True Negative (TN). The confusion matrix of Logistic Regression is shown in Fig. 3. Table 4 gives the measured values of all the confusion matrices.
Fig. 3. Confusion matrix of Logistic Regression.
6.3 Accuracy, Precision, Recall, and F1-Score It is important to measure the values of accuracy, recall, F1-Score, precision. 1. Accuracy: Accuracy is defined as the proportion of correct guesses in the number of projections overall. 2. Recall: True Positive Rate (TPR) or Recall is other term for sensitivity. It is a measurement for how many positive cases the classifier recognized consequently. It ought to be higher. 3. Precision: It is also known as the proportion of all positively classified instances to all positively projected cases. 4. F1 score: It is calculated using a weighted average of recollection (sensitivity) and reliability. Figure 4 depicts the accuracy comparison of different ML classifier models. Table 5 presents the performance comparison of different classifier models in terms of accuracy, precision, recall, and F1 score.
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients
263
Table 4. Confusion matrix of six ML models. Logistic Regression Predicted (0) Actual (0) TN 947 Predicted (1) FN 55 Decision Tree N=5434 Predicted (0) Actual (0) TN 1029 Predicted (1) FN 72 SVM N=5434 Predicted (0) Actual (0) TN 963 Predicted (1) FN 33 N=5434
Predicted (1) FP 104 TP 4328 Predicted (1) FP 22 TP 4311 Predicted (1) FP 88 TP 4350
N=5434 Actual (0) Predicted (1) N=5434 Actual (0) Predicted (1) N=5434 Actual (0)
Predicted (1)
KNN Predicted (0) TN 896 FN 8 Random Forest Predicted (0) TN 822 FN 1 MLP Predicted (0) TN 958 FN 61
Predicted (1) FP 155 TP 4375 Predicted (1) FP 229 TP 4382 Predicted (1) FP 93 TP 4322
Accuracy of ML Algorithms 100 98 96 94
97.07
97
98.27
97.77
98.6
95.76
Accuracy Fig. 4. Accuracy of ML Algorithms.
6.4 Receiver Operating Characteristic (ROC) Figure 5 shows the Receiver Operating Characteristic (ROC) comparison of different ML classifiers models.
264
P. Karmaker and M. S. Rahim Table 5. Performance comparison of different ML Classifier models.
ML Algorithm
Accuracy
Precision
Recall
F1 Score
Logistic Regression
97.07%
97.65%
98.74%
98.19%
KNN
97.00%
96.57%
99.81%
98.17%
Decision Tree
98.27%
99.49%
98.35%
98.92%
Random Forest
95.76%
95.03%
99.97%
97.44%
SVM
97.77%
98.01%
99.24%
98.62%
MLP
98.60%
97.92%
98.61%
98.25%
Fig. 5. ROC comparison of different ML Classifiers models.
7 Comparative Study Table 6 compares the results of the correlation model with relevant studies and provides examples. These results show that the correlation model performed competitively in comparison to different models and studies. Nevertheless, we could not find more than one binary COVID-19 patient datasets containing AD patients and medical information
Assessing the Risks of COVID-19 on the Health Conditions of Alzheimer’s Patients
265
for comparison. This correlation study provides around 98% accuracy from the Ad-CovCorrelationNet model. Table 6. Comparative study. Description
Dataset Type
Implemented Method/Algorithm
Accuracy Reference
Predicting COVID-19
X-ray image
LSTM-RNN
96.0%
[10]
Predicting COVID-19
X-ray image
LSTM-RNN
93.0%
[11]
Predicting COVID-19
X-ray image
Res-CovNet
86.0%
[12]
Predicting AD-COVID-19 Mortality
Binary
AD-CovNet
97.0%
[9]
AD-COVID-19 Binary dataset AD-Cov-CorrelationNet 98.60% symptoms correlation
Our work
8 Conclusions This study discovers substantial connections between distinct COVID-19 symptom cases and the worldwide burden of dementia. Health policymakers must have thorough plans in place to identify those at risk (including older people) and limit the risk of infection, even while paying attention to clinical and psychiatric well- being, at this key stage of the epidemic, when countries are ready to lift their national lockdown and begin opening their borders. Such patients may be prioritized based on their risk level if a vaccination becomes more broadly available. As a result, it is critical to assess the impact of COVID19 on Alzheimer’s patients’ health. Whenever it comes to vaccine, Alzheimer’s sufferers will be given extra attention and importance. The mortality rate of Alzheimer’s patients may be lowered as a result of the research. A comparative analysis is conducted using Google collaboration research, which has evaluated the performance of each ML technique included in the Ad-Cov-CorrelationNet model in terms of accuracy, precision, recall, and F1 score. The accuracy of Logistic Regression, KNN, Decision Tree, Random Forest, and SVM classification models is greater than 95%, and MLP yields the best accuracy of 98.60%. The outcomes of this study also show accurate ROC curves. A patient who is directly contacted in the outside world suffers more illnesses associated with coronavirus. Symptoms like breathing problems, fever, dry cough, and sore throat are very sensitive to COVID-19 cases. So, it is safe to stay at home for sensitive patients. Hypertension, headache and gastrointestinal are not the serious illness for Alzheimer’s patients. So, symptoms with these minor problems remain in a less risky position.
266
P. Karmaker and M. S. Rahim
Finally, the findings after the correlation study are given below: Finding 1: AD Patients with direct contact with outside world mostly suffer from COVID19. Finding 2: AD Patients with serious chronic illness with COVID-19 symptoms also suffer from infection in spite of no direct contact with outside world. Finding 3: AD Patients with only diabetes, hypertension with mild COVID-19 symptoms also suffer from infection. But, the cases are not that significant. Finding 4: AD Patients with headache, fatigue, and gastrointestinal problems are less likely to suffer from COVID-19 in spite of direct contact with outside. Finding 5: AD Patients are suffering low COVID-19 positive rate (in rare cases).
References 1. WHO: Coronavirus disease (COVID-19). https://www.who.int/emergencies/diseases/novelcoronavirus-2019. Accessed 17 Oct 2022 2. Wang, Q., Davis, P.B., Gurney, M.E., Xu, R.: COVID-19 and dementia: analyses of risk, disparity, and outcomes from electronic health records in the US. Alzheimer’s Dement. 17(8), 1297–1306 (2021) 3. Wiley, J.: Alzheimer’s disease facts and figures. Alzheimer’s Dement. 17, 327–406 (2021) 4. Bzdok, D., Altman, N., Krzywinski, M.: Statistics versus machine learning. Nat. Methods 15(4), 233–234 (2018) 5. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinform. 18(5), 851–869 (2016) 6. Laguarta, J., Hueto, F., Subirana, B.: COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J. Eng. Med. Biol. 1, 275–281 (2020) 7. Jesmin, S., Kaiser, M.S., Mahmud, M.: Artificial and internet of healthcare things based Alzheimer care during COVID 19. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) Brain Informatics. BI 2020. LNCS, vol. 12241, pp. 263–274. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6_24 8. Villavicencio, C.N., Macrohon, J.J., Inbaraj, X.A., Jeng, J.H., Hsieh, J.G.: Development of a machine learning based web application for early diagnosis of COVID-19 based on symptoms. Diagnostics 12(4), 821 (2022) 9. Akter, S., et al.: AD-CovNet: an exploratory analysis using a hybrid deep learning model to handle data imbalance, predict fatality, and risk factors in Alzheimer’s patients with COVID19. Comput. Biol. Med. 105657 (2022) 10. Alassafi, M.O., Jarrah, M., Alotaibi, R.: Time series predicting of COVID-19 based on deep learning. Neurocomputing 468, 335–344 (2022) 11. Alorini, G., Rawat, D.B., Alorini, D.: LSTM-RNN based sentiment analysis to monitor COVID-19 opinions using social media data. In: ICC 2021-IEEE International Conference on Communications, pp. 1–6. IEEE (2021) 12. Madhavan, M.V., Khamparia, A., Gupta, D., Pande, S., Tiwari, P., Hossain, M.S.: Res-CovNet: an internet of medical health things driven COVID-19 framework using transfer learning. Neural Comput. Appl. 1–14 (2021)
MRI Based Automated Detection of Brain Tumor Using DWT, GLCM, PCA, Ensemble of SVM and PNN in Sequence Md. Sakib Ahmed1 , Sajib Hossain1 , Md. Nazmul Haque1 , M. M. Mahbubul Syeed2 , D. M. Saaduzzaman1 , Md. Hasan Maruf1 , and A. S. M. Shihavuddin2(B) 1
2
Green University of Bangladesh (GUB), Begum Rokeya Sarani, Dhaka 1207, Bangladesh {saaduzzaman,maruf}@eee.green.edu.bd Independent University, Bangladesh (IUB), Bashundhara R/A, Dhaka, Bangladesh {mahbubul.syeed,shihav}@iub.edu.bd Abstract. Challenging, iterative, error-prone and time-consuming process it is to classify, segment, and detect the area of infection in MRI images of brain tumors. Moreover, to visualize and numerically quantify the properties of the structure of the abnormal human brain even with sophisticated Magnetic Resonance Imaging techniques requires advanced and expensive tools. MRI can better differentiate and clarify the neuronal architecture of the human brain compared to other imaging methodologies. In this study, a complete pipeline is proposed to classify abnormal structures in the human brain from MRI images that might be early signs of tumor formation. The proposed pipeline consists of noise reduction techniques, gray-level matrix (GLCM) extraction features, segmentation of DWT-based brain tumor areas reducing complexity, and Support Vector Machine (SVM) using the Radial Basis Function kernel (RBF) in ensemble with PNN for classification. SVM and PNN in combination provide a data-driven prediction model of the possible existence and location of a brain tumor in MRI images Experimental results achieved nearly 99% accuracy in identifying healthy and tumorous tissue based on structure from brain MRI images. The proposed method together with comparable accuracy is reasonably lightweight and fast compared to the other existing deep learning-based methods. Keywords: pre-processing · image segmentation · DWT · GLCM · PCA · MRI tumor Classification · Feature extraction · Support vector machine · Probabilistic Neural Network
1
Introduction
In the field of digital images [1], each pixels in combination holistically represents the information about the object of interest. With the recent availability c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 267–279, 2023. https://doi.org/10.1007/978-3-031-34619-4_22
268
Md. S. Ahmed et al.
computation intensive hardware at lower costs, deep learning based solutions have taken over the AI based problem solving. However, the energy cost of such solutions dues to heavy and inefficient computations during training is still alarming. For some specific problems, utilizing prior knowledge from field experts, simple image processing pipeline can be produced to generate effective yet computationally light solutions. In this work, we have addressed Magnetic Resonance Image (MRI) based automated detection of Brain Tumor problem that can enable clinical professionals to deliver early detection and corresponding healthcare to patients [1,2]. This study uses the gray-level co-occurrence matrix (GLCM) feature extraction [3] and Support Vector Machine (SVM) classification [25] to effectively reveal the targeted identification and classification of normal and cancerous tissues from MR images [5]. Brain tumors develop an abnormal brain cancerous tissue that is uncontrolled. There may be a benign and malignant brain tumor [7]. The benign is structurally identical and contains cancer cells that are not involved. The malignant tumor has structures that are non-uniform and includes active cancer cells which migrate between different sections [6]. The grading system scales used range from Grade I to Grade IV, which, according to the World Health Organization, classify benign and malignant tumors [11]. In this case, Grades I and II in general represents lower-grade tumors and grades III and IV represents higher-grade tumors respectively. At any age, brain tumors can be formed in adult human brains. It may not be the same impact on each individual. Because of such a complex human brain structure, it is challenging to diagnose the tumor area in the brain [8,11]. Malignant grade I and IV tumors may grow very rapidly. It can frequency affect healthy human brain cells caused by lifestyle, external environmental factors and can easily spread to the other parts of the human brain or corresponding spinal cord, thereby be harmful and untreated. The identification and classification of such brain tumors at an early stage can, therefore, be of upmost importance for the right diagnosis and cure procedure selection [9]. It allows a professional physician to monitor and track the incidence and development of tumor-affected areas at different stages by improving the latest imaging techniques so that they can make accurate diagnoses by scanning these images [10]. Based on these facts the maximum suitable therapy, radiation, surgical procedure, or chemotherapy may be determined. As a result, it is clear that early detection of a tumor can significantly increase the chance of a tumor-infected patient surviving [10,11]. Using imaging methods, segmentation is used to evaluate the part of the tumor affected [7]. Segmentation is the process of dividing an object into component parts that share similar properties such as color, texture, contrast, and borders [6,7]. Brain tumors are among the leading causes of death in humans in recent decade. It is clear that if the tumor is recognized early and detected effectively, the chances of survival may increase. Invasive approaches include usual technique procedures for identifying and classifying benign (non-cancer) and malignant (cancer) brain tumors, Biopsy, lumbar puncture, and movements of the spinal tap. A computer-assisted diagnostic algorithm [12] was developed
MRI Based Automated Detection of Brain Tumor
269
to replace traditional invasive devices and time consuming techniques in order to improve the brain tumor diagnosis in terms of accuracy and precision. This article also provides a high- quality intelligent feature based tumor classification technique where specific magnetic resonance (MR) pixels are classified into common Genius tumors, non-cancerous (benign) and cancerous tumors letters. The proposed method consists of three steps: (1) wavelet decomposition, (2) texture extraction, and (3) classification. Discrete Wavelet Transform is being used in many other literature to decompose the MR image into used specified unique segments and approximate coefficients after which structural data such as forces are obtained. The proposed strategy has been applied to regular MR images and it is observed that the classification accuracy of the usage of probabilistic neural networks in ensemble with SVM is nearly 99%. The main contributions of this work are the following: – Development of a light weight texture feature based segmentation and ensembie of SVN and PNN based classification of Brain Tumor from MRI with as per accuracy as heavy duty deep learning based approaches. – Because of being feature based, the proposed method can achieve higher accuracy with comparatively lesser number of training sets concerning deep learning based methods. Also the proposed method is generalized and adaptable to similar problems at ease. – The proposed features are explainable with the biological significance occurring in the brain due to tumor formation which is still a matter research in the field of deep learning.
2
Dataset
The dataset used in this work is assembled from axial T2-weighted MR picture & 256 albums. 256 in-plane resolution were collected from publicly available dataset and some local ones. Higher contrast images are T2 & best visuals are between the T1 & PET modalities whereas T2 models were preferred in this case. The unnatural data set of brain MR picture consist the following diseases such as Alzheimer, Sarcoma, Huntington, etc. From the datasets selected, images were randomly selected to generate the training, testing and validation sets and perform cross validation.
3
Proposed Method
The proposed method for the classification of the Brain Tumor composed of five major steps: Extraction of features using DWT and GLCM, Selection of features using PCA, Detection of tumors and using PNN-RBF Network identification as illustrated in the Fig. 1.
270
Md. S. Ahmed et al.
Fig. 1. Three stage diagram of the proposed method for the MRI image processing and classification.
3.1
Pre-processing
The first step in the pre-processing of image processing is filtration and denoise of images. Denoising is done using orders of filters such as Anisotropic diffusion to remove induced noise that can may influence the image quality and important feature for the identification of tumorous tissues. This noises in general appears into the image during acquisition, transmission or compression dues to limitations of sensors and external influences. An example of denoised image is being illustrated in Fig. 2. The overall performance is sensitive to the denoising algorithm using in the prepossessing steps. Other methods as low pass filter, Gaussian smoothing, edge preserving smoothing can also be useful in this case. 3.2
Segmentation
In this work, with segmentation the process of separating an image into different parts where with segmented parts each pixel has similar characteristics is being performed with classical approach. Image segmentation [14,15] can be performed through wide variety of approaches as shown in Fig. 3. Deep learning based approaches have consistently performed very well in recent years however it required thousands of examples of positive samples to train from. In cases of Brain tumors these many examples are very expensive to gather and validate. In this work, we approached the same problem with feature based methods as it provides more interpretability, low training samples requirements and faster training with equally reliable performances as reported from this experiments.
MRI Based Automated Detection of Brain Tumor
271
Fig. 2. Example of Original acquisition image and noised reduced version of the original image.
Local consistency of the initial segmentation is maintained using morphological operations and with some prior knowledge incorporated with it.
Fig. 3. Image segmentation types
3.3
Feature Extraction
In this work, the feature extraction part is very important as the classification had been performed based on that information. In this segment of the work, the aim was to extracting quantitative information from MR images resembling texture, form, and contrast. In this work, discrete wavelets transform (DWT) [16] is being used for extracting wavelet coefficients in concatenation with graylevel co-occurrence matrix (GLCM) [17] for locally distributed inherent texture feature extraction. Five different levels were created by dividing the images of the MRI making sure that the identical coefficients of the LL & HL bands were picked. Such sub-bands have been attained from the disarranged wavelet; statistic textural characteristics such as strength, correlation, entropy & same homogeneity have also been extracted from the GLCM.
272
Md. S. Ahmed et al.
DWT Features. As a function vector, the proposed system makes use of the coefficients of the Discrete Wavelet Transform (DWT) [23]. The wavelet is an effective mathematical apparatus for extracting facets and find the wave coefficient from a picture we used MR. Wavelets are localized fundamental features that are some constant mom wavelets scaled and shifted versions. Wavelets have the primary advantage of imparting localized frequency records about a signed variable, which is extremely useful for ranking. A continuous wavelet transformation of ax(t) signal, a square-integrable characteristic relative to a real- value wavelet, t is defined as a indispensable evaluation of Wavelet Decomposition. Wavelet a, b are defined by translation and dilation from the mother’s wavelet, wavelet, dilation factor. The base wavelet responds to fact that this wave forms can have zero means in possibility. This function can be utilized effectively by means of limiting a & b representing a discrete shape that provides the discrete wavelet as illustrated in Fig. 4 and 5.
Fig. 4. Discrete wavelet decomposition steps form input images during feature extraction process
Here, Ho D is High Pass Filter and Lo D is Low Pass Filter. T is Daubechies wavelet transform. In this work, the overall performance of HL sub-bands was once higher than that of the LL sub-band features. Therefore, a five- level decomposition of the use of Daubechies wavelet was once decided in this procedure and the traits were extracted from sub- bands LH and HL generated the usage of DWT. GLCM Features. Based on local texture features extracted by Gray Level Co-occurrence Matrix (GLCM), it is possible distinguish between Regular and anomalous tissue with reliable accuracy [21,22]. GLCM texture features also structurally contrast malignant tissue to Regular tissue, that can be often challenging for even human experts. Texture based automated analysis is therefore regularly used in computer-assisted pathology providing supplement strategies for biopsy. Even in deep learning based training, the mid level features that are extracted while training are mainly the local texture features from the object surfaces [24]. GLCM in general calculates a specific gray level frequency in any optical image region & with taking the correlations between neighboring pixels into consideration. Texture records are primarily base the probability of discovering a pair of grey ranges at predefined distances and angles across a whole body. Using the Gray Level Co-occurrence Matrix (GLCM), additionally acknowledged
MRI Based Automated Detection of Brain Tumor
273
Fig. 5. Block diagram for the extraction and reduction of the DWT features used in this work
as the Gray Level Spatial Dependence Matrix (GLSDM), statistical aspects of the MR snapshots are obtained. Haralick’s GLCM is a statistical technique that can give an explanation for the relation of pixel of particular gray level. GLCM is a two-dimensional histogram where i and j are the variables in prevalence frequency i with j. Using GLCM, the statistic field elements such as contrast, energy, entropy, homogeneity, correlation, shade, etc. which are the prominent interrelationship and elasticity are extracted. Homogeneity and entropy had been also been extracted from the first five wavelet decomposition degrees of the LH and HL sub-bands. 3.4
Feature Selection Using PCA
Principal component analysis (PCA) is an well established and widely used method for dimensionalilty reduction. In PCA, the linear data seeks the lowerdimensional feature matrix representation in a way that preserves replicated data variance keeping them orthogonal to each other. Lower dimensional feature vector representation reduces the redundancy in the data and also as a consequence enhances the classification accuracy. A feature vector calculated from DWT and GLCM in concatenation going through a vector component analysis using a feature reduction system to chosen dimension size by the PCA results in an effective classification algorithm as illustrated in Fig. 6. Each features are normalized before applying for dimension reduction through PCA. For normalization, we converter each feature vectors across the samples into the range of -1 to 1. We normalized the values using minimum and maximum values of that particular feature across samples with the following Eq. 1. The performance of the PCA is also sensitive to the number of dimension it is projected to. With this parameter optimization, a better dimension reduction can be approached maintaining the orthogonality of the projected features. F eatnorm = 2 ∗ (
F eat − F eatmin − 0.5) F eatmax − F eatmin
(1)
Feature extraction is the determination matter of choosing subsets from the set of variables that demonstrates the behavior of the entire set. choosing the
274
Md. S. Ahmed et al.
Fig. 6. Block diagram for the extraction and reduction of the feature used
helpful variables and discarding the inapplicable ones. As an input vector, the extracted feature vectors were used to train & evaluate the output of the PNN networks [18] for corresponding classification task. The statistic field characteristics of these feature vectors are shown in Table 1. Table 1. Gray-level co-occurrence matrix of trained MRI images representing corresponding statistical field MRI ID Contrast Correlation Energy
3.5
Homogeneity Entropy
1
0.38654
0.08050
0.815681 0.90000
2
0.36012
0.17250
0.81607
3
0.265017 0.1248
0.778082 0.938330
2.87463
4
0.315628 0.0928
0.796363 0.940348
2.67392
5
0.365406 0.1353
0.808549 0.9447
2.51157
6
0.285873 0.1143
0.758115 0.930993
2.88055
7
0.299221 0.1129
0.78455
0.938665
2.91007
8
0.267241 0.1246
0.778839 0.936707
2.91136
0.94738
2.32236 0.24365
9
0.2744
0.1381
0.79678
0.9325
2.2727
10
0.2812
0.1477
0.8074
0.8635
2.8837
Brain Tumor Classification
Classification of images is a process of extracting data classes from raster images of multi bands. There are basically three types of classification: per pixel, per subpixel, and per object. This study focuses on pixel-scale image classification [19], which can be divided into three groups: supervised classifier (user manual),
MRI Based Automated Detection of Brain Tumor
275
unsupervised classifier [20] (calculated by the software) are the two most common approaches, but analytical object-based images are rare and the most recent technique as mentioned above, and high-resolution images are used as input for the analysis. this art. Figure 7 depicts, from various points of view, the various types of classical object classification methods available in the literature.
Fig. 7. Different types of techniques for image recognition
Ensemble of Support Vector Machine (SVM) and Probabilistic Neural Network (PNN) for Classification: Support vector machines is very powerful tool for feature based support vector driven system for classification. Before deep learning took over the state of the art, SVM used to be one of the top contender in providing the best accuracy for varied classification challenges. The main assumption behind the SVM is it tries to set the class boundary using predefined kernels keeping more weights on the class examples need the boundaries. Together with SVM, a Probabilistic Neural Network (PNN) is trained based on feature mapping and used in ensemble method [30] for final classification prediction.
4
Results
The tests were performed with a standard Intel i5 platform run to Windows 10. For implementation, the wavelet toolbox were used to develop the feature extraction algorithm, Matlab’s 2018 bio-statistical toolbox (Mathworks), SVM toolbox, Expanded the SVM kernel were used for the classification and comparison of MR brain tumor analysis. We evaluated the accuracy with four SVMs (Linear (LIN), higher order Polynomial (HPOL), Lower order polynomial (IPOL), & Gaussian Radial Basis function (GRBF) kernels to select the best ones. The results found from the experiments showed interesting gain in performance when SVM and
276
Md. S. Ahmed et al.
PNN are used in ensemble. The classical approaches covering DWT, and other post processing performed reasonably well, however needs sophisticated feature extraction, normalization, reduction and classification approaches [26–29,31,32]. With the proposed method the MRI images are step by step in sequence are processed in efficient way to extract the optimum information in terms of features to make a robust and informed prediction providing high accuracy. The overall result is presented in Fig. 8. It illustrates that the best performance of our proposed DWT + GLCM + PCA + SVM (GRBF) + PNN system with when compared with other state of the art methods on our combined dataset, achieving the highest classification accuracy of 99%. The next one is the 98.75 percent accuracy DWT + GLCM + PCA + SVM (GRBF) method with 98.76%. All of these methods follows the same feature method principle however, with the SVM and PNN in combination we both able to extract the support vectors information and the class clusters in higher dimensional feature domain through PNN, that made it more comprehensive to achieve better results. However, for the training and ensembles, it requires more time and hardware space compared to others. Also the ensemble method needs to be further tuned for the optimized configuration. The accuracy of the qualified and checked image was measured on the basis of the classification of normal and abnormal tumour tissues. The findings of the identification of normal and abnormal tumour tissues are shown in Fig. 8. Also the features extracted along the way is very useful for better understanding of the biological process of a tumor formation and their behavioral pattern. Classification accuracy or right level is the effectiveness of acceptable classification to the total number of classification tests. The classification results rather than directly being used to diagnose the patients can also work as a suggestive system for the doctors who can make the final decision based on corresponding additional patient data. SVM and PNN captures complementary traits from the features and guides the system in combination for a more accurate results. SVM focuses more on that support vectors and features individually, whereas PNN contributes by looking more into the locally coherent features in their decision making. The ensemble methods can be further explored and tuned which is part of our future direction.
Fig. 8. Comparative results in terms of accuracy
MRI Based Automated Detection of Brain Tumor
5
277
Conclusion
In this study, we tend to used brain MRI pictures divided into healthy brain tissue and tumorous tissue. Preprocessing is used to remove noise and smooth the image. This also contributes to an improved ratio of SNR. We, therefore, used a discreet wavelet transformation that breaks down the image and extracted the characteristics from the co-occurrence matrix of gray-level (GLCM), observed with the aid of using morphological operations. SVM with RBF kernel and PNN ensemble is used as classifier to identify tumors in brain MRI images with around 99% accuracy. The final accuracy is comparatively higher in average than other light weight classical approaches that require minimal amount of positive training samples. Research results clearly state that diagnosis of brain tumor is using the proposed method can offer quicker and more reliable solution than diagnosing directly only by a medical professional. The proposed method can be a helping tool for the real life practitioners for more accurate diagnosis alongside tumor location identification.
References 1. Scholl, I., Aach, T., Deserno, T.M., et al.: Challenges of medical image processing. Comput. Sci. Res. Dev. 26, 5–13 (2011). https://doi.org/10.1007/s00450-010-01469 2. Eklund, A., Dufort, P., Forsberg, D., LaConte, S.M.: Medical image processing on the GPU - past, present and future. Med. Image Anal. 17(8), 1073–1094 (2013). https://doi.org/10.1016/j.media.2013.05.008. ISSN 1361-8415 3. de Siqueira, F.R., Schwartz, W.R., Pedrini, H.: Multi-scale gray level co-occurrence matrices for texture description. Neurocomputing 120, 336–345 (2013). https:// doi.org/10.1016/j.neucom.2012.09.042. ISSN 0925-2312 4. Wu, S.G., Bao, F.S., Xu, E.Y., Wang, Y., Chang, Y., Xiang, Q.: A leaf recognition algorithm for plant classification using probabilistic neural network. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 11–16 (2007). https://doi.org/10.1109/ISSPIT.2007.4458016 5. Ji, Z., Liu, J., Cao, G., Sun, Q., Chen, Q.: Robust spatially constrained fuzzy C-means algorithm for brain MR image segmentation. Pattern Recogn. 47(7), 2454–2466 (2014). https://doi.org/10.1016/j.patcog.2014.01.017. ISSN 0031-3203 6. Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015). https://doi.org/ 10.1109/TMI.2014.2377694 7. I¸sın, A., Direko˘ glu, C., S ¸ ah, M.: Review of MRI-based brain tumor image segmentation using deep learning methods. Procedia Comput. Sci. 102, 317–324 (2016). https://doi.org/10.1016/j.procs.2016.09.407. ISSN 1877-0509 8. Othman, M.F., Basri, M.A.M.: Probabilistic neural network for brain tumor classification. In: 2011 Second International Conference on Intelligent Systems, Modelling and Simulation, pp. 136–138 (2011). https://doi.org/10.1109/ISMS.2011.32 9. Abiwinanda, N., Hanif, M., Hesaputra, S.T., Handayani, A., Mengko, T.R.: Brain tumor classification using convolutional neural network. In: Lhotska, L., Sukupova, L., Lackovi´c, I., Ibbott, G.S. (eds.) World Congress on Medical Physics and Biomedical Engineering 2018. IP, vol. 68/1, pp. 183–189. Springer, Singapore (2019). https://doi.org/10.1007/978-981-10-9035-6 33
278
Md. S. Ahmed et al.
10. Bondy, M.L., et al.: Brain tumor epidemiology: consensus from the brain tumor epidemiology consortium. Cancer 113, 1953–1968 (2008). https://doi.org/10.1002/ cncr.23741 11. Anaraki, A.K., Ayati, M., Kazemi, F.: Magnetic resonance imaging-based brain tumor grades classification and grading via convolutional neural networks and genetic algorithms. Biocybern. Biomed. Eng. 39(1), 63–74 (2019). https://doi.org/ 10.1016/j.bbe.2018.10.004. ISSN 0208-5216 12. Syeed, M., Lindman, J., Hammouda, I.: Measuring perceived trust in open source software communities. In: Balaguer, F., Di Cosmo, R., Garrido, A., Kon, F., Robles, G., Zacchiroli, S. (eds.) OSS 2017. IAICT, vol. 496, pp. 49–54. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57735-7 5 13. Gupta, M., Taneja, H., Chand, L., Goyal, V.: Enhancement and analysis in MRI image denoising for different filtering techniques. J. Stat. Manag. Syst. 21(4), 561– 568 (2018) 14. Kaur, D., Kaur, Y.: Various image segmentation techniques: a review. Int. J. Comput. Sci. Mob. Comput. 3(5), 809–814 (2014) 15. Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3523–3542 (2021) 16. Ramya, J., Vijaylakshmi, H.C., Saifuddin, H.M.: Segmentation of skin lesion images using discrete wavelet transform. Biomed. Signal Process. Control 69, 102839 (2021) 17. Riesaputri, D.F., Sari, C.A., Rachmawanto, E.H.: Classification of breast cancer using PNN classifier based on GLCM feature extraction and GMM segmentation. In: 2020 International Seminar on Application for Technology of Information and Communication (iSemantic), pp. 83–87. IEEE (2020) 18. Virmani, J., Singh, G.P., Singh, Y.: PNN-based classification of retinal diseases using fundus images. In: Sensors for Health Monitoring, pp. 215–242. Academic Press (2019) 19. Alvarez, M.A., Theran, C.A., Arzuaga, E., Sierra, H.: Analyzing the effects of pixel-scale data fusion in hyperspectral image classification performance. In: Algorithms, Technologies, and Applications for Multispectral and Hyperspectral Imagery XXVI, vol. 11392, p. 1139205. International Society for Optics and Photonics (2020) 20. Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A., Aljaaf, A.J.: A systematic review on supervised and unsupervised machine learning algorithms for data science. Supervised and Unsupervised Learning for Data Science, pp. 3–21 (2020) 21. Shihavuddin, A.S.M., Gracias, N., Garcia, R., Gleason, A.C., Gintert, B.: Imagebased coral reef classification and thematic mapping. Remote Sens. 5(4), 1809–1841 (2013) 22. Shihavuddin, A.S.M., Gracias, N., Garcia, R., Escartin, J., Pedersen, R.B.: Automated classification and thematic mapping of bacterial mats in the north sea. In: 2013 MTS/IEEE OCEANS-Bergen, pp. 1–8. IEEE (2013) 23. Kociolek, M., Materka, A., Strzelecki, M., Szczypi´ nski, P.: Discrete wavelet transform-derived features for digital image texture analysis. In: International Conference on Signals and Electronic Systems, vol. 2 (2001) 24. Ghosh, S., Das, N., Das, I., Maulik, U.: Understanding deep learning techniques for image segmentation. ACM Comput. Surv. (CSUR) 52(4), 1–35 (2019) 25. Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565– 1567 (2006)
MRI Based Automated Detection of Brain Tumor
279
26. Zhang, Y.D., Wu, L.: An MR brain images classifier via principal component analysis and kernel support vector machine. Progress Electromagnet. Res. 130, 369–388 (2012) 27. Chaplot, S., Patnaik, L.M., Jagannathan, N.R.: Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed. Signal Process. Control 1(1), 86–92 (2006) 28. Zhang, Y., Wang, S., Wu, L.: A novel method for magnetic resonance brain image classification based on adaptive chaotic PSO. Progress Electromagnet. Res. 109, 325–343 (2010) 29. El-Dahshan, E.S.A., Hosny, T., Salem, A.B.M.: Hybrid intelligent techniques for MRI brain images classification. Digit. Signal Process. 20(2), 433–441 (2010) 30. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https:// doi.org/10.1007/3-540-45014-9 1 31. Varuna Shree, N., Kumar, T.N.R.: Identification and classification of brain tumor MRI images with feature extraction using DWT and probabilistic neural network. Brain Inform. 5(1), 23–30 (2018) 32. Mathur, Y., Jain, P., Singh, U.: Foremost section study and kernel support vector machine through brain images classifier. In: International Conference of Electronics. Communication and Aerospace Technology (ICECA), pp. 559–562 (2017). https://doi.org/10.1109/ICECA.2017.8212726
Pattern Recognition and Natural Language Processing
Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets Ferdib-Al-Islam(B) , Nusrat Jahan, Farjana Yeasmin Rupa, Suprio Sarkar, Sifat Hossain, and Sk. Shalauddin Kabir Northern University of Business and Technology, Khulna, Bangladesh [email protected]
Abstract. The world has faced a massive health emergency due to the prompt transmission of corona-virus (COVID-19) over the last two years. Since there is no specific treatment for COVID-19, infections have to be limited through prevention methods. Wearing a face mask is an effective preventive method in public areas. However, it is impractical to manually implement such regulations on big locations and trace any infractions. Automatic face mask detection facilitated by deep learning techniques provides a better alternative to this. This research introduced an automatic face detection system using ASUS Tinker single-board computer and MobileNetV2 model. As most of the publicly available face mask detection dataset was artificially generated, in this work, a real face mask detection dataset was first created consisting of a total of 300 images. The ASUS Tinker board’s model training and testing performance and training time have been assessed for this dataset and a publicly accessible dataset of 1376 images. The recommended system reached 99% of test accuracy, precision, recall, and f1-score for the newly collected dataset and 100% of test accuracy, precision, recall, and f1-score for the publicly available dataset. Keywords: COVID-19 · Face Mask Detection · Deep Learning · MobileNetV2 · ASUS Tinker
1 Introduction COVID-19 emerged suddenly in 2019 and has had a worldwide impact as it has infected over 259,502,031 people globally and killed over 5,183,003 people as of November 27, 2021 [1]. This figure is continuously rising. The World Health Organization (WHO) has recognized Coronavirus as primarily characterized by fever, dry cough, exhaustion, diarrhea, and loss of taste and smell [2]. Numerous prophylactic measures have been taken to counteract COVID-19. The most significant safeguards are washing hands frequently, keeping a safe distance, wearing a protective face mask, and actively avoiding touching the face being the most straightforward. COVID-19 is a contagious illness that may be prevented by correctly using a face mask. COVID-19 may be warded off by © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 283–293, 2023. https://doi.org/10.1007/978-3-031-34619-4_23
284
Ferdib-Al-Islam et al.
maintaining a tight social distance and wearing masks. However, many are not adhering to the restrictions, which contributes to the infection’s spread. Identifying individuals who are not adhering to the recommendations and notifying relevant authorities may assist in halting the contamination of the Coronavirus. According to the WHO, the correct technique to wear a mask is to adjust it to cover the mouth, nose, and chin [3]. If masks are not worn properly, protection is severely decreased. Security officers are currently stationed in public areas to advise individuals to wear masks. However, this method exposes the guards to virus-infected air, causes overcrowding at the doors due to its inefficiency, and the guards can also contaminate COVID-19 to the fresh people as it is highly contagious. As a result, a quick and effective solution is required to solve the situation. Face mask detection is used to decide that someone is not wearing a mask. Detecting anything in an image is analogous. Deep learning algorithms are widely used in face mask detection and other image classification applications [4–7]. These algorithms can be applied to detect a mask on a person’s face in real-time. The main problem in face mask detection research is that most publicly available datasets were created artificially based on available face detection datasets and the performance issues in real-time implementation. In this research, a face mask detection dataset from real people images has been created with 300 images. Then, another publicly available dataset has also been used to analyze the performances in face mask detection (with facemask vs. without facemask) using the MobileNetV2 model. The trained model was deployed for real-time inference in ASUS Tinker Single Board Computer (SBC). The paper is arranged as per following: the section “Literature Review” summarizes recent research on face mask detection, the section “Methodology” details the materials and methodology used to conduct this research, the “Result and Discussion” section outlines the study’s findings, and finally, the “Conclusion” section summarizes the conclusion and recommendation.
2 Literature Review Many researchers proposed different methods to detect a man wearing a mask by machine learning, deep learning, and computer vision algorithms. In this section, the previous research on face mask detection and the shortcoming of those works have been discussed. Nagrath et al. [8] developed a face mask detection system with SSDMNV2. In this approach, a single shot-multibox detector was used to recognize the face, and MobileNetV2 was used to perform real-time face mask detection. Their work achieved 92.64% accuracy and an F1-score of 93%. The “Properly Wearing Masked Face Detection Dataset” (PWMFD) was recommended by Jiang et al. [9]. In this dataset, they used a total number of 9205 images with three sections. They also proposed a mask detector model known as Squeeze and Excitation (SE)-YOLOv3 and achieved a higher detection speed. By super-resolution and classification networks (SRCNet), Qin et al. [10] have designed an automatic categorization method of facemask-wearing conditions using 3835 images of the public mask dataset. The dataset was divided into three categories: no mask-wearing (671 images), correct mask-wearing (3030 images), and incorrect mask-wearing (134 images). Finally, the proposed method got an accuracy of
Performance Analysis of ASUS Tinker and MobileNetV2
285
98.70%. Li et al. [11] used “Celebi and Wider Face Databases” for the training section and FDDB database for evaluation by YOLOv3 algorithm based on Darknet_19 architecture. This system achieved 93.9% accuracy. To automatically identify face masks, Chowdary et al. [12] proposed a transfer learning model developed by InceptionV3. In that method, they used the “Simulated Masked Face Dataset” (SMFD) for training and testing consisting of 785 images of unmasked facial and 785 images of masked facial and model achieved 99.9% accuracy. Sethi et al. [13] suggested a face masked detection model by joining single-stage and double-stage detectors for detecting whether people are wearing a mask or not. They used a large dataset containing 45000 images and three popular deep learning models - AlexNet, MobileNet, and ResNet50. After the evaluation, the recommended model with ResNet50 obtained the highest accuracy of 98.2%. Kayali et al. [14] proposed a system for face mask detection using two popular deep learning-based ResNet50 and NASNetMobile networks. The “Labeled Faces in the Wild” (LFW) dataset by adding face masks was used in their proposed system, and the ResNet50 model got 92% face mask detection accuracy.
3 Methodology The significant steps in the implementation process have been classified as follows: • • • •
ASUS Tinker SBC Preparation Dataset Creation and Preprocessing Image Augmentation Training of MobileNetV2 and Classification
3.1 ASUS Tinker SBC Preparation ASUS Tinker SBC is part of a new generation of more capable maker tools. As a singleboard computer, it provides a builder with many alternatives. It can programmatically manipulate hardware and run customized operating systems for particular purposes. ASUS’s tinker board is another significant player in the SBC PC market. The emulation procedure is much smoother when used with its Rockchip RK288 GPU, resulting in pure master performance as this comes with Mali-T764 GPU and 2GB DDR3 memory. The non-shared GBit LAN port for increased performance, the upgradeable embedded shielded Wi-Fi for reliable IoT and network connectivity, and the highly functional PCB and configuration with HD Audio-192/24bit audio, HD and UHD of accelerated (4K) video playback. Tinker Board Debian OS (V2.1.11) as the operating system was used in this research [15]. Then the necessary programming environment for the implementation like – TensorFlow, Keras, Imutils, OpenCV, Scikit-learn, Matplotlib, Seaborn has been installed in the system. 3.2 Dataset Creation and Preprocessing Two different face mask detection datasets have been used in this research to analyze the ASUS Tinker SBC and MobileNetV2 model. One of the datasets has been created
286
Ferdib-Al-Islam et al.
(a)
(b) Fig. 1. Sample images of (a) people with face masks and (b) without face masks from the created dataset
by the authors of this paper. This dataset is available for download upon submitting a request in Zenodo [16]. The dataset encompasses 300 authentic images of two classes (150 images of people with face masks and 150 images without face masks). The other dataset contains 1376 images (690 people with face masks and 686 images without face masks), made available by Prajna Bhandary on GitHub [17]. She developed the dataset by collecting normal photos of faces and masking them using a programming script after detecting facial landmarks. The sample images from both datasets have been illustrated in Fig. 1.
Performance Analysis of ASUS Tinker and MobileNetV2
287
In the preprocessing step, the images were converted to 224 × 224 shapes. Then, images were converted into 8-bit integers and stored in the array. Normalization was performed in the images. By doing so, image pixel values were scaled between -1 and 1. One-hot encoding has been performed for the image labels. The datasets have been partitioned using the percentage split technique where the training and test set ratio was 80:20. The validation set was constructed from the test set. 3.3 Image Augmentation The term “image augmentation” denotes a variety of strategies for generating “new” training examples from prevailing ones by introducing random resonances and distortions (but without altering the image’s class labels) [18]. Generally, a deep learning model performs well when fed a large quantity of data. Image augmentation is an effective strategy when a large volume of data is unavailable for developing a deep learning model. There are generally five different techniques for performing image augmentation. During training, on-the-fly image mutations (image augmentation) were performed to both datasets to increase generalization using Keras “ImageDataGenerator”. The parameters of image augmentation are enumerated in Table 1. Table 1. Parameter details of image augmentation Parameter Name
Selected Value
rotation_range
20
shear_range
0.15
zoom_range
0.15
height_shift_range
0.2
width_shift_range
0.2
horizontal_flip
True
fill_mode
“nearest”
3.4 Training of MobileNetV2 and Classification MobileNetV2 architecture is a variant of convolutional neural networks generally used in mobile devices [19]. The MobileNetV2 concept depends on a reversed residual structure, with the residual blocks’ input and output being thin bottleneck layers. Additionally, it employs lightweight convolutions to filter the expansion layer’s characteristics. Finally, non-linearities are eliminated in the narrow layers. The architecture of MobileNetV2 has been demonstrated in Fig. 2. In this research, a fine-tuned simple MobileNetV2 model has been used without retraining the whole model. Fine-tuning is accomplished by unfreezing some of the top layers of a fixed model base and simultaneously training the novel auxiliary classifier layers and the base model’s concluding layers. It permits
288
Ferdib-Al-Islam et al.
Fig. 2. Architecture of MobileNetV2 [19]
“fine-tuning” of the underlying model’s higher-order extracted features to make them more appropriate for the precise task. The old head of the MobileNetV2 was replaced by constructing a new fullyconnected (FC) head and appending it to the base. Then, the base layers of MobileNetV2 were frozen. As a result, the weights of the base layers will not be changed during backpropagation, but the weights of the head layer will be adjusted. In this work, MobileNetV2 architecture was modified by adding a pooling layer of 7 × 7 kernel size using an “AveragePooling2D” layer, which was connected to a flatten layer. A dense layer with the “relu” activation was next appended. A dropout of 0.5 was chosen to prevent the model from overfitting. A fully connected dense layer with “softmax” activation was used to classify. MobileNet V2 model was instantiated and was pre-loaded with the weights trained on ImageNet. For training the head of MobileNetV2, batch size was 32; the learning rate was 0.0183, the epoch was 10; adam optimizer was chosen for optimization; binary cross-entropy was used as loss function.
4 Result and Discussion This system’s training, testing, and inference have been accomplished on ASUS Tinker SBC. The evaluation of the implemented system’s performance was accomplished using distinct performance metrics – accuracy, precision, recall, and f1-score. The comprehensive classification report of the MobileNetV2 model for both datasets has been represented in Table 2. The training time for Prajna Bhandary’s dataset [17] was 358s, and the dataset collected by this paper’s authors was 301s. The accuracy for the datasets presented in [17] and [16] were 100% and 99% correspondingly. Table 2. Classification report of MobileNetV2 model for both datasets Dataset Name
Class Name
Training Time (sec.)
Accuracy (%)
Precision (%)
Recall (%)
F1-Score (%)
Dataset by Prajna Bhandary [17]
with_mask
358
100
100
100
100
100
100
100
Dataset by authors of this work [16]
with_mask
301
99
98
100
99
100
98
99
without_mask
without_mask
Performance Analysis of ASUS Tinker and MobileNetV2
289
(a)
(b) Fig. 3. Loss vs. Accuracy graph of training and validation ser for (a) Prajna Bhandary’s dataset and (b) authors’ collected dataset
290
Ferdib-Al-Islam et al.
The training reports (loss vs. accuracy) have been demonstrated in Fig. 3. The loss and accuracy for the training and validation set have been observed for ten epochs as it got maximum. The implementation of the system has been illustrated in Fig. 4 using an A4Tech web camera and ASUS Tinker SBC. The web camera was connected to the ASUS Tinker SBC through USB2.0.
A4Tech USB Web Camera
ASUS Tinker SBC Fig. 4. Proposed system’s experimental hardware setup
The average inference time for the video was ~ 1.6 s. A sample inference from the implemented system has been illustrated in Fig. 5 for both conditions with confidence. The performance of the MobileNetV2 model for both datasets has been compared to the previous works described in Table 3. It can be seen that the proposed system with the frozen MobileNetV2 model performed better than the previous works.
Performance Analysis of ASUS Tinker and MobileNetV2
291
(a)
(b) Fig. 5. Real-time inference on ASUS Tinker SBC for (a) person with face mask, and (b) person without face mask condition
292
Ferdib-Al-Islam et al. Table 3. Comparison of the proposed system with previous works
Author
Method
Dataset Size
Accuracy (%)
Nagrath et al. [8]
SSDMNV2 + MobileNetV2
5521
92.64
Qin et al. [10]
SRCNet
3835
98.7
Li et al. [11]
YOLOv3 + DarkNet-19
5171
93.9
Sethi et al. [13]
ResNet50
45000
98.2
Kayali et al. [14]
ResNet50
13233
92
This Work
MobileNetV2
1376
100
300
99
5 Conclusion The pandemic due to COVID-19 has compelled most nations to mandate wearing face masks. Observing the face mask manually in crowded areas is a critical duty. Developing this system for detecting if the person is wearing a face mask or not would greatly assist the authorities. The suggested embedded vision-based system may be used in any work setting, including public places, stations, corporate environments, streets, retail malls, and test centers, where precision and sensitivity are critical for the task at hand. This research eliminates the research gaps in the previous studies with superior performance (99% of test accuracy, precision, recall, and f1-score for the authors’ created dataset and 100% of test accuracy, precision, recall, and f1-score for a publicly obtainable dataset) and the creation of the real-world dataset, which can be helpful in face mask detection research. In the future, the dataset size can be increased by collecting natural images of people; this system can be adapted for detecting people with incorrectly mask-wearing. The introduction of the Internet of Things (IoT) can benefit the authorities responsible for the compulsion to wear a face mask from remote.
References 1. COVID Live - Coronavirus Statistics – Worldometer. https://www.worldometers.info/corona virus/ 2. Ferdib-Al-Islam, Ghosh, M.: COV-doctor: a machine learning based scheme for early identification of COVID-19 in patients. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds.) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol. 95, pp. 39–50 Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-6636-0_4 3. When and how to use masks. https://www.who.int/emergencies/diseases/novel-coronavirus2019/advice-for-public/when-and-how-to-use-masks 4. Saha, P., et al.: COV-VGX: an automated COVID-19 detection system using X-ray images and transfer learning. Inf. Med. Unlocked 26, 100741 (2021) 5. Islam, M., et al.: A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. Inf. Med. Unlocked 20, 100412 (2020)
Performance Analysis of ASUS Tinker and MobileNetV2
293
6. Muhammad, L.J., Islam, M.M., Usman, S.S., Ayon, S.I.: Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery. SN Comput. Sci. 1(4), 1–7 (2020). https://doi.org/10.1007/s42979-020-00216-w 7. Liu, L., et al.: deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2019). https://doi.org/10.1007/s11263-019-01247-4 8. Nagrath, P., et al.: SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 66, 102692 (2021) 9. Jiang, X., et al.: Real-time face mask detection method based on YOLOv3. Electronics. 10(7), 837 (2021) 10. Qin, B., Li, D.: Identifying facemask-wearing condition using image super-resolution with classification network to prevent COVID-19. Sensors. 20(18), 5236 (2020) 11. Li, C., Wang, R., Li, J., Fei, L.: Face detection based on YOLOv3. In: Jain, V., Patnaik, S., Popent, iu Vl˘adicescu, F., Sethi, I.K. (eds.) Recent Trends in Intelligent Computing, Communication and Devices. AISC, vol. 1006, pp. 277–284. Springer, Singapore (2020). https://doi. org/10.1007/978-981-13-9406-5_34 12. Jignesh Chowdary, G., Punn, N.S., Sonbhadra, S.K., Agarwal, S.: Face mask detection using transfer learning of inceptionV3. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds.) BDA 2020. LNCS, vol. 12581, pp. 81–90. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-66665-1_6 13. Sethi, S., et al.: Face mask detection using deep learning: An approach to reduce risk of Coronavirus spread. J. Biomed. Inf. 120, 103848 (2021) 14. Kayali, D., et al.: Face mask detection and classification for COVID-19 using deep learning. In: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp.1–6 (2021) 15. Tinker Board | Single Board Computer | ASUS Sri Lanka. https://www.asus.com/bd/Mother boards-Components/Single-Board-Computer/All-series/Tinker-Board/ 16. Ferdib-Al-Islam., et al.: Face Mask Detection Dataset (2021). https://doi.org/10.5281/zenodo. 5305989 17. observations/experiements/data at master · prajnasb/observations. https://github.com/pra jnasb/observations/tree/master/experiements/data 18. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0 19. Sandler, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE, Salt Lake City (2018)
Fake Profile Detection Using Image Processing and Machine Learning Shuva Sen, Mohammad Intisarul Islam, Samiha Sofrana Azim, and Muhammad Iqbal Hossain(B) Department Of Computer Science and Engineering, BRAC University, Mohakhali, Dhaka 1212, Bangladesh {shuva.sen,mohammad.intisarul.islam,samiha.sofrana.azim}@g.bracu.ac.bd, [email protected] Abstract. In today’s technologically evolved society, almost everyone has a presence on social media. Making phony accounts is thus incredibly simple. One who may pose as someone else is referred to as having a “false profile.” These accounts are mostly used to malign somebody by impersonating them. Nevertheless, a phony profile can also be utilized for a number of purposes, such as inciting regional tensions, propagating false information, and publishing provocative material involving current sensitive issues. A model was suggested that can potentially help in the decrease of fraudulent profiles considering they pose such a major hazard to anyone and everyone. It can reliably pinpoint users which can be accused of being fraudulent, notably anyone without a personal image. In the suggested methodology, machine learning and image recognition were both employed to guarantee that each user has a distinct profile. The goal of this concept was to prevent users from opening accounts with someone else’s image or personally identifiable information. In hopes of preventing someone from using another user’s photograph or an image of any objects, this model additionally uses an image processing service for face recognition. In order to prevent fictitious individuals from being able to open an account by using someone else’s identify, One Time Password (OTP) technology was deployed. It is vital to define fake accounts using deep learning on a real dataset of respondents’ responses. The k-means method was applied to the dataset in order to determine the inaccurate results. The dataset was subjected to the k-means clustering algorithm, and it was discovered that the accuracy value was 75.30%.
Keywords: Fake profile media
1
· photo identification · cyber crime · Social
Introduction
In the era of the internet, social networks such as Facebook, Instagram, Twitter, and LinkedIn have become a constant necessity for our generation, allowing us to connect with people all over the world quickly and easily. Everyone, from c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 294–308, 2023. https://doi.org/10.1007/978-3-031-34619-4_24
Fake Profile Detection Using Image Processing and Machine Learning
295
millennia-ls to the elderly, is using online social networking sites to improve their lives. They use these social networking platforms to discover and make friends by making and posting personal memories, photos, videos, and chats. Students, for example, use social networking features to acquire more expertise and equip themselves with a variety of specialties. Tutors use online social networks to connect with their students and help the students in their learning process. Many businesses have used a variety of websites to market and sell their products and services online. Public agencies use social media to effectively provide government programs and keep citizens updated on different circumstances. Every kind of person is relying on social networking platforms for numerous reasons. Hence, to get connected with every other person, they need to share their personal information to create their profile which is available on social networks. Facebook is acclaimed as the most prominent and commonly utilized social network in the world, with more than 2.4 billion active members per month. The exchange of messages, photographs, and comments enable connections between people all over the world. Individuals use Facebook all around the world to communicate with others for their purposes. Any good practice, though, comes with its own set of issues. Unfortunately, often people unacceptably use Facebook. They created accounts by using other people’s information with a tendency of harassment, spreading fake news, and creating panic among the people. They even aim to embarrass celebrities by making profiles of their names and personal details. According to Facebook’s study, the social media platform has deleted 5.4 billion false profiles in 2019. According to Facebook, about 5% of monthly active users were fraudulent. According to the study, one out of every ten likes on a Facebook post might be a response from a false account. To identify current fake accounts and prevent fake users from creating new ones, a deep learning algorithm was used to detect fake accounts along with the concept of image processing and one-time password (OTP) to prevent fake users from creating new ones. The k-means algorithm was applied to distinguish fake data from a real dataset by calculating Euclidean distances and modifying the centroid points. 1.1
Motivation
The various social platforms have been an integral aspect of people’s everyday lives. Everyone’s public interaction has evolved in the modern era to being connected to online interpersonal organizations. These platforms have brought about a significant change in the way we track our civic activity. It has proven to be easier to add new companions and keep in contact with them and their posts. Almost everyone has an account on various social networks and sometimes more than one account or on different platforms. In the recent era of technology as the applications and the utilization increase in our daily life, we continuously post some unwanted and unaware stuff on social networking and create a mess on the social platform. As a result, creating a fictitious account is quite easy. In the name of a female, there are over 200 million fake Facebook profiles. As a result, controlling these accounts is difficult, and money laundering, smuggling, and other anti-social activities result from these false accounts. Furthermore, these fictitious identities are being used to
296
S. Sen et al.
spread rumors, hate speech, or even post pictures or thoughts of other people without their permission. In this case, we will consider false accounts and individuals who use this site to launder money and put a stop to such a serious situation. Nobody has yet come up with a feasible solution to these problems like fake profiles, online impersonation, etc. We hope to include a mechanism in this project that allows for the automated identification of false accounts, ensuring that people’s social lives are protected. We also hope that we can make it easy for sites to handle the large number of profiles that cannot be handled manually by using this automatic detection strategy. We hope to reduce it so that no fake news or information cannot be spread from any fake accounts. It will also help to reduce identity theft. 1.2
Problem Statement
Fake profiles are being used to spread rumors and hate speech and post pictures and thoughts of other people without their permission. This type of action entails the mass production of false identities to launch an online assault on social media. We will also create multiple accounts in our app for comparison. For this, a database will be introduced where all the information about current users will be stored. The identification would be focused on the users’ Facebook behavior and interactions with other users and their user feeds info. We often use image recognition to determine whether or not different accounts have the same images. We will focus on which features are missing and different in the fake accounts by comparing our accounts features and dataset features from the database using image processing and algorithms. 1.3
Objective and Contributions
The preliminary plan of this research is to detect fake accounts from various social media platforms starting from Facebook. The steps are as follows: i) First, constructed a database with some fundamental information about the users to distinguish the characteristics of real accounts. ii) Use image processing to classify the images being used to open a new account. iii) Use face detection and object detection features of OpenCV to do so. iv) Use the K means algorithm to divide our database into different clusters. v) Lastly, add a one-time password (OTP) system to assure additional security to the accounts. In today’s world, the threat of cybercrime is growing along with the number of users on various social media platforms. And one of the simplest ways to accomplish this is to share erroneous data or to construct a profile of a random individual and disparage them. Our contributions are listed below:
Fake Profile Detection Using Image Processing and Machine Learning
297
i) Working to address potential threats and vulnerabilities brought on by duplicate and incorrectly categorized accounts. ii) Smoke out bogus accounts in addition to preventing individuals from creating identical accounts.
2 2.1
Background Literature Review
The algorithms used in the proposed model are- supervised ML, map reduction, pattern recognition approach and unsupervised two-layer meta-classifier method. PCA algorithm, SMOTE, Medium Gaussian SVM, Regression, Logistic Algorithms, Various classifier algorithms. In this model, linear SVM gives 95.8% accuracy, medium Gaussian SVM provides 97.6%, and logistic regression gives 96.6% [5]. Random forest along with C4.5 and adaptive boosting with decision stump are used as a second classifier, in case the accuracy of the first classifier is less effective [2]. ROC curve (Receiver Operating Characteristics curve) has been generated to measure the classifiers’ performance, along with some other metrics such as precision, recall, F-1 score, etc. Supervised Machine learning algorithms are used to dig out fake profiles. There is one more algorithm, a skin detection algorithm which has been applied to find decent pictures from account holders [3]. If any portrayal contains a human face, it will go under skin detection, where the percentage of skin present in the image will be computed. Using all these algorithms, 80% accuracy was obtained from ML, and the rest of the classifier has 60–80% accuracy and the error rate is 20% [6]. This paper proposes SVM, Neural network, SMOTE-NC, and Naive Bayes with Gaussian distribution [1]. To detect robotic accounts, to differentiate and check the effectiveness of the executed techniques; Precision, Recall, and F1 Score are valued in the evaluation metric. On the proposed model, fake accounts are divided into two sections-user unclassified accounts and undesirable accounts [13]. User unclassified accounts are personal profiles created by users for a company, organization, or non-human entity such as a pet [14]. The user profiles which break Facebook terms of service, including spamming and this is done intentionally for specific purposes are undesirable accounts [15]. A set of 17 attributes are named and measured, which dictates the actions and behavior of Facebook users. Then, these attributes are preceded as input in setting up learning models. Machine Learning algorithms are divided into two major groups- 1) Supervised and 2) Unsupervised [8]. K-means clustering is one of the simplest and popular unsupervised machine learning algorithms where the input data have an unlabeled response and make presumptions from dataset using only input vectors. It performs the iteration that does the partition of the dataset into K pre-determined independent well-separated clusters where each data point is a member of only one group [9]. Through a deterministic global search process that includes N (where N is the size of the data set) executions of the k-means algorithm from suitable initial positions, it dynamically adds one cluster center at a
298
S. Sen et al.
time. It assembles the similarities to make intra-cluster data points and differentiate the clusters on the basis of dissimilarities. If k is small, it produces tighter clusters than Hierarchical clustering. Its’ output is strongly impacted by initial inputs like number of clusters and order of data will have a strong impact on the final output [10]. Therefore, the arbitrary selection of the initial centroids has a significant impact on the quality of the k-means algorithm’s ultimate clustering outcomes. But it requires the specification of the number of cluster centers. It is very sensitive to re-scaling and unable to handle the non-linear dataset, noisy data and outliers. Best First Search algorithm is an informed search algorithm and traversal technique which uses both priority queue and heuristic function to find the most promising node [11]. Best-first searches require a great deal of bookkeeping for keeping track of all compelling nodes. The algorithm uses two lists for tracking the traversal and searching the graph space. They are 1) OPEN, 2) CLOSED. The nodes that are currently open for traversal are listed as “OPEN,” whereas the nodes that have already been traversed are listed as “CLOSED.” This algorithm traverses the shortest path first in the queue. The time complexity of the algorithm is O (n*log n). First of all, two empty lists are needed to be created; OPEN and CLOSED. Then we need to start from the initial node and put it in the ordered ‘OPEN’ list [12]. If the ‘OPEN’ list is empty, then exit the loop and return “False”. The first node in the OPEN list will then be chosen and moved to the CLOSED list. If N is a goal node, we will add it to the closed list and end the loop by returning “True,” but if N is not a goal node, we will expand N to create the following nodes and add them all to the OPEN list. Finally, the nodes in the OPEN list will then be rearranged in ascending order using an evaluation function f(n).
3
Dataset Description
To prevent identity theft, all the information about Facebook profiles, features, and privacy policies should be gathered. A survey form consisting of 25 questions helped us in collecting these data from people. It was posted throughout social media and Facebook groups for research purposes. After waiting for 3–4 weeks, a total of 505 responses were gathered. Added that, the details which people provided in the survey form were completely protected considering their privacy. After data processing, all identifying information was encoded immediately. Names and identification numbers, and email IDs were maintained on a protected local server which was available only for fellow teammates. After the last influx of data is processed the collected information will be destroyed instantly. To access any part of the dataset, destined users must need approval from their teammates. The questions are shown in Table 1. As mentioned, the responses were collected for a month. During this time, some ideas for two clustering algorithms and some ideas on how to preprocess the dataset to implement it were gathered by us. The initial idea was to assign every possible answer to all 25 questions in numerical values to run in the dataset.
Fake Profile Detection Using Image Processing and Machine Learning
299
Table 1. List of questions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
How frequently you change your profile picture? How much time do you spend using social media every day? How many FRIENDS do you have? How often do you comment on others activities? How many likes do you get on your posts (On average)? How many comments do you get on your posts each week(On average)? How many Picture album do you have? How many videos do you have in your account? How many artists do you follow on Facebook? How many Facebook groups are you a member of? How often do you post status updates on Facebook? Are your Facebook posts public or private? Do you use the video chat option for Facebook messaging? What did you use to create your account? How often you visit the links you see on Facebook? Do you use Facebook app to use your account? Why do you use your Facebook account for? How many other apps/sites connected with your Facebook account? How often do you share posts of others? How many friend request you sent per week? How many unknown message requests you get from others each week? Do you use your real name/picture on Facebook? Which of the following apps/games you have played? How often you watch live streaming on Facebook? Do you keep your Facebook accounts locked for unknown person?
Then, all the possible answers from the form to assign values for the responses from the sheet were saved there. While checking the responses, there were many responses where people shared their views on those questions differently. Along with the provided possible options, the individual responses are being taken and assigned numerical values. Also, in some parts where their views on the particular questions were similar, assigned a unique numerical value. This is how preprocessing of the data is done to use for implementation.
300
4
S. Sen et al.
Proposed Model
Following the preprocessing of the data, the next task is to run the dataset in a suitable algorithm to determine the ratio of false and true data from it. Since the purpose is to cluster the dataset into one group and gather all the data in one region to compare those data with some additional test data; It is necessary to find out whether the system can distinguish fake or real under the aegis of machine learning. Looking at the classification of clustering algorithms, K means algorithm was found relatable to the research. The value of K usually determines the number of clusters. However, only one cluster is needed to store the data in one category for detecting fake accounts. K denotes that the algorithm has a centroid point, and dynamically adds one cluster center at a time using a deterministic global search method made up of N k-means algorithm executions starting from appropriate initial positions [9]. The calculation is done based on the Euclidean distance between two points and data stored simultaneously for every data. The arbitrary selection of the starting centroid has a massive impact on the quality of the kmeans algorithm’s ultimate clustering results [10]. Our dataset contains 25 questions, to which 505 responses have been given. 505 responses were used as training data, with the first data points serving as the initial centroid stage. Then, the Euclidean distance was measured for each piece of data and saved value in an array. The centroid points are also modified for each data point contained in the cluster simultaneously. The aim of storing the distance between each piece of data in an array is to determine the maximum distance between them, which can be used as a threshold. The most recent centroid is also provided for each data point for keeping updated. The data was checked to see if the data printed were true or false after having the most recent centroid and maximal distance for the training dataset. The data points were used as new centroid values and measured the difference between them and the most recent centroid from the test data. If the distance is greater than the threshold, the data is fake; otherwise, if the new distance is smaller than the threshold, the data is genuine and stored in the cluster (Fig. 1). Since 25 attributes were used, the sophistication of the algorithm and the program’s runtime both are relatively high. This has also had an impact on the quality of the results received. So Weka was used to reduce and search the modified attributes. Among the three simple algorithms for selecting attributes in Weka, there are two suitable algorithms which are as follows, the best first algorithm and the greedy stepwise algorithm. The program was initially tested with both attributes. Upon running, the Best first algorithm provided nine attributes, and the depth-wise algorithm provided six attributes. The best first search algorithm has been chosen to implement in the dataset for more efficiency. After that, the nine attributes were taken, and created a new dataset to run in the algorithm. A start index and end index were also added to the program to count also any specific index and check with the algorithm to check the limited attributes and find out the centroid and Euclidean distance respectively.
Fake Profile Detection Using Image Processing and Machine Learning
301
Fig. 1. Steps of fake account detection system
4.1
Image Processing
Nowadays, one of the latest topics of the tech industry is image processing. There are so many branches of image processing and face recognition is one of them which is the main focus of this project. Steps of face recognition and how the face recognition flow works are shown below respectively in Fig. 2 and 3.
Fig. 2. Steps of face recognition system applications
302
S. Sen et al.
Fig. 3. Face Recognition processing flow
First of all, the user inputs an image. After that it will be checked if the image contains an object such as birds, airplanes, table, horse, chair, cow, dining table, bus, motorbike, dog, sheep, sofa etc. or includes human face. This detection will be done by a face detection model. Figure 4 shows system can detect the picture as car which is why user the user will be denied further access because the user must put the image of human faces. Afterward, in Fig. 5 if the image is of a human face, then the server will try to find a match in the dataset of images (Fig. 6).
Fig. 4. Object detection
Fig. 5. Recognizing face using OpenCV
Fake Profile Detection Using Image Processing and Machine Learning
303
Fig. 6. Workflow of image processing
4.2
One Time Password(OTP)
The user will be granted access depending on whether the image matches any other image in the database. If matched then an OTP (one-time password) will be sent to the person’s email address that the image matched with to authenticate the user which is shown in Fig. 7. For instance, let’s assume the image matches with the image of person X. An OTP will be sent to that person’s registered email address. There can be a scenario where he/she wants to have a new account. It will solely depend on that person. According to their will, they will be able to keep continuing to open the account.
304
S. Sen et al.
The user retrieves the OTP and inserts it into the prompt to authenticate its identity and obtain access. But if some trespasser is trying to open an account with someone else’s image already in the database, he will be stopped in his tracks. Without OTP which is only available to the actual possessor of the account, he will not be able to proceed.
Fig. 7. Input OTP from email
On the other hand, if the image does not match with the images from the dataset, access will be approved without any further complication. A desperate demand for personal security is being met by this defense system. If an adversary is trying to open an account using someone else’s identity, he/she will no longer be successful. But if somebody wants to have multiple accounts, this process will maintain their privacy and allow them to do so. The image that the user is using is matched within the previous image in the database and OTP will be sent to the person’s email address that the image matched with to authenticate the user. If it is the same user then he or she can enter the OTP and continue opening the account. Without the OTP which is only available to the actual processor of the account not be able to open an account (Fig. 8).
Fake Profile Detection Using Image Processing and Machine Learning
305
Fig. 8. Workflow of OTP
5
Implementation and Result Analysis
Once the whole process has been completed, the next step is to determine how accurately the algorithm works. Since there is no available dataset, a dataset has been created for this research. Also, the comparative analysis is not possible for us to do as there is no work on this dataset. To find out the percentage, some fabricated data have been added to the dataset to check on the algorithm to measure the accuracy of the research. For binary classification, accuracy can be calculated in terms of positive and negative data. We have used performance evaluation equation to measure the accuracy of our work [16].
306
S. Sen et al.
Accuracy =
Tp + Tn T p + T n + Fp + Fn
(1)
To find out the accuracy, a few more data are added as test data so that the above-mentioned equation can calculate the accuracy. 480 data are assigned as the training dataset. Then, we fabricated 25 data as True positive and 36 data as True Negative from the test data. After that, 10 False Positive data and 10 False Negative data were added too. Later, these data are calculated in the equation, the accuracy is 75.30%. For more data, the accuracy will be increased. For the larger dataset, the accuracy will rise and might be some more than the current one. There will be a time-to-time evaluation by collecting the data of existing accounts.
6
Analysis of Image Processing
Moreover, to upgrade the execution of human face detection, we are planning to improve many things such as color processing, edge detection, etc. can be added (Table 2). Here is the table below of Human Face detection rateTable 2. Human Face detection rate
Human Face detection is based on five cases and the average time of this detection is 2–4 s. This procedure has been executed in the Intel(R) Core(TM) i5-7200U CPU @ 2.50 GHz, 2712 Mhz, 2 Core(s), 4 Logical Processor(s). In the future, our goal is to acquire more accurate and precise result by using advanced methodologies and libraries as face detection.
7
Conclusion
To detect fake accounts, a work plan has been planned and developed for the proposed solution. At the dataset, data has been gathered and preprocessed. Then the best algorithm is chosen to distinguish between true and false accounts. To identify fake accounts, the relevant data must be clustered. K-means clustering is chosen for implementation because it is more accurate than the other algorithms for our proposed solution. As a result, the algorithm evaluates the results
Fake Profile Detection Using Image Processing and Machine Learning
307
and accuracy rate. Upon running this algorithm in our dataset, the accuracy is 75.30%. For larger dataset, our accuracy will be increased. Then, the image processing contributes by gathering images and distinguishing between true and false profiles. Finally, after matching the user’s data with the database, a one-time password would be submitted to them for authentication. By doing so, false accounts will be identified and identity fraud can be avoided. The model will help to reduce the number of fake accounts and the vast amount of trouble that can be caused by these accounts. Image processing and OTP are planned together to run on a website. As the motive is to stop the user from creating fake accounts and also none of the genuine users would be affected. There is a chance that genuine users might want to create more than one account. To make this happen, OTP is generated and if any user has already an account, and wants to create one more account, firstly the image of the user will be checked and then an OTP will be sent to the user’s first created account’s email address for verification. By doing this, no fake accounts can be created. In the future, the goal is to add more features to help us detect the image. In addition to that, OTP will be sent to the phone number of the user so that it will be easier for him to obtain the password with more ease. Finally, our model will be gradually applied to various other social media such as LinkedIn, Instagram, and Twitter.
References 1. Akyon, F.C., Kalfaoglu, M.E.: Instagram fake and automated account detection. In: 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–7. IEEE (2019) 2. Chen, Y.-C., Wu, S.F.: FakeBuster: a robust fake account detection by activity analysis. In: 2018 9th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 108–110. IEEE (2018) 3. Cresci, S., et al.: Fame for sale: efficient detection of fake Twitter followers. Decis. Support Syst. 80, 56–71 (2015) 4. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003) 5. Mohammadrezaei, M., Shiri, M.E., Rahmani, A.M.: Identifying fake accounts on social networks based on graph analysis and classification algorithms. Secur. Commun. Netw. 2018 (2018) 6. Smruthi, M., Harini, N.: A hybrid scheme for detecting fake accounts in Facebook. Int. J. Recent Technol. Eng. (IJRTE) 7, 5S3 (2019) 7. Yedla, M., Pathakota, S.R., Srinivasa, T.M.: Enhancing K-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2), 121–125 (2010) 8. Garbade, M.J.: Understanding K-means in machine learning. Towards Data Science, 18 September 2018 9. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003). https://doi.org/10.1016/s00313203(02)00060-2
308
S. Sen et al.
10. Yedla, M., Pathakota, S.R., Srinivasa,T.M.: Enhanced K-means clustering algorithm with improved initial center. Int. J. Sci. Inf. Technol. 1(2), 121–125 (2010) 11. Berliner, H.: The B tree search algorithm: a best-first proof procedure. Artif. Intell. 12(1), 23–40 (1979). https://doi.org/10.1016/0004-3702(79)90003-1 12. Dechter, R., Pearl, J.: Generalized best-first search strategies and the optimality of A*. J. ACM 32(3), 505–536 (1985). https://doi.org/10.1145/3828.3830 13. Gupta, A., Kaushal, R.: Towards detecting fake user accounts in Facebook. In: ISEA Asia Security and Privacy (ISEASP) 2017, pp. 1–6. IEEE (2017) 14. Llorens, F., Mora, F.J., Pujol, M., Rizo, R., Villagra, C.: Working with OpenCV and intel image processing libraries. processing image data tools 15. El Azab, A., Idrees, A.M., Mahmoud, M.A., Hefny, V.H.: Fake account detection in twitter based on minimum weighted feature set. Int. Sch 16. Shajihan, N.: Classification of stages of Diabetic Retinopathy using Deep Learning
A Novel Texture Descriptor Evaluation Window Based Adjacent Distance Local Binary Pattern (EADLBP) for Image Classification Most. Maria Akter Misti1 , Sajal Mondal1 , Md Anwarul Islam Abir1(B) , and Md Zahidul Islam2 1
2
Department of CSE, Green University of Bangladesh, Dhaka, Bangladesh [email protected] Department of Information and Communication Technology, Islamic University, Kushtia, Bangladesh Abstract. In this research, we suggested a novel texture descriptor distance-based Adjacent Local Binary Pattern AdLBP based on the adjacent neighbor window and the relationships among the sequential neighbors pixel value with a given distance parameter. The suggested technique calculates the neighbor and extracts the binary code from the adjacent neighborhood window and surrounding sub-image window in order to improve the adjacent neighbor information and change the conventional LBP thresholding schema. Additionally, we expanded this adjacent distance-based local binary pattern AdLBP and combined it with the evaluation window-based local binary pattern EwLBP to create a texture descriptor for texture classification that is more robust texture descriptor against noise. Finally combine AdLBP And EwLBP using encoding strategy to propose an Evaluation window based on Adjacent Distance Local Binary Pattern EADLBP descriptor for Image Classification. These descriptors are tested with the KTH-TIPS, KTH-TIPS2b to the applicability of the proposed method. In comparison, the proposed EADLBP approach is more robust against noise and consistently outperform all of the fundamental methods. Keywords: neighborhoods Local binary patterns nLBPd · Adjacent Distance Local Binary Pattern EADLBP · Evaluation Window based Local Binary Pattern EwLBP
1
Introduction
Applications like face identification, remote sensing, document analysis, medical image analysis, fingerprint identification, and classification of actual outdoor images have all made extensive use of texture analysis, which is crucial to image processing and computer vision frameworks. The widely used techniques in practice are Local Binary Patterns LBP, despite the fact that many different c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2023 Published by Springer Nature Switzerland AG 2023. All Rights Reserved Md. S. Satu et al. (Eds.): MIET 2022, LNICST 490, pp. 309–317, 2023. https://doi.org/10.1007/978-3-031-34619-4_25
310
M. M. A. Misti et al.
strategies for extracting texture features have already been made out in past few decades. Among the easiest methods is LBP proposed by Ojala et al. [1] which is used to describe texture. It makes use of a local structure or an image’s statistical intensity and Each pixel is compared to its neighbors using LBP. Neighbors are placed on a circle surrounding the pixel. To show these relationships, a binary pattern is converted into a histogram. [2,3]. However, there are numerous problems, such as noise sensitivity and lighting variation. Several studies, including Completed Local Binary Complete CRLBP [4], have been suggested to enhance LBP performance. The Improved Completed Robust Local Binary Pattern (ICRLBP) [5] is a new approach method that is suggested because CRLBP is not rotationally invariant. This study describes a novel technique called enhanced micro structure descriptor EMSD for characterizing batik images. Guang-Hai Liu’s [6] proposed micro-structure descriptor MSD has an enhancement model called EMSD. The M2ECSLBP algorithm also has the benefit of having computational times that are faster and more efficient than LBP or other approaches. However, noise particularly affects the LBP conventional encoding strategy. Local binary patterns become unstable because random noise can quickly change the values of the neighbors. Circular searches might miss this type of texture because micro-patterns can occasionally be oriented through a pixel. The “micro-structure” information of LBP and Local binary patterns by neighborhoods nLBPd [7] are investigated respectively.” However, noise interference has a more significant impact on smaller amounts of “micro-structure” information. This could be the cause of LBP and its variant’s sensitivity to noise. In nLBPd [7], not capture the immediate adjacent neighbor information and also color information. LBP, nLBPd, and most of the variant descriptors are vulnerable to noise. Random noise can easily change the values of the neighbors, and that is the cause of local binary patterns. Taking this into account, we attempted to construct a distance-based Adjacent Local Binary Pattern which is based on the window of the adjacent neighbor that calculates the pixel value near the adjacent neighbor. The the rest of this article is structured as follows: The Background study is presented in Sect. 2. The proposed AdLBP, EwLBP, and EADLBP are presented in detail in Sect. 3. The experiments are then described in more detail in Sect. 4 along with the findings. Section 5 of the paper brings it to a conclusion.
2
Background Study
There are numerous texture feature extraction methodologies in the literature [8–11]. These techniques are typically broken down into four categories: – – – –
Statistical strategies Structural strategies Model-based strategies Filter-based strategies
EADLBP
311
Statistical and model-based strategies frequently explore the spatial relations of pixels based on small pixel neighborhoods.Markov random field models (MRF),Local binary patterns (LBP) and gray level co-occurrence matrices (GLCM) [2] are the most widely used of these techniques. 2.1
Local Binary Pattern LBP
local binary pattern LBP [1], proposed in texture analysis to evaluate local contrast, searches for micro-textons in a very local region. As shown in Fig. 1, A binary pattern was produced by thresholding the value of the pixel in the center of each neighboring pixel.
Fig. 1. Encoding Process as basic in Local Binary Pattern LBP.
Using Eq. 1, define texture in a local neighborhood as a gray scale texture invariant measure derived from a basic definition of texture. The original LBP only took into account a pixel’s eight neighbors, but it has since been expanded to include all of the circular neighbors with number of pixels. LBPP,R (xc , yc ) = s(x) = 2.2
p−1
s(gp − gc )2p
(1)
p=0
0, x < 0 1, x ≥ 0
(2)
Local Binary Patterns by Neighborhoods nLBPd
Y.kaya [7] proposed two brand-new local binary pattern descriptors for texture analysis to detect unique patterns in images. And first one is dependent on the relationships between subsequent neighborhoods of a center pixel, nLBPd, at a given distance, while the second one, dLBPα , that focuses on identifying the neighbors within the same orientation through central pixel (dLBP) parameter. This descriptor, nLBPd, is based on the relationship between eight pixels’ neighbors, P = P0, P1, P2, P3, P4, P5, P6, P7, with one another. With the
312
M. M. A. Misti et al.
distance parameter d, a specific neighbor pixel value is computed sequentially with the following immediate neighbor pixel value. Based on the pixel value of the same orientation at an angle that may also take 0◦ , 45◦ , 90◦ or 130◦ degrees, the comparison between dLBPα is made.
3 3.1
Proposed Texture Descriptor Proposed Adjacent Distance Based Local Binary Pattern AdLBP
A new feature extraction technique, adjacent local binary patterns by neighborhoods based on distance AdLBP is proposed. AdLBP is proposed to improve the performance of the feature extraction technique. AdLBP is based on the local binary patterns by neighborhoods nLBPd proposed by kaya [7]. AdLBP the comparison is done by pixels in the adjacent orientation based on distance parameter, clockwise. Here we consider two 3*3 windows one is a sub-image window (AdLBPcurrent,d=1 ) and another is an adjacent neighborhood window (AdLBPnext,d=1 )d for collecting the adjacent neighbor information. The above descriptor focuses on the relationships among both 8 neighbors of 3*3 sub-image window and 8 neighbors of adjacent neighborhood windows respectively, p = p0, p1, p2, p3, p4, p5, p6, p7, with each other around a pixel. Each neighboring pixel’s value is compared to the pixel next to it during the comparison, which only accepts values of 1 or 0. In the preceding procedure, we obtain a binary number pattern for each center pixel in 3 × 3 window. In addition, a predetermined weight is multiplied by this binary number, and the resulting values are added to obtain the AdLBP pattern value for a center pixel. With the help of this method, the feature map for a specific image can be created after the pattern of every pixel has been extracted. The final feature vector, which is the histogram of the feature map, is derived from the feature map. AdLBP can be calculated by Eq. 7 and current window and next or adjacent window can calculated by Eq. 5 . The (AdLBPd=1 ) calculation Fig. 2 depicts the procedure (Fig. 3).
Fig. 2. The Encoding process of AdLBP
EADLBP
313
Fig. 3. Overview of relationships between neighbors within AdLBP
Distance = 1 ⇒ AdLBP(current,d=1) ⇒ P c = S(129 > 158), S(158 > 150), S(150 > 164), S(164 > 155), S(155 > 141), S(141 > 108), S(108 > 103), S(103 > 129) and Pc = {0, 1, 0, 1, 1, 1, 1, 0}
so Pc takes the value 94 Distance = 1 ⇒ AdLBP(next,d=1) ⇒ P c = S(158 > 150), S(150 > 134), S(134 > 136), S(136 > 155), S(155 > 155), S(155 > 141), S(141 > 150), S(150 > 158) and Pc = {1, 1, 0, 0, 0, 1, 0, 0}
so Pc takes the value 196 AdLBP(current,d=1 ) =
i=0
(Pi >Pj )
i=8
⇒ AdLBP(next,d=1 ) =
⎧ j=0 ⎨ ⎩
j=8
(3)
⎫ ⎬ (Pi >Pj ) ⎭
(4)
where Pc is the center of gray intensity value. Pi and Pj are gray intensity values of neighboring pixel of current frame (AdLBPcurrent,d=1 ) and next frame (AdLBPnext,d=1 ) and sampling center pixel and s(p) is defined as Eq. 6. The final feature vector is obtained by Eq. 7. ⎧ ⎫⎤ ⎡ ⎨i=0,j=0 ⎬ AdLBP(current,d=1 ) ⇒ ⎣AdLBP(next,d=1 ) s(Pi >Pj )2p ⎦ (5) ⎩ ⎭ i=8,j=8
s (Pi >Pj ) ←→ s (P ) ←→
1 if Pi >Pj , 0 if Pi≤ Pj
1, if X(AdLBPcurrent,d=1 ) , Y (AdLBPnext,d=1 ) > 0 0, else
(6)
(7)
where X(AdLBPcurrent,d=1 ) is the value of current frame and Y(AdLBPnext,d=1 ) denote as the value of next/adjacent frame of AdLBP.
314
3.2
M. M. A. Misti et al.
Proposed Evaluation Window Based Local Binary Pattern EwLBP
Following that, we join an evaluation window that is centered on the neighbor to reduce noise interference. In Evaluation Window-based Local Binary Pattern EwLBP, It reduces the noise of each neighbor pixel. On the other hand local binary pattern and its variant descriptor focus on only the canter. The suggested techniques continue to be more resistant to Gaussian noise. The proposed method creates an evaluation window to improve the traditional LBP and its variant threshold scheme. We can see that EwLBP is more robust than LBP against noise interference. It is commonly assumed that LBP can effectively describe local texture by being able to detect “micro-structure” [1]. The smaller “micro-structure” information, however, the greater impact of noise interference. This could be the cause of LBP’s sensitivity to noise. Since our proposed method AnLBPd is a variant of LBP, we adopt an Evaluation Window-based Local Binary Pattern EwLBP to improve the robustness of the feature. Instead of using the values of the neighbors, EwLBP creates an evaluation window for each Pth number of positions and applies the average concept of each pixel value of the evaluation window, which can more effectively reduce the noise of each neighbor’s values. Additionally, the use of the evaluation window expands the scale of the “micro-structure” information. As a result, it is possible to effectively extract the “micro-structure” information, allowing EwLBP to achieve higher classification accuracy despite noise interference. Equation 8 enables the calculation of EwLBP. EwLBPP,R =
P −1
S (xp− gc ) 2p
P =0
s (x) ←→
1 if x≥0, 0 if x