779 55 14MB
English Pages 536 [537] Year 2023
Lecture Notes in Networks and Systems 618
M. Shamim Kaiser · Sajjad Waheed · Anirban Bandyopadhyay · Mufti Mahmud · Kanad Ray Editors
Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering TCCE 2022
Lecture Notes in Networks and Systems Volume 618
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
M. Shamim Kaiser · Sajjad Waheed · Anirban Bandyopadhyay · Mufti Mahmud · Kanad Ray Editors
Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering TCCE 2022
Editors M. Shamim Kaiser Jahangirnagar University Dhaka, Bangladesh
Sajjad Waheed Mawlana Bhashani Science and Technology Dhaka, Bangladesh
Anirban Bandyopadhyay National Institute for Materials Science Tsukuba, Japan
Mufti Mahmud Nottingham Trent University Nottingham, UK
Kanad Ray Amity University Jaipur, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-9482-1 ISBN 978-981-19-9483-8 (eBook) https://doi.org/10.1007/978-981-19-9483-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Organization
Chief Patron Md. Forhad Hossain, Vice-chancellor, MBSTU, Bangladesh
Patron A. R. M. Solaiman, Pro-VC, MBSTU, Bangladesh Md. Sirajul Islam, Treasurer, MBSTU, Bangladesh
Conference Chairs Chi-Sang Poon, Massachusetts Institute of Technology, USA Kanad Ray, Amity University, Rajasthan, India Mufti Mahmud, Nottingham Trent University, UK
Steering Committee Anirban-Bandyopadhyay, National Institute for Materials Science, Japan Anirban Dutta, The State University of New York at Buffalo, USA Chi-Sang Poon, Massachusetts Institute of Technology, USA J. E. Lugo, University of Montreal, Canada Jocelyn Faubert, University of Montreal, Canada Kanad Ray, Amity University Rajasthan, India Luigi M. Caligiuri, The University of Calabria, Italy
v
vi
Mufti Mahmud, Nottingham Trent University, UK M. Shamim Kaiser, Jahangirnagar University, Bangladesh Subrata Ghosh, Northeast Institute of Science & Technology, India Shamim Al Mamun, Jahangirnagar University, Bangladesh
Advisory Committee Alamgir Hossain, Teesside University, UK Hashim Bin Saim, UTHM, Malaysia Jalil Bin Ali, UTM, Malaysia Joarder Kamruzzaman, Federation University Australia Nafarizal Bin Nayan, UTHM, Malaysia Kazi M. Ahmed, UAP, Bangladesh Kok Lay Teo, Sunway University, Malaysia Rosli Bin Omar, UTHM, Malaysia Md. Abu Taher, BUP, Bangladesh Md. Abdur Razzaque, DU, Bangladesh Md. Atiqur Rahman Ahad, DU, Bangladesh Md. Saiful Islam, BUET, Bangladesh Md. Obaidur Rahman, DUET, Bangladesh Md. Hanif Ali, Jahangirnagar University, Bangladesh Md. Roshidul Hasan, BSMRAU, Bangladesh Mohamad Zaky Bin Noh, UTHM, Malaysia Muhammad H. Rashid, University Of West Florida, USA Sarwar Morshed, AUST, Bangladesh Satya Prashad Mazumder, BUET, Bangladesh Subrata Kumar Aditya, SHU, Bangladesh
Organizing Chair Sajjad Waheed, Dean, Faculty of Engineering, MBSTU
Organizing Secretary Md. Ahsan Habib (Tareq), ICT, MBSTU
Organization
Organization
Technical Program Committee Chair Anirban Bandyopadhyay, National Institute for Materials Science, Japan M. Shamim Kaiser, IIT, JU Md. Mahbubul Hoque, Dean, Faculty of Life Science, MBSTU Md. Anwar Hossain, Dean, Faculty of Science, MBSTU Sazzad Parvez, CSE, DU Md. Shorif Uddin, CSE, JU Md. Shahadat Hossain, CU, Bangladesh Nilanjan Day, Techno India, Kolkata
Registration Monir Morshed, ICT, MBSTU
Venue and Accommodation Preparation Md. Iqbal Mahmood, TE, MBSTU
Workshop Md. Matiur Rahman, CSE, MBSTU
Special Session Muhammad Shahin Uddin, ICT, MBSTU
Finance Committee Mostafa Kamal Nasir, CSE, MBSTU
vii
viii
Refreshment Mehedi Hasan Talukdar, CSE, MBSTU
Accommodation A. S. M. Delowar Hossain, CSE, MBSTU
Local Arrangement Mst. Nargis Aktar, ICT, MBSTU
Project and Exhibition Md. Sazzad Hossain, CSE, MBSTU
Publicity Md. Abir Hossain, ICT, MBSTU Shamim Al Mamun, JU, Bangladesh M. Arifur Rahman, NTU, UK
Website Management Chair Tanvir Rahman, ICT, MBSTU Bikash Kumar Paul, ICT, MBSTU S. M. Shamim, ICT, MBSTU
Organizing Committee Mohd Helmy Abd Wahab, UTHM, Malaysia (Chair) Syed Zuhaib Haider Rizvi, UTHM, Malaysia (Co-Chair)
Organization
Organization
M. Shamim Kaiser, JU, Bangladesh (Co-Chair) Muhammad Sufi Bin Roslan, UTHM, Malaysia (Secretary) Kek Sie Long, UTHM, Malaysia (Registration Chair) Umar Abu Bakar, UTHM, Malaysia (Treasurer) Maizatulazrina Binti Yaakob, UTHM, Malaysia Syed Zulkarnain Syed Idrus, UTHM, Malaysia Haryana Mohd Hairi, UTHM, Malaysia Saiful Najmee Bin Mohamad, UTM, Malaysia Iliana Md. Ali, TATIUC, Malaysia Shamim Al Mamun, JU, BD Syed Asim Shah, UTHM, Malaysia
Technical Program Committee Chairs Anirban Bandyopadhyay, National Institute for Materials Science, Japan Sie Kek, UTHM, Malaysia M. Arif Jalil, UTM, Malaysia
Technical Program Committee Kavi Kumar, UTM, Malaysia Nabihah Ahmad, UTM, Malaysia S. P. Tiwari, IIT, Dhanbad, India Su Rong, NTU, Singapore Said Broumi, University Of New Mexico, USA Pal Madhumangal, Vidyasagar University, India F. Suresh Singh, University Of Kerala, India Sabariah Saharan, UTHM, Malaysia Arif Jalil, UTM, Malaysia Muhammad Arifur Rahman, Jahangirngar University, Bangladesh Sajjad Waheed, MBSTU, Bangladesh Md. Zahidur Rahman, GUB, Bangladesh Muhammad Golam Kibria, ULAB, Bangladesh Md. Majharul Haque, Deputy Director, Bangladesh Bank Samsul Arefin, CUET, Bangladesh Md. Obaidur Rahman, DUET, Bangladesh Mustafa Habib Chowdhury, IUB, Bangladesh Marzia Hoque-Tania, Oxford Univesity, UK Antesar Shabut, CSE, Leeds Trinity University, UK Md. Khalilur Rhaman, BRAC University, Bangladesh Md. Hanif Seddiqui, University of Chittagong, Bangladesh
ix
x
Organization
M. M. A. Hashem, KUET, Bangladesh Tomonori Hashiyama, The University of Electro-Communications, Japan Wladyslaw Homenda, Warsaw University of Technology, Poland M. Moshiul Hoque, CUET, Bangladesh A. B. M. Aowlad Hossain, KUET, Bangladesh Sheikh Md. Rabiul Islam, KUET, Bangladesh Manohar Das, Oakland University, USA Kaushik Deb, CUET, Bangladesh Carl James Debono, University of Malta, Malta M. Ali Akber Dewan, Athabasca University, Canada Belayat Hossain, Loughborough University, UK Khoo Bee Ee, Universiti Sains Malaysia, Malaysia Ashik Eftakhar, Nikon Corporation, Japan Md. Tajuddin Sikder, Jahangirnagar University, Bangladesh Mrs. Shayla Islam, UCSI, Malaysia Antony Lam, Mercari Inc. Japan Ryote Suzuki, Saitama University, Japan Hishato Fukuda, Saitama University, Japan Md. Golam Rashed, Rajshahi University, Bangladesh Md Sheikh Sadi, KUET, Bangladesh Tushar kanti Shaha, JKKNIU, Bangladesh M. Shazzad Hosain, NSU, Bangladesh M. Mostazur Rahman, AIUB, Bangladesh Tabin hassan, AIUB, Bangladesh Aye Su Phyo, Computer University Kalay, Myanmer Md. Shahedur Rahman, Jahangirnagar University Lu Cao, Saitama University, Japan Nihad Adnan, Jahangirnagar University Mohammad Firoz Ahmed, Jahangirnagar University A. S. M. Sanwar Hosen, JNU, South Korea Mahabub Hossain, ECE, HSTU, Bangladesh Md. Sarwar Ali, Rajshahi University, Bangladesh Risala T. Khan, Jahangirnagar University, Bangladesh Mohammad Shahidul Islam, Jahangirnagar University, Bangladesh Manan Binte Taj Noor, Jahangirnagar University, Bangladesh Md Abu Yousuf, Jahangirnagar University, Bangladesh Md. Sazzadur Rahman, Jahangirnagar University, Bangladesh Rashed Mazumder, Jahangirnagar University, Bangladesh Md. Abu Layek, Jagannath University, Bangladesh Saiful Azad, Universiti Malaysia Pahang, Malaysia Mostofa Kamal Nasir, MBSTU, Bangladesh Mufti Mahmud, NTU, UK A. K. M. Mahbubur Rahman, IUB, Bangladesh Al Mamun, Jahangirnagar University, Bangladesh Al-Zadid Sultan Bin Habib, KUET, Bangladesh
Organization
Anup Majumder, Jahangirnagar University, Bangladesh Atik Mahabub, Concordia University, Canada Bikash Kumar Paul, MBSTU, Bangladesh Md. Obaidur Rahman, DUET, Bangladesh Nazrul Islam, MIST, Bangladesh Ezharul Islam, Jahangirnagar University, Bangladesh Farah Deeba, DUET, Bangladesh Md. Manowarul Islam, Jagannath University, Bangladesh Md. Waliur Rahman Miah, DUET, Bangladesh Rubaiyat Yasmin, Rajshahi University, Bangladesh Sarwar Ali, Rajshahi University, Bangladesh Rabiul Islam, Kulliyyah of ICT, Malaysia Dejan C. Gope, Jahangirnagar University, Bangladesh Sk. Md. Masudul Ahsan, KUET, Bangladesh Mohammad Shahriar Rahman, ULAB, Bangladesh Golam Dastoger Bashar, Boise State University, USA Md. Hossam-E-Haider, MIST, Bangladesh H. Liu, Wayne State University, USA Imtiaz Mahmud, Kyungpook National University, Korea Kawsar Ahmed, MBSTU, Bangladesh Kazi Abu Taher, BUP, Bangladesh Linta Islam, Jagannath University, Bangladesh Md. Musfique Anwar, Jahangirnagar University, Bangladesh Md. Sanaul Haque, University of Oulu, Finland Md. Ahsan Habib, MBSTU, Bangladesh Md. Habibur Rahman, MBSTU, Bangladesh M. A. F. M. Rashidul Hasan, Rajshahi University, Bangladesh Md. Badrul Alam Miah, UPM, Malaysia Mohammad Ashraful Islam, MBSTU, Bangladesh Mokammel Haque, CUET, Bangladesh Muhammad Ahmed, ANU, Australia Nazia Hameed, University of Nottingham, UK Partha Chakraborty, CoU. Bangladesh Kandrapa Kumar Sarma, Gauhati University, India Vaskar Deka, Gauhati University, India K. M. Azharul Islam, KUET, Bangladesh Tushar Sarkar, RUET, Bangladesh Surapong Uttama, Mae Fah Luang University, Thailand Sharafat Hossain, KUET, Bangladesh Shaikh Akib Shahriyar, KUET, Bangladesh A. S. M. Sanwar Hosen, Jeonbuk National University, Korea
xi
xii
Publication Committee Kanad Ray, Amity University Rajasthan, India Kavikumar S/O Jacob, UTHM, Malaysia Mohd Helmy Bin Abd Wahab, UTHM, Malaysia Kek Sie Long, UTHM, Malaysia Mufti Mahmud, NTU, UK
Organization
Preface
The Fourth International Conference on Trends in Computational and Cognitive Engineering (TCCE 2022) was held at the Mawlana Bhashani Science and Technology University in Tangail, Bangladesh, during 18–19 December 2022. The previous three editions were held at Universiti Tun Hussein Onn Malaysia, Malaysia in 2021; Jahangirnagar University, Bangladesh in 2020; and Central University of Haryana, India in 2019. Experimental, theoretical, and applied aspects of computational and cognitive engineering are the focus of the TCCE. Computational and cognitive engineering investigate diseases and behavioural disorders with computer and mathematical approaches that are widespread in science, engineering, technology, and industry. The goal of the conference is to bring together researchers, educators, and business experts from allied research and development disciplines. This volume is a collection of peer-reviewed papers presented at the conference. The TCCE 2022 received 132 articles from 10 countries. Out of these, only 35 high-quality full papers from 7 countries were accepted for presentation at the conference. The papers that were submitted went through a single-blind review process. At least two experts, including two independent reviewers, the track co-chair, and the respective track chair, were asked for their opinions. The papers presented in this book are insightful for those interested in learning about computational intelligence and cognitive engineering. The accepted papers are split into five segments: Intelligent Image Analysis; Internet of Things; Machine Learning for Industry; Network and Security; and NLP and Robotics for Industrial and Social Applications. We are grateful to the authors who contributed to this conference and supported the advancement of knowledge in cognitive engineering. We want to express our gratitude to all the Organizing Committee members and the Technical Committee members for their hard work and unconditional support, particularly the Chairs, the Co-Chairs, and the Reviewers. Special thanks to the honourable vice-chancellor of Mawlana Bhashani Science and Technology University for his thorough support. TCCE 2022 could not have taken place without the tremendous work of the team and the gracious assistance. We are grateful to Mr. Aninda Bose and other team members of Springer-Nature for their continuous support in coordinating xiii
xiv
Preface
this volume’s publication. Last but not least, we thank all of our volunteers for their hard work during this challenging time to make TCCE 2022 a grand success. Dhaka, Bangladesh Nottingham, UK Dhaka, Bangladesh Jaipur, India Tsukuba, Japan December 2022
M. Shamim Kaiser Mufti Mahmud Md. Sajjad Waheed Kanad Ray Anirban Bandyopadhyay
Contents
Image Analysis Early Prediction and Analysis of DTI and MRI-Based Alzheimer’s Disease Through Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . Amira Mahjabeen, Md Rajib Mia, F. N. U. Shariful, Nuruzzaman Faruqui, and Imran Mahmud Ensemble Machine Learning Technique for Identifying COVID-19 from CT Scan Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rahul Deb Mohalder, Apu Sarder, Khandkar Asif Hossain, Laboni Paul, and Farhana Tazmim Pinki Machine Learning-Based Tomato Leaf Disease Diagnosis Using Radiomics Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Faisal Ahmed, Mohammad Naim Uddin Rahi, Raihan Uddin, Anik Sen, Mohammad Shahadat Hossain, and Karl Andersson Effective Feature Extraction via Segmented t-Stochastic Neighbor Embedding for Hyperspectral Image Classification . . . . . . . . . . . . . . . . . . . Tanver Ahmed, Md. Hasanul Bari, Masud Ibn Afjal, Adiba Mahjabin Nitu, Md. Palash Uddin, and Md. Abu Marjan
3
15
25
37
Argument Mining on Clinical Trial Abstracts on Lung Cancer Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md Yasin Arafat Khondoker and Mohammad Abu Yousuf
49
Car Detection from Unmanned Aerial Vehicles Based on Deep Learning: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sohag Hossain, Sajjad Waheed, and M. Abdullah
61
Light Convolutional Neural Network to Detect Eye Diseases from Retinal Images: Diabetic Retinopathy and Glaucoma . . . . . . . . . . . . Milon Biswas, Sudipto Chaki, Saurav Mallik, Loveleen Gaur, and Kanad Ray
73
xv
xvi
Contents
Internet of Things for Society Evaluating Usability of AR-based Learning Applications for Children Using SUS and Heuristic Evaluation . . . . . . . . . . . . . . . . . . . . Sheikh Tasfia, Muhammad Nazrul Islam, Syeda Ajbina Nusrat, and Nusrat Jahan IoT Based Biofloc Aquaculture Monitoring System . . . . . . . . . . . . . . . . . . . Tahsin Jannat Alam, Abdullah Al Shabab Bin Hayder, Ahsan Fuad Apu, Md. Hasan Al Banna, and Md. Sazzadur Rahman
87
99
Optimal Control Problem With Non-Standard Conditions: Direct and Indirect Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Wan Noor Afifah Wan Ahmad, Suliadi Firdaus Sufahani, Mahmod Abd Hakim Mohamad, Mohd Saifullah Rusiman, Rozaini Roslan, Mohd Zulariffin Md Maarof, Muhamad Ali Imran Kamarudin, Ruzairi Abdul Rahim, and Naufal Ishartono Descriptive Analysis for Electric Bus During Non-Operational Stage . . . 121 Wan Noor Afifah Wan Ahmad, Suliadi Firdaus Sufahani, Mohd Fahmy-Abdullah, and Muhammad Syamil Abdullah Sani The Effectiveness Level on the Electric Buses Operation: Case Study for Affordability and Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Ahmad Husaini Mohamad Tajudin, Mohd Fahmy-Abdullah, Suliadi Firdaus Sufahani, and Wan Noor Afifah Wan Ahmad IoMT-based Android Application for Monitoring COVID-19 Patients Using Real-Time Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Mohammad Farshid, Atia Binti Aziz, Nanziba Basnin, Mohoshena Akhter, Karl Andersson, and Mohammad Shahadat Hossain A Cost-Effective Unmanned Ground Vehicle (UGV) Using Swarm Robotics Technology for Surveillance and Future Combat . . . . . . . . . . . . . 159 Shamim Ahmed, Md. Khoshnur Alam, M. Rifat Abdullah Dipu, Swarna Debnath, Sadia Haque, and Taiba Akhter Neural Network-Based Obstacle and Pothole Avoiding Robot . . . . . . . . . . 173 Md. Mahedi Al Arafat, Mohammad Shahadat Hossain, Delowar Hossain, and Karl Andersson Machine Learning for Society A Comparative Study of Psychiatric Characteristics Classification for Predicting Psychiatric Disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Md. Sydur Rahman and Boshir Ahmed
Contents
xvii
Material Named Entity Recognition (MNER) for Knowledge-Driven Materials Using Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 M. Saef Ullah Miah and Junaida Sulaiman An Improved Optimization Algorithm-Based Prediction Approach for the Weekly Trend of COVID-19 Considering the Total Vaccination in Malaysia: A Novel Hybrid Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Marzia Ahmed, Mohd Herwan Sulaiman, Ahmad Johari Mohamad, and Mostafijur Rahman Analyzing the Effectiveness of Several Machine Learning Methods for Heart Attack Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Khondokar Oliullah, Alistair Barros, and Md. Whaiduzzaman Solving the Royalty Payment Problem Through Shooting Method . . . . . . 237 Wan Noor Afifah Wan Ahmad, Suliadi Firdaus Sufahani, Mahmod Abd Hakim Mohamad, Mohd Saifullah Rusiman, Rozaini Ros-lan, Mohd Zulariffin Md. Maarof, Muhamad Ali Imran Kamarudin, Ruzairi Abdul Rahim, and Naufal Ishartono ECG Signal Classification Using Transfer Learning and Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Tanzila Tahsin Mayabee, Kazi Tahsinul Haque, Saadia Binte Alam, Rashedur Rahman, M. Ashraful Amin, and Syoji Kobashi Partitional Technique for Searching Initial Cluster Centers in K-means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Md. Hamidur Rahman and Momotaz Begum A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Tanvir Habib Sardar, Rashel Sarkar, Sheik Jamil Ahmed, and Anjan Bandyopadhyay Model Analysis for Predicting Prostate Cancer Patient’s Survival: A SEER Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Md. Shohidul Islam Polash, Shazzad Hossen, and Aminul Haque Quantum-Inspired Neural Network on Handwriting Datasets . . . . . . . . . . 291 Manik Ratna Shah, Jay Sarraf, Prasant Kumar Pattnaik, and Anjan Bandyopadhyay
xviii
Contents
Network and Security An Efficient and Secure Data Deduplication Scheme for Cloud Assisted Storage Systems with Access Control . . . . . . . . . . . . . . . . . . . . . . . . 309 Md. Nahiduzzaman, M. Shamim Kaiser, Muhammad R. Ahmed, and Marzia Hoque Tania Priority-Based Intelligent Reflecting Surface for Uplink 6G Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Binodon, Md. Rafatul Haque, Md. Amirul Hasan Shanto, Amit Karmaker, and Md. Abir Hossain Identifying Duplicate Questions Leveraging Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Maksuda Bilkis Baby, Bushra Ankhari, Md Shajalal, Md. Atabuzzaman, Fazle Rabbi, and Masud Ibn Afjal StegoPix2Pix: Image Steganography Method via Pix2Pix Networks . . . . 343 Md. Min-ha-zul Abedin and Mohammad Abu Yousuf Low-Cost Energy Efficient Encryption Algorithm for Portable Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Abrar Fahim Alam and M. Shamim Kaiser Detection of Dental Issues Using the Transfer Learning Methods . . . . . . . 367 Famme Akter Meem, Jannatul Ferdus, William Ankan Sarkar, Md Imtiaz Ahmed, and Mohammad Shahidul Islam Healthcare Professionals Credential Verification Model Using Blockchain-Based Self-sovereign Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Shubham Saha, Sifat Nawrin Nova, and Md. Ishtiaq Iqbal Emerging Applications for Society A Classified Mental Health Disorder (ADHD) Dataset Based on Ensemble Machine Learning from Social Media Platforms . . . . . . . . . 395 Sabrina Mostafij Mumu, Hasibul Hoque, and Nazmus Sakib Language Identification in Multilingual Text Document Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Md. Mahmodul Hasan, A. S. M. Shafi, and Al-Imtiaz Sentiment Analysis of Restaurant Reviews Using Machine Learning . . . 419 M. Abdullah, Sajjad Waheed, and Sohag Hossain Transliteration from Banglish to Bengali Language Using Neural Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Shourov Kabiraj, Sajjad Waheed, and Zaber All Khaled
Contents
xix
An Evaluation of BdSL 49 Dataset Using Transfer Learning Techniques: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Saqib Sizan Khan, Ashraful Haque, Nipa Khatun, Nasima Begum, Nusrat Jahan, and Tanjina Helaly 1D to 20D Tensors Like Dodecanions and Icosanions to Model Human Cognition as Morphogenesis in the Density of Primes . . . . . . . . . . 449 Sudeshna Pramanik, Pushpendra Singh, Pathik Sahoo, Kanad Ray, and Anirban Bandyopadhyay Metamaterials-Based Photonic Crystal Fiber (PCF) Design for Wireless Charging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Kisalaya Chakrabarti and Mayank Goswami A Ranking Model of Paddy Farmers for Their Welfare . . . . . . . . . . . . . . . . 477 Suneeta Mohanty, Shaswati Patra, Prabhat Ranjan Patra, and Prasant Kumar Pattnaik A Real-Time Bangla Local Language Recognition from Voice . . . . . . . . . . 489 Mohammad Junayed Khan Noor, Fatema Tuj Johora, Md. Mahin, and Muhammad Aminur Rahaman A Reviewer Recommender System for Scientific Articles Using a New Similarity Threshold Discovery Technique . . . . . . . . . . . . . . . . . . . . . 503 Saiful Azad, M. Ariful Hoque, Nahim Ahmed Rimon, M. Mahabub Sazid Habib, Mufti Mahmud, M. Shamim Kaiser, and M. Rezaul Karim Self-survival of Quantum Vibrations of a Tubulin Protein and Microtubule: Quantum Conductance and Quantum Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Komal Saxena, Pushpendra Singh, Satyajit Sahu, Subrata Ghosh, Pathik Sahoo, Soami Daya Krishnananda, and Anirban Bandyopadhyay Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
About the Editors
Dr. M. Shamim Kaiser is currently working as Professor at the Institute of Information Technology of Jahangirnagar University, Savar, Dhaka-1342, Bangladesh. He received his Bachelor’s and Master’s degrees in Applied Physics Electronics and Communication Engineering from the University of Dhaka, Bangladesh, in 2002 and 2004, respectively, and the Ph.D. degree in Telecommunication Engineering from the Asian Institute of Technology, Thailand, in 2010. His current research interests include data analytics, machine learning, wireless network & signal processing, cognitive radio network, big data and cybersecurity, renewable energy. He has authored more than 100 papers in different peer-reviewed journals and conferences. He is Associate Editor of the IEEE Access Journal, Guest Editor of Brain Informatics Journal, and Cognitive Computation Journal. Dr. Kaiser is Life Member of Bangladesh Electronic Society; Bangladesh Physical Society. He is also Senior Member of IEEE, USA and IEICE, Japan, and Active Volunteer of the IEEE Bangladesh Section. He is founding Chapter Chair of the IEEE Bangladesh Section Computer Society Chapter. Dr. Kaiser organized various international conferences such as ICEEICT 2015–2018, IEEE HTC 2017, IEEE ICREST 2018, and BI2020. Sajjad Waheed received his M.Sc. in Computer Science and Engineering in 1999 and Ph.D. in Computer Engineering in 2013. He has been serving the Mawlana Bhashani Science and Technology University as Professor in the Department of Information and Communication Technology. He is currently Chairman of the Department of Information and Communication Technology and also Dean of the Faculty of Engineering. He has actively collaborated with numerous IT-related conferences in Bangladesh in the last few years. Anirban Bandyopadhyay is Senior Scientist in the National Institute for Materials Science (NIMS), Tsukuba, Japan, has Ph.D. from Indian Association for the Cultivation of Science (IACS), Kolkata 2005, December, on supramolecular electronics, 2005–2007: is ICYS Research Fellow NIMS, Japan, 2007–now, Permanent Scientist in NIMS, Japan, has 10 patents on building artificial organic brain, big data, molecular bot, cancer & Alzheimer drug, fourth circuit element, etc., 2013–2014 is Visiting xxi
xxii
About the Editors
Scientist in MIT, USA, on biorhythms, World Technology Network, WTN Fellow, (2009–continued), Hitachi Science and Technology award 2010, Inamori Foundation award 2011–2012, Kurata Foundation Award, Inamori Foundation Fellow (2011-), and Sewa Society International SSS Fellow (2012-), Japan; SSI Gold medal (2017). Dr. Mufti Mahmud is Senior Lecturer of Computing at the Nottingham Trent University, UK. He received Ph.D. degree in Information Engineering from University of Padova—Italy, in 2011. A recipient of the Marie-Curie postdoctoral fellowship, he served at various positions in the industry and academia in India, Bangladesh, Italy, Belgium, and the UK since 2003. An expert in computational intelligence, data analysis, and big data technologies, Dr. Mahmud has published over 80 peerreviewed articles and papers in leading journal and conferences. Dr. Mahmud serves as Associate Editor to the Cognitive Computation, IEEE Access, Big Data Analytics, and Brain Informatics journals. Dr. Mahmud is Senior Member of IEEE and ACM, Professional Member of the British Computer Society, and Fellow of the higher education academy—UK. During the year 2020–2021, he is serving as Vice Chair of the Intelligent System Application Technical Committee of IEEE CIS, Member of the IEEE CIS Task Force on Intelligence Systems for Health and the IEEE R8 Humanitarian Activities Subcommittee, and Project Liaison Officer of the IEEE UK and Ireland SIGHT committee. Dr. Mahmud is also serving as Local Organizing Chair of IEEE-WCCI2020; General Chair of BI2020 and BI2021, and Program Chair of IEEE-CICARE2020. Kanad Ray (Senior Member, IEEE) received the M.Sc. degree in physics from Calcutta University and the Ph.D. degree in physics from Jadavpur University, West Bengal, India. He has been Professor of Physics and Electronics and Communication and is presently working as Head of the Department of Physics, Amity School of Applied Sciences, Amity University Rajasthan (AUR), Jaipur, India. His current research areas of interest include cognition, communication, electromagnetic field theory, antenna and wave propagation, microwave, computational biology, and applied physics. He has been serving as Editor for various Springer book series. He was Associate Editor of the Journal of Integrative Neuroscience (The Netherlands: IOS Press). He has been Visiting Professor to UTM & UTeM, Malyasia and visiting Scientist to NIMS, Japan. He has established MOU with UTeM Malaysia, NIMS Japan, and University of Montreal, Canada. He has visited several countries such as Netherlands, Turkey, China, Czechoslovakia, Russia, Portugal, Finland, Belgium, South Africa, Japan, Singapore, Thailand, and Malaysia for various academic missions. He has organized various conferences such as SoCPROS, SoCTA, ICOEVCI, and TCCE as General Chair and Steering Committee Member.
Image Analysis
Early Prediction and Analysis of DTI and MRI-Based Alzheimer’s Disease Through Machine Learning Techniques Amira Mahjabeen , Md Rajib Mia , F. N. U. Shariful , Nuruzzaman Faruqui , and Imran Mahmud
Abstract Alzheimer’s disease (AD) is a neurodegenerative disease generally occurring in 65 years or older, destroying neurons and various brain areas. Initially mild, the symptoms develop increasingly severe over time. Patients with AD are becoming more numerous every day. As a result, it is essential to detect AD progression early. Different clinical methods and neuroimaging techniques are used to detect this disease. Due to the complicated nature of AD, only clinical methods or neuroimaging techniques cannot correctly detect early AD and the progression of MCI patients. Besides, these techniques are costly, time-consuming, and limited availability. This research work uses Alzheimer’s disease Neuroimaging Initiative (ADNI) database to make significant predictions. Numerous machine learning models were examined to recognize early AD and mild cognitive impairment in cognitively normal people with identical features. Gaussian Naive Bayes identifies Alzheimer’s patient’s mild cognitive impairment and healthy people with a better classification accuracy of 96.92% using the selected and correlated features—ADMCI3, AV45, APOE4, AV45AB12, ADASQ4, RAVLT_perc_forgetting, AD_CGH_L, MD_CGH_L, RD_CGH_L, etc. than other models. The study findings showed that by using neuropsychological data combined with cognitive data, machine learning techniques could help diagnose AD. Keywords Alzheimer’s disease · Machine learning · Neuron-imaging · Gaussian Naive Bayes
A. Mahjabeen · F. N. U. Shariful · N. Faruqui · I. Mahmud Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh M. R. Mia (B) · F. N. U. Shariful · N. Faruqui · I. Mahmud University of North Florida, Florida, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_1
3
4
A. Mahjabeen et al.
1 Introduction In 2018, fifty million people globally had dementia which will increase to one hundred and fifty-two million by the year 2050. According to the World Alzheimer’s Disease report, in 2018, people with dementia will appear every 3 s [19]. 60% to 80% of all dementia cases are due to AD [9]. AD causes brain cell death and memory loss. Patients with early-stage AD experience memory loss, mild cognitive difficulties, and numerous daily activity challenges. After between three and ten years, the patient passes away [13]. Besides, the patients face challenges in writing and speaking, short-term cognitive impairment, difficulties performing everyday activities, and mood changes [27]. Early intervention may be necessary to treat AD properly or to delay disease progression [2]. Genetic mutation and Down syndrome are uncommon genetic factors associated with AD. Risk factors for AD include APOE, age, and family history. In the age groups of 65–74, 75–84, and 85 and over, AD affects about 3, 17, and 32% of people, respectively. The leading cause of AD is genetic changes, which result in abnormal beta-amyloid deposition around brain cells and abnormal tau protein deposition within brain cells [4]. Montreal Cognitive Assessment (MoCA), Mini-Mental State Examination (MMSE), Alzheimer’s disease Assessment Scale (ADAS), positron emission tomography (PET), magnetic resonance imaging (MRI), and diffusion tensor imaging (DTI) are used to predict this disease. The MMSE and MoCA are used to examine cognitive performance and track the progression of AD. To differentiate between AD and healthy people, PET scans patterns of amyloid deposition and glucose metabolism. MRI can produce a 3D image of the internal anatomy of the brain. AD-related structural changes can be found using structural MRI. The relationship between blood oxygen leveldependent signal and neural activity is measured in functional MRI (rs-fMRI) to decide functional relationships across various brain regions. DTI measures the structural changes in white matter. It can detect the diffusion of water molecules in the brain and recognize abnormal spread [11, 22, 32, 34]. Numerous studies use machine learning to discover the best biomarker of AD. Several studies used the SVM model to distinguish AD [3, 12, 21, 25], this method’s detecting process is relatively slow. An investigation has used only clinical data to classify and detect AD [7] because only clinical data is not enough to determine AD. Besides, a single neuroimaging test is insufficient to see AD risk [32], where many studies used only MRI data to classify and detect AD. A study used only psychological parameters [24], which is insufficient for AD diagnosis. Another study used logistic regression, multivariate linear regression, and SVM to distinguish AD pathology from aging-related cognitive impairment [30]. Another proposed study uses SVM, Hard voting, Soft voting, DT, XGBoost, and RFC to identify AD based on various parameters. With an accuracy of 86%, the soft voting classifier predicts the outcome [31]. A study uses a machine learning technique and a dual-task gait assessment to identify MCI and AD and compare individuals with healthy participants. The SVM classifier was used to train on the MoCA score and gait features. The accuracy for the gait features was 78%, and the MoCA score was 83% [14].
Early Prediction and Analysis of DTI and MRI-Based Alzheimer’s …
5
The eight machine learning methods based on MRI images perform best using the ten-fold-cross validation technique [5]. A study used a filter-based attribute selection methodology in MMSC and MRI data. This study demonstrates that neuropsychological test scores can improve classification model performance [33]. MRI images are converted into numerical data, and SVM, GB, NN, K-NN, and RF models are used to detect AD [20]. Another study utilized two models—Convolutional neural network (CNN) and K-NN. 84.5% accuracy was obtained from the CNN model [10]. From several studies, the clinical methods focus on various cognitive domains, but their testing capabilities are limited. Structural MRI may not identify the histological characteristics of AD because of the overlap of atrophy patterns with other diseases. Additionally, PET neuroimaging techniques are comparatively costly and limited accessibility. Amyloid PET is not a suitable biomarker for structural MRI and FDG PET for disease progression during the clinical stage. FDG retention is a general metabolic indicator, though, and it may be abnormal for various reasons. DTI is hypothesis-dependent, and most brain area has not been explored [15, 26]. This research aims to precisely predict an individual’s risk of AD and MCI based on specified parameters, including various cognitive and neuroimaging factors. This research work is structured as follows: Overview of previous studies in Sect. 2. The suggested approach and algorithm are presented in Sect. 2.4. Section 3 analyzed and contrasted the model output. Section 4 emphasizes the conclusion and further study.
2 Material and Methodology There has been much research on Alzheimer’s disease. This research study has used different machine learning models to detect AD. The processing stages are shown in Fig. 1.
Fig. 1 Steps of machine learning approach
6
A. Mahjabeen et al.
Fig. 2 The X-axis represents the label feature, and Y-axis represents the number of each label feature class. Count Plot for label feature, here, 33 CN people identify as 1, 36 AD patients identify as 2, and 108 MCI patients identify as 3
2.1 Data Source The Alzheimer’s neuroimaging Initiative (ADNI) (adni.loni.usc.edu) provided the dataset used in this study [1]. The best dataset for diagnosing Alzheimer’s disease is the ADNI dataset. The ADNI initiative was started in 2003 to create biomarkers for genetic, clinical, and neuroimaging samples for the early detection of AD [23]. The dataset includes Cerebral Spinal Fluid (CSF) biomarkers, genetic biomarkers, results from cognitive tests, and data from MRI, PET, and DIT neuroimaging techniques. Data on DTI were gained from the ADNI-2 and ADNI-3 projects (www.adni.org) as well as the Laboratory of Neuroimaging, UCLA (Nir et al., 2013,link at adni.org). The dataset includes 177 individuals and 266 features. AD123 is the label feature. This dataset contains information on people aged 55–90. Among them, 107 people were men, and 70 people were female. Figure 2 depicts the label feature’s number of samples from each class. Furthermore, Fig. 3 describes the label data density from the dataset.
2.2 Data Preprocessing In this research work, the missing values are denoted by 999999. 999999 are converted to NUN. The label encoder converts categorical data into numeric data. Missing data were replaced with the mean value of the corresponding feature. The dataset is unbalanced. Synthetic Minority Oversampling Technique (SMOTE)
Early Prediction and Analysis of DTI and MRI-Based Alzheimer’s …
7
Fig. 3 The X-axis represents the label feature, and Y-axis represents the density of each label feature class. It displays the distribution of the label feature
sampling technique has been used to create data samples from the dataset and resample the features to match the sample because the dataset is relatively small. The MinMaxScaler technique has been used for feature normalization.
2.3 Feature Selection Feature selection is regularly used to eliminate redundant and unnecessary features. This technique also improves accuracy by reducing dataset overfitting and helps to train a model faster. This research work used the SelectKBest method to determine the critical features. This feature selection method uses a chi-square (×2) test-based method. It figures out the score (×2) for both label features and every non-negative independent feature [35]. After using this technique, it selects 30 significant features from 265 features for training to train the model.
2.4 Machine Learning Algorithms Several machine learning models have been used to train the model based on significant features. The model was trained using K-NN, SVM, and GNB. The three models were trained using data from 80% of the participants in the CN, MCI, and AD groups and tested on the 20% remaining participants’ data. Support Vector Machine (SVM). SVM is a classification method with an ideal hyper-plane. The hyper-plane divided the data point. The new instance is determined
8
A. Mahjabeen et al.
by its location on the hyper-plane. SVM identifies an optimum hyper-plane that delivers a significant minimal distance to the training data set [6]. In this research work, many potential hyperplanes were selected to separate the three classes of data points. The margin separation is crucial in classification accuracy because it supports the accurate classification of unidentified data points. K-Nearest Neighbor (K-NN). K-NN selects the k value (nearest neighborhood) in the training sample and assigns it a class based on how frequently it appears in that neighborhood. In K-NN, the K value can be selected. By calculating the distance between the nearest neighbor points, the K value is classified as its class [18]. The normalized Euclidean distance function evaluates the distance. The equation is below.
n
dist (X, Y ) =
i
(xi − yi )2 n
(1)
In Eq. (1), X and Y are the feature (X = x 1 , x 2 , x 3 ,…, x n and Y = y1 , y2 , y3 ,…, yn ). n denotes the feature space’s dimension [8]. Here, for the best result, K = 15 value was selected. This algorithm can adapt multiple classes without any additional work. Besides, this algorithm works very fast. Gaussian Naive Bayes (GNB). GNB is a probability-based classification system developed on the Bayes theorem. Concerning each class, GNB views each feature variable as an independent variable. The benefit of this model, it requires a small number of training data for training [29]. A hypothesis is produced for each class’s specific set of features, and the mean and variance are calculated. The equation is P(Xi|Y = c) =
1 2π σcl2
e
−(x−μcl )2 2 2σcl
(2)
In Eq. (2), cl stands for the labeling class. x is the learning rate. σcl2 is the variance, and μcl is the mean value [17]. In this research work, GNB was used because it doesn’t take much time to handle discrete and clinical data from the dataset.
3 Result and Discussion In this study, a model based on ADNI data was developed to identify CN individuals, those with MCI, and those with AD. The dataset contains cognitive test data, clinical test data, etc. Using a feature selection method, 30 significant features have been extracted. AV45, APOE4, ADMCI3, AV45AB12, ADASQ4, RAVLT_perc_forgetting, MD_CGH_L, AD_CGH_L, RD_CGH_L are highly correlated with the label feature. Accuracy, Precision, Recall, and f1-score have been calculated to examine the overall result. The ratio of accurately categorized records to the total amount of labeled data in the testing set is described as recall. Besides,
Early Prediction and Analysis of DTI and MRI-Based Alzheimer’s …
9
precision is referred to as the proportion of accurately classified documents relative to the number of classification attempts. In multi-label classification, the equation is as follows. n n T P i + i=0 T Ni ni=0 n n (3) Accuracy = n i=0 T P i + i=0 T N i + i=0 F P i + i=0 F N i n i=1 T P i n Precision = n (4) T P i + i=1 i=0 F P i n i=1 T P i n Recall = n (5) i=1 T P i + i=0 F P i F1 score = 2 · (Precision ∗ Recall)/(Precision + Recall)
(6)
where TPi , TN i , FPi , and FN i represent the True Positive, True Negative, False Positive, and False Negative. Figure 4a–c descript the confusion matrix of the SVM, GNB, and K-NN model. It compared the actual and predicted result. Different machine learning methods have been applied to the 30 significant features in order to detect the disease early. The model has been trained using 80% of the data, and 20% of the data has been utilized to test the model. To increase the accuracy of this model oversampling technique (SMOTE technique) has been used to balance the dataset. To normalize the dataset to obtain better accuracy, the min–max scalar has been utilized. Support vector machine, k-nearest neighbor, and gaussian naïve Bayes model were applied to identify the disease accurately. The accuracy of SVM, K-NN, and GNB is 93.84, 84.61, and 96.92%, respectively. GNB has the highest accuracy. This model is successful in predicting the disease correctly. The K-NN classifier achieves the lowest accuracy, which is 84.61%. The K value was 15 to improve the accuracy of the K-NN model. K-NN achieves an accuracy of 84%. Table 1 descript the performance classifier score of each method. GNB provides better accuracy than K-NN and SVM. Table 1 illustrates the accuracy, precision, recall, and f1-score value of the three models, and Table 2 compares the performance of prior studies with the suggested model for AD early detection. According to the summary of the previous research work studies, the common limitation is the dataset and classification model. For accurate classification results, dataset quality is crucial. Many research works use only neuroimaging techniques or cognitive tests. The proposed research work is different from previous work, and the dataset contains numerous neuroimaging techniques and cognitive tests.
10
A. Mahjabeen et al.
Fig. 4 Confusion matrix of a SVM, b GNB, and c K-NN results in AD identification of three classes named CN as 1, AD as 2, and MCI as 3 Table 1 Outcome statistics for the three different algorithms using the selected features Method name
Stage
Precision
Recall
F1 score
Accuracy (%)
K-NN
CN
0.72
0.86
0.78
84.61
MCI
0.82
0.67
0.74
SVM
GNB
AD
1.00
1.00
1.00
CN
0.90
0.90
0.90
MCI
0.90
0.90
0.90
AD
1.00
1.00
1.00
CN
1.00
0.90
0.95
MCI
0.91
1.00
0.95
AD
1.00
1.00
1.00
93.84
96.92
Early Prediction and Analysis of DTI and MRI-Based Alzheimer’s …
11
Table 2 Compares the results from previous studies with the present research work References (Authors)
Target
Best model
Accuracy (%)
Thushara et al. [34]
AD, MCI, cMCI, NC
Random forest
69.33
Rallabandi et al. [28]
CN, eMCI, lMCI, AD Non-linear SVM (RBF kernel)
75
Dahiwade et al. [10]
–
Convolutional neural network
84.5
Neelaveni and Devasana [24]
–
Support vector machine
85
Badnjevic et al. [5]
AD, cognitively normal
Random forest
85.77
Shah et al. [31]
Non-demented, demented
Soft voting classifier
86
Almubark et al. [3]
MCI, CN
Support vector machine with RBF kernel
88.06
Karatekin [16]
CN, MCI, AD
Random forest
91
Madiwalar [21]
Demented, non-demented
Extra tree classifier
93.14
Noor et al. [25]
Non-demented, demented, converted
SVM using linear Kernel with ‘C’ value 2
95
Proposed work
CN, MCI, AD
Gaussian naïve bayes
96.92
4 Conclusion This research study detects Alzheimer’s disease using various data and machine learning models for detecting AD stages automatically and faster. Thirty significant features were extracted from multiple biomarkers, including neuroimaging test data and clinical test data, by the SelectKBest feature selection method. Three models were compared to determine the best machine learning model. The disease’s stages are now simpler to resolve. GNB offers the highest accuracy with the selected thirty features compared to other algorithms. The accuracy of the GNB model is 96.92%. The feature selection algorithm chooses features from every biomarker group. It also helps to improve accuracy. This study has some limitations as well. The dataset used for this study was poor, as was the accuracy of the K-NN model. As a result, a vast number of datasets can help to improve accuracy. In the future, brain images will be used to diagnose Alzheimer’s disease stages.
References 1. ADNI. Alzheimer’s disease neuroimaging initiative. http://adni.loni.usc.edu/. Accessed 12 Jan 2022 2. Almubark I, Chang L, Nguyen T, Turner RS, Jiang X (2019) Early detection of Alzheimer’s
12
3.
4. 5.
6.
7.
8.
9. 10.
11.
12.
13.
14.
15.
16. 17.
18. 19.
A. Mahjabeen et al. disease using patient neuropsychological and cognitive data and machine learning techniques. In: 2019 IEEE international conference on big data (Big Data), pp 5971–5973. https://doi.org/ 10.1109/BigData47090.2019.9006583 Almubark I, Alsegehy S, Jiang X, Chang L-C (2020) Early detection of mild cognitive impairment using neuropsychological data and machine learning techniques. In: 2020 IEEE conference on big data and analytics (ICBDA). https://doi.org/10.1109/icbda50157.2020. 92897 2019 Alzheimer’s disease facts and figures. Alzheimer’s Dement 15(3):321–387. https://doi. org/10.1016/j.jalz.2019.01.010 Badnjevic A, Škrbi´c R, Gurbeta Pokvi´c L (2020) [IFMBE Proceedings] CMBEBIH 2019, vol 73 (Proceedings of the international conference on medical and biological engineering, 16–18 May 2019, Banja Luka, Bosnia and Herzegovina). Automatic detection of Alzheimer disease based on histogram and random forest, pp 91–96. https://doi.org/10.1007/978-3-030-179717_14 Battineni G, Chintalapudi N, Amenta F (2019) Machine learning in medicine: performance calculation of dementia prediction by support vector machines (SVM). Inform Med Unlocked 100200. https://doi.org/10.1016/j.imu.2019.100200 Benyoussef EM, Elbyed A, El Hadiri H (2017) Data mining approaches for Alzheimer’s disease diagnosis. Lecture notes in computer science, pp 619–631. https://doi.org/10.1007/978-3-31968179-5_54 Bucholc M, Ding X, Wang H, Glass DH, Wang H, Prasad G, Wong-Lin K et al (2019) A practical computerized decision support system for predicting the severity of Alzheimer’s disease of an individual. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.04.022 Chen Y, Xia Y (2021) Iterative sparse and deep learning for accurate diagnosis of Alzheimer’s disease. Pattern Recogn 116:107944. https://doi.org/10.1016/j.patcog.2021.107944 Dahiwade D, Patle G, Meshram E (2019) Designing disease prediction model using machine learning approach. In: 2019 3rd international conference on computing methodologies and communication (ICCMC). https://doi.org/10.1109/iccmc.2019.8819782 De A, Chowdhury AS (2020) DTI based Alzheimer disease classification with rank modulated fusion of CNNs and random forest. Expert Syst Appl 114338. https://doi.org/10.1016/j.eswa. 2020.114338 Eke CS, Jammeh E, Li X, Carroll C, Pearson S, Ifeachor E (2021) Early detection of Alzheimer’s disease with blood plasma proteins using support vector machines. IEEE J Biomed Health Inform 25(1):218–226. https://doi.org/10.1109/JBHI.2020.2984355 Fan Z, Xu F, Qi X et al (2020) Classification of Alzheimer’s disease based on brain MRI and machine learning. Neural Comput Appl 32:1927–1936. https://doi.org/10.1007/s00521-01904495-0 Ghoraani B, Boettcher LN, Hssayeni MD, Rosenfeld A, Tolea MI, Galvin JE (2021) Detection of mild cognitive impairment and Alzheimer’s disease using dual-task gait assessments and machine learning. Biomed Signal Process Control 64:102249. https://doi.org/10.1016/j.bspc. 2020.102249 Johnson KA, Fox NC, Sperling RA, Klunk WE (2012) Brain imaging in Alzheimer disease. Cold Spring Harb Perspect Med 2(4):a006213. https://doi.org/10.1101/cshperspect.a006213. PMID: 22474610. PMCID: PMC3312396 Karatekin Ç (2021) Early detection of Alzheimer’s disease using data mining: comparison of ensemble feature selection approaches Kruthika KR, Rajeswari, Maheshappa HD (2019) Multistage classifier-based approach for Alzheimer’s disease prediction and retrieval. Inform Med Unlocked 14:34–42. https://doi.org/ 10.1016/j.imu.2018.12.003 Kulkarni N (2018) Use of complexity based features in diagnosis of mild Alzheimer disease using EEG signals. Int J Inf Tecnol 10:59–64. https://doi.org/10.1007/s41870-017-0057-0 Liu L, Zhao S, Chen H, Wang A (2019) A new machine learning method for identifying Alzheimer’s disease. Simul Model Pract Theory 102023. https://doi.org/10.1016/j.simpat.2019. 102023
Early Prediction and Analysis of DTI and MRI-Based Alzheimer’s …
13
20. Lodha P, Talele A, Degaonkar K (2018) Diagnosis of Alzheimer’s disease using machine learning. In: 2018 fourth international conference on computing communication control and automation (ICCUBEA). https://doi.org/10.1109/iccubea.2018.8697386 21. Madiwalar S (2020) Classification and investigation of Alzheimer disease using machine learning algorithms. Biochem Biophys Res Commun 22. Mahmud M, Kaiser MS, McGinnity TM, Hussain A (2021) Deep learning in mining biological data. Cogn Comput 13(1):1–33 23. Naz S, Ashraf A, Zaib A (2022) Transfer learning using freeze features for Alzheimer neurological disorder detection using ADNI dataset. Multimedia Syst 28:85–94. https://doi.org/10. 1007/s00530-021-00797-3 24. Neelaveni J, Devasana MSG (2020) Alzheimer disease prediction using machine learning algorithms. In: 2020 6th international conference on advanced computing and communication systems (ICACCS), pp 101–104. https://doi.org/10.1109/ICACCS48705.2020.9074248 25. Noor MBT, Zenia NZ, Kaiser MS, Mamun SA, Mahmud M (2020) Application of deep learning in detecting neurological disorders from magnetic resonance images: a survey on the detection of Alzheimer’s disease, Parkinson’s disease and schizophrenia. Brain Inform 7(1):1–21 26. Oishi K, Mielke MM, Albert M, Lyketsos CG, Mori S (2011) DTI analyses and clinical applications in Alzheimer’s disease. J Alzheimers Dis 26(Suppl 3):287–296. https://doi.org/10.3233/ JAD-2011-0007. PMID: 21971468. PMCID: PMC3294372 27. Perera S, Hewage K, Gunarathne C, Navarathna R, Herath D, Ragel RG (2020) Detection of novel biomarker genes of Alzheimer’s disease using gene expression data. In: 2020 Moratuwa engineering research conference (MERCon), pp 1–6. https://doi.org/10.1109/MERCon50084. 2020.9185336 28. Rallabandi VPS, Tulpule K, Gattu M (2020) Automatic classification of cognitively normal, mild cognitive impairment and Alzheimer’s disease using structural MRI analysis. Inform Med Unlocked 100305. https://doi.org/10.1016/j.imu.2020.100305 29. Rohini M, Surendran D (2019) Classification of neurodegenerative disease stages using ensemble machine learning classifiers. Procedia Comput Sci 165:66–73. https://doi.org/10. 1016/j.procs.2020.01.071 30. Rohini M, Surendran D (2021) Toward Alzheimer’s disease classification through machine learning. Soft Comput 25:2589–2597. https://doi.org/10.1007/s00500-020-05292-x 31. Shah A, Lalakiya D, Desai S, Shreya, Patel V (2020) Early detection of Alzheimer’s disease using various machine learning techniques: a comparative study. In: 2020 4th international conference on trends in electronics and informatics (ICOEI) (48184). https://doi.org/10.1109/ icoei48184.2020.9142975 32. Talwar P, Kushwaha S, Chaturvedi M et al (2021) Systematic review of different neuroimaging correlates in mild cognitive impairment and Alzheimer’s disease. Clin Neuroradiol 31:953–967. https://doi.org/10.1007/s00062-021-01057-7 33. Thapa S, Singh P, Jain DK, Bharill N, Gupta A, Prasad M (2020) Data-driven approach based on feature selection technique for early diagnosis of Alzheimer’s disease. In: 2020 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN48605. 2020.9207359 34. Thushara A, UshaDevi Amma C, John A, Saju R (2020) Multimodal MRI based classification and prediction of Alzheimer’s disease using random forest ensemble. In: 2020 advanced computing and communication technologies for high performance applications (ACCTHPA), pp 249–256. https://doi.org/10.1109/ACCTHPA49271.2020.9213211 35. Zulfiker MS, Kabir N, Biswas AA, Nazneen T, Uddin MS (2021) An in-depth analysis of machine learning approaches to predict depression. Curr Res Behav Sci 2:100044. https://doi. org/10.1016/j.crbeha.2021.100044
Ensemble Machine Learning Technique for Identifying COVID-19 from CT Scan Images Rahul Deb Mohalder, Apu Sarder, Khandkar Asif Hossain, Laboni Paul, and Farhana Tazmim Pinki
Abstract COVID-19 is causing a pandemic situation around the globe and its rapid spread is very alarming for us. It is also affecting our economy. Now it is time to automatic identification of COVID-19 because of avoiding the time-consuming testing processes and erroneous conditions to detect COVID-19. In this research, we have proposed an ensemble machine learning-based technique for detecting COVID-19. We also compared the result with other existing deep learning-based approaches. Here a publicly available SARS-CoV2 computerized tomography (CT) scan dataset is used which contains 2482 CT-scan images including 1252 COVID positive cases and 1230 negative cases. Linear discriminant analysis is used to reduce the dimensionality of our dataset. We have applied state-of-the-art machine learning techniques and compared their accuracy. Random forest and extreme gradient boosting provide 99% plus accuracy which is almost the same and better than other works that use deep learning techniques. The proposed method will be supportive for people and policymakers to detect COVID-19 automatically and reduce the suffering of the general people. Keywords Covid-19 · Machine learning · Image analysis · CT scan image
R. Deb Mohalder (B) · L. Paul · F. Tazmim Pinki Khulna University, Khulna, Bangladesh e-mail: [email protected] L. Paul e-mail: [email protected] F. Tazmim Pinki e-mail: [email protected] A. Sarder · K. Asif Hossain North Western University, Khulna, Bangladesh e-mail: [email protected] K. Asif Hossain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_2
15
16
R. Deb Mohalder et al.
1 Introduction COVID-19, a coronavirus infection, has astonished the globe with its quick spread and potential virulence [8]. Funk et al. [7] presented a diagram of how humans are infected with the SARS-CoV-2 virus. Rapid testing and implementation of isolation protocol are necessary to fight this epidemic. In this scenario, the RT-PCR technique is suitable for accuracy. On the other hand, the RT-PCR technique still has some limitations for bulk testing. So, implementation of new testing techniques is at due. For that reason, Detection techniques, including algorithms for computer vision, machine learning, and deep learning, should be focused on. As these techniques are highly data-driven, they can not be used as a primary testing mechanism for COVID-19; as they accumulate a considerable amount of high-quality datasets, they can be highly efficient techniques nonetheless. Once it was determined that thoracic CT would be used to diagnose or screen patients, as these recent studies suggest, there is an immediate need to assess potentially very many imaging studies quickly. Machine Learning (ML) technology has the potential assisting radiologists with data triage, analysis of trends, and quantification. Chest CT scans can reveal any lung problems in multiple cases simultaneously. ML solutions can examine various instances simultaneously to see if a chest CT scan reveals any lung problems in numerous cases. Once validated and proven, these systems or variations on them can be extremely useful in identifying and treating virus-infected people. An ML technique can be quickly built from one or more algorithms that do a comparable task, similar to how COVID-19 represents a novel corona strain not detected in people and possibly a morphology of another coronavirus. According to our theory, ML-based strategies may be quickly developed by taking advantage of modifying and adapting existing ML models and combining them with prior clinical knowledge to address new difficulties and the COVID-19 category. Our objective is to create automated CT image processing tools based on machine learning that can distinguish coronavirus patients from those who do not have the disease to help diagnose, evaluate, and monitor sickness progression [4, 16]. We organized this paper as follows. Section 2 about previous COVID detection techniques. In the Sect. 3 COVID-19 dataset which we used in this work. Section 5 we analyzed our outcomes and contrasted them with other researcher’s operations section. Finally, in the Sect. 6, we gave the summary and plan of this research.
2 Related Works In this regard, Fang et al. [6] evaluated the sensitivity of the COVID-19 gold-standard assay, reverse transcription-polymerase chain reaction (RT-PCR), with other assays detection, to non-contrast chest computed tomography (CT). RT-PCR identifies viral nucleic acid. According to the authors, initial RT-PCR sensitivity was only 71%. In comparison, non-contrast chest CT was projected to have a sensitivity of 98%
Ensemble Machine Learning Technique for Identifying …
17
for identifying COVID-19 infection (results of the first RT-RPR test). The groundglass opacities in the examples they described in their research might be diffuse or focused. Xie et al. [21], a chest CT should be utilized to prevent false negative lab investigations, according to a study that also examined the early RT-PCR testing’s lack of sensitivity and revealed that 3% of 167 sufferers have negative RT-PCR results due to the infection while having heart CT characteristics indicative of Influenza pneumonia. Bernheim et al. [3] evaluated 121 heart CT scans from four Chinese hospitals during very early, middle, and late phases of illness. Additionally, they identified glass opacities on the ground-specifically, bilateral and peripheral consolidative and ground-class pulmonary opacities-as a characteristic of the infection. With passing time since the start of symptoms, they saw rising disease severity, and later illness indicators included higher reverse halo signs, linear opacities, crazy-paving patterns, and lung involvement. Bilateral lung involvement was present in 28% of patients with early disease, 76% of patients with intermediate disease, and 88% of patients with late illness (6–12 days). Maghdid et al. aim to develop an extensive collection of X-rays and CT image pictures from several sources and a straightforward yet efficient COVID-19 detection method using techniques for deep learning and transfer learning. That is how the generated Images from CT scans and X-rays are processed using a straightforward modified pre-trained AlexNet design and a convolution usage of neural networks (CNN). According to the tests, the techniques can achieve by utilizing a pre-trained network, with accuracy values of up to 98%, and using a modified CNN, with accuracy levels of 94.1% [12]. El Asnaoui et al. compared contemporary deep learning models and discovered that using Densnet 201 and Inception Resnet V2 generated superior results (Inception-ResNetV2 had an accuracy of 92.18% while Densnet201 had an accuracy of 88.09%) [5]. Ahsan et al. recent deep learning models were compared, and it was discovered that, On chest X-ray (95.4–100%) and CT scan (81.5–95.2%) image datasets, In terms of accuracy, NasNetMobile fared better than the other models [1]. Ahuja et al. having thought about the popular pre-trained architectures for the experimental evaluation and found out that, on the analyzed picture dataset, the Pre-trained transfer learning-based model ResNet18 outperformed it (99.82% for training, 97.32% for validation, and 99.4% for testing) than the alternatives [2]. Jin et al., Wang et al., Xu et al., Zheng et al. [9, 20, 22, 23] also proposed a based on deep learning model to predict or find COVID-19 levels based on CT scan pictures. Ozsahin et al. reviewed utilizing chest CT to get a COVID-19 diagnosis for AI [14]. Kumar et al. [10] suggested a structure for collecting a small amount of normalized various hospitals’ data using various types of Computed Tomography (CT) scanners and worldwide deep learning model training with Capsule Network-based segmentation, classification, and federated learning using blockchain. The data is authenticated via blockchain technology, and federated learning develops the model worldwide while maintaining the organization’s anonymity. They conducted extensive trials to validate their suggested strategy and showed that their results outperformed the alternatives in identifying COVID-19 patients.
18
R. Deb Mohalder et al.
Fig. 1 a Is a COVID affected CT scan image and b is a non-COVID CT scan image
Fig. 2 Workflow diagram of our proposed system
3 Dataset In this research, we used the well-known, frequently updated dataset of COVID-19, the SARS-CoV-2 CT-scan [19] for this study. This dataset comprises CT scan data from actual patients at Brazil’s Sao Paulo hospitals, where the negative to positive SARS-CoV-2 infection ratio is roughly 50%. Figure 1 illustrates a COVID and a non-COVID CT scan image of the dataset. This dataset comprises 1252 positive CT scan results for SARS-CoV-2 infection (COVID-19) and 1230 CT scans that are not positive for SARS-CoV-2 disease, for a total of 2482 CT scans.
4 Methodology This section will demonstrate the approaches utilized to develop our ML model and increase the accuracy of our forecasts. Figure 2 illustrates an architectural overview of our proposed COVID-19 detection system.
Ensemble Machine Learning Technique for Identifying …
19
4.1 Image Preprocessing In the image preprocessing step at first, we resize images into 150 × 150 pixels. Then we applied the filtering technique. By the filtering process, we were able to identify exact information from CT scan images. Through the color space conversion process we removed unwanted or not usable parts of the image. In the last step of this process, we applied the image augmentation technique to process the CT scan dataset.
4.2 Image Segmentation and Dataset Splitting By the K-means clustering method, we have segmented the data set. We split the dataset into two groups. One is the train set, and another one is the test set. 80% of the data were set aside for training and 20% for testing.
4.3 Classification We applied some classification methods to classify CT scam images perfectly. In this study, we used the classification techniques Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Decision Tree (CART), Random Forest (RF), and XGBClassifier (XGBoost). LR: Linear Regression (LR) is a method of machine learning that performs a regression job using supervised learning. Regression develops a value based on independent variables for the aim prediction. Determining the relationship between variables and predictions is its primary use. With LR, we were able to attain an accuracy of 82.8%. LDA: LDA or Linear Decrement Analysis is a dimensionality reduction method that is commonly used in modeling differences between groups while training models using supervised learning methods. Some of the extensions of LDA are QDA, FDA, and RDA. We used LDA and achieved 84.4% accuracy. KNN: KNN, often known as “K-nearest neighbor,” is a supervised learning technique primarily used to classify data based on the categorization of its neighbours. KNN maintains all of the examples that are accessible and categorize additional cases using a similarity metric. We used KNN and achieved 94.9% accuracy. CART: Classification and Regression Techniques (CART) Trees may be used to solve classification as well as regression issues. The distinction is in the target variable. With categorization, we attempt to predict a class label. We used CART and achieved 96.4% accuracy. RF: Random Forest is a well-known machine learning method that falls within the supervised learning approach group. Both classification and regression problems
20
R. Deb Mohalder et al.
Table 1 Accuracy and loss results of our proposed system Accuracy LR LDA KNN CART RF XGBoost
0.828 0.844 0.949 0.964 0.981 0.986
Loss 0.021 0.025 0.010 0.005 0.006 0.003
in machine learning may benefit from its use. By increasing the number of decision trees, accuracy and over-fitting may be controlled. We utilized Random Forest and attained an accuracy of 98.1%. Random Forest’s fundamental equation is as follows: Gini = 1 −
c (Pi )2 i=1
where, Pi reflects the relative frequency of the class in the dataset that you are seeing and c indicates the number of classes. XGBoost: Gradient Boosted decision trees are implemented in XGBoost. This technique sequentially generates decision trees. In XGBoost, weights have a significant impact. Weights are assigned to each independent variable before being entered into the decision tree, which forecasts outcomes. We utilized XGBoost and attained an accuracy of 98.6%. XGBoost’s simplified fundamental equation is as follows: L˜ (t) =
n i=1
1 gi f t (xi ) + h i f t2 (xi ) + ( f t ) 2
where, L is laplace transformation.
5 Result Analysis and Discussions We solely used the dataset of COVID-19, a CT scan for SARS-CoV-2 [19] in this study. There were 2482 CT-Scan images in two groups. We trained our model on 80% of the whole data and tested or verified it on 20% of the total data. Our main target was to classify CT scan images into two groups perfectly. Those groups were either COVID-positive or COVID-negative. We achieved the best accuracy from RF and XGBoost. RF accuracy was 98.1% with 0.006 loss and XGBoost accuracy was 99.04% with 0.003 loss. Table 1 illustrates our outcome accuracy and loss result. Precision, recall, and F1 Score values of the XGBoost classifier were calculated.
Ensemble Machine Learning Technique for Identifying …
21
Table 2 Measure the performance of XGBoost classifier Precision Recall F1-score COVID Non-COVID Accuracy Aacro avg Weighted avg
1.00 1.00
1.00 1.00
1.00 1.00
1.00 1.00
1.00 1.00 1.00 1.00 1.00
Support 246 246 492 492 492
Fig. 3 Confusion matrix of XGBoost classifier
Table 2 illustrated those outcomes. In Fig. 3 we illustrated the confusion matrix of XGBoost classifier. In the result analysis process, We compared all ML algorithms’ accuracy and showed that comparison by the box plot. Figure 4 illustrated the box plot representation of ML algorithms’ accuracy comparison. In Fig. 5 we showed the ROC curve. Classification or prediction performance is defined as the ROC curve. Using the ROC curve we showed the performance. It showed the AUC of the ensemble method grabs the apex in terms of AUC. In Table 3 we illustrated a comparative performance analysis of our proposed model with other previous works. In our proposed system’s lowest accuracy was 83% and high accuracy was 99%.
22
R. Deb Mohalder et al.
Fig. 4 Machine learning algorithm accuracy comparison
Fig. 5 ROC curve Table 3 Comparisons with other works Reference Algorithm Kundu et al. [11] Manocha et al. [13] Panwar et al. [15] Rahimzadeh et al. [17] Rahimzadeh et al. [18] Our model
Fuzzy rank-based CNN model Deep fusion strategy DL and grad-CAM based model Fully automated DL network BI-stage feature selection LR, LDA, KNN, CART, RF, XGBoost
Accuracy (%) 98.93 98.78 89.47–95.61 98.49 98.39 83.00–99.04
Ensemble Machine Learning Technique for Identifying …
23
6 Conclusion Due to its high velocity, COVID-19 requires automatic detection. This study sought to identify even the slightest signs of COVID-19 in patients. This research compares the accuracy of several machine learning approaches used to categorize COVID-19. According to the study, RF and XGBoost offer greater accuracy than other machine learning techniques. The suggested method beats other current methods compared with other deep learning techniques. Future studies will focus on improving classification performance and optimizing the proposed model. Furthermore, additional work will be done to grade the images based on the infection ratio, which is critical for lung cancer detection and therapy in practical applications.
References 1. Ahsan MM, Gupta KD, Islam MM, Sen S, Rahman M, Hossain MS et al (2020) Study of different deep learning approach with explainable AI for screening patients with covid-19 symptoms: using CT scan and chest x-ray image dataset. arXiv preprint arXiv:2007.12525 2. Ahuja S, Panigrahi BK, Dey N, Rajinikanth V, Gandhi TK (2021) Deep transfer learning-based automated detection of covid-19 from lung CT scan slices. Appl Intell 51(1):571–585 3. Bernheim A, Mei X, Huang M, Yang Y, Fayad ZA, Zhang N, Diao K, Lin B, Zhu X, Li K, et al (2020) Chest CT findings in coronavirus disease-19 (covid-19): relationship to duration of infection. Radiology 200463 4. Biswas M et al (2021) Accu3rate: a mobile health application rating scale based on user reviews. PloS One 16(12):e0258050 5. El Asnaoui K, Chawki Y (2021) Using x-ray images and deep learning for automated detection of coronavirus disease. J Biomol Struct Dyn 39(10):3615–3626 6. Fang Y, Zhang H, Xie J, Lin M, Ying L, Pang P, Ji W (2020) Sensitivity of chest CT for covid-19: comparison to RT-PCR. Radiology 296(2):E115–E117 7. Funk CD, Laferrière C, Ardakani A (2020) A snapshot of the global race for vaccines targeting SARs-cov-2 and the covid-19 pandemic. Front Pharmacol 11. https://doi.org/10.3389/fphar. 2020.00937 8. Jesmin S, Kaiser MS, Mahmud M (2020) Towards artificial intelligence driven stress monitoring for mental wellbeing tracking during covid-19. In: Proceedings of the WI-IAT, pp 845–851 9. Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, Deng L, Zheng C, Zhou J, Shi H et al (2020) Development and evaluation of an artificial intelligence system for covid-19 diagnosis. Nat Commun 11(1):1–14 10. Kumar R, Khan AA, Kumar J, Golilarz NA, Zhang S, Ting Y, Zheng C, Wang W et al (2021) Blockchain-federated-learning and deep learning models for covid-19 detection using ct imaging. IEEE Sens J 21(14):16301–16314 11. Kundu R, Basak H, Singh PK, Ahmadian A, Ferrara M, Sarkar R (2021) Fuzzy rank-based fusion of CNN models using Gompertz function for screening covid-19 CT-scans. Sci Rep 11(1):1–12 12. Maghdid HS, Asaad AT, Ghafoor KZ, Sadiq AS, Mirjalili S, Khan MK (2021) Diagnosing covid-19 pneumonia from x-ray and CT images using deep learning and transfer learning algorithms. In: Multimodal image exploitation and learning 2021, vol 11734. International Society for Optics and Photonics, p 117340E 13. Manocha A, Bhatia M (2022) A novel deep fusion strategy for covid-19 prediction using multimodality approach. Comput Electr Eng 103:108274
24
R. Deb Mohalder et al.
14. Ozsahin I, Sekeroglu B, Musa MS, Mustapha MT, Uzun Ozsahin D (2020) Review on diagnosis of covid-19 from chest CT images using artificial intelligence. Comput Math Methods Med 2020 15. Panwar H, Gupta P, Siddiqui MK, Morales-Menendez R, Bhardwaj P, Singh V (2020) A deep learning and grad-cam based color visualization approach for fast detection of covid-19 cases using chest x-ray and ct-scan images. Chaos, Solitons Fractals 140:110190 16. Paul A, Basu A, Mahmud M, Kaiser MS, Sarkar R (2022) Inverted bell-curve-based ensemble of deep learning models for detection of covid-19 from chest x-rays. Neural Comput Appl 1–15 17. Rahimzadeh M, Attar A, Sakhaei SM (2021) A fully automated deep learning-based network for detecting covid-19 from a new and large lung CT scan dataset. Biomed Signal Process Control 68:102588 18. Sen S, Saha S, Chatterjee S, Mirjalili S, Sarkar R (2021) A bi-stage feature selection approach for covid-19 prediction using chest CT images. Appl Intell 51(12):8985–9000 19. Soares E, Angelov P, Biaso S, Froes MH, Abe DK (2020) Sars-cov-2 CT-scan dataset: a large dataset of real patients CT scans for Sars-cov-2 identification. MedRxiv. https://doi.org/10. 1101/2020.04.24.20078584. https://www.medrxiv.org/content/early/2020/05/14/2020.04.24. 20078584 20. Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J, Cai M, Yang J, Li Y, Meng X et al (2021) A deep learning algorithm using CT images to screen for corona virus disease (covid-19). Eur Radiol 31(8):6096–6104 21. Xie X, Zhong Z, Zhao W, Zheng C, Wang F, Liu J (2020) Chest CT for typical coronavirus disease 2019 (covid-19) pneumonia: relationship to negative RT-PCR testing. Radiology 296(2):E41–E45 22. Xu X, Jiang X, Ma C, Du P, Li X, Lv S, Yu L, Ni Q, Chen Y, Su J et al (2020) A deep learning system to screen novel coronavirus disease 2019 pneumonia. Engineering 6(10):1122–1129 23. Zheng C, Deng X, Fu Q, Zhou Q, Feng J, Ma H, Liu W, Wang X (2020) Deep learning-based detection for covid-19 from chest CT using weak label. MedRxiv
Machine Learning-Based Tomato Leaf Disease Diagnosis Using Radiomics Features Faisal Ahmed, Mohammad Naim Uddin Rahi, Raihan Uddin, Anik Sen, Mohammad Shahadat Hossain, and Karl Andersson
Abstract Tomato leaves can be infected with various infectious viruses and fungal diseases that drastically reduce tomato production and incur a great economic loss. Therefore, tomato leaf disease detection and identification are crucial for maintaining the global demand for tomatoes for a large population. This paper proposes a machine learning-based technique to identify diseases on tomato leaves and classify them into three diseases (Septoria, Yellow Curl Leaf, and Late Blight) and one healthy class. The proposed method extracts radiomics-based features from tomato leaf images and identifies the disease with a gradient boosting classifier. The dataset used in this study consists of 4000 tomato leaf disease images collected from the Plant Village dataset. The experimental results demonstrate the effectiveness and applicability of our proposed method for tomato leaf disease detection and classification. Keywords Tomato leaf disease · Machine learning · Radiomics features · Classification
1 Introduction Bangladesh is an agricultural country. Agriculture is one of the driving forces of Bangladesh’s economy, contributing around 12.92% of GDP. About 50% of Bangladesh’s population primarily engages in agriculture, with more than 70% of the land devoted to growing crops. It is one of the most essential and popular vegetables F. Ahmed (B) · M. Naim Uddin Rahi · R. Uddin · A. Sen Department of Computer Science and Engineering, Premier University, Chattogram, Bangladesh e-mail: [email protected] M. Shahadat Hossain University of Chittagong, Chattogram, Bangladesh e-mail: [email protected] K. Andersson Luleå University of Technology, Skelleftea, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_3
25
26
F. Ahmed et al.
in Bangladesh. Bangladesh is the third-largest tomato producer in South Asia, with a cultivation area of nearly 13,066 hectares and production quantities of almost 74,000 metric tons. Tomato is one of the most popular and nutritious vegetable crops worldwide. It contains three essential antioxidants: vitamins E, C, and Beta-Carotene. It also has a vital antioxidant element called lycopene, which prevents cancer. In 2014–15, Bangladesh produced about 413610 MT of tomatoes; 95% of tomatoes were consumed fresh (BBS 2015). The total market value of fresh and processed tomatoes for FY 2014–15 was approximately USD 256 million. From 2005 to 2015, tomato production increased at an average rate of roughly 13% per year, the highest in South Asia [42]. However, unfortunately, all the cultivated varieties of tomato plants suffer from several diseases, especially on their leaves over their entire growth period. Such diseases damage 10–30% of the whole tomato plants and cost at least $220 billion. Therefore, diagnosing tomato leaf disease is necessary to avoid such a substantial financial loss. It is challenging and time-consuming for humans to detect leaf diseases accurately because of their complex nature. Therefore, much research has been done to diagnose tomato leaf diseases using artificial intelligence techniques such as machine learning image processing. In recent years, artificial intelligence (AI) has attracted many researchers to contribute in diverse fields and challenging research assignments such as: brain signal analysis [46], neurodevelopmental disorder assessment and classification focusing on autism [7, 11, 12, 14, 23, 35, 45], neurological disorder detection and management [10, 13, 15, 43], ultrasound image [44], various disease diagnosis [17, 19, 33, 36–38], smart healthcare service delivery [16, 22, 32], text and social media mining [2, 24, 39], understanding student engagement [41], etc. This paper presents a method for diagnosing the three most common tomato leaf diseases: Late Blight, Septoria, and Yellow Curl, using the radiomic feature and Gradient boosting algorithm. It can assist farmers in bypassing financial loss by early diagnosis of tomato leaf disease.
2 Literature Review An in-depth review of existing works accentuates various scopes of classified research fields before us. The recent success of deep learning algorithms led researchers to investigate the performance to identify leaf diseases. This section delivers some existing studies on identifying and classifying leaf diseases, especially tomato leaf diseases. A CNN model consisting of 3 convolutional layers and 3 max-pooling layers was designed to detect tomato leaf disease utilizing samples from the PlantVillage dataset [3]. There were 10000, 7000, and 500 images in the training, validation, and test set. The proposed model attained an accuracy of 91.2% for the nine diseases and one healthy class. Another study on a similar dataset combined AlexNet and SqueezeNet to recognize the tomato plant leaf disease [21].
Machine Learning-Based Tomato Leaf Disease …
27
In [25], the authors introduced a CNN model for identifying tomato leaf disease using SqueezeNet architecture. The images employed to train and test this model were provided by the Vegetable Crops Research Institute (Balitsa) in Lembang. They used 1400 images of tomato leaves from seven different classes, including healthy leaves. Applying the cross-validation, the proposed model recorded 86.92% accuracy. The authors of [50] proposed a transfer learning approach from pre-trained InceptionV3 to detect disease in tomato leaves. Their proposed model reached 92.19% accuracy for the training set and 93.03% accuracy for the test set. To identify tomato leaf disease, Tian et al. [47] proposed a deep neuro-fuzzy neural network model. They built this model using a large dataset that included eight different types of tomato leaves, both infected and uninfected. To extract complex features, they used the neuro-fuzzy network’s fuzzy inference and pooling layers. A modified YOLOv3 object detection model was proposed to detect tomato diseases by [34]. Multiple feature detection was applied based on the image pyramid attaining an accuracy of 92.39%. All the works mentioned above are based on deep learning techniques, which require high computational power and a vast volume of data. To the best of our knowledge, no prior work has explored the radiomic features with traditional machine learning for tomato leaf disease detection. In this study, we have addressed this research gap. We have proposed a method for the diagnosis of tomato leaf disease that extracts radiomic features and feeds them to a gradient boosting classifier to predict disease. The proposed method achieves state-of-the-art results.
3 Proposed Method The block diagram of our proposed method is shown in Fig. 1. The figure shows that the proposed method consists of three steps—image prepossessing, feature extraction, feature selection, and classification. Each of these steps is described below.
3.1 Image Prepossessing In this step, all the noise, blurry and distorted images are removed from the dataset. After that, the background is subtracted from the images. Finally, all the images are converted to 256*256 pixels.
3.2 Feature Extraction In this step, several discriminatory feature groups known as radiomics features are extracted from the background removed images. To the best of our knowledge,
28
F. Ahmed et al.
Fig. 1 Block diagram of proposed method
Table 1 Summary of the different type of features extracted Feature group Dimension First order statistics Shape-based (3D) Shape-based (2D) Gray Level Coocurence Matrix (GLCM) Gray Level Run Length Matrix (GLRLM) Gray Level Size Zone Matrix (GLSZM) Neighbouring Gray Tone Difference Matrix (NGTDM) Gray Level Dependence Matrix (GLDM) Total
19 16 10 24 16 16 5 14 120
radiomics-based features are the first used for tomato leaf disease classification in this study. The feature groups extracted for tomato leaf disease identification includes First Order Statistics, Shape-based (3D), Shape-based (2D), Gray Level Cooccurrence Matrix, Gray Level Run Length Matrix, Gray Level Size Zone Matrix, Neighbouring Gray Tone Difference Matrix, Gray Level Dependence Matrix [1]. The feature dimension of each type of feature group is shown in Table 1.
Machine Learning-Based Tomato Leaf Disease …
29
All the feature groups are concatenated together to form a feature vector to train a classifier for tomato leaf diseases.
3.3 Feature Selection All the radiomic features extracted may not be significant for classification and may lead to the curse of dimensionality, which causes performance degradation. Therefore, using the Analysis of Variance (ANOVA) hypothesis testing approach, the important features for identifying leaf diseases are chosen. We picked 50 characteristics whose p-values are less than the significance level of 0.01 based on the test statistics. The ANOVA test shows that first-order statistics, Gray Level Cooccurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), and Gray Level Size Zone Matrix (GLSZM), are the most important feature groups for tomato leaf disease classification.
3.4 Classification Finally, this study applies the Gradient Boosting(GB) algorithm to identify tomato leaf diseases and healthy leaves. GB is an ensemble machine learning technique that turns a group of weak learners into a single robust learner iteratively by optimizing any differentiable loss function using the gradient decent method [20].
4 Experiments and Results In this section, we describe our experiments, present the results, and discuss them.
4.1 Datasets The dataset for this study is collected from the PlantVillage website, which includes images of tomato leaves with three types of disease (Yellow Curl Leaf, Late Blight, and Septoria), and healthy leaves. The dataset has a total of 4000 images, 1000 of each category. We divided the dataset into train and test with an 80:20 ratio. The model is trained with 80% data and tested on the rest.
30
F. Ahmed et al.
4.2 Experimental Setup All the experiments were performed on the intel core i5-6400 [email protected] GHz CPU running on the Windows operating system. Python(3.7.7) programming language is used to implement the method. Several Python libraries have also been used, including Radiomics, SimpleITK, logging, Pandas, NumPy, Scikit-learn, Cv2, Matplotlib, and Seaborn.
4.3 Performance Measure The performance metrics used in this study are accuracy, precision, sensitivity/recall, and f1-score. These are widely used performance measure matrices for classification problems. The metrics are calculated based on the equations (1)–(4). Accuracy is defined as the ratio of total number of predictions that are correct Accuracy =
TP +TN T P + FP + T N + FN
(1)
Precision can be thought of as a measure of exactness. It measures what percentage of tuples labelled as positive are actually such. Pr ecision =
TP T P + FP
(2)
Recall, also known as sensitivity, is a measure of completeness. It is the proportion of positive observed values correctly predicted as positive. Recall =
TP T P + FN
(3)
F1-score combines precision and recall into a single measure. It is the harmonic mean of precision and recall defined as F1 − scor e =
2 ∗ pr ecision ∗ r ecall pr ecision + r ecall
(4)
We also graphically show the performance of our proposed method for different threshold values using the receiver operating characteristic (ROC) curve. The ROC curve plots the true positive rates as opposed to the false-positive rates for different thresholds. The classifiers that give the diagram near the top left corner indicate better performance. The closer the curve comes at a forty-five-degree angle to the ROC space, the less perfect the test.
Machine Learning-Based Tomato Leaf Disease …
31
4.4 Result We compared our method with other machine learning-based classifiers such as knearest neighbour (KNN), Decision Tree (DT), and Support vector machine (SVM). For a fair comparison, the same set of features is used to train all the classifiers. The results in terms of micro-precision, weighted precision, micro f1, weighted f1, macro recall, and weighted recall are shown in Table 2. The results show that our proposed method achieved the highest score in all performance measures which indicates the superiority of our proposed method. The results of other classifiers are not significantly deviated from the proposed method which demonstrates that extracted features are significant in classifying different leaf diseases. The classwise classification performance of our proposed method is shown in Table 3. We can see that precision, recall, and f1-score are consistent for all the classes. This proves the robustness and generality of our proposed method. The class-wise ROC curve and their AUC values are shown in Fig. 2 for a better visual representation of the performance of our proposed method. The figure shows that each class achieves the perfect AUC value. The micro average and macro average AUC scores are 0.99 and 0.99. Such performance of our proposed method comes from the use of the efficient set of radiomics features and the gradient boosting tree classifier.
Table 2 Comparison of our proposed method with other machine learning algorithms Macro Weighted Macro f1 Weighted f1 Macro Weighted precision precision (%) (%) recall (%) recall (%) (%) (%) Proposed method (GBC) KNN DT SVC
94
94
94
94
94
94
82 89 78
82 89 78
825 89 77
81 89 77
82 89 77
82 89 77
Table 3 Class-wise classification reports of the proposed method Category Precision (%) Recall (%) Healthy leaf Yellow curl leaf spot Late blight leaf spot Septeria leaf spot
99 95 90 92
98 96 92 91
f1-score (%) 98 96 91 92
32
F. Ahmed et al.
Fig. 2 Class-wise receiver operating characteristics curve of the proposed method
5 Conclusion Agriculture is one of the driving forces of Bangladesh’s economy. The majority of our population is engaged in agriculture. Tomatoes are one of the most common crops grown in huge quantities. Tomato leaf disease detection is most important to the economy’s growth. In this study, we have developed a machine learningbased approach to identify Bangladesh’s three most common types of tomato leaf diseases. The proposed model can extract significant radiomic features from tomato leaf images and identify disease types with 94% accuracy by applying the gradient boosting algorithm. Therefore, it can be used as a decision tool to help farmers identify tomato leaf diseases. An explainable artificial intelligent technique for tomato leaf disease diagnosis using the belief rule-based expert system [5, 6, 9, 26–29, 31, 48, 49] will be investigated in the future. We will also implement Federated learning of Convolutional Neural Networks [4, 8, 18, 30, 40, 51] for multi-institutional collaboration for tomato leaf disease diagnosis.
Machine Learning-Based Tomato Leaf Disease …
33
References 1. Radiomic features (2022). https://pyradiomics.readthedocs.io/en/latest/features.html. Last accessed 13 Feb 2022 2. Adiba FI, Islam T, Kaiser MS, Mahmud M, Rahman MA (2020) Effect of corpora on classification of fake news using Naive bayes classifier. Int J Autom Artif Intell Mach Learn 1(1):80–92 3. Agarwal M, Singh A, Arjaria S, Sinha A, Gupta S (2020) Toled: tomato leaf disease detection using convolution neural network. Procedia Comput Sci 167:293–301 4. Ahmed F, Akther N, Hasan M, Chowdhury K, Mukta MSH (2021) Word embedding based news classification by using CNN. In: 2021 international conference on software engineering & computer systems and 4th international conference on computational science and information management (ICSECS-ICOCSIM). IEEE, pp 609–613 5. Ahmed F, Chakma RJ, Hossain S, Sarma D et al (2020) A combined belief rule based expert system to predict coronary artery disease. In: 2020 international conference on inventive computation technologies (ICICT). IEEE, pp 252–257 6. Ahmed F, Hossain MS, Islam RU, Andersson K (2021) An evolutionary belief rule-based clinical decision support system to predict covid-19 severity under uncertainty. Appl Sci 11(13):5810 7. Ahmed S et al (2022) Toward machine learning-based psychological assessment of autism spectrum disorders in school and community. In: Proceedings of the TEHI, pp 139–149 8. Ahmed TU, Hossain S, Hossain MS, ul Islam R, Andersson K (2019) Facial expression recognition using convolutional neural network with data augmentation. In: 2019 ICIEV. IEEE, pp 336–341 9. Ahmed TU, Jamil MN, Hossain MS, Andersson K, Hossain MS (2020) An integrated realtime deep learning and belief rule base intelligent system to assess facial expression under uncertainty. In: 2020 ICIEV. IEEE, pp 1–6 10. Akhund NU et al (2018) Adeptness: AlzheimerÕs disease patient management system using pervasive sensors-early prototype and preliminary results. In: Proceedings of the Brain Informatics, pp 413–422 11. Akter T et al (2021) Towards autism subtype detection through identification of discriminatory factors using machine learning. In: Proceedings of the brain informatics, pp 401–410 12. Al Banna M et al (2020) A monitoring system for patients of autism spectrum disorder using artificial intelligence. In: Proceedings of the brain informatics, pp 251–262 13. Al Mamun S, Kaiser MS, Mahmud M (2021) An artificial intelligence based approach towards inclusive healthcare provisioning in society 5.0: a perspective on brain disorder. In: Proceedings of the brain informatics, pp 157–169 14. Biswas M, Kaiser MS, Mahmud M, Al Mamun S, Hossain M, Rahman MA et al (2021) An Xai based autism detection: the context behind the detection. In: Proceedings of the brain informatics, pp 448–459 15. Biswas M, Rahman A, Kaiser MS, Al Mamun S, Ebne Mizan KS, Islam MS, Mahmud M (2021) Indoor navigation support system for patients with neurodegenerative diseases. In: Proceedings of the brain informatics, pp 411–422 16. Biswas M et al (2021) Accu3rate: a mobile health application rating scale based on user reviews. PloS one 16(12):e0258050 17. Chen T et al (2022) A dominant set-informed interpretable fuzzy system for automated diagnosis of dementia. Front Neurosci 16:86766 18. Chowdhury RR, Hossain MS, ul Islam R, Andersson K, Hossain S (2019) Bangla handwritten character recognition using convolutional neural network with data augmentation. In: 2019 ICIEV. IEEE, pp 318–323 19. Deepa B, Murugappan M, Sumithra M, Mahmud M, Al-Rakhami MS (2021) Pattern descriptors orientation and map firefly algorithm based brain pathology classification using hybridized machine learning algorithm. IEEE Access 10:3848–3863 20. DeepAI: Gradient boosting (2019). https://deepai.org/machine-learning-glossary-and-terms/ gradient-boosting. Last accessed 12 May 2022
34
F. Ahmed et al.
21. Durmu¸s H, Güne¸s EO, Kırcı M (2017) Disease detection on the leaves of the tomato plants by using deep learning. In: 2017 6th international conference on agro-geoinformatics. IEEE, pp 1–5 22. Farhin F, Kaiser MS, Mahmud M (2021) Secured smart healthcare system: blockchain and Bayesian inference based approach. In: Proceedings of the TCCE, pp 455–465 (2021) 23. Ghosh T et al (2021) Artificial intelligence and internet of things in screening and management of autism spectrum disorder. Sustain Cities Soc 74:103189 24. Ghosh T et al (2021) An attention-based mood controlling framework for social media users. In: Proceedings of the brain informatics, pp 245–256 25. Hidayatuloh A, Nursalman M, Nugraha E (2018) Identification of tomato plant diseases by leaf image using squeezenet model. In: 2018 international conference on information technology systems and innovation (ICITSI). IEEE, pp 199–204 26. Hossain MS, Ahmed F, Andersson K et al (2017) A belief rule based expert system to assess tuberculosis under uncertainty. J Med Syst 41(3):1–11 27. Hossain MS, Habib IB, Andersson K (2017) A belief rule based expert system to diagnose dengue fever under uncertainty. In: 2017 computing conference. IEEE, pp 179–186 28. Hossain MS, Rahaman S, Kor AL, Andersson K, Pattinson C (2017) A belief rule based expert system for datacenter PUE prediction under uncertainty. IEEE Trans Sustain Comput 2(2):140–153 29. Hossain MS, Rahaman S, Mustafa R, Andersson K (2018) A belief rule-based expert system to assess suspicion of acute coronary syndrome (ACS) under uncertainty. Soft Comput 22(22):7571–7586 30. Islam MZ, Hossain MS, ul Islam R, Andersson K (2019) Static hand gesture recognition using convolutional neural network with data augmentation. In: 2019 ICIEV. IEEE, pp 324–329 31. Islam RU, Hossain MS, Andersson K (2020) A deep learning inspired belief rule-based expert system. IEEE Access 8:190637–190651 32. Kaiser MS et al (2021) 6g access network for intelligent internet of healthcare things: opportunity, challenges, and research directions. In: Proceedings of the TCCE, pp 317–328 33. Kumar I et al (2022) Dense tissue pattern characterization using deep neural network. Cogn Comput 14(5):1728–1751 34. Liu J, Wang X (2020) Tomato diseases and pests detection based on improved yolo v3 convolutional neural network. Front Plant Sci 11:898 35. Mahmud M et al (2022) Towards explainable and privacy-preserving artificial intelligence for personalisation in autism spectrum disorder. In: Proceedings of the HCII, pp 356–370 36. Mammoottil MJ et al (2022) Detection of breast cancer from five-view thermal images using convolutional neural networks. J Healthc Eng 2022 37. Mukherjee H et al (2021) Automatic lung health screening using respiratory sounds. J Med Syst 45(2):1–9 38. Mukherjee P et al (2021) icondet: an intelligent portable healthcare app for the detection of conjunctivitis. In: Proceedings of the AII, pp 29–42 39. Rabby G et al (2018) A flexible keyphrase extraction technique for academic literature. Procedia Comput Sci 135:553–563 40. Rahman KJ, Ahmed F, Akhter N, Hasan M, Amin R, Aziz KE, Islam AM, Mukta MSH, Islam AN (2021) Challenges, applications and design aspects of federated learning: a survey. IEEE Access 9:124682–124700 41. Rahman MA et al (2022) Explainable multimodal machine learning for engagement analysis by continuous performance test. In: Proceedings of the HCII, pp 386–399 42. Sarma P, Ali M (2019) Value chain analysis of tomato: a case study in Jessore district of Bangladesh. Int J Sci Res 8(2):924–932 43. Shaffi N et al (2022) Triplet-loss based Siamese convolutional neural network for 4-way classification of AlzheimerÕs disease. In: Proceedings of the brain Informatics, pp 277–287 44. Singh R, Mahmud M, Yovera L (2021) Classification of first trimester ultrasound images using deep convolutional neural network. In: Proceedings of the AII, pp 92–105
Machine Learning-Based Tomato Leaf Disease …
35
45. Sumi AI et al (2018) Fassert: a fuzzy assistive system for children with autism using internet of things. In: Proceedings of the brain informatics, pp 403–412 46. Tahura S, Hasnat Samiul S, Shamim Kaiser M, Mahmud M (2021) Anomaly detection in electroencephalography signal using deep learning model. In: Proceedings of the TCCE, pp 205–217 47. Tian X, Meng X, Wu Q, Chen Y, Pan J (2022) Identification of tomato leaf diseases based on a deep neuro-fuzzy network. J Inst Eng (India) Ser A 1–12 48. Ul Islam R, Andersson K, Hossain MS (2015) A web based belief rule based expert system to predict flood. In: Proceedings of the 17th international conference on information integration and web-based applications and services, pp. 1–8 49. Ul Islam R, Hossain MS, Andersson K (2018) A novel anomaly detection algorithm for sensor data under uncertainty. Soft Comput 22(5):1623–1639 50. Wadadare SS, Fadewar H (2022) Deep learning convolution neural network for tomato leaves disease detection by inception. In: International conference on computing in engineering and technology. Springer, pp 208–220 51. Zisad SN, Hossain MS, Andersson K (2020) Speech emotion recognition in neurological disorders using convolutional neural network. In: International conference on brain informatics. Springer, pp 287–296
Effective Feature Extraction via Segmented t-Stochastic Neighbor Embedding for Hyperspectral Image Classification Tanver Ahmed , Md. Hasanul Bari , Masud Ibn Afjal , Adiba Mahjabin Nitu, Md. Palash Uddin , and Md. Abu Marjan
Abstract Remote sensing hyperspectral imagery (HSI) contains significant information about the earth’s objects captured by hundreds of narrow and adjoining spectral bands. Considering all the bands for classification, the performance gets hampered. Consequently, it is crucial to reduce the HSI bands usually via feature extraction and feature selection. Principal Component Analysis (PCA) is one of the broadly used unsupervised feature extraction techniques. However, it considers global variance neglecting the local structure of the data. Segmented PCA (SPCA) overcomes this problem of PCA to some extent. Although t-Stochastic Neighbor Embedding (t-SNE), another unsupervised feature extraction technique, can be effectively used for data visualization and feature extraction, T-SNE is a probabilistic approach that tries to preserve the local structure of the dataset by persevering the relative distance between the data points. More local subtle characteristics can be considered by performing segmentation before applying t-SNE like SPCA. As such, in this paper, we propose the Segmented t-SNE (Seg t-SNE) feature extraction method exploiting the benefits of segmentation and t-SNE together. To analyze the efficacy, the performance of our proposed method Seg t-SNE is compared with PCA, SPCA, and t-SNE. The experimental outcomes demonstrate that Seg t-SNE (88.70%) outperT. Ahmed · Md. H. Bari · M. I. Afjal (B) · A. M. Nitu · Md. P. Uddin · Md. A. Marjan Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh e-mail: [email protected] T. Ahmed e-mail: [email protected] A. M. Nitu e-mail: [email protected] Md. P. Uddin e-mail: [email protected] Md. A. Marjan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_4
37
38
T. Ahmed et al.
forms PCA (84.39%), all the variants of SPCA (maximum of 88.59%), and t-SNE (77.52%) considering all classes’ samples. Keywords Remote sensing · Hyperspectral images · Feature extraction · PCA · Segmented PCA · t-SNE · Segmented t-SNE
1 Introduction Hyperspectral Images (HSI) are captured with the aid of remote sensing techniques and technologies. Several hundreds of narrow and contiguous bands make up every hyperspectral image. These characteristics of HSI offer to explore the earth’s materials in fine detail. The spectral bands span the range from 0.4 to 2.5 µm [1, 2]. Several conventional (e.g., mining, agriculture, geology, and military surveillance) and new applications (food quality, security, pharmaceutical, skin cancer analysis, and verification of counterfeit goods) are being developed using HSI [3–5]. The improved discriminability in data processing provided by HSI having a spectral resolution in nanometers comes at the trade-off of very large datasets and computational complexity [3]. The images are susceptible to tampering since they are often captured by aircraft or satellite sensors. It is necessary to undertake further preprocessing, such as atmospheric correction, radiometric correction, or geometric correction. HSI data is commonly arranged as a hypercube, with the first two dimensions defining the spatial information and the third dimension describing the spectral information. The datacube can be denoted as X Y F, where X , Y , and F represent the spatial height, spatial width, and the number of spectral bands, respectively. The complexity and duration of processing increase along with the number of bands. A few bands could provide less discriminating information because of the strong correlation between nearby bands. It is possible that different bands convey different amounts of information [6]. However, due to its high dimensionality, HSI is susceptible to the Hughes phenomenon or curse of dimensionality when there is an unbalanced ratio between the training samples and spectral bands [7]. In order to solve the dimensionality issue, dimensionality reduction techniques can be used to pick up the important bands and obtain inherent information. For dimensionality reduction, many feature selection and feature extraction approaches are used. Feature selection selects a subset of pertinent features from the original dataset while excluding the less significant ones. Feature extraction, on the other hand, reduces the number of features by reusing existing ones to produce new ones. The extracted features summarize the majority of the information carried out by the original dataset. There are two ways to execute feature extraction: supervised and unsupervised. While no a priori information is required for the unsupervised technique, it must be supplied for the supervised method in order to extract features [8]. Principal Component Analysis (PCA) is one of the most commonly used unsupervised feature extraction techniques [9]. To get the lowest dimensional representation of the dataset that retains the most data, PCA
Effective Feature Extraction via Segmented t-Stochastic Neighbor …
39
employs orthogonal projections. The modified features are linearly uncorrelated in nature and are known as principal components. Often, in comparison to the original feature dimension, the number of principal components is significantly lessened. When there are too many samples, it becomes harder to calculate the covariance matrix, which is a requirement of PCA. As the global variance is taken into account, PCA struggles to extract the divergent contributions of the features [3]. In the HSI bands which are close to one another, their correlations are typically higher than when the bands are farther apart. To exploit this characteristic of HSI data, the authors of [10] have introduced a method named Segmented PCA (SPCA) which is the modification of the conventional PCA. Furthermore, to extract relevant data for efficiently identifying the earth materials, a new segmentation method based on wavelength regions is introduced. It is called Spectrally Segmented PCA (SSPCA). In [11], a probabilistic approach named Stochastic Neighbor Embedding (SNE) is introduced. The method embeds high-dimensional data points into lower dimension preserving the relative distance between the neigbor data points. For adjusting the data points in the lower dimension, the minimization of the Kullback-Leibler divergences is done in this method. Another variant of SNE named t-Stochastic Neighbor Embedding (t-SNE) is introduced in [12]. It eliminates the propensity for points to cluster in the map’s center, which makes this approach better than the existing SNE. One more approach combining the t-SNE with the convolutional neural network (CNN) for HSI classification is introduced in [13]. The authors of [14] have introduced an interactive tool named t-viSNE. It analyzes the t-SNE projections and enables users to evaluate the accuracy of the projections and comprehend the thinking behind the algorithm’s clustering decisions. The potential assembly features, which are taken from both the dimension-reduced CNN (DR-CNN) and the multiscaleCNN, are automatically captured. In this paper, we propose the segmented t-SNE (Seg t-SNE) feature extraction method for extracting more subtle local features from the band subgroups of HSI as well as from the entire HSI. To end this, the following list summarizes this paper’s significant contributions. • A detailed investigation into the performance of PCA-based feature extraction techniques for HSI classification. • A correlation-based effective feature extraction technique Segmented t-SNE (Seg t-SNE) for HSI classification. • An empirical study to demonstrate the superiority of the suggested Seg t-SNE through a number of experiments. The rest of this paper is structured as follows. Section 2 discusses the insights of the PCA and t-SNE-based feature extraction methods for HSI. In Sect. 3, we explain the overall idea and derivation of the proposed Seg t-SNE approach. Section 4 provides the experimental setup and outcome analysis, while the studies and conclusions are summed up and concluded in Sect. 5.
40
T. Ahmed et al.
2 Feature Extraction 2.1 Principal Component Analysis (PCA) for HSI For the implementation of PCA [15, 16], from the HSI datacube, a 2D data matrix D is built up first. The matrix holds a dimension of F × S, where F symbolizes the number of features or bands carried by the dataset, and the total number of samples (pixels) is denoted as S = X ∗ Y . Every spectral vector identifies an object uniquely and is represented as xn = [xn1 xn2 ....xn F ]T , where n ∈ [1, S]. The mean spectral vector M is computed by making use of F spectral bands as follows: M=
S 1 xn . S n=1
(1)
The mean adjusted spectral vector In is formed via the following equation: In = xn − M. Then, the mean adjusted data matrix I is expressed as I = [I1 I2 ....In ]. The covariance matrix Cov for our mean adjusted data I is calculated on the basis of the mathematical expression: Cov =
1 T II . S
(2)
Since PCA is based on the eigenvalue decomposition of the covariance matrix, the calculation of both the eigenvector and eigenvalues is done using the following mathematical expression: Cov = VEVT ,
(3)
where E = diag[E 1 E 2 ....E F ] bears the eigenvalues of Cov in the main diagonal and V = [V1 V2 ....VF ] reflects the respective eigenvectors of the eigenvalues in E. The eigenvalues are set up in decreasing order (E 1 ≥ E 2 ≥ ... ≥ E F ), and the order of the corresponding eigenvectors are also rearranged along with the values. For the construction of the final projection matrix PM, a matrix of dimension F × k is formed, where k represents the number of first selected eigenvectors and k > 30 or temperatur e > 8.5 or p H > 1800 ppm or T DS > 30 mg/L or T SS 0.5z
(3)
The continuous approach was applied to produce the following hyperbolic tangent function. The approach was implemented to deal with the arising problem during the differentiation process. ρ = 1.1 + 0.1 tanh(k(y − 0.5z))
(4)
At this stage, let us discuss the optimality condition to satisfy the optimal solution. A few necessary conditions must also be satisfied, including the stationary condition and the state and costate equation. Furthermore, the initial condition for both the initial state and initial costate are both given, respectively. At the terminal time, the
116
W. N. A. W. Ahmad et al.
boundary condition of the integral must be satisfied. The generated z value needs to be equal to y(T ) at the end process. This is to ensure that the system is sufficiently close to zero. The iteration converges when the costate value is generated for four identical results. Only then the optimal solution will be produced.
3 The Direct and Indirect Method The direct method involved the Runge–Kutta method. The program was constructed using the AMPL programming language with the MINOS solver [5]. Meanwhile, the indirect method involved the shooting method. The Newton method is one of the possible iterative techniques for the shooting method [2]. Hence, the Newton and Golden Section Search combinations were considered in this process. The constructed program makes use of a highly precise Numerical Recipe procedure in the C++ programming language [7]. The Golden Section Search runs the constructed program to compute the best possible range of state and costate value. The Newton method will use these computed values to perform root iteration. This is to ensure that the system is sufficiently close to zero, and then the ODE solver is applied to solve the problem. The system will be determined whether they are close to zero and whether the computed optimal value maximizes the performance index at the final time. If not, the Golden Section Search will run the identical program until it produces the best possible value that is optimal and maximizes the performance index. The Golden Section Search was categorized as a one-dimensional minimization technique by Press et al. in 2007. As a result, in order to solve the maximizing problem, it is necessary to multiply the minimization technique’s performance index by a negative one [7]. In this research, the shooting result produces an optimal solution; the final state value, y(T ), is equal to 0.351080. In the meantime, the optimal solution for both initial and final costate values are −0.031822 and 0.056910, respectively. These optimal solutions result in an optimal performance index that is maximized and is equal to 0.643299 at the terminal time. In the meantime, both the value of state at the final time and costate at the initial time are equal to 0.351821 and −0.032116, respectively, and are optimally solved by the Runge–Kutta method. The optimal performance index is 0.646865 and is in the maximal value. Based on these optimal results, both the shooting and Runge–Kutta methods produced nearly identical results. The results were summarized in Table 1 in order to provide a clear explanation. According to Table 1, the optimal solution for both the shooting and Runge–Kutta method is similar up to three decimal places for the final state value, y(T ). At the same time, the initial costate, p(0), which is optimal, is identical up to two decimal places for both shooting and the Runge–Kutta method. Finally, the optimal goal function for both methods is similar up to two decimal places.
Optimal Control Problem With Non-Standard Conditions: Direct …
117
Table 1 The optimal solution for shooting and the Runge–Kutta method No.
Methods
Optimal solution Final state value, y(T )
Initial costate value, p(0)
Final costate value, p(T )
Performance index, J
1.
Shooting method
0.351080
−0.031822
0.056910
0.643299
2.
Runge–Kutta method
0.351821
−0.032116
–
0.646865
The ideal curve for the state, costate and control variables, as well as the performance index, was presented in Fig. 1. The plot for the Runge–Kutta method is a little bit different at certain periods even though the method produced a near similar optimal solution as the shooting method. Probably, there was a discretization error during the process. In contrast, the shooting method presents a smooth optimal plot. To conclude, both the shooting and Runge–Kutta methods can be applied to yield an optimal solution. However, the shooting method is more likely to produce a more accurate answer. Due to a larger convergence domain, direct techniques are easier to begin 11, 12. As a result of the discretization error that occurs during the process, the direct
Fig. 1 The plots for the generated optimal solution for state, costate and control value that maximize performance index start from initial time to final time
118
W. N. A. W. Ahmad et al.
method, however, produces findings that are less accurate than those obtained using the indirect approach 11, 13. When compared to the prior approach (direct method), indirect methods provide a more accurate solution with fast convergence that satisfies the optimality criterion 11, 14. However, initializing this procedure is challenging 11.
4 Conclusion In conclusion, the shooting method yields an optimal, more accurate solution than the Runge–Kutta method. This is done by combining the Newton and Golden Section Search methods. At the same time, the necessary condition was demonstrated and fulfilled at the final time. Moreover, this advanced research makes use of the hyperbolic tangent approach so that the system is differentiable at all periods. Consequently, the finding can be a guideline and valuable to future researchers to explore new mathematical methods for solving real-world problems. On top of that, the methodologies being employed can remain current, especially in the academic field. Acknowledgement This research was supported by the Ministry of Higher Education (MOHE) through Fundamental Research Grant Scheme (FRGS/1/2021/STG06/UTHM/03/3). Thank you to Research Management Center (RMC), Universiti Tun Hussein Onn Malaysia (UTHM), for managing the research and publication process.
References 1. Ahmad A, Sakidin H, Dahlia A, Zetriuslita Z, Qudsi R (2020) Haze and its impact to paddy monitoring from remote sensing satellites. Int J Adv Trends Comput Sci Eng 9(4) 2. Betts JT (2010) Practical methods for optimal control using nonlinear programming. Advance in Design & Control, Society for Industrial & Applied Mathematics, Philadelphia, PA 3. Bryson AE (2018) Applied optimal control: optimization. Routledge, Estimation and Control 4. Cruz PAF, Torres DFM, Zinober ASI (2010) A non-classical class of variational problems. Int J Math Model Numer Optim 1(3): 227–236. Inderscience Publishers 5. Fourer R, Gay DM, Kernighan BW (1990) A modelling language for mathematical programming. Manag Sci 36(5):519–554 6. Malinowska AB, Torres DFM (2010) Natural boundary conditions in the calculus of variations. Math Methods Appl Sci 33(14):1712–1722. Wiley Online Library 7. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge 8. Spence AM (1981) The learning curves and competition. Bell J Econ JSTOR 49–70 9. Zinober ASI (2010) Optimal control theory lecture notes (Unpublished). The University of Sheffield 10. Zinober ASI, Kaivanto K (2008) Optimal production subject to piecewise continuous royalty payment obligations. University of Sheffield 11. Benjamin, Passenberg Magnus, Kröninger Georg, Schnattinger Marion, Leibold Olaf, Stursberg Martin, Buss (2011) Initialization Concepts for Optimal Control of Hybrid Systems IFAC Proceedings Volumes 44(1) 10274-10280 10.3182/20110828-6-IT-1002.03012
Optimal Control Problem With Non-Standard Conditions: Direct …
119
12. Passenberg, B. (2012). Theory and Algorithms for Indirect Methods in Optimal Control of Hybrid Systems. Technische Universität München: Ph. D. Thesis. 13. Stryk OV & Bulirsch R (1992). Direct and Indirect Methods for Trajectory Optimization. Annals of Operational Research, 37(1). Springer, pp. 357-373. 14. Benson DA, Huntington GT, Thorvaldsen TP & Rao AV (2006). Direct Trajectory Optimization and Costate Estimation via an Orthogonal Collocation Method. Journal of Guidance, Control & Dynamics, 29(6). pp. 1435-1440.
Descriptive Analysis for Electric Bus During Non-Operational Stage Wan Noor Afifah Wan Ahmad , Suliadi Firdaus Sufahani , Mohd Fahmy-Abdullah , and Muhammad Syamil Abdullah Sani
Abstract The electric vehicle has minimal operating costs. All things considered, they require less maintenance-intensive moving components and are eco-friendly because they use almost no petroleum products. Therefore, electric vehicles are acknowledged as a potential replacement for the current vehicle generation to solve the issues of growing emissions, global warming, and resource depletion. This study created a framework for assessing the electrical bus’s technical components during its non-operational period, including the battery, temperature, framework, and energy use. The research additionally explores the connection between energy consumption practices and the technical components. The analysis used two electric buses, standard and luxury buses. The study is conducted by using qualitative data where the data collection has undergone observation to accomplish the goals. At the end of this study, the best practice for technical knowledge of the electrical bus is expected to be determined. The discoveries can be utilized to design electric buses to further explore low carbon in urban communities and electrical transport in metropolitan cities. Keywords Electric bus · Electric vehicle · Renewable energy
W. N. A. W. Ahmad · S. F. Sufahani (B) Department of Mathematics and Statistics, Faculty of Applied Sciences and Technology, Universiti Tun Hussein Onn Malaysia, Pagoh Campus, 84600 Pagoh, Johor, Malaysia e-mail: [email protected] M. Fahmy-Abdullah Department of Production and Operations Management, Faculty of Technology and Business Management, Universiti Tun Hussein Onn Malaysia, Parit Raja Campus, 86400 Parit Raja, Johor, Malaysia M. S. A. Sani Perisind Samudra Sdn. Bhd, 15, Jalan Dato Abdullah Tahir, Taman Abad, 80300, Johor Bahru, Johor, Malaysia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_11
121
122
W. N. A. W. Ahmad et al.
1 Introduction The fastest-growing significant factor causing environmental change worldwide is transportation. Most nations employ very little renewable energy and greener transportation. Unsustainable transportation exacerbates existing issues, such as climate change, urban air pollution, and depleting oil reserves [1–5]. Several nations, including Malaysia, actively contribute to lowering carbon dioxide emissions through domestic mitigation efforts and international agreements. Many electric vehicles (EVs) have been manufactured worldwide to address the environmental and energy issues brought on by outdated combustion engine automobiles [2, 6–9]. EVs have low operating costs. They have fewer moving components to maintain and are therefore very environmentally friendly because they use little or no fossil fuels. There are several advantages to electric buses (e-buses), but the reduction in carbon emissions is the most notable one. Changing to an e-bus could have a significant impact on improving air quality. The e-bus is significantly quieter than the diesel bus, providing passengers with a more comfortable trip and significantly lowering environmental noise. This paper is meant to analyze technical components’ effect on the e-bus performance during the non-operational stage.
2 Basic E-Buses Introduction One of the keys to the analysis is knowledge of the number of buses used during the data collection process. This study involved two e-buses, the standard and luxury buses owned by Persind Samudra Sdn. Bhd. The data collection process was done during the non-operational stage. Below are the basic descriptions of both e-buses (Table 1). The average distance travel per e-bus is as given by the Perisind Samudra Sdn. Bhd. is 200 km per day with full battery capacity. This Fig. 1 is used to calculate the total time for the battery to discharge fully daily. Given that the non-operational stage involves a stationary e-bus, the study is more on the specifications of the e-bus itself. The first specifications involved the bus technical parameters (Table 2). Not to forget, the e-buses involved the dashboard’s components, such as the pressure, temperature, and battery indicator (Fig. 2).
3 Performance Analysis Both standard and luxury bus performance were measured based on engine and battery performance.
Descriptive Analysis for Electric Bus During Non-Operational Stage Table 1 Basic specification for standard and luxury buses
Table 2 Vehicle technical parameters
Fig. 1 Buses interior equipment
123
No
Parameters
Standard bus
Luxury bus
1
Passenger capacity
18-seater
13-seater
2
Battery
135 kWh (100%)
3
Curb weight
4 750 kg
4
Gross vehicle weight
10 000 kg
5
Material
Aluminum alloy and composite fiber
6
Equipment
CCTV, television, radio/AUX, air conditioner, dim light, USB port (each seat), ergonomic seat, automatic passenger door
7
Dimension
Length: 7 935 mm, Width: 2 260 mm, Height: 3 000 mm, Wheelbase: 3 985 mm, Ground clearance: 254 mm
No
Parameters
Description
1
Climbing capacity
23%
2
Driving range
200 km
3
Minimum turning radius
7m
4
Maximum speed
100 km/h
5
Motor model
BYD-3425TZ-XS-A
6
Motor power
180 kW
7
Battery type
BYD lithium iron phosphate
8
Charger
40 kW (AC 3-Phase, 400 V, 63A)
9
Charge connector
Type 2 (IEC62196-2)
10
Charge duration
4h
124
W. N. A. W. Ahmad et al.
Fig. 2 Dashboard component
3.1 Standard Bus During the non-operational stage, the study was done by starting the engine of the standard bus. This activity measures the time taken for the buses to operate fully and the (OK) indicator to light up. The average time taken for the standard bus to full operation and the (OK) indicator to fully light up is 81.5 s and 13.3 s, respectively. This process was taken ten days. Engine Performance. The standard bus’s duration for the engine to start for the first time is 21.7 s. Table 3 will display the engine start time and engine shutdown time. This testing is done 20 times in one day. Based on Table 3, the average for the engine to start is 7.1075 s and the average for the engine to fully shut down is 15.988 s. The performance for the starting and shutting down of the engine can be referred to in the following Fig. 3. Battery performance. A few conditions were applied during the battery discharge to measure the battery performance. In this activity, the battery percentage was recorded at the beginning. After that, the discharged battery rate was recorded 20 times for each different condition. Testing the battery performance involved five conditions. Firstly, no air conditioner turns on in the standard bus. Based on Table 4, the average battery discharged without turning on the air conditioner is 78.13%. Next, the performance testing involved the air conditioner at 20 °C, with the average battery discharged, was 76.33%. The average percentage dropped to 71.26% when the lights were turned on. The average discharged battery percentage decreased to 73.77% when the air conditioner turned on at 16 °C. When the lights were turned on, the average battery discharged was 66.405%. Based on Fig. 4, the recorded battery discharge showed a decreasing trend for all the conditions applied to start from the first count until the twentieth.
3.2 Luxury Bus During the non-operational stage, the study also was done by starting the engine of the luxury bus. This activity is the same as the standard bus, where the time taken for
Descriptive Analysis for Electric Bus During Non-Operational Stage Table 3 Engine start time and shutdown time in a day for the standard bus
Count
Engine start time (s)
125 Engine shutdown time (s)
1
8.5
15.7
2
7.3
16.6
3
6.9
16.6
4
6.3
15.8
5
10.5
16.9
6
6.2
16.7
7
6.0
16.8
8
6.2
16.9
9
6.3
15.7
10
9.6
15.6
11
6.2
15.5
12
6.1
15.5
13
6.4
15.6
14
6.2
15.6
15
6.2
15.5
16
10.8
15.9
17
7.9
15.4
18
6.1
15.5
19
6.3
16.7
20
6.4
15.4
Engine start me and shutdown me for standard bus Time (s)
20.0 10.0 0.0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Engine Start Time
Count Engine Shutdown Time
Fig. 3 Engine start and shutdown time for the standard bus
the bus to operate fully and the (OK) indicator to light up were recorded for ten days. The average time taken for the luxury bus to full operation and the (OK) indicator to fully light up is 275.2 s and 13.3 s, respectively. Engine Performance. The duration for the engine to start for the first time is 21.7 s. The engine start and shutdown time was recorded 20 times in one day.
126
W. N. A. W. Ahmad et al.
Table 4 Battery discharged for the standard bus Count
Battery discharged (%)
Conditions
Without air conditioner
With air conditioner at 20 °C
With air conditioner at 16 °C
With air conditioner at 20 °C, Open light
With air conditioner at 16 °C, Open light
1
78.5
77.6
75.1
72.4
68.0
2
78.4
77.5
75.0
72.2
67.9
3
78.4
77.3
74.8
72.1
67.7
4
78.3
77.2
74.7
72.0
67.5
5
78.3
77.0
74.6
71.9
67.3
6
78.3
76.9
74.5
71.7
67.1
7
78.3
76.7
74.3
71.6
67.0
8
78.2
76.6
74.2
71.5
66.8
9
78.2
76.4
74.0
71.4
66.7
10
78.1
76.3
73.8
71.2
66.5
11
78.1
76.2
73.7
71.2
66.3
12
78.1
76.0
73.6
71.1
66.2
13
78.1
75.9
73.5
71.0
66.1
14
78.0
75.8
73.2
70.9
65.8
15
78.0
75.8
73.0
70.7
65.7
16
77.9
75.7
72.9
70.7
65.5
17
77.9
75.6
72.8
70.5
65.3
18
77.9
75.4
72.7
70.4
65.1
19
77.8
75.4
72.6
70.4
64.9
20
77.8
75.3
72.4
70.3
64.7
Average
78.13
76.33
73.77
71.26
66.405
Starting %
78.5
77.8
75.2
72.4
68.2
Based on Table 5, the average for the engine to start is 6.835 s and the average for the engine to fully shut down is 15.695 s. The performance for the starting and shutting down of the engine can be referred to in the following Fig. 5. Battery Performance. The same conditions as the standard bus were applied when analyzing the battery performance of the luxury bus. The discharged battery rate was recorded 20 times for each condition, with the battery percentage recorded at the beginning. Based on Table 6, the average battery discharged for the condition of no air conditioner turned on, with the air conditioner turned on at 20 °C and 16 °C, with the air conditioner, turned on at 20 °C, and 16 °C with the open lights source are 73.935%, 72.65%, 70.85%, 69.29%, and 65.9%, respectively.
Baery discharged (%)
Descriptive Analysis for Electric Bus During Non-Operational Stage
127
Baery discharged for standard bus
80 75 70 65 60
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Count Without aircond With aircond at 20°C With aircond at 16°C
With aircond at 20°C, Open light
With aircond at 16°C, Open light Fig. 4 Battery discharged for the standard bus Table 5 Engine start time and shutdown time in a day for the luxury bus
Count
Engine start time (s)
Engine shutdown time (s)
1
7.6
15.6
2
6.6
15.4
3
6.5
15.5
4
6.6
15.8
5
6.2
15.7
6
6.8
15.6
7
6.3
15.7
8
6.7
15.4
9
7.9
15.5
10
6.6
15.4
11
6.0
15.7
12
6.4
15.9
13
6.5
16.1
14
6.2
15.8
15
6.4
15.8
16
6.3
15.6
17
10.5
16.0
18
6.9
15.7
19
6.3
15.8
20
7.4
15.9
128
W. N. A. W. Ahmad et al.
Time (s)
Engine start time and shutdown time for luxury bus 18.0 16.0 14.0 12.0 10.0 8.0 6.0 4.0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Count Engine Start Time
Engine Shutdown Time
Fig. 5 Engine start and shutdown time for the luxury bus Table 6 Battery discharged for the luxury bus Count
Battery discharged (%)
Conditions
Without air conditioner
With air conditioner at 200 °C
With air conditioner at 160 °C
With air conditioner at 200 °C, Open light
With air conditioner at 160 °C, Open light
1
74.2
73.6
71.8
70.3
67.6
2
74.1
73.5
71.7
70.2
67.4
3
74.1
73.4
71.6
70.0
67.2
4
74.1
73.3
71.5
69.9
67.0
5
74.0
73.1
71.4
69.8
66.8
6
74.0
73.0
71.3
69.7
66.7
7
74.0
72.9
71.2
69.6
66.5
8
74.0
72.8
71.1
69.5
66.3
9
74.0
72.7
71.0
69.4
66.2
10
73.9
72.7
70.9
69.3
66.0
11
73.9
72.6
70.8
69.2
65.8
12
73.9
72.5
70.7
69.1
65.7
13
73.9
72.4
70.6
69.0
65.5
14
73.9
72.3
70.5
69.0
65.3
15
73.8
72.2
70.4
68.9
65.1
16
73.8
72.2
70.3
68.7
64.9
17
73.8
72.1
70.2
68.7
64.8
18
73.8
72.0
70.1
68.6
64.6
19
73.8
71.9
70.0
68.5
64.4
20
73.7
71.8
69.9
68.4
64.2
Average %
73.935
72.935
70.85
69.29
65.9
Starting %
74.3
73.7
71.8
70.4
67.7
Baery discharged (%)
Descriptive Analysis for Electric Bus During Non-Operational Stage
129
Baery discharged for luxury bus
75 70 65 60
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 With aircond at 200°C Without aircond Count With aircond at 160°C With aircond at 200°C, Open light With aircond at 160°C, Open light
Fig. 6 Battery discharge for the luxury bus
Figure 6 showed the decreasing trend for all conditions applied when the battery discharged percentage was recorded for the luxury bus.
4 Maintenance and Safety Practice Maintenance and safety practices are also essential subjects when handling e-buses. In the event of an emergency such as a fire involving electrical equipment, the fire extinguisher (FE) must be correctly used as follows: 1. Take FE type ABC at the back driver seat. 2. Step back for five steps and aim at the source of the fire. In the case of malfunction, the e-buses need an emergency towing. This towing truck needs to have the function of raising, lifting, twisting, and holding the vehicle. These functions involved low-loader trucks (Fig. 7).
5 Conclusion In this non-operational stage, standard and luxury electrical buses were analyzed based on their technical specifications, performance, maintenance component, and safety practice that need attention. The most prolonged period was spent on the performance part, where the buses underwent several tests, such as recording the time taken to turn on and shut down the buses’ engine, and battery performance was measured under five different conditions. The finding is beneficial to future researchers to explore energy consumption and low carbon in the urban city when using the e-bus.
130
W. N. A. W. Ahmad et al.
Fig. 7 Maintenance component of e-buses
Acknowledgements The research work is supported by the Universiti Tun Hussein Onn Malaysia (UTHM) and Perisind Samudra Sdn. Bhd. (PSSB) through the industry grant with references Vot M058. Thank you to Research Management Center (RMC) for managing the research and publication process.
References 1. Chan CC (1993) An overview of electric vehicle technology. Proc IEEE 81(9):1202–1213. https://doi.org/10.1109/5.237530 2. Chan CC, Wong YS (2004) Electric vehicles charge forward. IEEE Power Energy Mag 2(6):24– 33 3. Conway T (2016) Electric vehicles 4. Gao Z, Lin Z, LaClair TJ, Liu C, Li JM, Birky AK, Ward J (2017) Battery capacity and recharging needs for electric buses in city transit service. Energy 122:588–600. https://doi.org/10.1016/j. energy.2017.01.101 5. Iclodean C, Cordos N, Todorut A (2019) Analysis of the electric bus autonomy depending on the atmospheric conditions. Energies 12(23):4535. https://doi.org/10.3390/en12234535 6. Liu K, Wang J, Yamamoto T, Morikawa T (2018) Exploring the interactive effects of ambient temperature and vehicle auxiliary loads on electric vehicle energy consumption. Appl Energy 227:324–331 7. Nageshrao SP, Jacob J, Wilkins S (2017) Charging cost optimization for EV buses using neural network based energy predictor. IFAC-Papers OnLine 50(1):5947–5952. https://doi.org/ 10.1016/j.ifacol.2017.08.1493 8. Quarles N, Kockelman KM, Mohamed M (2020) Costs and benefits of electrifying and automating bus transit fleets. Sustainability 12(10):3977. https://doi.org/10.3390/SU12103977 9. Saadon Al-Ogaili A, Ramasamy A, Juhana Tengku Hashim T, Al-Masri AN, Hoon Y, Neamah Jebur M, Verayiah R, Marsadek M (2020) Estimation of the energy consumption of battery driven
Descriptive Analysis for Electric Bus During Non-Operational Stage
131
electric buses by integrating digital elevation and longitudinal dynamic models: Malaysia as a case study. Appl Energy 280:115873. https://doi.org/10.1016/j.apenergy.2020.115873
The Effectiveness Level on the Electric Buses Operation: Case Study for Affordability and Accessibility Ahmad Husaini Mohamad Tajudin, Mohd Fahmy-Abdullah , Suliadi Firdaus Sufahani , and Wan Noor Afifah Wan Ahmad
Abstract The emerging technology of electric vehicles is becoming a viable alternative to the transportation sector, especially in developing countries like Malaysia. However, the operation is still a challenge regardless of the level of effectiveness of its operation. Customer satisfaction and service quality related to two elements are focused on in the research: Affordability and accessibility. The research objectives are to collect data on electric buses in Bandar Subang. The study was conducted using a quantitative method, where the data was collected through a survey questionnaire. A total of 57 respondents was collected through an online platform. Descriptive analyses were used, and a discussion and conclusion were made on the quantitative findings. This research showed that all elements of the level of effectiveness recorded a high mean score. Affordability was the highest mean score among the two elements. There is a strong correlation between affordability and the other selected questions in the questionnaire. Keywords Effectiveness level · Customer satisfaction · Electric bus · Public transport · Affordability
A. H. M. Tajudin · M. Fahmy-Abdullah (B) · W. N. A. W. Ahmad Faculty of Technology Management and Business, Universiti Tun Hussein Onn Malaysia, Parit Raja, 86400 Batu Pahat, Johor, Malaysia e-mail: [email protected] S. F. Sufahani Faculty of Applied Sciences and Technology, Universiti Tun Hussein Onn Malaysia, Pagoh Higher Educational Hub, 84600 Pagoh, Johor, Malaysia M. Fahmy-Abdullah · S. F. Sufahani Oasis Integrated Group (OIG), Institute for Integrated Engineering, Universiti Tun Hussein Onn Malaysia, Parit Raja, 86400 Batu Pahat, Johor, Malaysia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_12
133
134
A. H. M. Tajudin et al.
1 Introduction Electric buses (EBs) are becoming increasingly common in cities to reduce greenhouse gas (GHG) emissions and local air pollutants relative to diesel buses. This implementation is often the focus of efforts in urban areas because conventional city buses emit 0.3 kg of carbon dioxide (CO2 ) per passenger mile, and 0.08 kg of CO2 per passenger mile is released per long-distance bus trip that is more than 20 miles [16]. Therefore, the Malaysian government aimed to electrify public transport, introducing 2 000 EBs by 2030. Customer satisfaction is a term that reflects how satisfied customers are with a business’s products, services, and capabilities [10], [19] said that customer satisfaction is based on the relationship between a customer and a product or service provider. According to [9], customer satisfaction data collection is a tool businesses can use to increase sales, reduce costs, and ensure that customers remain satisfied and continue using the product or service based on previous positive experiences. The purpose of this study is to conduct a customer satisfaction analysis using residents’ data to ascertain their satisfaction level with Bandar Subang’s EBs. The main aims are to determine the correlation between the independent variable (customer satisfaction) and dependent variables (affordability and accessibility). The evaluation was conducted using cross-tabulation on EBs. The analyzed data will later provide an understanding of customers’ satisfaction with Bandar Subang’s EB service.
1.1 Problem Statements Transportation costs can be a significant burden when factored into monthly expenses, particularly for low-income earners in the area. The high cost of transportation fares significantly impacts low-income earners who rely on public transportation daily. The affordability of public transport is a major concern for the less fortunate, even more so with the rising living cost. This factor contributes significantly to the low number of public transportation users. When people compare, they discover that owning a personal vehicle is more affordable than using public transportation daily. Studies on customer satisfaction with implementing EBs are still significantly lacking compared to other studies. There are currently no customer-oriented studies of EBs in Malaysia [13]. Only examined the perceived quality of EBs, not overall customer satisfaction with the transportation service. As a result of the shortcomings and research gaps, an attempt is made to ascertain customer satisfaction with implementing EBs in Bandar Subang.
The Effectiveness Level on the Electric Buses Operation: Case Study …
135
1.2 Research Questions The main research questions are: (1) How to establish data collection on EBs in Bandar Subang? (2) What is the important service quality element to satisfy the customers in public transportation? and (3) What is the cross-tabulation for the elements of the level of satisfaction on EBs?
1.3 Research Objectives and Its Scope The main goal of this research is to establish data collection on the EBs in Bandar Subang, to analyze the affordability and accessibility of current public transportation, and to analyze cross-tabulation for the elements of the level of satisfaction on EBs. The methodology was carried out through a quantitative approach, which is included in publication research and survey. Via this method, we could discover the level of effectiveness of EB in Bandar Subang.
2 Concept Effectiveness Level According to [21], public transportation providers generate economic and social benefits. He also added that effectiveness is when efficiency and satisfaction are considered. Customer satisfaction can be defined as the customer’s fulfillment responses [15]. It is a judgment that a product or service feature, or the product or service itself, provides a satisfying level of consumption-related fulfillment. Customers’ perceived qualities are typically formed by previous purchasing experiences, recommendations from friends, and the producer’s company’s selfclaim. Thus, customers’ pleasure is based on their perceptions of the product’s quality compared to what the manufacturer claims it can be [14].
3 Empirical Study for the Effectiveness Level and Its Elements Many studies found that improving service quality through understanding customers’ needs can considerably boost customer satisfaction. According to [1], customer satisfaction is an independent construct significantly influenced by service quality. Customer loyalty can be influenced directly by service quality [2], but it can also be influenced indirectly by customer satisfaction [5]. Apart from that, as [9] points out, customer happiness significantly impacts the profitability of every business. A customer satisfied with the service quality will eventually recommend the business
136
A. H. M. Tajudin et al.
to nine to ten additional people. On the other hand, poor service quality results in decreased customer satisfaction. This circumstance will harm the business, causing it to lose many customers, not just existing customers but also new potential consumers. The information gained from customer satisfaction can help companies identify the most critical requirements necessary to please their customers and consequently motivate enterprises to devote more resources to the most critical areas valued by customers [9]. Customers’ expected service quality for public transportation may vary by user, as the population comprises numerous groups of individuals with different social classes. So their service expectations for public transit may vary [8]. Categorized the expected service quality, which can also be obtained as mobility and travel restrictions. The two components of service quality, affordability and accessibility, have been referenced numerous times in numerous articles, including [8] and [3]. They have evolved into the two most crucial service characteristics customers expect from their transportation provider.
3.1 Affordability Affordability can relate to the financial costs incurred by households when traveling from one location to another to do daily activities such as school, job, healthcare, shopping, and other social activities [12]. According to [4], transportation affordability refers to the capacity to make financial concessions to carry out important transportation movements without jeopardizing other critical activities. Transportation is one of the most critical facilities in a growing country, as improving public transportation may be used to gauge the country’s growth [18]. Affordability is a key component of service quality that most users look for in a public transportation service [6]. According to [4], in major cities in other developing countries such as India, the United Kingdom, and Mexico, affordability is undoubtedly a critical factor that must be addressed.
3.2 Accessibility European Conference of Ministers of Transport [7] stated that accessibility has long been a critical component of an efficient, sustainable, and high-quality public transportation system. It corresponds to the idea of inclusiveness, which includes offering access to all groups of people from all socioeconomic backgrounds and classes. It has also become a critical component of designing and evaluating public transportation systems to achieve a higher level of mobility and sustainability [11–17]. Emphasized that accessibility provides equitable access to all people, particularly those with limited access to facilities and services, such as people with disabilities and senior citizens.
The Effectiveness Level on the Electric Buses Operation: Case Study …
137
Fig. 1 Research hypothesis
The primary objectives of accessibility are to improve connectedness between people and places while considerably reducing traffic congestion and giving a viable alternative to the personal automobile, which harms the environment’s condition and healthfulness [20]. Accessibility is a component of the quality of public transportation services directly related to affordability. The country’s low-income earners have the greatest need for an accessible and inexpensive public transportation system, as this group significantly relies on public transit. Additionally, low-income earners have a relatively restricted number of journeys to afford public transportation, reducing their chances of becoming less poor [8]. According to [4], an inconvenient location of public transport facilities results in low-income citizens having limited access. This circumstance significantly impacts this group since they are forced to walk or pay for cabs to reach the nearest public transport facilities.
4 Research Hypothesis H0: There is no positive correlation between affordability elements toward the effectiveness of the operation of public transportation services (Fig. 1). H1: There is a positive correlation between affordability elements toward the effectiveness in the operation of public transportation services. H2: There is a positive correlation between accessibility elements toward the level of effectiveness in the operation of public transportation services.
5 Research Methodology The survey and statistical calculations are the methods used to obtain accurate data and information in conducting this research, and the quantitative method was chosen to analyze the independent variable and effectiveness level of the EB operation in Bandar Subang (dependent variable). Figure 2 shows the research flow and process.
138
A. H. M. Tajudin et al.
Fig. 2 Flow chart of research methodology
The primary data is the recent and first-time data collected, which happens to be original. The questionnaire was used as the instrument to collect the data. The questionnaire was distributed to the respondent who lives and uses public transport in Bandar Subang (BRT Sunway Line). A total of 57 respondents participated in the questionnaire. IBM SPSS Version 26 was used to aid descriptive statistics analysis. The research measures central tendency, measures of spread and uses cross-tabulation to find the correlation between the dependent variable and independent variables.
6 Results and Discussion 6.1 Affordability Based on Table 1, each item for the element of affordability was recorded with a high mean score according to the central tendency level. Meanwhile, the standard deviation for each item ranges from 0.76325 to 0.87502. The EB fare less expensive
The Effectiveness Level on the Electric Buses Operation: Case Study …
139
Table 1 Affordability No
Question
N
Mean
Std. Deviation
Level
Rank
AF1
Public transport ticket price consistent with travel distance
57
4.2456
0.76325
High
3
AF2
The ticket price affordable for every level of society
57
4.1579
0.79708
High
5
AF3
Less expensive than taxis?
57
4.5614
0.82413
High
1
AF4
Is the fare affordable for older passengers?
57
4.2807
0.79629
High
2
AF5
Is the fare affordable to passengers with disabilities?
57
4.1930
0.87502
High
4
Average
57
4.2877
0.66334
High
–
than taxis is the highest level compared with other items, with a mean score of 4.5614. The results showed an overall mean value of 4.2877, where it remained at a high level with a standard deviation of 0.66334.
6.2 Accessibility Based on Table 2, the element of accessibility was recorded with high means score according to the central tendency model for AC3 and AC4. However, moderate means score for AC1 and AC2. Meanwhile, the standard deviation for each item ranges from 0.92107 to 1.1826. The highest value of the mean score was 3.7193 from AC3. These results showed an overall mean value of 3.5219, which remained at a high level with a standard deviation of 0.85340. Table 2 Accessibility No
Question
N
Mean
Std. Deviation
Level
Rank
AC1
The bus stand/stop is near to your house?
57
3.3158
1.0028
Moderate
4
AC2
The bus stop/stand covers all the area?
57
3.3684
1.0112
Moderate
3
AC3
Plenty of bus all the time?
57
3.7193
0.92107
High
1
AC4
Availability for special path for passengers with disabilities?
57
3.6842
1.1826
High
2
Average
57
3.5219
0.85340
High
140
A. H. M. Tajudin et al.
6.3 Cross-Tabulation Cross-tabulation was used to investigate the relationship between the affordability of ticket prices for every society level with the selected questions. The data have been collapsed into a three-point ordinal scale (unsatisfied, normal, and satisfied). Tables 3, 4, 5 summarize the cross-tabulation procedure’s results. Each cell contains an entry indicating the number of respondents assigned to that cell, and the values in brackets represent the corresponding percentages (of the total respondents). Cross-tabulation results provide critical information about the degree of complete agreement. For example, the percentage of respondents is either 1 and 1 (unsatisfied and unsatisfied), 2 and 2 (normal and normal), or 3 and 3 (satisfied and satisfied). For complete disagreement, for example, the percentage of respondents is between 1 and 3 (unsatisfied—satisfied) or vice versa. Table 5 showed the degree of complete agreement and complete disagreement for the selected questions (Fig. 3). A summary of the computed means of all the items according to variables is shown in Table 6. The elements were recorded as the mean score at a high level. This is between 3.5219 to 4.2877. Meanwhile, the standard deviation value ranges from 0.66334 to 0.85340. Thus, the data are spread out, and the points are, respectively, above the mean. The highest value of the mean score, 4.2877, is affordability, making it ranked as the first priority. However, this result reveals that the respondents consider Table 3 Cross-tabulation between ‘the ticket price affordable for every level of society’ with ‘less expensive than taxis?’ The ticket price affordable Less expensive than taxis? (AF3) for every level of society Unsatisfied Normal (AF2) N % N %
Satisfied N
%
N
%
Unsatisfied
2
1
1.8
3
5.3
Normal
NIL
Satisfied
1
Total
3
3.5
NIL
Total
2
3.5
3
5.3
5
8.8
1.8
1
1.8
47
82.5
49
86.0
5.3
3
5.3
51
89.5
57
100
Table 4 Cross-tabulation of ‘the ticket price affordable for every level of society’ with ‘plenty of bus all the time?’ The ticket price Plenty of bus all the time? (AC3) affordable for every Unsatisfied Normal level of society (AF2) N % N %
Satisfied
Unsatisfied
3
5.3
NIL
NIL
Normal
1
1.8
3
5.3
1
1.8
Satisfied
2
3.5
10
17.5
37
Total
6
10.5
13
22.8
38
N
Total %
N
%
3
5.3
5
8.8
64.9
49
86.0
66.7
57
100
The Effectiveness Level on the Electric Buses Operation: Case Study …
141
Table 5 Degree of complete agreement and disagreement between the affordability of ticket prices for every level of society with the selected questions No
Questions
Complete agreement (%)
Complete disagreement (%)
1
Less expensive than taxis?
3.5 + 3.5 + 82.5 = 89.5
3.6
2
Plenty of bus all the time?
5.3 + 5.3 + 64.9 = 75.5
3.5
Cross-tabulation for ‘less expensive than taxis?’
Cross-tabulation for ‘plenty of bus all the time’
Fig. 3 Cross-tabulation
Table 6 Summary of the means of computed items Element
N
Mean
Std. Deviation
Level
Rank
Affordability
57
4.2877
0.66334
High
1
Accessibility
57
3.5219
0.85340
High
2
Average
57
3.9410
0.64491
High
–
all factors to consist of some degree of importance concerning satisfaction on EBs in Bandar Subang. This is because the means of the computed items are above 3.50. The values for complete agreement between ticket price affordability for every level of society with the selected questions range from 75.5% to 89.5%, shown in Table 5. The complete disagreement has a value of only 3.6% (less expensive than taxis), and 3.5% (plenty of bus all the time). These findings show that people who are unsatisfied with the affordability of ticket prices at all levels of society have low satisfaction levels. People with normal perceptions of the affordability of ticket prices at all levels of society have normal satisfaction levels. Meanwhile, those who rated the affordability of ticket prices for all levels of society as satisfactory are very happy with the organization’s services. Hence, there is a strong link between affordability and customer satisfaction, and an increase in one will almost certainly lead to a rise in the other.
142
A. H. M. Tajudin et al.
7 Conclusion The research established the data collection on the EBs in Bandar Subang. The mean score was computed by analyzing affordability and accessibility, and the results showed that all elements are high. Through cross-tabulation for the elements of the satisfaction level, there is a strong link between affordability and customer satisfaction. Due to the research’s limited sampling population, the data for analysis are limited for the first goal (57 respondents). As a result, future research must employ a relevant and valid sampling population to avoid overgeneralizing the acquired data. Additionally, future studies may employ alternative methods for data analysis, such as STATA, SYSTAT, SAS, or AHP, to give more accurate and reliable results. This research aims to provide and extend valuable information for various stakeholders, most notably for future research and development. This study can potentially increase public awareness of the critical nature of environmental preservation. The analysis enables the researcher to ascertain how customers are satisfied with implementing EBs in the Bandar Subang. Acknowledgements The research work is supported by the Universiti Tun Hussein Onn Malaysia (UTHM) and Perisind Samudra Sdn. Bhd. (PSSB) through the industry grant with references Vot M057. Thank you to Research Management Center (RMC) for managing the research and publication process.
References 1. Aryani D, Rosinta F (2010) Pengaruh kualitas layanan terhadap kepuasan pelanggan dalam membentuk loyalitas pelanggan. Jurnal Ilmu Administrasi & Organisasi 17(2):114–126 2. Berry L, Zeithaml V, Parasuraman AP (1990) Five Imperatives for Improving Service Quality. MIT Sloan Manage. Rev. 31:9–38 3. Cafiso S, Di Graziano A, Pappalardo G (2013) Using the Delphi method to evaluate opinions of public transport managers on bus safety. Saf Sci 57:254–263 4. Carruthers R, Dick M, Saurkar A (2005). Affordability of public transport in developing countries. In The World Bank Group 5. Caruana A (2002) Service loyalty: The effects of service quality and the mediating role of customer satisfaction. Eur J Mark 36(7/8):811–828 6. Dell’Olio L, Ibeas A, Cecin P (2011) The quality of service desired by public transport users. Transp Policy 18:217–227 7. European Conference of Ministers of Transport (2009). Improving Transport Accessibility for All. In Improving Transport Accessibility for All. 8. Guzman L, Oviedo Hernandez D (2018) Accessibility, affordability and equity: Assessing ‘propoor’ public transport subsidies in Bogotá. Transport Policy, 68 9. Ilieska K (2013) Customer Satisfaction Index—as a base for strategic marketing management. TEM Journal 2(294):327–331 10. Ismael F (2010) Measuring customer satisfaction: Must or Not? Deniz Bilimleri ve Mühendisli˘gi Dergisi; Cilt: 6 Sayı: 2, 6(2). 11. Khalifeh Soltani S, Sham M, Awang M, Yaman R (2012) Accessibility for disabled in public transportation terminal. Procedia Soc & Behav Sci, 35
The Effectiveness Level on the Electric Buses Operation: Case Study …
143
12. Litman T (2016) Transportation affordability: evaluation and improvement strategies. Transp 250:360–1560 13. Munim ZH, Noor T (2020) Young people’s perceived service quality and environmental performance of hybrid electric bus service. Travel Behaviour & Society 20(March):133–143 14. Nurfarida IN (2003) Pengukuran Indeks Kepuasan Pelanggan Untuk Peningkatan Kualitas Layanan. J Ekon Mod 11(2):135–146 15. Roberts K, Varki S, Brodie R (2003) Measuring the quality of relationships in consumer services: an empirical study. Eur J Mark 37(1/2):169–196 16. Saadon Al-Ogaili A, Ramasamy A, Juhana Tengku Hashim T, Al-Masri AN, Hoon Y, Neamah Jebur M, Verayiah R, Marsadek M (2020) Estimation of the energy consumption of battery driven electric buses by integrating digital elevation and longitudinal dynamic models: Malaysia as a case study. Applied Energy, 280 (August) 17. Saif MA, Maghrour Zefreh M, Torok A (2018) Public transport accessibility: a literature review. Period Polytech Transp Eng, 3 18. Talmizi M, Asyraaf MS, Tahir Z (2020) Keberkesanan Perkhidmatan Pengangkutan Awam di Bandar Baru Bangi. J Wacana SarNa 4(1):1–7 19. American Society for Quality (2021) What is customer satisfaction? Customer satisfaction 20. Yatskiv I, Budilovich E, Gromule V (2017) Accessibility to riga public transport services for transit passengers. Procedia Eng 187:82–88 21. Zhang C, Xiao G, Liu Y, Yu F (2018) The relationship between organizational forms and the comprehensive effectiveness for public transport services in China? Trans Portation Res Part a: Policy & Pract 118(June):783–802
IoMT-based Android Application for Monitoring COVID-19 Patients Using Real-Time Data Mohammad Farshid , Atia Binti Aziz , Nanziba Basnin , Mohoshena Akhter , Karl Andersson , and Mohammad Shahadat Hossain
Abstract Surviving three years of the pandemic since December 2019, monitoring COVID-19 patients in a projected way is still challenging. Even after testing negative for coronavirus, people face a lot of post-covid stresses and symptoms. Scarcity of hospital beds, shortage of medical equipment like oxygen, ventilation, etc. have made the situation worse as people failed to receive proper treatment. In this regard, this work proposes an IoMT-based wearable checking device for assessing COVID19-identified imperative signals. Furthermore, by continuously monitoring data, the device promptly warns concerned clinical personnel about any breach of isolation for possibly contaminated patients. The data from the body-wearable sensor is processed and broken down by an edge node in the IoMT cloud to characterize the condition of health. A puttable IoMT sensor layer, a cloud layer with Application Peripheral Interface (API), and an Android-based cell prototype are part of the proposed system. Each layer has its own function; for example, the data from the IoMT sensor layer is used to characterize the wellness of the side effects. The Android portable application layer is in charge of informing and cautioning possibly infected patient family members, the nearest hospital, and the patient’s signed doctor about the potential contamination. Two APIs and a variety of applications are synchronized in the integrated system to predict and disrupt the situation. In a word, the target is to monitor this data and send it to the cloud through the IoMT gateway and monitor these parameters using the Android app. The doctor and the patient’s relative could also observe the monitor system through the app using the device id from this app. Because there are fewer available beds in hospitals, more people are dying as a result of inadequate care. Keywords COVID-19 · IoMT · WSN · Cloud computing M. Farshid (B) · A. B. Aziz · N. Basnin · M. Akhter International Islamic University Chittagong, Chittagong, Bangladesh e-mail: [email protected] K. Andersson Lulea University of Technology, Skelleftea, Sweden M. S. Hossain University of Chittagong, Chittagong, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_13
145
146
M. Farshid et al.
1 Introduction Pandemic COVID-19 caused severe social and economic disruption throughout the world resulting in tens of millions of people falling under long-term and permanent physical damage. The pandemic affects almost all social groups of a certain population, mostly detrimental to those groups who are in vulnerable social conditions [18]. Coronavirus is a catastrophe that hit the globe in late 2019 and has involved around 203 nations, impacted more than 155 M individuals, and ended the existence of around 3.2 M people [1] up to the furthest limit of April 2021. It should be mentioned that these statistics pertain to the end of April 2021. People fails to receive proper treatment due to the scarcity of empty bed in the hostpital, the mortality rate has increased. Bangladesh faced an acute scarcity of ICU beds during all its COVID-19 waves [20]. Hence, treating and monitoring patients from home and maintaining isolation properly have become a crying need. The proposed work therefore is an IoMT prototype for monitoring COVID-19 patients using real-time data from wearable sensor [10]. Through an Android app, the system automatically alerts the concerned medical authorities or families about any bad situation during the Isolation of the patient. The Internet of Medical Things (IoMT) is a network of Internet-connected medical equipment, hardware infrastructure, and software applications that connect healthcare IT. The main advantage here of using IoMT is the ease of passing the data. Because of ever-expanding fields of IoMT [14], the implementation will be revolutionary and perfect for the present time context. As a result, an IoMT-based wearable monitoring device is being developed to track various COVID19-related vital indicators. The wearable sensor will be associated with an edge hub in the IoMT cloud, where the information will be handled and investigated to decide the current status of wellbeing [21]. The proposed framework will be executed with three layered functionalities as wearable IoMT sensor layer, cloud layer with Application Peripheral Interface (API), and Android web layer for cell phones. The key symptoms of COVID-19 are expected to be utilized to follow a patient and isolate him in confined places, limiting the virus’s spread. As a result, health markers such as body temperature, SpO2, and pulse rate have the ability to detect novel coronavirus symptoms. So, our main purpose is to monitor this data and send it to the cloud through the IoMT gateway and monitor these parameters using the Android app. The doctor and the patient’s relative could also observe the monitor system through the app using the device id from this app.
2 Literature Review To design a good quality-based IoMT model, we need to review a lot of papers. Our proposed framework is actually based on a decentralized system. There is a lot of IoT work that helps us build our framework. This paper [4] approaches A hybrid approach based on the Internet of Things (IoT), and Belief Rule Base (BRB) is introduced to
IoMT-based Android Application for Monitoring COVID-19 …
147
evaluate autism spectrum disorder (ASD). This clever method can instantly and automatically diagnose autistic children by gathering their sign and symptom data from a variety of autistic children. Another IoT paper [13] and the goal of this study is to create an IoT-based smart home that can operate particular gadgets and monitor the house using Android mobile devices. Temperature, gas detection, and door lock are examples of features that are used here. Users may view a channel, alarm system, and door lock from any mobile device. If there is a change in the monitored data, the user will be alerted. There are so many papers based on COVID-19. This research [2] developed a COVID-19 severity prediction approach. In pandemic scenarios, the suggested approach will assist hospital administrators in identifying severe and non-severe COVID-19 patients and in developing effective treatment strategies. The results of the experiments conducted on real patients’ data demonstrate that the method is feasible and accurate. Another goal of another research [27] was to create a model that could predict confirmed cases in Bangladesh. Our combined model can predict confirmed cases with 90–99% accuracy. This approach can also provide an estimate of the overall number of quarantined people. Another approach paper [16] is Fake News Detection which is a difficult research challenge during the COVID-19 epidemic because it endangers the lives of many online users by giving disinformation. This research suggested a LIME-BiLSTM integrated model in which BiLSTM maintains classification accuracy and LIME ensures transparency and explainability. Kendall’s tau correlation coefficient is used to assess the explainability of this model. Ahmed et al. [3] presents an integrated CNN-BRBES strategy for predicting COVID19 patient survival probability. A unique pre-trained model (VGG19) is used as the CNN portion for COVID-19 patients’ condition evaluation, which can determine whether or not a patient is a critical COVID-19 patient by evaluating chest X-ray images. The suggested model provides more flexibility in outcomes that are validated by experts since it incorporates both data and knowledge-driven techniques rather than relying just on one of these two. Another paper [23] of COVID-19 is an Accurate diagnosis which is essential in avoiding the spread of the SARS-CoV-2 virus in the ongoing COVID-19 epidemic. The current standard diagnostic approach, RT-PCR testing, has a high probability of false negatives and false positives. This study used a belief rule-based expert system (BRBES) to diagnose COVID-19 in lung tissue infection in adult pneumonia patients using hematological and CT scan data. The system is optimized using a nature-inspired optimization technique based on adaptive differential evolution (BRBES) (BRBaDE). This model was tested on a real-world dataset of COVID-19 patients from a previous study. Furthermore, the performance of the BRBaDE was compared to that of BRBES improved with a genetic algorithm and MATLAB’s fmincon function, with BRBaDE ranking both and achieving the highest accuracy of 73.91%. Another paper [19] which is CNN based Approach Face masks may be identified using the approach provided in this paper, which employs deep learning, TensorFlow, Keras, and OpenCV. It is very affordable to implement; this method might be considered for use in security jobs. In reality, the GAN-generated face-masked dataset has been chosen for assessment. When compared to other typical Convolutional Neural Network models, the suggested framework beat them all, achieving 99.73% accuracy. Another paper [22] is
148
M. Farshid et al.
Fig. 1 Methodology
mainly focused on a Business based product which is product based on Digital health and IoT. Using clinical preliminaries to come to a false positive conclusion, in order to make a decision, this commercial device is used to make a decision on COVID-19 based on different parameters of IoT. Data security, data sharing design, and a calculation to accurately screen are all issues that need to be addressed. Body Area sensor IoMT [11] framework is used to gather information in giving early admonition of heart failure.
3 Methodology In the implementation, we are going to use four layers of architecture: Sensing layer, communication layer, storage layer, and application layer (Fig. 1).
3.1 Sensing Layer This is the design’s physical layer. This is where the sensors and accompanying devices, which compile various measurements of information according to the project’s needs, become possibly the most crucial aspect [12]. These could be edge devices, sensors, or actuators that communicate with their current firmware [6]. In our proposal, in that layer we got the sensors that are for oxygen measurement, pulse rate measurement, and temperature measurement. This sensor is connected to the IoMT gateway.
3.2 Communication Layer The data collected by these devices should be sent and processed. The network layer is in charge of this. It connects smart objects, servers, and network devices to these gadgets. It also manages the transfer of all information. Wireless protocol is particularly important at this tier [24]. Wireless sensors, as opposed to those that require cables, can be used in difficult-to-reach areas and require fewer material and
IoMT-based Android Application for Monitoring COVID-19 …
149
human resources to install. In our proposal for a communication layer, we got a Wi-Fi protocol system [15]. According to various electronic designers, Wi-Fi is a preferred alternative for IoMT integration. It’s due to the framework on which it’s developed. It features a fast data transfer rate and the ability to manage a big amount of data. You can transfer many megabits in a single second because of the infinite Wi-Fi standard 802.11. The sole drawback of this IoMT convention is that it can consume a large portion of the IoMT application. That is the reason why we chose a Wi-Fi protocol system.
3.3 Processing Layer We can call this layer a processing layer. In this layer, IoMT frameworks are designed to capture, store, and cycle data for additional needs [8]. Each device transmits a large number of data streams through the IoMT network. Data is presented in a variety of formats, speeds, and sizes. Isolating the important data from these massive streams is a critical concern for developers in this layer [7]. Edge computing is a method of increasing the performance of IoMT systems by distributing data processing to the network’s periphery nodes. We used edge computing to preprocess the data locally and then transmitted this data into our central cloud system [25]. Cloud computing allows organizations to access infrastructure, platform, and application services that run in centralized or distributed environments, allowing system owners to use the cloud provider’s operating capacity, processing, and data storage at a cost based on the amount of resources the system actually uses or requires [17]. Because cloud services are frequently maintained outside of an agency’s own IT infrastructure, cloud computing contingency planning relies on the cloud provider’s capabilities, procedures, and solutions, as well as outside workers to undertake contingency operations and activities [5]. This is the main layer where we have our all database management storage layer.
3.4 Application Layer The application layer [26] contains all of the products required to provide a specific service. Databases, analysis software, and other tools are used to store, collect, filter, and process information from previous levels. The information is made available to actual IoMT applications (smart wearables, smart vehicles, and so on), thanks to this preparation system. This is typically done using middleware software, which has the task of hiding the heterogeneity of the underlying layers. In our application layer, we have two users [9]. A normal user could be a patient party or a patient himself and another one is a doctor.
150
M. Farshid et al.
Fig. 2 System flowchart
3.5 System Analysis We build a user interface in a Mobile application where we will see the data of covid patients 24/7. The log in user will be the device id. Every device will get a unique id which will help to log in to the application. So in future, other patients or family members could use this. And if we talk about the device implementations, we will interface the sensor with the d1 mini. And the development board connects with the edge device locally to reduce the latency over the cloud and then filters data transmitted to the centralized cloud system. The flowchart of our system is highlighted below where we define the threshold for every parameter. In that flow diagram, we see the three basic parameters with the threshold limit and situation when the corresponding authority will get the notification. For pulse, the threshold
IoMT-based Android Application for Monitoring COVID-19 …
151
limit is 50 < p < 90. For oxygen, it will be 94 < O2 < 100, and for body temperature it will be 97.5 < T < 99.5. For body temperature, we need to add one degree every time when we measure the skin (Fig. 2).
4 Result and Discussion The schematic diagram of our proposed system is given below (Fig. 3). In that schematic diagram, we will see the connection between the development board with the sensor interface. As in that figure, we can see how the Max30100 connects with d1 mini. I2C protocol has been used for this connection. In Fig. 4, we will see real implementation images of the process of reading the data. It demonstrates the process of reading pulse and oxygen from Max30100. In the sensor, a finger is placed on the pulse-oximeter sensor to read the average pulse and oxygen. The reading is taken ten times and an average is produced from the results. Figure 5 depicts the procedure of reading temperature from the body using LM35. Since the sensor is placed on the skin, +1 ◦ C is added to produce the correct body temperature. In Fig. 6, a snippet of the app interface is given, where mainly the monitoring interface is represented. Figure 7 represents the app interface where users (i.e. doctors and people close to the patient) related to the patient will open an account by adding their contact number as well as the device they are using. In Fig. 8, the notification is
Fig. 3 Schematic diagram of proposed system
152 Fig. 4 App interface
Fig. 5 Step 1. Contact addition
M. Farshid et al.
IoMT-based Android Application for Monitoring COVID-19 …
153
Fig. 6 Step 2. Notification in Android app
Fig. 7 Step 3. Emergency notification
displayed in real time from the Android application. The person related to the patient will only be notified if the real-time value crosses any of the threshold values. Further, in Fig. 9, if the user’s Internet network is unstable, then by means of a 2G network the user will receive a text notification to draw awareness of the patient’s current health deterioration. Moreover, in Fig. 10 the prototype developed in this research is compared to the devices currently available in the market. This is done in order to check the accuracy and precision of the prototype.
154
Fig. 8 Comparison of real pulse oximeter
Fig. 9 Pulse rate comparison
M. Farshid et al.
IoMT-based Android Application for Monitoring COVID-19 …
155
Fig. 10 Oxygen level comparison
5 Conclusion This system is designed to play a significant role in proper patient care with intensive monitoring with a low-cost IoMT device architecture. Patients will be provided a 24/7 monitoring system which most hospitals failed to offer during the crisis. An Android application interface has been implemented in the proposed work for the whole monitoring system. The model is a cloud-based data storage management which prevents data loss. In future, this research aims to include telemedicine features as well as a survey based on patient data. This survey will help in predicting the number of people affected by COVID-19 in certain locations.
References 1. Ahmadi H, Arji G, Shahmoradi L, Safdari R, Nilashi M, Alizadeh M (2019) The application of internet of things in healthcare: a systematic literature review and classification. Univ Access Inform Soc 18(4):837–869 2. Ahmed F, Hossain MS, Islam RU, Andersson K (2021) An evolutionary belief rule-based clinical decision support system to predict COVID-19 severity under uncertainty. Appl Sci 11(13):5810 3. Ahmed TU, Jamil MN, Hossain MS, Islam RU, Andersson K (2022) An integrated deep learning and belief rule base intelligent system to predict survival of COVID-19 patient under uncertainty. Cogn Comput 14(2):660–676
156
M. Farshid et al.
4. Alam ME, Kaiser MS, Hossain MS, Andersson K (2018) An IoT-belief rule base smart system to assess autism. In: 2018 4th international conference on electrical engineering and information & communication technology (iCEEiCT). IEEE, pp 672–676 5. Almolhis N, Alashjaee AM, Duraibi S, Alqahtani F, Moussa AN (2020) The security issues in IoT-cloud: a review. In: 2020 16th IEEE international colloquium on signal processing & its applications (CSPA). IEEE, pp 191–196 6. Gong B, Zhang Y, Wang Y (2018) A remote attestation mechanism for the sensing layer nodes of the internet of things. Future Gener Comput Syst 78:867–886 7. Kakkar L, Gupta D, Saxena S, Tanwar S (2021) IoT architectures and its security: a review. In: Proceedings of the second international conference on information management and machine intelligence. Springer, pp 87–94 8. Krishnamurthi R, Kumar A, Gopinathan D, Nayyar A, Qureshi B (2020) An overview of IoT sensor data processing, fusion, and analysis techniques. Sensors 20(21):6076 9. Li S, Zhang Y, Raychaudhuri D, Ravindran R, Zheng Q, Dong L, Wang G (2015) IoT middleware architecture over information-centric network. In: 2015 IEEE globecom workshops (GC Wkshps). IEEE, pp 1–7 10. Mackenzie M (2016) LPWA networks for IoT: worldwide trends and forecasts 2015–2025. Analysys Mason Limited (srpanj, 2016) 11. Majumder A, ElSaadany YA, Young R, Ucci DR (2019) An energy efficient wearable smart IoT system to predict cardiac arrest. Adv Hum Comput Interact 12. Mrabet H, Belguith S, Alhomoud A, Jemai A (2020) A survey of IoT security based on a layered architecture of sensing and data analysis. Sensors 20(13):3625 13. Nahar L, Hossain MS, Jahan N, Tasnim M, Andersson K, Hossain M et al (2022) Smart home surveillance based on IoT. In: Proceedings of international conference on fourth industrial revolution and beyond 2021. Springer, pp 563–574 14. Palani D, Venkatalakshmi K (2019) An IoT based predictive modelling for predicting lung cancer using fuzzy cluster based segmentation and classification. J Med Syst 43(2):1–12 15. Pokhrel SR, Vu HL, Cricenti AL (2019) Adaptive admission control for IoT applications in home WiFi networks. IEEE Trans Mob Comput 19(12):2731–2742 16. Progga NI, Hossain MS, Andersson K (2020) A deep transfer learning approach to diagnose COVID-19 using x-ray images. In: 2020 IEEE international women in engineering (WIE) conference on electrical and computer engineering (WIECON-ECE). IEEE, pp 177–182 17. Ray PP (2016) A survey of IoT cloud platforms. Future Comput Inf J 1(1–2):35–46 18. Ray PP, Dash D, Kumar N (2020) Sensors for internet of medical things: state-of-the-art, security and privacy issues, challenges and future directions. Comput Commun 160:111–131 19. Rezoana N, Hossain MS, Andersson K (2022) Face mask detection in the era of Covid-19: a CNN-based approach. In: Proceedings of the third international conference on trends in computational and cognitive engineering. Springer, pp 3–15 20. Saif S, Jana M, Biswas S (2021) Recent trends in IoT–based smart healthcare applying ml and dl. Emerg Technol Data Min Inform Secur 785–797 21. Sciarrone A, Bisio I, Garibotto C, Lavagetto F, Staude G, Knopp A (2020) A wearable prototype for neurological symptoms recognition. In: ICC 2020–2020 IEEE international conference on communications (ICC). IEEE, pp 1–7 22. Seshadri DR, Davies EV, Harlow ER, Hsu JJ, Knighton SC, Walker TA, Voos JE, Drummond CK (2020) Wearable sensors for COVID-19: a call to action to harness our digital infrastructure for remote patient monitoring and virtual assessments. Front Digit Health 8 23. Shafkat Raihan S, Islam RU, Hossain MS, Andersson K (2022) A BRBES to support diagnosis of Covid-19 using clinical and CT scan data. In: Proceedings of the international conference on big data, IoT, and machine learning. Springer, pp 483–496 24. Sharma M, Sharma P (2015) A detail review on IEEE 802.16 m (wimax-2). Int J Adv Res Comput Eng Technol (IJARCET) 4(5) 25. Xhafa F, Kilic B, Krause P (2020) Evaluation of IoT stream processing at edge computing layer for semantic data enrichment. Future Gener Comput Syst 105:730–736
IoMT-based Android Application for Monitoring COVID-19 …
157
26. Yassein MB, Shatnawi MQ et al (2016) Application layer protocols for the internet of things: a survey. In: 2016 international conference on engineering & MIS (ICEMIS). IEEE, pp 1–4 27. Zisad SN, Hossain MS, Hossain MS, Andersson K (2021) An integrated neural network and SEIR model to predict COVID-19. Algorithms 14(3):94
A Cost-Effective Unmanned Ground Vehicle (UGV) Using Swarm Robotics Technology for Surveillance and Future Combat Shamim Ahmed , Md. Khoshnur Alam, M. Rifat Abdullah Dipu, Swarna Debnath, Sadia Haque, and Taiba Akhter
Abstract Swarm robotics, which takes its inspiration from nature, is a hybrid of swarm intelligence and robotics that has a lot of promise in various areas. The goal of this study is to build and develop a cost-effective and remote operative multidirectional Unmanned Ground Vehicle (UGV) using Swarm robotics technology. This study presents the development of an open-source, low-cost communication module that can connect small-sized robots to communicate on a mobile phone via Wi-Fi. Robots in close proximity are able to communicate and identify each other’s positions and bearings. The development involved Node MCU in remotecontrolled and UGV robot, DC motor, L298N motor controller, Servo motor, ESP32 cam, and mechanical wheels. An android-based application called Blynk is used by smartphones to control the speed and directions of the vehicle to reach a specific destination. We investigate four steps of optimizations that were used to improve overall performance by looking at the potential of changes to the hardware level in conjunction with software improvements. In contemporary warfare, UGVs may be employed as a robotic army in any battle and as a surveillance device for a variety of objectives. Keywords Swarm robotics technology · Remotely controlled · Node micro-controller unit (MCU) · Blynk app · Mechanical wheel · Unmanned ground vehicles (UGV) · Moveable gun · Surveillance
S. Ahmed (B) · Md. K. Alam · M. R. A. Dipu · S. Debnath · S. Haque · T. Akhter Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_14
159
160
S. Ahmed et al.
1 Introduction Robotics has significantly aided humans in completing a variety of tasks. Robots are built to work in a variety of environments and to complete activities on behalf of people. They work under actual and real-time conditions, and they should monitor complex physical features. In certain ways, robots are used in place of human soldiers in battle like Unmanned Ground Vehicles (UGVs) that can be employed in a variety of situations where a human operator would be inconvenient or dangerous, or when human presence is impossible like forest firefighting [1]. Because of the apparent tactical benefits in military missions such as scouting and reconnaissance, UGVs work as a collaborative system [2]. Actually, UGV is the vehicles that function without the presence of a human onboard. The cooperation of humans-robot during the war brings huge advantages for the soldiers to support effectively and make interactions like a virtual leader [3]. UGVs operate in places restricted by enemies so that they can overcome considerably greater pressure during combat. For example, a search and destroy operation which simulates and analyzes the feasibility and efficacy of combat using a series of computers [4]. In this research, we introduce two UGVs that can be controlled from a safe distance with two weapons on top of each. Between those robots, one robot can be identified as a “Master Robot” and another as a “Slave Robot”. The “Master Robot” acts as the main part between the two robots. Basically, the “Slave Robot” follows all of “Master Robot” instruction. Those two robots are connected through Wi-Fi network. When “Master Robot” detect any target (Enemy) by its camera then it broadcast the signal via Wi-Fi network to the “Slave Robot” and they engage the target together. The UGV and remote-controlled light machine gun will be operated by a smartphone from a safe distance. The two bots will construct a swarm and function together as a unit. This kind of UGV swarm can be used to replace soldiers in battle and decrease the amount of human existence lost. The UGV swarm has a number of possible benefits, including preventing injuries by reducing the number of soldiers killed in battle and increasing mobility by allowing the UGV robot to travel to any challenging location that normal humans cannot reach. This type of machine might be used as a security machine in emergency situations. Many other researchers are working on unmanned land vehicles, autonomous aerial surveillance vehicles for various purposes like agriculture, autonomous wheelchair, surveillance in war field, and fire strike planning [5–8]. But many autonomous UGVs and robotics use sophisticated sensors that are costly to deploy, the following research concerns arise a few issues. Firstly, is it possible to develop a low-cost unmanned ground vehicle? Second, is it possible to control more than one unmanned ground vehicles at the same time? Third, will that robot can perform during warfare? These requirements prompted us to see the issues posed above as a problem to be solved. So, we got motivated to build something like military grade UGVs that will decrease human causalities during warfare and save a lot of soldiers’ life.
A Cost-Effective Unmanned Ground Vehicle (UGV) …
161
2 Methodology We chose a centralized management framework as the project’s implementation architecture. A centralized management strategy allows the UGV framework to have lower expectations and little coordination with individual UGVs. Essentially, the UGV collects data from a centralized controller and acts on it. Hypothetically, this strategy is quicker and easier to refine. Our proposed system methodology is defined in Fig. 1. After developing the hardware and software side of UGV the overall architecture starts with making a connection with the Blynk app. By developing the connection between UGV and Blynk app, the system will be ready to move with video streaming. UGV will move in any direction as per Blynk app instructions. For example, if the value of the right side and left side is defined as a 0 (False) then UGV will move Forward. This instruction is developed into Node MCU. All of the different movement is also defined into Node MCU and if all of the instruction makes their requirement, then UGVs will reach their required position and start finding targets. If they find any targets nearby then both UGVs start firing as per controller instruction. Mainly the controller gets visual data from “Master Robot” and decides to engage.
Fig. 1 Flowchart of overall methodology
162
S. Ahmed et al.
2.1 Preliminary Studies and Planning The research is currently focused on wireless connectivity between some of the dual UGV swarms as well as the UGV management system. We had used Node MCU’s default Wi-Fi module, which will be attached directly to a cell phone, for interaction between the different UGVs and with the managed device. Planning the entire model of our UGV Swarm robot architecture is essentially separated into two sections, hardware and software. In this project, the motion of the two wheels [9] according to the direction of the processing unit (Blynk app on a mobile phone) then efficiency over the distance between both the UGV and the main controller, and the position of the object wheel, which will evaluate the angle of motion of the UGV. For surveillance and finding targets, we use ESP32 Wi-Fi camera.
2.2 System Architecture The unmanned ground vehicle’s elevated configuration is seen in Fig. 2. Both the handset and the UGV must be connected to monitor the UGV globally. The mobile as well as the UGV will be paired using the Node MCU’s standard Wi-Fi module. The smartphone app (Blynk App) will connect to the UGV via Wi-Fi and send commands to it. The Node MCU mostly on UGV can place the order from the android app and act accordingly. Since the driver will switch the UGV based on the video input, the UGV’s ESP32 camera can transmit visual to the android app to build the swarm structure. The “Master Robot” and “Slave Robot” are two different robots.
Fig. 2 UGV’s architectural design
A Cost-Effective Unmanned Ground Vehicle (UGV) …
163
The “Master Robot” is attached to the ESP32 camera. The operator will monitor the “Master Robot” based on the video input, and the “Slave Robot” will obey the master robot. The internal communication between two robots happens via Wi-Fi hotspot. Because they are in the same network so they can communicate through the network. All robots have a gun over the top that will be operated by the base processing unit based on a video stream. Node MCU (Node Micro-Controller Unit) Since it has a framework that allows one to program in C or use it like an Arduino so the Node MCU is included. We utilize Node MCU since the UGV and mobile can be connected using Node MCU’s standard Wi-Fi module. DC Motors (Direct Current Motors There’ll be 2 DC motors included. They cycle faster as even the power transmitted increases. Two of certain motors would be used in conjunction with a circular ball toward the front of the UGV, allowing it to be positioned and drive in all paths. L298N Motor Driver To allow the processor to operate the motors, a link must be between both DC motors as well as the Node MCU. This relation will be possible by the L298n module. This system has the ability to spin DC motors anti-clockwise and clockwise by reversing the positive and negative control terminals. Servo Motor A servo motor is capable of precise rotation. A control signal in this form of motor different responses on the current location of the DC motor. This feedback enables servo motors to spin with great precision. For the gun campaign, we used three servos in this research. ESP32 Wi-Fi Camera As the “Master Robot” robot’s eye, an ESP32 Wi-Fi wireless camera was used. This camera can have real-time video updates to assist in determining the direction and detecting targets. Switch, Capacitor, Battery, Connecting wire, and Others These four were also included in this project for diverse reasons. The switch controls the swarm UGV’s control on and off. The primary power supply is the generator. The interior body of the robot was constructed using a transistor and control electronic.
2.3 Build Electronic Circuits For this project, there are 2 types of electronic circuits. One is for the “Master Robot” and another is for the “Slave Robot”. Master Robot In Fig. 3, for developing “Master Robot” electronic circuit, here Node MCU acts as the main module of the robot. L298N motor driver is for the wheel control of the robot. ESP32 cam provides visual for the “Master Robot”. In the master robot battery is the main power source of the robot. For controlling the gun of the robot, three servo motors are used here.
164
S. Ahmed et al.
Fig. 3 Master robot
Slave Robot In Fig. 4, the basic circuit of the soldier robot is almost the same as the “Master Robot”. The only difference with the “Master Robot” is that here in the soldier robot, no ESP32 cam is used because the “Slave Robot” will follow the “Master Robot”.
2.4 System Code Development For the smart functionality of the UGV, some existing libraries are used to create the new raw data. On the UGV, Node MCU is responsible for two critical parts and these components are the motor driver and servo motor. Here a section of code is developed for controlling those components. Another section of code is developed for getting the ESP32 cam feedback. For communication purpose the code of authentication developed into UGVs to ensures the security of the swarm UGV [10]. For avoiding collision between two UGVs section of code is also develop. For Development of all above code section, C programming language is used. One special library is basically used for Camera Module which is AI thinker mode. This module helps us to identify the real-world object by using image processing technology [11] “CAMERA MODEL AI THINKER” for the future image processing purpose.
A Cost-Effective Unmanned Ground Vehicle (UGV) …
165
Fig. 4 Slave robot
3 Result and Discussion After all the evaluations, the result should be more transparent and easier to understand. So, we were showing our work result and discussed the overall ideas at a glance.
3.1 Functionality Test In this section, some functionality tests have done here for ensuring all the required components work perfectly. Here are some movement tests of multi-directional UGVs and guns for better understanding. The following Fig. 5 shows the left (a) and right (c) movements of the swarm UGV as per the instruction from the Blynk app. We define the x-axis and y-axis in the code to allow the Blynk app to provide direction and movement. Also, it shows the forward and backward (b) movement of the swarm UGV. The required functionality of the swarm UGV meets hereafter this test. The gun of the swarm UGV will also move accordingly as per the Blynk app instruction. The joystick together will help control the robot in our required direction. During robot’s movement to stop collision, we set a distance range of 2 m between the two robots so that the robots can avoid the collision and move by themselves.
166
S. Ahmed et al.
Fig. 5 In Fig. a Left, b Forward and Backward (both) and c Right movement and gun direction of the swarm UGV as the instruction from Blynk app
3.2 Blynk App Test There is an Android app developed for IoT project which acts as a remote control for those robots. In Fig. 7, we can see the very first view of the Blynk App. In this app, we can see there is a slider option that is used for the zooming camera and joysticks for the movement of those robots. There is a lot of option which are used for manually controlling the implemented camera. Or you can say it’s an option for featuring the robot’s visual perception. Also, we get live feedback from “Master Robot” which is a real-time visual perspective that includes UGVs movement and streams live seamlessly.
4 Experiment 4.1 Performance of Swarm UGV During implementation in Fig. 6, four experiments have been done on several phases of two UGVs that are marked as Experiment 1, Experiment 2, Experiment 3, and Experiment 4 to measure the performance of the swarm UGV. We measure the performance total out of 10 based on controlling, gun movement, and video feedback capability. In short, this paper examines four stages of optimization. The first two steps of the optimization step were applied to the software level, and the second two steps of the optimization step explored the possibility of changes to the hardware level associated with the software level. We got our required performance after the
A Cost-Effective Unmanned Ground Vehicle (UGV) …
Fig. 6 Overall performance of swarm UGV
Fig. 7 Main interface, camera feature, live feedback of blynk app
167
168
S. Ahmed et al.
fourth time improvement of those UGVs. Here from Experiment 4, we can extract the best result of controlling, gun movement, and video feedback capability.
4.2 Wi-Fi Frequency Test of UGV In communications platform, the baud rate is the frequency at which data is transmitted. When describing serial communication devices, the term “baud rate” is typically used. “9000 baud” in the serial port implies that the device controller can send a maximum of 9000 bits per second. In Table 1, “Yes” indicates that UGV works properly from those radii, while “No” indicates that UGV does not operate completely from those radii. We count those measurements by three baud rate frequencies which are 9000, 55,000, and 110,000, and distance is measured by “Meter” per scale. The experiment was carried out by driving and moving a UGV in an open region with a various Wi-Fi frequency and distance based on different baud rate. The tests begin by setting the baud rate on the UGV and the Blynk App to 9000 Baud. The distance ranges are less than 8 m to more than 1600 m. After that, we adjust the baud rate to 55,000 and 110,000 to get different results. The research’s outcome is shown in Table 2. The results of the range of each testing baud rate are shown in Table 2. The UGV’s anticipated range is up to 1500 m, however, based on the results, the system’s highest range is up to 1400 m with a 5.57 % accuracy. The baud rate 55,000 frequency is the optimal usage for implementation in this UGV system, according to Table 2. The 55,000 baud rate frequency gives us the best result with less percentages of error. In Table 3, we have analyzed and compared different existing systems. Networked unmanned robotics technology is being actively developed for both military and civilian applications to carry out a variety of tedious, hazardous, and dirty tasks [16–23].
Table 1 Wireless frequency data 8–600m 600–1000m Yes Yes Yes
Yes Yes Yes
1000–1200m
1200–1400m
1400–1600m
Yes Yes No
No Yes No
No No No
Table 2 Result and error (percentage) Baud frequency Range (meter) 9000 55,000 110,000
1000 1400 1200
Percentage error (%) 26 5.57 16.33
A Cost-Effective Unmanned Ground Vehicle (UGV) … Table 3 Comparison of various existing system Refs. Model Key features [12]
UAV dead-reckoning system, image processing by colored pattern
IoT object tracking, path planning, localization
[13]
Distance threshold, path length, bat algorithm (BA) and particle swarm optimization (PSO) Artificial intelligence (AI), KNN, nearest insertion and 2-opt methods
Obstacle avoidance
[14]
[15]
Robotic pickers, vegetable monitoring, fruit growth calculations, soil quality improvement, monitoring, and maintenance BPNN, OLOv3 Mining, clean and histogram of dirt, dispose oriented gradients hazard, rescuing, (HOG) image emitting exhaust processing fumes algorithms, sensory system
169
Technology
Application
Intelligence surveillance, reconnaissance, collaborative systems communication Detecting a shorter way, making decisions, coordinating motion and leading
Military use
Rescue, surveillance and military operations
Smart ground and Agricultural use air vehicles, sorting and sensing, monitoring and maintenance
Speed control, Rescue vision sensor, gas operations detector, infrared thermography, human perception detection
5 Conclusion The combination of different innovations into one framework has provided us with a roadmap to achieving targets that haven’t been accomplished in such a cost-effective manner before. All components used in this research are cheap in price. In today’s scenarios, these developments result in a self-reliant and capable that can handle problems on their own and make a human’s job easier. This kind of robot will typically operate outside and over a wide range of terrain, taking the place of humans. The results demonstrate that the proposed model is effective and successful in finding specific target destinations and can shoot with other neighboring robotics in multirobot systems such as UGVs (Unmanned Ground Vehicles). For future work, we will use the advanced module for robotic swarm with large populations. In terms of our project, we had some experiments that will provide some logical RAW data. We have
170
S. Ahmed et al.
ensured the best success rate of controlling UGVs, gun movement, and live video streaming capability from an external source. With today’s technology, unmanned ground vehicles can serve valuable functions in normal, unrestrained settings. The recommendation of this paper is to ensure tactical and safe deliverable missions during warfare or any critical situation where humans cannot afford or risk their lives. So that UGVs can save soldier’s life more and make their mission easy. Our project has some limitations, such as our robot cannot have interlinked with any unknown devices. Also, robots are not processed massive amounts of an image that can decide on their own. We will develop our project more perfectly where this limitation does not exist in this project work. Furthermore, we will make those bots interlinked with UAVs (Unmanned Air Vehicles) because both devices can identify enemies more correctly and take them down by detecting them. Some mechanical improvements such as heavy guns, explosives, and medical kits can be delivered for friendly reinforcement. Moreover, at last, the most significant changes will be identified with facial recognition and take a lot of information to examine several situations and make the decision of their own that how can robots perform in a dangerous and critical situations. For those improvements, a perfect UGV can be developed for the military, and it can be a leading technology for our modern era.
References 1. Roldán-Gómez JJ, González-Gironda E, Barrientos A (2021) A survey on robotic technologies for forest firefighting: applying drone swarms to improve firefighters’ efficiency and safety. Appl Sci 11(1):363 2. Liang X et al (2021) Design and development of ground station for UAV/UGV heterogeneous collaborative system. Ain Shams Eng J 3. Cao F, Jiang H (2021) Trajectory planning and tracking control of unmanned ground vehicle leading by motion virtual leader on expressway. IET Intell Transp Syst 4. Teow BHA, Yakimenko O (2018) Contemplating urban operations involving a UGV swarm. In: 2018 international conference on control and robots (ICCR). IEEE 5. Ju C, Son HI (2019) Modeling and control of heterogeneous agricultural field robots based on Ramadge-Wonham theory. IEEE Robot Autom Lett 5(1):48–55 6. Sezer V et al (2020) Conversion of a conventional wheelchair into an autonomous personal transportation testbed. Service Robotics. IntechOpen 7. Maheswaran S et al (2020) Unmanned ground vehicle for surveillance. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT), Kharagpur, India, pp 1–5. https://doi.org/10.1109/ICCCNT49239.2020.9225313 8. Liang H, Qiang H, Feng Z (2019) Research on capability characteristics modeling and cooperative fire strike planning for unmanned ground vehicles. In: 2019 2nd international conference on artificial intelligence and big data (ICAIBD). IEEE 9. Dosoftei C et al (2020) Simplified Mecanum wheel modelling using a reduced omni wheel model for dynamic simulation of an omnidirectional mobile robot. In: 2020 international conference and exposition on electrical and power engineering (EPE). IEEE 10. Hong W et al (2020) A provably secure aggregate authentication scheme for unmanned aerial vehicle cluster networks. Peer-to-Peer Netw Appl 13(1):53–63 11. Kim P (2019) UAV-UGV cooperative 3D environmental mapping. In: Computing in civil engineering 2019: data, sensing, and analytics. American Society of Civil Engineers, Reston, VA, pp 384–392
A Cost-Effective Unmanned Ground Vehicle (UGV) …
171
12. Tutunji TA, Salah-Eddin M, Abdalqader H (2020) Unmanned ground vehicle control using IoT. In: 2020 21st international conference on research and education in mechatronics (REM), Cracow, Poland, pp 1–5. https://doi.org/10.1109/REM49740.2020.9313890 13. Haruna Z et al (2021) Obstacle avoidance scheme based elite opposition bat algorithm for unmanned ground vehicles. Covenant J Inf Commun Technol 9(1) 14. Guzey A, Akinci MM, Guzey HM (2021) Smart agriculture with autonomous unmanned ground and air vehicles: approaches to calculating optimal number of stops in harvest optimization and a suggestion. In: Artificial intelligence and IoT-based technologies for sustainable farming and smart agriculture. IGI Global, pp 51–174 15. Szrek J et al (2021) Application of the infrared thermography and unmanned ground vehicle for rescue action support in underground mine-the amicos project. Remote Sens 13(1):69 16. Sakib AN, Ahmed S, Rahman S, Mahmud I, Belali MH (2012) WPA 2 (Wi-Fi Protected Access 2) security enhancement: analysis and improvement. Global J Comput Sci Technol 17. Ahmed S, Begum M, Siddiqui FH, Kashem MA (2012) Dynamic web service discovery model based on artificial neural network with QoS support. Int J Sci Eng Res 3(3):1–7 18. Chaki S, Ahmed S, Biswas M, Tamanna I (2022) A framework of an obstacle avoidance robot for the visually impaired people. In: Proceedings of trends in electronics and health informatics. Springer, Singapore, pp 269–280 19. Rahaman MN, Biswas MS, Chaki S, Hossain MM, Ahmed S, Biswas M (2021) Lane detection for autonomous vehicle management: PHT approach. In: 2021 24th international conference on computer and information technology (ICCIT). IEEE, pp 1–6 20. Chaki S, Ahmed S, Easha NN, Biswas M, Sharif GTA, Shila DA (2021) A framework for LED signboard recognition for the autonomous vehicle management system. In: 2021 international conference on science & contemporary technologies (ICSCT). IEEE, pp 1–6 21. Ahmed S, Shaharier MM, Roy S, Lima AA, Biswas M, Mahi MJN, ... Gaur L (2022) An intelligent and multi-functional stick for blind people using IoT. In: 2022 3rd international conference on intelligent engineering and management (ICIEM). IEEE, pp 326–331 22. Ahmed S, Biswas M, Hasanuzzaman M, Mahi MJN, Islam MA, Chaki S, Gaur L (2022) A secured peer-to-peer messaging system based on blockchain. In: 2022 3rd international conference on intelligent engineering and management (ICIEM). IEEE, pp 332–337 23. Mahi M, Nayeen J, Chaki S, Ahmed S, Tamanna I, Biswas M (2022) LCADP: a low-cost accident detection prototype for a vehicular ad hoc network. In: Proceedings of the third international conference on trends in computational and cognitive engineering. Springer, Singapore, pp 391–403
Neural Network-Based Obstacle and Pothole Avoiding Robot Md. Mahedi Al Arafat , Mohammad Shahadat Hossain , Delowar Hossain, and Karl Andersson
Abstract The main challenge of any mobile robot is to detect and avoid obstacles and potholes. This paper presents the development and implementation of a novel mobile robot. An Arduino Uno is used as the processing unit of the robot. A Sharp distance measurement sensor and Ultrasonic sensors are used for taking inputs from the environment. The robot trains a neural network based on a feedforward backpropagation algorithm to detect and avoid obstacles and potholes. For that purpose, we have used a truth table. Our experimental results show that our developed system can ideally detect and avoid obstacles and potholes and navigate environments. Keywords Mobile robot · Artificial intelligence · Neural network · Obstacle avoiding · Pothole avoiding
1 Introduction A mobile robot is a self-controlled robot that can observe its surroundings with the help of sensors and can move on its own in an unknown environment. This environment can be unpredictable, large, dynamic, and partially or completely unstructured Md. M. Al Arafat (B) · M. S. Hossain Department of Computer Science and Engineering, University of Chittagong, Chittagong 4331, Bangladesh e-mail: [email protected] M. S. Hossain e-mail: [email protected] D. Hossain Cumming School of Medicine, University of Calgary, Calgary, AB T2N 1N4, Canada e-mail: [email protected] K. Andersson Department of Computer Science, Electrical and Space Engineering, Lulea University of Technology, Skellefteå SE-931 87, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_15
173
174
Md. M. Al Arafat et al.
[1]. A Mobile robot should be capable of comprehending the environment’s structure [2–6]. Mobile robot navigation is a broad area that is currently covering a large number of technologies and applications [7]. These robots are becoming more and more essential in our daily lives [8], for example, automatic cleaning, agriculture, housekeeping, hazardous environments, entertainment, space exploration, military, social robotics, nuclear power plants, and so on [9, 10]. They have the potential to raise our standard of living [11, 12]. The primary issues with mobile robots are detecting and avoiding obstacles as well as potholes on their way toward a goal [13]. Brook’s well-known subsumption architecture states that obstacle avoidance is the lowest, or zeroth level of competence, which means it is the core functionality of a mobile robot system upon which everything else depends [14]. The field of the mobile robot is developing day by day [10]. Unlike other types of robots, a mobile robot has to navigate through the environment to accomplish its tasks. There has been a lot of research in recent years across the world to find an appropriate navigation method for mobile robots [15, 16]. There has been a lot of work done in the field of mobile robots. Cheng and Wang [17] have presented their robot, which used G mapping SLAM and Hector SLAM for robot navigation. The robot’s performance was very good. But the problem with this robot is its complexity. It needs high computing power. Dung Tran et al. have described a humanoid mobile robot that used Neural Network [18] to navigate environments. They proposed a method using depth images for path planning, obstacle avoidance, path following, and robot localization. Their robot works very well in an indoor environment. But the robot is very expensive, and it needs high computing power. Zhu et al. [19] have proposed a method that is based on monocular vision. This method performs well in a simple environment, but not in complex environments. Also, its performance depends on lighting conditions. Cho and Hong [20] have presented a robot that used a laser range finder for localization. The map is given by a set of vertexes. The robot works by matching vertexes of the map and vertexes from the laser range finder together, but this robot cannot detect occluded objects. Biswas et al. [21] have introduced a method of robot localization and navigation based on the Fast sampling plane filtering. They take a 3D observation from Depth Image and map it with a 2D map. This method shows good performance on local localization, but performance of the robot in a dynamic environment is unknown. In this paper, we present a novel way to develop a mobile robot using Arduino Uno as the processing unit. Several contributions are included in this research. These are (1) the robot can detect and avoid obstacles as well as potholes simultaneously, (2) it can work perfectly in any lighting condition, and (3) it is very cost-effective.
2 Methodology In this section, we describe the structure of the robot, sensory system, circuit diagram, the brain of the robot, design of the neural network, training process, and robot functional diagram.
Neural Network-Based Obstacle and Pothole Avoiding Robot
175
2.1 Architecture of OPHAR The robot consists of three HC-SR04 Ultrasonic sensors and a Sharp GP2Y0A21YK0F distance measurement sensor. The DMS sensor has a range of 10–80 cm [22] which can scan the environment and provide input to the Arduino Uno. Arduino processes all the input data and gives output to the motor driver and servo motor. L298N motor driver receives signals from the Arduino and drives two DC motors according to those signals. The Servo motor gets signals from the Arduino and rotates two sensors. Figure 1 shows the block diagram of the OPHAR. The Ultrasonic sensor is shown in Fig. 2a. It can measure distance by emitting ultrasonic sound waves (23–40 kHz) through the transmitter and waiting for the sound to reflect. It takes time from the emission of the sound waves till it receives the reflected sound, then measures the distance using this data. We have used a Sharp GP2Y0A21YK0F distance measurement sensor as shown in Fig. 2b to detect potholes. The sensor has a range of 10–80 cm [22]. In this robot, we have also used Arduino Uno, Servo Motor, Motor Driver, Left DC Motor and
Fig. 1 Diagram block of OPHAR Fig. 2 Sensors of OPHAR
(a) Sonar sensor.
(b) Sharp distance measurement sensor.
176
Md. M. Al Arafat et al.
Fig. 3 Physical architecture of OPHAR
Right DC Motor, and Robot chassis. The physical architecture of OPHAR is shown in Fig. 3.
2.2 Neural Network-Based Controller for OPHAR Neural Network is a computational algorithm [23]. Every network contains an output layer, an input layer, and one or more hidden layers [24, 25]. The structure of the neural network for OPHAR is shown in Fig. 4. The network contains four neurons in the input layer, three of them get input from three HC-SR04 ultrasonic sensors, and another neuron gets input from the Sharp GP2Y0A21YK0F distance measurement sensor. The hidden layer contains five neurons. The Hidden Layer extracts features from the input data and uses them to link input to the correct output [26]. Lastly, the output layer has three neurons, two of which give output for the left and right motors, and another neuron provides output for the servo motor. The basic idea of designing a C program to create a neural network is to create an architecture of data arrays to store the weights and keep track of the running totals as signals pass forward and errors are fed backward. When the backpropagation algorithm runs through the network, many nested loops iterate across these arrays and perform required calculations [27]. Learning rate, hidden node, momentum,
Neural Network-Based Obstacle and Pothole Avoiding Robot
177
Fig. 4 Structure of neural network for OPHAR
and initial weight all work together to train the system faster as well as to make it effective and minimize all the errors that occur in neural network design [28, 29]. Slower learning is the result of a lower learning rate, but it minimizes the likelihood of the network entering a rise and fall condition where it misses the solution and never can achieve success [30, 31]. We use the learning rate of 0.3. By integrating a part of earlier backpropagation into current backpropagation, momentum smooths out the training process. Momentum helps to find out the best solution. Hidden neurons have an impact on the speed of training, and they can help to prevent being stuck on local minimum. The capability to solve complex problems also depends on the hidden layers, but for a large number of neurons, we have to store a large number of weights as well. The initial weight value should be small. In the beginning, we set all the Weight values between −0.5 and 0.5. The value of these parameters depends on the training data. The threshold level of error is set by the success value. Success is the ultimate value, and it is very small. The total error of this type of network tries to be 0, but cannot reach it. Arduino Uno’s 2k SRAM can run 4 inputs, 5 hidden, and 3 output neurons easily. We have used the backpropagation algorithm to train the neural network. It enforces the delta rule. The principal preoccupation of a Neural network is to determine the accurate value of weights. Random weights are used to start the neural network. Gradient descent helps in the process of backpropagation. The magnitude of error is calculated by gradient descent, and it makes adjustments in the weights to reduce the error. At first, gradient descent calculates the magnitude of error at each neuron. The larger the difference between actual output and target output, the higher the error. The magnitude of error calculation at the output layer is simple. But for the hidden layer, there is no target to measure against, so calculating the magnitude of error at the hidden layer becomes a little complex. The relationship between the magnitude of error from the output layer and weights determines the magnitude of error for each hidden neuron. The code iterates through all the output connections
178
Md. M. Al Arafat et al.
Algorithm 1 High level breakdown of program logic 1. Initialize the arrays. The array of weight holds random numbers, and the table of two more arrays needed for backpropagation is set to 0. 2. Start a big loop in which the whole system runs using the training dataset. 3. Randomize the order of training set on each iteration to reduce the chance of being stuck on the local minimum. 4. Calculate hidden and output layer activation functions and errors. 5. Back-propagate the errors. 6. Recalculate the value of weights. 7. Run again if the value of the success threshold is less than the error value. 8. If the value of the success threshold is greater than the error value then stop and display success as well as display all the values to the serial monitor. 9. The result of the test run will be displayed on the serial monitor every 500 cycles.
Table 1 Training dataset Input Sonar front DMS 1 1 0
0 1 1
Sonar left
Sonar right
Output Left motor
Right motor Servo motor
1 1 0
1 1 0
1 0 0
1 0 0
1 0 0
for every hidden neuron by multiplying the weights with the magnitude of the error and keeping track of the total. Then the magnitude of error at the hidden layer is calculated. After that, the amount of change in every weight is determined. The new weight is then calculated by adding the old weight with the change value. Finally, we come to bias. It is a straightforward concept. Each of the input and hidden layers has one additional neuron that is constantly active. The bias has many positive effects. It improves stability and increases the number of solutions available. The most crucial aspect is that if all the inputs are zero, then no signal will pass through the network. Bias avoids the chance of all inputs being zero. The network is trained with the help of a training dataset, which consists of a set of inputs and the corresponding outputs.
2.2.1
Training of Neural Network for OPHAR
There are two data arrays in the configuration section. One is input and another is output. These arrays make the truth table for training the network. In the input section, 1 is the max output value and 0 is the min output value of a sensor. In the output section, 1 for the left and right motor indicates that the left and right wheels will rotate forward at full speed and for 0, they will rotate backward at full speed. If the output of the servo motor is less than 0.5, then the robot will stop moving and the servo motor will rotate those sensors to observe its surroundings. The truth table shown in Table 1 is used to train the network until it achieves a certain
Neural Network-Based Obstacle and Pothole Avoiding Robot
179
degree of accuracy. With the help of this truth table, Neural Network reduces the error and makes adjustments in the weights to reduce the error. The result of training is periodically supplied to the serial port, and that can be monitored by the Serial monitor of IDE.
2.3 Working Flowchart of OPHAR We have used 2 types of sensors in this robot. They are three ultrasonic and one Sharp 2Y0A21 distance measurement sensor (DMS). The ultrasonic sensor is facing forward and the DMS is facing downward. DMS measures the distance of the ground from the robot chassis. If the robot found a hole in front of it, then the robot will not move forward. We have attached one ultrasonic and distance measurement sensor on top of the servo motor, and attached the servo motor in front of the robot. The motor has a rotational range of 0 to 180◦ , with its initial position set at 90◦ . When the direction is set to 0, the motor rotates to 0◦ , which corresponds to the left side of the robot. Similarly, 90◦ represents the front direction, while 180◦ corresponds to the right side, akin to the concept of a protractor. We have attached additional two ultrasonic sensors on the left and right sides of the robot chassis both facing forward. These sensors cover those areas that the previous sensor can’t. The outputs of sensors are mapped and constrained between 0 and 100, and then divided by 100 to feed into the neural network. As we have used the sigmoid function, we get output from the system between 0 and 1. The output is then multiplied by 100 and fed into three functions. One is the left motor, another one is the right motor, and the last one is for the servo motor function. If the input for the left motor is less than 50, then it selects direction 0. When direction 0 is selected, the system will set L298N motor driver left motors input pin 1 LOW as well as input pin 2 HIGH and map the value of driving force (49, 0) to (20, 180) and send it to the left motor’s PWM pin, which adjusts the speed of rotation. We tried to make it robust by mapping it between 20 and 180. If the PWM value is less than 20, then the motor driver will not provide enough power to move the robot, and if the PWM value is greater than 180, then it moves very fast, so there is a chance of a collision. If the left motor gets input greater than 50, then it will select direction 1. If direction one is selected, then the system will set the motor driver’s left motors input pin 1 HIGH and input pin 2 LOW and map the value of driving force (51, 100) to (20, 180) and send it to the PWM pin. If the left motor gets input 50, it will set the motor driver’s left motors input pin 1 LOW and pin 2 LOW. The robot will stop moving at that time. The right motor function works the same as the left motor function. The functional flowchart is shown in Fig. 5. Lastly, the servo motor also gets input between 0 and 100. The bigger the value, the closer the object. If it gets an input less than 40, then it will set the driving force to 50 for both the left and right motors so that the robot will stop moving. Initially, the servo motor will be at a 90-degree angle. First, the servo motor will rotate those sensors to 0◦ and measure the distance in that direction. After that, it will rotate to
180
Md. M. Al Arafat et al.
Fig. 5 Functional flowchart of OPHAR
180◦ and measure the distance in that direction. Then OPHAR will compare them and select a direction. If the robot selects the direction of 0◦ , then the left motor driving force will be 70, and the right motor driving force will be 25 for 250 milliseconds. By doing this, the right wheel of the robot will rotate in the forward direction as well as the left wheel will rotate in the backward direction. By moving this way, the robot will turn left. Then the robot will move forward again. If the robot selects the direction of 180◦ , then the left motor driving force will be 70, and the right motor driving force will be 25 for 250 milliseconds. After all of these operations, the robot will move forward again.
Neural Network-Based Obstacle and Pothole Avoiding Robot
181
3 Experimental Results After powering on, first, the robot trains the neural network with the help of a truth table (shown in Table 1). Then it takes input from the sensors. Arduino Uno processes those sensor data and provides necessary signals to the motor driver as well as to the servo motor. The motor driver provides the power to rotate the robot’s wheel, and the servo motor rotates sensors to observe the robot’s surroundings. Figures 6 and 7 show how the robot avoids obstacles and potholes. Figure 6 shows the snapshots of robot navigation, and Fig. 7 shows how the robot avoids potholes. The performance of the robot was tested several times in different environments. Every time, there were several obstacles and potholes surrounding it. The robot successfully avoided all of them. Although the robot performs well in both light and dark conditions, the performance is compared. Figure 8 shows the comparison result. In every round, the robot was tested 10 times in both environments. In round 1, the robot successfully avoids obstacles 9 times in indoor environments and 8 times in outdoor environments. Here in this Fig. 8, five rounds of test results are shown.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 6 Robot avoiding obstacles: a The OPHAR; b found obstacle, read sensors, and select new direction; c turned right, found obstacle; d turned left, found obstacle; e turned right, found obstacle; f turned right and found exit
182
Md. M. Al Arafat et al.
(a)
(b)
Fig. 7 Robot avoiding potholes: a found pothole; b changed direction
Fig. 8 Indoor and outdoor result comparison
4 Conclusion and Future Work We have developed and implemented a mobile robot based on Arduino Uno. It used a distance measurement sensor and ultrasonic sensors to scan its surrounding environment to navigate. We have trained a neural network based on a feedforward backpropagation algorithm to navigate environments by avoiding obstacles and potholes. The robot performed very well in different environments and lighting conditions, and it was very cost-effective.
Neural Network-Based Obstacle and Pothole Avoiding Robot
183
In the future, we have planned to replace the Arduino Uno with Raspberry Pi and add a camera to the robot so that it can not only detect but also recognize objects. The robot will gain more functionalities and become much more efficient.
References 1. Singh MK, Parhi DR (2009) Intelligent neuro-controller for navigation of mobile robot. In: Proceedings of the international conference on advances in computing, communication and control, pp 123–128 2. Klanˇcar G, Škrjanc I (2007) Tracking-error model-based predictive control for mobile robots in real time. Robot Auton Syst 55(6):460–469 3. Wolf DF, Sukhatme GS (2008) Semantic mapping using mobile robots. IEEE Trans Robot 24(2):245–258 4. Hasan M, Asaduzzaman M, Rahman MM, Hossain MS, Andersson K et al (2021) D3mciad: data-driven diagnosis of mild cognitive impairment utilizing syntactic images generation and neural nets. In: International conference on brain informatics. Springer, pp 366–377 5. Pradhan SK, Parhi DR, Panda AK (2009) Fuzzy logic techniques for navigation of mobile robots. Appl Soft Comput 9(1):290–304 6. Parhi DR, Pradhan SK, Panda AK, Behera RK (2009) The stable and precise motion control for multiple mobile robots. Appl Soft Comput 9(2):477–487 7. Awad HA, Al-Zorkany MA (2004) Mobile robot navigation using local model networks. In: International conference on computational intelligence. Citeseer, pp 326–331 8. Nilwong S, Hossain D, Kaneko S, Capi G (2019) Deep learning-based landmark detection for mobile robot outdoor localization. Machines 7(2):25 9. Ruan X, Ren D, Zhu X, Huang J (2019) Mobile robot navigation based on deep reinforcement learning. In: 2019 Chinese control and decision conference (CCDC). IEEE, pp 6174–6178 10. Engedy I, Horváth G (2009) Artificial neural network based mobile robot navigation. In: 2009 IEEE international symposium on intelligent signal processing. IEEE, pp 241–246 11. Ko B, Choi H-J, Hong C, Kim J-H, Kwon OC, Yoo CD (2017) Neural network-based autonomous navigation for a homecare mobile robot. In: 2017 IEEE international conference on big data and smart computing (BigComp). IEEE, pp 403–406 12. Arfizurrahmanl M, Hossain MS, Ahmad, Haque MA, Andersson K (2021) Real-time nonintrusive driver fatigue detection system using belief rule-based expert system. J Internet Serv Inf Secur 11(4):44–60 13. Rahman A, Mustafa R, Hossain MS (2022) Real-time pothole detection and localization using convolutional neural network. In: Proceedings of the international conference on big data, IoT, and machine learning. Springer, pp 579–592 14. Brooks RA (1991) New approaches to robotics. Science 253:1227–1232 15. Singh NH, Thongam K (2019) Neural network-based approaches for mobile robot navigation in static and moving obstacles environments. Intell Serv Robot 12(1):55–67 16. Velagic J, Osmic N, Lacevic B (2008) Neural network controller for mobile robot motion control. World Acad Sci Eng Technol 47:193–198 17. Cheng Y, Wang GY (2018) Mobile robot navigation based on lidar. In: 2018 Chinese control and decision conference. IEEE, pp 1243–1246 18. Dung TD, Hosain D, Capi G (2019) Neural network based robot navigation in indoor environments using depth image 19. Zhu Y, Sun C, Han Z, Yu C (2011) A visual navigation algorithm for mobile robot in semistructured environment. In: 2011 IEEE international conference on computer science and automation engineering, vol 2. IEEE, pp 716–721
184
Md. M. Al Arafat et al.
20. Cho SH, Hong S (2010) Map based indoor robot navigation and localization using laser range finder. In: 2010 11th international conference on control automation robotics & vision. IEEE, pp 1559–1564 21. Biswas J, Veloso M (2012) Depth camera based indoor mobile robot localization and navigation. In: 2012 IEEE international conference on robotics and automation. IEEE, pp 1697–1702 22. de Bakker B (2019) How to use a sharp gp2y0a21yk0f IR distance sensor with arduino. https:// www.makerguides.com/sharp-gp2y0a21yk0f-ir-distance-sensor-arduino-tutorial/. Accessed 23 Apr 2020 23. Ahmed TU, Hossain S, Hossain MS, ul Islam R, Andersson K (2019) Facial expression recognition using convolutional neural network with data augmentation. In: 2019 joint 8th international conference on informatics, electronics & vision (ICIEV) and 2019 3rd international conference on imaging, vision & pattern recognition (icIVPR). IEEE, pp 336–341 24. Islam RU, Hossain MS, Andersson K (2020) A deep learning inspired belief rule-based expert system. IEEE Access 8:190637–190651 25. Ahmed TU, Jamil MN, Hossain MS, Andersson K, Hossain MS (2020) An integrated real-time deep learning and belief rule base intelligent system to assess facial expression under uncertainty. In: 2020 joint 9th international conference on informatics, electronics & vision (ICIEV) and 2020 4th international conference on imaging, vision & pattern recognition (icIVPR). IEEE, pp 1–6 26. Pathan RK, Uddin MA, Nahar N, Ara F, Hossain MS, Andersson K (2021) Human age estimation using deep learning from gait data. In: International conference on applied intelligence and informatics. Springer, pp 281–294 27. Abedin MZ, Akther S, Hossain MS (2019) An artificial neural network model for epilepsy seizure detection. In: 2019 5th international conference on advances in electrical engineering. IEEE, pp 860–865 28. Zisad SN, Hossain MS, Hossain MS, Andersson K (2021) An integrated neural network and SEIR model to predict Covid-19. Algorithms 14(3):94 29. Akter M, Hossain MS, Ahmed TU, Andersson K (2020) Mosquito classification using convolutional neural network with data augmentation. In: International conference on intelligent computing & optimization. Springer, pp 865–879 30. Islam MZ, Hossain MS, ul Islam R, Andersson K (2019) Static hand gesture recognition using convolutional neural network with data augmentation. In: 2019 joint 8th international conference on informatics, electronics & vision and 2019 3rd (icIVPR). IEEE, pp 324–329 31. Kabir S, Ul Islam R, Hossain MS, Andersson K (2022) An integrated approach of belief rule base and convolutional neural network to monitor air quality in shanghai. Expert Syst Appl 206:117905
Machine Learning for Society
A Comparative Study of Psychiatric Characteristics Classification for Predicting Psychiatric Disorder Md. Sydur Rahman and Boshir Ahmed
Abstract Psychiatry focuses on one of the greatest issues of public health. There are numerous indicators that can be utilized to assess the mental capacity of a person’s mental health. Several factors can affect the physical and financial health of a person. Psychiatrist treatment can result in mental illness. Schizophrenia primarily affects women and can be lethal. Men are more likely than women to exhibit symptoms of this illness. Antisocial conduct is caused by mental illness, which distorts social interactions. Consequently, social issues that were already obvious have spread. Adults in their 20 s and 30 s are susceptible to anxiety, substance misuse, hazardous conduct, hubris, suicidal thoughts, despair, bewilderment, and consciousness, according to a global survey. Mental diseases have been increased from 10.5% in 1990 to 19.86% in 2022 and psychiatry is account for 14.3% of total death worldwide (all about eight million). A labor-intensive survey questionnaire was utilized to obtain data. Combining surveys for distinct psychiatric conditions essential components from various research studies yielded the optimal response to the numerical value translation. After data collection, applying normalization technique psychiatric features are extracted from a processed dataset. We projected a machine learning classifier to categorize extracted features by using K-Nearest Neighbor (KNN), Polynomial Kernel SVM, Naïve Bayes, Decision tree, and Logistic regression classifiers on a scaled dataset of mental patients. This study proposes polynomial kernel SVM as a classifier to predict the psychiatric disease of each patient by comparing its performance to that of all other classifiers. Keywords Psychiatric disorder classification · Performance analysis · Prediction
Md. S. Rahman (B) Bangladesh Army University of Science & Technology (BAUST), Saidpur, Bangladesh e-mail: [email protected] B. Ahmed Rajshahi University of Engineering & Technology (RUET), Rajshahi, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_16
187
188
Md. S. Rahman and B. Ahmed
1 Introduction Using algorithms learned from training data, the relative weights of numerous categorizations and relevant parameters are determined. Psychiatric traits were surveyed. Each group was regularly surveyed. The survey’s positive results led to the implementation of a data collection system and processing environment. No machine learning classifier can accurately diagnose psychiatric diseases from psychiatric features and healthy controls. Using machine learning algorithms, the relative importance of symptoms, mental toughness, and past and present psychiatric goals is determined. We hypothesized that by applying KNN, SVM, LR, and DT classifiers to each predefined characteristic collected from a questionnaire survey, we would obtain unbiased estimates of the most influential elements in the classification of psychiatric diseases. Then, these estimations could be used to direct classification algorithms based on the retrieved features. Demanding inquiries display critical thinking skills. To assess psychiatric symptoms, statistics were utilized. An alternate objective is to examine healthy controls who may have acquired a diagnosis of mental illness. Geography and age have a bearing on psychological evaluations. After classifying all implemented data sets of specific psychiatric features, it would be simpler to predict a person’s mental capacity and recommend whether or not they should be taken to a psychiatrist.
2 Background Study In this study, a classification technique based on machine learning was combined with complicated multi-layer data, such as health evaluations, treatment decisions, and evaluations of risk factors for medical decision-making [1]. Various mental health issues were identified in psychiatric patients using machine learning approaches. Classifiers K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Logistic regression, and Decision tree have been used to categorize the level of psychiatry [2] based on the transformation and transposition of significant characteristics such as anxiety, drug abuse, high-risk behavior, arrogance, suicidal ideation, depression, and confusion. Generalized anxiety disorder (GAD) is a prevalent mental condition that is difficult to objectively identify. This study assessed the validity and reliability of a self-report measure developed to identify GAD cases [3]. For population screening, clinical case detection, and therapy evaluation, the DAST-10 can be utilized. For older children and adults. The DAST-10 gauges how serious side effects from drugs are [4]. Many fields discuss arrogance. Aggressive, comparative, and individual arrogance are categorized (each logically related to the next). Illusory self-worth causes inflated self-esteem, false superiority, and disrespectful behavior [5]. A depression-anxiety self-assessment test was used [6]. Analyzing bewilderment requires the same methods used to diagnose neurological or physical illnesses. Using
A Comparative Study of Psychiatric Characteristics Classification …
189
the patient’s medical history, physical examination, and observations, the doctor can diagnose and treat the patient [7]. Risk-taking has always appealed to professionals and scientists. Multiple studies demonstrate that the current measurement method is neither trustworthy nor accurate. In view of current calls for new risktaking concept measures, this study examines six important factors to consider when creating or implementing such measures. Reevaluate the measurement of passive risk-taking and promote greater prudence while taking risks. These ideas should enable researchers in making educated judgments regarding risk [8]. Neuroscientists explored consciousness [9]. Applying a machine learning classification algorithm to complex multi-layer [10] data, such as risk factor assessments, therapy options, and health evaluations, improved medical decision-making.
3 Methodology The outcome of our research is to categorize mental qualities using different types of machine learning algorithms and to illustrate the statistical breakdown of each trait. And based on the data, the prediction may be imposed on who would experience severe, mild, or no psychiatric symptoms. In order to address the situation, we must take many crucial steps. 1. Data Collection 2. Extracting Feature and 3. Imposed Classifier Figure 1 depicts the methodology’s data flow diagram.
3.1 Data Collection Surveys with enough questionnaires should examine anxiety, substance misuse, risktaking, hubris, confusion, and despair. Each question’s answers were rated 0, 1, 2,
Fig. 1 Workflow diagram of our methodology
190
Md. S. Rahman and B. Ahmed
Table 1 Sample responses to anxiety level measurement Q1
Q2
Q3
Q4
Q5
Q6
Q7
Anxiety
Low
Yes
Medium
Medium
Moderate
High
Yes
15
Moderate
Yes
High
Medium
Low
Low
Slightly
13
Low
At times
High
Medium
High
High
Yes
16
Table 2 Sample responses of arrogance level measurement Q1
Q2
Q3
Q4
Q5
Q6
Arrogance
Low
Yes
Medium
Medium
Moderate
High
15
Acute
Yes
High
Medium
Moderate
Moderate
17
Low
At times
High
Medium
High
High
16
and 3 for accuracy. Participants’ replies were tabulated. A threshold for the factor’s presence in terms of numbers and categories was necessary for all enquiries. We have added each attribute’s value to get the score. Psychiatric traits have similar treatments. Generalized anxiety disorder causes excessive worry (GAD). Most people know their dread is unjustified and uncontrollable. Constant stress may produce anxiety. We wanted to compare exam results to nervous patients’ survey responses. Table 1 shows anxiety-measuring responses. We addressed hubris from an interdisciplinary standpoint. Six stimulant components are conceptually interconnected. Misunderstandings, inefficient in communication, a disproportionate feeling of superiority over others, an unjustified appraisal of them, and blatant contempt are some of the contributing factors. Even if each component is present when the next one operates, the direction of causation may be in either direction. Table 2 exhibits arrogant question responses and reply remarks. We collected a table of responses from the incorporated survey for substance usage, despair, a propensity for taking huge risks, and confusion in a similar fashion to generate our desired dataset.
3.2 Feature Extraction Before applying machine learning algorithms to the dataset, the features must be extracted. Six mental qualities, including anxiety, drug usage, arrogance, depression, a desire to take risks, and perplexity are illustrated in Table 3. The data is separated into two feature sets with a minimum feature set of zero and a maximum feature set of one after being scaled for feature extraction. Using a compression technique, the amount of data is decreased to fit inside a predetermined range, which is frequently between 0 and 1. By rescaling the qualities of the data to a predetermined range, a scalar of characteristics modifies the data [11]. It keeps
A Comparative Study of Psychiatric Characteristics Classification …
191
Table 3 A data set example with six psychiatric characteristics Anxiety
Drug abuse
Arrogance
Depression
Risk-taking
Confusion
13
11
14
19
14
12
17
20
13
20
13
16
16
12
18
17
20
29
Table 4 Scaled features Anxiety
Drug abuse
Arrogance
Depression
Big risk
Confusion
0.22
0.30
0.4285
0.6190
0.4111
0.5238
0.777
0
0.5238
0.9523
0.7058
0.6190
0.6666
0.55
0.6666
0.9047
0.5882
1.00
the original distribution’s structure when used to condense a range of values into a smaller one. The min–max scaling is demonstrated using the following formula: x_std = (x − xmin (axis = 0)) / (xmax (axis = 0) − xmin (axis = 0))
(1)
x_scaled = x_std ∗ (max − min) + min
(2)
where min, max = feature range. x min (axis = 0): Minimum feature value. x max (axis = 0): Maximum feature value. Table 4 shows an example of our scaled data after applying Min–max normalization.
3.3 KNN Classifier The K-nearest neighbor approach is a fundamental supervised learning technique. The KNN approach assumes that new cases and data are equivalent to previous cases and groups them into the class that most closely resembles the existing categories. By comparing newly collected data to previously annotated data, the KNN approach classifies new data. The KNN approach allows for the quick categorization of new data [12]. The data we plotted after using the KNN classifier is shown in Fig. 2. We have chosen the number of neighbors k = 5. The Euclidean separation between the two data points will then be calculated. Geometry introduces a notion known as the Euclidean distance. It is the separation of the two points. Finally, after computing the Euclidean distance, we were able to determine which classes had the closest neighbors.
192
Md. S. Rahman and B. Ahmed
Fig. 2 Data plotting after applying KNN
Fig. 3 Data plotting after applying SVM
3.4 SVM with Polynomial Kernel A polynomial kernel is a common example of a kernel with more than one degree. It represents the linear kernel in a more thorough manner [13]. The homogeneous polynomial kernel function is represented by the following mathematical function: n K Xi , Y j = Xi · Y j + c
(3)
where n is the degree of the polynomial and “·” is the dot product of the two values. Three different classes of mental patients have been separated using two unique hyper-planes in Fig. 3 utilizing the polynomial kernel SVM classifier.
3.5 Naïve Bayes Classifier In addition to the Bayes rule, the Naive Bayes approach makes the strong assumption that the characteristics are conditionally independent given the class. Although these independence criteria are occasionally neglected in reality, the accuracy of Naive
A Comparative Study of Psychiatric Characteristics Classification …
193
Fig. 4 Data plotting after applying the Naive Bayes classifier
Bayes classification is often equal. Naïve Bayes is frequently utilized in practice [14] because of its computational efficiency and a plethora of other attractive qualities. Naïve Bayes is a type of Bayesian network classifier based on Bayes’ rule and uses the following formula: P(y|x) = (P(y) P(x|y))/ P(x)
(4)
After applying Naïve Bayes classifier our plotted data is presented in Fig. 4.
3.6 Decision Tree Classifier Decision trees are popular for classification and regression problems. Leaf nodes represent classification results, and inner nodes represent dataset attributes. Decision and leaf are tree nodes. Decision nodes make decisions and have various branches, whereas Leaf nodes implement them. Figure 5 shows a decision tree classifiers displayed training and test data.
Fig. 5 Data plotting after applying the decision tree classifier
194
Md. S. Rahman and B. Ahmed
Fig. 6 Data plotting after applying logistic regression classifier
3.7 Logistic Regression Categorization is aided by multinomial logistic regression. The default for logistic regression is two categories. The classification problem needs to be divided into numerous binary classification issues before extensions like one vs. rest may be used [15]. The loss function is changed to a cross-entropy loss in a multinomial logistic regression, and the predicted probability distribution is changed to a multinomial probability distribution. Figure 6 shows the plotted dataset after logistic regression classification.
4 Performance and Result Analysis The effectiveness of employed classifiers is evaluated using the following statistical criterion for classifier precision. Classification The classifier’s precision is the proportion of all evaluations to all reliable assessments. Using the following mathematical statement, we calculate the accuracy of the appropriate classifier. Accuracy = (TN + TP)/(TN + TP + FN + FP)
(5)
The classifier’s True Positive, True Negative, False Positive, and False Negative values are indicated here by the letters TP, TN, FP, and FN. Table 5 lists the classification accuracy of each classifier. With accuracy rates of 96.62% for KNN, 99.17% for polynomial Kernel SVM, 97.04% for Naive Bayes, 94.05% for Decision Tree, and 96.62% for Logistic Regression, the performance of each classifier has been represented as a bar graph. A visual illustration of it is shown in Fig. 7. The AUC scores of KNN), SVM, Naïve Bayes, DT, and LR classifier have been found as like as presented in Table 6.
A Comparative Study of Psychiatric Characteristics Classification … Table 5 Table of accuracy measurement with different classifier
Classifier name
195 Accuracy (%)
KNN
96.62
Polynomial Kernel SVM
99.17
Naïve Bayes
97.04
Decision tree
94.05
Logistic regression
96.62
Classifier name
Accuracy (%)
Fig. 7 Accuracy bar chart for each classifier
Table 6 Table of AUC score measurement with different classifiers
KNN
96.62
Polynomial Kernel SVM
99.17
Naïve Bayes
97.04
Decision tree
94.05
Logistic regression
96.62
We can quickly analyze statistical results using K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naive Bayes, Decision tree, and Logistic Regression classifiers on our processed data of various psychiatric features. Additionally, we discovered the confusion matrices shown in Fig. 8 for each classifier. Classification report for K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naive Bayes, Decision tree, and Logistic regression has been demonstrated and all the reports have stated in Fig. 9. Finally, we obtained ROC curves for every level of psychiatry that was classified for the dataset that was modified based on the characteristics of psychiatry that are only shown in Fig. 10.
196
Md. S. Rahman and B. Ahmed
Fig. 8 Confusion matrix of applied classifiers
Fig. 9 Classification report summary of all classifiers
5 Prediction Analyzing the most challenging task of generating a dataset using the responses of predefined questionnaires for each feature of psychiatry and the application of kernel SVM on it, a prediction of psychiatry for any individual may be demonstrated. Considering all confusion matrices, performance charts, classification reports, and ROC curves of KNN, SVM, Naïve Bayes, DT, and LR we found that our suggested SVM for polynomial kernel has performed with the best accuracy of 99.17%. So we may easily propose the polynomial kernel SVM as our expected classifier to predict the psychiatric disorder of each and every psychiatric patient.
A Comparative Study of Psychiatric Characteristics Classification …
197
Fig. 10 ROC curve of multi-class for all classifiers
6 Conclusion We have devised a technique to quickly identify psychological issues in individuals. We may quickly assess someone’s mental strength by utilizing a support vector machine (SVM) classifier with a non-linear polynomial kernel and adhering to the right procedures after they’ve responded to a survey of questionnaires on a variety of psychiatric aspects that have been finished using a simple technique.
References 1. Rahman MS (2022) Predicting psychiatric disorder from the classified psychiatric characteristics using machine learning algorithm. Eur J Inf Technol Comput Sci 2(4):5–10 2. Walsh-Messinger J (2019) Relative importance of symptoms, cognition, and other multilevel variables for psychiatric disease classifications by machine learning. Psychiatry Res 278:27–34 3. Spitzer RL (2006) A brief measure for assessing generalized anxiety disorder: the gad-7. Arch Intern Med 166(10):1092–1097 4. French MT (2001) Using the drug abuse screening test (dast-10) to analyze health services utilization and cost for substance users in a community based setting. Subst Use Misuse 36(6– 7):927–943 5. Cowan N (2019) Foundations of arrogance: a broad survey and framework for research. Rev Gen Psychol 23(4):425–443 6. Epstein-Lubow G (2010) Evidence for the validity of the american medical association’s caregiver self-assessment questionnaire as a screening measure for depression. J Am Geriatr Soc 7. Johnson MH (2001) Assessing confused patients. J Neurol Neurosurg Psychiatry 71(suppl 1):i7–i12 8. Bran A (2020) Assessing risk-taking: what to measure and how to measure it. J Risk Res 23(4):490–503
198
Md. S. Rahman and B. Ahmed
9. Vink P (2018) Consciousness assessment: a questionnaire of current neuroscience nursing practice in Europe. J Clin Nurs 27(21–22):3913–3919 10. Mayoraz E, Alpaydin E (1999) Support vector machines for multi-class classification. In: International work-conference on artificial neural networks. Springer, pp 833–842 11. Saranya C, Manikandan G (2013) A study on normalization techniques for privacy preserving data mining. Int J Eng Technol 5:2701–2704 12. Zhang (2020) Cost-sensitive KNN classification. Neurocomputing 391:234–242 13. Pujari P (2022) Classification of Pima Indian diabetes dataset using support vector machine with polynomial kernel. Deep Learn Mach Learn IoT Biomed Health Inf 5(2):55–67 14. Rish I (2001) An empirical study of the naïve-bayes classifier. In: IJCAI workshop on empirical methods in artificial intelligence, vol 3, pp 41–46 15. Karsmakers P, Pelckmans K, Suykens J (2007) Multi-class kernel logistic regression: a fixedsize implementation. IEEE 18:1756–1761
Material Named Entity Recognition (MNER) for Knowledge-Driven Materials Using Deep Learning Approach M. Saef Ullah Miah
and Junaida Sulaiman
Abstract The scientific literature contains an abundance of cutting-edge knowledge in the field of materials science, as well as useful data (e.g., numerical values from experimental results, properties, and structure of materials). To speed up the identification of new materials, these data are essential for data-driven machine learning (ML) and deep learning (DL) techniques. Due to the large and growing amount of publications, it is difficult for humans to manually retrieve and retain this knowledge. In this context, we investigate a deep neural network model based on Bi-LSTM to retrieve knowledge from published scientific articles. The proposed deep neural network-based model achieves an F1 score of 97% for the Material Named Entity Recognition (MNER) task. The study addresses motivation, relevant work, methodology, hyperparameters, and overall performance evaluation. The analysis provides insight into the results of the experiment and points to future directions for current research. Keywords Named entity recognition · Material named entity recognition · Materials science · EDLC · Bi-LSTM
1 Introduction The task of named entity recognition (NER) in the realm of materials science is termed material named entity recognition (MNER) [19]. NER stands for entity recognition which is a natural language processing (NLP) technique that automatically detects and classifies named entities in a text [21]. Names of persons, organisations, dates, M. Saef Ullah Miah (B) · J. Sulaiman Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, 26600 Pekan, Pahang, Malaysia e-mail: [email protected] J. Sulaiman e-mail: [email protected] J. Sulaiman Center for Data Science and Artificial Intelligence (Data Science Center), Universiti Malaysia Pahang, 26600 Pekan, Pahang, Malaysia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_17
199
200
M. Saef Ullah Miah and J. Sulaiman
monetary values, location names, and amounts are all examples of entities. Entities in the text are considered key pieces of information for the text, which is important for understanding the context of the text. The terms “Named Entity” and the word “Named” are intended to limit the range of potential entities to those for which one or more rigid designators serve as the referent [1, 2, 4, 17, 23]. When a designator designates the same item in all potential worlds, it is stiff. On the contrary, flaccid designators can refer to a variety of things in a variety of conceivable universes [20]. For example, in the field of materials science, various material names, their property names, and synthesis processes can be defined as named entities for the field of materials science. The MNER model comprises two steps like any other NER model, which are 1. Detection of a material named entity. 2. Categorisation of the detected named entity. The first step is to identify a word or series of words that together form an entity. Each word in a string represents a token. Each named entity can be formed from a single word or token or a combination of words or tokens. For example, “carbon” is a named entity with a single token; “carbon monoxide" is a named entity with multiple tokens. This recognition can be done by a variety of methods, such as using rules, dictionaries, or machine learning [5]. The second phase entails the establishment of material-specific entity types. For example, “Mat,” “Proc,” and “Cmt” can all refer to different types of materials, synthesis processes, and characterisation methods. These categories can be defined or generated by the expert from a variety of domains and as needed for a specific task. As mentioned earlier, NER enables easy identification of key components within a text, and extraction of key components from a text enables organisation of unstructured data and detection of critical information; similarly, MNER is critical for dealing with large data sets or knowledge-based materials systems. The amount of scientific publications in materials science has increased by orders of magnitude over the last few decades. A significant inconvenience in materials discovery occurs when new results are compared to previously published literature. A possible workaround to this problem would be to convert the fragmented raw text of articles into formatted database records that are queryable programmatically. To accomplish this, text mining combined with MNER tasks is performed to extract large amounts of information from the literature of materials science publications. MNER is critical to data- or knowledge-driven materials science, also known as materials informatics [11], which is an obvious component of Industrial Revolution 4.0. MNER helps identify materials, synthesis processes, characterisation methods, and many other types of entities that are essential for materials discovery research, identification of various synthesis processes to produce a substance or new object. The MNER is the most important part of any knowledge-based materials system that deals with the discovery, extraction, and knowledge representation of the discovered materials, processes, and other entities from the published works. The contributions of this study are as follows:
Material Named Entity Recognition (MNER) for Knowledge …
201
– A deep neural network architecture for recognising material and process entities from scientific articles. – A performance comparison between the proposed model and different baseline machine learning models. The remaining portion of the manuscript is organised as follows: Sect. 2 discusses relevant works; Sect. 3 presents the methodology employed in this study. The experiment and results are presented in Sect. 4 andSect. 5 concludes the study.
2 Related Work NER is undeniably a new field for the materials community. Training data is required for entity recognition models. If a domain already has a knowledge base, remote monitoring models can be used to train on known items and relationships. The most extensively used NER methods are dictionaries, rules, and machine learning, including deep learning. Knowledge-driven materials pipelines typically use all three methodologies. For efficient utilisation of annotated data, hybrid systems apply machine learning only when dictionaries or rules cannot manage the situation where the deep learning model processes the data in a sequential or semantic fashion. On the contrary, dictionary searches cover material composition, chemical element names, properties, procedures, and experimental data. A set of manually created rules or specifications that indicate how the relative order of rules and agreements should be handled is called a rule-based approach. Rules can be created using corpus-based methods, where multiple cases are analysed to find patterns, or by using domain knowledge and lexical conventions. LeadMine [12, 16] uses rules for naming conventions, ChemicalTagger [8] analyses empirical synthesis sections of chemical texts, and parts of ChemDataExtractor use nested rules. For example, to use ChemDataExtractor [24] with magnetic materials, researchers added domain-specific parsing rules with domain-specific terms (e.g., magnetic materials such as ferromagnetic and ferrimagnetic) [3]. Finally, using a feature-based representation of the observed data, machine learning-based algorithms identify specific entity names. Since a sentence is a series of words, it is not sufficient to focus on the current word only. In sequential models, the preceding, the current, and the next words are required. Unlike rule-based approaches, supervised machine learning models require a huge amount of expert annotated data and strict annotation rules. Machine learning methods require careful evaluation of recognised classes and tag classification order. Kim et al. [13], Kononova et al. [14], Guha et al. [7], and Weston et al. [25] spearheaded NER work in the materials domain by implementing a bidirectional network of long short-term memory (LSTM) [9] and a conditional random field (CRF) [15] for material entity identification. The Material NER problems differ by subarea. These include attributes, context, and reporting nuances. For example, Kononova et al. used a material parser to convert string representations of materials into chemical formulas, which were then split down into
202
M. Saef Ullah Miah and J. Sulaiman
constituents and molar ratio balances. To find balanced reactions between precursor and target materials, the authors solved a system of equations. The open substances were generated from the precursor and target materials’ combinations. Because not all sorts of patterns can be implemented for all domains or a considerable amount of corpus data is required for a specific domain, the rule-based or dictionary-based approach is time-consuming and does not guarantee performance. After examining several research and works, this work eschewed these methodologies in favour of a machine learning-based strategy for the Material Named Entity Recognition task, which included both classic machine learning and deep learning techniques.
3 Methodology 3.1 Problem Formulation The sequence labelling strategy is utilised for the entity recognition task. The sequence labelling job is processed using the form of the word, the context of the word in a sentence, and the word representation. Sentences (S) are tokenised from a piece of text (T ), then tokens or words (W ) are tokenised from the sentences, and finally each token is associated with a corresponding label (L). Formally, a piece of text T contains a set of natural sentences S, T = {S1 , S2 , S3 , ........, Sn }. Each sentence S contains a sequence of n tokens S =< W1 , W2 , W3 , ........, Wn > and the associated labels Lb =< lb1 , lb2 , lb3 , ........, lbn >. The objective is to predict a list of tokens and associated labels (Wi , lbi ) from an input set of unknown entities.
3.2 Deep Neural Network Model for MNER Task Because this is a sequence labelling task, the Long Short-Term Memory (LSTM) variation of the Recurrent Neural Network (RNN) [10] is used. Since the RNN suffers from context difficulties as the sequence grows longer, the LSTM outperforms the original RNN. Along with the other layers of the deep neural network model, a bidirectional LSTM (Bi-LSTM) [6] layer is used in this study. Forward encoding of input tokens and reverse encoding of input tokens are combined in Bi-LSTM to provide the optimal context of a token inside a sentence. The three gates in an LSTM network that update and control cell states are the forget gate, input gate, and output gate. The gates are activated by sigmoid and hyperbolic tangent functions. The input gate regulates how much new data will be encoded into the cell state in response to incoming input data. The forget gate determines whether the information in the cell state has to be deleted in response to fresh information entering the network. The
Material Named Entity Recognition (MNER) for Knowledge …
203
Fig. 1 The proposed deep neural network model architecture for MNER task
output gate decides whether the data contained in the cell state is sent as input to the network in the following time step. The deep neural network architecture developed for the MNER task proposed in this study is shown in Fig. 1. The proposed deep neural network model has five layers, three of which are hidden layers and two of which are input and output layers, as shown in Fig. 1. The first hidden layer is an embedding layer, which turns each word in a given sentence into a fixed-length vector. To use the embedding layer, all data is integer coded, which means that each word is represented by a distinct integer number. A spatial dropout layer is the following layer. To reduce model over-fitting and increase model performance, this dropout layer is employed as a regularisation strategy. This layer regularises the network during training by probabilistically eliminating activation and weight updates from input and recurrent connections to the LSTM units. The bidirectional LSTM layer is put after the dropout layer. Two LSTM layers are introduced within the bidirectional Keras wrapper. The first LSTM model learns the provided sentence’s word order, while the second LSTM model learns the reverse order of the first model. A time-distributed wrapped dense layer is put after the LSTM layer. To maintain oneto-one relationships between input and output, this time distributed wrapper applies a layer to every temporal slice of an input. The proposed deep neural network model implementation can be expressed using Algorithm 1. The Python programming language and the Keras library are used to build the deep neural network, and the TensorFlow library is used to train it.
204
M. Saef Ullah Miah and J. Sulaiman
Algorithm 1 Proposed Deep Neural Network model architecture 1: Input: Sentence List with word and word-labels: S L 2: num_wor ds = unique words in dataset, num_tags = unique labels in dataset 3: set max Len = 90 4: For {S in sent_list}{ X = pad sequences words } 5: For {S in sent_list}{ y = pad sequences labels, y = hot encode y } 6: Split data into training and testing set 7: input_wor d = (max Len) 8: Embedding(input_dim = num_wor ds, out put_dim = max Len, input_length = max Len) 9: SpatialDropout1D(0.2) 10: Bidirectional(LSTM(units = 200,r etur n_sequences = True, r ecurr ent_dr opout = 0.2)) 11: out=TimeDistributed(Dense(activation = ‘softmax’)) 12: matr ec = Model(input_wor d,out) 13: matr ec.compile(optimi zer = ‘adam’, loss = ‘categorical_crossentropy’, metrics = [accuracy, precision_mt, recall_mt, f1_mt]) 14: matr ec.fit(train-test data, batch_si ze = 16, ver bose = 1, epochs = 50, 15: validation_split = 0.2, callbacks=[tensorboard_cbk, es]) 16: model.save(‘matrec.h5’) 17: Return matr ec.h5
3.3 Evaluation Methods Precision, Recall, and F1 are used to evaluate the proposed Material Named Entity Recognition (MNER) model. When analysing entity predictions, if each token is correctly identified, the entity is marked as valid. When the entities are successfully predicted, True Positives (T Pos) are calculated; False Positives (F Pos) are marked when the predicted initial token does not match the entity’s marked token. When the system forecasts the initial token of the predicted entities erroneously, False Negatives (F N eg) are registered. The equations stated in Eqs. (1), (2), and (3) are used to determine the mentioned evaluation metrics: Pr ecision = Recall = F1 =
T Pos T Pos + F Pos
T Pos T Pos + F N eg
2 ∗ Pr ecision ∗ Recall Pr ecision + Recall
(1)
(2)
(3)
Material Named Entity Recognition (MNER) for Knowledge …
205
Table 1 Overview of the dataset employed in the MNER task with the proposed DNN model Dataset parameter Value Number of annotated articles Number of sentences having any entity Total annotated words Sentences containing material entity Sentences containing process entity Average number of sentences having any entity per article Average entity annotated per document Average entity annotated per sentence Average material entity containing word per document Average process entity containing word per document Average material entity containing word per sentence Average process entity containing word per sentence
50 1115 3155 980 265 22.3 63.1 2.8 51.1 12 2.6 2.3
4 Experiment and Results 4.1 Dataset In this study, a hand-crafted dataset annotated by the domain expert from an electric double-layer capacitor (EDLC) is used [18]. The dataset is curated from the full text of fifty scientific articles from the EDLC domain. The text is annotated in InsideOutside-Beginning (IOB) [22] format. There are five labelled classes in the dataset, namely 1. B-material, 2. I-material, 3. B-process, 4. I-process, and 5. O. The summary of the dataset is presented in Table 1.
4.2 Hyperparameters The hyperparameter values for the different layers of the proposed deep neural network vary from layer to layer. For the LSTM layer, tanh is used as the activation function with 200 neurons, and 0.2 is set as the recurrent dropout value. For the dense layer activation, softmax is used. As the optimiser “Adam” and as the loss function “categorical_crossentropy” is used. The sequence length is set to 90. The model is trained using GPU with a batch size of 16. The proposed model uses an early stopcallback approach with minimal validation loss to avoid overtraining. To train the
206
M. Saef Ullah Miah and J. Sulaiman
Table 2 Comparative results obtained from the experiment in terms of precision, recall, and F1 score Model name Precision Recall F1 Proposed DNN model DecisionTree classifier ExtraTree classifier RandomForest classifier XGBoost classifier K Neighbors classifier LGBM classifier Support vector classifier AdaBoost classifier Gaussian NB Mimicking model [7]
0.965 0.948 0.947 0.945
0.957 0.9 0.9 0.88
0.961 0.938 0.938 0.927
0.93 0.92 0.91 0.84
0.74 0.6 0.37 0.17
0.839 0.732 0.526 0.282
0.83 0.82 0.673
0.17 0.09 0.818
0.282 0.162 0.731
model, the network is fed 80% of the sentences in the dataset, and the model is tested on the remaining 20% of the sentences with 5-fold cross-validation. The model is stored and evaluated after training and testing using various evaluation measures.
4.3 Result Analysis The experimental results of the proposed MNER model are compared with different state-of-the-art machine learning models, and the comparison is shown in Table 2. The results obtained from the experiment are presented in Table 2. The findings of this experiment show that in the entity recognition task, the proposed deep neural network model outperforms many state-of-the-art baseline machine learning models. The proposed DNN model achieved a better f − 1 score than the system proposed by Guha et al. [7] which shows that the proposed model is better in terms of generalisation ability than the models using various pretrained word embedding models and conditional random fields along with other layers of the network. The proposed model also achieved better pr ecision and r ecall scores than the other compared models. The result shows that the proposed model is quite promising for the NER task in the EDLC domain in reference to the evaluation results. The comparison of the different evaluation metrics between the proposed model and other baseline models is shown in Fig. 2.
Material Named Entity Recognition (MNER) for Knowledge …
207
Fig. 2 Precision, recall, and F1 score comparison among different baseline machine learning models with proposed deep neural network model
5 Conclusion and Future Work Overall, the research reported in this article explores the possibilities of a deep neural network model based on intelligible LSTM in a knowledge-based materials system. Our initial research focused on material entity recognition. Without significant knowledge of the material context, the deep neural network model performed admirably. We believe that this LSTM-based model is a promising direction towards a more sophisticated knowledge-based system for extracting material knowledge from scientific literature since deep learning-based models are designed to adapt to a variety of different natural language processing tasks rather than concentrate on a single task.
References 1. Ahmed S et al (2022) Toward machine learning-based psychological assessment of autism spectrum disorders in school and community. In: Proceedings TEHI, pp 139–149 2. Al Banna M et al (2020) A monitoring system for patients of autism spectrum disorder using artificial intelligence. In: Proceedings of brain informatics, pp 251–262 3. Court CJ, Cole JM (2018) Auto-generated materials database of curie and néel temperatures via semi-supervised relationship extraction. Sci Data 5(1):1–12 4. Ghosh T et al (2021) Artificial intelligence and internet of things in screening and management of autism spectrum disorder. Sustain Cities Soc 74:103189 5. Goyal A, Gupta V, Kumar M (2018) Recent named entity recognition and classification techniques: a systematic review. Comput Sci Rev 29:21–43 6. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5–6):602–610
208
M. Saef Ullah Miah and J. Sulaiman
7. Guha S, Mullick A, Agrawal J, Ram S, Ghui S, Lee SC, Bhattacharjee S, Goyal P (2021) Matscie: an automated tool for the generation of databases of methods and parameters used in the computational materials science literature. Comput Mater Sci 192:110325 8. Hawizy L, Jessop DM, Adams N, Murray-Rust P (2011) Chemicaltagger: a tool for semantic text-mining in chemistry. J Cheminformatics 3(1):1–13 9. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 10. Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79(8):2554–2558 11. Jose R, Ramakrishna S (2018) Materials 4.0: materials big data enabled materials discovery. Appl Mater Today 10:127–132 12. Kaiser MS et al (2021) 6G access network for intelligent internet of healthcare things: opportunity, challenges, and research directions. In: Proceedings of TCCE, pp 317–328 13. Kim E, Huang K, Saunders A, McCallum A, Ceder G, Olivetti E (2017) Materials synthesis insights from scientific literature via text extraction and machine learning. Chem Mater 29(21):9436–9444 14. Kononova O, Huo H, He T, Rong Z, Botari T, Sun W, Tshitoyan V, Ceder G (2019) Text-mined dataset of inorganic materials synthesis recipes. Sci Data 6(1):203. https://doi.org/10.1038/ s41597-019-0224-1 15. Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Brodley CE, Danyluk AP (eds) Proceedings of the eighteenth international conference on machine learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28–July 1, pp 282–289. Morgan Kaufmann, 340 Pine Street, 6th Floor San Francisco, CA 94104 USA 16. Lowe DM, Sayle RA (2015) Leadmine: a grammar and dictionary driven approach to entity recognition. J Cheminformatics 7(1):1–9 17. Mahmud M et al (2022) Towards explainable and privacy-preserving artificial intelligence for personalisation in autism spectrum disorder. In: Proceedings of HCII, pp 356–370 18. Miah MSU, Sulaiman J, Jose R, Sarwar TB (2022) MATREC: material and process named entity recognition dataset for EDLC. https://data.mendeley.com/datasets/s3st6n77pr. 10.17632/s3st6n77pr.1 19. Miah MSU, Sulaiman J, Sarwar TB, Naseer A, Ashraf F, Zamli KZ, Jose R (2022) Sentence boundary extraction from scientific literature of electric double layer capacitor domain: tools and techniques. Appl Sci 12(3):1352 20. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26 21. Paul A, Basu A, Mahmud M, Kaiser MS, Sarkar R (2002) Inverted bell-curve-based ensemble of deep learning models for detection of Covid-19 from chest x-rays. Neural Comput Appl 1–15 22. Ramshaw LA, Marcus MP (1999) Text chunking using transformation-based learning. In: Natural language processing using very large corpora. Springer, Switzerland AG, pp 157–176 23. Sumi AI et al (2018) fASSERT: a fuzzy assistive system for children with autism using internet of things. In: Proceedings of brain informatics, pp 403–412 24. Swain MC, Cole JM (2016) ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J Chem Inf Model 56(10):1894–1904. https://doi.org/ 10.1021/acs.jcim.6b00207 25. Weston L, Tshitoyan V, Dagdelen J, Kononova O, Trewartha A, Persson KA, Ceder G, Jain A (2019) Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. J Chem Inf Model 59(9):3692–3702. https://doi.org/ 10.1021/acs.jcim.9b00470
An Improved Optimization Algorithm-Based Prediction Approach for the Weekly Trend of COVID-19 Considering the Total Vaccination in Malaysia: A Novel Hybrid Machine Learning Approach Marzia Ahmed , Mohd Herwan Sulaiman , Ahmad Johari Mohamad, and Mostafijur Rahman Abstract SARS-CoV-2 is a multi-organ disease characterized by a wide range of symptoms, which also causes the severe acute respiratory syndrome. When it initially began, it rapidly spread from its origin to adjacent nations, infecting millions of people around the globe. In order to take appropriate preventative and precautionary actions, it is necessary to anticipate positive COVID-19 instances in order to better comprehend future risks. Therefore, it is vital to building mathematical models that are resilient and have as few prediction mistakes as feasible. This research recommends an optimization-based Least Square Support Vector Machines (LSSVM) for forecasting COVID-19 confirmed cases along with the daily total vaccination frequency. In this work, a novel hybrid Barnacle Mating Optimizer (BMO) via Gauss Distribution is combined with the Least Squares Support Vector Machines algorithm for time series forecasting. The data source consists of the daily occurrences of cases and frequency of total vaccination from February 24, 2021, to July 27, 2022, in Malaysia. LSSVM will thereafter conduct the prediction job with the optimized hyper-parameter values using BMO via Gauss distribution. This study concludes, based on its experimental findings, that hybrid IBMOLSSVM outperforms cross validations, original BMO, ANN, and a few other hybrid approaches with optimally optimized parameters.
M. Ahmed (B) · M. H. Sulaiman · A. J. Mohamad Faculty of Electrical and Electronics Engineering Technology, Universiti Malaysia Pahang (UMP), Pahang, Malaysia e-mail: [email protected] M. Ahmed Department of Software Engineering, Daffodil Smart City, Daffodil International University, Ashulia, Dhaka, Bangladesh M. Rahman Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_18
209
210
M. Ahmed et al.
Keywords Time series prediction · Barnacle Mating Optimizer · Gaussian distribution · Least square support vector machines · COVID-19 confirmed cases and total vaccination
1 Introduction 1.1 History of COVID-19 The spread of infectious diseases in large populations may result in considerable financial, psychological, and social turmoil, along with prolonged hospitalization and death over a broad geographic area. Epidemics are defined as widespread outbreaks of infectious diseases that affect a large number of people. In 2019, we have all been exposed to the corona virus (SARSCoV2) [1]. Wuhan, China, has just found a novel coronavirus, SARS-CoV-2. This epidemic brought up memories of previous coronavirus outbreaks, such as SARS and the Middle East respiratory sickness, carried by travelers (MERS). The virus has attacked 6,057,853 people and killed 371,166 of them globally as of June 1, 2020 [2]. After just three months, the new sickness spread across the globe, prompting the World Health Organization to declare a pandemic on March 11, 2020 [3, 4]. According to the World Health Organization, the coronavirus is a negatively stranded RNA virus that causes respiratory infections such as common colds and other respiratory ailments. Virus with negatively stranded RNA genome is most likely transmitted by animals [5]. Over half a billion people will have been exposed to the virus by the end of 2020, with 1.3 million people dying as a result [6]. Symptoms of COVID-19 may range from subtle changes in general health to major multi-organ failure needing treatment in an intensive care unit [7]. The virus is transmitted mostly by close contact or close exposure to respiratory droplets, with aerosols and contaminated surfaces also playing a role to a lesser level [8]. The COVID-19 incubation period is 5 days, with the majority of persons experiencing symptoms by day 11 of the incubation period. Patients with COVID-19 have an average of 7 days between the onset of symptoms and admission to the hospital. Patients at hospitals range in age from 47 to 73, with 60% being males [6, 9]. In general, it is believed that the mutation rate of COVID-19 is rather high, especially when compared to other single-stranded RNA virus strains. According to industry standards, 104 replacements per site are made on average every year [10]. During the week of January 17–23, 2022, there was a 5% rise in the number of new cases of COVID-19, however, there was no change in the number of fatalities. During the last week, more than 21 million new cases were recorded throughout the six WHO regions, marking the highest weekly total since the pandemic started [11, 12]. Figure 1 shows the daily recorded confirmed cases and deaths all over the world. Therefore, Fig. 2 illustrates the total cases in Malaysia for the last 2 weeks, and in
An Improved Optimization Algorithm-Based Prediction Approach …
211
Fig. 1 COVID-19 confirmed cases and daily death per day around the world [13]
Fig. 3 the overall report in Malaysia has been shown along with the Global recorded details report. Several public safety precautions have been stressed in this regard, including the use of a facemask, social separation, and travel restrictions. Aside from that, one of the activities that may be taken to avoid growth in the number of instances is prediction. This publication advances the field by bringing together a wide range of machine learning-based COVID-19 or other severe illness diagnostic, detection, and prediction methods, as well as other experimental studies, i.e., the ANN model for Parkinson’s Disease Prediction [15]. In India, Support Vector Regression (SVR) was used to forecast COVID-19 cases [16], and Brazil, Portugal, and the United States used Artificial Neural Network (ANN) for predicting the confirmed cases [17]. ARIMA, SARIMA, and SARIMAX including Exponential Smoothing which are a few statistical prediction models used in Oliveira et al. [18]. Similarly, [19] takes a similar strategy. However, due to their linear nature, statistical methods cannot capture nonlinear patterns in time series data.
212
M. Ahmed et al.
Fig. 2 COVID-19 confirmed cases in Malaysia for past 2 weeks [13]
Fig. 3 Overall report of COVID-19 in Malaysia [WHO]
Despite the fact that numerous solutions have been offered for this problem, there is still room for improvement, and developing new ways is vital given the pandemic’s severe status. LSSVM has been acknowledged as an effective machine learning method in performing many problems including prediction. The objective of this paper is to build a prediction model for predicting confirmed cases of COVID-19 significantly with a minimal error rate. To predict the weekly trend of COVID-19 in accordance with total vaccination frequency, a novel evolutionary optimization method called Improved Barnacles Mating Optimizer (IBMO) and Least Square Support Vector Machine (LSSVM) are combined. Improved BMO-LSSVM was used to enhance the RBF kernel-based LSSVM model from February 24, 2021, until July 27, 2022. From the beginning of the COVID-19 epidemic in Malaysia, it has been shown that this hybrid approach works well. This research’s contributions include the following: 1. Cumulative Confirmed Cases of COVID-19 have been combined with Cumulative Total Vaccination in Malaysia as an Input–Output. 2. The original Barnacle Mating Optimizer (BMO), as enhanced by Gaussian Distribution and paired with the Least Square Support Vector Machine, enhances time series forecasting for infectious disease scenario analysis. 3. Compared with state of the art, IBMO-LSSVM can better forecast the weekly trend of COVID-19 cases when paired with the total number of vaccination cases. The rest of this paper is arranged as follows: Sect. 2 includes a quick introduction and mathematical explanation of Improved BMO and LSSVM, respectively. This is followed by the proposed applied approach in Sect. 3. Section 4 explains the result,
An Improved Optimization Algorithm-Based Prediction Approach …
213
while the analysis of the findings is presented in Sect. 5. Finally, Sect. 6 completes the research.
2 Improved Barnacle Mating Optimizer 2.1 Barnacle Mating Optimizer (BMO) BMO is a bio-inspired algorithm based on the barnacle’s mating behavior [20, 22]. Barnacles mate in nature through copulation and sperm-cast. In BMO, the Punnet square established by the Hardy–Weinberg principle is implemented in regular copulation and is viewed as exploitation, while sperm-cast is treated as exploration.
2.2 Mathematical Model of BMO The enhancement of the original BMO will be discussed in the next subtopic.
2.2.1
Initialization
The first possible solutions are shown in the following equation [20]. Candidate solutions were created at this stage using a random generator: ⎡
x11 · · · ⎢ .. . . X =⎣ . . xn1 · · ·
⎤ x1N .. ⎥ . ⎦
(1)
xnN
Here, N refers to the number of control variables that require to be optimized, and n is the total population. Then, when each population has been assessed, the first, best option is ranked and placed at the top.
2.2.2
Parents’ Selection
Random selection is used in the mating process to create new progeny. The parameter pl, which reflects the range of the barnacles that may be mated, has to be modified, though. Barnacles are regarded as hermaphrodites since they can produce their own sperm as well as accept it from others [23].
214
M. Ahmed et al.
However, it is believed that each barnacle can only be fertilized by one barnacle at a time for the sake of simplicity in the formulation of the BMO algorithm. Selfmating can occur in nature, however, it is extremely uncommon, as stated in Yusa et al. [24]. Consequently, it won’t be taken into account in BMO’s mathematical model.
2.2.3
Off-Spring Generation
How BMO reproduces is seen in the following equation. Here, x N barnacle_d and x N barnacle_m are the control variables of barnacles’ parents; p is the normally distributed random numbers, and q = (1 − p). In this algorithm, rand() is the simple random number [0 − 1]:
2.2.4
N N xiN _new = pxbar nacle_d + q x bar nacle_m
(2)
n xiN _new = rand() × xbar nacle_m
(3)
Gauss Distribution Implementation
The value of pl is essential to BMO’s definition of the exploration and exploitation processes. It is clear from Eq. (3) that the generation of new offspring, which is regarded as the exploration process, uses simple random integers. The exploration procedure is improved in this study by changing Eq. (3) to the following expression: n xiN _new = xbar nacle_m + Gauss(N )
Gauss(N ) = 0.01 × StepSize = 0.01 ×
(4)
r1 × σs 1
|r2 | β
(5)
Gauss Distribution is performed using the following equations: σs = σ0 exp (−μn)
(6)
where σ0 and μ is constant and valued by ½ and -0.0001. t is the current generation. Proper step size is important for the search space; if the step size is too large, a new selection will be too far from the old one and if it’s too small, then the change in position will also be too small.
An Improved Optimization Algorithm-Based Prediction Approach …
215
Starting the IBMO process flow entails generating a population of X according to Eq. (1). The new offspring are produced using Eqs. (2) and (3). Each new offspring is then subjected to a procedure of examination before being joined with their parents. The sorting method is used to place the most effective solution at the top of the population. For the assessment of the following iteration, when the bottom half of the population is presumably removed, only half of the top population (composed of a mixture of parents and offspring) is picked.
3 Least Square Support Vector Machine SVM linear regression is carried out by nonlinear mapping in high-dimensional feature space. One of the biggest drawbacks of the SVM is that it requires the solution of a massive quadratic programming problem. To deal with optimization issues involving linear equations rather than quadratic programming [25–27], the least squares support vector machine (LS-SVM) was developed. Kernel functions (linear, polynomial and multilayer perceptron, Radial Basis Function (RBF)) stated as k(xi, x j) in the following equations Eqs. (7 & 8) and αi , xi, b in Eqs. (7 & 9) represents the Lagrange multipliers, ith support vector, and bias parameter, respectively. The LSSVM regression model with RBF is expressed in Eq. (9): y(x) =
n
αi k(xi, xj) + b
i=1
k(xi, x j) = exp
y(x) =
n i=1
xi − xj 2 2σ2
xi − xj 2 αi exp +b 2σ2
(7)
(8)
(9)
4 Proposed Methodology This section describes the approach to the suggested solution. The tasks included dataset preparation, the development of the suggested hybrid algorithm, and how the assessment is carried out.
216
M. Ahmed et al.
Fig. 4 Total vaccination report during the date range in Malaysia
4.1 Dataset Preparation The data is obtained from February 24, 2021, to July 27, 2022, and collected every day. Following that, it will be grouped into different sets such as training, validation, and testing. The entire COVID-19 cumulative confirmed validated dataset is available for retrieval at the following website: https://data.humdata.org/dataset/novel-corona virus-2019-ncov-casess. On the other hand, the vaccination frequency data has been collected from the following website: https://ourworldindata.org/covid-vaccinations in Fig. 4, where frequency has been calculated from the cumulative total vaccination with the same date range.
4.2 Experimental Setup The experiment design that is compatible with the study is detailed below, and it contains information about the model’s input and output, as well as how data has been divided into three phases: training, validation, and testing. Finally, there are performance evaluation criteria for the IBMO-LSSVM model.
An Improved Optimization Algorithm-Based Prediction Approach …
217
Fig. 5 Schematic diagram of data modeling for the confirmed cases and total vaccination frequency in Malaysia
4.3 Input–Output This research predicts the average of confirmed COVID-19 cases in a weekly manner. It will take the daily data and calculate each week’s average data to predict the weekly average from next week onwards. For clarity regarding the input, a schematic diagram is provided in Fig. 5.
4.4 BMO-Gauss-LSSVM The process flow of IBMO is started by initializing an X population, as described in Eq. (1). In Eqs. (2) and (3), a few pre-defined parameters, such as pl, populations, and the maximum number of iterations, are set at the start. The next stage is to determine every function detail, which may comprise test system data or the boundary of the searching areas. It is followed by the evaluation process for each new offspring, which will then be combined with the parents. Finally, the sorting process is executed to place the current best solution at the top of the population. Only half of the top population (which consists of a mix of parents and offspring) is chosen for the evaluation of the next iteration, where the bottom half of the population is assumed to be eliminated. The proposed Improved BMO acts as an optimization tool for setting the LSSVM hyper-parameters, particularly the regularization parameter, γ, and the kernel parameter, σ2, so that the LSSVM algorithm can anticipate the COVID-19 outbreaks correctly. Figure 6 depicts the IBMO implementation process flow, and in Fig. 7, the pseudo-code of IBMOLSSVM has been illustrated explicitly. In line 16, the method is modified using Gaussian Distribution, and in lines 3 and 20, LSSVM is used to analyze the fitness function through training and validation.
218
M. Ahmed et al.
Fig. 6 The workflow diagram of IBMO
4.5 Performance Evaluation The prediction model performance is evaluated based on the following metric: Mean Absolute Percentage Error (MAPE), Accuracy, and Theils’ U. This performance metric for regression is defined as follows: N 1 M AP E = |(y pr edicted − yactual )/yactual | ∗ 100% N i=1 N 1 2 i=1 (yactual − y pr edicted ) N T heils U = 2 N 1 N 1 2 i=1 (yactual ) + i=1 y pr edicted N N
(10)
(11)
An Improved Optimization Algorithm-Based Prediction Approach …
219
Fig. 7 The pseudo-code of IBMO
Accuracy = 1 − M A P E
(12)
Here, N the number of test instances, y pr edicted the predicted values at ith time, and the actual values at ith time. yactual An indicator of a forecasting model’s superiority versus naïve forecasting is the statistic known as Theil’s U. The model is better than naïve forecasting if the value is less than 1, and worse than naive forecasting if the value is larger than 1. The model against naive forecasting statistics is calculated by calculating the square root of the sum of squared errors [23]. Equations (10–12) are the common evaluation indicators defined by the error rate of the prediction model for regression. Their values should be as small as possible.
5 Result The collected data from the series of analyses are compiled in this study. The performance assessment metric for IBMO-LSSVM, BMO-LSSVM, SSA-LSSVM, EMALSSVM (a very recent algorithm [28]), BMO-Levy (Accuracy-99.86) and ANN is shown in Table 1. It is clear from the table that IBMO-LSSVM excels by generating the lowest error with the highest accuracy value across all performance matrices.
220
M. Ahmed et al.
Table 1 Performance evaluation of different algorithms Applied algorithms BMO-LSSVM
Theils’U 2.83331150279882
ANN
26.1902743956193
MAPE
Accuracy
0.011887
0.988113
0.11
0.89
EMA-LSSVM
0.060
0.00361
0.9964
IBMO-LSSVM
0.833326912587873
0.00354
0.9964598
SSA-LSSVM
0.0107
0.0224
0.9776
The accuracy is 99.65% for IBO-LSSVM. Figure 8 shows the accuracy comparison among the algorithms. Figure 9 shows the convergence curve of IBMO-LSSVM during the testing phase. In Table 2 has been shown the day-to-day comparison among the applied algorithm with the actual and predicted values. Based on the specified parameters, the algorithm’s best value for the day is highlighted in bold.
1.02 1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82
Accuracy
BMO-LSSVM
ANN
Fig. 8 Accuracy comparison among algorithms
Fig. 9 Convergence curve of IBMO-LSSVM
IBMO-LSSVM
SSA-LSSVM
An Improved Optimization Algorithm-Based Prediction Approach …
221
Table 2 During testing phase, target values versus predicted values for weekly trend analysis Week
Target
IBMO-LSSVM
BMO-LSSVM
ANN
SSA-LSSVM
67
4502155
4486397.458
4448579.356
4006917.95
4401306.728
68
4513277.286
4497480.815
4459569.286
4016816.784
4412179.875
69
4524651.714
4508815.433
4470808.359
4026940.026
4423299.516
70
4538708.143
4522822.664
4484697.516
4039450.247
4437041.08
71
4554246.143
4538306.281
4500050.614
4053279.067
4452231.029
72
4571123.857
4555124.924
4516727.483
4068300.233
4468730.683
73
4592335.143
4576261.97
4537686.355
4087178.277
4489466.836
74
4617926.571
4601763.828
4562973.245
4109954.649
4514485.016
75
4647891.571
4631623.951
4592581.662
4136623.499
4543778.8
6 Discussion The IBMO-LSSVM results show that the recommended prediction model can generate prediction values that are more accurate than those produced by the original BMO-LSSVM, hybrid SSA-LSSVM, EMA-LSSVM, and ANN. The results show strong agreement between all of the statistical metrics utilized. In addition to the prediction error, the convergence rate shows positive results. On the other hand, the prediction model’s failure to anticipate accurate values during a sharp rise in occurrences is one of the challenges in time series prediction. Another issue is that leave-one-out cross validation takes a long time to complete and is costly to compute for bigger datasets. Therefore, as it would enhance prediction generalization, integrating optimization techniques with LSSVMs would be an intriguing subject to research.
7 Conclusion The COVID-19 pandemic is aggressively and rapidly spreading around the world. In order to minimize unplanned risk in terms of mental, physical, social, financial, or even maximal fatality, COVID-19 forecasting is essential. It also plays a significant role in warnings and management. This makes it hard to predict data from the COVID-19 case series with any degree of accuracy. In this work, a novel model called the Improved BMO-LSSVM was proposed to address the difficulty of forecasting confirmed instances. The effectiveness of the proposed models was evaluated between February 24, 2021, and July 27, 2022, using data from the first identification of Covid and the frequency of total immunization in Malaysia. Results from ANN and LSSVM were compared to see which one was more accurate. MAPE, Theils’ U, and Accuracy, three statistical metrics, were used to analyze the model’s performance. When it comes to forecasting inaccuracy, the proposed model outperforms
222
M. Ahmed et al.
the ones already in use. In order to forecast proven cases, take action, and manage the issue, the presented model may be used as a valuable and simple tool. Limitation: Despite the fact that BMO-LSSVM, EMA-LSSVM, SSA-LSSVM, and ANN were used in the paper to compare predictions made with IBMOLSSVM, future simulations of alternative decomposition modes with other variants should investigate more outstanding single models in order to improve the proposed approach’s accuracy in predicting confirmed instances. Acknowledgements This research study was supported by Ministry of Education Malaysia (MOE) and Universiti Malaysia Pahang under Fundamental Research Grant Scheme (FRGS/1/2019/ICT02/UMP/03/1) & (#RDU1901133).
References 1. Rogers JP et al (2020) Psychiatric and neuropsychiatric presentations associated with severe coronavirus infections: a systematic review and meta-analysis with comparison to the COVID19 pandemic. Lancet Psychiatry 7(7):611–627. https://doi.org/10.1016/S2215-0366(20)302 03-0 2. Liu J et al (2020) Community transmission of severe acute respiratory syndrome coronavirus 2, Shenzhen, China, 2020. Emerg Infect Dis 26(6):1320–1323. https://doi.org/10.3201/eid2606. 200239 3. Montelongo-Jauregui D, Vila T, Sultan AS, Jabra-Rizk MA (2020) Convalescent serum therapy for COVID-19: a 19th century remedy for a 21st century disease. PLoS Pathog 16(8):1–7. https://doi.org/10.1371/JOURNAL.PPAT.1008735 4. Cucinotta D, Vanelli M (2020) WHO declares COVID-19 a pandemic. Acta Biomed 91(1):157– 160. https://doi.org/10.23750/abm.v91i1.9397 5. Peiris JSM et al (2003) Coronavirus as a possible cause of severe acute respiratory syndrome. Lancet 361(9366):1319–1325. https://doi.org/10.1016/S0140-6736(03)13077-2 6. Ni L et al (2020) Detection of SARS-CoV-2-specific humoral and cellular immunity in COVID19 convalescent individuals. Immunity 52(6):971-977.e3. https://doi.org/10.1016/j.immuni. 2020.04.023 7. Wiersinga WJ, Rhodes A, Cheng AC, Peacock SJ, Prescott HC (2020) Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19): a review. JAMA J Am Med Assoc 324(8):782–793. https://doi.org/10.1001/jama.2020.12839 8. Jayaweera M, Perera H, Gunawardana B, Manatunge J (2020) Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company’ s public news and information. Environ Res 188(January):1–18 9. Wang D et al (2020) Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China. JAMA J Am Med Assoc 323(11):1061– 1069. https://doi.org/10.1001/jama.2020.1585 10. Su S et al (2016) Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Trends Microbiol 24(6):490–502. https://doi.org/10.1016/j.tim.2016.03.003 11. Weekly epidemiological update on COVID-19–25 January 2022. https://www.who.int/public ations/m/item/weekly-epidemiological-update-on-covid-19---25-january-2022. Accessed 27 Jan 2022 12. Sohrabi C et al (2020) World Health Organization declares global emergency: a review of the 2019 novel coronavirus (COVID-19). Int J Surg 76(February):71–76. https://doi.org/10.1016/ j.ijsu.2020.02.034
An Improved Optimization Algorithm-Based Prediction Approach …
223
13. COVID live—coronavirus statistics—Worldometer. https://www.worldomters.info/corona virus/. Accessed 18 Apr 2022 14. WHO coronavirus (COVID-19) dashboard. https://covid19.who.int/table/. Accessed 18 Apr 2022 15. Deshmukh R, Gourkhede P, Rangari S (2019) Heart disease prediction using artificial neural network. IJARCCE 8(1):85–89. https://doi.org/10.17148/IJARCCE.2019.8119 16. Parbat D, Chakraborty M (2020) A python based support vector regression model for prediction of COVID19 cases in India. Chaos Solitons Fractals 138:109942. https://doi.org/10.1016/j. chaos.2020.109942 17. Sudden Cardiac Death (SCD): symptoms, causes. https://my.clevelandclinic.org/health/dis eases/17522-sudden-cardiac-death-sudden-cardiac-arrest. Accessed 08 Feb 2022 18. de Oliveira LS, Gruetzmacher SB, Teixeira JP (2021) Covid-19 time series prediction. Procedia Comput Sci 181(2019):973–980. https://doi.org/10.1016/j.procs.2021.01.254 19. To˘ga G, Atalay B, Toksari MD (2021) COVID-19 prevalence forecasting using autoregressive integrated moving average (ARIMA) and artificial neural networks (ANN): case of Turkey. J Infect Public Health 14(7):811–816. https://doi.org/10.1016/j.jiph.2021.04.015 20. Sulaiman MH, Mustaffa Z, Saari MM, Daniyal H (2020) Barnacles Mating Optimizer: a new bio-inspired algorithm for solving engineering optimization problems. Eng Appl Artif Intell 87:265–270. https://doi.org/10.1016/j.engappai.2019.103330 21. Sulaiman MH, Mustaffa Z, Saari MM, Daniyal H, Musirin I, Daud MR (2018) Barnacles Mating Optimizer: an evolutionary algorithm for solving optimization. In: 2018 IEEE international conference on automatic control and intelligent systems (I2CACIS), Oct 2018, pp 99–104. https://doi.org/10.1109/I2CACIS.2018.8603703 22. Sulaiman MH et al (2019) Barnacles Mating Optimizer: a bio-inspired algorithm for solving optimization problems. In: 2018 19th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), June 2018, vol 87, no September 2019, pp 265–270. https://doi.org/10.1109/SNPD.2018.8441097 23. Barazandeh M, Davis CS, Neufeld CJ, Coltman DW, Palmer AR (2013) Something darwin didn’t know about barnacles: Spermcast mating in a common stalked species. In: Proceedings of Royal Society B Biological Sciences 24. Yusa Y, Yoshikawa M, Kitaura J, Kawane M, Ozaki Y, Yamato S, Høeg JT (2012) Adaptive evolution of sexual systems in pedunculate barnacles. In: Proceedings of the Royal Society B: Biological Sciences, vol 279, pp 959–966 25. Zeroual A, Harrou F, Dairi A, Sun Y (2020) Deep learning methods for forecasting COVID-19 time-Series data: a Comparative study. Chaos Solitons Fractals 140:110121. https://doi.org/10. 1016/j.chaos.2020.110121 26. Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2020) Time series forecasting of Covid19 using deep learning models: India-USA comparative case study. Chaos Solitons Fractals 140:110227. https://doi.org/10.1016/j.chaos.2020.110227 27. Kumar N, Susan S (2020) COVID-19 pandemic prediction using time series forecasting models. In: 2020 11th international conference on computing, communication and networking technologies, ICCCNT 2020. https://doi.org/10.1109/ICCCNT49239.2020.9225319 28. Sulaiman MH, Mustaffa Z, Saari MM, Daniyal H, Mirjalili S (2023) Evolutionary mating algorithm. Neural Comput Appl 35(1):487–516
Analyzing the Effectiveness of Several Machine Learning Methods for Heart Attack Prediction Khondokar Oliullah, Alistair Barros, and Md. Whaiduzzaman
Abstract Heart attack or heart failure cases are rising quickly each day, thus it is crucial and worrisome to anticipate any problems in advance. A heart attack is a significant medical emergency that happens when the blood circulation to the heart is abruptly clogged, normally by a blood clot. For the prevention and treatment of heart failure, an accurate and prompt identification of heart disease is essential. Traditional medical history has been criticized for not being a trustworthy method of diagnosing heart disease in many ways. Machine learning techniques are effective and reliable for classifying healthy individuals from heart attack risk factors. This study proposes a model based on machine learning methods such as decision trees, random forests, neural networks, voting, gradient boosting, and logistic regression using a dataset from the UCI repository that incorporates numerous heart diseaserelated variables. The aim of this paper is to foresee the probability of a heart attack or failure in patients. According to the results, the gradient boosting approach exhibits the best performance in terms of accuracy, precision, recall, specificity, and f1-score. Decision tree, random forest, voting, and gaussian naive Bayes also have shown good performance. Keywords Machine Learning · Heart disease · Heart attack prediction · Gradient boosting · Random Forest · Decision tree
K. Oliullah (B) Institute of Information Technology, Jahangirnagar University, Dhaka, Bangladesh e-mail: [email protected] A. Barros · Md. Whaiduzzaman School of Information Systems, Queensland University of Technology, Brisbane, Australia e-mail: [email protected] Md. Whaiduzzaman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_19
225
226
K. Oliullah et al.
1 Introduction The leading cause of death worldwide is still heart disease or cardiovascular disease. According to the report of the American Heart Association, 7.08 million deaths globally in 2020 were attributed to cerebrovascular illness. It is mentionable that 3.48 million deaths are from coronary heart disease which is the major reason of heart attack [12]. In the USA, a stroke victim died on average every 3 min and 30 s. The regions with the greatest rates of overall stroke mortality are Central, Southeast and East Asia, Oceania, and sub-Saharan Africa [11]. Typically, this illness leaves the heart unable to pump the necessary amount of blood to the other part of the body to support its normal functions, and as a result, eventually heart failure happens [5]. Risk factors for heart disease include genetic predisposition, bad behaviors in one’s personal and professional life, and lifestyle decisions. In addition to physiological risk factors like obesity, hypertension, high blood cholesterol, and pre-existing heart diseases, a number of behavioral risk factors, such as smoking, excessive alcohol, and caffeine use, stress, and physical inactivity, are associated with an increased risk of developing cardiovascular disease [8]. For patients to lessen their related risks of severe heart problems and enhance heart security [26], an exact and proper identification of their heart disease risk is required. Medical personnel evaluates the patient’s medical history, the results of the physical examination, and any alarming symptoms before using any invasive methods to diagnose heart disease. Due to human error, each of these steps frequently results in erroneous diagnoses and delays in the outcomes of the diagnostic. Furthermore, it requires longer examinations and is more expensive and computationally difficult [25]. The field of machine learning in artificial intelligence is still emerging. Its principal objective is to construct systems, let them learn, and then use what they have learned to generate predictions. By using a training dataset of heart disease to train machine learning algorithms, it builds a model. Then the new input data are used by the model to predict cardiac disease. By utilizing machine learning to uncover hidden patterns [9] in the input dataset, it generates models which can create more precise predictions from the fresh datasets. Missing values are filled, and the dataset is cleansed. The new input data are used to forecast heart illness or heart attacks, and the model’s accuracy is then assessed [18]. Here, our contributions are • To design a decision support system that is capable of accurately determining a patient’s chance of survival with heart failure. • To evaluate the overall effectiveness of various ML techniques in order to improve the model’s predictability. • To look into the key risk elements from the dataset that also influence how well the machine learning algorithm performs. The paper is designed as follows: in Sect. 2, a short review is discussed of the previous works. The materials and methodology of the proposed model are discussed
Analyzing the Effectiveness of Several Machine Learning Methods …
227
in Sect. 3. The details of the experiment’s findings are covered in Sect. 4. The conclusion of the paper is addressed in the final Sect. 5.
2 Literature Review A number of earlier research that employed data mining methods to identify heart issues served as inspiration for this work. Here is a short review of the literature. Have claimed that the proposed optimization tends to improve the overall correctness of the medical data [17]. According to the technique, there has been an improvement in performance and an increase in classification accuracy. Additionally, the mentioned model achieves the models of KNN at 99.65% and RF at 99.6%, which are considered to have precise scores. Additionally, Benjamin Fredrick David [17] has made reference to the method in the old system, which is claimed to provide greater precision with the total categorization. The proposed machine learning algorithms have made it possible to swiftly comprehend and forecast heart problems. Furthermore, the accuracy that was discovered using the random model allowed for the earliest possible computation of the coronary disease. For the Cleveland Heart Database dataset, Phasinam and Khongdet [6] employed a variety of classification techniques, including Naive Bayes, ID3, C4.5, and SVM algorithms. In this study, the accuracy (92%) and error rate (8%) of the SVM beat those of the other three algorithms. The researchers employed significant factors, such as accuracy results of classification algorithms and error rate results of classification algorithms, to build a better comparison between the models. This methodology made a thorough and comparative analysis between the various ML models. Proposed a hybrid intelligent machine-learning-based predictive system was put forth in order to identify heart illness. Cleveland heart illness dataset was used to evaluate the system [24]. Three feature selection techniques were combined with seven well-known classifiers, including K-NN, ANN, SVM, NB, DT, logistic regression, and random forest. The crucial features were selected using Relief, mRMR, and LASSO. The best accuracy was 89% achieved by the logistic regression with tenfold cross-validation. The goal of [25]’s research was to solve the issue at hand and build a reliable ML technique in order to predict heart disease. Both the entire dataset’s features and a sample of them have been used to test the proposed work. The reduction of features affects the efficiency of classifiers in terms of the evaluation matrix and execution time. With shorter computation times of 4.4, 7.3, and 8 s, respectively, experimental findings in terms of accuracy from SVM, K-NN, and logistic regression are 97.5%, 95%, and 93% respectively. Three ML classification modelling techniques such as KNN, RF, and logistic regression have been used to develop a model [15] for the detection of cardiovascular disease. This method predicts persons with cardiovascular disease using the medical history of patients affected by this illness from a dataset. The accuracy rate of this model is 87.5%.
228
K. Oliullah et al.
Experts and practitioners have grown increasingly interested in using traditional machine learning approaches to diagnose cardiac disease over time. In their research study [3, 21, 27, 19], experts typically use a classification strategy to make a model for the diagnosis of heart disease. Preliminary computational findings show that machine learning model can identify heart failure with 91% accuracy [6] using the UCI dataset which is not very enough for such a significant issue. Therefore, there is a lot of work being done to raise the performance evaluation rate in this sector.
3 Materials and Methodology This research intends to forecast the likely computerized prediction of a heart attack that is useful for physicians and patients in the medical field. The research methods and materials used in the work are briefly covered in the following subsections for achieving the aim. Figure 1 represents the proposed system framework.
3.1 Dataset A number of researchers [1, 4, 10, 22] use the “Cleveland heart disease dataset 2016,” which is accessible via the University of California, Irvine’s online data mining repository and Kaggle repository [16]. We use this dataset that holds the medical records of 304 different patients ranging in age. The medical parameters of the patient, such as chest pain, cholesterol, sugar level, etc., provided by this dataset allow us to identify patients who are prone to a heart attack or not. This dataset comprises 13 medical features for 303 patients is shown in Table 1. Data Preprocessing. Since real-world data frequently contains noise, and missing values, and may be in an unfavorable format, it cannot be used to directly train machine learning models. The accuracy and efficiency of a machine learning model are increased by data preprocessing [13], which is required to clean the data and prepare it for the model. Here, we’ll get the dataset ready for the logistic regression
Fig. 1 The proposed system framework
Analyzing the Effectiveness of Several Machine Learning Methods …
229
Table 1 Description of the dataset No
Name of the features
Values
1
Age
30 ~ 77
2
Sex
Male = 1, Female = 0
3
Chest pain type
1 = atypical, 2 = typical, 3 = asymptomatic, 4 = nonanginal
4
Resting blood pressure
94–200 mm Hg
5
Serum cholesterol
120–564 mg/dl
6
Sugar level in blood with fasting > 120 mg/dl 1 = True, 0 = False
7
Resting electr. results
8
Heart rate (max)
71–202
9
Exercise-induced angina
1 = yes, 0 = no
10
Old peak = ST depression induced by exercise relative to rest
0–6.2
11
Slop of the peak exercise ST segment
1 = up sloping, 2 = flat, 3 = down sloping
12
Major vessels number colored by fluoroscopy
0–3
13
Thallium scan
3 = normal, 6 = fixed defect, 7 = reversible defect
0 = normal, 1 = having ST – T, 2 = hypertrophy
model, decision tree, random forest, neural network, voting, gradient boosting, and gaussian NB models. Missing data handling. If the dataset has some missing data, the machine learning model may have a serious issue. As a result, the dataset contains missing values, which must be handled. Removing outliers. Data points that stand out from the rest of the dataset are known as outliers. These abnormal observations, which typically come from imprecise observations or poor data entry, frequently distort the data distribution. It’s crucial to find and eliminate outliers in order to guarantee that the trained model generalizes well to the acceptable range of test inputs. Feature selection. A crucial component that enhances classification accuracy is precise and accurate feature selection. Understanding the data first and attempting to glean as many insights from it as possible is a smart strategy. Before using the data in question, EDA focuses on making sense of it. Therefore, Exploring Data Analysis (EDA) is a helpful option for feature selection to acquire more accuracy of the model. After being categorized, the data are divided into a training dataset and a test dataset, which are then subjected to a variety of algorithms to get better accuracy results.
230
K. Oliullah et al.
3.2 Machine Learning Techniques Some machine learning classification algorithms are applied to foretell whether a person is prone to a heart attack or not. This subsection of this paper briefly discusses a few number well-known classification algorithms and their theoretical foundations. Decision Tree. A classification system known as a decision tree can be applied that includes both categorical and numerical data. Tree-like structures are made using decision trees. A decision tree is a straightforward and popular tool for managing medical datasets. The data in a tree-shaped graph can be easily implemented and analyzed [23]. Three nodes—the root node, internal node, and leaf node—make up the decision tree model, which base an analysis of them. Random Forest. The random forest algorithm is an algorithmic technique used for supervised classification. In this algorithm, a forest is made up of many trees. Each tree in this process emits a class expectation, and the class with the highest votes determines the model’s prediction. The more trees in the technique, the more accurate it is. Neural Networks. Similar to fault-diagnosis networks, neural network used for feature categorization only permits one output response for any input pattern as opposed to permitting numerous errors to happen for a particular set of operational conditions [2]. The general structure of a neural network consists of three layers—the input layer, hidden layer, and output layer. Gaussian Naive Bayes. A unique kind of NB algorithm is a Gaussian Naive Bayes algorithm. When the features contain continuous values, it is specially employed. Additionally, it is presumed that all of the features follow a normal distribution or gaussian distribution. Voting Classifier. It is a particular kind of ML estimator that develops a number of foundational models or estimators and makes decisions based on averaging their results. Voting for each estimator output can be combined with the collecting criteria. Gradient Boosting. Gradient boosting classifiers are a subset of machine learning methods that integrate a group of poor learning models into a powerful predicting model. Decision trees are widely used in it aiming to strengthen poor learning algorithms by making a series of adjustments to the learner or hypothesis [28]. The concept of Probability Approximately Correct Learning serves as the foundation for this sort of hypothesis boosting (PAC). We must complete a number of steps in order to create a gradient boosting classifier. We will require Fit the model, adapt the model’s hyperparameters and parameters, make forecasts, and describe the findings. Logistic Regression. The categorical variables in the logistic regression model, which is parametric, are dependent [7]. This parametric model is employed to categorize the categorical values, the values of which rely on the probability of success. It is used to tackle binary classification problems. It predicts values for variables 0 and 1 and divides them into two groups: negative (0) and positive (1).
Analyzing the Effectiveness of Several Machine Learning Methods …
231
3.3 Performance Matrix Seven machine learning techniques are applied to predict whether a patient is heart attack prone or not. These models are evaluated on a variety of performance factors clarified in Eqs. (1)-(5), with Accuracy, Precision, Recall, Specificity, and F1score. Here, true positive (TP)—total samples with lack of heart problem symptoms predicted as no possibility of a heart attack. False positive (FP)—total samples with the presence of heart problems symptoms predicted as no possibility of a heart attack. True negative (TN)—total samples with the presence of heart problem predicted as the possibility of a heart attack. False negative (FN)—total samples with a lack of heart problem symptoms predicted as the possibility of a heart attack. Accuracy =
TP +TN TP + TN + FP + FN
Recall =
TP TP + FN
(2)
TP TP + FP
(3)
TN TN + FP
(4)
2 ∗ Pr ecision ∗ Recall Pr ecision + Recall
(5)
Pr ecision = Speci f icit y = F1 − Scor e =
(1)
4 Experimental Results and Discussion The purpose of this study is to foretell the possibility of a patient having a heart attack. This study examined a number of machine learning methods utilizing the Kaggle datasets, including decision trees, random forests, neural networks, voting, gradient boosting, and logistic regression. Through the use of the anaconda tool, numerous experiments were carried out utilizing various machine learning methods. An 8th generation Intel Corei7 with a 6600U processor clocked at up to 3.1 GHz and 16 GB of RAM was used for the research. The data set was divided into a training set and a test set after classification. The data is pre-processed, and various ML algorithms are used to calculate the accuracy score. Using Python programming, the accuracy score results of several strategies were evaluated for the training (80%) and test (20%) data sets. Figure 2 illustrates the percentage of the heart attack prediction accuracy in terms of train and test data of different machine learning algorithms. In general, the accuracy of training data in Random Forest, Gradient boost, and Voting model are highest,
232
K. Oliullah et al.
and the rest of the techniques experienced near the highest accuracy except Neural Network model. Here, the highest accuracy is 99.80% and the lowest accuracy is 53.3% in terms of training data. Similarly, the Gradient Boosting algorithm’s accuracy using test data is the highest which is 98.8%, and the rest of the algorithms experienced near 98.8% or more than 90% except the Neural Network model. The overall accuracy of the Gradient boosting is comparatively better. Meanwhile, the accuracy of Decision Tree, Random Forest, Gaussian NB, and Voting are also good which are 96.6%, 97.7%, 95.7%, and 95% respectively. The entire dataset was tested on the selected ML algorithms in this experiment, where 80% and 20% of data are allocated for training and testing respectively. In Table 2 and Fig. 3, the performance matrices of prediction results for heart attack occurrences (yes and no) are presented. The gradient boosting shows a better performance with 0.99 (no) and 0.98 (yes) precision, 0.97 (no) and 0.99 (yes) recall, and 0.99 (no) and 0.99 (yes) f1-score. On the other hand, the artificial neural network presents a lower performance with 0.59 precision, 0.99 recall, and 0.74 f1-score for heart attack prediction (yes). All other models like Decision Tree, Random Forest, Gaussian NB, and Voting display almost same performance as gradient boosting except logistic regression achieved a little bit lower performance with 0.87 (no) and 0.92 (yes) precision, 0.89 (no) and 0.91 (yes) recall, and 0.88 (no) and 0.91 (yes) f1-score. Figure 4 shows predictions of the mentioned algorithms. It is apparent from the results that gradient boosting predicts the highest total true positives and true negatives. Although, ANN predicts the highest total true positives as Gradient boosting it also predicts the highest total false positives (presence of heart problem symptoms predicted as no possibility of a heart attack). Random forest indicates that it has the second highest true positives whereas the Decision Tree, Gaussian NM, and Voting model have the third highest true positives. 100 90 80 70 60 50 40 30 20 10 0 Decisio n Tree
Gradien Logistic Neural Rando Gaussia Voting t Regress Networ m n NB Model Boostin ion k Forest g
Accuracy (Train) %
99.5
99.75
99.8
53.3
98
99.7
90.4
Accuracy (Test) %
96.6
97.7
98.8
58.8
95.7
95
90
Fig. 2 Heart Attack Prediction Accuracy in terms of test and training data
Analyzing the Effectiveness of Several Machine Learning Methods …
233
Table 2 Performance metric indices for machine learning algorithms evaluation Model
Heart attack possibility
Precision
Recall
f1-score
Decision tree
No
0.99
0.92
0.96
Yes
0.95
0.99
0.97
Random forest
No
0.99
0.95
0.97
Yes
0.96
0.99
0.98
No
0.99
0.97
0.99
Yes
0.98
0.99
0.99
Neural network
No
0
0
0
Yes
0.59
0.99
0.74
Gaussian NB
No
0.99
0.95
0.97
Yes
0.96
0.99
0.98
No
0.99
0.95
0.97
Yes
0.96
0.99
0.98
No
0.87
0.89
0.88
Yes
0.92
0.91
0.91
Gradient boosting
Voting model Logistic regression
1 0.8 0.6 0.4 0.2 0 No
Yes
Decision Tree
No
Yes
Random Forest
No
Yes
Gradient Boosting precision
No
Yes
Neural Network recall
No
Yes
Gaussian NB
No
Yes
Voting Model
No
Yes
Logistic Regression
f1-score
Fig. 3 Performance measure indices of machine learning algorithms
Actually, Gaussian NB and Voting model show the same properties in Fig. 4. Logistic regression is the lowest performer after ANN in the sense of false negative and false positive. Gradient Boosting surpassed all other machine learning techniques, according to Fig. 5, with a maximum prediction accuracy of 0.99, while Random forest came in second (0.97). Additionally, the Gradient Boosting has exhibited strong precision and recall, respectively, of 0.99 and 0.98. The greatest F1 score of 0.99 and the highest specificity of 0.97 show that this technique is most suited for identifying patients
234
K. Oliullah et al.
Logistic Regression Voting Model Gaussian NB
False Negative (FN) True Negative(TN)
Neural Network
False Positive (FP)
Gradient Boosting
True Positive (TP) Random Forest Decision Tree 0
10
20
30
40
50
60
Fig. 4 Prediction results of machine learning algorithms
who have heart attack symptoms. In light of these performance matrices, we can therefore conclude that Gradient Boosting outperforms other methods in terms of predicting whether a patient is heart attack prone or not. Table 3 contrasts the proposed classification method we proposed with the findings of earlier studies. We discover that, in terms of heart attack prediction and classification, our optimized model outperforms than the other models individually when compared to the current approaches and experiment findings. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Decision Random Gradient Neural Gaussian Tree Forest Boosting Network NB
Voting Model
Logistic Regressi on
Accuracy
0.96
0.97
0.99
0.59
0.95
0.95
0.9
precision
0.94
0.94
0.99
0.29
0.92
0.92
0.93
recall
0.96
0.97
0.98
0.5
0.97
0.97
0.9
Specificity
0.92
0.92
0.97
0
0.89
0.89
0.89
f1-score
0.97
0.97
0.99
0.37
0.95
0.95
0.89
Fig. 5 Overall performance measurements of different machine learning techniques
Analyzing the Effectiveness of Several Machine Learning Methods …
235
Table 3 Comparative Performance of different ML algorithms using the same dataset Model
Techniques
Accuracy (%)
Phasinam, Khongdet, et al. [20]
SVM
92
C4.5
85
Ishaq, Abid, et al. [14]
Proposed model
ID3
78
AdaBoost
88
Random Forest
91
Extra Tree Classifier
92
Decision Tree
96
Random Forest
97
Gradient Boosting
99
Voting Model
95
5 Conclusion The main goal is to describe some machine learning methods, including logistic regression, decision trees, random forests, neural networks, voting, gradient boosting, and gaussian NB, that are beneficial in accurately predicting whether a heart condition can cause a heart attack or not. The patient’s medical history, which includes information on chest pain, blood sugar levels, blood pressure, oxygen saturation, and other factors, is used to teach these approaches. We only take into account 13 essential attributes in this analysis. The model employed pre-processed data that had been previously altered. In experimental analysis, the gradient boosting classifier achieved the best performance experienced with 99% accuracy, 99% precision, 98% recall, 97% specificity, and 99% f1-score. Voting, decision trees, random forests, and gaussian naive Bayes have all performed well.
References 1. Ahmed, Hager, et al. (2020) Heart disease identification from patients’ social posts, machine learning solution on Spark. Futur Gener Comput Syst 111: 714–722. 2. Mohd Amiruddin, Ahmad Azharuddin Azhari, et al. (2020) Neural network applications in fault diagnosis and detection: an overview of implementations in engineering-related systems. Neural Comput Appl 32.2 447–472 3. Al Badarin, Firas J, Saurabh Malhotra. (2019) Diagnosis and prognosis of coronary artery disease with SPECT and PETCurr Cardiol Rep 21.7: 1–11 4. Budholiya, Kartik, Shailendra Kumar Shrivastava, Vivek Sharma (2020) An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ-Comput Inf Sci 5. Bui AL, Horwich TB, Fonarow GC (2011) Epidemiology and risk profile of heart failure. Nat Rev Cardiol 8(1):30–41 6. David H, Antony Belcy S (2018) Heart disease prediction using data mining techniques. ICTACT J. Soft Comput 9.1
236
K. Oliullah et al.
7. Desai, Shrinivas D et al. (2019) Back-propagation neural network versus logistic regression in heart disease classification. Adv Comput Commun Technol. Springer, Singapore. 133–144 8. Dibben Grace et al. (2021) Exercise-based cardiac rehabilitation for coronary heart disease. Cochrane Database Syst Rev 11 9. Faruqui, Nuruzzaman, et al. (2021) LungNet: A hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensor-based medical IoT data. Comput Biol Med 139: 104961 10. Haq, Amin Ul et al. (2019) Heart disease prediction system using model of machine learning and sequential backward selection algorithm for features selection.In: 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). IEEE 11. Heart disease and stroke statistics update fact sheet at-a-glance. (n.d.), https://profes sional.heart.org/en/science-news/-/media/8D840F1AA88D423888ED3BA96DD61010.ashx , last accessed 2022/08/05 12. NHS Homepage, https://www.nhs.uk/conditions/heart-attack/, last accessed 2022/07/22 13. Hossen, Rakib et al. (2021) BDPS: An efficient spark-based big data processing scheme for cloud Fog-IoT Orchestration. Information 12.12: 517 14. Ishaq, Abid, et al. (2021) Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE access 9: 39707–39716 15. Jindal, Harshit, et al. (2021) Heart disease prediction using machine learning algorithms.In: IOP conference series: materials science and engineering. 1022(1). IOP Publishing 16. Kaggle, https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-predic tion-dataset, last accessed 2022/07/22 17. Khourdifi Y, Bahaj M (2019) Heart disease prediction and classification using machine learning algorithms optimized by particle swarm optimization and ant colony optimization. Int J Intell Eng Syst 12(1):242–252 18. Van Klompenburg, Thomas, Ayalew Kassahun, Cagatay Catal (2020) Crop yield prediction using machine learning: A systematic literature review. Comput Electron Agric 177: 105709 19. Minou, John, et al. (2020) Classification techniques for cardio-vascular diseases using supervised machine learning.“ Medical Archives 74.1: 39 20. Phasinam, Khongdet, et al. (2022) Analyzing the performance of machine learning techniques in disease prediction. J Food Qual 2022 21. Plati, Dafni K, et al. (2021) A machine learning approach for chronic heart failure diagnosis. Diagnostics 11.10: 1863 22. Rani, Pooja, Rajneesh Kumar, Anurag Jain (2021) Multistage model for accurate prediction of missing values using imputation methods in heart disease dataset. Innov Data Commun Technol Appl. Springer, Singapore. 637–653 23. Sagar, Shuvashish Paul, et al. (2021) PRCMLA: Product review classification using machine learning algorithms. In: Proceedings of International Conference on Trends in Computational and Cognitive Engineering. Springer, Singapore 24. Shah, Devansh, Samir Patel, Santosh Kumar Bharti (2020) Heart disease prediction using machine learning techniques. SN Comput Sci 1.6: 1–6 25. Ullah, Farhat, et al. (2022) An efficient machine learning model based on improved features selections for early and accurate heart disease predication. Comput Intell Neurosci 2022 26. Whaiduzzaman, Md, et al. (2020) AUASF: An anonymous users authentication scheme for fog-IoT environment. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE 27. Whaiduzzaman, Md, et al. (2021) HIBAF: A data security scheme for fog computing. J High Speed Netw Preprint: 1–22 28. Zhang Y, Haghani A (2015) A gradient boosting method to improve travel time prediction. Transp Res Part C: Emerg Technol 58:308–324
Solving the Royalty Payment Problem Through Shooting Method Wan Noor Afifah Wan Ahmad , Suliadi Firdaus Sufahani , Mahmod Abd Hakim Mohamad, Mohd Saifullah Rusiman, Rozaini Ros-lan, Mohd Zulariffin Md. Maarof, Muhamad Ali Imran Kamarudin, Ruzairi Abdul Rahim, and Naufal Ishartono
Abstract Nowadays, an Optimal Control problem tends to fall into the non-standard setting, especially in the economic field. This research deals with the non-standard Optimal Control problem with the involvement of the royalty payment. In maximizing the performance index, the difficulty arises when the final state value is unknown and resulting in the non-zero final costate value. In addition, the royalty function cannot be differentiated at a certain time frame. Therefore, an approximation of the hyperbolic tangent (tanh) was used as a continuous approach and the shooting method was implemented to settle the untangled issue. The shooting method was constructed in C ++ programming computer software. At the end of the study, the results produced are in the optimal solution. Future academics may build on this innovative discovery as they create mathematical modeling techniques to address practical economic issues. Moreover, the new method can advance the academic field in line with today’s technological advances. Keywords Optimal Control · Royalty Payment Problem · Shooting Method
W. N. A. W. Ahmad · S. F. Sufahani (B) · M. A. H. Mohamad · M. S. Rusiman · R. Ros-lan · M. Z. Md. Maarof · R. A. Rahim Universiti Tun Hussein Onn Malaysia, Pagoh Higher Educational Hub, 84600 Pagoh, Johor, Malaysia e-mail: [email protected] W. N. A. W. Ahmad · S. F. Sufahani · M. S. Rusiman · R. Ros-lan · M. A. I. Kamarudin Mathematics and Statistics Research Group, Faculty of Applied Sciences and Technology, Universiti Tun Hussein Onn Malaysia, Pagoh Higher Educational Hub, 84600 Pagoh, Johor, Malaysia N. Ishartono Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia Universitas Muhammadiyah Surakarta, Jl. A. Yani, Mendungan, Pabelan, Kec. Kartasura, Kabupaten Sukoharjo, Jawa Tengah 57169, Surakarta, Indonesia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_20
237
238
W. N. A. W. Ahmad et al.
1 Introduction Many ambitious scholars, including [1], have used the Optimal Control (OC) theory nowadays. They used OC theory to tackle a problem pertaining to a real-world application in their financial studies. In addition to economics, the study of OC can be used in a variety of other fields [5, 12]. The COVID-19 protective measure by [2] can be one of them. A particular goal function or performance index can be maximized or minimized using OC, an extension of the Calculus of Variations (CoV). OC can be defined as the seeking process through all legal variables of control to identify the one that shifts the dynamic foundation from a few starting states at a few initial times to a few terminal states at a few terminal times [4, 10]. In this study’s OC problem, the performance index is an integral—which relies on the final state value, y(T ) which is unknown—is taken into account. Furthermore, this situation qualifies as a non-standard OC problem because the costate value, p(T ) is not equal to zero at the final time. Consequently, a new prerequisite is established for y(T ) to be similar to a particular integral, z which is a continuous system. This system is the component of the final state value. In order to solve the non-standard OC problem, this study employs a royalty payment function, which is equivalent to a two-stage piecewise function. The royalty function ρ is not differentiable at a certain level. In order to get around the problem, the hyperbolic tangent (tanh) function which is the continuous approach was used. After that, the shooting method was developed in C ++ programming language to compute the optimal solution. There are four sections in this paper. The following section will discuss the nonstandard OC problem which will be thoroughly explained regarding the usage of the Newton method in the shooting method as well as the Golden Section Search method. Then, an illustrative royalty payment problem will be presented in the following section. A concise conclusion will be made at the end.
2 Non-standard Optimal Control Problem The non-standard OC problem is primarily the focus of this paper. The performance index integral in this study is dependent on the final state value, y(T ) which is unknown. Despite this, the final costate value p(T ) is not equal to zero, which is expected to be the same as another boundary condition that deviates from the standard OC theory. The new boundary condition concept can be referred to by Cruz, Torres, and Zinober (2012) as well as [7]. From the OC theory perspective, one has p(T ) that is equal to g y˙ (t, y(t), y˙ (t), z) Hence, the new boundary condition is known as p(T ) = η(T ) = T − S gz (t, y(t), y˙ (t), z)dt subject to p as the costate variable.
Solving the Royalty Payment Problem Through Shooting Method
239
The not equal to zero vital limit condition p(T ) is appeared in Malinowska and Torres (2010). Additionally, the g y˙ (t, y(t), y˙ (t), z) system might be distinguished in regard to z. In light of the facts already provided, let us move on to the numerical problem that was evaluated by [9], 11]. Let us take into account the subsequent ODE system y˙ (t) = u(t)
(1)
to maximize the following performance index (Zinober & Kaivanto, 2008; [9]) T
T g(t, y(t), u(t))dt =
J [u(t)] = t0
a(t)u 1−a − Q + m 0 + c0 e−λy u(t) e−r t dt
t0
(2) where a = e0.025t , α = 0.5, m 0 = 1.θ · c0 = 1.0, λ = 0.12, and r = 0.1. As was previously noted, the system is dependent on the variable of state y(t), and the royalty function ρ. Both variable and function were used in this study that is in a two-stage piecewise function. The suggested configurations, in this case, are t0 = 0 and T = 10, as opposed to this, the underlying known initial state is y(0) which is equal to zero, and y(T ) is unknown. The stationary condition, the state, and costate equation, and a few other essential requirements must all be met. Additionally, the known condition of y(0) = 0 and p(0), an estimated initial value is both provided. The integral’s boundary requirement must also be fulfilled at the very end time, T. In order for the system to be close to zero, it is also necessary for the iterated value z of the state equation to be equal to the value yT at the last iteration. Only when the costate value equation converges does this in our calculation become satisfied. The optimal solution will then be discovered.
3 Royalty Payment Problem The aforementioned two-stage royalty function will be used to solve the proposed issue. 1 for y ≤ 0.5z (3) ρ(y) = 1.2 for y > 0.5z The continuous hyperbolic tangent (tanh) approach was applied to produce Q = 1.1 + 0.1 tanh(k(y − 0.5z))
(4)
240
W. N. A. W. Ahmad et al.
To compute the ideal solution, the shooting method was applied where the Newton method was combined with the Golden Section Search in tackling the non-standard OC problem. The constructed algorithm makes use of the extremely precise Numerical Recipes library procedure in the C ++ programming language [8]. The final state value y(T ) and final costate value p(T ) for two scalar functions must both be comparable to yT and η(T ), respectively. The Golden Section Search method runs the program to compute the best yT when the possible range of yT values is the input to discover the optimal value. Then, the root iteration is performed using Newton’s method. The program will then be executed by the ODE solver using initial guess(ed) values. Within the defined range of yT values from the preceding discussion, the Golden Section Search method will yield a distinct potential outcome yT. The resulting value yT will then be transferred to Newton’s iteration. In this phase, Newton makes use of the yT value at each iterative step to make sure that both scalar functions are near zero and uses the ODE solver to solve the problem. The scalar functions’ closeness to zero will be checked at the final time, as well as if the resulting solution yields an ideal yT value that maximizes the performance index, J(T ). If not, the Golden Section Search algorithm will continue to run for a different value yT and repeat the same procedure until it finds the best yT value that maximizes the function. Once the software has generated the same answer four times and is no longer able to provide the best yT value conceivable, the ideal performance index J(T ) together with the ideal yT results will be attained [8]. Classified the Golden Section Search method as a one-dimensional minimization technique. Consequently, it is necessary to multiply the performance index J(T ) for the minimi-zation technique by a negative one to solve the maximization problem [8].
4 Results and Discussion In this research, the shooting result produces an optimal solution; the final state value y(T ) is equal to 0.351080. Meanwhile, the optimal solution for both initial and final costate values are –0.031822 and 0.056910 respectively. These optimal solutions result in an optimal performance index that is maximized and is equal to 0.643299 at the terminal time. The ideal curves for the variables of state, costate, and control, as well as the performance index ideal plot, are shown in Fig. 1. From the starting time of zero until the ending time of ten, the ideal solution is shown.
5 Conclusion The shooting approach, which combined the Newton with the minimizing technique—the Golden Section Search method—was applied in this study to demonstrate the non-standard OC problem and the outcomes. Additionally, in order to solve
Solving the Royalty Payment Problem Through Shooting Method
241
state y(t) vs time (t)
costate p(t) vs time (t)
0.4
0.08
0.35
0.06
0.04
0.25
costate p(t)
state y(t)
0.3
0.2
0.02
0
0.15 -0.02
0.1
-0.04
0.05 0
0
1
2
3
4
5
6
7
8
9 10
-0.06
0
1
2
3
4
control u(t) vs time (t)
6
7
8
9 10
performance index J(t) vs time (t)
0.046
0.7
0.044
0.6
performance index J(t)
0.042 0.04
control u(t)
5
time (t)
time (t)
0.038 0.036 0.034 0.032
0.5
0.4
0.3
0.2
0.03 0.1
0.028 0.026
0
1
2
3
4
5
time (t)
6
7
8
9 10
0
0
1
2
3
4
5
6
7
8
9 10
time (t)
Fig. 1 The optimal solution generated starts from the initial time to the terminal time
nonstandard OC issues perfectly, this research demonstrated the use of appropriate boundary conditions and numerical approaches. The shooting techniques made use of the C ++ programming language, and the outcome is optimal. As a result, other ambitious academics may use this research finding as a springboard to investigate novel mathematical techniques for addressing contemporary economic issues. This is done to ensure that, especially in the academic field, the methodologies being employed remain current. Acknowledgements This research was supported by the Ministry of Higher Education (MOHE) through Fundamental Research Grant Scheme (FRGS/1/2021/STG06/UTHM/03/3). Thank you to Research Management Center (RMC), Universiti Tun Hussein Onn Malaysia (UTHM), for managing the research and publication process.
242
W. N. A. W. Ahmad et al.
References Ahmad WNAW, Sufahani SF, Zinober A (2019) Solving royalty problem through a new modified shooting method. Int J Recent Technol & Eng, 8(1). Blue Eyes Intelligence Engineering & Sciences Publication, pp 469–475 Amin C, Priyono P, Umrotun U, Fatkhiyah M, Sufahani SF (2021) Exploring the Prevalence of Protective Measure Adoption in Mosques during the COVID-19 Pandemic in Indonesia. Sustainability 13(24):13927 Betts JT (2010) Practical methods for optimal control using nonlinear programming. Adv Des & Control, Soc Ind & Appl Math, Philadelph- ia, PA Bryson AE (2018) Applied Optimal Control: Optimization. Routledge, Estimation and Control Cruz PAF, Torres DFM, Zinober ASI (2010) A Non-Classical class of variational problems. Int J Math Model & Numer Optimi-Zation, 1(3). Inderscience Publishers, pp 227–236 Fourer R, Gay DM, Kernighan BW (1990) A Modelling Language for Mathemat- ical Programming. Manage Sci 36(5):519–554 Malinowska AB, Torres DFM (2010) Natural Boundary Conditions in the calculus of variations. Math Methods Appl Sci, 33(14). Wiley Online Library, pp 1712–1722 Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical Recipes: The Art of Scientific Computing, 3rd edn. Cambridge University Press, Cambridge Spence AM (1981) The learning curves and competition. Bell J Econ. JSTOR, pp 49–70 Zinober ASI (2010) Optimal control theory lecture notes. The University of Sheffield Zinober ASI, Kaivanto K (2008) Optimal production subject to piecewise continuous royalty payment obligations. University of Sheffield Zinober ASI, Sufahani S (2013) A Non-Standard optimal control problem arising in an economics application. Pesquisa Operational. Braz Oper Res Soc, 33(1), pp 63–71.
ECG Signal Classification Using Transfer Learning and Convolutional Neural Networks Tanzila Tahsin Mayabee, Kazi Tahsinul Haque, Saadia Binte Alam, Rashedur Rahman, M. Ashraful Amin, and Syoji Kobashi
Abstract The number of heart disease cases as well as the death associated with it are rising in numbers every year. It is now more important than ever to diagnose heart abnormalities quickly and correctly to ensure proper treatment is provided in time. A common tool for diagnosing heart abnormalities is the Electrocardiogram (ECG). The ECG is a procedure that requires electrodes to monitor and records the activity of hearts as a form of signal. In this paper, a method is proposed to classify standard 12-lead ECG signals using continuous wavelet transform (CWT) and convolutional neural network (CNN). At first, CWT is used to extract and represent features of the ECG signals in 2-dimensional (2D) RGB images. Later, the RGB images are classified into normal and abnormal cases using a pre-trained CNN. The proposed method is evaluated using a dataset containing ECG signals from 18,885 subjects. The maximum accuracy, precision, recall, F1 score, and AUC obtained are 74.78%, 78.968%, 71.003%, 72.957%, and 0.81126 respectively. Keywords Electrocardiogram (ECG) · Cardiovascular Disease (CVD) · Computer aided diagnosis (CAD) · Convolutional neural networks · Transfer learning · Binary classification
T. T. Mayabee (B) · K. T. Haque · S. B. Alam · M. A. Amin Center for Computational and Data Sciences, Independent University, Bangladesh (IUB), Dhaka 1299, Bangladesh e-mail: [email protected] K. T. Haque e-mail: [email protected] S. B. Alam e-mail: [email protected] M. A. Amin e-mail: [email protected] R. Rahman · S. Kobashi University of Hyogo, Shosha, Himeji 2167, Hyogo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_21
243
244
T. T. Mayabee et al.
1 Introduction Cardiovascular diseases (CVDs) refer to a group of disorders that affect the heart and blood vessels. One of the leading causes of mortality worldwide is CVDs. Around 17.9 million individuals worldwide died as a result of CVDs in 2019 alone, accounting for approximately 32% of all fatalities worldwide [1]. Moreover, CVDs such as heart failure are complex and progressive in nature, and patients are often diagnosed too late [9]. Early detection can vastly help in controlling CVDs and preventing them from becoming life-threatening. A standard tool for diagnosing CVDs is the Electrocardiogram (ECG). The process involves attaching electrodes to different parts of the body which detect electrical activity generated by the heart each time it beats. The ECG allows the medical attendant to see the anatomy and physiological environment of the patient’s heart, from which they can interpret and diagnose. The appeal of ECGs lies in the fact that it is a standardized, well-established method that is quick to perform and uses machines that are readily accessible in most countries. However, as the ECG is primarily a screening test, the results of its accuracy rely on the quality of the interpretation, and that varies greatly depending on the attendants’ prowess. A reliable way of decreasing the dependency on the attendants’ interpretability skill is to cumulate the interpretation of ECG of several attendants. However, this approach is not cost-effective and, time consuming. Recently, Computer-Aided-Detection (CADe) and Computer-Aided-Diagnosis (CADx) have been being used as an alternative to address the issue of interpretability and assist in diagnosis. Computer-Aided-Detection (CADe) and Computer-Aided-Diagnosis (CADx) are computer-based systems created to provide a ‘second opinion’ to doctors and medical attendants, helping them interpret medical images and signals. CAD helps to make the diagnostic process faster and less error-prone [4]. One way to categorize CAD application areas can be based on the type of data that is being used. These can include sound and signal data such as those generated by the ECG, medical image data such as ultrasound or X-ray images, or data related to various lab tests such as blood pressure tests, respiratory tests, etc. [21]. Contemporary AI technology such as deep convolutional neural networks (DCNN) are now used to develop CADs for various tasks, including classification of signals generated by the standard 12lead electrocardiograph. This paper proposes a transfer learning approach using the PTB-XL dataset where ECG signals are trained using a pre-trained ResNet50 architecture. The pre-trained CNN model accepts RGB images as inputs, therefore the raw ECG signals are first translated to the time–frequency domain by using Continuous Wavelet Transform (CWT). The generated images are then used as input for the neural network. The contribution of this paper is as follows: 1. A different representation approach on the PTB-XL dataset. The one-dimensional signals from the dataset are first converted to two-dimensional scalograms using CWT. Most of the work done using this dataset consider raw signals as input. 2. Classification and evaluation using residual based networks, specifically the ResNet50 architecture, on the PTB-XL dataset.
ECG Signal Classification Using Transfer Learning and Convolutional …
245
The rest of the paper is structured into the following sections: Sect. 2 describes some of the previous research done in regards to ECG classification, both using CNNs specifically and using other methods. Section 3 describes the publicly available dataset used in this paper. Section 4 describes a brief overview of the methodology used, and the architecture of the network. Section 5 describes the results of the experiment and its interpretation. Finally, Sect. 6 concludes the paper with discussions on possible future work.
2 Literature Review Heart disease classification using ECG signals has been a topic of interest for various research. Several deep learning models including the multilayer perceptron (ML), convolutional neural networks (CNN), deep neural networks (DNN), long short-term memory (LSTM), have been used in the task of ECG signal classification. Most ECG classification methods used raw ECG signals as the input to a neural network, often specifically one- dimensional CNN. The most common dataset used in ECG classification tasks is the MIT-BIH Arrhythmia dataset. The dataset consists of five heartbeat classes. Wu et al. [18] proposed a 12-layer deep one-dimensional CNN for multiclass classification of the aforementioned dataset. Experiments were conducted using both raw and data that had been denoised using self-adaptive thresholding. The model used showed better results in accuracy, sensitivity, and robustness compared to models such as backpropagation neural network, random forest classifier, and other CNN networks [18]. Wang wt al. [16] had also used CNNs, however the ECG signals had been first transformed to the time–frequency domain through the use of Continuous Wavelet Transform (CWT). The proposed CNN used both the entire scalogram and the segmentation of the RR interval for its training. The model had over 98% accuracy for the supra ventricular ectopic beat (SVEB) and ventricular ectopic beat (VEB) classes in the dataset, and achieves a greater F1-score, 4.75–16.85% higher than existing methods [16]. Similarly, [19] also used a CNN based method, theirs had consisted of 9 layers—4 convolutional, 2 fully connected, a softmax layer, and 2 sub-sampling layers. The method had an accuracy, positive predictive value, and specificity of over 99% for the SVEB and VEB classes. The proposed system could be of directly implemented into wearable devices so that long-term ECG data can be monitored [19]. The dataset was also used by [20] but with a two-stage CNN workflow instead, where the heartbeat signals were first classified into two classes—abnormal and normal. The abnormal signals were then further classified into the other four classes in the second stage. This ensures the false negative rate of normal heartbeat detection is reduced. The proposed CNN was then implemented as a spiking neural network (SNN) to reduce power consumption, and experiments were conducted using CNN + CNN, SNN + CNN, and SNN + SNN approach, with the former yielding the highest accuracy of 92% [20]. Malik et al. [8] used one dimensional self-organized
246
T. T. Mayabee et al.
operational neural networks (Self-ONNs) the first instance of the model to be used for classification task. Their experiments showed that 1D Self-ONNs outperform 1D CNNs with a significant margin, with an average accuracy of 98% and 99.04% for SVEB and VEB classes respectively, while maintaining a similar computational complexity [8]. Although many of the work done in this area uses the MIT-BIH Arrhythmia dataset, the MIT-BIH dataset is strongly imbalanced. Shaker et al. [12] use generative adversarial networks (GANs) as a data-augmentation technique and noted that the augmented dataset improves the performance of ECG classification [12]. A more recent dataset is the PTB-XL dataset, which is the largest ECG dataset ´ to date, with up to 20 classes. Smigiel et al. [13] used the PTB-XL dataset for binary and multiclass (5-class and 20-class) classification using CNN, SincNet, and convolutional network with entropy features, with the latter achieving the highest accuracy for both binary and multiclass classification [13]. Feyisa et al. [5] used a multireceptive field CNN (MRF-CNN) which combats the need for large parameters that are a staple of traditional 1D-CNNs. The proposed MRF-CNN achieves a score of 0.93 for the 5 superclasses and 0.92 for the 20 subclasses [5]. Another popular approach in solving the ECG classification problem using CNNs is via transfer learning. This reduces the need for a large number of annotated samples. First, the CNNs are pre-trained on large publicly available datasets, then the model is finetuned using smaller datasets. Rahhal et al. [10] had previously pre-trained a DCNN on the publicly available ImageNet dataset at first, and then finetuned the model on MIT-BIH arrhythmia, INCART, and SVDB databases. The proposed method achieved better results in detecting VEB and SVEB compared to state-ofthe-art methods [10]. Weimann and Conrad [17] showed that pretraining improves performance on target tasks by up to 6.57%, thereby reducing the number of annotations required to achieve the same results on non-pre-trained networks [17]. Transfer learning was also used for automatic sleep stage classification using the pre-trained CNN squeezenet by [7]. Single channel electroencephalogram (EEG) signals were first transformed into time–frequency domain using CWT then used as input for the CNN. The method was then evaluated using the publicly available EDFx dataset. The results show that the method achieves almost state-of-the-art accuracy in spite of single channel EEG signal being used [7].
3 Dataset In this study, the PTB-XL [6, 14, 15] dataset was used, which is one of the largest freely accessible ECG datasets available right now. The dataset contains a total of 21,837 12-lead clinical ECG signals from 18,885 patients, each 10 s in length. The patients’ ages range from 0 to 95 years, with a median age of 62. Among the patients, 52% are male and 48% are females. Each of the signals are annotated by at least one cardiologist, and some are annotated by two. The dataset is multi-labelled. There are 5 primary diagnostic superclasses, each
ECG Signal Classification Using Transfer Learning and Convolutional … Table 1 Dataset superclasses
247
Superclass
Description
Number of records
NORM
Normal ECG
9528
MI
Myocardial infarction
5486
STTC
ST/T change
5250
CD
Conduction disturbance
4907
HYP
Hypertrophy
2665
of which is split further into multiple subclasses. The waveform files are available in 16-bit precision at a resolution of 1 µV/LSB and a sampling frequency of 500 and 100 Hz. Although the dataset is multi-labelled, for this paper it was used for binary classification. The data was split into two classes—normal and abnormal. As a part of the data cleaning step all instances of multiple or no annotations were removed (Table 1).
4 Methodology 4.1 Overview The flowchart below gives an overview of the proposed method. First the ECG signals are converted to CWT images as a means of feature extraction, and because the pretrained model accepts RGB images as input. Then the images are used as input for a pre-trained CNN. Finally, the results are evaluated by calculating accuracy, precision, recall, F1 score, and area under the curve (AUC) (Fig. 1).
Fig. 1 Overview of methodology
248
T. T. Mayabee et al.
4.2 Time–Frequency Domain Transformation via CWT The pre-trained convolutional network takes RGB images as input, the ECG signals are also composed of different frequency components, therefore the raw ECG signals are converted to the time–frequency domain using CWT. CWT is a time–frequency analysis tool that decomposes the signal into wavelets. Although there are many similar time–frequency tools, the CWT is particularly adept at decomposing highly non-homogeneous signals. The raw ECG signals were first normalized and then used to generate CWT images (Figs. 2 and 3). Unlike other popular time–frequency tools such as the Fourier Transformation (FT), where the analyzing functions are complex exponentials, the CWT uses a wavelet ψ. The CWT compresses and stretches the wavelet by varying the translation and scale parameter and compares it to the signal to create a 2D representation [2]. CWT inherits and develops the idea of localization similar to STFT, however CWT can provide high time resolution and low frequency resolution in the high frequencies, and high-frequency resolution and low time resolution in the low frequencies. Formally, given a signal x(t) that is at a scale value a and translational value b, the
Fig. 2 Raw ECG signal
Fig. 3 CWT of signal
ECG Signal Classification Using Transfer Learning and Convolutional …
249
Fig. 4 Network architecture
CWT is defined by Eq. (1). X w (a, b) =
∞
1
x(t)ψ
1
|a| 2
−∞
t −b dt a
(1)
where ψ is the complex conjugate of a mother wavelet—a continuous function that is in both the time and frequency domain.
4.3 Classification via ResNet50 This paper uses a ResNet50 convolutional neural network that was pre-trained on the publicly available ImageNet dataset [3]. A CNN network is a type of artificial neural network. A CNN comprises of some combination of an input layer, convolution layers, pooling layers, fully connected layers, and an output layer. Resnet50 is a residual-based CNN, which was invented to solve the problem of the vanishing or exploding gradients by using skip connections. The ResNet50 architecture has 48 convolution layers, 1 MaxPool layer, and 1 AveragePool layer, and a total of 25.6 million trainable parameters (Fig. 4). 1. Convolution Layers: Convolution layers work as feature extractors through the use of learnable kernels. These kernels glide over the input in the convolution layer and uses matrix multiplication to create feature maps. The output can be optimized through the use of hyper-parameters such as the depth, the stride and padding. 2. Pooling Layers: The output from the convolution layer is high in dimension, the pooling layer reduces the spatial size of the convolution layer outputs. It also ensures that the most dominant features are extracted. Max pooling uses the maximum value from the area covered by the kernel, while average pooling returns the average of all the values. 3. Skip Connections: Skip connections add the input to the output of a convolution block, allowing an alternate path through which the gradient can flow through. They also ensure the higher layer will perform at least equally as good as the lower layer, if not better.
250
T. T. Mayabee et al.
The pre-trained ResNet50 model was trained on more than a million images from the ImageNet database which has a total of 1000 object categories [11]. Applying CWT to the raw ECG waveforms generates RGB time–frequency representations. These are then fed into the pre-trained CNN to generate features. The output layer is replaced by an additional fully connected layer. After the initial training stage, fine-tuning is achieved by training the fully connected layer using the learned features.
5 Results and Discussion The architecture was implemented using the MATLAB image processing toolbox, the MATLAB neural network toolbox, and the MATLAB deep learning toolbox. The total dataset was randomly split into training set (70%), validation set (20%), and test set (10%). The training process involved two steps. First, only the classification layers were trained and the parameters for all other layers were kept frozen. Training was done for 30 epochs, using minibatches of size 64, learning rate and regularization parameter of 1e−6 . Then the weights were initialized and the rest of the layers were finetuned using minibatches of size 64, learning rate and regularization parameter of 1e−6 and early-stopping was used to get the best model. The model was evaluated by calculating accuracy (ACC), precision (PR), recall (RC), F1 score (F1), and AUC score (AUC). The ACC, PR, RC, and F1 score were calculated by Eqs. (2)–(5). ACC =
TP +TN T P + T N + FP + FN
(2)
PR =
TP T P + FP
(3)
RC =
TP T P + FN
(4)
2 × P R × RC P R + RC
(5)
F1 =
where TP represents true positives or the number of correct predictions of abnormal cases, FP represents false positives or the incorrect predictions of normal cases, TN represents true negatives or the number of correct predictions of normal cases, and FN represents false negatives or the number of incorrect predictions of abnormal cases. The maximum accuracy for binary classification reached 74.78% for Lead 3, followed by 74.119% for Lead 1 and Lead 10. The ratio of True Positives and all Positives is known as the Precision. The precision is essentially a measure of all the positive values that were correctly identified as positive. A network with a high precision score returns a correctly predicted label for the majority of the data. The
ECG Signal Classification Using Transfer Learning and Convolutional …
251
maximum precision for the proposed network is 78.968% for Lead 3 and 77.009% for Lead 1. Recall is a measure of whether a network correctly identifies true positive. The maximum Recall obtained by our network was 71.003%, for Lead 11. The F1 score is another important evaluation metric for a network. It is calculated using the two aforementioned metrics, precision, and recall. Higher values of Precision and Recall will of course lead to higher F1 scores. However, it is often difficult to maximize both Precision and Recall, therefore a combined F1 score is also an indicator of a network’s performance. The closer the F1 score to 1 the better the network performs. The F1 score achieved is high on average, with a peak value of 72.957% for Lead 10. The receiver operator characteristic (ROC) is a curve which plots the true positive rate against the false positive rate at multiple threshold values. AUC is the total area under the ROC curve which acts as the measure of the ability of a classifier to distinguish between classes. Higher AUC scores mean the model is in discerning a positive class from a negative class. The AUC scores are maximum at 0.81126 and 0.80519 for Lead 1 and 3 respectively. Whether a prediction is classified as positive or negative is determined by the Model Operating Point. For this network the operative point was the default value of 0.5, therefore any result greater than 0.5 was considered as 1 and any result less than 0.5 was considered as 0 by the network. Generally, the model operating point is determined depending on the use case. The number of TP, TN, FP, and FN will change depending on the operating point being chosen. Therefore, the point is chosen to give the numbers in such a way so that other metrics that are dependent on in it such as precision and recall are maximized. The classification results for all performance metrics for each of the 12 leads are shown in Table 2. The ROC curves for each lead are shown in Fig. 5. Table 2 Classification results Lead No
Accuracy
Precision
Recall
F1 Score
AUC score
0
72.522
0.74113
0.69129
0.71535
0.79177
1
74.119
0.77009
0.68688
0.72611
0.81126
2
65.529
0.69433
0.55347
0.61595
0.70828
3
74.78
0.78968
0.67475
0.72771
0.81059
4
65.04
0.66824
0.62624
0.64656
0.71598
5
66.5
0.68176
0.63065
0.65521
0.74185
6
64.317
0.65812
0.59427
0.62457
0.71106
7
68.227
0.7279
0.58104
0.64623
0.73519
8
67.566
0.70281
0.6075
0.65169
0.73742
9
70.925
0.72533
0.67255
0.69794
0.78024
10
74.119
0.76294
0.69901
0.72957
0.80519
11
72.192
0.72686
0.71003
0.71835
0.78122
252
T. T. Mayabee et al.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Fig. 5 ROC Curves for a lead 0, b lead 1, c lead 2, d lead 3, e lead 4, f lead 5, g lead 6, h lead 7, i lead 8, j lead 9, k lead 10, and l lead 11
ECG Signal Classification Using Transfer Learning and Convolutional …
253
6 Conclusion In this paper, binary classification was performed on the PTB-XL dataset, a large publicly available dataset consisting of ECG signals, to classify between normal and abnormal cases. The method consisted of transferring knowledge from a pretrained ResNet50 architecture which was trained on the ImageNet dataset, and using CWT on the ECG signals so they can be used as input for the network. The experiment showed admirable AUC scores however the accuracy has room for improvement. The research can be further improved or expanded in a few different manners. Firstly, for this paper, no preprocessing was performed on the ECG signals other than CWT. Preprocessing prior to performing CWT will most likely lead to a better accuracy. The network architecture used here is a residual-based network. However, other networks can to be explored.
References 1. Cardiovascular diseases (CVDs) (2022). https://www.who.int/health-topics/cardiovascular-dis eases. Online; Last Accessed 31 Aug 2022 2. Continuous wavelet transform and scale-based analysis—MATLAB Simulink. https://www. mathworks.com/help/wavelet/gs/continuous-wavelet-transform-and-scale-based-analysis. html. Online; Last Accessed 31 Aug 2022 3. Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 248–255 4. Doi (2007) Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph 31(4–5):198–211 5. Feyisa DW, Debelee TG, Ayano YM, Kebede SR, Assore TF (2022) Lightweight multireceptive field CNN for 12-lead ECG signal classification. Comput Intell Neurosci 6. Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220 7. Jadhav P, Rajguru G, Datta D, Mukhopadhyay S (2022) Automatic sleep stage classification using time–frequency images of CWT and transfer learning using convolution neural network. J Biocybern Biomed Eng 40(1):494–504 8. Malik J, Devecioglu OC, Kiranyaz S, Ince T, Gabbouj M (2021) Real-time patient-specific ECG classification by 1d self-operational neural networks. IEEE Trans Biomed Eng 69(5):1788– 1801 9. Quinn GR, Ranum D, Song E, Linets M, Keohane C, Riah H, Greenberg P (2017) Missed diagnosis of cardiovascular disease in outpatient general medicine: Insights from malpractice claims data. Jt Comm J Qual Patient Saf 43(10):508–516 10. Rahhal MM, Bazi Y, Zuair M, Othman E, BenJdira B (2018) Convolutional neural networks for electrocardiogram classification. J Med Biol Eng 38(6):1014–1025 11. ResNet-50 convolutional neural network-MAT-LAB resnet50. https://www.mathworks.com/ help/deeplearning/ref/resnet50.html. Online; Last Accessed 31 Aug 2022 12. Shaker T, Tolba S (2020) Generalization of convolutional neural networks for ECG classification using generative adversarial networks. IEEE Access 8:35592–35605 ´ 13. Smigiel S, Pałczy´nski K, Ledzi´nski D (2021) ECG signal classification using deep learning techniques based on the PTB-XL dataset. Entropy 23(9):1121
254
T. T. Mayabee et al.
14. Wagner P, Strodthoff N, Bousseljot RD, Samek W, Schaeffter T (2020) PTB-XL, a large publicly available electrocardiography dataset. PhysioNet 15. Wagner P, Strodthoff N, Bousseljot RD, Kreiseler D, Lunze FI, Samek W, Schaeffter T (2020) PTB-XL, a large publicly available electrocardiography dataset. Sci Data 7(1) 16. Wang T, Lu C, Sun Y, Yang M, Liu C, Ou C (2021) Automatic ECG classification using continuous wavelet transform and convolutional neural network. Entropy (Basel) 23(1):119 17. Weimann K, Conrad TOF (2021) Transfer learning for ECG classification. Sci Rep 11(1):5251 18. Wu M, Lu Y, Yang W, Wong SY (2020) A study on Arrhythmia via ECG signal classification using the convolutional neural network. Front Comput Neurosci 14:564015 19. Xu X, Liu H (2020) ECG heartbeat classification using convolutional neural networks. IEEE Access 8:8614–8619 20. Yan Z, Zhou J, Wong WF (2021) Energy efficient ECG classification with spiking neural network. Biomed Signal Process Control 63:102170 21. Yanase J, Triantaphyllou E (2019) A systematic survey of computer-aided diagnosis in medicine: past and present developments. Exp Syst Appl 138:112821
Partitional Technique for Searching Initial Cluster Centers in K-means Algorithm Md. Hamidur Rahman and Momotaz Begum
Abstract Although there are many clustering algorithms are introduced, the Kmeans clustering algorithm is the most universal algorithm because of its simplicity. Several modifications are also introduced to overcome the drawbacks of the Kmeans clustering algorithm. But still, it suffers from centroid initialization of clusters primarily which may help to elevate the efficiency of clustering. Hence this paper put forward a way of initializing cluster centroids smartly so that drawbacks of the previous algorithm may be eliminated. The proposed algorithm divides the dataspace into K (number of clusters) equally by calculating the center point of each partition set as initial cluster centers. Compared to the original and some other modified versions of the K-means grouping algorithm made by researchers, the proposed model of writing paper provides better results in terms of time complexity, space complexity, inter-cluster distance, etc. Keywords K-means clustering · Cluster centroid · Time complexity · Space complexity · Intercluster distance
1 Introduction Huge amounts of raw data are generated every second subsequently advancement of modern science and technology in our daily lives, professional lives, studies, and research lives. Traditional data fining systems are impotent in handling such extensive data. In this case, data mining could be a viable option for resolving the issue. Clustering is a crucial tool in data mining and expert systems. It is an unsupervised knowledge engineering technique that creates a cluster of multiple similar Md. Hamidur Rahman (B) · M. Begum Dhaka University of Engineering and Technology, DUET, Gazipur, Bangladesh e-mail: [email protected] M. Begum e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_22
255
256
Md. Hamidur Rahman and M. Begum
data points based on similarities and differences between the data items in question. Clustering algorithms may be applied in different application areas like artificial intelligence, data compression, customer service, medicine, pattern recognition, data mining, marketing, information retrieval, biology, image processing, machine learning, psychology, exert systems, statistics, and so on. Hence the efficiency of the clustering algorithm affects the efficiency of mentioned application areas. So clustering algorithms should be highly capable to optimize overall performance and need to keep research on it. In this paper, we tried to enhance the efficiency of the K-means clustering algorithm by initializing cluster centroids after partitioning the dataspace into a number of clusters. For synthesis analysis, we have used iris and wine datasets, where we compare the proposed approach with original K-means, and GBKM models in terms of run-time and inter-cluster distance. The rest of the current paper is arranged as follows. Section 2 studies the background and conventional K-means clustering method. Section 3 describes the proposed system and illustrates the implementation of the presented algorithm. Section 4 shows the performance analysis. Section 5 encapsulates the work of the writing paper and put it on the map for further research.
2 Background and Present State of the Problem Clustering is a strategy that classifies the accounting data thoughtfully and searches for the hidden decoration that may remain in datasets [1]. Various types of clustering techniques are suitable to the specific application field. The clustering methods can be cut up into [2] (a) partition-based clustering algorithms like K-means along with K-modes [3], (b) hierarchical-based clustering algorithms for example BIRCH [4], CURE [5], (c) density-based clustering algorithms like DBSCAN [6], OPTICS [7] as well as CFSFDP [8], (d) grid-based algorithms like WaveCluster [9], and finally (e) model-based algorithms like I-KSOM [10] where a model is hypothesized for every cluster to search the best pertinent of data for a given model. K-means is a non-deterministic, unsupervised, numerical, iterative method [1]. It is a notably straightforward and widely used segregation-based clustering algorithm. As a result, the K-means algorithm is being used heavily in the world. Notably, the authors of [11–13], and many more, make use of the K-means method and implement this algorithm to gain categorical decisions in specific regions of interest. Such papers show the utilization of the K-means algorithm only but do not focus on the core improvement of the algorithm. However, the conventional K-means algorithm possesses many pitfalls, which are awaited to be further optimization. The authors of the literature [2] listed some disadvantages of the K-means clustering method such as a time-consuming algorithm that suffers from difficulty in coinciding with the accurate result as well as having sensitivity to noise.
Partitional Technique for Searching Initial Cluster Centers …
257
Since the K-means algorithm wields random selection of the initial cluster center may produce different cluster results and accuracy every time. Besides K-means algorithm can work well on spherical data and this algorithm is heavily affected by noise points. Comprehensive research should be carried out to overcome mentioned shortcomings as much as possible. Therefore, researchers are keeping studies on it continuously. Due to the lack of initializing cluster centroid, numerous academics have changed the K-means method in recent decades. Recently, Mahnaz Mardi and Mohammad Reza Keyvanpour [14] introduced a clustering algorithm based on a genetic algorithm namely GBKM. GBKM can provide the best cluster centroid for initializing the standard K-means algorithm but it consumes a huge amount of running time during its fitness function. Split-Merge step [15] was used to re-start the K -means algorithm after reaching a fixed point with an associated error but it needs to provide cluster centroid as the parameter which may greatly affect the performance. The Convex Hull strategy was used for computing the introductory two centroids and the nearest neighbor technique was used to ensure the election of one centroid per cluster [16], which suffers from high complexity because it finds the afar samples by calculating all the intergap between samples and set as cluster center. A method that divides the supplied data into K * K segments, where k is the appropriate number of clusters, was one of them [17]. However, it necessitates more partitioning than K, as well as a significant level of time complexity. To manipulate unsupervised high-featured data, Begum and Akthar [18] represented an algorithm that utilizes KSOM to find the cluster number and then KSOMKM finds a more accurate number, which lags efficiency in practice. For dimension reduction and initial cluster centroids, Principal Component Analysis (PCA) [19] is utilized. Find the distance between all the data points, according to [20], and then choose the point as a cluster centroid to which density is maximum. In addition, by combining the Clustering by fast search and finding of density peaks (CFSFDP) and CLustering in QUEst (CLIQUE) algorithms, [2] suggested an improved K-means algorithm. However, it has a higher level of time complexity as CFSFDP requires cut-off distance dc, which may be difficult to determine for different datasets. The within-cluster squared error criteria [21] are used to evaluate the quality of K-means clustering. A new clustering-based routing protocol for Vehicular Ad Hoc Networks (VANET) was proposed in the literature [22], which employed a modified K-means algorithm with a Continuous Hopfield Network and Maximum Stable Set Problem. The writers in [23] presented a new introductory centroids selection technique that uses distances between representatives are calculated and the sums of distances for all points are determined, then sort them using their total distances from maximum to minimum. The point has the maximum total distances defined as the first centroid. Then next N/K points are rejected, and the first sample consequently is selected as the second introductory centroid. This procedure will be done for K times. Since [24] finds the farthest samples by calculating all the distances between samples and setting them as cluster centers, it consumes a huge amount of time.
258
Md. Hamidur Rahman and M. Begum
Furthermore, Zhenfeng and Chunyan introduced EKM [25] as a hybrid model of K-means and Genetic Algorithm (GA) which may produce better output but need to sacrifice time complexity because of fitness calculation and mutation process of GA. In addition, a new method is suggested by Zhao and Zhou for calculating similarity depending on Weighted Euclidean distance [26], but it is a time-consuming concept as it requires more computation than K-means. Moreover, [26] theta and beta parameters expect to be supplied by the user, but the type and range of these parameters are not mentioned. On the other hand, columnist of [27] shows the evaluation of the K-means algorithm in terms of execution time with various distance measures. In this Proposition, we address the issue of cluster centroid initialization of the K-means clustering algorithm. At the same time, we present a new point of view on the same.
3 Methodology of Proposed Approach 3.1 Partitioning the Dataspace With a specific aim to address the above-mentioned issues of adaptability and functionality, we plan and actualize a flexible algorithm to initialize cluster centroid instead of random selection. The proposed system is focusing on cluster centroid initialization. The effects and proposed modification area of the k means algorithm are shown in Fig. 1. The proposed system will take input data points and the number of clusters (K), and the modification will be used to initialize the cluster centroid. After calculating the cluster centroid, the distance between cluster centers and each data point is calculated using the Euclidean distance equation [24] and then those data points are set to the nearest distance cluster center. After that cluster center is updated by calculating the mean value belonging to each cluster and again calculating distance like before. This process keeps continuing until the stopping criteria are met. will be repeated until the stopping criteria are met. To explain the proposed modification let us consider a database of two features that will be used for clustering. At first, preprocessing of data including cleaning and scaling operations are performed to extract effective data. In this context, MinMax scalar [28] is used for scaling the data. By calculating the gap between the minimum and maximum value of both features we can examine the scattering status of data. From Fig. 2, we see the scattering of a sample scaled data. In this example, we see the minimum value of X is 0.05 and the maximum value of X is 0.79 hence the gap between these values is (0.79–0.05) = 0.74. Similarly, the gap between the minimum and maximum value of Y is (0.3 - 0.05) = 0.25. Now comparing these two gaps (0.74 > 0.25) we may conclude that data are more scattered on the X-axis. So we will divide data into k (no of the cluster) partitions horizontally. For doing this we need to divide the gap between X-axis (i.e., 0.74) into k pieces. Then we get
Partitional Technique for Searching Initial Cluster Centers …
Proposed Modification
Start
Clean and Scale data using Minmax Scaler
Set the cluster number K
Dataspace partitioned into K partition equally
Set middle point of each partition as cluster center
259
Yes
Calculate distance & assign data to clossest cluster
Is cluster changed
Recalculate cluster center
No
Stop
Fig. 1 Flowchart of proposed modification area for K-means algorithm
0.05 to 0.29, 0.29 to 0.54, and 0.54 to 0.79 as three pieces (since we assume k = 3). Figure 2 shows these partitions. We have done our partitions. We need two values X, Y as coordinates for cluster centroids. As a result, we obtain the middle point of each partition as X coordinates and the middle Y value as Y coordinates (Fig. 3). So the cluster centroids will be as (((0.05 + 0.29)/2), (0.25/2)), (((0.29 + 0.54)/2), (0.25/2)), (((0.05 + 0.29)/2), (0.25/2)). Fig. 2 Scattering state of scaled data
Fig. 3 Making partitions of scattered data
260
Md. Hamidur Rahman and M. Begum
3.2 Steps of Proposed Approach The proposed system is focusing on cluster centroid initialization. For cleaning and scaling the data, we will use the MinMax scalar [28]. For calculating data space, the following equations have been used: (X ) = j − i
(1)
where j = Max(X) [23], i = Min(X) [23], X & Y are two dimensions of data space Len(Y ) = m − n
(2)
where m = max(Y) and n = min(Y) Mid(X ) = (i + j) 2
(3)
Mid(X ) = (m + n) 2
(4)
Diff(X ) = Len(X ) k
(5)
Diff(Y ) = Len(Y ) k
(6)
where k is the number of clusters Now for initializing cluster centroids following equations are taken into consideration: If (abs(Len(Y) - Len(X)) > λ); where λ is tuning parameter ( here used λ = 0.05). If (Len(X) > Len(Y)), Ci to K −1 = Min(X ) + (Dif(X ) ∗ i) + (Dif(X ) ∗ (i + 1))/2, Mid(Y )
(7)
If ( Len(Y) > Len(X), Ci to k−1 = (Mid(X ), (Min(Y ) + Dif(Y ) ∗ i + Dif(Y ) ∗ (i + 1)/2)
(8)
Else Ci to k−1 =(Min(X ) + Diff(X ) ∗ i + Diff(X ) ∗ (i + 1))/2, (Min(Y ) + Diff(Y ) ∗ i + Diff(Y ) ∗ (i + 1))/2
(9)
Partitional Technique for Searching Initial Cluster Centers …
261
3.3 Pseudo Code The following steps are required for the proposed approach: Algorithm: Finding the initial cluster centers Input: Number of cluster k, a data set D Output: A set of initial k cluster centers Step 1: User supplies the value of k and D Step 2: Examinee the data scattering in data space using equation (1) & (2) Step 3: Calculate the middle point each partitions using equation (3) & (4) Step 4: Calculate initial k cluster centeroids using equation (7), (8) & (9) Step 5: End
4 Implementation and Result Comparison 4.1 Performance Comparison This study utilizes data from the University of California, Irvine (UCI) Machine Learning Repository [29]. For comparing the result among several clustering algorithms running time required by each algorithm and inter-cluster distance are considered. This research also employs Python 3.0 and Jupiter notebook as implementation tools on 64-bit windows pc having processor-Intel(R) Core(TM) i3-8100 CPU @ 3.60 GHz 3.60 GHz and 8 GB RAM. From Table 1, it is clear that the proposed approach provides better outcomes rather than the original K-means clustering algorithm and Genetic based K-means (GBKM) in terms of both run-time and inter-cluster distance metrics. To find the inter-cluster distance, centroid-based Euclidean distances [24] are calculated, and then the maximum distance between them is mentioned in Table 1. The reason for this GBKM consumes a huge amount of time in the fitness function calculation stage and the original K-means clustering algorithm selects cluster centroids randomly. d(Ci , C j ) =
k−1 ci p − c j p 2
(10)
i!= j, p=0
Equation (10) is used to calculate distances between cluster centers, after that maximum value of these distances is chosen as centroid-based inter-cluster distance.
262 Table 1 Performance comparison
Md. Hamidur Rahman and M. Begum Model name
Data set
Runtime (Seconds)
Inter-cluster distance
Proposed approach of K-means
Iris
0.0358
1.0360
Wine
0.0574
0.5747
Original K-means
Iris
1.8339
1.0455
Wine
1.4631
0.8850
GBKM [15]
Iris
17.682
4.5553
Wine
19.443
6.9019
Fig. 4 Proposed Approach on iris testing dataset
4.2 Result Analysis Iris and wine datasets are splitted into training and testing datasets. The split size is 80% data are used for grounding the model and the rest 20% of the data are used to test the performance of the trained model. In the case of the iris dataset performance of all models are somewhat similar except for run-time and inter-cluster distance. Figures 4, 5, and 6 show the clustering results on iris test data of the proposed approach, original K-means model, and GBKM [15] model, respectively. Since petal length and petal width are highly co-related, so these two features are used in all of these cases. On the other hand for the wine dataset, more than two features are highly correlated with each other. Hence, PCA [21] is applied to the same correlated features for all of these three models. As a result, cluster centroid is calculated based on six corelated features but data are displayed based on only two features in two dimension figure. Figures 7, 8, and 9 show the clustering results on iris test data of the proposed approach, original K-means model, and GBKM [15] model, respectively.
Partitional Technique for Searching Initial Cluster Centers … Fig. 5 Original K-means on iris dataset
Fig. 6 GBKM [15] on iris testing dataset
Fig. 7 Proposed approach on wine training dataset
263
264
Md. Hamidur Rahman and M. Begum
Fig. 8 Original K-means on wine training dataset
Fig. 9 GBKM [15] on wine training dataset
5 Conclusion Clustering is an essential part of data analysis and machine learning applications. The accuracy and performance of data analysis and other intelligent systems heavily depend on clustering performance. That’s why researchers are trying to increase the performance of clustering by keeping the balance between the accuracy and efficiency of the clustering algorithm. This study tries to increase the performance of K-means clustering algorithm by proposing a new approach for selecting the initial cluster centroids. Comparisons among the original clustering algorithm, GBKM model, and proposed model have been shown earlier in this paper. The proposed approach exhibits better outcomes because of initializing cluster centroid using the partitioning approach. In this writing, the K-means model is improved by removing the impact of noise and optimizing the election of introductory points at the same time. The proposed model still demands to be optimized. For instance, this algorithm works well only
Partitional Technique for Searching Initial Cluster Centers …
265
on non-spherical data. So it should be improved so that it can work well on both spherical and non-spherical datasets. This paper studies iris and wine datasets only. So in the future, we plan to apply the proposed approach to other datasets and also experiment with whether the proposed model is working well or not in large datasets.
References 1. Na S, Xumin L, Yong G (2010) Research on k-means clustering algorithm: an improved k-means clustering algorithm. Third Int Symp Intell Inf Technol Secur Inform 2010:63–67. https://doi.org/10.1109/IITSI.2010.74 2. Xu H, Yao S, Li Q, Ye Z (2020) An improved K-means clustering algorithm. In: 2020 IEEE 5th international symposium on smart and wireless systems within the conferences on intelligent data acquisition and advanced computing systems (IDAACS-SWS), pp 1–5. https://doi.org/10. 1109/IDAACS-SWS50031.2020.9297060. 3. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2:283–304 4. Tian Z, Ramakrishnan R, Miron L (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103–114 5. Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. SIGMOD’98, Seattle, Washington, pp 73–84 6. Ester BM, Kriegel HP, Sander J, Xu X (1996) A density based algorithm for discovering clusters in large spatial databases. In: Proceeding of 1996 international conference on knowledge discovery and data mining, pp 226–231 7. Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Record 28(2):49–60 8. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496 9. Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Proceeding of the 24th international conference on very large data bases (VLDB ’98), pp 428–439 10. Begum M, Das BC, Hossain MZ, Saha A, Papry KA (2021) An improved Kohonen self-organizing map clustering algorithm for high-dimensional data sets. Indones J Electr Eng Comput Sci 24(1):600–610. ISSN: 2502-4752. https://doi.org/10.11591/ijeecs.v24.i1.pp6 00-610 11. Sun H, Chen Y, Lai J, Wang Y, Liu X, Identifying tourists and locals by K-means clustering method from mobile phone signaling data. J Transport Eng Part A: Syst 147(10):04021070 12. Hutagalung J, Ginantra NLSR, Bhawika GW, Parwita WGS, Wanto A, Panjaitan PD (2020) COVID-19 cases and deaths in Southeast Asia clustering using k-means algorithm. In: Annual Conference on Science and Technology Research (ACOSTER) 2020, 20–21 June 2020, Medan, Indonesia 13. Khorshidi N, Parsa M, Lentz DR, Sobhanverdi J (2021) Identification of heavy metal pollution sources and its associated risk assessment in an industrial town using the K-means clustering technique. Appl Geochem 135:105113, ISSN 0883-2927. https://doi.org/10.1016/j.apg eochem.2021.105113 14. Mardi M, Keyvanpour MR (2021) GBKM: a new genetic based k-means clustering algorithm. In: 2021 7th international conference on web research (ICWR), pp 222–226. https://doi.org/ 10.1109/ICWR51868.2021.9443113 15. Capó M, Pérez A, Lozano JA (2022) An efficient split-merge re-start for the K-means algorithm. IEEE Trans Knowl Data Eng 34(4):1618–1627. https://doi.org/10.1109/TKDE.2020.3002926.
266
Md. Hamidur Rahman and M. Begum
16. Rahman Z, Hossain MS, Hasan M, Imteaj A (2021) An enhanced method of initial cluster center selection for K-means algorithm. Innov Intell Syst Appl Conf (ASYU) 2021:1–6. https://doi. org/10.1109/ASYU52992.2021.9599017 17. Sen A, Pandey M, Chakravarty K (2020) Random centroid selection for K-means clustering: a proposed algorithm for improving clustering results. In: 2020 international conference on computer science, engineering and applications (ICCSEA), pp 1–4. https://doi.org/10.1109/ ICCSEA49143.2020.9132921. 18. Begum M, Akthar MN (2013) KSOMKM: an efficient approach for high dimensional dataset clustering. Int J Electr Energy 1(2):102–107. https://doi.org/10.12720/ijoee.1.2.102-107 19. Singh RV, Bhatia MPS (2011) Data clustering with modified K-means algorithm. Int Conf Recent Trends Inf Technol (ICRTIT) 2011:717–721. https://doi.org/10.1109/ICRTIT.2011.597 2376 20. Tajunisha S, Saravanan V (2010) Performance analysis of k-means with different initialization methods for high dimensional data. Int J Artif Intell Appl (IJAIA) 1(4):44–52 21. Yuan, Yang H (2019) Research on K-value selection method of K-means clustering algorithm. J 2(2):226–235. https://doi.org/10.3390/j2020016 22. Kandali K, Bennis L, Bennis H (2021) A New hybrid routing protocol using a modified K-means clustering algorithm and continuous hopfield network for VANET. IEEE Access 9:47169– 47183. https://doi.org/10.1109/ACCESS.2021.3068074 23. Motwani M, Arora N, Gupta A (2019) A study on initial centroids selection for partitional clustering algorithms. In: Software engineering. Springer, pp 211–220 24. http://ijcsit.com/docs/Volume%205/vol5issue06/ijcsit2014050688.pdf. 25. He Z, Yu C (2019) Clustering stability-based evolutionary K-Means. Soft Comput 23:305–321. https://doi.org/10.1007/s00500-018-3280-0 26. Zhao Y, Zhou X (2021) K-means clustering algorithm and its improvement research. In: Journal of Physics: Conference Series, Volume 1873, 2021 2nd International Workshop on Electronic communication and Artificial Intelligence (IWECAI 2021), 12–14 March 2021, Nanjing, China 27. Ghazal TM, Hussain MZ, Said RA, Nadeem A, Hasan MK, Ahmad M, Khan MA, Naseem MT, Intelligent automation and soft computing. 30(2):735–742. https://scholarworks.bwise.kr/ gachon/handle/2020.sw.gachon/81931, https://doi.org/10.32604/iasc.2021.019067 28. Patra GK, sahu KK, Normalization: a preprocessing stage. https://doi.org/10.48550/arXiv. 1503.06462 29. Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html 30. https://en.wikipedia.org/wiki/Maxima_and_minima
A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data Tanvir Habib Sardar, Rashel Sarkar, Sheik Jamil Ahmed, and Anjan Bandyopadhyay
Abstract The clustering of datasets is a widely used technique in unsupervised machine learning. The cluster quality evaluation is a tricky problem because external validation is usually not possible for clustering. This happens due to the unavailability of external proof. Although there are many methods developed and experimented with to validate the results obtained from clustering, it is always preferred to use more than a few cluster validity measures. The big data usually contain a high percentage of noise in it, making it necessary to use some additional techniques to incorporate while clustering big data. In this work, we have clustered document big data using fuzzy logic-based enhancement of traditional K-Means and K-Medoids. This work has suggested the Ensembling of seven different cluster quality validity measures to determine the best quality of fuzzy clusters. The Reuters standard document dataset is clustered using different cluster numbers, and the proposed ensemble methodology is proven to determine optimal numbers of clusters. Keywords Big data · Clustering · Cluster validation · Machine learning
T. H. Sardar Assistant Professor, Department of CSE, GITAM School of Technology, GITAM University, Bangalore, India e-mail: [email protected] R. Sarkar Associate Professor, Department of Computer Science, University of Science and Technology, Ri-Bhoi, Meghalaya, India S. J. Ahmed Assistant Professor, School of Computer Science and Engineering & Information Science, Presidency University, Bangalore, India e-mail: [email protected] A. Bandyopadhyay (B) Assistant Professor, Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_23
267
268
T. H. Sardar et al.
1 Introduction Clustering is a widely used technique of unsupervised learning techniques of machine learning [1]. Clustering is used to group the data objects by deriving the implicit similarity among the data objects. The similarity is obtained without any explicit source of information. The most widely used clustering algorithms are centroid based [2]. Centroid-based clustering algorithms partition the dataset into K number of groups where each of the groups is associated with its centroid value. The centroid of a cluster can be considered similar to the center of gravity of a uniformly distributed body of mass. The best clustering results provide the minimum distance between the centroids and the objects of their cluster and a larger distance between the objects of other clusters [3]. The K-Means is the most popular centroid-based clustering algorithm where the centroid is calculated as the mean value of the objects for each of the K clusters. In K-Means, each object can be associated with a single cluster only [4]. There is a widely used fuzzy set-based modification of the K-Means algorithm, known as Fuzzy C-Means. In Fuzzy C-Means, each object belongs to each cluster to a certain level of belongingness ranging from 0 to 1. The belongingness of each data point to each of the centroids is represented by the membership matrix. The centroid of each cluster is obtained based on a fuzzy set-based technique. This strategy of clustering enhances the cluster quality of fuzzy K-Means more than the plain KMeans algorithm, especially on noisy datasets. These algorithms group the dataset by deriving the similarity among them and keeping the most similar objects together [5]. The similarity is merely calculated based on some measures such as Euclidean, Manhattan, Squared, Cosine, etc., which try to find the similarity among the data points. Typical objective functions in clustering formalize the goal of attaining high intra-cluster and low inter-cluster similarity [6]. The cluster quality evaluation is a key requirement for many circumstances such as [7]: • To determine the goodness of a clustering job in the presence of noise. • To determine the comparative advantage of one clustering algorithm over the other. • To determine the quality of two different sets of clustering output. • To determine the quality of two different clusters of a clustering outcome. • To discover non-random patterns from the dataset. • To discover the optimal number of clusters. There are numerous works found in the literature which tries to evaluate the correctness of the clustering results. There are many measures proposed over time to evaluate cluster quality. However, the quality of clustering results is difficult to evaluate [8]. The main reasons that lead to difficulty in measuring the quality of clustering results are [9]: (i) The performance of the cluster validity measures varies based on the features of the clustered dataset such as the dimensions of the data points, the distribution of the data points, the size of the dataset, and the presence of noise in the dataset. Any particular cluster validity measure performs poorly in judging
A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data
269
the cluster quality with all combinations of the above-specified dataset features. (ii) Absence of ground truth: The clustering is mostly conducted on new datasets. In this case, the ground truth, i.e., a similar test result-based cluster validity remains absent. A researcher just has features of the dataset and the results of one or more cluster quality measures to determine the quality of the clusters. In this case, researchers use their intuition to judge the cluster quality. This makes popularizes the statement “clusters lie in the eye of beholders”. (iii) In the supervised machine learning algorithms, the quality can be easily evaluated by deriving accuracy, precision, and recall. This is not possible for clustering as the clustering is performed on the datasets where the data labels are absent. In this work, we have proposed a novel ensemble technique for evaluating fuzzy cluster quality. The ensemble technique uses seven measures. The four measures among these seven measures are specifically designed for fuzzy clustering. The remaining three measures are designed for hard cluster evaluation. The Reuters benchmark dataset is used for the evaluation of the clustering work. The results show that the proposed ensemble clustering technique generates a complete and qualitative view of cluster quality. The paper is organized as below. Section II provides the proposed ensemble clustering technique with seven measures. This section also provides the preprocessing stages performed on the document dataset and briefs the algorithms used in the clustering process. Section III has provided the result and its analysis. Section IV concludes the work.
2 Methodology A. The Dataset Preprocessing The document dataset is required to be preprocessed before the clustering process starts. The clustering algorithms work on numerical datasets making the preprocessing mandatory for document datasets. The preprocessing outputs a meaningful representation of the document dataset whose foundations are the features (terms/words). These term features are presented using the numerical weightage of the terms in the dataset. The following stages are conducted for preprocessing the dataset. (i)
Tokenization: In this phase, the raw document strings of the dataset are split into the basic unit of processing: words or terms. First, the bag of words model is created by converting strings to a dictionary of unique words. Then a cleansing process is executed to trim the words (removing empty spaces) and remove the control characters put erroneously. These two steps provide a dictionary of features consisting of unique and cleansed terms. (ii) Stop Word Elimination: Generally, every document datasets consist of terms that occur very repetitively. The reason is in every text description in natural languages consists of conjunctions (like or, but, etc.) and pronouns (like she, it, he, etc.). These stop words are removed as these words have little or no
270
T. H. Sardar et al.
significance in the document comprehensibility and clustering process. Similarly, special characters and numbers are also removed in this process. The stop words are listed by frequency of occurrence and select the terms with higher frequency. In this process, the terms having very less frequency can also be removed. For example, here we have taken the threshold as 6 and all the terms with frequency less than 6 are removed. (iii) Stemming: Some terms have a root but added suffixes and prefixes. For example, consult is the root word whose uses are consultant, consulting, consultative, consultants, etc. In natural language processing, the significance lies in the root word. Also, it removes the number of terms in the dataset. Thus, the root term needs to be identified and used in the clustering process. We have used a porter stemming algorithm for this process. (iv) Vector Space Transformation: After the above-specified preprocessing steps are performed, the vector space model-based transformation is performed. The vector space model transforms the data into vectors. For the document retrieval process, each document is transformed into a vector and each vector represents each of the dictionary terms as a dimension. This n-dimensional vector space allows us to perform required operations in vector-vector and vector-scaler mode. The vector space model uses the multiplied values of term frequency (tf) and inverse document frequency (idf) to provide weights to the terms of the dictionary based on their significance. The tf is the count of a term’s presence in a document. The idf is the logarithmically scaled inverse fraction of the documents that comprise the term attained by dividing the whole quantity of documents by the number of documents comprising the term and then taking the logarithm of that quotient. B. The Fuzzy Algorithms In this work, we have designed two fuzzy algorithms: fuzzy K-Means and fuzzy K-Medoids. The outputted centroids of these two algorithms would be used in our proposed ensemble cluster validity method. Brief details of these two algorithms are provided below: (i) Fuzzy K-Means: In this algorithm, N data objects are assigned to all K clusters with some degree of proximity. It is an extension of the K-Means algorithm which uses fuzzy logic. According to the fuzzy logic, a membership grade between 0 and 1 is assigned to each data object. The objective function of FCM is calculated based on the distance between the data object and cluster centers of their fuzzy memberships. The algorithm of traditional fuzzy K-Means can be summarized as follows: • Specify the number of clusters (i.e., centroids) k • Repeat until the maximum number of iterations is reached, or when the algorithm has converged: – For each point, compute its coefficients of being in the clusters using μi, j , using Eq. (1).
A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data
μi, j =
1 k
q=1
di −C j di −Cq
2/ (m − 1)
271
(1)
μi, j is the membership value between object di and centroid Cj . • Compute the centroid for each cluster Cj to j , using Eq. (2). Cj =
di ∗ μi, j
m m μi, j /
(2)
The μ(i, j) creates a membership matrix that contains values of belongingness between each centroid and vectors. d is the set of objects and C is the set of centroids. m is the fuzziness coefficient. The optimal value of m lies between [1.5, 2.5] for all practical applications and datasets [227]. In our proposed fuzzy algorithms, the value of m is taken as 2 by considering the mid-value of the proposed standard range of [1.5, 2.5]. FOR each Ꞓ ϵ Ꞓ DO FOR each nearest Ԁi ϵ Ԁ DO IF (dis=δ2 (Ԁi, ℂj) is MIN) THEN ĈJ = Ԁi
(ii) Fuzzy K-Medoids: The fuzzy K-Medoids are designed in such a way that the fuzzy K-Means calculated centroids are replaced with the nearest object of the data space. It means that, like fuzzy K-Means, the equation (i) and (ii) are followed in fuzzy K-Medoids but an additional step is performed to replace each recalculated centroid with the nearest object as medoids. The list of medoids should be unique. The below pseudo code explains this process: C. The Proposed Ensemble Validation Methodology In this work, we have proposed an ensemble method for cluster validity. This method has aimed at using seven different cluster validity measures in this ensemble method. The use of these seven measures for validity will ensure obtaining better justification in comprehending the cluster quality. It is observed that a few of the cluster validity measures perform better in judging the cluster quality, for a certain clustering technique and a certain dataset type. The use of multiple consensus-based ensemble methodologies of cluster validity would certainly enhance the cluster quality analysis. In this work, the following cluster quality analysis measures are used: (i) Silhouette measure: The silhouette of each data object x is calculated using the following Eq. (3):
272
T. H. Sardar et al.
§x = (m x − n x )/max(m x , n x )
(3)
where m x is the mean value of distances between data object x to all other data objects in the same cluster. The value n x is the least mean distance between data object x to all other data objects in the different clusters. The mean silhouette value is the mean value of §x for all data objects x in the dataset. The value remains between −1 to + 1. The negative value means the clustering is performed incorrectly, and the value 0 means that the inter-cluster distances are negligible and indifferent. The higher value, as close to 1, means that clusters are well separated from each other and the clustering job is well performed. (ii) Dunn measure: Dunn measure calculates the quality of a clustering job by determining the compactness and separation of the clusters, where the averages of different clusters are more separated than the average value of intra-cluster variance. The Dunn value is calculated using the following Eq. (4): D = min
min
(1≤i≤k) (i+1≤ j≤k)
δi , j min(1≤l≤k) (l )
,
(4)
where δ(i , j ) represents the minimum distance between any data objects which lies in clusters i and j . (l ) is the diameter of the l cluster which is calculated by the maximum distance between any pair of data objects that lies inside the cluster l . The larger the value of the Dunn measure the better the clustering job is considered. (iii) Davies-Bouldin (DB) measure: The DB measure is designed as the average likeness among the different clusters, where likeness is the proportion of intracluster to inter-cluster distances. Thus, if the clustering provides distinctly placed clusters, then that clustering job will result in a higher value of DB. The DB value is calculated using the following equation (v):
k (i ) + j 1 max DB = k i=1 (1≤ j≤k, j=i ) δ i , j
(5)
The calculation parts of Eq. (5) can be comprehended from Eq. (6). (iv) Xie-Beni (XB) measure: This measure is specially designed for fuzzy clustering quality analysis. The separation (inter-cluster distance) in a clustering job is calculated as the minimum squared distance between the mean values of the clusters. The compactness (intra-cluster) is calculated as the average distance between the data objects of the cluster to its centroid. XB values are calculated using the following Eq. (6):
A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data
2 μi,2 j Oi − j n ∗ mini= j i − j
k ˙ B˙ = X
j=1
273
n
i=1
(6)
The μ value is the membership value obtained as provided in Eq. (1). In Eq. (6), O refers to the objects and refers to the centroids. The least the XB value the better the clustering quality. (v) Fukuyama-Sugeno (FS) measure: This measure is also specially designed for fuzzy clustering quality analysis. FS is calculated by subtracting the compactness from the separation of clusters. The smaller the value of FS the better the quality of clustering is considered. FS values are calculated using the following equation (vii):
F˙ S˙ =
n k
k 2 j − 2 μi,2 j Oi − j −
i=1 j=1
(7)
j=1
where is the average mean to centroids. (vi) Bezdek’s Partition Coefficient (PC): This fuzzy clustering quality analysis measure’s maximum PC value indicates better quality clustering. The PC is calculated by taking the mean value of the squared membership values. PC is defined in Eq. (8):
n k 1 2 PC = μ n j=1 i=1 i, j
(8)
(vii) Bezdek’s Partition Entropy (PE): This fuzzy clustering quality analysis measure’s minimum PE value indicates better quality clustering. The PE is calculated by taking the negative average value of membership values multiplied by the log base 10 of the same value.PE is defined in Eq. (9): PE = −
n k 1 μi, j log μi, j , n j=1 i=1
(9)
The above seven measures are widely used and known for their performance as an internal measure of cluster quality. However, it is not suitable to use any of the particular quality measures. This is because some of the quality measures work fine with some datasets and clustering outcomes, but fails to validate cluster quality for other datasets. This has motivated us to design an ensemble methodology for clustering where seven widely used and time-proven cluster validity measures are combined to obtain cluster validity. In this process, two terms are specially used to
274
T. H. Sardar et al.
Fig. 1 The flow of the work
determine the cluster validity namely, (i) Absolute consensus: This term refers to when a certain clustering result is obtained by all seven measures. And (ii) Majority consensus: This term refers to when a minimum of four of seven measures agrees on a clustering quality while other measures disagree. The flow of the work is shown in Fig. 1.
3 Result and Discussion A. Preprocessing The dataset used in this work is Reuter’s document classification dataset. It is the R90 subset of the popular Reuters-21578 dataset. The dataset is labeled with 90 classes for the total repository of 10,788 documents, including 7769 training documents. In the training documents, the class-wise occurrence of average terms in the dataset lies in the range of 93–1263. The dictionary after constructing the bag of words model provides us with 35,247 terms. For better clustering, we have considered terms that appear at a minimum of 5 times in the dataset and obtained 20,768 terms. After the preprocessing of the dataset, each document is transformed to vector space-specific term-specific calculated weight. Each weight provides insight into the importance of the term in the dataset. A sample dataset line is shown in Fig. 2. In the sample line provided in Fig. 2, the “key” is the same for all of the datasets. The “key” represents the root folder of the dataset with the name of the folder. The line is the same for all the lines and shows 20,768 uniquely available terms in the dataset which has been assigned with the vector space calculated weight. The linespecific unique values like 0, 1… etc., are the assigned values to the unique terms
A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data
275
Key:Reuters21578;len:20768;0:0.06947104298102921;1:0.00482019749275402 ; 2:0.0293220928439213;3:0.01369200218539219; ……………; Fig. 2 Sample line for vector space transformed data
of the dataset. These are the keys of the terms. These values are followed by a “:” and the vector space calculated weights. These weights are the values assigned to the keys. B. Clustering Results: The fuzzy algorithms, fuzzy K-Means, and fuzzy K-Medoids are executed and the clusters are outputted. Each cluster has a centroid and the value for K = 90 is ideal for this dataset because the dataset is labeled with 90 classes. However, to experiment with the dataset the 3 K values have been chosen: 80, 90, and 100. The 80 and 100 are taken to check the validity of the clustering with these non-optimal clusters using the proposed validation methodology. C. Ensemble Validation Technique The ensemble validation technique is used to evaluate the cluster quality. As explained in the methodology section, a total of seven different cluster validity measures are used and a majority and absolute consensus-based validity are performed. The result is provided in Table 1. In Table 1 Shtt, DI, DBI, XB, FS, PC, and PE are used to represent Silhouette, Dunn index, Davies-Bouldin, Xie-Beni, Partition Coefficient, and Partition Entropy measures of cluster validity. Table 1 shows that for different values of centroids the different values are obtained by all of the different cluster quality measures. It is to be noted that all of the cluster validity measures don’t result similarly for every dataset and clustering technique. For a certain dataset, one particular cluster validity measure may work well but another cluster validity measure may not show the proper result. For example, it is observed that the Dunn index may validate good clustering results if the dataset has noise. As we are using the ensemble method of cluster validity, this skewness problem of the cluster validity measures is normalized. To analyze the results provided in Table 1 for each of the validity measures, a separate figure of scatter plots is drawn where Table 1 The validity values using different measures Algo
K
Shtt
DI
DBI
XB
FS
PC
PE
Fuzzy K-Means
80
0.71
0.57
1.739
0.332
14.67
0.36
1.30
90
0.87
0.609
1.723
0.312
13.219
0.371
1.32
100
0.90
0.62
1.741
0.326
14.40
0.40
1.37
80
0.92
0.56
1.55
0.31
13.59
0.373
1.49
90
0.91
0.571
1.521
0.287
11.541
0.432
1.56
100
0.93
0.59
1.49
0.321
11.59
0.411
1.63
Fuzzy K-Medoids
276
T. H. Sardar et al.
BLUE and orange lines show the execution of fuzzy K-Means and fuzzy K-Medoids, respectively. Figure 3 shows that the Silhouette values specify that best-quality clustering is obtained by fuzzy K-Medoids when the number of clusters is chosen as 100. Figure 4 shows that the Dunn values specify that best-quality clustering is obtained by fuzzy K-Means when the number of clusters is chosen as 100. Figure 5 shows that the Davies-Bouldin values specify that best-quality clustering is obtained by fuzzy KMedoids when the number of clusters is chosen as 100. Figure 6 shows that the XieBeni values specify that best-quality clustering is obtained by fuzzy K-Medoids when the number of clusters is chosen as 90. Figure 7 shows that the Fukuyama-Sugeno values specify that best-quality clustering is obtained by fuzzy K-Medoids when the number of clusters is chosen as 90. Figure 8 shows that the Partition Coefficient values specify that best-quality clustering is obtained by fuzzy K-Medoids when the number of clusters is chosen as 90. Figure 9 shows that the Partition Entropy values specify that best-quality clustering is obtained by fuzzy K-Means when the number of clusters is chosen as 90. The following observations can be obtained from Table 1 and Figs. 3, 4, 5, 6, 7, 8, and 9: Fig. 3 Silhouette plotting
Fig. 4 Dunn plotting
Fig. 5 Davies-Bouldin plotting
A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data
277
Fig. 6 Xie-Beni plotting
Fig. 7 Fukuyama-Sugeno plotting
Fig. 8 Partition coefficient plotting
Fig. 9 Partition entropy plotting
• The majority voting shows that the optimal number of clusters that generate the best-quality clustering is 90. This is obtained through by Xie-Beni, FukuyamaSugeno, Partition Coefficient, and Partition Entropy. • The majority voting shows that the best algorithm among the two fuzzy algorithms is fuzzy K-Medoids. The superiority of fuzzy K-Medoids over fuzzy K-Means is obtained through Silhouette, Davies-Bouldin, Xie-Beni, Fukuyama-Sugeno, and Partition Coefficient. • The above observations are obtained using majority voting. An absolute consensus is not obtained. This indicates the fact that it is not feasible to rely on any particular cluster validity measure.
278
T. H. Sardar et al.
4 Conclusion The quality analysis of a clustering job is a tedious process. The unavailability of ground truth makes it difficult to judge the quality obtained from a clustering job. However, many internal cluster validity measures are developed over time which validate the cluster quality using the generated clusters. This work validates cluster quality from fuzzy clusters generated from two popular fuzzy clustering algorithms named fuzzy K-Means and fuzzy K-Medoids. An intensive ensemble-based validity method is deployed using seven widely used validity measures: Silhouette, Dunn, Davies-Bouldin, Xie-Beni, Fukuyama-Sugeno, Partition Coefficient, and Partition Coefficient. The result shows that the use of this ensemble method based on majority voting for validity checking has correctly decided the correct number of optimal clusters 90 for the well-known and popular Reuters dataset which has 90 classes of documents. It is also found from the majority voting that fuzzy K-Medoids generate better quality clusters than fuzzy K-Means. The absence of an absolute majority affirms the fact that having one or fewer cluster validity measures may not validate cluster quality accurately.
References 1. Sardar TH, Ansari Z (2022a) Distributed big data clustering using MapReduce-based fuzzy C-medoids. J Inst Eng India Ser B 103:73–82. https://doi.org/10.1007/s40031-021-00647-w 2. Sardar TH, Ansari Z (2022b) MapReduce-based fuzzy C-means algorithm for distributed document clustering. J Inst Eng India Ser B 103:131–142. https://doi.org/10.1007/s40031-021-006 51-0 3. Sardar TH, Ansari Z (2020) An analysis of distributed document clustering using MapReduce based K-means algorithm. J Inst Eng India Ser B 101:641–650. https://doi.org/10.1007/s40031020-00485-2 4. Ansari Z, Afzal A, Sardar TH (2019) Data categorization using Hadoop MapReduce-based parallel K-means clustering. J Inst Eng India Ser B 100:95–103. https://doi.org/10.1007/s40 031-019-00388-x 5. Sardar TH, Ansari Z (2018a) An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm. Future Comput Inform J 3(2):200–209 6. Sardar TH, Ansari Z (2018b) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Future Comput Inform J 3(2):247–261 7. Marutho D, Handaka SH, Wijaya E (2018) The determination of cluster number at k-mean using elbow method and purity evaluation on headline news. In: 2018 international seminar on application for technology of information and communication. IEEE, pp 533–538 8. Nazari A, Dehghan A, Nejatian S et al (2019) A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Appl 22:133–145. https://doi.org/ 10.1007/s10044-017-0676-x 9. Raskutti B, Leckie C (1999) An evaluation of criteria for measuring the quality of clusters. IJCAI 99:905–910
Model Analysis for Predicting Prostate Cancer Patient’s Survival: A SEER Case Study Md. Shohidul Islam Polash, Shazzad Hossen, and Aminul Haque
Abstract Prostate cancer is assumed to be the most familiar cancer and the principal cause of death in the world. For effective treatment to decrease mortality, an accurate survival projection is essential. A remedy plan can be scheme under the predicted survival state. Machine Learning (ML) approaches have recently attracted significant attention, particularly in constructing data-driven prediction models. Prostate cancer survival prediction has received little attention in research. In this article, we constructed models with the support of ML techniques to determine the possibility of whether a patient with prostate cancer will survive or not. Feature impact analysis, a good amount of data, and a distinctive track make our model’s results better compared to previous research. The models have created using data from the SEER (Surveillance, Epidemiology, End Results) database. SEER program collects and distributes cancer statistics to lessen the disease impact. Using around twelve prediction models, we assessed the survival of prostate cancer patients. HGB, LGBM, XGBoost, Gradient Boosting, and Ada Boost are notable prediction models. Among them, the XGBoost is the best contribution, with an accuracy of 89.57%, and found to be faster among the models. Keywords Prostate cancer · Survivability prediction · Correlation analysis · LGBM classifier · XGBoost classifier
Md. S. I. Polash (B) · S. Hossen · A. Haque Department of Computer Science and Engineering, Daffodil International University, Ashulia, Dhaka 1341, Bangladesh e-mail: [email protected] S. Hossen e-mail: [email protected] A. Haque e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_24
279
280
Md. S. I. Polash et al.
1 Introduction Prostate cancer is regarded as the second most common kind of malignancy detected in males and the fifth most common cause of death on a global scale. Prostate cancer ranks top in preponderance, whereas fatality rates put it in third place [1]. This cancer is also the most prevalent in 105 countries [2]. While treating prostate cancer patients, if the doctors could understand whether patients will survive, they could formulate a better treatment plan. Sometimes, the doctor tries to understand based on the patient’s physical condition by comparing it with the previous patient, but a doctor may not diagnose so many patients in a limited lifetime. Therefore, we have developed an artificial prediction model with data from more than fifty thousand patients to determine the possibility of patient survival, which will help the doctor prepare the prescription. We have accumulated all these relevant factors from the SEER program, which are justified by AJCC (American Joint Committee on Cancer) [3]. Features with unique effects were found through correlation analysis and fed to the machine. Machine learning (ML) is widely used to automate medical services. We tried to predict whether a patient would survive or not using ML methods. The target attribute has two classes, alive or dead, so it will be deemed a binary classification problem. We used popular ML classifiers: Hist Gradient Boosting (HGB), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), and AdaBoost to predict the survivability of prostate cancer patients. Moreover, we used accuracy [1, 4], precision [1, 5, 6], sensitivity [2, 4], specificity [2, 4], AUC [5, 11], and ROC curves for interpretation of our results. The new thing we have added in interpretability is how quickly our model predicts. The boosting algorithms showed competitive performance, so the best model has been selected based on how fast an algorithm predicts. Among all the classifiers, XGBoost performs better than other classifiers because it is faster than HGB, LGBM, GBM, and AdaBoost classifiers. However, the accuracy of those classifiers is almost closest to 89.5%. The novelty of our work: 1. A survival prediction model using the XGBoost technique for prostate cancer patients with 89.56% accuracy. 2. Identified features that have a similar impact on prostate cancer. These will be helpful in future research, such as determining the stage of prostate cancer patients. 3. Identified a fast prediction model for patient survival. This approach can be useful for further cancer research. We used label encoding in feature engineering and built prediction models with massive data and attributes with unique effects; moreover, we tested how fast the models work on different platforms, which made the prostate cancer survival prediction model superior to others. Furthermore, to the best of our knowledge, no other research has predicted the survivability of prostate cancer in our way. Wen [1] has shown prostate cancer survival in different aspects. Moreover, their accuracy was 85.64%, but we achieved 89.57%. However, our concept of survival differs from others discussed in the comparison section.
Model Analysis for Predicting Prostate Cancer Patient’s Survival …
281
2 Literature Review Machine learning methods have been used in a lot of cancer diagnostic research. Wen [1] researched prostate cancer prognosis. He applied standard preprocessing techniques, ANN, and ML methods such as Naive Bayes (NB), Decision Trees (DT, K Nearest Neighbors (KNN), and Support Vector Machines (SVM). His target survival categories are 60 months or greater. ANN’s result is the best achievement with 85.6% accuracy. By taking the same prostate target attribute (five-year survivorship), Delen contributed a work [2] using data mining tools and SEER data in 2006. In their research, they used ANN, DT, and LR classifiers. With an accuracy of 91.07%, ANN beats all other classifiers. Data duplication removal and feature selection were not used in their work. Cirkovic [4], Bellaachia [6], and Endo [7] predicted breast cancer survival with a similar target approach. Cirkovic [4] developed a ML model to predict breast cancer patients’ chances of survival and recurrence. However, they only used 146 records and twenty qualities to predict 5-year survivability; the NB classifier was chosen as the best model. Bellaachia [6] used three data mining techniques on the SEER dataset: NB, back-propagation neural networks, and DT(C4.5) algorithms, where C4.5 performed better overall. By contrasting seven models (LR, ANN, NB, Bayes Net, DT, ID3, and J48), Endo [7] attempted to create the five-year survival state. The logistic regression model shows the highest accuracy, 85.8%. Montazeri [5] and Delen [8] survival indicate whether the patient will live or die. Montazeri [5] developed a rule-based classification approach for breast cancer. He employed a dataset of 900 patients, just 24 of whom were men or 2.7% of the whole patient population. He applied traditional preprocess techniques and ML algorithms, including NB, DT, Random Forest (RF), KNN, AdaBoost, SVM, RBF Network, and Multilayer Perceptron. He assessed the model using accuracy, precision, sensitivity, specificity, and area under the ROC curve and 10-cross fold validation. The RF with an accuracy of 96% was better than previous techniques. Delen [8] combined a widely utilized statistical technique, logistic regression, and decision trees to create the prediction models for breast cancer. For unique performance comparison, tenfold cross-validation techniques were shown. The DT model exhibits the greatest performance with 93.6% accuracy. The authors of [1, 2] focused on predicting whether a prostate cancer patient will live for sixty months or five years, referring to survival. However, since our objective attribute in this paper is different, we attempted to predict whether a patient would live or die in this experiment, known as the Vital Status Recode feature in the SEER database. This characteristic is used extensively to predict other cancer survival [5, 8–10]. Fewer records are provided by Montazeri [5], Agrawal [11], and Lundin [12] to use in predicting cancer survival. It glances that Wen [1], Delen [8], and Pradeep [10] used a small number (less than five) of algorithms for predicting breast, prostate, and lung cancer. Models for breast and lung cancer were constructed using a minimal set of characteristics in [7, 11, 12] by Agrawal, Lundin, and Endo. They also did not
282
Md. S. I. Polash et al.
mention the prediction time of any models, and there are differences in the number of attributes, which has allowed our efforts to go forward.
3 Methodology The target forecast model is based on a primary life cycle of machine learning. Essential procedures include data collection, data preparation, correlation analysis, data partitioning into train sets and test sets, model creation, cross-validation, and model testing (Fig. 1).
3.1 Data Collection For this research, we collect data from the SEER information repository. It aims to reduce the illness burden among the U.S. people. There are 187798 entries and 37 characteristics in the dataset. The AJCC has determined that these factors have a causal relationship with cancer and can provide intelligence to machine reasoning inclusion in the SEER database [3]. Some of these characteristics, such as the patient’s identification number, are not essential for our forecast. The initial observation made after obtaining the data was that every patient was male; hence, the sex characteristic was ignored. Based on domain analysis, survival months, and contaminated years, have been eliminated. Columns with only one class have been omitted. Moreover,
Fig. 1 Procedural framework
Model Analysis for Predicting Prostate Cancer Patient’s Survival …
283
27 facts have been included in the data processing. The remaining characteristics are Age, ICD O 3 Hist or behav, Laterality, RX Summ Surg Prim Site, CS extension, CS lymph nodes, Total number of benign or borderline tumors for patient, Derived AJCC N, Histology recode broad groupings, Diagnostic Confirmation, RX Summ Scope Reg LN Sur, Reason no cancer directed surgery, Regional nodes examined, Site recode rare tumors, AYA site recode, RX Summ Surg Oth Reg or Dis, First malignant primary indicator, CS tumor size, CS mets at dx, Derived AJCC Stage Group, Derived AJCC M, Derived AJCC T, Race recode, Total number of in situ or malignant tumors for the patient, Grade, and Regional nodes positive. We also have seen these in other works; therefore, many authors drew inspiration from them.
3.2 Experimental Setup The experiment used Google Colaboratory’s Intel Xeon CPU for model creation. MLbased prediction model constructed using scikit-learn, pandas, NumPy, and seaborn. Overall, the python programming language was used. The models have tested on several platforms for analyzing the prediction time. A serial process with a single core of the following CPU has used for the prediction time. Used processors are Intel Xeon, Intel Core i5-9300H, AMD Ryzen 7 3700X CPU.
3.3 Data Preparation Handling of missing value and data duplication: The SEER statistics on prostate cancer are deficient in many different aspects. Due to the advancement of medical knowledge, new characteristics have emerged that did not previously exist. Consequently, prior patients’ information on these current characteristics is unavailable. 10349 entries are entirely missing from our database, or around 5.5% of all records. This 10349 data point is negligible when compared to 187798 data points. Since prostate cancer patients expect accurate diagnostics for better treatment adding the missing information may lead to erroneous results. Therefore, the missing records have been eliminated. There is a possibility that the physiology of many patients is similar. For that, we discovered data duplication, which might have resulted in inaccurate conclusions; thus, we eliminated the redundant data. There are 54731 records in total, excluding duplicates. Our target attribute contains 44379 records in the Alive class and 10352 records in the Dead class. Feature Engineering: A prediction model can only be created by feeding the data into the numerical format. Nineteen out of the twenty-seven characteristics in the dataset are nominal, while eight are numerical. Therefore, nineteen attributes must be transformed into a machine-readable format. The Label encoding module from the
284
Md. S. I. Polash et al.
Python scikit-learn package was used to accomplish this goal. Label encoding is the process of taking textual labels and transforming them into numeric representations. Correlation Analysis: Correlation coefficients show how connected attributes are. Strong correlation coefficients of +0.8 to +1 and −0.8 to −1 represent the same behavior [14]. ¯ i − y¯ ) (xi − x)(y (1) r = ¯ 2 (yi − y¯ )2 (xi − x) Using Eq. 1, we have calculated the correlation coefficients(r). xi is the value of one attribute, and x¯ is the mean of the values; in the same manner, yi is the individual value of another feature, and y¯ is the average of that attribute. From Fig. 2 we can see that four pairs of attributes behave similarly. The manners of “Histology recode broad groupings” and “ICD-O-3 Hist or behav” have the same impact because their correlation coefficient is 0.98. In the same way, “RX Summ Surg Prim Site” and “Reason no cancer directed surgery” have 0.9, “CS Mets at dx”, and “Derived AJCC M” have 0.95; finally, “CS lymph node” and “Derived AJCC N” have 0.94 coefficients. These pair’s impacts are the same in the dataset. Models
Fig. 2 Zoomed correlation heatmap
Model Analysis for Predicting Prostate Cancer Patient’s Survival …
285
will be created with one attribute from each of these two pairs. ’ICD-O- 3 Hist or behav’, ’RX Summ Surg Prim Site’, ’CS lymph nodes’, and ’CS mets at dx’; these four attributes have been omitted. Moreover, the rest of the attributes has their own individuality.
3.4 Data Splitting We utilized different ratios of data for training and testing. Training and testing ratios are 70:30, 75:25, and 80:20, and the average accuracy difference between each ratio is less than 1%. The best results we have received using 80% of train data and 20% of the test data. The train-test-split module of scikit-learn is used in data separation.
3.5 Machine Learning Model Building The performance of twelve ML algorithms employed to predict survivorship. The ML methods include LGBM Classifier, Random Forest [11], Extra Trees, Logistic Regression [2, 7, 8], SGD Classifier, HGB, Gradient Boosting, XGBoost [14], KNN, Decision Tree [1, 4, 8, 11], and AdaBoost classifier [13]. Our finest discovered predicting model is created using the XGboost classifier. XGBoost Classifier: XGBoost is a decentralized gradient boosting toolkit that has been tuned for efficiency, flexibility, and portability. It incorporates machine learning methods inside the context of Gradient boosting. XGBoost is a parallel tree boosting algorithm that solves several data science challenges quickly and precisely. The same code operates on major distributed environments and is capable of solving issues that exceed billions of instances.
3.6 Performance evaluation Some tests are carried out after the creation of the machine learning models to understand their applicability. The accuracy, F1 score, Precision, Recall, Cross-Validation [14], and time interpretability of our model have been evaluated. To measure these parameters confusion matrix is a necessary item. Confusion matrix, often called an error matrix, is a table structure used in machine learning to visualize the performance of a supervised learning system. From a confusion matrix we get, True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). Equations that need to calculate ML performance measurments: Accuracy = (TP + TN)/(TP + TN + FP + FN), Recall = TP/(TP + FN), Precision = TP/(TP + FP), F1 Score = 2 * (Recall * Precision)/(Recall + Precision), Sensitivity = TP/(TP + FN), and Specificity = TN/(TN + FP). These equations generate Table 1’s values.
286
Md. S. I. Polash et al.
AUC: Area Under the Curve. It is utilized as an indicator of performance and indicates how well the model makes distinctions between different models. ROC Curve: The total classification levels of a categorization model are represented graphically by a receiver operating characteristic (ROC) curve.
3.7 Cross-Validation Cross-validation prevents overfitting in prediction models. It is a statistical procedure including creating a predetermined number of data folds, the analysis of each fold, and averaging the overall error estimate. We generated ten folds of our data. We have attempted a wide range of methodologies as part of our research. We have gone into a great deal of detail about the processes that led to our success in achieving our objectives.
4 Results and Discussion We used ML methods to build a model that could predict a patient’s prognosis for prostate cancer survival. Numerous predictive models have been developed among these models; we must choose the finest one. Table 1 shows the models’ prediction performance. Consequently, we can observe that the tree-based boosting techniques performed exceptionally well on our dataset. Each of them has an accuracy of little more than 89%. Now, we must choose the optimal algorithm from among these candidates. In Table 1, we can see that Hist Gradient Boost performs best based on accuracy, and its accuracy is 89.61%. The AUC score from Table 1 and the ROC curve in Fig. 3 shows that the best four algorithms, HGB, LGBM, XGBoost, and Gradient Boosting Classifier, have covered almost 92% of the data correctly. However, we have to think about which algorithm has been able to identify the target two classes more accurately.
Table 1 Performance measurements of the top 6 ML models Algorithms Accuracy F1 score Precision Recall HGB classifier LGBM classifier XGB classifier Gradient boosting classifier Ada boost classifier Logistic regression
AUC
Avg. CV
89.6136 89.5862 89.5679 89.5588
0.8187 0.8176 0.8143 0.8164
0.8453 0.8458 0.8501 0.8464
0.7984 0.7964 0.7891 0.7942
0.9243 0.9249 0.923 0.9229
0.8874 0.8872 0.8883 0.8881
89.4857 87.7592
0.8148 0.7748
0.8454 0.8224
0.7923 0.7459
0.9192 0.8862
0.8864 0.8717
Model Analysis for Predicting Prostate Cancer Patient’s Survival …
287
Fig. 3 Sensitivity and specificity of models and ROC curve
Fig. 4 Prediction time for test data with best three models
The sensitivity of Fig. 3 clearly distinguishes the algorithms. Because the sensitivity of the XGBoost has achieved 0.7851, which is more than others. Nevertheless, the sensitivity and specificity of the algorithms do not show much difference. For which the prediction time of the algorithms has been calculated. Different sets of data have used to check the prediction time; test data, train data, and single-person data were on the list. As the test records are unseen for the models, we represented prediction time with test data in Fig. 4. Till now, Hist Gradient Boosting Classifier, LGBM Classifier, and XGB Classifier algorithm accuracy are ahead in the race, but among them, XGB Classifier can predict in the shortest time. We can see that in three different platforms, on average the XGB Classifier takes 42.03 ms, the LGBM Classifier takes 97.9 ms,
288
Md. S. I. Polash et al.
Fig. 5 Normalised confusion Matrix of LGBM and XGBoost classifier
and the HGB classifier takes 164 ms. It indicates that the prediction model using the XGboost model is nearly two times and four times faster than LGBM and HGB models, respectively. In other sets of data, we also received the same ratio of time difference. Although we can see from Fig. 5 that the XGBoost can identify the dead class 0.02% less than LGBM, which is a tiny difference, but XGBoost model is two times faster than LGBM. Therefore we came to the conclusion that the predictive model using XGB Classifier would be our proposed model. Let us look at the in-depth analysis of the XGBoost model. XGBoost Classifier: Our accuracy score in the XGBoost Classifier is 89.56% which is very close to the LGBM classifier (Table 1). The report also said that the macro f1 score is 0.8143, the recall score is 0.7891, and the precision score is 0.8501, which is so near to the LGBM classifier. The average cross-validation score was 88.83% in the 10 folds cross-validation. That means this model perform better than LGBM (88.72% ) and HGB (88.74% ) in total data. This model is not overfitted either, since the training error is within acceptable bounds. Overall the XGBoost prediction model outperforms all models and is our recommended model. Comparisons: Delen [2] and Wen [1] have researched prostate cancer survival. However, there were differences in their target attributes. Whether the patient will live sixty months or five years, they both show it as a survivability prediction. However, we have predicted whether a prostate cancer patient will live or die. In the same way, Mourad [9] has predicted the survival of thyroid cancer, Montazeri [5] and Delen [8] for breast cancer, and Pradeep [10] for lung cancer. To the best of our knowledge, no one has predicted prostate cancer survivability like us. Which Pradeep, Mourad, and Montazeri did for other cancers. Consequently, our suggested model is unique, and its outcomes are satisfactory. As a result, our XGBoost model contributes to prostate cancer survival prediction. Montazeri [5], Agrawal [11], and Thongkam [13] employed small datasets of 146, 900, and 951 entries, respectively. However, we used a dataset of 187798 entries. Whereas we utilized 12 algorithms, Wen [1] and Delen [8] only employed five and three algorithms, respectively.
Model Analysis for Predicting Prostate Cancer Patient’s Survival …
289
For breast cancer survival, Montazeri [5] and Delen [8] achieved 96% and 93.6% accuracy; for thyroid and lung cancer, Mourad [9] and Pradeep [10] gained 94.5% and 82.6% accuracy respectively. Wen [1] studied the survivability of prostate cancer and attained an accuracy of 85.64%. In contrast, we were able to forecast the survivorship of prostate cancer with an accuracy of 89.56%. On the other hand, we demonstrated the length of time needed for our model to provide a reliable prediction, whereas those other authors did not show the prediction time of their model. We used crossvalidation to rejudge the results and, after utilizing correlation, used the distinctive traits that were affected.
5 Conclusion With the use of computational intelligence, in this article we intend to determine the possibility of a prostate cancer patient’s survival. Various techniques are evaluated by using characteristics with unique effects. Our XGBoost classifier-based prediction model is proposed to predict the survival of prostate cancer patients by analyzing how fast algorithms can successfully predict. We have shown the suitability of our model compared to others. Our system would play a revolutionary role in the digitalization of medical diagnosis for prostate cancer. The model has been evaluated using the data of 10946 individuals. Even more impressive, it has predicted the survival rate of 89.56% of patients with a high degree of accuracy. The physicians will be able to determine the patient’s likelihood of survival with the assistance of artificial intelligence, which will allow them to devise a treatment strategy that is more promising. In the future, we will try to make a better model to predict the survival state with lower error.
References 1. Wen H, Li S et al (2018) Comparision of four machine learning techniques for the prediction of prostate cancer survivability. In: 15th international computer conference on wavelet active media technology and information processing. IEEE, pp 112–116 2. Delen D, Patil N (2006) Knowledge extraction from prostate cancer data. In: 39th Annual Hawaii international conference on system sciences (HICSS’06). IEEE, pp 92b–92b 3. Lynch CM et al (2017) Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 108:1–8 4. Cirkovic BRA et al (2015) Prediction models for estimation of survival rate and relapse for breast cancer patients. In: 15th international conference on bioinformatics and bioengineering (BIBE). IEEE, pp 1–6 5. Montazeri M et al (2016) Machine learning models in breast cancer survival prediction. Technol Health Care 24:31–42 6. Bellaachia A, Guven E (2006) Predicting breast cancer survivability using data mining techniques, pp 10–110 7. Endo A et al (2008) Comparison of seven algorithms to predict breast Cancer survival. Int J Biomed Soft Comput Hum Sci 13:11–16
290
Md. S. I. Polash et al.
8. Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34:113–127 9. Mourad M et al (2020) Machine learning and feature selection applied to SEER data to reliably assess thyroid cancer prognosis. Sci Rep 10:1–11 10. Pradeep K, Naveen N (2018) Lung cancer survivability prediction based on performance using classification techniques of support vector machines, C4. 5 and Naive Bayes algorithms for healthcare analytics. Procedia Comput Sci 132:412–420 11. Agrawal A et al (2012) Lung cancer survival prediction using ensemble data mining on SEER data. Sci Program 20:29–42 12. Lundin M et al (1999) Artificial neural networks applied to survival prediction in breast cancer. Oncology 57:281–286 13. Thongkam J et al (2008) Breast cancer survivability via AdaBoost algorithms. In: Proceedings of the second Australasian workshop on Health data and knowledge management-Volume 80. Citeseer, pp 55–64 14. Polash MSI, Hossen S et al (2022) functionality testing of machine learning algorithms to anticipate life expectancy of stomach cancer patients. In: 2022 international conference on advancement in electrical and electronic engineering (ICAEEE), pp 1–6
Quantum-Inspired Neural Network on Handwriting Datasets Manik Ratna Shah, Jay Sarraf, Prasant Kumar Pattnaik, and Anjan Bandyopadhyay
Abstract With the advancement of machine learning for classifying and categorizing larger sets of data, there is a high need for greater computational power. Quantum computing in machine learning advances to solve this in a lesser time, consuming less energy and power. This paper includes the comparative analysis of Traditional Neural Networks with Quantum Neural Networks using handwritten digits from the simplified version of the MNIST dataset.
1 Introduction Handwriting recognition is considered one of the most challenging and fascinating tasks in numerous machine learning fields like pattern recognition and image processing [1]. Different people have different handwriting and different strokes. This not only takes a longer time to process the task but also causes many problems in handwriting recognition. Handwriting recognition has a lot of useful applications. This field has become one of the top research areas where several studies and research have been conducted for a long time. Handwriting recognition can be considered a process of detecting and recognizing characters from images and converting them into ASCII or equivalent machine-readable form. Handwritten digit recognition is the capability of a computer to recognize human handwritten digits. The digits can differ from person to person with various shapes and sizes. There are various applications of handwritten digit recognition like vehicle number plate recognition, bank cheque processing, postal address sorting, etc. Modern computers have difficulty recognizing handwritten digits. However, by introducing Artificial Intelligence and Machine Learning techniques, the difficulty is reduced rapidly. Many Artificial Neural Network Systems have already been developed that can recognize handwritten digits with an accuracy of more than 90%. M. R. Shah · J. Sarraf · P. K. Pattnaik (B) · A. Bandyopadhyay School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha 751024, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_25
291
292
M. R. Shah et al.
Neural networks are widely used in many machine learning applications such as speech recognition, image classification, and handwriting recognition. The size of Neural networks on classical computers greatly increases as the demand for accuracy increases. Due to this, there is a high requirement for computation power and hardware resources. Quantum computing on the other hand is increasing rapidly day by day. The number of qubits has increased from 5 qubits back in 2016 and is expected to increase to 1121 qubits in 2023. This has caused the introduction of the neural network to quantum computing. Quantum Neural Networks are expected to achieve exponential speedup in comparison to Classical Neural Networks [2]. Quantum computers provide a different computing environment in comparison to classical computers. They use the principle of Quantum mechanics such as superposition and entanglement that cannot be used by classical computers. Quantum computing is expected to solve problems that cannot be solved by even the most powerful supercomputers in the world [3]. The objective of this research is to predict handwritten digits with maximum accuracy and minimal training and classification time using Quantum Neural Network. The dataset used is the MNIST dataset which is considered as being one of the most widely used datasets for performance analysis of various neural network models and techniques [2, 4].
1.1 Traditional Computing Environment Traditional Computing often referred to as Conventional computing is said to be a classical phenomenon of computing devices. In Traditional Computing, computers perform two main functions: storing data and information in memory and processing those stored data and information using simpler mathematical operations such as add and subtract. To perform more complex functions, the simpler mathematical operations are organized together in the form of a series which we refer to as an algorithm. The following are the main concepts of Traditional Computing [5]. Just like electric circuits, there are two states in Traditional Computing: 0 and 1. 0 represents the OFF state, while 1 represents the ON state. A bit is the basic building block in Traditional Computers, and it consists of two states 0 and 1. All the information is represented by the combination of 0 and 1. Logic gates also known as circuits are used to perform calculations. There are different logic gates for different calculations. Simpler logic gates are combined to make complex circuits. The main processing is done by Central Processing Unit performing and managing, i.e. the Arithmetic and Logic Unit, the Control Unit, and the Memory Unit.
Quantum-Inspired Neural Network on Handwriting Datasets
293
When there are complex problems, the number of steps increases along with the time taken to solve the problem. As we are advancing, we find that various existing complex problems cannot be solved even by the most powerful supercomputers in the world. Due to this, scientists and researchers around the world are shifting their attention to Quantum Computing which promises to solve these complex problems [3].
1.2 Quantum Computing Environment Quantum computing processes information and solves issues that are too tough for classical computers by employing the laws of quantum mechanics [5]. A quantum is the lowest discrete unit of any physical property that can be defined. A qubit is the fundamental unit of information in quantum computing. Quantum computers operate in a different manner than classical computers. Classical computers use bits as their basic unit of information that is either off or on to represent 0 or 1. However, a qubit can represent a 0 or a 1, or any proportion of those two values at the same time. This makes it possible to look into computations differently. This also allows Quantum computing to answer all kinds of complex problems with many being exponential [8]. Qubits can be represented as |0 or |1. There are three significant properties of Qubits. They are [7] • Superposition According to this theory, a quantum particle can be in two states at once. Prior to our observation and measurement, these particles continue to oscillate between different values. We can use a coin as an example. We just flip Classical bits and look to see if the outcome is head or tail. However, in the situation of superposition, we discover that the coin is simultaneously in all possible states, including heads, tails, and everything in between. Figure 1 shows superposition. Here, classical bits are represented by either 0 or 1. But a qubit is in the position of superposition where it can be in any state or combination of states at a certain time. Fig. 1 Superposition
294
M. R. Shah et al.
Fig. 2 Entanglement
Fig. 3 Inference
• Entanglement Entanglement is the phenomenon in which the outcomes of two quantum particles are related to one another. When the qubits are entangled, a new system is detected in which the qubits interact with one another. As a result, references and inferences for the output of one qubit can be used to support those of the other. This phenomenon not only enables us to calculate enormous amounts of information but also to solve a variety of difficult and complex situations. Figure 2 shows Entanglement. Here, two qubits are kept in a closed superposition and entangled with each other. Once they are entangled, changing one qubit can influence the other even if they are physically separated. • Quantum Interference Interference can be said as the property or behavior of a qubit to collapse one way or another due to the superposition imposed on it. This must be reduced to increase the accuracy of the results. Figure 3 shows Interference. Here, two qubits collapse into each other which causes their amplitudes to add up. This gives the expected output.
1.3 Challenges The advancement in image classification is due to the advancement in deep learning methods such as the development of powerful models, the use of robust largescale datasets, and the rapid growth of available resources. Still, there are various
Quantum-Inspired Neural Network on Handwriting Datasets
295
challenges to overcome in image classification. Some of them are discussed below [6] • Making improvements in Model Generalization One of the challenges is to make models that are capable of handling real-world predictions that are not introduced in the training phase. In most cases, the test and training data belong to the same distribution. So if there is slight variation in viewing angles, configurations, and scaling, then the accuracy is rapidly reduced. • Meta-Learning Meta-Learning refers to the process of learning the learning process. This is one of the challenges as we still lack the models and algorithms that can learn the learning process as the complexity increases. • Relational Modeling This is another challenge in image classification. To have a better understanding of a scenario, it is important to model the relationships and interactions between the objects present in that scenario. • Geometric Reasoning The next challenge is geometric reasoning. Image classification is mostly done on 2D objects. By introducing 3D to the objects, we can analyze and predict more accurately thus reducing unreasonable semantic layouts. • Integrate Deep Learning with common sense In the present context, deep learning is concerned only with the data that is fed upon it. No outside knowledge is used. But we know that having common knowledge about any subject provides a lot of information. So the challenge here is to integrate deep learning with common sense. • Use of small- and ultra-large-scale data Deep learning fails to provide better results with the small-scale dataset. To work effectively with even small-scale data is a major challenge. Similarly, recent studies show that the algorithms fail to work effectively with ultra-large-scale data. In this study, we make use of the MNIST dataset, which contains 55,000 training examples made up of pixilated 28 × 28 handwritten digits labeled from 0 to 9. The main issue is that our classical simulator of a 17-qubit quantum computer with a single readout bit can only handle 16-bit data. We therefore employ a down-sampled version of the MNIST dataset, which includes 4X4 pixilated images, to resolve that. We use two digits at a time since labeling ten digits with a single readout bit is challenging [12, 20].
296
M. R. Shah et al.
1.4 Need for Quantum Computing Quantum Computers can massively outperform traditional classical computers. For a complex computational problem such as finding the path out of a maze, traditional computers go through every sequence of the path to find the exit, whereas a Quantum computer can try all the paths at once using the principle of superposition. Quantum Computers have the potential to provide more accuracy and speed than traditional computers due to which they can speed up various scientific discoveries and inventions. Quantum Computers can revolutionize Artificial Intelligence in the fields like Machine Learning and Deep Learning. The Need for Quantum Computing can be summarized in the following points: a. Advancement in Technology With the advancement of technology, there are various problems arising that are way too complex to solve. New and new problems are arising with a high level of complexity. Quantum computing provides solutions to several complex problems like modeling a protein, modeling the behavior of an individual atom in a molecule, and so on. As the world’s population and consumption rate increase, more complex problems such as optimization of energy usage also increase. This calls for the need for Quantum computing. b. Supercomputers can only solve linear problems Supercomputers solve problems that are modeled based on linear mathematics. But Quantum computers can solve nonlinear problems because have nonlinear properties of nature. Even supercomputers fail to solve some problems which are very high in complexity. These are the possible areas where Quantum computers can have a big impact [5]: iii. Quantum Simulation Quantum Computing can be used in the simulation of complex systems. They work well in modeling other quantum systems which are very complex and ambiguous for classical computers. Examples may include the modeling of photosynthesis, complex molecular formations, and superconductivity. d. Cryptography RSA algorithm which is widely used for securing data transmission could be broken easily using quantum computers within days which would take thousands of years for a classical computer. This allows us to develop even more secure cryptography algorithms that are impossible to break. This is because it is impossible to copy data encoded in a quantum state.
Quantum-Inspired Neural Network on Handwriting Datasets
297
v. Optimization Optimization means finding the best solution to a given problem. By using quantumbased optimization algorithms, we can manage complex systems such as logistic and transportation problems, traffic flows, and package deliveries in a better way. f. Quantum Machine Learning Machine learning and deep learning need efficient ways to train and test their models with the larger dataset. We can implement Quantum software to speed up the process and reduce the high computational cost. vii. Materials Science Various fields in materials science such as chemistry, automotive, and pharmaceuticals need a faster way to model complex interactions. Using quantum solutions will be a better approach to solving these problems.
2 Literature Review In 2016, [8] Schaetti, N et al. proposed a model that used Echo State Networks to classify digits from the MNIST dataset. It was found out that though Echo State Networks were outsmarted by CNN, they were better because of image processing techniques. 4000 Neurons were used to achieve the best performance. The error rate was 0.93%. Tabik et al. proposed a study in 2017 [9] that provided a brief summary of image processing for deep learning algorithms and CNNs. On the MNIST dataset, three types of CNNs were used: LeNet, Network3, and Drop Connect. It was discovered that combining elastic and rotation enhanced accuracy by 0.71%. Alom and colleagues [10] published a report comparing precision and time on the MNIST dataset. The comparison included SVM, CNN, KNN, and RFC machine learning and deep learning models. CNN outperformed the competition. Schott et al. suggested a robust classification model in 2018 [11] that could perform analysis by synthesis with learnt class-conditional data distributions. It arrived at the conclusion that the proposed paradigm has the potential to lessen vulnerability to adversarial attacks. Majumder [12] presented a study in which the Proximal Support Vector Machine (PSVM) was utilized to recognizes handwritten digits. PSVM outperformed ANN in terms of performance, achieving 98.65% on 20,000 samples. Farhi [13] proposed a Quantum Neural Network that could be trained using supervised learning and represented by either quantum or classical data. Down-sampled pictures of two separate handwritten digits from the MNIST dataset were utilized in this QNN. Each image had 16 data bits as well as a label. QNN was created with the idea of a near-term quantum processor in mind. Because a 17-qubit errorfree quantum computer was not available, a simulation was done using a classical
298
M. R. Shah et al.
computer. The parameters for labeling the data with modest mistakes were provided via stochastic gradient descent. Alejandro Baldominos published a study in 2019 [24] that summarized the top state-of-the-art contributions reported on the MNIST dataset. This paper distinguished itself from other published surveys by employing some form of data augmentation. In addition, works involving the use of CNN were reported separately. It was discovered that there were no substantial advancements in computer vision techniques over the course of two decades, however, integrating convolution neural networks with some improved models produced remarkable results. Kadam [14] proposed a paper that dealt with the MNIST dataset to classify handwritten digits. The author reviewed some literature and found the methods that were more accurate in classifying images. The CUDA implementation of deformable pattern recognition was found to be more accurate in classifying digits with an error rate of 0.57%. Another method based on Echo State Networks-based Reservoir Computing for MNIST Handwritten Digits Recognition had also an error rate of about 0.93%. It was concluded that the accuracy increased with the deformation and rotation of the image. Kaziha [15] introduced a software comparative analysis of Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM) Neural Networks using the MNIST dataset. The analysis was done based on the speed, size, accuracy, and complexity of the two models. The result was found to be CNN was more accurate and faster than LSTM. LSTM had an error rate of 0.78%, while CNN had 0.55%. The accuracy of CNN was above 99% and that of LSTM was near 99%. In 2020, [28] Priyansh Pandey proposed a paper in which he constructed an ANN that could read any handwritten digit with greater than 93% accuracy. The ANN was developed with 784 neurons in the input layer, 75 neurons in the hidden layer, and 10 neurons in the output layer. With a batch size of 20, there were 350 epochs. The optimal learning rate value was discovered to be 0.002. Adamuthe [14] introduced the application of a convolution neural network for image classification. CNN model was used with MNIST and Fashion MNIST datasets. 5 different architectures were used with different sets of convolution layers, filter sizes, and fully connected layers. It was found that the selection of different activation functions, dropout rates, and optimizers played an important role in determining the accuracy of results. The dataset used was MNIST and Fashion MNIST. The accuracy of all five architectures was found to be more than 99%. It was also concluded that the training time increased with the increase in convolutional layers. Iordanis Kerenidis [16] introduced a quantum algorithm that combined dimensionality reduction and classification into a quantum classifier which was used for handwritten digits classification using the MNIST dataset. The accuracy achieved was 98.5%. Andrei Velichko [17] introduced a neural network that used filters based on logistic mapping (LogNNet) for the classification of handwritten digits from the MNIST-10 dataset. Less memory weight arrays were used in this network in comparison to other networks.
Quantum-Inspired Neural Network on Handwriting Datasets
299
Sanghyeon An [18] introduced three different Convolutional Neural Network (CNN) models with 3 × 3, 5 × 5, and 7 × 7 kernel sizes in the convolution layer, respectively, each using batch normalization and ReLU activation functions. Each model provided high accuracy on the MNIST dataset with an accuracy of up to 99.87% independently and 99.91% when ensembled together. In 2021, [19] Sagar Pande published a Python approach for handwritten digit identification based on several machine-learning methods. Many machine learning methods, including Decision Trees, K-Nearest Neighbor, Support Vector Machines, Random Forest, Multilayer Perceptron, and others, were employed to identify digits in the MNIST dataset. The SVM classifier was used to get an accuracy of 95.88%. Uehara and Andreas Spanias [20] introduced a Quantum Neural Network (QNN) that operated on the MNIST dataset. Python programming language and Qiskit quantum SDK were used to perform quantum simulations on a classical computer. The takeaway from this project was performance improvements for photovoltaic fault detection using quantum machine learning. Qi et al. [21] introduced QTN-VQC, an end-to-end learning system with a trainable Quantum Tensor Network (QTN) for quantum embedding on a Variational Quantum Circuit (VQC). The MNIST dataset was utilized to demonstrate that QTN outperformed other quantum embedding approaches such as PCA-VQC and Dense-VQC. In 2022, [22] Zaiban Kaladgi proposed a handwritten character recognition system based on Convolution Neural Network. The system could predict the characters accurately from user-provided images. EMNIST dataset was used, and it included some more characters from the Hindi Language. The accuracy achieved was around 80%. Kim et al. [23] proposed a quantum convolution neural network for classical data categorization that utilized a 2-qubit throughout the procedure. On both the MNIST and Fashion MNIST datasets, the performance of several Quantum convolution neural network models with varying pre-processing methods, cost functions, quantum data encoding methods, parameterized quantum circuits, and optimizers was investigated. QCNN accuracy was determined to be over 99% for MNIST datasets and around 94% for Fashion MNIST datasets. It was also determined that QCNN outperformed CNN under similar training settings (Table 1).
3 Design and Implementation The major focus would be on the design of a quantum Neural Network (QNN) and a Traditional Neural Network that would classify a simplified version of the MNIST dataset. Various handwritten digits are contained in the MNIST collection. This dataset serves as a standard for evaluating various artificial intelligence, machine learning, and data science methodologies and algorithms. LeCun et al. [2] established the Mixed National Institute of Standards and Technology (MNIST) database in 1998.
300
M. R. Shah et al.
Table 1 Focused area of the related works Authors
Publication year Focus area
Schaetti et al.
2016
A model that classified digits from the MNIST dataset using Echo State Networks. The number of neurons used was 4000, and the error rate was 0.93%
Tabik et al.
2017
3 different types of CNNs were used on the MNIST dataset: LeNet, Network3, and Drop Connect
Alom et al.
2017
Comparison of precision and time on MNIST using various ML and Deep Learning models. CNN was best
Schott et al.
2018
For MNIST, a robust classification model capable of performing analysis by synthesis utilizing learnt class-conditional data distributions
Majumder
2018
Proximal Support Vector Machine (PSVM) was used for handwritten digit recognition. Accuracy was 98.65%
Farhi
2018
Quantum Neural Network using down-sampled images of two different handwritten digits from MNIST
Alejandro Baldominos 2019
A summary of the most recent state-of-the-art contributions reported on MNIST that make use of data augmentation
Kadam
2019
Literature review of some of the methods for classification of handwritten digits using MNIST. The CUDA implementation was more accurate
Kaziha
2019
Comparative analysis of CNN and LSTM Neural Network using MNIST. CNN performed better
Pandey
2020
ANN for classification of handwritten digits with an accuracy of 93%. 350 epochs and batch size of 20. The best learning rate achieved was 0.002
Adamuthe
2020
Convolution Neural Network with 5 different architectures on MNIST and Fashion MNIST. Accuracy was more than 99%
Kerenidis
2020
Combination of dimensionality reduction and classification into a quantum classifier using MNIST. Accuracy was 98.5%
Velichko
2020
Neural Network based on logistic mapping (LogNNet) for classification of handwritten digits on MNIST-10
Sanghyeon An
2020
3 CNN models with different kernel sizes using batch normalization and ReLU on MNIST. The accuracy achieved was 99.87% independently and 99.91% together
Pande
2021
Various machine learning methods for handwritten digit detection. SVM was best with an accuracy of 95.88% (continued)
Quantum-Inspired Neural Network on Handwriting Datasets
301
Table 1 (continued) Authors
Publication year Focus area
Uehara et al.
2021
Quantum Neural Network on MNIST dataset. Provided improvement for photovoltaic fault detection using quantum machine learning
Qi et al.
2021
End-to-End learning framework QTN-VQC on MNIST. QTN performance is better than other quantum embedding approaches such as PCA-VQC and Dense-VQC
Kaladgi
2022
Convolution Neural Network for MNIST and EMNIST. The accuracy achieved was 80%
Kim et al.
2022
CNN using 2-qubits throughout the algorithm. Accuracy was found to be 99% for MNIST and 94% for Fashion MNIST
Fig. 4 Sample of MNIST dataset
A total of 70,000 examples are included in the database, of which 60,000 are used for training and 10,000 for testing (Fig. 4). We employ Cirq, a Python toolkit for manipulating and creating quantum circuits, along with TensorFlow-Quantum, a quantum machine learning framework that is utilized for constructing hybrid quantum–classical ML models [27]. Additionally, it is used to execute those circuits on computers and quantum simulators [26]. It uses the 70,000 sample MNIST dataset, of which 60,000 are used for training and 10,000 for testing [2]. The samples consist of 28 × 28 pixel images. The images are down-sampled into 4 × 4 pixels because there are only so many qubits available. Only one readout qubit exists, bringing the total number of qubits to 17. As an example, the 3 s class and 6 s class can now be classified as one single set of classes [20, 25]. The contradictory images with labels of more than one are eliminated. 32 batches are used to train the quantum neural network over the course of three epochs. The total number of iterations is 1080 (Table 2). Here, training accuracy reports the average over each epoch, while validation accuracy is calculated at the end of every epoch. After 3 epochs, the models provide
302
M. R. Shah et al.
Table 2 Results from quantum neural network Epoch
Training loss
Training accuracy
Validation loss
Validation accuracy
1
0.6700
0.7846
0.3752
0.8720
2
0.3942
0.8287
0.8287
0.8287
3
0.3861
0.8617
0.3575
0.8886
a Training loss of 0.3861, Training accuracy of 86.17%, Validation loss of 0.3575, and Validation accuracy of 88.86%. A fair comparison is made using 37 parameters with 20 epochs for a batch size of 128, the total number of iterations being 1800 (Table 3). After 20 epochs, the models provide a Training loss of 0.2224, Training accuracy of 88.03%, Validation loss of 0.2218, and Validation accuracy of 86.79% (Table 4). Here, the Traditional model is run for 20 epochs with a batch size of 128. The total number of iterations is 1800 with a loss of 0.2218 and an accuracy of 86.79%. The Quantum model on the other hand is run for 3 epochs with a batch size of 32. The total number of iterations is 1080 with a loss of 0.3575 and an accuracy of 88.86% (Table 5). Table 3 Results from fair traditional neural network (37 parameters) Epoch
Training loss
Training accuracy
Validation loss
Validation accuracy
1
0.5973
0.7859
0.5527
0.7835
2
0.5324
0.8004
0.4955
0.7891
3
0.4726
0.8226
0.4408
0.8120
4
0.4180
0.8389
0.3926
0.8135
5
0.3725
0.8494
0.3537
0.8308
6
0.3370
0.8558
0.3238
0.8308
7
0.3101
0.8565
0.3015
0.8308
8
0.2901
0.8580
0.2843
0.8308
9
0.2747
0.8587
0.2709
0.8308
10
0.2628
0.8602
0.2602
0.8308
11
0.2535
0.8621
0.2522
0.8323
12
0.2461
0.8622
0.2450
0.8323
13
0.2403
0.8622
0.2398
0.8323
14
0.2357
0.8622
0.2350
0.8323
15
0.2320
0.8622
0.2315
0.8333
16
0.2291
0.8636
0.2285
0.8720
17
0.2268
0.8738
0.2263
0.8720
18
0.2250
0.8810
0.2245
0.8674
19
0.2235
0.8802
0.2230
0.8674
20
0.2224
0.8803
0.2218
0.8679
Quantum-Inspired Neural Network on Handwriting Datasets
303
Table 4 Comparison of traditional neural networks with quantum neural networks Model
Epochs
Batch Size
Iteration
Loss
Accuracy %
Traditional
20
128
1800
0.2218
86.79
Quantum
3
32
1080
0.3575
88.86
Model
Number of parameters
Traditional
37
5.748
Quantum
34
2208.147
Table 5 Comparison of the execution time with the number of parameters
Execution time (seconds)
Fig. 5 Comparison of quantum and traditional neural networks
There are 37 parameters used in the Traditional model, whereas the Quantum model used 34 parameters. The Traditional model took 5.748 s for execution, whereas the Quantum model took 2208.147 s for execution (Fig. 5).
4 Conclusion A comparative analysis between the quantum-inspired neural network and the traditional-inspired neural network was carried out. MNIST dataset was used which was classical. When a fair comparison was made with the QNN, the performance was similar with an accuracy of 88.86% for QNN and 86.79% for Traditional Neural Network. The Traditional model was executed in around 6 s, whereas the Quantum model took around 2209 s. The larger difference between the execution time was due to the use of classical data in the Quantum model and also the Quantum model being
304
M. R. Shah et al.
a simulation of the actual process. Some issues are still there in Quantum Neural Networks, but they are improving rapidly. Solving real-world classification and categorization problems is itself a huge success. With the advancement of Quantum Computing hardware and a noise-free environment, Quantum Neural Networks and Quantum Machine Learning promise to solve many problems that are beyond the capabilities of traditional models.
References 1. Purohit A, Chauhan SS (2016) A literature survey on handwritten character recognition. Int J Comput Sci Inf Technol (IJCSIT) 7(1):1–5 2. LeCun Y, Cortes C, Burges C (2010) The MNIST database of handwritten digits. Courant Institute, NYU, Google Labs, New York, Microsoft Research, Redmond. http://yann.lecun. com/exdb/mnist/ 3. Zlokapa A, Neven H, Lloyd S (2021) A quantum algorithm for training wide and deep classical neural networks. arXiv:2107.09200 4. Kietzmann J, Demetis DS, Eriksson T, Dabirian A (2021) Hello quantum! How quantum computing will change the world. IT Prof 23(4):106–111 5. García-Martín D, Sierra G (2017) Five experimental tests on the 5-qubit IBM quantum computer. arXiv:1712.05642 6. Hevia JL, Peterssen G, Ebert C, Piattini M (2021) Quantum computing. IEEE Softw 38(5):7–15 7. Dai J, Lin S (2018) Image recognition: current challenges and emerging opportunities. Microsoft research lab—Asia (1 November 2018). www.microsoft.com/en-us/research/lab/ microsoft-research-asia/articles/image-recognition-current-challenges-and-emerging-opport unities. Accessed 22 Dec 2019 8. Schaetti N, Salomon M, Couturier R (2016) Echo state networks-based reservoir computing for mnist handwritten digits recognition. In: 2016 IEEE international conference on computational science and engineering (CSE) and IEEE Intl conference on embedded and ubiquitous computing (EUC) and 15th Internatioanl symposium on distributed computing and applications for business engineering (DCABES). IEEE, pp 484–491 9. Tabik S, Peralta D, Herrera-Poyatos A, Herrera Triguero F (2017) A snapshot of image preprocessing for convolutional neural networks: case study of MNIST 10. Alom MZ, Sidike P, Hasan M, Taha TM, Asari VK (2018) Handwritten bangla character recognition using the state-of-the-art deep convolutional neural networks. Comput Intell Neurosci 11. Schott L, Rauber J, Bethge M, Brendel W (2018) Towards the first adversarially robust neural network model on MNIST. arXiv:1805.09190 12. Majumder S, von der Malsburg C, Richhariya A, Bhanot S (2018) Handwritten digit recognition by elastic matching. arXiv:1807.09324 13. Farhi E, Neven H (2018) Classification with quantum neural networks on near term processors. arXiv:1802.06002 14. Kadam SS, Adamuthe AC, Patil AB (2020) CNN model for image classification on MNIST and fashion-MNIST dataset. J Sci Res 64(2):374–384 15. Kaziha O, Bonny T (2019) A comparison of quantized convolutional and LSTM recurrent neural network models using MNIST. In: 2019 international conference on electrical and computing technologies and applications (ICECTA). IEEE, pp 1–5 16. Kerenidis I, Luongo A (2020) Classification of the MNIST data set with quantum slow feature analysis. Phys Rev A 101(6):062327 17. Velichko A (2020) Neural network for low-memory IoT devices and MNIST image recognition using kernels based on logistic map. Electronics 9(9):1432
Quantum-Inspired Neural Network on Handwriting Datasets
305
18. An S, Lee M, Park S, Yang H, So J (2020) An ensemble of simple convolutional neural network models for MNIST digit recognition. arXiv:2008.10400 19. Gope B, Pande S, Karale N, Dharmale S, Umekar P (2021) Handwritten digits identification using MNIST database via machine learning models. In: IOP conference series: materials science and engineering, vol 1022, no 1. IOP Publishing, p 012108 20. Dobson M, Uehara GS, Spanias A, Quantum neural network benchmarking with MNIST dataset 21. Qi J, Yang CHH, Chen PY (2021) Qtn-vqc: an end-to-end learning framework for quantum neural networks. arXiv:2110.03861 22. Kaladgi Z, Gupta S, Jesawada M, Kandale V, Handwritten character recognition using CNN with extended MNIST dataset 23. Hur T, Kim L, Park DK (2022) Quantum convolutional neural network for classical data classification. Quantum Mach Intell 4(1):1–18 24. Baldominos A, Saez Y, Isasi P (2019) A survey of handwritten character recognition with mnist and emnist. Appl Sci 9(15):3169 25. Wang M, Huang R, Tannu S, Nair P (2022) TQSim: a case for reuse-focused tree-based quantum circuit simulation. arXiv:2203.13892 26. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Zheng X (2016) {TensorFlow}: a system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283 27. Nagy M, Akl SG (2007) Quantum computing: beyond the limits of conventional computation. Int J Parallel, Emergent Distrib Syst 22(2):123–135 28. Pandey P, Gupta R, Khan M, Iqbal S. Multi-Digit Number Classification using MNIST and ANN
Network and Security
An Efficient and Secure Data Deduplication Scheme for Cloud Assisted Storage Systems with Access Control Md. Nahiduzzaman , M. Shamim Kaiser , Muhammad R. Ahmed, and Marzia Hoque Tania
Abstract Cloud storage helps clients reduce the burden of data management and storage, attracting more data owners to keep their private information on the cloud. Among the massive amounts of confidential information that are being created rapidly nowadays, storage efficiency has become a primary concern for reliable data deduplication. Numerous deduplication algorithms have been developed to minimize storage requirements and network traffic, but those schemes are not efficient in real life. Besides, they also lack access control as cloud service providers manage most of the operations. We propose a deduplication architecture that combines access control with convergent encryption to ensure the confidentiality of data uploaded to the remote server along with the encryption keys. We use the SHA-256 hash function with Proof of Ownership (PoW) to achieve data integrity and authenticate users for secure deduplication. Keywords Convergent encryption · Deduplication · Access control
1 Introduction Digital data is increasing explosively in this significant data era where cloud computing comes into the scene to solve this problem. With all of the advantages of cloud computing, cloud users are increasingly outsourcing their precious data to the cloud and sharing it with legitimate individuals. It considerably decreases the administraMd. Nahiduzzaman (B) · M. S. Kaiser Institute of Information Technology, Jahangirnagar University, Savar, Dhaka 1342, Bangladesh e-mail: [email protected] M. S. Kaiser e-mail: [email protected] M. R. Ahmed Matine Engineering Department, Military Technological College Muscat, Muscat, Oman M. H. Tania Institute of Biomedical Engineering, Old Road Campus Research Building, Oxford OX3 7DQ, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_26
309
310
Md. Nahiduzzaman et al.
tive load of software and hardware overhead, enticing people and many businesses to store their sensitive data in far-away cloud storage. According to a recent report, data stored in the global data sphere will reach nearly 175 zettabytes by 2025. A company can save 15% of its IT costs by migrating to the cloud. However, users lose direct control over the outsourced data as data is uploaded to third-party providers. Nevertheless, Cloud Service Providers (CSPs) are not always honest and reliable [1]. So, the primary difficulty in cloud computing is determining how to effectively and securely determine whether the CSP is keeping its client data safe. Traditional schemes that use cryptography as the solution are not enough to provide efficiency with security at the same time [2]. To make matters worse, providers may occasionally ignore the value of private customer data and seldom delete accessed records, particularly those from average consumers. Researchers are continually uncovering and offering solutions to cloud computing data and storage security vulnerabilities [3, 4]. With cloud service development more accessible, more users tend to outsource their sensitive data to remote CSPs, which puts heavy pressure on the CSP as the storage service is limited. This is where the data deduplication [5] comes to the scenario where this technique attempts to eliminate identical copies of the stored data. According to a recent EMC analysis, over 75% of contemporary digital material saved in the cloud is duplicated [6]. Storage cost can be saved by more than 50% and the backup systems by more than 95% if we use deduplication [7]. The cloud system deletes the duplicate file, reuses the existing file, and defines the user as the file’s owner when a cloud user attempts to upload a file (block) that already resides in the server. Convergent encryption (CE) is a practical approach for protecting data privacy and achieving deduplication [8]. The main trick is as follows. By generating the hash value of the file’s material, a user creates a convergent key that is then used with the symmetric key technique to encrypt and decipher the data. To conserve storage space, the cloud user preserves the encryption key after encrypting the files and offloads the ciphertext to the distant cloud-based storage server. Since the same file yields the same hash value, which is deterministic, the same file will produce the same convergent key; then, the ciphertext will also be the same. Thus, the cloud server will be able to perform deduplication. Deduplication may significantly improve the efficiency of a storage system, but it is a challenging process [9]. The key challenges include randomization, which can slow down read operations, compression, which delays write operations, virtualization, and the cost of update operations. These issues, however, can be resolved with the assistance of a trustworthy third-party index server. This extra server maintains track of the randomized data sharing and contributes to the idea of client-side deduplication. Additionally, access control is a crucial component of cloud computing that is commonly used in genuine cloud solutions [10]. We know current techniques do not concurrently accomplish data confidentiality, access control, and resistance to brute-force attacks. Storage costs can be reduced by a great amount by both clientside and server-side deduplication. But the data uploaded to a client service provider who is not necessarily trustworthy can be a huge burden for cloud users. If we can
An Efficient and Secure Data Deduplication Scheme for Cloud Assisted …
311
design a deduplication scheme that is cost-effective and at the same time can provide us access control, it would be a great relief to the cloud users. Our proposed system will serve the following objectives: – To propose a secure deduplication storage scheme that is efficient in computation overhead to save energy and achieve security and reliability simultaneously. – To develop a system that uses convergent encryption to provide authorization and access control for data that is outsourced to distant cloud-based storage servers. – To enable block-level and file-level deduplication in a single scenario. The remaining of the paper is organized as follows: The associated articles are described in Sect. 2, the system model and proposed storage method are covered in Sects. 3 and 4 respectively, and the performances and security analysis are covered in Sect. 5. Section 6 serves as the paper’s conclusion.
2 Related Research Works This segment portrays the related works in cloud deduplication. The introduction of Convergent Encryption (CE) allowed to perform both file-level [11] and block-level [12] deduplication. Li et al. [13] implemented separate key servers to thwart bruteforce attacks, making use of a bloom filter-based PoW method, adopted a hierarchical storage architecture to thwart brute-force attacks, and distributed unauthorized material to avoid paying for additional services. As the message length increases, cost efficiency decreases but the communication overhead is the lowest in this scheme. This scheme can save space and improve the speed of queries by applying a ratelimiting strategy. But the problem is key servers can be vulnerable to attacks. Zhang et al. [14] used the server-sided MLE and allowed routine key server changes to liberate its user from depending on a single key server to resist brute-force attacks. This presents a single point of failure because the user must rely on the key server’s dependability. To achieve 80 bits security, DECKS bears less communication cost and computation costs. This scheme does not need to trust a fixed group of key servers for lifelong protected data, but they did not give focus on malicious users. Zhang et al. [15] proposed HealthDep, which utilized real-world eHealth systems with low entropy by design, as well as cross-patient duplicate EMRs in cases where patients visit with the same department to avoid dependency on single key servers. It takes less than a minute to generate MLE keys for 100 patients which shows an improvement in terms of efficiency and reduces the cost by more than 65%. Although denial of service (DoS) attacks are not addressed by this system, it is immune to bruteforce attacks. In a cloud environment, user revocation is a serious problem that is not addressed in most schemes. Yuan et al. [16] proposed a site selection technique based on a deduplication approach with effective re-encryption and a bloom filter. This method ensures data integrity while using just 63.2 and 41.5% of the computation time for data re-encryption and re-decryption respectively, ensures confidentiality, and supports user revocation and user joining. But key servers are assumed to be
312
Md. Nahiduzzaman et al.
trusted which may not be true. As the number of users joining increases, the existing scheme can’t be adaptive and vulnerable to ownership revocation security flaws. Yuan et al. [17] proposed a dynamic user management scheme that ensures deduplication by exploiting the re-encryption techniques, freeing users from TPA. Due to the ownership change in the ownership list, this scheme takes more time in a key generation but supports access controls, user joining, and user revocation. To support fault tolerance, Sing et al. [18] dispersed the key’s value among several servers by dividing it into random portions. The suggested system has a 25% deduplication rate, which yields somewhat better outcomes than competing designs. But this technique is prone to attackers using brute force. Fan et al. [19] introduced a scheme where users encrypt their data using semantically safe encryption, and users can access files on the server by displaying the appropriate permission set. Users with m privileges will incur a cost of 64 m + 32 bytes for data storage which will provide the first secure and reliable TEE-based key management. Despite not requiring a third party, this approach requires more computing time and is open to brute-force attacks. The majority of methods are unable to simultaneously ensure tag consistency, access control, privacy, and resistance to brute-force attacks. Yang et al. [20] let users pick the symmetric key at random rather than hashing the data directly and combine the Boneh-Goh-Nissim and the Composite-order bilinear pairing technique. File-level deduplication is preferable to chunk-level deduplication in this case, and communication overhead is the lowest. But they do not consider message authentication and user revocation. The same data creates unique ciphertexts and integrity tags when encrypted with randomized keys and different user signing keys. By utilizing the homomorphic authenticator through the message-derived private key, Liu et al. [21] accomplish integrity tag deduplication. In this approach, the higher the number of duplicate chunks, the more efficient the storage. In the random oracle model, this system is safe against adaptive chosen-message attacks but works only on CE-based cloud storage.
3 System Model We have proposed the system architecture mentioned in Fig. 1 contains the full system workflow. Our proposed scheme consists of 3 entities: Cloud Users (CU): Cloud users (also known as group members) are the authorized users of the cloud. When the system sets up for the first time, they need to register themselves with the trusted server. They have access control of uploading and downloading their own outsourced data. Trusted Third-Party Server (TTPS): Trusted Third-Party Server manages user registration, user revocation, and generating system parameters. This server stores the server id and memory location for each data block. Corresponding file authentication
An Efficient and Secure Data Deduplication Scheme for Cloud Assisted … Fig. 1 Proposed system model
313
Challenge Response
Data Transfer
Proof of Ownership
Cloud Storage
Trusted Server
Users
tags are also stored in trusted servers before they are outsourced to the cloud storage by a user. At the same time, it saves a duplicate of each outsourced data block’s SHA-256 hash root so that it may be used for performing duplicate checking later. Cloud Storage Provider: Cloud Service Provider (CSP) manages the storage servers. Users upload their data to these remote servers and get benefit from their elasticity without having to worry about the maintenance of the local data storage.
4 Our Proposed Storage Scheme We outline the layout of our deduplication scheme along with specifics on each module’s features in this section. We have proposed the storage scheme mentioned in Fig. 2.
4.1 Design The hash value of the file serves as the encryption-decryption key in our system’s use of convergent encryption. This advantage is used in our proposed scheme to ensure that the same file always yields the same ciphertext which helps to implement deduplication. This encryption technique will also help to ensure storage efficiency. In order to increase efficiency, we have implemented both file-level and block-level deduplication. In our proposed storage scheme, data will be first fragmented into K file blocks. Then convergent encryption will be applied to those fragmented k blocks. In convergent encryption, the same data blocks generate the same ciphertexts. First, a
314
Md. Nahiduzzaman et al.
Server 2
Server 1 C1
C1
Server N Cn
C2
C2
Ck
Cn-1
Cn
Convergent Encryption B1
B2
Bk-1
Bk
Fragmentation
File F
Fig. 2 Proposed system model
hash value from the fragmented blocks will be calculated and then those hash values will be used as the key for performing encryption. Then those encrypted data will disperse to multiple storage servers.
4.2 Data Processing In data processing, plaintext file F will be taken as input and the output will be multiple data blocks. The sequential operations are given below User Registration: The trusted third-party server will operate as a group manager. Users must establish a connection with a trustworthy server in order to register with the system for the first time. Key Generation: Members from the same group get a user id which will be employed in the generation of the convergent key. Blocking and Encryption: A single file will be fragmented in multiple fixed-size blocks and then this convergent key encrypts the fragmented file blocks. The same block will give the same ciphertext. HashTag Generation: After that, authentication tags for both file and data blocks will be generated by hashing them, and they will be stored in the trusted server. After that, encrypted blocks will be ready to upload.
An Efficient and Secure Data Deduplication Scheme for Cloud Assisted …
315
The algorithm is given below in 4.1.
Algorithm 4.1: Data Processing Algorithm input : F ; output: < C Ti ; Ui ← UReg() ; K c ← Hash(F) ; < BCi > ← Block creation(F, n) C T ← Encrpytion(K c , F) for each BCi ∈ < BCi > do C Ti ← Encrpytion(K c , Bi ) ; for each C Ti ∈ < C Ti > do at T ag Ci ← authTagGen (C Ti ); ; Disperse to servers (< C Ti , K C >, at T ag F ); ;
/* plaintext file /* Cipher blocks /* User Registration /* convergent key
*/ */ */ */
/* encrypted blocks */ /* block tags */ /* upload */
4.3 Data Dispersal During the upload phase, two scenarios can happen which are shown in Fig. 3; the file is both new and will be submitted for the first time as well as old and is already present on the server. The server will search the PoW for the duplicate file if it is flagged as a duplicate file since it already exists on the server. With the use of a challenge-response protocol between users and servers, the Proof of Ownership (PoW) of a file or block can be confirmed. If PoW is valid for that file, the file will be discarded, and the block pointer will be provided to the user and returned to the user as the data holder. If proof of ownership is not valid for that file, it will be considered a malicious adversary. If the file is not duplicated, then the server will first check if the data blocks are duplicated or not. If data blocks are duplicated, the PoW (Block) will be checked, and if it passes, a block pointer of ownership will return to the user. Otherwise, the file block will be reported as a malicious adversary and ignored. However, if the data block is not duplicated and has the correct PoW, those blocks will be uploaded to the server as new data. Otherwise, data blocks will also be ignored and reported as malicious adversaries. After a block is uploaded, the index server keeps track of the server id and memory location. Deduplication at the file and block levels can therefore be used in a public cloud environment by our scheme.
316
Md. Nahiduzzaman et al.
Input Blocks & File Hash Tag
IF Hash exists on a server?
No
Yes
File not Duplicate
No
IF PoW(B) True?
Is the block Duplicate ? No
Duplicate Files
Yes
No
IF PoW(B) True?
Yes
Ignore and Report
No
Yes Return Block pointer
Yes
Upload to Server
IF PoW(F) True?
Malicious Adversary
Return Block pointer
Fig. 3 Data dispersal flowchart
4.4 Data Restoration Each related data block’s server id and memory location are known to the index (IS) server. The index server issues a query using the relevant server id and memory address whenever a user requests a specific file. This produces the ciphertexts, which is then decrypts to recreate the file.
5 Security and Performance Analysis 5.1 Security Analysis Confidentiality: On cloud servers, information is kept in ciphertext form. Furthermore, they are randomly stored, making it impossible for a malicious attacker to obtain actual data from the storage server without knowing the correct memory location sequence and the matching server id. It is considered that the index server is curious yet trustworthy. We can say that our design ensures confidentiality because only the index server is aware of the server id and memory sequence. Access Control: Users lose direct control over their data when they outsource sensitive data to a faraway cloud. Our approach can get rid of redundant systems without going against access control by implementing proof of ownership management and
An Efficient and Secure Data Deduplication Scheme for Cloud Assisted … Table 1 Simulation environment Device name RAM OS Asus K555L Laptop
8 GB DDR3
Windows 10
317
Upload speed
Download Speed
Library
3 Mbps
5 Mbps
Crypto++, Pycryptodome
authentication tag generation. In other words, it still assures that every authorized user may precisely decrypt the outsourced encrypted data, but not all illegitimate users. Integrity: We store the hash root of each file and data block, which can effectively maintain data integrity and public auditing. Resistant to Brute-Force Attack: We may state that our system is immune to bruteforce assaults because we implemented convergent encryption with the AES-256 encryption algorithm. Using brute-force methods, AES 256 is essentially impregnable. While a 56-bit DES key cannot be hacked in a day, with current computer capability, AES would take billions of years to crack. Hackers would be unwise to undertake such an attack. Fault Tolerance: Our system is secure from hackers because it runs on 16 servers. Our system can withstand attacks well because data is distributed randomly and encrypted across a number of servers. In our design, it is always possible to recover a piece of the data unless all servers are under attack.
5.2 Implementation and Performance Evaluation Through simulation, the performance of this system is assessed using a laptop running Windows, equipped with an Intel Core i5 processor with 8 GB of DDR3L RAM. Python’s pycryptodome library for AES-256 is used to implement the encryption and decryption process. The C++ programming language is used to implement the remaining operations. We have 16 storage servers that we will use to disperse our data (Table 1). Table 2 shows the performance analysis of our scheme. To evaluate the effectiveness of our system, we used a range of block sizes for a range of file sizes. We use key generation time as a performance measuring criterion together with upload and download times. The duration of encryption and decryption is included in the upload and download times. The table (see Table 2) demonstrates that key generation time is almost fixed for all files. But the larger block size yields fewer blocks. The lower block size improved the effectiveness of deduplication. If the block size is fixed at 256 bytes instead of 1 KB, this scheme takes 75% less
318
Md. Nahiduzzaman et al.
Table 2 Execution time (s) of our scheme Block size File size Total block 256 byte
512 byte
1 KB
256 kB 512 KB 1 MB 256 kB 512 kB 1 MB 256 kB 512 kB 1 MB
1024 2048 4096 512 1024 2048 256 512 1024
Key generation
Upload
Download
0.014 0.016 0.016 0.015 0.016 0.015 0.012 0.016 0.017
7.6 12.5 26.6 3.2 5.9 13.2 1.8 3.3 6.8
7.7 15.4 27.0 3.8 7.1 15.0 1.8 3.8 8.0
time for a 1MB file. Additionally, we see that even for 512 KB files, fragmentation, encryption (or decryption), upload, and download take less than 76% of the total time if the block size is decreased by a fourth.
6 Conclusion Cloud Computing may reduce the storage cost, but duplicate data can hinder the cloud’s successful implementation in real life. But effective data deduplication with access control can solve this problem. Convergent encryption with information dispersal can simultaneously ensure data confidentiality and reliability without the help of a hybrid cloud structure. Besides, this scheme can achieve data integrity by a simple hash function which gives the best services with the lowest computation overhead. The efficiency of our proposed method is demonstrated through chunklevel and file-level deduplication. In the future, we will try to incorporate our system with redundant servers.
References 1. Asif-Ur-Rahman M, Afsana F, Mahmud M, Kaiser MS, Ahmed MR, Kaiwartya O, JamesTaylor A (2018) Toward a heterogeneous mist, fog, and cloud-based framework for the internet of healthcare things. IEEE Internet Things J 6(3):4049–4062 2. Khan M, Alanazi AS, Khan LS, Hussain I (2021) An efficient image encryption scheme based on fractal Tromino and Chebyshev polynomial. Complex Intell Syst 7(5):2751–2764 3. Nahiduzzaman M, Tasnim M, Newaz NT, Shamim Kaiser M, Mahmud M (2020) Machine learning based early fall detection for elderly people with neurological disorder using multimodal data fusion. In: International conference on brain informatics. Springer, pp 204–214
An Efficient and Secure Data Deduplication Scheme for Cloud Assisted …
319
4. Al-Amin S, Sharkar SR, Shamim Kaiser M, Biswas M (2021) Towards a blockchain-based supply chain management for e-agro business system. In: Proceedings of international conference on trends in computational and cognitive engineering. Springer, pp 329–339 5. Meyer DT, Bolosky WJ (2012) A study of practical deduplication. ACM Trans Storage (ToS), 7(4):1–20 6. Yinjin Fu, Jiang Hong, Xiao Nong, Tian Lei, Liu Fang, Lei Xu (2013) Application-aware local-global source deduplication for cloud backup services of personal storage. IEEE Trans Parallel Distrib Syst 25(5):1155–1165 7. Harnik Danny, Pinkas Benny, Shulman-Peleg Alexandra (2010) Side channels in cloud services: deduplication in cloud storage. IEEE Secur & Priv 8(6):40–47 8. Li J, Chen X, Li M, Li J, Lee PPC, Lou W (2014) Secure deduplication with efficient and reliable convergent key management. IEEE Trans Parallel Distrib Syst 25(6):1615–1625 9. Islam T, Mistareehi H, Manivannan D (2019) Secres: a secure and reliable storage scheme for cloud with client-side data deduplication. In: 2019 IEEE global communications conference (GLOBECOM). IEEE, pp 1–6 10. Zhou Yukun, Dan Feng Yu, Hua Wen Xia, Min Fu, Huang Fangting, Zhang Yucheng (2018) A similarity-aware encrypted deduplication scheme with flexible access control in the cloud. Future Gener Comput Syst 84:177–189 11. Douceur JR, Adya A, Bolosky WJ, Simon P, Theimer M (2002) Reclaiming space from duplicate files in a serverless distributed file system. In: Proceedings 22nd international conference on distributed computing systems, pp 617–624. ISSN: 1063-6927 12. Quinlan S, Dorward S (2002) Venti: a new approach to archival data storage. In: Conference on file and storage technologies (FAST 02) 13. Li Shanshan, Chunxiang Xu, Zhang Yuan (2019) Csed: Client-side encrypted deduplication scheme based on proofs of ownership for cloud storage. J Inf Secur Appl 46:250–258 14. Zhang Y, Xu C, Cheng N, Shen X (2019) Secure encrypted data deduplication for cloud storage against compromised key servers. In: 2019 IEEE global communications conference (GLOBECOM) 15. Zhang Y, Xu C, Li H, Yang K, Zhou J, Lin X (2018) HealthDep: an efficient and secure deduplication scheme for cloud-assisted ehealth systems. IEEE Trans Ind Inf 14(9):4101–4112 16. Yuan H, Chen X, Li J, Jiang T, Wang J, Deng RH (2022) Secure cloud data deduplication with efficient re-encryption. IEEE Trans Serv Comput 15(1):442–456 17. Yuan Haoran, Chen Xiaofeng, Jiang Tao, Zhang Xiaoyu, Yan Zheng, Xiang Yang (2018) DedupDUM: secure and scalable data deduplication with dynamic user management. Inform Sci 456:159–173 18. Singh Priyanka, Agarwal Nishant, Raman Balasubramanian (2018) Secure data deduplication using secret sharing schemes over cloud. Future Gener Comput Syst 88:156–167 19. Fan Yongkai, Lin Xiaodong, Liang Wei, Tan Gang, Nanda Priyadarsi (2019) A secure privacy preserving deduplication scheme for cloud computing. Future Gener Comput Syst 101:127– 135 20. Yang X, Lu R, Shao J, Tang X, Ghorbani AA (2022) Achieving efficient secure deduplication with user-defined access control in cloud. IEEE Trans Dependable Secure Comput 19(1):591– 606 21. Liu X, Sun W, Lou W, Pei Q, Zhang Y (2017) One-tag checker: message-locked integrity auditing on encrypted cloud deduplication storage. In: IEEE INFOCOM 2017—IEEE conference on computer communications, pp 1–9
Priority-Based Intelligent Reflecting Surface for Uplink 6G Communication Binodon , Md. Rafatul Haque , Md. Amirul Hasan Shanto , Amit Karmaker , and Md. Abir Hossain
Abstract The sixth-generation (6G) wireless communication network will be implemented to launch the intelligent information system, which is extensively digitized, fascinated by intelligence, and guided by global data. The 6G network can serve a large number of devices with extremely low network latency while also being more reliable and convenient, but the Line-of-Sight (LOS) cut constraint is a major challenge for expanding coverage. The concept of Intelligent reflecting surface (IRS) has recently drawn a great deal of interest as a promising approach for increasing wireless coverage, cost-effectiveness, and power efficiency for 6G wireless communication. To improve communication performance, IRS is able to dynamically switch wireless channels. The majority of recent publications on IRS take into account frequencyflat channels and presuppose that the broadcaster has perfect knowledge of channel state information (CSI). A Priority-based Intelligent Reflecting Surface for uplink 6G Communication is proposed in this paper for frequency-selective channels and a practical uplink subcarrier channel estimation. While determining minimum subcarrier bandwidth, a stringent end-to-end latency requirement of 0.1 ms is maintained for different 6G network services ensuring faster and more effective data transmission reliability of 99.9999%. Keywords 6G · IRS · Subcarrier-frequency · Latency
1 Introduction Communication in cellular network growth is skyrocketing, and recent technological developments have significantly altered society. Fifth-Generation (5G) standardization has been accomplished, and deployment has begun in several countries all over Binodon · Md. R. Haque · Md. A. H. Shanto · Md. A. Hossain Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh A. Karmaker · Md. A. Hossain (B) Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_27
321
322
Binodon et al.
the world [1]. As of January 2022, 5G networks had been deployed in 72 countries, with roughly 1,947 cities having 5G networks [2]. The International Telecommunication Union (ITU) anticipates that the flow of information will arrive at five zettabytes (ZB, 1 ZB = 1021 Bytes) per month by 2030, as a result of the inevitable increase in connected devices and 5G will not be able to handle all of the needs of that time and beyond [3]. Beyond 5G (B5G)/6G wireless systems are anticipated to offer a massive rise in data throughput, extremely low latency, improved connectivity for a huge number of devices, and enhancements in network energy efficiency. The expected features of 6G communication technology include high data rate (≥1 T bps), high frequency of operation (≥1 T Hz), minimum end-to-end latency (≤1 ms), and high reliability (≥99.9999%) [4]. New enabling technologies such as cell-free communication, mobile edge computing, visible light communication, virtual reality, terahertz communication, backhaul network, 3D networking, unmanned aerial vehicles, and holographic beamforming will be required for B5G/6G networks [5]. Designing a real-time reconfigurable propagation environment is becoming more popular as researchers look for network topologies that go B5G/6G. This may be accomplished by deploying special surfaces known as Intelligent Reflecting Surface (IRS). For B5G/6G wireless communication systems, IRS has emerged as an interesting new strategy for achieving intelligent and dynamically configurable uplink wireless channels and radio propagation environments [6, 7]. The IRS is formed of many industrially detached reflecting elements, and all of those elements may autonomously reconfigure the amplitude and/or the incident signal’s phase while being controlled by the IRS Smart Controller [8]. The IRS enhances coverage by instituting a virtual Line-of-Sight (LOS) connection between Mobile Stations and the Base Station/Wireless Access Points provider, thereby expelling the barrier between them. From a communication perspective, the development of network systems with IRS assist is intriguing due to various and novel constraints. IRS enhances competitive strengths over conventional dynamic relays since it operates in full-duplex (FD) mechanism, with no amplification of antenna noise or self-interference. The use of IRS as a supplementary capability to help with concealed communications, unmanned aerial vehicle (UAV) communications, and wireless power transmission was also explored [9]. IRS is often low-profile and lightweight, and it may be easily installed on ceilings, walls, furniture, and behind paintings or decorations, as well as coated on building facades, lampposts, advertising boards, and even the surfaces of fast-moving vehicles. Figure 1 represented the scenario of IRS installation on various objects. Furthermore, because IRSs are supplementary devices, it is accessible to mobile stations since they can be quickly adopted into current wireless connections without changing physical-layer standards. In [8, 10] analysis, IRS-assisted wireless systems and networks confront novel challenges such as channel estimation, reflection optimization, and IRS configuration strategy. In prior work, the IRSassisted wireless communication systems’ transmitting architecture is considered to be single-user case of [11, 12], concept for wireless power transmission of [14], and development of the physical layer of security of [9]. Furthermore, [13] consider the scenarios with only one IRS and the scenarios with multiple IRS [15, 16].
Priority-Based Intelligent Reflecting Surface for Uplink 6G Communication
323
Fig. 1 IRS installation for a obstacle-based uplink communication
To overcome channel estimating issues, [17, 18] proposed various channel estimate schemes and proposed that the number of subcarriers should be larger than or equal to the number of users for a successful OFDMA-based broadband communication system [18]. The existing IRS-assisted works are based on uplink transmit power and path loss for pre-defined subcarrier channels. In this paper, a Priority-based Intelligent Reflecting Surface for uplink 6G Communication technique is considered for different 6G services. We calculate the minimum number of subcarriers needed to achieve transmission reliability of 99.9999% and within a latency of 0.1 ms. The primary objective of this research is to use 6G network services for quicker and more efficient data transmission while maintaining a strict end-to-end latency of 0.1 ms. Packet duplication improves data transmission reliability, but to avoid intrapacket collisions, we adopt 3- and 5-packet duplication. This work gets helped by
324
Binodon et al.
[19, 20], to divide 6G services into five categories as Emergency services, Ubiquitous 6G services, Autonomous 6G services, Public 6G services, and Expectance services, to achieve the standard of 6G.
2 System Model In addition, utilizing 6G, we proposed a system with low latency of 0.1 ms and 99.9999% reliability. It is impractical to have barrier-free communication when trying to establish a line-of-sight communication. Some obstacles were taken into account for our suggested model. Multipath fading and Time alignment problems are caused by these obstructions. We used a modified intelligent reflecting surface to meet the 6G criterion. Figure 2 depicts an IRS-assisted multi-user high-speed wireless uplink-
Fig. 2 System model
Priority-Based Intelligent Reflecting Surface for Uplink 6G Communication
325
communications network that is constructed using multiple customized IRS. By reducing latency and boosting reliability, these IRS improve communication between the Base station (BS) and the Mobile station (MS). Each MS includes different sorts of services (Emergency, Ubiquitous, Autonomous, Public, and Expectance services). Each service is given a priority. One mobile station can transmit only a single service throughout each transmission. For transmission without obstacles, these are identified and detected in BS. While for transmission with obstacles, identification and detection take place in our customized IRS. Based on their priority, the transmitted packets are further processed. In our proposed system, the system bandwidth B is distributed to each IRS. The bandwidth of each IRS is B I R S . From the B I R S , we allot service-bandwidth Bserx in each IRS depending on multiple services, using the bandwidth allocation model from [22]. The system bandwidth B is allocated to each IRS in our proposed model. Following the Bandwidth allocation model from [22, 23], we allocate service-bandwidth Bserx for each service. Bser x = Bser 1 , Bser 2 , Bser 3 ...Bser n
(1)
Bserx contains a set of frequencies. Each data packet randomly occupies subcarrier frequency f c from the Bserx frequency set. R
− Bserx fc ←
(2)
But in our IRS-assisted uplink-communication scheme, it occasionally (catastrophe, natural calamities, congregation, gathering, etc.) happens that some of the assigned Service-bandwidth are idle due to infrequent use, while others are extremely busy due to frequent use. It will decrease the reliability of the system. So, Re-assigning the bandwidth in that case in accordance with their criteria will be a good solution. We adopted the Page replacement algorithms: LRU (Least Recently Used) and MRU (Most Recently Used) [24] for this situation to properly utilize and reassign the bandwidth B I R S . The MRU finds the Bserx bandwidths that are used the most frequently, whereas the LRU finds those that are used the least frequently. To reassign Bserx , the bandwidth allocation algorithm from [22] is used. It will increase the system’s reliability, reduce latency, and make efficient use of the resources.
3 Result and Discussion We assess the effectiveness of our suggested IRS-assisted uplink-communication system in this section. We compute the minimum number of subcarrier bandwidth required to acquire the 99.9999% reliability, which supports the B5G/6G network standard. The simulation results are prepared using the MATLAB simulator. We consider the value utilized in the simulation mentioned in Table 1. The proposed network
326
Binodon et al.
Table 1 Simulation parameter Parameter
Value
Mobile station IRS Bandwidth Subcarrier bandwidth Link speed Packet size Packet duplication Arrival rate Slot duration Simulation time
100 4 6 GHz 10 KHz 1 Mbps 100 bits 3 and 5 500–10000 pkt/s 0.1 ms 1000 s
Table 2 Comparison of various existing system models with proposed protocol Ref
System model
[25]
MIMO, LOS
512
Single
[26]
Multiple user, MISO
64
Single
[27]
Single antenna, SISO
128
Single
[18]
Multi user
16
[28]
Massive MIMO
−
[15]
Downlink, LOS
This work Uplink, MIMO
Subcarrier IRS
User
Distance (m)
Reliability Latency (%) (ms)
−
10
99.999
≤10
3
1
99.999
≤5
1
2
−
−
Single
−
1.5
99.999
≤5
Two
3
4
99.999
1
−
Three
6
−
−
−
40–336
Four
100
3–6
99.9999
0.1
‘−’ Mark denotes the work does not consider or mention, MIMO: Multiple Input Multiple Output, SISO: Single Input Single Output, LOS: Line of Sight
considers 100 mobile stations, and each mobile station can send data for five different types of 6G services at various times to suit their needs. These MSs can transmit 3and 5-packet duplications using a variety of subcarrier channels with varying arrival rates. Using random subcarrier selection helps maintain minimum latency. The poison arrival process is used to generate and transmit packets. The Mobile stations send packets through the slotted-ALOHA system, and each slot duration is 0.1 ms. Table 2 represents the relative comparison of the existing models, including the proposed model. Most of the related works are considered fixed subcarrier frequency. Therefore, to achieve diversity and better adaptability, we have varied the usage of carrier frequency with packet duplication. In Sect. 3.1, we determine the minimum subcarrier bandwidth, and Sect. 3.2 shows the analysis of the reliability response.
Priority-Based Intelligent Reflecting Surface for Uplink 6G Communication
327
400
Fig. 3 Determine subcarrier channels bandwidth to gain reliability of 99.9999% for 3- and 5-packet duplication
3 Packet Duplication
Sub-carrier channel
5 Packet Duplication
300
200
100
0
0
2000
4000
6000
8000
10000
Arrival rate (pkt/s)
3.1 Subcarrier Channel The minimum subcarrier channels bandwidth is evaluated with different arrival conditions and multiple packet duplication in perspective to attain the reliability of 99.9999% in Fig. 3. This result is calculated for 3- and 5-packet duplication for various arrival rates. At 10,000 pkt/s arrival rate, only 148 subcarrier channels are needed for 5-packet duplication to achieve 99.9999% reliability as well as 3-packet duplication requires 336 subcarrier channels. In comparison to 5-packet duplication, 3-packet duplication implies the inclusion of more subcarrier channels in the system. At 5,000 pkt/s arrival rate, for 5-packet duplication to achieve 99.9999% reliability, 92 subcarrier channels are required, whereas 156 subcarrier channels are needed for 3-packet duplication. Those results indicate the proposed scheme requires a minimal number of channels to acquire the B5G/6G network standard at low arrival rates. But for acquiring the 6G reliability and latency, we have to consider the higher arrival conditions. From the overall evaluation, we can say the 5-packet duplication reached 99.9999% reliability with a lesser number of subcarriers than the 3-packet duplication for possible arrival conditions. The 5-packet duplication performs better in our proposed system.
3.2 Reliability For reliability analysis, we considered a total of 100 subcarrier channels in the proposed system. Figure 4 shows the achieved reliability response on different arrival conditions using 3- and 5-packet duplication. At the 10,000 pkt/s arrival rate, the proposed system gained 99.999677% reliability for 5-packet duplication, whereas
328
Binodon et al.
Fig. 4 Determine reliability for 3- and 5-packet duplication
Reliability (%)
99.9998 99.9996 99.9994 99.9992 99.999
3 Packet Duplication 5 Packet Duplication
99.9988 0
2000
4000
6000
8000
10000
Arrival rate (pkt/s)
3-packet duplication earned 99.998847% reliability. The performance of 5-packet duplication is better than 3-packet duplication at higher arrival conditions. At an arrival rate of 5,000 pkt/s, the 5-packet duplication achieved 99.999967% reliability, whereas the 3-packet duplication showed 99.999796% reliability. The proposed system offers a reliability that fulfills the B5G/6G network benchmark in most of the cases whereas the arrival condition is around 5000 pkt/s adopting 100 subcarrier channels in consideration.
4 Conclusion In this article, we propose a Priority-based Intelligent Reflecting Surface for uplink 6G Communication. And for these, we categorize 6G services and consider these for priority- based IRS-assisted transmission system. In this system, the minimum subcarrier channels are calculated for 3- and 5-packet duplication to satisfy the B5G/6G recommended reliability of 99.9999% and latency of 0.1 ms. The simulation results exhibit the efficacy of the proposed channel estimation scheme. As a potential future work, we determine to evaluate the impact of using a diverse number of IRSs on uplink communication, calculate the packet transmission power, and find the quantity of different 6G services packets that are successfully transmitted from a fixed number of mobile stations while sending packets.
Priority-Based Intelligent Reflecting Surface for Uplink 6G Communication
329
References 1. Sejan MAS, Rahman MH, Shin BS, Oh JH, You YH, Song HK (2022) Machine learning for intelligent-reflecting-surface-based wireless communication towards 6G: a review. Sensors 22(14):5405 2. New 5G Cities in 2021 https://www.viavisolutions.com/en-us/news-releases/635-new-5gcities-2021-1947-5g-cities-globally-according-viavi. Accessed 20 August 2022 3. Tariq F, Khandaker MR, Wong KK, Imran MA, Bennis M, Debbah M (2020) A speculative study on 6G. IEEE Wirel Commun 27(4):118–125 4. Hassan B, Baig S, Asif M (2021) Key technologies for ultra-reliable and low-latency communication in 6G. IEEE Commun Stand Mag 5(2):106–113 5. Liu Q, Sarfraz S, Wang S (2020) An overview of key technologies and challenges of 6G. In: International conference on machine learning for cyber security. Springer, Cham, pp 315-326 6. Wu Q, Zhang R (2018) Intelligent reflecting surface enhanced wireless network: joint active and passive beamforming design. In: 2018 IEEE global communications conference (GLOBECOM). IEEE, pp 1–6 7. Wu Q, Zhang R (2019) Towards smart and reconfigurable environment: intelligent reflecting surface aided wireless network. IEEE Commun Mag 58(1):106–112 8. Wu Q, Zhang S, Zheng B, You C, Zhang R (2021) Intelligent reflecting surface-aided wireless communications: a tutorial. IEEE Trans Commun 69(5):3313–3351 9. Cui M, Zhang G, Zhang R (2019) Secure wireless communication via intelligent reflecting surface. IEEE Wirel Commun Lett 8(5):1410–1414 10. Sun R, Wang W, Chen L, Wei G, Zhang W (2021) Diagnosis of intelligent reflecting surface in millimeter-wave communication systems. IEEE Trans Wirel Commun 11. Yu X, Xu D, Schober R (2019) MISO wireless communication systems via intelligent reflecting surfaces. In: 2019 IEEE/CIC international conference on communications in China (ICCC). IEEE, pp 735–740 12. Yang Y, Zheng B, Zhang S, Zhang R (2020) Intelligent reflecting surface meets OFDM: protocol design and rate maximization. IEEE Trans Commun 68(7):4522–4535 13. Guo H, Liang YC, Chen J, Larsson EG (2019) Weighted sum-rate maximization for intelligent reflecting surface enhanced wireless networks. In: 2019 IEEE global communications conference (GLOBECOM). IEEE, pp 1–6 14. Pan C, Ren H, Wang K, Elkashlan M, Nallanathan A, Wang J, Hanzo L (2020) Intelligent reflecting surface aided MIMO broadcasting for simultaneous wireless information and power transfer. IEEE J Select Areas Commun 38(8):1719–1734 15. Zhang Z, Cui Y, Yang F, Ding L (2019) Analysis and optimization of outage probability in multi-intelligent reflecting surface-assisted systems. arXiv preprint arXiv:1909.02193 16. Kim J, Hosseinalipour S, Kim T, Love DJ, Brinton CG (2021) Multi-IRS-assisted multi-cell uplink MIMO communications under imperfect CSI: a deep reinforcement learning approach. In: 2021 IEEE international conference on communications workshops (ICC Workshops). IEEE, pp 1–7 17. Zheng B, Zhang R (2019) Intelligent reflecting surface-enhanced OFDM: channel estimation and reflection optimization. IEEE Wirel Commun Lett 9(4):518–522 18. Zheng B, You C, Zhang R (2020) Intelligent reflecting surface assisted multi-user OFDMA: channel estimation and training design. IEEE Trans Wirel Commun 19(12):8315–8329 19. Zikria YB, Kim SW, Afzal MK, Wang H, Rehmani MH (2018) 5G mobile services and scenarios: challenges and solutions. Sustainability 10(10):3626 20. Yu H, Lee H, Jeon H (2017) What is 5G? Emerging 5G mobile services and network requirements. Sustainability 9(10):1848 21. Nayak S, Patgiri R (2020) A vision on intelligent medical service for emergency on 5G and 6G communication era. EAI Endors Trans Internet Things 6(22) 22. Shanto MAH, Karmaker A, Reza MM, Hossain MA (2022) Cluster-based transmission diversity optimization in ultra reliable low latency communication. Network 2(1):168–189
330
Binodon et al.
23. Hossain A, Pan Z, Saito M, Liu J, Shimamoto S (2020) Multiband massive channel random access in ultra-reliable low-latency communication. IEEE Access 8:81492–81505 24. O’neil EJ, O’neil PE, Weikum G (1993) The LRU-K page replacement algorithm for database disk buffering. ACM Sigmod Record 22(2):297–306 25. Zhang S, Zhang R (2020) Capacity characterization for intelligent reflecting surface aided MIMO communication. IEEE J Select Areas Commun 38(8):1823–1838 26. Li H, Cai W, Liu Y, Li M, Liu Q, Wu Q (2021) Intelligent reflecting surface enhanced wideband MIMO-OFDM communications: from practical model to reflection optimization. IEEE Trans Commun 69(7):4807–4820 27. Cai W, Li H, Li M, Liu Q (2020) Practical modeling and beamforming for intelligent reflecting surface aided wideband systems. IEEE Commun Lett 24(7):1568–1571 28. Ning B, Chen Z, Chen W, Du Y, Fang J (2021) Terahertz multi-user massive MIMO with intelligent reflecting surface: beam training and hybrid beamforming. IEEE Trans Veh Technol 70(2):1376–1393
Identifying Duplicate Questions Leveraging Recurrent Neural Network Maksuda Bilkis Baby, Bushra Ankhari, Md Shajalal, Md. Atabuzzaman, Fazle Rabbi, and Masud Ibn Afjal
Abstract Community Question Answering (CQA) forums are the predominant platform where the users can respond to others’ questions and share acquainted insights. The influx of new questions with linguistic expression variability and ambiguity leads to a haphazard collection of overlapped and unique questions. Hence, the challenge of identifying the equivalent questions emerges so that the users can be redirected to proper references. In this paper, we propose recurrent neural network-based architecture employing word-embedding to assess whether a question-pair is duplicate or not. After a careful pre-processing step, we apply several pre-trained word-embedding models to represent questions semantically in a fixed dimensional real-valued vector. We then apply two different RNN architectures, namely Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) to encode the underlying meaning of question-pairs. Finally, the introduced models predict whether the question-pair is duplicate or not. The experimental results on the benchmark dataset demonstrated that our best models yielded competitive results with an accuracy of 82 and 83% and contribute to the state of the art. In addition, our method is applicable to other textual similarity identification tasks. Keywords Duplicate question-pair detection · Word-embedding · Recurrent neural network · LSTM · BiLSTM
1 Introduction In community question answering (CQA) forums, people from different backgrounds can ask questions. Consequently, it is expected that multiple questions with different syntactic but similar semantic forms can be asked. The questions which contain M. B. Baby, B. Ankhari—Both authors contributed equally. M. B. Baby · B. Ankhari · Md Shajalal (B) · F. Rabbi · M. I. Afjal Hajee Mohammad Danesh Science and Technology University, Dinajpur, Rangpur, Bangladesh e-mail: [email protected] Md. Atabuzzaman University of Information Technology and Sciences, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_28
331
332
M. B. Baby et al.
semantically the same intent and can be answered by one answer are considered as duplicate [1]. By identifying duplicate questions automatically, CQA can suggest recommended answers of asked duplicate questions directly to the user. Therefore, the user will get recommended answer promptly. Currently, CQA forums depend on the moderators and users to manually analyze and merge duplicate questions using the merge tool, which is both time-consuming and tedious work. The rapid engagement of people in CQA platforms is another reason for raising the challenge of identifying duplicate questions. Let’s consider two questions, “Q1 : How to be competent in English?” “Q2 : What are the ways to be good in English?”. If Q1 has already been asked in the CQA platform, then it will not allow to ask the identical questions as Q1 . But there is a very high probability that users can ask the same question in different sentence structures (i.e., Q2 ). As a result, many duplicate questions are accumulated. For such questions, the CQA platform should try to automatically identify already asked duplicate questions, and if there exist such questions, it should redirect their recommended/accepted answers. Thus, it reduces time and tire for both readers and writers. Eventually, the forums can improve the users satisfaction. On the other hand, the user who writes the answers does not need to answer the semantically similar questions at different threads. Hence, they can grab more users with the previous answer [2]. With the objective to provide such a service to the users and make the forum more efficient and helpful, we present our method for identifying duplicate questions. In recent years, artificial intelligence (AI), in particular machine learning (ML), has attracted many researchers to contribute to diverse fields and challenging research assignments such as anomaly detection and classification, supporting the detection and management of the COVID-19 pandemic, cyber security and trust management, smart healthcare service delivery, text and social media mining, understanding student engagement, and identifying duplicate questions etc. [3–10]. Several studies introduced methods for identifying duplicate questions. Othman et al. [9] proposed an Attentive Siamese LSTM(ASLSTM) approach by extracting the semantic features employing pre-trained word-embedding. Chali et al. [10] also employed pre-trained word-embedding model and applied a combination of CNN and LSTM with an attention mechanism. However, their method ignores words which do not exist in the pre-trained word-embedding models. To address this problem, we introduce a new architecture of word-embedding with deep learning techniques to classify duplicate and non-duplicate question-pairs. First, we have pre-processed the question-pair as they contain many inconsistent data such as stopwords, contractions, and wrong spelled words to improve the performance of the model. Then we pass the pre-processed words of the question-pair through pre-trained word-embedding models to extract semantic features. We make use of two different combinations of word-embedding models for extracting features from question-pairs. One combination is the pre-trained word2vec word-embedding and our locally trained word-embedding. Another combination is the pre-trained word2vec word-embedding and pre-trained fastText word-embedding. After representing question-pair semantically, we applied two different deep learning models based on LSTM and BiLSTM to identify the duplicacy of the questions.
Identifying Duplicate Questions Leveraging Recurrent Neural Network
333
We trained our introduced models on over 323480 question-pairs and tested over 40435 records of a benchmark duplicate question identification dataset. The experimental results demonstrated the effectiveness of our methods. In summary, our contributions to this research are two folds: 1. We introduce two different combinations of word-embedding techniques with our locally trained one to supply the word-embedding vectors of every significant word. 2. Our proposed LSTM and BiLSTM-based methods outperform known related works.
2 Literature Review Different neural frameworks including Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) [11, 12] have delegate progress in real-world question similarity task. The conventional approaches in modeling the similarity of question-pairs concentrate on word-embedding techniques. Chawla et al. [13] conducted a comparative study among five word-embedding techniques including TF-IDF, Word2vec, Doc2vec, BERT, and fastText individually to find semantic matching. Their studies concluded that the fastText model achieves an impressive accuracy due to its noise tolerant nature. Meenakshi et al. [14] employed an ensemble-based model and applied a transformer-based contextual similarity detection (TCSD) embedding technique to their model. Xu et al. [15] proposed a semantic matching model(SMM) integrated with the multi-task transfer learning framework for multi-domain duplicate question detection. Dimitrov et al. [1] have shown that deep learning approach with CNNs was built on top of word-embedding to detect duplicate questions from the Quora and achieved competitive results. They tackled vocabulary mismatch problems by correcting spelling errors. A relatively unique approach was demonstrated by Rani et al. [16] for detecting duplicate questions from the transliterated bilingual data on CQA. The author makes use of a hybrid model combining a siamese neural network with a capsule network and a decision tree classifier. Sakhrani et al. [17] proposed an question identification framework combining bidirectional transformer encoder and convolutional neural networks. Most of the above-mentioned methods extracted features using word2vec, Glove, BERT, and fastText word-embedding approach. But, word2vec and Glove might not provide the semantic vector for some words due to vocabulary mismatch or new words. As the duplicacy of a question-pair depends more on its words meaning, omitting word from a question might cause performance degradation of the system. In the case of fastText, it works on character level and the use of character n-grams was more useful in morphologically rich languages such as Arabic, German, and Russian than for English [18]. Therefore, in this research work, we employ a combination
334
M. B. Baby et al.
of two different word-embedding approaches for extracting features from questionpairs that tend to reduce the chances of vocabulary mismatch problem.
3 Our Approach Let Q 1 and Q 2 are two questions where Q 1 is asked by a user and Q 2 is the already existing question previously asked by another user. Since the users in CQA platforms come from different backgrounds, it is expected that their writings might have different styles and spelling errors. To tackle the vocabulary mismatch problem, primarily we apply pre-processing to improve the quality of the texts using classical NLP libraries. Further, semantic information of the questions is extracted employing different word-embedding models which provide us with a real-valued high-dimensional vector for each vocabulary. Finally, deep learning models use those question representations to identify whether the pair is duplicate or not.
3.1 Text Pre-processing We apply the text pre-processing step to present questions in such forms so that we can extract meaningful insight later. Quora dataset contains questions and unsurprisingly contains anomalies in the questions (i.e., non-ASCII characters). To eliminate different anomalies, several NLP techniques are applied including expanding contraction, removing punctuation, special and accented characters, word correction, and lemmatization to pre-process the text with the help of NLTK and Keras libraries. The tokenizer function from Keras library is used to tokenize each question into a set of words.
3.2 Semantic Information Extraction Semantic information rather than syntactic information plays an important role in the classification of two questions. Depending on the context, the same syntactic sentence can give different semantic information. In addition, different syntactic sentences might bear the same semantic information. For example, if we consider a question-pair, “What is the step by step guide to invest in share market in India?” and “What is the step by step guide to invest in share market?”. These two questions are almost similar syntactically but semantically they bear completely different meanings and they are not the same. Another question-pair, “What does the Quora website look like to members of Quora moderation?” and “How does Quora look to a moderator?”. This question-pair is syntactically different but semantically they are duplicate. Word-embedding techniques are employed widely to extract semantic
Output Layer
Fully Connected Layer
Dropout Layer
335
Fully Connected Layer
Pooling Layer
Embedding Layer
Input Layer
Fig. 1 Architecture of our method with different layers
LSTM/ Bi-LSTM Layer
Identifying Duplicate Questions Leveraging Recurrent Neural Network
information from a text that can solve the above phenomena. It is a vector representation of a particular word that can capture the meaning of a word based on a particular context. In this research, We apply two different word-embedding models, word2vec (trained on Google News corpus) [19, 20] and fastText (trained on Common Crawl and Wikipedia) [18, 21].
3.3 Duplicate Question Classification Using RNN To classify question-pair, we design a neural network-based architecture based on two different recurrent neural networks (RNN), LSTM and BiLSTM. The structure of our Neural network(NN) model with different layers is illustrated in Fig. 1. The input layer passes every word of the question to the embedding layer. The Embedding layer transforms words into their corresponding word-embedding vector.
3.3.1
Long Short-Term Memory (LSTM) Networks
Long Short-Term Memory (LSTM) [22] networks are a type of recurrent neural network that gained popularity for the capacity of capturing long-term dependencies and modeling sequential data. In our case for duplicate question-pair detection, sequential information and memorizing it for a long time is crucial to understand the context of a question. An LSTM unit is a memory cell composed of four main components: a forget gate, an input gate, a memory cell state, and an output gate. Mathematically, each gate can be represented by the following equation: f t = sigmoid(W f xt + U f h t − 1 + b f ), i t = sigmoid(Wi xt + Ui h t − 1 + bi ), cpt = tanh(Wc xt + Uc h t − 1 + bc ), ct = i t ∗ cpt + f t ∗ ct − 1, ot = sigmoid(Wo xt + U0 h t − 1 + b0 ), h t = ot ∗ tanh(ct ),
(1)
336
M. B. Baby et al.
where f t , i t , ct ,ot are input gate, forget gate, memory cell state, and output gates at time t, respectively. Wk , Uk are LSTM parameterized weight matrices, and bk represents the bias vector for each k in {i, f, c, o}.
3.3.2
Bidirectional LSTM Networks
A Bidirectional LSTM or BiLSTM [23] is another RNN-based sequence processing model that consists of two LSTMs (i.e., one taking the input in a forward direction and the other in a backward direction) with two separate hidden layers. It connects the two hidden layers with an identical output layer. Both forward and backward layers’ output h t are calculated by using the standard LSTM updating equations Eq. 1. The BiLSTM layer generates an output vector, Yt = [Yt−n , ...., Yt−1 ], in which the last element Yt−1 is the predicted output. Each element of the output is calculated by using the following equation: − → ← − (2) Yt = σ ( h t , h t ) where σ function is used to combine the output sequences h t (forward) and h t (backward) obtained from each of the LSTM layers. After LSTM or BiLSTM layer, the pooling layer down- samples the input representation by taking the maximum value over the time dimension. A fully connected layer is used to connect all the activation of its previous layer with it and introduce non-linearity of high-level features that are learnt in the previous layer. We then applied the dropout layer to avoid the overfitting problem. The final layer is the output layer which is used to predict the class of the question-pair.
4 Experimental Results and Evaluation 4.1 Dataset The CQA forum Quora first released its public dataset in January 2017 for identifying duplicate questions [24]. We carried out our experiments with this benchmark Quora dataset.1 The dataset consists of over 404352 question-pairs of edited and grammatically well-formed potential question-pairs. Each record contains IDs for each question in the pair, the full text for each question, and a label (binary value) that indicates whether the record contains a duplicate-pair or not. The gold standard for each pair is a label supplied by the human experts. The dataset roughly contains 37% of the duplicate question-pairs and other pairs are not duplicates. We split the dataset randomly into train/dev/test set with a proportion of 80:10:10 ratio.
1
https://www.kaggle.com/datasets/sambit7/first-quora-dataset
Identifying Duplicate Questions Leveraging Recurrent Neural Network
337
4.2 Experimental Settings We conducted experiments combining different introduced techniques. In total, we have eight different setups which are summarized in Table 2. Mainly, the experiments can broadly be categorized with LSTM and BiLSTM. For both categories, we first applied pre-trained word-embedding by Google News corpus (word2Vec) and then we applied a combination of two different embedding models. Word2Vec wordembedding might not able to produce word-embedding vector of those words which it didn’t fetch during its training time. Generally, lots of words (from typos, newly added words, and compound words) are used in CQA websites which are absent in the vocabulary list of word2vec. Eliminating these words from question-pairs may affect the performance of the duplicate question-pair detection system. To tackle this problem, we employed a combination of word2vec and our own custom-trained word2vec word-embedding (trained on Quora dataset corpus) and a combination of word2vec and fastText word-embedding. The combination works in such a way that first send the word to word2vec wordembedding model for extracting semantic information. If the word does not exist in the vocabulary, then the word is sent to other embedding models (fastText or custom word2vec) (Table 1).
4.3 Experimental Results The performance of all settings in terms of evaluation metrics ( Accuracy, Precision, Recall, and F1-score) is reported in Table 2. The BiLSTM model combining word2vec and fastText exhibits F1-score levels 81% and recall levels 81% whereas BiLSTM combining word2vec and custom word2vec shows F1-score levels 82% and recall levels 82%, respectively. We also observe that a combination of two word-embedding with BiLSTM can effectively identify duplicate entries compared to individual word-embedding with the LSTM approach. The combination of two word-embedding methods provides better results because every significant word can be mapped into a semantic feature. In the case of BiLSTM-based models, our method achieved better performance in most of the cases because the input flows in both directions in BiLSTM, and it is capable of utilizing information from both sides. It is also a powerful tool for modeling the sequential dependencies between words and phrases in both directions of the sequence. In addition, text pre-processing techniques are also plausible reasons for the improved performance.
338
M. B. Baby et al.
Table 1 Summary of different experimental settings Experimental setup
Description word2vec word-embedding with LSTM
W 2v _ L ST M
LSTM combining word2vec and custom word2vec W 2v _Cw2v _ L ST M
LSTM combining fastText and word2vec W 2v _ F T _ L ST M
fastText word-embedding with LSTM F T _ L ST M
word2vec word-embedding with BiLSTM W 2v _ Bi L ST M
BiLSTM combining word2vec and custom word2vec W 2v _Cw2v _ Bi L ST M
fastText word-embedding with BiLSTM F T _ Bi L ST M
BiLSTM combining word2vec and fastText W 2v _ F T _ Bi L ST M
Table 2 Performance of our approaches in terms of evaluation metrics Model (%)
Accuracy (%)
Precision (%)
Recall (%)
F1-score (%)
77
76
75
75
79
83
73
80
79
77
79
78
80
78
78
78
80%
84
75
81
80
78
79
78
82
80
81
81
83
80
82
82
W 2v_L ST M W 2v_F T _L ST M f ast T ext_Bi L ST M f ast T ext_L ST M W 2v_Cw2v_L ST M W 2v_Bi L ST M W 2v_F T _Bi L ST M W 2v_Cw2v_Bi L ST M
4.4 Comparison with Known Related Works To validate the performance of our methods, we compared the performance of it with some known related works performed on the Quora dataset. Figure 2 summarizes the comparison among different known methods and our methods in terms of accuracy. The bar in dark color represents our methods. The figure concludes that our methods outperformed recent state-of-the-art methods except for GRU-based method [25]. Homma et al. [25] applied Siamese Gated Recurrent Unit (GRU) neural network to encode each question-pair and data augmentation to their model. In our approaches, we have not applied data augmentation. However, our model performed better than
Identifying Duplicate Questions Leveraging Recurrent Neural Network
339
LSTM-based model introduced by Chali et al. [10] and Siamese_CNN model introduced by Wang et al. [26]. Fig. 2 Performance comparison chart of our approaches with some known related work in terms of accuracy
GRU[19] W2v_Cw2v_BiLSTM W2v_FT_BiLSTM LSTM[4] Siamese_CNN [20] Random Forest [21] KNN[21] Logistic Regression [21] SVM [21] 0%
20%
40%
60%
80%
100%
Accuracy
We employed the combination of different word-embedding techniques and passed the semantic representation to the BiLSTM architecture. BiLSTM has the ability to process the sequence of textual data in both forward and backward directions. We think this is one of the plausible reasons for having better performance. In addition, our approaches perform better than the several classical machine learning models that include Random forest, SVM, KNN, and Logistic regression introduced by Viswanathan et al. [27]. For natural language processing tasks, it is evident that deep learning performs better than machine learning since it learns multiple levels of representation of text. Moreover, using deep learning, we can take the advantage of word-embedding to extract semantic information from text data. Hence, our method contributed to achieve state-of-the-art performance.
5 Conclusion and Future Work In this paper, we apply two different combinations of word-embedding methods with two recurrent neural network models LSTM and BiLSTM. Several pre-processing on human-entered data is also carried out to make the duplicate question-pair identification task effective. We have carried out our experiments employing the Quora benchmark dataset. We conduct eight different experiments to identify duplicate questions syntactically or semantically. Among eight settings, BiLSTM provides the best performance with 83% accuracy on test data. Furthermore, our methods surpass the performance of different machine learning and deep learning-based approaches. In the future, we would like to apply our methods to low-resourced languages including Bangla. It would be interesting to apply hybrid neural networks with attention architecture.
340
M. B. Baby et al.
References 1. Dimitrov Y (2020) Combining word embeddings and convolutional neural networks to detect duplicated questions. arXiv preprint. arXiv:2006.04513 2. Imtiaz Z, Umer M, Ahmad M, Ullah S, Choi GS, Mehmood GS (2020) Duplicate questions pair detection using Siamese MaLSTM. IEEE Access 8:21932–21942 3. Farhin F, Kaiser MS, Mahmud M (2020) Towards secured service provisioning for the internet of healthcare things. In: Proceedings of AICT, pp 1–6 4. Mahmud M, Shamim Kaiser M (2021) Machine learning in fighting pandemics: a Covid-19 case study. In: COVID-19: prediction, decision-making, and its impacts, pp 77–81 5. Mahmud M, Shamim Kaiser M, Martin McGinnity T, Hussain A (2021) Deep learning in mining biological data. Cogn Comput 13(1):1–33 6. Paul A, Basu A, Mahmud M, Shamim Kaiser M, Sarkar R (2022) Inverted bell-curve-based ensemble of deep learning models for detection of Covid-19 from chest x-rays. Neural Comput Appl 1–15 7. Tahura S, Hasnat Samiul SM, Shamim Kaiser M, Mahmud M (2021) Anomaly detection in electroencephalography signal using deep learning model. In: Proceedings of TCCE, pp 205– 217 8. Kaiser MS, Mahmud M, Binth Taj Noor M, Zerin Zenia N, Al Mamun S, Abir Mahmud KM, Azad S, Manjunath Aradhya VN, Stephan P, Stephan T et al (2021) iWorksafe: towards healthy workplaces during covid-19 with an intelligent phealth app for industrial settings. IEEE Access 9:13814–13828 9. Othman N, Faiz R, Smaïli K (2022) Learning English and Arabic question similarity with Siamese neural networks in community question answering services. Data Knowl Eng 138:101962 10. Chali Y, Islam R (2018) Question-question similarity in online forums. In: Proceedings of the 10th annual meeting of the forum for information retrieval evaluation, pp 21–28 11. Balla HAMN, Salvador ML, Delany SJ (2022) Arabic medical community question answering using on-ISTM and CNN. In: 2022 14th international conference on machine learning and computing (ICMLC), pp 298–307 12. Waad Thuwaini Alshammari and Sarah AlHumoud (2022) TAQS: an Arabic question similarity system using transfer learning of BERT with BILSTM. IEEE Access 10:91509–91523 13. Chawla S, Aggarwal P, Kaur R (2022) Comparative analysis of semantic similarity word embedding techniques for paraphrase detection. Emerging technologies for computing, communication and smart cities. Springer, pp 15–29 14. Meenakshi D, Shanavas ARM (2022) Transformer induced enhanced feature engineering for contextual similarity detection in text. Bull Electr Eng Inform 11(4):2124–2130 15. Xu Z, Hua Y (2020) Forum duplicate question detection by domain adaptive semantic matching. IEEE Access 8:56029–56038 16. Rani S, Kumar A, Kumar N (2022) Eliminating data duplication in CQA platforms using deep neural model. Comput Intell Neurosci 17. Sakhrani H, Parekh S, Ratadiya P (2021) Contextualized embeddings based convolutional neural networks for duplicate question identification. arXiv preprint. arXiv:2109.01560 18. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146 19. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint. arXiv:1301.3781 20. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, p 26 21. Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In: Proceedings of the international conference on language resources and evaluation (LREC 2018)
Identifying Duplicate Questions Leveraging Recurrent Neural Network
341
22. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 23. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681 24. Iyer S, Dandekar N, Csernai K et al (2017) First Quora dataset release: question pairs. data.quora.com 25. Homma Y, Sy S, Yeh C (2016) Detecting duplicate questions with deep learning. In: Proceedings of the international conference on neural information processing systems (NIPS) 26. Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. arXiv preprint. arXiv:1702.03814 27. Viswanathan S, Damodaran N, Simon A, George A, Anand Kumar M, Soman KP (2019) Detection of duplicates in Quora and twitter corpus. Advances in big data and cloud computing. Springer, pp 519–528
StegoPix2Pix: Image Steganography Method via Pix2Pix Networks Md. Min-ha-zul Abedin and Mohammad Abu Yousuf
Abstract Steganography is the strategy of concealing secret data inside images. With the quick advancement of deep learning-based methods in steganalysis, it turns out to be an extremely challenging to design a secure steganographic method. In this paper, we propose a steganographic technique based on GAN, named StegoPix2Pix. In the proposed method U-Net accepts cover image and secret message information as input to synthesize stego-image. This method can effectively withstand steganalysis equipment. Steganalysis cannot identify our stego-images from other generated images using GAN or similar methods. In the meantime, our technique can generate stego-images of arbitrary size with 0.01 bpp, this is an improvement over other steganographic method those, who only can embed fixed-length message information into cover image. Experimental results show that StegoPix2Pix can accomplish security, reliability, and good visual effect. Keywords GAN · Pix2Pix · CNN · StegoPix2Pix
1 Introduction Cryptography and Steganography are the two strategies utilized in secret data communication. Steganography conceals the existence of secret information. Image steganography deals with hiding secret message in an image. It hides information in a way that it looks like normal to other people, but receiver can retrieve the secret data from the image. The carrier of the secret message is known as cover image, on the other hand output image which is embedded with the secret data is known as stego-image. Md. Min-ha-zul Abedin (B) Department of ICT, Bangladesh University of Professionals, Dhaka, Bangladesh e-mail: [email protected] M. A. Yousuf IIT, Jahangirnagar University, Savar, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_29
343
344
Md. Min-ha-zul Abedin and M. A. Yousuf
Traditional image steganography hides information via embedding secret data into source image through some sort of change in source image. This type of embedding makes some distortion in the source image. Minimizing distortion in source image while embedding is the primary goal in steganographic research. Mielikainen et al. introduced an improved technique for LSB matching which able to embed secret information into images as LSB matching but needed less modification to source image [9]. HUGO [12] an embedding technique that is designed to minimize distortion function. Holub and Fridrich presented WOW [6] which find complex texture region to insert data into cover image. Some adaptive steganographic algorithm also proposed which minimizes distortion with the help of syndrome-trellis code. However, there are possibilities of being noticed, mostly when steganalysis method uses deep learning-based architecture. Whether an image normal or stego-image is determined by steganalysis algorithms. Fridrich et al. proposed a technique which is quite consistent and accurate for the detection of the non-sequential LSB embedding in images [2]. Shi et al. presented a method which uses Markov process for JPEG image steganalysis [16]. Fridrich and Kodovsky introduced a technique for the steganalysis where a large number of diverse submodules is used in a model [1]. Goljan et al. extended this method for color image steganalysis [3]. Recent advancement in deep learning leads to advanced algorithms for steganalysis. Qian et al. presented an end to end CNN based model for steganalysis [13]. Zeng et al. introduced a hybrid deep-learning technique including domain knowledge of rich models for JPEG image steganalysis [19]. AI based steganalysis algorithms are becoming more powerful and imposing a key challenge to security of steganography which is based on embedding. As deep learning becoming more and more capable [15]. Currently, there are many researchers who are attempting to introduce deep-learning based methods in steganography, such as HiDDeN [21] and SteGAN [5]. CNN based methods learns without any domain knowledge [17, 18]. In deep learning-based method, neural networks are used in place of distortion functions to locate pixels that are fit for concealing secret message. Nevertheless, they are also susceptible to security issue. Additionally using traditional deep learning based steganographic algorithm we do not have the freedom of choosing cover image which led to more security issue, capacity, and size of stego-images cannot change after model is trained. For the improvement of these circumstances, we try to explore a steganographic method based on deep convolutional network that can overcome these shortcomings. Generative Adversarial Network (GAN) [4] developed by Goodfellow consist of two model. In GAN generative model and discriminative model compete against each other so that generative model can produce realistic sample where discriminative model learns to discriminated between real and generated sample [10]. There are several extension of GAN are proposed like DCGAN [14], cGAN [11], Pix2Pix [8] which can produce realistic images. Pix2Pix is an image-to-image transformation framework. It hides source image information into the target images unnoticeably. It also can reconstruct source image from generated image. We investigate whether it can be helpful for steganography. In our method we explored this case with steganography in which neural network can discover ways to encode message information
StegoPix2Pix: Image Steganography Method via Pix2Pix Networks
345
Fig. 1 Proposed architecture of StegoPix2Pix
in the procedure of image-to-image transformation. So we propose StegoPix2Pix (Fig. 1) which is an image steganographic method based on Pix2Pix [8]. StegoPix2Pix is a deep learning based steganographic method without any special knowledge in steganography. StegoPix2Pix takes content image and secret message information as input and generate stego-image as output. We can send the stego-image via public network, and receiver can reconstruct secret data from the stego-image. The main contribution of this paper includes: • We propose a new framework for steganography using Pix2Pix. Previous technique has no control over image cover image synthesis, but our framework provides full control over cover image selection for the user. • Accuracy for the secret data recovery is high. • Our framework can resist detection of steganography by steganalysis algorithm.
2 Model Pix2Pix is a variation of GAN which works as image-to-image translator. Our model consists of two different parts: U-Net, responsible for image generation and Discriminator, responsible for adversarial training via discriminating real and generated sample. Both U-Net and Discriminator are part of Pix2Pix GAN. Extractor network, responsible for extracting information form image. Security analyzer network checks steganographic security of our U-Net. U-Net is responsible for input data hiding in the cover image. The extractor network accepts stego-image (generated by U-Net) as input and reconstruct secret information.
346
Md. Min-ha-zul Abedin and M. A. Yousuf
Image-to-image translation is a computer vision task that involves controlled conversion of input image to target image. Pix2Pix an approach for image-to-image translation which based on conditional GAN. In pix2pix target image is generated on the condition of input image and pix2pix modify loss function in order to generate image that follow target data distribution and also plausible translation of source image. LcG AN (G, D) = Ex,y logD(x, y) + Ex,z log(1 − D(x, G(x, z)))
(1)
The objective function of cGAN is shown in Eq. (1), where G tries to minimize, and D tries to maximize the function. In out model we have used L1 distance function which make our final objective as follow G ∗ = arg min maxLcG AN (G, D) + λL1
(2)
stego = cover = G(sour ceimage)
(3)
G
D
Generator model (G-model) is more complex than other model used in the proposed architecture. G-model is an encoder-decoder model which is based on U-Net architecture. In this work we have used modified Pix2Pix image translation framework for stego-image generator. The modification is in the U-Net part of the framework. Secret message information is added to U-Net at bottleneck of the network. U-Net consists of three connected sub-parts down sampling, bottleneck, and up sampling. In down sampling there are five convolutional layers. All down sampling layer is made of Conv2D which have kernel value four, stride value two, padding value same, kernel_initializer value, Dropout with 0.5 probability along with InstanceNormalization, and as activation function LeakkyReLU with 0.2 value is used. Bottleneck is one dimensional feature extracted from input image which we concatenate with encoded secret message information. In up sampling there are five up sampling and four concatenation layers. Up sampling layer consist of UpSampling2D followed by Convolutional layer which have kernel value four, stride value two, padding value same, kernel_initializer value, Dropout with 0.5 probability along with InstanceNormalization, and as activation function LeakkyReLU with 0.2 value is used. After every up-sampling layer there is a contamination between same dimensional down sampling layer output. G-model accepts source image as input and produces target image as output through encoding the source image to bottleneck layer, then decoding the bottleneck to target image. In G-model skip-connections are added between down sampling and corresponding up sampling to form U-shape shown in Fig. 2.
L(E) =
n i=1
(z − E(stego))2
(4)
StegoPix2Pix: Image Steganography Method via Pix2Pix Networks
347
Fig. 2 U-Net for proposed architecture
Discriminator (D-model) is deep-convolutional neural-network, simply works as a binary classifier, more specifically conditional-image classification. The job of discriminator is to distinguish between real and generated data. If sample is generated by the generator, then discriminator label that data as “fake” and other data as “real”. Discriminator takes source image and target image as input and calculates the probability of target-image being real or fake. It calculates cross-entropy loss for the source and target. The output of the discriminator is used as feedback for Backpropagation algorithms for both discriminator and generator. Discriminator is made of six convolutional layers shown in Fig. 3. All convolutional layers made of Conv2D which have kernel value four, stride value two, padding value same, kernel_initializer value, Dropout with 0.5 probability along with InstanceNormalization, and as activation function LeakyReLU with 0.2 value is used. In output layer we have used Sig-moid as activation function instead of LeakyReLU. Stego-image is input for the Extractor network, and it reconstructs secret message M. The goal of extractor network is to recover secret message M information from the stego-image. Squared Euclidean distance is used as loss function for the extractor network. Input of extractor network is 32 × 32 × 3 dimensional stego-image, and output is secret message information. Main building block of extractor network is CNN, dropout, activation function, BatchNormilization, and pooling layer so that extractor network can learn nonlinear features. Extractor networks consist of five convolutional layers followed by three fully connected layers shown in Fig. 4. Each convolutional layer consists of Conv2D which have kernel size there, padding value same as activation function LeakyReLU with alpha value 0.2 along with BatchNormalization. We have used fully connected layers after feature extraction using coevolution layers. First layer consists of 512 nodes and
348
Md. Min-ha-zul Abedin and M. A. Yousuf
Fig. 3 Discriminator for adversarial training process
Fig. 4 Reveal network
activation function tanh, second layer consist of 256 nodes and activation function tanh. In final layer we have matched the secret message embedding dimension with tanh activation function.
3 Experimental Result In this work, we have used CIFR10 dataset as source image that is publicly available. This dataset comprised of 60,000 color image of size 32 × 32 × 3 in 10 classes. There are no class imbalances. Each class have ten thousand photos. The dataset has
StegoPix2Pix: Image Steganography Method via Pix2Pix Networks
349
50,000 training images and 10,000 test images. The dataset is provided on CIFAR10 official website as well as in different deep-learning framework and library. Each image is rescaled, shuffled as preprocessing. In the training process D-model trained on real image and generated image directly. On the other hand, G-model trained through D-model. On each iteration the parameter of G-model is updated so that the loss calculated by D-model can be minimize and generated image can be marked as real image. G-model is also updated so that it can minimize L1 loss between generated and target image. G-model is updated through a weighted sum of adversarial loss and L1 loss. For training purpose, we built a composite model sacking G-model on top of D-model. Source image is given as input to G-model as well as to D-model and the output of the G-model also given to D-model as corresponding target-image. D-model calculates the likelihood of the generated image being a tangible transformation of source image. In composite model update, parameter of D-model kept as not trainable. Combined model gets updated for two goals, one for cross-entropy loss calculated as generated image is real and other for L1 loss. Training of D-model requires batches of real and generated images. We have generated random image from training dataset and label them as class “1” to incite real image, along with that we used G-model to predict a batch of generated image and label them as class “0” to indicate generated image or fake image. For generating image using G-model, we used batch of image prepared in previous step labeled as “1”. Typically, training of GAN does not converge. Instead, convergence an equilibrium is found between G-model and D-model. So, we cannot easily determine when training should stop. There is no easy way or good method for measuring performance of GAN. Therefore, we periodically (10 epoch) saved G-model, D-model, some generated image for later analysis. In our experiment, we have implemented three different G-model. The variation was in bottleneck and secret message embedding part. First experiment was 8-bit secret message hiding, second was 16 bit, and third was 32 bits. To evaluate Reveal-Net and the accuracy of data recovery, we have used 10,000 images from the test set. Note that no test image appeared in the training set. RevelNet can decrypt up to 99.8% in case of 8-bit data hiding, accuracy of all Reveal-Net is shown in Fig. 5. Though this is not perfect decoding, in practical application 100% successful decoding can be achieved through introducing error-correction-coding. Training of StegoPix2Pix is performed on CIFR10 image dataset. We optimize Pix2Pix on the basis of objective function shown in Eq. (1). Learning rate kept to 0.0001 and the mini-batch to 100 and every model is trained for 100 epochs. In training process, we can observe that GAN loss is oscillating that was because Gmodel compete with D-model. When G-model perform better, D-model gets higher penalty and update its parameter accordingly that ultimately makes G-model performance bad in Fig. 6 we shown training GAN loss of 3 different G-mode. Some output images from G-model1, G-model2, and G-model3 are shown in Figs. 7, 8, and 9 respectively.
350
Md. Min-ha-zul Abedin and M. A. Yousuf
Fig. 5 Training accuracy for different reveal-net
Fig. 6 StegoPix2Pix training loss
Relative capacity =
Abslute capacity Size of image
(5)
Relative capacity is indicator how much information can be concealed in cover image. We have designed three different G-model in which we used 8-bit secret message information in G-model1, 16-bit secret message information in G-model2, and 32 bi secret message information in G-model3. So, in out experiment relative capacity is 0.002, 0.005, and 0.01 respectively for G-model1, G-model2, and Gmodel3 (Table 1).
StegoPix2Pix: Image Steganography Method via Pix2Pix Networks
Fig. 7 Generated image through G-model1
Fig. 8 Generated image through G-model2
Fig. 9 Generated image through G-model3
351
352
Md. Min-ha-zul Abedin and M. A. Yousuf
Table 1 Steganography capacity of different methods Methods
Absolute capacity
Image size
Relative capacity
Method in Zheng et al. [20]
2.25
512 × 512
8.58e-6
Method in Xu et al. []
64 × 64
800 × 800
6.4e-3
Method in Hu et al. [7]
≥37.5
64 × 64
9.16e-3
G-model1(ours)
8
32 × 32 × 3
0.002
G-model2(ours)
16
32 × 32 × 3
0.005
G-model3(ours)
32
32 × 32 × 3
0.01
4 Security Analysis Traditional steganographic method embeds secret message information by changing specific pixels in order to minimize distortion to statistical properties and appearance of image. For steganalysis it is assumed that steganographic algorithm is known. Using a large number of stego and cover image pair as training dataset we can train a model to detect whether the image contains secret message information. Not like traditional steganographic algorithm, in our method we generate a new image with given image and secret message information. As in our method a new image was generated using G-model, it can significantly resist most steganalysis tools. Steganalysis task is to distinguish between a generated image using secret message information and generated image using DCGAN and Pix2Pix. We built a steganalysis-model using deep convolution network in order to evaluate security of our proposed method. Steganalysis-model will distinguish between generated image using our method and other generated using DCGAN and Pix2Pix. Steganalysis-model consist of six convolutional network units. All convolutional layers made of Conv2D which have kernel value four, stride value two, padding value same, kernel_initializer value, Dropout with 0.5 probability along with InstanceNormalization, and as activation function LeakyReLU with 0.2 value is used. In output layer we have used Sigmoid as activation function for making binary output. In order to train steganalysis-model we marked stego images as positive sample and normal generated image as negative sample. After training we tested on our G-model. There are three sets of data generated by three different G-model with different random seed. These three sets of data are labeled as positive. At the same time, we train Pix2Pix and DCGAN to generate image from CIFR10 dataset and 100dimensional random noise, respectively. This data is labeled as negative. Experimental results shows that model has remarkably high security. Figures 10, 11, and 12 shows predicted probability and corresponding image which is generated through G-model1, G-model2, and G-model3 respectively. We can observe from the figure that security analysis network is making random choice between “True” and “False”, and stego images are not distinguishable from other generated image.
StegoPix2Pix: Image Steganography Method via Pix2Pix Networks
Fig. 10 Predicted probability and corresponding image generated using G-model1
Fig. 11 Predicted probability and corresponding image generated using G-model2
Fig. 12 Predicted probability and corresponding image generated using G-model3
353
354
Md. Min-ha-zul Abedin and M. A. Yousuf
5 Reliability If we consider a threat scenario where, an opponent or hacker has an idea that StegoPix2Pix has been applied and attempts to reveal secret message data from the stego image. For assessing our method’s reliability, we suppose that an opponent or hacker has an idea about the network structure, hyperparameters, and dataset, except having the random seed (which helps initializing parameters besides affecting the values of final model parameters). We have trained 2 separate StegoPix2Pix model. Where, one of the models is engaged in embedding secret information and generates stego-images; and another one endeavors in predicting secret information from the stego-images. Firstly, we train the two models using similar training specifications i.e., the same dataset and hyperparameters. Accuracy of secret information retrieval might get to 0.99. Secondly, considering another experimental paradigm, we will apply altered random seed whereas keeping the other training conditions unchanged. Unless the decoding accuracy is 0.39 (0.5 corresponding to random guess), no productive information can be acquired. Hence, the result validates the high-level of reliability offered by our proposed model. Despite of mastering the algorithm as well as the stego-image, an adversary is incapable of getting the secret information. In order to achieve a more enhanced steganographic security, the secret information can be encrypted prior to masking it to cover images. Unique steganographic strategy can be automatically learned by our proposed model’s neural network, where the strategy remains same while training the model using similar training conditions. In real application scenario, transmitting the whole parameters of the model directly through public channel is extremely threatening. Thus, two communicating parties will be able to share the conditions of training rather than transmitting the whole parameters of the model. Using the same training conditions, the model is locally trained by them. Moreover, distinct steganographic system can be acquired by just modifying the training conditions.
6 Conclusion A novel Pix2Pix method-based image steganographic method has been proposed in this paper. This is an end-to-end neural network having no steganographic domain knowledge. The information of secret message stays embedded with the style features of stego-image. The proposed method is capable of resisting steganalysis algorithm. Considering the capacity, StegoPix2Pix is capable of producing stego-images having size of 0.01 bit per pixel. Experiment results establish that the approach proposed in this paper can compete with the prevailing steganographic algorithms on security, capacity, reliability, and visual effect. Yet, this novel model retains drawbacks similar to previous deep steganographic models. In comparison to the conventional steganographic algorithms, additional expense is needed for training our neural network. In
StegoPix2Pix: Image Steganography Method via Pix2Pix Networks
355
forthcoming research, we would like to consider different domains, like text, audio, and video to analyze compatibility of our proposed method.
References 1. Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE J Mag IEEE Xplore. https://ieeexplore.ieee.org/document/6197267. Accessed Apr. 21, 2022 2. Fridrich J et al (2001) Detecting LSB steganography in color, and gray-scale images. IEEE J Mag IEEE Xplore. https://ieeexplore.ieee.org/document/959097. Accessed Apr. 21, 2022 3. Goljan M et al (2014) Rich model for Steganalysis of color images. IEEE Conference Publication, IEEE Xplore. https://ieeexplore.ieee.org/document/7084325. Accessed Apr. 21, 2022 4. Goodfellow IJ et al (2014) Generative adversarial networks. arXiv:1406.2661 [cs, stat]. Accessed Apr. 21, 2022 [Online]. http://arxiv.org/abs/1406.2661 5. Hayes J, Danezis G (2017) Generating steganographic images via adversarial training. In: Advances in neural information processing systems, vol 30. [Online]. https://papers.nips. cc/paper/2017/hash/fe2d010308a6b3799a3d9c728ee74244-Abstract.html. Accessed Apr. 21, 2022 6. Holub V, Fridrich J (2012) Designing steganographic distortion using directional filters. In: 2012 IEEE international workshop on information forensics and security (WIFS), pp 234–239. https://doi.org/10.1109/WIFS.2012.6412655 7. Hu D et al (2018) A novel image steganography method via deep convolutional generative adversarial networks. IEEE J Mag IEEE Xplore. https://ieeexplore.ieee.org/document/ 8403208. Accessed Apr. 23 2022 8. Isola P, Zhu J-Y, Zhou T, Efros AA (2018) Image-to-image translation with conditional adversarial networks. arXiv:1611.07004 [cs]. Accessed Apr. 21, 2022 [Online]. http://arxiv.org/abs/ 1611.07004 9. Mielikainen J (2006) LSB matching revisited. IEEE Signal Process Lett 13(5):285–287. https:// doi.org/10.1109/LSP.2006.870357 10. Min-ha-zul Abedin Md, Ghosh T, Mehrub T, Yousuf MA (2022) Bangla printed character generation from handwritten character using GAN. In: Soft computing for data analytics, classification model and control, studies in fuzziness and soft computing book series, vol 413. Springer, pp 153–165 11. Mirza M, Osindero S (2014) [1411.1784] Conditional generative adversarial nets. https://arxiv. org/abs/1411.1784. Accessed Apr. 21, 2022 12. Pevný T et al (2010) Using high-dimensional image models to perform highly undetectable steganography. SpringerLink. https://link.springer.com/chapter/10.1007/978-3-64216435-4_13. Accessed Apr. 20, 2022 13. Qian Y et al (2015) Deep learning for steganalysis via convolutional neural networks. https:// www.spiedigitallibrary.org/conference-proceedings-of-spie/9409/94090J/Deep-learning-forsteganalysis-via-convolutional-neural-networks/10.1117/12.2083479.full?SSO=1. Accessed Apr. 21, 2022 14. Radford A et al (2015) [1511.06434] Unsupervised representation learning with deep convolutional generative adversarial networks. https://arxiv.org/abs/1511.06434. Accessed Apr. 21, 2022 15. Sazzad S, Rajbongshi A, Shakil R, Akter B, Kaiser MS (2022) RoseNet: rose leave dataset for the development of an automation system to recognize the diseases of rose. Data Brief 44:108497 16. Shi YQ et al (2007) A Markov process based approach to effective attacking JPEG steganography—New Jersey Institute of Technology. https://researchwith.njit.edu/en/publications/amarkov-process-based-approach-to-effective-attacking-jpeg-stega. Accessed Apr. 21, 2022
356
Md. Min-ha-zul Abedin and M. A. Yousuf
17. Ghosh T, Min-Ha-Zul Abedin Md, Chowdhury SM, Tasnim Z, Karim T, Salim Reza SM, Saika S, Yousuf MA (2020) Bangla handwritten character recognition using MobileNet V1 Architecture. Bull Electr Eng Inform 9(6):2547–2554. https://doi.org/10.11591/eei.v9i6.2234 18. Ghosh T, Min-Ha-Zul Abedin Md, Hasan Al Banna Md, Mumenin N, Yousuf MA (2021) Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognit Image Anal Springer Link 31(Issue 1):60– 71 19. Zeng J et al (2017) [1611.03233] Large-scale JPEG steganalysis using hybrid deep-learning frame-work. https://arxiv.org/abs/1611.03233. Accessed Apr. 21, 2022 20. Zheng S et al (2017) Coverless information hiding based on robust image hashing. springerprofes-sional.de. https://www.springerprofessional.de/en/coverless-informationhiding-based-on-robust-image-hashing/13319066. Accessed Apr. 23 2022 21. Zhu J, Kaplan R, Johnson J, Fei-Fei L (2018) HiDDeN: hiding data with deep networks, arXiv:1807.09937 [cs]. Accessed Apr. 21, 2022 [Online]. http://arxiv.org/abs/1807.09937
Low-Cost Energy Efficient Encryption Algorithm for Portable Device Abrar Fahim Alam
and M. Shamim Kaiser
Abstract The number of portable digital devices is escalating day by day with the advancement of technology. As the number of digital devices users is expanding, the security concer for the privacy of their shared information is also thriving. Cyber criminals wait for the opportunities to steal and tamper with the secret information shared by the users. To protect the cyber world from these criminals by ensuring the secured communication channel between users in an affordable cost is now a big challenge for researchers. Considering these concerns, the target of this research is to develop an encryption algorithm for portable devices which will be less costly and energy proficient. For that purpose, a low-cost energy efficient algorithm has been proposed which needs two private keys of sender and receiver and one global public key. A shared symmetric key is generated by mixing those keys in both sender and receiver sides. For different sets of public and private keys, the generated shared symmetric key is dissimilar. This symmetric key can be used for encrypting the plain text and decrypting the cipher text. Lastly, a performance analysis has been performed which shows that the algorithm is resilient against the brute force attack and takes less computational time to execute in comparison with the existing Diffie–Hellman Algorithm. Keywords Encryption · Security · Privacy · Portable
1 Introduction 1.1 Background With the advancement of technology, the number of users in the cyber world is escalating. Every second, enormous amount of data are transferred into cyber space. But the security of the transmitted data is one of the major concerns. Hackers and A. F. Alam (B) · M. S. Kaiser Institute of Information Technology, Jahangirnagar University, Savar, Dhaka 1342, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_30
357
358
A. F. Alam and M. S. Kaiser
Intruders all the time try to steal and corrupt this data. To overcome these challenges, researchers have come up with different solutions [4, 8–11, 14, 27]. Encryption algorithms are widely used in the healthcare sector. Li et al. (2018) generates the initial key for a chaotic system using electrocardiography (ECG) signals and divides a plain picture into blocks of certain sizes appropriate for further encryption using an autoblocking approach [18]. As different languages have different symbols, researches have taken distinct strategy for encryption and decryption. Geetha et al. (2020) has proposed a hybridized approach for Tamil language encryption which uses a symmetric key for both encryption and decryption [12]. Encryption and decryption are not only used for text messages, but also for graphical files. Zhang et al. (2020) has developed an asymmetric image encryption algorithm with the help of elliptic curve cryptosystem (ECC) and the chaotic system [28]. Now we are moving toward the era of the 4th industrial revolution and an enormous amount of data is being produced every day. One of the major concerns is to ensure the safety of this huge dataset. Researchers are working to find efficient encryption techniques to protect the data security for the appliances being used in advanced technology. Rajesh et al. (2018) has proposed a lightweight encryption algorithm that can provide data security for IoT devices [21]. The researchers who worked in the field of improving the performance of the symmetric key generation algorithm have mentioned that it is tough to ensure high secrecy and low computational time to execute the algorithm which will provide energy efficiency.
1.2 Problem Statement Though there are many techniques available for encryption, it is laborious to develop an algorithm that will be energy efficient and affordable for portable devices. This study is designed to develop a low-cost energy efficient algorithm for portable devices.
1.3 Aim and Objective The goal of this research work is to develop an encryption algorithm that will fill the research gap. To develop this algorithm, some objectives have been set, which are listed as follows: – To study different types of energy efficient encryption algorithms available in the literature – To design and implement low-cost energy efficient encryption algorithm – To compare the performance of the proposed algorithm with the existing Diffie– Hellman algorithm.
Low-Cost Energy Efficient Encryption Algorithm for Portable Device
359
1.4 Research Outline The remainder of the article is arranged as follows. In Sect. 2, A literature review of relevant work is provided, along with explanations. Section 3 includes the proposed system model in detail. In Sect. 4, a performance analysis of the proposed system model is shown. Lastly, in Sect. 5, conclusion and future work are mentioned.
2 Literature Review 2.1 Related Works Symmetric key and asymmetric key cryptography are the two main cryptography algorithms depending on the number of keys utilized. Asymmetric key cryptography uses two types of secret keys. Among them, one is a public key which is available to all and the other one is a private key which is only available to the specific user. As it uses both, a public and a secret key, asymmetric key cryptography is sometimes known as public key cryptography. On the other hand, symmetric key cryptography entails using a secret key in conjunction with encryption and decryption methods to safeguard communication content. The main challenge of the symmetric key cryptography is the exchange of the secret key through insecure communication channel as if anyone gets the symmetric secret key he can easily decrypt the cipher text. Researches has used different types of techniques for encryption and decryption. Some researcher has used biological data for cryptography. Every human has identical characteristics which are used by the researchers for generating the symmetric key which can be used for encryption and decryption. Gonzalez et al. (2017) has used ECG for generating time invariant symmetric key. It works well for a time period of 24 h but has not been tested for a large scale of data sets [13]. Kumari et al. (2018) has developed symmetric key generation techniques for body sensor networks which works well against brute force and known plain text attacks. But no key revocation method has been implemented in this research work [17]. Sarkar et al. (2018) has proposed a model in which reversible fingerprint of sender and receiver are being used for encrypting and decrypting the plain text. But this method is yet to be tested against major cyberattacks [23]. Bio-inspired algorithms are also being used for producing the secret key for encryption. Sindhuja et al. (2014) have used genetic algorithm to generate for symmetric key encryption. Several key operations are being used in this research and it is analayzed that modulo operation has the higher order of complexity. Though the proposed algorithm is easy to implement, but it couldn’t ensure higher level of security due to linear substitution methodology [24]. Chunka et al. has developed a genetic algorithm-based key generation techniques which is resilient against frequency attack and brute force attack, but it is not tested against other attacks [6].
360
A. F. Alam and M. S. Kaiser
Researchers have proposed different algorithms for secure secret key exchange. Diffie–Hellman Algorithm is one of the most popular key exchange protocol which uses two public keys and two private keys. Among the two public keys, one is a prime number and the other is a primitive root of the prime number [7]. But this algorithm uses power and mod function which increases the computational time to run the program. There are two major areas where an algorithm can be improved. The computational time complexity and resilience to various attacks are two such factors. Many researchers have worked out on improving the performance of the existing algorithms in terms of the security and time complexity.
2.2 Comparison Among Different Model Researchers have not only developed different types of algorithms, but also analyzed the existing models. Researchers have also made comparisons among different models. Some researchers have analyzed the limitation of different models. Purohit et al. has used Diffie–Hellman algorithm for producing encryption key and analyzed that the existing algorithm is not resilient against the plain text attack as both cipher text and plain text are available to the intruders [20]. Some researchers have made comparison between different models. Vyakaranal et al. has made a comparative analysis among different different algorithms which have used symmetric key. It is examined that Blowfish algorithm has lesser time and space complexity in comparison to DES and AES algorithm [25]. Security of the shared data is a burning topic in today’s world. Researchers have developed distinct methodologies for different divisions of technology. Kako et al. has developed a symmetric key generation algorithm to convert a sequence of letters into single letters. It shows better results than the existing playfair algorithm, but it is not resilient against the plain text attack [15]. Prakash et al. (2020) has developed an authentication method for e-commerce system by hybridizing existing methods [1]. Several researchers has executed different methodology to improve the performance of the existing system models. Researchers have optimized the existing algorithm to improve the performance of the existing algorithm. Ali et al. (2020) has suggested a modified Diffie–Hellman algorithm which performs better in most of the cases than the original Diffie–Hellman algorithm for wireless sensor network, but the algorithm is not resilient against the plain text attack [2]. Researchers have tried different types of techniques to improve the time and space complexity of the existing algorithms. Chiang et al. (2011) have come up with another method for exchanging the secret keys. They have proposed a three way key exchange protocol which has lower service time and queuing delay from similar key exchange protocol [5]. Some researchers developed algorithms to protect data in the insecure wireless network, but do not test them in real-time scenarios. Sahin et al. (2016) has proposed a symmetric key generation algorithm to protect data in the insecure wireless network,
Low-Cost Energy Efficient Encryption Algorithm for Portable Device
361
but the performance is not tested in real-life communication [22]. Some researchers proposed an algorithm which works well for a certain type of test case but does not work well in different scenarios. Li et al. (2017) has proposed a shared key generation algorithm for insecure wireless channel which focused only on the passive attacks but might not work well against the active attacks [19]. Researchers have made comparisons among different types of models and tried to hybridize those models to make an optimum model. Amalarethinam et al. has made a comparative analysis among different symmetric key encryption algorithms and observed that the data stored in the cloud can be secured by using a proper encryption algorithm. But in the research work, no performance optimized techniques for the existing algorithms has been introduced [3]. Keserwani et al. has developed an optimized symmetric key encryption algorithm by hybridizing the existing models. Through it performed well against brute force attack, it is not resilient against plain text attack [16].
2.3 Research Gap Over the time, with the evolution of technology, researchers have presented different encryption algorithms but a research work focusing on the affordability of mass portable device users is missing. So keeping that in mind, it was cherished to develop a low-cost energy efficient highly secured encryption algorithm that will be suitable for portable devices.
3 Proposed System Model The goal of this research work is to develop an encryption algorithm that will not only assure the security of the data, but it will also be affordable cost-wise so that less computational power is consumed, which will be suitable for portable devices like smartphone and laptop. The proposed system model has two major parts. At first, the encryption algorithm will generate a shared symmetric key which is shown in the Fig. 1. These system model needs two private keys and one public key. Both Sender and receiver will mix their private keys with the global public key. After that, they will exchange the keys they have generated. Afterward, they will mix their private keys with the received key, and at that point, the shared symmetric secret key will be generated. Algorithm for generating the shared secret key is shown in Algorithm 1. At first, the sender and the receiver will mix their private keys with global public key. This mixing has been done with XOR operation. XOR Operation is used because it reduces the time complexity significantly. After mixing with the global key, two keys will be generated which are stated as x and y in the algorithm. Then they will exchange this generated key. After that, both parties will again mix the generated key with their
362
A. F. Alam and M. S. Kaiser
Fig. 1 Symmetric key generation
Private Key of A
Global Public Key
Private Key of B
Exchange of generated key
Shared Secret key
private keys. This mixing has also been done by using the XOR operation. After that, the shared secret key will be generated at both ends. These secret keys are identical to each other. Algorithm 1 Calculate shar ed secr et key 1: a ⇐ private key of A 2: b ⇐ private key of B 3: c ⇐ global public key 4: Generated K ey x = a XOR c 5: Generated K ey y = b X O R c 6: A Sends x to B 7: B Sends y to A 8: Generated Secr et K ey f or A = y X O R a 9: Generated Secr et K ey f or B = x X O R b 10: A and B has now shar ed symmetric key
In the proposed algorithm, the mixing of the public and private keys has been done with XOR operation. If someone steals the key that have been exchanged, he can’t generate the same symmetric key as he doesn’t know the private keys of the sender
Low-Cost Energy Efficient Encryption Algorithm for Portable Device
Decrypt
Encrypt
Plain Text Message
363
Cipher Text]uccqwu
Plain Text Message
Fig. 2 Encryption and decryption technique
or receiver. As we have generated the shared secret symmetric key in the previous steps, which is marked as black, now it is time to use the key to encrypt and decrypt the plain text to cipher text. At first, at the sender part, the plain text will be mixed with secret key which will result in the cipher text. Then the ciper text will be sent through the insecure communication channel. As it is an encrypted message, none can see the plain text which is sent by the sender. When the receiver will receive the message, he will decrypt the message with the same secret key that has been generated earlier. After that, the receiver will get the same plain text which is sent by the sender. For a plain text “Message”, the generated cipher text will be “]uccqwu”. Figure 2 shows the visual representation of encryption decryption techniques.
4 Performance Analysis As XOR operation between the private keys and global public keys has been proposed to generate the shared secret key, the shared secret key will be different for different set of private keys of different user. As the shared secret key is different, it will generate different cipher text for different plain text.
4.1 Resilience Against Brute Force Attack Brute force attack is one of the most common cyber attack that works based on trial and error method. Brute force attack checks all the possible keys to crack the secret key. The proposed algorithm is tested against the brute force attack to check how long it will take to guess the shared symmetric key. The performance of the algorithm against the brute force attack is shown in Table 1. The shared symmetric key which
364
A. F. Alam and M. S. Kaiser
Table 1 Performance against brute force attack The key length 128 bits 2128 1 key per micro second 5.4 × 1024 years
Required no of test keys Attack speed Time required
will be generated in both sender and receiver sides will be 128 bits. So it will be required to test 2128 times to find the shared symmetric key with a brute force attack as it requires to test all the possible key. If the brute force can generate 1 key in per micro second, then 5.4 × 1024 years will be required to find the generated symmetric key.
4.2 Computational Time A comparison has been made between existing Diffie–Hellman and the proposed algorithm which has been shown in Fig. 3. From the figure, we can clearly see that the proposed algorithm is performing better than the existing Diffie–Hellman Algorithm in terms of computational time. Yuki et al. has analyzed the relation between compilation time and energy efficiency and found that improving compiling speed has also improved the overall energy efficiency [26]. So, we can say that the overall energy consumption will give better result for the proposed algorithm.
Computional Time in μs 40
35 30 25 20 15 10
5 0 1
2
3
4
5
6
7
8
9
Diffie Hellman
Fig. 3 Computational time analysis
10 11 12 13 14 15 16 17 18 19 20 21 Proposed Algorithm
Low-Cost Energy Efficient Encryption Algorithm for Portable Device
365
5 Conclusion and Future Work The number of portable digital devices is growing daily as a result of technological improvement. As more people utilize digital devices, security worries about the confidentiality of their shared information are growing. The private information given by users is secret, and cyber criminals wait for opportunities to steal and alter it. It is currently a major issue for the researchers to safeguard the protected communication channel between users at an affordable cost in order to defend the cyber world from these crooks. In light of these issues, an energy efficient and inexpensive encryption algorithm has been proposed for portable devices. This algorithm is tested only against the brute force attack. In the following work, this algorithm will be tested against different types of attack. Acknowledgements This is to acknowledge that, Information and Communication Tecnology (ICT) Division, Government of the People’s Republic of Bangladesh has awarded masters fellowship for this research work.
References 1. Kuppuswamy P, Shanmugasundaram JR (2020) A novel approach of designing e-commerce authentication scheme using hybrid cryptography based on simple symmetric key and extended linear block cipher algorithm, pp 1–6. https://doi.org/10.1109/ICCIT-144147971. 2020.9213815 2. Ali S, Ashraf H, Khan I, Saqlain S, Ghani A, Jalil Z, Ramzan MS, Alzahrani B (2020) An efficient cryptographic technique using modified Diffie-Hellman in wireless sensor network. Int J Distrib Sens Netw 16:1–24. https://doi.org/10.1177/1550147720925772 3. Amalarethinam G, Leena H (2018) A comparative study on various symmetric key algorithms for enhancing data security in cloud environment. Int J Pure Appl Math 85–94 4. Biswas M et al (2021) Accu3rate: a mobile health application rating scale based on user reviews. PloS One 16(12):e0258050 5. Chiang WK, Chen JH (2011) Tw-keap: an efficient four-party key exchange protocol for endto-end communications. In: Proceedings of the 4th international conference on security of information and networks, pp 167–174. SIN ’11, Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2070425.2070452 6. Chunka C, Goswami R, Banerjee S (2019) A novel approach to generate symmetric key in cryptography using genetic algorithm (GA): proceedings of IEMIS 2018, vol 1, pp. 713–724. https://doi.org/10.1007/978-981-13-1951-8_64 7. Diffie W, Hellman M (1976) New directions in cryptography. IEEE Trans Inform Theory 22(6):644–654. https://doi.org/10.1109/TIT.1976.1055638 8. Esha NH et al (2021) Trust IoHt: a trust management model for internet of healthcare things. In: Proceedings of the ICDSA, pp 47–57 9. Farhin F, Kaiser MS, Mahmud M (2020) Towards secured service provisioning for the internet of healthcare things. In: Proceedings of the AICT, pp 1–6 10. Farhin F, Kaiser MS, Mahmud M (2021) Secured smart healthcare system: blockchain and Bayesian inference based approach. In: Proceedings of the TCCE, pp 455–465 (2021) 11. Farhin F, Sultana I, Islam N, Kaiser MS, Rahman MS, Mahmud M (2020) Attack detection in internet of things using software defined network and fuzzy neural network. In: Proceedings of the ICIEV and icIVPR, pp 1–6
366
A. F. Alam and M. S. Kaiser
12. Geetha R, Padmavathy T, Thilagam T, Lallithasree A (2020) Tamilian cryptography: an efficient hybrid symmetric key encryption algorithm. Wireless Personal Commun 112(1):21–36 13. González-Manzano L, de Fuentes JM, Peris-Lopez P, Camara C (2017) Encryption by heart (EBH)-using ECG for time-invariant symmetric key generation. Future Gener Comput Syst 77:136–148. https://doi.org/10.1016/j.future.2017.07.018 14. Kaiser MS et al (2021) 6g access network for intelligent internet of healthcare things: opportunity, challenges, and research directions. In: Proceedings of the TCCE, pp 317–328 15. Kako N, Sadeeq H, Abrahim A (2020) New symmetric key cipher capable of digraph to single letter conversion utilizing binary system. Indonesian J Electr Eng Comput Sci 18:1028. https:// doi.org/10.11591/ijeecs.v18.i2.pp1028-1034 16. Keserwani P, Govil M (2020) A hybrid symmetric key cryptography method to provide secure data transmission, pp 461–474. https://doi.org/10.1007/978-981-15-6318-8_38 17. Kumari P, Anjali T (2018) Symmetric-key generation protocol (sgenp) for body sensor network. In: 2018 IEEE international conference on communications workshops (ICC workshops), pp 1–6. https://doi.org/10.1109/ICCW.2018.8403548 18. Li C, Lin D, Lü J, Hao F (2018) Cryptanalyzing an image encryption algorithm based on autoblocking and electrocardiography. IEEE Multi Media 25(4):46–56. https://doi.org/10.1109/ MMUL.2018.2873472 19. Li Z, Wang H, Fang H (2017) Group-based cooperation on symmetric key generation for wireless body area networks. IEEE Internet Things J 4(6):1955–1963. https://doi.org/10.1109/ JIOT.2017.2761700 20. Purohit K, Kumar A, Upadhyay M, Kumar K (2020) Symmetric key generation and distribution using Diffie-Hellman algorithm, pp 135–141. https://doi.org/10.1007/978-981-15-4032-5_14 21. Rajesh S, Paul V, Menon VG, Khosravi MR (2019) A secure and efficient lightweight symmetric encryption scheme for transfer of text files between embedded IoT devices. Symmetry 11(2). https://doi.org/10.3390/sym11020293 22. Sahin C, Katz B, Dandekar KR (2016) Secure and robust symmetric key generation using physical layer techniques under various wireless environments. In: 2016 IEEE radio and wireless symposium (RWS), pp 211–214. https://doi.org/10.1109/RWS.2016.7444407 23. Sarkar A, Singh BK (2018) Cryptographic key generation from cancelable fingerprint templates. In: 2018 4th international conference on recent advances in information technology (RAIT), pp 1–6. https://doi.org/10.1109/RAIT.2018.8389007 24. Sindhuja K, Devi SP (2014) A symmetric key encryption technique using genetic algorithm. Int J Comput Sci Inform Technol 5(1):414–416 25. Vyakaranal S, Kengond S (2018) Performance analysis of symmetric key cryptographic algorithms. In: 2018 international conference on communication and signal processing (ICCSP), pp 0411–0415. https://doi.org/10.1109/ICCSP.2018.8524373 26. Yuki T, Rajopadhye S (2013) Folklore confirmed: compiling for speed: compiling for energy. In: International workshop on languages and compilers for parallel computing. Springer, pp 169–184 27. Zaman S et al (2021) Security threats and artificial intelligence based countermeasures for internet of things networks: a comprehensive survey. IEEE Access 9:94668–94690 28. Zhang X, Wang X (2018) Digital image encryption algorithm based on elliptic curve public cryptosystem. IEEE Access 6:70025–70034. https://doi.org/10.1109/ACCESS.2018.2879844
Detection of Dental Issues Using the Transfer Learning Methods Famme Akter Meem, Jannatul Ferdus, William Ankan Sarkar, Md Imtiaz Ahmed, and Mohammad Shahidul Islam
Abstract Humans frequently experience dental problems, and as the population consumes more sugar and sweets, these problems will become more prevalent. The dentist always locates the problems through physical examination and X-ray photos. Technology is advancing quickly across the board in the health sciences, and deep learning modules known as transfer learning are highly helpful in recognizing patterns or individual pixels in an imagenet. Dental X-ray pictures can be employed in the transfer learning process as well as the CNN approach. In this study, dental X-ray pictures are recognized using six transfer learning models: Resnet50, VGG16, InceptionV3, Xception, and EfficientnetB7. Although the InceptionV3 also offers the best waiting time of 7.58 min with an accuracy of 0.93, the Densenet201 offers the best accuracy of 0.98 with a waiting time of 12.91 min. Although the waiting time is longer than inceptionV3, it can be argued that dental abnormalities can be diagnosed more effectively with the densenet201. Keywords Dental issues · Teeth Root infection · Transfer learning · Densenet201 · InceptionV3
1 Introduction Dentistry is the area of medicine that deals with the mouth, teeth, and gums. It entails the investigation, diagnosis, management, prevention, and treatment of oral illnesses, disorders, and conditions, with the dentition (the growth and placement of teeth) and oral mucosa receiving the majority of the attention [1]. Dentistry may also include the temporomandibular joint and other components of the craniofacial complex. F. A. Meem · J. Ferdus · W. A. Sarkar · M. I. Ahmed (B) Department of Computer Science and Engineering, Prime University, Mirpur, Dhaka, Bangladesh e-mail: [email protected] M. S. Islam Department of Information Technology, Jahangirnagar University, Savar, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_31
367
368
F. A. Meem et al.
In the specialist specialty of dentistry known as oral and maxillofacial pathology, diseases of the mouth, face, and other maxillofacial structures are diagnosed and treated. The mouth is the initial part of the gastrointestinal system and one of the main organs in the human body. It is important for breathing, digesting, and communication. The oral cavity is a vital organ that serves a number of purposes in our bodies. Additionally, it is prone to a number of medical and dental conditions. Dental radiography has been utilized to find even the smallest alterations in the mouth’s internal bone structures. It aids in directing dental therapy and tracking the course of illness, among other things. Oral cancer can also be detected with dental radiography. Two components make up the majority of a human tooth. One is the clinically apparent crown, while the other is the root, which is lodged in the jaw but is not clinically visible. By examining the X-ray pictures, the impact of illness on the tooth may be found. One of the most prevalent dental diseases globally is dental caries [2]. It is the phrase used in medicine to describe the typical dental cavity or tooth decay. Dental caries may occur in several phases, but the classification of the illness, not the progression of its stage, is the goal here. Acids acting on the enamel surface are to blame for its occurrence. Enamel caries, dentinal caries, and pulp caries are the three main types [3]. It is a sickness that may be prevented on an individual basis. For this reason, dental caries should be found early on since if it gets to the pulp, the treatment would be more difficult. There are several ways to find dental caries [4–6]. Over the past 20 years, the real uses of image processing have been in biometric and biological picture preparation. Different biomedical non-conclusive imaging modalities, including processed tomography (CT), X-ray, ultrasound images, magnetic resonance imaging (MRI), and countless others, are utilized in the preparation of medical images for the diagnosis and planning of therapy in the therapeutic sector [7, 8]. In recent years, a variety of radiographic 2-D image preparation techniques have been used successfully to detect oral diseases. A significant portion of the population worldwide is affected by dental diseases, therefore identifying dental caries is crucial for diagnosing the condition and planning a course of treatment [9, 10]. Medical imaging technologies like computed tomography (CT) and X-rays have helped in the treatment and diagnosis of several ailments in recent years [11]. Dental informatics is an emerging topic in dentistry that not only helps to better the treatment and diagnosis process but also saves time and lessens stress in daily activities [12]. Massive volumes of data may be processed by computer programs to assist dental practitioners in making decisions related to prevention, diagnosis, or treatment planning, among other things, as a result of the use of high-resolution image sensors and biosensors [13]. X-rays from an X-Ray generator are used to take radiographs by passing through the mouth cavity. Some tissues can absorb radiation, or radiation can pass through the patient while being absorbed by a detector. Projective radiography is the name of the technique, which creates two-dimensional pictures that depict the interior organs of the human body [14]. Dental radiographs fall into one of two categories: extraoral, where the patient is positioned between the X-ray source and the radiographic film, and intraoral, when the film is placed inside the buccal cavity. The most typical intraoral and extraoral dental X-rays are bitewing, periapical, and panoramic.
Detection of Dental Issues Using the Transfer Learning Methods
369
AI is an arm of computer technology that aims to learn and build brainy systems, frequently instantiated as spreadsheet programs. Nowadays, recuperation most usually uses an affiliate of AI named machine intelligence and, again, deep education. Clinical applications of AI in dentistry are radiology, orthodontics, periodontics, endodontics, oral pathology. Challenges of AI: the transparency of AI algorithms and data is a substantial issue. Our proposed models are Inception V3, Restnet-50, VGG16, Densenet 201, EfficientnetB7. We use these models to attain our target that solving dental issues.
2 Literature Review 2.1 Previous Research on Dental Teeth Issues Tooth decay or dental decay are other names for dental caries. The first approach is to use diagnostic imaging tools to find dental caries in their earliest stages. Only in the later stages of the disease, when the tooth structures have sufficiently undergone decalcification, is caries detectable radiographically. The shape and intensity fluctuation in dental X-ray images may make segmentation challenging. Radiographs are frequently used by dentists to detect cavities, bone loss, malignant or benign tumors, and concealed dental structures [15]. Dental decay affects 60–90% of children and 100% of grownups. QLF employs stains that have side effects. optical image-based caries monitoring technology that poses no health risks. able to divide up each tooth’s caries lesion and track how big it gets. Finding the precise location of caries lesions on afflicted teeth will aid dentists in providing better follow-up and diagnosis. Sensitivity, specificity, error rate, and accuracy are studied in comparison. Using RGB and other types of pictures, the method provides a comprehensive solution with over 93.5% accuracy for determining the shape and depth of the caries lesion, respectively. Every intermediate image, such as a mask, segmented teeth, or caries mask, is kept and the overall performance time is 2.5 to 3 s [16]. Direct observation of the patient’s diagnosis or determination is part of an image procedure method. The tools are liberated for studying and researching digital photographs, and they execute a group of compromised teeth. An alternative method for the depiction and splitting of a large tooth head. Segmentation based on region. The GLCM, SVM image is transformed into a 2-D format. change the image’s entropy, contrast, intensity, and noise levels. The CPU, memory, and other elements are factors that influence how quickly images are processed. The investigation’s findings, the patients’ medical history, and their bone fragments are all distinguished expertly [17]. Dental radiograph interpretation has made tremendous progress with deep learning-based visual pattern recognition. Deep learning-based image analysis in the
370
F. A. Meem et al.
context of dental imaging has demonstrated significant 90% accuracy in the segmentation, classification, and diagnosis of various prevalent dental disorders. Results provide a window of opportunity for improved dental medicine diagnosis and treatment planning. Standardization of data and generalization of AI-based solutions are just two of the many issues that still need to be resolved [18]. In 2-D dental radiography picture, dentistry image processing is finished, including classification, segmentation, and preprocessing. useful for identifying the areas of the background bones and the foreground teeth. graph cut segmentation of the tissues of the mouth cavity. Convolution neural network (CNN) results with 97.07% accuracy based on deep learning. Calculating the histogram for a 2-D dental X-ray image takes O(n2(mm + k)) time effectively. Dental disease analysis: Net output = I = 1xi + wi + b. Energy, entropy, homogeneity, and correlation in the examination of dental features. Dental X-ray images have been segmented using the following methods: level set approach, watershed approach, Niblack approach, fuzzy cmeans, Canny method, and sobel method [19]. Fourteen distinct dental conditions could manifest. Information about semantic segmentation is obtained using the CNN model. There were several image processing operations carried out. achieving instance segmentation by a mask region-based convolution neural network. extracting characteristics with Resnet-101. Neural networks must be trained and tuned for supervised learning. The disadvantage is that it only looks for teeth, ignoring other types of issues such as dentures and areas where teeth are absent [20]. Algorithm based on a convolutional neural network, Regional Convolutional Neural Network that is faster (Faster-R-CNN). Using the idea of Anchors and Intersection over Union, propose a novel method for detection and classification. The algorithm’s advantage is that it detects objects using bounding boxes rather than manually separating each tooth from the signal’s set of teeth. put into practice for testing and training. in comparison to the actual truth. It provides more than 90% accuracy for detection and 99% accuracy for classification [21]. With a high degree of accuracy, convolutional neural networks (CNN) are able to solve issues like image identification, segmentation, classification, etc., investigating how well CNN could diagnose a tiny labeled dental dataset. Accuracy is increased by the use of transfer learning. For improved accuracy, transfer learning using the VGG16 pretrained model is used. 88.46% accuracy is attained. Three alternative CNN architectures were discovered through experimentation. For the categorization task, the most prevalent disorders such as dental caries, periapical infection, and periodontitis are considered [22]. A Deep Learning model (Mask R-CNN) can distinguish and organize the decay of a tooth on occlusal surfaces across all 7-class ICDAS (International Caries Detection and Assessment System) scale. Mask R-CNN is an object discovery and instance segmentation DNN. Transfer learning and info improving are working during the whole of the preparation of the model. The stated veracity in detecting only disaster damage across the surface attained 86.3% [23]. Deep convolutional neural networks (CNNs) were used to identify the decay of a tooth on periapical radiographs. Pre-prepared GoogLeNet Inception v3 CNN network
Detection of Dental Issues Using the Transfer Learning Methods
371
was secondhand for preprocessing and transfer learning. Accuracies of a hard bony structure in the jaws of vertebrates (Premolar), something that chops (molar), and two together bony objects in the mouth (Premolar) and bony object in the mouth (molar) models were 89.0% (80.4–93.3), 88.0% (79.2–93.1), and 82.0% (75.5–87.1). The bony object in the mouth (Premolar) model was given the highest rank AUC (P < 0.001). A deep CNN invention has given considerably good efficiency in detecting decay of a tooth in periapical radiographs [24].
2.2 Transfer Learning Models When representing the second task, transfer learning is an optimization that enables quick advancement or improved performance [25]. (a) Resnet50: A convolutional neural network with 50 layers is called ResNet-50. The ImageNet database contains a pretrained version of the network that has been trained on more than a million images. The pretrained network can categorize photos into 1000 different object categories, including several animals, a keyboard, a mouse, and a pencil. The network has therefore acquired rich feature representations for a variety of images. The network accepts images with a resolution of 224 by 224 [26]. (b) VGG16: Convolution neural network (CNN) architecture VGG16. The 16 in VGG16 stands for 16 weighted layers. Thirteen convolutional layers, five max pooling layers, three dense layers, and a total of 21 layers make up VGG16, but only 16 of them are weight layers, also known as learnable parameters layers. Convolution neural network (CNN) architecture VGG16 was employed to win the 2014 ILSVR (Imagenet) competition. It is regarded as one of the best vision model architectures created to date [27]. (c) InceptionV3: A module for Googlenet, Inception v3 is a convolutional neural network that aids in object detection and picture analysis. Inceptionv3, a pretrained convolutional neural network model with 48 layers, is the third iteration of Google’s Inception Convolutional Neural Network, which was first introduced during the ImageNet Recognition Challenge. Inceptionv3 is a convolutional neural network with 48 layers that have been trained on more than a million photos from the ImageNet database. The ImageNet database contains a pretrained version of the network that has been trained on more than a million images. The pretrained network can categorize photos into 1000 different object categories, including several animals, a keyboard, a mouse, and a pencil [28–30]. (d) Xception: The Xception model, a 71-layer deep CNN based on an extreme interpretation of the Inception model from Google, was inspired by the Inception model. Convolutional layers that are depth-wise separable are stacked throughout the architecture. can import a pretrained version of the network from the ImageNet database that has been trained on more than a million images.
372
F. A. Meem et al.
The pretrained network can categorize photos into 1000 different object categories, including several animals, a keyboard, a mouse, and a pencil. Xception is an addition to the Inception architecture that uses depth-wise separable convolutions in place of the regular Inception modules [31]. (e) Densenet201: DenseNet-201 is a convolutional interconnected system that is to say 201 tiers deep. You can load a pretrained version of the network prepared in addition heap concepts from the ImageNet table [32]. The pretrained network can categorize images into 1000 object classifications, to a degree row of keys, rodents, indications, and many animals [33, 34]. (f) EfficientNetB7: EfficientNet-B0 is the control network grown by AutoML MNAS, while Efficient-B1 to B7 are acquired by measuring up the measure network. EfficientNetB7 achieves new 84.4% top-1/97.1% top-5 veracity, while being 8.4 × tinier than highest in rank existent CNN. the total number of tiers in EfficientNet-B0 the total is 237 and in EfficientNetB7 the total comes decided upon 813!! The most considerable EfficientNet model EfficientNet B7 acquired up-to-date acting on the ImageNet and the CIFAR-100 datasets. It acquired about 84.4% top-1/and 97.3% top-5 veracity on ImageNet. Also, the model diameter was 8.4 occasions tinier and 6.1 occasions faster than the prior best CNN model [35, 36].
3 Proposed System The tooth is the major element of the human body and tooth images are better for identifying any dental issues or problems, however, it cannot be possible to collect human teeth images with cameras rather than X-rays. X-rays of the teeth are very common for dental issues and in this research, the X-ray images of teeth were collected. After that, the images are added to the proposed system where the image segmentation and augmentation start first. Once the images proceeded to have the same structure, after that, it was divided to train and test data. This research was proposed on the basis of the transfer learning techniques and a number of transfer learning modules used in this research to identify the best-used module for the future. In addition, the system training time, run time of the epochs, and total time for identifying the dental issues are being observed. The popular transfer learning techniques called VGG16, Inception V3, Resnet50, Xception, Densenet201, and EfficientNetB7 are used in this research. After finding the accuracy of each of the modules then the comparison was held to sort out the best accuracy module for identifying the vulnerable teeth (Fig. 1).
4 Methodologies The main aim of the research is to find out vulnerable teeth using the deep learning technique called transfer learning. It will help the dentist to identify the issues of the teeth without using their knowledge, as the X-ray images of teeth have been
Detection of Dental Issues Using the Transfer Learning Methods
373
Fig. 1 Proposed model diagram
compared with the normal and already enormous teeth, so the system will identify the vulnerable or broken teeth easily. Section 3 describes the proposed system of the research, and based on the proposed system the methodologies are divided into some basic parts. Below the basic parts of the methodologies are explained.
4.1 Data Collection The teeth images need to identify the teeth issues and there must be a trained model by which the identification can be easily obtained. As the teeth images are not used mostly, the X-ray images of teeth are collected. In the dentistry issue, the most common use of dentistry images is X-ray images as they can identify the root issues or cavity issues of the teeth. A total of 120 X-ray images are collected for this research from the dentist where good teeth samples and bad teeth samples are presented. The X-ray images are being observed properly whether it contains any blur images and it was replaced with appropriate images.
4.2 Image Preprocessing Once the X-ray image collection is done, then the images are being uploaded to the system. In this research, the system is built on python3 and the preprocessing starts just after the uploadation of the images. The images’ sizes were different and weren’t in the same resolution, however, the teeth of the images can be easily identified by the system. Need to see the codes.
374
F. A. Meem et al.
4.3 Train Test Splitting The preprocessed images are split into train and test images for training purposes in the transfer learning techniques. There were a total of 120 images for the research, among them 80 images are used for training purposes and 40 images are used for testing purposes. It was obtained that the test data contains the teeth images with muddle images as well as the adequate teeth images. Once the train test splitting is done then the images are ready for training in the transfer learning module.
4.4 Transfer Learning Modules The popular transfer learning techniques called VGG16, Inception V3, Resnet50, Xception, Densenet201, and EfficientNetB7 are called for the training and testing of the image data. As the system’s main goal was to find the best model which can replace human or dentistry work for the identification of fatal teeth and the accuracy measurements were needed to identify the best models with the best accuracy. The algorithm’s accuracy, training loss, testing loss, train, and validation accuracy are being measured for research purposes to find the best transfer learning models. In the below section, the result and discussion contain the accuracy and visualization of the training loss, testing loss, validation accuracy, and performance accuracy. The waiting time for getting the result is also mentioned in this research so that researchers can figure out which algorithm to use for the identification as everybody wants fast results in the context of accurate measurement.
5 Result and Analysis As described in the methodology that the proposed system works in building a model by the use of the transfer learning techniques which will allow the dentist to evaluate any dental issues of patients using the X-ray images in the system. In this research, the X-ray images of dental issues and the six transfer learning models are used. The six algorithms’ accuracy measurements must need to be taken into account as the research focus was to find the best model for replacing the physical workload of the dentists. There are in total of 20 epochs used for each of the algorithms and the training loss and training accuracy were measured. Ten epochs initially were used; however, the accuracy wasn’t good enough for use in the research. Later, 20 epochs were used and the accuracy was found excellent. Each of the algorithm’s average training accuracy, average training loss, average validation loss, and average validation accuracy is given in Table 1.
Detection of Dental Issues Using the Transfer Learning Methods
375
Table 1 Transfer learning models waiting time, training loss, validation loss, training accuracy, and validation accuracy averages Model name
Waiting time (m)
Training loss avg
Validation loss avg
Training accuracy avg
Validation accuracy avg
Resnet50
11.21
2.98
VGG16
31.41
0.27
2.40
0.24
0.22
0.15
0.96
0.96
InceptionV3
7.58
Xception
11.66
0.55
0.48
0.90
0.93
0.19
0.21
0.95
0.93
Densenet201
12.91
0.02
0.01
0.98
0.98
EfficientnetB7
27.78
35.57
33.13
0.09
0.08
Based on Table 1 it seems that the densenet201 gives the best accuracy and it needs a total of 12.91 min to complete its epochs. However, it is also observed that the inceptionV3 and xception have a similar accuracy level and which is 0.93. However, the inceptionV3’s waiting time is very low 7.58 min rather than the xception’s waiting time of 11.66 min (Fig. 3). So, it can be said that the inceptionV3 is more good than xception. The inceptionV3’s waiting time is more convenient than the waiting time required for the densenet201. This research leads to the conclusion that densenet201 and inceptionV3 both can be taken in the thinking of the accuracy and waiting time. In the following Fig. 2, the training and validation accuracy and loss are plotted for the densenet201 and inceptionV3. The rest models are not that good for further use so those algorithms are exempted from that list (Fig. 4).
Fig. 2 Sample X-ray images for the research purposes
376
Fig. 3 Densenet201 training and validation loss and accuracy
Fig. 4 InceptionV3 training and validation loss and accuracy
F. A. Meem et al.
Detection of Dental Issues Using the Transfer Learning Methods
377
6 Conclusion In the conclusion, it can be said that dentist can easily reduce their physical work by implementing the transfer learning procedure for detecting the vulnerable teeth of humans. In this procedure, the X-ray images are used for identification through the comparison of the previously trained data. However, if the system can do it automatically means the machine can provide the result so that it will be more automatic and, in the future, it will be trying to implement the machinery instruments to find reports instantly. The system produces a good result by densenet201 with an accuracy of 98% and the inceptionV3 works effectively at 93% where the time comparison leads that inceptionV3 needing less time than the densenet201. If more epochs can be used in the research, then more good accuracy can be occupied as people’s choice is less time with the highest benefit or accuracy.
References 1. Council on Dental Care Programs (1979) Glossary of dental prepayment terms. J Am Dental Assoc 98(4): 601–604. https://doi.org/10.14219/jada.archive.1979.0109 2. Selwitz RH, Ismail AI, Pitts NB (2007) Dental caries. The Lancet 369(9555):51–59 3. Kidd EAM, Fejerskov O (2004) What constitutes dental caries Histopathology of carious enamel and dentin related to the action of cariogenic biofilms, 1st ed. J Dental Res 83:35–38 4. Pretty IA (2006) Review—caries detection and diagnosis: novel technologies. J Dent Elsevier 34:727–739 5. Lussi A, Imwinkelried S, Pitts NB, Longbottom C, Reich E (1999) Performance and reproducibility of a laser fluorescencesystem for detection of occlusal caries in vitro. Caries Res 33:261–266 6. Karlsson L (2010) Caries detection methods based on changes in optical properties between healthy and carious tissue. Int J Dent 7. Baert AL, Neri E, Caramella D, Bartolozzi C (2007) Image processing in radiology: current applications (medical radiology/diagnostic imaging). Springer 8. Ahmed AM, Kong X, Liu L, Xia F, Abolfazli S, Sanaei Z, Tolba A (2017) BoDMaS: bioinspired selfishness detection and mitigation in data management for ad-hoc social networks. Ad Hoc Netw 55:119–131 9. Whaites E (2002) Essentials of dental radiography and radiology. ISBN-13/EAN: 9780443070273, Copyright Churchill Livingstone 10. Shakeel PM, Tolba A, Al-Makhadmeh Z, Jaber MM (2019) Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks. Neural Comput Appl 1–14 11. Shen D, Wu G, Suk H-I (2017) Deep learning in medical image analysis. Annu Rev Biomed Eng 19:221–248 12. Tuzoff DV, Tuzova LN, Bornstein MM, Krasnov AS, Kharchenko MA, Nikolenko SI, Sveshnikov MM, Bednenko GB (2019) Tooth detection and numbering in panoramic radiographs using convolutional neural networks. Dentomaxillofacial Radiol 48:20180051 13. Mendonça EA (2004) Clinical decision support systems: perspectives in dentistry. J Dent Educ 68:589–597 14. Haring JI, Jansen L (2000) Dental radiography: principles and techniques, 2nd edn. W.B. Saunders Company
378
F. A. Meem et al.
15. Krithiga R, Lakshmi C (2016) A survey: segmentation in dental X-ray images for diagnosis of dental caries. Int J Control Theory Appl 9(40):941–948 16. Datta S, Chaki N (2015) Detection of dental caries lesion at early stage based on image analysis technique. In: 2015 IEEE international conference on computer graphics, vision and information security (CGVIS). IEEE 17. Sangeetha M, Kumar K, Aljabr AA (2021) Image processing techniques in periapical dental X-ray image detection and classification. Webology 18. Special Issue on Information Retrieval and Web Search, pp 42–53 18. Singh NK, Raza K (2022) Progress in deep learning-based dental and maxillofacial image analysis: a systematic review. Exp Syst Appl 116968 19. Al Kheraif AA, Wahba AA, Fouad H (2019) Detection of dental diseases from radiographic 2d dental image using hybrid graph-cut technique and convolutional neural network. Measurement 146:333–342 20. Muresan MP, Barbura AR, Nedevschi S (2020) Teeth detection and dental problem classification in panoramic X-ray images using deep learning and image processing techniques. In: 2020 IEEE 16th international conference on intelligent computer communication and processing (ICCP). IEEE 21. Laishram A, Thongam K (2020) Detection and classification of dental pathologies using fasterRCNN in orthopantomogram radiography image. In: 2020 7th international conference on signal processing and integrated networks (SPIN). IEEE 22. Prajapati SA, Nagaraj R, Mitra S (2017) Classification of dental diseases using CNN and transfer learning. In: 2017 5th international symposium on computational and business intelligence (ISCBI). IEEE 23. Moutselos K et al (2019) Recognizing occlusal caries in dental intraoral images using deep learning. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE 24. Lee J-H et al (2018) Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. J Dent 77:106–111 25. Transfer Learning for Deep Learning (2022) Machine learning mastery. https://www.machin elearningmastery.com/transfer-learning-for-deep-learning/#:~:text=Transfer%20learning% 20is%20an%20optimization,that%20has%20already%20been%20learned. Accessed 22 Aug 2022 26. Resnet50 (2022) MathWorks. www.mathworks.com/help/deeplearning/ref/resnet50.html#:~: text=ResNet-50%20is%20a%20convolutional,%2C%20pencil%2C%20and%20many%20a nimals. Accessed 22 Aug. 2022. 27. VGG16 Implementation in Keras (2019) Towards data science, 6 Aug 2019. https://www. towardsdatascience.com/step-by-step-vgg16-implementation-in-keras-for-beginners-a833c6 86ae6c 28. Inceptionv3 (2022) MathWorks. https://www.se.mathworks.com/help/deeplearning/ref/incept ionv3.html. Accessed 23 Aug. 2022. 29. InceptionV3 (2022) Wikipedia. https://www.en.wikipedia.org/wiki/Inceptionv3. Accessed 24 Aug 2022 30. Classify Any Object Using Pre-Trained CNN Model (2022) Towards data science. https:// www.towardsdatascience.com/classify-any-object-using-pre-trained-cnn-model-77437d 61e05f. Accessed 24 Aug 2022 31. Xception (2022) Research gate. www.researchgate.net/figure/Architecture-of-the-Xceptiondeep-CNN-model_fig2_351371226. Accessed 24 Aug 2022 32. Imagenet (2022) IMAGENET. www.image-net.org. Accessed 24 Aug 2022 33. Densenet201 (2022) MathWorks. www.mathworks.com/help/deeplearning/ref/densen et201.html#:~:text=DenseNet%2D201%20is%20a%20convolutional,%2C%20pencil%2C% 20and%20many%20animals. Accessed 24 Aug 2022 34. Introduction to DenseNet with TensorFlow (2020) PLURALSIGHT. Gaurav Singhal, 6 May 2020. www.pluralsight.com/guides/introduction-to-densenet-with-tensorflow
Detection of Dental Issues Using the Transfer Learning Methods
379
35. Understanding EfficientNet—The Most Powerful CNN Architecture (2021) Mlearning-Ai, 8 May 2021. https://www.medium.com/mlearning-ai/understanding-efficientnet-the-most-pow erful-cnn-architecture-eaeb40386fad 36. EfficientNet (2019) Google AI Blog, 29 May 2019. https://www.ai.googleblog.com/2019/05/ efficientnet-improving-accuracy-and.html
Healthcare Professionals Credential Verification Model Using Blockchain-Based Self-sovereign Identity Shubham Saha, Sifat Nawrin Nova, and Md. Ishtiaq Iqbal
Abstract In terms of healthcare, fundamental trust and reliability depend on health professionals. But there are several records of doctors practicing with forged licenses worldwide which derogate the impression, reduce trust, and put patients’ lives at stake. Professionals need to verify their identity to ensure that they are licensed to do medical practices. Also, they have to share their personal information credulously to verify their certificates to the hospital they are intended to do medical practices, or to the medical college to pursue higher degrees which are not secure enough as there lies the biggest threat of misusing their personal information. To overcome these two major problems, a system is required that will authenticate a healthcare professional’s digital credentials with minimal identity reveal. In this research work, we propose a privacy-preserving Self-Sovereign Identity (SSI) model leveraging a decentralized Blockchain which verifies healthcare professionals’ credentials to impede them from practicing with fraudulent licenses, disclosing minimal information, and providing data secrecy which proves an attribute from a credential without disclosing the actual value and provides a secured end-to-end communication channel for data transmission. Keywords Healthcare · Hospitalist · Blockchain · Decentralized identifier · Identity management · Self-sovereign identity · Verifiable credentials
1 Introduction Verifying someone’s identity is the most significant thing to build trust, which enhances reliability. For instance, a general relationship between a doctor and a patient. Patients heavily rely on doctors for wellness and health, but unfortunately, the reality is drastically horrendous. Numerous reports from all over the world address the fact that many doctors practice medicine, treating patients without proper training S. Saha (B) · S. N. Nova · Md. I. Iqbal University of Information Technology and Sciences (UITS), Baridhara, Dhaka 1212, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_32
381
382
S. Saha et al.
certification or licenses. For instance, in 2021, nearly a thousand doctors practicing around the state who are not fully qualified medical doctors in Florida were found by 9 investigations. The five doctors were investigated on October 14, 2015, in India, where some of the doctors held medical degrees, while two of them hadn’t even passed the HSC exams. Due to the availability of sophisticated scanning and printing technologies today, credential fabrication has increased. As a result, it is becoming more crucial and challenging to validate certificates and confirm the identities of their holders [17]. Though the standard practice of verifying identity and pre-employment checks has increased over time as healthcare service providers have followed new legislation. Numerous identity management models have been proposed over time. Nevertheless, until recently, no model was able to address the concerns about the verification of a doctor’s credentials along with the sovereignty of a doctor’s personal and sensitive information. With the advent of Blockchain technology, issues with paper certificate verification have been addressed [9]. Satoshi Nakamoto first put out the idea for the technology [15], which has four key components: decentralized, trustless, collectively maintained, and reliable database [4]. Due to its tamper-proof and decentralized nature, Blockchain technology enables an user to manage their credentials via dispersed networks and gives an user the ability to control their records. A Blockchain-based certificate cannot be altered. A new identity management model called Self-Sovereign Identity (SSI) was launched with the advent of Blockchain. It aims to address all the aforementioned problems and provides a user with full sovereignty over their identity as well as credential verification over the accompanying private and confidential data. It retains all private data in a user-owned and managed digital wallet and maintains identity ownership. The Digital Wallet works similarly to a physical wallet in that it stores all digital credentials as tangible objects, but these credentials are digitally signed, verified, issued, and confirmed much more quickly than their physical counterparts [20]. Also, because there is no mediator between the user and the organization, it is a peer-to-peer model. Along with maintaining identity ownership, it keeps all private data in a user-owned and managed digital wallet. The Digital Wallet functions similarly to a physical wallet in that it saves all digital credentials as physical objects, but these credentials are digitally signed, verified, and far quicker to issue and verify than their physical equivalents [14]. It is also a peer-to-peer paradigm, meaning there is no intermediary between the user and the organization. This chapter provides a conceptual model of certificate issuing and verification method of doctors using Blockchain technology to combine various certifications and licenses as part of the issuing credential without eliciting doctors personal sensitive information and sacrificing control over them. This system model focuses on eliminating the tedious procedure of verification, establishing a secure channel for data transmission, and maintaining all the principles of the privacy policy. This entire chapter is segmented into seven sections. Section 2 describes some of the basic concepts of Self-Sovereign Identity. Section 3 provides some literature surveys relevant to our study. Section 4 describes the methodology of the proposed model. Section 5 discusses the prospective outcome from this model. Section 6 high-
Healthcare Professionals Credential Verification Model …
383
lights some challenges regarding the model. Finally, Sect. 7 summarizes the chapter and examines potential areas for future development.
2 Basic Concepts of Self-sovereign Identity A partial identity in digital form is referred to as a digital identity. Each given entity may have one or more distinct or non-distinctive digital IDs. The expansion of the Internet and online services has sparked a demand for practical, secure, and privacypreserving digital identity and access management infrastructures [5]. As a result, various identity management models were developed. Previously, these assessments were dependent on a centralized identity system where the organization and user interacted within a core database. Afterward, a more improved version has been deployed called federated identity which also maintained a core database system with an identity provider. As an emerging technology, SSI eliminates the drawbacks of the previous identity management systems including the need for any third-party identity provider. It allows direct contact between a user and an organization and removes the third-party system. It also offers a user full control over their identityrelated personal and confidential data through the use of a digital wallet, resolving the key ownership issue. SSI plays the three crucial roles of Issuer, Holder, and Verifier in its ecosystem. Credentials are made and given to a holder by an issuer. When necessary, the holder can share the credentials with a verifier after receiving them from the issuer. A verifier accepts and verifies the credentials that a holder has presented [14]. SSI is implemented based on three pillars, Decentralized Identifier (DID), Verifiable Credential (VC), and Blockchain technology.
2.1 Decentralized Identifiers Self-Sovereign Identity depends on Decentralized Identifiers (DIDs). It enables two parties to establish individual, private, and reliable peer-to-peer connections. DIDs allow an individual to be authenticated equivalent to a login system, but without depending on a dependable third-party organization, and can be handled completely by their owners. In order to avoid being controlled by a single authority, DIDs are frequently kept in distributed ledgers like Blockchain ledgers [16].
2.2 Verifiable Credential Verifiable Credentials (VCs) are the collection of tamper-proof assertions utilized by Issuers, Holders, and Verifiers. The DIDs, Issuers’ DID documents, and credential schemes are frequently stored in a distributed ledger. Through a combination of public
384
S. Saha et al.
key cryptography and secrecy strategies to restrict correlation, Verifiable Credentials essentially enable the electronic watermarking of claimed data. This has the effect of enabling third parties to instantly verify this data without having to contact the issuer, as well as enabling owners of certain credentials to provide particular information from this credential on a selective basis without revealing the actual data [21].
2.3 Blockchain The term “Blockchain Technology” or Distributed Ledger Technology (DLT) refers to the underlying technology of decentralized databases that gives entities authority on how information evolves among them through a peer-to-peer network while utilizing consensus algorithms to ensure replication across the network’s nodes. SelfSovereign Identity uses Blockchain technology to establish trust between the parties and guarantee the legitimacy of the data and attestations without actually storing any personal data on the Blockchain. This is essential because a distributed ledger is immutable, which means that anything added to it cannot be changed or removed [7].
3 Literature Survey Strong identity verification procedures must be maintained by healthcare providers to make sure that all of their professionals are who they say they are and have the knowledge and training necessary for the position. Unlicensed medical practice has unfortunately been reported in various parts of the world. This compromises patient safety and erodes confidence in the field as a whole. Recently, 3,000 doctors’ credentials had to be checked by the UK General Medical Council after it was discovered that a fraudulent psychiatrist had been working for 23 years without the required credentials [6]. The improvement of healthcare professionals’ credential verification has been the subject of numerous research investigations. This section talks about a few of them. In [13], the authors proposed a privacy-preserving decentralized peer-to-peer infrastructure for safe and secure hospital networks in healthcare that allows mutual authentication using decentralized identifiers and verified credentials. In [10], the author proposes a Blockchain-based DDSSI wallet to update the current identity management system that will be utilized to identify as well as access control to provide validation and authorization of entities in a digital system. The researchers in [13] illustrate the four major BC application features in healthcare, with a particular emphasis on security, interoperability, data sharing, and mobility. The main components of an identity management system based in British Columbia, such as accessibility, compliance, regulation, incorporation, and standardization, are described in the article [12]. It does not, however, handle matters pertaining to healthcare, such
Healthcare Professionals Credential Verification Model …
385
as those involving physicians, patients, researchers, hospitals, clinics, and insurance companies. In [8], the researchers contrast many identity management technologies that are now in use, such as decentralized authentication, DNS, Blockchain architecture, and privacy protection, among others. In [1], the authors have covered Fraud Detection and Identity Verification section where their key focus in Fraud Detection was digital document verification, voting system, and digital transactions. In Identity Verification, they focused on notarizing marriages, birth certificates, and business contracts.
4 Proposed Methodology This research incorporates an identity verification system for healthcare professionals based on Self-Sovereign Identity (SSI). The entire workflow of the system is illustrated in this section. A conceptual framework of the model has been proposed in Fig. 1, where a healthcare professional maintains a digital wallet which is an application that stores their credentials. This wallet will establish a connection with a trusted authority to receive all required credentials for verification. In this model, two entities have been used as certificate issuing authorities named Medical School and Medical Regulatory Authority, one entity has been used as a certificate verifier named Practicing Hospital as well. 1. Medical School: In order to become a licensed doctor, a degree at a medical school is mandatory. For undergraduate students, there are degrees such as Bachelor of Medicine, Bachelor of Surgery (MBBS), and Bachelor of Dental Surgery (BDS). For graduate students, Doctor of Medicine (MD), Master of Surgery (MS), etc. degrees are offered. 2. Medical Regulatory Authority: It is a group that the government of a nation, state, province, etc. has authorized as incorporating responsibilities of licensing doctors, which enables them to engage in the practice of medicine and provide the guidelines for practice followed by licensed doctors in that country. 3. Practicing Hospital: Licensed physicians practice in a hospital where they provide care for a wide range of illnesses. Hospitalists treat patients who are admitted to the hospital for a range of ailments and/or wounds. Firstly a doctor will use their digital wallet to establish a connection with trusted authorities, such as Medical College and Medical Regulatory Authority. To establish a reliable communication channel, peer DID is used because of its secured key exchanging mechanism. It eliminates the third-party key distributor’s engagement, making the channel more private. Trusted authorities will ask to scan a QR code through their digital wallet, where public DID documents will be exchanged and a secured peer DID communication channel will be established. Through this channel, authorities will issue all requested certificates in the form of Verifiable Credentials (VC) by appending their Public DID document to that credential. However, the
386
S. Saha et al.
Fig. 1 The proposed model
allowance of digital signature in a VC sincerely addressed the privacy-preserving factor because of the use of asymmetric cryptography. The most popular asymmetric keys are constructed using the Rivest-Shamir-Adleman function (RSA) [22]. A public key and a private key are the two key pairs that are always produced by this technique [11]. Let’s consider the working algorithm for establishing connection along with message encryption and decryption defined in Algorithm 1, firstly the DW uses the CIA’s public key to encrypt the plain text and signs it using its own private key. The message is then sent to DW which will verify it using DW’s public key and decrypt the message using its own private key. The DID Documents for CIA and DW include all the data necessary for this interaction [3]. User certificates are encrypted with the issuer’s public key which can be decrypted using the user’s private key. After issuing the verifiable credentials by the authorities, the doctor will store them in their digital wallet also, the information and the signature of the credential issuer are registered in the decentralized ledger. So afterward, when a practicing hospital wants to check the certifications of a doctor to prove its validity, through the use of their digital wallet, the doctor will respond to a proof request by using their digital wallet to create a cryptographic presentation which is called credential presentation using one or more verifiable credentials. After that, the
Healthcare Professionals Credential Verification Model …
387
Algorithm 1 DID Communication Between Digital Wallet (DW) and Certificate Issuing Authority (CIA) System 1: DW has a private key (ska) and a DID Document for CIA containing an endpoint (endpointCIA) and a public key (pkb). 2: CIA has a private key (skb), and a DID Document for DW containing its public key (pka). 3: DW encrypts plaintext message (m) using (pkb) and creates an encrypted message (eb). 4: DW signs eb using her private key (ska) and creates a signature (s). 5: DW sends (eb, s) to endpointCIA. 6: CIA receives the message from DW at endpointCIA. 7: if Verify(s, eb, pka) = 1 then 8: CIA decrypts eb using skb. 9: CIA reads the plaintext message (m) sent by DW. 10: end if Fig. 2 Working procedure of the attestation and verification
doctor will generate a QR code from the wallet including a credential presentation and DID document. The practicing hospital will scan the QR code (after scanning the code, DID documents will be exchanged and a secured connection will be established). Here, the practicing hospital has zero knowledge about the doctor’s data. To verify the doctor instead of validating the actual data in the credentials they check the Blockchain ledger to verify the authenticity of the attestation signature and the issuer’s information, such as the medical school, so they may decide whether to validate the proof or not. If the signature and the issuer’s information match the ledger, then it is considered authentic. In Fig. 2 to understand the framework more precisely, we have illustrated a scientific analysis of the proposed system based on the rudimentary working procedure of the attestation and verification scheme using Distributed Ledger. Based on the concept, a use case is presented below along with a sequence flow diagram to elaborate the practical scenario of this model to assess the forgery of any individual doctor. An example of a doctor who needs to verify their credentials to prove their eligibility in order to do medical practices in a reputed hospital. To identify
388
S. Saha et al.
Fig. 3 Example of identity verification with minimal disclosure of information
Fig. 4 Sequence flow diagram for an individual doctor
themselves, they need to present some identification documents to the credential verifier such as Practicing Hospital. Based on the SSI-based system, the doctor will be verified. Figure 3 shows an example of a doctor who receives transcripts from issuers and at the time of identity verification the doctor discloses minimum information. Figure 4 depicts the procedure of the use case. 1. Firstly the doctor needs the issuers like medical college and medical regulatory authority to provide the intended certificates. For instance, MC is a medical college and MRA is a medical regulatory authority. A doctor who completed his
Healthcare Professionals Credential Verification Model …
2.
3. 4. 5. 6.
7.
8. 9.
10.
389
preliminary medical studies and is eligible for a practicing license is requesting the credential issuers to issue their certificate and license as proof of verified healthcare professional. Both MC and MRA maintain a wallet or web application to establish a secure connection with the doctor via their digital wallet. The issuers present a QR code for creating a connection. The doctor scans the QR code with its own digital wallet and establishes a secure connection with the credential issuers. The issuer then issues the required verifiable credentials to the doctor’s digital wallet. In the digital wallet when the credential is received, it is stored and can be retrieved whenever it is needed. So when the doctor wants to practice further in a reputed hospital (let’s assume PH is a practicing hospital), then he/she needs to verify their identity to ensure that they are authorized and licensed to do medical practices. The doctor then embeds their credentials into a QR code and presents it to the credential verifier such as PH to establish a connection and provide the credential for the validity of the doctor’s authenticity proof. PH also maintains a wallet or web application, through which it scans the QR code. After establishing a connection with the doctor’s digital wallet, it cross-checks the validity of the doctor’s credentials and notifies them with an acknowledgment. Based on the credential verification, doctors with fraudulent licenses can be impeded smartly without any hassle.
5 Discussion Identity can be used for both beneficial and detrimental purposes. Our system addresses a number of identity and privacy criteria by ten major SSI principles [19]. Each SSI concept is mapped to the corresponding proposed model requirement below: 1. Existence: In the initial communication establishment between the doctor and the credential issuer, the doctor’s identity is confirmed. 2. Control: The doctor is in charge of their identity data and keys on their own device, and they have the authority to alter identity attributes and reveal how much they want. 3. Access: Doctor has access to all identifying details and keys via the wallet. The ledger should be available to them on demand. 4. Transparency: It provides feasibility to anyone who should be able to evaluate how systems and algorithms function because they are transparent.
390
S. Saha et al.
5. Persistence: Each entity’s DID is stored in the ledger, together with the agents’ metadata, consent receipts, and public claims which are permanent, accessible, and valid unless their owners specifically invalidate them. 6. Portability: The ability to transfer identity-related data and services to other services is required. Smartphones can be used to transfer client identity info. As an alternative, the customer can decide to store their data on cloud-based services of their own choice. 7. Interoperability: It is always feasible to create a digital identity, and it is widely useful. This model relies on DID and Verifiable Credentials to ensure full interoperability among the entities. 8. Consent: The terms and conditions under which the identifying variables are communicated with the credential issuer should be clearly stated by the doctor. A record of their approval should be made in the ledger for future audit purposes. 9. Minimization: When a healthcare professional’s data is disclosed, then its minimum amount of data should be disclosed. 10. Protection: All applicable policies and obligations for each entity of the model should be made explicit. Instead of verifying the user’s credentials by disclosing the data which takes time and also chances of human error while cross-checking, a Blockchain ledger is used to check if the issuer’s information and the attested signature are registered while issuing the credentials, tamper-proof or not. Complexity in the healthcare system doesn’t truly promote new innovation. A more structured approach to experimentation and implementation in relatively confined, risk-free situations is necessary. An SSI system only functions when it is widely adopted. This must be driven from a national perspective and demonstrate how the entire ecosystem benefits [2, 18].
6 Research Challenges This research proposes a conceptual model by addressing a modern issue that needs to be resolved. Implementing the model by solving every corner case is a core challenge. Without a centralized authority, consensus can now be maintained across all nodes because of Blockchain technology. However, despite better algorithms that reduce transactional delay while updating entries on ledgers, fundamental problems such as a lack of real-time transaction settlement and scalability continue to be significant barriers to blockchain deployment in daily life. A new Blockchain Based on enhanced version Direct Acyclic Graphs (DAG), named TechPay might mitigate the problem by overcoming the issues of running Blockchain platforms which aim to address longstanding issues with current public distributed ledger technologies and set itself apart from conventional block-based storage architecture [2].
Healthcare Professionals Credential Verification Model …
391
7 Conclusion and Future Scope This research proposes a model for healthcare professionals’ credential verification system based on principles and three pillars of Self-Sovereign Identity. Additionally, various general identification systems are in the production phase. However, they haven’t yet been standardized for use in healthcare applications. The work in this research attempts to solve a variety of issues. Firstly, it’s solving the authentication and verification problem of physicians by imposing new angles on how we should prohibit the unauthorized practice of doctors. Secondly, it’s giving a doctor the sole control over their personal information. It can make favorable and desirable space in every authentication and verification system to improve the technology with the latest security standards. Also to reduce time and enhance security, a QR code scan is used to establish a secured connection between credential issuers and credential owners. Without ever looking at the information on a doctor’s certification, a healthcare organization could rely on its veracity. Future work of this research focuses on various possibilities. Initially, to extend the work, improvising the system design for a secured and decentralized digital wallet to allow it to be utilized in versatile scenarios of the healthcare industry by enabling real-time transactions is a prime factor. Also, the system will be designed for the revocation of doctor’s credentials based on doctor’s consent.
References 1. TECHPAY WHITEPAPER. https://www.techpay.io/home_page/assets/TECHPAY%20COIN %20WHITE%20PAPER.pdf (2022). Last accessed 10 Aug 2022 2. Abramson W, van Deursen NE, Buchanan WJ (2020) Trust-by-design: evaluating issues and perceptions within clinical passporting. arXiv preprint arXiv:2006.14864 3. Abramson W, Hall AJ, Papadopoulos P, Pitropakis N, Buchanan WJ (2020) A distributed trust framework for privacy-preserving machine learning. In: International conference on trust and privacy in digital business. Springer, pp 205–220 4. Crosby M, Pattanayak P, Verma S, Kalyanaraman V et al (2016) Blockchain technology: beyond bitcoin. Appl Innov 2(6–10):71 5. Dabrowski M, Pacyna P (2008) Generic and complete three-level identity management model. In: 2008 second international conference on emerging security information, systems and technologies. IEEE, pp 232–237 6. Dyer C (2018) Gmc checks 3000 doctors’ credentials after fraudulent psychiatrist practised for 23 years 7. Ferdous MS, Chowdhury F, Alassafi MO (2019) In search of self-sovereign identity leveraging blockchain technology. IEEE Access 7:103059–103079 8. Garg T, Kagalwalla N, Churi P, Pawar A, Deshmukh S (2020) A survey on security and privacy issues in IOV. Int J Electr Comput Eng (2088-8708) 10(5) 9. Grech A, Camilleri A (2017) Blockchain in education. JRC science for policy report. European Commission 10. Islam MT, Nasir MK, Hasan MM, Faruque MGG, Hossain MS, Azad MM (2021) Blockchainbased decentralized digital self-sovereign identity wallet for secure transaction. Adv Sci Technol Eng Syst J 6(2):977–983
392
S. Saha et al.
11. Kaur R, Kaur A (2012) Digital signature. In: 2012 international conference on computing sciences. IEEE. pp 295–301 12. Kuperberg M (2019) Blockchain-based identity management: a survey from the enterprise and ecosystem perspective. IEEE Trans Eng Manage 67(4):1008–1027 13. McGhin T, Choo KKR, Liu CZ, He D (2019) Blockchain in healthcare applications: research challenges and opportunities. J Netw Comput Appl 135:62–75 14. Naik N, Jenkins P (2020) Self-sovereign identity specifications: govern your identity through your digital wallet using blockchain technology. In: 2020 8th IEEE international conference on mobile cloud computing, services, and engineering (MobileCloud). IEEE, pp 90–95 15. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Decentralized Bus Rev 16. Papadopoulos P, Abramson W, Hall AJ, Pitropakis N, Buchanan WJ (2021) Privacy and trust redefined in federated machine learning. Mach Learn Knowl Extr 3(2):333–356 17. San AM, Chotikakamthorn N, Sathitwiriyawong C (2019) Blockchain-based learning credential verification system with recipient privacy control. In: 2019 IEEE international conference on engineering, technology and education (TALE). IEEE, pp 1–5 18. Shuaib M, Alam S, Alam MS, Nasir MS (2021) Self-sovereign identity for healthcare using blockchain. Mater Today: Proceed 19. Soltani R, Nguyen UT, An A (2018) A new approach to client onboarding using self-sovereign identity and distributed ledger. In: 2018 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData). IEEE, pp 1129–1136 20. Tobin A, Reed D (2016) The inevitable rise of self-sovereign identity. The Sovrin Found 29(2016):18 21. W3.org: verifiable Credentials Data Model v1.1. https://www.w3.org/TR/vc-data-model/ (2022). Last accessed 1 Aug 2022 22. Yudistira R (2020) AES (advanced encryption standard) and RSA (rivest-shamir-adleman) encryption on digital signature document: a literature review. Int J Inf Technol Bus 2(2):26–29
Emerging Applications for Society
A Classified Mental Health Disorder (ADHD) Dataset Based on Ensemble Machine Learning from Social Media Platforms Sabrina Mostafij Mumu, Hasibul Hoque, and Nazmus Sakib
Abstract Machines in the modern world are becoming more intelligent. Recognizing patterns and categorizing enormous volumes of data into discrete values is one of the typical tasks performed by machine learning systems. Additionally, mental health illnesses are increasingly the most prevalent medical condition nowadays. Additionally, there are other ways that machines are utilized to identify mental health issues. In this study, machine learning is used as the baseline to diagnose the mental health condition known as Attention Deficit Hyperactivity Disorder (ADHD). An unsupervised dataset for ADHD was converted into a classified dataset. To get better results, an ensemble machine learning model made up of five classifiers has been deployed. After that, linguistic models were employed to improve the machine’s ability to predict. This recently created dataset offers the potential for future developments such as chatbots that can be developed to provide people with psychological support and the application of natural language processing to identify the behavior pattern of ADHD patients. Keywords Machine learning · Ensemble machine learning · Mental health · ADHD
1 Introduction Mental health issues are increasing day by day. Mental health diseases come in many different forms, and ADHD is one of them. Attention Deficit Hyperactivity Disorder stands for the acronym ADHD. Typically, this condition affects the neurological S. M. Mumu (B) · H. Hoque · N. Sakib Ahsanullah University of Science and Technology, Tejgaon Dhaka, 1208, Bangladesh e-mail: [email protected] H. Hoque e-mail: [email protected] N. Sakib e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_33
395
396
S. M. Mumu et al.
system and the brains of people, leading to hyperactivity and difficulty to focus [4]. The inability to focus and maintain stillness are therefore symptoms of this ailment. Whether a youngster or an adult, it has an impact on how they live their lives. ADHD affects people in three distinct ways, including the hyperactive-impulsive type, the predominantly inattentive type, and a mix of these two kinds. Researchers studying this illness have discovered these three diverse forms. Both children and adults who are hyperactive-impulsive find it difficult to sit still. As a result of their anxiety, they could speak crudely or have bodily twitches. They could seem to have endless energy, which would explain why they are unable to stop moving. The majority of persons with ADHD have this problem. It’s recognized as ADHD-PH in medical parlance. The patient is unable to maintain focus and readily becomes distracted when they have an inattentive type. ADHD-PI is the name of this illness. Another is the combined kind, which shows symptoms of both ADHD-PH and ADHD-PI [1]. The inability to pay attention, following instructions inadequately, making thoughtless mistakes, acting disruptively, and many other symptoms are some of the most typical signs of ADHD. Sometimes, those who suffer from this condition have uncontrollable spending urges and lack concentration. They frequently overlook special arrangements, including parties, shopping, and other activities. This condition has medications that can keep it under control. Unfortunately, there is really very little understanding of ADHD in Bangladesh. It’s possible that some people are afflicted with this illness but are unable to receive the right care because of a lack of understanding. Machines are becoming friendlier to humans these days. They are trained to help and mimic humans. A vast number of machines are trained to behave intelligently using advanced machine learning techniques. For that purpose, a suitable dataset is required. Additionally, a large dataset [8] is discovered, although it is unsupervised and contains a lot of data that was extracted from Reddit. Therefore, the aim is to make the dataset supervised. In this method, 100 data were chosen as control data and expertly labeled. Three human annotators subsequently annotated 400 data following the control data [2]. Then, the data with human annotations were utilized as training data, while the control data were used as testing data. The highest accuracy from the three normal annotators was 83.5% which is used as the trustworthiness score [11]. The normal annotators annotated 4.5 thousand data while maintaining accuracy levels at or above the trustworthiness score. Then, utilizing ensemble machine learning [6], active learning was employed. To evaluate the outcomes of this study, a model is constructed that employs maximum voting to classify the input from the output of many machine learning models, including Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, and Gaussian Naive Bayes. This process improves the machine’s ability to predict the class. This combined classification gives more accuracy than a single classification [12]. The class of the text data was automatically predicted by the computer, and when it was necessary, the normal annotators verified and updated the class. Thus, 18 thousand data were labeled. Additionally, Deep Learning models like BERT and LSTM were useful in precisely classifying data so that it could be more accurate. [18]. The contribution of this work can be summarized with the following points:
A Classified Mental Health Disorder (ADHD) Dataset …
397
• The newly constructed model should perform well and be useful for both people who are seeking assistance online and those who are ignorant of their problem. • Addhe research community can use this dataset to classify mental health disorders more efficiently using machine learning and train more transformer models. • Chatbots can be created to provide people with psychological assistance. The rest of the paper contains some related works on this field in Sect. 2. Section 3 describes the datasets that were used in this research. Section 4 describes the newly produced dataset (Classified ADHD) and its description with the annotation process. Sections 5 and 6 contain the future scope of this work and bring the study to a close.
2 Related Work Due to the lack of awareness of ADHD, relatively few research have been conducted in this area. In recent years, a number of research studies on the interaction between ADHD and machine learning have been published. The majority of earlier publications have utilized various techniques or strategies for their intended purposes. To examine people’s psychology and emotions, RoBERTa, a transformer-based architecture model, is employed [19]. The initiated model has been compared using two classifiers: Long Shot-Time Memory (LSTM) and BERT. All the predictions were based on contextual information. The RoBERTa classifier outperformed other suggested models, according to the author, who claimed that this is the first multi-class model to employ the transformer-based architectural model RoBERTa to examine people’s emotions and psychology. Another study suggests a novel machine learning approach that analyzes student data and classifies ADHD [15]. The patients were selected from surrounding schools and varied in age from 10 to 12 years old. There were 28 patients with ADHD (4 females and 24 boys) and 22 healthy (3 girls and 19 boys). The task was performed twice by 17 ADHD persons in the subgroup, once with and once without medication. They created a meta-learner and employed a voting ensemble classification technique to categorize a declassified dataset. This model has a sensitivity of 0.821, a specificity of 0.727, and an AUROC [9] of 0.856. A machine learning approach has been proposed to classify whether children have ADHD or not [17]. The data used in this research have been extracted from 2018–2019 National Survey of Children’s Health. Among 45,779 children, about 5218 were with ADHD and the rest of the children were healthy. A combination of Oversampling and Undersampling approaches has been used to balance class level as the class level was highly imbalanced. Logistic Regression was used to extract the significant factors. Eight Machine Learning Classifiers have been used and among them, Random Forest provided the best accuracy of 85.5%, sensitivity of 84.4%, specificity of 86.4%, and an AUC of 0.94. To analyze Reddit user postings, a method was presented that combines Natural Language Processing (NLP) methods with machine learning approaches. People’s feelings are generally revealed through postings. As a result, the goal was
398
S. M. Mumu et al.
to discover the facts behind the depression of those Internet users [20]. The text was retrieved using a combination of n-gram, LDA, and LIWC features. With ngram, term frequency-inverse document frequency (TF-IDF) was also combined. The model was built using Logistic Regression, Support Vector Machine, Random Forest, Adaptive Boosting, and Multilayer Perceptron classifiers. Bigram was combined with SVM to improve the detection of depression. This approach achieved an accuracy of 80%. Multilayer Perceptron classifiers were used to manifest the combined features. They attained a greater level of accuracy of 91%. A system based on data extracted from Reddit has been proposed to diagnose people with long-term mental disorders such as Anxiety, Depression, Bipolar, and ADHD (Attention Deficit Hyperactivity Disorder) [21]. This suggested model includes a semi-supervised learning strategy (Co-training). Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB), and other frequently used classifiers have been employed. The dataset was divided into 80-20 percent for training and testing data, respectively. For this study, a total of 3922 postings were counted. For feature extraction, Term Frequency-Inverse Document Frequency (TF-IDF) was used. Pruning and chi-squared methods were used to feature selections. Support Vector Machine (SVM) with Co-training yielded F-measure= 0.84, Naive Bayes with Co-training yielded F-measure= 0.83, and Random Forest with Co-training yielded F-measure= 0.83. Based on behavioral differentiation, a study used machine learning models to identify autism and ADHD [14]. The dataset stemmed from the primarily autismbased collection has a significant imbalance. A total of 65 questions were chosen as features, and each response was graded from 1 to 4. The complete dataset was broken down into ten stratified folds, which were then used for cross-validation in the machine-learning pipeline. SVC, LDA, Logistic Regression, Random Forest, Decision Tree, and Categorial Lasso were employed; five of the six algorithms performed nearly as well with five features or fewer, and four of the algorithms (SVC, LDA, Categorial Lasso, and Logistic Regression) performed with a similar accuracy of 0.962 to 0.965 using the same five items and are therefore considered the best models. Another study uses machine learning to classify ADHD into subtypes based on EEG measures, using distinct EEG measurements of an ADHD patient compared to a normal patient [22]. EEG power spectra are used here for the EEG measurements. 117 adult patients were tested, and 50 of them served as controls. Four different EEG situations, including EC, EO, VCTP, and ECTP, were recorded. These four circumstances’ obtained data were measured independently. In order to choose the best feature, the SVM classifier was then employed in conjunction with a forward selection strategy, and the classifier was then trained. To determine the outcome, voting was used to combine the results from different classifiers. As a result, the outcome was enhanced with a 100% accuracy rate in subgroup differentiation.
A Classified Mental Health Disorder (ADHD) Dataset … Table 1 ADHD dataset Title Selftext Not feeling affects of aderall.
Score
I’ve been on 30 aderall for a couple weeks now and it seems like the boost in my ability to pay attention has already faded
Table 2 ADHD women dataset Body Score You definitely 345 need to talk to HR. Politely. It’s illegal for them to deny you a job for a legitimate medical condition and prescribed medications
399
Id
Created-utc
Createddatetime
kzux1
1239041887
2009-05-31 17:08:19
id
Created-utc
Created-datetime
cy5o5j8
1304183283
2011-04-30 17:08:03
3 Dataset Description The name of the dataset is “Reddit ADHD Dataset” [8] which was gathered from a website. There are four csv files in the dataset. Two csv files named “ADHD” and “adhdwomen” were used in this work. The dataset corpus consists of eight columns, including the post, title of the post, a score that determines the total number of likes, id of the user, URL of the post, number of comments, generated coordinated universal time, and created date and time acquired from Reddit [7] (Table 1). The dataset “ADHD” contains around three lakh grades, whereas “adhdwomen” has forty-four thousand grades. The datasets are unsupervised because they are not processed to any output [10] (Table 2). A section of another sentiment dataset was also used since the maximum post portrays the lives of ADHD people. It reduces the overfitting of the model. The name of the dataset is “Emotions in Text”, and its corpus consists of two columns labeled text and emotion, each with around 25 thousand grades [3]. This dataset only included texts from positive emotion classifications. As a result, our computer will be properly trained and will not be biased (Table 3).
400 Table 3 Positive sentiment dataset Text
S. M. Mumu et al.
Emotion
I am ever feeling nostalgic about the fireplace i Love will know that it is still on the property I have been with petronas for years i feel that Happy petronas has performed well and made a huge profit
Table 4 Classified ADHD dataset Sentence
Label
I feel like all my struggles in the past and Yes present make sense now. I always thought I was too dumb or lazy to do well in school, because it was so hard to focus on anything. I’ve dropped out of college twice. First time due to failing, second time I impulsively withdrew from all my classes the night before the semester started. Now that I know I have ADHD I am determined to learn how to manage it, so that I can return to school. Anyone else a college student? How do you manage? I have problems with studying for more than five minutes at a time I enjoyed this semester and i enjoyed the No challenges i got to face and overcome and i feel that i m really coming away with a lot of valuable experience out of this
4 Classified ADHD Dataset Construction The name of our labeled dataset is “Classified ADHD Dataset”.
4.1 Corpus Description The corpus of this dataset consists of two columns named sentence and label. Each column contains around eighteen thousand grades. The sentence column contains text data from the previously mentioned datasets. Highly scored data from “ADHD”, “adhdwomen” and positive emotion texts from “Emotions in Text” were collected [13]. The label column indicates whether or not the corresponding sentence reflects ADHD (Table 4). Around eighteen thousand data was classified in which around ten thousand data is classified as “Yes” and around eight thousand data is classified as “No” (Fig. 1).
A Classified Mental Health Disorder (ADHD) Dataset …
401
Fig. 1 Pie chart
Fig. 2 Model diagram
Firstly, an expert annotated one hundred data, and a trustworthiness score of 83.50% was fixed. This one hundred data was used to train the machine. Following that, some test data was given to the computer to predict the result of the classification. The accuracy was then calculated. The data that achieved the trustworthiness score were used to train the machine (Fig. 2).
4.2 Use of Active Learning Five Machine Learning models were applied in the text classification process: Logistic Regression, Naive Bayes Classifier, Support Vector Classifier, Random Forest Classifier, and Decision Tree Classifier. The activation function for logistic regression is sigmoid. The alpha value is 1, and the Gaussian Naive Bayes Classifier is
402
S. M. Mumu et al.
Fig. 3 ROC curve
employed. The “rbf” kernel is utilized for SVM since the hyper-plane would be non-linear for a large amount of data variation. 100 estimators are utilized for the Random Forest Classifier using the “gini” criterion and the maximum features of “sqrt,” and the kernel co-efficient gamma is 0. The “gini” criterion was also used by the Decision Tree Classifier. The predicted output from these five classifiers was ensembled using maximum voting to improve prediction accuracy. After manually identifying over three thousand data, active learning was implemented [6]. The expected outcome of active learning was manually cross-checked. As a result, the dataset was transformed into a supervised data set [10]. The ensemble model performed admirably in terms of accuracy (Fig. 3).
4.3 Use of Language Model The supervised dataset, gained from ensemble learning and active learning, was trained by some Deep Learning language models such as BERT and RNN to achieve greater accuracy [18].
4.4 Corpus Evaluation The quality of the newly annotated dataset is measured by this process.
4.4.1
Inter-Rater Reliability [5]
Inter-rater reliability assesses the degree of agreement between the subjective assessments of various raters, inspectors, judges, or appraisers. Inter-Rater Reliability was
A Classified Mental Health Disorder (ADHD) Dataset …
403
employed in our study to label the dataset. Here, five machine learning models (Naive Bayes Classifier, Decision Tree Classifier, Support Vector Classifier, Random Forest Classifier, and Logistic Regression) have been used for classification. To forecast the outcome, the prediction was chosen by majority voting.
4.4.2
Trustworthiness Score [11]
Our 100 labeled data were accepted by a mental health specialist. The data is considered the control data. Three different annotators labeled 400 more data using control samples, and the accuracy of the newly labeled data was measured by keeping the control data on the test. The trustworthiness score of this dataset is fixed on the maximum accuracy of 83.50 percent of the three annotators. This dataset does not contain data that did not meet the threshold for trustworthiness.
4.4.3
Confidence Score [16]
The dataset, Classified ADHD, is constructed through binary classification. It has two classes such as “Yes” and “No”. Five Machine Learning models were applied to predict the output, and maximum voting was used to decide the final label or class. From eighteen thousand data, 9952 data is classified as “Yes⣙⣙ and 8048 data is classified as “No”. Five classifier models predicted 8016 data as “Yes”, four classifier models predicted 1247 data, three classifier models predicted 689 data, two classifier models predicted 1650, and only one classifier model predicted 2932 data as “Yes”. So, it can be said that 80.5% “Yes” classified data was predicted as “Yes” by five different classifiers, whereas 36.4% “No” classified data was predicted and “No” by five different models.
4.4.4
Dataset Discussion
The newly constructed Classified ADHD dataset has over 18,000 data. With a split of 70% train data and 30% test data, accuracy increased to 94.92%, and with a split of 80% train data and 20% test data, ensemble learning increased accuracy to 95.05%. The Language model LSTM achieved an accuracy of 94.1%, whereas BERT [23] achieved an accuracy of 89.83%.
404
S. M. Mumu et al.
5 Conclusion In classification, the content of the input is very important. If the input content contains more information, the machine can learn properly. Machine Learning and Deep Learning keep a great impact on the field of classification. They can give good results if given quality content as input. Also, if the prediction from different Machine Learning models is combined, the machine can get good accuracy. Then, the Deep Learning Language Model gives good feedback. So, combining the prediction of different Machine Learning models to get more accuracy and classify a little known mental health disorder was the main work of this paper.
6 Future Work Our constructed dataset contains the daily life and emotions of an ADHD patient as well as some positive emotions of normal people. So, this dataset can be used in sentiment analysis. The positive classified texts can be used in detecting multiclass mental disorders. Natural Language Processing can be applied to this dataset to detect words or patterns of ADHD patients to train machines for further research. Also, by recognizing patterns in the texts, it can be used in building chatbots to help people who need mental support.
References 1. About attention deficit hyperactivity disorder (2022) https://www.genome.gov/GeneticDisorders/Attention-Deficit-Hyperactivity-Disorder#: 2. Data annotation: what is it? (2022) https://appen.com/blog/data-annotation/#: 3. Emotions in text (2022) https://www.kaggle.com/datasets/ishantjuyal/emotions-in-text? resource=download 4. Information on adhd (2022) https://www.psychiatry.org/patients-families/adhd/what-is-adhd 5. Inter-rater reliability: definition, examples & assessing (2022) https://statisticsbyjim.com/ hypothesis-testing/inter-rater-reliability/ 6. Ml: active learning (2022) https://www.geeksforgeeks.org/ml-active-learning/ 7. Reddit (2022) https://www.reddit.com/ 8. Reddit adhd dataset (2022) https://www.kaggle.com/datasets/jerseyneo/reddit-adhd-dataset 9. Roc curves (2022) https://acutecaretesting.org/en/articles/roc-curves-what-are-they-andhow-are-they-used 10. Supervised versus unsupervised learning (2022) https://www.ibm.com/cloud/blog/supervisedvs-unsupervised-learning 11. What is trustworthiness in qualitative research? (2022) https://www.statisticssolutions.com/ what-is-trustworthiness-in-qualitative-research/ 12. Abbott D (1999) Combining models to improve classifier accuracy and robustness. Inf Fus— INFFUS 13. Althnian A, AlSaeed D, Al-Baity H, Samha A, Dris A, Alzakari N, Abou Elwafa A, Kurdi H (2021) Impact of dataset size on classification performance: an empirical evaluation in the medical domain. Appl Sci 11:796. https://doi.org/10.3390/app11020796
A Classified Mental Health Disorder (ADHD) Dataset …
405
14. Duda M, Ma R, Haber N, Wall D (2016) Use of machine learning for behavioral distinction of autism and adhd. Transl Psychiatry 6(2):e732–e732 15. Khanna S, Das W (2020) A novel application for the efficient and accessible diagnosis of adhd using machine learning (extended abstract) 16. Mandelbaum A, Weinshall D (2017) Distance-based confidence score for neural network classifiers. arXiv:1709.09844 17. Maniruzzaman M, Shin J, Hasan MAM (2022) Predicting children with adhd using behavioral activity: a machine learning analysis. Appl Sci 12(5):2737 18. Munikar M, Shakya S, Shrestha A (2019) Fine-grained sentiment classification using bert 19. Murarka BRSRA (2020) Detection and classification of mental illnesses on social media using roberta 20. Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in reddit social media forum 21. Tariq S, Akhtar N, Afzal H, Khalid S, Mufti MR, Hussain S, Habib A, Ahmad G (2019) A novel co-training-based approach for the classification of mental illnesses using social media posts 22. Tenev A, Markovska-Simoska S, Kocarev L, Pop-Jordanov J, Müller A, Candrian G (2014) Machine learning approach for classification of adhd adults. Int J Psychophysiol 93(1):162–166 23. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Language Identification in Multilingual Text Document Using Machine Learning Md. Mahmodul Hasan, A. S. M. Shafi, and Al-Imtiaz
Abstract Multilingual text documents have been used for decades to overcome language barriers. Multilingual documents in Bangladesh frequently include Bengali, English, and Arabic words. However, Optical Character Recognition (OCR) technologies struggle to correctly identify characters in documents written in multiple languages. Therefore, OCRs require a technique that helps to identify languages from words in the documents. This study proposes a machine learning-based automatic language recognition method and producing OCR masks from multilingual documents. By employing the masks, language-specific OCRs recognize the words in a document to generate machine-encoded text. This research utilizes hand-crafted feature extraction techniques with machine learning algorithms (support vector machine, random forest, decision tree, gradient boosting, and k-nearest neighbors) to identify words from different languages. This research also produces a new dataset for multilingual recognition tasks containing words from Arabic, Bengali, and English. Our suggested approach performs remarkably well at creating language masks for particular OCRs, facilitating multilingual text recognition challenges by simplifying OCR selection. Keywords Language Recognition · Multilingual Text Document · Bengali OCR · Language Specific Characteristics
Md. M. Hasan Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail-1902, Bangladesh A. S. M. Shafi · Al-Imtiaz (B) Department of Computer Science and Engineering, University of Information Technology & Sciences (UITS), Dhaka-1212 Gazipur, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_34
407
408
Md. M. Hasan et al.
1 Introduction Automatic document processing facilitates the transformation of physical, real-world documents into digital text, which is incredibly useful for further processing, such as storing, retrieving, and indexing vast volumes of data. Optical Character Recognition (OCR) converts printed text/images into digital form. Daily, multilingual papers, including government documents, forms, and other paperwork, are pervasive everywhere. For such multilingual documents, an OCR program for a particular language will not function properly [1]. Many kinds of research have been conducted on OCR with satisfactory outcomes. However, most of them are language specific. This means an OCR algorithm designed for English will only work correctly with English [1]. OCR needs a training data set or unique algorithm for each language to recognize characters. The effectiveness of OCR is also influenced by the input data, noise, font family, and other factors. As a result, language recognition is crucial before running an individual OCR for a particular language to create a successful multilingual OCR [2]. All the language identification techniques under the printed category can be done by word-level language identification. Language identification in a document is the primary importance of a multilingual document. Documents can be classified based on the languages and, further, knowing the language of the text in the image [2]. The language-specific OCR system can be applied to extract textual information. The fundamental problem involved here is the classification of the textual regions into language. The overall language recognition operation may be broken down into three steps: pre-processing, language detection, and post-processing [3]. A reliable language recognition system could be designed by studying language’s distinctive character. This research suggests a method for recognizing language based on identifying language-specific properties, specifically the projection profiles of the words. We have utilized multiple machine learning classification algorithms to identify the languages using extracted features from images of words. This research has also created a new dataset for multilingual texts comprising Arabic, Bengali, and English written words. Individual classifiers were evaluated using samples of fonts of varying sizes and styles. The suggested technique performed remarkably well on multilingual printed materials, with enhanced performance by creating language masks for different OCRs. The rest of the paper is organized as follows: Sect. 2 focuses on related works. Section 3 discusses the system architecture of our proposed methodology. Section 4 gives the experimental findings, analysis, and discussion of the proposed model. Finally, Sect. 5 draws the conclusion of this work.
Language Identification in Multilingual Text Document Using Machine …
409
2 Literature Review Automatic language identification has been studied in many works for different languages. The authors of [4] categorized texts based on the five component qualities’ mean, standard deviation, and skew. Using linear discriminant analysis of retrieved characteristics, the documents were eventually classified with an average accuracy of 88% [5]. Evaluated the difficulty of detecting written scripts in Indian official document images. Based on AAR (average accuracy rate) and MBT (model building time), the efficacy of several well-known classifiers is evaluated. The experiment was conducted on 459 images of paper material, and the logistic regression model had the highest AAR (98.9%) of all models. In addition, the experimental findings suggest that the BayesNet and random forest classifier with the lowest MBT of 0.09 s had average accuracy rates of 96.7% and 98.2%, respectively. The study’s authors [6] developed a straightforward method for recognizing Kannada, English, and Hindi text from a printed document. The horizontal and vertical projection profiles served as the foundation for their identifying language. Using their proposed method, the classification rates for Kannada, English, and Hindi were 98.25%, 99.25%, and 98.87%, respectively. Chaudhury et al. suggested two trainable classification approaches for identifying Indian scripts [7]. These approaches have not leveraged projection profile characteristics to their full potential and lack of evaluating their methods on larger diverse datasets. Directional discrete cosine transforms (D-DCT) were used in [8] for word-level handwritten script identification. Authors have classified words using LDA and knearest neighbor (k-NN) by calculating the mean and standard deviations of the left and right diagonals of DCT coefficients. Authors of [9] have automatically identified Arabic and Latin words from handwritten or printed documents based on fractal analysis features. Their experimental results obtained a classification rate of 96.64% and 98.72% using k-NN and RBF, respectively. Acharya et al. presented a strategy for identifying Kannada, Malayalam, Telugu, Tamil, Gujarati, Hindi, and English words from printed texts by utilizing projection profiles and top pitch information. Padma et al. proposed a language identification system for Kannada, Hindi, and English text lines from printed documents based on the characteristic features of the top and bottom profiles of the input text lines and achieved an accuracy of 95.4% [3]. In this research, we have developed a method to recognize languages of words from multilingual text documents containing Arabic, Bengali, and English.
3 System Architecture This section contains the proposed methodology of our work, including data collection, processing data for suitable representation, feature extraction, classification, and mask creation. The block diagram of our study is in Fig. 1.
410
Md. M. Hasan et al.
Fig. 1 Block diagram of the proposed methodology
3.1 Data Collection We have collected data from several images of mixed text documents containing Bengali, English, and Arabic words. The dataset includes a total of nine thousand words of the languages with an equal number of words. Therefore, this study has labeled them as 0, 1, and 2 for Arabic, Bengali, and English, respectively. The dataset is available at https://github.com/MahmodulHasan/OCR_Language_Recogn ition. Figure 2 presents a few samples of the dataset. We took 70% of the images for training and the remaining for testing and validation.
Language Identification in Multilingual Text Document Using Machine …
(a)
(b)
411
(c)
Fig. 2 Sample images from the dataset a Bengali b English and c Arabic image
Fig. 3 Illustration of pre-processing operation on word images a input image b grayscale image c binary image, and d inverted image
3.2 Preprocessing In real-life scenarios, printed words may have a variety of hues, font families, sizes, and orientations. Therefore, they must be converted into a uniform representation. The method of feature extraction operates on transformed images. We transformed the color space of images to grayscale for transformation. This conversion reduces the difficulty of recognizing images of different color spaces. However, there is still scope for improvement concerning categorization challenges. Thus, we converted grayscale images to binary images and reversed their intensity values. Figure 3 depicts the outcomes of different stages.
3.3 Feature Extraction This study uses hand-crafted feature extraction techniques to extract features from images. In addition, we have utilized different language-specific knowledge of properties of features. Feature 1: Max-index: The position of the maximum number of white pixels in a binary image on the Y axis. Figure X shows the visualization of this feature. Feature 2: Min-index: The position of the minimum number of white pixels in a binary image on the Y axis. Figure 4 shows the visualization of this feature.
412
Md. M. Hasan et al.
Fig. 4 Visualization of max and min index features
Feature 3: Absolute difference between the top-min-index and bottom-min-index. We divide each image into the top and bottom parts for this feature. Figure 5 presents the division. Then we calculate the min index for each part. The absolute difference between the two min indexes is our feature 3. Feature 4: The absolute difference between the top-min-index and middle-index. This feature is extracted similarly to feature 3. However, this feature is computed using the max index’s absolute difference instead of the min index. Figure 6 shows the illustration. Feature 5: The absolute difference between the middle-index and bottom-minindex. Feature 5 is demonstrated in Fig. 7. Feature 6: Absolute difference between the first and second-top-max-index. This feature plays a vital role in classifying bold texts. Feature 7: Absolute difference between the first and second-bottom-max-index. Feature 7 has the same impact as feature 6.
Fig. 5 Visualization of top and bottom features
Fig. 6 Visualization of top max and bottom max features
Fig. 7 Visualization of middle and bottom min features
Language Identification in Multilingual Text Document Using Machine …
413
3.4 Classification These features are extracted from each word of the text document and classified using machine learning algorithms. We have applied classical machine learning algorithms for classification. We used multiple algorithms: Support Vector Machine (SVM), Random Forest (RF) [11], Decision Tree (DT), Gradient Boosting (GB) [12], K Nearest Neighbors (KNN), and analyzed their performance in Sect. 4. Among these methods, GB achieved the best accuracy score. Classification results are analyzed using both train-test-validation and k-fold cross-validation settings. For train-testvalidation, we have split the dataset as 70% for training, 20% for testing, and the remaining 10% for the validation set. Cross-validation results are analyzed using fivefold cross-validation.
4 Experimental Results and Discussion In our work, we have used a computer with (Intel(R) Core (TM) i5-10,300 CPU @3.60 GHz 8.00 GB RAM) with an NVIDIA GeForce RTX 2070 Super to implement our proposed method. We have analyzed the performance of our proposed method with the help of different performance indexes such as precision, recall, and accuracy shown in Table 1. Recall, also called sensitivity, refers to the ability to measure the proportion of actual positives that are correctly identified as such. Precision can be estimated by calculating how many positives were correctly predicted out of all predicted positive cases. F1-score is the weighted average of precision and recall. Accuracy is the sum of true positives and false positives divided by the total number of evaluated cases (true positives, true negatives, false positives, and false negatives). fivefold cross-validation (CV) has been employed to assess the method’s performance. The language identification results are summarized in Table 2 based on SVM, RF, GB, DT, and KNN classifiers. Here TP (True Positive): The number of correct classifications of positive examples, TN (True Negative): The number of correct classifications of negative examples, FP (False Positive): The number of incorrect classifications of negative examples, FN (False Negative): The number of incorrect classifications of positive examples. Table 1 Evaluation metrics
Evaluation metrics
Formula
Recall
TP T P+F N TP T P+F P Pr ecisionx Recall 2 ∗ Pr ecision+Recall T P+T N T P+T N +F P+F N
Precision F1-score Accuracy
KNN
DT
GB
98
99
96
English
90
English
Bengali
96
Arabic
99
99
English
Bengali
99
Arabic
100
90
English
Bengali
97
Bengali
Arabic
97
89
English
Arabic
92
Bengali
RF
98
Arabic
SVM
Fold 1
97
99
99
93
95
100
99
99
100
90
96
97
89
92
98
Fold 2
Precision (%)
Class
Models
Performance metrics
Table 2 Performance analysis using fivefold validation
98
99
99
92
92
100
99
99
99
87
97
98
90
91
98
Fold 3
96
99
99
91
91
100
99
99
100
88
97
98
88
91
100
Fold 4
97
99
99
93
91
100
99
99
99
90
97
97
88
93
98
Fold 5
98
95
100
96
89
99
99
99
100
96
88
99
92
87
100
99
97
100
96
93
99
99
98
99
95
87
99
92
87
100
Fold 2
Recall (%) Fold 1
99
96
100
97
90
94
99
99
100
97
87
100
91
87
100
Fold 3
99
96
100
97
90
94
99
99
100
97
87
100
91
87
100
Fold 4
99
96
100
97
93
94
99
99
100
96
88
100
93
86
100
Fold 5
98.4
94.7
99.2
94.2
93
Avg Acc. (%)
414 Md. M. Hasan et al.
Language Identification in Multilingual Text Document Using Machine …
415
0.900
0.993
0.965
0.986
0.957
0.913
0.994
0.985
0.997
0.993
0.972
0.969 0.901
0.950
0.881
1.000
0.927
1.050
0.982
Figure 8 illustrates the precision results of several classifiers on the test dataset. Figure 8 demonstrates that GB has attained the highest level of language identification precision. Figure 9 depicts the recall of several classifiers on the test set. Figure 9 demonstrates that k-NN has achieved the maximum precision for identifying Arabic languages. However, GB obtained the highest scores in Bengali and English. Figure 10 shows the F1-score of several classifiers on the test dataset. Figure 10 validates that the GB classifier has achieved the best F1-score in all languages. Table 2 shows that the performance of GB is notable in classifying Arabic, Bengali, and English languages with an average accuracy of 99.2%, whereas k-NN is 98.4%. It also shows promising results with an average identification accuracy of 94.7% of DT classifiers. In addition, the worst performance of SVM classifiers is nearly 93%. Table 3 compares the results of relevant research on language detection from multilingual text documents. However, the vast majority of studies validated their methodologies by applying them to their private datasets. Thus, we compared the performance of the same languages in our research and contrasted it to the performance of past studies.
0.850 0.800 SVM
RF
GB
Arabic
DT
Bengali
KNN
English
0.900
0.985
1.000
0.958
0.964
0.906
0.992
0.992
0.999
0.985
0.963
0.993 0.882
0.950
0.929
1.000
0.864
1.050
0.998
Fig. 8 Precision of different classifiers
0.850 0.800 0.750 SVM
RF Arabic
Fig. 9 Recall of multiple classifiers
GB Bengali
DT English
KNN
0.900
0.975
0.993 0.938
0.931
0.975
0.993
0.989
0.998 0.931
0.982 0.923
0.950
0.905
1.000
0.894
1.050
0.989
Md. M. Hasan et al. 0.990
416
0.850 0.800 SVM
RF Arabic
GB Bengali
DT
KNN
English
Fig. 10 F1-score of different classifiers
Table 3 Comparative analysis of different studies in language recognition
Language
Ref
Precision
Recall
Arabic
Moussa et al
97.3
97.3
This Study
99.7
99.9
Bengali
Chaudhury et al
75.4
86.0
This Study
99.3
98.5
Padma et al
97.2
95.6
Chaudhury et al
81.5
88.0
Kumer et al
98.6
99.2
This Study
98.5
99.2
English
5 Conclusion The research has developed a method for detecting the languages of words in multilingual texts and generating language-specific OCR masks. This study utilized handcrafted features from word images that are intuitive for language identification using machine learning algorithms with notable performance in language identification. This enabled us to produce the masks for the various OCRs to convert into the machine-encoded text of different languages. In terms of performance measures, gradient boosting outperformed other machine learning techniques. Compared to previous research published in the literature, the proposed approach for feature extraction delivers superior results. The work can be enhanced in the near future by employing deep learning techniques to recognize more languages.
References 1. Jayanthi N, Harsha H, Jain N, Dhingra IS (2020) Language detection of text document image. In: 2020 7th International Conference on Signal Processing and Integrated Networks, SPIN
Language Identification in Multilingual Text Document Using Machine …
417
2020. https://doi.org/10.1109/SPIN48934.2020.9071167 2. Babhulgaonkar A, Sonavane S (2020) Language Identification for multilingual machine translation. In: Proceedings of the 2020 IEEE International Conference on Communication and Signal Processing, ICCSP 2020. https://doi.org/10.1109/ICCSP48568.2020.9182184 3. Padma MC, Vijaya PA, Nagabhushan P (2009) Language identification from an indian multilingual document using profile features. In: Proceedings—2009 International Conference on Computer and Automation Engineering, ICCAE 2009. https://doi.org/10.1109/ICCAE.200 9.35 4. Hochberg J, Bowers K, Kelly P, Cannon M (1999) Script and language identification for handwritten document images. Int J Doc Anal Recognit. https://doi.org/10.1007/s10032005 0036 5. Obaidullah SM, Mondal A, Das N, Roy K (2014) Script Identification from Printed Indian Document Images and Performance Evaluation Using Different Classifiers. Appl Comput Intell Soft Comput. https://doi.org/10.1155/2014/896128 6. Kumar A (2012) Discrimination of english to other indian languages (Kannada And Hindi) for ocr system. Int J Comput Sci Eng Appl. https://doi.org/10.5121/ijcsea.2012.2214 7. Chaudhury S, Harit G, Madnani S, Shet RB (2000) Identification of scripts of Indian languages by Combining trainable classifiers. In Proc. of ICVGIP 8. Hangarge M, Santosh KC, Pardeshi R (2013) Directional discrete cosine transforms for handwritten script identification. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. https://doi.org/10.1109/ICDAR.2013.76 9. Moussa S, Ben Zahour A, Benabdelhafid A, Alimi AM (2008) Fractal-based system for Arabic/Latin, printed/handwritten script identification. In: Proceedings—International Conference on Pattern Recognition. https://doi.org/10.1109/icpr.2008.4761838. 10. Acharya DU, Gopakumar R, Aithal PK (2010) Multi-script line identification system for indian languages. J Comput 2(11):107–111 11. Breiman L (2001) Random forests. Mach Learn. https://doi.org/10.1023/A:1010933404324 12. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) LightGBM: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems
Sentiment Analysis of Restaurant Reviews Using Machine Learning M. Abdullah, Sajjad Waheed, and Sohag Hossain
Abstract Nowadays, everything is based on the Internet. People generally love to express their opinions. Now people use different kinds of online platforms such as Google, Facebook, and Twitter to express their opinions about different kinds of things such as foods, products, services, and restaurants. Customer or user opinions are very important information for both customers and the management to know customer sentiment and whether the customers like the service or not. However, evaluating individual customer opinions (unstructured texts) is so difficult, when the amount of reviews is large. As a result, we ought to include a technique to automatically evaluate customer opinions or reviews and deliver the essential findings in a definite way. In this paper, we have created a dataset of 1000 reviews, applied different machine learning classification algorithms to those reviews, and observed their accuracy. We got the highest accuracy of 97.6% for Support Vector Classifier (SVC) to determine the reviews as positive, negative, and neutral. Keywords Sentiment analysis · Customer opinion · Support vector classifier · Machine learning
1 Introduction Day by day, Internet users all over the world are growing constantly. People now spend more time online than offline. Now people before doing anything first check it on the Internet. So before going to any restaurant, customer or user wants to S. Waheed and S. Hossain contributed equally to this work. M. Abdullah (B) · S. Waheed · S. Hossain Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail 1902, Bangladesh e-mail: [email protected] S. Waheed e-mail: [email protected] S. Hossain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_35
419
420
M. Abdullah et al.
know the status of the restaurant, food quality, ambiance, service provided by the restaurant management, or what others think about the restaurant. Based on these things, customer decides whether they will go to that restaurant or not. So an analysis of customer reviews of the restaurant will play a vital role in the restaurant business. Management should know what customer thinks about their restaurant. If they can know the sentiment of the customer, then they can take their business decision easily. Sentiment analysis can be described as an automated process for determining human emotions within the text. The emotions can be positive, negative, or neutral. It also includes the study of text data, natural language processing (NLP), and computational linguistics for scientifically identifying, studying, and extracting subjective or topic-based information from various textual data. The amount of reviews is huge and even though it is not impossible, it will extremely take a long time for a customer to make a final decision by analyzing it. So by automating this, a huge amount of time can be saved for customers and restaurant management. A lot of research work has been done previously to perform sentiment analysis, and each has its advantages and disadvantages. Some researchers try to perform sentiment analysis using opinion mining, some use a deep learning approach, and some also use a machine learning approach. The focus of this research work is to determine the reviews and classify them into three classes such as positive, neutral, and negative. NLP approaches are employed for data cleansing for this aim, which are tokenization, removal of stop words, and lemmatization. After data cleaning, different supervised learning algorithms such as K-Nearest Neighbor (KNN), Decision tree, SVC, Naïve Bayes, Random Forest, and Logistic Regression are applied. Different measures were employed to assess the model’s performance, including precision, recall, F1-score, accuracy, and confusion matrix. The following are the paper’s contributions: • First, we made a dataset of 1000 reviews and corresponding ratings collected from Google. From those reviews, we have created different features such as food, service, room, decoration, pool, Wi-Fi, bathroom, parking, stay or not, and recommendations. • The text reviews are preprocessed by NLP techniques such as tokenization, punctuation removal, stop words removal, and lemmatization. • Feature extraction on text reviews was performed. • We have presented a comparative performance evaluation of six different machine learning algorithms. • the reviews were categorized into three different classes: negative, neutral, and positive. The remainder of the research work has been arranged as indicated: Sect. 2 Related works; Sect. 3, Methodologies descriptions; Sect. 4 shows the different experimental results; Sect. 5, Conclusion.
Sentiment Analysis of Restaurant Reviews Using Machine Learning
421
2 Literature Review Sentiment analysis is a strategy for NLP techniques to identify human emotions within the text. There are lots of research works that have been done in this field of sentiment analysis. Some research analysis has been carried out recently. The machine learning technique Naïve Bayes was used by Hamad et al. [1] to carry out the research on a dataset containing 1000 tweets of restaurant reviews collected from Kaggle. They simply preprocess the data but don’t use any kind of feature extraction techniques such as Count Vectorizer, Bag of Words (BOW), and Term Frequency-Inverse Document Frequency (TFIDF) transformer. They have only achieved 73% accuracy in the category of the review as positive or negative. Support Vector Machine Classification algorithm is used by Hasan et al. [2] to perform sentiment analysis. They perform feature extraction techniques using BOW, TFIDF feature extractor along with Ngram after preprocessing the data. The dataset was collected from Kaggle which was their main drawback. They achieved maximum accuracy of 91.53% which can be further improved. Several Machine Learning algorithms such as Naïve Bayes, Logistic Regression, Support Vector Machine, and Random Forest have been applied by Zahoor et al. [3] to carry out a sentiment analysis of restaurant reviews and perform category classification of reviews on a dataset which was collected from the Facebook Community in Karachi. They have achieved maximum accuracy of 95% by the Random Forest algorithm to classify the reviews as positive or negative which can be improved. Comparative analysis of evaluation metrics of each classification algorithm such as precision, recall, F1-score, and accuracy is presented. Similarly, Naïve Bayes and Support Vector Machine algorithms were used by Rajasekaran et al. [4] to carry out a sentiment analysis on a dataset containing restaurant reviews through Twitter tweets. They don’t include any feature extraction techniques. They achieved the highest accuracy of 72% by Naïve Bayes classifier which can be improved a lot. Huda et al. [5] used NLP techniques to preprocess the data. Count Vector and TFIDF were used with BOW to perform feature extractions. Then different machine learning algorithms such as Naïve Bayes, Logistic Regression, Decision Tree, and Support Vector Machine classifier were applied. The dataset only contains 700 reviews of different restaurants. They have got the highest accuracy of 95% by SVM to categorize the review as satisfactory or poor. Classification model evaluation metrics are also presented. Krishna et al. [6] also apply different machine learning classification algorithms to perform sentiment analysis on the restaurant reviews dataset which they collected from Kaggle. They have performed preprocessing in three steps such as punctuation removal, NLP techniques, and stemming. After that, they made BOW. They achieved maximum accuracy of 94.56% by SVM to classify the review as positive or negative. Hemalatha et al. [7] proposed a method to classify Yelp reviews on local businesses as positive or negative. The dataset contains a total of 16,000 data. In their proposed method, data was preprocessed by NLP techniques such as tokenization and parts of speech tagging. After that, they perform feature extractions by mapping word frequency, and then they applied different classification algorithms.
422
M. Abdullah et al.
They have got maximum accuracy of 78.44% which can be improved. A recommendation system is proposed by Asani et al. In Matlatipov et al. [8], both machine learning and deep learning techniques such as Logistic Regression, SVM, RNN, and CNN are applied to perform sentiment analysis of restaurant reviews in the Uzbek language. NLP techniques were used to preprocess the data. N-grams were used with the preprocessed data. They have got highest accuracy of 91% by logistic regression based on character n-grams. Hossain et al. [9] proposed deep learning-based techniques called Bi-directional Long Short-Term Memory (i.e., BiLSTM) to classify the sentiment as positive or negative and got an accuracy of 91.35%. Their proposed framework consists of three modules. These are text-to-vector representation, the BiLSTM model which consists of three major blocks, and the classification model. Instead of using BiLSTM, Hossain et al. [10] propose a technique called combined Convolutional Neural Network Long Short-Term Memory (i.e., CNN-LSTM) to categorize the sentiment of the text as positive or negative. They have got the highest accuracy of 94.22%. From the above discussion, it is clear that all the authors only consider the positive and negative aspects of the review. But the review can be neutral. That is neither positive nor negative. The accuracy also can be improved more. In this study, we conducted sentiment analysis and classified the restaurant reviews into three classes negative, neutral, and positive with higher accuracy than those we described. We have implemented NLP techniques and supervised Machine Learning algorithms to achieve this.
3 Methodology This section provides details of how we carried out the analysis. The steps that we followed are illustrated by using Fig. 1.
Fig. 1 Flowchart of working mechanism
Sentiment Analysis of Restaurant Reviews Using Machine Learning
423
3.1 Data Collection The data was gathered from Google. We had taken 1000 reviews, and corresponding ratings from different restaurants in Dhaka, Sylhet, and Sonargaon. From these reviews, we have created different features such as whether the reviewer stays or not, service, room, food quality, decoration, bathroom, pool, Wi-Fi, parking, and recommendation or not.
3.2 Data Cleaning/Preprocessing Data preprocessing is very important to get a good prediction model. The dataset can contain null values, unnecessary data, duplicate data, or missing values. Text data can contain different types of symbols, extra letters, words, links, or any special characters. To properly analyze the sentiment of the text data, we need to remove those things from our dataset. • Convertintolowercharactertext:allthetextinthedatasetsmustbeinthesameformat that is the lower case, otherwise the corpus may contain the same word more than once because the machine will treat capital letters and small letters differently: • Tokenization: It means breaking down the raw text into words or sentences. In tokenization, sentences are broken down into words and a paragraph is broken down into sentences. For example, if we perform the tokenization on the sentence ‘Foods and services are good’, then it provides output as a list of words that are ‘Foods’, ‘and’, ‘services’, ‘are’, and ‘good’. • Punctuation removal: Text data can contain different types of symbols, extra letters, words, links, or any special characters. To properly analyze the sentiment of the text data, we need to remove those things from our dataset. • Stop word removal: it is one of the most common preprocessing techniques that is used in NLP. Generally, articles and pronounces are known as stop words such as ‘in’, ‘the’, ‘at’, and ‘to’. These words are removed from sentences in stop word removal. • Lemmatization: It is the technique of converting word of a sentence to its base form. Although stemming and lemmatization look like providing the same functionality, lemmatization has a significant difference from stemming. Lemmatization considers the context of the word before converting to its base form, but stemming does not consider it. For this reason in our work, we use lemmatization.
3.3 Feature Extraction The machine can’t understand text data. So the reviews that are in the form of text are needed to convert into numerical features. We have performed the process using the following techniques:
424
M. Abdullah et al.
• CountVectorizer: it is a technique that is used in feature extraction. It counts the number of words in a sentence or text and arranges them in a word count vector. For example, consider the two sentences ‘The quality of food is not good. Food quality needs to improve’. CountVectorizer will count each word in the sentence such as ‘The’:1, ‘quality’:2, ‘of’:1, ‘food’:2, ‘is’:1, ‘not’:1, ‘good’:1, ‘needs’:1, ‘to’:1, and ‘improve’:1. • Term Frequency-Inverse Document Frequency (TF-IDF) Transfer: it is a process in which the words in a sentence or document are assigned to a score based on the occurrence of that word in that sentence or document. If a word occurs frequently in a document, then it will be considered as important and will be given a high score. But if a word occurs more frequently, then it will be considered less important and will be given a low score.
3.4 Training and Testing Split of Dataset To do particular things with the machine, we need to train the machine. To train and test the machine, we partitioned our dataset into train and test sets by three different ratios: first the traditional 80:20, then 75:25, and finally 70:30.
3.5
Classification Model
We have performed the analysis on our training data using the supervised learning algorithms such as Naïve Bayes, Decision Tree, SVC, KNN, Random Forest, and Logistic Regression.
3.6 Model Evaluation To evaluate our model, we have used the following classification model evaluation technique: • Precision: It can be defined as Precision =
True Positive True Positive + False Positive
(1)
True Positive True Positive + False Negative
(2)
• Recall: It can be described as Precision =
Sentiment Analysis of Restaurant Reviews Using Machine Learning
425
• F1-score: It can be defined as Precision = 2 ×
(Precision × Recall) (Precision + Recall)
(3)
• Accuracy: It is used to describe how well the model fits the dataset. It gives values in the range of 0 to 1. If the value is close to 1, then the model is considered a good model. • Cross Validation: It is used to validate the model by splitting the training dataset into different folds, applying algorithms, and then observing their accuracy. In our research work instead of k-fold cross-validation, we have used stratified k-fold cross-validation because it takes the exact percentage of data for each class.
4 Results and Discussions The analysis is conducted in Jupyter Notebook of Python Anaconda environment, in a machine with Intel core i3 processor and 4 GB RAM. We have got the maximum level of accuracy by splitting the data into training and test sets with the ratio of 75:25. The evaluation metrics of each algorithm are shown in Table 1. We have applied our same dataset to six different supervised learning algorithms and compared their precision, recall, F1-score, and accuracy (before and after crossvalidation) with each other as shown in Table 1. We saw that the SVC classifier provides the highest accuracy of 97.6% among them to categorize the reviews as negative, neutral, and positive after tenfold stratified cross-validation. Precision, recall, and F1-score for each category are close to 1 which is very good for our model. The decision tree algorithm provides the second highest 97% accuracy in categorizing the reviews as negative, neutral, and positive after tenfold stratified cross-validation. The precision, recall, and F1-score of the neutral class in the decision tree algorithm is less than that of SVC. Of all the other four algorithms, KNN provides 96.1%, Naïve Bayes provides the lowest accuracy of 92.2%, Random Forest provides 95.80%, and Logistic Regression provides 96.70%, after tenfold stratified cross-validation. Support Vector Classifier or SVC is a supervised machine learning technique frequently used for multi-class classification. It can also be used for multi-class classification that implemented the one versus one approach. It separates the classes by mapping the data to a high-dimensional space and then locating the best hyperplane. It is C-support vector classification, and its application is dependent on libsvm. One versus one means dividing the dataset into separate datasets for every class versus each other class. A Linear type kernel was used in SVC while training the model. The confusion matrix of the SVC model is shown in Fig. 2. Here, 0 indicates negative, 1 indicates neutral, and 2 indicates positive class. We saw that among 33
426
M. Abdullah et al.
Table 1 Accuracy and evaluation results of different algorithms Algorithms
Class
Precision
Recall
F1-score
Accuracy (%)
Cross validation (%)
Decision tree
Positive
1.0
1.0
1.0
96.40
97.00
Negative
0.90
0.82
0.86
Neutral
0.78
0.88
0.82
Positive
1.0
1.0
1.0
98.00
97.60
Negative
0.91
0.94
0.93
Neutral
0.91
0.88
0.89
Positive
0.98
1.0
0.99
95.60
96.10
Negative
0.90
0.82
0.86
Neutral
0.79
0.79
0.79
Positive
1.0
1.0
1.0
93.20
92.20
Negative
0.75
0.73
0.84
Neutral
0.64
0.67
0.65
Positive
0.98
1.0
0.99
94.40
95.80
Negative
0.84
0.79
0.81
Neutral
0.77
0.71
0.74
Positive
0.99
1.0
1.0
97.20
96.70
Negative
0.89
0.94
0.91
Neutral
0.90
0.79
0.84
SVC
KNN
Naïve Bayes
Random Forest
Logistic Regression
Fig. 2 Confusion matrix of SVC classifier
Sentiment Analysis of Restaurant Reviews Using Machine Learning
427
negative reviews 31 of them are accurately classified as negative, among 24 neutral reviews 21 of them are accurately classified as neutral, and among 193 positive reviews, all of them are accurately classified as positive.
5 Conclusion In this paper, we have performed sentiment analysis of different restaurant reviews of Dhaka, Sylhet, and Sonargaon and classified them as positive, negative, or neutral using Supervised Machine Learning Techniques. NLP techniques such as tokenization, stop word removal, and lemmatization were used to preprocess the data. We applied different classification algorithms such as K-Nearest Neighbor, Decision Tree, Random Forest, Support Vector Classifier, Naïve Bayes, and Logistic Regression to the cleaned data and observed their precision, recall, F1-score, and accuracy. Among all of those algorithms, we have got the highest accuracy 97.6% by the Support Vector Classifier algorithm to classify the review as positive, neutral, or negative, as shown in Table 1. We want to collect a large number of reviews in the future and will apply deep learning and a neural network approach to perform sentiment analysis to get a more accurate result.
References 1. Hamad MM, Salih MA, Jaleel RA (2021) Sentiment analysis of restaurant reviews in social media using Naïve Bayes. Appl Modell Simu 5:166–172 2. Hasan T, Matin A, Joy MSR (2020) Machine learning based automatic classification of customer sentiment. In: 2020 23rd international conference on computer and information technology (ICCIT). IEEE, pp 1–6 3. Zahoor K, Bawany NZ, Hamid S (2020) Sentiment analysis and classification of restaurant reviews using machine learning. In: 2020 21st international Arab conference on information technology (ACIT). IEEE, pp 1–6 4. Rajasekaran R, Kanumuri U, Siddhardha Kumar M, Ramasubbareddy S, Ashok S (2019) Sentiment analysis of restaurant reviews. In: Smart intelligent computing and applications. Springer, pp. 383–390 5. Huda SA, Shoikot MM, Hossain MA, Ila IJ (2019) An effective machine learning approach for sentiment analysis on popular restaurant reviews in Bangladesh. In: 2019 1st international conference on artificial intelligence and data sciences (AiDAS). IEEE, pp 170–173 6. Krishna A, Akhilesh V, Aich A, Hegde C (2019) Sentiment analysis of restaurant reviews using machine learning techniques. In: Emerging research in electronics, computer science and technology. Springer, pp 687–696 7. Hemalatha S, Ramathmika R (2019) Sentiment analysis of yelp reviews by machine learning. In: 2019 international conference on intelligent computing and control systems (ICCS). IEEE, pp 700–704 8. Matlatipov S, Rahimboeva H, Rajabov J, Kuriyozov E (2022) Uzbek sentiment analysis based on local restaurant reviews. arXiv preprint arXiv:2205.15930
428
M. Abdullah et al.
9. Hossain E, Sharif O, Hoque MM, Sarker IH (2020) Sentilstm: a deep learning approach for sentiment analysis of restaurant reviews. In: International conference on hybrid intelligent systems. Springer, pp 193–203 10. Hossain N, Bhuiyan MR, Tumpa ZN, Hossain SA (2020) Sentiment analysis of restaurant reviews using combined CNN-LSTM. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–5
Transliteration from Banglish to Bengali Language Using Neural Machine Translation Shourov Kabiraj, Sajjad Waheed, and Zaber All Khaled
Abstract The popularity of using social media has grown among people day by day. Nowadays, people feel more comfortable in social media (SM) to send messages to others. The rate of using Banglish (blend of Bengali and English words) language and Banglish short-term words for sending a text to another on social media is increasing immensely. In this paper, Banglish and Banglish short terms have been transliterated into the Bengali language. Transliteration converts words from one language or alphabet into the corresponding words in another by replacing similarsounding letters with different characters. For this task, we have used Neural Machine Translation (NMT). We chose Neural Machine Translation because phrase-based systems separate an input sentence into a collection of words and phrases, mapping each to a word or phrase in the destination language, and, on the other hand, Neural networks consider the entire input sentence at each stage when generating the output sentence. Neural Machine Translation was opted to be used due to its ability to work effectively with large datasets while requiring less supervision. Keywords Banglish · NMT · SM · Encoder · Decoder · Mapping · Transliteration
S. Waheed and Z. All Khaled contributed equally to this work. S. Kabiraj (B) · S. Waheed · Z. All Khaled Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail 1902, Bangladesh e-mail: [email protected] S. Waheed e-mail: [email protected] Z. All Khaled e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_36
429
430
S. Kabiraj et al.
1 Introduction Previously, most of the time the communication medium was letter sending. But these days, the majority of people get in touch with other people online. Almost everyone is comfortable sharing through social media. For online communication, people use many social platforms such as Facebook, Messenger, WhatsApp, and Instagram. People can send text messages to their friends, relatives, or others through those social platforms. For sending messages, some people use this short-term word or Banglish words (blend of Bangla and English words). Though the short form of a word or Banglish is a highly usable term these days, most teenagers have used these simplified terms in their conversations. Particularly, Bangladeshis use this term heavily because it is very easy to type and saves time for the sender. But many people aren’t used to that short form of words or Banglish words yet. Even many people don’t realize the correct meaning of these Banglish words so it is difficult to continue replying back. So by the transliteration of these Banglish words into Bengali words, people can understand the correct meaning of the word or sentences. Some examples of this transliteration of Banglish into Bengali words or sentences ), jnne( ), khrp( ), lgse( ). Most of the time, multiple are tmr( people use multiple Banglish words for a single Bengali word. So we use multiple Banglish words corresponding to a single Bengali word. Generally, people use the avro keyboard on their PC and the ridmik keyboard on Android for typing. In avro or ridmik keyboards, people can type Bengali words by typing English letters. In a specific way, by these keyboards when we type Banglish words by typing English letters, it automatically converts those words into Bengali words, such as if we type “Tumi” or “Khawni” in avro or ridmik keyboard, then it perfectly converts it into ” or “ ”. But these keyboards can’t convert Bengali words and shows us “ the shortest form of any English word or Banglish word. But people feel comfortable using shortcut form, and most of the time they use it so they type shortcut form. If we type “Tmi” or “2me” or “Kawni” in avro or ridmik, this keyboard is unable to show the correct form of these shortcut forms. So we have tried to transliterate from those Banglish words with some English words and also shortcut forms of Banglish and English words to Bengali words.
2 Literature Review Though about 4.21% of people speak Bengali all over the world, many people use Banglish words in their regular conversation or while texting on Social media. Hence, some research papers are proposed for this term. But still, there is a lack of thesis papers in this area. A few research analyses have been conducted recently. Suharto Das et al. [1] applied a Deep Learning Study On Understanding Banglish and Abbreviated Words Used in Social Media. For converting Banglish words, they have used computer vision instead of vector distance count, because vector distance always
Transliteration from Banglish to Bengali Language …
431
doesn’t give an approximate result. They represented each dataset word as a numeric value in place of a String. Feature extraction is used here for improving the accuracy of the learned model of the input data. To convert large and complex image data into smaller sets, here also feature extraction was used. In image recognition, convolutional neural networks provide great accuracy. By using convolutional neural network, they connected every neuron of one layer with the next layer. To gather images containing data here used computer vision, and then applied a convolutional neural network to predict words that closely match the dataset. About 71 words and their different types of representation have been used as a dataset here. By using those data and 12 epochs, they achieved an accuracy of 65–70%. Md. Faruquzzaman Akan [2] focused on the place of articulation of vowels and consonants that are used in making Banglish words. He explores the intricacy of transliteration and translation in the field of language and linguistics. Generally, Bengali vowels’ places of articulation are palatal, guttural labial cerebral, etc. On the other hand, Bengali consonants’ places of articulation are dental, velar, nasal as well as palatal and labial. The difference between the place of articulation of Bengali vowels, Bengali consonants, English vowels, and English consonants are the major reason for different spelling of Banglish and Bengali words. Some Bengali sounds are also found here which are always hidden by some consonants. Some syntax-related problems between Bengali and the English language which occur for Be verb, Have verb, Word order, Articles, Negations, etc. are also discussed here.
3 Methodology The dataset is the most important and essential part of any research activity. We collect some unique data for our work. The dataset we are working on is quite inadequate on the Internet. Though finding such types of datasets is comparatively difficult and time-consuming, we tried to collect different types of words and their different combinations of every single word. A sample of our data is shown in the following Fig. 1.
Fig. 1 Different types of representation of a Banglish word
432
S. Kabiraj et al.
Fig. 2 Different types of representation of an English word
Fig. 3 Workflow diagram
Mainly, we worked on Banglish words and Banglish shortcut words; here, we also worked on a couple of English words which are often used in the shortcut form in written conversations. These types of words are very much difficult to realize if used in shortcut form. A sample of this English data is shown in Fig. 2. Some English words we wrote in our dataset are a mixture of English letters and numerical words. These types of words are the shortest of all words in our dataset. Example: f9(fine), n8(night), r8(right), and b4(before). We have also drawn out a workflow diagram of our work as given in Fig. 3.
3.1 Data Collection We created a CSV file for our dataset. All the Banglish and Bengali words are put there in 2 columns. We have collected some data and the rest of the data we created ourselves. The best place to get our data is often a distinct messenger group or WhatsApp group where numerous members chatted together. We tried to find the Banglish words at first that people use most for chatting, then we input different
Transliteration from Banglish to Bengali Language … Table 1 Expected result Banglish
433
Bangla
2mi kmon acho tumi kemon acho tomi kamon aso amra f9 asi 2mi r8 bolecho gd n8
representations of that word. We created at least three to four representations for every particular Banglish word. From Table 1, we see that we created three representations for a single word. We expected any of those three Banglish sentences to show the same result.
3.2 Data Preprocessing Preprocessing data just means putting the data into a form that is ready for the task. To obtain precise results, data preprocessing and data deduplication are very vital. If there are any duplicate data while creating the dataset, they are removed. We assess the data to ensure accurate transliteration, though the dataset still needs some work. • Lower casing: The dataset’s entire text must be in lowercase else the same term could appear more than once. Both lowercase and uppercase words can be found in text data, in a sentence. The computer however assumes the same words typed in various case types as distinct entities. For instance, the computer treats the terms “Amar” and “amar,” which is the same Banglish term, as two separate words. • Tokenization: By Tokenization we break the plaintext into words and sentences. If we use a paragraph as an input, then tokenizing it can break it down into small parts of the sentence. By Tokenization, the task always becomes easier. • Punctuation Mark Removal: Eliminating punctuation from the textual data is a very effective part used in the text processing method. To treat each text equally, punctuation will be removed. As an example, “Kemon” and “Kemon?”, when the punctuation has been removed, are treated similarly. • Remove spaces at beginning and end: In sentences, Deleting additional space at the beginning and end is good because it does not store additional memory and even we can see the data accurately. Moreover, when the spaces are inserted at the beginning and end, it doesn’t seem good.
434
S. Kabiraj et al.
Fig. 4 Model architecture
3.3 Proposed Model We used Neural Machine Translation (NMT) for transliteration. This method consists of an encoder that reads the input language in neural networks for machine translation and creates a representation of it. Then the output sentence is created word by word by a decoder while reviewing the encoder’s representation (Fig. 4).
3.4 Data Encoding and Decoding At each time step, it receives a single element from the input sequence, processes it, gathers data for that element, and propagates it forward. An encoder converts the inputted string into vector space. These word vector representations are then transmitted through an attention mechanism, which chooses which source words to focus on in order to produce an output for the intended language. Following preprocessing, we added the “START” and “END” symbols to the sentence, built a vocabulary of all the distinctive Banglish and Bengali terms, and determined the vocabulary sizes and maximum sentence lengths for both languages. In order to create the data for training and testing batches, we divided the data into training and testing categories and built a function for doing so. We train the model with 30 epochs, and using test data after training it made predictions. Finally, we will generate the output Result on train data. Figure 5 shows that our model is performing decently on the train data.
Transliteration from Banglish to Bengali Language …
435
Fig. 5 Model parameter
Fig. 6 Accuracy diagram
4 Results and Discussions For analyzing the task, we used Google Colab. According to Fig. 6 when we train our dataset, we saw, by increasing the number of epochs, the accuracy of our task increases.
436
S. Kabiraj et al.
There are some algorithms used in Machine Learning. From those, Support Vector Machine (SVM) is used in applications like face detection and email classification. Convolutional neural network (CNN) is used in image recognition and classification. Random Forest (RF) is commonly used in remote sensing to predict the accuracy or classification of data. These algorithms are used for classification. So it is not possible to use SVM or CNN or RF for Transliteration. Here, we used Neural Machine Translation (NMT), which works using recurrent neural network (RNN). We used “Sequence to Sequence with Attention” and “Sequence to Sequence without Attention” which are the systems of RNN. When we used “Sequence to Sequence with Attention,” we got 80% accuracy, but when we used “Sequence to Sequence without Attention” we got below 60% accuracy. Even Google also used NMT for Google Translate, which is based on RNN. During the training of our dataset, accuracy reached up to 80%. More training data give greater test accuracy. We have trained 1000 sentences. We will improve the accuracy of our task if we enhance our dataset.
5 Conclusion In this paper, we have performed an analysis of messaging words on how people make easy their conversation. But we do not know about these types short forms of words. Natural language processing is a very essential thing nowadays. Without NLP, we are unable to do the analysis. The accuracy of the result provided by neural machine translation in this study is up to 80%. But occasionally, it only performs with mediocre precision. Because Translator only provides results based on predictions. The test sentence may occasionally be unfamiliar to the translator. As a result, achieving 100% accuracy is unattainable. It is challenging to gather datasets. We have trained only 1000 sentences. The accuracy will rise when the dataset is enriched further. But people make short words from day to day. So it is a continuous process, as we know a computer is an intelligent device when it has available data. A computer can do something which we teach or train, so advanced algorithms and datasets can be produced.
References 1. Das S, Islam MS, Mahmud I (2021) A deep learning study on understanding Banglish and abbreviated words used in social media. In: 2021 5th international conference on intelligent computing and control systems (ICICCS). IEEE, pp. 1690–1695 2. Akan MF (2018) Transliteration and translation from Bangla into English: a problem solving approach. Br. J. English Linguistics 6(6):1–21
An Evaluation of BdSL 49 Dataset Using Transfer Learning Techniques: A Review Saqib Sizan Khan, Ashraful Haque, Nipa Khatun, Nasima Begum, Nusrat Jahan, and Tanjina Helaly
Abstract Sign language is used to communicate using hand movements rather than words or written language. Typically, this approach is useful for the deaf or mute people since they cannot utilize other forms of communication due to their speech and hearing impairments. The majority of Bangladeshi people are unfamiliar with sign language. As a consequence, deaf or mute people are unable to communicate with general individuals. Therefore, to address this issue, computer vision and supervised learning techniques are utilized to recognize images of the Bengali Sign generated with both hands. Thus, in this research, we proposed a method to evaluate the performance of our own dataset named BdSL 49 which contains 49 classes and 29,428 images. We use our dataset to train the latest transfer learning benchmark models such as Xception, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, ResNet50V2, and ResNet101V2 which are used for sign image recognition. The experimental result analysis of the benchmark models has been done where the performance of the models is satisfactory. Among the seven models, the Xception, InceptionResnetV2, and MobileNet achieved the highest F1-score of 93, 91, and 92%, respectively. Additionally, we compared our dataset with the state-of-the-art
S. S. Khan · A. Haque · N. Khatun · N. Begum (B) · N. Jahan · T. Helaly Department of Computer Science and Engineering, University of Asia Pacific, Dhaka 1205, Bangladesh e-mail: [email protected] S. S. Khan e-mail: [email protected] A. Haque e-mail: [email protected] N. Khatun e-mail: [email protected] N. Jahan e-mail: [email protected] T. Helaly e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_37
437
438
S. S. Khan et al.
dataset. On the basis of the model’s performance, we can conclude that our dataset is pretty standard. Finally, the paper is concluded with some future directions. Keywords Bengali sign language · Transfer learning · CNN · Computer vision · Supervised learning
1 Introduction The population of Bangladesh is around 160 million and of which 16 million are mute or deaf [1]. Generally, oral and written communication are the most common forms of human interaction. However, there is an another medium of communication which is known as Sign Language. Individuals who are deaf or unable to speak can communicate through sign language. They face great difficulties while attempting to interact with general people in their daily life. The proper use of sign language helps to establish communication between them. However, the majority of us have little or no knowledge of sign language. Besides, it is not taught in the standard curriculum. Thus, we need to solve this issue to reduce the communication gap between normal people and mute or deaf people by using an automated approach. In this paper, we propose a method to evaluate the performance of our developed dataset, BdSL 49 [2] using seven different transfer learning models. The dataset comprises nearly all Bengali alphabetic and numeric characters. Some CNN-based transfer learning models such as Xception, InceptionV3 (IcepV3), InceptionResNetV2 (IRV2), MobileNet (MN), MobileNetV2 (MNV2), ResNet50V2 (RN50V2) and ResNet101V2 (RN101V2) are selected to evaluate the performance of BdSL 49 dataset. We analyze the performance of each model based on the accuracy, precision, recall, and F1-score. The remaining of this paper is organized as follows. The related work is described in Sect. 2. The overview of the dataset is described in Sect. 3. The proposed pipeline is presented in Sect. 4. Section 5 illustrates experimental result analysis and performance evaluation. Lastly, Sect. 6 concludes this research work with future scopes.
2 Related Works In Islam et al. [3], a comprehensive dataset is proposed named Ishara-Lipi, which is the first Bangla Sign Language (BdSL) dataset containing 36 characters and a total of 1,800 images of two hand gestures. Islam et al. [4] have created an artificial interpreter that translated the static image of BdSL into spoken language. They created a massive dataset with 7052 samples of 10 numerics and 23864 samples of alphabets. Poddar et al. [5] produced a huge amount of data with 87 classes using video extraction and converting it to masking images. Jahid et al. [6] proposed a dataset that contains 30 classes and a total of 4,500 images using one hand gesture. Rafi et al. [7] used a dataset with a total of 12,581 images for 38 characters. They finetuned the VGG-
An Evaluation of BdSL 49 Dataset Using Transfer Learning Techniques: A Review
439
19 model and achieved 89.6% accuracy. Angona et al. [8] utilized MobileNet for recognition of Bengali Sign Language using the Ishara-Lipi dataset [3]. The author in [9] reviewed the research approaches of all BdSL from 2002 to 2021 and discussed each work’s contributions and weaknesses. The research work [10] uses a variety of deep learning models and angular loss functions to highlight the significance of generalization in finger-spelled BSL recognition. Due to a lack of diversity in the dataset, they achieved 55.93 and 47.81% test accuracy using the SphereFace loss function in VGG-19 architecture. The majority number of the existing works [3, 6] used datasets that have up to 30– 38 classes with a few images. The authors [4, 5] utilized one hand gesture which is not considered as standard. Besides, the authors in [4, 5] applied some complex image pre-processing techniques such as masking, gray scaling, and erasing backgrounds which are computationally expensive. Besides, at the time of model training, they did not use RGB images. However, our developed dataset contains 49 classes with an adequate number of samples using both hands. Our sign images are captured with different backgrounds. And for model training purposes, we used RGB images.
3 Dataset Acquisition We developed BdSL 49 [2], which consists of 49 classes, including 37 alphabets, 10 numerals, and 2 special characters. Each class has around 300 images. It has a total of 29,428 images for 49 different classes. Each image represents a single Bengali letter. The images were taken by the smartphone cameras of 14 distinct adults. There are two segments of our dataset such as recognition and detection. Each segment has several images taken in a variety of locations and lighting conditions. Each segment
Fig. 1 Overview of BdSL 49 dataset
440
S. S. Khan et al.
has 14,714 images categorized into 49 classes. The recognition section uses only the sign images of the hand gesture as shown in Fig. 1. For model training purposes, we utilize the recognition phase, which is divided into two sections: training and testing. We considered 80% of the total images in the training set, comprising 11,774 images, while the remaining 20% of the total images are in the test set.
4 Proposed Methodology The proposed method evaluates the performance of several transfer learning models using our own developed dataset. The dataset is used to train the Xception, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, ResNet50V2, and ResNet101V2 models. We use the original weights of the ImageNet dataset to reduce the training time. Since all the images are in RGB (Red-Green-Blue) format, there are three channels of the model’s input. Each channel’s pixel values lie between 0 and 255. The machine consistently provides greater precision for pixel numbers between 0 and 1. Thus, all pixel values are divided by 255. The pooling layer settings are maximized. After the pre-trained model, three additional layers are added. The first two layers are comprised of 128 and 64 neurons, each with a ReLU activation function. The third layer is comprised of 49 neurons based on the total number of classes in the dataset and is activated by a softmax activation function. To prevent over-fitting, dropout is applied before the final layer. Model compilation employs categorical cross-entropy as a loss function and the Adam optimizer is used as an optimization process. A total of 30 epochs are needed to train the model with a batch size of 58 (Fig. 2).
Fig. 2 Overall flow diagram of the proposed methodology
An Evaluation of BdSL 49 Dataset Using Transfer Learning Techniques: A Review
441
5 Result Analysis and Performance Evaluation This section describes the experimental result of the benchmark models such as Xception, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, ResNet50V2, and ResNet101V2. The performance evaluation of these models is discussed based on accuracy and loss graph with F1-score. After training each transfer learning model using the BdSL 49 dataset [2], we generate various visualizations for evaluating the performance of each model. The graphical representation of the performance evaluation matrices are illustrated in Figs. 3, 4, 5, 6, 7, 8, and 9. Each graph demonstrates the contrast between training and test data and verifies the model’s performance based on the graph line gestures. In Table 1, the F1-score of each class and model is represented by the range of 0 to 1. The following Eqs. (1–3) are used to calculate the precision and recall in order to determine the F1-score. TP (1) Precision( p) = T P + FN Recall(r ) =
TP T P + FN
(2)
p∗r p+r
(3)
F1 − scor e = 2 ∗
In the above equations, the parameter TP stands for True Positives, TN stands for True Negatives, FP stands for False Positives, and FN stands for False Negatives. The graphical representation of the accuracy and loss for each of the models is shown below. Figure 3 illustrates the accuracy and loss graph of the Xception model. The line structure of the graph is not excessively zigzag which indicates high accuracy and little loss of the model. The accuracy and loss graphs are nearly flipped relative to each other which implies very few errors in the data. The architecture of the Xception
Fig. 3 Performance evaluation of Xception model
442
S. S. Khan et al.
Fig. 4 Performance evaluation of InceptionV3 model
Fig. 5 Performance evaluation of InceptionResNetV2 model
model is rather extensive and involves a vast number of trainable parameters. This model performs really good with an accuracy of 93%. Figures 4 and 5, respectively, represent the Inception architecture’s accuracy and loss graphs of our dataset, BdSL 49. We utilized InceptionV3 and InceptionResNetV2 for performance evaluation. The performance of the InceptionV3 model is satisfactory with an accuracy of 87%. However, the performance of the InceptionResNetV2 model is much better. It achieves an overall accuracy of 91%, since its architecture includes residual connections to prevent vanishing gradients. Besides, it has significantly greater trainable parameters than Xception and IceptionV3 models. The accuracy and loss graphs of InceptionResNetV2 are not approximately flipped to each other and include an excessive number of zigzags. On the other hand, IceptionV3’s graphs have a curved line and are relatively flipped to each other. Therefore, we assume that the InceptionResNetV2 model does not perfectly fit to our dataset.
An Evaluation of BdSL 49 Dataset Using Transfer Learning Techniques: A Review
443
Fig. 6 Performance evaluation of MobileNet model
Fig. 7 Performance evaluation of MobileNetV2 model
The performance of the two versions of the MobileNet model is illustrated in Figs. 6 and 7, respectively. MobileNet is an exceptionally lightweight architecture that achieves an accuracy of 92% with fewer trainable parameters than other transfer learning models. MobileNet’s performance is excellent, whereas MobileNetV2’s performance is moderate as the accuracy falls to 85%. MobileNetV2 has fewer trainable parameters than simple MobileNet, which decreases the performance accuracy. The performance of the ResNet models is represented in Figs. 8 and 9, respectively. Two variations of the ResNet architecture were implemented. The ResNet50V2 and ResNet101V2 show acceptable performance with an accuracy of 85%. The RN101V2 features a very large architecture similar to the InceptionResNetV2 architecture with a large number of trainable parameters. When the version number of the ResNet model increases, then graph lines result in more zigzags. Therefore, sometimes, large trainable parameters may hamper the model performance.
444
S. S. Khan et al.
Fig. 8 Performance evaluation of ResNet50V2 model
Fig. 9 Performance evaluation of ResNet101V2 model
Table 1 represents the F1-score of different models. For almost every class, the Xception, InceptionResNetV2, and MobileNet achieve the F1-score of greater than 90%. The F1-score has dropped to 70% for some classes of the InceptionV3, MobileNetV2, ResNet50V2, and ResNet101V2 models. The average accuracy of these four models is not standard. On the other hand, the Xception, InceptionResnetV2, and MobileNet performed quite good for our dataset and achieved the overall F1-score of 93, 91, and 92%, respectively, which is satisfactory. Finally, we can conclude that BdSL 49 is quite a standard dataset for future studies in this domain.
An Evaluation of BdSL 49 Dataset Using Transfer Learning Techniques: A Review
445
Table 1 Comparison of F1-score for different models Label
0
1
2
3
4
5
6
7
8
9
Xception
0.75
0.98
0.93
0.89
0.89
0.94
0.94
0.98
0.99
0.94
IcepV3
0.83
0.87
0.82
0.72
0.74
0.85
0.87
0.92
0.93
0.89
IRV2
0.91
0.91
0.92
0.71
0.79
0.86
0.93
0.92
0.97
0.95
MN
0.91
0.94
0.93
0.82
0.86
0.92
0.91
0.99
1
0.93
MNV2
0.75
0.78
0.82
0.6
0.71
0.81
0.86
0.96
0.94
0.92
RN50V2
0.86
0.85
0.8
0.64
0.69
0.76
0.88
0.91
0.91
0.94
RN101V2
0.77
0.87
0.77
0.67
0.75
0.81
0.66
0.74
0.88
0.86
Label
10
11
12
13
14
15
16
17
18
19
Xception
0.96
0.97
0.91
0.93
0.98
0.89
0.99
0.95
1
0.82
IcepV3
0.97
0.87
0.97
0.88
0.91
0.87
0.94
0.88
0.78
0.98
IRV2
0.96
0.91
0.92
0.89
0.92
0.89
0.93
0.93
0.91
1
MN
0.98
0.97
0.99
0.92
0.91
0.82
0.97
1
0.74
1
MNV2
0.87
0.95
0.96
0.8
0.85
0.94
0.82
0.87
0.71
0.9
RN50V2
0.93
0.93
0.88
0.88
0.9
0.84
0.85
0.79
0.79
0.87
RN101V2
0.9
0.86
0.91
0.87
0.82
0.92
0.94
0.91
0.67
0.91
Label
20
21
22
23
24
25
26
27
28
29
Xception
0.48
0.87
0.97
0.99
0.95
1
0.82
0.48
0.87
0.97
IcepV3
0.9
0.88
0.67
0.98
0.89
0.93
0.85
0.9
0.43
0.69
IRV2
0.94
0.84
0.88
0.97
0.88
0.97
0.89
0.93
0.75
0.84
MN
0.96
0.83
0.91
0.94
0.94
0.99
0.92
0.95
0.64
0.84
MNV2
0.84
0.87
0.76
0.83
0.87
0.93
0.94
0.9
0.88
0.79
RN50V2
0.9
0.84
0.78
0.95
0.91
0.91
0.82
0.9
0.71
0.72
RN101V2
0.94
0.81
0.83
0.9
0.85
0.88
0.88
0.93
0.77
0.81
Label
30
31
32
33
34
35
36
37
38
39
Xception
0.99
0.95
0.99
0.99
0.99
0.97
0.97
0.97
0.98
0.99
IcepV3
0.38
0.92
0.97
0.87
0.84
0.94
0.82
0.83
0.84
0.82
IRV2
0.83
0.99
0.82
0.87
0.8
0.88
0.95
0.9
0.89
0.91
MN
0.83
0.96
1
0.95
0.92
0.99
1
0.91
0.9
0.8
MNV2
0.85
0.87
0.98
0.7
0.85
0.84
0.96
0.84
0.85
0.89
RN50V2
0.79
0.9
0.97
0.78
0.87
0.92
0.95
0.8
0.72
0.79
RN101V2
0.85
0.83
0.99
0.87
0.93
0.94
0.85
0.83
0.76
0.79
Label
40
41
42
43
44
45
46
47
48
Avg.
Xception
0.96
0.98
0.9
0.98
0.9
0.98
0.97
1
1
0.93
IcepV3
0.85
0.93
0.86
0.79
0.79
0.93
0.96
0.9
0.9
0.87
IRV2
0.93
0.91
0.8
0.88
0.89
0.95
0.95
0.98
0.89
0.91
MN
0.95
0.97
0.85
0.9
0.89
1
0.99
1
0.94
0.92
MNV2
0.91
0.93
0.89
0.79
0.46
0.89
0.9
0.92
0.89
0.85
RN50V2
0.91
0.92
0.8
0.87
0.76
0.97
0.95
0.85
0.89
0.85
RN101V2
0.87
0.92
0.73
0.83
0.75
0.99
0.97
0.96
0.88
0.85
446
S. S. Khan et al.
Table 2 represents the comparison between our developed dataset (BdSL 49) and the existing dataset (Ishara-Lipi) using some transfer learning models. Table 2 shows that our BdSL 49 dataset performed outstanding compared to the state-of-the-art dataset. Table 2 Comparison of our dataset with the state-of-the-art dataset Dataset Xception (%) IcepV3 (%) IRV2 (%) MN (%) Ishara-Lipi [3] 80 Our dataset 93 (BdSL 49)
31 87
76 91
83 92
RN50V2 (%) 63 85
6 Conclusion and Future Work A good, standard dataset is an essential component for developing many automated systems. Thus, in this research, we developed a new Bengali Sign Language dataset named BdSL 49. The main contribution of this paper is to evaluate the performance of our developed BdSL 49 dataset on different CNN-based transfer learning models to assess the standardization and usefulness of our dataset. From the experiment, we found that the Xception, InceptionResnetV2, and MobileNet are the best-performed models for our dataset where the accuracy is greater than 90%. In the future, we will develop a customized deep learning model for Bengali Sign Language recognition and detection with a language model which will help to reduce the communication gap for the deaf and mute community. Acknowledgements This work is supported by the Institute of Energy, Environment, Research, and Development (IEERD), University of Asia Pacific (UAP), Bangladesh.
References 1. Disability in Bangladesh. https://en.wikipedia.org/wiki/Disability_in_Bangladesh. Accessed on September 1, 2022 2. Hasib A, Khan SS, Eva JF, Khatun M, Haque A, Shahrin N, Rahman R, Murad H, Islam M, Hussein MR et al (2022) Bdsl 49: a comprehensive dataset of Bangla sign language. arXiv preprint arXiv:2208.06827 3. Islam MS, Mousumi SSS, Jessan NA, Rabby ASA, Hossain SA (2018) Ishara-lipi: the first complete multipurpose open access dataset of isolated characters for Bangla sign language. In: 2018 international conference on Bangla speech and language processing (ICBSLP). IEEE, pp 1–4 4. Islalm MS, Rahman MM, Rahman MH, Arifuzzaman M, Sassi R, Aktaruzzaman M (2019) Recognition Bangla sign language using convolutional neural network. In: 2019 international conference on innovation and intelligence for informatics, computing, and technologies (3ICT). IEEE, pp 1–6
An Evaluation of BdSL 49 Dataset Using Transfer Learning Techniques: A Review
447
5. Podder KK, Chowdhury ME, Tahir AM, Mahbub ZB, Khandakar A, Hossain MS, Kadir MA (2022) Bangla sign language (bdsl) alphabets and numerals classification using a deep learning model. Sensors 22(2):574 6. Jim AAJ, Mendeley (2021) KU-BdSL: Khulna University Bengali sign language dataset. https://doi.org/10.17632/SCPVM2NBKM.1 7. Rafi AM, Nawal N, Bayev NSN, Nima L, Shahnaz C, Fattah SA (2019) Image-based Bengali sign language alphabet recognition for deaf and dumb community. In: 2019 IEEE global humanitarian technology conference (GHTC). IEEE, pp 1–7 8. Angona TM, Shaon AS, Niloy KTR, Karim T, Tasnim Z, Reza SS, Mahbub TN (2020) Automated Bangla sign language translation system for alphabets by means of mobilenet. TELKOMNIKA (Telecommun Comput Electron Control) 18(3):1292–1301 9. Khatun A, Shahriar MS, Hasan MH, Das K, Ahmed S, Islam MS (2021) A systematic review on the chronological development of Bangla sign language recognition systems. In: 2021 Joint 10th international conference on informatics, electronics & vision (ICIEV) and 2021 5th international conference on imaging, vision & pattern recognition (icIVPR). IEEE, pp 1–9 10. Youme SK, Chowdhury TA, Ahamed H, Abid MS, Chowdhury L, Mohammed N (2021) Generalization of Bangla sign language recognition using angular loss functions. IEEE Access 9:165351–165365
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human Cognition as Morphogenesis in the Density of Primes Sudeshna Pramanik, Pushpendra Singh, Pathik Sahoo, Kanad Ray, and Anirban Bandyopadhyay Abstract From image processing to information retrieval from the brain structure and signal, brain researchers find common geometric shapes from the lower dimensional data to derive higher dimensional data and create the elements of higher dimensional data. We have challenged this culture and argued to replace it with a practice to find elements in the orthogonal space, which are conceptually invariants of lower dimensions. At the same time, we have argued to replace space-time with space-timetopology-prime-based invariants under the self-operating material universe, SOMU, since the density of primes is a bias-free infinite source to deliver unique symmetries perpetually. Here we have derived the topology or morphogenesis from the density of primes and estimated the framework of maniflats and manifolds derived from the 1D to 20D tensors holding the within-and-above network of invariants as conscious thoughts of a human brain. Keywords Space-time · Higher dimensional tensors · Manifold · Maniflats · Human brain · The density of primes
1 Introduction Recent findings that the brain operates in the 12 dimensions (12D, [1]) have triggered a serious debate on developing higher dimensional information processing for human cognition and consciousness. However, we think that looking into more complex S. Pramanik · K. Ray Amity School of Applied Science, Amity University Rajasthan, Kant Kalwar, NH-11C, Jaipur Delhi Highway, Jaipur 303007, Rajasthan, India P. Singh · P. Sahoo · A. Bandyopadhyay (B) National Institute for Materials Science, International Center for Materials Nanoarchitectronics, MANA, Tsukuba, Japan e-mail: [email protected] Center for Advanced Measurement and Characterization, RCAMC, 1-2-1 Sengen, Tsukuba, Ibaraki 3050047, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_38
449
450
S. Pramanik et al.
structures in the brain, if the decision is made that the brain is a 12D structure, then the approach is fundamentally wrong. There has been a fundamental mathematical protocol to derive a higher dimension that is not used in brain research. Rather, a wrong notion of the dimension is used, where deriving elements from a set with common dynamics is considered a new dimension. Right from the days of framing the string theory to create the theory of everything, constructing everything starting from a vibrating string, we have come to a point where the string is replaced by a helical spring, a fourth circuit element, Hinductor (Sahu et al., [2] JP-511630; US 9019685B2, (2015). EU patent EP2562776B1). While string theory bonded the strings to construct the universe bottom-up, we nested and twisted the spring to create everything, nesting the helical phases bottom-up. Once we fixed the fundamental element of the universe, the next step was to choose the metric. In 1916, Einstein introduced the concept of linking space and time together in one tensor, as if different constituents of space and time interact to create reality. In these tensors, always three dimensions of space and one dimension of time were taken into account, and the observer’s frame of reference was a key. We have changed the definition of a new dimension, suggesting that new higher dimension elements are orthogonal to the current dimension (or elements of the fifth dimension would be invariants of the fourth dimension and given equal importance to space, time, topology, and prime (3 dimensions × 4 = 12 dimensions, resulting in the metric, space-time-topologyprime, (STts metric, [3, 4]). The reason for introducing two new variables, topology, and prime, is that the instrument’s measurement limits space and time. In contrast, topology, or morphogenesis, is a subject of study where the observer’s frame of reference is not a key. Therefore, the geometric elements of topology could be found in wide ranges of spatio-temporal events. For example, teardrop to the ellipsoid formation is found when a star is born, in the brain-spinal cord combination, protein folding, sperm-ovum duality, etc. For prime, the density of prime has been the fundamental of the universe, which has no bias; deriving the geometric elements of morphogenesis provides us with a metric where instead of space-time, we take space-time-topology-prime as a fundamental element, but the infinite series of the density of primes create a self-operating mathematical universe, SOMU [3, 4]. Here in this work, we have concentrated only on the density of primes and how geometric elements could be derived from the density of primes. At the same time, we have revisited the 1D to 20D tensors to find the geometric maniflats that would estimate the linking of elements and interaction of elements in the space-time-topologyprime metric. First, we review historical efforts linking the tensors of different dimensions. Of course, the journey of string theory has been that a 4D tensor, quaternion, and an 8D tensor, octonion, were coupled with a warp factor, and the sum of two tensors, 8 and 4, considered to estimate 12D tensor, which we consider a very wrong approach. Dixon algebra [5] took a product of real numbers, complex numbers, octonions, and quaternions to build a grand unification theory, GUT, from elementary particles. Why do we take the product of vectors of different dimensions to invent a new algebra? Because addition or subtraction does not take into account interdimensional interactions. Leonard Dickson showed in 1919 that octonions could be constructed as a combination of two-dimensional algebra where elements are
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
451
quaternions [6]. We have recently generalized the composition algebra [7] for higher dimensional tensors. Instead of representing the composite tensor as a product, we replace elements of a tensor with other tensors. We have named it the decomposition of a tensor. We needed it to understand and analyze the maniflats, or the geometric structure where elements of different dimensions would link with each other and transmission between different elements could be made [3, 4].
2 Methods Summary 2.1 Calculation of Invariants and Space-Time-Topology-Prime Metric (STts) Metric Under SOMU 2.1.1
The Mathematical Operation That Invents an Invariant
3D clock assembly representing spatio-temporal events is a polar decomposed structure, hence could be rewritten as a positive-definite tensor Ai j k = U e f , f = ln( P T ); / P is the deformation of input Ai j k from memorized Ai j k . Two 3D clock assemblies, /
one memorized ( Ai j k ) and the other unknown input ( Ai j k ) resonating with memory. /
Deformation η (η = ( Ai j k − Ai j k )/ Ai j k ) in the 3D clock assembly for train and test datasets are taken as differential ∂η/∂∅ signal or x i along three orthogonal axes ∅1 , ∅2 , ∅3 , in general ∅ S . The plot is R3 : (x, y, z)→ (φ 3 ), The invariant 1 , φ 2 , φ ∂ Sφ 1 ∂ Sφ 2 condition: partial derivatives of S, with respect to η, ∂η : ∂η vanishes when φ 1 = φ 2 ( A : B = t r( AB T ) … (equation 3). T denotes transpose, t r trace delivered by ∅sN ∗ W sS . Seed molecule or supramolecule’s resonance frequency set {R i } periodically oscillates, we get Ai j k . When the output product of one self-assembly is used as a seed for thenextphase of self-assembly, the orthogonal projection to ∂ Sφ 1 ∂S determine the invariant ∂η : ∂ηφ2 is satisfied. Both layers together generate the vortex that holds the geometric shape of an invariant [8]. So, the definition of dimension used here could be found in solid-state physics literature from the 1960s.
2.1.2
The Space-Time-Topology-Prime Metric
We reported counting primes in a 3D clock arrangement to build an artificial brain [9], and the metric evolved over time [10]. Finally, we reported the operational protocols for space-time-topology-prime (STts metric, Singh et al. [3, 4]. Here the metric measures distance in symmetry, for self-assembly of systems within and above, we could measure the differential space ds 2 (h 3 ), differential time dt 2 (h 1,2 ), differential topology dT 2 (h 5 ) and finally, differential symmetry in the density of primes d S 2 (h 7 ).
452
S. Pramanik et al.
Here the combinations of dual (either S or t), triple (either S, t, or T) and quad features (S, T, t, and s) are: Space-time st (h 3 , h 1,2 ), with 3 + 1 or 2 = 5 or 6 dimensions; Space-topology sT (h 3 , h 5 ), with 3 + 5 = 8 dimensions; time-symmetry t S h 1,2 , h 7 , with (1 or 2) + 7 = (8 or 9) dimensions; topology-symmetry T S(h 5 , h 7 ), with 5 + 7 = 12 dimensions; st ST (h 3 , h 1,2 , h 5 , h 7 ), with 3 + (1 or 2) + 5 + 7 = 16 or 17 dimensions. Spacesymmetry s S(h 3 , h 7 ), with 3 + 7 = 10 dimensions; Space-topology t T (h 1,2, h 5 ), , h with (1 or 2) + 5 = (6 or 7) dimensions; space-symmetry-time s St h 3 7 , h 1,2 with 3 + 7 + 1 or 2 = 11 or 12 dimensions; space-time-topology st T h 3 , h 1,2 , h 5 with 2 + 1 or 2 + 5 = 8 or 9 dimensions; space-symmetry-topology s ST (h 3 , h7 , h 5 ) with (1 or 3 + 7 + 5 = 15 dimensions; symmetry-time-topology t ST h1,2 , h 7 , h 5 with 2) + 7 + 5 = 13 or 14 dimensions; symmetry-space-time s St h 3 , h 7 , h 1,2 3 + 7 + 1 or 2 = 11 or 12 dimensions. Since icosahedron has 12 corners, for the “within-and-above” universe, we would have a maximum of 12 dimensions. Since icosahedron has 20 planes, we would have 20 dimensions when dimension means adding new dynamics. Hence, st ST would be confined in one imaginary world, i.e., it would represent the projected and the feedback time crystal “to and from” the Phase Prime Metric PPM [3, 4, 9, 10]. The metric representing the polyatomic time crystal in st ST = S2T 2 universe of 12 nested worlds is given by H = P P M1 st h 3 , h 1,2 + P P M2 sT (h 3 , h 5 ) + P P M3 t S h 1,2 , h 7 + P P M4 T S(h 5 , h 7 ) + P P M5 s S(h 3 , h 7 ) + P P M6 t T h 1,2 , h 5 + P P M7 s ST (h 3 , h 7 , h 5 ) + P P M8 st T h 3 , h 1,2 , h 5 + P P M9 t ST h 1,2 , h 7 , h 5 + P P M10 s St h 3 , h 7 , h 1,2 + Pr oject−Feedback st ST h 3 , h 1,2 , h 5 , h 7 . . . (equation 4)
3 Results and Discussion 3.1 A Composition of Five Fractal Patterns for the Density of Primes Is Key to Cognition We have argued earlier that the basic concept of the number system in the Vedic religious scriptures is a periodic power cycle of 10. Therefore, in Fig. 1a, we have counted the number of primes at a gap of 10 integers, so in each loop, the possible number of primes available would be either 0, 1, 2, 3, or 4; only five values possible. We try to find how the prime number theorem predicts the number of primes and the actual density of primes in Fig. 1b. We can see a huge discrepancy in the real density of primes with the prime number theorem, so we rejected the prime number theorem and investigated if there is any hidden periodicity or pattern in the density of primes. For that purpose, we calculated the frequency of finding a particular density of prime
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
453
in the integer space. Say the density of primes is 3, and if we keep searching in the integer space, how frequently would we encounter a period of ten where the density of primes is 3? The gap between integers was normalized for all values 0, 1, 2, 3, 4, and 5 so that we could bring gap values for all five possible prime densities with similar values and differences in their frequency pattern to be compared. Figure 1c–g plotted half-the-integer-gap above and below a central axis, and C2 symmetry reveals a set of 6 waves or a triplet of wave pairs for each possible primes’ possible densities. Thus, we get 5 × 6 = 30 waves that map the frequency pattern for the density of primes; the 30 clocks form an integrated architecture, as shown in Fig. 1i. From the fitting plot of 30 waves, we discovered that the most dominating phase gap is 22.5°, Fig. 1i. This is a very important angle, and we see it everywhere in nature. For each set of ten consecutive integers, there could be 4, 3, 2, 1, and 0 number of primes, and For these five density values, if isolated from integer series, we find three pairs of phase spheres for each of the five density values. This integrated network of 30 phase spheres, all with a specific phase gap and diameter, remains constant; it a
b
c
d
e
f
g
h
Fig. 1 The thirty infinite series of thirty invariant phases in the density of primes defines fifteen types of singularities and fifteen types of infinity. a Density of primes calculated by counting the number of primes at a gap of 10, we get five possible values of density of primes, 0, 1, 2, 3, 4, and we find integer gap of the gap of gaps (three higher levels) to find positive and negative diversions in the integer series. b All five densities for 0, 1, 2, 3, and 4 are plotted together. We consider five densities, 0, 1, 2, 3, 4 and build five plots in panels c, d, e, f, and g. Three pairs of waves fit the density of primes (6 × 5 = 30), and the fitting line is colored. Each pair has an opposite phase but with a phase gap. i We plot the curves between numbers 0–4000 or 0–10,000
454
S. Pramanik et al.
never evolves, n → n + 1 or counting takes place on the surface of this structure. It is the skeleton of SOMU. The data generation for Fig. 1c–i. We search the prime numbers P1, P2, P3, … Pn in every 10 numbers interval 0–10; 10–20; … from 0 to 4000. Then we obtain the successive difference (Q1 = P2-P1; Q2 = P3-P2 … Qn-1 = Pn-Pn-1) between P1, P2, P3 … Pn and frequently, we find the fixed numbers Q1, Q2, Q3, Q4, and Q5. After that, we select the numbers N1, N2, and N3…Nn where we get Q1, Q2, Q3, Q4, and Q5. For all Q, we obtain the number difference M1 = N2-N1; M2 = N3-N2; … Mn-1 = Nn-1-Nn. To get the smallest numbers S1, S2, S3 … Sn, we divide the numbers M1, M2, … Mn by a suitable number X. After that, we take the next difference level of smallest numbers T1 = S2-S1; T2 = S3-S2; … Tn-1 = Sn-Sn-1. We plot the curves between numbers 0–4000 and T1, T2, … Tn for each Q. Then, we fit the curves and get 3 sets of parodic equations for each Q.
3.2 Implications of Time Crystal Like the 3D Architecture of Clocks Representing the Density of Primes 3.2.1
Synthesis of System Points and Singularities
One of the important factors for clock-assembled architectures of the density of primes is that 30 = 2 × 3 × 5, 2 × 5 × 3, 3 × 5 × 2, 3 × 2 × 5, 5 × 3 × 2, 5 × 2 × 3; there are six topologies, it is a fusion of two triangles crossing over by facing each other, creating a singularity domain in the overlapping region. The density of primes has enormous complexity than the brain, which primarily uses 12 as the basis of metric variations (12 = 2 × 2 × 3, 2 × 3 × 2, 3 × 2 × 2). In the ordered factor metric, 12 cannot generate a system point (paired 2 is the reason), but 30 can (three separate integers, 2, 3, and 5). Most of the system points originated when we consider primes’ density, which continues when we move downstream from 12 to 1D. The density of primes intricately curves the probability of making a point undefined (ordered factor metric, [10]), the choices connecting lines for all integers from isolated discrete forms along the lines of integers, the isolation, superposition, overlaps, and gaps are modulated by the density of primes. Connecting nearest neighbors of the density of primes plot derives C5xC3 symmetry of paired wave gives birth to the property of normalization, i.e., nearly a closed loop at 360°.
3.2.2
12° Solid Angle for 30 Wave Packets on a Phase Sphere
The density of primes has fifteen paired periodic waves or thirty waves with a phase gap of 12°, totaling 360°, which introduced the concept of a circle or sphere when the density plot is projected to infinity. However, nine of fifteen density plots form a triplet of triplet group to demonstrate 108° phase cover. All these phase values are
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
455
connected to integers 12, 36, and 108 forming the first triplet of triplet, and 12 points ordered factor metric forms the smallest triplet and smallest closed loop. Thus, the 360° value for a complete loop is not an accident, it originates from ~12° (11.75° or 23.5°/2) phase gap between paired waves, and there are 30 waves in the density of the primes plot. We have shown in Fig. 4c that at N = 12, the first clockwise rotation completes. Therefore, a solid angle of 12° creates a loop.
3.2.3
Fifteen Vortices Generated by 15 Pair Waves
The density of primes has fifteen paired periodic waves engaging in a unique journey of changing the shape of the 3D surface area covered. Similar to the process we generate the phase prime metric, 3D surface area, we build a change in the gap between primes for a particular density (0, 1, 2, 3, 4). Now, as the integer increases, the distance gets separated, and the waveforms generate a curvature following r1n , in other words, a 3D solid angular twist by a pair of waves generating 15 conical cylinders are the evolution pathways for 12 classes of metrics generated for SOMU.
3.2.4
Expanding Clock Architecture of Density of Primes
The density of primes creates 30 waveforms or 15 pairs of waveforms whose periodicities in the integer space have five quantized levels; since five types of prime densities are feasible. These polyatomic time crystals expand inhomogeneously. We need to correlate the density of primes derived invariant structure made of clocks, where instead of time, integers flow. One of the prime features of this invariant structure is the expansion of the spheres representing the clocks. The reason is continuously increasing separations between primes, yet, the geometric feature of the invariant 3D clock assembly remains constant. Therefore, we find that the nearest neighbors in the choices of prime arrangement plots (ordered factor metric) are unique. Interestingly, neighboring integers are not correlated, but the symmetry of prime positions brings choices of distantly located primes in one shape; neither magnitude of a prime is important nor the number of primes constituting an integer. The nearest numbers suggest that if symmetry breaks, where the system point would move. Therefore, neighboring points in the ordered factor metric loops or curved lines map a homogeneous gradient of several symmetries. Along these lines, if one moves, symmetry will not change dramatically. There will be the least changes in the system. Jumping between lines would be a phase transition while moving through the lines would be symmetry-breaking.
456
S. Pramanik et al.
3.3 108 Fundamental Constants Made of 17 Primes The density of primes synthesizes and sets the condition for normalization. A triplet for each of the five density values of primes sets a triplet of three angular invariants. These angular invariants are conserved laws when thirty system points generated by paired waves of the density of primes jump from one isolated pattern of the metric space to another. The critical patterns of the density of primes generate curvatures as thirty waveforms converge at common regions. Note that 30 waveforms generated by connecting the density of primes do not start at a particular number, and by 109, all 30 waveforms get at least one point. For that reason, they do not converge to a singular point, but rather a domain. These domains generic r1n feature contributes to all the geometric patterns generated by invariants of all 12 dimensions. For this reason, these indices Ci ∼ r1n generate fundamental constants Ci . . Thus, we get 95 fundamental constants for 47 bases (Ci × 10−47 to Ci × 1047 ), for the SOMU, we envision the ultimate universal engine would have a 17 prime base at 53, then we would have universal constants (Ci × 10−53 to Ci × 1053 ) (Fig. 2).
3.4 Replotting Density of Primes and Understanding the Differentials in Details 3.4.1
The Importance of Differentials
In Fig. 1, we have observed that the periodicity we are applying forcefully to the system is an oversimplification of the true pattern that links the density of primes points. For that purpose, we have written an algorithm to find an integral gap between the two points in the integral system where a common density of primes (say, two instances of zero) is found and then what is the difference between counting (C). We call it dC/dN. When we get a singular differential plot, from that, we could plot 2 dC/dN again. The derived plot would be dd NC2 . We have continued this process six or seven times. The reason is to find if there is a fractal pattern much more complex than simple periodic waves that we determined in Fig. 1. In all the plots of Fig. 3, where the differential dataset of the density of primes has been documented, we have got negative values.
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
457
Fig. 2 Density of primes and its six fundamental features: the first metric accounts for the properties born directly from the 30 waves of the density of primes described in Fig. 1. The density of primes gives birth to three types of normalizations applied on the ordered factor metric [1, Chap. 3]: Three types of normalization: e-pi-phi empty normalization (T), the density of primes: Prime contribution normalization, the density of primes induced (S): Polar plot makes clockwise and anti-clockwise rotation (R). Thus we get six features schematically summarized in six circles. Relative phases between 30 waves in the density of primes give rise to 23.5°, an angle responsible for multidimensional projections. It ensures the formation of a quadratic relation and, consequently, the fundamental constants [10]
3.4.2
The Natural Emergence of Negative Value
In Figs. 1 and 3, the reason we observe negative values is the fact that often the density of primes decreases. Therefore, it is not artificial but naturally embedded in the pattern of the density of primes.
3.4.3
Fitting the Density of Primes Differentials
Figure 4 shows only one example for each possible density of primes, and the nearest neighbors are connected. There are some remarkable features of the plots. Irrespective of the differentials of the second order or the third order, the typical branching pattern remains constant for a particular density of primes. Also, for all possible density of primes, we find that very particular branching out from a single point. Therefore, the density of primes plot Fig. 1 delivers an oversimplified periodic feature made of
458
S. Pramanik et al.
Fig. 3 The density of primes plot and differential features: There are five rows, and each row presents a particular density, L0 means zero number of primes in a gap of ten, and the first digit where zero density appears is taken. Similarly, all five possible density of primes (0, 1, 2, 3, 4) have been taken into account, and the first row is for zero, the second row is for density one, L1; the third row is for density 2, L2; the fourth row is for density 3, L3 and fifth row is for density four, L4. The first plot in every row is the density, C versus integer, N. From second to sixth are differentials, 2 3 4 DS2 means dd NC2 , DS3 means dd NC3 ; DS4 means dd NC4 . For the first two rows, Nmax is 4 k ~4000; for the third row, Nmax = 10 k; for the fourth and fifth rows, Nmax = 100 k
30 clocks. However, when we consider the differential plots in much more detail, we find that not only a typical branching out from a single point is embedded in the integer space, but the larger data we pack, the larger the size of the same fractal seed pattern. We draw two important conclusions. First, a within-and-above network of a “branching out from a point” structure should act as a seed dynamics. Second, there is a periodicity once again, like in Fig. 1. The periodicity derived from the differential plot of Fig. 3 and Fitted in Fig. 4 is nearly similar to that obtained from Fig. 1. The finding is a serious challenge to the prime number theorem that ignores all these intricate patterns and embedded invariants in the density of primes.
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
459
Fig. 4 Nearest neighbor connection of the density of primes plots: We have taken one of the differentials for the five densities of primes as noted to the right of each plot (density of primes C is along the vertical axis and N is along the horizontal axis) and connected the neighbors to create a connection. L0 is for Nmax ~4 k, L1 is for Nmax ~10 k, L2 is for Nmax = 10 k, L3 is for Nmax = 100 k and L4 is for Nmax = 100 k. A self-repeating branching pattern is seen, and we have noted the periodicity of the self-repeating or fractal seed pattern that is common for all densities of primes. For L0, the periodicity of N is 1 k. For L1, the periodicity is 1 k; for L2, the periodicity is 2 k; for L3, the periodicity is 10 k; for L4, it is an integral multiple of 10 k, i.e., an expanding feature of the same seed pattern
3.5 Generalization of the Fractal Seed Pattern Observed in the Density of Primes 3.5.1
A Generic Model That Could Establish the Formation of a Typical Branching Pattern in Fig. 4
In Fig. 5, we have attempted to deconstruct the “branching out from a point” feature observed in Fig. 4. We start from a point in panel A of Fig. 5, and we show that we could expand the branching out using three possibilities. Note that there are several different possibilities for branching out, as observed in various plots of Fig. 4. For the time being, we have included all possible variations and created seed rules to generate 15 different morphogenesis that repeats with C2 symmetry or 30 different morphogenesis classes observed in the variations of the branching out the pattern of the density of primes in Fig. 5 panel C.
460
S. Pramanik et al.
Fig. 5 A journey from point to elementary transitions observed in the density of primes in Fig. 4: Here, we describe the fundamental philosophical argument for developing the SOMU. A point generates paths and then continues to produce several dynamic paths by interacting with more points. a There are only three possibilities for making a journey from a point; using an arrow, we describe three choices. b The second phase of panel a is shown in panel b. Three events are noted by combining the output of first-phase products. c The derivatives of the second phase described in panel b are combined to create third-generation progenies in panel c. In panel c, a new point is added as an attractor in all three generations
3.5.2
The Necessity of Structuring an Undefined Point: The Growth of a Point
Here we construct a philosophical basis for our SOMU model proposed earlier to incorporate our recent finding in Fig. 4. Within a point or outside, we cannot have space, mass, potential, field, or even force at the beginning. Moreover, constructing SOMU requires choosing mathematical elements and constructs that use minimum assumptions. If there are many points, we have to define how many, how they are correlated, and who links them. So, at the beginning of everything, we cannot take more than one point. We cannot go outside the point. The only possibility is going inside a point, exploring what we can do so that everything we do is normalized to 1 or unity and satisfy all criteria to be a point. If something starts moving from a point, it could go clockwise, anti-clockwise, and knots or a composition of both clockwise and anti-clockwise rotations. A point’s origin is not an entity, but a paradox, to exist or not exist. Instead of a point as a real physical entity, we would only have the
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
461
probability of a point. Since that question must not have a finite answer, probabilities would deliver the basic elements of SOMU. An undefined point is not a null set. If it is, then it will be defined. It is the sum of all possibilities, an endless chain that cannot be defined.
3.5.3
The Birth of SRT Follows a Fusion with a Point to Create 15 Morphogenesis
Now at the second phase of the evolution of a point, when clockwise and anticlockwise paths superpose, only the end path could exist, termed as T (of SRT); both the clockwise and anti-clockwise paths could cross and survive with a loop, R of SRT; finally, both clockwise and anti-clockwise paths could coexist in a twisted path, S of SRT once we have SRT configuration, circle, ellipsoid, and knots form. These are the origins of morphogenesis. SRT properties are defined at this stage very similarly, S = Fill or expand the pattern by repeating, R = Cross-over and inflate, and T = minimize corners to make it suitable for bonding. Bigyan-Vikshu argued in the fourteenth century that for several thousands of years, different schools and a wide range of scholars have argued for redefining S, R, and T. However, the sense of SRT triplet S = projection, R = transformation, and T = bonding were never changed. However, for morphogenesis, one requires to add a point and a bidirectional arrow to the circle, ellipsoid, and knot triplet. Why do we need to add a point and bidirectional arrow? We have to insert a point inside a point; that is true. However, we need to deform the shape we have in hand so that the transformations continue. A bidirectional arrow refers to an expansion of the shape and thus incorporates a new pattern. We have outlined in Fig. 5 at the bottom row how adding a point to the circle creates a teardrop, triangle, and square. These three derived geometries combine to create wide ranges of geometries we observe as constituents for morphogenesis. When an ellipsoidal knot (R) undergoes a point and arrow transformation, it builds several asymptotes and divergent geometries, which are used as a basic element for morphogenesis. Finally, when knots interact with the point and the arrow, it generates several complex knots. The three-phase evolution of a point to basic morphogenesis constituents is the key for SOMU.
462
S. Pramanik et al.
3.6 Derivation of an Engine that Generates All Possible Branching Out Patterns of Fig. 4 We observe in Figs. 4 and 5 that all journeys must begin from a point and branch out. Therefore, in Figs. 6, 7, 8, and 9, we have created a point and two concentric circles. The outer ring decides when the system returns to the initial point, or the periodicity of the fractal seed pattern observed in the density of primes plot in Fig. 4 is derived in the second ring of Figs. 6, 7, 8, and 9. Now, we have chosen a triangular path and remained strict to the rules that our maximum journey would be limited to dimension 20 or D20 because we would not go beyond 12 dimensions. Now, 12 dimensions could be written in the planes of the dodecahedron (12D) or along the corner diagonals of the icosahedron (20D for 20 planes). Therefore, under the current model of SOMU, where we limit ourselves to the minimum dimension we need to create a self-operation, we do not need to go beyond 20D. However, we kept the possibilities open for future SOMU (last figure of the book [10]).
4 Conclusion The prime number theorem suggested a homogeneous law to explain the density of primes, which ignores the plethora of geometric invariants and laws of nature. We have created generic engines of 20D as generic alternatives of the Fano plane for octonions to predict products of all 1D to 20D vectors and, at the same time, predict dynamic behaviors of the branching out fractal seed patterns in the density of primes. Our finding paves the way to analyze morphogenesis observed in nature using the density of primes, wherein morphogenesis could eventually generate all possible spatio–temporal events happening in nature. Therefore, we provide an essential tool for bias-free self-operating modules from the density of primes to SOMU.
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
463
Fig. 6 Generic manifolds for N × N tensors where the density of primes paths form as one moves from the center of the maniflat architecture: We have drawn two concentric circles representing the product determination and signature determination routes. For taking the product of two vectors of a particular dimension, one should start from the center, which could be the point of Fig. 5 and the row value of the tensor, and move along the line to get the column value. While completing the triangle, the third value should be placed as the product at the coordinate (row, column). If the arrow direction is followed, its positive and opposite directions would be negative. The dimension of the tensor is written as the N × N value
464
S. Pramanik et al.
Fig. 7 This figure highlights the generic manifolds of tensors with dimensions of 11 × 11, 12 × 12, 13 × 13, and 14 × 14 where the density of primes paths form as one moves from the center of the maniflat architecture. In addition to the technical details presented in the caption of Fig. 6
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
465
Fig. 8 This figure concerns the generic manifolds of tensors with dimensions of 15 × 15, 16 × 16, and 17 × 17, wherein a specific pattern in the distribution of prime numbers is observed. As one moves away from the center of the maniflat architecture, the density of prime paths forms, which implies the existence of an underlying structure or organization in the distribution of prime numbers. It is worth noting that the technical details essential to understanding this phenomenon are presented in the caption of Fig. 6
466
S. Pramanik et al.
Fig. 9 This figure describes the observation of prime path density in generic manifolds constructed for tensors with dimensions of 18 × 18 and 19 × 19. The density of prime paths emerges as one moves away from the center of the maniflat architecture, indicating the presence of a structured distribution of prime numbers within the manifolds. The essential technical details to comprehending these figures are available in the figure caption of Fig. 6
Acknowledgements The authors acknowledge the Asian office of Aerospace R&D (AOARD) a part of United States Air Force (USAF) for the Grant no. FA2386-16-1-0003 (2016–2019) on the electromagnetic resonance-based communication and intelligence of biomaterials. Competing Interests Statement The authors declare that they have no competing financial interests. Resources All algorithms used here to build the maniflats and tensors and density of primes differentials are available through the joint NIMS-Amity university free resources in the GitHub.
1D to 20D Tensors Like Dodecanions and Icosanions to Model Human …
467
References 1. Reimann MW, Nolte M, Scolamiero M, Turner K, Perin R, Chindemi G, Dłotko P, Levi R, Hess K, Markram H (2017) Cliques of neurons bound into cavities provide a missing link between structure and function. Front Comput Neurosci 11:48. https://doi.org/10.3389/fncom. 2017.00048. PMID: 28659782; PMCID: PMC5467434 2. Sahu S, Fujita D, Bandyopadhyay A (2010) Inductor made of arrayed capacitors. Japanese patent has been issued on 20th August 2015 JP-511630 (world patent filed, this is the invention of fourth circuit element), US patent has been issued 9019685B2, 28th April 2015 3. Singh P, Sahoo P, Saxena K, Ghosh S, Sahu S, Ray K, Fujita D, Bandyopadhyay A (2021) A space-time-topology-prime, stTS metric for a self-operating mathematical universe uses Dodecanion geometric algebra of 2–20 D complex vectors. Proc Int Conf Data Sci Appl 148:1–31 4. Singh P, Sahoo P, Saxena K, Ghosh S, Sahu S, Ray K, Fujita D, Bandyopadhyay A (2021) Quaternion, octonion to dodecanion manifold: stereographic projections from infinity lead to a self-operating mathematical universe. Proc Int Conf Trends Comput Cogn Eng 1169:55–77 5. Dixon GM (1994) Division algebras: octonions quaternions complex numbers and the algebraic design of physics. Springer New York, NY, Springer US 1994; Hardcover ISBN, 978-07923-2890-2; Softcover ISBN, 978-1-4419-4746-8; Book ISBN, 978-1-4757-2315-1, Edition Number, 1; Number of Pages X, 238. https://doi.org/10.1007/978-1-4757-2315-1 6. Dickson LE (1919) On quaternions and their generalization and the history of the eight square theorem. Ann Math Second Series, Ann Math 20(3):155–171. https://doi.org/10.2307/1967865 7. Jacobson N (1958) Composition algebras and their automorphisms. Rendiconti Del Circolo Matematico Di Palermo 7:55–80. https://doi.org/10.1007/bf02854388 8. Ennis DB, Kindlmann G (2006) Orthogonal tensor invariants and the analysis of diffusion tensor magnetic resonance images. Magn Reson Med 55:136–146. https://doi.org/10.1002/ mrm.20741 9. Reddy S, Sonker D, Singh P, Saxena K, Singh S, Chhajed R, Tiwari S, Karthik KV, Ghosh S, Ray K, Bandyopadhyay A (2018) A brain-like computer made of time crystal: could a metric of prime alone replace a user and alleviate programming forever? Soft Comput Appl 761:1–43 10. Bandyopadhyay A (2020) Nanobrain: the making of an artificial brain from a time crystal. CRC Press, Taylor and Francis. https://doi.org/10.1201/9780429107771
Metamaterials-Based Photonic Crystal Fiber (PCF) Design for Wireless Charging Kisalaya Chakrabarti and Mayank Goswami
Abstract A metamaterial-based photonic crystal fiber (PCF) is tested in this work to handle intermittent recharging. Due to the metamaterial effect, the Electric field and Magnetic field are isolated at resonance by the Toroidal field (evolved from the Toroidal moment from the metamaterial outer ring in the form of the torus). The extracted Electric field is proposed to use for wireless charging. Keywords Toroidal field · Poloidal currents · Metamaterial
1 Introduction The process of automatic inductive charging resolves slower charging, supposedly permitting any vehicle to operate for an indefinite period. Slow charging costs the operational life of driving electronics equally in device and charger. Relative higher battery temperature not only damages battery cells but electronics as well. It is shown that to charge a Pixel 4 battery starting from 0 up to 100% through a classic cable takes 14.26 W-hours, whereas the same will take by a wireless charger approximately 21.01 W-hours, an enhancement of 47% time. Imagine the significance of 3.5 billion smartphones getting charged every other day worldwide. Innovative advancement could lessen these transfer losses by using ultra-thin coils with optimized drive electronics. Consequently, compact chargers and compact batteries through minor changes could be done. The proposed technology will endow less charging time analogous to its wired counterpart; therefore, it is rapidly finding its way into mobile batteries. Response to an electrical stimulus in the majority of the materials is identified to order stronger than a magnetic one. As expected, this disparity leads to the expression of electrical moments involved in the light-matter K. Chakrabarti Electronics and Communication Engineering, Haldia Institute of Technology, Hatiberia, Haldia 721657, India M. Goswami (B) Divyadrishti Laboratory, Department of Physics, IIT Roorkee, Roorkee, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_39
469
470
K. Chakrabarti and M. Goswami
interaction. It has been observed with metamaterial comprising of Toroidal topology, and the electromagnetic performance is mainly determined by the Toroidal moment. It occurs at a Gedanken torus due to the current excitation rotating on a surface along its meridians called Poloidal currents magnetic field [5]. Electric dipoles radiation patterns are identical Toroidal moment patterns, subsequently excited in an anti-phase that forms a non-radiating structure–anapole [2]. In this paper, we have designed and analyzed metamaterial-based photonic structure description where the Magnetic field will be ruptured out from the incident electromagnetic wave in the core and at resonance condition, will be cancelled totally by the Toroidal field (evolved from Toroidal moment from the metamaterial outer ring), and only Electric field will remain as small discontinuous circles uniformly distributed at the thin Gold (Au) ring layer which is sandwiched in between SiO2 and metamaterial as described in the following sections. To the best of our knowledge, we reported this cancellation phenomenon for the first time, which is anti-coherent. Have reported two types of metamaterials [1]. The first one is composed of “electric” metal molecules, which are planar conductive structures with two symmetric split loops. Here, plane wave incident to it stimulates circular currents “j” and sets a circulating magnetic moment in motion, leading to a Toroidal field “T”. Also, the Electrical field “E” is generated in the metamaterial because of the central gap. Magnetic Field is also “H” and always goes together with the metamaterial substrate. The authors have also demonstrated that destructive interference between T, E, and H gives us a unique effect as strong E- field localization inside the central gap. Further, [7] showed that destructive interference between the electric field generated by a central metallic scatterer and the Toroidal field creates a cloak inside a cluster of dielectric cylinders. For a plane-polarized incident, wave parallel to the symmetry axis of the computational space “m” shown in Fig. 1 brings about together Electric dipolar moments “P” and Toroidal dipolar moments "T˜ ". The electric field of the wave drives charge separation ρ = ρ0 eiωt transversely to the waist of the computational space, causing an oscillating electric dipole moment given by [5], ˜ P(t) = mρ 0 eiωt
(1)
These charge displacements furnish Poloidal-like counter-rotating currents termed as j (t) oscillating along the edges of the circular air holes of the structure shown below. These currents are in turn generating Toroidal moment T˜ (t), oriented at the direction “m”, given by T˜ (t) = miωρ 0 eiωt
(2)
Now, at resonance, the magnetic moment H˜ (t), generated from the metamaterial substrate due to anapole excitation will be exactly equal and opposite to the previous direction “m” and is given by
Metamaterials-Based Photonic Crystal Fiber (PCF) Design for Wireless …
471
Fig. 1 Structure of proposed metamaterial-based photonic structure. Arrow “m” represents the axis of its mirror symmetry
H˜ (t) = (−m)iωρ 0 eiωt
(3)
Therefore, at resonance condition, Toroidal moment T˜ (t) completely cancel the Magnetic moment H˜ (t). In other words, the Toroidal field and the Magnetic field completely cancel each other at resonance due to destructive interference and are termed as “Total Anti-Coherent Cancellation”. The electric field generated by this method can be used for wireless charging, which is more efficient than direct charging, causing no heat production in contrast to all conventional charging. It employs electromagnetic induction to supply electricity to transportable devices. It could also be used in power tools, electric toothbrushes, vehicles, and medical devices. At the specific resonance frequency, considerable distances between the sender and receiver can be accomplished. A capacitor can be introduced to each induction coil to construct two specific circuits when the inductive charging system includes the resonant inductive coupling. The frequency of the received current is matched with the resonance frequency and is chosen based on the distance needed for maximum efficiency. Contemporary developments to this resonant structure comprise movable transmission coil which can be attached to a raising platform or support and the use of other materials like silver-plated copper or aluminum at the receiver coil to reduce weight and resistance owing to the skin effect. But this arrangement suffers from disadvantages like slower charging, expensive hardware, the inefficiency of direct charging, and incompatible standards.
472
K. Chakrabarti and M. Goswami
2 Metamaterial-Based Photonic Structure Description In one earlier proposed structure [6], radii of the core (d c ) and the small holes (d 1 ) are taken as 0.15 μm whereas the radius of the larger holes (d 2 ) are taken as 0.3 μm, and also the width of the gold layer is considered as 40 nm, while metamaterial layer width is 0.98 μm, Pitches of p1 and p2 are of 2.0 μm and 1.0 μm, respectively. Phase matched layer (PML) width is 3.5 μm. It is obvious that a shift in the peak resonance will be observed with a change in the refractive index of a metamaterial. Therefore, materials applied in the recommended structure are fused silica, gold, metamaterial (refractive index of –1.37), and air holes, as shown in Fig. 1. Mesh size is kept extra fine for good results. The parametric sweep of the wavelengths is between 0.6 μm and 0.9 μm in COMSOL™. The number of modes is set using the mode analysis option. We performed a simulation on four values of modes: 40, 60, 80, and 100. To observe the metamaterial effect,we have investigated close to the refractive index of −1.45.
3 Results and Analysis We are supposed to calculate the Total fields (Combined Electric (E) and Magnetic fields(H)) around the boundary of SiO2 and metamaterial substrate having a refractive index of −1.37, at the interface of thin Au layer (in the shape of a ring) for three distinct wavelengths 0.6 μm,0.7 μm, and 0.9 μm on four values of modes depicted as (i) Mode 40 (Fig. 2), (ii) Mode 60 (Fig. 3), (iii) Mode 80 (Fig. 4), and (iv) Mode 100 (Fig. 5). As metamaterial has a negative refractive index [3] so, for an apparent reason, we have investigated the fields close to the refractive index of −1.45. It is observed from the simulations that at resonance condition, the Toroidal fields evolved from the Toroidal moments of the torus-shaped metamaterials given by Dubovik and Cheshkov [4] 1 (4) [{r. j(t)}r − 2r 2 j (t)]d 3r T = 10c
Fig. 2 Resonance Electric fields distribution at Au ring when the number of modes = 40; for λ = 0.6 μm, 0.7 μm and 0.8 μm, respectively, for r.i. of metamaterial = −1.37
Metamaterials-Based Photonic Crystal Fiber (PCF) Design for Wireless …
473
Fig. 3 Resonance Electric fields distribution at Au ring when the number of modes = 60; for λ = 0.6 μm, 0.7 μm and 0.8 μm, respectively, for r.i. of metamaterial = −1.37
Fig. 4 Resonance Electric fields distribution at Au ring when the number of modes = 80; for λ = 0.6 μm, 0.7 μm and 0.8 μm, respectively, for r.i. of metamaterial = −1.37
Fig. 5 Resonance Electric fields distribution at Au ring when the number of modes = 100; for λ = 0.6 μm, 0.7 μm and 0.8 μm, respectively, for r.i. of metamaterial = −1.37
where r is the coordinate vector located at the origin in the torus center and j (t) represents counter-rotating (Poloidal-like) current. It is now significant that Poloidal current j (t) induces Toroidal fields excited in an anti-phase to that of ruptured magnetic fields originating from incident electromagnetic waves [2], subsequently cancelled out the Magnetic Fields (H) of the EM waves and only Electric Fields (V/m) will remain in the distributed fashion throughout the Au ring in discontinuous form as shown in Figs. 2–5. Therefore, while calculating the Total fields, we should find only the distributed Electric fields or, in other words, distributed Electric fields are extracted out from the incident Electromagnetic fields impinged at the core situated at the center of the above-mentioned photonic structure. We refer to this phenomenon of H-field cancellation as Total Anti-Coherent Cancellation. Under this condition, from the geometry, one can derive the extracted electric far-field of a ring of charge on the axis of the ring can be found by superposing the point charge fields of infinitesimal charge elements given below
474 Table 1 Value of Effective Refractive Indices associated with the value of wavelength (λ) at four different mode numbers for r.i. of metamaterial = −1.37
K. Chakrabarti and M. Goswami Different wavelengths (λ) Modes
0.6 μm
0.7 μm
0.8 μm
40
−1.431 + 0.0037103i
−1.416 + 0.0026598i
−1.3962 + 0.0019785i
60
−1.4481 + 0.035952i
−1.4397 + 0.04666i
−1.4423 + 0.058501i
80
−1.4317 + 0.036363i
−1.4397 + 0.046663i
−1.4423 + 0.058501i
100
−1.4317 + 0.036387i
−1.4397 + 0.046663i
−1.4151 + 0.059591i
kz Ez =
N
Qi
i=1
(z 2 + r 2 )3/2
zˆ
where z is the long-distance where the far-field is to be calculated and
(5) N
Q i is the
i=1
total charge originated from the impinged wave on the photonic structure. Table 1 depicts the value of Effective Refractive Indices associated with the value of wavelength (λ) at four different mode numbers. It is observed that it has a negative real part due to the metamaterial effect, and the imaginary part will contribute to the phase of the refractive indices. The phase part contributes to the spatial orientation of the Electric fields on the Au ring, which is discrete in nature and spaced at equidistance all over the metal ring. It is also observed from Figs. 2–5 that the clarity of Electric field distributions upon the thin Au ring has been increased when we compute for a higher number of modes, e.g., mode number 100 gives better E-field distribution than that of 40. This extracted Electric field can be used for wireless charging as the E-field is associated with charge “Q” as shown in Eq. 2. We have noticed from the above discussions that for the number of modes = 100, it is sufficient to analyze the Resonance Electric fields distribution pattern as it consists of all possible modes. So, for the last two cases in Fig. 6. and 7, we have considered only modes number equal to 100. As in the previous case, here also, we have investigated the fields close to the refractive index of −1.45. Figures 6, 7 show the resonance electric fields distribution pattern at Au ring when the number of modes = 100; for λ = 0.6 μm, 0.7 μm, and 0.8 μm, respectively, for r.i. of metamaterial of −1.5 and −2.5, respectively. It has been observed that for metamaterial of refractive index -1.5 at 0.8 μm, there is almost no separation of electric fields due to the absence of resonance in the Au layer. In the case of metamaterial of refractive index −2.5, it has been observed that for any of the three cases for λ = 0.6 μm, 0.7 μm, and 0.8 μm, there is no resonance because the field compensation at higher negative refractive index does not occur and it is quite obvious. So we should not approach a higher negative refractive index calculation.
Metamaterials-Based Photonic Crystal Fiber (PCF) Design for Wireless …
475
Fig. 6 Resonance Electric fields distribution at Au ring when the number of modes = 100; for λ = 0.6 μm, 0.7 μm, and 0.8 μm, respectively, for r.i. of metamaterial = −1.5
Fig. 7 Resonance Electric fields distribution at Au ring when the number of modes = 100; for λ = 0.6 μm, 0.7 μm, and 0.8 μm, respectively, for r.i. of metamaterial = -2.5
Table 2 Value of Effective Refractive Indices associated with the value of wavelength (λ) at mode number = 100 for r.i. of metamaterial = -−.5 and −2.5, respectively
Different wavelengths (λ) 0.7 μm
0.8 μm
100 (for r.i = − −1.4543 + 1.5) 0.004082i
−1.4758 + 0.0026658i
−1.4275 + 0.014135i
100(for r.i = − 2.5)
−1.456 + 0.026558i
−1.4472 + 0.059379i
Modes
0.6 μm
−1.4755 + 0.039322i
Table 2 depicts the value of Effective Refractive Indices associated with the value of wavelength (λ) at mode number = 100. As expected, this case also has a negative real part due to the metamaterial effect, and the imaginary part will contribute to the phase of the refractive indices. As mentioned earlier, we should not get any clear, discrete Electric fields as we have seen in earlier cases. Here we can see a partial separation of Electric fields or noisy Electric fields with outward fringes (Fig. 7) where total cancellation of the Electric field and the Magnetic fields does not happen.
4 Conclusion We proposed a metamaterial-based photonic structure suitable for the extraction of electricity by total cancellation of the magnetic field. This phenomenon of Hfield cancellation by Toroidal field is termed as “Total Anti-Coherent Cancellation”. Extracted Electric field from the electromagnetic wave can be used for wireless charging as the E-field is associated with Electric charge “Q”. Electric field generated
476
K. Chakrabarti and M. Goswami
by this method can be utilized for charging which is more efficient to direct charging, causing no heat production in contrast to all conventional charging. Credit authorship contribution statement MG: Methodology, Investigation, Software, Writing, Visualization, KC: Data Processing, Investigation, Writing.
References 1. Basharin AA, Chuguevskiy V, Volsky N, Kafesaki M, Economou EN, Ustinov AV (1605) Extremely high Q-factor toroidal metamaterials. https://arxiv.org/abs/1605.08779 (08779) 2. Basharin AA., Chuguevsky V, Volsky N, Kafesaki M, Economou EN (2017) Extremely high Q-factor metamaterials due to anapole excitation. Physical Review B, 95 (3) 3. Kisalaya Chakrabarti, Shahriar Mostufa, Alok Kumar Paul (2021) Design and analysis of position chirped metamaterial photonic crystal array for confinement of light pulse. J Opt, IOP Publishing Ltd., 23(11) Published 6th October 2021. https://doi.org/10.1088/2040-8986/ac2164 4. Dubovik VM, Cheshkov AA (1974) Multipole expansion in classical and quantum field theory and radiation. Sov J Part Nucl Phys 5:318–364 5. Fedotov VA, Rogacheva AV, Savinov V, Tsai DP, Zheludev NI (2013) Resonant transparency and Non-Trivial Non-Radiating excitations in toroidal metamaterials. Sci Rep 3:2967. https:// doi.org/10.1038/srep02967 6. Khare, Prasunika, and Mayank Goswami. “AI Algorithm for Mode Classification of PCF SPR Sensor Design.“ arXiv preprint arXiv:2107.06184 (2021). 7. Ospanova AK, Labate G, Matekovits L et al (2018) Multipolar passive cloaking by nonradiating anapole excitation. Sci Rep 8:12514. https://doi.org/10.1038/s41598-018-30935-3
A Ranking Model of Paddy Farmers for Their Welfare Suneeta Mohanty, Shaswati Patra, Prabhat Ranjan Patra, and Prasant Kumar Pattnaik
Abstract Paddy is considered as the major staple food of the Asia continent. Many stakeholders are involved in the process of rice harvesting from storage to distribution to the final consumer. In this paper, we have prioritized the farmer of a particular region based on their estimated time of harvest that will lead to streamline the process of paddy collection and distribution to avoid wastage. We have given a ranking model for paddy farmers using AHP. Reduction of wastage will contribute toward the welfare of the farmer. Keywords Paddy collection · Supply chain · Farmer’s welfare · MCDM · Analytic Hierarchy Process (AHP)
1 Introduction Sixty percent of the population of the world considers paddy as their staple food. The continent of Asia is mainly responsible for the production and consumption of rice. We have to store the paddy/rice produced in a very safe area considering some criteria like the selection of site, structure of storage, cleaning and disinfection of the storage site, drying and cleaning of grains, proper ventilation, and regular and thorough inspection. There are many kinds of storage structures present, to name a few, producer’s storage, rural godown, mandi godown, state and central warehousing co-operations, and co-operative storage. The majority of rice is transported to markets Suneeta Mohanty, Shaswati Patra: These authors contributed equally to this work. S. Mohanty (B) · S. Patra · P. K. Pattnaik School of Computer Engineering, Kalinga Institute of Industrial Technology Deemed to Be University, Bhubaneswar 751024, India e-mail: [email protected] S. Patra e-mail: [email protected] P. R. Patra College of Agriculture, OUAT, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_40
477
478
S. Mohanty et al.
within the same state or adjacent states. The rice makes its way from paddy fields to the assembling stations (mandi) through the producer and then several more market formalities while completing its way to the end consumer. Due to the unavailability of adequate storehouses, most of the paddy gets wasted outside the mandi because of rain, theft, insects, mice, etc., if they were not picked in proper time. The farmers are allotted a time slot for selling their paddy in the mandi, but during that time slot, the paddy may not be ready for sale. It is always not possible for the farmer to reach their product to the mandi in the allotted time slot for it due to delays in harvesting, lack of manpower, etc. It may have to wait for the next available slot for which there is a risk of keeping the paddy outside the storehouse. In some cases, the farmers have made their paddy ready for selling but they have not got their turn. Therefore, proper slot allocation is required for efficient supply of paddy in mandi based on their estimated time of production. In this paper, we have prioritized the farmer in a particular region based on when they expect to harvest in order to streamline the paddy collection process and prevent waste, which will benefit farmers’ welfare. We have used AHP, which is one of the multi-criteria decision-making (MCDM) tools to achieve the above goal.
2 Literature Review In this paper, we have studied various research articles related to paddy harvesting, agriculture supply chain management, and MCDM approach. Somsekhar and Raju have proposed an extensive review of agriculture and food supply chain management, marketing, contract farming, and many other initiatives taken by the private sector in India [1]. Parwez has explored many problems with respect to food security including insufficient storage, and faulty supply chain in the context of IT [2]. Any inefficiency present in the supply chain can be tracked using RFID to improve the traceability of the inventories [3, 4]. Kumar and Iyengar proposed the blockchain-based supply management for rice that ensures the safety of rice during the entire process of the supply chain [5]. Wang et al. have used MCDM approach for evaluating and selecting the supplier for the rice supply chain [6]. Singh et al. [7] have discussed the importance of the MCDM method in decision-making and proposed a recommendation system for fertilizers.
3 Problem Definition It is always not possible for the farmer to reach the mandi in the available time slot due to delays in harvesting, lack of manpower, etc. They may have to wait for the next available slot for which there is a risk of keeping the paddy outside the storehouse.
A Ranking Model of Paddy Farmers for Their Welfare
479
Due to the lack of storage, most of the paddy is wasted outside the storehouse due to rain, theft, and also paddy getting eaten by insects, mice, etc. Our objective is to design an automated system where farmers will fill in all the information about their cultivation, methodology used, soil parameters, and the environmental condition in that area in the middle of their cultivation. Our model will analyze the data of the farmer and make a ranking of the farmers for selling their products in the mandi.
4 Proposed Model Our objective is to minimize the waiting time of the farmers to get their turn to sell their paddy and maximize the supply. There are many reasons for the wastage of paddy in the mandi due to improper storage availability. It may be the case that the farmers get their turn, but their paddy is not ready for sale or a farmer is ready to sell their product but has not got its turn. If we will be able to prioritize the farmer based on their time of production, we will be able to reduce the waiting time and increase the supply. Several parameters affect the time of production as, if a farmer sows the seeds earlier, the paddy will be produced earlier. Similarly, the type of seeds, whether the land is irrigated or not, types of nurseries, method of production, and technology affect the production time of the paddy. If we decide which of these factors have how much impact on the time of production then we will be able to prioritize these factors and priorities, and the farmers as well. We have used AHP to rank the farmers based on their production time by considering different factors affecting the production time. We have categorized the factors into two types: external factors and internal factors. In the internal factor, we have considered several criteria such as the types of seeds, whether the land is irrigated or not, types of nurseries, method of production, and technology that affect the production time of the paddy. In the internal factors also, we have considered several criteria such as soil parameters, pest and disease control, natural disaster, environmental effect, physical damage, and availability of required resources. Among all the internal factors, we have prioritized the criteria which have more impact on the time of production by making a pair-wise comparison between them. Each criterion also has some sub-criteria which has a different impact on production time. We have represented the internal and external factors, the criteria, and the subcriteria in a hierarchical form as shown in Tables 1 and 2. By using the AHP framework, we prioritize the criteria and the sub-criteria, then we will find an overall ranking of the farmers. For each of the criteria, we consider the sub-criteria for a better estimation of production time. The internal factor of Seed Variety directly affects the time of production. Some of the seed variety take 105–110 days, 124 days, and 135–140 days for harvesting which respectively affects the time of production. Similarly, the Nitrogen, Phosphorous, and Potassium (NPK) content of the soil affects the production time; based on the amount
480
S. Mohanty et al.
Table 1 Internal criteria
Table 2 External criteria
Internal criteria
Abbreviations
Seed variety
SED
Irrigation system
IRR
Type of nursery
NUR
Financial economy
FE
Method of production
MET
Technology used
TEC
External criteria
Abbreviations
Soil parameter (NPK contain)
SP
Paste and disease control
PDC
Environmental effect
EE
Natural disaster
ND
Repair physical damage
R-PHD
Resources status
RS
of NPK content, we divided the soil as high, medium, and low NPK containing soil and accordingly we considered the sub-criteria. The sub-criteria concerning different internal criteria are given in the following Table 3. The sub-criteria concerning different external criteria are given in the following Table 4. We find the priority between the internal criteria using the MCDM method. The following Tables 5–7 represent the steps of the MCDM method. First, we make the pair-wise comparison matrix between each criterion and fill the semantic values with their numerical equivalent as 1, 3, 5, 7, and 9, if a criterion has an extremely good, Table 3 Sub-criteria of each internal criterion Seed variety
Irrigation system
Type of nursery
Financial economy
Methodology used
Technology used
105–100 days
Irrigated
Wet bed
Below poverty line (BPL)
Drilling
Sickles
124 days
Non-irrigated
Dry bed
Above poverty line (APL)
Transplantation
Harvesting by knife
Japanese
Threshing machine-driven
135–140 days
Depog
Threshing manually
A Ranking Model of Paddy Farmers for Their Welfare
481
Table 4 Sub-criteria of each external criterion SP (NPK content)
PDC
EE (rainfall)
Natural disaster
Repair-PSD
Resource status
High
High
Heavy rain
Draught
Machine-related damage
Unavailability of resources
Medium
Medium
Moderate
Cyclone
Human-related damage
Poor-quality resources
Low
Low
Low
Flood
Good-quality resources
highly good, moderately good, better, and good impact on the time of production as compared to the other criteria, in the pair-wise comparison matrix and their reciprocal values in the transpose position. Then, we find the normalized matrix, the priorities of the criteria, and the consistency ratio following the MCDM procedure given in Saaty [8]. If the consistency ratio < 0.1 the judgment is consistent, else it is inconsistent and we have to make it consistent by changing the values of the pair-wise comparison matrix and repeating the process. Here, we got Consistency Index as 0.10 and Consistency Ratio as 0.08. Similarly, we calculated the priority of each external criterion using the MCDM method, and the steps are represented in Tables 8, 9, and 10. Here, we got Consistency Index as 0.10 and Consistency Ratio as 0.08. Table 5 Pair-wise comparison matrix of internal criteria SED
IRR
NUR
FE
MET
TEC
1
3
3
5
5
7
IRR
0.333
1
3
3
5
5
NUR
0.333
0.333
1
3
5
5
FE
0.2
0.333
0.333
1
3
5
MET
0.2
0.2
0.2
0.333
1
3
TEC
0.143
0.2
0.2
0.2
0.3
1
SED
Table 6 Normalized matrix of internal criteria SED
IRR
NUR
FE
MET
TEC
0.452
0.592
0.387
0.398
0.259
0.269
IRR
0.150
0.197
0.387
0.239
0.259
0.192
NUR
0.150
0.065
0.129
0.239
0.259
0.192
FE
0.090
0.065
0.043
0.079
0.155
0.192
MET
0.090
0.039
0.025
0.026
0.051
0.115
TEC
0.064
0.039
0.025
0.015
0.015
0.038
SED
482
S. Mohanty et al.
Table 7 Priority of internal criteria Priority
SED
IRR
NUR
FE
MET
TEC
0.393
0.237
0.172
0.104
0.058
0.033
Table 8 Pair-wise comparison matrix on external criteria SP
PDC
EE
ND
R-PHD
RS
SP
1
3
3
5
7
7
PDC
0.333
1
3
5
5
7
EE
0.333
0.333
1
3
5
7
ND
0.2
0.2
0.333
1
3
5
R-PHD
0.142
0.2
0.333
1
3
5
RS
0.142
0.142
0.142
0.2
0.333
1
R-PHD
RS
Table 9 Normalized matrix of external criteria SP
PDC
EE
ND
SP
0.465
0.615
0.398
0.344
0.328
0.233
PDC
0.154
0.205
0.390
0.344
0.234
0.233
EE
0.154
0.068
0.130
0.206
0.234
0.233
ND
0.093
0.041
0.043
0.068
0.140
0.166
R-PHD
0.066
0.041
0.026
0.022
0.046
0.100
RS
0.066
0.029
0.018
0.013
0.015
0.033
Table 10 Priority of external criteria Priority
SP
PDC
EE
ND
P-PHD
RS
0.396
0.260
0.171
0.092
0.050
0.029
We make the pair-wise comparison between the sub-criteria of each internal criterion and prioritize the sub-criteria using the MCDM technique which is given in the following Table 11. Similar to the ranking of the sub-criteria of external parameters, we apply the same method and prioritize the sub-criteria of internal parameters, and the ranking is given in the following Table 12. Finally, we compared the impact of internal parameters over external parameters at the time of production and found that the impact of the internal parameter is 55% and that of the external parameter is 45% at the time of production. We collect the data for all these criteria and sub-criteria from 3 different farmers. We use the priority of each criterion and sub-criteria calculated by the MCDM method and find the priority of the farmer.
Priority
0.4
0.35
0.2
SED
105-110D
124 D
135–140 D
Non-irrigated
Irrigated
IRR
0.35
0.65
Priority
Depog
Dry Bed
Wet Bed
NUR
0.2
0.3
0.5
Priority
Table 11 Priority of all sub-criteria corresponding to each internal criterion
APL
BPL
FE 0.7
0.3
Priority
Japanese
Transplantation
Drilling
MET
0.4
0.35
0.25
Priority
0.35 0.15
Threshing manually
0.25
0.25
Priority
Threshing engine driven
Harvesting by Knife
Sickles
TEC
A Ranking Model of Paddy Farmers for Their Welfare 483
Priority
0.423
0.313
0.267
SP (NPK Content)
High
Medium
Low
Low
Medium
High
PDC
0.224
0.352
0.412
Low
Moderate
Heavy Rain
EE (Rainfall)
0.321
0.480
0.251
Priority
Table 12 Priority of all sub-criteria corresponding to each external criterion
Flood
Cyclone
Draught
ND
0.278
0.457
0.353
Priority
Human-related damage
Machine-related damage
Repair-PSD
0.452
0.541
Good-quality resources
Poor-quality resources
Unavailability of resources
Resource status
0.601
0.247
0.149
484 S. Mohanty et al.
A Ranking Model of Paddy Farmers for Their Welfare
485
The overall priority of a farmer is calculated as ⎛ 0.55 × ⎝ ⎛ +0.45⎝
6
⎞ priority of internal criterion_k × priority of sub − criteriachoosenbythe f ar mer ⎠
k=1 6
⎞
priority of external criterion_k × priority of sub − criteriachoosenbythe f ar mer ⎠
k=1
By considering all the criteria and sub-criteria, we found the overall ranking of the farmers based on the time of production of their paddy (Tables 13, 14, 15). Table 13 Farmer 1 chose sub-criteria corresponding to each internal and external criterion Farmer-1
Internal criteria
Sub-criteria
External criteria
Sub-criteria
SED
135–140 Days
SP
Medium
IRR
Non-irrigated
PDC
Low
NUR
Wet bed
EE
Moderate
FE
BPL
ND
Draught
MET
Transplantation
R-PHD
Human-related damage
TEC
Threshing manually
RS
Poor-quality resources
Table 14 Farmer 2 chose sub-criteria corresponding to each internal and external criterion Farmer-2 Internal criteria Sub-criteria
External criteria Sub-criteria
SED
105 Day
SP
Low
IRR
Irrigated
PDC
Medium
NUR
Dry bed
EE
High
FE
APL
ND
Flood
MET
Transplantation
R-PHD
Human-related damage
TEC
Harvesting by sickles RS
Good-quality resources
Table 15 Farmer 3 chose sub-criteria corresponding to each internal and external criterion Farmer-3
Internal criteria
Sub-criteria
External criteria
Sub-criteria
SED
124 Day
SP
Medium
IRR
Non-irrigated
PDC
High
NUR
Dry bed
EE
Moderate
FE
BPL
ND
Cyclone
MET
Japanese
R-PHD
Human-related damage
TEC
Threshing machine-driven
RS
Poor-quality resources
486 Table 16 Ranking of farmers
S. Mohanty et al. Farmer
Priority
Rank
Farmer 1
0.356
2
Farmer 2
0.392
1
Farmer 3
0.350
3
Overall priority of farmer 1 = 0.55 × (0.393 × 0.4 + 0.237 × 0.35 + 0.172 × 0.5 + 0.104 × 0.3 + 0.058 × 0.35 + 0.033 × 0.15) + 0.45 × (0.396 × 0.313 + 0.260 × 0.224 + 0.171 × 0.480 + 0.092 × 0.353 + 0.050 × 0.452 + 0.029 × 0.247) = 0.210 + 0.1469 = 0.356. Overall priority of farmer 2 = 0.55 × (0.393 × 0.4 + 0.237 × 0.65 + 0.172 × 0.3 + 0.104 × 0.7 + 0.058 × 0.35 + 0.033 × 0. 25) + 0.45 × (0.396 × 0.267 + 0.260 × 0.352 + 0.171 × 0.251 + 0.092 × 0.278 + 0.050 × 0.452 + 0.029 × 0.601) = 0.255 + 0.137 = 0.392. Overall priority of farmer 3 = 0.55 × (0.393 × 0.35 + 0.237 × 0.35 + 0.172 × 0.3 + 0.104 × 0.3 + 0.058 × 0.4 + 0.033 × 0.35) + 0.45 × (0.396 × 0.313 + 0.260 × 0.412 + 0.171 × 0.480 + 0.092 × 0.278 + 0.050 × 0.452 + 0.029 × 0.247) = 0.185 + 0.1658 = 0.350.
5 Conclusion In this paper, we identify different internal and external criteria affecting the time of production of paddy and prioritize them using the MCDM method. Here, we rank the farmer based on their tentative time of production which is given in Table 16. The farmers will get their turn for selling their product in mandi as per the ranking which will in turn reduce the wastage and will contribute toward their welfare.
References 1. Somashekhar C, Raju JK (2014) Agriculture supply chain management: a scenario in India. Res J Soc Sci Manag RJSSM 04(07):pp 89–99 2. Parwez S (2014) Food supply chain management in Indian agriculture: issues, opportunities and further research. Acad J 8(14):572–581 3. King RP, Boehlje M, Cook M, Sonka ST (2010) Agribusiness economics and management. In: American journal of agricultural economics, special issue commemorating the centennial of the AAEA, vol 92, no 2 4. Brintrup A, Ranasinghe D, McFarlane D (2010) RFID opportunity analysis for leaner manufacturing. Int J Prod Res 48(9):2745–2764 5. Kumar MV, Iyengar NChSN (2017) A framework for blockchain technology in rice supply chain management. Adv Sci Technol Lett 146(FGCN 2017):125–130 6. Wang C, Nguyen VT, Duong DH, Do HT (2018) A hybrid fuzzy analytic network process (FANP) and data envelopment analysis (DEA) approach for supplier evaluation and selection in the rice supply chain. Symmetry 10:22
A Ranking Model of Paddy Farmers for Their Welfare
487
7. Singh S, Mohanty S, Pattnaik PK (2022) Agriculture fertilizer recommendation system. In: International conference on smart computing and cyber security: strategic foresight, security challenges and innovation. Springer, Singapore, pp 156–172 8. Saaty TL (1988) What is the analytic hierarchy process? Mathematical models for decision support. Springer, Berlin, Heidelberg, pp 109–121
A Real-Time Bangla Local Language Recognition from Voice Mohammad Junayed Khan Noor, Fatema Tuj Johora, Md. Mahin, and Muhammad Aminur Rahaman
Abstract Voice is the most natural form of communication and interaction between people. Bangladesh has masses of people who do not know how to read or write but merely speak and also Typing in Bangla can be difficult, especially for those who cannot type fast. Therefore, there is an increasing interest in speech-to-text conversion for speech-oriented human–machine interaction. Google developed its Speech Application Program Interface for speech-related work on mobile and computer applications. The API can recognize up to 120 languages and variants (including Bangla). Therefore, the purpose of this research is to investigate Speech-to-Text (STT) conversion using Google Speech-to-text API for Promito Bangla words. Another purpose of this research is to investigate Speech-to-Text (STT) conversion using Google Speechto-text API for Local Bangla words, and developing a system that will detect Local Bangla words and will generate the correspondent Promito Bangla words against the Local Bangla word. For our research, we collected 105 Promito Bangla words, 105 local words of Noakhali against the 105 Promito Bangla words, and the local word of Chittagong against the 105 Promito Bangla words. The sum of our collected words was 315. With our proposed system firstly, we tested all Promito Bangla words. After that, we tested all trained Local words of Noakhali and tested all the trained Local words of Chittagong. experimenting with the Promito Bangla Words in Model A, the accuracy rate was amazing 95.23% to be exact. Our analysis shows that increasing the number of training data increases accuracy. Keywords Speech Recognition · Testing Speech · Google Speech-to-text API · Real-Time Conversion · UNICODE
M. J. K. Noor · F. T. Johora (B) · M. A. Rahaman Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh e-mail: [email protected] Md. Mahin Department of Computer Science, University of Houston, Houston, TX, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_41
489
490
M. J. K. Noor et al.
1 Introduction Speech is the most natural way for humans to communicate and interact. Speech is a versatile, strong, and well-known mode of communication. On the opposite hand, text and symbols are the foremost common style of transaction in computer systems. As a result, interest in speech-to-text conversion for speech-oriented human–computer interaction is growing by the day. Between bidirectional conversions, Text-to-Speech (TTS) translation (known as speech synthesis) is easy and easier than Speech-to-Text (STT) conversion. The motion, manner, and pronunciation of words are the aspects of voice bio-metrics and give difficulty recognizing the speech to convert it to text [5]. Numerous difference among spoken languages demands individual systematic and scientific effort for each language. Among more than six thousand distinct languages of the world [6], most of the speech-related studies are related to a few languages in which English is the main. In the past, contributions in the exact field were achieved using Microsoft SAPI but it was very limited in being slow in recognizing continuous speech without proper word gap [10]. We hope to achieve faster performance using Google Speech-to-text API. The reason that the Bengali language has fallen behind in this regard is because of being a very tricky language with a lot of challenges to overcome to do correctly [3]. Speech-to-text conversion is the process of dissecting discrete syllables or phonemes of recorded vocal audio and converting them to their literal transliteration [9]. Moreover, one of the main challenges previous works suffered to convert voice into text was developing the phonetic dictionary [11] which is used to match portions of the audio to their respective phonemes [2] that get merged into syllables to finally form the desired word. Google developed its Speech Application Program Interface for speech-related work on mobile and computer applications. Google Speech-to-text API powered by machine learning to accurately predict and process language, vocabulary, and text. Therefore, speechto-text conversion for other languages using Google Speech-to-text API is a highly dependable area of study. Bengali is one of the richest languages in the world. About 250 million natives and about 300 million total peoples worldwide speak Bangla [8]. However, typing in Bengali can be difficult, especially for those who cannot type fast [7]. Bangla is an important language with a long history,UNESCO has designated February 21st as International Mother Language Day to honor the language martyrs who died in the year 1952 for the language in Bangladesh. It is also spoken in Malawi, Nepal, Saudi Arabia, Singapore, Australia, the UAE, the UK, and the USA by Banglaspeaking people. About one-sixth population of the world is speaking in Bangla [4]. The aim of this study is to investigate Speech-to-Text (STT) conversion using Google Speech-to-text API for Promito Bangla words. Another purpose of this research is to investigate Speech-to-Text (STT) conversion using Google Speech-to-text API for Local Bangla words, and develop a system that will detect Local Bangla word and will generate the correspondent Promito Bangla word against the Local Bangla word. In this research, we got huge recognition success for Promito Bangla words. Word error rate for Promito Bangla words was very low. But Google Speech-to-text API could not recognize and detect Local Bangla Words. There are regional variations in
A Real-Time Bangla Local Language Recognition from Voice
491
the Bengali Language called dialect. Style of pronunciation and accent are different from area to area. Some words are even pronounced differently by different speakers. So it is a big challenge for us to develop a system that will detect Local Bangla word and will generate the correspondent Promito Bangla word against the Local Bangla word. The rest of the paper is organized as follows: Sect. 2 discusses some literature reviews. Then Sect. 5 discusses our proposed methodology. Section 6 presents the result analysis followed by the conclusion.
2 Literature Review There are many works have been done on Bangla language recognition for formal Bangla words but there is no significant work has done on speech-to-text to recognition of Bangla Local words. There are many discrepancies between the regional words of one district and the regional words of another district. As a result, people in one district have difficulty understanding the words of another district. The regional words of a district are unfamiliar to the people of another district. All Bengalis are familiar with formal Bengali words. The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. Applications that use SAPI include Microsoft Office, Microsoft Agent, and Microsoft Speech Server. Sultana et al. develop a system that took input as Bangla Voice and the aim of this study is to investigate Speech-to-Text (STT) conversion using SAPI for the Bangla language [10]. In general, all versions of the API have been designed such that a software developer can write an application to perform speech recognition and synthesis by using a standard set of interfaces, accessible from a variety of programming languages. In the training part, the researcher took voice input and make a trained sample for different speakers. In methodology, they took voice input and then used SAPIin their system for voice-to-text conversion, and at the end, system generates text outputs. Author designed the system for Speech-to-Text (STT) conversion of Bangla Language. Their result was a 78% match between the trained samples and the test samples. Their test set was almost 270 Bangla words. Bangla is a very rich language and 270 words are very limited and it’s the biggest drawback of this research. This paper presents a model to convert the natural Bengali language to text. Humayun Kabir et al. proposed a model that requires the usage of the open-sourced framework Sphinx 4 which is written in Java and provides the required procedural coding tools to develop an acoustic model for a custom language like Bengali [7]. Their main objective was to ensure that the system was adequately trained on a word-by-word basis from various speakers so that it could recognize new speakers fluently. They used a free digital audio workstation (DAW) called Audacity to manipulate the collected recording data via continuous frequency profiling techniques to reduce the Signal-to-Noise-Ratio (SNR), vocal leveling, normalization, and syllable splitting as well as merging which ensures an error-free 1:1-word mapping of each utterance with its mirror transcription file text. In this research work researchers designed a system to convert Bangla
492
M. J. K. Noor et al.
Fig. 1 Structure of the system for evaluating three API performances [1]
language, they have used an API named CMU Sphinx 4 for speech-to-text conversion, which is developed in Java and was developed by a group of researchers at Carnegie Mellon University. In methodology, they took voice input and then used CMU Sphinx 4 engines for voice-to-text conversion, and at the end system generates text outputs. Their result was a 71.7% match between the trained samples and the test samples. They didn’t insure to implement of their algorithm in real-time applications. In this paper, researchers do performance tests of Microsoft API, Google API, and CMU Sphinx for speech recognition. All these three, API is commercial Speech Recognition API and right now the most powerful and useful API for speech recognition. In their proposed methodology, they have designed a system and integration of Microsoft API, Google API, and CMU Sphinx into their system. They trained their system. They inputted voice as audio files and then particularly recognize each inputted sentence by all three API. In the end, researchers calculated the word error rate for each API particularly. It’s a great work by which we can see the comparison between 3 API and also, we can see the performance of these 3 APIs. Figure 1 shows the structure of the designed system for evaluating three API (Microsoft API, Google API, and CMU Sphinx) performance in one system.
3 Problem Domain Bengali is one of the richest languages in the world and about 250 million people all over the world speak the language. About one-sixth population of the world speaks in Bengali language. It is ranked 7th based on the number of speakers. However, typing in Bengali can be difficult, especially for those who cannot type fast. Being able to convert the spoken word to text is the easiest form of typing any language and we want Bengali to be a part of that digital world. There are regional variations in Bengali Language called dialect. Style of pronunciation and accent are different from area to area. Some words are even pronounced differently by different speakers. Here is the scenario of problem domaining. For example, ami is the promito Bangla word and there exist lots of local words against ami, and meaning of all those words are same. It’s very difficult for a person to know all the local words and regional
A Real-Time Bangla Local Language Recognition from Voice
493
Fig. 2 Problem domain of our proposed system to detect local Bangla words
variants. There are couple of APIs by which we can do speech-to-text conversion of Bangla language. They are Microsoft API, CMU Spinx 4 (written in java), and Google cloud API. All of these apes are able to do speech-to-text conversion of formal Bangla language and we are working with google cloud API and we got enough accuracy for formal Bangla language. Figure 2 shows the problem domain of our proposed system, our system can detect Promito Bangla words and cannot detect Local Bangla words. This is the biggest challenge of our work. But the problem is that it cannot detect Bangla local words. That is the problem that we want to mitigate. Our first objective is to develop a system to recognize Bangla speech-to-text from voice and which will be in real time. Our second objective is to develop an enriched train set for identifying Bangla local words and giving output as Promito Bangla word. Our final and dream objective is to develop a robust algorithm for identifying Bangla local words and giving output as Promito Bangla word.
4 Proposed Architecture The system is designed for Bangla Local Language Recognition from Voice in real time. We divided our work into four phases. At the very first phase, our system has to take voice input in real time. At the second phase, system has to convert the inputted voice into text. This is the architecture. Figure 3 shows the architecture that we proposed for our system. We designed our system fully based on Automatic Speech Recognition (ASR) system. In this stage, we take help from Google Cloud API. The third phase is the training phase of our system. In this phase, we stored all data which we got by doing recognition of Bangla local words. This is a very important part of getting better accuracy. Phase four is the last phase for generating output. At this phase, we have to use a search algorithm for matching the inputted local word and after matching the word the system has to generate the correspondent Promito Bangla Word against the Specific Bangla Local Word.
494
M. J. K. Noor et al.
Fig. 3 An ASR-based proposed architecture of Bangla local language recognition
5 Methodology Here is the workflow diagram of our system. First of all, system takes voice input in real time by setting a listener. Then it calls cloud functions. Cloud functions are needed for calling Google Cloud API. When the cloud function calls Google cloud API then google cloud API also calls some Machine Language API for better recognition. Figure 4 shows workflow diagram of our proposed system for Bangla local word recognition and detection. Gets the recognized text, which we used for creating the train set. Finally, we use a search algorithm for matching the inputted local word and after matching the word the system has to generate the correspondent Promito Bangla Word against the Specific Bangla Local Word. A. Proposed system training: The training process is very important for any training-based system. Here in Fig. 5, the flow chat shows how we trained our
Fig. 4 Workflow diagram of the proposed system for Bangla local word recognition and detection
A Real-Time Bangla Local Language Recognition from Voice
495
Fig. 5 Process of system training and storing data
system for Bangla Local Words. Firstly, we imputed Bangla Local words as voice and did recognition by Google Speech-to-Text API. Figure 5 shows the process of System Training and Storing Data of the proposed system. After completing recognition by Google Speech-to-Text API we stored every data that we got after recognition in the different item section of Python Dictionary. Python Dictionary technology based on Hash Table Technology. That is why it is very fast. Here is Fig. 6 which shows all the words for which we have trained the system. I have given the ID numbers of the words in the first column; in the second column, we have put the standard Bengali words. In the third column, we put all the rural words of Chittagong, in the last column we put all the rural words of Noakhali for which we have trained our system. We had to spend a lot of time collecting the words because it was very necessary to collect these words for our system training, otherwise, we would not have been able to start the system training, we have left these words out so that they can be used in later research. Next, we collected several speaker words for these words. We have collected words from 13 different speakers in their voice. With that voice, we have trained our system more often so that we can get better accuracy. We know that Bangla is a vast and very rich language. Here we listed 315 Bangla words and also trained those words. But there are lots of words that exist in Bangla Language. We gathered audio recordings of different speakers for testing and training our system. Because the accuracy of Speech Recognition system depends on how much it is trained. The huge number of training data is needed for getting better accuracy of Speech Recognition system. B. Data collection and analysis: To build a general speech recognizer, lots of data would have been required which was not easy to collect in this short period of time. For our research, we collected 105 Promito Bangla words. We also collected 105 Local words of Noakhali against the 105 Promito Bangla words. Furthermore, we collected 105 Local words of Chittagong against the 105 Promito Bangla words. The sum of our collected word was 315. The first method of data collection that we attempted was to gather Promito Bangla Words, Local word of Noakhali against the Promito Bangla words, Local word of Chittagong
496
M. J. K. Noor et al.
Fig. 6 Different speakers
against the Promito Bangla words. The second method of data collection that we attempted was to gather audio recordings of different speakers for training and testing our system. List of the Bangla words we collected and trained and tested our designed system. Here is Fig. 7 shows a collection of word samples for which we have trained the system. I have given the ID numbers of the words in the first column; in the second column, we have put the standard Bengali words. In the third column, we put all the rural words of Chittagong, in the last column we put all the rural words of Noakhali for which we have trained our system. We had to spend a lot of time collecting the words because it was very necessary to collect these words for our system training, otherwise, we would not have been able to start the system training, we have left these words out so that they can be used in later research. Next, we collected several speaker words for these words. We have collected words from 13 different speakers in their voice. With that voice, we have trained our system more often so that we can get better accuracy. We know that Bangla is a vast and very rich language. Here we listed 315 Bangla words and also trained those words. But there are lots of words that exist in Bangla Language. C Algorithm: This algorithm is designed for Dictionary Iteration. This is our approach for word mapping and matching Local Bangla Words and then returning correspondent Promito Bangla Words against Local Bangla Words.
A Real-Time Bangla Local Language Recognition from Voice
497
Fig. 7 Sample of data collection
6 Performance Evaluation We tried to develop the system in such a way so that gradually it decreases its error and accuracy gets increased. For that purpose, we focused on training the system with different speakers and with recordings recorded in different environment. While testing for accuracy we also provided unknown speaker recordings just to see how accurate can the system be for unknown speakers. However, we described the numbers of speakers, accuracy, and other experimental issues through the table datum
498 Table 1 Result data of the proposed system
M. J. K. Noor et al. Models
Number of words
Word recognized
Percentage of recognized
Model A
105
100
95.23
Model B
105
90
85.71
Model C
105
75
71.42
and bar chart datum. At first, we started with very simple and few-word recordings to implement the setup properly. In that case, firstly we tested all Promito Bangla words. Then we tested all trained Local words of Noakhali. Finally, we tested all the trained Local words of Chittagong. Although we had to face difficulties such as trying couple of times on computer after failing to set up the environment for the system, we finally could develop the system successfully. Moreover, our team then started working on recordings for more words since we got very poor accuracy at time. We even got the expected output after a couple of times. So, therefore, we started training the words in our system. We could record 105 words for each person. To get the better accuracy then we tried further and got much better accuracy than it was before. Finally, in order to getting best possible result, we increased the number of speakers with the number of Bengali words to train the system. Besides, we used speakers to record some specific Bengali words. After completing all of these attempts and experiments, we finally could increase the accuracy of the system which was better than the first experiment. We also discovered if could increase enough words in the training phase, the system can even provide surprising accuracy rate. The following experimental results were obtained via our Proposed System. Table 1 shows the result data of the Recognition Rate of our proposed system. • Model A: Dataset of 105 Promito Bangla Words. • Model B: Dataset of 105 Local Words of Noakhali. • Model C: Dataset of 105 Local Words of Chittagong. Recognition Rate calculation from Eq. 1. We calculated the Recognition Rate of our proposed system by using this formula. Accuracy = Number of Recognized Words/Number of Inputted Words × 100 (1) Figure 8 shows the Accuracy Chart of our proposed system and we calculated it by following Eq. 1. As shown in Chart and Fig. 4.6, in experimenting with the Promito Bangla Words in Model A, the accuracy rate was very amazing, with 95.23% to be exact. Testing Local Words of Noakhali in Model B data set gave also good enough accuracy with 85.71%. Using our third dataset Model C, having Local Words of Chittagong we managed to get an accuracy of 71.42%. After analysis, we got that the local language which is very complicated to pronounce decrease accuracy. Because it’s tough to recognize. We know that pronouncing Local Words of Chittagong is very tough. Our analysis shows that increasing number of Training Data increase
A Real-Time Bangla Local Language Recognition from Voice
499
Fig. 8 Performance analysis of our proposed system
accuracy. People all over the world pronounce the same word differently so it stands to reason that if we could train our system with every spoken variation for each word, it could definitely work much more accurately.
7 Conclusion To conclude, the paper we are presenting is a model of voice-to-text conversion method for the Bengali language. We have already mentioned about that the system, which would be very helpful for people from different regions in Bangladesh to understand the different districts local words. Also, it would be very helpful for people who are technically challenged or cannot type. It is also a game-changing tool for the illiterate people. It aims to help the deaf people by giving assistance to the academic sector. So, the model is not only a model for the mechanism but also effective and helpful for various purposes. This independent recognition system for continuous speech is built with machine learning base Google Speech-to-Text API developed by Google which consists of both male and female voices that are being recorded or real-time voice through microphone. With the help of word matching algorithm, the system matches the word and follows the steps to generate correspondent formal Bangla word against the inputted Local Bangla word. To make the system more accurate, lots of improvement is required. Among them, in our thesis, 5 speakers recorded each word 3 times and 3 speakers recorded each word 2 times and got the accuracy of around 95.23% for Promito Bangla Words, 85.71% accuracy for detecting Local words of Noakhali, and 71.42% accuracy for detecting Local words of Chittagong. Especially for Bangla Local words lots of improvement is required.
500
M. J. K. Noor et al.
The recognition results produced by our system showed to be satisfactory. This implementation of Speech Recognition System has been built on small data, domain based, and trained with only 5 speakers, whereas for a general Speech Recognition System at least 50 Speaker is needed. Its performance is dependent on speaker, environment, microphone, distance between speaker and microphone, stress. It recognizes the Bangla Words better when said without noise and it makes mistake to recognize some particular word if said at a slow speed or if broken down. The proposed model has some limitations. The first and foremost one is the accuracy of the system. While training and testing, we trained the system with 315 words at the first, second, and third attempts. At that attempt, the system accuracy was around 95.23% for Promito Bangla Words, 85.71% accuracy for detecting Local words of Noakhali, and 71.42% accuracy for detecting Local words of Chittangong. The accuracy could be increased more than we got in our system by adding more speakers and number of words. We believe that if the number of speakers is increased, the accuracy could be increased easily which we noticed from the first attempt to the final one. Moreover, we worked with an API (Google Speech-to-Text API) in Python environment, in which Bengali speech-to-text conversion was not done in a vast perspective. Because of that we could not get any help or use the previous dataset which could have helped us to decrease our time in training the system. We created the whole dataset newly from the scratch with unique speakers and words. We also believe that if there was the previous dataset that we could use, then we could easily increase the accuracy as well. A. Challenges of Bangla Local Word Recognition Research on the recognition of Bengali local words has not been done enough. For which we have had to face many obstacles in researching the recognition of local words. As many APIs or Frameworks have been developed for speech recognition and some of them are capable of Bengali language recognition, not a single one of his Bengali local words is fully capable of recognition. We know that Bangla is one of the richest languages in the world. It has lots of pronunciation variants, which made the recognition more difficult. Some major challenges of Bangla Local Word Recognition: • Different tone of voice can make it difficult. • Same words are even pronounced differently by different speakers. • Google Cloud API is developing and training every time. It’s not enough trained in Bangla Local Words. Acknowledgements This work was supported in part by the Center for Research, Innovation, and Transformation of Green University of Bangladesh.
A Real-Time Bangla Local Language Recognition from Voice
501
References 1. Bohouta G, Këpuska V (2017) Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int J Eng Res Appl 2248–9622:20–24 2. Chatterji SK (1921) Bengali phonetics. Bull Sch Orient Afr Stud 2(1):125 3. Hasnat MA, Mowla J, Khan M (2007) Isolated and continuous Bangla speech recognition: implementation, performance and application perspective 4. Islam M (2009) Research on Bangla language processing in Bangladesh: progress and challenges 5. Marx M, Schmandt C (1994) Reliable spelling despite poor spoken letter recognition 6. Morrisa M (2012) Words of the world: The global language system. Language in society. In: Sultana S, Akhand MAH, Das PK, Hafizur Rahman MM (eds) Bangla speech-to-text conversion using sapi. 2012 International conference on computer and communication engineering (ICCCE), pp 385–390 7. Nasib AU, Kabir H, Ahmed R, Uddin J (2018) A real time speech to text conversion technique for Bengali language. In: 2018 international conference on computer, communication, chemical, material and electronic engineering (IC4ME2), pp 1–4 8. Rahaman MA, Parvez Hossain M, Rana MM, Arifur Rahman M, Akter T (2020) A rule based system for Bangla voice and text to Bangla sign language interpretation. In: 2020 2nd international conference on sustainable technologies for industry 4.0 (STI), pp 1–6 9. Rahman MM, Khan MF, Moni MA (2010) Speech recognition front-end for segmenting and clustering continuous Bangla speech. Daffodil Int Univ J Sci Technol 5(1): 6772. [Online]. Available: https://www.banglajol.info/index.php/DIUJST/article/view/4384 10. Sultana S, Akhand MAH, Das PK, Hafizur Rahman MM (2012) Bangla speech-to-text conversion using sapi. In: 2012 international conference on computer and communication engineering (ICCCE), pp 385–390 11. Uddin MA, Sakib N, Rupu EF, Hossain MA, Huda MN (2015) Phoneme based Bangla text to speech conversion. In: 2015 18th international conference on computer and information technology (ICCIT), pp 531–533
A Reviewer Recommender System for Scientific Articles Using a New Similarity Threshold Discovery Technique Saiful Azad, M. Ariful Hoque, Nahim Ahmed Rimon, M. Mahabub Sazid Habib, Mufti Mahmud, M. Shamim Kaiser, and M. Rezaul Karim
Abstract Among the tons of articles that are published every year, a considerable number of substandard articles are also published. One of the primary reasons for publishing these substandard articles is due to applying ineffective and/or inefficient reviewer selection processes. To overcome this problem, several reviewer recommender systems are proposed that do not depend on the intelligence of the human selector. However, most of these existing systems do not take the reviewer feedback score or confidence score into consideration during the recommendation process. Therefore, a new reviewer recommender system is proposed in this paper that recommends a set of reviewers to a set of manuscripts with an objective of attaining a high average confidence score taking several constraints into consideration, including a fixed number of reviewers for a manuscript and a fixed number of manuscripts to a reviewer. The proposed system employs a new similarity threshold discovery technique for facilitating the reviewer recommendation process. Again, since there is hardly any dataset exists that satisfies the requirements of the proposed system, a new dataset is prepared by getting the data from various online sources. The proposed system is evaluated by incorporating several existing selection techniques.
S. Azad (B) · M. A. Hoque · N. A. Rimon · M. M. S. Habib Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh e-mail: [email protected] M. Mahmud Nottingham Trent University, Nottingham, UK e-mail: [email protected] M. S. Kaiser Jahangirnagar University, Savar Union, Bangladesh e-mail: [email protected] M. R. Karim Sonali Bank Limited, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. S. Kaiser et al. (eds.), Proceedings of the Fourth International Conference on Trends in Computational and Cognitive Engineering, Lecture Notes in Networks and Systems 618, https://doi.org/10.1007/978-981-19-9483-8_42
503
504
S. Azad et al.
The experimental results demonstrate that despite employing various selection techniques, the proposed system can assign most of the articles to the prescribed number of reviewers. Keywords Automated system · Scientific articles recommendation · Keyphrase extraction · Similarity calculation · Threshold discovery
1 Introduction Due to the overwhelming amount of information and the complexity of discovering appropriate knowledge from that information, the users need recommendations for responding efficiently to the information, trade, and services [1]. A system that recommends or suggests a course of action to the users based on many different factors is generally known as a recommender system or recommendation system or recommendation engine [2]. Note that a recommender system is a subclass of information filtering system that recommends items after filtering fundamental data out of a massive quantity of dynamically generated data by predicting the “rating” or “preference” that a consumer would give to an item [3]. In this era of digitization, recommender systems are employed in a wide range of applications, including economy, education, scientific research, and others [3]. In this paper, the reviewer recommender system, which is a subclass of the scientific research recommender system, is taken into account to select reviewers for various research articles automatically from a pool of reviewers. Generally, in a reviewer recommender system, a subset of reviewers is selected from a set or pool of reviewers based on their research interests to comment on the quality of the submitted manuscripts, proposals, and others. On many occasions, their recommendations of acceptance and/or scores are also acquired to facilitate the final selection process. It has been observed that an ineffective and/or inefficient reviewer selection process results in selecting substandard manuscripts and/or rejecting standard manuscripts, which is not at all desirable [4]. This phenomenon is more frequently observed in the case of the manual reviewer selection process, which substantially depends on the intelligence of the human selector. However, to date, many conferences and journals still exercise this process. To make this process efficient, a considerable number of semi-automated/ automated reviewer recommender systems are proposed in the literature, which is discussed elaborately in Sect. 2. Among them, in [5–13], reviewer recommendation problem is approached with the retrieval-based methods, which mainly take the topic relationship between submissions and reviewer candidates into consideration. On the other hand, in [14, 15], it is approached with the matching-based methods employing a bipartite graph between the submissions and the reviewer candidates. Again, this problem is also treated as the optimization problem in [16–18]. Furthermore, in [8, 19–21], this problem is approached with intelligent decision support methods.
A Reviewer Recommender System for Scientific Articles …
505
Even though there are several similar systems that have already been proposed in various literature, however, only a few of them utilizes reviewer feedback data, including reviewer experiences in the form of scores in the selection process, which is taken into account in this paper. In addition, a new similarity threshold discovery technique is proposed to select a subset of reviewers from a set of candidate reviewers with an objective of attaining a considerably high average confidence score. In this process, this work makes several notable contributions, which could be summarized as follows: – Identify the relationship between the reviewer experience score and his/her research interests. – Proposed a new technique, called the similarity threshold discovery technique to facilitate the reviewer selection process. – Evaluate the proposed technique and reviewer recommender system with the existing similar systems. This paper is organized as follows. In Sect. 2, the existing similar recommender systems to our proposed system are analyzed with brief details. Section 3 presents the problem formulation of the work. The proposed similarity threshold discovery technique and the proposed system are discussed elaborately in Sects. 4 and 5. The experimental setup, including dataset construction, is elaborated in Sect. 6. Section 7 presents the experimental results and their associated discussions. Section 8 details the conclusions of this work.
2 Related Works As argued in the earlier section is that even though tons of articles are published every year, a considerable number of substandard articles are also published. Among various reasons, the reviewer selection process is also considered as one of the primary reasons for that. This phenomenon is more frequent when reviewers are selected manually, which substantially depends on the intelligence of the human selector. Till date, many conferences and journals are still exercising this process. To resolve this problem, many conferences and journals collect reviewers’ interests and assign reviews accordingly. However, since they depend on reviewers’ intelligence, the efficiency of these systems relies on the accuracy of the data that the reviewers have provided. However, it has been observed that around 25% of the keyphrases that authors mentioned in the articles do not exist there [22] and so could be the case here. Therefore, a number of reviewer recommender systems are proposed in various literature. Among them, the reviewer recommendation problem is approached with the retrieval-based methods in [5–13] where the topic relationships between submissions and reviewer candidates are mainly explored. Conversely, this problem is approached with the matching-based methods employing a bipartite graph between
506
S. Azad et al.
the submissions and the reviewer candidates in [14, 15]. Again, in [16–18], this problem is treated as the optimization problem, and in [8, 19–21], it is approached with intelligent decision support methods. Among them, only a few of these existing systems utilize reviewer feedback scores or confidence scores during the recommendation process along with several other constraints that are considered in this paper. Hence, a new recommender system is proposed in this paper with a new similarity threshold discovery technique to resolve the reviewer recommendation problem.
3 Problem Formulation This section includes the fundamental concepts that are essential to understand the recommendation problem alongside the formulation of the problem and the solution to be identified. Assuming a set of manuscripts, denoted as M = {m 1 , m 2 , m 3 , . . . , m n } where 0 < i < n, i ∈ N, and |i| = n; and a set of reviewers R = {r1 , r2 , r3 , . . . , rm } where 0 < j < m, j ∈ N, and | j| = m. In the ideal case, any manuscript can be assigned to any number of reviewers and a reviewer can review any number of manuscripts, which is impractical. Generally, in peer-review conferences or journals, two or more reviewers are assigned to review a manuscript and a reviewer reviews only a limited number of manuscripts. Hence, a subset of competent reviewers, denoted as R where R ⊂ R, need to be selected from candidate reviewers, R; and a reviewer can review a fixed number of manuscripts, i.e., a subset of manuscripts, denoted as M where M ⊂ M. These constraints are also considered while developing the reviewer selection problem in this paper, which is much more challenging than the problem with selecting a reviewer(s) for a single manuscript that is considered in many existing systems. In a nutshell, the proposed system must select R reviewers from R, where η = |R | is a fixed value, keeping |M | to a prescribed threshold, μ for every r ∈ R . Again, a subset of reviewers, R , can be selected in many different ways considering many different parameters, including keyphrases, citations, freshness of topics, and others. However, R selection based on the keyphrase similarities between the manuscripts and reviewers’ research interests is a widely utilized technique; hence, also utilized in this paper. Assuming i keyphrases of a manuscript, m i be denoted as m i = {K 1m i , K 2m i , K 3m i , . . . , K m p } and ri ri ri ri keyphrases of a reviewer, ri be denoted as ri = {K 1 , K 2 , K 3 , . . . , K p }, which are assumed as the research interest of ri in this paper since they are extracted from the published articles of the reviewer using machine learning-based keyphrase extraction techniques. Similarly, the keyphrases of a manuscript signify the meaning or main ideas of it, which are extracted following the similar process like reviewers. Hence, the recommendation problem tackled in this paper can be defined as follows: Definition 1 Given a set of manuscripts M and their respective keyphrases, and given a set of reviewers R and their respective keyphrases, the proposed reviewer
A Reviewer Recommender System for Scientific Articles …
507
recommender system must find a subset of competent R for ∀m i ensuring |R | ≤ η in such a way that a considerably high average confidence score can be attained. Here, a reviewer confidence score can be defined as follows: Definition 2 A reviewer confidence score ζ expresses the degree to which a reviewer feels familiar with the domain of the reviewed manuscripts or feels confidence about the judgment s/he is providing about a manuscript. Generally, this is given as a rating style within a prescribed range. It is an established fact that when ζ of a reviewer is higher for a manuscript, his/her judgment/comments would be considerably more accurate. Again, a confidence score is commonly influenced by the research interests (i.e., keyphrases in this paper) of a reviewer. Hence, it can be hypothesized that if the keyphrase similarity score between a manuscript and a reviewer is higher, it is very likely that the reviewer’s confidence score would be higher. This characteristic is merely investigated in the earlier works and is performed extensively in this paper. Moreover, it is necessary to discover a similarity threshold to tackle this many-to-many or M-to-R recommendation problem. The system that is proposed in this paper considers the problem definition mentioned above during the design process, which is elaborated in the subsequent section (i.e., Sect. 5).
4 Proposed Similarity Threshold Discovery Technique It could be observed from the problem definition in Sect. 3 that the proposed system must not only select R for a manuscript but also ensures quality reviews by attaining higher confidence scores. The relationship between the quality reviews and confidence scores is already discussed in Sect. 3. Again, there is a certain degree of relationship between the confidence scores and similarities of keyphrases between the reviewed manuscripts and the research interests (or keyphrases) of a researcher, which is demonstrated in Sect. 7.1. Hence, a similarity threshold may ensure attaining a considerably high confidence score by recommending only those reviewers whose similarity scores are more than the threshold. Consequently, a new similarity threshold discovery technique is proposed in this paper, which could be mainly divided into two parts, namely similarity calculation and threshold discovery.
4.1 Similarity Calculation To calculate the similarities between the reviewed manuscripts and the most relevant keyphrases of a researcher, two state-of-the-art algorithms are considered in this paper, namely Cosine Similarity and Jaccard Similarity. In the case of Cosine Similarity, it calculates the measures of similarity between two vectors comprised of
508
S. Azad et al.
the weight of similar keyphrases of an internal product space [23] and can be found as follows: n AB i=1 Ai Bi (1) cos(A, B) = = n n AB (A )2 (B )2 i=1
i
i=1
i
Conversely, in case of Jaccard Similarity, it calculates the measures of two finite sample sets comprised of keyphrases where the union of sets is divided by intersection [24]. |A ∩ B| |A ∩ B| = (2) J (A, B) = |A ∪ B| |A| + |B| − |A ∪ B| All the similarity scores received from Eqs. 1 and 2 are ranged from 0 to 1. Both these techniques are utilized during investigating the relationship between the reviewed manuscripts and the most relevant keyphrases of a researcher in Sect. 7.1. However, since the Cosine Similarity exhibits relationships in a profound manner, it is selected in discovering threshold and reviewer selection process.
4.2 Threshold Discovery For attaining a considerably high confidence score from a reviewer, a similarity threshold needs to be identified which will be later utilized in selecting a reviewer primarily before considering other constraints. In this process, a conditional probability, p of a similarity score, s for a certain confidence score, ζ is calculated as p(s | ζ) =
p(s and ζ) p(s)
(3)
where p(s) is the probability of occurring s, and p(s and ζ) is the probability of occurring s given that ζ has occurred. These probability scores are afterwards utilized in discovering a threshold value. For that, a target confidence score τ needs to be selected by the user. Then, to find the threshold, ϑ the following equation is utilized: n p(s | τ ) ϑ = argmax i=τ 1 or R