1,128 57 99MB
English Pages 1285 [1271] Year 2021
Lecture Notes on Data Engineering and Communications Technologies 72
Faisal Saeed Fathey Mohammed Abdulaziz Al-Nahari Editors
Innovative Systems for Intelligent Health Informatics Data Science, Health Informatics, Intelligent Systems, Smart Computing
Lecture Notes on Data Engineering and Communications Technologies Volume 72
Series Editor Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data technologies and communications. It will publish latest advances on the engineering task of building and deploying distributed, scalable and reliable data infrastructures and communication systems. The series will have a prominent applied focus on data technologies and communications with aim to promote the bridging from fundamental research on data science and networking to data engineering and communications that lead to industry products, business knowledge and standardisation. Indexed by SCOPUS, INSPEC, EI Compendex. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/15362
Faisal Saeed Fathey Mohammed Abdulaziz Al-Nahari •
•
Editors
Innovative Systems for Intelligent Health Informatics Data Science, Health Informatics, Intelligent Systems, Smart Computing
123
Editors Faisal Saeed College of Computer Science and Engineering Taibah University Medina, Saudi Arabia
Fathey Mohammed School of Computing, Information Systems Department Universiti Utara Malaysia Sintok, Malaysia
Abdulaziz Al-Nahari Sanaa’a Community College Sana’a, Yemen
ISSN 2367-4512 ISSN 2367-4520 (electronic) Lecture Notes on Data Engineering and Communications Technologies ISBN 978-3-030-70712-5 ISBN 978-3-030-70713-2 (eBook) https://doi.org/10.1007/978-3-030-70713-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
We are honored to welcome you to the 5th International Conference of Reliable Information and Communication Technology 2020 (IRICT2020) that was conducted online on December 21–22, 2020, and organized by the Yemeni Scientists Research Group (YSRG), Information Service Systems and Innovation Research Group (ISSIRG) in Universiti Teknologi Malaysia (Malaysia), Data Science Research Group in College of Computer Science and Engineering at Taibah University (Kingdom of Saudi Arabia), School of Science and Technology in Nottingham Trent (UK), College of Engineering, IT, and Environment at Charles Darwin University (Australia) and Association for Information Systems—Malaysia Chapter (MyAIS). IRICT2020 is a forum for the presentation of technological advances in the field of information and communication technology. The main theme of the conference is “Innovative Systems for Intelligent Health Informatics.” Many researchers have been attracted to submit 140 papers to IRICT2020 from 29 countries including Algeria, Australia, China, Egypt, Fiji, Germany, India, Indonesia, Iraq, Iran, Jordon, Malaysia, Morocco, Myanmar, Nigeria, Oman, Pakistan, Saudi Arabia, Singapore, Somalia, South Africa, Sri Lanka, Sudan, Sweden, Taiwan, Tunisia, UK, USA and Yemen. Of those 140 submissions, 111 submissions have been selected to be included in this book. The book presents several research topics which include health informatics, bioinformatics, information retrieval, artificial intelligence, machine learning, data science, big data analytics, business intelligence, Internet of things (IoT), information security, intelligent communication systems, information systems theories and applications, computational vision and robotics technology, software engineering and multimedia applications and services. We would like to express our appreciations to all authors and the keynote speakers for sharing their expertise with us. And we would like to thank the organizing committee for their great efforts in managing the conference. In addition,
v
vi
Preface
we would like to thank the technical committee for reviewing all the submitted papers; Prof. Dr. Janusz Kacprzyk, AISC series editor, Prof. Dr. Fatos Xhafa, book series editor; and Dr. Thomas Ditzinger from Springer. Finally, we thank all the participants of IRICT2020 and hope to see you all again in the next conference.
Organization
IRICT2020 Organizing Committee Honorary Co-chairs Rose Alinda Alias
Ahmad Lotfi
Abdullah Alsaeedi
Association for Information Systems—Malaysian Chapter, Head of the Information Service Systems and Innovation Research Group (ISSIRG) in Universiti Teknologi Malaysia Computing and Technology School of Science and Technology, Nottingham Trent University, UK College of Computer Science and Engineering, Taibah University, Kingdom of Saudi Arabia
Conference General Chair Faisal Saeed
Yemeni Scientists Research Group (YSRG), Head of Data Science Research Group in Taibah University, Kingdom of Saudi Arabia
Program Committee Chair Fathey Mohammed
Universiti Utara Malaysia (UUM), Malaysia
General Secretary Nadhmi Gazem
Taibah University, Kingdom of Saudi Arabia
vii
viii
Organization
Technical Committee Chairs Faisal Saeed Tawfik Al-Hadhrami Mamoun Alazab
Taibah University, Kingdom of Saudi Arabia Nottingham Trent University, UK Charles Darwin University, Australia
Publications Committee Fathey Mohammed Abdulaziz Al-Nahari
Universiti Utara Malaysia Sanaá Community College
Publicity Committee Abdullah Aysh Dahawi Maged Naeser Mohammed Omar Awadh Al-Shatari Ali Ahmed Ali Salem
Universiti Teknologi Malaysia Universiti Teknologi Malaysia Universiti Teknologi PETRONAS Universiti Tun Hussein Onn Malaysia
IT and Multimedia Committee Fuad Abdeljalil Al-shamiri Mohammed Alsarem Amer Alsaket Sulaiman Mohammed Abdulrahman
Universiti Teknologi Malaysia Taibah University, KSA Sitecore, Malaysia Taibah University, KSA
Treasure Committee Abdullah Aysh Dahawi
Universiti Teknologi Malaysia
Logistic Committee Chair Wahid Al-Twaiti
Universiti Teknologi Malaysia (UTM)
Registration Sameer Hasan Albakri
Universiti Teknologi Malaysia (UTM)
Organization
ix
International Technical Committee Abdelhadi Raihani Abdelhamid Emara Abdelkaher Ait Abdelouahad Abdualmajed Ahmed Ghaleb Abdullah Almogahed Abdullah B. Nasser Abdulrahman Alsewari Abubakar Elsafi Aby Mathews Maluvelil Ahmad Fadhil Yusof Ahmed Awad Ahmed Majid Ahmed Mutahar Ahmed Rakha Ahmed Talal Alaa Alomoush Alaa Fareed Abdulateef Ali Ahmed Ameen Ba Homaid Aminu Aminu Mu’Azu Amr Tolba Amr Yassin Anton Satria Prabuwono Arwa Aleryani Auday Hashim Saeed Al-Wattar Bakar Ba Qatyan Bander Al-Rimy Bassam Al-Hameli Bouchaib Cherradi Mohammed Gamal Alsamman Haitham Alali Ehsan Othman Eissa Alshari Fadhl Hujainah Faisal Saeed Fathey Mohammed Fatma Al-Balushi Feras Zen Alden Fuad Ghaleb
Hassan II University of Casablanca, Morocco Taibah University, KSA Chouaib Doukkali University, Morocco Al-Khulaidi Sana’a University, Yemen Universiti Utara Malaysia, Malaysia University Malaysia Pahang, Malaysia University Malaysia Pahang, Malaysia University of Jeddah, KSA Independent Researcher, Canada Universiti Teknologi Malaysia, Malaysia King Abdulaziz University, KSA University of Information Technology and Communications, Iraq Management and Science University, Malaysia Al-Azhar University, Egypt Al-Iraqia University, Iraq Universiti Malaysia Pahang, Malaysia Universiti Utara Malaysia, Malaysia King Abdulaziz University, KSA University Malaysia Pahang, Malaysia Umaru Musa Yar’adua University Katsina, Nigeria King Saud University, KSA Ibb University, Yemen King Abdulaziz University, KSA Independent Researcher, Canada University of Mosul, Iraq Universiti Utara Malaysia, Malaysia UNITAR, Malaysia University Malaysia Pahang, Malaysia Hassan II University, Morocco Universiti Utara Malaysia, Malaysia Emirates College of Technology, UAE Ovgu Magdeburg, Germany Ibb University, Yemen University Malaysia Pahang, Malaysia Taibah University, KSA Universiti Utara Malaysia, Malaysia Independent Researcher, Oman Universiti Utara Malaysia, Malaysia Universiti Teknologi Malaysia, Malaysia
x
Hamzah Alaidaros Hanan Aldowah Hapini Bin Awang Hassan Silkan Hesham Alghodhaifi Hiba Zuhair Hussein Abualrejal Insaf Bellamine Insaf Bellamine Jawad Alkhateeb Kamal Alhendawi Khairul Shafee, Kalid Khaleel Bader Bataineh Khalili Tajeddine Marwa Alhadi Masud Hasan Mohamad Ghozali Hassan Mohamed Abdel Fattah Mohamed Elhamahmy Mohammed A. Al-Sharafi Mohammed Al-Mhiqani Mohammed Alsarem Mohammed Azrag Mohammed Nahid Mostafa Al-Emran Motasem Al Smadi Mustafa Ali Abuzaraida Mustafa Noori Nabil Al-Kumaim Nadhmi Gazem Naseebah Maqtary Nejood Hashim Al-Walidi Noor Akma Omar Dakkak Omar Zahour Osama Sayaydeh Qasim Alajmi Raed Aldhubhani Raghed Esmaeel Rajesh Kaluri Salah Abdelmageid Salwa Belaqziz Samar Ghazal Samar Salem Ahmed
Organization
Al-Ahgaff University, Yemen Universiti Sains Malaysia, Malaysia Universiti Utara Malaysia, Malaysia Université Chouaib Doukkali, Morocco University of Michigan, USA Al-Nahrain University, Iraq Universiti Utara Malaysia, Malaysia Chouaib Doukkali University, Morocco FSDM Fès, Morocco Taibah University, KSA Al-Quds Open University, Palestine Universiti Teknologi PETRONAS, Malaysia Amman Arab University, Jordan Hassan II University of Casablanca, Morocco Sana’a University, Yemen Taibah University, KSA Universiti Utara Malaysia, Malaysia Taibah University, KSA Egypt University Malaysia Pahang, Malaysia Universiti Teknikal Malaysia Melaka, Malaysia Taibah University, KSA University Malaysia Pahang, Malaysia Hassan II University, Casablanca, Morocco Buraimi University College, Oman Jordan University of Science and Technology, Jordan Universiti Utara Malaysia, Malaysia Middle Technical University, Iraq Universiti Teknikal Malaysia Melaka, Malaysia Taibah University, KSA University of Science and Technology, Yemen Cairo University, Egypt Universiti Malaysia Pahang, Malaysia UNIKA, Turkey Hassan II University of Casablanca, Morocco University Malaysia Pahang, Malaysia A’ Sharqiyah University, Oman University of Hafr Al Batin, KSA University of Mosul, Iraq Vellore Institute of Technology, India Taibah University, KSA Ibn Zohr University, Morocco Universiti Sains Malaysia, Malaysia International Islamic University, Malaysia
Organization
Sharaf J. Malebary Sinan Salih Soufiane Hamida Susan Abdulameer Syifak Izhar Hisham Tawfik Al-Hadhrami Waleed A. Hammood Waleed Ali Waleed Alomoush Waseem Alromimah Wasef Mater Yaqoub Sulaiman Yousef Fazea Yousif Abdullah AlRashidi Yousif Aftan Abdullah Yousif Munadhil Ibrahim Zainab Senan
xi
King Abdulaziz University, KSA Dijlah University College, Iraq Hassan II University of Casablanca, Morocco Universiti Utara Malaysia, Malaysia Universiti Malaysia Pahang, Malaysia Nottingham Trent University, UK Universiti Malaysia Pahang, Malaysia King Abdulaziz University, KSA Imam Abdulrahman bin Faisal University, KSA Taibah University, KSA University of Petra, Jordan King Abdulaziz University, KSA Universiti Utara Malaysia, Malaysia Al Yamamah University, KSA University of Baghdad, Iraq Universiti Utara Malaysia, Malaysia Universiti Utara Malaysia, Malaysia
Contents
Intelligent Health Informatics Comparative Study of SMOTE and Bootstrapping Performance Based on Predication Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdulaziz Aborujilah, Rasheed Mohammad Nassr, Tawfik Al-Hadhrami, Mohd Nizam Husen, Nor Azlina Ali, Abdulaleem Al- Othmani, and Mustapha Hamdi
3
UPLX: Blockchain Platform for Integrated Health Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Omar Musa, Lim Shu Yun, and Reza Ismail
10
Convolutional Neural Networks for Automatic Detection of Colon Adenocarcinoma Based on Histopathological Images . . . . . . . . . . . . . . . Yakoop Qasim, Habeb Al-Sameai, Osamah Ali, and Abdulelah Hassan
19
Intelligent Health Informatics with Personalisation in Weather-Based Healthcare Using Machine Learning . . . . . . . . . . . . . Radiah Haque, Sin-Ban Ho, Ian Chai, Chin-Wei Teoh, Adina Abdullah, Chuie-Hong Tan, and Khairi Shazwan Dollmat A CNN-Based Model for Early Melanoma Detection . . . . . . . . . . . . . . . Amer Sallam, Abdulfattah E. Ba Alawi, and Ahmed Y. A. Saeed SMARTS D4D Application Module for Dietary Adherence Self-monitoring Among Hemodialysis Patients . . . . . . . . . . . . . . . . . . . . Hafzan Yusoff, Nur Intan Raihana Ruhaiyem, and Mohd Hakim Zakaria Improved Multi-label Medical Text Classification Using Features Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rim Chaib, Nabiha Azizi, Nawel Zemmal, Didier Schwab, and Samir Brahim Belhaouari
29
41
52
61
xiii
xiv
Contents
Image Modeling Through Augmented Reality for Skin Allergies Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nur Intan Raihana Ruhaiyem and Nur Amalina Mazlan
72
Hybridisation of Optimised Support Vector Machine and Artificial Neural Network for Diabetic Retinopathy Classification . . . . . . . . . . . . Nur Izzati Ab Kader, Umi Kalsom Yusof, and Maziani Sabudin
80
A Habit-Change Support Web-Based System with Big Data Analytical Features for Hospitals (Doctive) . . . . . . . . . . . . . . . . . . . . . . Cheryll Anne Augustine and Pantea Keikhosrokiani
91
An Architecture for Intelligent Diagnosing Diabetic Types and Complications Based on Symptoms . . . . . . . . . . . . . . . . . . . . . . . . . 102 Gunasekar Thangarasu, P. D. D. Dominic, and Kayalvizhi Subramanian An Advanced Encryption Cryptographically-Based Securing Applicative Protocols MQTT and CoAP to Optimize Medical-IOT Supervising Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Sanaa El Aidi, Abderrahim Bajit, Anass Barodi, Habiba Chaoui, and Ahmed Tamtaoui Pulmonary Nodule Classification Based on Three Convolutional Neural Networks Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Enoumayri Elhoussaine and Belaqziz Salwa A Comparative Study on Liver Tumor Detection Using CT Images . . . 129 Abdulfattah E. Ba Alawi, Ahmed Y. A. Saeed, Borhan M. N. Radman, and Burhan T. Alzekri Brain Tumor Diagnosis System Based on RM Images: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Ahmed Y. A. Saeed, Abdulfattah E. Ba Alawi, and Borhan M. N. Radman Diagnosis of COVID-19 Disease Using Convolutional Neural Network Models Based Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . 148 Hicham Moujahid, Bouchaib Cherradi, Mohammed Al-Sarem, and Lhoussain Bahatti Early Diagnosos of Parkinson’s Using Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Tariq Saeed Mian Detection of Cardiovascular Disease Using Ensemble Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Fizza Kashif and Umi Kalsom Yusof
Contents
xv
Health Information Management Hospital Information System for Motivating Patient Loyalty: A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Saleh Nasser Rashid Alismaili, Mohana Shanmugam, Hairol Adenan Kasim, and Pritheega Magalingam Context Ontology for Smart Healthcare Systems . . . . . . . . . . . . . . . . . . 199 Salisu Garba, Radziah Mohamad, and Nor Azizah Saadon A Modified UTAUT Model for Hospital Information Systems Geared Towards Motivating Patient Loyalty . . . . . . . . . . . . . . . . . . . . . 207 Saleh Nasser Rashid Alismaili, Mohana Shanmugam, Hairol Adenan Kasim, and Pritheega Magalingam Teamwork Communication in Healthcare: An Instrument (Questionnaire) Validation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Wasef Matar and Monther Aldwair Potential Benefits of Social Media to Healthcare: A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Ghada Ahmad Abdelguiom and Noorminshah A. Iahad Exploring the Influence of Human-Centered Design on User Experience in Health Informatics Sector: A Systematic Review . . . . . . . 242 Lina Fatini Azmi and Norasnita Ahmad An Emotional-Persuasive Habit-Change Support Mobile Application for Heart Disease Patients (BeHabit) . . . . . . . . . . . . . . . . . . 252 Bhavani Devi Ravichandran and Pantea Keikhosrokiani A Systematic Review of the Integration of Motivational and Behavioural Theories in Game-Based Health Interventions . . . . . . . 263 Abdulsalam S. Mustafa, Nor’ashikin Ali, and Jaspaljeet Singh Dhillon Adopting React Personal Health Record (PHR) System in Yemen HealthCare Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Ziad Saif Alrobieh, Dhiaa Faisal Alshamy, and Maged Nasser Artificial Intelligence and Soft Computing Application of Shuffled Frog-Leaping Algorithm for Optimal Software Project Scheduling and Staffing . . . . . . . . . . . . . . . . . . . . . . . . 293 Ahmed O. Ameen, Hammed A. Mojeed, Abdulazeez T. Bolariwa, Abdullateef O. Balogun, Modinat A. Mabayoje, Fatima E. Usman-Hamzah, and Muyideen Abdulraheem A Long Short Term Memory and a Discrete Wavelet Transform to Predict the Stock Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Mu’tasem Jarrah and Naomie Salim
xvi
Contents
Effective Web Service Classification Using a Hybrid of Ontology Generation and Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . 314 Murtoza Monzur, Radziah Mohamad, and Nor Azizah Saadon Binary Cuckoo Optimisation Algorithm and Information Theory for Filter-Based Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Ali Muhammad Usman, Umi Kalsom Yusof, and Syibrah Naim Optimized Text Classification Using Correlated Based Improved Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Thabit Sabbah Multi-objective NPO Minimizing the Total Cost and CO2 Emissions for a Stand-Alone Hybrid Energy System . . . . . . . . . . . . . . . . . . . . . . . 351 Abbas Q. Mohammed, Kassim A. Al-Anbarri, and Rafid M. Hannun A Real Time Flood Detection System Based on Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Abdirahman Osman Hashi, Abdullahi Ahmed Abdirahman, Mohamed Abdirahman Elmi, and Siti Zaiton Mohd Hashim Extracting Semantic Concepts and Relations from Scientific Publications by Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Fatima N. AL-Aswadi, Huah Yong Chan, and Keng Hoon Gan Effectiveness of Convolutional Neural Network Models in Classifying Agricultural Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Sayem Rahman, Murtoza Monzur, and Nor Bahiah Ahmad A Study on Emotion Identification from Music Lyrics . . . . . . . . . . . . . . 396 Affreen Ara and Raju Gopalakrishna A Deep Neural Network Model with Multihop Self-attention Mechanism for Topic Segmentation of Texts . . . . . . . . . . . . . . . . . . . . . 407 Fayçal Nouar and Hacene Belhadef Data Science and Big Data Analytics Big Data Interoperability Framework for Malaysian Public Open Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Najhan Muhamad Ibrahim, Amir Aatieff Amir Hussin, Khairul Azmi Hassan, and Ciara Breathnach The Digital Resources Objects Retrieval: Concepts and Figures . . . . . . 430 Wafa’ Za’al Alma’aitah, Abdullah Zawawi Talib, and Mohd Azam Osman A Review of Graph-Based Extractive Text Summarization Models . . . . 439 Abdulkadir Abubakar Bichi, Ruhaidah Samsudin, Rohayanti Hassan, and Khalil Almekhlafi
Contents
xvii
Review on Emotion Recognition Using EEG Signals Based on Brain-Computer Interface System . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Mona Algarni and Faisal Saeed A New Multi-resource Deadlock Detection Algorithm Using Directed Graph Requests in Distributed Database Systems . . . . . . . . . . 462 Khalid Al-Hussaini, Nabeel A. Al-Amdi, and Fuaad Hasan Abdulrazzak Big Data Analytics Model for Preventing the Spread of COVID-19 During Hajj Using the Proposed Smart Hajj Application . . . . . . . . . . . 475 Ibtehal Nafea Financial Time Series Forecasting Using Prophet . . . . . . . . . . . . . . . . . 485 Umi Kalsom Yusof, Mohd Nor Akmal Khalid, Abir Hussain, and Haziqah Shamsudin Facial Recognition to Identify Emotions: An Application of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Kenza Belhouchette Text-Based Analysis to Detect Figure Plagiarism . . . . . . . . . . . . . . . . . . 505 Taiseer Abdalla Elfadil Eisa, Naomie Salim, and Salha Alzahrani A Virtual Exploration of al-Masjid al-Nabawi Using Leap Motion Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Slim Kammoun and Hamza Ghandorh Comparison of Data Analytic Techniques for a Spatial Opinion Mining in Literary Works: A Review Paper . . . . . . . . . . . . . . . . . . . . . 523 Sea Yun Ying, Pantea Keikhosrokiani, and Moussa Pourya Asl Open Data in Prediction Using Machine Learning: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Norismiza Ismail and Umi Kalsom Yusof Big Data Analytics Based Model for Red Chili Agriculture in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Junita Juwita Siregar and Arif Imam Suroso A Fusion-Based Feature Selection Framework for Microarray Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 Talal Almutiri, Faisal Saeed, Manar Alassaf, and Essa Abdullah Hezzam An Approach Based Natural Language Processing for DNA Sequences Encoding Using the Global Vectors for Word Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Brahim Matougui, Hacene Belhadef, and Ilham Kitouni
xviii
Contents
Short-Term CO2 Emissions Forecasting Using Multi-variable Grey Model and Artificial Bee Colony (ABC) Algorithm Approach . . . . . . . . 586 Ani Shabri, Ruhaidah Samsudin, and Essa Abdullah Hezzam IoT and Intelligent Communication Systems A Reliable Single Prediction Data Reduction Approach for WSNs Based on Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Zaid Yemeni, Haibin Wang, Waleed M. Ismael, Younis Ibrahim, and Peng Li A Real-Time Groundwater Level Monitoring System Based on WSN, Taiz, Yemen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 Asma’a K. Akershi, Ziad S. Arobieh, and Reayidh A. Ahmed Design and Simulation of Multiband Circular Microstrip Patch Antenna with CSRR for WLAN and WiMAX Applications . . . . . . . . . . 623 Abdulguddoos S. A. Gaid, Amer A. Sallam, Mohamed H. M. Qasem, Maged S. G. Abbas, and Amjad M. H. Aoun Reference Architectures for the IoT: A Survey . . . . . . . . . . . . . . . . . . . 635 Raghdah Saemaldahr, Bijayita Thapa, Kristopher Maikoo, and Eduardo B. Fernandez A Circular Multiband Microstrip Patch Antenna with DGS for WLAN/WiMAX/Bluetooth/UMTS/LTE . . . . . . . . . . . . . . . . . . . . . . 647 Abdulguddoos S. A. Gaid, Amer A. Sallam, Mohamed H. M. Qasem, Maged S. G. Abbas, and Amjad M. H. Aoun Anomaly Intrusion Detection Systems in IoT Using Deep Learning Techniques: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Muaadh. A. Alsoufi, Shukor Razak, Maheyzah Md Siraj, Abdulalem Ali, Maged Nasser, and Salah Abdo Security and Threats in the Internet of Things Based Smart Home . . . . 676 Nor Fatimah Awang, Ahmad Fudhail Iyad Mohd Zainudin, Syahaneim Marzuki, Syed Nasir Alsagoff, Taniza Tajuddin, and Ahmad Dahari Jarno Simulation and Control of Industrial Composition Process Over Wired and Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Hakim Qaid Abdullah Abdulrab, Fawnizu Azmadi Hussin, Panneer Selvam Arun, Azlan Awang, and Idris Ismail Performance Degradation of Multi-class Classification Model Due to Continuous Evolving Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 696 Abdul Sattar Palli, Jafreezal Jaafar, and Manzoor Ahmed Hashmani
Contents
xix
Compact Wide-Bandwidth Microstrip Antenna for Millimeter Wave Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Osaid Abdulrahman Saeed, Moheeb Ali Ameer, and Mansour Noman Ghaleb Dual-Band Rectangular Microstrip Patch Antenna with CSRR for 28/38 GHz Bands Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 Abdulguddoos S. A. Gaid, Mohamed H. M. Qasem, Amer A. Sallam, and Ebrahim Q. M. Shayea Dual Band Rectangular Microstrip Patch Antenna for 5G Millimeter-Wave Wireless Access and Backhaul Applications . . . . . . . . 728 Abdulguddoos S. A. Gaid, Amer A. Sallam, Amjad M. H. Aoun, Ahmed A. A. Saeed, and Osama Y. A. Saeed Design of Wireless Local Multimedia Communication Network (WLMmCN) Based on Android Application Without Internet Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739 R. Q. Shaddad, F. A. Alqasemi, S. A. Alfaqih, M. F. Alsabahi, A. T. Fara, K. M. Nejad, and E. A. Albukhaiti A Statistical Channel Propagation Analysis for 5G mmWave at 73 GHz in Urban Microcell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748 Zaid Ahmed Shamsan Advances in Information Security Robot Networks and Their Impact on Cyber Security and Protection from Attacks: A Review . . . . . . . . . . . . . . . . . . . . . . . . . 759 Daniah Anwar Hasan and Linah Faisal Tasji An Efficient Fog-Based Attack Detection Using Ensemble of MOA-WMA for Internet of Medical Things . . . . . . . . . . . . . . . . . . . 774 Shilan S. Hameed, Wan Haslina Hassan, and Liza Abdul Latiff A New DNA Based Encryption Algorithm for Internet of Things . . . . . 786 Bassam Al-Shargabi and Mohammed Abbas Fadhil Al-Husainy Watermarking Techniques for Mobile Application: A Review . . . . . . . . 796 Aqilah Abd. Ghani, Syifak Izhar Hisham, Nur Alya Afikah Usop, and Nor Bakiah Abd Warif Analysis and Evaluation of Template Based Methods Against Geometric Attacks: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807 Tanya Koohpayeh Araghi, Ala Abdulsalam Alarood, and Sagheb Kohpayeh Araghi Survey of File Carving Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815 Nor Ika Shahirah Ramli, Syifak Izhar Hisham, and Mohd Faizal Abd Razak
xx
Contents
Affecting Factors in Information Security Policy Compliance: Combine Organisational Factors and User Habits . . . . . . . . . . . . . . . . . 826 Angraini, Rose Alinda Alias, and Okfalisa Mitigation of Data Security Threats in Iraqi Dam Management Systems: A Case Study of Fallujah Dam Management System . . . . . . . . 837 Hussam J. Ali, Hiba Zuhair, and Talib M. Jawad Advances in Information Systems Development and Validation of a Classified Information Assurance Scale for Institutions of Higher Learning . . . . . . . . . . . . . . . . . . . . . . . . 857 Bello Ahmadu, Ab Razak Che Hussin, and Mahadi Bahari Sustainable e-Learning Framework: Expert Views . . . . . . . . . . . . . . . . 869 Aidrina Binti Mohamed Sofiadin Derivation of a Customer Loyalty Factors Based on Customers’ Changing Habits in E-Commerce Platform . . . . . . . . . . . . . . . . . . . . . . 879 Mira Afrina, Samsuryadi, Ab Razak Che Hussin, and Suraya Miskon Analysis of Multimedia Elements Criteria Using AHP Method . . . . . . . 891 Nadiah Mohamad Sofian, Ahmad Sobri Hashim, and Aliza Sarlan The Development of a Criteria-Based Group Formation Systems for Student Group Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903 Divya Gopal Mohan and Khairul Shafee Kalid Trusted Factors of Social Commerce Product Review Video . . . . . . . . . 911 Humaira Hairudin, Halina Mohamed Dahlan, and Ahmad Fadhil Yusof Building Information Modelling Adoption: Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 920 Hafiz Muhammad Faisal Shehzad, Roliana Binti Ibrahim, Ahmad Fadhil Yusof, Khairul Anwar Mohamed Khaidzir, Omayma Husain Abbas Hassan, and Samah Abdelsalam Abdalla Adoption of Smart Cities Models in Developing Countries: Focusing in Strategy and Design in Sudan . . . . . . . . . . . . . . . . . . . . . . . 933 Mohmmed S. Adrees, Abdelrahman E. Karrar, and Waleed I. Osman Factors Affecting Customer Acceptance of Online Shopping Platforms in Malaysia: Conceptual Model and Preliminary Results . . . . . . . . . . . . 945 Nabil Hasan Al-kumaim, Gan Wong Sow, and Fathey Mohammed Student Compliance Intention Model for Continued Usage of E-Learning in University . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 960 Ken Ditha Tania, Norris Syed Abdullah, Norasnita Ahmad, and Samsuryadi Sahmin
Contents
xxi
Digital Information and Communication Overload Among Youths in Malaysia: A Preliminary Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975 Mohamad Ghozali Hassan, Muslim Diekola Akanmu, Hussein Mohammed Esmail Abualrejal, and Amal Abdulwahab Hasan Alamrani The Effect of Using Social Networking Sites on Undergraduate Students’ Perception and Academic Performance at University of Taiz-Yemen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987 Maged Rfeqallah, Rozilah Kasim, Faisal A. M. Ali, and Yahya Abdul Ghaffar Building Information Modelling Adoption Model for Malaysian Architecture, Engineering and Construction Industry . . . . . . . . . . . . . . 999 Hafiz Muhammad Faisal Shehzad, Roliana Binti Ibrahim, Ahmad Fadhil Yusof, Khairul Anwar Mohamed Khaidzir, Muhammad Mahboob Khurshid, and Farah Zeehan Othman Digital Government Competency for Omani Public Sector Managers: A Conceptual Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 1009 Juma Al-Mahrezi, Nur Azaliah Abu Bakar, and Nilam Nur Amir Sjarif Computational Vision and Robotics Landmark Localization in Occluded Faces Using Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1023 Zieb Rabie Alqahtani, Mohd Shahrizal Sunar, and Abdulaziz A. Alashbi Contrast Image Quality Assessment Algorithm Based on Probability Density Functions Features . . . . . . . . . . . . . . . . . . . . . . . 1030 Ismail Taha Ahmed, Soong Der Chen, Norziana Jamil, and Baraa Tareq Hammad The Impact of Data Augmentation on Accuracy of COVID-19 Detection Based on X-ray Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041 Yakoop Qasim, Basheer Ahmed, Tawfeek Alhadad, Habeb Al-Sameai, and Osamah Ali A Fusion Schema of Hand-Crafted Feature and Feature Learning for Kinship Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1050 Mohammed Ali Almuashi, Siti Zaiton Mohd Hashim, Nooraini Yusoff, and Khairul Nizar Syazwan Lossless Audio Steganographic Method Using Companding Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064 Ansam Osamah Abdulmajeed
xxii
Contents
Smart Traffic Light System Design Based on Single Shot MultiBox Detector (SSD) and Anylogic Simulation . . . . . . . . . . . . . . . . 1075 E. R. Salim, A. B. Pantjawati, D. Kuswardhana, A. Saripudin, N. D. Jayanto, Nurhidayatulloh, and L. A. Pratama Learning Scope of Python Coding Using Immersive Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086 Abdulrazak Yahya Saleh, Goh Suk Chin, Roselind Tei, Mohd Kamal Othman, Fitri Suraya Mohamad, and Chwen Jen Chen Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101 Gobiga Rajalingam, Janarthan Jeyachandran, M. S. M. Siriwardane, Tharshvini Pathmaseelan, R. K. N. D. Jayawardhane, and N. S. Weerakoon A Comparison of CNN and Conventional Descriptors for Word Spotting Approach: Application to Handwritten Document Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115 Ryma Benabdelaziz, Djamel Gaceb, and Mohammed Haddad Handwritten Arabic Character Recognition: Comparison of Conventional Machine Learning and Deep Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1127 Faouci Soumia, Gaceb Djamel, and Mohammed Haddad Document Image Edge Detection Based on a Local Hysteresis Thresholding and Automatic Setting Using PSO . . . . . . . . . . . . . . . . . . 1139 Mohamed Benkhettou, Nibel Nadjeh, and Djamel Gaceb Fast I2SDBSCAN Based on Integral Volume of 3D Histogram: Application to Color Layer Separation in Document Images . . . . . . . . . 1151 Zakia Kezzoula and Djamel Gaceb Enhancing Daily Life Skills Learning for Children with ASD Through Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164 Rahma Bouaziz, Maimounah Alhejaili, Raneem Al-Saedi, Abrar Mihdhar, and Jawaher Alsarrani Recent Computing and Software Engineering SpaceScience App: Development of a Mobile Application for School Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177 Wan Fatimah Wan Ahmad and Ain Fatihah Ahmad Harnaini Research on Online Problem-Based Learning Among Undergraduate Students: A Systematic Review . . . . . . . . . . . . . . . . . . . 1187 Amira Saif and Irfan Umar
Contents
xxiii
Derivation of Factors in Dealing Negative E-WOM for Maintaining Online Reputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198 Rizka Dhini Kurnia, Halina Mohamed Dahlan, and Samsuryadi A Terms Interrelationship Approach to Query Expansion Based on Terms Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209 Nuhu Yusuf, Mohd Amin Mohd Yunus, Norfaradilla Wahid, Mohd Najib Mohd Salleh, and Aida Mustapha Multi-domain Business Intelligence Model for Educational Enterprise Resource Planning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 1218 Hisham Abdullah, Azman Taa, and Fathey Mohammed Measuring Risk Mitigation Techniques in Agile Global Software Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225 Adila Firdaus Arbain, Muhammad Akil Rafeek, Zuriyaninatasa Podari, and Cik Feresa Mohd Foozy Risk Mitigation Framework for Agile Global Software Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233 Zuriyaninatasa Podari, Adila Firdaus Arbain, Noraini Ibrahim, and Endah Sudarmilah Re-verification of the Improved Software Project Monitoring Task Model of the Agile Kanban (i-KAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . 1247 Hamzah Alaidaros, Mazni Omar, Rohaida Romli, and Adnan Hussein Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1259
Intelligent Health Informatics
Comparative Study of SMOTE and Bootstrapping Performance Based on Predication Methods Abdulaziz Aborujilah1(B) , Rasheed Mohammad Nassr1 , Tawfik Al-Hadhrami2 , Mohd Nizam Husen1 , Nor Azlina Ali1 , Abdulaleem Al- Othmani1 , and Mustapha Hamdi3 1 University Kuala Lumpur, 50250 Kuala Lumpur, Malaysia
[email protected] 2 Nottingham Trent University, Nottingham NG1 4FQ, UK 3 Edge IA, IoT, Nottingham, UK
Abstract. Recently, there has been a renewed interest in smart health systems that aim to deliver high quality healthcare services. Prediction methods are very essential to support these systems. They mainly rely on datasets with assumptions that match the reality. However, one of the greatest challenges to prediction methods is to have datasets which are normally distributed. This paper presents an experimental work to implement SMOTE (Synthetic Minority Oversampling Technique) and bootstrapping methods to normalize datasets. It also measured the impact of both methods in the performance of different prediction methods such as Support vector machine (SVM), Naive Bayes, and neural network(NN) The results showed that bootstrapping with native bays yielded better prediction performance as compared to other prediction methods with SMOTE. Keywords: Datasets normalization · Prediction systems · Dataset redistribution methods · SMOTE-Bootstrapping
1 Introduction Healthcare systems demand for accurate data to handle all aspects of healthcare tasks from making the policies until delivering the end services [1]. Data normality is necessary for efficient healthcare systems. Data mining methods are mainly used in healthcare applications such as disease predictions. The performance of such applications is basically influenced by the issue of classes’ distributions. These methods presumably deal with balanced datasets. However, most real datasets are not balanced. This causes poor performance of prediction systems. Thus, datasets balancing issues are getting more attention from researchers [1]. The issue of imbalanced dataset highly impacts the sensitivity of prediction methods and creates a bias in prediction performance [2]. For example, misclassification of the minority class causes serious consequences in detecting fraud, intrusion, and chronic diseases [1]. Imbalanced data makes classification algorithms more sensitive towards majority classes. This is due to the performance © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 3–9, 2021. https://doi.org/10.1007/978-3-030-70713-2_1
4
A. Aborujilah et al.
of the classification algorithms which are highly biased towards the majority classes, at the same time performing poorly towards the minority classes. Reducing such biases requires techniques to balance the data such as Cost-Sensitive Approach, samplingbased techniques, algorithm modifications [3], and attributes ranking [4]. This paper demonstrated how sampling methods such as SMOTE and bootstrapping influence the production performance. The remainder of this paper is organized with the second section presenting the related studies followed by the experiment section. The next section is results’ discussion while the last section is on conclusion and future work.
2 Related Works An imbalanced dataset has a direct impact in the accuracy of prediction systems. This impact results in the form of abnormality of classes’ distribution. It is relatively difficult to have accurate prediction systems with imbalanced classes [1]. Similarly, it is not easy to have balanced datasets from real life situations. This is due to the ubiquitous nature of real-life datasets. Therefore, it is a researchable issue of improving the predictive performance of classification algorithms and reducing the deficiency of class imbalance [1]. An imbalanced dataset consists of data that belong to both majority and minority classes. From this view, two main approaches to deal with imbalanced data have been identified. First approach is under-sampling which aims to reduce the number of majority samples [6] while second is oversampling which aims to increase the amount of minority samples [7]. Another method [8] for sampling is Bagging method which uses random sampling with the replacement of the original dataset. The optimal dataset is chosen based on the average results of all the models (ensemble model). This method relies on random sampling and does not rely on training results. It also performs well with generalization and has no direct impact on imbalance unless some other factors are included. In [9] and [10], a modification of oversampling was suggested by selecting the training sample randomly. This method has a positive impact in improving accuracy of the classification model. However, it causes delay in processing large datasets. Another method of sampling was proposed in [11] and [12] where it leads to new data generation from the training dataset. It keeps the basic characteristics of the original dataset but lowers the classification model accuracy as the original dataset maintains its integrity. Another sampling method is called under-sampling which focuses on reducing the majority class and finds a balance with minority class. It has gained more attention among academics [1]. The study [13] presented two methods of sampling, under-sampling as random and informative. Informative under-sampling looks to achieve the data balance by eliminating dates from the training dataset based on predefined criteria. Deep neural network is used in different domains. It is also used widely in security applications. SMOTE was proposed by Chawla et al. [7]. Its core idea is constructing the synthetic minority samples through the interpolation between minority training data and its k-nearest neighbourhoods [14]. SMOTE is an oversampling technique that is used to increase the minority class samples by generating data artificially. It continuously increases the minority until the dataset reaches an acceptable ratio where the minority class and majority class become approximately equal [15]. SMOTE is an acceptable
Comparative Study of SMOTE and Bootstrapping Performance
5
method for oversampling. On the other hand, it suffers from high ratio of misclassification [16]. It uses minority class to generate new synthetic examples based on the randomly selected k-nearest neighbors.The number of generated samples depends on a predefined over-sampling ratio [17]. This pre-processing method is widely used to enhance the dataset balancing. More works on the extensions are on-going [18] such as generating and removing the samples simultaneously based on their influences on the model performance, such as SPIDER [19]. Bootstrapping technique resamples each training record by applying the probability with replacements based on the bagging algorithm concept [14]. The bagging algorithm creates a random forest of samples and features used in training that are selected randomly [20]. Bootstrapping is a data replacement method that replaces the actual dataset items with their statistical interests such as mean as shown in the following steps: 1. Given a data set of size n, replacement is applied on the sample from the data set n times 2. Repeat step 1 m times (e.g., m = 10,000) 3. For each vector produced by step 1, calculate the statistic of interest (e.g., the mean) 4. The result is a distribution of the statistic (e.g., if the statistic of interest is the mean, with the assumptions that you picked the right m and n is large enough, step 3 should result in a normal distribution) [22].
3 Experiments The goal of these experiments was to investigate how to handle the problem of imbalanced dataset through the refinement of training dataset. To carry out the experiment RapidMiner studio 9.3 [1] have been used. Well-known dataset which is Breast Cancer Wisconsin (Diagnostic) dataset has been selected [21]. It consists of 569 patient records of breast cancer with 12 attributes. Table 1 shows the used attributes and their definition. The steps of conducting the experiments are consists of three main phases, training, testing and evaluation. At the training phase the dataset (2/3 of whole dataset) is extracted form CSV file to rapid miner data store then diagnosis feature was selected as the class label. Then SMOTE method is used to normalize dataset, next SVM, Naïve Bayes and Neural network classification models are created. Afterword, at the testing phase (1/3 of whole dataset) is extracted and the models are applied. Then confusion matrix and measurements are calculated. The same steps are repeated and SMOTE was replaced with Bootstrapping method. Finally, the measurements values of both methods are compared. Figure 1 show the steps of conducting the experiments. A predictor attribute is a diagnosis which takes two values either M = malignant or B = benign. The number of records in M class was 212 (37%) and B class was 357 (63%). This shows that the dataset was not balanced. Two types of experiments were done. The first experiment aimed to examine how bootstrapping and SMOTE methods impact the prediction performance while the second aimed to evaluate how the prediction performance is affected by using the oversampling method. Three types of prediction methods were used in this evaluation: SVM, neural networks, and Naïve Bayes. Table 2 shows the results of the prediction performance by using bootstrapping and SMOTE with SVM, neural networks, and Naïve Bayes.
6
A. Aborujilah et al.
Fig. 1. Experiments phases flow diagram
Table 1. Breast cancer Wisconsin dataset attributes Features
Date type Meaning
1.
ID number
Real
ID
2.
Diagnosis
Real
M = malignant, B = benign
3.
Radius
Real
Mean of distances from centre to points on the perimeter
4.
Texture
Real
Standard deviation of gray-scale values
5.
Perimeter
Real
Perimeter
6.
Area
Real
Area
7.
Smoothness
Real
Local variation in radius lengths
8.
Compactness
Real
Perimeter2 / area - 1.0
9.
Concavity
Real
Severity of concave portions of the contour
10. Concave points
Real
Number of concave portions of the contour
11. Symmetry
Real
Symmetry
12. Fractal dimension Real
“Coastline approximation” -
Table 2 compares the prediction performance of three methods: SVM, Naïve Bayes, and neural networks using SMOTE and bootstrapping. The results showed that the bootstrapping generally did better than SMOTE. For example, SVM with bootstrapping achieved the best results as compared to the prediction method that used SMOTE. The results of prediction sensitivity, specificity, precision and negative predictive, and accuracy reached 100%. F1 Score, Matthews Correlation Coefficient, False Positive Rate,
Comparative Study of SMOTE and Bootstrapping Performance
7
Table 2. Performance comparison of SMOTE and Bootstrap methods Measurements
SMOTE
Bootstrapping
SVM
Neural networks
Naive bayes
SVM
Neural networks
Naive bayes
Sensitivity
0.972
0.9813
0.8879
1
0.9848
0.8939
Specificity
0.9813
0.9813
0.972
1
0.9905
0.9714
Precision
0.9811
0.9813
0.9694
1
0.9848
0.9516
Negative predictive value
0.9722
0.9813
0.8966
0
0.9905
0.9358
False positive rate
0.0187
0.0187
0.028
0
0.0095
0.0286
False discovery rate
0.0189
0.0187
0.0306
0
0.0152
0.0484
False negative rate
0.028
0.0187
0.1121
0
0.0152
0.1061
Accuracy
0.9766
0.9813
0.9299
1
0.9883
0.9415
F1 Score
0.9765
0.9813
0.9268
1
0.9848
0.9219
Matthews correlation coefficient
0.9533
0.9626
0.8629
1
0.9753
0.8763
False Discovery Rate, and False Negative Rate reached 0%. Naïve with bootstrapping did not achieve good prediction results as compared to the other prediction methods that used SMOTE. For example, the prediction sensitivity, specificity, precision, negative predictive, accuracy, F1 Score, and Matthews Correlation Coefficient values were less than their pairs with SMOTE method. Similarly, the false positive rate, false discovery rate, and false negative rate values were higher than their pairs in SMOTE method.
4 Discussion The initial objective of the project is to explore how SMOTE and bootstrapping methods repair imbalance data. It compared the performance of three prediction methods of SVM, Naïve Bayes, and neural networks with SMOTE and bootstrapping methods. Figure 2 shows that SMOTE generally performed poorer than bootstrapping with SVM and neural networks methods. This is because SMOTE does not consider the neighboring records that can belong to other classes which increase overlapped classes and add new noisy data. In contrast, bootstrapping is a straightforward way to redistribute data to become normal via calculating a statistic of interest such as mean. SVM with bootstrapping reached the best performance because of its ability to avoid the direct probability estimates and insistence that it relies on soft margin classification concept.
8
A. Aborujilah et al. 1.2 1 0.8 0.6 0.4 0.2 0 Sensivity Specificity Precision Negave Predicve Value
False Posive Rate
False False Accuracy Discovery Negave Rate Rate
SVM_with_SMOTE
Deep learning_with_SMOTE
Naive Bayes_with_SMOTE
SVM_with_bootstrapping
Deep learning _with_bootstrapping
Naivewith_bootstrapping
F1 Score Mahews Correlaon Coefficient
Fig. 2. Performance comparison of SMOTE and Bootstrapping with prediction methods
5 Conclusion The purpose of the current study is to compare the impact of samples redistribution methods such as SMOTE and bootstrapping on predication methods performance. So, SVM, NN, and Naïve Bayes prediction methods have been used. Overall, these results indicate that SMOTE generally performs poorer than bootstrapping with SVM and neural networks methods. This is because SMOTE does not take into account the neighboring records that can belong to other classes which increase overlapped classes and add new noisy data. This research has strengthened our understanding of how to optimise prediction processes with imbalanced datasets through employing the sample normalization methods. The major limitation of this study is that the evaluation was done by using a health dataset. Training other datasets may improve the understanding of this problem. More works need to be done to determine how to handle imbalanced datasets using other optimization methods such as ad-hoc heuristics-based methods. Further research should be carried out to establish the comparative approach to handle the imbalanced dataset problem including selecting the most efficient features and prediction methods.
References 1. Ebenuwa, S.H., Sharif, M.S., Alazab, M., Al-Nemrat, A.: Variance ranking attributes selection techniques for binary classification problem in imbalance data. IEEE Access 7, 24649–24666 (2019) 2. Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv Prepr. arXiv1305.1707 (2013)
Comparative Study of SMOTE and Bootstrapping Performance
9
3. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. (Ny) 250, 113–141 (2013) 4. Meidan, Y., et al.: N-BaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 17(3), 12–22 (2018) 5. Nguyen, G.H., Bouzerdoum, A., Phung, S.L.: Learning pattern classification tasks with imbalanced data sets. In: Pattern recognition, IntechOpen (2009) 6. Luo, M., Wang, K., Cai, Z., Liu, A., Li, Y., Cheang, C.F.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Contin. 58(1), 15–26 (2019) 7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) 8. Caminero, G., Lopez-Martin, M., Carro, B.: Adversarial environment reinforcement learning algorithm for intrusion detection. Comput. Netw. 159, 96–109 (2019) 9. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J.: Improving software-quality predictions with data sampling and boosting. IEEE Trans. Syst. Man, Cybern. Syst. Humans 39(6), 1283–1294 (2009) 10. Drummond, C., Holte, R.C.: “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol. 11, pp. 1–8 (2003) 11. Liu, A., Ghosh, J., Martin, C.E.: Generative Oversampling for Mining Imbalanced Datasets. In: DMIN, pp. 66–72 (2007) 12. Huda, S., Yearwood, J., Jelinek, H.F., Hassan, M.M., Fortino, G., Buckland, M.: A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4, 9145–9154 (2016) 13. Team, A.V.C.: Practical guide to deal with imbalanced classification problems in R. Analytics Vidhya (2016) 14. Wang, Q., Luo, Z., Huang, J., Feng, Y., Liu, Z.: A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput. Intell. Neurosci. 2017, (2017) 15. Liu, R., Hall, L.O., Bowyer, K.W., Goldgof, D.B., Gatenby, R., Ben Ahmed, K.: Synthetic minority image over-sampling technique: How to improve AUC for glioblastoma patient survival prediction. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1357–1362 (2017) 16. Wijermans, N., Conrado, C., van Steen, M., Martella, C., Li, J.: A landscape of crowdmanagement support: an integrative approach. Saf. Sci. 86, 142-164 (2016) 17. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004) 18. Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111 (2011) 19. Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292 (2008) 20. Liaw, A., Wiener, M.: Classification and regression by randomForest. R news 2(3), 18–22 (2002) 21. Lavanya, D., Rani, D.K.U.: Analysis of feature selection with classification: breast cancer datasets. Indian J. Comput. Sci. Eng. 2(5), 756–763 (2011)
UPLX: Blockchain Platform for Integrated Health Data Management Omar Musa1(B) , Lim Shu Yun1 , and Reza Ismail2 1 Faculty of Business and Technology, UNITAR International University,
47300 Petaling Jaya, Malaysia {omarm,lim_sy}@unitar.my 2 LedgerX International Sdn Bhd, T3-20-3A Icon City Trade Center, 47300 Petaling Jaya, Malaysia [email protected]
Abstract. Health data management currently needs a technology refresh in order to provide accurate, reliable and verifiable data for doctors and researchers to decide on the best medications and for the public to have their own dependable health information history as they continue with their daily lives. We propose UPLX (Unified Patient Ledger) as a blockchain-based data platform to securely record, “anonymize” and store patient health data for medical, academic and pharmaceutical research. UPLX is a blockchain powered health data platform which is designed to be interoperable and can be integrated with any hospital information systems (HIS) through API (Application Programming Interface) technology. To this end, a Hyperledger Fabric implementation is described to demonstrate the feasibility of the proposal and its use in healthcare organization. Successful implementation will accelerate the acceptance of Blockchain technology in protecting recorded health data while increasing the efficiency of healthcare delivery. Keywords: Blockchain · Interoperable · Medical ledger · Hyperledger fabric
1 Introduction The ongoing COVID-19 pandemic has exposed the need of a better data platform to manage patient health information [1]. The health care industry has transitioned from paper based to digital record keeping through the introduction of modern Electronic Medical Records (EMR) and Electronic Health Record (EHR) systems. Together, these two digital record keeping systems are able to chart a patient’s medical history and overall health. However, digital record keeping comes with its own share of advantages and disadvantages. While digital health records have proven to be cost effective, efficient and has greatly improved the accessibility of health information, recent security breaches has also introduced some ethical and security issues in managing patient confidentiality and privacy [2].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 10–18, 2021. https://doi.org/10.1007/978-3-030-70713-2_2
UPLX: Blockchain Platform for Integrated Health Data Management
11
In 2019, it was found that records for 14,000 HIV-Positive people in Singapore was leaked after a major cyber attack on Singapore’s health database [3]. In 2013, a former medical technician at Howard University Hospital in Washington U.S., pleaded guilty for violating the Health Insurance Portability and Accountability Act (HIPAA) by selling identifiable patient information to third parties [4]. A HIPAA Compliance Study found that 73% of physicians text other physicians about work [5]. Text messages, although fast and efficient can be easily accessed by third parties if the mobile device is not secured properly, misplaced, lost or stolen. According to a report from the Business Insider Intelligence, health care was exposed to more cybersecurity breaches than any other industry in 2018, accounting for 25% of 750 reported hacks. The numbers were particularly high in the U.S., where health firms suffered a record 365 data breaches in 2018 in comparison with 2017 s high of 358. During the hacks patients frequently lose their social security numbers, names and addresses. Sometimes, the information is more sensitive, such as health insurance information and medical histories [6]. UPLX or Unified Patient Ledger is a blockchain powered health data network designed to securely record and store patient consultation, prescription and treatment data. Data recorded in UPLX is securely “anonymized” using blockchain encryption methodology, enabling unprecedented access to patient data without exposing patient identity and confidentiality. UPLX is designed to be interoperable and easily integrated with any hospital system via API or data entry process. This work extends and improves on related works such as [7, 8] and [9]. In [7], the authors proposed the adoption of Blockchain technology as a disruptive technology for health data management. In [8], the authors proposed a secure electronic health record management on a public blockchain and the attendant consensus algorithm. In addition, simulations were performed to evaluate the scalability of their proposal instead of actual implementation. In [9], the authors proposed a model of a private consortium Blockchain model which assumes the presence of a cloud “Blockchain-As-A-Service (BaaS)” platform that integrates the off-chain and on-chain transactions. The remainder of this paper is structured in the following manner: In Sect. 2, we introduced the salient points of Blockchain Technology. In Sect. 3 we present the Hyperledger Fabric Blockchain model which utilizes permissioned blockchain network and how this is applicable for our UPLX implementation. In Sect. 4, we outlined the UPLX Interoperable Architecture and how it can be the foundation of our use case scenario of collaborations among relevant stakeholders. In Sect. 5, we presented the Data Structure Design and how the read and write processes of patient transactions are captured and stored. Finally, Sect. 6 concludes the paper.
2 Blockchain Technology The Blockchain concept was first published by a (fictional) person named Satoshi Nakamoto in 2009 in the form of a white paper about a peer-to-peer electronic cash system called Bitcoin [10]. It is a distributed P2P (Peer to Peer) ledger technology to process transactions in immutable blocks of data using cryptography. The blockchain is deployed as a distributed and decentralized network that processes, verifies and maintains (multiple copies of) its own data; autonomously.
12
O. Musa et al.
The blockchain records transactions in the form of an immutable ledger. It is deployed via a distributed network of untrusting peers, each maintaining a copy of the ledger [11]. Data is created in the blockchain in the form of a time stamped ledger which cannot be changed, updated or deleted in anyway because the provisions to do so simply does not exist. This data is therefore termed immutable. Its time stamped ledger also enables provenance, enabling us to trace the origin of the data and its evolution over time. Computable blockchains are able to execute pre-programmable code called smart contracts, which are preprogrammed code that can be written and deployed into the blockchain with rules to self-execute and self-enforce itself [9]. Smart contracts are transparent, run autonomously and once deployed cannot be changed or manipulated. The blockchain architecture then allows untrusting parties with common interests to co-create a permanent, unchangeable and transparent record of exchange and processing without relying on a central authority [12]. UPLX or Unified Patient Ledger is built on Hyperledger Fabric and designed as a blockchain platform to securely record and store patient data.
3 Hyperledger Fabric Hyperledger Fabric is a modular and extensible open source system for deploying permissioned blockchain networks [13]. Permissioned blockchain networks operate with a set of known, identified and verified participants. Public or permissionless blockchain networks such as Bitcoin or Ethereum allows anyone to participate in its network without verifying their identity. Public blockchains are usually deployed for cryptocurrency and usually require a consensus algorithm such as Proof of Work (POW) in Bitcoin coupled with fee-based incentives to ensure its transactions are made without the need for any centralized authority for verification. In such distributed, consensus environments, transactions are executed through an Order-Execute architecture (Fig. 1) in which it is broadcasted to all peers and it is executed sequentially [14].
Fig. 1. Public/Permissionless order-execute architecture
This results in some limitations: • Consensus has to be hard-coded within the platform • Transactions need to be broadcasted to all peers and executed sequentially • Smart contracts are programmed via a fixed, non-standard domain specific programming language and need to be run at all peers
UPLX: Blockchain Platform for Integrated Health Data Management
13
In a permissioned blockchain network such as Hyperledger Fabric, participants are groups of organizations that even though do not fully trust each other are able to exchange information and validate transactions because they share a common goal. Hyperledger Fabric executes transactions in a different Execute-Order-Validate (Fig. 2) architecture [14].
Fig. 2. Permissioned Execute-Order-Validate Architecture
Transactions are executed and endorsed first before they are ordered and validated in the chain. This resolves some of the limitations of a Order-Execute based public blockchain network: • All peers validate transactions, but not all peers need to execute it • Endorsement policies can be created and customized to determine which peers execute smart contracts • Executing transactions before ordering them allows peers to execute transactions in parallel • Smart contracts can be written using non-deterministic code such as Go, JAVA and Node.js
4 UPLX Interoperable Architecture UPLX is a blockchain powered interoperable health data platform and can be integrated with any hospital information systems (HIS) through API (Application Programming Interface) technology. To provide real-time data, UPLX can also be integrated with health tracking apps, wireless enabled wearables or IoT devices. Blockchain based data architecture is the leading candidate to enhance interoperability whether for existing application systems, IoT platforms or other smart devices because it is able to ensure security, privacy and performance [14]. UPLX is divided into two phases: A Write Phase and a Read Phase. In the Write Phase where each medical or health institution participating in UPLX network is represented as an “Organization” object, with rights to create and endorse transactions. The Write Phase (Fig. 3) provides tools for organizations to record their patient data via integrating UPLX APIs into their information systems. Their data is encrypted and patient information is anonymized before being recorded into the blockchain. UPLX anonymizes patient identity by applying an SHA-256 cryptographic hash function; utilizing information such as patient name and identity number combined with the organization’s private keys to create a unique representation of that data (Fig. 4). Since the private keys are unique to each organization, their patient information cannot be read by other organizations within UPLX network.
14
O. Musa et al.
Fig. 3. UPLX write phase
Fig. 4. Organization based “Anonymized” patient asset
The Read Phase (Fig. 5) enables recorded data to be accessed by third parties to perform various actions such as big data analytics, machine learning and Artificial Intelligence (AI). Read access to the UPLX network allows access to readable data structures which store health records within UPLX.
Fig. 5. UPLX read phase
UPLX: Blockchain Platform for Integrated Health Data Management
15
These APIs can be integrated into various analytical systems which enables access to data being recorded in real time from the health organization’s internal systems as the patient goes through their consultation processes. The access to the data may be permissioned with time restrictions as well as a subset of the data depending on the use cases scenarios. The validity and success of research studies, big data analytics and artificial intelligence activities depend a whole lot of the source data that it uses. The data should ideally come of its original source, unedited and untouched, for a valid and meaningful research and analysis to be performed.
5 UPLX Data Structure UPLX utilizes the Hyperledger Fabric model to map its health-based data structure design. UPLX data structure can be summarized into two main components, Assets and Transactions. In Hyperledger Fabric, an asset is defined as a collection of key-value pairs while a transaction are chaincode executables written to change or modify the state of an asset [14]. An asset in UPLX can be generalized as a binary representation of a patient’s identity. While transactions are a set of activities or actions that can be performed upon a patient. UPLX focuses on recording patient data and is designed specifically to integrate patient records and medical workflow into the blockchain (Fig. 6).
Fig. 6. UPLX data structure summary
UPLX transactions that directly affect the patient’s status are categorized into four main types: • • • •
Constant Physical Data (e.g. ethnicity, blood type) Variable Physical Data (e.g. height, weight) Location Data (e.g. city, state) Social Data (e.g. marriage status, number of children)
These categories are then paired with 8 medical workflows defined as transactions; or medical activities that affects or cause changes to the patient asset: • Consultations • Triage • Issues
16
• • • • •
O. Musa et al.
Tests Outpatient/Inpatient Prescriptions Procedures History
Each transaction has its own set of data structures which describes how it relates to and affects an asset (Fig. 7).
Fig. 7. Example - create patient data structure JSON API
A combination of APIs allow for any health and medical record system to integrate their workflows and participate in UPLX blockchain platform. Similar to any other blockchain platforms, each medical organization participating in UPLX generates their own private key which is used to encrypt their patient data. UPLX’s comprehensive data structure allows for a robust implementation of blockchain based health data records. Data structures are designed to enable tracking the health records of a person’s whole life, from simple medical consultations, inpatient treatments, health checkups to disbursement and consumption of medication and specific drugs.
UPLX: Blockchain Platform for Integrated Health Data Management
17
There is great potential of UPLX use cases in areas of drug research and traceability, clinical trials, disease tracking and even indirect processes such as health insurance claims.
6 Conclusion As blockchain technology continues to gain more exposure, the adoption of blockchain technology into industry applications is something that should be looked into to fully reap its benefits. Many blockchain projects are kicking into high gear as researchers and practitioners continue to experiment with its capabilities and limits. The blockchain at its core, generates a set of secure, immutable and trusted data. Data is the foundation of research and analysis in many areas of industry and academia. The validity, accuracy and reliability of any research stems from the trustworthiness of its data source. We believe our blockchain powered UPLX platform as described in this paper is able to function as a unified platform to securely consolidate, record and store patient medical data. It is able to operate as a trusted source of health data and anonymously distribute health records for the purpose of improving the reliability, accuracy and validity of research and development of medications and vaccines, while protecting the patient’s confidentiality and privacy. In addition, by having a permissioned blockchain architecture, a consensus algorithm is not needed thus increasing the scalability and efficiency of UPLX. The UPLX is currently undergoing prototyping and deploying to pilot institutions that have indicated willingness to participate. Other than finetuning the prototype based on feedbacks from the pilot, future work will focus on developing an improved dynamic and distributed trust model for blockchain based self -sovereign identity management. This will enhance the security and privacy requirements of autonomous users of UPLX. Acknowledgment. The authors acknowledge the support by Malaysia’s Fundamental Research Grant Scheme under FRGS/1/2018/ICT04/UNITAR/03/1.
References 1. Hamzah, F.A., et al.: CoronaTracker: Worldwide COVID-19 Outbreak Data Analysis and Prediction. World Health Organisation, Bull cE-publication (2020) 2. Ozair, F., Jamshed, N., Sharma, A., Aggarwal, P.: Ethical issues in electronic health records: a general overview. Perspect. Clin. Res. 6(2), 73 (2015) 3. Reuters. https://www.reuters.com/article/us-singapore-health/us-finds-american-guilty-insingapore-hiv-data-leak-case-idUSKCN1T709J. Accessed 20 June 2020 4. FBI. https://archives.fbi.gov/archives/washingtondc/press-releases/2012/former-howarduniversity-hospital-employee-pleads-guilty-to-selling-personal-information-about-patients. Accessed 20 June 2020 5. Greene, A.H.: HIPAA compliance for clinician texting. J. AHIMA 83(4) 34–36 (2012) 6. Healthcare Compliance Analytics-Protenus. https://www.protenus.com/
18
O. Musa et al.
7. Magyar, G.: Blockchain: solving the privacy and research availability tradeoff for EHR data: a new disruptive technology in health data management. In: IEEE 30th Neumann Colloquium (NC), Budapest, pp. 000135–0001402017 (2017) 8. de Oliveira, M.T. et al.: Towards a Blockchain-based secure electronic medical record for healthcare applications. In: ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, pp. 1–6 (2019) 9. Theodouli, A., et.al.: On the design of a blockchain-based system to facilitate healthcare data sharing. In: 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), New York, NY, pp. 1374–1379 (2018) 10. S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system”, 28 (2008). 11. Androulaki, E., Barger, A., Bortnikov, V., et al.: Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains. [4] (2018) 12. Levi, S.D., et.al.: An Introduction to Smart Contracts and Their Potential and Inherent Limitations, [5] (2019) 13. IBM Research Group. https://www.ibm.com/blogs/research/2018/02/architecture-hyperl edger-fabric “Behind The Architecture of Hyperledger Fabric”. Accessed 20 June 2020 14. Dorri, A., Kanhere, S.S., Jurdak, R., Gauravaram, P.: Blockchain for IoT security and privacy: the case study of a smart home. In: IEEE International Conference on Pervasive Computing and Communications Workshops, pp. 618–623 (2017)
Convolutional Neural Networks for Automatic Detection of Colon Adenocarcinoma Based on Histopathological Images Yakoop Qasim(B) , Habeb Al-Sameai, Osamah Ali, and Abdulelah Hassan Department of Mechatronics and Robotics Engineering, Taiz University, Taiz, Yemen
Abstract. Colorectal cancer is the second type of cancer that causes death and the third in terms of prevalence and number of cases. Due to the absence of symptoms in the early stages of the injury, several types of tests must be performed to discover the cancer, but these methods take a lot of time, cost and require a specialized expert. So in this paper, we proposed a Convolutional Neural Network (CNN) model that characterized by speed of diagnosis and high accuracy with few number of parameters for diagnosing colon adenocarcinoma since it is the most common of colorectal cancer, where it represents 95% of the total cases of colorectal cancer, depending on dataset of 10000 histopathological images divided into 5000 images for colon adenocarcinoma and 5000 images for benign colon. Our model consists of two paths each path is responsible for creating 256 feature maps to increase the number of features at different level in order to improve the accuracy and sensitivity of the classification. To compare the performance of the proposed model, Visual geometry Group (VGG16) model was prepared and trained on the same dataset. After training the two models we obtained an accuracy of 99.6%, 96.2% for the proposed model and VGG16 respectively, we also obtained from the proposed model a sensitivity of 99.6% and Area Under Curve (AUC) of 99.6% which indicates the effectiveness of this model in diagnosing colon adenocarcinoma. Keywords: Deep learning · Colorectal cancer · Convolutional neural networks
1 Introduction Colorectal cancer is a cancer that arises in the colon and rectum, the colon is also known as the large intestine, while the rectum is the last part of the colon. According to the World Health Organization (WHO), colorectal cancer is the second most common type of cancer leading to death and the third at the most common cancer cases list, in 2018 [1]. There are many types of colorectal cancer such as adenocarcinoma, carcinoid tumors, gastrointestinal stromal tumors and colorectal lymphoma, whereas adenocarcinoma is the most common type of colorectal cancer and it represents about 95−98% of the total cases of colorectal cancer [2]. There are various symptoms of colorectal cancer such as constipation, diarrhea, changes in stool color, blood in the stool and bleeding from the rectum, often these symptoms do not appear on the patient in the early stages, and here lies the danger of colorectal cancer [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 19–28, 2021. https://doi.org/10.1007/978-3-030-70713-2_3
20
Y. Qasim et al.
To diagnose colorectal cancer, doctors use a Colonoscopy which is a long, thin and flexible tube attached to a camera and screen to view inside the colon and rectum, if doctors found any suspected areas with the disease, surgical instruments can be inserted through the Colonoscopy to take tissue samples (biopsies) [3], but the histopathological diagnosis requires an expert and a pathologist who is able to distinguish the size of cell nuclei as well as the shape of cells and where they are in the tissue, which may lead to an increasing in the diagnostic time and cost. With artificial intelligence algorithms, diagnosis process becomes easier as it possible to use deep learning algorithms to diagnose colorectal cancer based on the histopathological images at a faster speed and lower cost, there are many studies in this field. In [4] the authors used two approaches for transfer learning which are (i) CNN as a fixed feature generator, in this approach the CNN model which is VGG16 [5] is used to extract features and then fed them into a separate machine learning algorithm to complete the classification process, (ii) Fine-tuning the CNN in this approach the low layers of VGG16 are fixed, and the top layers of VGG16 are changed to adapt the new classification task. The authors training and testing their models on a dataset of 13500 whole-slide images of colorectal tissues distributed over three classes adenocarcinoma, tubuvillous adenoma and healthy tissue, they obtained an accuracy of 96%. In [6] the authors used different strategies of transfer learning by changing the number of layers which are frozen, they used four popular models for training and testing a dataset of 1577 confocal laser microscopy images speared over four classes healthy colon, malignant colon, healthy peritoneum and malignant peritoneum, they obtained Area under curve of 97.1% for classification metastases in the peritoneum. One of the disadvantages of this paper was that the dataset used was small and belonged to rats. In [7] a CNN model consisting of 43 convolution layers was presented to classify three classes which are adenocarcinoma, adenomatous polyps and normal on images, after training the model it was tested on 410 images achieving an accuracy of 94.4%. In [8] the authors used two approaches which are traditional approaches and transfer learning approach. In the traditional approach they used 5 state-of-the-art feature extraction techniques followed by an Support Vector Machine (SVM) classifier, while in the second approach a CNN model which is InceptionV3 [9] used as feature extraction and classifier. Both approaches are used to classify four classes normal, hyperplastic polyps, tubular adenoma and carcinoma based on histology images consists of 4000 images at rate 1000 images for each class. The best accuracy and sensitivity are 94.5% and 95.21% respectively. The main contributions of this paper is a CNN model has the ability of diagnosis colon adenocarcinoma based on histopathological images with high accuracy, as this model was trained on a large number of data, which gives reliability to the performance of the model, in addition this model is characterized by having a small number of parameters which can be used in any platform or framework since it does not need a large storage space.
Convolutional Neural Networks for Automatic Detection
21
2 Methods and Dataset 2.1 Dataset The dataset used in this work was extracted from LC25000 dataset which is available online [10]. The folder of colorectal cancer consists of two subfolders, the first subfolder is colon_aca with 5000 histopathological images of colon adenocarcinoma, and the second subfolder is colon _n with 5000 histopathlogical images of benign colonic tissue. Before training we divided the dataset into 70% for training and 30% for validation. 2.2 Convolutional Neural Networks A Convolutional Neural Network (CNN) is a type of neural networks that specializes in image classification and computer vision tasks, a typical CNN architecture has a convolution layer which extracts the features from the input array by applying a different filters on the input array pixels producing what is known as a convolved feature map [11], a pooling layer which reduces the size of feature map to make the computational easier, a Rectified Linear Unit (ReLU) which is an activation function can be represented as f(x) = max(0,x), the ReLU function accelerates the training speed and solves the vanishing gradient problem [12], a fully connected layer which takes the outputs of the previous layers, turn them into a single vector and gives the predictions for each class. CNN architecture can be divided into two parts, the first part consists of convolution layer, pooling layer and ReLU, is responsible for extraction the features, the second part consists of fully connected layer, is responsible for classification tasks. Figure 1 shows the process of extracting the features and creating feature maps by the convolution layer, and reducing the size of feature map by the max pooling layer.
Fig. 1. The process of extracting and reducing the feature maps.
2.3 Proposed Model As is known that increasing the depth of the model improves the accuracy of classification [13]. However, this is may not apply to medical diagnostics, as increasing the depth of
22
Y. Qasim et al.
the model may not improves its performance in the biological images [14], instead, the number of feature maps must be increased in order to include the most important and necessary details of the diagnostic process, therefore, we present the proposed model which consists of two paths in order to obtain the most high-level and low-level features with a suitable depth to diagnose medical images, and has a low number of parameters for reducing the computational resources and the time consuming in training. Table 1. The architecture of the proposed model for each path. Layer
Input feature
Stride
Padding
Output feature
Input layer
50 × 50 × 3
−
−
50 × 50 × 3
Parameters 0
Conv1
50 × 50 × 3
1
Same
50 × 50 × 32
896
Max pooling
50 × 50 × 32
2
−
25 × 25 × 32
0
Drop out
25 × 25 × 32
−
−
25 × 25 × 32
0
Conv2
25 × 25 × 32
1
Same
25 × 25 × 64
18,496
Max pooling
25 × 25 × 64
2
−
12 × 12 × 64
0
Drop out
12 × 12 × 64
−
−
12 × 12 × 64
0
Conv3
12 × 12 × 64
1
Same
12 × 12 × 128
Max pooling
12 × 12 × 128
2
−
6 × 6 × 128
0
Drop out
6 × 6 × 128
−
−
6 × 6 × 128
0
73,856
Conv4
6 × 6 × 128
1
Same
6 × 6 × 256
295,168
Max pooling
6 × 6 × 256
2
−
3 × 3 × 256
0
Drop out
3 × 3 × 256
−
−
3 × 3 × 256
0
GAP
3 × 3 × 256
None
Valid
256
−
Drop Out
1024
Dense2
1024
Dense1
256
0
−
1024
263,168
−
−
1024
0
−
−
2
2050
As shown in Table 1 and Fig. 1 our model consists of two paths, each path consists of four blocks and each block consists of combination of convolution + ReLU, max pooling layer and dropout layer [15] which prevents over-fitting [16], the two paths of the proposed model is followed by Global Average Pooling which creates a feature map for each category in the last layers [17]. The input layer of the proposed model is fixed with the size 50 × 50 × 3 pixels, the kernel size of all convolution layers is 3 × 3 because this size is the smallest possible size that can extract the features left/right and up/down, and it is also known that in medical diagnostics based on histopathological images, the smallest details are important to the success of the classification process, the filters number that applied in the first, second, third, fourth convolution layers are 32, 64, 128, 256 respectively. The output or dense layer consists of two neurons and the activation function was applied on it is Softmax function. Also as shown in Table 1 each path is responsible for creating 256 feature maps at different levels, and the total number of the
Convolutional Neural Networks for Automatic Detection
23
parameters is 653,634 and this is a small number compared to the number of parameters of popular models, which leads to a decrease in the time required for training. To show the effective of the proposed model, the VGG16 model was prepared and trained on the same dataset, VGG16 was chosen because it is one of the most commonly used model for image recognition, and also due to its simple structure (Fig. 2).
Fig. 2. The architecture of the proposed model
2.4 Transfer Learning and Fine-Tune Transfer learning is a technique of taking weights learned one problem and applying them to a new, similar problem [18], while fine-tune is the way of applying or utilizing transfer learning and adapting the pre-trained models to our classification task. In our work, we have used the transfer learning to re-train the VGG16 model, by keeping the weights of low layers or blocks fixed and fine-tuning the weights of the top layers. Table 2 shows the hyper-parameters for the two models. Table 2. Hyper-parameters for the two models. Hyper-parameters Value Batch size
32
Epochs
30
Image sized
50 × 50
Optimizer
Adam
Loss function
Categorical cross-entropy
Learning rate
1e−3
3 Results After training the proposed model we got Training and Validation curves for an accuracy and loss as shown in the figures below (Figs. 3 and 4).
24
Y. Qasim et al.
Fig. 3. Accuracy curves for the proposed model
Fig. 4. Loss curves for the proposed model.
For evaluating the performance of our model and other model, we calculated the Confusion matrix, and plotted the Receiver Operating Characteristics (ROC) so we can calculate the Area Under Curve (AUC). Confusion Matrix By using confusion matrix, we can calculate the evaluation metrics based on four parameters which are True Positive (TP) which is the correctly predicts of colon adenocarcinoma cases, False Positive (FP) which is the cases of benign colon that were classified as colon adenocarcinoma, True Negative (TN) which is the cases of benign colon and were classified as benign colon, False Negative (FN) which is the cases of colon adenocarcinoma that were classified as benign colon, Fig. 5 shows the confusion matrix for
Convolutional Neural Networks for Automatic Detection
25
the two models, Table 3 shows the four parameters for the two models, the evaluation metrics which can found by the four parameters are accuracy, sensitivity, specificity, precision and F-Measures, As the sensitivity is the percentage of cases with colon adenocarcinoma were correctly classified, specificity is the percentage of cases with benign colon were correctly classified, precision is the percentage of cases that actually belong to the colon adenocarcinoma cases from all cases that were classified as colon adenocarcinoma, and F-measure is the harmonic mean of sensitivity and precision. Table 4 shows the evaluation metrics for the two models.
Fig. 5. Confusion Matrix for the Proposed Model (left) and for the VGG16 Model (right).
Table 3. The four parameters for the two models. Model
TP
VGG16
1494
FP TN 6 1494
FN 6
The proposed 1435 49 1451 65
Table 4. Evaluation metrics for the two models. Model
Accuracy
Sensitivity
Specificity
Precision
F-Measure
VGG16
96.2
95.67
96.73
96.7
96.18
The proposed
99.6
99.6
99.6
99.6
99.6
TP + TN TP + TN + FP + FN
(1)
Sensitivity =
TP TP + FN
(2)
Specificity =
TN TN + FP
(3)
Accuracy =
26
Y. Qasim et al.
Precision = F − Measure =
TP TP + FP
2 × Senstivity × Precision Senstivity + Precision
(4) (5)
ROC and AUC ROC curve is a 2-D graph shows the performance of the classification model at all the classification thresholds [19], it can also be defined as the trade-off between sensitivity and specificity [20]. ROC is plotted between the True Positive Rate (TPR) or sensitivity and False Positive Rate (FPR) or 1-specificity, we have plotted ROC curve for the two models so we can calculate AUC, where AUC measures the area under the ROC curve and ranges from zero to one, if the predictions of the model are completely true then AUC is one, and if the predictions of the model are completely false then AUC is zero, whereas the higher the value of AUC means the lower the values of FP and FN. In medical diagnostic, the value of AUC is closer to one is required [22, 23]. Figure 6 shows ROC and AUC from the figure we obtained AUC of 99.6% for the proposed model and 96.2% for the VGG16 model.
Fig. 6. ROC and AUC for the two models.
4 Discussion and Conclusion In this work, we proposed a deep learning model consisting of two paths to diagnose colon adenocarcinoma based on the histopathological images with low number of parameters in order to reduce the computational resources and time of training. The proposed model was tested on 3000 images and performed exceptionally well. Our model achieved an overall accuracy, sensitivity, specificity, precision and F-Measure of 99.6% for the all
Convolutional Neural Networks for Automatic Detection
27
metrics outperforming the VGG16 model, from Table 4 we note that the sensitivity is very high, which means the model is very sensitive to images of colon adenocarcinoma and suitable to be used as a diagnostic colon cancer. Our model also achieved an AUC of 99.6%, this value is considered to be perfect in the field of medical diagnosis, it should be noted that higher AUC means low FP and FN cases, and low FP and FN cases means better classification and perfect diagnostic results. From the above we conclude that the models which are built in this way are very effective in medical diagnosing based on histopatological images. This model has one drawback or limitation, which that is not trained on all types of colorectal cancer, the focus has been on colon adenocarcinoma, as it constitutes the vast majority of people with colorectal cancer. In the future we aspire to train the model on all types of colorectal cancer and create a diagnostic platform or framework. In Table 5 we compared between the result obtained from the previous studies and the results which we obtained. Table 5. Compression between the previous studies and our study. Study
Dataset used
Approaches-Models
Results
Francesco et al. [4]
Consists of 13500 Whole-slide-images(WSIs)
i-CNN as feature generatorVGG16 + SVM ii-Fine tuning -VGG16
Accuracy = 96%
Nills et al. [6]
Confocal laser microscopy (CLM) which consist of 1577 images
InceptionV3 Densenet121 SE-Resnext50 & VGG-16
AUC = 97.1%
Hyun et al. [7]
Consists of 49458 of endoscopy images
CNN consists of 43 layers
Accuracy = 94.39%
Junaid et al. [8]
QU-AHLI which Consists of i- Traditional approach4000 images rLPQ + SVM rlbp + svm Uniform Rlbp + SVM Haralick + SVM (rLPQ + rLbp) + SVM ii- Transfer learning approach-InceptionV3
Accuracy = 94.4% AUC = -
This study
LC25000
Accuracy = 99.6% AUC = 99.6%
The proposed model
References 1. WHO Cancer-World Accessed 28 Jul 2020
Health
Organization,
http://www.who.int/health-topics/cancer.
28
Y. Qasim et al.
2. Cancer.org, What is Colorectal cancer, http://www.google.com/amp/s/amp.cancer.org/cancer/ colon-rectal-cancer/about/what-is-colorectal-cancer.html Accessed 2 Aug 2020 3. Christina Chun, Colorectal cancer: Symptoms, treatment, risk factors, and causes. http://www. medicalnewstoday.com/articles/155598 Accessed 2 Aug 2020 4. Ponzio, F., Enrico, M., Elisa, F., Santa, D.: Colorectal Cancer Classification Using Deep Convolutional Networks, In: Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies, vol. 2, (2018) 5. Simonyan, K., Andrew, Z.: Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv preprint arXiv. pp. 1409–1556 (2014) 6. Streiner, D.L., John, C.: What’s under the roc? An introduction to receiver operating characteristics curves. Can. J. Psychiatry 52(2), 121–128 (2007) 7. Gessert, N., Marcel, B., Lukas, W., Daniel, D.: Deep transfer learning methods for colon cancer classification in confocal laser microscopy images. Int. J. Comput. Assis. Radiol. Surg. 14(11), 1837–1845 (2019) 8. Park, H., Yoon, K., Sang, L.: Adenocarcinoma recognition in endoscopy images using optimized convolutional neural networks. Appl. Sci. 10(5), 1650 (2020) 9. Malik, J., Serkan, K., Suchitra, K., Turker, I., Somaya, A., Ridha, H., Moncef, G.: Colorectal Cancer Diagnosis from Histology Images, A Comparative Study, arXiv preprint arXiv. pp. 1903–11210 (2019) 10. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 11. Borkowski, A., Marilyn, M., Brannon, T., Catherine, P., Lauren, A., Stephen, M.: Lung and Colon Cancer Histopathological Image Dataset (Lc25000), arXiv, preprint arXiv, pp. 1912– 12142 (2019) 12. Pinaya, W., Garcia-Dias, S., Mechelli, A.: Convolutional neural networks. https://dpi.org/10. 1016/B978-0-12-815739-8.00010-9 13. Ide, H., Takio, K.: Improvement of Learning for CNN with ReLU Activation by Sparse Regularization. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2684–2691. IEEE (2017) 14. LeCun, Y., Yoshua, B.: Geoffrey H. Deep Learn. Nat. 521(7553), 436–444 (2015) 15. Min, S., Byunghan, L., Sungroh, Y.: Deep learning in bioinformatics. Briefings Bioinf. 18(5), 851–869 (2017) 16. Hinton, G., Srivastava, N., Krizhevsky, A.: Improving Neural Networks by Preventing CoAdaption of Feature Detectors [R/Ol], (2015) 17. Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004) 18. Chris. What are Max Pooling, Average Pooling, Global max Pooling and Global Average Pooling. https://www.machinecurve.com/index.php/2020/01/30/what-are-max-poolingaverage-pooling-global-max-pooling-and-global-average-pooling/. Accessed 27 Jun 2020 19. Cheng, B., Liu, M., Zhang, D., Munsell, B.C., Shen, D.: Domain Transfer Learning for MCI conversion Prediction. IEEE Trans. Biomed. Eng. 62(7), 1805–1817 (2015) 20. Developers.google. Classification: ROC curve and AUC. https://developers.google.com/mac hine-learning/crash-course/classification/roc-and-auc. Accessed 16 Aug 2020 21. Korsten, MA.: Application of Summary Receiver Operating Characteristics (Sroc) Analysis to Diagnostic Clinical Testing. In: 7th Reflections on the Future of Gastroenterology–unmet Needs vol. 52, p. 76, (2007) 22. Streiner, D.L., Cairney, J.: What’s under the roc? an introduction to receiver operating characteristics curves. Can. J. Psychiatry 52(2), 121–128 (2007) 23. Siddiqui, M.K„ Morales-Menendez, R., Ahmad, S.: Application of Receiver Operating Characteristics (Roc) on the Prediction of Obesity. Brazilian Arch. Biol. Technol. 63, (2020)
Intelligent Health Informatics with Personalisation in Weather-Based Healthcare Using Machine Learning Radiah Haque1 , Sin-Ban Ho1(B) , Ian Chai1 , Chin-Wei Teoh1 , Adina Abdullah2 , Chuie-Hong Tan3 , and Khairi Shazwan Dollmat1 1 Faculty of Computing and Informatics, Multimedia University, 63100 Cyberjaya, Malaysia
{sbho,ianchai,shazwan.dollmat}@mmu.edu.my 2 Department of Primary Care Medicine, Faculty of Medicine, University of Malaya,
50603 Kuala Lumpur, Malaysia [email protected] 3 Faculty of Management, Multimedia University, 63100 Cyberjaya, Malaysia [email protected]
Abstract. Enhancing personalisation is important for productive collaboration between humans and machines. This is because the integration of human intelligence with cognitive computing would provide added value to healthcare. While the well-being and human health can be profoundly affected by weather, the effect of machine learning on personalised weather-based healthcare for selfmanagement is unclear. This paper seeks to understand how machine learning use affects the personalisation of weather-based healthcare. Based on the Uses and Gratifications Theory (UGT), new constructs are incorporated (demography, weather and effectiveness) in order to propose a model for health science with machine learning use, weather-based healthcare, and personalisation. Subsequently, this paper proposes building a system that can predict the symptoms of two diseases (asthma and eczema) based on weather triggers. The outcome from this paper will provide deeper understanding of how personalisation is impacted by machine learning usage and weather-based healthcare for individual patients’ self-management and early prevention. The findings in this paper will also assist machine learning facilitators design effective use policies for weatherbased healthcare that will have new fundamental knowledge with personalisation to enhance the future of intelligent health informatics, and artificial intelligence. Keywords: Machine learning · Intelligent health informatics · Artificial intelligence · Weather-based healthcare · Mobile application
1 Introduction Weather-based healthcare, which refers to self-management of chronic diseases that are affected by the weather, helps patients avoid weather triggers that can worsen their symptoms by changing their lifestyle. However, it is difficult for individual patients © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 29–40, 2021. https://doi.org/10.1007/978-3-030-70713-2_4
30
R. Haque et al.
to change their lifestyle for self-management and early prevention of worsening their disease symptoms based on the weather, because many of them are unaware of which weather triggers they are vulnerable to, which may depend on their demographic characteristics (e.g. age and disease severity level). Many weather-based healthcare systems for self-management fail to be adopted widely because these systems lack the capability to support personalisation (i.e. providing feedback based on individual patients’ demographic characteristics and weather triggers). Fortunately, machine learning can solve this problem because it teaches computers to learn from relevant weather and demographic data of individual patients and provide personalised prediction results. Machine Learning is a scientific study that involves statistical models and algorithms. It is used to implement tasks by systems without specific instructions, but by relying on inference and patterns instead [1]. In other words, machine learning is a subset of data science and artificial intelligence, where computers learn through algorithms based on implications of predictive power from a set of data [2]. Thereby, machine learning works in systems that can discover and learn from data patterns, and use them to make independent decisions. Consequently, the application of machine learning is driven by the availability of large amounts of data and lower cost computation. By using this technology, a lot more topics can be researched and can produce results or decisions that are more accurate and useful to the community. One of the topics is healthcare where machine learning techniques have made advances in the healthcare domain. This is because healthcare data provides vast opportunities for development of learning the patterns between individual patient’s symptom triggers, medical history and demographic characteristics, which then assists in processing automation and personalised predictions. Meanwhile, well-being and human health can be profoundly affected by the weather. The weather may also be associated with allergies and respiratory diseases, which can be possibly linked to concentrations of pollution levels and pollen grains [3]. However, the machine learning impact on personalised weather-based healthcare is still unclear at the moment. As compared to the effectiveness of machine learning techniques in other commercial domains, such as face and image recognition, weather-based analytics and prediction for self-management still lags behind. Therefore, this paper is to explore further the relationship between weather conditions and chronic diseases that are affected by weather. Two diseases have been considered for this study; asthma and eczema. Furthermore, a model of intelligent health informatics with personalised healthcare will be proposed using machine learning techniques to predict asthma attack and worsening of eczema for individual patients based on weather triggers. This paper will benefit the information and communication technologies and healthcare sectors. The outcome of this paper will assist machine learning developers and researchers design effective use policies for weather-based healthcare, and provide invaluable feedback to government healthcare systems.
2 Background Study 2.1 Influence of Weather on Asthma and Eczema The influence of weather and climate changes is critical to patients who have chronic asthma. The study by Alharbi and Abdullah [4] has stated that asthma attacks are affected
Intelligent Health Informatics with Personalisation in Weather-Based Healthcare
31
by the changes of weather temperature and humidity. Another study by Asthma and Allergy Foundation of America (AAFA) [5] has elaborated that increased humidity in the atmosphere and thunderstorms can trigger asthma attacks. This is because humidity helps dust mites thrive and the number of pollen particles increases in the air which can aggravate asthma. Heavy rain and strong wind caused by thunderstorms can break pollen grains into smaller sizes, which makes them transmit more easily through the air. It has a critical effect as asthma patients inhale the polluted pollen-laden air into their lungs. One of the significant incidents that reflect the influence of humidity and thunderstorms for asthma triggers is the 2016 Melbourne, Australia event when a thunderstorm affected thousands of asthma patients living in the city. In fact, this particular event has been labelled “thunderstorm asthma” [6]. Moreover, analysis of 15,678 asthmatic hospital admissions in Shanghai, China found that cold temperatures can trigger attacks in asthma [7]. Eczema, or skin allergies, is a skin barrier dysfunction which causes skin dryness, itchiness and irritation. Since skin is mostly exposed to the environment, the skin barrier is important to block allergens and other germs in the air from entering the skin surface. Eczema causes the skin to lose the ability to adapt to climate changes [8]. Vocks et al. [9] found that patients with eczema suffer more itchiness and irritation in winter than in summer, and during thunderstorms, due to colder temperatures and windy conditions. The increased pollen grains in the air due to winds enter the skin cells and cause dryness and irritation. Respiratory health and skin diseases affect the general population around the world, and the level of severity varies from patient to patient based on different weather conditions. Thus, weather is an important factor that must be monitored by individual patients with asthma and eczema. 2.2 Methods for Patients to Self-monitor Asthma and Eczema For personalised healthcare, patients need to perform control tests enabling them to conduct self-monitoring on the seriousness of their asthma or eczema on their own from their location. The AAFA has introduced the Asthma Control Test (ACT) as the standard test to monitor asthma. The ACT is recommended by medical experts [5]. Asthma patients can use the ACT to identify the severity of their asthma, which, in turn, is useful for doctors and nurses to determine the required treatment. The ACT has a scaling index for patients to record the severity of their asthma easily. Meanwhile, Charman et.al [10] has suggested the Patient-Oriented Eczema Measure (POEM) as the standard assessment to identify the severity of eczema. The POEM assessment has a scaling index for patients to record the severity of their eczema easily. The National Institute for Health and Care Excellence (NICE) at the University of Nottingham [11] recommends the use of POEM in clinical guidelines. In eczema trials, the HOME (Harmonizing Outcome Measures for Eczema) initiative recommends POEM to be used as the essential instrument to measure patient-reported symptoms of skin allergies. Thus, both recommended methods allow patient self-monitoring for asthma and eczema. The score obtained on both tests helps medical personnel to classify the severity and determine the effective treatment required for their patients effectively and efficiently. In addition to the wide use of ACT and POEM for self-monitoring asthma and
32
R. Haque et al.
eczema respectively, this paper aims to relate them to weather. This can allow patients to identify the triggers of their disease based on the weather conditions in their location. 2.3 Machine Learning in Weather-Based Healthcare The effect of weather conditions on asthma and eczema necessitated further research into how computers can assist personalised healthcare awareness for self-management, and provide assessment of temperature, exposure to allergens, changes in barometric pressure, humidity, and wind. Information extraction and machine learning promise lower costs, besides being capable of discovering patterns in large amounts of data, dealing with uncertainties and probabilities. The working of a machine learning application is completely different from a regular application [12]. Figure 1 illustrates how machine learning fits with weather-based healthcare under data science [13]. In the context of information extraction and machine learning, there is a need to revisit regression analytics challenges for two reasons. Firstly, machine learning relies extensively on statistical methods. The ultimate goal is to model real-world weather-based healthcare system, with mathematical relationships. Secondly, the lack of standards for measuring the relationships among the variables makes utilising the most effective method even harder. The many capabilities for regression are based on training data uniformity [14], making fitting the regressions to non-uniform data challenging.
Fig. 1. Machine learning in weather-based healthcare.
In recent years, with the expansion of computer-assisted systems in health, attention is driven on developing tools such as mobile health (mHealth) applications for selfmanagement by providing early prevention using machine learning. Table 1 summarises examples of machine learning use in weather-based healthcare for self-management of asthma and eczema, and highlights the limitations of these proposed models. From the background study, it was identified that implementing a weather-based healthcare system with personalisation support for self-management has proven to be a challenge. This is because it is difficult to identify the impact of weather on an individual patient’s symptoms, since weather attributes affect each patient differently based on demographic characteristics and severity level. Fortunately, machine learning algorithms, such as neural networks, can be developed to predict the impact of weather on an individual
Intelligent Health Informatics with Personalisation in Weather-Based Healthcare
33
patient’s symptoms and provide personalised feedback for self-management and early prevention of asthma attacks and worsening eczema symptoms. Table 1. Machine learning use in weather-based healthcare. Ref.
Contribution
Limitation
[15]
Proposed machine learning technique for an mHealth application for predicting and providing early prevention of asthma attacks. The model can provide real-time feedback to patients based on weather conditions in the user’s location
The proposed model does not include collecting demography data and does not provide personalised feedback to users. Moreover, there is a need to classify the severity level of asthma for individual patients
[16]
Developed a prototype for an mHealth application which contains knowledge generation via machine learning to sense the surrounding environment and weather conditions in the user’s location. The model can predict the weather triggers for asthmatic patients and provide feedback and early prevention of asthma attacks
The model does not propose identifying the severity level of asthma for individual patients. Furthermore, the proposed model does not provide personalised prediction based on individual user’s demographic characters and weather triggers
[17]
Developed a machine learning model to predict eczema severity level on a daily basis. The proposed model is design through Bayesian inference to provide probabilistic prediction to patients with eczema for early prevention of worsening symptoms
The proposed model does not identify the impact of weather attributes on individual patient’s eczema severity level and does not collect demography data for personalised feedback
3 Methodology This paper aims to propose a weather-based healthcare system that can predict chances of triggering asthma attacks or worsening eczema symptoms of its users based on daily weather forecasts for their location using machine learning techniques. The conceptual representation of the proposed system in Fig. 2 presents how a machine learning model for developing algorithms can be used to provide recommendations for patients based on the current weather forecast [18, 19]. These machine learning algorithms shall predict how weather conditions affect a patient’s asthma or eczema. The algorithms and predictions [20] are based on the analysis of user-reported data and the weather forecast. Consequently, the system can continuously improve the accuracy of the prediction of these diseases and the algorithms through machine learning [21]. Typically, a machine learning process includes collecting data, preparing the data and applying algorithms to train and test the data [13]. In order to collect the data, a weatherbased mHealth application with a user personalisation feature is developed. The mHealth application provides the latest weather forecast to help users with asthma and eczema
34
R. Haque et al.
estimate the chances of asthma triggers or to measure the severity level of their eczema based on the weather conditions of the day from their location. Table 2 summarises the core functions of the weather-based mHealth application, while Fig. 3 illustrates the data flow between the user and the system. Figure 4 shows the Entity Relationship Diagram (ERD) of the weather-based mHealth application. In this case, the system comprises of four entities including UserInfo to store user profile and demography data, ReportDiseaseAsthma and ReportDiseaseEczema to store user results of the ACT and POEM respectively, plus ReportWeather to store weather forecast information when the user submits the ACT or POEM results.
Fig. 2. Conceptual representation of the proposed system. Table 2. Core functions of the weather-based healthcare mobile application. Core functions
Description
Sign in and register
Provide a function to allow the user to sign in and register with email address and password
Location retrieval
Detect the user’s current location and retrieve the weather forecast information for that location
Weather conditions
Provide weather forecast information including temperature, wind speed, pressure, humidity and rain
Main forecast activity
Provide weather forecast information based on the user location
Daily forecast
Provide daily updated weather forecast information for the current day, the next day, and the next 5 days
Hourly forecast
Provide hourly weather forecast information for the next 3 h
Report asthma
Provide the Asthma Control Test (ACT) with questions for the user to answer and submit the results (continued)
Intelligent Health Informatics with Personalisation in Weather-Based Healthcare
35
Table 2. (continued) Core functions
Description
Report eczema
Provide the Patient-Oriented Eczema Measure (POEM) test with questions for the user to answer and submit the results
Report weather
Provide a function to collect the weather information as a timestamp when the user submits the ACT or POEM answers
Data storage
Store user answers with the timestamp in a real-time system database
Personalised settings
Provide several application settings options to allow the user to personalise preferences, such as weather forecast unit, display format and application theme
Graph activity
Provide a function to display the weather forecast information in a graphical form
Air quality index (AQI)
Provide a function to display AQI reading from the user location
Weather map activity
Provide a function to allow the user to interact and view the weather forecast information in a more creative way
System requirements
Provide a system that can work on mobile devices with Android version 6.0.1 and above. The application functional module must meet the system functional and non-functional requirements
Fig. 3. Context diagram of the system.
4 Results and Discussion The proposed weather-based healthcare model is based on the Uses and Gratification Theory (UGT) that incorporates weather attributes, demographic characters and personalisation effectiveness. UGT helps to identify the media and the elements that benefit users’ social need [22], thus ensuring sustained user engagement and wide adoption. This model suggests developing an mHealth application (media) using machine learning techniques (elements) for self-management (social need). Smartphones and tablets
36
R. Haque et al.
Fig. 4. Entity Relationship Diagram (ERD).
introduce lighter-weight operating systems and user interfaces with gesture-based interactions which helps building interactive mHealth application for weather-based healthcare. Figure 5 shows the capability of disease severity prediction based on weather conditions. The application provides a reliable and easy-to-use interface for asthma and eczema patients to stay updated with the weather forecast in their location.
Fig. 5. The forecast interface with asthma and eczema precautions.
Figure 6 provides examples of the current hourly and daily weather forecast extracted from the two weather resources, namely Wunderground [18] and DarkSky [19]. This is
Intelligent Health Informatics with Personalisation in Weather-Based Healthcare
37
to provide a weather forecast report offline viewing feature, together with asthma and eczema precautions. Users are able to report and record the severity of diseases in the application using ACT for asthma and POEM for eczema. ACT is mainly used because it provides a numerical score to determine the severity level of asthma for individual patients. There are two tests available for ACT. One for those 12 years or older, and another, the Childhood ACT, for those under 12 years old. Meanwhile, POEM is used for monitoring atopic eczema severity. This weather-based healthcare mobile application is on the Android platform, and developed with Android Studio and the Firebase database. Data has been collected from users with asthma or eczema who reported their conditions through ACT or POEM reporting interfaces, by answering the Multiple-Choice Questions (MCQ) (see Fig. 7). Once their answers are submitted, a timestamp is created with the weather forecast information of that day and time. This timestamp, along with the pre-assigned number to each MCQ answer, is stored in the database. From the weather-based healthcare application, it was identified that both asthma and eczema cause a variety of symptoms that can worsen based on different weather conditions. The ACT and POEM results indicated that on some days, a user may not have symptoms, but on other days the user shows strong symptoms, depending on the weather conditions of those days. By analysing the results, it has been identified that cold temperature and thunderstorms were among the common causes of triggering asthma as well as worsening eczema for the majority of the users. This result agrees with the findings in the literature which show that weather has an apparent effect on asthma and eczema. This can be tracked for individual patients, who can then take necessary precautions based on the predicted weather conditions for self-management and early prevention, which can lead to personalised healthcare.
Fig. 6. Hourly and daily weather forecasts from two weather resources.
38
R. Haque et al.
Fig. 7. ACT and POEM reporting interfaces with MCQs.
This weather-based healthcare mobile application can be useful from the perspectives of both the user and machine learning developer. While the user obtains information and submits data through the application, the developer performs data cleansing to apply machine learning algorithms that can provide predictions to individual users of their asthma or eczema condition based on daily weather forecast in each user’s location. This is important because the severity level of these diseases differ among users and the extent of weather impact on each user’s condition also varies based on their location. In light of this, the on-going investigation is focused on how to benefit from machine learning use in the context of personalisation in weather-based healthcare. The machine learning techniques used for the personalised weather-based healthcare model include a regression technique to predict asthma attacks and eczema severity based on daily weather forecast. The model used in machine learning includes a recommendation technique to associate users’ activities or preferences with their situation and providing predictive feedback to them. This model recommends precautions to individual patients in certain weather conditions. A Recurrent Neural Network (RNN) [23] combines all the machine learning techniques mentioned above, so is considered for integration into the mHealth application. Specifically, RNN is suitable for modelling the personalised weather-based healthcare system because it can cluster a dataset with many variables in functional groups for individual patients. To accomplish personalisation in the proposed weather-based healthcare system, it is important to identify similar patterns and regular trends in the dataset for individual patients over a period of time. Consequently, a ‘many-to-one’ RNN is used with multiple input neurons at the input layer, including weather and demography input, and one output neuron, which is the chance rate of triggering an asthma attack/worsening eczema symptom. Once this rate is identified through an RNN as output for each patient, prediction results using the machine learning recommendation technique will be given to individual users on the mHealth application’s forecast interface. Figure 5 illustrates an example of this output,
Intelligent Health Informatics with Personalisation in Weather-Based Healthcare
39
where a list of precautions is provided for self-management and early prevention of asthma attacks or worsening eczema symptom based on weather triggers.
5 Conclusion Machine learning has become a universal and promising technology that can be researched and improved continuously and has the potential to contribute in many significant studies such as personalised weather-based healthcare for self-management and early prevention of worsening chronic disease symptoms. Personalisation is important in weather-based healthcare as it can offer prognosis information of health condition based on the weather to facilitate self-monitoring. Throughout this paper, it was identified that human health can be profoundly affected by the weather, which can trigger chronic diseases such as asthma and eczema. The effect of weather conditions on these diseases underlined the importance of personalised weather-based healthcare. It necessitated further research into how machine learning techniques can assist raising self-management and early prevention awareness in healthcare and provide predictions on how certain weather temperatures, exposure to allergens, barometric pressure changes, humidity and wind may trigger attacks in asthma, or worsen the symptoms of eczema. This paper proposes a meaningful way of collecting weather-based healthcare data through developing a mobile health (mHealth) application. The application provides daily weather forecast information retrieved from the user’s location. The main goal of the mHealth application is to allow patients to self-manage their condition and prevent from getting worse based on the provided weather forecast in their location. To do this, users can go through the Asthma Control Test (ACT) for asthma symptoms tracking or the Patient Oriented Eczema Measure (POEM) for monitoring atopic severity in skin allergy. The main limitation of the current version of the weather-based mHealth application is that it does not provide daily predictions to individual users of their asthma or eczema severity based on the weather forecast in their location, which can facilitate personalisation. Having said that, the development of this application as a data collection mechanism for machine learning process is revolutionary, due to its ability to connect and link different datasets into one self-adjusted training and testing dataset for Recurrent Neural Network (RNN) modelling. For further development, the absence of classification for prognosis through machine learning needs to be addressed. With the annotated dataset, the intended result of the proposed system using RNN can be properly measured and documented to assist machine learning researchers to develop algorithms through proper machine learning frameworks and design effective personalised weather-based healthcare systems for self-management and early prevention. Acknowledgment. The authors appreciate the financial support given by the Fundamental Research Grant Scheme, FRGS/1/2019/SS06/MMU/02/4 and Multimedia University, Cyberjaya, Malaysia (Project ID: MMUE/190031).
References 1. Akhil, J., Samreen, S., Aluvalu, R.: The future of healthcare: machine learning. Int. J. Eng. Technol. (UAE) 7, 23–25 (2018)
40
R. Haque et al.
2. Panch, T., Szolovits, P., Atun, R.: Artificial intelligence, machine learning and health systems. J. Glob. Health 8(2), 020303 (2018). https://doi.org/10.7189/jogh.08.020303 3. Lepeule, J., Litonjua, A.A., Gasparrini, A., Koutrakis, P., Sparrow, D., Vokonas, P.S., Schwartz, J.: Lung function association with outdoor temperature and relative humidity and its interaction with air pollution in the elderly. Environ. Res. 165, 110–117 (2018) 4. Alharbi, E., Abdullah, M.: Asthma attack prediction based on weather factors. Periodicals Eng. Nat. Sci. 7, 408–419 (2019). https://doi.org/10.21533/pen.v7i1.422 5. AAFA.: Weather can trigger asthma. Asthma and Allergy Foundation of America (2017). https://www.aafa.org/weather-triggers-asthma. Accessed 6 Aug 2020 6. D’Amato, G., Pawankar, R., Vitale, C., Lanza, M., Molino, A., Stanziola, A., Sanduzzi, A., Vatrella, A., D’Amato, M.: Climate change and air pollution: Effects on Respiratory Allergy. Allergy, Asthma Immunol. Res. 8(5), 391–395 (2016) https://doi.org/10.4168/aair.2016.8. 5.391 7. Zhang, Y., Peng, L., Kan, H., Xu, J., Chen, R., Liu, Y., Wang, W.: Effects of meteorological factors on daily hospital admissions for asthma in adults: A time-series analysis. PLoS One 9(7), e102475 (2014). https://doi.org/10.1371/journal.pone.0102475 8. Balato, N., Megna, M., Ayala, F., Balato, A., Napolitano, M., Patruno, C.: Effects of climate changes on skin diseases. Expert Rev. Anti. Infect. Ther. 12, 171–181 (2014) 9. Vocks, E., Busch, R., Frölich, C., Borelli, S., Mayer, H., Ring, J.: Influence of weather and climate on subjective symptom intensity in atopic dermatitis. Int. J. Biometeorol. 45, 27–33 (2001) 10. Charman, C., Venn, A., Ravenscroft, J., Williams, H.: Translating patient-oriented eczema measure (POEM) scores into clinical practice by suggesting severity strata derived using anchor-based methods. Br. J. Dermatol. 169(6), 1326–1332 (2013) 11. POEM. Patient Oriented Eczema Measure (2020). https://www.nottingham.ac.uk/research/ groups/cebd/resources/poem.aspx. Accessed 6 Aug 2020 12. Scarpino, M.: Tensor Flow for Dummies. Wiley, Hoboken, pp. 8–43, pp. 201–224, USA (2018) 13. Kurata, J.: Understanding machine learning with python (2016). Pluralsight. https://app.plu ralsight.com. Accessed 30 Apr 2020 14. Bassi, S.: Python for Bioinformatics. 2nd ed. CRC Press, Taylor & Francis Group, Boca Raton, pp. 30–37, pp. 158–208 (2018) 15. Tsang, K., Pinnock, H., Wilson, A., Shah, S.: Application of machine learning to support self-management of asthma with mHealth. In: 42nd Annual Int’l Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada (2020) 16. Gaynor, M., Schneider, D., Seltzer, M., Crannage, E., Barron, M.L., Waterman, J., Obersle, A.: A user-centered learning asthma smartphone application for patients and providers. Learn. Health Syst. 4(3), e10217 (2020) 17. Hurault, G., Domínguez-Hüttinger, E., Langan, S.M., Williams, H.C., Tanaka, R.J.: Personalized prediction of daily eczema severity scores using a mechanistic machine learning model. Clin. Exp. Allergy (2020). https://doi.org/10.1111/cea.13717 18. Wunderground. Weather Underground Application Programming Interface (API) (2020). https://www.wunderground.com/weather/api/d/pricing.html. Accessed 6 Aug 2020 19. DarkSky. DarkSky API (2020). https://darksky.net/dev/docs. Accessed 6 Aug 2020 20. VanderPlas, J.: Python data science handbook. O’Reilly Media Inc, Sebastopol (2017) 21. Geron, A.: Hands-on machine learning with scikit-learn and tensorflow. O’Reilly Media Inc, Sebastopol (2017) 22. Hossain, M.: Effects of uses and gratifications on social media use. PSU Res. Rev. 3(1), 16–28 (2019). https://doi.org/10.1108/prr-07-2018-0023 23. Phan, D., Yang, N., Kuo, C., Chan, C.: Deep learning approaches for sleep disorder prediction in an asthma cohort. J. Asthma (2020) https://doi.org/10.1080/02770903.2020.1742352
A CNN-Based Model for Early Melanoma Detection Amer Sallam1(B) , Abdulfattah E. Ba Alawi2 , and Ahmed Y. A. Saeed2 1 Computer Network and Distributed Systems Department, Taiz University, Taiz, Yemen
[email protected] 2 Software Engineering Department, Taiz University, Taiz, Yemen
Abstract. Melanoma is a serious form of skin cancer that develops from pigmentproducing cells known as melanocytes, which in turn produce melanin that gives your skin its color. Early detection of these symptoms will certainly help affected people to overcome their suffering and find appropriate solutions for their treatment methods. That is why researchers have tried in many studies to provide technical solutions to help early detection of skin cancer. In this paper, a smart pre-trained model based on deep learning techniques for the early detection of Melanoma and Nevus has been proposed. It is designed to track and divide the dynamic features of the dermoscopic ISIC dataset into two distinguished classes Melanoma and Nevus of epidermal pathologies. AlexNet and GoogLeNet are used to classify each cancer type according to their profile features. It was found that the average classification accuracy for the above-mentioned algorithms is 90.2% and 89% respectively, providing plausible results when comparing to other existing models. Keywords: Skin diseases · Dermoscopic · Dermatologist · CAD · Melanoma · GoogLeNet · AlexNet
1 Introduction Skin is the first defense line in the human body. It plays a critical and vital role in protecting the body from infections, injuries, UV radiation, harmful radiation, and temperature control. Such importance has been attracting many researches in the field of computer science (i.e. data mining, computer vision, and pattern recognition). So many studies are constantly investigating and developing for such diseases. Malign Malignancy [1] is one of the leading causes of skin cancer that risks people’s lives severely. It can scatter over time rapidly. The rapid growth of the melanoma cases makes it a very vigilant form of cancer that has been receiving wide attention. Increasing survival rates for patients are critical to the early diagnosis of melanoma [2]. In this study, a CNN-based model is proposed to recognize melanoma disease on dermoscopic images. GoogLeNet and AlexNet have been employed to make a decision on the recognition process. The remaining parts of this paper are arranged in the following structure: Sect. 2 presents a background of the problem domain and introduces a brief description of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 41–51, 2021. https://doi.org/10.1007/978-3-030-70713-2_5
42
A. Sallam et al.
related works. The proposed methods are elaborated in Sect. 3. Then, the obtained results and observations are discussed and analyzed in Sect. 4. The last section (Sect. 5) is the conclusion and the directions of the future work are briefly outlined.
2 Background The nightmare of skin diseases in general and the skin cancer in particular, still causing a suffering and death of millions compelled humankind to shudder. For such reasons, a several attempts try to diminish such suffering. Dermoscopy is a trustworthy screening tool for malignant melanoma. Skin experts or dermatologists can greatly minimize the risk of malignant melanoma wrong diagnosis by integrating skin pathology with macroscopic and microscopic clinical dermatology. Expert dermatologists, however, demonstrate variance among observers and can lead, in presenting the same lesion [3], to different outcomes. 2.1 Related Works Pattern recognition is a hot topic for researchers that has inspired many scholars to find solutions for real-world problems including skin diseases. The manual inspection method of a skin lesion, which cannot be possible over time, is generally dependent on a qualified specialist; therefore, machine learning-based methods are proposed by a group of researchers working in this domain. Codella et al. [4] implemented an ensemble of deep learning CNN-based models to identify lesions through three separate architectures, which are integrated to form a standard architecture of pre-trained models. With the aid of 1279 images of the International Symposium on Biomedical Imaging (ISBI) 2016, the model is tested with a fully CNN model, achieving 76% accuracy. Li and Shen [5] proposed a reliable DL approach for the identification of lesion problems by segmentation methods, function estimation, and eventually recognition. The approach is intended to address lesion extraction issues. For lesion segmentation and classification, two fully residual network layers are used and enhanced through the measurement unit of lesion index. On the (ISIC 2017) dataset, the proposed approach achieved 91.2%. Adjed et al. [6] implemented a fusion of wavelet, curvelet, and two local binary pattern descriptors with structural and textural characteristics. The suggested treatment is carried out by using 200 dermoscopic photographs from the PH2 collection, including 160 non-melanoma and 40 melanoma photographs. The validated results were very promising a random cross-validation approach of Support Vector Machine (SVM) success at 78.93%, 93.25%, and 86.07% for the sensitivity, specificity, and accuracy respectively. Mukherjee et al. [7] designed a Deep Convolution Neural Networks DCNN method based on MEDNOE and Dermofit datasets in two phases. Another different performance is calculated at the early stage and later combined both datasets and achieves 83.07% in terms of accuracy. Mahbod et al. [8] proposed an automated model for recognizing skin lesions with deep features optimization. Three CNN models including ResNet18,
A CNN-Based Model for Early Melanoma Detection
43
AlexNet, and VGG are utilized. Then, the SVM classifier used and achieved 83.33% melanoma recognition accuracy. Abbas and Celebi [9] introduced a model that can recognize pigmented skin lesions using a new method named DermoDeep. The DermoDeep technique consists of five architectural layers and involves the fusion of Visual and DNN elements. This model trained using 2800 images. The efficiency of the model is verified and obtained 93% and 95% sensitivity and specificity respectively. In the face of the eclecticism on the basis of what has been obtained by the aforementioned methods, but they still exist some prominence and imperfections in terms of generalization as a result of a difference in dermoscopic scans and poor resolution of datasets. In addition, the collection of most discriminatory characteristics is not sufficient. Many Computer-Aided Diagnosis Systems (CADs) are also used for recognizing the skin lesions, and to effectively assist the dermatologists’ clinical diagnosis [10]. Therefore, it is very significant to develop an efficient computer-aided Diagnosis system for melanoma classification. At present, one of the most commonly used diagnostic features of melanomas in the CAD systems is the ABCD rule [11], Menzies method [12], the seven-point checklist [13], and the CASH method [14]. These approaches focus primarily on global characteristics such as color, texture, and form that distinguish melanomas, which are difficult to react and represent the same identical appearances of distinct lesions (e.g. melanoma and nevus). This study suggests a method for diagnosing melanoma and resolving the issue of melanoma skin disease. This approach can be described in four important ways. First-ly, this method proposes an effective algorithm for extracting features from dermoscopic images. Secondly, the end-to-end classification method can be applied without a need for any feature selection experience. Thirdly, this method treats all local regional features of skin image equally; it does not consider any specific portion of the image. It does not also require any complex hardware, which is undoubtedly cost-effective by proposing a robust and effective tool that may help in diagnosing skin lesions. Two types of lesions are considered in this study. They are Melanoma and Nevus. 2.2 Melanoma and Nevus Lesions Melanoma [15] is a type of cancer that starts from unchecked cells in nearly any region of the body; cells can become cancer and infects others. Thus, cancer begins and grows. Several other kinds of skin cancers are more rarely reported than melanoma. However, melanoma is serious since it is much more likely to spread to other areas of the body unless early detection takes place. Melanoma [16] is a developing cancer of the melanocytes. Many of the melanoma cells create a dark or brown tumor. Any melanomas, however, do not contain melanin and can look black, brown, or green. Melanomas are most likely to arise with the trunk (thorn as well as back) or anywhere on the skin, in both men and women. Melanoma lesion mostly appears in the face and the neck. Nevus (plural: nevi) is the scientific name for the mole of skin [17, 18]. It is very common. Simple nevi collections of colored cells are harmless. It usually tends to be tiny gray, tan, or purple. It may be born before or without moles. The lumps you are born with are referred to as congenital lumps. Throughout puberty and adolescence, however,
44
A. Sallam et al.
most moles develop; and they are considered a nevus gained. Because of sun exposure, moles will also grow later in life. There are many other nevi types. Some are innocuous and the others are more serious. As shown in Fig. 1, visual appearance between different skin lesions, especially melanoma and benign lesions can be indistinguishable. There are several visual methods used by dermatologists to diagnose melanoma without biopsy, but the accuracy of these methods is poor; it is around 60% [19]. Due to the high similarity between nevus and melanoma, the classification of these diseases is difficult to be done using visual differences. The following figure shows samples for melanoma and nevus.
Fig. 1. The pictures of A, B, C, and D are melanoma samples whereas E, F, G, and H are Nevi cases.
3 Methodology The proposed model of recognizing melanoma disease in early-stage can be depicted as shown in Fig. 2 below.
Fig. 2. The proposed model for recognizing melanoma.
A CNN-Based Model for Early Melanoma Detection
45
Figure 2 clearly depicts the proposed melanoma recognition model. The acquired images are forwarded to the deep neural network after they have been pre-processed. Then in the deep convolution layers, the features of data are extracted and obtained. Finally, the classification results of the input image are shown. 3.1 Experimental Dataset The dataset that is used is ISIC dataset is downloaded from ISIC 2019 Challenge [20– 22]. In total, it includes about 10275 images for both melanoma and nevus classes. About 4275 images represent melanoma class and 6000 images for nevus class. The following figure (Fig. 3) shows the main steps that have been followed to build the proposed model.
Fig. 3. The steps that are done for building the proposed model.
Figure 3 shows the steps that can be followed to train and evaluate the proposed melanoma recognition model. During the training phase, the two classifiers that have been involved are AlexNet and GoogleNet. In the pre-processing phase, it is essential to make the images fit the first layer of the pre-trained models used. The images are resized to 224 × 224 in the case of GoogleNet and 227 × 227 for AlexNet, and the features of each class have been extracted successfully. Later, trained classifiers are employed in the testing phase. 3.2 AlexNet Pre-trained Model AlexNet [23] is a well-known pre-trained model in the ImageNet Larger Visual Recognition Challenge (ILSVRC) in autumn September 2012. AlexNet demonstrated a superior deep learning ability of GPUs. AlexNet is built with 25 layers and it has been commonly used for image classification tasks.
46
A. Sallam et al.
3.3 GoogleNet Pre-trained Model GoogleNet [24] is also a competed and superior pre-trained model since it is implemented as a state of the art responsible for image detection and classification task (ILSVRC 2014). The key mark of this designed model is the increased use of computational power within the network. Due to its engineered architecture that enables the depth and breadth of the network to be expanded while retaining a constant computational budget [25]. The structure decisions were based on the Hebbian theory and the multiple-scale processing insight to maximize efficiency [24]. GoogleNet is constructed of 22 deep-layer networks, whose attributes are evaluated in the sense of classification.
4 Results and Discussion By using deep learning techniques, the outcome obtained from this model in terms of training accuracy is illustrated in Fig. 4.
Fig. 4. The obtained training accuracy.
In Fig. 4, the obtained training accuracy using AlexNet and GoogLeNet pre-trained models is 89.12% and 90% respectively, though many images have been used as an experimental dataset. The efficiency and the robustness of both models can be observed from the outcome of the training loss as reflected in Fig. 5. GoogLeNet pre-trained model showed very promising results; it achieved less than 0.25 loss while AlexNet achieved 0.3. The performance of pre-trained models in terms of accuracy during the validation phase is shown in the following figure (Fig. 6).
A CNN-Based Model for Early Melanoma Detection
47
Fig. 5. The obtained loss during the training phase.
Fig. 6. The obtained accuracy during the validation phase.
In the context, again, the GoogLeNet performs well and competes with the validation accuracy of AlexNet pre-trained model. The validation losses of the pre-trained models are illustrated below in (Fig. 7).
48
A. Sallam et al.
Fig. 7. The performance of the used pre-trained models as regards validation loss.
GoogLeNet achieved better validation loss which goes to around 0.24 than AlexNet which achieved 0.3 at epoch 10. However, in all conducted experiments, it has been noticed that GoogLeNet is very steady and can be used effectively to tackle diagnosis issues of skin diseases. To investigate the allegation, this model has been compared with other current models and the outcome is illustrated in the following table. Table 1. Comparison between this work and the works done in this field of study The author
Dataset used
Classifiers
Results
Lobez et al. [26]
ISBI 2016 Challenge [27]
Modified VGG16
Acc = 81.33%
Abbas et al. [9]
2800 images from: Skin-EDRA ISIC DermNet Ph2-dataset
SVM
AUC = 0.880 Sensitivity = 88.2% Specificity = 91.3%
Mukherjee et al. [7]
Dermofit/MEDNODE
CNN malignant lesion detection (CMLD)
Acc = 90.58 and 90.14%
Prathiba et al. [28]
Harvard Dataset
CNN
NA
Matsunaga et al. [29]
ISIC 2016
CNN
Acc = 83.09%
Adjad et al. [6]
ISIC
SVM
Acc = 86.07%
Yu et al. [30]
ISIC 2016
CNN
Acc = 85%
Our proposed model
10275 images from ISIC 2019
AlexNet, GoogLeNet
Acc = 90.2%
A CNN-Based Model for Early Melanoma Detection
49
As shown in Table 1, the obtained results of this model are promising while comparing to the results of other models. The proposed model has been tested with a large dataset to evaluate its performance. Figure 8 shows the results as testing melanoma samples take place.
Fig. 8. Test samples have been selected randomly.
As shown in Fig. 8, the system is able to recognize melanoma lesion efficiently. Also, it is shown that the number located above each image and next to the recognized class name represents the confidence percentage of the obtained results.
5 Conclusion The designed model has been proposed to classify two types of skin diseases using deep learning pre-trained models. The proposed method can be applied in the field of health informatics to facilitate the diagnosis of melanoma process. In addition to this, it can provide dermatologists with a clear picture of skin diseases in order to help them to decide the treatment. The system can also provide an effective determination and robust solution about early melanoma skin cancer prediction. Expanding the dataset with more classes of diseases and the challenging task of bringing to more robust learning of the network parameters has been left for future works. Besides, the possibility to use the advantage of some pre-processing steps for input images (e.g. color constancy) could be accounted. Finally, the use of a segmentation phase could be considered to obtain registered images for a common reference.
References 1. Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE Syst. J. 8(3), 965–979 (2013) 2. Silveira, M., Nascimento, J.C., Marques, J.S., Marcal, A.R.S., Mendonca, T., Yamauchi, S., Maeda, J., Rozeira, J.: Comparison of segmentation methods for melanoma diagnosis in dermoscopy images. IEEE J. Sel. Top. Sign. Process. 3(1), 35–45 (2009) 3. Ahn, E., Kim, J., Bi, L., Kumar, A., Li, C., Fulham, M., Feng, D.D.: Saliency-based lesion segmentation via background detection in dermoscopic images. IEEE J. Biomed. Health Inf. 21(6), 1685–1693 (2017)
50
A. Sallam et al.
4. Codella, N.C.F., Nguyen, Q.-B., Pankanti, S., Gutman, D.A., Helba, B., Halpern, A.C., Smith, J.R.: Deep learning ensembles for melanoma recognition in dermoscopy images. IBM J. Res. Dev. 61(4/5), 5:1−5:15 (2017) 5. Li, Y., Shen, L.J.S.: Skin lesion analysis towards melanoma detection using deep learning network. Sensors 18(2), 556 (2018) 6. Adjed, F., Gardezi, S.J.S., Ababsa, F., Faye, I., Dass, S.C.: Fusion of structural and textural features for melanoma recognition. IET Comput. Vis. 12(2), 185–195 (2017) 7. Mukherjee, S., Adhikari, A., Roy, M.: Malignant melanoma classification using crossplatform dataset with deep learning CNN architecture. In: Bhattacharyya, S., Pal, S.K., Pan, I., Das, A. (eds.) Recent Trends in Signal and Image Processing: Proceedings of ISSIP 2018, pp. 31–41. Springer, Singapore (2019) 8. Mahbod, A., Schaefer, G., Wang, C., Ecker, R., Ellinge, I.: Skin lesion classification using hybrid deep neural networks. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1229–1233. IEEE (2019) 9. Qaisar Abbas, M., Celebi, E.: DermoDeep-A classification of melanoma-nevus skin lesions using multi-feature fusion of visual features and deep neural network. Multimedia Tools Appl. 78(16), 23559–23580 (2019) 10. Pathan, S., Prabhu, K.G., Siddalingaswamy, P.C.: Control: techniques and algorithms for computer aided diagnosis of pigmented skin lesions—a review. Biomed. Sig. Process. Control 39, 237–262 (2018) 11. Stolz, W.: ABCD rule of dermatoscopy: a new practical method for early recognition of malignant melanoma. Eur. J. Dermatol 4, 521–527 (1994) 12. Menzies, S.W., Ingvar, C., Crotty, K.A., McCarthy, W.H.: Frequency and morphologic characteristics of invasive melanomas lacking specific surface microscopic features. Arch. Dermatol. 132(10), 1178–1182 (1996) 13. Argenziano, G., Fabbrocini, G., Carli, P., De Giorgi, V., Sammarco, E., Delfino, M.: Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis. Arch. Dermatol. 134(12), 1563–1570 (1998) 14. Henning, J.S., Dusza, S.W., Wang, S.Q., Marghoob, A.A., Rabinovitz, H.S., Polsky, D., Kopf, A.W.: The CASH (color, architecture, symmetry, and homogeneity) algorithm for dermoscopy. J. Am. Acad. Dermatol. 56(1), 45–52 (2007) 15. Mitchell, T.C., Karakousis, G., Schuchter, L.: Melanoma. In: Abeloff’s Clinical Oncology. pp. 1034–1051. e1032. Elsevier (2020) 16. What is Melanoma Skin Cancer ? https://www.cancer.org/cancer/melanoma-skin-cancer/ about/what-is-melanoma.html (2019). Accessed 16 May 2020 17. Massi, G., LeBoit, P.E.: Common nevus. In: Massi, G., LeBoit, P.E. (eds.) Histological Diagnosis of Nevi and Melanoma, pp. 29–46. Springer, Berlin (2014) 18. Massi, G., LeBoit, P.E.: Histological Diagnosis of Nevi and Melanoma. Springer, Berlin (2013) 19. Kittler, H., Pehamberger, H., Wolff, K., Binder, M.J.T.l.O.: Diagnostic accuracy of dermoscopy. Lancet Oncol. 3(3), 159–165 (2002) 20. ISIC Dataset. https://challenge2019.isic-archive.com/ (2019). Accessed 1 May 2020 21. Society, A.C.: Cancer Facts & Figures 2019. https://www.cancer.org/content/dam/cancerorg/research/cancer-facts-and-statistics/annual-cancer-facts-and-figures/2019/cancer-factsand-figures-2019.pdf (2019). Accessed 30 May 2019 22. P. Tschandl, C.R., Kittler, H.: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. arXiv:1710.05006. 23. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
A CNN-Based Model for Early Melanoma Detection
51
24. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv 2014. 1409 (2014) 25. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.J.: Going deeper with convolutions. CoRR. (2014) 26. Lopez, A.R., Giro-i-Nieto, X., Burdick, J., Marques, O.: Skin lesion classification from dermoscopic images using deep learning techniques. In: 2017 13th IASTED international conference on biomedical engineering (BioMed), pp. 49–54. IEEE (2017) 27. Gutman, D., Codella, N.C., Celebi, E., Helba, B., Marchetti, M., Mishra, N., Halpern, A.J.: Skin lesion analysis toward melanoma detection: A challenge at the international symposium on biomedical imaging (ISBI) 2016, hosted by the international skin imaging collaboration (ISIC) (2016) 28. Prathiba, M., Jose, D., Saranya, R.: Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks. In: IOP Conference Series: Materials Science and Engineering 2019, vol. 1, p. 012107. IOP Publishing 29. Matsunaga, K., Hamada, A., Minagawa, A., Koga, H.: Image classification of melanoma, nevus and seborrheic keratosis by deep neural network ensemble (2017) 30. Yu, L., Chen, H., Dou, Q., Qin, J., Heng, P.-A.: Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Trans. Med. Imaging 36(4), 994–1004 (2016)
SMARTS D4D Application Module for Dietary Adherence Self-monitoring Among Hemodialysis Patients Hafzan Yusoff1(B) , Nur Intan Raihana Ruhaiyem2 , and Mohd Hakim Zakaria1 1 School of Health Sciences, Universiti Sains Malaysia, 16150 Kota Bharu, Kelantan, Malaysia
[email protected] 2 School of Computer Sciences, Universiti Sains Malaysia, USM, 11800
Gelugor, Penang, Malaysia
Abstract. The mortality rate in hemodialysis patients is 6.3–8.2 times higher than the general population. Failure to adhere to dietary intake recommendation, was one of the most significant factors affecting patient survival. Technology-mediated approach such as web and mobile application could be the most desirable approach nowadays. This paper presents the SMARTS dual application modules development by using ADDIE model, beginning with the analysis of needs, followed by content and face validation in the design phase, and finally the development of application prototype. The application system was designed to enable seamless access, interaction, and monitoring between all the involved users; patient, caretaker, and Healthcare Provider (HCP). Twenty-five respondents involved in the need assessment and also face and validity testing, Most of them are dietitian from government hospital (n = 16, 64%), university medical centers (n = 6, 24%) and private hospital (n = 2, 8%), with ample experience managing hemodialysis patients. Majority of them rated the content (84%), and purpose of the app as a new nutrition education tool (84%) as the most appealing properties of the app, followed by the visual appealing (68%), and variety of topics offered (40%). Some improvisation was suggested on the comprehension and quality of the text, inclusion of nutrient tracker, presentation of education messages in video format, and adding more visuals rather than textual information to enhance understanding. The SMARTS D4D module was well-accepted and supportive of respondents’ needs. Appropriate modifications have been done based on the valuable respondents’ feedbacks. Keywords: Hemodialysis · Dietary plan · Kidney failure · Application module
1 Introduction 1.1 Study Background End-stage renal disease (ESRD), defined as chronic renal disease (CKD) stage five, is a permanent loss of renal function identified as a glomerular filtration rate of less than 15 ml/min requiring hemodialysis treatment. In Malaysia, the number of ESRD patients © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 52–60, 2021. https://doi.org/10.1007/978-3-030-70713-2_6
SMARTS D4D Application Module for Dietary Adherence Self-monitoring
53
has been on an upward track for the past 20 years [1] fueled by aging populations and a wide range of chronic non-communicable diseases, especially diabetes mellitus and hypertension. Survival wise, a previous study suggested that 9–13% of patients on hemodialysis die within one year [2]. Among the most important factors influencing patient survival was failure to conform to treatment regimens [3], including strict schedule of dialysis treatment, medicine, fluid, and dietary intake prescriptions. The standard dietary prescription for ESRD patients undergoing hemodialysis is 750 to 1L of fluid, not more than 2 g of sodium, 2 g of potassium and 1 g of phosphorus. As regards to protein and energy, an intake of 1.2 g of protein per kg body weight per day and 35 kcal energy per kg body weight per day were recommended [4]. Complex food and fluid prescriptions are, however, difficult for patients to adopt, without assistance and close monitoring by the HCP such as dietitian. In Malaysia, it is a common practice that dietitians facilitate dietary and fluid selfmanagement of patients through face-to - face consultation based on referral system, and the dietary record of the patients was done traditionally without proper monitoring system in hand. This might lead to low dietary adherence among the patients [5]. It is quite challenging too for the dietitian to make information available at a time convenient for patients; provide information tailored to individual needs, cultures, and food preferences; and complement patient decision making with useful feedback [6]. Thus, this application system was developed to facilitate self-monitoring among patients and assist healthcare providers especially dietitian to meet these challenges. 1.2 SMARTS D4D Module The idea to develop technology-mediated SMARTS D4D dietary module was originated from dietitian perspective based on years of experience in hemodialysis patients management. The acronym SMARTS D4D represents the specific dietary strategies (see Fig. 1). A dual-application system was then developed based on the module. The app was compatible for both iOs and Android platforms. It also enables for future updates by taking into account the possibility of new methods, treatment regime or parameters to be introduced in future.
Fig. 1. SMARTS D4D dietary strategy
54
H. Yusoff et al.
2 Methodology 2.1 SMARTS D4D Module Development The SMARTS D4d module was designed to empower the hemodialysis patients in controlling their nutrients intake, assist them with routine meal preparation, thus eliminates the unnecessary complication that might occur due to non-adherence. Besides, it also becomes an enabler and support system for the healthcare providers such as dietitian to give precise recommendations and provide personalized diet plan for patients based on anthropometric, physiological, clinical and nutritional evaluations. This module was designed by following the ADDIE instructional design model [7]. The acronym ADDIE represents five phases: Analysis, Design, Development, Implementation, and finally Evaluation (see Fig. 2). This model provides a systematic approach for designing and developing an effective application. We began by critically analyze the existing dietary apps, followed by designing the goals and objectives of our modules, and finally we proceed to the development phase by generating the food database (back-end) and interface design (front-end). Then, we conducted a pilot study to test the content and face validity of the modules. The evaluation phase is yet to be accomplished.
Fig. 2. ADDIE instructional design model
2.2 Analysis of Existing Application System Currently, there is no existing system that helps dialysis patients to keep track of their diet. Before the ideas of developing this system, Health Care Professional kept track manually of their patients’ diet by asking them every time during appointment. That previous method seems unreliable due to patients tend to not keep track of their daily meals. There are a few applications that kind of similar to proposed ideas, but they are not specifically developed for End Stage Renal Disease (ESRD) patient. Table 1 gives a summary of the existing applications targeting hemodialysis patients as potential user with feature comparisons.
SMARTS D4D Application Module for Dietary Adherence Self-monitoring
55
HealthifyMe Mobile App. This application enables the users to track their health, weight loss, and eat healthy food by using the guidance. It also tracks user’s calorie intake, based on the food content. The main purpose of this application is to recommend a custom diet plan for weight loss designed for men and women with specific health goals [8]. CKD Care Mobile App. This application allows medical professionals to estimate kidney function using the eGFR calculator and provides care guidelines. This application only estimates kidney function by calculating possibility of person having kidney problem or not [9].
Table 1. Comparison of existing system. Mobile application 1. Interaction between HCP and users 2. Nutrition and calorie calculator 3. Information about ESRD diet 4. HCP can monitor remotely the patient progress 5. Able to know the nutritional value in foods and drinks 6. Provide a Diet personalize diet plan for specific user
HealthifyMe App. CKD Care App. SMARTS D4D App. √ √
√ √
√
√
√
√
√
√
√
2.3 Modules of Dual-Application Systems The new system was a dual-systems consist of web-based and mobile application platforms as shown in Table 2. The system allows admin and HCP to login to the web application system, while allowing patients to login to mobile application. All patients’ data will be entered by HCP in web application. Patients are able to access the SMARTS dietary plan via mobile application, enables them to keep track of their daily nutrition intake by prompting the input of their daily food intakes in the food library. The HCP then be able to analyze the patient’s health condition based on the input and generate the report. The details on the module was illustrated in Table 3. 2.4 Content and Face Validation Testing A quantitative, online survey involving 25 dietitians was performed to assess their need for an app, their willingness to use an app during consultation with hemodialysis patient
56
H. Yusoff et al. Table 2. The two main platforms.
Mobile application
Web-based application
User: Patient User: Admin (Health-care provider, HCP) Act as information sharing platform Act as a platform to access the patient’s information, including personal data, dietary intakes, anthropometry, biochemical tests etc. Dietary plan Dietary tracker
Table 3. The modules Modules
Mobile and Web-based application
1. User (Patient) management
Allow admin to manage users account, view user information based on the user input Allow user to register and manage their own account or information
2. Food library management
Allow admin to create new food data through system Allow admin to update existing data through system Allow admin to view food data through the system Allow admin to delete food data through the system Allow users to access food list using application and get the food details
3. User report management
Allow admin to enter user assessment report data through application, when examining user/patient Allow admin to view lists of user reports through the system Allow admin and user to access the SMARTS dietary plan Allow admin and user to access food list to be used in the diet plan tracker
4. Diet plan management
Allow user to enter the intake of food based on the existed food library and send it to system Allow admin to monitor daily intake of user through system Allow admin to analyze the user intake based on SMARTS diet plan and monitor the user progress Allow user to view the daily intake of food and monitor the nutritional progress based on the food intake
and assess the content and face validity of the module. Each of them was equipped with a softcopy of the module and the proposed app visualization scheme sent via email. Only respondents who consented and fulfil the inclusion criteria were recruited into this study; a) Physicians, or dietitians, or dialysis nurse from health facilities in Malaysia, b) aged between 18 to 50 years old, and c) willing to complete the study through online survey. The respondents rated the drafted module and app visualization scheme (see
SMARTS D4D Application Module for Dietary Adherence Self-monitoring
57
Fig. 3) based on the appropriateness of the contents (selection of topics and variety), quality of graphics used, and the comprehension of the content (the scientific terms used, fonts and language). The survey assessment items were adapted from a previous study [10].
3 Module Design, Development and Findings 3.1 Technology Deployed and System Architecture The development of this complete system involves various technologies to make it compatible to many platforms. For hardware, processor 1.7 GHz and RAM sized 6 GB were used, developed using HTML, CSS, MySQL, Javascript, PHP and Angular 5 programming language. Other tools also used such as XAMPP server as the localhost server, VSCode as the code editor and Postman as API testing tool. The system architecture as follows (see Fig. 3).
Fig. 3. System architecture of SMARTS D4D application
For the system, it is accessed through the website and mobile application whereby internet connection is required to make use of the system. This system is mainly divided into two which are websites for admin, while the mobile application is for user/patient. This system stores all the data and information within MySQL database and can be accessed through phpMyAdmin. The connection between website, the apps and database were done through.php files and API created using PHP. 3.2 Dual-System Implementation The implementation strategy for developing this system took a bottom-up approach where the core systems were built first. Low level system, which are backend server, Application Programming Interface (API) and database were built first since they serve as the main components of the system. These low-level systems act as the main communication channel for the whole system.
58
H. Yusoff et al.
The backend mainly functions as a service provider that provides services in the form of formatted JSON data and handles all the back-end functionalities. Among those functionalities are communication with database, handling and routing requests from the client through HTTP protocols and protecting from unauthorized access. All requests will be handled by controller and middleware for their own actions such as authenticating users, data movement handling and data format handling to be sent out to the client. The whole backend system was built mostly using PHP and its framework. By using the bottom down approaches, we can build and test the group of subsystems that can be easily implemented with any top-level system (user interfaces). The top-level systems are built later after the bottom level systems already running in an acceptable manner.The top-level system (user interface) was also built using PHP and JavaScript. The top level and bottom level are separated but was communicated or connected by API. The usage of API as middleman can allows for more scalability and dynamic data processing. The API endpoint can be implemented by any other front-end systems such as a desktop application or a mobile application which serve data in an easy to manipulate form such as a JSON format. Using the bottom-up approaches will enable the system user interface to be independently designed and modified without having to make drastic changes to the main low-level system. This provides more efficiency in the development process by enabling multiple programmers to independently build the systems. 3.3 Needs, Content and Face Validity Testing Majority of the respondents are dietitians from government hospital (n = 16, 64%), university medical centers (n = 6, 24%) and private hospital (n = 2, 8%), with ample experience managing hemodialysis patients. Majority of them rated the appropriateness of the content (84%), and the purpose of the app as a new nutrition education tool (84%) as the most appealing properties of the app, followed by the graphic quality (68%), and variety of topics offered (40%). However, most of them recommend improvisation in terms of the comprehension and quality of the text (72%). Responding to the item “How would you judge the comprehension of the module?”, all respondents rated the module as good (52%) and very good (48%) respectively. The detailed item analysis on need, content, and face validity of the module was illustrated in the Table 4 and Table 5. Table 4. Need assessment Item
Yes (%) No (%)
1. Do you think this module is helpful for healthcare providers to monitor and facilitate dialysis patients?
100
0
2. Would you like to use this module as a nutrition education tool?
100
0
3. Would you like to use this module in a mobile application form?
100
0
SMARTS D4D Application Module for Dietary Adherence Self-monitoring
59
Table 5. Content and face validity assessment Assessment itema
Mean rating Standard deviation
1. Appropriateness of the content
4.28
0.66
2. Sufficiency of the content
4.16
0.67
3. Quality of text
4.08
0.56
4. Quality of graphics
3.80
0.94
4.16
0.88
5. Acceptability of: (a) Calorie management (b) Protein management
4.24
0.86
(c) Potassium management
4.24
0.81
(d) Phosphate management
4.20
0.94
(e) Sodium management
4.12
0.99
(f) Fluid management
4.24
0.99
4 Conclusion The SMARTS D4D application module was found to be greatly satisfactory and supportive of respondents’ needs. Appropriate modifications have been done to the module based on the valuable feedbacks given by the respondents. A dual-system comprising of a web service and a mobile application is currently under development. Once completed, this mobile application will enable the patient to keep track with their nutritional intake while HCP could monitor them remotely using one of the modules that features auto calculation and recording of the patient’s dietary intake via the web-based application.
References 1. Bujang, M.A., Adnan, T.H., Hashim, N.H., Mohan, K., Kim Liong, A., Ahmad, G., Haniff, J.: Forecasting the incidence and prevalence of patients with end-stage renal disease in Malaysia up to the year 2040. Int. J. Nephrol. 2(5), 24–34 (2017) 2. Chandrashekar, A., Ramakrishnan, S., Rangarajan, D.: Survival analysis of patients on maintenance hemodialysis. Indian J. Nephrol. 24(4), 206–213 (2014) 3. Collins, A.J., Foley, R.N., Herzog, C., Chavers, B.: US renal data system 2012 annual data report (cl-476). Am. J. Kidney Dis. 61, A7 (2013) 4. Fouque, D., Vennegoor, M., Ter Wee, P., Wanner, C., Basci, A., Canaud, B., VanHolder, R.: EBPG guideline on nutrition. Nephrol. Dial. Transplant. 22(Suppl 2), ii45–ii87 (2017). 5. Luis, D., Zlatkis, K., Comenge, B., García, Z., Navarro, J.F., Lorenzo, V., Carrero, J.J.: Dietary quality and adherence to dietary recommendations in patients undergoing hemodialysis. J. Ren. Nutr. 26(3), 190–195 (2016) 6. Welch, J.L., Astroth, K.S., Perkins, S.M., Johnson, C.S., Connelly, K., Siek, K.A., Scott, L.L.: Using a mobile application to self-monitor diet and fluid intake among adults receiving hemodialysis. Res. Nurs. Health 36(3), 284–298 (2013)
60
H. Yusoff et al.
7. Morrison, G.R.: Designing Effective Instruction, 6th edn. Wiley, UK (2010) 8. HealthifyMe [Mobile application]. https://www.healthifyme.com (2020) 9. CKD Care [Mobile application]. https://www.kidney.org/apps/professionals/ckd-care-intera ctive-guide-clinicians (2020) 10. Dali, W.P.E.W., Mohamed, H.J.J., Yusoff, H.: Development and evaluation of interactive multimedia-based nutrition education package IMNEP (researcher) to promote healthy diet for overweight and obese children. Health 8(1), 24–48 (2017)
Improved Multi-label Medical Text Classification Using Features Cooperation Rim Chaib1,2(B) , Nabiha Azizi1,2 , Nawel Zemmal1,3 , Didier Schwab4 , and Samir Brahim Belhaouari5 1 Labged Laboratory of Electronic Document Management, Badji Mokhtar University,
Annaba, Algeria [email protected] 2 Computer Science Department, Badji Mokhtar University, 23000 Annaba, Algeria 3 Department of Mathematics and Computer Science, Mohamed Cherif Messaadia University, 41000 Souk-Ahras, Algeria 4 LIG-GETALP Laboratory, Grenoble Alpes University, Grenoble, France 5 College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
Abstract. Medical text categorization is a valuable area of text classification due to the massive growth in the amount of medical data, most of which is unstructured. Reading and understanding the information contained in millions of medical documents is a time-consuming process. Automatic text classification aims to automatically classify text documents into one or more predefined categories according to several criteria such as the type of output (multi-label or mono label). Feature extraction task plays an important role in text classification. Extracting informative features highly increases the performance of the classification models and reduces the computational complexity. Traditional feature extraction methods are based on handcrafted features which mainly depend on prior knowledge. The use of these features may involve an insignificant representation. Doc2vec is a way to generate a vector of informative and essential features that are specific to a document. In this paper, the impact of combining handcrafted and doc2vec features in the multi-label document classification scenario is analyzed by proposing a system named MUL-MEDTEC. The one-versus-all classification strategy based on logistic regression is adopted in this study to predict for each medical text it to one or several labels. Experimental results based on Ohsumed medical dataset are very encouraging with based classification accuracy equal to 0.92 as global precision. Keywords: Text categorization · Multi-label classification · Medical text · Handcrafted features · Doc2vec
1 Introduction With the rapid growth of the medical text datasets, most of which are unstructured, it is practical to use machine-based algorithms to extract useful knowledge from these data [1]. Unstructured medical documents are complicated and very hard to handle, but they © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 61–71, 2021. https://doi.org/10.1007/978-3-030-70713-2_7
62
R. Chaib et al.
commonly enclose detailed information on patients which is valuable [2]. The use of automatic tools that classify these documents can help alleviate the hard process of finding information. Text classification can be used to solve a variety of problems [3, 4] and has gained quite an importance in the classification of medical text documents [5, 6]. The main task of the automatic text classification approach is to classify the electronic documents into one or more predefined classes (multi-label or mono label) [7]. Different from the common text classification problems, medical data can be multilabeled: medical documents describing a single patient health may contain one or more illnesses [8]. For better mining knowledge from medical text, the classification algorithm requires an appropriate set of features. Extracting informative features highly increases the performance of the classification model and reduces the computational complexity [4]. Traditional feature extraction methods are based on handcrafted features which are mainly depends on the prior knowledge [9]. The use of those features may involve an insignificant representation. Also, it can generate redundant and irrelevant features in the description space of multi-label medical data, which could limit the obtained system performance. Recently, many studies reported the effects of the feature extraction based on document embedding using the doc2vec technique [10–12, 15]. Doc2vec is a way to generate a vector of informative and essential features that are specific to a document [11]. Doc2vec has shown a promote classification performance mainly in the classification of medical documents [4, 12]. In decision stage, many popular classification algorithms have been used for multilabel classification, among them logistic regression. Indeed, many works have adopted this classifier and they have obtained good results [12–14]. What motivates us to use this classifier to solicit our classification problem. The goal behind this work is to analyze and evaluate the impact of combining handcrafted features and doc2vec features in the multi-label document classification scenario. The proposed system MUL-MEDTEC (Multi-Label Medical Text Classification) has two main learning steps: Doc2vec technique was used as an automatic feature extractor from text documents to generate features vectors in the first stage. In order to reinforce the classification model, other handcrafted features were extracted and integrated as additional features. The second step is to classify the aggregated features using the one-versus-all strategy and logistic regression model as kernel classifier. The remainder of this paper is organized as follows: Sect. 2 describes the basic concepts used in this work. Section 3 exhibits the proposed system in detail. Section 4 deals with discussions of the obtained experimental results. Finally, Sect. 5 summarizes the proposed MUL-MEDTEC and gives further research direction.
2 Preliminaries In this section, we will describe the basic concepts adopted in our system. 2.1 Feature Extraction Feature extraction plays a major role in text classification as it has a direct impact on the classification accuracy [15]. It consists of extracting a list of words from a text data,
Improved Multi-label Medical Text Classification
63
then transforming them into a set of features usable by a classifier. The feature extraction algorithm computes the word’s weights in the text, then creates a numeric vector which represents the text’s feature vector [15, 16]. The techniques of vector representation of words can be divided into two categories: • Traditional approaches such as: bag of words and TF-IDF. • Word embedding based approaches such as: Glove, word2vec, doc2vec, Star-Space, and ELMO. 2.2 Doc2Vec Doc2vec technique proposed by Mikolov [17], can be considered as extension of the Word2vec model [18] is an unsupervised technique that uses a deep 3-layer neural network to create vector representations and facilitate similarity of content documents. The Doc2Vec model is based on the same word2vec concepts with only the addition of another vector (paragraph ID) unique to the document, when forming the word vectors W, the document vector ID is also formed. There are two main training methods for doc2vec, the distributed memory paragraph vector model (PV-DM), and the paragraph vector with a distributed word bag (PVDBOW) [14]. The architecture of the Doc2Vec model can be illustrated as follow (Fig. 1):
Fig. 1. The architecture of Doc2Vec model [19].
2.3 Multi-label Classification Multi-label classification is a challenging problem in the field of natural language processing. It is a variant of single-label classification where a set of labels is associated with a single instance. However, there are other classification issues where each instance can be associated with one or more labels. The traditional single-label classification associates the instance X with a single label L from a finite set of labels so the representation with a single label is (X, L). In multi-label problems, each instance X is associated with a subset of labels S or S ∈ L so the representation with multiple labels is (X, S) [20].
64
R. Chaib et al.
2.4 Multi-label Learning Approaches Many multi-label learning algorithms have been proposed in the literature [21, 22]. Multilabel learning approaches can be categorized into three main families: (i) transformation learning approaches which divide the multi-label problem into several mono-label problems, (ii) adaptation learning approaches which adapt mono-label algorithms so that they can process multi-label data, and (iii) ensemble learning approaches which use a set of classifiers from the first or second family of approaches. The used techniques in each paradigm are presented in Fig. 2.
Fig. 2. Multi-label learning approaches.
3 Methodology The objective of our approach is to generate a robust multi-label text classification system for medical text reports. The main steps of our approach are described in Fig. 3.
Fig. 3. Main steps of proposed MUL-MEDTEC system.
Improved Multi-label Medical Text Classification
65
3.1 Medical Text Preprocessing In order, to evaluate the efficiency of the automatic representation of the features for the multi-label classification, a set of basic preprocessing operations were applied which are: • Stop words: permit the elimination of stop words which are the common words in the language such as: “a, an, is, and… Etc.” because they are judged not representative and will not be able to give useful information for our system. Then, punctuation, special characters, hashtags, HTML, URLs, redundant phrases, and rarely used words were all removed from the dataset. • Lowercase: It aims to transform the input text data into lower case. This step allows us to avoid having a multitude of copies for the same word. For example, when calculating the number of words, “Diagnosis” and “diagnosis” will be considered as different words. • Spelling correction: Spell checking is a useful preprocessing step, as it will also reduce multiple copies of words. For example, “disease” and “desease” will be treated as different words even if they are used in the same sense. • Lemmatization: It consists of representing words in their canonical form: Verb will be replaced by its infinitive and name replaced by its singular masculine. • Tokenization: It transforms a text into a series of individual tokens. Each token represents a word for example [“cardiovascular”,“hypertension”,“diagnostic”, “physiology”, “pathology”,…] etc. Moreover, we have also decided to delete rarely used words because those last ones are considered as unusual data. Therefore, removing all these instances will help us reduce the size of the training data and keep that data informative. Figure 4 illustrates an example of textual data sample before and after applying preprocessing steps. 3.2 Feature Extraction As we describe below the importance of feature extraction stage in text representation and the improvement of the classification stage; a cooperation of two families of feature extraction paradigms is adopted in this work to analyze the impact of feature fusion. In fact, classical approaches in text classification are investigated using a different type of manual feature extraction like statistical features and bag of words. Recently, with the growth of deep learning and vector representation, a doc2vec strategy is considered among the effectiveness techniques to generate numerical vector represented the text. This feature vector has the advantage to take into account the statistical, syntactical, semantic relations between the text words. 3.2.1 Used Handcraft Features The handcrafted features in this study can be summarized by a set of statistical features which are:
66
– – – –
R. Chaib et al.
The number of words in each text document. The average length of the words of each text. The number of stop words And finally, the number of digits in a document text.
Fig. 4. An example of used preprocessing steps.
3.2.2 Doc2vec Based Feature Extraction As defined in Sect. 2.2, a doc2vec model is analyzed in our approach to extract the best numerical features. This is guaranteed by a series of empirical tests based on the many parameters such as the size of the generated vector, the window size, the epoch number, and the strategy choice (the distributed memory paragraph vector or the paragraph vector with a distributed word bag). 3.3 Multi-label Classification Stage Our Multi-label classification has the main objective the classification of the aggregated features using the one-versus-all strategy; this last one uses a basic classifier or kernel classifier as decision model; In our approach, the logistic regression model is adopted. This strategy consists of adjusting a classifier by a class which in our case is logistic regression model.
Improved Multi-label Medical Text Classification
67
In order to evaluate the performance of our system, we use the accuracy evaluation criterion defined in Eq. (1), 1 m I (Yi = h(xi )) CA = (1) i=1 m Y represents the true labels, h(xi ) representes the predicted labels, m denotes the number of instances of the test dataset, I(true) = 1, I(false) = 0.
4 Experimental Results and Discussion 4.1 Used Data In this work, the “Ohsumed” dataset is applied to validate our approach. It consists of a medical abstract concerning 23 categories of cardiovascular disease. The main task was to classify those categories where the documents can belong to several classes, for example, an abstract can belong to four types of diseases (Classes: C2, C4, C9, C23). This data collection is available at [23]. 4.2 Evaluation In this study, we carried out several empirical tests to be able to determine the impact of the different parameters of the Doc2Vec by changing the window size, the size of the feature vector, and the number of epochs necessary. Table 1. Obtained results using Doc2Vec based features. Strategy
Window
Size victor
DM
6
150 200
10
150 200
DBOW
6
150 200
10
150 200
Epoch
Precision
50
0.80
100
0.82
50
0.83
100
0.83
50
0.81
100
0.82
50
0.86
100
0.85
50
0.83
100
0.86
50
0.91
100
0.92
50
0.88
100
0.90
50
0.89
100
0.89
68
R. Chaib et al.
The features cooperation impact is analyzed in this study by comparing the multilabel classifier system with and without statistical feature. Table 1 and Table 2 illustrate the obtained results of some tests. Table 2. Obtained results using features cooperation. Strategy
Window
Size victor
DM
6
150 200
10
150 200
DBOW
6
150 200
10
150 200
Epoch
Precision
50
0.79
100
0.80
50
0.81
100
0.82
50
0.80
100
0.80
50
0.84
100
0.84
50
0.81
100
0.80
50
0.89
100
0.90
50
0.86
100
0.89
50
0.87
100
0.89
With the medical records dataset, the best result of our approach is 92% of accuracy with the following parameters: the DBOW method, window size = 6, vector size = 200, epoch = 100, with the cooperation of statistical and automatic features.
5 Discussion The results of our experiments have shown that the automatic representation of the features based on Doc2Vec has given encouraging results by changing the parameters of the Doc2Vec model. After the combination of the features generated by Doc2Vec and the statistical features, our approach gave better results compared to the model with doc2vec only (0.92 and 0.90 respectively). To better analyze the robustness of our system, we used another performance measure which is the MicroF1 score and we have compared our system with other works already existing in the literature which use the same database (Ohsumed) as shown in Table 3.
Improved Multi-label Medical Text Classification
69
Table 3. Comparison with some works that use Ohsumed dataset. Works
Feature extraction Used measure
[24]
BOW
MicroF1 = 59.91
[25]
BOW
MicroF1 = 73.97
[26]
BOW, TF, TF-IDF Accuracy = 0.72
[27]
TF-IDF
MUL-MEDTEC Doc2Vec
MicroF1 = 43.8 MicroF1 = 86.51 Accuracy = 0.92
From Table 3, we can notice that our system surpasses the others existing works. Based on our study, we can say that many approaches can generate too many features and the performance of the classification system depends on the choice of features vectors, which makes the task of learning even harder. To overcome this problem, feature selection proves to be a suitable solution.
6 Conclusion Classification of medical texts is a special case of text classification. In this paper, we have proposed a multi-label medical text classification system (MUL-MEDTEC) that can predict the different types of diseases in a medical record using the cooperation of two types of feature representation: generated features by doc2vec and statistical features to reinforce the learning of the prediction model. The logistic regression classifier is adopted as basic classifier of the one-versus-all strategy to classify the medical texts. Obtained results confirm the robustness of the proposed overall model (with the cooperation of the features) and gives better results with an accuracy of 0.92. In perspective, we want to use a meta-heuristic approach to find the optimal feature vector for better represent textual data.
References 1. Hughes, M., Li, I., Kotoulas, S., Suzumura, T.: Medical text classification using convolutional neural networks. Stud. Health Technol. Inf. 235, 246–250 (2017) 2. Lenivtceva, J., Slasten, E., Kashina, M., Kopanitsa, G..: Applicability of machine learning methods to multi-label medical text classification. In: Krzhizhanovskaya, V., et al. (eds.) Computational Science – ICCS, Springer, Cham, pp. 509–522 (2020) 3. Benzebouchi, N.E., Azizi, N., Hammami, N.E., Schwab, D., Khelaifia, M.C.E., Aldwairi, M.: Authors’ writing styles based authorship identification system using the text representation vector. In: 16th International Multi-Conference on Systems, Signals & Devices (SSD), IEEE, Istanbul, Turkey, pp. 371–376, 21–24 March (2019) 4. Benzebouchi, N.E., Azizi, N., Aldwairi, M., Farah, N.: Multi-classifier system for authorship verification task using word embeddings. In: 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP), IEEE, Algiers, Algeria, pp. 1–6, 25–26 April (2018)
70
R. Chaib et al.
5. Qing, L., Linhong, W., Xuehai, D.: A novel neural network-based method for medical text classification. Future Int. 11, 255–268 (2019). 6. Alkhatib, W., Rensing, C., Silberbauer, J.: Multi-label text classification using semantic features and dimensionality reduction with autoencoders. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) Language, Data, and Knowledge. LDK 2017, Lecture Notes in Computer Science, vol. 10318. Springer, Cham, pp. 380–394 (2017) 7. Lenc, L., Kral, P.: Word Embeddings for multi-label document classification. In: Proceedings of Recent Advances in Natural Language Processing, Varna, Bulgaria, pp. 431–437, 4–6 September (2017) 8. Guo, Y., Chung, F., Li, G.: An ensemble embedded feature selection method for multilabel clinical text classification. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 823–826 (2016) 9. Azizi, N., Farah, N., Sellami, M.: Ensemble classifier construction for Arabic handwritten recongnition. In: 7th International Workshop on Systems, Signal Processing and their Applications, WoSSPA, pp. 271–274 (2011) 10. Lee, H., Yoon, Y.: Engineering doc2vec for automatic classification of product descriptions on O2O applications. Electron. Commer. Res. 18(3), 433–456 (2017) 11. Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477, 15–29 (2019) 12. Wan, S., Mak, M.-W., Kung, S.-Y.: mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. Anal. Biochem. 473, 14–27 (2015) 13. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Androutsopoulos, I.: Large-scale multi-label text classification on eu legislation, arXiv preprint arXiv:1906.02192 (2019) 14. Hoque, M.T., Islam, A., Ahmed, E., Mamun, K.A., Huda, M.N.: Analyzing performance of different machine learning approaches with Doc2vec for classifying sentiment of Bengali natural language. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE (2019) 15. Dzisevic, R., Sesok, D.: Text classification using different feature extraction approaches. In: 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lituanie, pp. 1–4 (2019) 16. Resham, N.W., Anuradha, D.: Thakare2.: a review of feature extraction methods for text, International Journal of Advance Engineering and Research Development (2018) 17. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Paper presented at the proceedings of the 31st international conference on international conference on machine learning (2014) 18. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Paper Presented at the Proceedings of the 26th International Conference on Neural Information Processing Systems (2013) 19. https://shuzhanfan.github.io/2018/08/understanding-word2vec-and-doc2vec/ Accessed 15 Aug 2020 20. You, X., Zhang, Y., Li, B., Lv, X., Han, J.: VDIF-M: Multi-label classification of vehicle defect information collection based on Seq2seq model. In: Yin, Y., Li, Y., Gao, H., Zhang, J. (eds.) Mobile Computing, Applications, and Services. MobiCASE 2019, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 290. Springer, Cham (2019) 21. Pant, P., Sabitha, A.S., Choudhury, T., Dhingra, P.: Multi-label classification trending challenges and approaches. In: Rathore, V., Worring, M., Mishra, D., Joshi, A., Maheshwari, S., (eds.) Emerging Trends in Expert Applications and Security. Advances in Intelligent Systems and Computing, vol. 841, Springer, Singapore, pp. 433–444 (2019)
Improved Multi-label Medical Text Classification
71
22. Ganda, D., Buch, R.: A survey on multi label classification. Recent Trends Program. Lang. 5(1), 19–23 (2018) 23. https://disi.unitn.it/moschitti/corpora.htm Accessed 24 July 2020 24. Al-Salemi, B., Mohd Noah, S.A., Ab Aziz, M.J.: RFBoost: an improved multi-label boosting algorithm and its application to text categorization. Knowl.-Based Syst. 103, 104–117 (2016) 25. Al-Salemi, B., Masri, A., Noah, S.A.M: Feature ranking for enhancing boosting-based multilabel text categorization. Expert Syst. Appl. 113, 531−543 (2018) 26. Parlak, B., Alper, K.U.: The impact of feature selection on medical document classification. In: 2016 11th Iberian Conference on Information Systems and Technologies (CISTI). IEEE (2016) 27. Burkhardt, S., Stefan, K.: Online multi-label dependency topic models for text classification. Mach. Learn. 107(5), 859–886 (2018)
Image Modeling Through Augmented Reality for Skin Allergies Recognition Nur Intan Raihana Ruhaiyem(B) and Nur Amalina Mazlan School of Computer Sciences, Universiti Sains Malaysia, USM, 11800 Gelugor, Penang, Malaysia [email protected]
Abstract. Skin rashes and allergies are common on human body. To date, we could find many skin care products sold not only in pharmacy but also from individual business. However, not all products suitable for all skin types. As a normal human, we sometimes not know the type of rashes or allergies that we faced. Meeting dermatologist would not be the first choice for many patients – given that the fees are expensive especially. Skin rashes can occur to anybody and an early recognition could avoid the rash become worse. Seeking information online would be the first choice, however patients still in high possibilities in mistakenly buy skin care products. Therefore, the development of the augmented reality application for skin rashes and allergies detection is expected can solve the problem. With the help of dermatologist and healthcare people, the information in this application is established and trustable. Among the advantages of this application are the ability in detecting of different types of skin rashes, displaying informative details on the detected skin rashes to reduce wrong judgement on the allergies the patient faced, and reasonable processing speed on mobile screen. Keywords: Augmented reality · Skin rashes · Image processing · 3D modeling · Mobile application
1 Introduction Augmented reality is using technology to integrate the digital information from the user’s environment in real time. By using the augmented reality, the application will allow to overlay new information on top of the existing environment. 3-Dimensional (3D) modeling or 3D program is the main feature in augmented reality application as it will allow the developer to store the 3D animation or digital information in the computer program to an augmented reality marker in the real world. When the device of the augmented reality application receives digital information from a known marker, the application will execute the marker’s code and layer the correct 3D modeling or animation. In this research project, augmented reality will be used to solve the problem of skin rashes. This research work needs to get the data for each type of skin rashes and/or allergies, so that the application can differentiate the types of skin rashes, create 3D modeling for skin rashes and allergies and display the information on the mobile © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 72–79, 2021. https://doi.org/10.1007/978-3-030-70713-2_8
Image Modeling Through Augmented Reality
73
screen. To solve this problem, the augmented reality technology is used to process the patient’s skin problem image through scanning. At the system interface the target object will pop out on the focus area screen. The user needs to locate the camera at the targeted skin rashes within the target object focus area screen. After a complete scanning, the screen will display the expected 3D modeling of the skin rashes type to the user together with the detailed information. Skin rashes or allergies can make someone feel itchy and not comfortable. From baby to elderly are exposed to the skin rashes. There are about 50 types of rashes and in certain cases it looks exactly like the other type of rashes or allergy. Many of which would look like to each other. Although users prefer to seek information such as from Google image to search about the type of the skin rashes, but mistakes would happen perhaps because of user’s perception. Helping users to overcome the skin rashes problem in daily life has been the motivation for this research work as well as drive to the development of this application. This application is believed to help users who are very shy to ask about their skin problems, especially on their private part. The main objective of this system is to develop a complete augmented reality application for skin allergies that can be used by users or patients. The application system also equipped with special feature where the user is able to get to see the expected 3D modeling of the skin rashes and allergies from the scanning, which help the user to understand more what happen to their skin and the type of the skin rashes. Moreover, this application is a new development which is expected to be used by many types of users. Augmented reality (AR) is a technology that produces an information including processing 3D modeling from the user’s real time environment through Unity; a software which utilized for a good visualization and interaction of mobile AR (Kim et al. 2014), with large programming toolsets, (Eriksen et al. 2020) and able to improve the user interface (Kim et al. 2014; Nuryono and Iswanto 2020). Image processing, 3D model and database are the features in AR application. There are four special features of this application, firstly, the rotation speed of the 3D modeling – this will ease user to explore underneath the skin rashes (in video format). Secondly, information text display, for example the information such as symptoms to help users differentiate the type of skin rashes. Thirdly, the Vuforia target manager where the Vuforia SDK can detect and track from the image targets which represent images. Last one is the 3D modeling which will model 3D of that skin rashes to increase users’ understanding of the expected outcome from the image processing (i.e. early diagnosis). Four modules offered in the system development, tabulated in Table 1. The application would give benefits and impact to end users such as it helps the patient to get early diagnosis at home if the patient is too shy to go out, helps parents identify baby’s skin rashes, and it is believed could save more time for multiple users, such as the patients, parents as well as pharmacist. The uniqueness of the application is there is no skin rashes application that use augmented reality where this allows 3D model appear to give the probability or good percentage on early diagnosis using image processing compared to website which generally provide only photos.
74
N. I. R. Ruhaiyem and N. A. Mazlan Table 1. Description of system module.
Module
Description
Image database
The image will be store in database of the Vuforia Target Manager. In Unity, the image will be the AR marker which will trigger the 3D model to be display
3D model of type skin rashes
3D model of skin rashes is created and build by Blender as the platform. The complete 3D model is imported into the Unity. The 3D model will be setting with selected image from database according the types of the skin rashes
Real-time skin allergy tracking
Skin allergy is detected in real-time. The natural features found in the image itself is detected using SDK by comparing these natural features against a known target resource database
Real-time skin allergy result
The result of the skin allergy is in 3D model of the type skin rashes or allergy. The expected 3D model is added (layer) to the skin allergy of the user in real world through the AR application. The information in forms of text also displayed
1.1 Related Work As mentioned earlier, there is no similar mobile application for skin rashes detection through augmented reality technology. There is one mobile application which is very close to this project; called Doctor Mole Skin Cancer app, which is using the standard Asymmetry, Border, Color, Diameter and Risk (ABCDE) approach in order to determine and give instant risk feedback (Doctor Mole 2015). This app focusing on skin cancer and detecting the malignant lesions. Thus, the detection and recognition techniques are different. Doctor Mole is a medical app which used to detect skin cancer by using AR and camera to scan and analyze the suspicious mole in real time. The captured photo is saved and can be used again to see the evolution changes from time to time. Another similar approach for AR technology is tracking with fine object segmentation (TFOS) which originally proposed in year 1989, where it introduced the basic properties of three new variational problems which are suggested by applications to computer vision (Mamford and Shah 1989). In 2013, taking advantage of TFOS, a novel method for on-line, joint object tracking and segmentation was introduced (Konstantinos and Antonis 2013).
Image Modeling Through Augmented Reality
75
1.2 Proposed System Beside diagnosing the skin rashes or allergies in real time, the app also can be used to educate patients and users about skin rashes or allergies through 3D modeling. The main features in this app are; real time detection or scanning that allows the application being used directly to the human body and display 3D model as the outcome or the result from the diagnosis, the offline capabilities, and the application priority where the speed to display the result to user is one of the application priority and the application will not make the user to wait while the application loading.
2 System Analysis, Design, and Implementation This application system can be a service or a product to the client or user. The main features in this application; firstly, the real time detection – where the augmented reality allows the application being used on human body or skin directly and display 3D model as the outcome or result from the detection, and secondly the offline capability – where the apps surely can be used without internet connection. The system capabilities are including the detection speed and the 3D object modeling, where both will take advantage of the smartphone camera (at least 5 MP and above for better detection and results). Like any other application, this app also has its own limitation such as the application can’t be used in the dark or insufficient light place, there is also no social media sharing information and no sound integrated with this system. These limitations are something can be focused on future. The architecture diagram depicted in Fig. 1 shows the overview of the app on how it works from the detection phase until the production of results. For the image database – images of skin rashes are taken from the trusted website and from collection of private photos. The images are stored in database of the Vuforia Target Manager, where the features are tabulated in Table 2. Table 2. Features of the Vuforia target manager. Feature
Description
Rating
This rating is displayed in the Target Manager and the range rating from 0 to 5 for any given image. The higher the rating of an image target, the stronger the tracking and detection ability it contains. Zero rating indicates the image target is not tracked at all by the AR system. Rating at five indicates that the image target is easily tracked by the AR system
Add Target
Add the more the image target by uploading the image using the Add Target button to the Target Manager database
Download Database The downloaded database in form of unity package allow all the image target been import into the Unity
76
N. I. R. Ruhaiyem and N. A. Mazlan
Fig. 1. System architecture of the AR application on skin rashes and allergies.
Image Modeling Through Augmented Reality
77
3 System Testing and Evaluation In Unity system, there is a play button to render the scene. Once it is tuned on, it will show the result whether the app is working well or not. The system is considered successfully working when the 3D model appears after the camera target the image, 3D model can be rotated, and the text is displayed. Other scenario can occur such as with different target image and different setting of lighting such as different values of hue/saturation used. Generally, the test results are good as the speed of tracking and detection is fast.
4 System Interface Design Unity is the main software used for development of the app where the images were created to serve as AR marker (Fig. 2). Here, all settings including camera and image position will be fine-tuned. For 3D models of the skin rashes which used for providing extra information to users, Blender is applied. This software has the ability not only in creating a static 3D image, but also capable to generate motion 3D graphic (Fig. 3). As one of the objectives of this system is to produce an AR application with learning tool on skin rashes, 3D model is produced for learning purpose once the rashes detected and recognized (Fig. 4). Some information including the skin rashes, tips on how to recognize them and tips to heal them will pop out on the screen as well. The evidence that the application meets the requirement and work is in the application, the 3D model is working, the database can be stored and uploaded in a package into Unity, the SDK can detect and track the image and the 3D model rotation is working give the opportunity to user to explore 360° under the skin.
Fig. 2. Unity interface showing the settings of AR camera (left side) and image position (right side) and other settings important for AR marker.
78
N. I. R. Ruhaiyem and N. A. Mazlan
Fig. 3. Blender interface showing all available tools for 3D model development (left side) and render settings (right side).
What are the symptoms of Mosquito bite? 1. Puffy bump on the skin immediately after the bite 2. Reddish brown, itchy bumps 3. Dark spots or bruises caused by itching 4. Mild fever and body ache
Fig. 4. Interface of working AR app, 3D model together with the information will be pop up after the skin rashes successfully detected (which prove that AR marker is working well).
5 Conclusion Using AR as one of a new technology approach for medical field which easily can be used by a lot of people with interaction in real time is something interesting to be explored. Educating or displaying information through 3D model also something should be widely used as it can give a real scenario in real life and easy to understand it (Loke and Ruhaiyem 2020; Teh et al. 2020). Furthermore, this application also provides extra benefit in educating users to know more about skin problems. The important findings found is to know that Unity 3D can be used to create android application in cooperation with AR technology. To create the 3D model, many 3D modeler software (e.g. Maya, Blender, and 3D studio max) has been tested before Blender is chosen. In future however, there are rooms for other technologies could be explored for better AR findings as problem solver application.
Image Modeling Through Augmented Reality
79
References Doctor Mole app 2015 Homepage. https://apkpure.com/doctor-mole-skin-cancer-app/com.rev soft.doctormole. Accessed 02 Sep 2020 Eriksen, K., Nielsen, B.E., Pittelkow, M.: Visualizing 3D molecular structures using an augmented reality app. J. Chem. Educ. 97(5), 1487–1490 (2020) Kim, S.L., Suk, H.J., Kang, J.H., Jung„ J.M., Laine, T., Westlin, J.: Using Unity 3D to facilitate mobile augmented reality game development: In IEEE World Forum on Internet of Things (WF-IoT) (2014). Konstantinos, E.P., Antonis, A.A.: Integrating tracking with fine object segmentation. Image Vis. Comput. 31(10), 771–785 (2013) Loke, H.K., Ruhaiyem, N.I.R.: A conceptual perspective in mathematics through augmented reality and 3D image modeling application. In: 8th International Conference on Multidisciplinary Research on European Proceedings of Social and Behavioral Sciences, pp. 613–620. European Publisher (2020) Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989) Nuryono, A.A., Iswanto, A.M.: Comparative analysis of path-finding algorithm unrestricted virtual object movable for augmented reality. Int. J. Sci. Technol. Res. 1(1), (2020) Teh, Y.X., Ruhaiyem, N.I.R., Syed-Mohamad, S.M.: MYTOXAPP: A mobile system toxicology emergencies through image processing. In: 8th International Conference on Multidisciplinary Research on European Proceedings of Social and Behavioral Sciences, pp. 694–700. European Publisher (2020)
Hybridisation of Optimised Support Vector Machine and Artificial Neural Network for Diabetic Retinopathy Classification Nur Izzati Ab Kader, Umi Kalsom Yusof(B) , and Maziani Sabudin School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia [email protected], {umiyusof,maziani}@usm.my
Abstract. Diabetic Retinopathy (DR) is a threatening disease which causes blindness in diabetic patients. With the increasing number of DR cases, diabetic eye screening is a challenging task for experts. Adopting machine learning to create a high accuracy classifier will be able to reduce the burden of diabetic eye screening. Therefore, this paper aims to propose a high accuracy DR classifier using clinical attributes. This study was conducted using nine clinical attributes of 385 diabetic patients, who were already labelled regarding DR, where 79 patients did not suffer from DR (NODR), 161 patients had nonproliferative DR (NPDR), and 145 patients had proliferative DR (PDR). The data was then used to develop a DR classifier through the hybrid of optimised Support Vector Machine (SVM) and Artificial Neural Network (ANN). The experiment results showed that the hybrid classifier had a high accuracy of 94.55. The accuracy yield was higher compared to single classifier. Keywords: Diabetic Retinopathy · Classification · Hybridisation · Support Vector Machine · Neural Network
1 Introduction Diabetic Retinopathy (DR) is one of the complications from Diabetes Mellitus (DM) that affects 1 in 3 persons with DM. It is caused by the damage of retinal blood vessels and light-sensitive tissue at the back of the eye. It may be asymptomatic at an early stage but eventually can cause permanent vision loss if not diagnosed and treated in time [1]. According to the WHO Global report, the number of adults living with diabetes is increasing year by year, which had quadrupled from 108 million in 1980 to 422 million adults in 2016. The rise in Type 2 diabetes and the factors driving it that include overweight and obesity have become the main factors that contribute to this drastic rise [2]. With the increasing prevalence of diabetes nowadays, classification of abnormal retina has become a challenging task as a large number of retinal images need to be diagnosed by ophthalmologists every day. Screening process and early detection of DR play a significant role in helping to reduce the incidence of visual morbidity and blindness. The screening processes are © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 80–90, 2021. https://doi.org/10.1007/978-3-030-70713-2_9
Hybridisation of Optimised Support Vector Machine
81
done manually in most countries [3]. Usually, normal healthy vessels and abnormal vessels are differentiated by ophthalmologists using relative characteristics based on their experience, which can lead to inconsistencies during the grading process [4, 5]. The process is carried out using an ophthalmoscope to inspect the fundus of the eye directly. The pupil will be dilated before it is examined [6]. The retinal qualitative scale, such as mild, moderate, severe, and extreme, is used to evaluate the retina. Occasionally it is useful, however, it is not that effective. It can cause issues of variability in grading as the boundaries between the grades may differ between observers [7] and may also be prone to errors [8]. The high prevalence of the disease is drawing attention from all parties to step up prevention and treatment of the disease. Possibility of collaboration between experts from different areas can be achieved using the technology nowadays [9]. Currently, the applications of computational technique have had a significant impact on the health sector. For instance, supervised machine learning is widely used to predict the presence and absence of the disease [10]. These methods play an essential role in improving the way for diagnosis and treatment of the disease. Amongst the solutions which have been proposed by previous researchers is a DR classification that can assist ophthalmologists in the grading process. Various methods had been done previously for DR classification such as retinal imaging which is a classification technique performed based on the abnormalities found on retinal fundus images [11]. Although it facilitates early detection of DR, additional equipment is required, which is quite cost-prohibitive or sometimes unavailable, especially in rural areas. On the other hand, several DR classifiers have been developed using clinical variables as an alternative to retinal imaging. However, there is still a need for improvement, especially in the accuracy of the classifiers. Therefore, this study proposed to classify DR with the objective to find DR classifiers with optimal or near-optimal performance matrices using a hybrid of optimised Support Vector Machine (SVM) and Artificial Neural Network (ANN), known as SVM-NN. The organisation of the paper starts with Sect. 2 that reviews the literature on the domain problem. Then, Sect. 3 discusses the proposed work. In Sect. 4, the results of the proposed work are evaluated. Finally, Sect. 5 provides a discussion on the results while Sect. 6 concludes the paper.
2 Related Works 2.1 Support Vector Machine Support Vector Machine (SVM) is an algorithm introduced by Corinna Cortes and Vladimir Vapik in 1995 [12]. It is used for classification of input data received by a computing system and also for regression tasks. SVM is categorised as a supervised machine learning method with the objective to classify data points by generating a hyperplane to discriminate between two classes after the input data has been transformed into high-dimensional space [13]. It works based on the concept of principle fitting boundary to the homogeneous region. When a boundary is fitted, the test sample has to be checked, either it lies inside the boundary or not. There is a core of set points that can help identify and fix the boundary. These set points are called support vectors as their function is to support the boundary. The name
82
N. I. Ab Kader et al.
of the vector is because each data point is a vector which is a row of data that contains values for a number of different attributes. The specialty of SVM is that it can efficiently perform a non-linear classification using the kernel trick. The kernel trick function allows the construction of the algorithm without a feature space [14]. 2.2 Artificial Neural Network Artificial Neural Network (ANN) is a computational technique inspired by the nature of neurophysiology. This is analogous to the human brain processing, where synapses are reinforced or weakened [15]. The human body has the capability to receive, process, and send signals through nerve pathways. With the presence of neural cells that are made up of nerve endings and nucleus of the axon, the signals can be transmitted across synapses through chemical treatment. ANN has been applied in various branches related to science and technology since the 1980s [16]. For instance, in the medical sector, ANN has proven useful in the analysis of blood and urine samples of diabetic patients, leukemia classification, diagnosis of tuberculosis, and complicated effusion samples with analysis [17]. There are three layers in the structure of ANN, namely input layer, hidden layer, and output layer. The number of neurons in each layer is determined by the complexity of the system studied. Figure 1 shows the architecture of an Artificial Neural Network. It shows the three layers of ANN, which are input layer, hidden layer, and output layer. The input layer is the information received and will be processed in the hidden layer in order to be an output.
Fig. 1. Architecture of Artificial Neural Network.
ANN is known for its ability to learn from examples, which is a significant trait of intelligence. Instead of following a particular rule specified by human experts, ANN appears to learn from examples (such as input-output relationship) which makes it attractive and exciting. The learning process of ANN can be understood as the problem of updating network architecture and connection weights that will enable the network to function efficiently. The network learns the connection weights from the available training patterns.
Hybridisation of Optimised Support Vector Machine
83
2.3 Hybrid Machine Learning Algorithm The hybrid algorithm is about combining two or more algorithms into a single hybrid algorithm [18]. It is inspired by the possibility of this new algorithm performing better than an individual algorithm. Hybrid is also known as poly-algorithm; when there is a choice at a high level between at least two distinct algorithms, each of which could solve the same problem. It is motivated by an increase in the performance of the execution, depending on both input/output data and computing resources. The hybrid algorithm implementation can be based on the divide, recursive, and conquer method. The divide means to split the step of the first algorithm into smaller steps to see the opportunity to create new parameters and fill in the feature from the subalgorithm. Recursive means repetition of the sub-algorithm, while conquer means the main algorithm controls the sub-algorithm. The first algorithm in this hybrid is denoted as A1 . The second algorithm is denoted as A2 . The hybrid algorithm, H, is developed by modifying the code of A1 by introducing a new parameter, n0 , where algorithm A2 fills in [19].
3 Proposed Work 3.1 Data from the Electronic Health Record The dataset used in this study was provided by the Eye Clinic of the Sakarya University Educational and Research Hospital. The dataset had previously been used in [20], which investigated the DR prediction using Naive Bayes. It contains 385 diabetic patients, who were already labelled regarding DR, where 79 patients were not suffering from DR (class NODR), 161 patients were presented with NPDR, and 145 patients were presented with PDR. NPDR is the moderate stage while PDR is the most severe stage in the DR classification. The attributes in this data are numerical (Haemoglobin, Glycated Haemoglobin, Low-Density Lipoprotein, High-Density Lipoprotein, Diabetes Duration, Creatinine, Triglyceride, Glucose, and URE). 3.2 Performance Evaluation The general purpose of performing classification is to predict the categorical class label for unknown data based on the classification model built by the training data. Confusion matrix is a table that consists of the performance of the classification model in which true values are known. It contains information regarding actual and predicted classification done by a classification system [21]. The accuracy of the algorithm is indicated by the percentage of the test dataset, which is correctly classified by the algorithm. It is used to measure the general performance of the algorithm using the confusion matrix. It is evaluated by calculating the correctly predicted True Positive (TP) and True Negative (TN) classifications based on Eqs. 1–3. Accuracy = (TP + TN ) = TP + FP + TN + FN
(1)
Precision = TP = (TP + FP)
(2)
84
N. I. Ab Kader et al.
Recall = TP = TP + FN
(3)
Apart from that, sensitivity and specificity are measured from the confusion matrix in order to get more specific information on the performance of the algorithm. Sensitivity measures the relevant instances selected while specificity measures the exactness of the algorithm. The words sensitivity and specificity had their origins in screening tests for diseases. Sensitivity is defined as the probability that the test says a person has the disease when, in fact, they do have the disease. In other words, it measures how likely it is for an algorithm to pick the presence of a disease in a person who has it. On the other hand, specificity is defined as the probability that the algorithm says a person does not have the disease when, in fact, they are disease-free. It is also an important measure to be considered. An ideal algorithm should have high sensitivity and high specificity values [22]. It is evaluated by calculating the correctly predicted True Positive and True Negative classifications based on Eqs. 4–5. Sensitivity = TP = TP + FN
(4)
Specificity = TN = TN + FP
(5)
F-measure is also used to measure the performance of the algorithms. F-measure is a harmonic mean of precision (positive predictive value) and recall (exactness of algorithm). According to Van Rijsbergen (1979), F-measure is defined as a combination of recall (R) and precision (P) with equal weight as in Eq. 6. F = 2PR/P + R
(6)
According to [23], precision can be understood as the probability that a randomly chosen predicted positive instance would be relevant while recall is how close we are to a specific target on average. 3.3 The Overall Flow Architecture of Hybrid Optimised Support Vector Machine and Artificial Neural Network SVM-NN is a combination of prediction output from optimised Support Vector Machine and Artificial Neural Network. In SVM-NN, the primary algorithm (A1 ) is SVM while the secondary algorithm (A2 ) is ANN. Figure 2 shows the overall flow architecture of SVM-NN. SVM-NN started with the input initialisation which was the clinical features from the dataset mentioned in Sect. 3.1 and SVM parameters. The kernel used was RBF kernel, because its efficiency is higher. Therefore, two hyperparameters of RBF, Cost, C, and Gamma, U were involved. Prior to the hybrid process, it underwent a phase called hyperparameter optimisation to ensure that the SVM was optimised. The goal is to ensure that the best hyperparameter runs on the SVM. Therefore, C was set to 64 from the optimised SVM and U was set to 0.03. Two essential hyperparameters which are Hidden Layer and Neuron were involved for ANN.
Hybridisation of Optimised Support Vector Machine
85
Fig. 2. The overall flow architecture of SVM-NN.
The input vectors were then propagated to the hidden layer for process (training) step development. It was equipped into an algorithm of backpropagation, also known as automatic differentiation. The backpropagation algorithm is a tool usually used to help ANN change neuron weights and biases if the result is unsatisfactory. Backpropagation aims to refine weights such that ANN can understand ways of mapping arbitrary input to outputs such that the target output can be similar/closer to the actual output. In ANN, this is what is known as “learning”. The initiation process, error estimation, and modified weight were continued until the full number of iterations was reached. Normally, the output is generated and sent to the output layer after the training phase has ended. For SVM-NN, the output layer was truncated and replaced with SVM’s RBF kernel. Thus, the output generated became an input for SVM operation. Next, SVM was trained using the RBF kernel for the processed data. Equation 7 describes the new RBF kernel formulation. This formula is nearly the same as RBF’s original formula. However, they are different in terms of input computation. RBF’s calculation uses the original dataset, while Eq. 7 uses SVM derived from the hidden layer. K(XP , X1 ) = C −Y ||XP −X 1||
2
(7)
86
N. I. Ab Kader et al.
Training was continued until the maximum iteration, maxit = 10, was reached. Upon reaching the maximum iteration, the model was tested using test data. Then, the results produced were analysed.
4 Results Table 1 shows the performance results of SVM-NN model. The first performance measure observed was the accuracy of the algorithm. With the nine inputs fed and processed in the hidden layer and also trained with the kernel function, SVM-NN showed a significant improvement in classification, which was 94.55% accuracy. In the other additional metrics, SVM-NN also obtained a considerable performance measure. The average value of sensitivity and specificity for all classes in SVM-NN were high at 0.9511 and 0.9704, respectively. SVM-NN showed a high F-measure with better precision and recall performance. F-measure conveys an average of precision and recall. The best performance value of F-measure is at 1. The F-measure value obtained by SVM-NN was almost 1 when calculated on average, which was at 0.9500. Table 1. Result of SVM-NN based on each class of Diabetic Retinopathy. Algorithm
Accuracy
Class
Sensitivity
Specificity
Precision
Recall
F-Measure
SVM-NN
94.55
NODR
0.9873
0.9902
0.9630
0.9873
0.9750
NPDR
0.9627
0.9375
0.9172
0.9627
0.9394
PDR
0.9034
0.9833
0.9704
0.9034
0.9357
4.1 Comparison of Results Between SVM-NN, Optimised SVM, and Non-optimised SVM algorithms The performance results’ comparison between SVM-NN algorithms with optimised SVM and non-optimised SVM are tabulated in Table 2 based on each class of DR. Optimised SVM is the SVM algorithm that runs with the optimal parameter setting. In contrast, non-optimised SVM is the SVM algorithm that runs with the default parameter setting. For NODR class, the performance of SVM-NN for each metric was more than 0.9630. The highest output metric obtained by SVM-NN in this class was specificity with 0.9902. High specificity means the algorithm can recognise a person as negative without the illness. It means the model has a good ability to correctly classify patients without DR and with low false positive outcomes. It was proven based on the confusion matrix that showed that SVM-NN correctly identified all NODR patients except one person who was wrongly labelled as NPDR patient. Compared to the optimised and non-optimised SVM, SVM-NN achieved the highest performance for each performance metric. SVM-NN also showed a strong result in the NPDR class, recording more than 0.9172. The highest output metric obtained by SVM-NN in this class was sensitivity with 0.9627.
Hybridisation of Optimised Support Vector Machine
87
High sensitivity means a person with the illness can be identified as positive by the algorithm. Hence, it means the model had successfully classified the patients who were positively at the NPDR level. It had shown progress by an additional 0.1677 relative to the non-optimised SVM. SVM-NN’s lowest output yield for this class was 0.9172 on precision. Nevertheless, when contrasted with optimised and non-optimised SVM, it was still considered high. Table 2. Result for Diabetic Retinopathy classification for SVM-NN compared with optimised SVM and non-optimised SVM. Class
Techniques
Accuracy
Sensitivity
Specificity
Precision
Recall
F-Measure
NODR
SVM-NN Optimised SVM Non-optimised SVM
94.55 85.45 76.62
0.9873 0.9494 0.8481
0.9902 0.9804 0.9771
0.9630 0.9260 0.9054
0.9873 0.9494 0.8481
0.9750 0.9375 0.8758
NPDR
SVM-NN Optimised SVM Non-optimised SVM
94.55 85.45 76.62
0.9627 0.8882 0.7950
0.9375 0.8438 0.7545
0.9172 0.8033 0.6995
0.9627 0.8882 0.7950
0.9394 0.8437 0.7442
PDR
SVM-NN Optimised SVM Non-optimised SVM
94.55 85.45 76.62
0.9034 0.7655 0.7813
0.9833 0.9376 0.7812
0.9704 0.8810 0.6897
0.9034 0.7655 0.7813
0.9357 0.8192 0.7327
SVM-NN sustained the performance quality in the PDR class, with more than 0.9034. The specificity of SVM-NN for this class yielded the highest value compared to the other measures. It means that SVM-NN can recognise people who have no PDR as really not having PDR (patients’ prompt negative outcome). SVM-NN’s precision was also high at 0.9704 for this class. Precision indicates the outcome accuracy on repeated tests. High precision means the evaluation of the result is highly consistent.
120 100 80 60 40 20 0 SVM-NN
OpƟmised SVM NODR
NPDR
SVM PDR
Fig. 3. Comparison of sensitivity between SVM-NN, Optimised SVM, SVM.
88
N. I. Ab Kader et al.
The sensitivity measurement of SVM-NN algorithm was also compared to optimised and non-optimised SVM. Figure 3 shows the comparison of sensitivity between SVMNN, Optimised SVM, and SVM. The SVM-NN sensitivity test was higher than the other two algorithms. This means that the addition of the hybridisation stage enhanced SVM’s ability to identify NODR, NPDR, and PDR patients.
5 Discussion In this research, the hybrid technique between optimised SVM and ANN, also called SVM-NN, which is a method of improving DR classification was carried out. SVM-NN provided substantial results with the right combination of these two algorithms, and it showed improvements compared to the optimised algorithm and the non-optimised algorithm. The hidden layer implemented in the optimised SVM was found to play a role in improving the performance. An experiment was performed where the hidden layer was removed from SVM-NN to test its result without the hidden layer. The result showed that the SVM-NN’s accuracy was lower. It thus proved that the hidden layer plays a significant part in SVM-NN. With the inclusion of the hidden layer in SVM-NN, the input vectors were processed prior to the SVM training. The process of measuring error occurred in the hidden layer, and the operation that occurred in the hidden layer produced an encoding of what the network considers to be the important input features. The number of neurons (nodes) in the hidden layer is an important factor that needs to be carefully determined because it will affect the model, either to be underfitting or overfitting. The number of neurons chosen in this study was an appropriate number for the SVM-NN model. Apart from that, another important factor that eased the cycle of learning mechanism in the hidden layer was the backpropagation algorithm that functions to change weight and biases for neurons. The weight and bias for each node cannot be optimised without the efficiency of the backpropagation algorithm, and a good output cannot be produced.
6 Conclusions In this paper, the study of a hybrid of optimised Support Vector Machine and Artificial Neural Network had been done in detail. From the study, the results showed that SVMNN gave the best performance with 94.55% accuracy and gave a better result compared to the current literature. The implementation of the proposed DR classification with excellent performance will be able to serve as an aid in assisting experts in the diagnosis of DR. It can help the experts in improving decision making and can become a standard guideline for the diagnosis. In addition, it is highly essential to classify and categorise the severity of DR to establish adequate therapy. With the healthcare industry continually looking to improve efficiency and throughput, this study seems to be a satisfactory solution that can provide fast results and timely management of eye screening. Further studies should be conducted to improve the performance of these classification techniques by using a larger dataset. Other performance measures, such as time complexity can also be included.
Hybridisation of Optimised Support Vector Machine
89
Acknowledgement. The authors would like to thank Universiti Sains Malaysia for the assistance it has provided through the Fundamental Research Grant Scheme (203/PKOMP/6711802) to complete the current work.
References 1. Sreekala, X.S., Piri, D., Delen, T., Liu, H.M.: Zolbanin: a data analytics approach to building a clinical decision support system for diabetic retinopathy: developing and deploying a model ensemble. Decis. Support Syst. 101, 12–27 (2017) 2. Zaki, W.M.D.W., et al.: Diabetic retinopathy assessment: towards an automated system. Biomed. Sig. Process. Control 24, 72–82 (2016) 3. Chen, W., Yang, B., Li, J., Wang, J.: An approach to detecting diabetic retinopathy based on integrated shallow convolutional neural networks. IEEE Access 8, 178552–178562 (2020) 4. Ramos, L., Novo, J., Rouco, J., Romeo, S., Álvarez, M.D., Ortega, M.: Retinal vascular tortuosity assessment: inter-intra expert analysis and correlation with computational measurements. BMC Med. Res. Methodol. 18(1), 1–11 (2018) 5. Gargeya, R., Leng, T.: Automated identification of diabetic retinopathy using deep learning. Ophthalmology 124(7), 962–969 (2017) 6. Mapayi, T., Tapamo, J.-R., Viriri, S., Adio, A.: Automatic retinal vessel detection and tortuosity measurement. Image Anal. Stereology. 35, 117–135 (2016) 7. Wu, B., Zhu, W., Shi, F., Zhu, S., Chen, X.: Automatic detection of microaneurysms in retinal fundus images. Comput. Med. Imaging Graph. 55, 106–112 (2017) 8. Qomariah, D.U.N., Tjandrasa, H., Fatichah, C.: Classification of Diabetic Retinopathy and normal retinal images using CNN and SVM. In: 2019 12th International Conference on Information & Communication Technology and System (ICTS), vol. 1, pp. 152–157 (2019) 9. Amin, J., Sharif, M., Yasmin, M.: A review on recent developments for detection of diabetic retinopathy. Scientifica 2016, 1–20 (2016) 10. Aher, J., Singh, P., Shah, M.: Diabetic Eye Disease Detection Using Machine Learning. Techniques 5, 725 (2020) 11. Ullah, H., Saba, T., Islam, N., Abbas, N., Rehman, A., Mehmood, Z., Anjum, A.: An ensemble classification of exudates in color fundus images using an evolutionary algorithm based optimal features selection. Microsc. Res. Tech. 82(4), 361–372 (2019) 12. Carrera, E.V., González, A., Carrera, R.: Automated detection of diabetic retinopathy using SVM. In: 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), vol.1, pp. 1–4 (2017) 13. Huang, H.-Y., Lin, C.-J.: Linear and kernel classification: When to use which? In: Proceedings of the 2016 SIAM International Conference on Data Mining, SIAM, vol.1, pp. 216–224 (2016) 14. Teixeira Jr, L.A., et al.: Artificial neural network and wavelet decomposition in the forecast of global horizontal solar radiation. Pesquisa Operacional. 35, 73–90 (2015) 15. Rigby, M., Anthonisen, M., Chua, X.Y., Kaplan, A., Fournier, A.E., Grütter, P.: Building an artificial neural network with neurons. AIP Adv. 9(7), 1–1 (2019) 16. Dubey, K.B., Shrivastava, D.: Forestalling growth rate in type ii diabetic patients using data mining and artificial neural networks: an intense survey. Int. J. Comput. Eng. Technol. 10(3), 31–38 (2019) 17. Rawat, A.S., Rana, A., Kumar, A., Bagwari, A.: Application of multi-layer artificial neural network in the diagnosis system: a systematic review. IAES Int. J. Artif. Intell. 7(3), 138–142 (2018) 18. Sayed, G.I., Hassanien, A.E.: A hybrid SA-MFO algorithm for function optimization and engineering design problems. Complex Intell. Syst. 4(3), 195–212 (2018)
90
N. I. Ab Kader et al.
19. Awad, M.: Enhanced hybrid method of divide-and-conquer and rbf neural networks for function approximation of complex problems. Turkish J. Electr. Eng. Comput. Sci. 25, 1095–1105 (2017) 20. Evirgen, H., Cerkezi, M.: Prediction and diagnosis of diabetic retinopathy using data mining technique. Turkish Online J. Sci. Technol. 4, 32–37 (2014) 21. Deng, X., Liu, Q., Deng, Y., Mahadevan, S.: An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Inf. Sci. 340, 250–261 (2016) 22. Zalesky, A., Fornito, A., Cocchi, L., Gollo, L.L., van den Heuvel, M.P., Breakspear, M.: Connectome sensitivity or specificity: which is more important? Neuroimage 142, 407–420 (2016) 23. Sajjadi, M.S., Bachem, O., Lucic, M., Bousquet, O., Gelly, S.: Assessing generative models via precision and recall. Adv. Neural. Inf. Process. Syst. 1, 5228–5237 (2018)
A Habit-Change Support Web-Based System with Big Data Analytical Features for Hospitals (Doctive) Cheryll Anne Augustine and Pantea Keikhosrokiani(B) School of Computer Sciences, Universiti Sains Malaysia, 11800 Minden, Penang, Malaysia [email protected]
Abstract. Even with the advancement of medical services, we still see an increase in mortality rate around the world especially due to heart disease, where it constantly remains as the number one cause of death globally. In order for an individual to protect their health, they are required to adopt healthy eating and practice regular exercises which also means that they have to adapt to a habit change in their daily routine. This healthy habit change does not only protect against heart diseases but also other chronic diseases such as cancer and stroke. Therefore, a habit-change support web-based system with big data analytics and decision-making features called Doctive is developed in this study to lower the risks of heart diseases. Doctive is targeted for hospital authorities to monitor patients and their habits and to prescribe medication and advice based on patients’ habits and gathered information. Furthermore, this system also provides emergency assistance for patients based on their current location. This proposed system, would also be beneficial in collecting and organizing patients’ information to ease access and speed the process of data entry and retrieval. The system was tested and evaluated by 5 people who were medically qualified or with knowledge and expertise in the field of data analytics and visualization. After gathering their opinionated responses, the results were tabulated and analyzed to be taken into consideration for improvements and to garner ideas for the future development of the system. Doctive can be useful for healthcare providers and developers. Keywords: Habit-change · Medical information system · Web-based system · Big data analytics · Decision-making
1 Introduction As the use of technology spreads rapidly in various fields, the advancement in the medical field in the aspect of data gathering and processing has not bloomed to its fullness. This may be due to the immense number of information sent to hospitals where data gathered is not segregated and analyzed in order optimally. Besides, hospitals could also be lacking of useful input and real time data from patients. This rings a warning sign as the mortality rate sees an increase due to heart and chronic diseases, based on the survey done by World Health Organization [1], where, heart disease remains as the leading © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 91–101, 2021. https://doi.org/10.1007/978-3-030-70713-2_10
92
C. A. Augustine and P. Keikhosrokiani
cause of death around the world. The current systems that are available in hospitals lack the ability to monitor patients’ bad habits. Therefore, a Habit-Change Support WebBased System with Big Data Analytical Features for Hospitals (called “Doctive”) was developed on June 2020 in Penang, Malaysia for hospitals that supports patients based on their emotional and persuasive habit-change patterns. The proposed system can help to collect and store information related to patients’ bad habits and try to provide them with the prescriptions and supervision they need by qualified and authorized personals at the right time based on their habits and progression. Most of the diseases that attacks the community is largely based on the habits of individuals. Our habits vary from one person to another, therefore monitoring and promoting change in unhealthy habits will aid in creating a healthier generation. Patients who suffer from cardiovascular or other chronic diseases may be affected by their unhealthy habits such as lack of exercise, unhealthy eating patterns, lack of sleep, smoking and much more. There are recent studies that also shows the importance of early detection and diagnosis of diseases thorough healthcare systems using Big Data [2]. Therefore, a habit-change towards the proper and healthy direction is all that is needed to prevent detrimental health issues in the future. The developed web-based system works hand in hand with a mobile application that tracks patients’ daily activities, habits and stores it in a Firebase cloud platform, so that data can be collected, identified and analyzed by the hospital authorities for further supervision, medical treatments, specified prescriptions and advices. This collected data will be very helpful as Big Data analytics which is now widely used helps in providing precise and effective healthcare services by enabling data sharing and performing analytical calculations and analysis to provide strategic planning and better decision making [3, 4]. This new project would also help to improve the hospital’s patient management system and to be able to provide early care by reaching out to the society. This paper reviewed most of the existing healthcare systems and thus introduced the proposed habit-change support web-based system with big data analytical features for hospitals, called Doctive. The system requirement, design, development methodology, big data analytics, decision-making features, tests and evaluation are included in this paper. The paper is finally wrapped up with concluding remarks and future directions.
2 Related Work 2.1 Existing Systems There are existing system which are somewhat similar to the proposed system but lacks some functionalities. Two existing machine that closely works with the cardiology department to track patient’s heart rhythms are introduced in this paper. However, they do not constantly record the rhythms for over a month, nor do they monitor patients’ habits which actually plays a vital role. Holter Monitor The Holter Monitor [5] is a device that constantly records the heart’s rhythms. This monitoring device is worn by patients for about 24 to 48 h as they perform normal daily activities. This test is performed by sticking electrodes which are small conducting
A Habit-Change Support Web-Based System
93
patches onto patient’s chest. These electrodes are attached by wired connectors to a small recording monitor where it can be carried in a pouch or pocket. This monitor runs on batteries. After 24 to 48 h, this monitor should be returned to the doctor or the health care provider, to enable them to take a look at the collected records to see if there is any abnormal heart rhythms. Abnormal heart rhythms may include various arrhythmias, certain changes may also indicate that the heart is not getting enough oxygen supply. Adding to that, this system stresses on the importance of patients’ to record their symptoms and activities separately and manually in order for the provider to match them with the findings of the Holter Monitor when they attend the follow up sessions. Cardiac Event Recorder A Cardiac event recorder [6] is a battery-powered portable device that patients’ can control to tape-record their heart’s electrical activity, electrocardiogram (ECG) when they have symptoms such as fast or slow heartbeats, dizziness, or the feeling of fainting. It can also be used to monitor how patients respond to a certain type of medication. Some of these ‘cardiac event monitors’ are able to store patients’ ECG in the memory monitor. There are two types of ‘event recorders’ such as the loop memory monitor and the symptom event monitor. • Loop Memory Monitor A small device that can be programmed to record your ECG for a certain period of time, such as 5 min. A button has to be pushed to activate it and the ECG will be stored for a period of time before and during the symptoms. • Symptom Event Monitor A hand-held device or to be worn on the wrist. When patients’ feel an irregular heartbeat, they should place the monitor on their chest and activate a recording button. This device only records the ECG reading when it is activated. These devices are able to send the ECG recording by telephone to a specific unit in the hospital for the doctor to review. However, based on investigations, there are also some existing healthcare management systems in Malaysia such as Med-Pro Care, H-MagSys and My1HealthCare Solution [7] that includes clinical and administrative functions but lacks health monitoring and big data analytics features which provides decision support for medical professionals to make better judgements in improving the habits of patients, which is used in the Doctive system.
3 System Requirements and Analysis 3.1 Proposed Solutions The proposed habit-change support web-based system aims to collect, organize and maintain patients’ information to ease access and retrieval of data. In addition, it monitors patients and their habits through the proposed system. Habit change performance will be analyzed using big data analytics feature and the decision tree will be created for further prescription by the medical experts. The main motivation of this system is to build a
94
C. A. Augustine and P. Keikhosrokiani
healthier community by lowering the increasing percentage of chronic diseases due to the practices of unhealthy habits among people. Habit affects an individual’s mood and habits can be changed and improved over time. This has led to the idea of a HabitChange Support Web-Based System for Hospitals [8–12]. The main idea life-cycle of the proposed system (Fig. 1) is to recognize and improve the bad habits and the health status of individuals.
Fig. 1. System idea life-cycle
Their data is obtained from the Firebase cloud storage platform, where it will be classified, organized and analyzed by the system to be turned into useful information in order to help patients with their health status. Professional hospital authorities will then analyze patients’ habit change performance through machine learning and data visualization tools and provide prescription or advices. Patients’ habits will then be remotely monitored closely using Internet of Things (IoT) just as co-implemented in this system [13], in order to provide new prescription or suggest other medical alternatives based on the progress and changes in habits. 3.2 Development Methodology The development of the Doctive system is based on the System Development Life Cycle. This development cycle mainly consists of four main phases which are the modelling, assessment, design and prototype phase. The modelling phase involves the gathering of system requirements which includes hardware and software. The second phase, assessment, is carried out to understand and gather user requirements for this new system. This is a crucial phase as, user assurance is needed in order to be confident in completing the project, to avoid rejection and dissatisfaction in later stages. Therefore, an interview was carried out to gather users’ opinions. Next, the design phase includes requirement to design, feasibility study, analysis and design of the Doctive system. An architectural design specifies the hardware, software and environment of the new project. The fourth and last phase is the prototyping phase which is completed with constructing, coding or testing and evaluating the new system.
A Habit-Change Support Web-Based System
95
3.3 System Architecture The system architecture of the “Doctive” system is described in detail in the Fig. 2. There are two parts of the Doctive system, it partly works together with the Behabit mobile application that is connected to a smart watch. This Behabit mobile application collects and tracks end user’s demographic details which also includes, heart rate, habits and tries to change patients’ bad habits using emotional-persuasive features. These collection of data will then be stored to the Firebase cloud platform for the Doctive system to obtain. The Doctive system plays a huge role in collecting, organizing and analyzing these data so that the users will be able to maintain a fit and healthy lifestyle, far from potential chronic diseases. Doctive system collects data from the cloud platform and applies machine learning tools and technique to help doctors monitor progress and changes in user or patients’ habits. Collected data is stored in the form of the log file in WEKA’s Attribute-Relation File Format (ARFF). This file is used as input to train the classifier. Apart from that, the data is also used to create data visualizations in Tableau to assist doctors’ analysis.
Fig. 2. Doctive system architecture
The algorithm used in helping doctor or health care providers predict the right and suitable prescription and advices for specific users or patients is the J48 predictive decision tree. For example, ArffViewer tool is provided by Weka, which imports data from a comma-separated values (CSV) file and saves it in ARFF format. This file is then fed to Weka to train the classifier. The classifier then produces the J48 algorithm confusion matrix, along with the decision tree.
96
C. A. Augustine and P. Keikhosrokiani
4 Test and Evaluation of the System In order to test and evaluate Doctive system, testing strategy is utilized as shown in Fig. 3 which includes unit testing, integration testing, system testing and acceptance testing. Unit testing is a level of software testing where individual units/components of a software or system is tested. It is aimed to affirm that each unit of the system performs its functions as designed. A unit is the smallest testable part of any software. It usually has one or more input with a single output. In SDLC, unit testing is the first level of testing done before integration testing. PHPUnit was created by Sebastian Bergmann and it is the most popular unit testing framework independent library for testing PHP codes but it involves writing tests manually and running them, which takes more time.
Fig. 3. Proposed testing strategy for doctive system
Integration Testing is a software testing level where individual units are combined and tested to verify whether they are functioning as they are intend to when integrated. The main purpose is to check the interface between modules and to identify defects in the interaction between these software modules when they are integrated. Integration testing is a systematic methodology for installing a software system when conducting checks to discover interfacing-related errors. It can ensure the exception for parameter, function, run-time and incompatibility between object interactions. Integration testing performed from time to time, starting from project development is advisable, to ensure that everything runs smoothly. System Testing is a software test level which tests a complete and integrated software. This test was performed at every phase to determine the system’s compliance with the stated requirements. System testing helps reduce troubleshooting after delivery, and service calls. When there is a chance of any error occurrence, urgent action must be taken to fine-tune the system. System testing was carried out for the habit change and prescription module of the Doctive system to ensure, the habit-change can be observed from week to week and for the latter, to check if doctor’s prescriptions are able to reach the designated patients. User Acceptance testing of the Doctive system was carried out among a limited number of people as it required, individuals with some medical expertise to provide their
A Habit-Change Support Web-Based System
97
feedback and opinions on the system. Therefore, a questionnaire was created and distributed to 5 people who were either medically qualified or has knowledge or expertise in the field of data analytics and visualization. After gathering their opinionated responses, the results were tabulated and analyzed to be taken into consideration for improvements and ideas for the future development of the system. 4.1 Result The system shows the results of the big data analysis based on patient’s habit in two forms, the decision tree and visualizations from Tableau. The decision tree as shown in the Fig. 4 below will then help doctors or health care providers to predict user’s mood based on the level of exercise performed. For example, when user performs light exercise, and burns calories less than or equals to 36 kcal, their mood is dull. But users performs medium exercise, and burns more than 43 kcal with walking for more than 4266 steps a day, their mood is excited. Therefore, doctors or health care providers can advise users or patients to constantly perform medium level exercise and walk for more than 4200 steps a day to maintain a happy and cheerful mood.
Fig. 4. J48 decision tree model for doctive system
Besides, Doctive system also aids in the hospital administration activities such as registering and managing patients, storing and keeping track of patients’ medical history, scheduling appointments and follow-up sessions and most importantly, reaching out to patients in times of emergency. These hospital administrational data of patients will be stored in a centralized local database (phpMyAdmin) of the hospital. Apart from analyzing data in Weka, Doctive system also includes big data analytics and visualizations from Tableau that helps in simplifying raw data into very easily understandable formats. Figure 5 portrays data of an individual for a period of one month with a Body Mass Index (BMI) of 22.0 in the Normal category and Basal Metabolic Rate (BMR) of 1252 calories per day. Therefore this individual can maintain their healthy weight by making sure that their calorie burn is consistent with their calorie intake. The definition and calculation of BMR is as follows:
98
C. A. Augustine and P. Keikhosrokiani
Basal Metabolic Rate Equation by Mifflin-St Jeor. Basal metabolic rate (BMR) is the total number of calories that an individual needs to perform basic, life-sustaining functions. These basal functions include circulation, breathing, cell production, nutrient processing, protein synthesis, and ion transport [14]. (M ) = (10 × weight in kg) + (6.25 × height in cm) − (5 × age in years) + 5 (F) = (10 × weight in kg) + (6.25 × height in cm) − (5 × age in years) + 161 BMR multiplied by the activity factor of an individual based on the activity level, determines the amount of calories needed by that particular individual. The graph in Fig. 5 includes data such as date by weeks for a month, average number of steps, activity level (Sedentary, Low Active, Moderately Active, Active, Highly Active) which is the standardization provided by the World Health Organization (WHO), average calories, Behabit points (labelled on the graph as Very Bad, Bad, Average, Good, Excellent) and mood (Dull, Excited, Happy, Normal). As it is said that, there is a strong connection between good mental health and good physical health, and vice versa by the Harvard Medical School [15], there are also studies on the influence and relationship of mood on health [16, 17]. Based on the analysis, a clear visualization can be made about the mood of the individual based on their steps and calories weekly for a month. The graph also helps doctors filter the moods based on its colored categories.
Fig. 5. Habit Change analysis and visualization in Tableau
A Habit-Change Support Web-Based System
99
When the steps level of an individual is at a Sedentary level which is below 5000 steps and with low calorie burnt, the mood of an individual falls between Dull, Normal and Happy, but has a wider mood area of normal which also includes dull. This also show that their Behabit point within this period falls under Very Bad, Bad and Average. Comparatively, when an individual is at the Low Active level, which is between 5000 to 7500 steps, the mood of an individual shows more excitement and happiness with only a little Normal mood. At this point, their Behabit points is also labelled as Good. And topping it all, when an individual achieves steps count between 7500 to 9999 steps (Moderate Active), and burns a higher number of calories with an average of 125 cal, their mood is never dull nor normal. It only keeps an individual in a positive mood between happy and excited. This also helps them achieve an Excellent score in the Behabit points. Based on an example as shown in Table 1 a doctor can conclude, which steps level should be advised and is practical to be practiced by an individual to keep them in a positive mood with a healthy lifestyle. For example, concluding based on the data visualization, when the patient performs, Low Active level of steps, they are generally in a positive mood ranging from normal, happy and excited. Referring to that, the patient can be advised to continuously perform Low Active steps level between 5000 to 7499 steps a day with medium level exercise, for a month to take the patient off the Sedentary lifestyle and make changes to their habits progressively. Table 1. Activity classification based on WHO
5 Conclusion and Future Work Doctive system which is a habit-change support web-based system with big data analytical features for hospitals was developed, tested, and evaluated successfully in June 2020 in Penang, Malaysia. Doctive has been built to enforce healthy habit change and reduce the risk of heart diseases in the community which has been the number one killer globally. Based on the evaluation results from medical experts, the feedbacks were positive and constructive. This system has been developed after lots of consideration on the requirements of the end user and suggesting ways on how to further ease their responsibilities. Therefore, the feedbacks and ideas obtained from the user acceptance evaluation will also be taken into consideration for the further improvement and betterment of the system. Some important feature that can be added to the system in the
100
C. A. Augustine and P. Keikhosrokiani
future are the features that will help the community in leading a healthy lifestyle for the body and mind such as predictive habit change. This is becoming more and more important as the number of people especially teenagers, falling into depression is on the rise. Furthermore, ECG monitoring features can be added to the system that stores the readings into the system rather than printing out the long list of ECG reading to be filed physically. This can then be stored as reference for upcoming doctor visits and can be retrieved with ease. ECG is vital as it detects and provides a more accurate result and reading of the heart’s rhythm. More features will also be added in order to provide a complete patient management system for large scale hospitals in the very near future. Acknowledgment. The authors are thankful to School of Computer Sciences, and Division of Research & Innovation, USM for providing financial support from Short Term Grant (304/PKOMP/6315435) granted to Dr Pantea Keikhosrokiani.
References 1. Who.int. The top 10 causes of death. https://www.who.int/news-room/fact-sheets/detail/thetop-10-causes-of-death. Accessed 9 Oct 2019 2. Jagadeeswari, V., Subramaniyaswamy, V., Logesh, R., Vijayakumar, V.: A study on medical internet of things and big data in personalized healthcare system. Health Inf. Sci. Syst. 6(1), 14 (2018) 3. Madanian, S., Parry, D.: IoT, cloud computing and big data: integrated framework for healthcare in disasters. Stud. Health Technol. Inform. 264, 998–1002 (2019) 4. Kolasa, K., Goettsch, W., Petrova, G., Berler, A.: ‘Without data, you’re just another person with an opinion’. Expert Rev. Pharmacoecon Outcomes Res. 20(2), 147–154 (2020) 5. Holter monitor (24 h): MedlinePlus Medical Encyclopedia. https://medlineplus.gov/ency/art icle/003877.htm. Accessed 3 Nov 2019 6. www.heart.org. Cardiac Event Recorder, https://www.heart.org/en/health-topics/arrhythmia/ prevention–treatment-of-arrhythmia/cardiac-event-recorder. Accessed 3 Nov 2019 7. Hospital and Healthcare Management Systems for Hospitals, Medical Centres, Specialist Clinics and General Practitioners. http://www.my1healthcare.com/modules/web/index.php. Accessed 9 Oct 2020 8. Keikhosrokiani, P.: Emotional-persuasive and habit-change assessment of mobile medical information Systems (mMIS). In: Keikhosrokiani, P. (ed.) Perspectives in the Development of Mobile Medical Information Systems, Academic Press, pp. 101–109 (2020) 9. Keikhosrokiani, P., Mustaffa, N., Zakaria, N., Baharudin, A.S.: User behavioral intention toward using mobile healthcare system. In: Consumer-driven technologies in healthcare: breakthroughs in research and practice: IGI Global, pp. 429–444 (2019) 10. Keikhosrokiani, P.: Behavioral intention to use of Mobile Medical Information System (mMIS). In: Keikhosrokiani, P. (ed.) Perspectives in the Development of Mobile Medical Information Systems, Academic Press, pp. 57–73 (2020) 11. Keikhosrokiani, P., Mustaffa, N., Zakaria, N.: Success factors in developing iHeart as a patientcentric healthcare system: a multi-group analysis. Telematics Inf. 35(4), 753–775 (2018) 12. Keikhosrokiani, P., et al.: Assessment of a medical information system: the mediating role of use and user satisfaction on the success of human interaction with the mobile healthcare system (iHeart). Cogn. Technol. Work 22(2), 281–305 (2020)
A Habit-Change Support Web-Based System
101
13. Fernández-Caramés, T.M., Froiz-Míguez, I., Blanco-Novoa, O., Fraga-Lamas, P.: enabling the internet of mobile crowdsourcing health things: a mobile fog computing, blockchain and iot based continuous glucose monitoring system for diabetes mellitus research and care. Sensors (Basel) 19(15), 3319 (2019) 14. Frey, M.: How to Change Your Basal Metabolic Rate for Weight Loss. https://www.verywe llfit.com/what-is-bmr-or-basal-metabolic-rate-3495380. Accessed 5 Jun 2020 15. Publishing, H.H.: Mind & Mood. https://www.health.harvard.edu/topics/mind-and-mood. Accessed 11 Oct 2020 16. Salovey, P., Birnbaum, D.: Influence of mood on health-relevant cognitions. J. Pers. Soc. Psychol. 57(3), 539–551 (1989) 17. Yates, J.A., Clare, L., Woods, R.T.: What is the Relationship between Health, Mood, and Mild Cognitive Impairment? J. Alzheimers Dis. 2017 55(3), 1183–1193 (2016)
An Architecture for Intelligent Diagnosing Diabetic Types and Complications Based on Symptoms Gunasekar Thangarasu1(B) , P. D. D. Dominic2 , and Kayalvizhi Subramanian3 1 Department of Professional Industry Driven Education, MAHSA University,
Jenjarom, Malaysia 2 Department of Computer and Information Science, University Technology PETRONAS,
Seri Iskandar, Malaysia [email protected] 3 Department of Fundamental and Applied Sciences, University Technology PETRONAS, Seri Iskandar, Malaysia
Abstract. Information and communication technology can play a vital role in improving healthcare services by providing new and efficient ways of diagnosing diseases. Diabetic is recognized as the fastest-growing disease in the world. Due to insufficient diagnostic mechanisms, the number of undiagnosed diabetes has been increasing day by day. And it leads to creating long term complications such as neuropathy, nephropathy, foot gangrene and so on. The objective of this study is to design an intelligent architecture for diagnosing diabetes effectively based on the individual physical symptoms. The architecture has been designed by utilizing the combination of neural networks, data clustering algorithms and fuzzy logic techniques. Subsequently, a prototype system has been developed to validate against the diagnostic architecture on the aspect of efficiency and accuracy of diagnosing diabetes, and its types and complications. The overall qualitative findings from this study scored very high, which is 94.50% accurate. Keywords: Diabetes · Complications · Neural networks · Fuzzy logics and clustering
1 Introduction Information and Communication Technology (ICT) can help in coping with the information explosion. Information refers to any communication or representation of knowledge such as facts, data or opinions in any medium including textual, numerical, graphic Cartographic, narrative or audiovisual forms. Technology is the practical form of scientific knowledge or the science of the application of knowledge to practice. Today, ICT is used in a wide range of fields especially in healthcare. The use of information technology is to improve the healthcare system by saving cost, increasing patient safety and improvising the quality of healthcare. Digital technology will continue to be the catalyst for innovative initiatives in the healthcare sector. The computerized decision support systems sued © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 102–110, 2021. https://doi.org/10.1007/978-3-030-70713-2_11
An Architecture for Intelligent Diagnosing Diabetic Types
103
in a hospital are very useful for complete, accurate decision making. Among the benefits that information technology renders in the healthcare industry is the record-keeping of patients and their information such as symptoms, diagnosis treatment and etc. [1]. Diagnosis of diseases is an important and difficult task in medicine that primarily focuses on the various causes and functions that are interrelated. In general, a variety of factors and observations are taken into account for the diagnosis of real diseases [2, 3]. Therefore, doctors typically apply their specialized expertise, experience and hypothesis about illness and eventually assess the patient as the best scientific way to solve the issue or formulate the treatment. The clinical database also aids in identifying patterns and collecting the know-how to improve the diagnosis and treatment of patients. Many health professionals often use medical records in major medical evaluations. In the last 40 years, several statistical instruments have conducted a little research in the field of medicine. Nevertheless, over the past 20 years, the use of neural networks, fusion logic and other methods has steadily grown [4]. In the last five decades, developments in computer technology have been promoted through the development of artificial intelligence, expert systems and decision-making system which enables individuals to carry out specific tasks. Since its inception, the health industry has been using Artificial Intelligence (AI), Expert Systems (ES), Decision Support Systems (DSS) and in medical diagnostic applications, fuzzy reasoning, case-based thinking and the neural network have gradually increased [5]. In this research, the problem of study originated from the real diagnosis of diabetes. Few diabetes studies were performed and their trends and methods were not sufficient at the present time. According to the World Health Organization (WHO), more than 50% of diabetes is still undiagnosed worldwide [6]. The use of diabetes diagnosis included in the clinical details that are blood test results measures glychohemogloin, fast plasma glucose and oral glucose tolerance. Both diabetes-related blood tests include the blood drawing and transfer of the sample to a healthcare professional office in a clinical laboratory. To ensure correct test results, a blood laboratory analysis is necessary. Random plasma glucose monitoring in a number of diabetes diagnostic cases is also used during daily monitoring. For all diabetes diagnostic examinations, a second calculation [7, 8] is needed. So it’s the requirement to do the research for diagnosing the diabetes types and its complications in a faster manner without going through the series of medical tests and waiting for the physician’s appointment. The principle objective of this study is to design and improve the diagnostic framework for intelligent diagnostic architecture to evaluate the types and complications of diabetes mellitus. Health professionals can use this groundbreaking approach of improving the diagnosis of diabetes.
2 Review of Literature Saiful Rahman founded the Decision supporting diabetes diagnosis [9]. In his study, sign, symptoms, risk factors for diabetic disease and the results of a physical examination were used for data collection. The physician asked about the patient’s symptoms. The risk of disease was measured and evaluated on the basis of system response information obtained from 55 Types-I diabetes patients for instances, low probability, medium chance and
104
G. Thangarasu et al.
moderate risk diabetic or not. The system has shown an incredible performance, which has an accuracy of 98%. Innovation by intelligent agents has been used to design and improve the system. The study of Matsumoto et al. [10] have acknowledged in its report the information given to the doctor’s clinically definitive signs, symptoms, comments and research facilities. It proposed the idea of patient models and he patterns of disease, a structured algorithm for the diagnosis and a real context that was built with neural systems for medicinal research. As Gultepe et al., pointed out [11], it is necessary to recognize early sepsis in order to keep traveling in the more severe phases of illness, with one in four consequences. The Bayesian method is developed using systemic triggering disorder reaction criteria, mean vessel weight and lactate levels of sepsis patients. The following system reveals an appropriate conation between the levels of lactate and sepsis. Dakua, et al. [12] confirmed that one of the prevalent and debilitating diseases in the adult population worldwide is a cerebrovascular disorder. It causes cerebral vessels to break down within the brain that leads to a haemorrhage of the subarchnoids. A clinical workflow model was introduced into their work to assist the endovascular in selecting the type of stent-related treatment for cerebral aneurysms. The findings suggest the clinical potential benefit of the proposed computational workflow. Raiter, et al. [13], on the other hand, used inconspicuous and basic sensor devices to track the well-being of individuals fascinated by a healthy lifestyle. The information gathered from body sensor and survey data is analyzed by allowing improvement plans and suggestions to be extracted with the aid of feedback, training and motivations and techniques are recognized as ideally helping individuals to achieve their goals in daily life. They have a couple of weeks of customized take care frameworks that can be used as part of their own home. According to Vyssoulis et al.’s study [14], in non-diabetic Greek adults, hypertensive men and women have developed glycaemic profiles according to diabetes mellitus and obesity history in their families. Family history of diabetes, obesity markers, the criteria of glycaemia, insulin protection and IGH penetration have been resolved in an important collaborator of its organization.
3 Research Methods This research, proposes an architecture for intelligent diagnosing diabetes types and its complications based on the physical symptoms. The methodology contains two research phases: The first phase of designing the architecture for intelligent diagnosing diabetes types and its complications using back-propagation neural networks, fuzzy logics and K-Means clustering and the second phase of developing the experimental prototype system using Visual Studio 2017 and SQL Server 2016 with Windows 10 operating system in order to test and validate the proposed architecture. Back propagation neural network is a mathematical model widely used for classification and diagnosis in various thirst areas like effective decision making in medical fields, signal processing and so on. Fuzzy logic expert systems used in medical examinations are of great importance providing an exact evaluation report of medical data provided to the systems. These types of the system provide an instant and simple method of medical examination. Cluster analysis has been applied for such varied objectives as finding a true topology, model fitting, prediction based on groups, hypothesis generation, hypothesis testing,
An Architecture for Intelligent Diagnosing Diabetic Types
105
data exploration, data reduction and grouping similar entities into homogeneous classes. The corresponding results are presented in a series of different studies. This method is intended to further verify the validity of the proposed diagnostic system in order to classify diabetes and its complications. The experimental prototype has been adopted, since Karen has demonstrated only partial functions on some aspect of the system [15]. The study’s overall design is shown in Fig. 1.
Type-I Diabetes
Data Clustering
Type-II GestaƟonal
Neural Networks Cardiovascular Feature SelecƟon Data Preprocess
Clinical Database
Nephropathy Fuzzy Logics
Neuropathy ReƟnopathy Gangrene
Result
Fig. 1. An architecture for diagnosing diabetes types and complications
3.1 Back-Propagation Neural Networks In the recent year, several researchers have proposed that a back-propagation neural networks [16] is an effective method for data prevision in medical science. Numerous studies on training data, the computational design and the creation of real-time applications, such as robotic control have now been provided to the neural network of the national and international research groups every day. Mcculloch-Pitts Neuron on developed by McCulloch and Pitts [17] is the first representation of neural networks in the mammalian brain. Backpropagation neural network key characters are: (a) black box build, (b) each link-based node, (c) a single network structure is capable of performing a variety of tasks via data training e no data noise and (f) no interference [17]. A single neuron is n with inputs x 1 , x 2 ……. x n with real and y output values. Moreover, the inputs of the neuron are connected to actual numbers, w1 , w2 ….. wn weights. The output depends on the number of inputs weighted. n i=1
Wi Xi
106
G. Thangarasu et al.
The non-linear function is known as activation or threshold function. Heaviside function for all aerials R, and Sβ functions described by a formula are the most common activation features. Sββ(a = (1 + e−βa )−1
(1)
When β is a positive constant (called the steepness parameters), the value of which specifies a specific sigmoid function, considering the neuron output to be specified as, n y = sβ Wi Xi − θ (2) i=1
βεR+
For some bias, the β is the so-called neuronal bias. The partition determines the weighted number of inputs in which the neuron output changes the most sensitive in the volume. For the convenience of the consumer, the bias θ and the corresponding weight W 0 = θ respectively, are normally expressed by an extra input. So the Eq. (2), as in Eq. (3) will be replaced by a simple formula. n y = sβ Wi Xi (3) i=1
3.2 Data Clustering Algorithms The data cluster is a popular technology for data mining. Relevant domains were applied successfully to identify trends. Data clustering algorithms are generally quick and simple. (i) Hierarchical clusters, (ii) clusters of means k, (iii) clusters of medium K and (iv) clustering of fuzzy C-means (FCM) are the four common methods used for clustering results. The fuzzy C-means the clustering methods are used to identify diabetes forms from clinical databases since data points can be assigned to more than one cluster according to this approach. It is an addition of algorithms of K-means. Each data point can only be allocated a single cluster for the K-means clustering. The clustering is one of the initial data analysis components. It stipulates that the transfer of data to different classes is important from measurements [18]. 3.3 Fuzzy Logic Techniques Based on the professional knowledge of the medical profession and clinical evaluation, the furious reasoning relationship between symptoms and risk factors for diabetes is established to classify multiple complications caused by diabetes [19]. Incomprehensible systems are controlled by the importance of the fudging logic techniques. In order to concentrate on effective decision-making, the fluffy reasoning strategies do not require extracting quantifications. More etymologic variables are used for the use of fuzzy logic. It makes the production and operation of systems quicker and easier. It can be a good data management decision-making tool. Fuzzy logic will require many repetitions in order to discover a variety of guidelines that provide a consistent solution in complicated
An Architecture for Intelligent Diagnosing Diabetic Types
107
systems. During the fusion of fuzzy logic techniques with the neural network, the study of data clusters will cut time to establish rules [20]. The fuzzy logic is one of the most common and widely applied artificial intelligence techniques, including medical diagnostics, assessment, image care, control systems and model recognition. Figure 2 shows the overall functions of fuzzy inference systems.
Fuzzy rule
Fuzzy rule
Fuzzy rule
Decision Making Unit (Inference Engine)
Fuzzification
Defuzzification
Output
Fig. 2. Fuzzy inference system
The first step in fuzzy logic is to take the measured data and determine the membership degree of these inputs to associated fuzzy sets. It is done by giving the value of each variable to a membership function set. Membership functions take different shapes. The two most common functions are triangular and trapexoidal.
4 Result Analysis and Discussion First of all, the questionnaire has been designed with associated 30 questions for collecting data for testing and proposed system. The questionnaire has been verified and approved by three Professional Diabetic Experts. The questionnaire was distributed to 235 individuals. Out of them, 200 respondent’s data were confirmed for experimental purposes. Table 1 shows details of the diagnosis of diabetes from 200 respondents. The respondents are listed as 98 men and 102 women. The results of the prototype device diagnosis are 62 respondents (27 males and 35 female interviewees) and 138 non-diabetic interviewees (71 males and 67 females). Table 2, shows the detailed prototype findings diagnosed for 200 respondents, including diabetic non-diabetes respondents put age group wise. Table 1. Summary of diabetes diagnosing results Respondents Prototype diagnosed results Diabetes Yes
Total Diabetes No
Male
27
71
098
Female
35
67
102
Total
62
138
200
108
G. Thangarasu et al.
The findings of the prototype diabetic device respondents are discussed in detail in Table 2. A total of 62 diabetes respondents was found. Of the 58 people with diabetes, one was affected by type-1, fifty-seven were affected by Type-II and 4 by gestational diabetes. Long term complications can grow gradually over a decide for people with type-1 diabetes. Patients should then adopt the daily medication and diet that will help to decrease the risk of complications. Table 2 shows how the disease works and how to deal with emotional difficulties and to make improvements in the required lifestyle. Table 2. Respondent’s diabetes diagnosing results Diabetes result
Age category
Nos.
Types
Nos.
Complications
Nos.
Yes
Below 20 years
0
Type-I
1
Cardiovascular
0
21–30 years
1
31–40 years
2
41–50 years
6
51–60 years
Type-II
25
Above 60 years
28 Gestational
62
57
62
4
Retinopathy
1
Neuropathy
0
Nephropathy
0
Gangrene Foot
0
Cardiovascular
23
Retinopathy
37
Neuropathy
13
Nephropathy
16
Gangrene Foot
15
Not Applicable
62
The author used predictive analysis a table confusion is a table with two rows and two columns that reports the number of false positive, false negatives, true positive and true negative. This allows more detailed analysis than mere proportion of correct classification. Table 3 shows the diabetes diagnosing types and its complications based on the proposed results. The Table 3 conduced that 94.50% accuracy of the models. Table 3. Table of confusion TN FN TP
FP Accuracy
200 11 189 0
94.5
5 Conclusion In this research, the design of intelligent diagnostics for diabetes, the development of experimental technologies and the improvement solution for the health sector in particular for the diagnosis of diabetes, types and its symptomatic complications are involved.
An Architecture for Intelligent Diagnosing Diabetic Types
109
The synthesis of neural network approaches with the back-propagation algorithm, clustering algorithm, and fluffy architectural logic techniques. The individual capabilities of each technology are unique. The neural network technology has been described in recent years, in terms of the literature review, as a significantly powerful and incredibly accurate decision-making method. As a second technology for the classification of diabetes types, the clustering algorithm is employed. The final soft logic approach is used to classify different diabetes complications. It can be a good tool for decision making, easier, simpler and versatile with high precision. The best technological concept will be the three best innovations integrated into a single system. The findings were reported with a precision of 94.50%. This proposed system will help the people to diagnose their diabetes disease types and its complications in the very early stage and get medication on time to live a longer life.
References 1. Usama, M., Ahmad, B., Xiao, W., Hossain, S., Muhammad, G.: Self-attention based recurrent convolutional neural networks for disease prediction using healthcare data. Comput. Methods Programs Biomed. 190, 105–122 (2020) 2. Liu, L, Wang, l., Huang, Q., Zhou, L., Fu, X., Liu, L.: An efficient architecture for medical high-resolution images transmission in mobile telemedicine system. Comput. Methods Programs Biomed. 187, 88–101 (2020). 3. Sandhu, K.J., Verma, A., Rana, P.: An Expert Approach for data Flow Prediction: Case Study of Wireless Sensor Networks 112(325–352), 73–91 (2020) 4. Kamdar, J.H., Jeba Praba, J., John, J.: Artificial intelligence in medical diagnosis: methods, algorithms and applications. Learning and Analytics in Intelligent Systems book series LAIS 13, 27–37 (2020) 5. Uzoka, F.M.E., Osuji, J., Obot, O.: Clinical decision support system (DSS) in the diagnosis of malaria: a case comparison of two soft computing methodologies. Expert Syst. Appl. 38(1), 1537–1553 (2018) 6. IDF Diabetes Atlas, International Diabetes Federation: 9th Ed. (2019) 7. Medical News Today. https://www.medicalnewstoday.com/info/diabetes. Accessed 10 Feb 2020 8. Michael, B.: Inadequacies of current approaches to pre-diabetes and diabetes prevention. J. Endocrine 44(3), 623–633 (2018) 9. Rahaman, S.: Diabetes diagnosis decision support system based on symptoms, signs and risk factors using special computation algorithm by rule base. In: 15th International Conference on Computer and Information Technology, pp. 65–71, Chittagong (2016) 10. Matsumoto, T., Shimada, Y., Kawaji, S.: Clinical diagnosis support system based on symptoms and remarks by neural networks. In: IEEE Conference on Cybernetics and Intelligent Systems, pp. 1304–1307, Singapore (2018) 11. Gultepe, E., Hien, N., Albertson, T., Tagkopoulos, I.: A bayesian network for early diagnosis of sepsis patients: a basis for a clinical decision support system. In: IEEE 2nd International Conference on Computational Advances in Bio and Medical Sciences, pp. 1–5, Las Vegas (2016) 12. Dakua, S.P., Navkar, N.V., Abi-Nahed, J., Groen, D., Bernabeu, M.O., Saghir, M.A.R., Kamel, H., Al-Ansari, A., Coveney, P.V.: Towards a computational system to support clinical treatment decisions for diagnosed cerebral aneurysms. In: Middle East Conference on Biomedical Engineering, pp. 281–284, Doha (2018)
110
G. Thangarasu et al.
13. Reiter, H., Naujokat, E., Pinter, R., Devot, S.: Take Care: a home-based sensor system for the management of cardiovascular risk factors primary prevention by monitoring vital body signs, analysing the data and closing the loop by feedback, coaching and motivation. In: 5th International Summer school and Symposium on Medical Devices and Biosensors, Hong Kong, pp. 186–189 (2018) 14. Vyssoulis, G.P., Liakos, C.I., Karpanou, E.A., Triantafyllou, A.I., Michaelides, A.P., Tzamou, V.E., Markou, M.I., Stefanadis, C.I.: Impaired glucose homeostasis in non-diabetic greek hypertensives with diabetes family history, effect of the obesity status. J. Am. Soc. Hypertens. 7(4), 294–304 (2018) 15. Inke, M., Guy, C.: The role of institutional design and organizational practice for health financing performance and universal coverage. Healthy Policy 99(3), 183–192 (2016) 16. Carrin, G., Mathauer, I., Xu, K.: Universal coverage of Health Services: tailoring its Implementation. Bull. World Health Organ. 86(1), 09–24 (2018) 17. Sherrod, P.H.: DTREG Predictive Modeling Software Manual (2019) 18. Nathan, D.M.: Advances in diagnosis and treatment. Int. J. Med. 314(10), 1052–1062 (2019) 19. Suyash, S., Lokesh, S., Vijeta, S., Ajai, K., Hemant, D.: Prediction of diabetes using artificial neural network approach. Eng. Vibration, Comm. Inf. Process. 478(1), 679–687 (2018) 20. Mohamed, S., Baskar, S., Sarma, V.R., Mustafa, M.J.: Cloud-based framework for diagnosis of diabetes mellitus using K-means clustering. Health Inf. Sci. Syst. 16(6), 321–232 (2018)
An Advanced Encryption Cryptographically-Based Securing Applicative Protocols MQTT and CoAP to Optimize Medical-IOT Supervising Platforms Sanaa El Aidi1(B) , Abderrahim Bajit1 , Anass Barodi1 , Habiba Chaoui1 , and Ahmed Tamtaoui2 1 Laboratory of Advanced Systems Engineering (ISA), National School of Applied Sciences,
Ibn Tofail University, Kenitra, Morocco {barodi.anass,habiba.chaoui}@uit.ac.ma 2 National Institute of Posts and Telecommunications (INPT-Rabat), SC Department, Mohammed V University, Rabat, Morocco [email protected]
Abstract. Our proposed Platform is to detect and to measure the temperature of persons with PIR Node IOT, and then verify his identity through the combination of an RFID Node IOT and facial recognition with Cam Node IOT and if these tests are valid, the persons can then access the public area. With the security layer of the CoAP (Constrained Application Protocol) and MQTT (Message Queuing Telemetry Transport) communication protocols to compare these 2 protocols in terms of the execution time, the RAM memory space occupation, and the execution CPU consumptions. Then we have able to create an intelligent and secure medical IoT Platform this has been designed to monitor citizens to access this vast area in a more organized and secure manner to reduce the severity of this pandemic. Keywords: MQTT · MQTT Client IOT · CoAP Client IOT · Broker · CoAP SERVER IOT · AES encryption · CoAP · IoT · Artificial intelligence · Microcontroller · OpenCV
1 Introduction Given the spread of the coronavirus pandemic, we thought to create an intelligent and secure medical IoT Platform and make it work in an existing environment (such as in a hospital, company, establishment….) without the complexity of integration with the network or other existing infrastructures. Our objective of this platform is to improve and optimize health precautions to have no integration between citizens who have covid-19 or who had contact with an infected person with citizens who are never infected with covid-19. The Internet of Things allows an interaction between the physical and digital worlds. The digital world interacts with the physical world through sensors and actuators. These © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 111–121, 2021. https://doi.org/10.1007/978-3-030-70713-2_12
112
S. El Aidi et al.
sensors collect information that must be stored and processed. Data processing can take place at the edge of the network or on a remote server or in the cloud. The storage and processing capacities of an IoT object are limited by the available resources, which are restricted due to limitations in size, energy, power, and computing capacity [1]. In our platform, we have also used artificial intelligence, specifically facial recognition for image detection and processing, the human face provides several social signals essential for our public life [2]. The face mediates person identification, attractiveness, and facial communicative. A.I. is a science that dates back some thirty years. Its purpose is to reconstruct intelligent reasoning and action using artificial means - almost always computers. The difficulties are a priori of two types: • Most of our activities we don’t know ourselves how we do it. We don’t know a precise method - no algorithms today say computer scientists - to understand a written text or to recognize a face, to demonstrate a theorem, to establish a plan of action, to solve a problem, to learn… • Computers are a priori very far from such a level of competence. They have to be programmed from the very beginning. Indeed, programming languages only allow to express very ’elementary’ notions. A.I. is, from this double point of view, an experimental science: experiments on computers that allow to test and refine the models expressed in the programs on many examples; observations on humans (generally the researcher himself) to discover these models and better understand the functioning of human intelligence [3, 4]. IoT needs artificial intelligence AI and vice versa, and the use of AI is beneficial for real-time processing [5, 6]. In our platform IoT has been used to connect medical devices to the internet, to collect several data about the citizen to process it’s analyze and to act appropriately the citizen’s access right. Our Platform is used to use 4 tests for monitoring citizens to access the surface of the body in a more organized and secure manner, the first we did is to check the temperature of citizens using the PIR Node IOT and if the test is less than or equal to 37 °C we move to the 2nd test for the detection of the identity of citizens by the RFID tag, and we verify if the person has already negative test of PCR/Serological, then the person presented need to confirm his information by using artificial intelligence recognition system. The system is capable to analyze the facial structure by comparing it to the information in the database and then identifying the detected person. We devoted ourselves to the implementation of a MQTT/CoAP protocols with an authentication server in the IOT. The intervention of encryption algorithms to solve the above-mentioned problems by encrypting/decrypting messages. We have therefore implemented a secure version of the CoAP and MQTT protocols using the AES encryption algorithm [7].
2 General Architecture Our intelligent and secure medical IOT Platform is used to identify citizen to have access at the public area public area, so we have 4 tests to do to appropriate and enable citizens. we used 3 nodes, the first one -PIR and temperature Client Node- is applied to detect the
An Advanced Encryption Cryptographically-Based Securing
113
presence of a citizen and samples its temperature, the second one -RFID Identification IOT Client Node- is deployed to identify the citizen and sanitary information, and the third one –Image Recognition Camera IOT Client Node– is implemented to recognize the citizen face and his identity. In this proposed medical IOT platform, we employed 2 application IOT communication protocols, the Message Queuing Telemetry Transport -MQTT- and the Constrained Application Protocol –CoAP- in order to choose the best one in terms of the executing time, the RAM memory space occupation and the execution CPU consumptions (Fig. 1).
Fig. 1. Intelligent and secure medical IOT Platform
The illustrated figure presents the medical IOT Platform which is used to identify cases of coronavirus infection and to control access to a public place using an RFID card by performing 4 tests the 1st is used to test if the citizen’s temperature is below 37°C. Then we verify if the citizen has already presented a PCR/serological test thanks to the data connected to the RFID Identification tag, then we verify that a citizen has already been in contact with a positive case and if we find that the citizen’s temperature is normal and RFID information and identification has shown the condition the citizen has negative test and not contact with an infected person, then we move to the citizen identity by using the facial recognition based on artificial intelligence, if this test is verified valid we give the authorized access, and if detected the non-conformity of the identity with the data associated to the RFID tag we give the access denied in this case, even if the person has a normal temperature, has been tested negatively and has not had contact with an infected person. For the MQTT protocol implementation, we employed 3 topics: 3 publishers - IOT MQTT Client Nodes (PIR, Temperature Sensor, RFID identifier and CAM recognizer)-,
114
S. El Aidi et al.
and one IOT MQTT Client subscriber -Web Server IOT Client Node-, a platform key is also set to encrypt and decrypt transmitted data. And for the CoAP implementation, we used 3 topics: IOT CoAP Server, IOT CoAP Client, we also used AESLIB and PyCryptodome for encrypting and decrypting transmission DATA.
3 Methodology 3.1 Protocols Application IOT CoAP is an application, IOT communication and web transfer protocol based on Representational State Transfer (REST) that is used for resource-constrained devices operating in an IP network, resource-constrained devices can be numerous, but they are often linked to each other by function or location, group communication mechanisms can improve the efficiency and latency of communications and reduce the bandwidth for a given request [8]. CoAP is primarily designed for constrained devices. Clients may send GET, PUT, POST and DELETE resource requests to the server. CoAP messages are encoded in a simple binary format. Packets are simple to generate and can be parsed in place without consuming energy in constrained devices [9]. The MQTT uses a publish/subscribe model, has low network overhead and can be implemented on low-power devices such as IOT node microcontrollers that could be used in remote sensors in the Internet of Things. As such, Mosquitto is destined to be employed in all cases where there is a need for light messaging, especially on constrained devices with limited resources [10]. The primary difference between CoAP and MQTT is that the former works over the user’s datagram (UDP), while the latter works in addition to TCP. Since UDP is inherently unreliable, CoAP provides its own reliability mechanism, so it has two modes: reliable and unreliable. In reliable mode it is the use of confirmable messages that require an ACK, while in non-reliable mode it is the use of non-confirmable messages that do not require recognition. Another difference between CoAP and MQTT is the availability of different QoS levels. The MQTT defines 3 levels of QoS while the CoAP does not offer a differentiated quality of service [11]. AES is a symmetric key system in which the sender and recipient of a message share a unique common key, which is used to encrypt and decrypt the message. AES supports key sizes of 128, 192, and 256 bits, and consists of 10, 12, and 14 encryption repetition (also known as rounds), respectively. Each round mixes the data with a round-key derived from encryption key. Except last round, each round comprises four processing steps, including SubBytes, ShiftRows, MixColumns, and AddRoundKey [12]. • SubBytes is an invertible and nonlinear transformation, which adopts 16 identical 256byte substitution tables (i.e., S-box) for individually mapping bytes of the data block into other bytes. S-box entries are produced by calculating multiplicative inverses in Galois Field GF(28) and applying an affine transformation. • ShiftRows performs a byte transposition by cyclically shifting rows of the data block according to predefined offsets, i.e., left shift of the second, third, and fourth row by one, two, and three bytes, respectively.
An Advanced Encryption Cryptographically-Based Securing
115
• MixColumns multiplies each column of the data block with a modular polynomial in GF(28). Instead of computing separately, SubBytes and MixColumns can also be combined into large Look-Up-Tables (LUT). • AddRoundKey transformation adds the data block with round-key derived from initial secret key in the key schedule unit. This function XORed each byte of the block with the corresponding bye in the round-key [12]. The operations in decryption are basically the inverse of the operations in encryption. Besides, the number of rounds of the looping is set to Nr-1 in which Nr is specified according to the AES specification [13].
4 Related Works 4.1 Comparative Study for the Proposed Protocols IOT In an CoAP environment, a solution that consists in integrating DTLS and CoAP protocols for IoT [14] through CoAP-DTLS integration has been developed to allow the application to automatically access CoAP. The results of the evaluation show a significant gain in terms of power consumption, network response time and pro-cessing time. Research is currently oriented towards securing the IoT, several aspects by highlighting the security and proposing several solutions by establishing the specif-ic characteristics of the protocols of the application layer proposed by the RSA-based security solution [15]. The most used asymmetric cryptography, being to achieve low overload and high interoperability, because the overload of the DTLS handshake process that consumes a large amount of power not supported by IoT devices. An-other analysis of the two known security protocols that can be used to secure CoAP networks: DTLS and Internet Protocol Security (IP-sec) [16]. They concluded that these protocols are not the most optimized solutions for CoAP security by citing the drawbacks of these security protocols. In an MQTT environment, the demand for a new approach to secure the MQTTbased platform in order to guarantee the confidentiality and integrity of transmitted data [17]. According to the name “secure-MQTT” which is standardized by IANA and port 8883 is exclusively reserved for MQTT over TLS [18], security between the MQTT broker and users can be provided by SSL and TLS [19], but the TLS protocol is not cost-effective for optimal security at MQTT. While the additional use of the CPU is generally negligible for the broker, it can be a problem for highly constrained devices that are not designed for computationally intensive tasks [18]. The CA-based solution to generate a private key and a certificate, which will be published manually for certified customers [19]. This approach is not applied to an IoT environment that may contain a wide range of nodes, so manual configuration is so difficult to achieve, and security has a cost in terms of CPU usage and communication costs.
5 The Proposed Approach Encryption is the process of converting the original plain text into non-readable format. There are various encryption techniques that exist in cryptography such as DES, Triple
116
S. El Aidi et al.
DES, AES, RSA, etc. AES has been widely used in many devices, especially in resourceconstrained environments, due to their efficient, secure, and high-performance use in these resource-limited environments. And the major point of the AES algorithm is that the AES key has the smallest dimensions which are much less important than the others. Reducing the size of the key will reduce computing resources, Conserve more energy from all the nodes IOT and extend the life of the network. Symmetric Encryption uses the same key concept to encrypt as well as decrypt. There are a number of benefits to this approach. The performance is relatively high. There are two aspects of this algorithm. The first is the encryption algorithm and the other is the key. The encryption algorithm is a process of transformations that take place on the plain text with the key itself. At the time of decryption, the same process of encryption is followed in a reverse manner with the same key. A strong algorithm should depend on its key entirely [20]. Our goal is to apply an encryption layer based on AES. To achieve this, a security layer is added for both secured protocols. In this proposal, we will ensure that only authorized citizens can access the information, and we will provide a more secure text to those who do not have permission or right to access the data. According to Fig. 2, there is a security layer added to the CoAP protocol, and the data is encrypted from end to end. Encryption Data is done at the CoAP Client IOT level, and the Decryption at the webserver IOT level, we can conclude that only authorized persons have the permission to access the data. We have proposed the algorithm design described as follow:
Fig. 2. Algorithm transmission data
We have used in our platform IOT-Medical precisely in the CoAP protocol: • Coapthon Server [21]: CoAP Server implementation in python. • ESP Nodes: CoAP Server and Client implementation in MicroController WiFi Module. • AESLIB: AES implementation. • PyCryptodome: AES implementation in python
An Advanced Encryption Cryptographically-Based Securing
117
According to Fig. 3, there is a security layer added to the MQTT protocol, the diagram has 3 principal elements: the MQTT broker and 2 MQTT customers: a subscriber and a publisher. The execution environment for a subscriber can be a microcontroller node, for example, an ESP8266 card and ESP32CAM, so it includes physical sensors and a subcomponent which is the MicroPython, the latter contains first of all two parts: the “AESCipher Encrypt” and the encryption key. The “AESCipher Encrypt” is the piece of code responsible for encrypting the collected data into a ciphertext which is then sent to the broker. In our approach, the subscriber is a Linux web server whose role is to perform several tasks such as managing the graphical user interface, processing decrypted data via “AESCipher Decrypt”, transmitting encrypted orders to other platform nodes via the MQTT broker, reading and/or writing to the database (Fig. 4).
Fig. 3. Diagram of deployment
Our project has been evaluated in a pure IoT environment, to view the exchange of messages in an encrypted format via an MQTT broker between all the microcontroller nodes of the platform (Fig. 5).
118
S. El Aidi et al.
Fig. 4. Encryption with AES
Fig. 5. Decryption with AES
6 Discussion The objective of this work is to apply the AES algorithm in our platform for both secured and unsecured protocols because it allows to use small keys for encryption and decryption. Table 1 shows a comparative results of the two versions of the platform: secured and unsecured for the 2 protocols in terms of executing time, occupation RAM space and execution CPU consumptions, for PIR and temperature node (node1), RFID Identification IOT Node (node2), and Image Recognition Camera IOT (node 3), according to the Table 1, we find that the secure MQTT has a very high consumption by secure CoAP. And in the table, we have shown the 2 protocols in unsecure mode, also after the addition of the security layer on the 2 protocols IOT in order to choose the best secure protocol and on the other hand to show that the security layer did not influence our Platform IOT in terms of time and power consumption.
An Advanced Encryption Cryptographically-Based Securing
119
According to the analyses made in [22], it was found that encryption and decryption with ECC (Elliptical Curve Cryptography) is better than RSA, and in this article it was concluded that encryption and decryption with AES is better than RSA, because the principle of AES is to ensure effective communication between nodes in IOT, ensuring confidentiality, integrity and authentication exceptional and represent the best options for resource-constrained environments. In future work we will apply encryption with ECC on our Medical-IOT platform with both secure protocols to make a comparative study between AES and ECC. Table 1. The results of the implementation of IOT protocols Nodes
Protocols
TIME (ms)
CPU (µs)
Node1
MQTT
2320
534323.67
CoAP
3020
431575.3
584
Node2
MQTT
2165
15259729.7
328
CoAP
2785
5343170.7
504
MQTT
1606.29
520469
335872
CoAP
1510
435351
50217
Node1
S-MQTT
1390
404615.5
19820
S-CoAP
2163
326815.7
Node2
S-MQTT
1593.33
6043050.5
18620
S-CoAP
1876
4597250.5
376
S-MQTT
1270
508456
S-CoAP
50217
1220
Node3
Node3
RAM (bytes) 404
502
68000 402267
7 Conclusion and Perspectives In this given work we have able to create an intelligent and secure medical IOT Platform and making it more efficient and secure by using our proposed protocols with AES algorithm for encryption and decryption transmission DATA. Our proposed approach has been tested in a real-time environment, to illustrate the exchange of DATA in an encrypted format via an IOT Server protocol between all the nodes IOT of the platform. In future work, we will implement ECC encryption on our platform to compare AES with ECC and choose the best encryption algorithm on our proposed protocol.
References 1. Garg, H., Dave, M.: Securing IoT devices and securelyconnecting the dots using REST API and middleware 978–1–7281–1253–4/19/$31.00 © 2019 IEEE https://doi.org/https://doi.org/ 10.1109/IoT-SIU.2019.8777334
120
S. El Aidi et al.
2. Pantic, M., Patras, I.: Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybern. https:// doi.org/10.1109/TSMCB.2005.859075 3. Laurière, J.-L.: Intelligence artificielle : résolution de problèmes par l’homme et la machine. https://ulysse.univ-lorraine.fr/discovery/fulldiplay?vid=33UDL_INST:UDL&docid=alm a991003670679705596&lang=fr&context=L&adaptor=Local%20Search%20Engine 4. Barodi, A., Bajit, A., Benbrahim, M., Tamtaoui, A.: Improving the transfer learning performances in the classification of the automotive traffic roads signs. In: E3S Web Conf., -Proceeding (2020) 5. Zhou, J., Wang, Y., Ota, K., Dong, M.: AAIoT: accelerating artificial intelligence in IoT systems. IEEE Wirel. Commun. Lett. 8(3), 825–828 (2019) 6. Barodi, A., Bajit, A., El aidi, S., Benbrahim, M., Tamtaoui, A.: Applying real-time object shapes detection to automotive traffic roads signs. In: Proceeding of the 2020 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Morocco, Kenitra, 2020, pp. 1–6 (2020) 7. Bajit, K.A., Nahid, M., Tamtaoui, A., Benbrahim, M.: A Psychovisual Optimization of wavelet foveation-based image coding and quality assessment based on human quality criterions. Adv. Sci. Technol. eng. Syst. J. 5(2), 225–234 (2020). https://doi.org/10.25046/aj050229 8. Rahman, A., Dijk, E.: Group Communication for the Constrained Application Protocol (CoAP) (2014). https://www.hjp.at/doc/rfc/rfc7390.html 9. Kayal, P., Perros, H.: A comparison of IoT application layer protocols through a smart parking implementation. 2017 20th Conference on Innovations in Clouds, Internet and Networks (ICIN), Paris, pp. 331–336 (2017). https://doi.org/10.1109/ICIN.2017.7899436. 10. Light: Mosquitto: server and client implementation of the MQTT protocol. J. Open Source Softw. 2(13), 265 (2017). https://doi.org/10.21105/joss.00265 https://doi.org/10.1109/ACC ESS.2018.2852563 11. Thangavel, D., Ma, X., Valera, A., Tan, H., Tan, C.K.: Performance evaluation of MQTT and CoAP via a common middleware. In: 2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), Singapore, pp. 1–6 (2014). https://doi.org/10.1109/ISSNIP.2014.6827678. 12. Tsai, K., Huang, Y., Leu, F., You, I., Huang, Y., Tsai, C.: AES-128 based secure low power communication for LoRaWAN IoT environments. IEEE Access 6, 45325–45334 (2018). https://doi.org/10.1109/ACCESS.2018.2852563 13. Lu, C.-C., Tseng, S.-Y.: Integrated design of AES (Advanced Encryption Standard) encrypter and decrypter. In: Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors, San Jose, CA, USA, 2002, pp. 277–285. https://doi.org/ 10.1109/ASAP.2002.1030726. 14. Raza, S., Shafagh, H., Hewage, K., et al.: Lithe: lightweight secure CoAP for the internet of things. IEEE Sensors J. 13(10), 3711–3720 (2013) 15. Kothmayr, T.: A security architecture for wireless sensor networks based on DTLS. Master’s thesis in the Software Engineering Elite Graduate Program at the University of Augsburg (2011) 16. Alghamdi, T.A., Lasebae, A., Aiash, M.: Security analysis of the constrained application protocol in the Internet of Things. In: Second International Conference on Future Generation Communication Technologies (FGCT 2013). IEEE, pp. 163–168 (2013) 17. Silva, C., Toasa, R., Martinez, H.D., Veloz, J., Gallardo, C.: Secure Push ‘notification service based on MQTT Protocol for mobile platforms’, Conference: XII Jornadas Iberoamericanas de Ingenieria de Software e Ingenieria del Conocimiento 2017, JIISIC 2017 - Held Jointly with the Ecuadorian Conference on Software Engineering, CEIS 2017 and the Conference on Software Engineering Applied to Control and Automation Systems, ISASCA 2017At: Latacunga, Ecuador
An Advanced Encryption Cryptographically-Based Securing
121
18. Mektoubi, A., Hassani, H.L., Belhadaoui, H., Rifi, M., Zakari, A.: New approach for securing communication over MQTT protocol A comparaison between RSA and Elliptic Curve. In: 2016 Third International Conference on Systems of Collaboration (SysCo) (2016). https:// doi.org/10.1109/sysco.2016.7831326 19. Khamphroo, M., Kwankeo, N., Kaemarungsi, K., Fukawa, K.: MicroPython-based educational mobile robot for computer coding learning. In: 2017 8th International Conference of Information and Communication Technology for Embedded Systems (IC-ICTES) (2017). https://doi.org/10.1109/ictemsys.2017.7958781 20. Mishra, P., Agrawal, M.: A Comparative Survey on Symmetric Key Encryption Techniques, Monika Agrawal et al. / International Journal on Computer Science and Engineering (IJCSE) ISSN: 0975–3397 Vol. 4 No. 05 May 2012 877. ’https://citeseerx.ist.psu.edu/viewdoc/dow nload?doi=10.1.1.433.2037&rep=rep1&type=pdf 21. Tanganelli, G., Vallati, C., Mingozzi, E.: CoAPthon: easy development of CoAP-based IoT applications with Python. In: 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT). IEEE, 2015, pp. 63–68 (2015) 22. EL aidi, S., Bajit, A., Barodi, A., Chaoui, H.: An elliptic-curve based cryptographically optimized vehicular protocols applied to secured applicative protocols MQTT and CoAP. In: 2020 International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS), Kenitra, 2020, pp. 1–6, Proceeding (2020)
Pulmonary Nodule Classification Based on Three Convolutional Neural Networks Models Enoumayri Elhoussaine and Belaqziz Salwa(B) LabSIV Laboratory, Department of Computer Science, Faculty of Science, Ibn Zohr University, BP 8106, 80000 Agadir, Morocco [email protected]
Abstract. The leading reason of death linked to cancer worldwide is lung cancer. To plan effective treatment, create monetary and care plans, early diagnosing of lung nodules in computed tomography (CT) chest scans must be performed. In this context, the purpose of this paper is to take into account the problem of classification between malignant and benign pulmonary nodules in CT scans, which aims to automatically map 3D nodules to category labels. Thus, we propose an ensemble learning approach based on three Convolutional Neural Networks including a basic 3D CNN, a 3D model inspired by AlexNet, and another 3D mod-el inspired by ResNet. The result from these CNNs is combined to estimate one result, using a fully-connected layer with a softmax activation. These CNNs are trained and evaluated on the LIDC-IDRI public dataset. The best result is obtained by the ensemble model, providing a larger AUC (84.66%); “area under the receiver operating characteristic curve” and 94.44% for TPR (sensitivity), with a data augmentation technique. Keywords: Pulmonary nodule classification · LIDC-IDRI · Deep neural networks · 3D AlexNet · 3D ResNet
1 Introduction Lung cancer is the pathology that has more mortality globally, accounting for more deaths than cancers of the prostate, breast, colon, and pancreas combined [1], and its mortality rate can be reduced utilizing Low-Dose Lung CT screening [2]. However, the subtle differences between benign and malignant pulmonary nodules make lung cancer diagnosis a difficult task even for human experts. Moreover, the evaluation of radiologic diagnosis is very subjective, it induces much variety in the radiologist’s opinions. Computer-aided diagnosis (CAD) provides an objective prediction and a non-invasive solution for the problem of classification between malignant and benign pulmonary nodules in CT scans, CAD can be used to increase the radiologist’s confidence in the diagnosis of a pulmonary nodule. For current CAD systems, there are two categories: the first one measure radiological traits (e.g. shape, nodule size, texture, location), in this approach, the feature selection is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 122–128, 2021. https://doi.org/10.1007/978-3-030-70713-2_13
Pulmonary Nodule Classification
123
extracted by hand or with predefined filters [3, 4], then a classifier is adapted to determinate the malignancy status. The second category is based on the automatic extraction of features via deep neural networks [5–7]. To handle the pulmonary nodule classification problem, these networks learn from the data through a general learning process without needing to extract features as the traditional way. In [8], the author proposes four Convolutionary Neural Networks evaluated on the public LIDC-IDRI dataset, the best classification performance (0.9010 for AUC and 86.84% for accuracy) was achieved using 3D multi-output DenseNet. In [9] the author uses a boosting classifier to obtain the best result based on a voting idea with several CNNs, and the classification method is carried out by modifying the weight of the training sample, the classifiers are combined linearly to increase their performance. The model suggested in [10] uses a Gradient Boosting Machine (GBM). First, using a convolutional layer, the features are extracted, then a set of 3D dual-path blocks is employed to learn higher-level features. Finally, for malignant or benign classification the author applies a 3D average pooling and binary logistic regression. We propose in this paper a new classification approach composed of an ensemble classifier using three 3D models: one with a set of convolution and fully-connected layers, the others are inspired by AlexNet [11] and ResNet [12] architectures respectively. The proposed ensemble classifier uses the outputs from the three 3D models, and then a fully-connected-layer with softmax activation is trained to get the results. The public LIDC-IDRI dataset is used to train and evaluate the proposed approach. This paper is structured as follows. First, in Sect. 2, the details of the proposed method for pulmonary nodule classification are given. Then Sect. 3 dives into the experiments and their results, this section is split into three major subsections, the Dataset is introduced in the first one, the second and the third gives the details about experiment settings and results respectively. Finally, in Sect. 4 a conclusion is held to summary the proposed method, and discuss the perspectives.
2 Proposed Method In this section, we describe our models for classifying lung nodules in CT scans using deep neural networks. In practice, radiologists check several slices of a lung nodule and consider the 3D information of the nodule to make a diagnosis. Most of the previous approaches do not include full 3D information for a pulmonary nodule, they simply use single or multi-view 2D images. Therefore, the proposed method discriminates malignant lung nodules from benign ones using as input a 3D CT chest scan with the location of the nodules. A typical CT scan consists of hundreds of 2D gray images with a dimension of 512 × 512. The design of the proposed networks is as below: Basic 3D CNN: Consists of four convolutional layers, each with a sequence of 32, 32, 64, 64 feature maps, and a filter of size 3 × 3 × 3, respectively. Batch normalization and max-pooling layers are applied after every convolution layer, the filter size of the last max-pooling is 1 × 1 × 1, and all others have a filter of size 2 × 2 × 2. The input of this model is a 32 × 32 × 32 volume where a pulmonary nodule dominates, the result
124
E. Elhoussaine and B. Salwa
feature map from the set of CNN is sent to a classifier e.g., a fully-connected layer with a softmax activation to distingue benign and malign nodules. 3D AlexNet: This architecture is more profound than the previous CNN, inspired by 2D AlexNet architecture. It composed of 6 convolutional layers, each one with a sequence of 32, 32, 64, 64, 128, 128 feature maps, and a filter of size 3 × 3 × 3 respectively. Then, batch normalization and max-pooling layers are applied after each convolution layer, with max-pooling filters of size 2 × 2 × 2, except the last two ones that have a filter of size 1 × 1 × 1. To distingue benign and malign nodules, the result from the last max-pooling layer is connected to a classifier e.g. fully-connected layer with a softmax activation. 3D ResNet: Consists of different stages, with a convolution and identity block at each stage. The identity block (Fig. 1) is used in the case where the output and the input activation have the same dimension, otherwise the convolution block is used; in this case a convolution layer is added to the shortcut path. The implemented model consists of two stages: the first with a 3D convolution layer and the second with a convolution block and two identity blocks. Each convolution block and identity block has three convolution layers. Then average pooling layer and dense layer with softmax are used to perform classification output.
Fig. 1. ResNet identity block
Ensemble Model: Create a new model to better combine the predictions from the models above. First, each of the previous models classifies the input nodule individually, then their output results are combined in one vector and sent to a new fully-connected layer to perform new classification results (Fig. 2).
Pulmonary Nodule Classification
125
Fig. 2. The proposed method
3 Experiments 3.1 Dataset and Preprocessing The LIDC-IDRI dataset offers 1010 different DICOM-format CT scans with a uniform size of 512 × 512. The thickness of the image varies from 0.5 to 5 mm, where 1, 1.25, and 2.5 mm are the recurrent image thicknesses. Each LIDC-IDRI dataset case contains hundreds of images and an XML file providing the details of the lung lesions found. The diameter of each of the observed lesions was measured using electronic calipers, based on their classification there are three main groups of lesions including nodules (with a diameter of size 3–30 mm), non-nodules (with a diameter of size ≥30 mm), and micro-nodules (with a diameter of size ≤3 mm). One to four radiologists annotate each nodule and assign a score of 1 to 5, with 1 and 5 being the extremes of benign and malignancy, respectively. We ignored the zero score, which means that there is no diagnosis available [13]. 3.2 Experiment Settings The proposed approach is implemented based on the Keras framework [14] with TensorFlow as a backend [15]. We choose the binary cross-entropy as the loss function since the classification problem is of a binary nature. To prevent over-fitting, the models are trained using data augmentation technique; horizontal flip, vertical flip, z-axis flip, and random orientation. We consider two sets for training; DS3 and DS4, which are the lung nodules diagnosed by at least three and four radiologists respectively. Then, we compute the median value of annotated scores for a nodule. A median value greater than three is taken as malignant and less than three as benign and a median value equal to three is excluded. The proposed models were trained on both DS3 and DS4 (training set: 70%, test set: 30%) using Adam optimizer, L2 regularizer, and Xiaver initialization method to initialize models weights. 3.3 Experiment Results The experimental results of the proposed networks on the DS3 and DS4 datasets are presented in Table 1 and Table 2. Their ROC curves are shown in Fig. 3 and Fig. 4.
126
E. Elhoussaine and B. Salwa
Since our problem is imbalanced and we care for negative and positive classes equally, we have used the AUC metric rather than accuracy, as the models can easily get a high ac-curacy value by simply labeling all observations as the majority class. The ensemble model obtains the highest AUC, TNR on both DS3 and DS4, and the highest TPR, TNR, and AUC are obtained on DS4, resulting in a TPR of 94%, TNR of 93%, and AUC of 84%. These results indicate that the fully-connected layer learns to perfectly weight the results of the three models in order to obtain optimum performance. The advantage of a fully-connected layer is that it gets a weighted average instead of a standard one. A problem present in the LIDC-IDRI dataset is the ambiguity that exists in defining the malignancy score of a nodule, being the evaluation very subjective. It results in a disparity in the ratings given in the evaluations since radiologists have different opinions when evaluating the same nodule. A model that has been trained with nodules diagnosed by fewer radiologists has a higher chance of being biased than the one trained with nodules diagnosed by more radiologists. This explains why the results from DS4 are good than the results from DS3. Regarding the model size and the number of parameters for the networks (basic 3D CNN with 1 060 862 parameters, 3D AlexNet with 1 271 554 parameters, and 3D ResNet with 539 266 parameters). Although ResNet has complicated and deeper architecture and more layers compared to the basic 3D CNN and AlexNet, the optimization takes advantage of the shortcut connection approach to help achieve better optimal results. Table 1. Performance on DS3 test set. 3D network
TPR% TNR% PPV% AUC%
AlexNet
0.9816 0.2822 0.5776 0.6319
ResNet
0.6835 0.3805 0.5246 0.5320
Basic model
0.9691 0.4546 0.6399 0.7119
Ensemble model 0.7268 0.8026 0.7864 0.7647
Table 2. Performance on DS4 test set. 3D Network
TPR% TNR% PPV% AUC%
AlexNet
0.6695 0.9337 0.9579 0.8200
ResNet
0.9352 0.9167 0.7721 0.8296
Basic model
0.7512 0.7891 0.9157 0.8410
Ensemble model 0.9444 0.9391 0.7925 0.8466
Pulmonary Nodule Classification
127
Fig. 3. ROC curves using DS3 test set.
Fig. 4. Testing ROC curves using DS4 test set.
4 Conclusion In this paper, we proposed an ensemble model using three 3D networks to classify pulmonary nodules in a CT image into benign or malignant classes. Working on 3D images provides better results for the classification of lung nodules compared to the use of approximate 3D images with multi-view or 2D images. One limitation of the proposed
128
E. Elhoussaine and B. Salwa
networks is that they did not take into account the thickness of the CT scans during training, which could affect performance. For future work, we aim at improving performance by using normalized CT scans to avoid the thickness problem, one other future work is automatic pulmonary nodule detection and segmentation; instance segmentation, which will relax the requirement of manual annotations for nodule locations.
References 1. Luís, G., Jorge, N., António, C., Aurélio, C.: Evaluation of the degree of malignancy of lung nodules in computed tomography images (2017) 2. National Lung Screening Trial Research Team et al.: Reduced lung-cancer mortality with low-dose computed tomographic screening. Natl. Engl. J. Med. 2011(365), 395–409 (2011) 3. Senthilkumar, K., Ganesh, N., Umamaheswari, R.: Three-dimensional lung nodule segmentation and shape variance analysis to detect lung cancer with reduced false positives. In: Proceedings of the Institution of Mechanical Engineers, vol. 230, no. 1, pp. 58–70, Journal of Engineering in Medicine (2016) 4. Ying, L., Yoganand, B., Thomas, A., Sanja, A., Qian, L., Ronald, C.W., Gary, S., Pierre, P.M., Matthew, B.S., Robert, J.G.: Radiological image traits predictive of cancer status in pulmonary nodules. In: Clinical Cancer Research, clincanres–3102 (2016). 5. Wei, S., Mu, Z., Feng, Y., Caiyun, Y., Jie, T.: Multi-scale convolutional neural networks for lung nodule classification. In: International Conference on Information Processing in Medical Imaging, pp. 588–599. Springer (2015) 6. Aiden, N., Zhen, H., Dennis, W.: Pulmonary nodule classification with deep residual networks. In: International Journal of Computer Assisted Radiology and Surgery, p. 10 (2017) 7. Kui, L., Guixia, K.: Multiview convolutional neural networks for lung nodule classification. Int. J. Imaging Syst. Technol. 27(1), 12–22 (2017) 8. Sarfaraz, H., Kunlin, C., Qi, S., Ulas, B.: Risk stratification of lung nodules using 3D CNNbased multi-task learning. In: International Conference on Information Processing in Medical Imaging, pp. 249–260. Springer (2017) 9. Hongtao, X., Dongbao, Y., Nannan, S., Zhineng, C., Yongdong, Z.: Automated pulm nary nodule detection in CT images using deep convolutional neural networks (2018). 10. Wentao, Z., Chaochun, L., Wei, F., Xiaohui, X.: DeepLung: deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification. In: arXiv preprint arXiv:1709. 05538 (2017) 11. Alex, K., Ilya, S., Geoffrey, E.H.: ImageNet classification with deep convolutional neural networks 12. Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep Residual Learning for Image Recognition 13. Anthony, P.R., Alberto, M.B.: The lung image database consortium (lidc) nodule size report, October. https://www.via.cornell.edu/lidc/ 14. Chollet, F., et al.: Keras (2015.) https://github.com/keras-team/keras 15. Abadi, M., et al.: Large-scale machine learning on heterogeneous systems, 2015. Software available from https://www.tensorow.org/
A Comparative Study on Liver Tumor Detection Using CT Images Abdulfattah E. Ba Alawi(B) , Ahmed Y. A. Saeed, Borhan M. N. Radman, and Burhan T. Alzekri Software Engineering Department, Taiz University, Taiz, Yemen
Abstract. Liver cancer (LC) is a globally known issue. It is one of the most common cancers that can cause human beings. It is a fatal disease spreading especially in developing countries. Many algorithms have been used to perform the detection of liver cancer with the help of both traditional machine learning classifiers and deep learning classifiers. To analyze the performance of commonly used algorithms, this paper attempts a comparative study on LC detection. It includes both machine learning and deep learning techniques; and several methods for liver and tumor detection from CT images are used. With the advances in Artificial Intelligence (AI) and convolution neural networks algorithms, the methods included in this comparative study achieved great results. The best accuracy among traditional machine learning classifiers reaches 90.46% using Support Vector Machine (RBF). Inception V4 pre-trained model obtained 93.15% in terms of testing accuracy, and it is the best classifier among deep learning models. The performance of deep learning models is very promising to take place in medical decisions. Keywords: Liver tumor · CT scan · CNN · Pre-trained model · Deep learning
1 Introduction Liver cancer is the type of cancer that occurs in the liver, which organ is one of the major parts of the human body, which requires our care and caution to keep it sound and healthy. The liver is situated below the right lung and under the ribcage. People who suffer from liver tumors usually died due to inaccurate or late detection. There are several important diagnostic tests for liver cancer such as CT scans and MRIs. In general, every doctor asks the patient to obtain a CT scan to make sure whether a liver tumor exists or not. If doctors find damages in the liver are old, they ask for taking MRI to obtain detailed knowledge of the liver tumor since MRI provides a better view of tumor location. Liver cancer is the common cause of death throughout the world using computed tomography (CT) images; the cancerous tissue can be precisely identified [1]. Because many methods are used for detecting liver cancer, this paper investigates the performance of machine learning and deep learning models commonly used in this respect. The common classifiers have been used to successfully classify abnormal liver cancer features. In this, the effectiveness of liver cancer prediction models is inspected using the assistance of precision, recall, and accuracy. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 129–137, 2021. https://doi.org/10.1007/978-3-030-70713-2_14
130
A. E. Ba Alawi et al.
The rest of this paper is organized as follows: In Sect. 2, the common works in the area of liver tumor detection are reviewed. Section 3 briefly describes the methods that have been applied in this study. The obtained results are analyzed and discussed in Sect. 4. Section 5 is given in the form of a conclusion and recommended further works.
2 Related Works Computer-aided diagnosis systems are commonly used with image processing techniques to identify liver cancer and assist the clinician in decision-making [2]. Many algorithms are applied for the identification of liver tumors, which includes regional approaches, watershed transformation, and machine learning approaches. An automated method was documented using GLCM-based features within a CAD framework to successfully classify liver tumors [3]. Huang et al. [4, 5] designed a Computer Aided Diagnosis procedure for the segmentation and classification of liver tumors using CT images. Their work had been extended to use the auto-covariance texture features for classifying the tumor with an accuracy of 81.7% [4]. Ji et al. [6] suggested an efficient computational model for the clinical diagnosis of hepatocellular carcinoma based on a framework for optimizing particle swarm. A novel, effective, and optimized approach based on Instance Optimization (IO) and SVM have been documented to more accurately identify liver cancer [7]. Li et al. [8] used a regularized level-set assessment approach based on edge distance that was effectively segmented into the cyst, tumor, calculi, and normal liver in CT images. A complete Convolutionary multi-channel network (MC-FCN) model that provides greater precision in CT images of liver tumors is proposed in [9]. The Gray Level Co-occurrence Matrix (GLCM) is used to effectively extract liver tumor features. The extracted statistical features are commonly used in machine learning approaches [10–13] such as support vector machine [15], and back propagation [14], or fuzzy clustering approach with a multi-SVM classifier [16]. Recent works have successfully applied deep learning techniques using DNN to solve a wide range of issues especially in liver tumor detection [9, 17]. The Convolutionary Neural Networks (CNNs) are effectively used in an automated system to segment affected lesions in CT images. The coefficient of dice similarity, it has achieved, is 80.06% [18]. Lu et al. [19] developed a deep learning algorithm with a cut refinement of the graph to segment the CT scans automatically and effectively. Kaizhi et al. [20] designed a system using deep learning for the classification of liver diseases. Hu et al. [14] addressed deep learning strategies such as Convolutionary neural networks in a recent survey study Completely Convolutionary network, auto-encoders, and deep conviction networks for cancer detection and diagnosis. In the work [21], the liver was initially isolated by marker-controlled watershed segmentation method and the lesion caused by cancer was eventually split into the Gaussian model mixture protocol. The deep neural classifiers are used for efficient recognition of three types of liver cancer; they are hemangioma, hepatocellular carcinoma, and metastatic. In the paper [22], Liver Function Tests (LFT) evidence is used in the diagnosis of computer-assisted Liver disease screening. The authors suggested a tightly related deep neural network with 13 LFT markers and population knowledge of liver disease screening subjects. A data set
A Comparative Study on Liver Tumor Detection Using CT Images
131
with 76,914 was used and the under curve area of DenseDNN reaches 0.8919; the under curve area of DNN is 0.8867; the under-forest of random forest is 0.8790; and the rational regression reaches 0.7974. DenseDNN demonstrates higher results than DNN in comparison with the deep learning methods. This paper presents a comparative study on deep learning classification and the detection of the region of liver tumors. Nineteen classifiers are used for recognizing liver tumors in CT images. In deep learning approach, about thirteen classifiers are used including (ResNet-50, DenseNet121, DenseNet201, GoogLeNet, InceptionV4, AlexNet. SequeezNet1.0, Se-queezeNet1.1, VGG11, VGG13, VGG16, VGG19, and Xception). This is n addition to the implementation of six traditional machine learning classifiers which include Support Vector Machine (SVM), Radial Base Gaussian Function (RBF), and K-Nearest Neighbors (KNN), to name just a few.
3 Methodology To analyze commonly used classifiers for clinical diagnosis and computer-aided decision systems, two approaches of artificial intelligence are investigated in this comparative study. These approaches are given in the following figure.
Fig. 1. Steps commonly followed for detecting liver tumors.
As shown in the above figure (Fig. 1), for machine learning algorithms, feature extraction is done then training the classifiers on the extracted features (e.g. SVM, KNN, etc.). The processioning operation is resizing the images to 128 × 128 before extracting features with HOG descriptor. However, for deep learning algorithms, the images are processed to have a size of 224 × 224 to fit the dimensions of the first layer of the pre-trained models. Only Inception V4 needs input images with a size of 299 × 299 because it has an input layer with 299 × 299 dimensions. Then, the pre-trained models are retrained in the liver tumor dataset.
132
A. E. Ba Alawi et al.
3.1 Data Collection The used dataset was downloaded from TCGA [23, 24], 3D-IRBADb 01 [25], and the Data of CHAOS Challenge - Combined (CT-MR) Healthy Abdominal Organ Segmentation [26]. After the removal of bad images and anomalies of the dataset, we get 735 images divided into 350 images as a normal class and 385 images as an abnormal class. The dataset was ready to be preprocessed to analyze the performance of machine learning and deep learning classifiers and classify liver CT scans as normal and abnormal. The dataset divided into 3 partitions; about 515 images for training, 147 images for validation, and 73 images for testing. 3.2 Data Augmentation To prevent the model from overfitting, and to ensure a balanced classification, the data required an augmentation process. Various augmentation operations such as Salt and paper noise and Gaussian noise are used. Also, all normal images rotated from angle 1 to angle 20. 3.3 Machine Learning Techniques The classification of liver images is performed using traditional machine learning classifiers. In this approach, the images are preprocessed and extracted using Histogram Oriented Gradients (HOG) descriptor. Then, the extracted features are classified using skin machine learning classifiers to analyze the performance of each one. 3.4 Deep Learning Techniques Convolution Neural Networks (CNN) algorithm is used here for its vital role in image classification tasks. The power of CNN is in a hidden area between input and output layers. The classification tasks by CNN show high-performance findings. Figure 2 illustrates the architecture of CNN.
Fig. 2. Convolution neural networks.
A Comparative Study on Liver Tumor Detection Using CT Images
133
3.5 Transfer Learning Transfer learning is a new technique that has been used recently. The most important advantage of this technique is that it reduces the required time and resources for training. Instead of training from scratch that takes more time and GPU resource and a large dataset of images, the pre-trained model (e.g. ResNet50, AlexNet, and GoogLeNet) is used to transfer the knowledge and perform the task. 3.6 Deep Learning Pre-trained Model In this study, thirteen pre-trained models are used; they are ResNet -50, DenseNet121, DenseNet201, GoogLeNet, InceptionV4, AlexNet, SequeezNet1.0, SequeezeNet1.1, VGG11, VGG13, VGG16, VGG19, and Xception. These models are fine-tuned by replacing the last layer of the pre-trained models with suitable layers according to the number of classes in the fully connected layers. The training phase steps are summarized in Fig. 3 below.
Fig. 3. Training and testing steps in a deep learning approach.
To evaluate the performance of the deep learning pre-trained models, the test images are inputted. Then input images are processed and tested on the obtained expertise model to recognize whether or not the liver CT scan image contains a tumor.
4 Results and Discussion The experiment was carried out using an HP laptop having 4 GB RAM, and Core i5 Microprocessor. During the training phase using the collected dataset, the loss related to
134
A. E. Ba Alawi et al.
each phase is used as a performance metric; and besides accuracy, Precision, and recall, 32-batch size and 25 epochs are used for training deep learning pre-trained models. The performance of these models in terms of training loss is depicted in Fig. 4.
Fig. 4. Training loss of deep learning models.
The training loss of Inception-V4 and Xception was the best among the pre-training models achieving 0.04 and 0.035, respectively. AlexNet pre-trained model achieved a loss of 0.5. Therefore, it can be regarded as the worst performance of all pre-trained models during training phase. The performance of pre-trained models in terms of validation loss is represented in the following diagram (Fig. 5). In terms of validation loss, DenseNet-201 and SqueezeNet-1.0 pre-trained models achieved the best performance. The following table summarizes the performance of all deep learning pre-trained models during testing phase: The above table (Table 1) illustrates the performance of the pre-trained models in detecting liver tumors during testing phase. The performance of Inception-V4 is the best reaching an accuracy of 93.15%. This indicates that the pre-trained models with more layers achieved better results than others can do. However, some deep learning models that have more layers do not perform well such as GoogLeNet that reaches the best training loss and the worst testing performance. In machine learning approach, the following table (Table 2) shows the performance of machine learning classifiers with a cross-validation of k = 7. Vividly, Kernel SVM (Radial Basis Function RBF) achieved the best performance with an accuracy reached 90.64%. In terms of accuracy and recall, both decision tree, random forest, and Naïve Bayes performed poorly during testing phase. The experimental findings show the feasibility of using ML and DL techniques in diagnosing liver tumor.
A Comparative Study on Liver Tumor Detection Using CT Images
Fig. 5. The obtained training loss for the pre-trained models.
Table 1. Performance Metrics Results using ResNet-50 Pre-trained model. Pre-trained model
Testing accuracy
Recall
Precision
F1_score
AlexNet
90.41%
90.86%
90.49%
90.41%
DenesNet-121
86.30%
86.57%
85.83%
84.73%
DenesNet201
87.67%
88.87%
88.57%
87.00%
GoogLeNet
86.30%
87.17%
88.11%
87.21%
Inception-V4
93.15%
91.84%
92.26%
91.70%
Vgg-11
83.56%
84.44%
84.50%
83.34%
Vgg-13
89.04%
88.67%
89.91%
88.20%
Vgg-16
90.41%
91.43%
89.04%
89.85%
Vgg-19
90.41%
89.73%
91.73%
89.48%
ResNet50
84.93%
84.94%
84.85%
84.84%
SqueezeNet-V1.0
87.67%
88.52%
89.93%
87.14%
SqueezeNet-V1.1
89.04%
89.75%
89.99%
89.76%
Xception
89.25%
89.14%
88.86%
89.04%
135
136
A. E. Ba Alawi et al. Table 2. The performance of machine learning classifiers.
ML classifier
Accuracy
Precision
Recall
Value
Standard division
Value
Standard division
Value
Standard division
SVM (kernel RBF)
90.46%
± 2.83%
91.63%
± 6.95%
90.41%
± 5.77%
Linear Regression
78.88%
± 4.99%
77.36%
± 5.96%
84.66%
± 7.65%
KNN, k = 7
89.51%
± 1.98%
90.90%
± 5.23%
89.32%
± 4.52%
KNN, k = 5
90.19%
± 2.21%
92.51%
± 5.39%
88.56%
± 6.53%
Naïve Bayes
76.29%
± 4.94%
81.32%
± 9.39%
72.16%
± 7.17%
Random Forest 79.84%
± 3.84%
93.55%
± 4.06%
66.33%
± 9.26%
Decision Tree
± 6.67%
85.17%
±11.20%
62.91%
± 23.72%
73.30%
5 Conclusion and Future Work The present clinical results obtained by using the developed modalities of CT imaging techniques are excellent with an accuracy of around 90% in the validation process. The results are expected to show a high impact on the diagnostic process. By combining these techniques effectively, all targeting different properties of malignant tissue of the liver could be diagnosed. As already discussed, this comparative study analyzes thirteen pre-trained models and six traditional machine-learning classifiers. As findings of this study, the best performance was achieved by Inception V4 pre-trained model with accuracy, precision, recall, and F_1 measure of 93.15%, 91.84%, 92.26%, and 91.70%, respectively. Among traditional ML classifiers, SVM (RBF) achieved the best accuracy that reaches 90.46%. To use the ensemble learning technique and apply segmentation process with the studied models are left for future work.
References 1. Bartolozzi, C., Ciatti, S., Lucarelli, E., Villari, N., de Dominicis, R.: Ultrasound and computer tomography in the evaluation of focal liver disease. Acta Radiologica. Diagnosis 22(5), 545– 548 (1981) 2. Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23(1), 89–109 (2001) 3. Chen, E.-L., Chung, P.-C., Chen, C.-L., Tsai, H.-M., Chang, C.-I.: An automatic diagnostic system for CT liver image classification. IEEE Trans. Biomed. Eng. 45(6), 783–794 (1998) 4. Huang, Y.-L., Chen, J.-H., Shen, W.-C.: Diagnosis of hepatic tumors with texture analysis in nonenhanced computed tomography images. Academic Radiol. 13(6), 713–720 (2006) 5. Huang, Y.-L., Chen, J.-H., Shen, W.-C.: Computer-aided diagnosis of liver tumors in nonenhanced CT images. Comput. Biol. Med. 9, 141–150 (2004) 6. Ji, Z., Wang, B.: Identifying potential clinical syndromes of hepatocellular carcinoma using PSO-based hierarchical feature selection algorithm. BioMed Res. Int. 2014, 1–12 (2014).
A Comparative Study on Liver Tumor Detection Using CT Images
137
7. Jiang, H., Zheng, R., Yi, D., Zhao, D.J.: A novel multiinstance learning approach for liver cancer recognition on abdominal CT images based on CPSO-SVM and IO. Comput. Math. Methods Med. 2013, 1–10 (2013) 8. Li, C., Xu, C., Gui, C., Fox, M.D.: Distance regularized level set evolution and its application to image segmentation. IEEE Trans. Image Process. 19(12), 3243–3254 (2010). 9. Sun, C., Guo, S., Zhang, H., Li, J., Chen, M., Ma, S., Jin, L., Liu, X., Li, X., Qian, X.J.: Automatic segmentation of liver tumors from multiphase contrast-enhanced CT images based on FCNs. Artif. Intell. Med. 83, 58–66 (2017) 10. Haralick, R.M., Shanmugam, K.: Its’Hak Dinstein: Textural features for image classification. IEEE Trans. Syst. Man Cybern. SMC-3(6), 610–621 (1973) 11. Newell, D., Nie, K., Chen, J.-H., Hsu, C.-C., Hon, J.Y., Nalcioglu, O., Su, M.-Y.: Selection of diagnostic features on breast MRI to differentiate between malignant and benign lesions using computer-aided diagnosis: differences in lesions presenting as mass and non-mass-like enhancement. Eur. Radiol. 20(4), 771–781 (2010) 12. Nie, K., Chen, J.-H., Hon, J.Y., Chu, Y., Nalcioglu, O., Su, M.-Y.: Quantitative analysis of lesion morphology and texture features for diagnostic prediction in breast MRI. Acad. Radiol. 15(12), 1513–1525 (2008) 13. Moon, W.K., Shen, Y.-W., Huang, C.-S., Chiang, L.-R., Chang, R.-F.: Biology: Computeraided diagnosis for the classification of breast masses in automated whole breast ultrasound images. 37(4), 539–548 (2011) 14. Hu, Z., Tang, J., Wang, Z., Zhang, K., Zhang, L., Sun, Q.J.P.R.: Deep learning for image-based cancer detection and diagnosis− a survey. 83, 134–149 (2018) 15. Devi, P., Dabas, P.: Liver tumor detection using artificial neural networks for medical images. IJIRST 2(3), 34–38 (2015) 16. Sakr, A.A., Fares, M.E., Ramadan, M.: Automated focal liver lesion staging classification based on Haralick texture features and multi-SVM. Int. J. Comput. Appl. 91(8), 0975–8887 (2014) 17. Ben-Cohen, A., Klang, E., Kerpel, A., Konen, E., Amitai, M.M., Greenspan, H.J.N.: Fully convolutional network and sparsity-based dictionary learning for liver lesion detection in CT examinations. 275, 1585–1594 (2018) 18. Li, C., Wang, X., Eberl, S., Fulham, M., Yin, Y., Chen, J., Feng, D.: A likelihood and local constraint level set model for liver tumor segmentation from CT volumes. 60(10), 2967–2977 (2013) 19. Lu, F., Wu, F., Hu, P., Peng, Z., Kong, D.: Surgery: Automatic 3D liver location and segmentation via convolutional neural network and graph cut. 12(2), 171–182 (2017) 20. Wu, K., Chen, X., Ding, M.J.O.: Deep learning based classification of focal liver lesions with contrast-enhanced ultrasound. 125(15), 4057–4063 (2014) 21. Das, A., Acharya, U.R., Panda, S.S., Sabut, S.: Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques. 54, 165–175 (2019) 22. Yao, Z., Li, J., Guan, Z., Ye, Y., Chen, Y.J.N.N.: Liver disease screening based on densely connected deep neural networks. 123, 299–304 (2020) 23. NBIA Dataset. https://nbia.cancerimagingarchive.net/.Accessed 25 Jan 2020 24. Erickson, B., Kirk, S., Lee, Y., Bathe, O., Kearns, M., Gerdes, C., Rieger-Christ, K., Lemmerman, J.: Radiology Data from The Cancer Genome Atlas Liver Hepatocellular Carcinoma [TCGA-LIHC] collectionThe. (2016). 25. IRCAD France Dataset. https://www.ircad.fr/research/3d-ircadb-01/ (2020). Accessed 25 Jan 2020 26. Kavur, A.E., Gezer, N.S., Barı¸s, M., Conze, P.-H., Groza, V., Pham, D.D., Chatterjee, S., Ernst, P., Özkan, S., Baydar, B.: CHAOS Challenge--Combined (CT-MR) Healthy Abdominal Organ Segmentation (2020)
Brain Tumor Diagnosis System Based on RM Images: A Comparative Study Ahmed Y. A. Saeed(B) , Abdulfattah E. Ba Alawi, and Borhan M. N. Radman Software Engineering Department, Taiz University, Taiz, Yemen
Abstract. Cancers or tumors have their impact effects on humans, especially if the cancer is localized in an important organ such as the brain. It is important to detect cancer earlier so that many lives can be saved. As cancer diagnosis is highly time-consuming and needs expensive tools, there is an immediate requirement to develop non-invasive, cost-effective, and efficient tools for brain cancer staging and detection. Brain scans that are commonly used are magnetic resonance imaging (MRI) and computed tomography (CT). In this paper, we studied the common algorithms that are used for brain tumor detection using imaging modalities of brain cancer and automatic computer-assisted methods. The main objective of this paper is to make a comparative analysis of several methods of detecting tumors in the Central Nervous System (CNS). The results of the applied classifiers are compared and analyzed using different metrics including accuracy, precision, and recall. The best accuracy reached using machine learning algorithms is 85.56% accuracy with Random Forest, while the best classifier among applied deep learning algorithms is Inception V4 with 97.36%. Keywords: Brain cancer · Central nervous system tumor · Pathophysiology · Deep learning
1 Introduction The brain is the central nervous system control hub that helps the entire human body to carry out its operations. Tumors in the brain will directly threaten people’s lives. The patients would be more likely to live if the tumor is identified at an early stage. MR imagery is commonly used by doctors to assess if cancer defects are present or the tumor is determined [1]. MR is a form of resonance imaging which has become a hot field of research. Many researchers have sought to develop smart structures to classify brain cancer into various groups such as brain tissues for normal, pathological, biting, and malignant, low-grade and high-grade forms. Main carcinoma cells that affect the brain are considered the worse cancer not only due to the weak prognostics but also because of their strong effects on executive ability loss and reduced life expectancy. Lymphomas and gliomas, in the main central nervous system that are responsible for nearly 80% of malignant cancers [2], are the most prominent major brain tumors in adults. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 138–147, 2021. https://doi.org/10.1007/978-3-030-70713-2_15
Brain Tumor Diagnosis System Based on RM Images
139
Brain cancer has various degrees. Complex prognostic results are found for low gliomas (LGGs) with an average 10-year survival rate of approximately 57% [3, 4]. Past results indicate that brain tumors, which have been recently diagnosed, can be used to predict potential diagnosis and treatment strategies using MRI characteristics [5–7]. Feature selection is one of the most critical issues for brain tumor diagnosis and segmentation. Due to the importance of detecting brain tumors in early-stage and the plenty of methods that have been used by researchers in this area, we performed a comparative study to analyze the common methods to justify the performance of twenty-two classifiers. This paper is divided into five sections. The following section (Sect. 2) presents the related works on brain tumor detection. Section 3 illustrates the methods of brain tumor detection in addition to illustrating the techniques that are used such as CNN pre-trained models. Section 4 presents a full description of the obtained results of the proposed system. Section 5 is the concluded given in the form of a sum-up and recommendations.
2 Related Works There are various revolutionary processing methods, which have been demonstrated to improve the detailed diagnosis and segmentation of brain tumors at the same time. In tumor segmentation on average, Huang et al. [8] obtained an accuracy of 74.75% on a subspace mapping basis. Previous studies adopted various models that are commonly used. These models are Support Vector Machine (SVM) and the Neural Network (NN), which demonstrated strong tumor classification results. However, before grouping, the manual collection of features in common. Studies on brain tumor classification found that Soltaninejad et al. [9] have been utilizing 38 first-order or second-order statistical tests to rate tumors of various grades depending on SVM. More than 80% of their tests indicates the quality of 21 patients with various scoring combinations. The treatment involves segmented tumor slices as models and features that are chosen carefully before training. The current advances that have recently risen from deep learning approaches such as Convolutionary Neural Network have been shown to be good in classifying objects. Comparatively speaking, deeper learning models are typically unattended learning models, which randomly learn the characteristics of the entity from the data. Ethiopia [10] indicates that a high-performance feature detector with the Convolutional Neural Networks was used in a research that focused on the ImageNet dataset. The detector achieved 15.8 percent precision with ImageNet results, besides a quantitative improvement of 70% over other previous work. Generally speaking, in previous studies on brain segmentation there can be unsupervised learning methods [11–14] and supervised [15–19] learning strategies. The present study attempts to comparatively examine AI-based brain cancer diagnosis models while using deep learning and while using machine learning approaches. Eight traditional machine learning classifiers are applied (e.g. Naïve Bayes, Logistic Regression, Decision Tree, Support Vector Machine, K-Nearest Neighbors, Random Forest), besides thirteen pre-trained models (e.g. ResNet18, ResNet50, ResNet101, ResNet152, ResNext50, ResNext101, SqueezeNet1_0, SqueezeNet1_1, AlexNet, DenseNet121, DenseNet201,GoogLeNet, Inception V4).
140
A. Y. A. Saeed et al.
3 Methods This section discusses the basic methods for developing AI-based diagnosis systems to recognize brain cancer based on images. Figure 1 shows the basic methods for brain cancer classification.
Fig. 1. The basic methods for building brain tumor diagnosis models.
As shown in Fig. 1, machine learning and deep learning methods as commonly used to diagnose brain tumors. After working on image acquisition and preprocessing, the images are forwarded to deep neural networks in the deep learning approach. However, in the traditional machine learning approach, image features extraction and segmentation are applied. 3.1 Dataset Collection Phase The dataset of this model was collected from two public sources. Around 3000 images were collected from [20], and about 698 images were collected from [21]. All collected images dataset from the previous sources are MRI. These images were very precisely collected. From all the collected images, only 160 images for tumor samples were used, besides 216 images used for a non-tumor class. These images are divided into three partitions: 264 images for training, 75 images for validation, and 38 images for testing.
Brain Tumor Diagnosis System Based on RM Images
141
3.2 Data Selection The above-mentioned downloaded images are selected precisely after getting rid of images that have poor resolution. 3.3 Covert 3D MRI to JPEG Images The dataset needs to be processed in a 2D CNN model. Therefore, we used a script for the conversion process. The output of this phase is a 2-D image. 3.4 Deep Learning Method In this approach, fourteen pre-trained models have been implemented by the following steps such as: Preparation of the Dataset. This task aims at preparing the training images in a specific folder to start a training process. Image Pre-processing. During this phase, each image is resized or rescaled to 224 * 224 to fit the input layer of the pre-trained models (e.g. ResNet-50). Only, Inceptun-V4 requires images with 299 × 299 size, because the first layer of Inception pre-trained model has the size of 299 × 299. Retraining the Pre-trained Model. Transfer Learning is applied to transferred to the knowledge of the pre-trained model (ResNet) to perform new tasks of classification with the dataset of the brain tumor. This task requires using the model which trained on a large dataset to be retrained in the task of classifying brain tumor. Obtaining the Expertise Model. The output of the previous steps is a trained model that can recognize the brain tumor. These steps are performed using Python with Pycharm environment. The model has the extension (.pt) and saved in a specific file. The previous steps of the training stage are summarized in the following Fig. 2.
Input BT Dataset
Preprocessing
Retrain the Pre-trained model
Save Trained Model Weights
Fig. 2. Training the pre-trained models.
3.5 Machine Learning Approach To this approach, more than six classification algorithms have been applied using Histogram Oriented Gradients (HOG) for feature extraction. The Principle Component Analysis (PCA) algorithm has been implemented to reduce the feature vector. Eight traditional machine learning classifiers are applied (e.g. Naïve Bayes, Logistic Regression, Decision Tree, Support Vector Machine, K-Nearest Neighbors, Random Forest).
142
A. Y. A. Saeed et al.
3.6 Deep Learning Approach CNN is a popular technique in supervised learning methods that have greatly evolved at the end of the 20th century. It mimics the human brain’s function. Furthermore, it indicates strong success in the field of 2D data classification with CNN-based algorithms (e.g., LeNet-5, ResNet, and DenseNet). It shows a testing failure at a rate of less than 1%, based on a neural network and the innovative neural network model centered on a CNN system. CNN is now commonly used in the image processing area.
Fig. 3. Convolution Neural Networks.
3.7 Transfer Learning In the deep learning domain, it is a common method to start training a new model using an advanced pre-trained model, rather than arbitrarily initializing parameters of the current one. A model is pre-trained, but designed with distinct datasets for a specific or separate mission, with a similar or identical design to the new model. Transfer learning which is beneficial in a variety of ways acquires the visual representations of a pre-trained model from a massive dataset of millions of samples: shortened time for testing the new model, future advancement of the new model, and less training data from the new task area. In cases with different learning activities and data sets, the value of transfer learning is regularly used. For example, the recognition of objects, scene recognition [22], and object recognition by natural images, the classification of interstice lung diseases with CT images [23] and [24]. 3.8 Experimental Setup This experiment is performed using an HP laptop, with 8 GB of RAM. Colab environment and Rapid Miner Tools are used, with using the 32-batch size and 25 epochs for deep learning models. In deep learning classifiers, the dataset was divided into a training set, validation set, and testing set as 70%, 20%, and 10% respectively. However, the cross-validation was applied to machine learning classifiers with k = 10.
Brain Tumor Diagnosis System Based on RM Images
143
4 Results and Discussion In this study, 22 different classifiers have been used for cancer detection including nine traditional Machine Learning classifiers, and thirteen pre-trained models. For machine learning approach, KNN, Linear Regression, Decision tree, Logistic regression, SVM, Random Forest, and Gradient Boosted Trees, Multilayer Perceptron, K-Nearest Neighbored, and Radial Base Function (RBF) are used. The best accuracy of the traditional ML classifiers was Random Forest; it achieved a value of 85.65% in terms of testing accuracy. Whereas, K- Nearest Neighbors (KNN) had shown the worst results as 81.37%. The accuracy, precision, and Recall, of the used machine learning classifiers, are shown in Table 1. Table 1. The performance of machine learning models. The classifier\Metric
Accuracy Value
Standard deviation
Precision Value
Standard deviation
Recall Value
Standard deviation
Naive Bayes
83.50%
0.0563
83.44%
0.0584
83.55%
0.0601
Logistic Regression
81.36%
0.0348
81.42%
0.0396
81.61%
0.0437
Decision Tree
83.53%
0.0473
83.62%
0.0479
84.10%
0.0491
Random Forest
85.65%
0.073
85.70%
0.0479
86.15%
0.0364
Gradient Boosted 82.39% Trees
0.0614
82.44%
0.0592
81.95%
0.0609
Support Vector Machine SVM
82.43%
0.0294
82.84%
0.0315
82.43%
0.0345
K- Nearest Neighbors KNN, K=5
81.37%
0.028
81.50%
0.0479
81.43
0.0467
Mlti-layer Perceptron MLP
82.45%
0.0489
82.25%
0.0485
82.45%
0.0486
Radial Base Function RBF
82.72%
0.055
90.62%
0.0508
66.24%
0.1312
In deep learning, thirteen pre-trained models are used with the help of AlexNet, DenesNet-121, DenesNet-201,’GoogleNet, Inception-V4, ResNet-18, ResNet50, ResNet-101, ResNet-152, ResNext-50, ResNext-101, SqueezeNet-1_0, and SqueezeNet-1_1. Training and Validation loss of all pre-trained models are shown in Fig. 4. Figure 4 shows the obtained loss during training deep learning models for 25 epochs. The best training loss obtained is by using Inception V4. Figure 5 shows the validation loss for CNN-based models.
144
A. Y. A. Saeed et al.
Fig. 4. The training loss of the pre-trained models.
Fig. 5. The validation loss of the pre-trained models.
Brain Tumor Diagnosis System Based on RM Images
145
As Fig. 5 shows, the validation loss of the pre-trained models. AlexNet reached the best validation loss at around 0.2 in epoch 11 and 15, but with inconsistent performance. Table 2 shows a comparison between the classifiers used in the pre-trained phase in terms of test accuracy, precision, and recall. Table 2. The performance of deep learning models. The classifier\Metric Test accuracy Precision Recall DenesNet121
89.47%
78.47%
78.12%
DeneseNet201
86.84%
92.26%
92.05%
ResNet50
86.84%
76.64%
76.56%
ResNet101
86.84%
81.22%
88.72%
ResNet18
94.73%
97.49%
96.42%
ResNet152
86.84%
88.65%
81.15%
ResNext50
84.21%
83.92%
85.93%
ResNext101
78.94%
80.63%
82.44%
AlexNet
84.21%
89.04%
79.68%
GoogleNet
89.47%
86.86%
89.18%
SqueezeNet-v1.0
81.57%
83.33%
84.37%
SqueezeNet-v1.1
92.10%
95.03%
95.44%
Inception-V4
97.36%
98.61%
98.33%
As Table 2 shows, the model that achieves the best result in term of testing accuracy was Inception-V4 with 97%: whereas, ResNext-101 was the worst model showing a poor accuracy of 78.94%. Vividly, the experimental findings show the outperformance of Random Forest among traditional machine learning classifiers, besides Inception-V4 that achieved the best performance among deep learning models.
5 Conclusion and Recommendations This study focused on identifying the common deep learning and machine learning algorithms in diagnosing the brain tumor. The findings indicate that the best machine learning classifier achieved the best results was Random Forest with 85.6% accuracy. Among the deep learning pre-trained models, Inception-V4 is found to best perform the task achieving 97.36%. The pre-trained models that have more deeply layers achieved better results than others did. This study recommends the application of these classifiers for the different brain tumor datasets. For future works, using a large dataset to make the results more generalized.
146
A. Y. A. Saeed et al.
References 1. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016) 2. Schwartzbaum, J.A., Fisher, J.L., Aldape, K.D., Wrensch, M.: Epidemiology and molecular pathology of glioma. Nature Clin. Practice Neurol. 2(9), 494–503 (2006) 3. Ramakrishna, R., Hebb, A., Barber, J., Rostomily, R., Silbergeld, D.J.N.: Outcomes in reoperated low-grade gliomas. 77(2), 175–184 (2015) 4. Jacob, C.P.: Post prandial hypertriglyceridemia in patients with CAD and without CAD: A comparative study. Sree Mookambika Institute of Medical Sciences, Kulasekharam (2018) 5. Mazzara, G.P., Velthuizen, R.P., Pearlman, J.L., Greenberg, H.M., Wagner, H.: Brain tumor target volume determination for radiation treatment planning through automated MRI segmentation. Int. J. Radiat. Oncol. Biol. Phys. 59(1), 300–312 (2004). 6. Yamahara, T., Numa, Y., Oishi, T., Kawaguchi, T., Seno, T., Asai, A., Kawamoto, K.: Morphological and flow cytometric analysis of cell infiltration in glioblastoma: a comparison of autopsy brain and neuroimaging. Brain Tumor Pathol. 27(2), 81–87 (2010) 7. Bauer, S., Wiest, R., Nolte, L.-P., Reyes, M.: Biology: a survey of MRI-based medical image analysis for brain tumor studies. Phys Med Biol. 58(13), R97 (2013) 8. Huang, W., Yang, Y., Lin, Z., Huang, G.-B., Zhou, J., Duan, Y., Xiong, W.: Random feature subspace ensemble based extreme learning machine for liver tumor detection and segmentation. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2014, pp. 4675–4678. IEEE (2014) 9. Soltaninejad, M., Ye, X., Yang, G., Allinson, N., Lambrou, T.: Brain tumour grading in different MRI protocols using SVM on statistical features (2014) 10. Le, Q.V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE international conference on acoustics, speech and signal processing 2013, pp. 8595–8598. IEEE (2013) 11. Szilagyi, L., Lefkovits, L., Benyo, B.: Automatic brain tumor segmentation in multispectral MRI volumes using a fuzzy c-means cascade algorithm. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) 2015, pp. 285–291. IEEE (2015) 12. Mei, P.A., de Carvalho Carneiro, C., Fraser, S.J., Min, L.L., Reis, F.: Analysis of neoplastic lesions in magnetic resonance imaging using self-organizing maps. J. Neurol. Sci. 359(1–2), 78–83 (2015). 13. Juan-Albarracin, J., Fuster-Garcia, E., Manjon, J.V., Robles, M., Aparici, F., Martí-Bonmatí, L., Garcia-Gomez, J.M.: Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification. Plos One 10(5) (2015) 14. Rajendran, A., Dhanasekaran, R.J.P.E.: Fuzzy clustering and deformable model for tumor segmentation on MRI brain image: a combined approach. 30, 327–333 (2012) 15. Wu, W., Chen, A.Y., Zhao, L., Corso, J.J.: Surgery: brain tumor detection and segmentation in a CRF (conditional random fields) framework with pixel-pairwise affinity and superpixel-level features. 9(2), 241–253 (2014) 16. Pinto, A., Pereira, S., Correia, H., Oliveira, J., Rasteiro, D.M., Silva, C.A.: Brain tumour segmentation based on extremely randomized forest with high-level features. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2015, pp. 3037–3040. IEEE (2015) 17. Soltaninejad, M., Yang, G., Lambrou, T., Allinson, N., Jones, T.L., Barrick, T.R., Howe, F.A., Ye, X.: Automated brain tumour detection and segmentation using superpixel-based extremely randomized trees in FLAIR MRI. Int. J. Comput. Assist. Radiol. Surg. 12(2), 183–203 (2017).
Brain Tumor Diagnosis System Based on RM Images
147
18. Jafari, M., Kasaei, S.J.: Automatic brain tissue detection in MRI images using seeded region growing segmentation and neural network classification. Australian J. Basic Appl. Sci. 5(8), 1066–1079 (2011) 19. Subbanna, N., Precup, D., Arbel, T.: Iterative multilevel MRF leveraging context and voxel information for brain tumour segmentation in MRI. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, pp. 400–405 (2014) 20. Navoneel, C.: Brain MRI Images for Brain Tumor Detection (2020). https://www.kaggle. com/navoneel/brain-mri-images-for-brain-tumor-detection. Accessed 16 May 2020 21. Nbia Cancer Imaging Archive (2020). https://nbia.cancerimagingarchive.net/nbia-search/ 22. Yu, W., Yang, K., Bai, Y., Xiao, T., Yao, H., Rui, Y.: Visualizing and comparing AlexNet and VGG using deconvolutional layers. In: Proceedings of the 33rd International Conference on Machine Learning (2016) 23. Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks (2014) 24. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Diagnosis of COVID-19 Disease Using Convolutional Neural Network Models Based Transfer Learning Hicham Moujahid1(B) , Bouchaib Cherradi1,2 , Mohammed Al-Sarem3 , and Lhoussain Bahatti1 1 SSDIA Laboratory, ENSET of Mohammedia, Hassan II University of Casablanca,
28820 Mohammedia, Morocco 2 STIE Team, CRMEF Casablanca-Settat, Provincial Section of El Jadida,
24000 El Jadida, Morocco 3 Information Systems Department, Taibah University, Al-Madinah Al-Monawarah,
Kingdom of Saudi Arabia
Abstract. COVID-19 disease is similar to normal pneumonia caused by bacteria or other viruses. Therefore, the manual classification of lung diseases is very hard to discover, particularly the distinction between COVID-19 and NON-COVID-19 disease. COVID-19 causes infections on one or both lungs which appear as inflammations across lung cells. This can lead to dangerous complications that might cause death in the case of gaining or having an immune disease. The problem of COVID-19 is that its symptoms are similar to conventional chest respiratory diseases like flu disease and chest pain while breathing or coughing produces mucus, high fever, absence of appetite, abdominal pain, vomiting, and diarrhea. In most cases, a deep manual analysis of the chest’s X-ray or computed tomography (CT) image can lead to an authentic diagnosis of COVID-19. Otherwise, manual analysis is not sufficient to distinguish between pneumonia and COVID-19 disease. Thus, specialists need additional expensive tools to confirm their initial hypothesis or diagnosis using real-time polymerase chain reaction (RT-PCR) test or MRI imaging. However, a traditional diagnosis of COVID-19 or other pneumonia takes a lot of time from specialists, which is so significant parameter in the case of a pandemic, whereas, a lot of patients are surcharging hospital services. In such a case, an automatic method for analyzing x-ray chest images is needed. In this regard, the research work has taken advantage of proposing a convolutional neural network method for COVID-19 and pneumonia classification. The X-ray processing have been chosen as a diagnosis way because of its availability in hospitals as a cheap imaging tool compared to other technologies. In this work, three CNN models based on VGG-16, VGG19, and MobileNet were trained using the zeroshot transfer learning technique. The best results are obtained on VGG-19 based model: 96.97% accuracy, 100% precision, 100% F1-score, and 99% recall. Keywords: Convolutional neural network · Transfer learning · COVID-19 · Pneumonia · X-ray images
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 148–159, 2021. https://doi.org/10.1007/978-3-030-70713-2_16
Diagnosis of COVID-19 Disease
149
1 Introduction The novel coronavirus (COVID-19) also called SARS-CoV-2 appeared the first time in Wuhan city, China. This virus propagated from animals to humans in December 2019. It has been spread mainly among people through the respiratory system. The droplets of the infected person most likely contain COVID-19 viruses and spread to others in many ways like coughing, sneezing, or even talking to someone else. The spreading factors continue to increase until the World Health Organization (WHO) declared coronavirus as a pandemic threatening every human’s life on 11 February 2020. By August 9, 2020, the spread of virus reaches 20 million cases all over the world and 730 000 deaths1 . Pneumonia has almost the same symptoms as COVID-19 disease which makes it hard to diagnose and differentiate between them. Pneumonia is generally an inflammation in the small air sacs in lung. It can be caused by many germs like viruses, bacteria, and fungi. Therefore, specialists need additional tools like blood tests and deep x-ray imaging. Because of the remarkable increase of COVID-19 infected patients, the manual diagnosis is not sufficient. Therefore an automatic and rapid way of diagnosis is needed. X-rays are radiation waves or electromagnetic waves that help to create images of the internal body and organs with different shades and levels of a gray color. The level of gray color results from the corresponding absorption of radiations by molecules, for example, calcium in bones, absorbs x-ray waves the most, which makes the color more near to the white color than to the black or gray. The air has very leak absorption, so lungs for example look more near to the black color. According to the recent works, many diseases could be diagnosed automatically. For example, the brain tumor segmentation task can be achieved by convolutional neural networks applied for 3D MRI images [1]. For predicting patients with type 2 diabetes mellitus with the help of machine learning algorithms, in [2] four algorithms: decision tree, K-nearest neighbors, artificial neural network, and Deep Neural Network were applied and evaluated. Heart diseases such as atherosclerosis [3] could be early diagnosed to prevent many health complications, using artificial Neural Network (ANN) and KNearest Neighbor (KNN). Convolutional Neural Networks based algorithms could help us detect abnormal lungs and diagnose pneumonia [4] based on processing and analyzing thoracic X-ray images. The X-ray radiation can be used to diagnose chest diseases like pneumonia and recently COVID-19 disease, by exposing the patients to a normal amount of radiation which does not put the patients at a big risk of radiation. At this end, most parts of a normal lung look black because of the excessive presence of the air, whiles the infected parts of lung are shown more near to the white color because of the leak of the air particles and the presence of other types of tissues and pus. CT images can be exploited for COVID-19 diagnosis [5, 6], but the existence of this technology is limited in hospitals. In the proposed method, the diagnosis is automatic using the power of convolutional neural networks on image processing. This method is very useful which presents good results and the probability of predictions compared to the manual way [7]. It consists of deep analyzing chest x-ray images of patients to classify diseases and differentiate normal lungs from abnormal lung [8, 9]. For the abnormal lungs, our proposed method 1 https://www.worldometers.info/coronavirus.
150
H. Moujahid et al.
can distinguish and classify traditional pneumonia and COVID-19 disease. However, the convolution aspect of CNN networks makes this methodology efficient in dealing with image processing. Thus, to accelerate and minimize time and resources complexity for tasks with huge amount of data, the use of parallel architecture is recommended [10]. The rest of this paper is organized as follows: In Sect. 2, a review of some potential and recent related works is presented. Section 3 presents all sources used to build the used dataset and describe the proposed methods. In Sect. 4, the finding results and discussion in terms of model performances are included. Section 5 concludes the paper and gives some future perspectives.
2 Related Works The actual plague of the COVID-19 pandemic obliged most researchers to focus their efforts on finding besides a medical treatment a rapid and appropriate way for diagnosing the disease at its early stage. Many attempts have been done lately concerning the exploitation of machine learning algorithms in order to help to diagnose coronavirus disease by analyzing clinical resources. The current section presents some interesting publications related to our work concerning the application of machine learning in the field of COVID-19 problematic. In [11], the authors proposed a convolutional neural network model by combining Xception and ResNet50V2 networks. The proposed model was trained on a dataset of 11302 X-ray images to classify images into three classes: pneumonia, normal, and COVID-19 cases. The used dataset was unbalanced were only 180 samples of COVID-19 against 6054 of pneumonia and 8851 of normal cases. Although the average accuracy of the detecting COVID-19 cases on the validation dataset was 99.51% and 91.4% accuracy for other classes, this work did not test the model on the testing set to get accurate model performances. Another work in [12] concerns a new automatic system of diagnosing COVID19 disease from x-ray findings. The proposed system employs hybrid deep learning techniques in which long short-term memory (LSTM) is concatenated with CNN. The CNN part is used for feature extraction, whilst LSTM is used at the detection phase. Same as the work of [11], the dataset used for training the model was very small. The authors used a dataset of only 421 x-ray images and 141 of them were COVID-19 features, 140 images for normal cases, and 140 for pneumonia cases. The proposed methods achieved 97% accuracy. However, deeply analyzing the finding results, the proposed method suffers from, on one hand, the weakness of the used data augmentation technique which generated data with a big correlation. On the other hand, to evaluate the model, the authors reported the performance of the model only respects the validation set and omitted the testing set. A new CNN based model was designed and described in [13] for detecting COVID-19 in the human body. In that study, the model was trained on a dataset of three classes: normal x-ray images, pneumonia x-ray images, and COVID-19 x-ray images. The authors used a dataset from the Kaggle dataset repository2 . The dataset contains 234 normal 2 https://github.com/ieee8023/covid-chestxray-dataset.
Diagnosis of COVID-19 Disease
151
images, 390 images of pneumonia, and only 94 coronavirus images. The dataset is divided into a training set, a validation set, and a testing set. The best accuracy achieved by the model was 87.4%. However, analyzing deeply the finding results, the model gets a big over-fitting problem due to the weakness of used image preprocessing techniques. In [14], the authors proposed a CNN model as a combination of three models: autoregressive integrated moving average (ARIMA) model, the prophet algorithm, and LSTM. The proposed model used a dataset of only 128 x-ray images including 28 healthy cases and 70 of COVID-19 cases3 . Similarly, the authors used an augmentation technique to generate more data until 1000 x-ray images. Later, the model was trained and tested to predict the area that will be the most infected in the next seven coming days in terms of new cases, recovered cases, and reported deaths. In [15], the study shows clearly the difference between the use of a pre-trained model and the modified CNN model. In the experiment, the authors used two different datasets (CT) images and X-ray images. The results showed that using the pre-trained Alex-Net model yields better results comparing with the modified CNN model with 98% and 94.1% accuracy respectively. A summary of the aforementioned related work methodologies and validation accuracies obtained on training models over different datasets is presented in Table 1. Table 1. Overview of the related works results. Authors/Reference
Used method
Images type
COVID-19 images number
Accuracy
Rahimzadeh et al. [11]
Combined Xception & ResNet50V2
x-ray
180
99.51%
Islam et al. [12]
Concatenated LSTM & CNN
x-ray
141
97%
Gonesh et al. [13]
CNN model
x-ray
94
87.4%
Alazab et al. [14]
Combined LSTM & PA & ARIMA
x-ray
70
99.94%
3 Materials and Methods The next subsections introduce an artificial intelligence concept, the used methodology of convolutional neural networks, and the process of collecting sufficient features from public resources to build a valid dataset. 3.1 Dataset Assembling The lake of publicly available datasets that concerns the new pandemic of COVID19 makes the collection of sufficient data for our work hard and difficult. Especially, 3 https://www.kaggle.com/nabeelsajid917/covid-19-x-ray-10000-images.
152
H. Moujahid et al.
the collected and selected data from public resources need additional filtering and preprocessing. The final dataset contains three classes of images: normal cases, pneumonia cases, and COVID-19 cases. Figure 1 shows a sample of each class.
Fig. 1. Example of samples from the used classes in dataset: (a) Normal case, (b) Pneumonia case, (c) COVID-19 case.
Our work requires a dataset of chest x-ray findings. For normal and pneumonia images, a COVID-19 Radiography Database of chest x-ray images collected by a team of researchers from many universities was used. The database contains 1341 normal x-ray images and 1345 pneumonia x-ray images. Also, the dataset contains also 219 COVID-19 positive x-ray images that will be combined with other images to build the final dataset as described in Table 2. Table 2. Different sources of COVID-19 datasets. Dataset
COVID-19 images
Valid images
Joseph Paul Cohen (ieee8023)a
661
456
Covid-19 chest x-ray dataset initiativeb
56
35
ACTUALMED COVID-19 chest x-ray datasetc
239
58
COVID-19 Radiography Databased
224
224
76
68
Dataset-01 Chest x-rayse
a https://github.com/ieee8023/covid-chestxray-dataset. b https://github.com/ieee8023/covid-chestxray-dataset. c https://github.com/agchung/Actualmed-COVID-chestxray-dataset. d https://kaggle.com/tawsifurrahman/covid19-radiography-database. e https://github.com/zeeshannisar/COVID-19.
In terms of valid chest x-ray images, 840 images were collected from the literature. Then, the dataset is divided into three sub-datasets as follows: 70% for the training dataset, 15% for the validation dataset, and 15% for the testing dataset. A brief description of sample distribution is shown in Table 3.
Diagnosis of COVID-19 Disease
153
Table 3. Description of dataset distribution SUBSETS
NORMAL
PNEUMONIA
COVID-19
Training set
939
941
588
Validation set
201
202
126
201
202
126
1341
1345
840
Testing set Total
3.2 Proposed CNN Based Methodology CNN, as earlier stated, is a deep learning model composed of several artificial neurons related to each other mathematically by specific functions. CNN is usually used intensively for extracting distinctive features from visual images. A CNN network is based on mathematical convolution operation applied in at least one convolutional layer [16]. For complex-valued functions f, g defined on Z ensemble, the discrete convolution of f and g is given as follows: (f ∗ g)[n] =
m=+∞
f [m].g[m − n]
(1)
m=−∞
A traditional CNN network consists of an input layer related to multiple hidden layers and an output layer called a classifier. The hidden layers are a combination of convolutional layers, pooling layers, and fully connected layers. In a CNN architecture, the input is a tensor with shape depending on the input image dimension. After the tensor passes through a convolutional neural network, an abstraction of the image happens to generate a feature map with a defined shape. When creating a convolutional layer, some hyper-parameters are tuned, e.g., a kernel with a specific depth and height, the number of input channels and output channels, the depth of convolution filter, and finally the activation function. Each layer output is transformed into an input for the next layer which makes shapes of inputs and outputs are highly correlated. The general architecture for a convolutional neural network is presented in Fig. 2, showing the input layer, two hidden layers, and the output layer.
Fig. 2. Convolutional neural network architecture.
154
H. Moujahid et al.
Convolutional Layers Inside the convolutional layer, each neuron is connected to only a subset of neurons connected spatially in the layer before and the weights of these connections are shared with all other neurons in the convolution layer [17, 18]. The mean utility of the convolutional layer is detecting the local features at all positions in the input feature maps with learnable kernels (connection weights between the feature map i at the layer n − 1 and the feature map j at the layer n). Pooling Layers Pooling operation consists of sliding a 2-dimensional filter across the feature map resulted from the previous convolutional layer, then summarizing the output features. Generally, the mean goal of the pooling layer is to reduce the feature size and preserve only the important information and release the rest. There are several types of pooling operations among them the max pooling, average pooling, and global pooling are the most commonly used in CNN networks [17, 19]. The typically max-pooling operation is presented in Fig. 3. The output element y of a pooling layer is defined as follows: y = maxxij i,jR
(2)
where xij represents an element covered by the filter R.
Fig. 3. The max-pooling operation.
Fully Connected Layers (FC) Fully connected layers (are also called dense layers) are equivalent to the convolutional layers with the difference that all units in a fully connected layer are connected to every unit in the next layer as well as those neurons at the previous layer. Those layers are activated by an activation function generally a rectifier linear unit (ReLU) [18]. Transfer Learning (TL) The trained convolutional neural network model can be transferred to be exploited for another prediction task [20]. Such a process is known as the transfer learning process. There are many derivatives of transfer learning depending on the task to process. The used one in this paper is a zero-shot transfer learning, where all layers must be retrained
Diagnosis of COVID-19 Disease
155
except the output layer. Totally, the COVID-19 and pneumonia classification task is different from the original task of the pre-trained CNN model. In this case, a large dataset and computing power are needed as well. Also, some layers at the end must be added and specified the appropriate classifier. Every machine learning model must be tested and evaluated before exploitation in the real-world task. This can be done by calculating evaluation metrics. However, there are many metrics to evaluate a model. In the experimental part, three models are trained by using transfer learning approach using architectures of VGG-16, VGG-19, and MobileNetV2.
4 Results and Discussion A CNN model presents different results depending on the depth of the network, type of layers, hyper-parameters and its architecture. In this paper, the VGG-16, VGG-19, and MobileNetV2 architectures are used as a base of the proposed model. 4.1 Training Results As shown in Fig. 4, while training, both the VGG-16 model and the VGG-19 model continue to be improved in a linear manner in terms of validation accuracy and loss until the 10th epoch. After that, the training performance begins to stabilize and the improvement becomes slow. On the opposite, the MobileNetV2-based model achieves good training improvements until the 13th epoch, and then becomes slower. The main problem of this model is the divergence of the validation performances from the training. To avoid the over-fitting problem, a callback parameter is adopted and specified to optimize the number of epochs needed to obtain the maximum possible accuracy. After that, the models are trained with the dataset described in Table 3. The overall accuracy and loss results are presented in Fig. 4 and Fig. 5, showing the variation of those metrics across epochs. In addition to that, the three models showed different convergence levels. The VGG16 based model is the fastest in terms of convergence. It needs 25 epochs to get maximum accuracy value. Whilst VGG-19 based model needs only18 epochs, and MobileNetV2 based model needs 47 epochs. 4.2 Testing Results The training step generates a ready model for testing. In this experiment, the three models were tested on a totally an independent dataset from the ones used for training and validation. Then, according to the obtained testing results, the models are evaluated with specific metrics as shown in Table 4. VGG-19 and VGG-16 showed the best accuracy value of almost 97% against MobileNetV2 that achieved an accuracy of 95.84%. In terms of COVID-19 classification, VGG-19 and VGG-16 make a 100% prediction correctly.
156
H. Moujahid et al.
Fig. 4. Accuracy metric of trained models: (a) VGG-16, (b) VGG-19 and (c) for MobileNet.
Fig. 5. Loss metric of trained models: (a) VGG-16, (b) VGG-19 and (c) for MobileNet
Diagnosis of COVID-19 Disease
157
Table 4. Detailed metrics values for each trained model Metric Precision Recall F1-score
VGG-16 VGG-19 MobileNetV2 1.00 0.97 0.98
1.00
0.95
0.99
1.00
1.00
0.98
Accuracy 96.22%
96.97% 95.84%
Loss
17.40%
17.33%
14.66%
4.3 Comparison with Related Works Comparing the finding results of our proposed model with those cited in the existing methods listed previously in the related work section, our model achieved a 100% precision and 100% F1-score. In terms of accuracy, our proposed model shows a slightly good accuracy respecting different datasets. More details are shown in Table 5. Table 5. Comparison of our results with other similar works Method
Recall
Precision
F1-score
Accuracy
Combined Xception & ResNet50V2 [11]
80.53%
35.27%
NA
99.51%
Concatenated LSTM & CNN [12]
100%
NA
100%
97%
CNN model [13]
NA
NA
NA
87.4%
Combined LSTM & PA & ARIMA [14]
NA
NA
NA
99.94%
This work: VGG-19 based TL model
99%
100%
100%
96.97%
5 Conclusion and Perspectives Through this paper, a COVID-19 detection methodology was reported by personalizing and retraining three CNN models (VGG-19, VGG6, and MobileNetV2). A specific dataset of 3526 X-ray images was generated with the help of many sources. It contains 840 COVID-19 cases, 1345 images of pneumonia cases and 1341 images of normal cases. After testing the models on a test set containing 30% of the original dataset, a deep analysis of results and performance of the models was performed based on some essential metrics (Recall, Precision, F1-score, and Accuracy). The best result on VGG-19 shows 99% on recall, 100% for precision, and 100% for f1-score. The results prove also the importance of using CNN architecture for predicting COVID-19 disease based on X-ray images. The behavior of a convolutional neural network towards a classification task is unpredictable. This conducts us to think that there is no dedicated model for each field of study.
158
H. Moujahid et al.
Therefore, our vision in the future is to train and test as much as possible of models. However, there are so many tasks to process in the medical field using deep learning methodologies. The availability and accuracy of the dataset in the specific field is an important factor that must be taken into consideration. Acknowledgements. This work is a part of a project supported by co-financing from the CNRST (Centre National pour la Recherche Scientifique et Technique) and the Hassan II University of Casablanca, Morocco. The project is selected in the context of a call for projects entitled “Scientific and Technological Research Support Program in Link with COVID-19” launched in April 2020 (Reference: Letter to the Director of “Ecole Normale Supérieure de l’Enseignement Technique de Mohammedia” dated 10 June 2020).
References 1. Moujahid, H., Cherradi, B., Bahatti, L.: Convolutional neural networks for multimodal brain mri images segmentation: a comparative study, pp. 329–338 (2020) 2. Daanouni, O., Cherradi, B., Tmiri, A.: Type 2 diabetes mellitus prediction model based on machine learning approach. In: The Proceedings of the Third International Conference on Smart City Applications, pp. 454–469 (2019) 3. Terrada, O., Cherradi, B., Raihani, A., Bouattane, O.: Classification and Prediction of atherosclerosis diseases using machine learning algorithms. In: 2019 5th International Conference on Optimization and Applications (ICOA), pp. 1–5 (2019) 4. Moujahid, H., Cherradi, B., Gannour, L., Bahatti, O.T., Hamida, S.: Convolutional Neural Network Based Classification of Patients with Pneumonia using X-ray Lung Images, vol. 5, no. 5, p. 9 (2020) 5. Singh, D., Kumar, V., Vaishali, Kaur, M.: Classification of COVID-19 patients from chest CT images using multi-objective differential evolution–based convolutional neural networks. Eur. J. Clin. Microbiol. Infect. Dis. 39(7), 1379–1389 (2020). https://doi.org/10.1007/s10096020-03901-z. 6. Wang, S., et al.: A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19), medRxiv, p. 2020.02.14.20023028, April 2020. https://doi.org/10.1101/2020. 02.14.20023028. 7. Zhang, Q., Liu, Y., Liu, G., Zhao, G., Qu, Z., Yang, W.: An automatic diagnostic system based on deep learning, to diagnose hyperlipidemia. Diabetes Metab. Syndr. Obes. Targets Ther. 12, 637–645 (2019). https://doi.org/10.2147/DMSO.S198547 8. Heidari, M., Mirniaharikandehei, S., Khuzani, A.Z., Danala, G., Qiu, Y., Zheng, B.: Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. Int. J. Med. Inf. 144, 104284 (2020). https://doi.org/10.1016/ j.ijmedinf.2020.104284 9. Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Rajendra Acharya, U.: Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 121, 103792 (2020). https://doi.org/10.1016/j.compbiomed.2020.103792 10. Bouattane, O., Cherradi, B., Youssfi, M., Bensalah, M.O.: Parallel c-means algorithm for image segmentation on a reconfigurable mesh computer. Parallel Comput. 37(4), 230–243 (2011). https://doi.org/10.1016/j.parco.2011.03.001 11. Rahimzadeh, M., Attar, A.: A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2. Inform. Med. Unlocked 19, 100360 (2020). https://doi.org/10.1016/j.imu. 2020.100360
Diagnosis of COVID-19 Disease
159
12. Islam, M., Islam, M., Asraf, A.: A Combined Deep CNN-LSTM Network for the Detection of Novel Coronavirus (COVID-19) Using X-ray Images (2020) 13. Gonesh, C., Ganie, I., Rajendran, G., Nathalia, D.: CNN Analysis for the detection of SARSCoV-2 in Human Body, pp. 2369–2374, June 2020 14. Alazab, M., Awajan, A., Mesleh, A., Abraham, A., Jatana, V., Alhyari, S.: COVID-19 Prediction and detection using deep learning. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 12, 168–181 (2020) 15. Maghdid, H.S., Asaad, A.T., Ghafoor, K.Z., Sadiq, A.S., Khan, M.K.: Diagnosing COVID19 pneumonia from X-Ray and CT Images using Deep Learning and Transfer Learning Algorithms, p. 8. 16. Tajbakhsh, N., Shin, J.Y., Hurst, R.T., Kendall, C.B., Liang, J.: Chapter 5 - automatic interpretation of carotid intima–media thickness videos using convolutional neural networks In: Zhou, S.K., Greenspan, H., Shen, D. (eds.) Deep Learning for Medical Image Analysis, pp. 105–131. Academic Press (2017) 17. Wanda, P., Jie, H.: RunPool: a dynamic pooling layer for convolution neural network. Int. J. Comput. Intell. Syst. 13, January 2020. https://doi.org/10.2991/ijcis.d.200120.002. 18. Srinivas, S., Sarvadevabhatla, R.K., Mopuri, K.R., Prabhu, N., Kruthiventi, S.S.S., Babu, R.V.: Chapter 2 - an introduction to deep convolutional neural nets for computer vision. In: Zhou, S.K., Greenspan, H., Shen, D. (eds.) Deep Learning for Medical Image Analysis, pp. 25–52. Academic Press (2017) 19. Alsaeedi, A., Al-Sarem, M.: Detecting rumors on social media based on a CNN deep learning technique. Arab. J. Sci. Eng. (2020). https://doi.org/10.1007/s13369-020-04839-2 20. Sewak, M., Karim, M.R., Pujari, P.: Practical Convolutional Neural Networks: Implement Advanced Deep Learning Models Using Python. Packt Publishing Ltd. (2018)
Early Diagnosos of Parkinson’s Using Dimensionality Reduction Techniques Tariq Saeed Mian(B) Department of IS, College of Computer Science and Engineering, Taibah University, Madinah Almunwarah, Saudi Arabia [email protected]
Abstract. Correct and early diagnosing Parkinson’s Disease (PD) is vital as it enables the patient to receive the proper treatment as required for the current stage of the disease. Early diagnosis is crucial, as certain treatments, such as levodopa and carbidopa, have been proven to be more effective if given in the early stages of PD. At present the diagnosis of PD is solely based on the clinical assessment of a patient’s motor symptoms. By this stage however, PD has developed to such an extent that irreversible neurological damage has already occurred, meaning the patient has no chance of recovering. By implementing the use of machine learning into the process of assessing a potential PD patient the disease can be detected and diagnosed at a much earlier stage, allowing for swift intervention, which increases the chance of PD not developing to such damaging levels in the patient. Machine Learning is a subfield of artificial intelligence that provides different technique to scientists, clinicians and patients to address and detect diseases like PD at early stage. The main symptom of PD is the vocal impairment that distinguishes from the normal person. In this study, we used a PD vocal based dataset that has 755 features The Principal Component Analysis (PCA) and Linear Discriminate Analysis (LDA) techniques are used to reduce the dimensionality of the available Parkinson’s dataset to 8 optimal features. The study used four supervised machine learning algorithms, two algorithms are from the ensemble techniques, Random Forest, Adaboost Support Vector Machine and Logistic Regression. The Random Forest model with LDA and PCA shows the highest accuracy of 0.948% and 0.840% respectively. Keywords: Parkinson’s disease · Early detection · Machine learning · Linear Discriminate Analysis · Dimensionality reduction · Principal Component Analysis · Ensemble methods · Random forest · Adaboost Support Vector Machine · Logistic regression
1 Introduction Bioinformatics have been widely used in diagnosis and detection of fatal neural diseases in recent times. Machine Learning is the sub-field of artificial intelligence that is being utilized in Parkinson’s disease diagnosis. This disease is mostly found in people over © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 160–175, 2021. https://doi.org/10.1007/978-3-030-70713-2_17
Early Diagnosos of Parkinson’s Using Dimensionality Reduction Techniques
161
65 years of age [1]. However, the symptom of PD disease arises during the age of 30– 50. PD is a chronic, neurodegenerative disease that majorly affects the motor system of the body. The PD progression rate is very slow; however, as the condition worsens it stimulates the non-motor symptoms, such as hallucinations, mood disorders, delusions and other cognitive and behavioral changes. The environment and genetic factors have a significant influence on the risk of developing Parkinson’s; however, the origin of disease remains unknown [2]. PD is caused due to loss of brain cells by the degeneration of dopamine producing neuron cells. Dopamine is chemical that is generated from substantial nigra that is responsible for transfer of signals within the brain. As the production of dopamine is reduced, movement disorders begin to develop in the patient. The Parkinson’s symptoms can be divided into motor and non-motor symptoms. Motor symptoms are movement related problems, and these are more perceptible to non-motor symptoms [3]. The patient has issue of rigidity, tremor and slower movement. Non-motor symptoms of PD are speech problem, sleep disorder and olfactory issues. The general symptoms that PD patient face is difficulty in walking, mental disorders and shaking in movement (tremors). PD patients also suffer from depression and anxiety. AS the condition of a PD patient worsens then dementia develops. In regard to gender, this disease is more common in males as compared to females [4]. There is no proper treatment of Parkinson’s disease in the market. Available treatments only help control the symptoms rather than treat them. There are different methods for diagnosis of PD that are adopted by different practitioners. Physicians prefer medical history and neurological examination to assess the condition of PD patients [5]. Medical imaging, such as Magnetic Resonance Imaging (MRI) and Computer Tomography (CT), are also useful for the diagnosis of PD, with MRI being the more effective of the two methods [6]. The diffusion MRI technique is effective in distinguishing between additional PD syndrome and PD [7]. As information technology is making progress and new computational systems are being introduced, More and more clinicians are interested in using an intelligent model to improve not only the accuracy of diagnosis, but also the quality of diagnosis. In recent years due to easy access to storage and communication tools enormous data has become available for domestic and industrial usage. The examples of applications that are providing further research are flight simulators, weather forecasting and earth simulators [8]. The cognitive ability can be achieved through a wide range of computer science branches like Artificial Intelligence, Machine Learning, Natural Language Processing and Computer Vision [9]. Now the signs of Parkinson can be identified through smart phone applications. The aim of this application is to detect PD and monitor the progress rate. Most of PD diagnosis model focus on clear symptoms that can be easily identified through medical equipment. PD can be detected by the vocal patterns of the user. The vocal disorder of PD individual can be identified in the early stages of the disease. The large set of available data about PD and clinical need for an intelligent system gave rise to develop a computational model that can be used for the early detection of PD [10]. The vocal features are passed to machine learning models to determined potential insight from the data. The automatic classification of PD is based on its severity. PD becomes life threatening due to late stage diagnosis. The early stage diagnosis of PD increases the chance of the patient’s condition not deteriorating further and becoming more severe. The researchers have used different
162
T. S. Mian
speech signal processing techniques to get the clinical like features of PD and then these mined features are passed on to machine learning-based models to classify the disease. Support Vector Machine (SVM) [11], Artificial Neural Network [12], Random forest [13] and K Nearest Neighbor (KNN) [14] are the more commonly used algorithms to classify PD. The above-mentioned algorithms use the feature selection technique and take optimal number of features as input. The vocal data have intrinsic properties and manual selection of features is a difficult task. There is need of a more novel approach to diagnose PD that should be simplified, less expensive and more reliable. In this study we used the vocal based feature dataset that is publicly available on the Kaggle website. This dataset is the property of the University of California, Irvine Machine Learning. The dataset consists of 754 different attributes. We used Principal Component Analysis and Linear Discriminate Analysis to reduce the dimensionality of the proposed dataset. We used only 8 features as input to the four supervised machine learning algorithms. We used two ensemble techniques; Random Forest and Adaboost. Like other health studies the datasets used on paper are also imbalanced, which means the number of one class instances is larger in distribution than the number of other class instances. The imbalance data set impact the classification performance of the ML algorithm due to its biasness towards the majority class. As the dataset is imbalanced, the accuracy may be misleading in measuring and predicting, and most outcomes may of the majority class. In order to check evaluation of proposed models, we used accuracy, confusion matrix, Precision, Recall, F1-Score and Receiver Operator Characteristics (ROC) as the performance evaluation metrics. The contribution of the proposed approach is: • In the proposed approach, we used an unsupervised approach to reduce the dimensionality of the data. PCA and LDA shows better results than simple used feature selection techniques. • We used two ensemble-based machine learning and two simple machine learning algorithms. We prove that ensemble model provides better results than simple supervised machine learning algorithms. • We used Accuracy, Precision, recall, F1-score to investigate the performance of LDA and PCA on ML algorithms. The paper is organized as follow; in Sect. 2, provides a literature review, Sect. 3 discuss and explain the methodological issues, Sect. 4 presents the proposed algorithms predictions through tables and graphs and Sect. 5 is devoted to conclusion and future direction.
2 Literature Review In this section we are going to discuss existing machine learning techniques used to diagnose Parkinson’s disease. Our main focus is to discuss intelligent methods powered by Machine learning and deep learning for the classification of PD. The author et al. [15] discuss about the fed-forward neural network used for Parkinson’s prediction. In this study, the prediction error of the model is discussed. Neural network output is a
Early Diagnosos of Parkinson’s Using Dimensionality Reduction Techniques
163
project rule-based system. The unlearnt data is gathered separately during the process of learning the models and fed to the model in next batch of training. This approach also performs well on imbalanced datasets. Data Mining techniques are mostly used on structure data to make predictions. Three different techniques of data mining, tress based, statistical and support vector machine, are used to classify the effected individuals. The prediction is measured in term of accuracy in the data mining approaches [16]. In this study [17], Artificial Neural Network (ANN) and SVM are used to classify the effected individuals of PD. These approaches are helpful for medical practitioner to diagnose PD suffering individual at lower costs. There are different machine learning algorithms that are applied to vocal recordings of PD patients and make the decision boundary between target variable instance classes. The ensemble algorithm random forest model outperforms with candidate feature selection by using minimum Redundancy Maximum Relevance(mPMR) feature selection technique over the benchmark models [18]. The author et al. [19] presented a particular classification and prediction approach for Parkinson diagnosis. In this study, data preprocessing, cross validation and Machine Learning algorithms are used to find the hidden pattern from the data. The tremor data features and neuro data features were analyzed for symptoms prediction. Machine Learning provides very good results, yet have some shortcomings in PD detection and sensitivity rate. Mostafa et al. [20] proposed multi-agent data analysis technique. This technique evaluates vocal disorders. The effected individual vocal records were considered as important features for disease detection. Reinforcement Learning, Naïve Bayes, Random Forest and Decision Tree models were used to analyze the vocal variations. The dataset used in this study were collected from Tel Aviv Sourasky Medical Centre. However, the work was lagging with real-time issues. The author et al. [21] presented the novel approach for detection of PD symptoms using vocal based dataset. Naïve Bayes and SVM were used on vocal dataset to make the prediction. The proposed approach was attempted to predict the more accurate result yet this work has some limitation in dataset features. The author et al. [22] proposed information gain analysis technique for PD detection from benchmark datasets. In this study, different Machine Learning and Information gain techniques were combined for the detection of PD. This strategy has good result in PD diagnosis yet produced insignificant results compared to deep learning techniques used for PD diagnosis. Seppi et al. [23] presented the new technique for Parkinson treatment using non-motor symptoms. The proposed approach provides information and updates the next level treatment method for the future. The work was a collection from different treatment evidences and provided valuable suggestions. The PD classification results depend on the feature selection and artificial learning methods. In research, many researchers have used publicly available dataset [24] that consist of 31 instances and 195 sound recordings. Parisi et al. [25] proposed hybrid intelligence-based classifier for PD diagnosis. The dataset used in this paper was the property of University of California Irvine ML repository. They used MLP with custom cost function and were trained on the training instance. Then hybrid MLP_LSVM and predict the diagnosis of PD and get 100% accuracy rate. The author et al. [26] presented a new technique for PD detection with vocal features. They used different feature selection to filter 10 optimal features with high relevance score. They used feature selection technique such as Least
164
T. S. Mian
Absolute Shrinkage and Selection Operator, Minimum Redundancy Maximum Relevance (mRmR). These optimal features are then passed to the Random Forest model and Support Vector Machine. These models show a precision rate of 98% with following features: shimmer, HNR and vocal fold excitation. The author et al. [27] presented a novel approach for detection of PD using the vocal features. The input features were, jitter, shimmer, pitch and HNR. A different selection technique was used to get the high rank features such as Fisher’s Discriminant ratio, correlation rates, t-test and ROC. The optimal features were defined through the wrapper method that used support Vector machine model to project feature performance curve. In this study, KNN, SVM and discrimination-based classifiers were used with optimal features. In order to validate the performance of the algorithms, accuracy, error rate, sensitivity and specificity, metrics were used. The KNN model has the highest accuracy score with 93.82%. The author et al. [28] used advance method for clinical treatment of PD and his proposed study presented that supportive care, including rehabilitative and physical interventions, nursing care and speech therapy, are main process for improvement in the recovery of PD. The author et al. [29] proposed the SVM model with Gaussian Radical basis kernel for PD diagnosis. The dataset used in study was taken from UCI machine learning repository. The author et al. [30] used the non-linear model for classification of PD that is based on Dirichlet mixtures. The author et al. [31] used the feature selection technique mutual information gain with SVM. This technique obtained high classification accuracy yet tele-diagnosis of PD needs a better method with higher classification performance.
3 Methodology 3.1 Data Set In this proposed study, vocal based dataset is used that contains healthy and affected individual’s vocal recording instances. This dataset is accessed from California University, Irvine Machine Learning. This dataset has 188 patient records in which there are 81 female and 107 male participants. The participants’ individual age group ranges from 33 to 87. The healthy group has 64 samples with 23 male and 41 female individuals with age range of 41 and 82. The final version of the dataset contains 756 instances and 754 attributes [33, 34] (Table 1). Table 1. Dataset description Detail
Source information
Dataset property
University of California, Irvine Machine Learning
Dataset name
Parkinson’s Disease
Dataset attributes 754 Dataset records
756
Target variable
(0-control, 1-PD). Binary Class Problem
Task
Binary classification
Early Diagnosos of Parkinson’s Using Dimensionality Reduction Techniques
165
We also analyze that class distribution of dataset is skewed, which means our dataset is imbalanced. The classification accuracy will be tending toward the majority class. To handle the issue of imbalanced dataset, we used an upsampling technique in which class distribution 0 and 1 are equal. The upsampling increase the distribution in the minority class and make an equal distribution of both classes. The Fig. 1 shows that class distribution of the target value without upsampling.
Fig. 1. Class distribution of the target value without upsampling
3.2 Proposed Model In the proposed approach, we first perform the preprocessing of the vocal dataset, explore the duplicate values, get statistical information from the dataset, do exploratory data analysis to get more insight information from the hidden pattern of the dataset. The dataset contain the highly correlated features. We set the value of thresh at 80% and remove the correlated feature that have strong correlation of more than 80%. We then split the dataset into training and test with the ratio of 70:30 respectively. A five fold cross validation is used to test the generalizability of the models and increase the accuracy of the proposed models. The dataset has large number of attributes. We performed the dimensioanlity reduction technique PCA and LDA to use the important dimension. We then implement supervised machine learning algorithm and check the performance evaluations of proposed algorithms in term of accuracy, precision, recall, f-score and AUC (Fig. 2). 3.3 Dimensionality Reduction Dimensionality reduction is the process of converting high dimensional data into low dimensional data. In this paper, we used two technique of dimensionality reduction, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA). Linear Discriminant Analysis (LDA) LDA is prevalent dimensionality reduction technique for reducing the dimension of data in machine learning and data mining applications [39]. The goal of LDA is to project the large number of features into a reduced number of features with good class-separability and reduce computational costs. LDA maximize the variance of data and also perform maximization into separation of multiple classes. The main purpose of LDA is to project a dimension space into a reduced subspace i (where i < x−1) without losing the class information.
166
T. S. Mian
Fig. 2. Proposed algorithm
The steps of LDA are: • A d-dimensional mean vector is calculated for every class of dataset. • Computation of scatter matrices is performed. • The eigenvectors (E1, E2, E3…. Ed) and their related eigenvalues (ψ1, ψ2, ψ3,…. ψd) of the scatter matrices are calculated. • We sort the eigenvectors in descending order of eigenvalues and then opt for k eigenvectors which have maximum eigenvalues in order to form a d * I matrix WW. • We used d * I matrix of eigenvector for transformation of input samples into a new subspace • YY = XX * WW. Principal Component Analysis (PCA) PCA is a statistical technique that transforms the data orthogonal. PCA transformed a group of correlated features to uncorrelated group of features [38]. The basic function of PCA is to reduce the dimensionality of data and perform exploratory data analysis. PCA can be used to determine the relationship among variables. Let we have the dataset having features X = (x1, x2, x3…………xn) where n denotes input dimension. We can reduce the n-dimension data into k-dimension (k < n) by using PCA. Assume that we have raw data with unit variance and zero mean. j
j
xi =
xi − xj σj
We can calculate the co-variance matrix of the raw data. 1 m (xi )(xi )T , ∈ Rnxn = 1 m
(1)
(2)
We can compute the eigenvalue and eigenvector of the co-variance matrix. μT = λμ
(3)
Early Diagnosos of Parkinson’s Using Dimensionality Reduction Techniques
167
⎡
⎤ − − − μ = ⎣ μ1 μ2 μ3 ⎦, μi ∈ Rn − − −
(4)
We projected the row data into k-dimensional subspace and then choose the top k eigenvector from co-variance matrix. This matrix will be new from the original dataset. The PCA is method of converting the raw data of n-dimensionality into a reduce k dimensional representation of the data. ⎡
xinew
⎤ μT1 xi = ⎣ μT2 xi ⎦ ∈ Rk μTk xi
(5)
3.4 Machine Learning Algorithms Random Forest Random Forest model is an ensemble technique that is based on decision tree and use the set of splitting rules to build model that predict the value of target variable. Random Forest improves classification accuracy of singletree classifier by adding randomization and bootstrap aggregating method in the selection of data nodes during the decision tree construction [78]. A decision tree with M leaves divides the feature space into M regions Rm, 1 ≤ m ≤ M. The prediction function f(x) for each tree can be defined as: f (x) =
M m=1
cm
(x, Rm )
(6)
In Eq. (6) M denotes number of regions in the feature space, Rm is a region suitable to m, Cm is a constant to m. 1, if x ∈ Rm (7) (x, Rm ) = 0, Otherwise K-Nearest Neighbor K-Nearest Neighbor (KNN) model was introduced by Fix Hodges in 1951. KNN is simple distance based powerful, non-parametric lazy learning algorithms. KNN can also be for classification and regression tasks. It stores all available cased and divide into new cased based similarity score. KNN has been used in pattern recognition and statistical estimation before 1970. KNN takes n number of training instance and q as an unknown value. 1. Training samples are stored in any array of data points arr[] in which every element of the array shows a tuple(a, b) 2. For i = 1 to n, then compute Euclidean distance d(arr[i], q)
168
T. S. Mian
3. Obtained the smallest set of K where obtained distances correspond to dataset target variable. 4. Return majority label class from small set of S. Logistic Regression Logistic Regression was present by David Cox in 1958. Logistic Regression model is capable to solve only classification task that use the probability value of 0.5 to predict the target. Logistic Regression model can use observed numerical and categorical values. Logistic Regression optimal decisions are based on the posterior clasds probabilities p(y|x). We can represent outcome of logistic regression in case of binary classification as follows: y = 1 if log p(y = 1|X )p(y = 0|X ) > 0
(8)
or y = 0 if log
p(y = 1|x) 120 mg/dl (1 = true; 0 = false)
restecg
Resting electrocardiographic results; (Value 0: normal, Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), Value 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria
thalach
Maximum heart rate achieved
exang
Exercise induced angina; (1 = yes; 0 = no)
oldpeak
ST depression induced by exercise relative to rest
slope
the slope of the peak exercise ST segment; (Value 1: upsloping, Value 2: flat, Value 3: downsloping)
ca
number of major vessels; (0–3) colored by flourosopy
thal
3 = normal; 6 = fixed defect; 7 = reversable defect
num
diagnosis of heart disease (angiographic disease status); (Value 0: < 50% diameter narrowing, Value 1: > 50% diameter narrowing)
Hardware and Software The experiments conducted in this paper were implemented using Python programming language using the Jupyter notebook from Anaconda. It was run on an Intel® Core™ i7 CPU 2.60 GHz with 8.00 GB of RAM under 64-bit Windows 10 Enterprise operating system.
180
F. Kashif and U. K. Yusof
3.2 Classification Algorithms Random Forest The base learners in this ensemble model are the individual decision trees that comprise together to form a forest, hence the name random forest [12]. Random forest can be applied to both regression and classification problems. This method makes use of the bootstrap aggregating or bagging for the learning the data. For a training set X = x1 , x2 , . . . , xn with responses Y = y1 , y2 , . . . , yn bagging is done repeatedly for B times wherein a random sample is drawn with replacement [12]. For each of these samples a classification or regression tree is trained. From these trained models, the outcomes for unseen samples is made by taking average for regression and majority for the classification tasks. The predictions for random forests for an unseen sample x in regression and classification tasks are shown in Eqs. 1 and 2 respectively. 1 B fb x b=1 B f = MajorityVote{fb x }B1
f =
(1)
(2)
Bagging methods work best when the base learners are not stable or there is a need to improve the accuracy of the model [11]. Generally random forests prove more stable in classification tasks as opposed to regression tasks. In this paper, random forest algorithm is employed on 500 base learners to achieve a higher classification accuracy as was achieved by decision tree alone. Support Vector Machine SVM is a classification algorithm that has a very high accuracy. It is primarily used for dichotomous response variables, such as binary or logical [13]. This algorithm provides linear and non-linear separators to find a better fit for the data. One of the main drawbacks is that this procedure is highly sensitive to noise and a little bias can dramatically affect the overall performance. SVM is a discriminant-based method, so in classification tasks, evaluating posterior probabilities is not the priority, rather the need is to estimate the decision boundaries [12]. For a binary classification problem with class labels −1 and + 1, the sample X = {xt , r t } where r t = +1 if xt ∈ C1 and r t = −1 if xt ∈ C2 , the classification in SVM follows the principles of weights associated with each class. The weights w and w0 are found such that: wT xt + w0 ≥ +1forr t = +1
(3)
wT xt + w0 ≤ −1forr t = −1
(4)
Theoretically there can be multiple hyperplanes that are able to separate the classes, but the optimal hyperplane is the one that maximizes the margin, which is the distance between the two closest data points of different classes [12]. SVM are powerful algorithms even for data that cannot be separated by a straight line. Transforming the data
Detection of Cardiovascular Disease Using Ensemble Machine Learning Techniques
181
into a higher dimension can separate the data points but in practice these transformations can get overly complicated. The kernel trick is a function that takes vectors from the original space as input and returns their dot product as the feature space. Mathematically, if x1 , x2 ∈ X and a map ∅ : X → RN , then x = (x1 , x2 ) → ∅(x) = {∅1 (x), ∅2 (x)}
(5)
There are several kernel functions one can choose to handle the data, however in this paper the radial basis function kernel is used in the proposed ensemble model. For two samples x1 , x2 in the original space, the RBF kernel is defined as x1 −x2 2 (6) K(x1 , x2 ) = exp − 2σ 2 where σ is a free parameter: which cannot be accurately predicted and is not controlled by the model; this parameter can be estimated experimentally. Using this kernel function, the feature space is defined, and the ensemble model is created for this classification task. Ensemble Random Forest and Support Vector Machine Ensemble models use the weak or base learners as their foundation and can be used to generate more accurate results. In this paper an ensemble model is created by using the random forest model and support vector machine. This model is created using the Vote method wherein the model decides based on majority votes from the base learners as to which class to assign to each instance [12]. As described in the previous sections, random forest and SVM are powerful classification techniques and combining them in one model yielded better results than either model produced individually. The models are merged by using the Vote method where an unweighted approach is used i.e. both algorithms are given equal importance while selecting the final class for every new instance. The base learners for this ensemble model are selected based on their individual capabilities to handle classification tasks. These methods are also individually compared with the proposed ensemble model. Hyperparameters of the Proposed Ensemble Model The proposed model is an ensemble using the combined strengths of Random Forest and Support Vector Machine algorithms. The Random Forest model was created using 500 decision trees with a maximum depth of 5. The criterion used is the Gini Impurity which measures the likelihood of a new instance being wrongly classified by the trained model [12]. The hyperparameters of the SVM model were tuned wherein the kernel used was radial basis function. Gamma for SVM was set to “scale” for training this base learner. This parameter of SVM algorithms determines how far the influence of one instance prevails [12]. When this value is set to ‘scale’ the gamma is calculated by using the following formula: Gamma = 1/(NumberofFeatures)(VarianceofX )
(7)
182
F. Kashif and U. K. Yusof
3.3 Evaluation Metrics To build the models, 80% of the data was designated as the training set and 20% was used for testing. To evaluate these models, confusion matrix and the metrics associated with it are employed. Confusion matrix summarizes the actual and the predicted outcomes in a tabular format in terms of false positive, false negative, true positive and true negatives. Ideally the off diagonals i.e. the false positive and negative results should be zero [12]. The evaluation metrics used on this paper are described below: Precision = TP/(TP + FP)
(8)
Recall = TP/(TP + FN )
(9)
Error = (FN + FP)/(TP + FP + FN + TN )
(10)
F − 1 Score = 2(Precision)(Recall)/(Precision + Recall)
(11)
Where T stands for true, F for false, P for positive and N for negative. Precision of a model is the fraction of the predicted positive values that really are positive. Recall corresponds to the fraction of positives that were correctly predicted by the classifier. The threshold is the measure used to define a balance between the precision and the recall [13].
4 Results and Evaluation This research was conducted by using Heart Disease dataset from the UCI repository to predict the class label. The best performance was shown by the ensemble random forest and support vector machine model where the model was able to reach an overall accuracy of 0.89. This model was built by combining the random forest made with 500 decision trees and the SVM algorithm with the radial basis kernel function. The comparison of the base learners with the proposed ensemble model is shown in Table 2 and the detailed analysis of the proposed model is shown in Table 3. We can see from Table 2 that the ensemble model performs better than either of the base individual models. Table 2. Comparison of base learners with proposed ensemble model Algorithm
Accuracy Precision Recall F-1 score
SVM
0.81
0.88
0.85
0.87
Random forest
0.85
0.86
0.84
0.85
Proposed ensemble model 0.89
0.89
0.89
0.88
Detection of Cardiovascular Disease Using Ensemble Machine Learning Techniques
183
Table 3. Class-wise analysis for proposed ensemble model Class label
Precision Recall F-1 score Error rate
0
0.88
0.85
0.87
0.14
1
0.89
0.91
0.90
0.08
Average 0.89
0.89
0.88
0.11
As seen from Table 3 the accuracy, recall and precision of the model are quite close to each other which implies that the model is stable and can predict both classes well. The proposed model has scored a better accuracy as compared to the other algorithms. For further comparisons, some other algorithms were also trained and tested on the same dataset and the comparison was made with the proposed model. The comparison with other algorithms is shown in Table 4. We can see that the proposed model outperforms the rest of the algorithms and can be used to accurately detect heart disease in patients. Table 4. Accuracy comparison of various algorithms Algorithm
Accuracy Precision Recall F-1 score
Decision tree
0.77
0.77
0.77
0.77
Voting
0.84
0.84
0.83
0.83
Bagging
0.82
0.82
0.81
0.81
AdaBoost
0.72
0.72
0.72
0.72
Proposed ensemble model 0.89
0.89
0.89
0.88
The AdaBoost gives the lowest accuracy among the algorithms being compared. This ensemble model was made by using decision tree as the base classifier. The learning rate was set as 0.07 with 500 classifiers. There is an inherent trade-off between these two values and further experiments showed that tuning these parameters allow for some fluctuations in the classification report of this classifier. The problem in this instance is that this model in this form has overfit the data and thus shows subpar performance on the test set. This data has been used by other researchers to predict the heart disease diagnosis in patients. The results from [6, 14] and [15] are compared with the proposed model in Table 5. The complete evaluation metrics for [6] are not reported in the paper. The researchers in [6] have proposed the Vote ensemble as their proposed model and by comparison, their algorithm outperforms the Vote ensemble performed in this paper. This can be contributed to two reasons: different base learners and feature selection. In [6], the base learners for Vote are Logistic Regression model and Naïve Bayes whereas in our proposed model the base learners are Random Forest and SVM. Additionally,
184
F. Kashif and U. K. Yusof
the difference in accuracy can be due to the feature selection step. We have used all the features in the Cleveland dataset whereas in [6], 9 of these features were selected. Table 5. Comparison of proposed model with existing literature Best algorithm
Accuracy Precision Recall F-1 score
Vote [6]
0.874
HRFLM [14] Logistic regression [15]
–
–
–
0.884
0.901
0.928
0.90
0.8625
0.89
0.86
0.86
0.89
0.89
0.88
Proposed ensemble model 0.89
From the results, we can see that the overall accuracy is enhanced in detection of heart disease. The proposed model is accurately predicting both classes, in it that it can be considered as robust for either case. The ensemble model proposed is not favoring any class over the other which is a desirable property in classification tasks. The comparison in Table 5 shows that no other model is as consistent as the one proposed in this paper. The stability of this model makes it a good candidate for classifying patients in this domain.
5 Conclusion Cardiovascular diseases are quite prevalent in today’s day and age, early and accurate diagnostics can help people achieve a higher quality of life. In this paper ensemble learning using random forest and support vector machine were used to predict heart disease in patients based on medical records. In this study 303 records of the Cleveland Heart Disease dataset were observed and used to create classification models. The ensemble random forest and support vector machine model outperformed the other algorithms and showed an overall accuracy of 0.89 and F-1 score of 0.88. The results of the proposed model are compared with several other techniques. This research can be extended to address higher dimensional and real-world data. In this paper, all 13 attributes were used to create the model without any feature selection. In future, we intend to perform comprehensive feature selection to select appropriate attributes. Feature selection can impact the accuracy levels of the model. Additionally, the method proposed in this paper is an ensemble of random forest and SVM with radial basis function kernel, a different combination of algorithms can be tried to improve the accuracy. Furthermore, different preexisting or custom kernels can be applied to the data to improve prediction accuracy. Acknowledgements. The authors would like to thank Universiti Sains Malaysia (USM) for the support and encouragement to conduct this research through the Research University Grant (RUI) (1001/PKOMP/8014084).
Detection of Cardiovascular Disease Using Ensemble Machine Learning Techniques
185
References 1. Ali´c, B., Gurbeta, L., Badnjevi´c, A.: Machine learning techniques for classification of diabetes and cardiovascular diseases. In: 6th Mediterranean Conference on Embedded Computing (MECO) IEEE (2017) 2. Alpaydın, E.: Introduction to Machine Learning. MIT Press, (2015) 3. Amin, M.S., Chiam, Y.K., Varathan, K.D.: Identification of significant features and data mining techniques in predicting heart disease. J. Telematics Inf. 36, 82–93 (2019) 4. Arji, G., Safdari, R., Rezaeizadeh, H., Abbassian, A., Mokhtaran, M., Ayati, M.H.: A systematic literature review and classification of knowledge discovery in traditional medicine. J. Comput. Methods Programs Biomedicine 168, 39–57 (2019) 5. Durairaj, G., Oommen, A.T., Pillai, G.: Correlation between BMI, Hba1c and fasting lipid profile in patients presenting with acute coronary syndrome and their relationship with CVD Risk. J. Cardiovascular Disease Res. 2, 10 (2019) 6. Islam, S., Jahan, N., Khatun, M.E.: Cardiovascular Disease Forecast using Machine Learning paradigms. In 10th International Conference on Computing Methodologies and Communication (ICCMC), pp. 487–490. IEEE (2020) 7. Kumar, S., Sahoo, G.: Enhanced decision tree algorithm using genetic algorithm for heart disease prediction. J. Int. J. Bioinf. Res. Appl. 14(1–2), 49–69 (2018) 8. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. J. IEEE Access. 7, 81542–81554 (2019) 9. Nowbar, A.N., Gitto, M., Howard, J.P., Francis, D.P., Al-Lamee, R.: Mortality from ischemic heart disease: analysis of data from the World Health Organization and coronary artery disease risk factors From NCD Risk Factor Collaboration. J. Circulation: Cardiovascular Quality and Outcomes. 12 (6), (2019) 10. World Health Organization: Cardiovascular Disease. https://www.who.int/health-topics/car diovascular-diseases 11. UCI Machine Learning Repository: Heart Disease Dataset. https://archive.ics.uci.edu/ml/dat asets/heart+disease 12. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. J. Procedia computer science 132, 1578–1585 (2018) 13. Sutton, C.D.: Classification and regression trees, bagging, and boosting. Handbook Stat. 24, 303–329 (2005) 14. Xu, G., Liu, M., Jiang, Z., Söffker, D., Shen, W.: Bearing fault diagnosis method based on deep convolutional neural network and random forest ensemble learning. J. Sensors 19(5), 1088 (2019) 15. Zumel, N., Mount, J., Porzak, J.: Practical data science with R. Manning Publications, Shelter Island (2014)
Health Information Management
Hospital Information System for Motivating Patient Loyalty: A Systematic Literature Review Saleh Nasser Rashid Alismaili1,2(B) , Mohana Shanmugam1 , Hairol Adenan Kasim1 , and Pritheega Magalingam3 1 College of Informatics and Computing, Universiti Tenaga Nasional, Kajang, Malaysia
[email protected] 2 Directorate of Information Technology, Ministry of Health, Muscat, Sultanate of Oman 3 Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Skudai,
Malaysia
Abstract. Healthcare service institutions (HIS) seeking to motivate patient loyalty have identified Hospital Information Systems (HIS) as a potential solution to gather, measure, and analyze the healthcare data necessary for this goal. The purpose of this systematic review of the literature is to reveal how prevalent the use of HIS with respect to motivating patient loyalty, and to investigate the efficacy of HIS in doing so. To generate data, published empirical studies and conference papers from the past five years were compiled from the following online databases: Scopus, ACM Digital Library, IEEE Xplore, ScienceDirect, and Emerald Insight. The search results indicate that, while the use of HIS in motivating patient loyalty is rare relative to other topics within the general field of HIS, HIS use have a significant positive impact on patient satisfaction, which is understood in the literature to be directly related to patient loyalty. There remains a gap in empirical studies on the direct application of HIS with the purpose of increasing patient loyalty. Future research may be required on the development of an HIS focused on motivating patient loyalty, which can be empirically tested in a real-world HSI setting. Keywords: Hospital Information Systems · Patient loyalty · Patient satisfaction
1 Introduction Due to cheap travel costs and rising healthcare standards in developing countries, more patients across the globe are choosing to go overseas for healthcare. In response, local governments, hospital administrators, and other stakeholders are seeking ways to retain their patients and motivate loyalty to their healthcare service institutions (HSIs). There is a clear consensus in the relation between service quality, patient satisfaction, and patient loyalty. In brief, service quality variables influence patient satisfaction; patient satisfaction, in turn, influences patient loyalty [1–3]. Thus, it has been argued in the literature that, to motivate patient loyalty, it would be necessary to motivate patient satisfaction by meeting the service quality variables laid out by [4], namely, tangibility, reliability, responsiveness, assurance, and empathy [5, 6]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 189–198, 2021. https://doi.org/10.1007/978-3-030-70713-2_19
190
S. N. R. Alismaili et al.
A significant complication inherent in these variables is the observed differences, not just in the relative importance specific groups place on each variable, but also how the variables are conceived in the first place. For instance, [7] found that younger patients valued the variables of empathy and tangibles more than their older counterparts, while single patients valued reliability more than their married peers. Another example is the high value Ghanaian patients place on individual attention from HSI staff [8], which is markedly different from Japanese patients, who were most satisfied when HIS staff treated them the same as other patients [9]. Thus, if HSIs wish to motivate patient loyalty, the best result would likely require the data gathered from their own patient base, rather than from other HSIs that may cater to other segments of people. HSIs attempting to meet all service quality variables require the gathering, measurement, and analysis of massive amounts of data in a timely manner. Not only do HSIs have to handle large sets of healthcare data, they must do so while respecting patients’ confidentiality, and keeping their trust [10]. The emergence of Hospital Information Systems (HIS) may assist in accomplishing this task. HIS can be understood simply as the methods with which a hospital manages the information within their organization; it has been defined as a “comprehensive, integrated Information System (IS) designed to manage the administrative, financial, and clinical aspects of a hospital” [11]. HIS is intended to make sense of the large amounts of information that HSIs collect every day and present them in a way that can be utilized easily by HSI administrators and stakeholders. However, HSI investments in HIS have focused mainly on improving performance from the perspective of administrators rather than from the perspective of patients [12, 13]. The purpose of this systematic review of the literature is to determine how prevalent the use of HIS is for the goal of motivating patient loyalty, as opposed to focusing on administrative concerns. An additional purpose is to reveal how effective HIS has been in motivating patient loyalty in the studies that use it for that goal. To reveal the prevalence and effectiveness of HIS use with respect to motivating patient loyalty, the following research questions were formulated: • RQ1. How prevalent is the use of Hospital Information Systems to motivate patient loyalty? • RQ2. How effective is Hospital Information Systems in motivating patient loyalty? By reviewing the current state of the literature regarding both the prevalence and efficacy of HIS with respect to motivating patient loyalty, this review aims to reveal how commonly HSIs use HIS to increase patient loyalty and how effective such use has been. The review will be organized into four sections. The first section will discuss the methods utilized for the systematic review, the second will outline the results of the review, the third will expand upon the obtained results, and the fourth and final section will include the conclusion of the study, and also possible directions for future research.
Hospital Information System for Motivating Patient Loyalty
191
2 Methods 2.1 Literature Search A search of five online research databases was made to compile relevant articles for the review: ACM Digital Library, Emerald Insight, IEEE Xplore, ScienceDirect, and Scopus. The search was conducted from late February to early May 2020. The search results were reconfirmed on June 2020. To generate the articles, the following search string was used: “hospital information system AND (“patient loyalty” OR “patient satisfaction”)”. Initially, the search string used was simply “hospital information system” AND “patient loyalty”. However, there were too few results using this string. Due to the robust evidence linking patient satisfaction and patient loyalty, “patient satisfaction” was added to the search string in order to generate more results. 2.2 Inclusion and Exclusion Criteria As the review is focused on the actual use of HIS by HSIs with the express purpose of motivating or improving patient loyalty, only articles with the following characteristics were included: (1) the article must be published in English, and available in full text version, (2) the article must be published in a peer-reviewed academic journal within the last five years, or (3) the article is set to be published in a peer-reviewed academic journal this year, or (4) the article is a peer-reviewed article from an international computer science conference. Only empirical studies that focus on the use or effect of HIS on patient loyalty or satisfaction were included in the review, as the primary concern is the actual practice of HSIs with respect to HIS and patient loyalty. The inclusion criteria were intended to generate results pertinent to the current state of HIS use with respect to motivating patient loyalty in HSIs. Due to the rapid advancements in IT, extending the inclusion range beyond five years prior to the review may introduce obsolete data. Because the research questions of the review pertain to actual use of HIS by HSI staff, only empirical studies were included. Articles were excluded if they were published prior to 2015, not peer-reviewed, used HIS in a manner other than for motivating patient loyalty, or did not conduct an empirical study. Previous systematic literature reviews were also not included in the review as their findings may already be obsolete. 2.3 Data Extraction By applying the inclusion and exclusion criteria outlined above, a list of articles to be included in the review was identified and gathered for further refinement to build the final list. It was necessary to eliminate articles that emerged from more than one database to eliminate redundancies.
192
S. N. R. Alismaili et al.
To determine whether an article was to be included in the review, the titles of the results from the aforementioned search string were scanned; if they did not mention HIS, patient loyalty, or patient satisfaction, they were excluded. Next, the remaining results’ abstracts were examined to determine their relevance to the review. If the abstract did not contain reveal an empirical study on HIS, patient loyalty, or patient satisfaction, they were excluded. Finally, the remaining results’ full texts were read by to determine their relevance to the review. The first three authors of the review handled the initial search and abstract reviews. All four authors contributed to the full text readings and final selection of the articles. The full process is outlined below in Fig. 1.
Fig. 1. Data extraction process
3 Results This section presents a summary of the reviewed studies.
Hospital Information System for Motivating Patient Loyalty
193
3.1 Search Results
Table 1. Data extraction results Authors
Year
Study location
Purpose of the study
Relevant results
Limitations
Implications
Liang, Gu, Tao, Jain, Zhao and Ding [16]
2015
Large hospital in East China
To examine the influence of HIS on doctor-patient relationships and patient satisfaction through the lens of service fairness
Patient-accessible HIS increases the patients’ perception of service fairness, which in turn improves both doctor-patient relationships, as well as patient satisfaction
Data from a large hospital was utilized; data from smaller HSIs may lead to different results. Furthermore, because this is a Chinese hospital, Liang et al. (2015) noted that there may be cultural factors specific to the Chinese that may differ from other cultures with respect to perceptions of service fairness
There is a power imbalance between physicians and patients in health care, leading to potential tension when patients feel that their concerns are unappreciated or ignored. The use of HIS, which allows patients more access to pertinent medical and administrative information regarding themselves, may help remedy this imbalance and increase patient satisfaction and loyalty
Yoo, Jung, Kim, Kim, Lee, Ching and Hwang [18]
2016
A public tertiary general hospital in South Korea
To evaluate an HIS that addresses the difficulties of outpatients regarding the search for HSIs, keeping up with treatment regimens, and accessing tailored medical and administrative information
The authors conducted a survey on their satisfaction regarding the HIS, n = 43 (23 outpatients and 20 of their guardians). Participants exhibited a satisfaction level of roughly 4.0 on a 5-point Likert scale
An Android-based mobile app was used by the outpatients. The study did not discuss the HIS used by the hospital. The results may therefore apply only to HIS initiatives solely focused on patients, and not on HSI-wide efforts to utilize HIS
Outpatients value the easy access to pertinent medical and administrative information at a glance. Visiting an HSI can often be a stressful experience, particularly for older patients. If they can access information readily without having to ask a staff member, they may feel more empowered and thus more satisfied—the increased satisfaction may motivate them toward loyalty for the HSI
(continued)
194
S. N. R. Alismaili et al. Table 1. (continued)
Authors
Year
Study location
Purpose of the study
Relevant results
Limitations
Implications
Khalifa [15]
2017
Four hospitals in Saudi Arabia (2 private, 2 public)
To reveal the perceived benefits of HIS and electronic medical records (EMR) from the point of view of patients
After 153 valid survey responses, the patients perceived the following benefits for HIS and EMR: 1) Improved information access, 2) Increased healthcare professionals productivity, 3) Improved efficiency and accuracy of coding and billing, 4) Improved quality of healthcare, 5) Improved clinical management (diagnosis and treatment), 6) Reduced expenses associated with paper medical records, 7) Reduced medical errors, 8) Improved patient safety, 9) Improved patient outcomes and 10) Improved patient satisfaction
HIS was examined solely from the point of view of patients. Considerations from the HSI’s point of view were left out, which means there is limited data on whether the HIS would be financially viable on their end
Patients are more satisfied when they can feel empowered and exercise their informed autonomy in HSI interactions. HIS that increases information availability and convenience will likely lead to greater patient satisfaction, as well as greater patient loyalty
Meyerhoefer, Sherer, Deily, Chou, Chen, Sheinberg and Levick [17]
2018
An obstetrics and gynecology practice in eastern Pennsylvania
To examine the impact of installing an EHR system at OB/GYN practices
HSI staff were dissatisfied with the EHR system; physicians most especially. Patient satisfaction decreased after the installation of the EHR system
Only OB/GYN practices were considered; results may not apply to other branches of healthcare
The negative impact of EHR on patient satisfaction may be due to the dissatisfaction of HSI staff with the system, which may have impacted staff compliance. HIS tasked with motivating patient loyalty must have support from HSI staff to be viable
(continued)
Hospital Information System for Motivating Patient Loyalty
195
Table 1. (continued) Authors
Year
Study location
Purpose of the study
Relevant results
Limitations
Implications
Asagbra, Burke, & Liang [14]
2019
Acute care hospitals in the United States
To examine the relationship between HIS functionalities and the quality of care by the HSI
The more comprehensive the coverage of the different HIS functionalities, the higher the satisfaction of patients. The number of functionalities also correlated negatively with readmission rates for myocardial infarction, hearth failure, and pneumonia
Secondary data was utilized by Asagbra et al. (2019), all of which were surveys
HIS that meets patient needs leads to greater patient satisfaction, which may, in turn, lead to more robust patient loyalty
4 Discussion 4.1 Prevalence of HIS Tasked with Motivating Patient Loyalty In contrast to the thousands of results one obtains by searching for “hospital information system” in online research databases, limiting the search terms to mentions of “hospital information system” in combination with “patient loyalty” drastically decreased the search results. To remedy this, the search term “patient satisfaction” was added in order to expand the results. The final data extraction resulted in just five studies; this indicates that there is a clear lack of studies focused on HIS with the specific intention of patient loyalty. 4.2 Efficacy of HIS Tasked with Motivating Patient Loyalty The reviewed studies revealed that HIS has significant positive effects on patient satisfaction, which in turn, motivates patient loyalty, which supports the findings of previous literature. Four out of five studies revealed that the increased information availability of HIS led to greater satisfaction rates from patients. In [14], secondary data from acute care hospitals was analyzed to reveal whether the HSIs’ HIS functionalities predicted the satisfaction rates of patients. It was found that the more HIS functionalities are present in an HSI, the more likely it is for patients to report satisfaction. A potential contributor to higher patient satisfaction rates is the lower readmission rates among HSIs with more HIS functionalities. Because an HIS can present pertinent medical data tailored to the patients’ own needs, patients are better able to follow their treatment regimens, which lead to better patient outcomes. In fact the improved outcomes may be attributed by patients to better information availability. Similar results were also found in [15–18]. A common thread among the four studies that revealed the relationship between information availability and patient satisfaction [14–16, 18] is the subjects’ apparent priority of information availability. A potential reason for this was illuminated by [16],
196
S. N. R. Alismaili et al.
who noted that, in China, there is a wide gulf between patients and medical staff, especially physicians, in terms of power. Patients often feel powerless in the face of illness; if medical staff is unable to empower patients by providing them with pertinent information quickly, they feel less powerful. This may contribute to their lack of satisfaction, as patients can feel confused and uncertain if they feel uninformed about the specifics of their treatment or care. The outlier was the result obtained by [17], which examined the effect of HIS on an OB/GYN practice. They found that patients were less satisfied before the implementation of HIS on the practice than during, as well as after, the implementation. However, the result may have been influenced by the dissatisfaction exhibited by the medical staff with the HIS employed, particularly among physicians. Their dissatisfaction could have negatively impacted their performance, which in turn could have led to the loss of satisfaction reported by patients. Overall, it appears that the use of HIS leads to positive effects with respect to increasing patient loyalty. However, the effect revealed in the reviewed studies is indirect; that is, HIS positively impacts patient satisfaction, which can then be inferred to lead to improved patient loyalty. 4.3 Gaps in the Literature There appears to be a need to examine the use of HIS toward the specific use of motivating patient loyalty. Based on the short list of results for HIS and patient loyalty, it appears that much of the scholarly focus regarding the overall aim of HIS is centered on other issues. Some of the more common search results for HIS concerns technology adoption or conceptual frameworks. Traditionally, the healthcare industry has utilized HIS to tend to administrative concerns, such as the streamlining of billing procedures and the storage of medical records or patient data [12]. It is mostly assumed that the administrative benefits will result in greater patient satisfaction and loyalty, due to the efficiencies brought about by HIS. Despite the surge in patient-centered HIS in recent years, there is still a gap in empirical studies on the effect of such HIS on patient loyalty. Instead, much of the focus is on the conceptual development, implementation, and adoption of patient-centered HIS. One potential explanation for this focus is that most researchers believe that motivating patient loyalty is a byproduct of the improvements brought about by HIS, rather than the primary goal. The short list of empirical studies on HIS and patient loyalty indicate that, when the impact of HIS on patient loyalty is examined, the results are significantly positive. It is noteworthy, however, that the reviewed articles examined patient loyalty indirectly. That is, they did not investigate the impact of HIS on patient loyalty specifically; instead, this connection could only be inferred by the positive impact of HIS on patient satisfaction. There appears to be a literature gap in the empirical testing of an HIS that treats patient loyalty as its primary goal, rather than an incidental effect of improving HSI services.
Hospital Information System for Motivating Patient Loyalty
197
4.4 Limitations This review is limited by the short list of articles generated by the search terms utilized. The articles were also limited by publication date, specifically, within the last five years; it is possible that much of the empirical testing of the effects of HIS on patient loyalty was enacted prior to this period. The search results may also have been constrained by the selection of research databases, which was limited by the available resources of the school library.
5 Conclusions and Directions for Future Research Scholarly research on HIS yields a large number of results; however, when the research is limited to HIS in relation to generating patient loyalty, the number shrinks significantly. The review revealed that the prevalence of HIS for the express purpose of motivating patient loyalty is low. However, in the few studies where the impact of HIS was investigated, a majority found that HIS had a significant positive impact for patient satisfaction. Due to the robust literature on the relationship between patient satisfaction and patient loyalty, it would be reasonable to infer that HIS is likely effective in motivating patient loyalty. However, a direct examination of an HIS geared toward increasing patient loyalty was not found. Future research may be directed toward the development of HIS designed to motivate patient loyalty, as well as the empirical testing of this HIS in a real-world HSI setting.
References 1. Mohebifar, R., Hasani, H., Barikani, A., Rafiei, S.: Evaluating service quality for patients’ perceptions: application of importance-performance analysis method. Osong Public Health Res. Perspect. 7(4), 233–238 (2016) 2. Shabbir, A., Malik, S.A., Malik, S.A.: Measuring patients’ healthcare service quality perceptions, satisfaction, and loyalty in public and private sector hospitals in Pakistan. Int. J. Q. Reliab. Manag. 33(5), 538–557 (2016) 3. Tosyali, H., Sütcü, C.S., Tosyali, F.: Patient loyalty in the hospital patient relationship: the mediating role of social media. Erciyes˙Ileti¸simDergisi 6(1), 783–804 (2019) 4. Parasuraman, A., Zeithaml, V.A., Berry, L.L.: Servqual: a multiple-item scale for measuring consumer perceptions of service quality. J. Retail. 64(1), 12 (1988) 5. Asnawi, A., Awang, Z., Afthanorhan, A., Mohamad, M., Karim, F.: The influence of hospital image and service quality on patients’ satisfaction and loyalty. Manag. Sci. Lett. 9(6), 911–920 (2019) 6. Kulsum, U., Yanuar, T., Syah, R.: The effect of service quality on loyalty with mediation of patient satisfaction. Int. J. Bus. Manag. Invention 6(3), 41–50 (2017) 7. Ahmed, S., Tarique, K.M., Arif, I.: Service quality, patient satisfaction and loyalty in the Bangladesh healthcare sector. Int. J. Health Care Qual. Assur. 30(5), 477–488 (2017) 8. Nkrumah, S., Yeboah, F., Adiwokor, E.: Client satisfaction with service delivery in the health sector: the case of Agogo Presbyterian Hospital. Int. J. Bus. Adm. 6(4), 64–78 (2015) 9. Elleuch, A.: Patient satisfaction in Japan. Int. J. Health Care Qual. Assur. 12(7), 692–705 (2008)
198
S. N. R. Alismaili et al.
10. Bayer, R., Santelli, J., Klitzman, R.: New challenges for electronic health records: confidentiality and access to sensitive health information about parents and adolescents. J. Am. Med. Assoc. 313(1), 29–30 (2015) 11. Ahmadi, H., Nilashi, M., Shahxradi, L., Ibrahim, O.: Hospital information system adoption: expert perspectives on an adoption framework for Malaysian public hospitals. Comput. Hum. Behav. 67, 161–189 (2017) 12. Sheikh, A., Sood, H.S., Bates, D.W.: Leveraging health information technology to achieve the “triple aim” of healthcare reform. J. Am. Med. Inform. Assoc. 22(4), 849–856 (2015) 13. Wang, Y., Kung, L., Byrd, T.A.: Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Chang. 126, 3–13 (2018) 14. Asagbra, O.E., Burke, D., Liang, H.: The association between patient engagement HIT functionalities and quality of care: does more mean better? Int. J. Med. Inf. 130, 103893 (2019) 15. Khalifa, M.: Perceived benefits of implementing and using hospital information systems and electronic medical records. In: ICIMTH, pp. 165–168. IOS Press (2017) 16. Liang, C., Gu, D., Tao, F., Jain, H.K., Zhao, Y., Ding, B.: Influence of mechanism of patientaccessible hospital information system implementation on doctor–patient relationships: a service fairness perspective. Inf. Manag. 54(1), 57–72 (2017) 17. Meyerhoefer, C.D., Sherer, S.A., Deily, M.E., Chou, S.Y., Guo, X., Chen, J., Sheinberg, M., Levick, D.: Provider and patient satisfaction with the integration of ambulatory and hospital EHR systems. J. Am. Med. Inform. Assoc. 25(8), 1054–1063 (2018) 18. Yoo, S., Jung, S.Y., Kim, S., Kim, E., Lee, K.H., Chung, E., Hwang, H.: A personalized mobile patient guide system for a patient-centered smart hospital: lessons learned from a usability test and satisfaction survey in a tertiary university hospital. Int. J. Med. Informatics 91, 20–30 (2016)
Context Ontology for Smart Healthcare Systems Salisu Garba(B) , Radziah Mohamad, and Nor Azizah Saadon(B) School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, UTM, 81310 Skudai, Johor, Malaysia [email protected], {radziahm,azizahsaadon}@utm.my
Abstract. This paper proposes an improved Context Ontology for Smart Healthcare Systems. The main contribution of this work is the simplification, sufficiently expressiveness, and extendability of the smart healthcare context representation, in which only three contextual classes are required—compared to several classes in the related context ontologies. This is achieved by adapting the feature-oriented domain analysis (FODA) techniques of software product line (SPL) for domain analysis, and subsequently, the lightweight unified process for ontology building (UPON Lite) is used for ontology development. To validate the applicability of the proposed context ontology, sustAGE smart healthcare case study is used. It is found that the proposed context ontology can be used to sense, reason, and infer context information in various users, environments, and smart healthcare services. The ontology is useful for healthcare service designers and developers who require simple and consolidated ontology for complex context representation. This paper will benefit the smart healthcare service developers, service requesters as well as other researchers in the ontology-based context modeling domain. Keywords: Ontology · Smart healthcare · Context ontology · Healthcare service
1 Introduction The smart healthcare system is an intelligent system that makes use of modern technologies such as IoT, Big Data, advanced analytics with deep learning for better diagnosis of the disease, better treatment of the patients, and improved quality of lives with the aid of vital components (mHealth and eHealth) for efficient and effective communication between individuals and health service providers [1]. Wireless Sensor Network (WSN) serves as the enabling technology for the transformation of healthcare applications. WSNs consist of an array of sensors that can monitor all-natural phenomena such as body temperature, blood pressure, heart rate, glucose, breath rate in smart healthcare [2]. Healthcare systems require a high transmission rate and lower delay which mandates mobile network operators to shift from network monitoring to service monitoring device monitoring [3]. This leads to the generation of massive data that necessitate proper representation, intelligent analytics to enable appropriate decisions in smart healthcare systems [4].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 199–206, 2021. https://doi.org/10.1007/978-3-030-70713-2_20
200
S. Garba et al.
Among the resonant context space, query preference, and vector space, objectoriented and ontology are widely employed [5]. The most viable method to organize, represent, and generate inference from this massive context information is ontology [6]. An ontology is a machine-readable and precise representation of rigorous conceptual schema (relevant entities, properties, relations, and rules) that is derived from consensus to capture meaning (semantics) within the domain of discourse [7]. Despite the extensibility and expressive power of ontology, its application in context representation for smart healthcare systems exhibits several challenges such as constantly changing context, heterogeneous smart devices, building ontologies from scratch, and many more [8]. The existing context ontologies are mostly constricted and slightly fragmented as the smart healthcare context is user, service, and environment-dependent. This may be probably due to the impression that the smart healthcare context is similar to traditional healthcare systems. Given these shortcomings, an improved context ontology for smart healthcare systems in this paper. The idea is based on the previously defined context ontologies integrated using UPON, unlike other approaches thus far—which rely on object-oriented or creation of ontology from scratch for semantic reasoning in smart healthcare systems. The key benefits of the proposed context ontology for smart healthcare systems are expressiveness and expandability. The rest of this paper is organized as follows: The related works are discussed in Sect. 2; The smart healthcare context properties analysis is discussed in Sect. 3; The proposed context ontology is discussed in Sect. 4; The applicability and validity of the context ontology are illustrated through a typical case study in Sect. 5; Lastly, the conclusions are presented in Sect. 6.
2 Related Work Several authors have recognized the importance of ontologies in organizing and represent the massive context information in the smart healthcare domain to realize interoperability among smart healthcare systems. For instance, [9] proposed an ontology that represents valuable context suitable for the provision of healthcare monitoring services in pervasive healthcare systems. The ontology described four fundamental concepts (personal data, sensor data, services, and host) and also the relationships between these concepts, furthermore, physician, and developer rules are used for context reasoning. The authors in [10] developed an upper-level context ontology to model the daily activities of elderly people in a smart home. Context information such as user, activity, location, sensors, physical objects, and temporals form the upper-level ontology to provide a homogeneous view over the heterogeneous data and generate semantics for activity modeling and context representation. Other research contributions such as ontology-based teal-time data modeling and knowledge representation for a smart healthcare system [11], and medical ontology for the effective management of healthcare system during an emergency in the dynamic environment [12], focus more on the medical information rather than the user context necessary for the provision of personalized smart healthcare service.
Context Ontology for Smart Healthcare Systems
201
Although the related studies have contributed immensely to the context ontology for the ubiquitous healthcare domain. The usefulness of other ontologies should not be ignored. The evaluation of applicability and validation of the context ontology should be extended with real data, in a real case study, which will improve the quality of the smart healthcare system, thereby making possible to provide the required services to healthy individuals, elderly people, or patients with chronic diseases [13].
3 Smart Healthcare Context Properties Analysis To analyze the context properties in the smart healthcare domain, Feature-Oriented Domain Analysis (FODA) techniques of Software Product Line (SPL) is adapted to capture the commonality and variability of the smart healthcare domain. This involves domain planning, feature identification, Feature extraction based on six case studies from healthy living and assisted living smart healthcare, Commonality, and variability analysis. The activities in the Feature-Oriented Domain Analysis (FODA) technique are shown in Fig. 1. The six case studies identified for feature extraction are as follows: 1. Healthy-living case study: sustAGE—smart healthcare for sustainable well-being of employees in EU industries 2. Healthy-living case study: Sports performance monitoring and injury prevention framework 3. Healthy-living case study: Food and activity tracking for a disease prevention system 4. Assisted-living case study: SMART BEAR—smart living and healthcare system for elderly 5. Assisted-living case study: xVLEPSIS—Smart non-invasive healthcare monitoring system for infants 6. Assisted-living case study: Smart Heart disease monitoring system
Fig. 1. The Feature-oriented domain analysis (FODA) technique.
The existing commonality and variability in the smart healthcare domain are identified based on the fundamental concepts proposed in [14] and [15]. With the identified commonality and variability within the context properties of smart healthcare, the context ontology can be developed with flexibility and reusability.
202
S. Garba et al.
4 Context Ontology for Mobile Service Instead of reinventing the wheel in the development of the proposed context ontology for mobile web service in smart healthcare, the Unified Process for ONtology building (UPON Lite) methodology [16] which is a lightweight extension of Network Ontology (NeOn) [17] is adopted.
Fig. 2. The Six stages of steps in UPON-Lite methodology.
The first two steps of the UPON Lite methodology about the domain is archived in Sect. 3, while the other steps about the ontology are discussed in this section. Classes are used to concretely represent ontology concepts in the process of ontology construction. The main classes are; “User” class, “Services” class, and “Environment” class. Figure 2 presents the taxonomy (hierarchical representation of the concepts) for smart healthcare context ontology. In ontology construction, relationships such as “hasCharacteristics”, “isLocatedIn” are used to show the interaction between the ontology concepts based on the properties and by the attributes that describe the concepts. Figure 3 shows an overview of the context ontology classes, object properties, data properties, and individuals in Protégé (Fig. 4).
Context Ontology for Smart Healthcare Systems
Fig. 3. The taxonomy of the smart healthcare context ontology
203
204
S. Garba et al.
Fig. 4. An overview of the context ontology classes and properties in Protégé
5 The Evaluation of the Proposed Context Ontology To evaluate the proposed context ontology and demonstrate its applicability, the sustAGE case study is used. Figure 5 shows the system architecture of the sustAGE case study discussed in [18].
Fig. 5. The sustAGE system architecture
The context spectrums and the context situations of the sustAGE case study are organized based on the proposed context ontology to represent all the available context for inference generation for the smart healthcare system using Protégé together with
Context Ontology for Smart Healthcare Systems
205
Pellet reasoner. Given that, it is the most widely used tool for expressive, fast, and flexible ontology development [19]. Figure 6 shows the description and property assertion of an assembly line worker in the sustAGE case study.
Fig. 6. The description and property assertion of an assembly line worker
An inference can be generated for the sustAGE system, for example, CareMessage service can be used to alert users (assembly line worker) via SMS, alarm, etc., depending on the user preferences and environment devices, e.g., proximity to hazardous conditions. The rules of inference for the “CareMessage service” are the following, in the case that the proximity to hazardous conditions is 100 m. • Worker(?w)m ˆ isLocatedIn(?w, ?l) ˆ Location(?l) ˆ locationType(?t, “hazardous conditions”) ˆ proximity(?p, 100) - > AlertWorker(?w). The above Semantic Web Rule Language (SWRL) is one of the methods used to create rules that govern the inference generation and reasoning process for smart healthcare systems. Among the numerous axioms, classes, object properties, datatype properties, and individuals of the proposed context ontology, no inconsistency was detected.
6 Conclusion and Future Work This paper proposes an improved Context Ontology for Smart Healthcare Systems. The feature-oriented domain analysis (FODA) techniques of software product line (SPL) is adapted for domain analysis which also complements domain expert’s opinion from existing ontologies. The UPON-Lite methodology is used to develop the proposed ontology. To validate the applicability of the proposed context ontology, a sustAGE smart healthcare case study is used. Overall, the results demonstrate a strong effect of the proposed context ontology as it’s more consistent, expressive, extendable, and can be used for context reasoning in smart healthcare systems. Future investigations will consider smart systems case studies and the development of smart healthcare systems based on the proposed context ontology to further validate the conclusions that are drawn from this study. Acknowledgments. We would like to thank the Ministry of Education (MOE) Malaysia for sponsoring the research through the Fundamental Research Grant Scheme (FRGS) with vote number 5F080 and Universiti Teknologi Malaysia for providing the facilities and supporting the research. In addition, we would like to extend our gratitude to the lab members of Software Engineering Research Group (SERG), School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia for their invaluable ideas and support throughout this study.
206
S. Garba et al.
References 1. Park, S.J., et al.: Development of the elderly healthcare monitoring system with IoT. In: Advances in Human Factors and Ergonomics in Healthcare, vol. 482, pp. 309–315. Springer (2017) 2. Khalaf, O.I., Sabbar, B.M.: An overview on wireless sensor networks and finding optimal location of nodes. Periodicals Eng. Natural Sci. 7(3), 1096–1101 (2019) 3. Salman, A.D., Khalaf, O.I., Abdulsahib, G.M.: An adaptive intelligent alarm system for wireless sensor network. Indonesian J. Electr. Eng. Comput. Sci. 15(1), 142–147 (2019) 4. Khalaf, O.I., Abdulsahib, G.M., Kasmaei, H.D., Ogudo, K.A.: A new algorithm on application of blockchain technology in live stream video transmissions and telecommunications. Int. J. e-Collaboration 16(1), 16–32 (2020) 5. Cabrera, O., Franch, X., Marco, J.: Ontology-based context modeling in service-oriented computing: a systematic mapping. Data Knowl. Eng. 110(May), 24–53 (2017) 6. Munir, K., Sheraz Anjum, M.: The use of ontologies for effective knowledge modelling and information retrieval. Appl. Comput. Inf. 14(2), 116–126 (2018) 7. Pradeep, P., Krishnamoorthy, S.: The MOM of context-aware systems: a survey. Comput. Commun. 137(January), 44–69 (2019) 8. Bagtharia, P., Bohra, M.H.: An optimal approach for web service selection. In: Proceedings of the 3rd International Symposium on Computer Vision and the Internet - VisionNet 2016, pp. 121–125 (2016) 9. HameurLaine, A., Abdelaziz, K., Roose, P., Kholladi, M.-K.: Ontology and rules-based model to reason on useful contextual information for providing appropriate services in U-healthcare systems. In: Intelligent Distributed Computing VIII, pp. 301–310. Springer (2015) 10. Ni, Q., García Hernando, A.B., De La Cruz, I.P.: A context-aware system infrastructure for monitoring activities of daily living in smart home. J. Sensors 2016, 1–9 (2016) 11. Abatal, A., Khallouki, H., Bahaj, M.: A smart interconnected healthcare system using cloud computing. In: ACM International Conference Proceeding Series (2018) 12. Zeshan, F., Mohamad, R.: Medical ontology in the dynamic healthcare environment. Procedia Comput. Sci. 10, 340–348 (2012) 13. Gubert, L.C., da Costa, C.A., da Rosa Righi, R.: Context awareness in healthcare: a systematic literature review. Universal Access in the Information Society, no. 0123456789 (2019) 14. Aguilar, J., Jerez, M., Rodríguez, T.: CAMeOnto: context awareness meta ontology modeling. Appl. Comput. Inf. 14(2), 202–213 (2018) 15. Lu, Z.J., Li, G.Y., Pan, Y.: A method of meta-context ontology modeling and uncertainty reasoning in SWoT. In: Proceedings - 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2016, pp. 128–135 (2017) 16. De Nicola, A., Missikoff, M.: A lightweight methodology for rapid ontology engineering. Commun. ACM 59(3), 79–86 (2016) 17. Suárez-Figueroa, M.C., Gómez-Pérez, A., Fernández-López, M., Benjamins, V.R.: The NeOn methodology for ontology engineering. In: Ontology Engineering in a Networked World, pp. 9–34 Springer (2012) 18. Pateraki, M., et al.: Biosensors and Internet of Things in smart healthcare applications: challenges and opportunities. Wearable Implantable Med. Devices 5, 25–53 (2020) 19. Musen, M.A.: The protégé project. AI Matters 1(4), 4–12 (2015)
A Modified UTAUT Model for Hospital Information Systems Geared Towards Motivating Patient Loyalty Saleh Nasser Rashid Alismaili1,2(B) , Mohana Shanmugam1 , Hairol Adenan Kasim1 , and Pritheega Magalingam3 1 College of Informatics and Computing, Universiti Tenaga Nasional, Kajang, Malaysia
[email protected] 2 Directorate of Information Technology, Ministry of Health, Muscat, Sultanate of Oman 3 Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Skudai,
Malaysia
Abstract. Healthcare service institutions (HSIs) have sought ways to motivate patient loyalty in response to surging rates of medical tourism. Previous research indicates that Hospital Information System (HIS) is essential for HSIs to gather, measure, and analyze the massive amounts of data required to generate patient loyalty. There is currently no consensus on the factors that comprise HIS specifically geared towards motivating patient loyalty (HISPL). Furthermore, HIS requires full adoption by HSI staff to be effective. Thus, to reduce wastage of HSI resources, it is necessary to predict whether a given HIS specifically geared towards motivating patient loyalty is likely to be adopted. The purpose of this study is to reveal the factors that comprise HISPL and to modify the Unified Theory of Acceptance and Use of Technology (UTAUT) model to help predict the likelihood of an HISPL to be fully adopted by HSI staff. The results revealed that pertinent HISPL factors are capability, configurability, ease of use/help desk availability and competence (EU), and accessibility/shareability (AS). Using these factors, the UTAUT model was modified to fit the specific needs of HISPL. The modifications are theoretical and will have to be validated in future empirical studies. Keywords: Hospital Information System · Patient loyalty · UTAUT
1 Introduction Healthcare service institutions (HSI) seeking to motivate patient loyalty have turned to Hospital Information Systems (HIS) as a means of achieving their organizational goals. HIS is a “comprehensive, integrated Information System (IS) designed to manage the administrative, financial, and clinical aspects of a hospital [1]. When implemented correctly, HIS can assist greatly in improving healthcare quality and efficiency, as well as improve patient outcomes and reduce medical errors by freeing medical staff to focus solely on their jobs [2]. HIS that is designed with the express purpose to motivate patient loyalty will be referred to in this study as HISPL. The primary goal of an effective © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 207–216, 2021. https://doi.org/10.1007/978-3-030-70713-2_21
208
S. N. R. Alismaili et al.
HIS is to meet the needs of patients [3]. However, due to the lack of direct interaction between patients and HIS, patients can only perceive the effects of HIS indirectly, that is, through the improved performance of HSI staff empowered by the HIS [4]. HIS can motivate patient loyalty by empowering HSI staff, which improves patients’ perceived service quality; the higher patients’ perceptions of service quality are, the more likely they are to report being satisfied with their care, which would then motivate patient loyalty [5–10]. Prior research indicates that the effectiveness of HIS depends largely on its full-scale adoption by its end-users, the HSI staff [11, 12]. An HIS may be well-designed and fully functional, but if HSI staff refuse to use it across the board, it will be no better than a malfunctioning HIS [11, 12]. To be effective, an HIS must be adopted across the board by HSI staff [12]. It is therefore necessary to ensure that an HIS meets the needs of HSI staff to ensure its efficacy in a real world context [13]. Before an HIS can be utilized by an HSI, it is essential to predict whether the HIS will likely be adopted by HSI staff to avoid wasting resources. To predict the likelihood of an HIS’ adoption, the Unified Theory of Acceptance and Use of Technology (UTAUT) model was utilized. The UTAUT model was selected as it is the culmination of previous technology acceptance models in the field [14, 15]. The UTAUT is the most widely cited model of individual technology acceptance and use [15]. However, the original UTAUT model is not sufficient to predict the adoption of HIS, as it is a general model that must be revised depending on the context in which it is used [14]. The popularity of the UTAUT model is due in no small part to its adaptability into different contexts. The original UTAUT model was developed outside the healthcare industry, and focuses on individual adoptive behaviors [14]. As HSIs are large organizations, the UTAUT model must therefore be modified if it is to be used in this capacity [16]. There is currently no research on the UTAUT model modified to predict the adoption of HISPL. To modify the UTAUT model for evaluating the adoption of HISPL, it will be necessary to expand the original model’s focus on individuals to entire HSIs. Furthermore, the modified model will have to account for the needs of HSI staff. Due to the paucity of research on HISPL and the lack of consensus on the factors of HISPL, conducting a literature review is necessary to modify the UTAUT model for predicting the adoption of HISPL. The objective of this study is to reveal the relevant factors of HISPL according to the literature, and to modify the original UTAUT model to accommodate the relevant factors of HISPL. To that end, the following research questions were generated for this study: 1. What are the factors of HISPL revealed in the literature? 2. How can the original UTAUT model be modified to address the HISPL factors revealed in the literature? The rest of the study will be organized into the following sections: a review of the literature on factors that comprise HISPL, and the modification of the UTAUT model to account for the HISPL factors revealed in the literature review.
A Modified UTAUT Model for Hospital Information Systems
209
2 Literature Review This section contains the pertinent findings of the review, which reveal factors of HISPL. The findings will be divided into the four factors found to have a significant impact on HIPL, namely: capability, configurability, ease of use/help desk availability and competence, and accessibility/shareability. 2.1 Capability Capability refers to the technical aspects of the HIS, or the ability of the HIS to accomplish healthcare goals. This includes the HIS’ ability to collect pertinent medical data, facilitate communication between different individuals and departments within an HSI, appropriate system architecture, and fast response times for requests while ensuring that the data it collects is confidential and secure from outsiders [17]. Collecting, keeping, and analysis of data is a necessary component of any effective HSI functioning, especially as healthcare data accrues more and more worldwide [18]. While one effect of HIS is easing the procedure of billing and payments, some scholars have argued that past investments in IT among HSIs have focused on this issue, and have so far failed to capitalize on the other benefits IT offers to the clinical needs of HSI, especially from the perspective of patients [2, 19]. The lack of investment centered on easing the medical process for the customer manifests itself in the difficulty of procuring medical treatments in different parts of the world relative to how similarly information-rich industries function [19]. Thus, it is fundamental for HIS to be designed around the fact that it is patient-centered, in that it assists HSIs to meet the needs of patients, and not just meet the needs of HSIs. 2.2 Configurability Configurability refers to the ability of an HIS to be modified or changed depending on the needs of the medical personnel using the technology. Systems and work practices must be able to adapt to each other in order for an HIS to be effective [11]. Work practices refer to the “practices, procedures, and norms” at a given HIS [20]. Because each HSI can have different areas of focus and requirements, an effective HIS must be able to be configured easily in order to adapt to any foreseeable situation [21, 22]. In short, due to the multitudes of different contexts and situations of individual HSIs, an effective HIS must be an open, generic system that prioritizes flexibility so that the system can be configured for the HSI’s specific needs [20]. An added complication is that requests about configurability may be routinely ignored by IT vendors without explanation, leading to difficulties with formal templates present in the IT solutions that they may feel fails to address the core tasks that necessitated the solutions in the first place. HIS developers must endeavor to make changes to the system intuitive for most HSI IT departments. IT vendors must also pitch in if IT departments need assistance configuring the HIS to their own specifications. Similarly, if end-users have issues with the HIS, their IT departments must be able to resolve their issues, with the assistance of IT vendors if needed [23, 24].
210
S. N. R. Alismaili et al.
2.3 Ease of Use/Help Desk Availability and Competence (EU) While younger medical staff have been seen to be more likely to engage proactively with IT solutions in healthcare, it is important to note that older medical staff may have the desire to do so as well but lack the competence that could be easily acquired if IT training is presented properly to staff [24]. Thus, it is essential that an IT solution for HSIs be accommodating of staff members who are not fully competent in the new technology by making them user-friendly to medical staff, to encourage full compliance and encourage their learning process. In case medical staff retains difficulties adapting, a help desk tasked with dealing with issues that arise from the IT solution must be ready to assist competently and promptly in a manner that will not dissuade staff from asking for help [23, 24]. The aforementioned difficulty of focusing on older medical staff can be compounded further in HSIs from developing countries, where computer literacy, technological competence, and willingness to learn among non-IT staff may be low. While newer IT technologies tend to be more streamlined and thus much simpler to learn and use, older staff may prefer older systems with which they are already familiar [21]. EU is important to assist in the adoption of new systems, whether from on-site technical support or remotely, e.g. a call center [21]. The reason is because an HIS’ ease of use can reduce the learning curve, while help desk availability ensures that any difficulties learners face will be addressed promptly—thus avoiding the fears of some medical staff that learning a new system would be slower for them, potentially jeopardizing their patients’ outcomes. 2.4 Accessibility/Shareability For an HIS to be effective, it is necessary that the information it is tasked to handle can be accessed easily and conveniently by the many different departments and individuals within an HIS, to avoid delays, inconveniences, and potential harms [22]. The importance of efficiency in terms of granting efficient access to medical information to the relevant medical staff not only improves service time, but also improves service quality by allowing different medical teams to coordinate with each other without losing time [25, 26]. An illustration of this is the fact that vascular surgery outpatient appointments are becoming more difficult to manage in recent years due to a lack of information [27]. As this information is essential for the success of the operation, additional work tracking them down is required of the medical staff, time that could be spent more productively. Some of the sources of this information include patient notes, referral letters from general practitioners or the patients’ original doctor, and recent test results and scans [27]. The information must also be shareable to staff because inpatient and outpatient departments in HSIs tend to be structured independent of each other, and records from each may have to be repeated once the patient enters the other department, wasting time and resources [28]. This is especially important when a patient, in the course of an illness, enters many different phases of treatments, located in different departments within the HSI. For example, a person who enters the emergency room for a broken leg is confined to the inpatient department for further observation, then is transferred to home care for rehabilitative treatments. Each of these departments may not have access
A Modified UTAUT Model for Hospital Information Systems
211
to information obtained in other departments, necessitating the collection of redundant information, causing delays and lack of transparency, both of which can impact the quality of healthcare, as well as the patients’ perceptions of it [28]. Furthermore, reliable access of staff to medical data depends on a robust computer network with minimal delays. The accessibility/shareability of medical information is essential to the pursuit of quality healthcare [25, 26]. The more comprehensive one’s medical record is, the more likely it is for medical staff to make well-informed decisions about the patient’s care. The use of computerized medical records possess the ability to improve the quality and efficiency of HSIs drastically because they are much easier to share across different healthcare providers compared to paper records [19]. The increased shareability of patient records and other important medical information could help facilitate shorter waiting periods and more efficient medical interactions between different departments and HSIs [25].
3 Modified UTAUT Model for HISPL The Unified Theory of Acceptance and Use of Technology (UTAUT) model was developed by [14, 15], and aimed to consolidate the disparate views on technology acceptance to a single coherent model. UTAUT theorized that all such models utilized four core constructs that motivate behavioral intention—performance expectancy, effort expectancy, social influence, and facilitating conditions—and that these four core constructs were, in turn, moderated by individual characteristics, namely, age, gender, voluntariness, and experience [14]. The original UTAUT model is provided in Fig. 1. Performance expectancy refers to the degree an individual perceives the helpfulness of adopting technology in their job performance. Effort expectancy refers to the degree an individual perceives the ease of adopting a technology. Social influence refers to the degree in which an individual perceives others’ beliefs that they should adopt a technology. Facilitating conditions refer to the degree an individual perceives the technical and organizational support they receive in adopting a technology [14]. HISPL requires full adoption to be effective [12]. Without full compliance by HSI staff, even an excellent HISPL will fail to bring about its goals [11]. Thus, to ensure that HSI resources are not wasted needlessly in developing HISPL, the use of a technology adoption model is needed to ensure that the HISPL will be likely adopted by HSI staff. The choice to use UTAUT as the study’s base technology adoption model is justified by the current status of UTAUT as the most cited model of individual technological acceptance and use [16]. Within the healthcare context, UTAUT has been used most often in predicting the adoption of electronic medical records [29–31], but also in predicting acceptance of Information Systems among healthcare professionals [14, 32]. Within healthcare, UTAUT has been most associated with predicting the individual use of electronic medical records [30, 31], but has also been used in predicting the acceptance of Information Systems (IS) among healthcare professionals [14, 32, 33]. To modify the UTAUT model, results from the literature review were used to reveal the factors that influenced adoption and use of HISPL. The revealed factors were capability, configurability, EU, and AS.
212
S. N. R. Alismaili et al.
Fig. 1. Original UTAUT model [14–16]
UTAUT was modified by eliminating the original UTAUT variables that had no relation to HISPL, as revealed by the results of the literature review. This meant eliminating social influence and facilitating conditions, and splitting performance and effort expectancy into two distinct constructs, to mirror HSI professionals’ distinction between capability and configurability, as well as the distinction between EU and AS. Figure 2 shows the modified UTAUT model, based on the factors of HISPL revealed in the previous section. The choice to simplify the original model by reducing the moderating factors to just one—namely, age—is supported by [16] in a study that tailored the UTAUT model for electronic health records. The modification is validated by the natural comparison between the modified constructs and the original UTAUT constructs, which have already been validated in previous studies. Assimilating the factors of HISPL as revealed by the literature review into the original UTAUT model to generate a modified model for HISPL is justified in [14] as a way of creating an HISPL-specific UTAUT model, a method previously utilized by other studies of UTAUT in healthcare contexts [34, 35]. An additional factor in keeping age as the sole moderating factor is the strong evidence found for age’s significant effects on technology adoption in developing countries, such as Cameroon [36], Ghana [35, 37], and Brazil [38]. The factor of age may play a larger role than other moderating factors due to age affecting HSI staffs’ ability to learn a new system [21, 24], while factors such as gender, voluntariness, and experience are superseded by the organizational requirement to adopt new technologies such as HIS. The modified UTAUT model presented in this study is theoretical and will have to be validated in a future empirical study within a real life healthcare context. Based on the preceding literature review and discussion, it can be hypothesized that: H1: Capability has a significant positive effect on HISPL adoption among HSI staff. H2: Configurability has a significant positive effect on HISPL adoption among HSI staff. H3: Ease of use/help desk availability and competence has a significant positive effect on HISPL adoption among HSI staff. H4: Accessibility/Shareability has a significant positive effect on HISPL adoption among HSI staff.
A Modified UTAUT Model for Hospital Information Systems
213
Fig. 2. Modified UTAUT model for HISPL
H5: The influence of capability on HISPL adoption among HSI staff is moderated by age. H6: The influence of configurability on HISPL adoption among HSI staff is moderated by age. H7: The influence of ease of use/help desk availability and competence on HISPL adoption among HSI staff is moderated by age. H8: The influence of accessibility/shareability on HISPL adoption among HSI staff is moderated by age. The modified UTAUT model is part of an ongoing study on patient loyalty. The potential effect of HIS on patient loyalty is illustrated in Fig. 3:
Fig. 3. Relationship between HIS and Patient Loyalty [5–10]
4 Conclusions and Directions for Future Work The study conducted a literature review on the factors that comprise a Hospital Information System specifically geared towards motivating patient loyalty (HISPL) in health service institutions (HSIs). HISPL is distinguished from HIS in general as no HISPL currently exists. While HIS can and does motivate patient loyalty indirectly by improving patient satisfaction, such effects are often secondary to the HIS’ primary purpose of meeting administrative, financial, and clinical needs. For HISPL, attaining patient
214
S. N. R. Alismaili et al.
loyalty is the primary purpose. To that end, the literature review revealed that capability, configurability, ease of use/help desk availability and competence (EU), and accessibility/shareability (AS) are possible factors to HISPL. Given that the efficacy of HIS in general depends largely on its full-scale adoption by HSI staff, the technology adoption model UTAUT was modified to fit the factors of HISPL revealed in the literature review. In the modified UTAUT model, the factors of facilitating conditions and social influence were dropped due to the lack of support found in the literature review. Performance expectancy was divided into two distinct components to mirror the distinction found in the literature between capability and configurability. Effort expectancy was similarly divided into EU and AS. Only age was retained as a moderating factor due to the strong support it received in UTAUT studies conducted within the healthcare field—especially in developing countries, where HIS can make the most improvements—while others received mixed evidence. Future research will test the variables used in the modified UTAUT model for validity, with the ultimate goal of developing an HISPL for public HSIs in Oman.
References 1. Ahmadi, H., Nilashi, M., Ibrahim, O.: Organizational decision to adopt hospital information system: an empirical investigation in the case of Malaysian public hospitals. Int. J. Med. Inform. 84(3), 166–188 (2015) 2. Zakaria, N., Yusof, S.A.M.: Understanding technology and people issues in hospital information system (HIS) adoption: case study of a tertiary hospital in Malaysia. J. Infect. Public Health 9(6), 774–780 (2016) 3. Cantiello, J., Kitsantas, P., Moncada, S., Abdul, S.: The evolution of quality improvement in healthcare: patient-centered care and health information technology applications. J. Hosp. Adm. 5(2), 62–68 (2016) 4. Wijaya, E., Sulistyowati, N.: The effect of application of hospital information systems on operational performance through user satisfaction. Eur. J. Bus. Manag. 11(36), 71–78 (2019) 5. Juhana, D., Manik, E., Febrinella, C., Sidharta, I.: Empirical study on patient satisfaction and patient loyalty on public hospital in Bandung, Indonesia. Int. J. Appl. Bus. Econ. Res. 13(6), 4305–4326 (2015) 6. Lubis, A.N., Lumbanraja, P., Lubis, R.R., Hasibuan, B.K.: A study of service quality, corporate social responsibility, hospital image, and hospital value creation in Medan. Eur. Res. Stud. 20(4B), 125–133 (2017) 7. Meesala, A., Paul, J.: Service quality, consumer satisfaction and loyalty in hospitals: thinking for the future. J. Retail. Consum. Serv. 40, 261–269 (2018) 8. Asnawi, A., Awang, Z., Afthanorhan, A., Mohamad, M., Karim, F.: The influence of hospital image and service quality on patients’ satisfaction and loyalty. Manag. Sci. Lett. 9(6), 911–920 (2019) 9. Rahmadita, A., Yanuar, F., Devianto, D.: The construction of patient loyalty model using Bayesian structural equation modeling approach. CAUCHY 5(2), 73–79 (2018) 10. Tosyali, H., Sütcü, C.S., Tosyali, F.: Patient loyalty in the hospital patient relationship: the mediating role of social media. Erciyes˙Ileti¸simDergisi 6(1), 783–804 (2019) 11. Handayani, P.W., Hidayanto, A.N., Pinem, A.A., Hapsari, I.C., Sandhyaduhita, P.I., Budi, I.: Acceptance model of a hospital information system. Int. J. Med. Inform. 99, 11–28 (2017)
A Modified UTAUT Model for Hospital Information Systems
215
12. Narattharaksa, K.C., Speece, M.: Vendor relations and implementation of health IT projects. https://www.researchgate.net/profile/Mark_Speece2/publication/292988984_Vendor_relati ons_and_implementation_of_health_IT_projects/links/56b49ad008ae922e6c020216.pdf. Accessed 16 Aug 2020 13. Shahzad, K., Jianqiu, Z., Zia, M.A., Shaheen, A., Sardar, T.: Essential factors for adoption hospital information system: a case study from Pakistan. Int. J. Comput. Appl. 1–12 (2018) 14. Venkatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User acceptance of information technology: toward a unified view. MIS Q. 27, 425–478 (2003) 15. Venkatesh, V., Thong, J., Xu, X.: Consumer acceptance and use of information technology: extending the unified theory of acceptance and use of technology. MIS Q. 36(1), 157–178 (2012) 16. Venkatesh, V., Sykes, T.A., Zhang, X.: ‘Just what the doctor ordered’: a revised UTAUT for EMR system adoption and use by doctors. In: 2011 44th Hawaii International Conference on System Sciences, p. 10. IEEE, January 2011 17. Farzandipour, M., Meidani, Z., Gilasi, H., Dehghan, R.: Evaluation of key capabilities for hospital information system: a milestone for meaningful use of information technology. Ann. Trop. Med. Public Health 10(6), 1579 (2017) 18. Roesems-Kerremans, G.: Big data in healthcare. J. Healthcare Commun. 1(4), 33 (2016) 19. Wang, Y., Kung, L., Byrd, T.A.: Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Chang. 126, 3–13 (2018) 20. Hertzum, M., Simonsen, J.: Configuring information systems and work practices for each other: what competences are needed locally? Int. J. Hum. Comput. Stud. 122, 242–255 (2019) 21. Bawack, R.E., Kamdjoug, J.R.K.: Adequacy of UTAUT in clinician adoption of health information systems in developing countries: the case of Cameroon. Int. J. Med. Inform. 109, 15–22 (2018) 22. Malik, S.A., Nordin, A., Al-Ehaidib, R.N.: Requirements engineering (RE) process for the adaptation of the hospital information system (HIS). Int. J. Adv. Sci. Eng. Inf. Technol. 9(1), 8–17 (2019) 23. Bezboruah, K.C., Hamann, D.: Health IT adoption in nursing homes: the role of IT vendors. Int. J. Innov. Technol. Manag. 15(01), 1850001 (2018) 24. Abramson, E.L., Edwards, A., Silver, M., Kaushai, R.: Trending health information technology adoption among New York nursing homes. Am. J. Managed Care 20(11 Spec No. 17), eSP53-9 (2014) 25. Holmgren, A.J., Patel, V., Charles, D., Adler-Milstein, J.: US hospital engagement in core domains of interoperability. Am. J. Manag. Care 22(12), e395–e402 (2016) 26. Liebe, J.D., Esdar, M., Hübner, U.: Measuring the availability of electronic patient data across the hospital and throughout selected clinical workflows. Stud. Health Technol. Inform. 253, 99–103 (2018) 27. Hurst, K., Kreckler, S., Handa, A.: Improving information availability in vascular surgical clinics. A service evaluation and improvement project. BMJ Open Qual. 5(1), u210012– w4177 (2016) 28. Kranz, A.M., Dalton, S., Damberg, C., Timbie, J.W.: Using health IT to coordinate care and improve quality in safety-net clinics. Joint Comm. J. Qual. Patient Saf. 44(12), 731–740 (2018) 29. Jewer, J.: Patients intention to use online postings of ED wait times: a modified UTAUT model. Int. J. Med. Inform. 112, 34–39 (2018) 30. Alam, M.Z., Hu, W., Barua, Z.: Using the UTAUT model to determine factors affecting acceptance and use of mobile health (mHealth) services in Bangladesh. J. Stud. Soc. Sci. 17(2), 137–172 (2018)
216
S. N. R. Alismaili et al.
31. Cimperman, M., Brenˇciˇc, M.M., Trkman, P.: Analyzing older users’ home telehealth services acceptance behavior—applying an extended UTAUT model. Int. J. Med. Inform. 90, 22–31 (2016) 32. Sharifian, R., Askarian, F., Nematolahi, M., Farhadi, P.: Factors influencing nurses’ acceptance of hospital information systems in Iran: application of the unified theory of acceptance and use of technology. Health Inf. Manag. J. 43(3), 23–28 (2014) 33. Williams, M.D., Rana, N., Dwivedi, Y.K.: The unified theory of acceptance and use of technology (UTAUT): a literature review. J. Enterp. Inf. Manag. 28(3), 443–488 (2015) 34. Ahlan, A.R., Ahmad, B.I.E.: User acceptance of health information technology (HIT) in developing countries: a conceptual model. Procedia Technol. 16, 1287–1296 (2014) 35. Zhou, L.L., Owusu-Marfo, J., Antwi, H.A., Antwi, M.O., Kachie, A.D.T., Ampon-Wireko, S.: Assessment of the social influence and facilitating conditions that support nurses’ adoption of hospital electronic information management systems (HEIMS) in Ghana using the unified theory of acceptance and use of technology (UTAUT) model. BMC Med. Inform. Decis. Mak. 19(1), 230 (2019) 36. Ahlan, A.R., Ahmad, B.I.E.: An overview of patient acceptance of health information technology in developing countries: a review and conceptual model. Int. J. Inf. Syst. Project Manag. 3(1), 29–48 (2015) 37. Antwi, H.A., Yiranbon, E., Lulin, Z., Maxwell, B.A., Agebase, A.J., Yaw, N.E., Vakalalabure, T.T.: Innovation diffusion among healthcare workforce: analysis of adoption and use of medical ICT in Ghanaian tertiary hospitals. Int. J. Acad. Res. Bus. Soc. Sci. 4(7), 63 (2014) 38. Duarte, J.G., Azevedo, R.S.: Electronic health record in the internal medicine clinic of a Brazilian university hospital: expectations and satisfaction of physicians and patients. Int. J. Med. Inform. 102, 80–86 (2017)
Teamwork Communication in Healthcare: An Instrument (Questionnaire) Validation Process Wasef Matar1(B) and Monther Aldwair2 1 University of Petra, Amman, Jordan
[email protected] 2 College of Technological Innovation, Zayed University, Abu Dhabi, UAE
[email protected]
Abstract. Healthcare face many problems, one of these problems is embodied in teamwork communication systems, the current HISs lack of teamwork communication tools. To introduce a teamwork communication instrument (questionnaire) in healthcare which plays a key role in health information system area. The proposed a research model for this study applied a quantitative approach using a survey method. To formulate the problem a preliminary data was collected by survey method to test and introduce a validated instrument (questionnaire). This study proposed and validated an instrument (questionnaire) to be used in healthcare teamwork communication studies. The findings of this study will be contributed to teamwork communication in healthcare and will be a reference for any healthcare communication related study. This study is the first of its kind in Jordan and has added a new dimension in the teamwork communication in healthcare. Keywords: Teamwork communication · Clinical Pathways · Instrument (questionnaire) · Communication tools
1 Introduction Teamwork communication in healthcare is an important process to prevent medical errors. According to the report of the Institute of Medicine (IOM), 70% of medical errors are related to a teamwork communication, and 30% related to other factors. The current Health Information Systems (HISs) is lack of supporting teamwork communication among medical staff. The term “To err is human” has been applied to medical errors due to diagnostics, treatment, prevention and others. In the United States, 30%–45% of patients do not receive appropriate health care [1]. Effective use of HIS has improved quality of treatment, improved patient safety, better team climate, and better clinical outcomes. Clinical Pathways may improve teamwork communications [2, 3], Clinical Pathways is a system developed to apply patientcentered approach. Clinical Pathways is defined as “A complex intervention for the mutual decision making and organization of predictable care for a well-defined group of patients during a well-defined period.” (E-P-A, www.E-P-A.org). Clinical Pathways © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 217–229, 2021. https://doi.org/10.1007/978-3-030-70713-2_22
218
W. Matar and M. Aldwair
support teamwork communication, improve healthcare quality and reduce the cost [4–6], thus, this system plays a key role in quality of healthcare. By implementing CP, healthcare teamwork communication will be enhanced and improved [4] in Jordan. Teamwork communication in healthcare has become a critical issue in healthcare sector. Hence, there is a pressing need to investigate this issue. Based on the literature and the best of researcher knowledge, there is a lack of validated instrument and/or questionnaire to measure the teamwork communication in healthcare. Therefore, the aim of this paper is two folds; first, to develop an instrument (questionnaire) which based on a developed model to implement electronic Clinical Pathways (CP). Second aim is to validate the developed instrument (questionnaire) to be a reference in teamwork communication in healthcare research. The proposed and validated instrument (questionnaire) is based on the model in Fig. 3. The proposed model and the instrument (questionnaire) were tested in healthcare sector in Jordan, which is based on implementing Clinical pathways system. The study was conducted on two University’s hospitals use the computerized systems for healthcare. According to the researchers’ inspections for these two hospitals, it was found that they do not apply the Clinical Pathways. Moreover, it was found that they still use mobile phones and emails for communication rather than using HIS. Comparatively, these two types of communications have disadvantages and lack in providing information and support communications than the computerized information system. Thus, it is necessary to implement an effective communication system such as Clinical Pathways to improve teamwork communication based on the proposed model.
2 Literature Review The related work on Information Systems (IS) theories and teamwork communication models were reviewed and investigated to develop an instrument (questionnaire) to be a reference in teamwork communications in healthcare. This research study attempts to validate the proposed model of CP and an instrument (questionnaire) for healthcare teamwork communications. Many of HISs have failed around the world, and very few models integrate electronic clinical pathways in HISs [7, 8]. In addition, there is a trend to improve communication and decision making in HISs, which has not yet been achieved [9]. Therefore, there is a need for a model for successful communication and decision making, and a valid instrument (questionnaire) to measure the model in healthcare communications to be a reference. The current HISs was designed and developed for administrative purposes, thus there is a need to differentiate between the administrative and clinical processes. Most HISs in the hospitals and medical centers supports the administrative processes, the involve, but to a limited extent, they embrace some clinical processes that can support teamwork communication. Consequently, treatment processes, communication and coordination of teamwork does not supported in the applied HISs [10–13]. There are two approaches in HISs, patient-centered and disease-centered. In disease-centered approach, the patients are treated individually, without considering other circumstances and treatments. To develop and improve the current HISs, There is a necessity to switch from diseasecentered to patient-centered approach to treat the patient’s case as a complete instead of
Teamwork Communication in Healthcare: An Instrument (Questionnaire)
219
isolated diseases [10, 11, 14, 15]. Disease-centered approach supports the administrative process, but this approach is poor in supporting clinical processes; there is a need for a system supporting the clinical processes. Teamwork activities are an essential process which lack in disease-centered approach, especially communication, and lacks information on the flow of treatment process. In addition, in patient-centered approach, the patients are treated by considering all his diseases as a whole, not in isolation from other illnesses. The aim of patient-centered approach is to improve the healthcare quality and to decrease the medical errors. Patient-centered approach supports teamwork activities, communication and coordination are the main activities for teamwork. Patient-centered approach has two key requirements, teamwork communication and care coordination [10, 11, 14, 16, 17]. This study has reviewed information system studies and teamwork communications based on implementing Clinical Pathways as a clinical process. There is a lack of studies on Clinical Pathways from information systems perspective given the problematic history of HISs issues [18], and a lack of specific details and failure in HISs implementations, and we surveyed the issues related to HISs and Clinical Pathways [7, 19]. In previous studies have not identified the importance of Clinical Pathways as a communication tool [2, 3]. In addition, there is no validated healthcare communication instrument (questionnaire) to test the model in Fig. 3 [10].
3 Model As mentioned before that the aim of this study is to develop and validate an instrument (questionnaire) for teamwork communication in healthcare. Therefore, Socio-technical theory and Donabedian model were integrated to support the propose model to enhance teamwork communication. There is a need for a valid instrument (questionnaire) to test the proposed model. And to be a reference in healthcare communications discipline by implementing electronic Clinical Pathways. Social and technical aspects are the two subsystems for socio-technical theory which depict on Fig. 1, the two subsystems interact with each other with a relationship, every aspect support the other aspect by the interrelated relationship. Donabedian model depict in Fig. 2, this model has three dimensions (structure, process, and outcomes). The main objective for this model is to enhance healthcare system’s quality in general, and to be as reference in improving models and frameworks in healthcare in particular [20]. For this study, we develop an instrument (questionnaire) to be used in healthcare to enhance teamwork communication. Socio-technical theory is a recommended theory by many researchers in healthcare to light the requirements of the 21st century [21–23]. Healthcare has a set of characteristics and these characteristics are seen in three dimensions. The nature of work is one these characteristics that is viewed based on socio-technical theory. Based on the literature Clinical Pathways used as a tool to support teamwork communications can be seen from the perspective of socio-technical theory [24]. Integrate the human factors and healthcare quality models can improve and add to performance, healthcare quality and patient safety [25]. Socio-technical has two aspects (systems) without factors or dimensions. Based on
220
W. Matar and M. Aldwair
this, there is no related questions or a validated instrument (questionnaire) were found to test the model. There is another synonym for socio-technical which called socio-technical systems (STS) approach which is integrated of technical and social factors should be considered in organizational systems design [26, 27]. The STS approach eases a better understanding of how social factors affect of technical systems usage. Figure 1 presents socio-technical theory.
Social tem
Sys-
Structure
People
Technical tem
Sys-
Technology
MIS (Direct)
Process
Fig. 1. Socio-technical theory
Donabedian Model has a set of factors that are useful in implementing and designing HIS [20]. Donabedian model addresses the structure and the process for interdisciplinary teamwork model and provides a set of factors that support teamwork structure and teamwork process that lead to support implementation and designing HIS. Moreover, this model provides a solution for teamwork communication issues such as lack of information systems that support treatment flow, shared care requires, etc. Figure 2 presents the Donabedian model. This model was tested based on qualitative method, in this paper, the model would have been tested based on quantitative approach.
Structure
Process Fig. 2. Donabedian model
Outcome
Teamwork Communication in Healthcare: An Instrument (Questionnaire)
221
Validation and testing of the instrument (questionnaire) are main objective of this study in the context of teamwork communication in healthcare utilizing socio-technical theory in Jordan. Previous studies in healthcare communication consider the communication between physicians and nurses by using SBAR (Situation, Background, Assessment, and Recommendation). Previous research did not consider the communication among physicians and nurses and between them based on Clinical Pathways system [28]. Therefore, the current instrument (questionnaire) are based on such aspects. In this study provides an instrument (questionnaire) that is based on electronic Clinical Pathways (CP). The following is the model to test and validated the instrument that has been developed for teamwork communication in healthcare sector.
Communication Protocols
Social Factor
Internal Communication External Communication Teamwork Structure
Enhanced Teamwork Communication in Healthcare
Technical Factor Care Planning Disease Planning Discharge Planning Information Sharing Information Exchange
Fig. 3. Research model
222
W. Matar and M. Aldwair
4 Discussion The main objective of this study is to propose and validate an instrument (questionnaire) to be a reference in healthcare teamwork communications, this instrument (questionnaire) is based socio-technical theory and validated through the model mentioned in Fig. 3. The main contribution of this study is the establishment of a validated developed instrument (questionnaire) based on a review of literature in teamwork communication in healthcare. The instrument includes 48 items to measure 10 factors. The only items excluded from developed instrument; (CP1, CP2) from communication protocols, (EC1) from external communication, (IE3, IE6) from information exchange. This research contributes to the area by developing and validating an instrument (questionnaire) using as a sample of physicians and nurses in Jordanian hospitals. HIS in Jordanian hospitals is needed to be successful and there is a need to develop its functions to support teamwork communication among medical staff. Without validated instrument and responses from the users of HIS system this process yields misleading results. This study, therefore, overcome the shortcomings of the instrument in HIS with the model from the responses of nurse and physicians from two hospitals in Jordan. However, and despite the limitations, many directions are highlighted in this paper as future work such as the use of the instrument in doing and enhance the research on healthcare teamwork communications, which are likely to enhance the research and studies in not only on communication by implementing electronic pathways but in teamwork communication in general.
5 Conclusion The developed instrument (questionnaire) in this work opens the door for researchers to explore teamwork communication in healthcare. In addition, this instrument (questionnaire) is the first step to build a block that can contribute to teamwork communication in healthcare and other domains. To generalize the findings of this research, more research is encouraged in other hospitals in Jordan and in developing countries. HIS in Jordan is a new technology that emerged from the needs of healthcare organizations to better serve their patients and to improve healthcare quality and prevent medical errors by enhancing teamwork communications between physicians and nurses. Limitation of this study is that this study only considers physicians and nurses as main users of Clinical Pathways. Future research can consider top management and communication between medical staff and patients.
6 Appendix A
Teamwork Communication in Healthcare: An Instrument (Questionnaire) Internal communication
Coming from
Using information Process planning technology will improve internal communication and coordination
223
Reference
Used in
[29]
Process-oriented model 1–5
Scale
[30]
Self-categorization theory
Using information technology will strengthen care processes planning Using information technology will enable your hospital to adopt new organizational structures Using information technology will improve decision-making Using information technology will streamline care process External communication External communication
1–5
Using information technology facilitates members to communicate to other members during off time hours Using information technology facilitates to get all information from other team members during off time hours Using information technology makes it easy for members to communicate during off time hours Using information technology facilitates communication with other agencies (continued)
224
W. Matar and M. Aldwair
(continued) Internal communication
Coming from
Reference
Used in
Scale
Communication
[31]
TeamSTEPPS framework
1–5
Communication protocols
[32]
This study
Formalization
[33, 34]
TeamSTEPPS framework
Using information technology facilitates streamlining of clinical processes during off time hours Communication protocols Using specific terminologies in communication will improve patient care Staff follows a standardized method of sharing information when handing off patients Team member document and verify information that they receive from one another Using briefing (surgical checklist) will improve patient care Using debriefing (document treatment method) will improve patients care Using method of SBAR (Situation, Background, Assessment, and Recommendation) will improve patient care Teamwork structure
1–5
Hospital rules and regulations effect teamwork structure composition (continued)
Teamwork Communication in Healthcare: An Instrument (Questionnaire)
225
(continued) Internal communication
Coming from
Reference
Used in
Scale
Information sharing
[35]
Socio-technical theory emergency system
1–5
Care planning
[36, 20]
Donabedian model
1–5
Using information technology will improve the structure and composition of teamwork Using information technology will improve leadership in teamwork structure composition Using information technology will improve care coordination in teamwork structure composition Using information technology will improve communication in teamwork structure composition Information sharing Using information sharing enables the hospital to work with other agencies cooperatively Using information sharing enables better allocation of the resources Using information sharing among physicians should be timely Using information sharing during off time hours should be timely Care planning Hospital’s policies have positive effect on process of care planning (continued)
226
W. Matar and M. Aldwair
(continued) Internal communication
Coming from
Reference
Used in
Scale
Information exchange
[37, 38]
Donabedian model
1–5
Discharge planning
[20, 39]
Donabedian model
1–5
Standardized care processes will improve care planning Clinical guidelines will improve the structure of care planning Using information technology will improve documentation care planning Information technology will improve administrative procedures Information exchange Physicians benefit from exchanging and combining ideas with one another Physicians believe that by exchanging ideas they can improve healthcare quality Nurses believe that by exchanging ideas they can improve healthcare quality Hospital policies facilitate information exchange My hospital has a standardized system for information exchange Discharge planning Using information technology facilitates pre-discharge instruction Using information technology facilitates discharge planning process (continued)
Teamwork Communication in Healthcare: An Instrument (Questionnaire)
227
(continued) Internal communication
Coming from
Reference
Used in
Scale
Disease planning
[40]
Donabedian model
1–5
Effective emergency system
[35]
This study
1–5
Using information technology will improve coordination with some governmental agencies Using information technology facilitates patients to reintegrate into the community Disease planning Using information technology will improve disease planning process Using information technology will improve evaluation of disease planning process Using information technology will improve physiological sessions that are needed to reduce potential disease Other therapies are needed to manage diseases and it should be planned Teamwork communication enhancement Using information technology will improve teamwork communication quality Using information technology will provide communication in a timely manner Using information technology will streamline the care plan Using information technology will improve quality of care
228
W. Matar and M. Aldwair
Acknowledgements. This work was supported in part by Zayed University Research Office, Research Cluster Award #R18054.
References 1. Gaddis, G.M., Greenwald, P., Huckson, S.: Toward improved implementation of evidencebased clinical algorithms: clinical practice guidelines, clinical decision rules, and clinical pathways. Acad. Emerg. Med. J. 14(11), 1015–1022 (2007) 2. Abu-Rish Blakeney, E., et al.: Purposeful interprofessional team intervention improves relational coordination among advanced heart failure care teams. J. Interprof. Care 33(5), 481–489 (2019) 3. Lawal, A.K., et al.: Development of a program theory for clinical pathways in hospitals: protocol for a realist review. Syst. Rev. 8(1), 1–7 (2019) 4. Deneckere, S., et al.: Care pathways lead to better teamwork: results of a systematic review. Soc. Sci. Med. 75(2), 264–268 (2012) 5. Adeyemi, S., Demir, E., Chaussalet, T.: Towards an evidence-based decision making healthcare system management: modelling patient pathways to improve clinical outcomes. Decis. Support Syst. 55(1), 117–125 (2013) 6. Shoji, F., et al.: Assessing a clinical pathway to improve the quality of care in pulmonary resections. Surg. Today 41(6), 787–790 (2011) 7. Logan, J.: Electronic health information system implementation models-a review. Stud. Health Technol. Inform. 178, 117–123 (2012) 8. Sherifi, D.: FITT Model, in Portable Health Records in a Mobile Society, pp. 199–208. Springer (2019) 9. Sacchi, L., et al.: Personalization and patient involvement in decision support systems: current trends. Yearb. Med. Inform. 10(1), 106 (2015) 10. ALsalamah, H.: Supporting integrated care pathways with workflow technology. Cardiff University (2012) 11. Ward, M.M., et al.: Clinical information system availability and use in urban and rural hospitals. J. Med. Syst. 30(6), 429–438 (2006) 12. Lenz, R., Reichert, M.: IT support for healthcare processes – premises, challenges, perspectives. Data Knowl. Eng. 61(1), 39–58 (2007) 13. Swenson, K.D., Palmer, N., Silver, B.: Taming the Unpredictable: Real World Adaptive Case Management: Case Studies and Practical Guidance. Future Strategies Inc. (2011) 14. Executive, N.: The NHS Plan: a plan for investment, a plan for reform, in London: Department of Health. Colegate (2000) 15. Marjanovic, S., et al.: Innovating for improved healthcare: sociotechnical and innovation systems perspectives and lessons from the NHS. Sci. Public Policy 47(2), 283–297 (2020) 16. Majidi, M., Mahdavi, H., Siamian, H.: Patients’ information needs in affiliated hospitals of Tehran University of Medical Sciences, vol. 723 (2012) 17. Luxford, K., Safran, D.G., Delbanco, T.: Promoting patient-centered care: a qualitative study of facilitators and barriers in healthcare organizations with a reputation for improving the patient experience. Int. J. Qual. Health Care 23(5), 510–515 (2011) 18. Ibrahim, W.M.R.: Situation analysis for clinical pathways and teamwork communication in healthcare. Indian J. Sci. Technol. 9, 28 (2016) 19. Robert, G., et al.: Organisational factors influencing technology adoption and assimilation in the NHS: a systematic literature review. Report for the National Institute for Health Research Service Delivery and Organisation programme (2009)
Teamwork Communication in Healthcare: An Instrument (Questionnaire)
229
20. Kuziemsky, C.E., et al.: An interdisciplinary team communication framework and its application to healthcare ‘e-teams’ systems design. BMC Med. Inform. Decis. Mak. 9(1), 43 (2009) 21. Sittig, D.F.: A socio-technical model of health information technology-related e-iatrogenesis. In: AMIA Annual Symposium Proceedings (2008) 22. Nøhr, C., Aarts, J.E.: Information Technology in Health Care: Socio-Technical Approaches 2010: From Safe Systems to Patient Safety, vol. 157. IOS Press, Amsterdam (2010) 23. Aarts, J., et al.: Information technology in health care: socio-technical approaches. Int. J. Med. Inform. 79(6), 389–390 (2010) 24. Berg, M.: Patient care information systems and health care work: a sociotechnical approach. Int. J. Med. Inform. 55(2), 87–101 (1999) 25. Carayon, P., et al.: Work system design for patient safety: the SEIPS model. Qual. Saf. Health Care 15(suppl 1), i50–i58 (2006) 26. Baxter, G., Sommerville, I.: Socio-technical systems: from design methods to systems engineering. Interact. Comput. 23(1), 4–17 (2011) 27. Bostrom, R.P., Heinen, J.S.: MIS problems and failures: a socio-technical perspective, Part II: the application of socio-technical theory. MIS Q. 1, 11–28 (1977) 28. Thomas, C.M., Bertram, E., Johnson, D.: The SBAR communication technique: teaching nursing students professional communication skills. Nurse Educ. 34(4), 176–180 (2009) 29. Van Der Vegt, G.S., Bunderson, J.S.: Learning and performance in multidisciplinary teams: the importance of collective team identification. Acad. Manag. J. 48(3), 532–547 (2005) 30. Tallon, P.P., Kraemer, K.L., Gurbaxani, V.: Executives’ perceptions of the business value of information technology: a process-oriented approach. J. Manag. Inf. Syst. 16, 145–173 (2000) 31. Battles, J., King, H.B.: TeamSTEPPS Teamwork Perceptions Questionnaire (T-TPQ) Manual. American Institute for Research, Washington, DC (2010) 32. Lisha Lo, M.: Teamwork and Communication in Healthcare, p. 68. Canadian Patient Safety Institute Edmonton (2011) 33. Hirst, G., et al.: How does bureaucracy impact individual creativity? A cross-level investigation of team contextual influences on goal orientation–creativity relationships. Acad. Manag. J. 54(3), 624–641 (2011) 34. Hamilton, H., Chou, W.-y.S.: The Routledge Handbook of Language and Health Communication, 1st edn. Routledge, New York (2014) 35. Kim, M., et al.: Assessing roles of people, technology and structure in emergency management systems: a public sector perspective. Behav. Inf. Technol. 31(12), 1147–1160 (2012) 36. Segars, A.H., Grover, V.: Profiles of strategic information systems planning. Inf. Syst. Res. 10(3), 199–232 (1999) 37. Collins, C.J., Smith, K.G.: Knowledge exchange and combination: the role of human resource practices in the performance of high-technology firms. Acad. Manag. J. 49(3), 544–560 (2006) 38. Gold, A.H., Arvind Malhotra, A.H.S.: Knowledge management: an organizational capabilities perspective. J. Manag. Inf. Syst. 18(1), 185–214 (2001) 39. Dai, Y.T., et al.: Effectiveness of a pilot project of discharge planning in Taiwan. Res. Nurs. Health 26(1), 53–63 (2003) 40. Glasgow, R.E., et al.: The chronic illness resources survey: cross-validation and sensitivity to intervention. Health Educ. Res. 20(4), 402–409 (2005)
Potential Benefits of Social Media to Healthcare: A Systematic Literature Review Ghada Ahmad Abdelguiom1(B) and Noorminshah A. Iahad2 1 School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru,
Malaysia 2 Azman Hashim International Business School (Information Systems),
Universiti Teknologi Malaysia, Johor Bahru, Malaysia [email protected]
Abstract. Social media offers a rich online experience, dynamic content, usability, and knowledge that attracts more users. The use of social media in the health sector is indeed attracting more and more attention. Over the last ten years, researchers have attempted various topics related to the health sector via social media that have contributed beneficially to the healthcare domain. There is a demand for a study to identify the potential benefits of social media to healthcare. Thus, this paper surveys research papers related to the social media platform in the healthcare domain that were published between the years (2014–2020). The primary objective of this study is to review the range, nature, and extent of current research activity on the role of social media in healthcare. Therefore, this paper outlines the recent approaches to the utilization of social media to provide solutions for health-related issues. Also, it discusses the role of social media in promoting health care services. The study addresses the key issues addressed in the latest research, provides an overview of their shortcomings, restrictions, and finally, outlines the opportunities for future research. Keywords: Health care · Social networks · Digital communication · Social media · e-Health · Web 2.0
1 Introduction Social Media (SM) platforms can be described as a community of Internet-based applications that build on the Web 2.0’s that enable the creation and sharing of content that has been created by users (Carr and Hayes 2015). The new approach in which it provides, however, also promotes discussion and identification of medical problems (Naslund et al. 2016). The social media’s idea of offering immediate, direct input to the members allows the site to play an enticing and exciting role in delivering relevant content to promote a healthier lifestyle (Chung et al. 2017). SM has been very cautious to share information during training, practice and surveys and emergencies from healthcare providers and public health practitioners (Vaterlaus et al. 2015). The emergence of Internet of things © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 230–241, 2021. https://doi.org/10.1007/978-3-030-70713-2_23
Potential Benefits of Social Media to Healthcare
231
lead to development in the field of healthcare. The available resources or studies lacks generation of awareness to people to utilize SM to interact with medical professionals and avail better treatment. One of its main features is that it enables members who cannot communicate with their doctors about certain medical problems and want to utilize web applications. They are not free to contact them. On the other hand, SM platforms can potentially be used to establish medication coherence and improve compliance with medications, training, and patient care programs, including topics related to sexual health, alcohol, and drug addiction. This can also be used to encourage institutional patient aid, support groups for patients, facilitate institutional loyalty, increase overall interaction between the physician and the patient, and enhance general physician communication (Lee et al. 2016; Sarker et al. 2016). The use of SM in a targeted way may contribute to a stronger quality education for patients, which in turn allows them to better understand their disease predictions and their respective treatment options. In conjunction with highlighting the latest treatments, facilities, installations that can be used when needed (Glover et al. 2015) Sensitive medical providers do need to see the value of participating, cooperation, and encouragement to chronic diseases such as diabetes by the inclusion of different SM (Knight et al. 2015; Santoro et al. 2015). They are seen as interesting tools that offer the medical community enormous benefits to help health professionals deliver adequate service and practice. The literary analysis framework for narrative synthesis was used for evaluating the current evidence (McCaughey et al. 2014; Merolli et al. 2015; Panahi et al. 2016; Xu et al. 2016). We searched for indexed scientific literature using healthcare and SM keywords. The articles included those that analyzed SM application in the area of healthcare. We analyzed the studies based on the criteria such as user retention, acceptance, level of participation, and the findings of SM efficacy in the health sector. In the last nine years, public engagement in SM has risen dramatically (Allington et al. 2020).In the USA, there has been a rise from 8% to 72% for adults using SM since 2005 (Depoux et al. 2020; Tursunbayeva et al. 2017). In 2012 Facebook users have more than one billion people worldwide, representing a seventh of the world’s population (Wyche and Baumer 2017). SM is widely distributed among all ages and occupations and is widespread across the globe. Moreover, 100 million active Twitter users send over 65 million Twitter messages every day, and two billion YouTube videos (Alhabash and Ma 2017). SM have been related to major political events such as the Arab Spring revolution, as well as to common developments in culture, including the decline of public attention and drop in print media (AlSayyad and Guvenc 2015). The intention of this review is to address the following Research Question (RQ). RQ1: What are the existing research articles that presents the application of SM in healthcare. RQ2: How to classify the Healthcare Professionals (HCP) and non-HCP. RQ3: What are the opportunities available to connect both HCP and non-HCP.
232
G. A. Abdelguiom and N. A. Iahad
The objective of this study to extract literature related to the role of SM in the health care domain. In addition, it presents the available methods for connecting both HCP and non-HCP. This paper is organized as follows. The information regarding the benefits of SM and its application in health care domain is discussed in the first section. The second section provides information about methods for reviewing the literature with relevant decisions on selecting studies for consideration. The results of this review are discussed in the third and fourth sections. Finally, the review is concluded with future direction.
2 Methodology The narrative synthesis approach was used to review and analyze the existing evidence as presented in the related literature to usage of SM platforms in health-related issues. The databases includes Medline, Web of Science, Emerald, and Applied Social Sciences Index and Abstract were searched using the keywords: Social Media, Digital Media, Web 2.0, Facebook, Internet, Blog, Twitter, Forum, content marketing, Wiki, Email, Health, medication, and Medical. Table 1 presents the inclusion and exclusion criteria employed in the study. We easily sorted the literature based on a title to rule out irrelevant papers and remove studies from medical professionals using social networking as an intervention or using social networking within closed support groups. Non-English literature has been excluded. Meanwhile, Google Scholar Search engine features were used to focus the search publication year (2014 to 2020). We used the following combination of keywords to extract studies. (“Social media’ OR “Social software” OR “Social network” OR “Facebook” OR “Twitter” OR “Youtube “’ OR “Digital media” OR “Social tagging” OR Wiki) AND (Healthcare domain). The papers included were those involving a considerable investigation on SM and different health themes as well as reporting the results of the intervention or detailing evidence-based plans. As such, the general papers about social networking and health were excluded. Table 1. Inclusion and exclusion criteria Inclusion
Exclusion
Complete research
In – complete or partial Research content
Published between the year 2014–2020
Articles published before the year 2014
Articles related to the objective
Irrelevant articles
Research articles published in English Language Research articles in non – English language
Studies focused on SM and/or health conditions. Online forums discuss various health issues. These studies monitor network activity over a given period, compile user interaction scenarios, and analyze trend content. Statistics suggest that participants
Potential Benefits of Social Media to Healthcare
233
appear to seek out social support networks and find solace from people with similar health problems. Having experience in the SM with others can lead to interactions that are less judgmental than in other social arenas. Individuals are willing to discuss socially very sensitive or embarrassing conditions openly, for example on an online men’s eating disorder forum (Moessner et al. 2018). An analysis of the social networking site, where a health problem is of a rather personal nature. We have also focused at how the online networking operation of users could lead to specific social changes and how they formed dimensions that could capture campaigning activities and crowd funding activities. RQ1 is addressed using Table 2 that illustrates the process of selection of research studies used in the study. Table 2. Selection process to identify research articles Steps
Number of papers
Extraction of studies from digital platforms
120
Removal of duplicates using Endnote
70
Manual selection of studies using Title and Abstract
50
Manual selection of studies based on the content of the research
49
3 Results It’s worthy to mention, as a general observation, that the number of articles published on the usage of SM in the health domain has expanded outstandingly in the span of the present review. Most literatures addressed the topic related to either perspective of health care providers or patients. 3.1 User Classification To address the RQ2, we classified the users in SM plays an important role in this modern environment. All over the world, people are communicating with the support of SM. The purpose of SM is to provide a platform to users to discuss about a common or specific issue. In this study, we have classified SM users into two major classification includes HealthCare Professional (HCP) and Non – HCP. The reason for the classification of users is to categorize the benefits of SM according to the users (Fig. 1). HCP. An HCP may provide health care treatment and advice based on formal training and experience. The field includes those who work as a physician, surgeon, physician assistants, nurse, physiotherapist, dentist, midwife, psychologist, psychiatrist, or pharmacist or who perform services in allied health professions. A health professional may also be a public health or community health practitioner. The following part of this section will facilitate the benefits of SM for each user under HCP category
234
G. A. Abdelguiom and N. A. Iahad
Fig. 1. Classification of users
i.
Doctors: Doctors or medical professionals, especially in the western world, are linked to their patients and peers through SM, including Twitter and Facebook. This allows them, in addition to answering questions on the health perspective, to engage with their patients on an individual basis and share ideas with their peers. ii. Nurses: Nurses use SM to interact at the personal and social level, and to monitor events related to public health. They rely on SM in all fields of practice to communicate and exchange knowledge with colleagues. iii. Counselors and Volunteers: Over recent years, SM platforms have imbued several facets of social and cultural life. Such tools, and Web 2.0 in general, allow interactivity, interoperability and collaboration and encourage user-generated health-related content to be created and shared in support of patients and lay users. The openness and participation are perceived to be the underlying logic of those technologies. There is an increasing amount of knowledge that shows that SM provides significant opportunities for the charitable and community sectors to improve participation. iv. Psychologist: Some of SM’s most enticing facets is that it’s an unobtrusive way for psychologists to analyze individual behavior. A new research indicates that people on SM prefer to act somewhat like oneself, rather than as an idealized version of themselves. It makes the information obtained from all these networking sites more reliable than anyone would think it is. Nevertheless, it should be noted that certain forms of studies, such as experimental research, are not quite conducive to SM sites.
Potential Benefits of Social Media to Healthcare
235
v. Trainers and Trainees: Trainers have initiated to use the SM channels to train trainees as part of their training process. Trainees are reinforced to use unique hashtags on Twitter, or to join other groups to participate in training sessions. This makes the training process more immersive and enjoyable. Those preparation strategies have a central place for trainees to ask questions and get prompt responses. SM empowers learners to provide direct feedback on training sessions to their trainers. Non – HCP. A Non-HCP is a health care recipient carried out by medical practitioners. The non-HCP is most frequently ill or injured and requires treatment from a physician, nurse, psychologist, or dentist. The utilization of SM by each Non – HCP is as follows: i.
Patient: Patients are becoming more active with the introduction of SM and have access to online health records. Patients’ use of SM has strengthened their interaction with their health professionals. ii. Health Seekers: The health seeker is mindful of any individual’s encouragement for better or worse health. Health seeking is the natural pursuit of one’s proper wellbeing balance, the continuous step toward one’s own core and the appreciation of “normal” health. iii. Lay users: Lay users are the ultimate beneficiaries of the developments in SM in everyday settings. It is easy for them to communicate the health service officials and improve their health.
4 Discussion The proliferation of social networks like Facebook makes it much harder to keep distance between the doctor and patient (Paul et al. 2016). SM improves the bond between the patient and the doctor with professionalism. RQ3 raises a question, will healthcare professionals consider a patient as a social network friend, and if not would refuse the patient to be misunderstood and create issues in the relationship? Many members of the medical community agree that it is unwise for healthcare professionals to communicate with patients on social networking platforms or various reasons, however, primarily to prevent obscuring the distinction between personal and professional life. SM members are encouraged to connect, find, and understand each other (Jordan et al. 2019). ‘Sharing’ means updating the online profile with all health-related medical history material, including disease progression and symptoms, and supporting therapy. Participants use their online health accounts to monitor their progress and may opt to share the details with others or even their physicians. “Search” refers to the search engine for the website that allows a member to locate people who have the same medical conditions. Using the medical profiles, searches can be reduced to diagnosis, geographical location, age, and gender, not just illness or symptoms. SNSs and patient decision-making help correct assessment methods should be used to determine the quality of healthcare interactions or opinions. Responsive healthcare providers have also started to see the value of engaging and collaborating with patients through various SM platforms to foster self-management
236
G. A. Abdelguiom and N. A. Iahad
support, achieve positive health outcomes, and empower patients with chronic disease such as diabetes (De Martino et al. 2017). There are empirical underpinnings which support the claim that the utilization of SM in the context of health-related support groups have more noteworthy results, particularly, to those in need of emotional support. It can connect the health care providers to patients’ perspectives and needs to enable and empower them to develop strategies of coping with their health concerns (Grajales III et al. 2014) Interestingly, SM proved highly effective in hard-to-reach clinical populations, such as patients with rare diseases it can offer information on those issues (Davies 2016; Jacobs et al. 2016). Interestingly, SM proved highly effective in hard-to-reach clinical populations, such as patients with rare diseases (Davies 2016; Jacobs et al. 2016). The most striking outcome is that SM plays a vital role in supporting public health practices by facilitating reaching and identifying the targeted population for an intervention in case of disease outbreaks. Additionally, it can be used to incorporate investigation, analysis, management, and surveillance of disease outbreak (Hossain et al. 2016; McGough et al. 2017). There is a consensus that the utilization of SM can be a means of data sources for public health surveillance. It can also provide complementary information on health situations. Not to mention the considerable impact on the detection of epidemic diseases through the sharing of users’ information (Aiello et al. 2020). A large number of publications have emphasized that SM can offer new instruments to health authorities to support decision-making on the global, domestic, regional, and corporate levels (Lewin et al. 2015).This technology channel may have a profound impact on the advancement of healthcare-related issues, such as safe eating habits, physical activity, and overweight prevention (Gamache-OLeary and Grant 2017). The study found that self-management has become simpler for people who want to share their medical information, because health information is clearly organized according to actual data. Persons were often drawn with the opportunity to locate other patients who matched the advice or reassurance on treatment or symptoms both demographically and medically. While patients may feel empowered by websites such as SM, because of their ability to quantify and understand the everyday decision they make about their health, there is also a commercial operation behind these tools which sell patients’ data to private companies that might not prioritize the same goals as patients (Myrick et al. 2016). Experts advocate that regardless if participants state they do not care who has access to their medical data, or how public it may be, the users are vulnerable to unintentional and intentional security threats and misuse (Omary 2018), raising discriminations in health care, employment, etc.
5 Conclusion The aim of the proposed systematic literature review is to examine, analyze, and survey the current researches to explore the influence of SM on the health domain then summarizes the current evidence of understanding of using SM in the health domain, and how this platforms can contribute to promoting inflow of delivering health services and describes the main topics being discussed among the articles included in this review.
Potential Benefits of Social Media to Healthcare
237
Likewise reviews the shortcomings of previous researches and recommends to the direction of the future research. Prior studies found that generally SM has opened new opportunities for the strengthening the empowerment model and for participatory healthcare, and Allows shared decision making between patients and healthcare providers which in turn can lead to autonomy, improved communication, greater self-efficacy, and patient satisfaction. One of the more significant findings to emerge from this study is that usage of SM has significant potential to enhance the well-being of patients with long and short-term ailments. The Researches must trend to the adoption of SM platforms in the health domain Because of it acceptable and promising. Utilize this conceivably great innovation as a useful health knowledge transfer mechanism. The significance of SM in the context of healthcare is clearly supported by the current findings that people who suffer from having poor health more likely to share health information from online sources. The researchers have highlighted pieces of evidence that health care services are productively utilize the SM as a primary tool to deliver messages regarding health information to people who are residing in economically overburdened (low-income) countries. In those countries, individuals are often helpless to immediate health-related access and patients also may not have the health- insurance. A greater focus on SM in countries that lack state-of-the-art healthcare systems could produce interesting findings that account more for may be used to facilitate patient-centric healthcare by involving the patient in fulfilling personal healthcare needs. The findings of this research provide insights into this platforms can promote a health awareness and empowering patients on sensitive health issues such as (homosexuals). Furthermore, it can be contributed viably to control and prevent of the non- communicable diseases that resulting of lack of awareness and which have a negative impact on the health economics of those countries that already suffer from financials risk., Future researches should focus on how health institutions in developing and low-income countries, can take advantage of the proliferation and growth rates of SM platforms for harness them for helping to establish healthy behaviors and the healthy lifestyle among citizen. The findings of this study have a number of various imperative ramifications implications for future practice given that the SM new open doors opportunities for the empowerment model and for participatory HealthCare. That Usage of SM in health could produce interesting findings that account more it can improve health literacy and selfmanagement of health at the individual level and increase the efficiency in the provision of health services at the institutional level.
References Alhabash, S., Ma, M.: A tale of four platforms: motivations and uses of Facebook, Twitter, Instagram, and Snapchat among college students? Soc. Media+ Soc. 3(1) (2017). https://doi.org/10. 1177/2056305117691544 Allington, D., Duffy, B., Wessely, S., Dhavan, N., Rubin, J.: Health-protective behaviour, social media usage and conspiracy belief during the COVID-19 public health emergency. Psychol. Med. 1–7 (2020) AlSayyad, N., Guvenc, M.: Virtual uprisings: on the interaction of new social media, traditional media coverage and urban space during the ‘Arab Spring.’ Urban Stud. 52(11), 2018–2034 (2015)
238
G. A. Abdelguiom and N. A. Iahad
Aschbrenner, K.A., Naslund, J.A., Shevenell, M., Kinney, E., Bartels, S.J.: A pilot study of a peergroup lifestyle intervention enhanced with mHealth technology and social media for adults with serious mental illness. J. Nerv. Ment. Dis. 204(6), 483 (2016) Carr, C.T., Hayes, R.A.: Social media: defining, developing, and divining. Atlantic J. Commun. 23(1), 46–65 (2015) Chung, A.E., Skinner, A.C., Hasty, S.E., Perrin, E.M.: Tweeting to health: a novel mHealth intervention using Fitbits and Twitter to foster healthy lifestyles. Clin. Pediatr. 56(1), 26–32 (2017) Dadgar, M., Joshi, K.D.: The role of information and communication technology in selfmanagement of chronic diseases: an empirical investigation through value sensitive design. J. Assoc. Inf. Syst. 19(2), 2 (2018) Dai, X., Bikdash, M., Meyer, B.: From social media to public health surveillance: word embedding based clustering method for Twitter classification. Paper presented at the SoutheastCon 2017 (2017) de Almeida Marques-Toledo, C., Degener, C.M., Vinhal, L., Coelho, G., Meira, W., Codeço, C.T., Teixeira, M.M.: Dengue prediction by the web: tweets are a useful tool for estimating and forecasting dengue at country and city level. PLoS Negl. Trop. Dis. 11(7), e0005729 (2017) De Martino, I., D’Apolito, R., McLawhorn, A.S., Fehring, K.A., Sculco, P.K., Gasparini, G.: Social media for patients: benefits and drawbacks. Curr. Rev. Musculoskelet. Med. 10(1), 141–145 (2017) Dekker, R., Engbersen, G., Klaver, J., Vonk, H.: Smart refugees: how Syrian asylum migrants use social media information in migration decision-making. Soc. Media+ Soc. 4(1) (2018). https:// doi.org/10.1177/2056305118764439 Deng, Z., Liu, S.: Understanding consumer health information-seeking behavior from the perspective of the risk perception attitude framework and social support in mobile social media websites. Int. J. Med. Inform. 105, 98–109 (2017) Depoux, A., Martin, S., Karafillakis, E., Preet, R., Wilder-Smith, A., Larson, H.: The pandemic of social media panic travels faster than the COVID-19 outbreak. Oxford University Press (2020) Fernández-Luque, L., Bau, T.: Health and social media: perfect storm of information. Healthc. Inform. Res. 21(2), 67–73 (2015) Fung, I.C.-H., Hao, Y., Cai, J., Ying, Y., Schaible, B.J., Yu, C.M., Tse, Z.T.H., Fu, K.-W.: Chinese social media reaction to information about 42 notifiable infectious diseases. Plos ONE 10(5), e0126092 (2015) Fung, I.C.-H., Tse, Z.T.H., Fu, K.-W.: The use of social media in public health surveillance. West. Pac. Surveill. Response J. WPSAR 6(2), 3 (2015) Gamache-OLeary, V., Grant, G.: Social media in healthcare. Paper presented at the Proceedings of the 50th Hawaii International Conference on System Sciences (2017) Glover, M., Khalilzadeh, O., Choy, G., Prabhakar, A.M., Pandharipande, P.V., Gazelle, G.S.: Hospital evaluations by social media: a comparative analysis of Facebook ratings among performance outliers. J. Gen. Intern. Med. 30(10), 1440–1446 (2015) Gore, R.J., Diallo, S., Padilla, J.: You are what you tweet: connecting the geographic variation in America’s obesity rate to Twitter content. PLoS ONE 10(9), e0133505 (2015) Grajales III, F.J., Sheps, S., Ho, K., Novak-Lauscher, H., Eysenbach, G.: Social media: a review and tutorial of applications in medicine and health care. J. Med. Internet Res. 16(2), e13 (2014) Gruver, R.S., Bishop-Gilyard, C.T., Lieberman, A., Gerdes, M., Virudachalam, S., Suh, A.W., Kalra, G.K., Magge, S.N., Shults, J., Schreiner, M.S.: A social media peer group intervention for mothers to prevent obesity and promote healthy growth from infancy: development and pilot trial. JMIR Res. Protoc. 5(3), e159 (2016) Hossain, L., Kam, D., Kong, F., Wigand, R., Bossomaier, T.: Social media in Ebola outbreak. Epidemiol. Infect. 144(10), 2136–2143 (2016)
Potential Benefits of Social Media to Healthcare
239
Jordan, S.E., Hovet, S.E., Fung, I.C.-H., Liang, H., Fu, K.-W., Tse, Z.T.H.: Using Twitter for public health surveillance from monitoring and prediction to public response. Data 4(1), 6 (2019) Kaufman, M.R., Cornish, F., Zimmerman, R.S., Johnson, B.T.: Health behavior change models for HIV prevention and AIDS care: practical recommendations for a multi-level approach. J. Acquir. Immune Defic. Syndr. 66(Suppl 3), S250 (2014) Keir, A., Bamat, N., Patel, R.M., Elkhateeb, O., Roland, D.: Utilising social media to educate and inform healthcare professionals, policy-makers and the broader community in evidence-based healthcare. BMJ Evid.-Based Med. 24(3), 87–89 (2019) Keles, B., McCrae, N., Grealish, A.: A systematic review: the influence of social media on depression, anxiety and psychological distress in adolescents. Int. J. Adolesc. Youth 25(1), 79–93 (2020) Kim, S.H., Utz, S.: Effectiveness of a social media-based, health literacy-sensitive diabetes selfmanagement intervention: a randomized controlled trial. J. Nurs. Scholarsh. 51(6), 661–669 (2019) Knight, E., Werstine, R.J., Rasmussen-Pennington, D.M., Fitzsimmons, D., Petrella, R.J.: Physical therapy 2.0: leveraging social media to engage patients in rehabilitation and health promotion. Phys. Ther. 95(3), 389–396 (2015) Lee, J.L., Choudhry, N.K., Wu, A.W., Matlin, O.S., Brennan, T.A., Shrank, W.H.: Patient use of email, Facebook, and physician websites to communicate with physicians: a national online survey of retail pharmacy users. J. Gen. Intern. Med. 31(1), 45–51 (2016) Lehmiller, J.J., Ioerger, M.: Social networking smartphone applications and sexual health outcomes among men who have sex with men. PLoS ONE 9(1), e86603 (2014) Lewin, S., Glenton, C., Munthe-Kaas, H., Carlsen, B., Colvin, C.J., Gülmezoglu, M., Noyes, J., Booth, A., Garside, R., Rashidian, A.: Using qualitative evidence in decision making for health and social interventions: an approach to assess confidence in findings from qualitative evidence syntheses (GRADE-CERQual). PLoS Med. 12(10), e1001895 (2015) Li, C., Chen, L.J., Chen, X., Zhang, M., Pang, C.P., Chen, H.: Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020. Eurosurveillance 25(10), 2000199 (2020) Li, Y., Wang, X., Lin, X., Hajli, M.: Seeking and sharing health information on social media: a net valence model and cross-cultural comparison. Technol. Forecast. Soc. Change 126, 28–40 (2018) Lim, S., Tucker, C.S., Kumara, S.: An unsupervised machine learning model for discovering latent infectious diseases using social media data. J. Biomed. Inform. 66, 82–94 (2017) Liu, S., Young, S.D.: A survey of social media data analysis for physical activity surveillance. J. Forensic Leg. Med. 57, 33–36 (2018) McCaughey, D., Baumgardner, C., Gaudes, A., LaRochelle, D., Wu, K.J., Raichura, T.: Best practices in social media: utilizing a value matrix to assess social media’s impact on health care. Soc. Sci. Comput. Rev. 32(5), 575–589 (2014) McGough, S.F., Brownstein, J.S., Hawkins, J.B., Santillana, M.: Forecasting Zika incidence in the 2016 Latin America outbreak combining traditional disease surveillance with search, social media, and news report data. PLoS Negl. Trop. Dis. 11(1), e0005295 (2017) McGregor, F., Somner, J.E., Bourne, R.R., Munn-Giddings, C., Shah, P., Cross, V.: Social media use by patients with glaucoma: what can we learn? Ophthalmic Physiol. Opt. 34(1), 46–52 (2014) Merolli, M., Gray, K., Martin-Sanchez, F., Lopez-Campos, G.: Patient-reported outcomes and therapeutic affordances of social media: findings from a global online survey of people with chronic pain. J. Med. Internet Res. 17(1), e20 (2015) Moessner, M., Feldhege, J., Wolf, M., Bauer, S.: Analyzing big data in social media: text and network analyses of an eating disorder forum. Int. J. Eat. Disord. 51(7), 656–667 (2018)
240
G. A. Abdelguiom and N. A. Iahad
Mohanty, S., Leader, A.E., Gibeau, E., Johnson, C.: Using Facebook to reach adolescents for human papillomavirus (HPV) vaccination. Vaccine 36(40), 5955–5961 (2018) Munger, K., Bonneau, R., Nagler, J., Tucker, J.A.: Elites tweet to get feet off the streets: measuring regime social media strategies during protest. Polit. Sci. Res. Methods 7(4), 815–834 (2019) Nagar, R., Yuan, Q., Freifeld, C.C., Santillana, M., Nojima, A., Chunara, R., Brownstein, J.S.: A case study of the New York City 2012–2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives. J. Med. Internet Res. 16(10), e236 (2014) Naslund, J., Aschbrenner, K., Marsch, L., Bartels, S.: The future of mental health care: peer-to-peer support and social media. Epidemiol. Psychiatr. Sci. 25(2), 113–122 (2016) Oh, S.-H., Lee, S.Y., Han, C.: The effects of social media use on preventive behaviors during infectious disease outbreaks: the mediating role of self-relevant emotions and public risk perception. Health Commun. 1–10 (2020) Omary, R.A.: Social media and education in radiology: let’s start with why. Acad. Radiol. 25(6), 744–746 (2018) Panahi, S., Watson, J., Partridge, H.: Social media and physicians: exploring the benefits and challenges. Health Inform. J. 22(2), 99–112 (2016) Paul, M.J., Sarker, A., Brownstein, J.S., Nikfarjam, A., Scotch, M., Smith, K.L., Gonzalez, G.: Social media mining for public health monitoring and surveillance. Paper presented at the Biocomputing 2016: Proceedings of the Pacific Symposium (2016) Pershad, Y., Hangge, P.T., Albadawi, H., Oklu, R.: Social medicine: Twitter in healthcare. J. Clin. Med. 7(6), 121 (2018) Rhodes, S.D., McCoy, T.P., Tanner, A.E., Stowers, J., Bachmann, L.H., Nguyen, A.L., Ross, M.W.: Using social media to increase HIV testing among gay and bisexual men, other men who have sex with men, and transgender persons: outcomes from a randomized community trial. Clin. Infect. Dis. 62(11), 1450–1453 (2016) Sarker, A., O’Connor, K., Ginn, R., Scotch, M., Smith, K., Malone, D., Gonzalez, G.: Social media mining for toxicovigilance: automatic monitoring of prescription medication abuse from Twitter. Drug Saf. 39(3), 231–240 (2016) Tursunbayeva, A., Franco, M., Pagliari, C.: Use of social media for e-Government in the public health sector: a systematic review of published studies. Gov. Inf. Q. 34(2), 270–282 (2017) Vannucci, A., Simpson, E.G., Gagnon, S., Ohannessian, C.M.: Social media use and risky behaviors in adolescents: a meta-analysis. J. Adolesc. 79, 258–274 (2020) Vaterlaus, J.M., Patten, E.V., Roche, C., Young, J.A.: # Gettinghealthy: the perceived influence of social media on young adult health behaviors. Comput. Hum. Behav. 45, 151–157 (2015). https://doi.org/10.1016/j.chb.2014.12.013 Ventola, C.L.: Social media and health care professionals: benefits, risks, and best practices. Pharm. Ther. 39(7), 491 (2014) Weiss, D., Rydland, H.T., Øversveen, E., Jensen, M.R., Solhaug, S., Krokstad, S.: Innovative technologies and social inequalities in health: a scoping review of the literature. PLoS ONE 13(4), e0195447 (2018) Wyche, S., Baumer, E.P.: Imagined Facebook: an exploratory study of non-users’ perceptions of social media in rural Zambia. New Media Soc. 19(7), 1092–1108 (2017) Xu, S., Markson, C., Costello, K.L., Xing, C.Y., Demissie, K., Llanos, A.A.: Leveraging social media to promote public health knowledge: example of cancer awareness via Twitter. JMIR Public Health Surveill. 2(1), e17 (2016) Zhang, J., Brackbill, D., Yang, S., Centola, D.: Efficacy and causal mechanism of an online social media intervention to increase physical activity: results of a randomized controlled trial. Prev. Med. Rep. 2, 651–657 (2015)
Potential Benefits of Social Media to Healthcare
241
Zhang, X., Wen, D., Liang, J., Lei, J.: How the public uses social media wechat to obtain health information in China: a survey study. BMC Med. Inform. Decis. Mak. 17(2), 71–79 (2017) Zhou, L., Zhang, D., Yang, C.C., Wang, Y.: Harnessing social media for health information management. Electron. Commer. Res. Appl. 27, 139–151 (2018)
Exploring the Influence of Human-Centered Design on User Experience in Health Informatics Sector: A Systematic Review Lina Fatini Azmi1(B) and Norasnita Ahmad2 1 UTM Research Computing, Department of Deputy Vice-Chancellor (Research and
Innovation), Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia [email protected] 2 Azman Hashim International Business School, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia [email protected]
Abstract. Integrating human-centered design (HCD) approach in healthcare informatics solution are changing the landscape of the e-services and e-satisfaction among the users. Major evolution of informatics system in healthcare organization helps to revolve the role of design by changing it into key element that centralize on user’s capability on improving their e-service. Ample studies on implementing element of humanities into user experience-based design are now being adapted in order to enhance satisfaction and utmost benefits to users. This paper is built on a systematic literature review of academic papers that seeks to explore the influence of human-centered design approach towards user experience in health informatics sector. The total number of selected literatures using PRISMA process for this study is n = 64. The obtained results of this study highlighted the relation between human-centered design approach and user experience. This study also illustrates the process of human-centered design flow adapted from selected studies focus on healthcare sector in a unique approach to developing user-friendly informatics system to bridge the user experience gap. Keywords: Systematic review · Human-centered design · Health informatics system · Health informatics sector
1 Introduction Lately, user experience is playing a major role in usage of health informatics system that create a leaning trend that incline toward optimization of user experience through humancentered design. Human-centered design play a central role by helping system developer to develop a human-centric information system based on listening to and understanding user experiences, needs, and expectations [1]. Human-centered design is an innovative approach that was first used by corporations to create new products and services but in the same time it is intensely preoccupy the healthcare sector [2]. Human-centered design begins on how individuals understand their needs and designing from their viewpoint © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 242–251, 2021. https://doi.org/10.1007/978-3-030-70713-2_24
Exploring the Influence of Human-Centered Design on User Experience
243
[3]. Unfortunately, user satisfaction is difficult to measure. Therefore, user experience is crucial in designing human-centric informatics which can be illustrate on vast volume of exploratory research that being conducted to gather evidence [4] and various guideline have been developed to meet varying levels of user satisfaction. Moreover, study by [5] proved that the concept of user experience is a slightly unattended area in healthcare technology sector. Currently, studies on human-centered design in healthcare sector are widely discussed. Based on literature review of previous works, human-centered design approach usually starts with a brainstorming among team members on how to design an informatics system that works closer to human being. Then, prototypes are developed as a tool to accumulate user feedback. Hence, this review aims to explore the influence of system design towards feedback of user experience on human-centric health informatics system. This review also seeks to prove that system design contributes high influence to gain positive experience.
2 Previous Works Several preceding works that revolve around healthcare industry have been reviewed. Review by [6] also use PRISMA model to locate, select and include their selected studies. The review was conducted to assess the adequacy HCI and user-centered design in the development of e-mental health interventions. Next, study by [7] systematically reviewed using the Cochrane Handbook and PRISMA methods and guidelines. This goal of this study was to establish a measure of the user-centeredness of development processes and to define optimal practices. Another study by [8] also use PRISMA process for cross study analysis and synthesis. This study comprehensively details the problems with wearable Ventricular Assist Devices (VAD) systems and recommends a way to close the gap through human-centered design.
3 Method This study aims to systematically review the latest qualitative literature of user experience on human-centered design within health informatics system, analyze current evidence, and identify further research issues. The protocol for this systematic review is based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [9] which is a protocol that was developed and reviewed by the co-author until consensus was reached about the research questions and methods. According to [10], PRISMA statement was use to guarantee high-quality reporting of systematic reviews or metaanalyses published in studies in the healthcare field. In this study, user experience was defined as a user’s feedback about the implementation of human-centered design on the health informatics system. Human-centered design was defined as an innovation to humanize the design of informatics system to become more useful for people. Finally, the health informatics system was defined as e-services platform provided by healthcare providers. Protocol for this study is arrange as follows: i) research question developed ii) outlined search strategy iii) described inclusion and exclusion criteria iv) developed quality assessment and data extraction v) and, data analysis.
244
L. F. Azmi and N. Ahmad
3.1 Research Questions The review was guided by two specific questions: 1. What is the relation between human-centered design and user experience? 2. Does human-centered design approach influence user experience in health informatics system, and if so, how? 3.2 Search Strategy Five databases were searched using a broad search strategy; ScienceDirect, Scopus, Web of Science, SpringerLink, and Emerald Insight. The search terms were composed of variations of the following key terms: 1) user experience or satisfaction or perspective 2) user feedback 3) human-centered design 4) health informatics sector 5) health informatics system or platform. A brief grey literature search in Google Scholar was also conducted to expand the searching of relevant articles. As a result, Mendeley Desktop was used to store the findings of each database and to prevent duplicate studies and store all the citations. 3.3 Inclusion and Exclusion Criteria The flow of selected studies was conducted as in (see Fig. 1.). Full text screening of all articles was assessed using predefined inclusion criteria. The following studies were eligible for inclusion; (1) studies in health informatics sector area, (2) studies with full text articles in English, (3) studies assessing user experience within the health informatics system, (4) studies involving human-centered design of health informatics system, (5) studies published between 2015–2021, (6) studies published in journals and conference proceeding only. Studies were excluded if: (1) studies in other sector, (2) studies with no full text available and written in other language, (3) studies did not mention about user experience, (4) studies did not implement human-centered design on health informatics system, (5) studies published did not within determined period. 3.4 Quality Assessment and Data Extraction Selected studies were assessed for quality to ensure that the final list of studies answered the research questions before data extraction and analysis step. The following data was extracted from each selected study; study details (i.e. authors, title, publication year, source, methodology, issues/topics, theories, result/findings, future work). The data were tracked, gathered and recorded using Mendeley Desktop and Microsoft Excel. 3.5 Data Analysis The abstract sections of each study were read line-by-line to identify their aims, methodology and findings. In the beginning of identification phase, the database searching process combined with backward and forward searching method, 495 selected studies were
Exploring the Influence of Human-Centered Design on User Experience
245
Identification
subjected to a screening phase to remove irrelevant articles using inclusion and exclusion criteria. Following the screening phase, 105 studies were retained for full reading and 42 were removed throughout the eligibility phase after the assessment for quality stage conducted. As for the result in included phase, only 64 studies were selected in this review study as the flow of data analysis shown in Fig. 1.
Records identified through database searching (n=511)
Additional records identified through other sources (n=22)
Records after duplicates removed (n=495)
Screening
Records excluded (n=390)
Eligibility
Full-text articles assessed for eligibility (n=105)
Included
Records screened (n=105)
Studies included in SLR (n=64)
Full-text articles excluded, with reasons (n=41)
Fig. 1. PRISMA flow chart
4 Result 4.1 Year Published Over 83% of the papers (n = 53) that reported on user experience using human-centered design in health informatics sector were published in the latest of our chosen time period
246
L. F. Azmi and N. Ahmad
(2018 = 6; 2019 = 24; 2020 = 23). The remaining papers (n = 11) were published in the earlier years of chosen time period (2015 = 4; 2016 = 2; 2017 = 5). Therefore, it clearly shows that studies on this topic are widely discussed in latest previous year. 4.2 Methodology Only 10% of the selected studies used mixed mode method while none of them use quantitative method in the area of study. Most of the selected studies in this domain area used qualitative method in various approaches includes case study, focus group, interview, and observation. This is strongly shows that qualitative method is the best method to be used in human-centered design studies. The reason of these because humancentered design is an approach to develop informatics system, therefore, interview is the main technique in data collection for this area of study. 4.3 Publication Sources Almost all of the selected studies (between 2015–2020) were published in journals with majority distribution of 73%. Then, it followed by conference papers with 23%, while research studies only 4%. 4.4 Context of Study The context of the reported studies were: general health issues (n = 25), people with mental illness (n = 6), chronic health condition (n = 4), older adults (n = 3), rehabilitation treatment (n = 3), children (n = 3), hypospadias surgery (n = 2), HIV (n = 2), people with cancer (n = 2), people with asthma (n = 2), woman (n = 2), glucose patient (n = 1), people undergoes dialysis treatment (n = 1), psychotherapies (n = 1), pelvic exam (n = 1), spine surgery (n = 1), people with dementia (n = 1), people with hearing impaired (n = 1), people with arthritis (n = 1), radiology (n = 1), pulmonary embolism (n = 1). 4.5 Discussion of SLR RQ1: What is the relation between human-centered design (HCD) and user experience (UX)? In order to prove the influence of human-centered design towards user experience, evidence of their relation is needed. There are several evidences of relation between human-centered design and user experience reported by various studies in health informatics sector. Table 1 shows some evidences of relation between HCD and UX highlighted in the selected studies. Based on the Table 1, it proved that HCD have its own influences towards positive user experience and HCD can helps in improving healthcare system. RQ2: Does human-centered design approach influence user experience in health informatics system, and if so, how? Yes. Various studies using human-centered design approach as the methodology in their studies helps to improve their services and user experience instantly. Based on
Exploring the Influence of Human-Centered Design on User Experience
247
Table 1. Relation between HCD and UX. Author
Relation
[11]
“Stakeholders in healthcare system started realizing the importance of user experience …. required involving human-centered design and engineering to structured healthcare design paradigm”
[12]
“Applies HCD thinking to solve critical healthcare problems that is more focused on the patient and the patient experience”
[8]
“The user experience resulting from the design of the wearable system … by positioning human-centered design opportunities at the intersection of human factors and user experience”
[13]
“Human-centered design research methodologies put user experience and needs at the forefront of the development process … result in the next generation of patient-friendly healthcare”
[14]
“HCD make systems usable and useful by focusing more on the design analysis to improve user experience”
[15]
“The team strived to design usable interface using HCD approach to support learnability …, while providing a pleasant user experience …”
[16]
“Suggesting the need for new design approaches for population of low user experience smoking cessation app”
[17]
“.. human-centered design approach combines with design thinking methodology focuses on the end-user experience to generate innovation …”
[18]
“Incorporating HCD methodology into this problem allows for the consideration of user experiences when making design decisions”
[19, 20]
“HCD able to addresses the whole user experience, including the context in which the user finds his/herself”
[21]
“…. requires thoughtful design of user-friendly interfaces that consider user experience and present data in personalized ways …”
[22]
“Design processes such as HCD, ….. can be crucial in ensuring that the product meets the needs …, in terms of safety and user experience.”
[7]
“HCD is a highly iterative method for optimizing the user experience and the effectiveness of the system, service or product”
[6]
“… understanding HCI and HCD is an important factor in developing successful computer user experience”
[23]
“… apps/wearable … should be designed to leverage and further improve the user experience …”
[24]
“.. to improve designer’s abilities and tools to search for user experience, active participation of users to design process and sharing their experiences with designers … is critical to develop more inclusive environments..”
[25]
“… an appropriate selection of game-design principles, … may improve the usability and user experience of a system” (continued)
248
L. F. Azmi and N. Ahmad Table 1. (continued)
Author
Relation
[26]
“In contrast, HCD focuses primarily on individual user experiences.”
[1]
“… using HCD to optimize user experience at a tertiary academic medical center.”
[27]
“The human-centered design thinking methodology … identifying and defining the problem… a deep understanding of user experience”
[28]
“A review of the design … reducing the burden of cognitive strain experience”
selected studies, many of them have been explained the way on how they implemented human-centered design approach in their development phase of health informatics system. To conclude the human-centered design method used by selected studies, see Fig. 2 shows the flow of the process.
Positive User Experience
Inspiration • • • •
Complaining Sharing Information Listening & Feedback Consulting & Advising
Human Centered Design
Delivery
Ideation • • •
Identify Needs Research & Analysis Design Concept
Implementation • • •
Develop Prototype Pilot Testing User Experience Evaluation
Fig. 2. Flow of process human-centered design (self-adaption)
Reported on various studies, human-centered design (HCD) consists of 3 main phases; inspiration, ideation, and implementation. At the end of these phases, the system will be delivered. First phase is inspiration, which is the beginning of ideas by listening to complains, sharing information by user, listening user’s feedback and consultation from the expert in order to improve their products/services. Next, ideation phase that converge all the ideas that helps to cater the needs of users which lead to research and analysis stage of the results and reshape it into vital piece on designing the concept of
Exploring the Influence of Human-Centered Design on User Experience
249
HCD. Third phase is implementation, which is to implement the ideas by developing prototype, pilot testing, and user experience evaluation. These 3 phases are iterative until the final evaluations by the users are reaching their goals. Lastly, they can deliver the product to the user and gain positive user experience.
5 Discussion Result shows that implementing human-centered design principle in development process of informatics system phase able to expand the services productivity. All the selected studies focus on the same aim which is developing products based on human-centered design as their methodology in order to improve the user experience and satisfaction. The upshot of evolution in technologies nowadays forcing many sectors to change their nature of working from paper-based to system-based. Thus, designing a system is challenging as they need to consider the needs, expectation and experience of the users. Moreover, this domain area of studies still debatable actively on previous research and studies. Therefore, it is proven that further study about user experience on human-centered design implementation to the technologies is paramount to improve the e-services in healthcare sector. In addition, this study helps in proving the influences of human centered design (HCD) towards positive user experience (UX).
6 Conclusion This systematic literature review is conducted for studies published in health informatics sector between 2015 until 2020. Even though this domain area has varieties of discussion on previous year studies but through these days, only few studies particularly focusing on applying human-centered design approach on health informatics sector. Mostly, studies focused on improving their e-services in health sector using human-centered design approach. Out of all publication within the selected time period, only 64 studies met the inclusion criteria for the review. This study also points out relation between humancentered design approach, user experience and also the process of influence of humancentered design implementation to the informatics system.
References 1. Vagal, A., Wahab, S., et al.: Optimizing patient experience using human-centered design. J. Am. Coll. Radiol. 17(5), 668–672 (2020). https://doi.org/10.1016/j.jacr.2019.11.020 2. Kim, S.H., Myers, C.G., Allen, L.: Health care providers can use design thinking to improve patient experiences. Harv. Bus. Rev. 95, 222–229 (2017) 3. Searl, M.M., Borgi, L., Chemali, Z.: It is time to talk about people: a human-centered healthcare system. Health Res. Policy Syst. 8, 35 (2010) 4. Matheson, G.O., et al.: Leveraging human-centered design in chronic disease prevention. Am. J. Prev. Med. 48(4), 472–479 (2015). https://doi.org/10.1016/j.amepre.2014.10.014 5. Trauzettel, F., Minge, M.: Usability in the lifecycle of medical software development. Curr. Dir. Biomed. Eng. 2(1), 583–586 (2016). https://doi.org/10.1515/cdbme-2016-0129
250
L. F. Azmi and N. Ahmad
6. Søgaard Neilsen, A., Wilson, R.L.: Combining e-mental health intervention development with human computer interaction (HCI) design to enhance technology-facilitated recovery for people with depression and/or anxiety conditions: an integrative literature review. Int. J. Ment. Health Nurs. 28(1), 22–39 (2019). https://doi.org/10.1111/inm.12527 7. Witteman, H.O., et al.: User-centered design and the development of patient decision aids: protocol for a systematic review. Syst. Rev. 4(1) (2015). https://doi.org/10.1186/2046-40534-11 8. Dunn, J.L., et al.: Human factors and user experience issues with ventricular assist device wearable components: a systematic review. Ann. Biomed. Eng. 2431–2488 (2019). https:// doi.org/10.1007/s10439-019-02303-3 9. Shamseer, L., et al.: Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ (Online) g7647 (2015). https:// doi.org/10.1136/bmj.g7647 10. Panic, N., et al.: Evaluation of the endorsement of the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement on the quality of published systematic review and meta-analyses. PLoS ONE (2013). https://doi.org/10.1371/journal.pone.0083138 11. Dey, N., Rautray, P., Soni, M.: Patient-centered design in a connected healthcare world: a case study. Smart Innovation, Systems and Technologies, pp. 967–976. Springer Science and Business Media Deutschland GmbH (2019). https://doi.org/10.1007/978-981-13-5974-3_83 12. Gomes, N., Patwardhan, V.: Applying human-centered design and human-machine integration techniques to solve key healthcare problems. Advances in Intelligent Systems and Computing, pp. 3–9. Springer (2019). https://doi.org/10.1007/978-3-030-02053-8_1 13. Park, T., et al.: Living profiles: an example of user-centered design in developing a teenoriented personal health record. Pers. Ubiquit. Comput. 19(1), 69–77 (2015). https://doi.org/ 10.1007/s00779-014-0812-1 14. Khakurel, J., et al.: Human-centered design components in spiral model to improve mobility of older adults, pp. 83–104. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-934 91-4_5 15. Sebillo, M., et al.: Human-centered design of a personal medication assistant - putting polypharmacy management into patient’s hand! Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 685–699. Springer (2017). https://doi.org/10.1007/978-3-319-57186-7_49 16. Vilardaga, R., et al.: User-centered design of learn to quit, a smoking cessation smartphone app for people with serious mental illness. J. Med. Internet Res. 20(1) (2018). https://doi.org/ 10.2196/games.8881 17. Carter, J., Bababekov, Y.J., Majmudar, M.D.: Training for our digital future: a human-centered design approach to graduate medical education for aspiring clinician-innovators. npj Digital Med. 1(1), 26 (2018). https://doi.org/10.1038/s41746-018-0034-4 18. Taylor, G.A., McDonagh, D., Hansen, M.J.: Improving the pelvic exam experience: a humancentered design study. Des. J. 20(sup1), S2348–S2362 (2017). https://doi.org/10.1080/146 06925.2017.1352750 19. Holeman, I., Kane, D.: Human-centered design for global health equity. Inf. Technol. Dev. 26(3), 477–505 (2020). https://doi.org/10.1080/02681102.2019.1667289 20. Persson, J.: A review of the design and development processes of simulation for training in healthcare – a technology-centered versus a human-centered perspective. Appl. Ergon. 314–326 (2017). https://doi.org/10.1016/j.apergo.2016.07.007 21. Hartzler, A.L., et al.: Integrating patient-reported outcomes into spine surgical care through visual dashboards: lessons learned from human-centered design. eGEMs (Generating Evid. Methods Improve Patient Outcomes) 3(2), 2 (2015). https://doi.org/10.13063/2327-9214. 1133
Exploring the Influence of Human-Centered Design on User Experience
251
22. Harte, R., et al.: Human-centered design study: enhancing the usability of a mobile phone app in an integrated falls risk detection system for use by older adult users. JMIR mHealth uHealth 5(5), e71 (2017). https://doi.org/10.2196/mhealth.7046 23. Wulfovich, S., et al.: “I must try harder”: design implications for mobile apps and wearables contributing to self-efficacy of patients with chronic conditions. Front. Psychol. 10 (2019). https://doi.org/10.3389/fpsyg.2019.02388 24. Özten Anay, M.: Design thinking to familiarize hearing-impaired architectural drafting students with human-centered design concept. ICONARP Int. J. Archit. Plan. 8(1), 62–87 (2020). https://doi.org/10.15320/iconarp.2020.105 25. Schulz, R., Martinez, S., Hara, T.: Towards a game-design framework for evidence-based clinical procedure libraries. In: 2019 IEEE 7th International Conference on Serious Games and Applications for Health, SeGAH 2019. Institute of Electrical and Electronics Engineers Inc. (2019). https://doi.org/10.1109/SeGAH.2019.8882474 26. Chen, E., et al.: Enhancing community-based participatory research through human-centered design strategies. Health Promot. Pract. 21(1), 37–48 (2020). https://doi.org/10.1177/152483 9919850557 27. Vagal, A., Wahab, S.A., et al.: Human-centered design thinking in radiology. J. Am. Coll. Radiol. 17(5), 662–667 (2020). https://doi.org/10.1016/j.jacr.2019.11.019 28. Faiola, A., Srinivas, P., Duke, J.: Supporting clinical cognition: a human-centered approach to a novel ICU information visualization dashboard. In: AMIA ... Annual Symposium Proceedings, AMIA Symposium, pp. 560–569 (2015). https://pubmed.ncbi.nlm.nih.gov/26958190/. Accessed 9 Sep 2020
An Emotional-Persuasive Habit-Change Support Mobile Application for Heart Disease Patients (BeHabit) Bhavani Devi Ravichandran(B) and Pantea Keikhosrokiani School of Computer Sciences, Universiti Sains Malaysia, 11800 Minden, Penang, Malaysia [email protected], [email protected]
Abstract. Heart disease is stated as the world’s biggest killers. The risk factors of this deadly disease are due to some bad habits such as being overweight, bad eating diet, smoking, assumption of alcohol, etc. Nevertheless, patients can live a healthy lifestyle if they have the proper guidance of persuasive-emotional featured technologies. In line with this, this study focuses on developing an emotionalpersuasive habit-change support mobile application called BeHabit to improve heart disease patients’ lifestyles. Persuasive-emotional features are two different features that are integrated with BeHabit to distinguish this application from the existing ones. The proposed system is designed, implemented, tested, and evaluated by 10 users. In conclusion, the users are satisfied to used BeHabit to change their bad habits. Emotional and persuasive features which are integrated into BeHabit are the key to help patients to change their bad habits. BeHabit and the integrated feature can be used as a guideline for healthcare developers and providers for the improvement of mHealth services. Keywords: Heart disease · mHealth · Habit-change · Persuasive · Emotional features · Mood · Medical information system
1 Introduction According to the press release statistics on causes of death, Malaysia [1], Malaysians affected by coronary illness at a very young age of 41 as compared to other nations whereas in Thailand it is at the age of 65, in China 63, in western countries 66 and Canada at the age of 68. Therefore, more attention is required for heart disease patients in Malaysia. For instance, Mobile health technologies can impact the health of a chronic disease patient [2]. Having bad habits of diet, sleep, smoking, exercise, etc. might be the main cause of many diseases such as heart disease. Many existing mobile applications provide a platform for heart disease patients mostly to record patient’s activities; however, there is no current mobile application that can influence to change heart disease patients’ bad habits by adding persuasive and emotional features. The lifestyle of heart disease patients is very different from a normal individual. A heart disease patient cannot perform vigorous physical activities, nor they should not have their heart rate beat faster than a certain rate. Therefore, this study aims to develop a mobile application called BeHabit © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 252–262, 2021. https://doi.org/10.1007/978-3-030-70713-2_25
An Emotional-Persuasive Habit-Change Support Mobile Application
253
to assist heart disease patients by implementing persuasive and emotional features as a motive to form a new healthy lifestyle. BeHabit guides patients to follow daily healthy routines as well it keeps track of the patient’s moods and symptoms after each activity. It is programmed to suggest carrying out light physical activities as it is suggested by medical expertise [3]. BeHabit also provides a summary of the day to motivate the patients emotionally so that they will be enthusiastic to continue to live a healthy lifestyle. Besides, this application also helps the doctor to monitor their patients in real-time as well as provide prescriptions to their patients. This paper firstly introduced some existing mobile health (mHealth) applications followed by the proposed emotional-persuasive habit-change support system with a mobile application which is called BeHabit. BeHabit is developed in June 2020 to assist heart disease patients to change their bad habits of exercise and mood to reduce further development of heart disease. System design, methodology, implementation, test, and user acceptance evaluation are summarized in this paper. Finally, concluding remarks and future works are added as well. 1.1 Background The use of mHealth in the medical world has been a massive game-changer to the healthcare industry as patients have access to the latest and best reliable medical resources, treatments, good communication with doctors, and many more. Emerging mobile technology provides a platform for patients to be more aware of self-care, reduce hospitalization and mortality rates by 21% and 20% respectively [4]. The rate of heart disease is relentlessly expanding as well as becoming one of the biggest killers. Nonetheless, there are many ways to prevent heart disease by following good habits such as exercising 30 min a day on most days of the week, eating a healthy diet, maintaining an ideal weight, reduce stress, and many countless efforts that can be taken. Persuasive-emotional features are two different technologies with each being researched individually. Persuasive technology is the study of computers known as Captology. It is the study of interactive technology that helps to change the user’s habit. A study by [5], has presented five perspectives on computers and persuasion where it is the primary research to emphasize further to understand more on persuasive computing. Nevertheless, emotional technology is the new platform for further improvement in artificial intelligence area where it measures biometric information to define emotion as a computation for different computer applications. Recently, there has an increase in demand for its application to various fields. In this scenario, emotionally featured technology is key to help patients to guide their emotions in a way so that they could change their bad habits. The health habits of heart disease patients play a major role in mortality. Patients are advised to exercise regularly. According to medical experts, when it comes to exercising, it is different for heart disease patients as they should only maintain light exercise [3]. Hence, heart disease patients should be guided differently based on their lifestyle. Instant Heart Rate mobile application is designed to measure pulse accurately and heartbeat zone with heart rate and health monitor after sleeping or during workouts & training. Instant Heart Rate doesn’t require heart rate straps. It monitors blood circulation with accurate heart health monitors (like ECG or EKG). Functions similarly to pulse
254
B. D. Ravichandran and P. Keikhosrokiani
oximeters, detecting the change in your finger to provide accurate heartbeat measurements. It can measure the instant heart rate in less than 10 s. Most of the functions are in-app purchase and has limited functionalities [6, 7]. Cardiio helps to measure the user’s pulse using a phone camera. This application will help users gain insights on how heart rate relates to fitness and endurance. It will also improve the user’s fitness by building high-intensity circuit training exercises that will take around 7 min to complete. It keeps track of personal dashboards with history for daily, weekly, and monthly. This application is only available in iOS and it has in-app purchases where most functionalities are limited to normal users [8]. iCardio application helps users to keep track of runs, rides, and many activities related to cardio at the gym, daily step count, and activity all in one application. Users can add heart rate for more accuracy to count calories. The main motivation for this application is for users to lose weight. This application can also track workouts indoors and outdoors [9, 10].
2 Proposed Solution BeHabit is a specialized solution for heart disease patients to improve their lifestyle for being healthy. This system connects to Samsung Health with Samsung Smartwatch which retrieves data from Samsung Health Cloud. The smartwatch can provide more accurate data of the user such as heart rate, calories, distance traveled, and step counts. Samsung Health platform is a useful tool as it provides a centralized databased for developers to work on their projects. All information is retrievable from one Samsung account. BeHabit proposes a solution for heart diseases patient as this system analyses user heart rate in real-time and able to identify any abnormal changes. After the detection, the system automatically sends the user’s abnormal maximum heart rate to the doctor as well as alerts the user in the application. The user can also communicate with the doctor by sending messages and retrieve prescriptions directly from their doctor. Moreover, this system implements persuasive and emotional features to persuade patients for changing their bad habits. As for implementing persuasive features, the system applied a point collecting system known as BeHabit point. The BeHabit point is a technique to measure a user’s activeness which is presented in Table 1. This feature will motivate users persuasively to improve their health by either increasing step count or carrying more activity. The system also sends user persuasive messages to complete their targeted achievements. Another persuasive feature in BeHabit is that the system will let the user choose an activity from some listed choices and the targeted minutes to complete is at most 30 min for each. At the end of the activity, the system will prompt praise to the user. More persuasive features are defined by [11]. BeHabit uses dialogue support, credibility support, and primary task to support the persuasion context. In short, an application should be appealing, pleasurable, memorable, and effective. Nevertheless, with the same ideology, BeHabit has implemented similar methods to emotionally affect the user. Moreover, the system keeps track of the user’s mood before and after an activity. The system also keeps user symptoms if any after an activity. According to [12], mobile apps are developed for mood tracking in which the application features can be mapped
An Emotional-Persuasive Habit-Change Support Mobile Application
255
Table 1. Habit-points calculation Habit-points 0 to 20 range
21 to 40
41 to 60
61 to 80
81 to 100
Remarks
Very bad
Bad
Average
Good
Excellent
Types of activity achieved
– 20% of the targeted step count – Preference of user mood
– 40% of the targeted step count – Preference of user mood
– 60% of the targeted step count – Preference of user mood
– 80% of the targeted step count – Preference of user mood
– 100% or more of the targeted step count – Preference of user mood
into stages of mood tracking. Table 2 shows the stages to be implemented in the BeHabit system. For example, when the system prompts user mood tracking form, it is a stage of preparation where it provides fundamental information on how to conduct mood tracking. Next, the system will show a range of emoticons, pictures, and texts to define how the user feels. This is a collection stage. Table 2. Stages of mood tracking
The system will also notify users emotionally using motivational messages to improve user’s emotional mentality. This feature is also a collection stage. Moreover, the system will display a summary of user activity. The main purpose of this feature is to emotionally motivate the user to keep up the work or to remind them to work out more which according to Table 2 which is at the reflection stage. Finally, the system will export user data to the doctor to be referred and receive a prescription to improve the user’s health. 2.1 System Design BeHabit system architecture design is illustrated in Fig. 1. End users will be using android mobile devices and connect to the internet via an access point. The mobile applications have access to the online database which is in this case Firebase and Samsung Health cloud through the Internet connection. However, the user’s health data retrieved from Samsung Health are not stored in any database to ensure the user’s confidentiality and data access control.
256
B. D. Ravichandran and P. Keikhosrokiani
Fig. 1. System architecture diagram
Application architecture design is divided into 3 layers, namely (1) the view layer, (2) domain layer, and (3) the data layer. Each of the layers places significant roles in the overall architecture of the project. The view layer is responsible for the interaction between the users and the system. The mobile application needs to be installed in a smartphone that is equipped with a stable cellular network with a GPS sensor. The application will be connected to the Internet via the gateway. The domain layer is responsible for processing the data obtained from the view layer and pass it to API for data operation. The API implemented here is Google Firebase services, Samsung Health services, and YouTube API. Finally, the data layer is responsible for storing and managing the data to be used in the application. The main database used in this system is Firebase Real-Time database which is a flexible scalable new SQL cloud database to store and think data for the client. The firebase database functions as online storage for the application to retrieve the basic information on the user. The firebase cloud messaging (FCM) Service is also used to send push notification. The component in every layer needs to work together seemingly to ensure the application can perform at a desirable level. Nonetheless, Samsung Health Data Store and Tracker Service have been responsible to provide a platform for sharing health data from users to Android phones. The health data and services are retrieved in real-time to the BeHabit system. Health data sharing is shared with the user’s knowledge.
3 System Development Methodology The development methodology for this application is the Software Development Life Cycle (SDLC) methodology designed by [13–17]. SDLC is a framework that characterizes the various advances or procedures. The various steps involved in SDLC are modeling, assessment, design, and prototype as shown in Fig. 2. The SDLC can be applied
An Emotional-Persuasive Habit-Change Support Mobile Application
257
Fig. 2. SDLC methodology of the project [13]
to both hardware and software which will deliver high-quality products or services. This will guarantee the smooth running of the organizations. 3.1 System Implementation System Requirement There are some functional and non-functional requirements for developing BeHabit. Based on the functional requirements, the system should be able to retrieve real-time Samsung Health data and display them in an understandable user interface. It must provide an option for the user to view a history of health data. Furthermore, the user should be able to receive push notification which contains a daily quote to encourage the user. The user can send messages to and receive from the doctor in real-time. The system provides a platform for users to track and view their mood. The system will check the user’s heart rate if the user has the device to check it. The system displays helpful and encouraging tips to motivate users as a persuasive feature. It should recommend the user the types of activity to carry out based on heart rate data User is also able to share their achievement of the week as well as month. Finally, it must notify the doctor and user in case of any abnormalities in the user’s heart rate in real-time. As for non-functional requirements, the mobile application is designed with a minimum SDK version API 24: Android 7.0 Nougat which is 100% compatible with all portable devices. Every user is required to sign into the system to access the functionalities of the mobile application in the line of protecting the personal data. The user’s health data is retrieved directly from the Samsung Health API and is not stored anywhere in the device to ensure the user’s confidentiality. The mobile application should be designed with a consistent graphical user interface that is user friendly. The system always performs at an optimal level unless there is no Internet connection or have a failure in the online database. This requirement also always ensures system availability.
258
B. D. Ravichandran and P. Keikhosrokiani
Algorithms, Pseudocodes, APIs In this section, only the operations that have more sophisticated procedures with the use of a specific library or APIs are discussed. The straightforward operation such as user sign in, display encouraging tips, manage user profile, and more are considered as self-explanatory, thus it will not discuss in this section. Recommend Activity In recommendation activity which is a implementation of persuasive feature, the application retrieves binning heart rate data from Samsung Health data of the user. BeHabit has a built-in method to calculate the user’s estimated heart rate at a vigorous level and compare the user’s current heart rates with the value. The methods are called is onCalculateEstimatedHR() and onCompareHR(). Both methods are called when user requests for activity recommendation. The calculation to identify the user’s estimated heart rate value at vigorous value logic starts with converting the binning heart rate value into arrays and obtaining the user’s age from the firebase database. The method compares the age and the arrays of heart rate with the estimated value. If the array of heart rate is less than the estimated value, then BeHabit will recommend regular activities to carry out. However, if there’s any heart rate value in the array that exceeds the estimated value, then BeHabit will immediately recommend calming activities. The user can carry out the activity when the tap on the activity icon and the application will link directly to the Samsung Health activity tracker. Alert User An alert user activity, the primary objective is to alert the user when the heart rate during exercise exceed the estimated heart rate value. BeHabit uses the following table as a guide for detecting the abnormal heart rate. Firstly, the user’s age is retrieved from the Firebase database. Secondly, BeHabit received binning heart rate data which is converted to arrays to check if there is any heart rate that exceeds the heart rate zone of vigorous-intensity and maximum. The method implemented here is called checkExerciseHeartRate() which returns either true or false. In a scenario where the method returns true, the notificationAlert() method will be triggered. This method will trigger sending out a notification to the user to inform the user immediately. Furthermore, the exceeded heart rate value will be pushed to the Firebase database. So, the user can view the heart rate values by data later in a list view. Send/Receive Message to/from Doctor BeHabit allows communication between user/patients and their respective doctors. The user can send a message to the doctor. The user has to select the Send message button which triggers the sendMessage() method. This method opens another layout to allow the user to write the message. When the user is done, he/she can tap on the send button. However, if the message is empty, the system will prompt the message when not the message will be pushed to the Firebase database. This message will later be received at the doctor’s site and be displayed in the application. The system will also display
An Emotional-Persuasive Habit-Change Support Mobile Application
259
messages received from the doctor site which are pushed in the Firebase database in the application. Retrieve Current and Historical Health Data BeHabit application is primarily dependent on Samsung Health SDK which provides Data and Services to Android API. The Samsung Health Data Store syncs data with the user’s Samsung Account. The health data is retrieved from the Samsung Health Server that implements Rest API. For first time users, BeHabit will request Rest API Oauth2 from the Samsung Health Server SDK. The users must approve the authentication to fully utilized BeHabit functionalities. Initially, health data service is required to be initialized and ensure health data store connection is connected. Next, it is vital to set the listener to retrieve the required health data to use. In this system, daily step count, heart rate, and exercise health data are retrieved. This activity implements a persuasive feature as it motivates the user to be more active. Receive Daily Quotes In this part, the system sends user inspirational daily quotes as persuasion features. Every day at 8 am, the user receives motivational quotes using Firebase Cloud Messaging. The system randomly selects a quote from a long list of quotes and triggers the quoteFCM() method to start. This method starts the service and sends out a daily quote to the user. Moreover, the user can view and share the quote in the application to other social media platforms. Mood and Symptoms Tracker The system tracks the user’s mood and symptoms daily as emotional features. User is required to select the date to enter entry for either mood or symptom. The system will trigger moodTrack() and symptomsTrack(). The user must select one mood or symptom and tap on the submit button. Next, the submitted information will be pushed to Firebase real-time database. The user can also view the submitted data in the application. Calculate BeHabit Points The main purpose of calculating BeHabit points is to identify the activeness and mental evaluation of the user. This point system is dependent on both the total steps taken by the user and the mood of the day. The system retrieves the health data and reads the user’s input for symptoms. The implementation of BeHabit points applies persuasive features. The formula for BeHabit as shown as below: BeHabit points = ((Total steps/target steps) ∗ 0.5) + ((Mood value for the day/Number of moods) ∗ 0.5)
(1)
4 Testing and Evaluation 4.1 Unit Testing Unit testing in defies the smallest units of codes for its functionality, the purpose is to identify that each unit of the software performs as designed. Unit testing is very
260
B. D. Ravichandran and P. Keikhosrokiani
important because it can ensure the system runs seamlessly who starred in the Android application which uses Android studio as the environment tools, unit testing is carried out on a method. The functionality of a method is tested one by each other to minimize the error when methods are integrating to become a subsystem. 4.2 Integration Testing Integration testing is a process of combining the units of code and carry out their testing process that will produce the result of combination functions correctly. The purpose of this level of testing is to expose faults in the interaction between integrated units. Integration testing provides a systematic technique for assembling a software system while conducting tests to uncover errors associated with interfacing. It can ensure the parameter, function, run-time Exceptions an incompatibility between the interaction of objects. In this application, the integration between the subsystems is important because we divided the work to different members. Integration testing needs to be carried out from time to time starting from the development of the project. It is advisable to carry out integration testing when every functionality of the subsystem is complete to ensure the efficiency for implementation of the application. 4.3 System Testing and User Acceptance Evaluation System testing is a level of software testing where a complete and integrated software is tested. We performed system testing at the end of each iteration to ensure system compliance with the specific requirement. If that any error, immediate action needs to be done to fine-tune the system. In this project, an example of system testing carried out was the application system to ensure it can retrieve real-time data about the user’s health data and process the data to calculate if the user is active or not. Generally, user acceptance testing (UAT) on the system is carried when the system is integrated and completed. It is carried up by randomly selecting 10 users. This testing
Fig. 3. User interfaces designs of BeHabit application
An Emotional-Persuasive Habit-Change Support Mobile Application
261
is carried out by first to introduce all the available features to the respondent to make sure they understand the usage of each function implemented. After that, each of them we’ve given about 20 min. Next, the respondents are asked to fill up a questionnaire that consists of 7 questions to find user satisfaction and acceptance of BeHabit. After gathering all the responses, the results of UAT are analyzed Lastly, the result of UAT will be taken into consideration in future enhancement or development of the project. Figure 3 shows the user interfaces of the application.
5 Conclusion and Future Work The system with the BeHabit application has been developed successfully by meting requirements to provide a platform for improving heart disease patients’ lifestyles. BeHabit has equipped with push notification that can alert the user when the heart rate increases as well as receive a prescription from the doctor. At the same time, the application benefits patients by providing emotional and persuasive feature implantation. The user can keep track of their daily mood from time to time and symptoms if there’s any. All this information will be sent to the doctor’s site to observe and provide prescriptions according to the individual user. The user can also send messages to the doctor if they wish to ask any questions or to update. The application provides a visual presentation of the user’s health data. Moreover, the system can provide recommended activities to users according to their health and more. BeHabit was tested and evaluated by 10 users who were satisfied to use the app. Emotional and persuasive features are very important for changing bad habits. In the future, communication with a doctor should be improved. The system can increase the usability to keep track of the user’s other unhealthy habits such as smoking, diet, etc. Furthermore, a food intake tracking feature can be added to track of user’s calorie intake. Finally, BeHabit point calculation can be improved. Acknowledgment. The authors are thankful to School of Computer Sciences, and Division of Research & Innovation, USM for providing financial support from Short Term Grant (304/PKOMP/6315435) granted to Dr Pantea Keikhosrokiani.
References 1. Department of Statistics Malaysia. Press Release Statistics on Causes of Death, Malaysia (2019). https://dosm.gov.my/v1/index.php?r=column/pdfPrev&id=RUxlSDNkcnRVazJnak NCNVN2VGgrdz09. Accessed 30 Nov 2019 2. Nilsen, W., et al.: Advancing the science of mHealth. J. Health Commun. 17(sup1), 5–10 (2012) 3. Fuezeki, E., Engeroff, T., Banzer, W.: Health benefits of light-intensity physical activity: a systematic review of accelerometer data of the National Health and Nutrition Examination Survey (NHANES). Sports Med. 47(9), 1769–1793 (2017) 4. Clark, R.A., et al.: Telemonitoring or structured telephone support programmes for patients with chronic heart failure: systematic review and meta-analysis. BMJ 334(7600), 942 (2007) 5. Fogg, B.J.: Persuasive computers: perspectives and research directions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (1998)
262
B. D. Ravichandran and P. Keikhosrokiani
6. Instant Heart Rate: HR Monitor & Pulse Checker. Apps on Google Play, Google (2019). https://play.google.com/store/apps/details?id=si.modula.android.instantheartrate&hl=en. Accessed 28 Oct 2019 7. Azumio Inc, Instant Heart Rate: HR Monitor. App Store (2019). https://apps.apple.com/us/ app/instant-heart-rate-hr-monitor/id409625068. Accessed 28 Oct 2019 8. Cardiio, Inc, Cardiio: Heart Rate Monitor. App Store (2019). https://apps.apple.com/us/app/ cardiio-heart-rate-monitor/id542891434. Accessed 28 Oct 2019 9. iCardio Workout Tracker & Heart Rate Trainer. Apps on Google Play, Google (2019). https://play.google.com/store/apps/details?id=com.fitdigits.icardio.app&hl=en. Accessed 28 Oct 2019 10. Fitdigits Inc, iCardio Workout Tracker. App Store (2019). https://apps.apple.com/us/app/ica rdio-workout-tracker/id314841648. Accessed 28 Oct 2019 11. Lehto, T., Oinas-Kukkonen, H.: Persuasive features in web-based alcohol and smoking interventions: a systematic review of the literature. J. Med. Internet Res. 13(3), e46 (2011) 12. Caldeira, C., et al.: Mobile apps for mood tracking: an analysis of features and user reviews. In: AMIA Annual Symposium Proceedings. American Medical Informatics Association (2017) 13. Keikhosrokiani, P.: Perspectives in the Development of Mobile Medical Information Systems: Life Cycle, Management, Methodological Approach, and Application. Academic Press, Cambridge (2019) 14. Keikhosrokiani, P.: Chapter 6 - Emotional-persuasive and habit-change assessment of mobile medical information Systems (mMIS). In: Keikhosrokiani, P. (ed.) Perspectives in the Development of Mobile Medical Information Systems, pp. 101–109. Academic Press (2020) 15. Keikhosrokiani, P., et al.: User behavioral intention toward using mobile healthcare system. In Consumer-Driven Technologies in Healthcare: Breakthroughs in Research and Practice, pp. 429–444. IGI Global (2019) 16. Keikhosrokiani, P.: Chapter 4 - Behavioral intention to use of mobile medical information system (mMIS). In: Keikhosrokiani, P. (ed.) Perspectives in the Development of Mobile Medical Information Systems, pp. 57–73. Academic Press (2020) 17. Keikhosrokiani, P., Mustaffa, N., Zakaria, N.: Success factors in developing iHeart as a patientcentric healthcare system: a multi-group analysis. Telematics Inform. 35(4), 753–775 (2018) 18. Keikhosrokiani, P., et al.: Assessment of a medical information system: the mediating role of use and user satisfaction on the success of human interaction with the mobile healthcare system (iHeart). Cogn. Technol. Work 22(2), 281–305 (2020)
A Systematic Review of the Integration of Motivational and Behavioural Theories in Game-Based Health Interventions Abdulsalam S. Mustafa(B) , Nor’ashikin Ali, and Jaspaljeet Singh Dhillon Universiti Tenaga Nasional, Selangor, Malaysia {nora.ali08,jaspaljeet}@uniten.edu.my
Abstract. M-Health interventions designed for healthcare can potentially increase participation and behaviour outcomes. However, interventions need to incorporate a theoretical perspective of behavioural change to enhance their perceived efficacy. Although behavioural outcome theories have gained interest in the health and fitness literature, the implementation of theoretical integration remains largely under-studied. Therefore, we reviewed the efficacy of behavioural gamified interventions based on integrated theories in various contexts, such as healthcare and fitness. Studies were included if an integrated theoretical intervention was implemented to change behaviour in specific contexts. The review aims to uncover the effectiveness of integrated theory in predicting behaviour outcome in interventions. Our findings reveal that in 39 studies, Self Determination Theory (n = 19) and Theory of Planned Behaviour (n = 16) outnumbered other theories in integrated models. Overall, 77% of studies showed evidence that integrated theoretical-based behaviour change interventions can be successful for a short time, with only a few studies that tested these interventions’ long term effects. We discuss the implication of our findings, and also propose potential future directions. Keywords: Integrated theories · Hybrid · Gamification · Intervention · Behaviour change · Health and fitness
1 Introduction Essentially all lifestyle-related health risks, like non-communicable diseases, are significantly affected by an individual’s health behaviours such as physical activity and food intake. It was reported that globally, approximately 80% of adolescents are physically inactive [1]. Evidence indicates that our wellbeing can be regulated by individual behaviours [2]. Thus, considering the potential risks of sedentary behaviour and physical inactivity [3], behavioural improvement becomes critical to maintaining a healthy lifestyle. One of the critical drivers of behavioural change in an individual is motivation. A key challenge in the health and fitness context is to keep users interested, inspired and engaged in maintaining physical activity. Gamification which refers to the use of game elements in a non-gaming context has been proposed to positively impact on both health © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 263–278, 2021. https://doi.org/10.1007/978-3-030-70713-2_26
264
A. S. Mustafa et al.
behaviour change and adherence [4, 5]. The concept of gamification has been applied in several context, such as education and online learning [6, 7], workplace [8] and travel [9], through game elements such as leaderboards, badges, and challenges. Gamification seeks to make activities more engaging, exciting, and promote a long-term behaviour outcome. Examples of apps and services that use gamification for user engagement include Fitbit, Duolingo, Foursquare, Khan Academy, Coursera, and Grab. However, for gamification to be effective in these conditions, the platform needs to encourage users to adhere to the interventions and remain engaged. This refers to continued usage behaviour or post-adoption behaviour, which is the repeated usage activity of users after the system is adopted, often measured by the frequency of use. Recent research, however, affirmed that the strengthening of the continuous use behaviour is related to maintaining long-term behaviour outcome, thus promoting the long-term stability and overall efficacy of the intervention [10, 11]. In total, interventions aim to increase motivation for sustained use of apps towards performing certain behaviours [12]. Nevertheless, to achieve this, it has become critical to provide a theoretical understanding of predicting behaviour. Therefore, theory application is essential in identifying the causal determinants of change and the effects of the intervention [13]. Indeed, theoretical approaches that enable theories to be tested to determine the most effective and best fit are now widely used in the health and fitness-related literature [14–16]. In this regard, Schoeppe and colleagues [17] noted varying degrees of effectiveness in behavioural change theories to explain behavioural health improvements and these theories’ efficacy. According to [18], addressing multiple theories helps develop more extensive interventions by leveraging each theory’s strength. Thus, identifying the integrated theories in this review will strengthen the use of relevant theory and the objectivity with which they are implemented. This will improve understanding of how multiple theories have been revised and combined over time and their usefulness in predicting behaviour. Moreover, the findings have implications for creating a better understanding of predicting behaviour through theory integration. This paper is organised as follows: Sect. 2 outlines and discusses theory integration in behaviour change interventions. We then include an overview of our review procedure in Sect. 3. Section 4 summarises the key conclusions of our study. Section 5 discusses the results of our review. We end with a brief conclusion, identifying limitations and potential future directions in Sect. 6.
2 Integration of Theories Theory integration refers to combining the variables or constructs of two or more theories to form an integrated theory, a hybrid or modified model. Essentially, integrated theories illustrate key psychological factors and processes that help predict and explain a behavioural change in hybrid-based health interventions by eliminating redundancy while leveraging each theory’s strength [19, 20]. Notably, some authors have extended existing theories by adding useful constructs from one model and combining them with another model [21, 22]. In other words, this can be referred to as theory advancement.
A Systematic Review of the Integration of Motivational and Behavioural Theories
265
While the literature shows that multiple theories’ constructs can work together to predict behaviour, they have the common limitation of testing only one or more constructs of each theory instead of all variables [19]. In addressing this, Noar and Zimmerman [19] suggest integrating theories to expand our understanding of the constructs’ influence while evaluating individual theories to help guide their integration. This helps us to equate the individual theoretical models with integrated ones and examine their benefits. Thus the main goal is to provide modern-day views on applied theoretical research in the health domain to understand the processes better. In turn, it may lead to improvements in health behaviour and related outcomes. Indeed, this intends to provide a simplified and detailed view of the factors that influence health behaviour, thus eliminating theoretical gaps, reducing complexities and increasing parsimony [23]. Therefore, while theoretical integration is feasible, further studies are necessary before drawing clear conclusions. In this context, the first step is analysis of the existing literature.
3 Review Procedure 3.1 Search Strategy A systematic search of the scholarly literature was performed using DOAJ, Emerald Insight, Google Scholar, IEEE Xplore, GSPI, PubMed, Science Direct, Scopus, Web of Science and Wiley from April to July 2020. To locate relevant papers on integrated theories in behavioural interventions, we used a mixture of search terms to identify all related literature in the following categories: “integrated,” “theories,” “integ,” “comb,” “models,” “behaviour” “gamification”. We reviewed 88 potentially relevant articles, but only 22 of them met the study criteria. Specifically, we identified 76 records via database search, with 12 additional records from other sources. We then excluded 44 articles with a different focus. We also conducted a forward and backward search and identified an additional 17 related articles. Finally, we selected 39 articles from June 1999 to June 2020 for this review (see Fig. 1 and Table 4). 3.2 Data Analysis Studies were categorised as quantitative, qualitative or mixed-method, using either subjective or objective indicators based on the concepts of [24]. We were especially interested in how the studies integrated theories into the behavioural change process through theoretical lenses, including the discovery of theoretical approaches as well as assessing their degree of effectiveness. Thus, an intervention was determined to be successful on based on the outcome and categorised as (i) positive, (ii) negative, (iii) neutral (no effect). For the classification of theories, each study must include at least one motivational or behavioural theory. Finally, for the analysis, we included the following categories in the framework: study sample, study length, data collection, analysis, study location and outcome.
266
A. S. Mustafa et al.
Fig. 1. Flow diagram of the literature review process.
4 Analysis The interventions were categorised based on the population size, study groups, research time frame, study location and the study outcome. The selected articles in this study predominantly used quantitative methods (77%) other articles employed a mixed approach. The majority of studies adopted the survey method for data collection. The number of participants in the studies ranges from 46 [38] to 8840 [50], with age groups between 17 and 82. Moreover, the intervention strategies were delivered over timescales ranging from two weeks to one year. We observed more publications in journals (n = 36) than in conference proceedings (n = 2) and dissertations (n = 1). These findings were significantly different from that of [6], which highlighted that most papers in their analysis were conference proceedings. Then, we examined the number of publications over the time frame and found that the studies were conducted between 1999 and 2020. Notably, a total of seventeen papers were published between 2017 and 2020, with three in 2017, four in 2018, five in 2019, and five in 2020. This confirms a growing trend in the number of publications in the context of hybrid theory-based interventions. The analysis highlighted that the majority of studies were conducted in China (n = 7), followed by Taiwan (n = 5) and the USA (n = 4) (see Fig. 2). 4.1 What Theories Were Targeted? This subsection summarises the main properties of the behaviour change interventions from the analysis. We identified 21 different theories featured in the studies (see Table 1). The Self-Determination Theory (SDT) is extensively used in 49% (n = 19) of the studies, followed by Theory of Planned Behaviour (TPB) in 41% (n = 16) of studies,
A Systematic Review of the Integration of Motivational and Behavioural Theories
267
Fig. 2. Number of studies per country.
and the Technology Adoption Model (TAM) in 28% (n = 11) of studies. Significantly, most results were positive in studies that integrated any of the three theories (SDT, TPB or TAM). This finding provides additional support for SDT’s suitability for integration with other theories. Table 1. Frequency of theories used in the studies. Theories
No. of studies
% of studies
Self-Determination Theory (SDT)
19
49%
Theory of Planned Behaviour (TPB)
16
41%
Technology Adoption Model (TAM)
11
28%
Task Technology Fit (TTF)
9
23%
Expectation Confirmation Model (ECM)
5
13%
Self-Efficacy Theory (SET)
4
10%
Social Cognitive Theory (SCT)
3
8%
Unified Theory of Acceptance and Use of Technology (UTAUT); Flow theory; Social capital theory
2
5%
Other theories (see Table 4)
1
3%
4.2 What Behaviours Were Targeted? Table 2 shows the behaviours targeted by the studies. Cugelman [25], underlines that gamification’s long-term effect is more significant than the short-term impact. Based on this, we found that eight health behaviours were targeted (n = 14) with mainly positive outcomes. However, studies that did not target health-related behaviour outcomes focused on online learning, information system, entrepreneurship, workplace, social network and gaming. As shown in Table 3, the most recurrently studied health area is Health, Fitness & Physical Activity (n = 13). The primary behavioural outcome here is to increase
268
A. S. Mustafa et al. Table 2. Targeted behaviours in the studies.
Behavioural contexts targeted
No. of studies (%)
Studies
Health, fitness & physical activity
14 (36)
[14, 15, 18, 32–36, 39, 40, 42, 50, 51, 57]
Online learning
10 (26%)
[28, 43, 45, 46, 52, 54–56, 58, 59]
Mobile shopping & banking
5 (23%)
[16, 31, 44, 49, 60]
ICT, information system & 2 (5%) KMS
[48, 53]
Entrepreneurship
2 (5%)
[26, 27]
Workplace
2 (5%)
[30, 37]
Social networking
2 (5%)
[29, 47]
Gameplay & online
2 (5%)
[38, 41]
physical activity. The remaining papers focused on Physical Education and Leisure, Myopia Prevention, Influenza Prevention, Exercise & Diet, Post-cardiac Rehabilitation, and Blood Donation, highlighting the growing number of studies in this domain. Table 3. Targeted health behaviours in the studies. Health behaviour targeted
No. studies (%)
Studies
Physical Activity (PA)
5 (37%)
[14, 15, 18, 34, 40]
Exercise and diet
3 (21%)
[39, 42, 50]
Physical Education (PE) and leisure
2 (14%)
[32, 33]
Seasonal influenza prevention
1 (7%)
[36]
Myopia prevention
1 (7%)
[35]
Post-cardiac rehabilitation
1 (7%)
[51]
Blood donation
1 (7%)
[57]
4.3 What Groups Were Targeted? We observed that the target categories of featured interventions varied. Most of the studies addressed a specific group, such as university students, programme analysts, online players, e-learning users, instructors, m-banking users and employees. On the other hand, only three studies targeted healthcare patients [34, 35, 51]. In our view, the result emphasises that it may be easier to implement hybrid theory-based gamification studies in a non-health context. The sampling designs used in the studies include of
A Systematic Review of the Integration of Motivational and Behavioural Theories
269
sampling, cohort groups, simple random sample and convenience sampling were used. In total, the majority of the studies used an online survey to recruit their participants. However, in specific cases, the focus group was predefined by the associated behaviour change, such as cardiac rehabilitation [51]. Notably, other studies recruited individuals interested in interventions through online networks. We identified most of the studies to be cross-sectional with only six longitudinal studies applied by scholars. Hence, there is a need for more longitudinal studies to determine the interventions’ long-term impact effectively. From our result, we also found that 30 (77%) of the papers show evidence of significant benefits of integrated theory-based interventions, while three (8%) show adverse effects and six (15%) neutral outcomes or no effect. Also, most empirical works (30 studies) showed positive impacts, which suggests that progress has been made in the context of hybrid-based interventions on behavioural change. However, just three papers [38, 42, 55] tested gamification and theory integration. In two of the studies [38, 42], experimental design (experimental and control groups) was employed and integrated SDT with other theories. The studies’ results reported two positive [42, 55] and one neutral [38] outcomes. Theory-based gamified approaches can effectively promote behavioural outcomes; thus, future experimental and longitudinal research is needed to understand the results better.
5 Discussion Notably, the majority of studies (77%) show significant beneficial outcomes. The analysis confirms that an integrated model can increase engagement and related outcomes, such as intention and behavioural change. We observed a growing interest in combining more than two theories (in five studies), whereas previous studies only combined two theories. Our study reveals the unequal distribution of theoretical use frequency with SDT and TPB as the most implemented theories. This can be attributed to the fact that the theories are selected most often because they have a clear conceptual framework, a positive outcome or that they satisfy other acceptance criteria. Although, this may not be the case, as some studies have shown negative or neutral effects of theory integration (SDT + TPB and SDT + SET). Our findings also indicate that integrated theory-driven gamification intervention for behaviour change are underutilised. Still, evidence demonstrates the potential for hybrid theory and game-based interventions to fit together to successfully predict behaviour outcome [22] (Table 4).
Intervention domains Entrepreneurship; Healthcare; Exercise & physical activity; information system
Theories integrated
SDT + TPB
Studies
[26, 27, 34–36, 39, 40, 42, 48, 57]
N = 3670 (Total) University students, office workers, Parents Australia [40, 57] Belgium [42] China [35, 48] Hong Kong [36] Malaysia [26] Yemen [27] USA [34]
Total sample sizes & characteristics
Table 4. Characteristics of included studies
(continued)
SDT and TPB support explanations of motivation of entrepreneurial behaviour [26] Satisfaction of SDT motivational factors significant in improving student’s intention [27] SDT explains more variance in TPB variables than TPB explains for SDT [34] Weak relationship between intentions and behaviour [34] Integrated model of SDT + TPB can be used to explain myopia-preventive behaviours [35] Intention significantly predicted reading distance [35] Facemask use positive related to intentions mediated by subjective norm, attitude, PBC [36] Indirect effect on exercise behaviour; both direct and indirect effect on diet behaviour [39] PA intentions strongest determinant of behaviour; intention fully mediated by TPB variables [40] PA behaviour positively predicted by intentions towards PA [42] Intervention intensity positively predicted desired changes in fat intake [42] AM strongly influenced IS discontinuation [48] Autonomous motivation predicted intention, and no effect of external regulation [57]
Outcomes
270 A. S. Mustafa et al.
Intervention domains Gameplay; physical activity; ICT training
Workplace; Online learning
Physical activity
Physical activity
Healthcare
Theories integrated
SDT + SET
TAM + TTF
TPB + EPPM
TPB + BPN
SCT + TAM + Social capital theory
Studies
[18, 38, 51, 53]
[37, 55, 58]
[14]
[15]
[16]
Table 4. (continued) Total sample sizes & characteristics
N = 365; Students Taiwan
N = 462; Athletes Australia
N = 336; Students Australia
N = 731 (Total) Prog analysts, MOOC users & instructors USA [37], China [58], North Cyprus [55]
N = 567 (Total) Cardiac-rehabilitation patients [51] Canada [18, 51] Thailand [53]
(continued)
Over 80% of all relationships proposed in the integrated model supported Integrated model demonstrated excellent fit and useful for predicting behavioural intention
Integrated model increased explanatory power for predicting exercise behaviour Intentions and PBC predicted intention to continue sports
Integrated model enhanced explanatory power for predicting exercise behaviour than only TPB
TAM + TTF had better explanation for variance in IT utilisation than TAM or TTF [37] PU, PEOU, TTF, social recognition, social influence and attitudes significantly influence continuance intention [55] PU, PEOU, TTF, social recognition, reputation, attitude and social influence strongly predicted continuance intention [58]
Individual and integrated models supported; SDT + SET more favourable over either SDT or SET [18] All psychological needs predicted self-determined motivation in integrated model [18] No major difference among the groups in relation to engagement and performance [38] SDT and SET partially supported but unable to predict physical activity change; [51] Higher Self-determined motivation will predict self-efficacy, satisfaction and usage intention [53]
Outcomes
A Systematic Review of the Integration of Motivational and Behavioural Theories 271
Intervention domains Online learning Social media Workplace
Mobile banking
Physical education
Physical activity Online internet
Online learning
Mobile data service
Theories integrated
Flow theory + ECM + ISSM
TAM + U&G theory
SDT + Social exchange theory
TAM + Trust theory
SDT + TBP + HMIM
SDT + Goal orientation theory
SCT + EDT
SDT + TTF
ECM + TPB
Studies
[28]
[29]
[30]
[31]
[32]
[33]
[41]
[43]
[44]
Table 4. (continued)
N = 207; Graduate Students South Korea
N = 414 Pakistan
N = 235; Internet users Taiwan
N = 723; 28 Schools Hungary
N = 274; College students Greece
N = 219; M-banking users Ethiopia
N = 453; Office workers China
N = 372 Students, UAE
N = 515; College students UAE
Total sample sizes & characteristics
(continued)
Integrated model has stronger explanatory power of continuance than ECM or TPB alone Subject norm, PBC User satisfaction, PF, PU, and PE have significant impact on continuance intention
TTF positively influenced behavioural intentions PC, perceived relatedness and social recognition strongly influence behavioural intentions
Continuance significantly related to SCT + EDT constructs Continuance intention predicted satisfaction, internet self-efficacy, and outcome expectations
Self-determined forms of behavioural regulation main predictor of intention
3 Basic psychological need satisfaction variables exclusively predicted autonomous motivation in PE
Attitude and trust mutually explain 50% variance in continuance intention to use m-banking
Need satisfaction mediated relationship between overall justice and intrinsic motivation
The integrated model strongly predicted users’ intention
PU, PE and PEOU have the strongest effect on continuous intention to use
Outcomes
272 A. S. Mustafa et al.
Intervention domains Online learning
Online learning
Social network Mobile shopping
Healthcare
Online learning
Online learning
Theories integrated
PAM + TTF
ECM + TPB + TAM + flow theory
TTF + Social capital theory
TAM + ECM
HBM + UTAUT
SDT + TAM
TAM + UTAUT
Studies
[45]
[46]
[47]
[49]
[50]
[52]
[54]
Table 4. (continued)
N = 305; University students The Netherlands
N = 174; UN staff USA
N = 8840; Students, workers China
N = 203; Mobile shoppers China
N = 315; Students, workers Taiwan
N = 363; e-learning students Taiwan
N = 135 Norway
Total sample sizes & characteristics
(continued)
The UTAUT model strongly predicts the perceived acceptance of MOOCs Attitude strongly effects behavioural intention
Three basic psychological needs have significant indirect effects on continuance intention Stronger influence of PU on continuance intention than perceived playfulness
Users’ risk perception negatively affected actual usage behaviour Actual usage behaviour positively affected weight-loss intention and behavioural intention
PU did not motivate all user groups Satisfaction and PEOU significantly influence different user groups
The fit between social characteristics and tech characteristics impacts users’ intentions
Continuance intention strongly affected by satisfaction and PBC (lesser predictor) PU, subjective norms, concentration and attitude moderately effect continuance intention
Both TTF and PAM variables explain continuance intention Users’ satisfaction influences the development of strong intention about IS continuance
Outcomes
A Systematic Review of the Integration of Motivational and Behavioural Theories 273
KMS
Online learning
M-banking
SCT + TTF
TPB + TTF
TAM + TTF + ECM
[56]
[59]
[60]
N = 43; Telecom utilities China
N = 870; Students Taiwan
N = 192; KMS users Taiwan
Total sample sizes & characteristics
Satisfaction, TTF, PU and perceived risk strong predictors of continuance intention
User attitudes towards subjective norms, and PBC indirectly impact utilisation Some features of utilisation determined by user perceptions and behavioural intentions
Integrated model explains about 50% of variance in KMS usage Task interdependence, TTF, self-efficacy and personal outcome significantly impact KMS usage
Outcomes
SDT: Self-Determination Theory, TAM: Technology Acceptance Model, TTF: Task Technology Fit, U&G: Uses and Gratifications Theory, TPB: Theory of Planned Behaviour, SET: Self Evaluation Theory, ECM: Expectation Confirmation Model, PAM: Post Acceptance Model, UTAUT: Unified Theory of Acceptance and Use of Technology, ISSM: Information System Success Model, HBM: Health Behavioural Model, SCT: Social Cognitive Theory, EDT: Expectancy Disconfirmation Theory, BPN: Basic Psychological Needs; HMIM: Hierarchical Model of Intrinsic Motivation, KMS: Knowledge Management System
Intervention domains
Theories integrated
Studies
Table 4. (continued)
274 A. S. Mustafa et al.
A Systematic Review of the Integration of Motivational and Behavioural Theories
275
6 Conclusion In this paper we reviewed studies that integrated several related theories in behaviour change interventions. To our knowledge, this is the first review conducted on integrated theory-based intervetions used to predict health behaviour outcome. The results revealed that SDT is the most implemented theory, followed by TPB. The majority of the papers reviewed showed a positive effect on behavioural outcomes, but these studies’ positive effect was also short-term. The study findings generally reinforce the use of behavioural change interventions based on theory integration. However, relatively few studies have tested integrated-theory driven behaviour change interventions, hence the need for more intervention research in line with [21]. This raises awareness of some limitations of theoretical-based research in behavioural sciences, thus highlighting the alternatives that theory integration offers. Since positive behavioural outcomes are often short-term, further integrated theory-based intervention research is required to predict and sustain long-term behavioural outcome in the health context. Acknowledgement. The authors would like to acknowledge the financial support of Universiti Tenaga Nasional under the Bold Research Grant (RJO10517844/012) and the Innovative Research Management Center (iRMC) UNITEN.
References 1. Trigueros, R., Aguilar-Parra, J.M., Cangas, A.J., Lopez-Liria, R., Alvarez, J.F.: Influence of physical education teachers on motivation, embarrassment and the intention of being physically active during adolescence. Int. J. Environ. Res. Public Health 16(13), 2295 (2019) 2. Sola, D., Couturier, J., Voyer, B.: Unlocking patient activation in chronic disease care. Br. J. Healthc. Manag. 21(5), 220–225 (2015) 3. Wu, X.Y., Han, L.H., Zhang, J.H., Luo, S., Hu, J.W., Sun, K.: The influence of physical activity, sedentary behavior on health-related quality of life among the general population of children and adolescents: a systematic review. PloS One 12(11), e0187668 (2017) 4. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness: defining “gamification”. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, pp. 9–15, September 2011 5. Landers, R.N., Armstrong, M.B., Collmus, A.B.: How to use game elements to enhance learning: applications of the theory of gamified learning. In: Serious Games and Edutainment Applications, pp. 457–483. Springer, Cham (2017) 6. Khalil, M., Wong, J., de Koning, B., Ebner, M., Paas, F.: Gamification in MOOCs: a review of the state of the art. In: 2018 IEEE Global Engineering Education Conference (EDUCON), pp. 1629–1638IEEE, April 2018 7. Antonaci, A., Klemke, R., Specht, M.: The effects of gamification in online learning environments: a systematic literature review. Informatics 6(3), 32 (2019) 8. Suh, A., Cheung, C.M., Ahuja, M., Wagner, C.: Gamification in the workplace: the central role of the aesthetic experience. J. Manag. IS 34(1), 268–305 (2017) 9. Yen, B.T., Mulley, C., Burke, M.: Gamification in transport interventions: another way to improve travel behavioural change. Cities 85, 140–149 (2019) 10. Bhattacherjee, A.: Understanding information systems continuance: an expectationconfirmation model. MIS Q. 25, 351–370 (2001)
276
A. S. Mustafa et al.
11. Sailer, M., Hense, J., Mandl, H., Klevers, M.: Fostering development of work competencies and motivation via gamification. In: Competence-Based Vocational and Professional Education, pp. 795–818. Springer, Cham (2017) 12. Greaves, C.J., Sheppard, K.E., Abraham, C., Hardeman, W., Roden, M., Evans, P.H., Schwarz, P.: Systematic review of reviews of intervention components associated with increased effectiveness in dietary and physical activity interventions. BMC Public Health 11(1), 1–12 (2011) 13. Davis, R., Campbell, R., Hildon, Z., Hobbs, L., Michie, S.: Theories of behaviour and behaviour change across the social and behavioural sciences: a scoping review. Health Psychol. Rev. 9(3), 323–344 (2015) 14. Richards, J.A., Johnson, M.P.: A case for theoretical integration: combining constructs from the theory of planned behavior and the extended parallel process model to predict exercise intentions. SAGE Open 4(2) (2014). https://doi.org/10.1177/2158244014534830 15. Gucciardi, D.F., Jackson, B.: Understanding sport continuation: an integration of the theories of planned behaviour and basic psychological needs. J. Sci. Med. Sport 18(1), 31–36 (2015) 16. Tsai, C.H.: Integrating social capital theory, social cognitive theory, and the technology acceptance model to explore a behavioral model of telehealth systems. Int. J. Environ. Res. Public Health 11(5), 4905–4925 (2014) 17. Schoeppe, S., Alley, S., Van Lippevelde, W., Bray, N.A., Williams, S.L., Duncan, M.J., Vandelanotte, C.: Efficacy of interventions that use apps to improve diet, physical activity and sedentary behaviour: a systematic review. Int. J. Behav. Nutr. Phys. Act. 13(1), 127 (2016) 18. Sweet, S.N., Fortier, M.S., Strachan, S.M., Blanchard, C.M.: Testing and integrating self-determination theory and self-efficacy theory in a physical activity context. Can. Psychol./Psychologie Canadienne 53(4), 319 (2012) 19. Noar, S.M., Zimmerman, R.S.: Health behavior theory and cumulative knowledge regarding health behaviors: are we moving in the right direction? Health Educ. Res. 20(3), 275–290 (2005) 20. Marteau, T., Dieppe, P., Foy, R., Kinmonth, A.L., Schneiderman, N.: Behavioural medicine: changing our behaviour. BMJ J. 332(7539), 437–438 (2006) 21. Hagger, M.S., Hamilton, K.: Changing behaviour using integrated theories. In: The Handbook of Behavior Change (2020) 22. Liebana-Cabanillas, F., Munoz-Leiva, F., Sanchez-Fernandez, J.: A global approach to the analysis of user behavior in mobile payment systems in the new electronic environment. Serv. Bus. 12(1), 25–64 (2018) 23. Hagger, M.S.: Theoretical integration in health psychology: unifying ideas and complementary explanations. Br. J. Health. Psychol. 14(2), 189–194 (2009) 24. David, M., Sutton, C.D.: Social Research: The Basics, vol. 74, no. 3. Sage, Thousand Oaks (2004) 25. Cugelman, B.: Gamification: what it is and why it matters to digital health behavior change developers. JMIR Serious Games 1(1), e3 (2013) 26. Al-Jubari, I., Hassan, A., Liñán, F.: Entrepreneurial intention among university students in Malaysia: integrating self-determination theory and the theory of planned behavior. Int. Entrep. Manag. J. 15(4), 1323–1342 (2019) 27. Al-Jubari, I.: College students’ entrepreneurial intention: testing an integrated model of SDT and TPB. Sage Open 9(2), 2158244019853467 (2019) 28. Al-Maroof, R.S., Salloum, S.A.: An Integrated model of continuous intention to use of google classroom. In: Recent Advances in Intelligent Systems and Smart Applications, pp. 311–335. Springer, Cham (2020) 29. Al-Maroof, R.S., Salloum, S.A., AlHamadand, A.Q.M., Shaalan, K.: A unified model for the use and acceptance of stickers in social media messaging. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 370–381. Springer, October 2019
A Systematic Review of the Integration of Motivational and Behavioural Theories
277
30. Aryee, S., Walumbwa, F.O., Mondejar, R., Chu, C.W.: Accounting for the influence of overall justice on job performance: integrating self-determination and social exchange theories. J. Manag. Stud. 52(2), 231–252 (2015) 31. Asnakew, Z.S.: Customers’ continuance intention to use mobile banking: development and testing of an integrated model. Rev. Socionetwork Strateg. 14, 123–146 (2020) 32. Barkoukis, V., Hagger, M.S., Lambropoulos, G., Tsorbatzoudis, H.: Extending the transcontextual model in physical education and leisure-time contexts: examining the role of basic psychological need satisfaction. Br. J. Educ. Psychol. 80(4), 647–670 (2010) 33. Biddle, S., Soos, I., Chatzisarantis, N.: Predicting physical activity intentions using goal perspectives and self-determination theory approaches. Eur. Psychol. 4(2), 83 (1999) 34. Brooks, J.M., Iwanaga, K., Chiu, C.Y., Cotton, B.P., Deiches, J., Morrison, B., Moser, E., Chan, F.: Relationships between self-determination theory and theory of planned behavior applied to physical activity and exercise behavior in chronic pain. Psychol. Health Med. 22(7), 814–822 (2017) 35. Chan, D.K.C., Fung, Y.K., Xing, S., Hagger, M.S.: Myopia prevention, near work, and visual acuity of college students: integrating the theory of planned behavior and self-determination theory. J. Behav. Med. 37(3), 369–380 (2014) 36. Chung, P.K., Zhang, C.Q., Liu, J.D., Chan, D.K., Si, G., Hagger, M.S.: The process by which perceived autonomy support predicts motivation, intention, and behavior for seasonal influenza prevention in Hong Kong older adults. BMC Public Health 18(1), 1–9 (2018) 37. Dishaw, M.T., Strong, D.M.: Extending the technology acceptance model with task–technology fit constructs. Inf. Manag. 36(1), 9–21 (1999) 38. Jamshidifarsani, H., Tamayo-Serrano, P., Garbaya, S., Lim, T., Blazevic, P.: Integrating selfdetermination and self-efficacy in game design. In: International Conference on Games and Learning Alliance, pp. 178–190. Springer, Cham, December 2018 39. Hagger, M.S., Chatzisarantis, N.L., Harris, J.: From psychological need satisfaction to intentional behavior: testing a motivational sequence in two behavioral contexts. Pers. Soc. Psychol. Bull. 32(2), 131–148 (2006) 40. Hamilton, K., Cox, S., White, K.M.: Testing a model of physical activity among mothers and fathers of young children: integrating self-determined motivation, planning, and the theory of planned behavior. J. Sport Exerc. Psychol. 34(1), 124–145 (2012) 41. Hsu, M.H., Chiu, C.M., Ju, T.L.: Determinants of continued use of the WWW: an integration of two theoretical models. Ind. Manag. Data Syst. 104, 766–775 (2004) 42. Jacobs, N., Hagger, M.S., Streukens, S., De Bourdeaudhuij, I., Claes, N.: Testing an integrated model of the theory of planned behaviour and self-determination theory for different energy balance-related behaviours and intervention intensities. Br. J. Health. Psychol. 16(1), 113–134 (2011) 43. Khan, I.U., Hameed, Z., Yu, Y., Islam, T., Sheikh, Z., Khan, S.U.: Predicting the acceptance of MOOCs in a developing country: application of task-technology fit model, social motivation, and self-determination theory. Telematics Inform. 35(4), 964–978 (2018) 44. Kim, B.: An empirical investigation of mobile data service continuance: incorporating the theory of planned behavior into the expectation–confirmation model. Expert Syst. Appl. 37(10), 7033–7039 (2010) 45. Larsen, T.J., Sorebo, A.M., Sorebo, O.: The role of task-technology fit as users’ motivation to continue information system use. Comput. Hum. Behav. 25(3), 778–784 (2009) 46. Lee, M.C.: Explaining and predicting users’ continuance intention toward e-learning: an extension of the expectation–confirmation model. Comput. Educ. 54(2), 506–516 (2010) 47. Lu, H.P., Yang, Y.W.: Toward an understanding of the behavioral intention to use a social networking site: an extension of task-technology fit to social-technology fit. Comput. Hum. Behav. 34, 323–332 (2014)
278
A. S. Mustafa et al.
48. Luqman, A., Masood, A., Ali, A.: An SDT and TPB-based integrated approach to explore the role of autonomous and controlled motivations in “SNS discontinuance intention.” Comput. Hum. Behav. 85, 298–307 (2018) 49. Shang, D., Wu, W.: Understanding mobile shopping consumers’ continuance intention. Ind. Manag. Data Syst. 117, 213–227 (2017) 50. Wei, J., Vinnikova, A., Lu, L., Xu, J.: Understanding and predicting the adoption of fitness mobile apps: evidence from China. Health Commun. 1–12 (2020) 51. Sweet, S.N., Fortier, M.S., Strachan, S.M., Blanchard, C.M., Boulay, P.: Testing a longitudinal integrated self-efficacy and self-determination theory model for physical activity post-cardiac rehabilitation. Health Psychol. Res. 2(1), 1008 (2014) 52. Roca, J.C., Gagne, M.: Understanding e-learning continuance intention in the workplace: a self-determination theory perspective. Comput. Hum. Behav. 24(4), 1585–1604 (2008) 53. Techatassanasoontorn, A.A., Tanvisuth, A.: The integrated self-determination and selfefficacy theories of ICT training and use: the case of the socio-economically disadvantaged. GlobDev 2008, p. 19 (2008) 54. Kamp, C.V.D.: Acceptance of MOOCs by Dutch university students. Extending the unified theory of acceptance and use of technology (UTAUT) model with the technology acceptance model (TAM) (2019) 55. Vanduhe, V.Z., Nat, M., Hasan, H.F.: Continuance intentions to use gamification for training in higher education: integrating the technology acceptance model (TAM), social motivation, and task technology fit (TTF). IEEE Access 8, 21473–21484 (2020) 56. Lin, T.C., Huang, C.C.: Understanding knowledge management system usage antecedents: an integration of social cognitive theory and task technology fit. Inf. Manag. 45(6), 410–417 (2008) 57. Williams, L.A., Sun, J., Masser, B.: Integrating self-determination theory and the theory of planned behaviour to predict intention to donate blood. Transfus. Med. 29, 59–64 (2019) 58. Wu, B., Chen, X.: Continuance intention to use MOOCs: integrating the technology acceptance model (TAM) and task technology fit (TTF) model. Comput. Hum. Behav. 67, 221–232 (2017) 59. Yu, T.K., Yu, T.Y.: Modelling the factors that affect individuals’ utilisation of online learning systems: an empirical study combining the task technology fit model with the theory of planned behaviour. Br. J. Educ. Technol. 41(6), 1003–1017 (2010) 60. Yuan, S., Liu, Y., Yao, R., Liu, J.: An investigation of users’ continuance intention towards mobile banking in China. Inf. Dev. 32(1), 20–34 (2016)
Adopting React Personal Health Record (PHR) System in Yemen HealthCare Institutions Ziad Saif Alrobieh1(B) , Dhiaa Faisal Alshamy2 , and Maged Nasser3 1 Department of Communication and Computer Engineering, Alsaeed Faculty for Engineering
and Information Technology, Taiz University, Taiz, Yemen 2 Department of Networking and Distributed Systems, Faculty of Information Technology and Engineering, Taiz University, Taiz, Yemen 3 School of Computing, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia
Abstract. Health care is a critical sector of society that requires quality improvement of healthcare services, information technology IT systems have a great impact on improving the quality of these services, unfortunately, The effect of information culture on the implementation of information systems by healthcare providers in the developed countries is little known, considering the importance of information culture. Despite that many Yemeni healthcare facilities have been already using information systems to digitize the management of healthcare providing procedures, The patients’ health information including disease history and prescriptions is not fully recorded and additionally, there is no implementation of Personal Health Record PHR systems where patients can access, and control their health records from another place where their records are stored locally in the healthcare providers’ databases. The existing electronic health record systems are limited and do not exploit the available technology solutions and services. To explore the advantages of using PHR systems, multiple kinds of research are being studied and the proof of use has been cleared by these researchers’ conclusions, also a survey was made; to ensure people intention to using PHR systems, and to observe their opinion on what they need to be provided by the system and what interests them. Designing the web-based system was done after going through the available platforms to choose the most correct and suitable solutions to assure that the system meets the requirements needed. Adopting innovative and modern technological solutions such as PHR web applications is a fine way to improve patient safety and quality of care, increase efficiency, Decision supporting, and Increase patient and health workers’ satisfaction. The proposed solution ensured the patients’ satisfaction and safety by giving them access to their health records whenever and wherever they are through their device’s browser and helped the doctors to make the right decisions and speed up the healthcare providing process which reduced the damage caused by the current systems, saving human lives and preventing serious health issues. Keywords: Health records systems · PHR systems · Healthcare · Technological solutions
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 279–289, 2021. https://doi.org/10.1007/978-3-030-70713-2_27
280
Z. S. Alrobieh et al.
1 Introduction Healthcare is a remarkable sector in which we must be evolving every technological solution or invention we reach; we must try hard to give a better and innovative way of solving the problems in this particular sector. The way healthcare systems are managed is radically changed by technical growth, more specifically by the digital revolution [1]. Broadly, the introduction of the new innovations will allow governments to deliver value-added services to people and shows a range of developments in technology, firsthand health monitoring, and medical care [16]. Despite the remarkable development in the healthcare sector in the whole world, Yemen still has a lot of deprivations and obstacles related to this sector. For Yemeni public health, it has been stated that the information has still not been recognized as a culture. The potential of IT in the health sector is still not properly utilized [3]. As a result of a 4 year ago study on the quality of handwritten prescriptions in Sana’s, Yemen [2] which covers 2178 prescriptions from 23 randomly selected pharmacies with different geolocation considered, 99.12% of the analyzed prescriptions were considered as low-quality prescriptions which have writing errors related to physicians and patient information and to the prescribed medications where spelling and instruction of use are the most errors found. Thus, the current healthcare systems in Yemen as in many other countries don’t exploit the technology revolution and the available software technologies (e.g.: APIs, frameworks, services…etc.) which would make a majestic impact on the treatment process and caregiving procedures if they were implemented. For such systems to meet most of these requirements, they must use web-based solutions that support data sharing. Easy access and availability can be achieved by using appropriate platforms to improve interactivity and make the web app more user friendly and responsive and allow developers to create web-based mobile-friendly applications [17] while getting significant benefits from using specific APIs (i.e. symptoms, a large number of drugs containing information and conflicts) to make it easier, faster and safer to choose the right medicine and automatically provide suggestions and help with making more accurate decisions. The main aim of this paper is to encourage healthcare providers in Yemen to adopt PHR systems by showing them all the benefits that can be gained and additionally direct them with a design proposal along with the required technologies, this design will expose new, innovative technologies to ensure an appropriate design supported by health care APIs for instance which allow healthcare providers to access and use electronic applications and data in more innovative ways [18] than those used in current EHRs and unfortunately, in our country and many other countries, healthcare institutions still use desktop-based HealthCare applications, and even if they are; users -patient- don’t have access to their health data. The complexity and time cost of fully implementing healthcare applications in the health providing procedures increases while the user experience and ease of use are not taking into account at the beginning of building the system, i.e. doctors can’t use these systems to store patients symptoms and prescriptions, because it takes more time or requires full attention or specific conditions in the case of speechrecognition input methods these problems can be solved if the designer of the system takes an effort to make the application easier to use and more simple and time-efficient,
Adopting React Personal Health Record (PHR) System
281
it must benefit the doctors and helps them makes good decisions and prevents common errors, it also must speed up the process not slow it down. This can be achieved easily using modern mobile applications, to make use of the application from every device and from different and multiple places as health data need to be made available wherever or whenever, users can simply use their phones, tablets and any portable device’s browser with a decent internet connection. This allows individuals to access their PHRs via the Internet, using state-of-the-art security and privacy controls, at any time and from any location [12]. Recent web-based technologies opened the door for completely new possibilities for creating various medical information systems. Web-based applications are offering competitive benefits to old-style software-based systems, permitting businesses to consolidate and streamline their systems and processes and decrease costs [10]. Technology is becoming more and more advanced as we can now use IoT devices, sensors to measure and monitor human health metrics, send them into mobile or web applications, and access them through our portable devices and computers. The Personal Health Record (PHR) is an Internet-based set of tools that allows people to add, maintain, access, and coordinate their lifelong health information [12– 15], and make appropriate parts of their own medical and health-related information available to those who need it (specialists, doctors, nurses, family members, etc.) [12]. Whereas EMR is a patient’s health information inside a specific medical institution that is unshared in more than one. EHR has the same meaning of EMR but in addition of sharing it among more than one of an institution, but all the operations that are done on them -both EMR and EHR-, done by medical professionals or staff of the institution and no way to the patient to have hand in it as management or control. PHR comes to include the EHR concept with making the information more flexible and giving the patient the complete power to manage, control, and provide access to it. So PHRs patients are more comfortable with adding information to their health record and review all records at any time. Moreover, privacy is felt due to who can access what exactly in his records regardless of geographical distance. PHR abbreviates all that patients and health providers need to know from any place at any time and makes it easy to share with keeping privacy and giving the patient all his rights to deal with his health information.
2 Related Works This section will cover both related works on persuading healthcare providers to adopt web-based applications, and similar systems proposals. Since this paper aims to encourage healthcare providers to adopt web-based PHR systems, and to provide a proposed system design with the technologies which would be used, and the main idea for this proposed work allowing the patients in Yemen to have their medical records and health data on their own. The idea was to go over multiple researches and applications that discussed the implementation of PHR systems including the advantages and disadvantages of health information access for both patients and doctors. As a motivation for healthcare providers, implementing good electronic health services is an effective step toward making the patient more satisfied, where The quality of healthcare services in Yemen from the patient’s perspective has been studied and
282
Z. S. Alrobieh et al.
researched by Mr. Bashar Mohammed Al-Sofyani [8] in his thesis for the master degree of public health, he concluded that Satisfied patients due to providing good quality of care are more likely to comply with treatment and continue to use services. This will improve utilization and will finally lead to better general health indicators. Albokai et al. [4] improved the Quality of Healthcare by using the Information Technology System in the Hospitals of Yemen showed that it improves patient safety and quality of care, increases efficiency, Decision supporting, and Increase patient and health workers satisfaction. However, it was observed that multiple healthcare providers in Yemen have already implemented EHR systems but in limited services and without supporting the information access for patients. Although collecting information about patients is important and critical to construct a proper treatment plan for the patient. This is considered as one of the most important difficulties facing the general medical staff and the private doctors [4]. The adoption of PHRs and EHRs with patient access-support should be considered and “Late adopters of the electronic health record should move now” [9]. A Proposed PHR Architecture for Saudi Arabia Health Services concluded that such a system once approved for adoption in Saudi Arabia, will improve the health services and it will assist in disease prevention and emergency treatment intervention. They also hypothesized that increased patient engagement in their healthcare can improve the quality of the provided services and surely improving their health lifestyle [7]. A study aimed to elaborate the functional specifications of the pregnant woman PHR and to create and propose a prototype, although the study followed some functional principles i.e. (promoting information sharing among women, health professional’s hospitals and diagnostic services and promoting documentation of care), a specific design technology wasn’t provided [12]. P. Thummavet1 and S. Vasupongayya2 [13] propose a novel scheme for handling accesses to PHR information in emergencies, they focused on how to give emergency staff access to PHR information even when the owner is not able to give his/her consents using threshold cryptosystem, based on the owners’ PHR policy. The system consists of three levels of confidentiality (security, restriction, and exclusiveness), the PHR owner can define a confidentiality level to each record before it is uploaded to PHR server and emergency staff will have variety access ability either encryption key through a service provider (EmS) used for encryption or instantly if they are trusted users and pre-selected by the PHR owner. Although This scheme is efficient in case of security, somehow has a level of complexity for simple users. Muhammad H. Aboelfotoh et al. [14] proposed mobile-based system architecture that allows patients to use the online PHR systems that they are subscribed to and at the same time use their portable devices to provide direct data access to physicians using authenticated and integrated Backend infrastructure without fully interconnecting healthcare systems network, However, their proposal requires an existing online PHR system along with additional requirements i.e. (Smart health Card, Healthcare Provider (HCP) terminal) which increase cost and system complexity. Yeong-Tae Song et al. [15] proposed a PHR system that utilizes applications standards such as SNOMED CT, and HL7 CDA to achieve interoperability between different EHRs and PHR systems, a mobile application is used to collect medical data and store it in HL7 CDA format. Their model consists of four main models; Clinical Data Collection
Adopting React Personal Health Record (PHR) System
283
Module: the mobile application is used to collect medical data and generate CDA files that can be uploaded to a cloud-based management system, Cloud file Manager Module: this model used to store the CDA files for each individual, CDA Query Module: which uses XML parsing program to search nodes and extract codes and other values so that the Diagnosis Module can use them as input, Diagnosis Module: the extracted codes will be used to create the clinical decision logic, matches symptoms in the personal medical data to the diagnosis rules, they used Rule-based system CLIPS.
3 Advantages of Using and Improving the PHRs Systems Patients who have medical documents comprise advisory opinions, lab results, prescriptions, and MR, CT, Ultrasound images, and such on. in various formats and forms. That what makes the patient’s medical information is stored in different places according to which institution the patient goes to, which makes the ability to access it, by the patient himself and share it with others, is necessary to face changing places especially for people that travel a lot. What keeps effort, time, and cost that took in repeating diagnoses with repeating examinations and such like. Or the risk of taking incongruity medicine to end a patient’s life. Actually, there is no way to limit the need to use personal health records (PHR) but at the same time, we must ensure high performance, whether in-facility or speed and care about security. So, such advantages can be summarized as follows: • PHR provides a continual monitor for patients’ health status and acquaintances all necessary health information (medical history, medical examination data, physiological parameters, healthy lifestyle, etc.) at any time, anywhere, from any platform [8]. • Educating patients: Patient-accessible medical records improved recall and understanding of medical information by objective measurement in two randomized controlled trials. Among medical outpatients, smokers who received a copy of their most recent progress note were significantly more likely to identify smoking as a problem 2 weeks after their appointment, and this trend persisted at 6 months. 43 Older patients with chronic medical conditions also showed significant increases in their recall of medical problems and treatment plans that did not involve medications. • Empowering patients. • Improving doctor-patient communication. • Improving patient satisfaction. • Patient-accessible medical records are particularly helpful for patients who are concerned about what might be hidden in the chart. • Facilitating correction of errors: Patients found inaccuracies in the medical record in many of the studies. A descriptive study of medical inpatients found that half of the patients “made some addition or correction on a point of fact.” • Effects on documentation: Although both patients and staff had the impression that patient access to the records changed documentation patterns, little change was identified on objective analysis, and made the staff more accurate in what they wrote.
4 Methodology and System Design Several hospitals and healthcare institutions were visited to explore and identify existing EHR systems. Many hospital personnel was questioned and the level of satisfaction was
284
Z. S. Alrobieh et al.
observed with the current systems. In this work, each patient will have his or her health profile, accessible at all times and across mobile or digital devices, to support the existing healthcare systems with productivity, time-saving, and data sharing probability. We first have to measure the level of participants and willingness to use an online PHR system, and what would be the most important part of the system that they need, and will encourage them the most to use the system. And second, by going over multiple proposed PHR systems to collect ideas about what the system should implement. Having this information will help us to propose the most proper and suitable plan for constructing a PHR system for our society. 4.1 Determining Level of Interest Either From the patient’s perspective or the doctors, it’s hard for them to collect the health data, we can see that After making a questionnaire about the effectiveness of the proposed system idea and. The survey contained a group of questions that the respondents had to answer. The target of the survey was every individual at the age older than 15. A total of 131 random individuals were given a link to a Google form that must be submitted in 3 days. The included questions aimed to observe and study the level of excitement and interest to have such ability, it is also aimed to see how much they see the importance of this idea and how useful it would be. And was divided as follows: • Basic information (age-gender-education level- having a chronic disease): People of different ages will have different interests and needs of PHR system parts, As well as gender and education level, additionally the possibility of having a chronic disease will affect the willingness of people to use the system. Although Table 1 shows the level of interest for peoples of different genders and health states, in general, we can see that more people are encouraging this idea and interested in using a PHR system. • Online activities related to healthcare: • Search about some diseases: seeking information about a health issue or disease is critical while some sources may be tricky or misleading. • Search for information about a doctor or contact a doctor by email or specific application. • Used a PHR or health-related application before: having an experience will have a different impact on the level of participation. • Received a notification about a test result or browse his medical prescriptions with his mobile. • Feelings about the idea of having PHR Four levels of interest will be provided (highly interested, interested, not interested, highly not interested (thinks that the idea will have a negative impact). As it is shown in Table 1 most of the participants are interested in having and using a PHR system no matter how their health conditions would be.
Adopting React Personal Health Record (PHR) System
285
• Prioritizing system services: Prioritize the system aspects from most important to less important from everyone’s perspective most important service they wish the system will provide, and what they think it’s not that important. Figure 1 is shown the priority of health services from participants’ perspectives. • Concerns and limitations: What are the negative sides of implementing this system and what concerns them and probably prevents them from using the system and if there are limitations?
Table 1. Level of interest based on gender and chronic disease Basic information Gender Have chronic disease
Filter type
Participants number (%) Highly interested
Interested
Not interested
M
31 (41.9%)
28 (37.8%)
15 (20.27%)
F
17 (28.8%)
27 (45.76%)
15 (25.42%)
Yes
5 (55.5%)
3 (33.3%)
1 (11.1%)
No
39 (34.2%)
49 (42.98%)
26 (22.8%)
HEALTH SERVICES PRIORITY FOR INDIVIDUALS sharing health informa on with healthcare providers browsing health instruc ons and informa on childerens health monitoring reminders of medical tests results control and manage family health informa on doctor visits reserva on contac ng doctors adding new priscrip ons for each visit brows test results informa on from a trustable source
160 140 120 100 80
60
40
20
strickly important
0 important
not important
Fig. 1. Health services priority for individuals
Additionally, to gain a better insight into the positive impact on both patients and doctors, and to ensure the effectiveness of the proposed system, by going over multiple
286
Z. S. Alrobieh et al.
researches and articles that proved the usefulness of using PHR systems in general and using web-based systems in specific. 4.2 System Technology Selection To make the application be accessed from every user device from different and multiple places, as health data requires to be available no matter where or when this can be fulfilled easily with the use of mobile applications, users can simply use their phones, tablets, and any portable device’s browser with a decent internet connection. React-NodeJS Web Application The system designer must make an effort to facilitate the interactive, time-efficient, and interactive use of the application. React’s capability assists in designing a simpler user interface that facilitates the application and enhances the user experience. In this section, we will explain more about the React.js framework. React is a component-based library which is used to develop fast interactive UI’s (User Interfaces). It is currently one of the most popular JavaScript front-end libraries which has a strong foundation and a large community supporting it. As list advantages and giving some reasons why React-Js was the chosen technology, we can summarize that in the following points: • Easy creation of dynamic applications: because it requires less coding and offers more functionality, as opposed to JavaScript, where coding often gets complex very quickly. • Improved performance: Where react uses Virtual DOM (VDOM) thereby creating web applications faster. Virtual DOM compares the components’ previous states and updates only the items in the Real DOM that were changed, instead of updating all of the components again, as conventional web applications do. • Reusable components: Components are the building blocks of any React application, and a single app usually consists of multiple components. These components have their logic and controls, and such that can be reused throughout the application, which in turn dramatically reduces the application’s development time. • Unidirectional data flow: This means that when designing a React app, developers often nest child components within parent components. Since the data flows in a single direction, it becomes easier to debug errors and know where a problem occurs in an application at the moment in question. • Small learning curve: React is easy to learn, as it mostly combines basic HTML and JavaScript concepts with some beneficial additions. Still, as is the case with other tools and frameworks, you have to spend some time to get a proper understanding of React’s library. • JSX: JSX stands for JavaScript XML. It’s an XML/ HTML-like syntax used by React. • Virtual DOM: Manipulating real DOM is much slower than manipulating VDOM because nothing gets drawn on the screen. When the state of an object changes, VDOM changes only that object in the real DOM instead of updating all of the objects. • Performance: React uses VDOM, which makes the web applications run much faster than those developed with alternate front-end frameworks. React breaks a complex user interface into individual components, allowing multiple users to work on each component simultaneously, thereby speeding up the development time.
Adopting React Personal Health Record (PHR) System
287
• Extensions: React goes beyond simple UI design and has many extensions that offer complete application architecture support. It provides server-side rendering, which entails rendering a normally client-side only web application on the server, and then sends a fully rendered page to the client. It also employs Flux and Redux extensively in web application development. • One-way data-binding: that means unidirectional data flow as explained previously. • Debugging: React applications are easy to test due to a large developer community. 4.3 System Framework and Users’ Interface Using created auto-complete recommendations from multiple APIs, such as (medicine names, symptoms, prescriptions, and medical instructions), the system’s ease of use and efficiency will be ensured, which will give the PHR some kind of ease and aid in decisionmaking and reduce time costs while medical staff provides this information. APIs can also be used as a source to get valuable information about diseases and medicines; thus, it helps to get the correct information for the patients and prevent misleading, confusing,
This one of the most important part of the system where the prescripƟons for each visit will be recorded and valuable informaƟon and guidance will be stated to help the paƟent and give hem insights about the prescripƟon medicines like Ɵme to take and reminders and duraƟon of the prescripƟon, it will also provide some important instrucƟons the paƟent will have to consider.
Will contain a list of the diseases and medical condiƟons that the paƟents had along with specific dates of treatments and the period of each illness. MulƟple helpful properƟes will be provided i.e. a property called (Illness Sate) will indicate the state of that illness and show either the paƟent has fully recovered or not and other related properƟes according to the illness type.
Every time the patient visits the doctor, the visit details will be recorded such as medical tests ordered by the doctor or the medical diagnosis, and doctors notes about the paƟent condiƟon. It will also contain the list of prescripƟons given by the doctor and associated illness for the current condiƟon if the paƟent have a medical history of specific illness .
Fig. 2. Main PHR system components
The medical reports and tests results that the patient have taken during any treatment session, including the reports provided by the doctors for the patient visits and his medical state , the reports will also include any examinaƟon files as pictures or the other format.
288
Z. S. Alrobieh et al.
and sometimes wrong information from untrusted websites. Several advantages have been proposed by this system, and one of this advantage is when the patient leaves, he will be able to access all his recent treatment procedures and brows his medical reports and prescriptions through his browser anywhere in order to get any information he needs or instructions or medicines he has to take, along with other parts as it is shown in Fig. 2. The second advantage is, if he ends up visiting another doctor or had another medical condition and transferred to another health provider, the medical staff can easily find the information they needed in order to construct a proper treatment plan.
5 Conclusion Seeking health information access is not just claimed from the doctors but in fact, it has already been a legal right in many countries [6]. Adopting PHR system is highly required for the sack of a better HealthCare for every individual in our country, not only because other people in other countries have such system, but having such system in our country would help to increase people’s safety and gives them a great satisfaction about the services of the caregivers. Our study shows that Yemeni people are different from others country’s people and a high percentage of people feel that they need that kind of systems they can use to assets them into having a better health state and make it easy for them to get any information they need without worrying about whether it’s wrong or dangerous to take these medications in their prescriptions, they want to share their health information without going over all the healthcare places they went to, and they are ready to continue using the service that satisfies their needs. Thus, caregivers in our country must make a fast move towards implementing PHR systems. Our study proposed the system components they should put in mind when they decide that they will implement PHR system, that’s what people need, and that’s what they want, they can’t effort the high cost of expensive PHR systems and they fear that they may not be able to use a complex PHR system so it has to be simple and clear to them when they use the application, ease of use and interactivity must be fulfilled; that’s why we recommend using react as a framework to design the system. Our future work will focus on privacy and security when decentralized databases are used and blockchain technology using Hyperledger fabric is implemented. Interoperability is another issue that we can discuss to share healthcare data between multiple systems using scandalized records systems and design the system to support global standards like HL7 and other CDA.
References 1. Laurenza, E., et al.: The effect of digital technologies adoption in healthcare industry: a case based analysis. Bus. Process Manag. J. 24, 1124–1144 (2018) 2. Mohammed Al-Worafi, Y., Patel, R.P., Zaidi, S.T.R., et al.: Completeness and legibility of handwritten prescriptions in Sana’a Yemen. Med. Princ. Pract. 27(3), 290–292 (2018). https:// doi.org/10.1159/000487307 3. Mukred, A., Singh, D., Safie, N.: Investigating the impact of information culture on the adoption of information system in public health sector of developing countries. Int. J. Bus. Inf. Syst. 24(3), 261–284 (2017)
Adopting React Personal Health Record (PHR) System
289
4. Albokai, N., Liu, L., Alragawi, A., Albokai, A.: Improving the quality of healthcare by using information technology system in the hospitals of Yemen. Open J. Bus. Manag. 07, 728–754 (2019). https://doi.org/10.4236/ojbm.2019.72049 5. The impact of patient characteristics and the Internet usage on potential Personal Health Record (PHR) adoption in Primary Care 6. van Mens, H.J.T., Duijm, R.D., Nienhuis, R., de Keizer, N.F., Cornet, R.: Determinants and outcomes of patient access to medical records: systematic review of systematic reviews. Int. J. Med. Inform. 129, 226–233 (2019). https://doi.org/10.1016/j.ijmedinf.2019.05.014. Medline: 31445260 7. Mafawez, A., Qawqzeh, Y.: Proposed PHR architecture for Saudi Arabia health services. J. Eng. Appl. Sci. 4(1), 26–31 (2017) 8. Parkhomenko, A., Tyshchenko, I.: Research and Development of the API for Personal Health Record. CMIS (2019) 9. Rumball-Smith, J., Ross, K.: Bates DW late adopters of the electronic health record should move now BMJ Qual. Saf. 29, 238–240 (2020) 10. Lazakidou, A.: Web-based applications in healthcare. In: Lazakidou, A. (eds.) Web-Based Applications in Healthcare and Biomedicine. Annals of Information Systems, vol. 7. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-1274-9_9 11. Ariani A., Koesoema A.P., Soegijoko, S.: Innovative healthcare applications of ICT for developing countries. In: Qudrat-Ullah, H., Tsasis, P. (eds.) Innovative Healthcare Systems for the 21st Century. Understanding Complex Systems. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-55774-8_2 12. Duran, A., Galuscan, A., Muntean, C.: Proposed structure of personal health records for pregnant women. Med. Evol. XVI(1) (2010). Timis, oara 13. Thummavet, P., Vasupongayya, S.: A novel personal health record system for handling emergency situations. In: 2013 International Computer Science and Engineering Conference (ICSEC), Nakorn Pathom, pp. 266–271 (2013). https://doi.org/10.1109/ICSEC.2013. 6694791 14. Aboelfotoh, M.H., Martin, P., Hassanein, H.S.: A mobile-based architecture for integrating personal health record data. In: 2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom), Natal, pp. 269–274 (2014). https://doi.org/ 10.1109/HealthCom.2014.7001853 15. Song, Y., Hong, S., Pak, J.: Empowering patients using cloud based personal health record system. In: 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Takamatsu, pp. 1–6 (2015). https://doi.org/10.1109/SNPD.2015.7176216 16. Aceto, G., Persico, V., Pescapé, A.: The role of information and communication technologies in healthcare: taxonomies, perspectives, and challenges. J. Netw. Comput. Appl. 107, 125–154 (2018) 17. Shahzad, F.: Modern and responsive mobile-enabled web applications. Procedia Comput. Sci. 110, 410–415 (2017) 18. Zayas-Cabán, T., Chaney, K.J., Rucker, D.W.: National health information technology priorities for research: a policy and development agenda. J. Am. Med. Inform. Assoc. 27(4), 652–657 (2020)
Artificial Intelligence and Soft Computing
Application of Shuffled Frog-Leaping Algorithm for Optimal Software Project Scheduling and Staffing Ahmed O. Ameen1(B) , Hammed A. Mojeed1 , Abdulazeez T. Bolariwa1 , Abdullateef O. Balogun1,2 , Modinat A. Mabayoje1 , Fatima E. Usman-Hamzah1 , and Muyideen Abdulraheem1 1 Department of Computer Science, University of Ilorin, PMB 1515, Ilorin, Nigeria
{aminamed,mojeed.ha,balogun.ao1,mabayoje.ma,usman-hamza.fa, muyideen}@unilorin.edu.ng, [email protected] 2 Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, 32610 Bandar Seri Iskandar, Perak, Malaysia
Abstract. Software Project Scheduling Problem is one of the most crucial issues in software development because it includes resources planning; cost estimates, staffing and cost control which if not properly planned affect the timely completion of the software project. Software project scheduling is a problem of scheduling the tasks (work packages) and employees in such a way that the overall project completion time is minimized without violating dependency constraints (tasks dependencies) and being consistent with resource constraints. This study adopts a Search Based Software Engineering approach that focuses on multi-objective optimization for a software project planning using the Shuffled Frog Leaping Algorithm, a memetic meta-heuristic algorithm. The objectives are optimal ordering of work packages without dependency violation and allocation of staff to the work packages such that only employee(s) with required competence(s) are allotted to a given work package. The study was carried out in four stages, namely: frog (solution) representation, definition of the fitness function, implementation of Shuffled Frog Leaping Algorithm and evaluation with a randomly generated Software Project Scheduling Problem. The study concludes that it is possible to find an efficient solution to a Software Project Scheduling Problem by implementing the SFLA than any other traditional computing means which are tedious, error prone and costly. Keywords: Shuffled Frog-Leaping Algorithm · Software Project Scheduling Problem · Software project planning · Search Based Software Engineering
1 Introduction Software development for organizations is a very complex task as it deals with managing people, technologies and business processes [1]. In software development process, effective planning is important because failure to plan and/or poor planning can result in © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 293–303, 2021. https://doi.org/10.1007/978-3-030-70713-2_28
294
A. O. Ameen et al.
unnecessary delays and overhead costs [2]. Due to this uncertainty incurred in planning software project, given timing and budget constraints are often unacceptable; which in turn leads to business critical failures. Software development companies often struggle to deliver projects timely, within budget and with required quality. Possible causes of this problem are poor project scheduling and ineffective team staffing [3]. Therefore, software engineering projects require good software project management techniques to ensure that projects are completed on schedule and within budget [4]. In order to achieve proper planning and management of software project, tasks need to be optimally scheduled and resources be effectively allocated. Scheduling is setting a sequence of time-dependent functions to execute a set of dependent tasks that constitute a project [5]. Dependency of tasks in terms of priority and precedence is very crucial to software project scheduling. There are priorities constraints between tasks in projects, but in addition to these constraints there may be another kind of constraints between tasks based on resource allocation [5]. Apart from considering priority and precedence limitations, scheduling should be carried out in a way to be consistent with resource constraints. Good allocations (team staffing) are very crucial for software projects, since humans are their main resources [8, 20]. The importance of effective software project scheduling cannot be overemphasized when managing the development of medium to large scale projects as it is required to carry out projects that can meet the deadline and budget [8]. Software Project Scheduling Problem (SPSP) is a kind of optimization problem that seeks to find optimal schedule for a software project so that the precedence and resource constraints are satisfied and ensuring that project cost and duration are minimized [3]. This problem has been found to be Non-deterministic Polynomial (NP)-hard [9, 19]. To solve this problem, meta-heuristic evolutionary algorithms such as Genetic Algorithm [10], Ant Colony Optimization [9], Shuffled Frog Leaping (SFL) algorithm [4] and Differential Evolutionary Algorithm [5, 6] have been successfully applied. Majority of these studies however, consider only task scheduling for the formulation of the problem [21–23]. There is a need for studies that combines tasks scheduling and staffing (allocation of jobs to developers) in software development project planning problem. In this work, a memetic approach based on Shuffled Frog-Leaping Algorithm (SFLA) is presented for optimal project scheduling and staffing when the objectives are combined. 1.1 The SFLA Algorithm Shuffled Frog Leaping Algorithm (SFLA) is a novel memetic meta-heuristic first proposed by Eusuff and Lansey [11] for solving combinatorial optimization problems and it was first used to solve problem of water resource in network distribution [12]. The SFLA has been designed as a meta-heuristic to perform an informed search using a heuristic function [13]. The SFLA is a fusion of deterministic and random approaches. The deterministic strategy allows the algorithm to use response surface information effectively to guide the heuristic search as in Particle Swarm Optimization (PSO). The random approach ensures the flexibility and robustness of the search pattern. The SFLA does not specify the individuals belonging to it population rather, it uses an abstract model, called a virtual population [13].
Application of Shuffled Frog-Leaping Algorithm
295
The search begins with a randomly selected population of P frogs (i.e. solutions). The population is partitioned into several m memeplexes (parallel communities) that can evolve independently to search the solution space in different directions. The individual frogs contain ideas (memes) that can be influenced by the ideas of other frogs within each memeplex and evolve through an optimization process refeered to as memetic evolution [14]. Memetic evolution enhance the quality of worst frog Xw and guide its performance towards a goal. To ensure that the evolution process is competitive, it is required that frogs with better memes (ideas) contribute more to the development of new ideas than frogs with poor ideas. During thi evolution step, the frogs may change their memes using the information from the memeplex best frog Xb or the global best frog Xg of the entire population [13]. Accordingly, the position of the frog with the worst fitness is adjusted using Eqs. 1–2. Change in Frog position: Di = rand () . (Xb − Xw )
(1)
Xnew = Xw + Di ; (Dmax >= Di >= −Dmax )
(2)
New position:
Where rand() is a random number between 0 and 1; and Dmax is the maximum allowed change in a frog’s position. If this process yields a better frog (solution), it replaces the worst frog. Elsewise, the calculations in Eqs. (1) and (2) are repeated with respect to the global best frog (that is Xg replaces Xb). If no improvement becomes possible in this latter case, then a new solution (frog) with any arbitrary fitness is randomly generated to replace the worst frog [14]. The calculations then continue for a specific number of evolutionary iterations within each memeplex. After a number of memetic evolutionary steps, ideas are pass among memeplexes in a shuffling process (global search). The local search and the shuffling processes continue until convergence criteria are satisfied. The algorithm has been tested on several combinatorial problems and found to be efficient in finding global solutions [14]. The core parameters of SFLA are: population size, P, number of memeplexes, m, and number of evolutionary iterations in each memeplex, q [3].
2 Related Works Considering the application of SFLA, Elbeltagi, Hegazy and Grierson [14] compared the searching mechanism of the Genetic Algorithm (GA) with that of the SFLA and the experimental results of the comparison show that the SFLA have better performance than the GA in solving some problems of continuous functions. Their work also proposed an improved SFLA, introduced a new parameter called search-acceleration factor (C) to the original formulation of the SFLA, analyzed the positive role of the new parameter and solved discrete and continuous optimization problems. Nejad, Jahani and Sarlak [15] applied SFLA to Economic Load Dispatch (ELD) problem in power system. Their objective was to find the optimal combination of power generations that minimizes the total generation cost while satisfying an equality constraint and inequality constraints. Two
296
A. O. Ameen et al.
representative systems (IEEE 30 bus and 57 bus) were used to test their proposed SFLA algorithm in comparison with the GA based method for the solution of the ELD problem. The result proved that the SFLA technique was faster than the GA technique. Also, Liping, Weiwei, Yefeng and Yixian [16] introduced the SFLA to solve an uncapacitated Single Level Lot Sizing (SLLS) problem and gained ideal results. Gerasimou et al. [17] investigated the application of a Particle Swarm Optimization (PSO) algorithm to software project scheduling and effective team staffing. The study aims to create optimal project schedules by specifying the best sequence for executing a project’s tasks to minimize the total project duration and seeks to form skillful and productive working teams with the best utilization of developer skills. A combination of Constriction-PSO and Binary-PSO variations were employed to solve the problem. Results from empirical experiments showed that PSO was able to generate feasible solutions with feasibility rate of approximately 100% and hit rate of virtually 100% in all of considered problems. However, as the complexity and size of the problems increase a progressive decrease in these percentages is observed reaching as low as 30%. This shows that the employed algorithm still encounters difficulties in producing optimal solution as project complexity increases. Chen and Zhang [18] developed an approach based on an event-based scheduler (EBS) and an ant colony optimization (ACO) algorithm for optimal project scheduling and staffing. The model employed the event-based scheduler to simplify the restricted flexibility of human resource allocation. The project plan was model as task list and employee allocation matrix, then Ant Colony Optimization (ACO) algorithm was applied to solve the problem. Experimental results showed that the representation scheme with the EBS is effective, and the proposed algorithm manages to yield better plans with lower costs and more stable workload assignments compared with other existing approaches such as the Tabu Search (TS) algorithm for the multiskill scheduling Problem, the knowledge-based GA (KGA) and the time-line-based GA. The study however considered not the employee experience in the formulation. Stylianou and Andreou [7] proposed a procedure for software project managers to support their project scheduling and team staffing activities by adopting a genetic algorithm approach as an optimisation technique in order to construct a project’s optimal schedule and to assign the most experienced employees to tasks. Experimental results obtained revealed that the genetic algorithm is capable of finding optimal solutions for projects of varying sizes when using either one of the objective functions. However, when the objective functions were combined, the genetic algorithm presents difficulties in reaching optimal solutions especially when having preference to assign the most experienced employees over the project’s duration. This study presents SFLA as a memetic meta-hueristic algorithm to tackle this shortcoming. Recent works have focused on combining task scheduling and team allocation/ resource assigning based on multiple skills (as also adopted in this study) using different optimization approaches. Lin, Zhu, and Gao [24] proposed a genetic programming hyper-heuristic algorithm for minimizing makespan in multi-skill resource constrained project scheduling problem (MS–RCPSP). Comparisons with existing algorithms such
Application of Shuffled Frog-Leaping Algorithm
297
as HACO, GRASP and DEGR showed that the proposed algorithm performed considerably better with regards to solution quality and convergence rate. The same multiskill formulation was also employed by Li et al. [25] with focus on skill evolution and cooperation effectiveness in project scheduling. Van Den Eeckhout, Maenhout and Vanhoucke [26] applied a heuristic procedures based on iterated local search to an integrated personnel staffing problem and the project scheduling problem formulation such that the demand for staff and the scheduling of the resources is determined simultaneously as proposed in this study. However, their objective is to determine the personnel budget that minimizes project cost rather than combining minimized completion time and cost objectives. Recently, an optimization procedure for large scale resource constrained multi-objective project scheduling problem based on cooperative coevolution was proposed by Shen, Guo and Li [27]. Duration and cost are considered together as objectives with employees’ satisfaction. Experimental results on 15 randomly generated large-scale instances with up to 2048 decision variables indicated the high scalability of the proposed approach with regards to convergence ability.
3 Methodology To model the problem, Design Structure Matrix (DSM) which enforces the dependencies among tasks was used. It is represented as a jagged array of two-dimension where row indices represent WP ids. Using an hypothetical software project consisting of seven WPs, an example of a modeled DSM is shown in Fig. 1.The DSM indicates that WP1 does not depend on any task before it can actually start, WP1 must finish before WP2 can start, WP1 and WP2 must finish before WP3 and WP4 can start, WP1, WP2, WP3 and WP4 must finish before WP5 can start, WP4 and WP5 must finish before WP6 can finish and WP5 must finish before WP7 can start. For a software project scheduling problem, the number of WPs is usually less than or equal to 2n – 1, with n representing the number of employees required to complete the project.
WP 1 WP 2 WP 3 WP 4 WP 5 WP 6 WP 7
1 1 1 1 4 5
2 2 2 5
3
4
Fig. 1. DSM model representation of dependencies constraints
The staff allocation is modeled using binary representation of an integer number × having a value in the interval 1 to 2n – 1, where n equals the employee involved in the project. The value of each bit in the binary equivalence denotes an employee involvement in the current task. A value of 1 means the corresponding employee is
298
A. O. Ameen et al.
allocated for the given WP and 0 means the employee is not allocated. Starting from the left, the first bit denotes employee1 s involvement in the task, the next bit represents employee2 s involvement and so on. Assuming that four employees are available for the project represent by the DSM in Fig. 1, the employee assignment of any of the WPs is the binary equivalence of a number between 1 and 15. An example of employee assignment under this representation is presented in Table 1. Associated with each employee is skill set represented as a linear array of skill types. Also, for each WP, the required competence(s) is defined which is represented as an n-array of skill types. The total required competence of a WP is the sum of all the inherent skill set possessed by the team of employees assigned to the WP. Table 1. Employee assignment representation Work package Employee assignment Binary equivalence Remarks 1
2
0010
Task assigned to only employee 3
2
11
1011
Task assigned to employees 1, 3 and 4
3
15
1111
Task assigned to employees 1, 2, 3 and 4
4
5
0101
Task assigned to employees 2 and 4
5
4
0100
Task assigned to only employee2
6
13
1101
Task assigned to employees 1, 2 and 4
7
7
0111
Task assigned to employees 2, 3 and 4
3.1 Frog Representation A frog represents a feasible solution to project scheduling and staffing problem. It is encoded as an n × 2 array where each row consists of a WP id and an integer number representing employee assignment. The row index indicates the position of the WP in the WPs ordering. For example, row index 0 indicates position (POS) 1, and the associated WP start first before any other WP. A typical frog schema is shown in Fig. 2. 3.2 SFLA Design Shuffled Frog Leaping Algorithm (SFLA) works generally as follows: At first, a virtual or random population of frogs is created (where p is the population size). Subsequently, the fitness of the individual frogs is evaluated. Afterwards, the frogs are sorted in descending order of their fitness (that is the fittest to the worst). Thereafter, the frogs are partitioned into m memeplexes. Then, a local search is performed within each memeplex. During
Application of Shuffled Frog-Leaping Algorithm
1 2 4 3 5 7 6
299
1 5 6 7 2 3 7
Fig. 2. A frog representation
each intra-memeplex local search, the best frog and the worst frog are identified as Xb and Xw respectively and the global best frog is identified as Xg. Then, a process is applied to improve only the worst frog, excluding other frogs. Consequently, in this approach the position of the worst frog (Xw) is adjusted using Eqs. 3–7. chunkLength = 0.5 × frog_size
(3)
Start = rand() × (frog_size − chunkLength
(4)
I = chunkLength + start; start = 0
(8)
Where: V = number of dependency violations, M = number of skill mismatches and n = number of WPs for frog A.
4 Results and Discussion The modified SFLA was tested on a randomly generated software project scheduling and staffing problem consisting of seven (7) WPs, twelve (12) dependencies, five (5) employees. The population size is varied as 100, 150, 200 and 300. The number of memeplexes is also varied as 5, 10 for each cases of the population size. This variation of population size and memeplex size is necessary because there are no generally acceptable criteria for choosing population size and number of memeplexes for a given problem. These parameters together with the number of evolutionary iterations greatly influence the performance of the algorithm. The number of evolutionary iterations per memeplex is set to 2N, where N is the number of frogs in each memeplex as proposed [4]. Owing to the fact that SFLA works on a virtual or randomly generated population and tries to improve the frogs based on the convergence criteria set, its result and how well the improvement of frogs is done is always time varied. Hence, there is need to run the algorithm a number of instances and then average the results to have a better evaluation of the performance of the algorithm on a given problem and how well the feasible solutions to the problem (frog) are improved before finally selecting the best solution. Table 2 presents the results of experiments carried out on the random problem using SFLA with varied population size and number of memeplexes, and pure random search.
Application of Shuffled Frog-Leaping Algorithm
301
A total of twenty (20) independent runs as proposed in [28] were performed on each case of the variation and the results were averaged. The same experiments were also carried out with pure random search and for a comparison with the proposed approach. All algorithms are implemented in Java. Table 2. Experimental results of SFLA and random search Population size
Number of memeplex (es)
100 150 200 300
Average fitness (SFLA)
Average fitness (random) 4.12
5
0.27
10
0.49
5
0.23
10
0.32
5
0.16
10
0.28
5
0.15
10
0.22
5.11 4.79 5.14
It can be deduced that the algorithm worked better on a larger population size and for the same population size when the number of memeplex was varied, the lower the number of memeplex, the better the improvement of the whole population, the average fitness of the individual frogs in the population and the selected best solution. The SFLA approach was also compared with a pure random approach of generating feasible solutions (frog) based on a set threshold (maximum of 3 dependency violation can be made) and the proposed SFLA proved better when compared with the results of the random search approach. Figure 4 presents this comparison in a line graph.
6
Average Fitness
5 4 3
Ramdom Search
2
SFLA
1 0 100
150 200 Populaon size
300
Fig. 4. Average fitness comparison of SFLA and random search
302
A. O. Ameen et al.
From Fig. 4 it is observed that SFLA significantly outperformed random search in all population sizes with difference of up to 4.92 average fitness. This result revealed the effectiveness of SFLA in handling project scheduling and staffing problem under our formulation.
5 Conclusion In this work, a good data structure that enforces dependency constraints among Work Packages (WPs) was successfully adopted. The study was able to find a mathematical representation with easy implementation for staff allocation. This enables the adoption of a good data structure in representing a frog (solution) that will cater for both work package ordering and staff allocation. The study adopts the power of SFLA to find the near-optimal solution for randomly generated Software Project Scheduling Problem (SPSP) and a comparison was made with a purely random approach. The SFLA approach in project planning provides a new, effective and efficient perspective to recent software projects scheduling. The result analysis of the study shows that it performs reasonably well in project scheduling. In the future, we plan to include more objectives, carry out empirical studies with real world project scheduling standard problem instances and compare results with existing studies.
References 1. Kang, K., Hahn, J.: Learning and forgetting curves in software development: does type of knowledge matter? In: ICIS 2009 Proceedings, p. 194 (2009) 2. Mojeed, H.A., Bajeh, A.O., Balogun, A.O., Adeleke, H.O.: Memetic approach for multiobjective overtime planning in software engineering projects. J. Eng. Sci. Technol. 14(6), 3213–3233 (2019) 3. Patil, N., Sawanti, K., Warade, P., Shinde, Y.: Survey paper for software project scheduling and staffing problem. Int. J. Adv. Res. Comput. Commun. Eng. 7, 5675–5677 (2014) 4. Oladele, R.O., Mojeed, H.A.: A shuffled frog-leaping algorithm for optimal software project planning! Afr. J. Comput. ICT 7(1), 147–152 (2014) 5. Amiri, M., Barbin, J.P.: New approach for solving software project scheduling problem using differential evolution algorithm! Int. J. Found. Comput. Sci. Technol. 5(1), 1–5 (2015) 6. Eshraghi, A.: A new approach for solving resource constrained project scheduling problems using differential evolution algorithm. Int. J. Ind. Eng. Comput. 7(2), 205–216 (2016) 7. Stylianou, C.S., Andreou, A.S.: Intelligent software project scheduling and team staffing with genetic algorithm. In: IFIP Advances in Information and Communication Technology (IFIPAICT), vol. 364. Springer, Heidelberg (2011) 8. Shen, X., Minku, L.L., Bahsoon, R., Yao, X.: Dynamic software project scheduling through a proactive-rescheduling method. IEE Trans. Softw. Eng. 42(7), 658–686 (2016) 9. Vitekar, K.N., Dhanawe, S.A., Hanchate, D.B.: Review of solving software project scheduling problem with ant colony optimization. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2(4), 1177–1186 (2013) 10. Karova, M., Petkova, J., Smarkov, V.: A genetic algorithm for project planning problem. In: Proceedings International Scientific Conference Computer Science 2008, pp. 647–651 (2008) 11. Eusuff, M.M., Lansey, K.E.: Optimization of water distribution network design using the shuffled frog leaping algorithm. J. Water Resour. Plan. Manag. 129(3), 210–225 (2003)
Application of Shuffled Frog-Leaping Algorithm
303
12. Mai, G., Li, Y.: An improved shuffled frog leaping algorithm and its application. In: Proceedings of International Conference on Advances in Mechanical Engineering and Industrial Informatics, China (2015) 13. Eusuff, M., Lansey, K., Pasha, F.: Shuffled frog leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng. Optim. 38(2), 129–154 (2006) 14. Elbeltagi, E., Hegazy, T., Grierson, D.: A modified shuffled frog-leaping optimization algorithm: applications to project management. Struct. Infrastruct. Eng. 3(1), 53–60 (2007) 15. Nejad, H.C., Jahani, R., Sarlak, G.: Applying shuffled frog-leaping algorithm for economic load dispatch of power system. Am. J. Sci. Res. 20, 82–89 (2011) 16. Liping, Z., Weiwei, W., Yefeng, X., Yixian, C.: Application of shuffled frog leaping algorithm to uncapacitated SLLS problem. AASRI Procedia 1, 226–231 (2012) 17. Gerasimou, S., Stylianou, C., Andreou, A.S.: An investigation of optimal project scheduling and team staffing in software development using particle swarm optimization. ICEIS 2, 168– 171 (2012) 18. Chen, W.N., Zhang, J.: Ant colony optimization for software project scheduling and staffing with an event-based scheduler. IEEE Trans. Softw. Eng. 39(1), 1–17 (2013) 19. Weisstein, E.W.: NP-Hard Problem (2017). https://mathworld.wolfram.com/NP-HardPr oblem.html 20. Wysocki, R.K.: Effective Project Management: Traditional, Agile, Extreme, 5th edn., pp. 167– 171. Wiley Publishing, Indianapolis (2009) 21. Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Struct. Multidiscip. Optim. 26, 369–395 (2004) 22. Krasnogor, N., Aragon, A., Pacheco, J.: Metaheuristic procedures for training neural networks. Operations Research/Computer Science Interfaces Series, vol. 36, pp. 225–248 (2006) 23. Rezende, A.V., Silva, L., Britto, A., Amaral, R.: Software project scheduling problem in the context of search-based software engineering: a systematic review. J. Syst. Softw. 155, 43–56 (2019) 24. Lin, J., Zhu, L., Gao, K.: A genetic programming hyper-heuristic approach for the multi-skill resource constrained project scheduling problem. Expert Syst. Appl. 140, 112915 (2020) 25. Li, Q., Sun, Q., Tao, S., Gao, X.: Multi-skill project scheduling with skill evolution and cooperation effectiveness. Eng. Constr. Archit. Manag. 27, 2023–2045 (2019) 26. Van Den Eeckhout, M., Maenhout, B., Vanhoucke, M.: A heuristic procedure to solve the project staffing problem with discrete time/resource trade-offs and personnel scheduling constraints. Comput. Oper. Res. 101, 144–161 (2019) 27. Shen, X., Guo, Y., Li, A.: Cooperative coevolution with an improved resource allocation for large-scale multi-objective software project scheduling. Appl. Soft Comput. 88, 106059 (2020) 28. Harman, M., Mansouri, S.A., Zhang, Y.: Search based software engineering. A comprehensive analysis and review of trends techniques and applications. Technical report TR-09-03. Department of computer science, King’s College, London (2009)
A Long Short Term Memory and a Discrete Wavelet Transform to Predict the Stock Price Mu’tasem Jarrah1(B) and Naomie Salim2 1 King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia
[email protected] 2 Universiti Teknologi Malaysia - UTM, Johor Bahru, Malaysia
[email protected]
Abstract. Financial Analysis is a challenging task in the present-day world, where investment value and quality are paramount. This research work introduces the use of a prediction technique that uses a combination of Discrete Wavelet Transform (DWT) and Long Short-Term Memory (LSTM) to predict stock prices in the Saudi stock market for the subsequent seven days. A time series model is used where comprises the historical closing values of several stocks listed on the Saudi stock exchange. This model is called the Discrete Long Short-Term Memory (DLSTM) which comprises memory elements that preserve data for extended periods. The function determined the historical closing price of the stock market and then employed Autoregressive Integrated Moving Average (ARIMA) for analysis. The DLSTM-based experimental model had a prediction accuracy of 97.54%, while that of ARIMA was 97.29%. The results indicate that DLSTM is an effective tool for predicting the prices in the stock market. The results highlight the importance of deep learning and the concurrent use of several information sources to predict stock price levels Keywords: Long Short Term Memory · Deep learning · Prediction · Stock market
1 Introduction Stock price forecasts have been an area of significant interest since the past several decades [1]; however, such predictions are challenging because of the dynamic and complex nature of the environment [2]. Forecasting the trend and the prices in the stock market are considered crucial in the financial and investment domains. Several researchers have studied and suggested techniques to forecast market price to reap profits during trading by employing several methods such as statistical analysis and technical analysis. Stock trends are difficult to predict since there are a lot of uncertain factors and noise that affect prices. Numerous aspects may influence the market price on any specific day; changes to the national economy, sentiment, value of the product, political aspects, and weather are some such aspects [3]. Researchers have assessed and worked on stock price © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 304–313, 2021. https://doi.org/10.1007/978-3-030-70713-2_29
A Long Short Term Memory and a Discrete Wavelet Transform
305
trends to understand the factors that have the most significant effect on prices. This study uses LSTM and DWT to predict the prices of the stocks for the subsequent seven days. The sentiment of the customers is an aspect that affects the stock markets. Additionally, financial developments and the sentiment of the stock buyers who understand that their perspective concerning products or services offered by a firm significantly affect price volatility, which will be addressed in future studies. The present research compares traditional methods like ARIMA to the proposed DLSTM model. Numerous options have been evaluated for modelling, and the selected model has been assessed considering many possibilities while considering different model configurations. The present study concerns four critical aspects for the proposed models; these aspects are Discrete Wavelet Transform (DWT) used in combination with the Long Short-Term Memory (LSTM) deep-learning framework. The DWT technique facilitates noise elimination specific to the financial time series data using unsupervised techniques. The paper is structured as specified: Sect. 2 details the works specific to stock market forecast; Sect. 3 discusses the proposed model and its characteristics. Section 4 lists the results and comprises a discussion of the results, while Sect. 5 concludes the research and contains recommendations for additional research.
2 Literature Survey The model proposed in this research is an improved version of the model used in [4]. The present study builds on the existing model to make the regression analysis of the stock market more accurate. Data collection comprises the first step in the proposed model. In contrast, the second step consists of data cleansing and transformation, which are necessary steps to have the data required for analysis. Labelling is the critical phase which comprises the determination of data polarity of individual opinion as being positive, negative, or neutral. The fourth step comprises classification, whereby stock patterns are identified by employing the hybrid Naïve Bayes Classifiers (NBCs). The last step determines model performance. The Hybrid Naïve Bayes Classifiers (HNBCs) is the machine learning technique suggested in this study to perform the classification of the stock market sentiment. The results are important for firms, investors, and academicians since the results may be used to plan further action according to the sentiment of the individuals associated with the stock market. The results achieved using the proposed technique have been significant, whereby the accuracy is 90.38%. Batra & Daudpota (2018) suggested a novel machine learning technique concerning sentiment analysis to determine perspective (neutral, positive, or negative) used a piece of text addressing a person, product, enterprise, or another entity. Sentiment analysis helps determine the mood of the individuals whose perspective potentially affects stock price; hence, this technique can help predict real-world stock movement [5].
306
M. Jarrah and N. Salim
In a related study by Fischer and Krauss (2018), Long short-term memory (LSTM) networks were used as an advanced technique for learning sequences. Such networks are used predominantly with financial time series data; however, they are intrinsically applicable in this domain. We implement LSTM-based networks for forecasting the outof-sample directional trends of the stocks listed on the S&P 500. Additionally, characteristics determining profitability are highlighted, thereby providing information regarding the intricate working of artificial neural networks. One pattern associated with trading stocks is that they are associated with a high degree of volatility but a reversal in the short-term [6]. Minh, Sadeghi-Niaraki, Huy, Min, & Moon, (2018) suggested a new technique to forecast stock-price direction using sentiment dictionary and finance-related developments. Financial news is an established important aspect that leads to a change in stock prices. Nevertheless, previously conducted research has emphasised on the analysis of the superficial aspects, while disregarding the structural association between the words making a sentence. Numerous studies concerning sentiment analysis have attempted to determine the correlation between news events and the reaction of the investors. However, the lingual dataset was typically used to build the sentiment dataset. The lingual dataset is not related to the financial domain and led to inadequate performance [7]. Chou & Nguyen (2018) suggest an intelligent prediction method based on time series data that employs the sliding-window metaheuristic optimisation for forecasting stock prices one step in advance. The proposed system comprises a standalone application based on a graphical user interface. The hybrid system designed as part of the research demonstrated excellent forecast performance, which leads to higher profits concerning investment. The suggested framework comprises a powerful prediction method for severely non-linear time series data where traditional models may be unable to detect the patterns accurately [8].
3 Prediction Using DWT and LSTM 3.1 Discrete Wavelet Transform (DWT) The DWT technique has powerful feature extraction capability; therefore, it is employed in several domains like signal processing and financial time series. The primary aspect of the wavelet transform is that it allows the analysis of the frequency elements of the financial time series data concurrently as opposed to Fourier transform. Therefore, the wavelet transform technique facilitates a better understanding of financial time series comprising significant irregularities. This study uses the Haar function as the wavelet basis because it helps with decomposing the financial time series into the constituent time and frequency domains and also leads to significantly reduced processing time [9]. In the context of the Haar functionbased wavelet transform, O(n) denotes the time complexity of the process; here, n denotes the time series size. The expression for the continuous wavelet transform (CWT) is specified below: √ (1) ∅_(a, τ )(t) = 1/ a ∅((t − τ )/a)
A Long Short Term Memory and a Discrete Wavelet Transform
307
Where a and τ denote the scale and translation factors, respectively, and φ(t) represents the basis wavelet function [10]. 3.2 Long Short-Term Memory A long short-term memory (LSTM) element [11] or network is a sophisticated variant of the basic recurrent neural network. It may potentially be used as a constituent building element for improved series analysis using the recurrent network. An LSTM block is fundamentally a recurrent network since it comprises recurring connections like those found in traditional recurrent networks. The LSTM is formulated to resemble a recurrent neural network so that it can be used for processing long-term associations with better accuracy compared to traditional Recurrent Neural Networks. According to [12], the concurrent use of LSTM and RNN has shown better performance compared to simple RNN and Deep Neural Networks (DNN) in the context of speech recognition or stock price movement. Conventional DNNs are restricted in the sense that the modelling can be based only on a fixed-size sliding window where the network does not have a dependency on the time steps processed previously; hence, it is not the first choice for appropriate modelling of stock-specific data (refer to Fig. 1). The study uses data collected from the Saudi Stock Market (TADAWUL) to train and test the proposed model. After the data is gathered, it is normalised, and the training and testing sets are created. The training set is employed to train the formulated DLSTM framework so that it can predict the stock data for the subsequent seven days. After training, the model is tested using the data from several companies, and the outputs are compared to the actual stock prices using graphical plots, which are depicted in Fig. 2.
Fig. 1. Long short term memory cell
4 Experimental Setup This section comprises three sub-sections: the first specifies the characteristics of the dataset employed for experimentation, the second sub-section discusses the prediction method and accuracy measurement, while the last section comprises the results of the experiment.
308
M. Jarrah and N. Salim Table 1. Shows the formula for each component at time step t.
Component
Formula Input gate it = σ Wi . ht−1 , xt + bi Forget gate ft = σ Wf . ht−1 , xt + bf Cell candidate gt = σ Wg . ht−1 , xt + bg Output gate ot = σ Wo . ht−1 , xt + bo Cell state Hidden state
Ct = ft ∗ Ct−1 + it ∗ gt ht = ot ∗ tanh(Ct )
Purpose Control level of cell state update Control level of cell state reset (forget) Add information to cell state Control level of cell state added to hidden state Transfer data from one step to the next Used for predictions (Output)
- Where tanh denotes the state activation function [13].
DWT Tadawul Fi-
Data Processing
nance Date Split Data
Testing
Build Model Validation
Prediction (Next 7 Days)
Fig. 2. Architecture of the proposed model (DLSTM)
4.1 Data Description The present study uses the historical data concerning the stocks listed on the Saudi stock market (Tadawul). The entity was approved to operate in Saudi Arabia as the Securities Exchange (also referred to as the Exchange), and it records the daily open/low/high/close prices and the volume relating to all the stocks traded on the market. Tadawul comprises 1300 records for each of the 146 stocks that were listed between 2011/01/01 and 2016/03/31. Of the 190,000 series gathered from the database, 130,000 were employed for training the model, while the remaining 60,000 were used for validating the model. In the context of the suggested model specific to this research, academicians have preferred to use as input the closing price for six different companies from different industries. It should be noted that all companies had a noteworthy deviation in the outcomes. The data specific to these companies had a marginal error rate and were, therefore, considered as errorless. 4.2 Prediction Procedure Numerous tests were used during the research to regulate the parameters to fine-tune the results. The number of training periods is the first aspect of the LSTM model that needs to be tuned. The next variable is the batch size, which determines the update frequency of the network weights. The number of neurons is the third aspect, which modifies the learning ability of the system. Furthermore, in this study, the Adam optimised method
A Long Short Term Memory and a Discrete Wavelet Transform
309
is used for the LSTM because it builds upon the stochastic gradient descent algorithm, which is widely accepted in the deep learning domain. A higher number of neurons is typically associated with the system having an increased capacity of learning the problem structure, though the training duration gets extended. Higher learning capability also leads to the problem of overfitting of training data. The test parameters and the averages indicating prediction accuracy (MAE, MSE, RMSE) are listed in Table 1 for all the cases. Table 2. Experiments details Experiment
Epochs
Batch size
Neurons
MAE
MSE
RMSE
1
100
8
4
0.46
0.26
0.507
2
150
4
4
0.28
0.09
0.301
3
150
4
5
2.13
5.79
2.406
4
200
4
5
0.24
0.19
0.424
5
200
4
6
2.29
5.86
2.421
Using the table, it may be observed that test 4 produces the best results. Table 2 lists the data specific to the chosen sample that comprises four companies from the Saudi market and the S&P 500. Additionally, historical stock data was obtained from the website https://macrotrends.dpdcart.com/cart/deliver?purchase_id=12487241&salt=4526fa b69067075ba5560b21f1850513b192ef77. The data was processed using the ARIMA and DLSTM processes. The results of the tests performed on the Saudi companies and S&P 500 stocks are specified in Table 3 and Table 4 and depicted using Fig. 3, 4, 5 and Fig. 6 and Fig. 7 on respectively. The present study used a sample comprising of six randomly selected companies. Concerning the prediction process, the stock closing price was considered a significant parameter because it is associated with the opening price for the subsequent day. During the prediction process, the dataset pertaining to the Saudi stock market was split into two for every firm on the dataset. These two datasets are called the training and testing sets. Of the total number of entries, 1306 entries comprise the training set, while the remaining would be employed for model testing. Additionally, the S&P 500 dataset is also split into the training and testing sets. In this case, the first 1313 records are used for training and the others for testing. Models will be formulated using the training dataset, while the testing dataset would be used to validate the models by predicting the outputs. The time step for every dataset will move by one. A model is employed to forecast the time step, and the value expected for the testing set will be fed to the model to help predict the next step. Such a setting is similar to the real-world scenario where the updated stock market view can be accessed every day and be used for forecasting the outcomes for the subsequent day. Lastly, the predictions made using the testing set are collected, and the error value is calculated to ascertain the predictive power of the model.
310
M. Jarrah and N. Salim
The Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) are the metrics used to tune the model, where higher errors are corrected to facilitate better results that are in line with the real data.
5 Results The researchers used Python 3.6 running on Windows 10 operating system to validate the model. Several Python libraries were used: firstNumpy considers a homogenous multidimensional array as the primary object. It may be understood as the element table (typically containing numeric values of the same type). The Pandas package provided with Python provides flexible, quick, and expressive mechanisms that are formulated to provide easy-to-understand “labelled” and “relational” data. Sklearnit is a library that provides for efficient and straightforward data assessment. Furthermore, the Keras library may be executed using Theano, TensorFlow, or the MS Cognitive Toolkit. This library provides for swift testing using deep neural networks and focuses on extensibility, modularity, and user-friendliness. Finally, the Matplotlibit library is the Python plotting library which works with its mathematical extension called NumPy. Table 3, along with Fig. 3, 4, 5 and 6 depict a sample stock on which model testing was performed using a seven-day timeframe. Table 3 contains details about the actual and predicted stock prices along with information concerning prediction accuracy and error indicators, namely, MAE, MSE, and RMSE. Additionally, Fig. 7 depicts the accuracy summary obtained using the application of the suggested DLSTM framework. Table 3. Predictions result for the next 7 days Company Name
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 MAE
MSE
RMSE
Actual Saudi Arabian Mining Co. DLSTM
30.16 29.86 29.50 29.07 28.67 28.54 28.52 30.16 30.06 29.66 29.28 28.85 28.45 28.45
0.131 0.022
0.150
ARIMA
30.24 30.09 29.65 29.32 28.83 28.45 28.52
0.137 0.025
0.159
Actual
49.15 49.14 49.11 49.07 49.02 49.00 48.98
Yanbu Cement Co.
Sabic
Saudi Indian
DLSTM
49.12 49.22 49.14 49.08 49.05 49.00 48.99
0.017 0.000
0.021
ARIMA
49.14 49.16 49.13 49.11 49.06 49.00 48.99
0.026 0.001
0.035
Actual
77.27 76.86 76.29 75.60 74.91 74.73 74.75
DLSTM
77.29 77.34 76.56 75.90 75.12 74.44 74.58
0.205 0.059
0.243
ARIMA
77.33 77.24 76.44 75.99 75.09 74.48 74.79
0.249 0.079
0.281
Actual
11.01 11.76 12.70 13.76 14.81 15.16 15.21
DLSTM
10.93 11.06 12.12 13.26 14.45 15.52 15.54
0.413 0.205
0.453
ARIMA
10.91 11.04 11.94 12.94 14.04 15.09 15.31
0.475 0.338
0.581
A Long Short Term Memory and a Discrete Wavelet Transform
Fig. 3. Prediction for Saudi Mining Co.
311
Fig. 4. Prediction for Yanbu Cement Co.
Fig. 5. Prediction for Sabic.
Fig. 6. Prediction for Saudi Indian.
In the next step, the final test consisting of the ARIMA and DLSTM techniques is performed on the S&P 500 dataset. Fig. 7 and Table 4 provide information specific to the experimental outcomes for the 7-day window. Table 4. Predictions result for the next 7 days (S&P index) Index Name S&P Actual 500 DLSTM Index ARIMA
Day 1 Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
MAE MSE RMSE
2049.80 2036.71 2035.94 2037.05 2055.01 2063.95 2059.74 2050.96 2038.52 2037.74 2038.86 2055.85 2064.23 2060.29
1.18
1.75
2051.65 2048.93 2036.36 2035.95 2038.19 2055.79 2065.44 6.61 76.58
1.32 8.75
312
M. Jarrah and N. Salim
Fig. 7. Summary for prediction next 7 days (S&P index)
6 Conclusions and Future Work Several researchers have worked on the subject of stock market price prediction and have created two prediction mechanisms. The first mechanism forecasts the direction of the movement in the stock market and individual stock prices. At the same time, the other mechanism helps to predict future values. The predictions generated using the framework help provide financial assistance to users who can make better-informed decisions concerning stock market investments. This study proposes a new hybrid framework that is based on the LSTM and DWT combination, which relies on a technical dataset for stock price prediction. The proposed framework is capable of integrating the data collected from the stock market using the DWT and LSTM and then perform a simple optimisation process. Additionally, the integrated technique may be employed for creating better techniques that can address the risks better and provide better assistance to investors. It is suggested that researchers pursuing such studies considered several additional aspects to enhance prediction accuracy. The Umrah, Hajj period, and the Ramadan celebration month are celebrated events. Such factors could be studies for any potential effect on the prediction accuracy of stock price movements on the Saudi stock market.
Abbreviations LSTM DWT RNN DLSTM
Long Short Term Memory Discrete Wavelet Transform Recurrent Neural Networks Discrete Long Short Term Memory
References 1. Li, R., DianZheng F., Zeyu, Z.: An analysis of the correlation between internet public opinion and stock market. Paper presented at the 2017 4th International Conference on Information Science and Control Engineering (ICISCE) (2017)
A Long Short Term Memory and a Discrete Wavelet Transform
313
2. Nabipour, M., Nayyeri, P., Jabani, H., Mosavi, A., Salwana, E.: Deep learning for stock market prediction. Entropy 22(8), 840 (2020) 3. Jarrah, M., Salim, N.: A recurrent neural network and a discrete wavelet transform to predict the Saudi stock price trends. Int. J. Adv. Comput. Sci. Appl. 10(4), 155–162 (2019) 4. Batra, R., Daudpota, S.M.: Integrating stocktwits with sentiment analysis for better prediction of stock price movement. Paper presented at the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET) (2018) 5. Bruce, L.M., Koger, C.H., Li, J.: Dimensionality reduction of hyperspectral data using discrete wavelet transform feature extraction. IEEE Trans. Geosci. Remote Sens. 40(10), 2331–2338 (2002) 6. Chou, J.-S., Nguyen, T.-K.: Forward forecast of stock price using sliding-window metaheuristic-optimized machine-learning regression. IEEE Trans. Industr. Inf. 14(7), 3132– 3142 (2018) 7. Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 270(2), 654–669 (2018) 8. Li, R., Fu, D., Zheng, Z.: An analysis of the correlation between internet public opinion and stock market. Paper presented at the 2017 4th International Conference on Information Science and Control Engineering (ICISCE) (2017) 9. Minh, D.L., Sadeghi-Niaraki, A., Huy, H.D., Min, K., Moon, H.: Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 6, 55392–55404 (2018) 10. Mithani, F., Machchhar, S., Jasdanwala, F.: A modified BPN approach for stock market prediction. Paper presented at the 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (2016) 11. Zhu, L.-F., Ke, L.-L., Zhu, X.-Q., Xiang, Y., Wang, Y.-S.: Crack identification of functionally graded beams using continuous wavelet transform. Compos. Struct. 210, 473–485 (2019)
Effective Web Service Classification Using a Hybrid of Ontology Generation and Machine Learning Algorithm Murtoza Monzur(B) , Radziah Mohamad, and Nor Azizah Saadon School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia UTM, 81310 Skudai, Johor, Malaysia [email protected]
Abstract. Efficient and fast service discovery becomes an extremely challenging task due to the proliferation and availability of functionally-similar web services. Service classification or service grouping is a popular and widely applied technique to classify services into several groups according to similarity, in order to ease up and expedite the discovery process. Existing research on web service classification uses several techniques, approaches and frameworks for web service classification. This study focused on a hybrid service classification approach based on a combination of ontology generation and machine learning algorithm, in order to gain more speed and accuracy during the classification process. Ontology generation is applied to capture the similarity between complicated words. Then, two machine learning classification algorithms, namely, Support Vector Machines (SVMs) and Naive Bayes (NB), were applied for classifying services according to their functionality. The experimental results showed significant improvement in terms of accuracy, precision and recall. The hybrid approach of ontology generation and NB algorithm achieved an accuracy of 94.50%, a precision of 93.00% and a recall of 95.00%. Therefore, a hybrid approach of ontology generation and NB has the potential to pave the way for efficient and accurate service classification and discovery. Keywords: Web service discovery · Web service description language (WSDL) · Service classification · Ontology · Machine learning · Support Vector Machines (SVMs) · Naive Bayes (NB)
1 Introduction With the expansion of service-oriented architectures, web services have turned into a distinguished technology for providing superior solutions for the interoperability of various types of systems. Web services are the compilation of related application functions and freely coupled software components that can be distributed and utilized on the web [1]. This enables various applications from various sources to communicate with one another, utilizing standard protocols in real-time with minimal human cooperation. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 314–323, 2021. https://doi.org/10.1007/978-3-030-70713-2_30
Effective Web Service Classification Using a Hybrid of Ontology Generation
315
The fundamental advantage of utilizing web services is property interoperability, which makes web services progressively famous compared to related technologies [2]. Web services have shown significant improvement in a distributed computing paradigm. They are currently considered the most suitable way to publish and describe business processes. The popularity of web service applications affects the number of web services on the web. The proliferation of functionally similar web services makes it difficult to identify which context needs to be considered, and how to classify during the discovery process to achieve overall user satisfaction [3]. Overall, web service discovery has become a very significant and challenging task. Web service classification to identify functionally-similar web services has become a major approach to the efficient discovery of suitable web services. Web service classification is a process of distributing web services into several classes according to their functions and contexts, such that the similarity between services within one class remains high, while dissimilarity remains low [4]. Domain experts typically execute this classification process manually. With the exponential growth of web services on the internet, however, arranging, classifying, and handling web services manually has become impractical, as this requires intense human effort. Moreover, due to the vast number of categories in web service registries, it is an error-prone task [5]. Classification accuracy can be improved by using an ontology that captures the similarity between conceptually complicated words. Moreover, combining machine learning algorithms together with the ontology, can be an effective approach to classify web services. Therefore, this work proposes a web service classification approach based on ontology generation and machine learning algorithms in order to capture the similarity and classify functionally similar services. The remainder of this paper is organized as follows. Section 2 provides an overview of previous works related to different web service classification techniques, approaches and frameworks. Section 3 describes the detailed description of the proposed approach. Section 4 explains the experimental results. Finally, Sect. 5 concludes this paper and discusses potential future work.
2 Related Work Work in the web service classification area has recently gained significant attention based on the popularity of web services and the possible advantages that can be achieved from automated web services classification. Existing classification approaches use several techniques, approaches and frameworks to compute the similarity between web services. These include Specificity Aware Ontology Generation [6], Search Space Reduction Approach by applying Modified Negative Selection (M-NSA) algorithm [4], Hybrid Term Similarity (HTS) method and Context-Aware Similarity (CAS) method that uses Support Vector Machines (SVMs) for similarity calculations [1], and K-means++ method by extracting the feature vector of the service function description from the WSDL file [7]. Most existing approaches consider the functional properties for classifying web services. Most of these works prefer a hybrid approach for classifying web services. The difficulties in RESTful web service discovery is brought about by the absence of WSDL-like documents to provide a standard definition or portrayal of the RESTful web service
316
M. Monzur et al.
[4]. The work by [4] focused on enhancing the accuracy of the classification prior to discovery, in order to achieve less computation time and more accurate solutions. Any ontology generation method is not considered by this work. The work by [7] considered users’ QoS records for user classification. For web service classification, the work used extracted features from the WSDL file. But the number of extracted features was relatively low, since the WSDL file contains a huge amount of information about a web service. Therefore, considering only a few features for classification can affect the quality of the classes. The work by [1, 8, 9] and [6] considered several ontology generation methods for capturing the similarity between web services by extracting several features from the WSDL file, including domain-related information that can be very useful for achieving accurate classification. But the problem is that all of those works used agglomerative algorithms for identifying the classes. There are several drawbacks of using an agglomerative algorithm [10]. • It can never undo any previous steps. For example, the algorithm used to separate or classify two points, and later if the work is not considered as a good one, the algorithm does not provide the option to undo that step. • The time complexity of the agglomerative algorithm can bring about long computation times. • The correct number of classes determination is very difficult by the dendrogram for a large number of datasets. A combination of SLS and SVMs classification algorithms shows significant improvement compared to the SVMs classification algorithm alone, in terms of accuracy [11]. The most popular machine learning algorithms for classification are Support Vector Machines (SVMs) and Naive Bayes (NB), as they provide an embedded feature selection method. Feature selection is automatically selecting a subset of the most appropriate features for a problem from an original feature set, to be included in a model. The embedded feature selection method primarily works with learning algorithms. During model creation, feature selection is performed without splitting the data into training and testing sets. The combination of an ontology generation method and machine learning algorithms such as SVMs or NB can be one of the approaches for classifying web services in a better manner, due to the faster computational speed and accuracy of the machine learning algorithm, as well as the ability of the ontology generation method to capture the similarity among complex terms. A combination of ontology generation and machine learning algorithm can achieve superior accuracy. The work by [12] describes an ontology as a set of representational primitives and specifications with existing interrelationships for a particular domain. A web ontology is used to describe complex items on the web, and can define the rich concepts and knowledge about information interpretation. It comprises hierarchical definitions of principal concepts in a domain, along with descriptions of properties for each concept [13]. Concepts in ontology construction can be modelled as classes or sub-classes, depending on the hierarchy. According to Table 1, most of the works consider a hybrid approach for classifying web services. The ontology generation method is also widely used for capturing the hidden semantic meanings of complicated words.
Effective Web Service Classification Using a Hybrid of Ontology Generation
317
Table 1. Existing works on service classification. Author
Algorithm
Result
[4]
K-means, Modified NSA
Accuracy: 90.1%
[6]
Ontology learning, Agglomerative
Precision: 92.62%; Recall: 92.82%; F-measure: 92.63%
[9]
Ontology learning, Agglomerative
Precision: 92.75%; Recall: 94.12%; F-measure: 93.43%
[1]
Ontology learning, Agglomerative
Precision: 89.61%; Recall: 80.23%; F-measure: 84.66%
[8]
Ontology learning, Agglomerative
Average Precision: 90.8%; Average Recall: 91.22%; Average F-Measure: 90.6%
[11]
SVMs and SLS
Accuracy: 84.86%
[14]
Naive Bayes
Accuracy: 90%
3 Proposed Classification Approach In order to accomplish the proposed classification approach, a few works need to be done in a flow, and each work in this flow performs a significant role in the proposed approach. Figure 1 illustrates the workflow of the proposed approach.
Fig. 1. Overview of the proposed approach.
3.1 Dataset Collection Data collection and pre-processing are among the most critical activities of any machine learning model. There are several web service repositories available on the internet, such as webservicelist.com, woogle.com, seekda.com, and programmableWeb.com. Among
318
M. Monzur et al.
these repositories, ProgrammableWeb.com provides all the necessary information about RESTful web services of various categories. A web crawler tool is used to acquire all the important features for a web service such as serviceAPI name, serviceAPIHref, serviceTags, serviceDescription, serviceCategory, etc. A total of 12,920 web services details were collected based on some particular domains including agriculture, entertainment, communication, finance, education, food, healthcare, simulation, travel, security and media. 3.2 Data Pre-processing Data pre-processing is also important for better performance. Cleaning and replacing all of the null values with accurate or approximate information is an important step in pre-processing data. The WEKA® machine learning workbench was used for feature selection and pre-processing purposes. Feature selection is the process of removing irrelevant or redundant features without losing any important information. The purpose of feature selection is to enhance the capability of an algorithm by minimizing redundancy and optimizing relevant data. In addition, it reduced the necessary storage space and processing time. 3.3 Designing Context Ontology The most critical part of the ontology generation is distinguishing the semantically meaningful concepts and connections that exist among concepts. After the pre-processing process, the subsequent stage identified the TF–IDF values of all tokenized words, and organized the words in climbing TF–IDF series. The words are ranked by awarding the maximum rank to the word with the maximum TF–IDF value. A threshold value T is then defined. Similarity scores are calculated with the help of similarity filters referred to as proper equivalent, feature equivalent, feature-&-feature equivalent, joint equivalent, relative equivalent, annex equivalent and using Eq. (1) and Eq. (2), along with Table 2 [8]. For a proper equivalent, the similarity score calculated as 1. Sim(Ci, Cj) = Wm + We × ESim(Ci, Cj)
(1)
ESim(Ci, Cj) = − log(d (Ci, Cj)/2D)
(2)
3.4 A Hybrid Approach of Ontology and SVMs Classifier Support Vector Machines (SVMs) classification algorithm offers kernel trick, which is used during the classification process for handling nonlinear data. The algorithm is designed in such a way so that the hyperplane used to separate two data point always follows the largest amount of margin rule. The hyperplane construction was performed in an iterative manner. The primary goal was to minimize the error in classes. The SVMs kernel transformed the data point into the required form, such as a low dimensional input data into a high dimensional input data by adding more dimensions to it (Fig. 2).
Effective Web Service Classification Using a Hybrid of Ontology Generation
319
Table 2. Assigned values for Wm and We [8]. Matching filter
Weight Wm
We
0.89
0.11
Feature-&-feature equivalent 0.86
0.14
Feature equivalent Joint equivalent
0.80
0.20
Relative equivalent
0.75
0.25
Annex equivalent
0.63
0.37
Fig. 2. The algorithm of a hybrid approach using SVMs classifier.
3.5 A Hybrid Approach of Ontology and NB Classifier Naive Bayes (NB) classification algorithm used the Bayes Probability Theorem for classifying the data into several classes. NB is considered as the most straightforward and fastest classification algorithm, and is also suitable for large datasets. During the classification process, every feature is considered as an independent feature. This assumption is also called class conditional independence. This assumption simplified the computation process, and consequently, the classification process becomes fast, accurate and more reliable (Fig. 3).
320
M. Monzur et al.
Fig. 3. The algorithm of a hybrid approach using NB classifier.
3.6 Training and Testing Training and testing procedures are used for evaluation. It is a process where the accuracy, data quality, and necessary output all occur. A total of 80% of the data is used for training from the vast data collection gathered, and the remaining 20% of the data is used for testing. 3.7 Evaluation Metrics The performance measurement is used to ensure the performance or usefulness of the proposed approach, and analyzes the significance of the proposed approach. This work considered the evaluation metrics of accuracy, precision and recall as the performance measurement criteria. Precision (P) is defined as the number of true positives (TP) divided by the number of true positives and the number of false positives (FP) [15]. P(%) = TP/ (TP + FP) × 100
(3)
Recall (R) is defined as the number of true positives (TP) divided by the number of true positives (TP) and the number of false negatives (FN) [15]. R(%) = TP/(TP + FN ) × 100
(4)
Accuracy (A) is defined as the overall proportion accuracy of classification that is classified correctly [15]. A(%) = (TP + TN )/(TP + FP + FN + TN ) × 100
(5)
Effective Web Service Classification Using a Hybrid of Ontology Generation
321
4 Result Analysis During the classification process, particular domains were considered for classification purposes because of the high level of relativeness between those domains. Among the features, the selected features were serviceAPI name, serviceAPIHref, serviceTags and serviceDescription. Based on these features, web services were classified. Among all these features, the serviceTags and serviceDescription were considered as the most important feature, as they described the functionalities provided by a service (Fig. 4).
Feature Importance Service Descripon Service Tags Service Href Service Name 0%
10%
20%
30%
40%
50%
60%
70%
Fig. 4. Feature importance between several features.
Table 3 presents the results obtained using the proposed hybrid classification approach. Table 3. The comparison of the proposed approach with similar approaches. Algorithm
Accuracy
Precision
Recall
Ontology learning, Agglomerative
Not presented
92.62%
92.82%
K-means, Modified NSA
90.1%
Not presented
Not presented
Ontology learning, Agglomerative
Not presented
92.75%
94.12%
Ontology learning, Agglomerative
Not presented
89.61%
80.23%
Ontology learning, Agglomerative
Not presented
90.8%
91.22%
SVMs and SLS
84.86%
Not presented
Not presented
Naive Bayes
90%
Not presented
Not presented
Ontology generation, SVMs
90.59%
89.75%
91.23%
Ontology generation, NB
94.50%
93.00%
95.00%
322
M. Monzur et al.
The results obtained by applying the proposed hybrid classification approach showed significant improvements in terms of accuracy compared with the similar hybrid approaches in prior work. The accuracy helps to deal with the proliferation of functionally similar services by classifying them according to their domain and functions. There are several reasons for achieving better accuracy compared to other works. The service description played a significant role during the classification process. The proposed hybrid approach used an ontology generation method for capturing the similarities between complex terms. Consequently, most of the similar words considered as their base form, and were classified under the same class. Moreover, most of the information and important features were in the form of natural language. The algorithms perform better when provided details are converted to a machine-readable format. The Label Encoder function is used inside the algorithm to transform and fit the natural language into a machine-readable format. Although the hybrid of ontology generation and SVMs classification algorithm provided slightly better accuracy compared to the other hybrid approaches, the precision and recall value was not up to the mark. This is due to longer computational time. The SVMs classifier took more time to train, and as a result, the performance slightly decreased. The hybrid of ontology generation and NB classification algorithm performed better in all three comparison criteria (accuracy, precision and recall). The performance improved due to the use of the Gaussian Classifier, which is suitable for a large chunk of a dataset. The algorithm performed better with the help of the Gaussian Classifier, and provided high accuracy and speed on the large dataset. Thus, the NB classifier performed better in the case of text analysis with low computation time and provided better accuracy, precision and recall values.
5 Future Work This study only considered service classification, not service discovery. However, the proposed hybrid approach can be studied more deeply to expand it towards service discovery. More domains can be considered rather than just considering only a few domains. Furthermore, Artificial Neural Networks (ANNs) and Deep Learning (DL) can be applied to the dataset to classify web services. The accuracy is predicted to be more precise, and the approach will be increasingly solid to utilize in a framework. Acknowledgements. We would like to thank the Ministry of Education (MOE) Malaysia for sponsoring the research through the Fundamental Research Grant Scheme (FRGS) with vote number 5F080 and Universiti Teknologi Malaysia for providing the facilities and supporting the research. In addition, we would like to extend our gratitude to the lab members of Software Engineering Research Group (SERG), School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia for their invaluable ideas and support throughout this study.
References 1. Rupasingha, R.A.H.M., Paik, I., Kumara, B.T.G.S.: Calculating web service similarity using ontology learning with machine learning. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Madurai, pp. 1–8 (2015)
Effective Web Service Classification Using a Hybrid of Ontology Generation
323
2. Sambasivam, G., Amudhavel, J., Vengattaraman, T., Dhavachelvan, P.: An QoS based multifaceted matchmaking framework for web services discovery. Future Comput. Inform. J. 3, 371–383 (2018) 3. Cao, Z., Liu, H., Zhang, X.: An efficient algorithm of context-clustered microservice discovery. In: CASE 2018 Proceedings of the 2nd International Conference on Computer Science and Application Engineering, Hohhot, China (2018) 4. Garba, S., Mohamad, R., Saadon, N.A.: Search space reduction approach for self-adaptive web service discovery in dynamic mobile environment. In: Saeed, F., Mohammed, F., Gazem, N. (eds.) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol. 1073. Springer, Cham (2020) 5. Raj, M., Pragasam, S.: QoS based classification using K-Nearest Neighbor algorithm for effective web service selection. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, pp. 1–4 (2015) 6. Rupasingha, R.A.H.M., Paik, I., Kumara, B.T.G.S.: Specificity-aware ontology generation for improving web service clustering. IEICE Trans. Inf. Syst. E101-D(8), 2035–2043 (2018) 7. Wen, T., Bao, J., Ding, F.: QoS-aware web service recommendation model based on users and services clustering. In: ICITEE 2018 Proceedings of the International Conference on Information Technology and Electrical Engineering 2018, Xiamen, Fujian, China (2018) 8. Kumara, B.T.G.S., Paik, I., Chen, W.: Web-service clustering with a hybrid of ontology learning and information-retrieval-based term similarity. In: 2013 IEEE 20th International Conference on Web Service, Santa Clara, CA, pp. 340–347 (2013) 9. Rupasingha, R.A.H.M., Paik, I., Kumara, B.T.G.S., Siriweera, T.H.A.S.: Domain-aware web service clustering based on ontology generation by text mining. In: 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, pp. 1–7 (2016) 10. Sasirekha, K., Baby, P.: Agglomerative hierarchical clustering algorithm- a review. Int. J. Sci. Res. Publ. 3(3), 1 (2013) 11. Laachemi, A., Boughaci, D.: A stochastic local search combined with support vector machine for web services classification. In: International Conference on Advanced Aspects of Software Engineering (ICAASE), Constantine, pp. 9–16 (2016) 12. Mohd-Hamka, N., Mohamad, R.: OntoUji–ontology to evaluate domain ontology for semantic web services description. Jurnal Teknologi. 69(6), 21–26 (2014) 13. Mohamad, R., Zeshan, F.: Medical ontology in the dynamic healthcare environment. Procedia Comput. Sci. 10, 340–348 (2012) 14. Liu, J., Tian, Z., Liu, P., Jiang, J., Li, Z.: An approach of semantic web service classification based on Naive Bayes. In: IEEE International Conference on Services Computing (SCC), San Francisco, CA, pp. 356–362 (2016) 15. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
Binary Cuckoo Optimisation Algorithm and Information Theory for Filter-Based Feature Selection Ali Muhammad Usman1,2(B) , Umi Kalsom Yusof1 , and Syibrah Naim3 1 School of Computer Sciences, Universiti Sains Malaysia, 11800 Pulau Pinang, Malaysia
[email protected] 2 Department of Computer Sciences, Federal College of Education (Technical), Gombe, Nigeria 3 Technology Department, Endicott College of International Studies (ECIS),
Woosong University, Daejeon, Korea [email protected]
Abstract. Dimensionality reduction is among the data mining process that is used to reduce the noise and complexity of the features. Feature selection (FS) is a typical dimensionality reduction that is used to reduce the unwanted features from the datasets. FS can be either filter or wrapper. Filters lack interaction among selected subsets of features which in turns affect the classification performance of the chosen subsets of features. This study proposes two ideas of information theory entropy (E) as well as mutual information (MI). Both of them were used together with binary cuckoo optimisation algorithm BCOA (BCOA-E and BCOAMI) to reduce both the error rate and computational complexity on four different datasets. A support vector machine classifier was used to measure the error rates. The results are in favour of BCOA-E in terms of accuracy. In contrast, BCOAMI is computationally faster than BCOA-E. Comparison with other approaches found in the literature shows that the proposed methods performed better in terms of accuracy, number of selected features and execution time. Keywords: Feature selection · Filter-based · Binary Cuckoo optimisation · Information theory
1 Introduction In the various fields of health care, online education, bioinformatics, and social media, among others, data have now become abundant. Since then, the exponential growth of data has become a significant problem for successful data management. As such, data mining and machine learning approaches must be implemented to uncover secret information from these vast data pools [13, 14]. Classification is amongst the methods of data mining that are used to classify each instance into a set of groups. Feature space the only problem downgrading a classifier’s efficiency. Except there is an earlier understanding of the best features, it is otherwise difficult to find the most useful and appropriate features, especially when the size of the feature is large [23]. The term feature © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 324–338, 2021. https://doi.org/10.1007/978-3-030-70713-2_31
Binary Cuckoo Optimisation Algorithm and Information Theory
325
selection (FS) is, therefore, introduced to pick the most essential and appropriate features from these enormous volumes of data. FS’s two main problems are how to search for the best subsets and then evaluate the best one generated [14, 31]. Most current algorithms cannot properly discover the enormous space of an FS without deprived of being stuck in some local optima [6, 10, 29]. Evolutionary algorithms(EAs) are now being used as search methods to elucidate FS problems; nevertheless, several of them still grieve from early convergence. Cuckoo Optimization Algorithm (COA) introduced by [22] is one of the EAs mentioned in [13, 14, 26] that have qualified search operators and can contribute to the search space realisation of the most promising area and converge more rapidly than many other EAs. It depends on the method of the FS to assess or evaluate the best subsets of the generated features. To determine the accuracy or error rate of the chosen subset of features, the wrapper method of the FS uses a classification algorithm and selects the subsets with better accuracy. However, these processes are highly computationally expensive, particularly on high-dimensional datasets [20, 25]. Filter methods, alternatively, are computationally fast and can scale speedily to large dimensional datasets. A lack of feature dependence or relationship between the selected features is one of its significant downsides [13, 31]. Therefore, this study will address the issue of the feature dependency among selected subsets of features. Information theory is a practical approach that can able to measure the relevance within two or more features together with their class label in feature ranking. The most frequently used ones are mutual information (MI) and entropy [4]. Researchers are now using the concepts of both MI and entropy to find the significance and redundancy of the selected features by combining them with various EAs. For instance, [3] used both entropy as well as MI as a fitness evaluation measure in Binary Particle Swarm Optimisation (BPSO). In the work of Mlakar et al. in [17], MI is being used along with PSO. Besides, the Particle Swarm Optimisation (PSO) was used to enhance crowding features and clustering to obtain the best subset of features. Recently, [9] used differential evolution (DE) for feature ranking with the help of MI, Relief-F and Fisher scores. The results obtained surpass both the single and multi-objective approaches presented. All these previous work testified that the concept of information theory is successful in addressing the problems of FS. Thus, in this paper, the enhanced version of the COA, the Binary COA (BCOA) developed by [16] that is suitable for handling FS is proposed as a search technique together with MI and information gained based entropy as the filter evaluation measures. The remainder of the paper is standardised as follows: Sect. 2 describes BCOA, MI, as well as entropy. Section 3 is the proposed filter-based BCOA (BCOAMI and BCOAE) along with the experimentation. Section 4 describes the results and discussion. Lastly, in Sect. 5, the conclusions were offered as well as further research directions.
2 Background This section describes all the ingredients that are used to carry out this study. It includes the BCOA, MI, and gain ratio based entropy.
326
A. M. Usman et al.
2.1 Binary Cuckoo Optimisation Algorithm Binary COA was proposed in [16] since the original COA is meant to solve an only continuous optimisation problem. The BCOA is the most suitable in solving FS problems than its COA counterpart. To calculate the X G and X CP of the habitat in the COA in [22] we use: YNH = XCP + rand (X G − XCP )
(1)
To create a new habitat X NH suitable for discrete binary problems, a sigmoid function (Sig) in the Eq. 2 was used. The reason is to map X NH into the range [0, 1]. Then Eq. 3 will alter the values in the habitat as 0 or 1. Whereby rand in Eq. 3 is a random number, that is generated randomly. Sig =
1 1 + e−X NH
If Sig > rand Then XNH = 1 Else XNH = 0
(2) (3)
2.2 Information Gain Based Entropy The information gain entropy is calculated based on Eq. 4. The higher values of the entropy signify the same probability of occurrence of each variable in contrast to the low entropy that means the different possibility of event of an incident for each variable. H (X ) = − Pxi log2 Pxi (4) i
X is the random variable and P(xi ) = Pr{X = P(xi ), xi ∈ X } is the mass probability density of X. 2.3 Mutual Information Mutual information (MI) is the measure of relationship or dependence between two arbitrary variables by providing a means to assess the relevance of the subset of the features. The MI between two features X and Y is defined as [9, 27]: P(xi , yj ) I (X ; Y ) = − P(xi , yj )log2 P (5) i,j P(xi )P yj Equation 5 shows that the I(X; Y) will be large if the two features X and Y are so much related. Else, I(X; Y) = 0 if X and Y are not related at all.
Binary Cuckoo Optimisation Algorithm and Information Theory
327
2.4 Some Related Works The performance of K nearest neighbour (KNN) and SVM based on current filters is presented by Freeman et al. in [8]. The results have shown that MI can develop a better subset of functionality for SVM and KNN. Also, MI is capable of evolving useful subsets of functionality for the two classification algorithms. The idea of maximum relevance and minimum redundancy within the MI was presented by Peng et al. in [21]. The objective was to find the subset of functionality with reduced redundancy and to improve the relevance with the class label. Based on that, researchers now use it to obtain the relationship or dependency between two pairs of features. But, due to the use of sequential search, it can quickly get trap in the local optima. Estevez et al. use genetic Algorithm (GA) in [5] to remedy the constraint of sequential search. Besides, a normalise FS based MI (NMIFS) was proposed because MI favoured characteristics with higher values. The NMIFS is an improvement of the MIFS, MIFS-U and mRMR methods offered in work Battiti in [2]. However, it is also limited to only one pair of features, and yet a non-optimal set of features are likely to be chosen. This motivates many researchers to use other optimisation algorithms, that can search for the best optimal subset of features with best classification performance. For example, Cervante et al., in [3] used a binary PSO together with entropy and MI as evaluation criteria. The results obtained on the four datasets showed that BPSO with mutual information could evolve a set of features with a fewer number of features. Whereas, BPSO with entropy has more classification accuracy using a DT compared to BPSO with MI. Moreover, Moghadasian and Hosseini in [18] used MI and entropy are used as evaluation criteria on some six high dimensional datasets. An artificial neural network was used to measure the classification accuracy and cuckoo search as the search technique. The experimental results displayed that around 90% of the main features was minimised and yet achieved better classification accuracy than using full-length features. In the work of Mlakar et al. in [17], the concept of MI is being used along with PSO. Besides, the PSO is to enhance crowding features and clustering to obtain the best subset of features. Recently, Huda et al. in [11] use a group-based PSO by updating the Pbest along with the Gbest to get the relevance features while ignoring the redundant features. Moslehi et al. in [19] proposed a hybrid filter-wrapper FS by combining GA and PSO along with Artificial Neural Network on five different datasets. Liu et al. in [14] presented a survey paper on the optimisation algorithm that has been used for FS. Out of the numerous algorithms, they conclude that there is still a chance to use other algorithms that are not fully explored in the FS domain. Recently, Usman et al. in [28] presented a comparative analysis among some nature-inspired algorithms for feature selection on some medical datasets. The results obtained showed that binary flower pollination algorithm performed better than the standard flower pollination algorithm in terms of both the number of selected features and classification accuracy. Moreover, the proposed BPFA performed better than harmony search and particle swarm optimisation that uses rough set and quick reduct, respectively. Recently Usman et al. in [30] use the concepts of BCOA for filter-based FS but its limited gain ratio based entropy. Other optimisation algorithms are now becoming popular in dealing with FS problems. For example, Mafarja et al. in [15] hybridised Whale Optimisation Algorithm (WOA) together with Simulated Annealing (SA) to solve FS problems. The datasets
328
A. M. Usman et al.
used coincided with the datasets used in this study. Hence is used for comparison despite the fact that its wrapper-based approach. Similarly, Samy et al. in [24] introduced a new binary WOA for FS based on whales’ behaviour. The Optimum-Path Forest technique is used as an objective function. The results obtained was tested on five colour images datasets. It’s found that the process is much faster than the other classification techniques. In another perspective, Arora et al. in [1] presented two binary variants of the Butterfly Optimization Algorithm (BOA). Among which, two transfer functions are used to map the continuous search space to a discrete one. Twenty-one datasets are used in the experiments. The superior performance of the proposed binary variants is proved in the experiments. Moreover, Jain et al. in [12] offer an enhanced binary version of Gravitational Search Algorithm (GSA) is presented, which is based on the law of gravity and attraction of masses to address this problem of feature selection in medical data. The speed of a random forest classifier is combined with the optimisation behaviour of the GSA. A substantial improvement was recorded in terms of the prediction accuracy. Furthermore, Hichem et al. in [10] presented a new binary Grasshopper Optimisation Algorithm for FS. Whereby, the binarisation of continuous space transforms the continuous values of the continuous space into binary values 0 or 1 in the binary space was realised. Lately, Tahir et al. in [25] presented a novel Binary Chaotic GA for FS in healthcare. To conclude, Fahad et al. in [6] introduced symmetric uncertainty based Ant Colony Optimisation Algorithm for streaming FS in high dimensional medical datasets. The review of the related works shows that optimisation algorithms are becoming more relevant in dealing with different kinds of FS problems. They are used explicitly as search techniques, to search for the most relevant subsets of features. On the other hand, the concepts of information theory play a vital role as a filter evaluation measure, specifically in the filter-based approach.
3 Proposed Approaches In this section, the two filter evaluation measures are being used together with BCOA to form BCOA-MI and BCOA-E. The detail is explained below. 3.1 BCOA Based MI for FS The MI is used to measure the relationship between two pair of features along with their target class. As such, it is used to measure the relevance and redundancy between two couple of features during the feature interaction between them. Based on that, BCOAMI is proposed containing both the relevance and redundancy as the fitness evaluation measure that guides the BCOA to hunt for the subset of features. It is indicated in the Eq. 6: Fmi = −β(Rel mi + Red mi ) − Red mi Rel mi (X ; C) = max
i
I (x; c)and Red mi (X ; Y ) = min
(6) 1 I xi ; yj |m| i,j
Binary Cuckoo Optimisation Algorithm and Information Theory
329
c and X represent the target class and the discrete binary feature subsets, respectively. The Rel mi uses a pairwise method to calculate the MI between every feature and its target class, that ultimately determine the relevancy of the chosen feature subsets to the target class. Red mi evaluates the MI shared by each pair of the selected features, which means that there is redundancy inside the selected features. Thus, Eq. 6 F mi is s a maximisation function because it maximises the relevancy Relmi and simultaneously minimises the Red mi of the selected features. 3.2 BCOA Based Information Gain Entropy for FS Unlike the F mi that is considered as two-way relevance and redundancy, in FS, Feature interaction may happen in more than two ways; we may have a group of features interactions. Therefore, BCOA-E is proposed to consider a group of features during feature interaction. Hence, the fitness function is clearly defined, as shown in the Eq. 7. FE = −β(Rel E + Red mi ) − Red E Rel E (X ; C) = maxIG
i
I (x; c)and Red E (X ; Y ) = min
(7) 1 IG(x{X /x}) |m| i,j
Also, RelE evaluates information gain of c given the information on the features in X, and this indicates the relevancy between the selected subset of features as well as the target class. On the other hand, RedE assesses the combined entropy of all the given features in X, and this shows that there is redundancy inside the chosen subsets of features. Therefore, Eq. 7 FE is also considered as a maximisation function that maximises relevancy RelE and concurrently minimises the redundancy RedE among the selected subset of features.
330
A. M. Usman et al.
3.3 Relevance and Redundancy Weighted Values in BCOA-MI and βCOA-E It can be discerned that both Eq. 6 and Eq. 7 has a β1 and β2 respectively. The essence of the β values is to see which one can significantly improve the relevance and consequently reduced redundancy. Based on that, we sum up the relevance and redundancy, then multiplied it with the values and deduct it from the outcome. The reason is that; relevance is needed the most than the redundancy for the optimal result as reported by Hancer et al. in [10]. The weighted values used by Cervante et al. in [3] are adopted in this study. 3.4 Experimental Design Table 1 depicts the datasets used in this study, and it can be found in Frank and Asuncion [7]. From the table, four datasets are used in the experiments with WaveformEW having the highest number of features and instances while Lympography is having the least. The initial and maximum population of the BCOA are set to twenty and thirty; for the thirty different runs. SVM was used to measure the classification accuracy. The datasets are divided into a training set (70%) and testing set (30%). Besides, ten-fold cross-validation was used on each dataset. Table 1. Experimental detests. S/N Detests
Features Instances
1
Lymphography 18
148
2
SpectEW
22
267
3
KrvskpEW
36
3196
4
WaveformEW
40
5000
4 Discussion of Results Table 2, 3, 4 and 5 show the results of the proposed methods. Firstly, BCOA-MI and BCOA-E results are displayed in Table 2. From the Table “Ave Size”, “Ave Acc”, “Best Acc”, “Time” and “All” represent the average number of selected features, ave age accuracy, best accuracy, time, and all features, respectively. 4.1 Results of BCOA-MI and BCOA-E Table 2 shows the results of BCOA-MI along with BCOA-E without any weight function. It can be observed from the results that BCOA-MI performed much better on the average size features selected in all the datasets where around 75% of the total features is reduced. In contrast to the BCOA-E which performed much better in terms of accuracy. Similarly, a less computational time was recorded in the BCOA-MI, and this is due to the pair number of features it deals with compared to BCOA-E that used a group of features. The results clearly showed that both BCOA-MI and BCOA-E could significantly minimise the feature size and attain an improved or similar performance than using the full features.
Binary Cuckoo Optimisation Algorithm and Information Theory
331
Table 2. Experimental results of the proposed algorithms (BCOA-MI) and (BCOA-E). Detests
Approach
Ave-size
Ave-Acc (Best Acc)
Lymphography
All
18
0.875
BCOA-MI
3
0.840 (0.850)
1.68
BCOA-E
4.8
0.855 (0.859)
52.08
All
22
0.851
BCOA-MI
4
0.881 (0.884)
1.85
BCOA-E
4.2
0.888 (0.904)
54.21
All
36
0.892
BCOA-MI
4.2
0.920 (0.945)
56.11
BCOA-E
13.9
0.980 (0.984)
1649.60
All
40
0.771
BCOA-MI
17.5
0.660 (0.660)
172.62
BCOA-E
20.2
0.760 (0.760)
5100.90
SpectEW
KrvskpEW
WaveformEW
Time
4.2 Results of BCOA-MI and BCOA-E with BWeighted Values From Table 3, it can be seen that the higher the β1 value in BOCA-MI, the better the accuracy in the entire datasets. Therefore, the relevance is more significant than the redundancy, which consequently leads to higher accuracy on the higher values of the β1. But looking at the WaveformEW dataset in the table when β1 = 0:9 and 0:8 the different between the best values is not much they are 0.778 and 0.779 respectively. Moreover, the feature size got reduced to around 70%. On the other hand, the higher the β2 value in BCOAE depicted in Table 4, the higher the number of the selected feature. The number of features reduced by almost 40% compared to the full-length features. Also, the accuracy increases as the β2 increase in the majority of the datasets. Comparison between BCOA-MI with β1 in Table 3 along with BCOA-E with β2 in Table 4, one can notice that: (i. β1 is worse than β2 in terms of accuracy (ii. β2 is worse than β1 in terms of the number of selected features and (iii. β1 is computationally less expensive compared to β2. Employing both β1 and β2 values within the filter evaluation measures could significantly reduce the number of features and obtained appropriate classification accuracy than using the full-length features. The “Std” in both Table 3 and Table 4 represent the standard deviation in all the thirty different runs. 4.3 Average Fitness of BCOA-MI and BCOA-E Table 5 shows that the proposed BCOA-E converged earlier with least fitness value than the BCOA-MI on all the four datasets. Although, BCOA-MI recorded the highest fitness value it mostly obtained the best classification performance in terms number of selected
332
A. M. Usman et al. Table 3. Results of the BCOA-MI with different weights of β1. Detests
β1
Ave-size
Ave-Acc (Best Acc)
Std
Lymphography
0.9
7.8
0.860 (0.888)
0.013
1.69
0.8
5.2
0.840 (0.850)
0.013
1.69
0.7
4.9
0.834 (0.834)
0.000
1.69
0.6
4.1
0.800 (0.800)
0.000
1.68
0.5
3
0.780 (0.799)
0.001
1.68
0.9
9.2
0.888 (0.894)
0.012
1.87
0.8
7.8
0.871 (0.885)
0.012
1.87
0.7
5.6
0.844 (0.855)
0.011
1.86
0.6
4.2
0.833 (0.840)
0.011
1.86
SpectEW
KrvskpEW
WaveformEW
Time
0.5
4
0.830 (0.830)
0.000
1.85
0.9
17.2
0.942 (0.946)
0.001
59.55
0.8
16.7
0.935 (0.940)
0.002
57.45
0.7
15.2
0.930 (0.937)
0.002
57.11
0.6
14.2
0.924 (0.925)
0.001
56.13
0.5
12.2
0.920 (0.923)
0.001
56.11
0.9
21.4
0.775 (0.778)
0.001
179.9
0.8
20.2
0.770 (0.779)
0.004
175.7
0.7
19.2
0.760 (0.774)
0.003
174.0
0.6
18.4
0.688 (0.727)
0.000
172.6
0.5
17.5
0.660 (0.660)
0.000
172.6
features and computational time compared to its BCOA-E counterpart as shown earlier in Table 2, Table 3 and Table 4. Also, the values of the standard deviation in the table are within the required standard limit in all the iterations. 4.4 Convergence Trends of BCOA-MI and BCOA-E Figure 1 shows the convergence of the proposed BCOA-MI and BCOA-E. At the top of the chart is the name of the dataset, while the fitness and number of iterations are represented on the x-axis and y-axis respectively. From at the curve on each graph, it can be observed that BCOA-E is at the bottom compared to the BCOA-MI, this means that BCOA-E converges to the best fitness compare to the BCOA-MI. Perhaps, it can be due to interaction among a group of features in the BCOA-E. On the other hand, BCOA-MI has limited feature interaction, since it interacts with only pair features at a time.
Binary Cuckoo Optimisation Algorithm and Information Theory Table 4. Results of the BCOA-E with different weights of β2. Detests
β2
Ave-size
Ave-Acc (Best Acc)
Std
Time
Lymphography
0.9
12.6
0.890 (0.890)
0.000
52.39
0.8
10.5
0.880 (0.888)
0.000
52.39
0.7
8.9
0.874 (0.879)
0.001
52.39
0.6
6.4
0.860 (0.872)
0.001
52.08
0.5
5.1
0.855 (0.859)
0.001
51.46
0.9
10.2
0.899 (0.914)
0.004
54.79
0.8
8.7
0.891 (0.895)
0.001
54.79
0.7
6.8
0.884 (0.889)
0.001
54.5
0.6
5.2
0.871 (0.880)
0.002
54.5
SpectEW
KrvskpEW
WaveformEW
0.5
5
0.862 (0.869)
0.001
54.21
0.9
19.2
0.972 (0.976)
0.001
1750.8
0.8
18.4
0.965 (0.980)
0.005
1689
0.7
16.3
0.950 (0.977)
0.005
1679
0.6
15.4
0.944 (0.945)
0.001
1650.2
0.5
13.9
0.929 (0.933)
0.001
1649.6
0.9
26.5
0.822 (0.888)
0.004
5315.2
0.8
24.3
0.790 (0.819)
0.003
5190.5
0.7
21.2
0.770 (0.785)
0.003
5140.2
0.6
20.1
0.768 (0.769)
0.001
5100.9
0.5
19.2
0.760 (0.760)
0.000
5100.9
Table 5. Average fitness for BCOA-MI and BCOA-E. Detests
BCOA-MI Fitness
BCO-E StdDev Fitness StdDev
Lymphography 0.158
0.001
0.131
0.001
SpectEW
0.179
0.001
0.137
0.002
KrvskpEW
0.064
0.000
0.055
0.002
WaveformEW
0.279
0.000
0.274
0.000
333
334
A. M. Usman et al.
Fig. 1. Convergence trends of BCOA-MI and BCOA-E.
4.5 Comparison with Other Existing Approaches The results obtained are compared with the existing work, that works with similar datasets, for example, BPSO-MI and BPSO-E in the work Cervante et al. in [3] together with WOA-SA in work Mafarja et al. in [16]. The detailed of the comparison is depicted in Table 6. The proposed results were compared with the existing works in terms of numbers of selected features, classification accuracy and the time it takes to finish its execution on each dataset during the thirty independent runs. In all aspect, our proposed methods performed better than the BPSOMI and BPSOE. Whereas, in terms of the computational time, our approaches performed better than WOA-SA excepts on Lymphography dataset whereWOA-SA recorded the least time. In terms of accuracy, WOA-SA achieved the best accuracy in two of the datasets, while our proposed methods achieved the best accuracy on the remaining two datasets. Our proposed methods recorded the least number of features in all datasets compared to the other approaches. Therefore, one can conclude that the proposed methods performed better than the existing works in terms of the number of selected features, computational time as well as classification accuracy.
Binary Cuckoo Optimisation Algorithm and Information Theory
335
Table 6. Comparison of the proposed algorithms with other existing approaches. Detests
Approach
Ave-Size
Ave-Acc (Best Acc)
Lymphography
All
18
0.875
BCOA-MI
3
0.780 (0.799)
0.001
1.66
BCOA-E
5.1
0.855 (0.859)
0.001
52.08
All
18
0.755
BPSO-MI
3
0.711 (0.711)
0.000
3.89
BPSO-E
6.3
0.740 (0.778)
0.017
61.45
WOA-SA
7.2
0.890
All
22
0.851
BCOA-MI
4
0.830 (0.830)
0.000
1.85
BCOA-E
4.2
0.862 (0.869)
0.001
54.21 2.13
SpectEW
KrvskpEW
WaveformEW
Std-Acc
Time
1.66
All
22
0.809
BPSO-MI
3.1
0.783 (0.794)
0.002
BPSO-E
4.5
0.812 (828)
0.010
WOA-SA
6
0.880
All
36
0.892
BCOA-MI
4.2
0.920 (0.945)
0.001
56.11
BCOA-E
13.9
0.980 (0.984)
0.001
649.60
All
36
0.985
BPSO-MI
4.7
0.797 (0.902)
0.027
76.23
BPSO-E
15.7
0.970 (0.977)
0.011
203.67
WOA-SA
12.8
0.980
641.0
641.01
All
40
0.771
BCOA-MI
17.5
0.660 (0.660)
0.000
172.62
BCOA-E
20.2
0.760 (0.760)
0.000
5100.90
62.89 313.38
All
40
0.696
BPSO-MI
19.4
0.620 (0.649)
0.011
1497.9
BPSO-E
20.9
0.688(0.698)
0.002
6102.76
WOA-SA
20.6
0.770
1770.48
5 Conclusions and Future Works The aim of this paper has been achieved by developing two filter-based evaluation measures based on entropy and MI, together with BCOA. The results demonstrated that BCOA-MI is capable of evaluating the relevance and redundancy of the pair features. In comparison, BCO-E shows its priority in assessing both the relevance and redundancy
336
A. M. Usman et al.
when dealing with a group of features. In either case, weighted values are employed. And it is found that the higher the values, the higher the number of features and the accuracy. BCOA-MI recorded the least accuracy compared with BCOA-E. Perhaps, it might be due to the feature interaction among a group of features by the BCOA-E. On the other hand, BCOA-E is computationally expensive compared with the BCOAMI. BCOA-MI interacts with only pair features that make it computationally faster. Apart from using different newer optimisation algorithms to solve similar problems for competitive results, in the future, we will investigate the use of the nondominated sorting mechanism together with BCOA to solve the conflicting issues in FS rather than using the weighted values. Acknowledgement. This document is the results of the research project funded by the Universiti Sains Malaysia via Research University Grant (RUI) (1001/PKOMP/8014084) together with Woosong University, Korea.
List of Acronyms FS MI COA BCOA BCOA-MI BCOA-E WOA-SA BPSOMI BPSOE
Feature Selection Mutual Information Cuckoo Optimisation Algorithm Binary Cuckoo Optimisation Algorithm Binary Cuckoo Optimisation Algorithm Mutual Information Binary Cuckoo Optimisation Algorithm Entropy Wolf Optimisation Algorithm Simulated Annealing Binary Particle Swarm Optimisation Mutual Information Binary Particle Swarm Optimisation Entropy
References 1. Arora, S., Anand, P.: Binary butterfly optimisation approaches for feature selection. Expert Syst. Appl. 116, 147–160 (2019) 2. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994) 3. Cervante, L., Xue, B., Zhang, M., Shang, L.: Binary particle swarm optimisation for feature selection: a filter based approach. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2012) 4. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997) 5. Estevez, P.A., Tesmer, M., Perez, C.A., Zurada, J.M.: Normalised mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009) 6. Fahad, L.G., Tahir, S.F., Shahzad, W., Hassan, M., Alquhayz, H., Hassan, R.: Ant colony optimisation-based streaming feature selection: an application to the medical image diagnosis. Sci. Program. 2020 (2020) 7. Frank, A., Asuncion, A.: UCI Machine Learning Repository, vol. 213, p. 2. School of Information and Computer Science, University of California (2010). https://archive.ics.uci.edu/ml. irvine,ca
Binary Cuckoo Optimisation Algorithm and Information Theory
337
8. Freeman, C., Kuli´c, D., Basir, O.: An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recogn. 48(5), 1812–1826 (2015) 9. Hancer, E., Xue, B., Zhang, M.: Differential evolution for filter feature selection based on information theory and feature ranking. Knowl. Based Syst. 140, 103–119 (2018) 10. Hancer, E., Xue, B., Zhang, M., Karaboga, D., Akay, B.: Pareto front feature selection based on artificial bee colony optimisation. Inf. Sci. 422, 462–479 (2018) 11. Hichem, H., Elkamel, M., Rafik, M., Mesaaoud, M.T., Ouahiba, C.: A new binary grasshopper optimisation algorithm for feature selection problem. J. King Saud Univ. Comput. Inf. Sci. (2019) 12. Huda, R.K., Banka, H.: A group evaluation based binary pso algorithm for feature selection in high dimensional data. Evol. Intell., 15 (2020) 13. Jain, R., Sawhney, R., Mathur, P.: Feature selection for cryotherapy and immunotherapy treatment methods based on gravitational search algorithm. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–7. IEEE (2018) 14. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2017) 15. Liu, W., Wang, J.: A brief survey on nature-inspired metaheuristics for feature selection in classification in this decade. In: 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC), pp. 424–429. IEEE (2019) 16. Mafarja, M.M., Mirjalili, S.: Hybrid whale optimisation algorithm with simulated annealing for feature selection. Neurocomputing 260, 302–312 (2017) 17. Mahmoudi, S., Rajabioun, R., Lotfi, S.: Binary cuckoo optimisation algorithm. Nature, pp. 1–7 (2013) 18. Mlakar, U., Fister, I., Brest, J.: Hybrid multi-objective PSO for filter-based feature selection. In: 23rd International Conference on Soft Computing. pp. 113–123. Springer, Cham (2017) 19. Moghadasian, M., Hosseini, S.P.: Binary cuckoo optimisation algorithm for feature selection in high-dimensional datasets. In: International Conference on Innovative Engineering Technologies, ICIET2014, pp. 18–21 (2014) 20. Moslehi, F., Haeri, A.: A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimisation for feature subset selection. J. Ambient Intell. Humanized Comput. 11(3), 1105–1127 (2020) 21. Nogueira, S., Sechidis, K., Brown, G.: On the stability of feature selection algorithms. J. Mach. Learn. Res. 18(1), 6345–6398 (2017) 22. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005) 23. Rajabioun, R.: Cuckoo optimisation algorithm. Appl. Soft Comput. 11(8), 5508–5518 (2011) 24. Russell, S.J., Russell, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016) 25. Samy, A., Hosny, K.M., Zaied, A.N.H.: An efficient binary whale optimisation algorithm with optimum path forest for feature selection. Int. J. Comput. Appl. Technol. 63(1–2), 41–54 (2020) 26. Tahir, M., Tubaishat, A., Al-Obeidat, F., Shah, B., Halim, Z., Waqas, M.: A novel binary chaotic genetic algorithm for feature selection and its utility in affective computing and healthcare. Neural Comput. Appl., 1–22 (2020) 27. Tavana, M., Shahdi-Pashaki, S., Teymourian, E., Santos-Arteaga, F.J., Komaki, M.: A discrete cuckoo optimisation algorithm for consolidation in cloud computing. Comput. Ind. Eng. 115, 495–511 (2018) 28. Tsanas, A., Little, M.A., McSharry, P.E.: A simple filter benchmark for feature selection. J. Mach. Learn. Res. 1, 1–24 (2010)
338
A. M. Usman et al.
29. Usman, A.M., Abdullah, A.U., Adamu, A., Ahmed, M.M.: Comparative evaluation of naturebased optimisation algorithms for feature selection on some medical datasets. i-manager’s comparative evaluation of naturebased optimisation algorithms for feature selection on some medical datasets. i-manager’s J. Image Process. 5(4), 9 (2018) 30. Usman, A.M., Yusof, U.K., Naim, S.: Cuckoo inspired algorithms for feature selection in heart disease prediction. Int. J. Adv. Intell. Inf. 4(2), 95–106 (2018) 31. Usman, A.M., Yusof, U.K., Naim, S.: Filter-based multi-objective feature selection using NSGA III and Cuckoo optimisation algorithm. IEEE Access 8, 76333–76356 (2020) 32. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)
Optimized Text Classification Using Correlated Based Improved Genetic Algorithm Thabit Sabbah(B) Al Quds Open University (QOU), Ramallah, Palestine [email protected]
Abstract. Text Classification (TC) is one of the basic processes in many Information Retrieval systems. Still, the performance of TC is a subject of improvement, and many approaches were proposes to achieve this aim. This work proposes an Improved Genetic algorithm (IGA) inspired by Genetic Engineering to enhance TC performance. In IGA, chromosome generation process were re-designed to diminish the effect of correlated genes. The Support Vector Machine (SVM) classifier were utilized based on the “Sport Text” popular dataset to evaluate the proposed approach. Empirical classification results were improved using IGA as compared to normal GA optimization. The proposed Improved Genetic Algorithm (IGA) improved the correct rates of TC by 1.39% in average. Keywords: Text classification · Improved Genetic Algorithm · Genetic engineering · Feature correlation
1 Introduction 1.1 Text Classification Text Classification and Text Categorization are common terms in the field of text analysis, however, the slightly difference between the two terms is related to the output while almost the process is similar. Text Classification places each portion of text whether it is a tweet, a paragraph, or even a document into a predefined class which is also known as label [1]. However, in Text categorization, usually, the classes are not predefined, and it is the responsibility of the classifier to determine how many and what are the categories to be generated as well as the placement of text portions among these categories [2]. In Text Classification, usually, an initial preprocessing stage is applied, and then the documents are represented numerically in Vector Space Model on which classifying algorithms such as Support Vector Machine can complete the task. For the aim of numerical representation of text, many different methods can be applied, however, most of these methods are based on assigning a numerical value (known as weight) for each term in the text [3].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 339–350, 2021. https://doi.org/10.1007/978-3-030-70713-2_32
340
T. Sabbah
The performance of TC is an open research problem, since there is no optimal solution has been provided, although many approaches were proposed in the last few decades. This research deals with Text Classification problem where the output classes are predefined. Although the aim is to enhance the performance of TC, however, the major contribution in this work is proposing and utilizing an Improved correlation based Genetic Algorithm (IGA), hence, the classical text classification approach is applied. The application of the proposed Improved Genetic Algorithm (IGA) for text classification requires the classification process to be fitted into the general approach of genetic optimization, which is based on the natural selection of effective features that lead to the higher classification performance, as discussed in the subsequent section. 1.2 Genetic Optimization Genetic Algorithm (GA) is an optimization method, which was initiated by [4]. GA is inspired from natural selection process and motivated by the saying “survival of the fittest”. During the last few periods, GA were employed in resolving optimization problems [5] in numerous applications and research domains such as classification, clustering, spam detection [6–8] and more. GA imitates the process of natural evolution by leading the random search in the space to find the optimal solution of the optimization problem. The random searches are guided through maintaining and merging the “effective” factors of a good solution to yield improved solutions. Commonly, GA consists of some stages that include many steps, following are brief descriptions of the general GA stages and steps as depicted in Fig. 1. New Start
Population
Initial Initialization Stage Population
Sequent population Generation
Evaluation
Mutation Crossover Selection
End
Yes
Termination
Elitism
No
Current Population
Fig. 1. General stages of genetic algorithm.
Initialization GA optimization process starts with the random generation of a population, were a population consists of individuals that are also known as chromosomes that represent a group of possible solutions of the problem. Usually in GA, solution is representable in various ways, such as binary string or numerical values [9]. In this work the binary string of zeros and ones representation is applied, in which the binary one indicates that
Optimized Text Classification Using Correlated Based IGA
341
the corresponding feature is involved in the solution. However, the sequence of binary zeros and ones string, which forms the chromosome usually, are referred as the genes. Fitness Evaluation and Ranking In this stage, the generated population is assessed; the fitness of each individual in the current population is calculated. The fitness value of an individual is determined by the fitness function, which is the function that measures the quality of the solution (i.e. how close is a solution to the best solution of the problem), and then the individuals are ranked based on their fitness values. In a classification problem, the accuracy or the correct rate and other performance measures can acts as the fitness function. Hence, this work considers the classification correct rate as the fitness value. After ranking, the algorithm terminates if termination condition is satisfied, otherwise, the algorithm iterates while generating successive populations through the stages of Selection and Sequent population generation that includes the processes of elitism, crossover, and mutation. Sequent Generation Production If a new population is to be generated (i.e. next generation) after the Selection process, GA usually divides the individuals into three major parts. These parts are: one: those individuals who will survive in the new generation as they are (i.e. without any modification), two: the individuals that will involve breeding to generate new individuals, and three: the individuals will be mutated. The first part is determined using the “Elitism” or the elitist selection strategy [10], in which the individuals with high fitness values are usually selected to remain and survive in the sequent generation; in order to guarantee that the new generation will at least be effective as the current generation. The remaining individuals of the current population are included in part two for breeding through the crossover process, while a random selection process controls the third part in which the individuals will be mutated. In crossover procedure, two individuals (known as parents) are chosen to produce the offspring(s). The frequently used techniques for parents’ selection include Roulette wheel, Tournament, and Stochastic uniform selection [11, 12]. Moreover, there are several frequent techniques which are employable to perform the crossover operation [12, 13] such as the single point, the two points, and the scattered methods. Figure 2 illustrates these methods.
Fig. 2. Popular crossover and mutation methods.
342
T. Sabbah
In single point and two points methods Fig. 2(c-a) and (c-b) of the crossover methods respectively, points (or lines) of crossover are placed on parental chromosomes. The genes located before the crossover point from the first parent and the genes located after the point from the second parent to be concatenated into the new offspring in the single point method. While, in the two points crossover method, two random points (or lines) of crossover are placed on parental chromosomes, and then, the genes located between the two crossover points from the second parent, and the genes located before the first point and after the second point from the first parent chromosome are selected (and vice versa) to produce the new offspring(s). However, in the scattered crossover (see Fig. 2(cc) crossover methods column) which is also known as uniform crossover operation, a binary randomly vector that matches in length the parental chromosomes is generated. Then, the genes corresponding the locations of ones in the binary vector from the first parental chromosome and those from the second parental chromosome corresponding to the zeros locations are selected to produce the new offspring. Mutation is another method used for new chromosomes production; the new individuals are produced through changing of current individuals genes. The mutation process aims to provide genetic diversity to the sequent population and enables the GA to discover a wider search space with the anticipation to increase the possibility of producing new individuals with best fitness, as well as to prevent GA from falling off into a local optimum. Various procedures are frequently applied to perform mutation operation such as Gaussian, flip bit, and interchange methods as shown in Fig. 2 (Mutation Methods Column). Based on different approaches, generally, the parents’ genes are inverted (either binary 1 turned into 0 or 0 turned into 1) to yield the new offspring. In the interchange mutation, shown in Fig. 2(m-a), a single gene is selected randomly from the chosen individual, then the value of selected gene is inverted (either 0 turned to 1 or 1 turned to 0) to produce the new offspring. Similarly, the flip bit mutation in Fig. 2(m-b), selects multiple genes to be inverted based on a generated random mutation vector whose length is same as parent chromosome’s length, the values of the genes from the chosen individual located corresponding to the 1 s locations in the mutation vector are inverted in order to produce the new offspring. However, in gaussian mutation as in Fig. 2(m-c), a random number (or mutation) from a Gaussian distribution is selected and then added to the values of all genes of the selected individual to produce the new offspring. It should be noted that Gaussian mutation could be utilized with the number chromosome representations while the flip bit and interchange are utilizable with binary representations. After the execution of the elitism, crossover, and mutation operations, the new individual chromosomes are aggregated (gathered) to form the new population (sequent generation), then the fitness and ranking process (described in Fitness Evaluation and Ranking) is performed before the check for termination. At termination, the top ranked individual among the last population is considered to the optimized solution. The proposed IGA involves a new mutation method which is applied to all newly generated chromosomes during sequent population generation stage in which the genes are inverted based on their correlation such the genes are injected or ejected from the chromosome to diminish the effect of the appearance of such correlated features in the a chromosomes.
Optimized Text Classification Using Correlated Based IGA
343
2 Related Works Since its proposal in 1975 as a nature inspired approach, GA was utilized in various domains for optimization. However, over the years, many works were proposed to enhance this algorithm. This work, presents the proposed Improved GA in which the improvement is performed in the step of new offspring(s) generation. Table 1 reviews some of the works proposed enhancements on GA in the last few years. Table 1. Review of major GA enhancement proposals Ref.
Proposed enhancement
[14]
Arithmetic crossover and mutation operators with variable length chromosomes
[7]
Mutation based on clustering points of extremism to overcome the limitations of k-means algorithm
[15]
Enhancement on memory updating based on environment schemes reaction for constrained knapsack problems in dynamic environments
[16]
Multi-hop Path Finding fitness function were proposed to extend network’s lifetime
[17]
Multi coefficients’ weighting based Elitism
[18]
Solution vector representation based on feature (sentence) index for text summarization
[19]
Decision trees based on Random forest were generated for each population; best fitness tree is selected for the classifier
[6]
Parents’ chromosomes selection for crossover operation based on cumulative term weights to generate new offspring
[20]
Recovery method from the uniformity (i.e. GA fails to produce better fitness values) by a migration test and step
[8]
Mature convergence approach by metropolis simulated annealing process after classical crossover and mutation operations
[21]
Feature’s subsets size controlling in the fitness function
These approaches list in Table 1 varies in various aspects such as: the domain of application, the stage in which the enhancement were proposed, and the representation of features, as well as the methods used in sequent population generation (i.e. elitism, crossover, and mutation). Although some of these approaches were domain specific, however few works were specified to the TC domain. In the domain of TC, GA has been proposed in many works to play different roles in the process of TC. [6] utilized GA for TC in the stage of feature selection, such that the crossover operation was based on term and document frequencies, while the mutation operation was based on the performance of classifier on the original parents. The thoughts behind this work is to perform these operations on useful information instead of random selection of features.
344
T. Sabbah
Earlier, GA was utilized by [22] for an informative feature subspace selection, where the GA was applied to reduce the dimensionality of the sub-feature space selected using the Information Gain (IG) feature selection method. Similar to [22–25] employed the GA for sub-feature space selection on top of the feature space generated by different traditional feature selection method such as IG, DF, MI, and CHI. Later, [26] has employed GA as a learning technique to improve text categorization by the automatic generation of categorizing rules.
3 Proposed Approach 3.1 Proposed IGA As mentioned earlier, the proposed IGA involves an extra mutation step that is applied to all newly generated chromosomes in which the genes are inverted based on their correlation such that the genes are injected or ejected from the chromosome to diminish the effect of the appearance or absence of correlated features in the a chromosomes. This approach is motivated by the idea of genetic engineering, in which the genes that are responsible of a feature(s) are identified and then manipulated to enhance the new generations. In this work, the correlated features (genes) are identified and then treated by either injection into or discharge from the chromosome to enhance the performance. The hypothesis behind this work is that the existence of the correlated features in a chromosome based on the random selection of features or because of crossover and mutation processes (which is also based on randomization) during chromosomes generation affects the performance of these chromosomes as well as the GA as a result. Therefore, this work proposes an extra mutation process (based on the correlation between features (genes)), such that the highly positively or negatively correlated features in a chromosome are firstly identified and then manipulated. Figure 3 shows the steps of the proposed Improved Genetic Algorithm (IGA). Start Initialization Stage
New Population
Initial
Sequent population Generation
Population Correlation based Mutation
Correlation Matrix Calculation
Mutation
Evaluation
Correlation Matrix
Crossover Elitism
Selection
End
Yes
Termination
No
Current Population
Fig. 3. Improved Genetic Algorithm (IGA).
Optimized Text Classification Using Correlated Based IGA
345
Figure 3 shows the extra step named “Correlation Based Mutation (CBM)” that is located before the “Evaluation” and “Selection” operations. The CBM process requires the calculation of “Correlation Matrix” of the feature space, which is calculated once by the “Correlation Matrix Calculation” process. This process performs the controlled mutation such that if two highly correlated variables (features) appears in a chromosome then one of these features is ejected (removed), however, if the two highly correlated variables (features) are not include in the chromosome then one of these features is injected (planted) into the chromosome. The ejection/injection process is achieved by inverting the binary value that represents that feature. The example in Fig. 4 illustrate the CBM.
Before CBM
V1
V2
V3
V4
..
..
..
Vn
1
1
0
0
1
..
..
0
• (V1, V2): variables of highest posive correlaon • (V4,Vn): variables of highest negave correlaon
Aer CBM
(a)
1
0
0
0
1
..
CBM ..
1
(b) Fig. 4. CBM operation illustration
Consider the calculated correlation matrix between feature space variables as in Fig. 4(a), in which the correlation value between V1 and V2 equals 0.5, while the correlation between V4 and Vn equals to −1. Now for any generated chromosome, for example the (Before CBM) chromosome in Fig. 4(b) if it includes both V1 and V2 , then one of them will be ejected as a result of CBM, while if both are absent from the chromosome, one of them will be injected. Moreover, if the chromosome includes V4 and Vn , then one of them will be ejected, while if both are absent from the chromosome, one of them will be injected. As showed in Fig. 4(b), the binary representation of (Before CBM) chromosome shows that V1 and V2 are included in this generated chromosome, however, these two features have the highest positive correlation. As a result, the CBM operation will eject one of these genes (features) by turning one of their binary representation 1 into 0 as showed in (After CBM) chromosome. Moreover, the example shows that none of V4 and Vn is included in the (Before CBM) chromosome; therefore, CBM injected one of these features into the chromosome by turning its binary representation 0 into 1. The shaded cells in the (After CBM) chromosome are the inverted binary values. 3.2 IGA Based Text Classification The employment of IGA for text classification follows the general TC classification approach, in which the Vector Space Model [27] of the dataset is split into two parts (i.e. training and testing), the IGA is applied as the feature selection method during the training phase where the classification correct rate is the fitness value. At the end
346
T. Sabbah
of training phase, the features included by the chromosome that achieved the highest fitness value are considered for testing phase.
4 Experimental Environment To test the performance of the proposed approach, the “Sports articles” dataset [28] is employed. Table 2 shows the utilized dataset specification which was downloaded from the Machine Learning Repository known as UCI1 . Table 2. Dataset specifications Specification
Value
Dataset characteristics
Multivariate, text
Number of labels
2 (subjective, objective)
Number of instances
1000 (365 subjective, 635 objective)
Attribute characteristics Integer Number of attributes
57
Based on the Yarpiz2 “Implementation of Binary GA in MATLAB”, the proposed IGA was run out using MATLAB R2016a software, under a 64bit Windows10 environment on a Core i7 2.1 GHz with 16 GB Ram laptop computer. The Support Vector Machine (SVM) classifier were employed under the default configuration, and the performance were measured based on the classification correct rate. The experiment is conducted 10 times, and the average of best scores in each iteration was recorded. Finally, the results were benchmarked against the basic GA performance under the same experimental parameters. Table 3 shows the IGA experimental parameters. Table 3. Experimental parameters Parameter
Value
Population size
20
Number of generations(iterations)
50
Crossover percentage
0.8
Selection
Roulette wheel (selection pressure = 8)
Mutation rate (mu)
0.02
1 https://archive.ics.uci.edu/ml/index.php. 2 www.yarpiz.com.
Optimized Text Classification Using Correlated Based IGA
347
In this experiment, the case of max positive and negative highly correlated variables were treated, such that the two variables with max positive correlation and the two variables with max negative correlation were treated so that only one variable of each pair enforced to appear in each generated chromosome.
5 Results and Discussions Figure 5 pictures the results of the proposed IGA compared to the basic GA performance on “Sports articles” dataset. It is seen in Fig. 5 that the correct rate of IGA based classification is increasing with the number of iterations in general, and outperforming the results of Basic GA classification. The correct rate of classification based on IGA was enhanced from 84.7% up to 86.2% through the iterations, while the Baseline GA enhances it from 84.4% up to 86.1%. The results based on IGA has an average improvement by 1.39% as compared to the Baseline GA. Moreover, it is noticeable from Fig. 5 that the performance of IGA initial population also outperform the corresponding population of Baseline GA, which also indicates that the proposed improvement is effective at initialization stage. 86.5%
Correct Rate
86.0%
85.5%
85.0% Baseline (GA) IGA
84.5%
84.0% 0
5
10
15
20
25 Iteraons
30
35
40
45
50
Fig. 5. Comparison of IGA and Basic GA results on “Sports articles” dataset
Table 4 shows the achieved classification results through the iterations and the improvements percentage. The described results in Fig. 5 and Table 4, supports the claim that the appearance of correlated features in chromosomes affects the performance of GA that represents the hypothesis behind this work. The proposed technique in this work ensures that a generated chromosome should include only one feature out of each pair of the positively and negatively highly correlated features, which explains the improvements achieved by the proposed IGA.
84.6
84.7
84.8
84.8
84.9
84.9
84.9
84.9
84.9
84.9
84.9
85
85
85.2
85.2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Average
84.4
2
Baseline (GA)(%)
85.5
85.5
85.5
85.5
85.5
85.3
85.1
85
85
84.9
84.9
84.9
84.9
84.8
84.8
84.7
IGA (%)
Classification results
1
Iteration
1.29
1.29
1.29
1.29
1.25
1.01
0.76
0.68
0.64
0.56
0.56
0.56
0.56
0.42
0.4
0.32
Improvement (%)
85.7 85.7
33
85.7
85.6
85.6
85.5
85.4
85.4
85.4
85.4
85.4
85.4
85.4
85.4
85.3
85.2
85.2
85.8
85.8
85.8
85.8
85.8
85.8
85.7
85.7
85.7
85.6
85.6
85.6
85.6
85.6
85.5
85.5
85.5
Baseline (GA) IGA (%) (%)
Classification results
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
Iteration
1.61
1.61
1.61
1.61
1.61
1.61
1.53
1.53
1.53
1.45
1.45
1.45
1.45
1.45
1.31
1.29
1.29
Improvement (%)
50
49
48
47
46
45
44
43
42
41
40
39
38
37
36
35
34
Iteration
Table 4. Classification results of IGA and Basic GA.
85.4
86.1
86.1
86.1
86
86
86
86
86
86
86
85.8
85.8
85.8
85.8
85.7
85.7
85.7
85.6
86.2
86.2
86.2
86.2
86.1
86.1
86.1
85.9
85.9
85.9
85.9
85.8
85.8
85.8
85.8
85.8
85.8
Baseline (GA) IGA (%) (%)
Classification results
1.39
2.13
2.13
2.11
2.05
2
2
2
1.76
1.72
1.72
1.72
1.65
1.65
1.65
1.61
1.61
1.61
Improvement (%)
348 T. Sabbah
Optimized Text Classification Using Correlated Based IGA
349
6 Conclusions This work presented a proposed improvement of GA named IGA inspired from the domain of Genetic Engineering, in which the improvement involved a controlled based mutation step that is applied to all generated chromosomes after either the initialization or sequent population generation stages. The controlled mutation treats highly correlated gens (feature) of a chromosome, such that only one gene of each highly correlated genes pairs is enforced to appear in the chromosome. The SVM based classification correct rate performance using the popular “Sports articles” dataset showed an improvement of the proposed method based results as compared to the baseline GA results, which supports the initial hypothesis behind this work and promote further research on the topic in various directions.
References 1. Jiang, S., Pang, G., Wu, M., Kuang, L.: An improved K-nearest-neighbor algorithm for text categorization. Exp. Syst. Appl. 39(1), 1503–1509 (2012). https://doi.org/10.1016/j.eswa. 2011.08.040 2. Hartmann, J., Huppertz, J., Schamp, C., Heitmann, M.: Comparing automated text classification methods. Int. J. Res. Mark. 36(1), 20–38 (2019). https://doi.org/10.1016/j.ijresmar.2018. 09.009 3. Sabbah, T., et al.: Modified frequency-based term weighting schemes for text classification. Appl. Soft Comput. 58, 193–206 (2017). https://doi.org/10.1016/j.asoc.2017.04.069 4. Holland, J.H.: Adaptation in Natural and Artificial Systems Ann Arbor. The University of Michigan Press 1, 975 (1975) 5. Thengade, A., Dondal, R.: Genetic algorithm-survey paper. In: MPGI National Multi Conference 2012, pp. 7–8. Citeseer (2012) 6. Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Exp. Syst. Appl. 49, 31–47 (2016). https://doi.org/10.1016/ j.eswa.2015.12.004 7. El-Shorbagy, M.A., Ayoub, A.Y., Mousa, A.A., El-Desoky, I.M.: An enhanced genetic algorithm with new mutation for cluster analysis. Comput. Stat. 34(3), 1355–1392 (2019). https:// doi.org/10.1007/s00180-019-00871-5 8. Salehi, S., Selamat, A., Bostanian, M.: Enhanced genetic algorithm for spam detection in email. In: 2011 IEEE 2nd International Conference on Software Engineering and Service Science, 15–17 July 2011, pp. 594–597 (2011) 9. Garzelli, A., Capobianco, L., Nencini, F.: 9 - Fusion of multispectral and panchromatic images as an optimisation problem. In: Stathaki, T. (ed.) Image Fusion, pp. 223–250. Academic Press, Oxford (2008) 10. Baluja, S., Caruana, R.: Removing the Genetics from the Standard Genetic Algorithm. In: Prieditis, A., Russell, S. (eds.) Machine Learning Proceedings 1995, pp. 38–46. Morgan Kaufmann, San Francisco (CA) (1995) 11. Yadav, S.L., Sohal, A.: Comparative study of different selection techniques in genetic algorithm. Int. J. Eng. Sci. Math. 6(3), 174–180 (2017) 12. Mirjalili, S.: Genetic algorithm. In: Evolutionary Algorithms and Neural Networks: Theory and Applications, pp. 43–55. Springer, Cham (2019) 13. Sivanandam, S., Deepa, S.: Introduction to Genetic Algorithms. Springer, Heidelberg (2007)
350
T. Sabbah
14. Nazarahari, M., Khanmirza, E., Doostie, S.: Multi-objective multi-robot path planning in continuous environment using an enhanced genetic algorithm. Exp. Syst. Appl. 115, 106–120 (2019). https://doi.org/10.1016/j.eswa.2018.08.008 15. Qian, S., Liu, Y., Ye, Y., Xu, G.: An enhanced genetic algorithm for constrained knapsack problems in dynamic environments. Nat. Comput. 18(4), 913–932 (2019). https://doi.org/10. 1007/s11047-018-09725-3 16. Al-Shalabi, M., Anbar, M., Wan, T.-C., Alqattan, Z.: Energy efficient multi-hop path in wireless sensor networks using an enhanced genetic algorithm. Inf. Sci. 500, 259–273 (2019). https://doi.org/10.1016/j.ins.2019.05.094 17. Wan, J., Chu, P., Jiao, Y., Li, Y.: Improvement of machine learning enhanced genetic algorithm for nonlinear beam dynamics optimization. Nucl. Instrum. Meth. Phys. Res. Sect. A 946, 162683 (2019). https://doi.org/10.1016/j.nima.2019.162683 18. Anh, B.T.M., My, N.T., Trang, N.T.T.: Enhanced genetic algorithm for single document extractive summarization. In: 10th International Symposium on Information and Communication Technology, Hanoi, Ha Long Bay, Viet Nam 2019, pp. 370–376. Association for Computing Machinery (2019) 19. Saidi, R., Bouaguel, W., Essoussi, N.: Hybrid feature selection method based on the Genetic Algorithm and Pearson Correlation coefficient. In: Hassanien, A.E. (ed.) Machine Learning Paradigms: Theory and Application, pp. 3–24. Springer, Cham (2019) 20. Tsai, C.-F., Chen, Z.-Y., Ke, S.-W.: Evolutionary instance selection for text classification. J. Syst. Softw. 90, 104–113 (2014). https://doi.org/10.1016/j.jss.2013.12.034 21. Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft. Comput. 12(2), 111–120 (2008). https://doi.org/10.1007/s00500-0070193-8 22. U˘guz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl. Based Syst. 24(7), 1024– 1032 (2011). https://doi.org/10.1016/j.knosys.2011.04.014 23. Gunal, S.: Hybrid feature selection for text classification. Turkish J. Electr. Eng. Comput. Sci. 20(2), 1296–1311 (2012) 24. Lei, S.: A feature selection method based on information gain and genetic algorithm. In: 2012 International Conference on Computer Science and Electronics Engineering 2012, pp. 355– 358. IEEE (2012) 25. Uysal, A.K., Gunal, S.: A novel probabilistic feature selection method for text classification. Knowl. Based Syst. 36, 226–235 (2012) 26. Afif, M.H., Ghareb, A.S., Saif, A., Bakar, A., Bazighifan, O.: Genetic algorithm rule based categorization method for textual data mining. Decis. Sci. Lett. 9(1), 37–50 (2020) 27. Sabbah, T., Selamat, A., Selamat, M.H., Ibrahim, R., Fujita, H.: Hybridized term-weighting method for Dark Web classification. Neurocomputing 173(Part 3), 1908–1926 (2016). https:// doi.org/10.1016/j.neucom.2015.09.063 28. Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA, USA (2019)
Multi-objective NPO Minimizing the Total Cost and CO2 Emissions for a Stand-Alone Hybrid Energy System Abbas Q. Mohammed1,2(B) , Kassim A. Al-Anbarri1 , and Rafid M. Hannun3 1 Faculty of Engineering, Electrical Engineering Department, Mustansiriyah University,
Baghdad, Iraq 2 Construction and Projects Department, University of Thi-Qar, Nassriyah, Thi-Qar, Iraq 3 Mechanical Engineering, College of Engineering, University of Thi-Qar,
Nassriyah, Thi-Qar, Iraq
Abstract. This article proposes a new algorithm called Nomadic People Optimizer (NPO) to find the optimal sizing of a hybrid energy system (HES), consisting of photovoltaic cell (PV), battery storage (BS), and diesel generator (DG). The HES supply the electricity to an academic building located in Thi-Qar Province, which is located in southern Iraq on latitude 31.06º and longitude 46.26º. The objectives of this algorithm are to reduce the total cost during the life cycle of the project, and this is an economic aspect that in turn reduces energy costs, the second goal is to reduce emissions of carbon dioxide. While continuing to supply the electrical load with electricity throughout the life cycle of the project for 25 years. The results show that optimal sizing of the HES achieved by 1875 number of the PV,687 number of the BS, and single DG. Keywords: Renewable energy · Solar energy · Nomadic people optimizer · Optimization
1 Introduction Electric power is one of the most sought after commodities of the human race. More than 70% of the world’s energy demand comes from fossil fuels burning, like crude oil, coal, carbon gas, and natural gas [1]. As the economies and world populations expand, energy demand rises, resulting in a rise in fossil fuel usage. Conventional fuel supplies are therefore limited and quickly depleting. Also, fossil fuels are responsible for pollution, including greenhouse gasses (GHGs) that lead to global warming [2]. Global energy demand is projected to grow by 56% from 2010 to 2040, rising CO2 emissions from 31.2 billion tons in 2010 to 45.5 billion tons in 2040. Moreover, in the coming decades, fossil-based oil, coal, and gas reserves will rapidly deplete [3]. These current and expected circumstances push scientists to pursue a strategy that involves improving energy-efficient systems [4, 5]. And replacing fossil-fuel power generation units with those that use renewable energy sources (RES) [6]. It is therefore important © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 351–363, 2021. https://doi.org/10.1007/978-3-030-70713-2_33
352
A. Q. Mohammed et al.
to use environmentally friendly energy sources for environmental improvement [7]. Clean energy sources such as solar, wind, hydro, and geothermal energy; are important in this sense since they are environmentally friendly [8]. However, for many reasons solar power may be an ideal choice in the future world for several reasons: firstly, the solar energy is the highest renewable source produced the electricity [9]. Studies have shown that global energy demand can be met satisfactorily with solar energy, since it is plentiful and readily available and have less the cost of energy [10]. Secondly, it is a promising global energy source because it is not exhaustible, offering high and increased production efficiencies than other energy sources [11]. Even if the RES is appealing with its environmentally friendly and easy to replenish solutions to the energy crises, they are unpredictable and cannot be anticipated. It is important to note that this issue can be alleviated by incorporating various the RES that will serve as a hybrid renewable energy system (HRES) and by integrating with other storage sources such as batteries, and the DGs to form the HES [12]. The design of HES is a very complex problem that has a high number of parameters, and thus classic design techniques can produce unsatisfactory results [13]. In some studies, the HOMER program is used to determine the HES configuration. The HOMER software, which optimizes HES, uses an enumerative technique in searching for the optimal design. The enumerative technique ensures the optimal solution possible, but an incredibly high Processor time can be needed. In recent years, the academic community, and the industry have been paying more attention to the optimization algorithms. These algorithms were applied to several problems and obtained incredibly good results [14]. The problem of these algorithms are the inability to balance between the local search and the global search, also have a control parameter that makes it more complicated. So, this study employs a new algorithm called “Nomadic People Optimizer (NPO)”, which relies on the pattern of the life of nomads. The NPO simulates the life pattern of the nomads during their search for sources of life (such as grass, and water for their animals). The algorithm also captures how the nomads have lived for several years, and how they have been continuously migrating from place to place in search of comfort. This algorithm has a peculiar ability to achieve the right balance between exploration and exploitation and does not rely on any control parameters to control the search process [15–18].
2 The Problem of Optimal Sizing 2.1 Construction of the Proposed HES The HES in this paper consists of four components (PV, BS, DG), in addition to the inverter that converts DC power to AC power, where the PV and BS are connected to the DC bus, while DG and the AC load connected to the AC bus. Fig. 1 shows a block diagram of the proposed HES.
Multi-objective NPO Minimizing the Total Cost and CO2 Emissions
353
Fig. 1. Block diagram of the proposed HES
2.2 Data Collection The output power of the PV dependent on the solar radiation and the temperature, so it is data was collected from the weather forecast in Thi-Qar for every hour of the year. Thi-Qar Province is characterized by high solar radiation. The annual solar radiation rate that fell on the Thi-Qar Province from the years (1961–1991) is 4.92 kW/m2 /day [19]. Figure 2 and Fig. 3 shows the solar radiation and the temperature for the first ten days of July to clarify. The load data collected from Thi-Qar Electricity Distribution Directorate, where the peak load equal to 180 kW, an average load equal to 96.2688 kW. Fig. 4 shows the load demand in the first ten days of July to clarify.
Fig. 2. The solar radiation for the first ten days of (July)
354
A. Q. Mohammed et al.
Fig. 3. The air temperature for the first ten days of (July)
Fig. 4. The load demand for the first ten days of (July)
2.3 Modeling of the Proposed HES PV System Model By using Eq. 1, the output power of the PV can be calculated at a time (t) [20] S × (1 + K(TPV − TPV−STC ) PPV (t) = Prat × DF × SSTC
(1)
Where: PPV (t) is the output power of the PV in time (t) [W], Prat is rated power of the PV [W], S is the solar radiation at a time (t) [W/m2 ], SSTC is the solar radiation at standard test conditions(STC) [W/m2 ], TPV is cell temperature [°C], TPV−STC is the cell temperature of a PV at SOC [°C], K is the temperature coefficient of the maximum power of the PV, and DF is a module derating factor.
Multi-objective NPO Minimizing the Total Cost and CO2 Emissions
Cell temperature can determine by using Eq. 2 [21]: NOCT − 20 TPV = Tair + S × 800
355
(2)
Where: NOCT is the normal operating cell temperature of the PV [°C], Tair is the air temperature [°C]. The datasheet of the PV that is used in this studying shown in Table 1. Table 1. PV datasheet Type
Poly Crystalline
SSTC
1000 W/m2
Prat
355 W
TPV −STC
25 ◦ C
K
−0.38%
Life cycle
25 years
DF
0.94%
Maintenance cost
0$
NOCT
45◦ C
Capital cost
220 $
BS System Model The BS is charged only when there is a surplus of energy that generated by RESs EEX/DE (t) > 0 , and the level of charge (LOC) of the BS is less than the maximum (LOC(t) < LOCmax ). During charging, the LOC of the BS at the time (t) is given by using Eq. 3 [12]: ED(t) × effBS (3) LOC(t) = LOC(t − 1) × (1 − Sdisc ) + Eren (t) − effINV EEX/DE (t) = Eren (t) −
ED(t) effINV
(4)
Where: LOC is a level of charge the BS in a present hour (W), LOC(t −1) is a level of charge BS in a previous hour (W), Sdisc is the rate of self-discharge for the BS in time (t) (%), Eren (t) is the energy that generated the RESs in time (t) (W), eff BS is the efficiency of BS (%), eff INV is the efficiency of the inverter (%), LOCmax is the maximum state charge of BS (W), ED(t) is the load demand of the energy in time (t) (W), and EEX/DE (t) is a surplus or a deficit of the RESs in time (t) (Wh). The discharge of only when there is an energy deficit that the BS is happening generated by RESs EEX/DE (t) < 0 and the LOC of the BS is greater than the minimum (LOC(t) > LOCmin ). During discharging, the LOC of the BS at a time (t) is given by using Eq. 5 [12]: ED(t) (5) − Eren (t) LOC(t) = LOC(t − 1) × (1 − Sdisc ) + effINV Where: LOCmin is the minimum level of charge BS (W).
356
A. Q. Mohammed et al. Table 2. BS datasheet Type
12 V Monoblock
DOD
0.4 × CMax
CMax
2400 W
Sdisc
0.01488%/h
eff BS
>= 90%
Maintenance cost
0$
LOCMax
2400 W
Capital cost
250 $
LOCmin
960 W
The datasheet of the BS that used in this studying shown in Table 2. CMax is the nominal capacity of the BS (W), (DOD) is the maximum allowable discharge depth of the BS (%).The life cycle of the BS dependent on DOD. Table 3 shows the relation DOD of the BS with the life cycle of the BS. Table 3. The life cycle of the BS DOD
Life cycle
At 30%
3600 cycle
At 40%
2600 cycle
At 50%
2000 cycle
At 60%
1500 cycle
Ideal float condition 10 ears
Inverter Model The number of inverters required for the HES is calculated by using Eq. 6 [22] NINV =
PH−Max PINV−Max
(6)
Where: PG−Max is represent the maximum power generated by the components that are connected with the inverter (W), PInv−Max is the maximum power of the inverter (INV)(W). The datasheet of the INV that is used in this studying shown in Table 4. DG System Model The DG is needed to supply the load continuously if the energy provided by the RESs cannot meet demand and the LOC at a minimum. The DG is independent of climate for supplying power but their operation has harmful effects such as CO2 diffusion that pollutes the environment. Also, it has high costs for maintenance. The consumption of the fuel F(t) in liters/h of the DG is linked to the rated power for the DG, and the average output power of the DG in one hour can be calculated by using Eq. 7 [12]: F(t) = (0.246 × PADG ) + (0.08415 × PRDG )
(7)
Multi-objective NPO Minimizing the Total Cost and CO2 Emissions
357
Table 4. INV datasheet Type
Bi-directional inverter
Life Span
25 years
PINV−Max
10000 W
Maintenance cost
20 $/year
effInv
>= 90%
Capital cost
3367 $
Where: F(t) is the consumption of the fuel in time (t) (l/h), PRDG is the rated power of the DG (kW). PADG is the average output power of the DG in time (t) (kW), 0.246, and 0.08415 are constant factors in l/kWh. In this study, the DG covers a deficit of the RESs without charge of the BS. In this study, the DG does not charge the BS. The datasheet of the DG that is used in this studying shown in Table 5. Table 5. DG datasheet Type
Perkins
Life span
15000 h
PRdg (Kw)
200 kW
Maintenance cost
0.309 $/h
PRdg (KVA)
250 A
Capital cost
29200 $
3 Nomadic People Optimizer (NPO) In this study, a new swarm-based metaheuristic (Nomadic People Optimizer (NPO)) is used to simulate the lifestyle of nomads as they travel in search of the life sources such as water, and the grass for their livestock. In this research, the creation of the NPO was influenced by the Bedouins and their lifestyles. The nomads are from the Sheik family and the rest of the families are considered normal. The Sheik as a clan leader is responsible for deciding where and when families can travel to ensure their safety, and the sheik also decides the pattern in which the ordinary families may be put around the Sheik’s house. The family tents are typically semicircularly distributed around the Sheik’s tent. The Sheik selects the families to find a new appropriate place; selected families are forced to travel randomly in various directions and distances in search of the best place to move. The nomads spend all their time traveling with their animals to find a better place to support their time [15]. 3.1 The Objective of the Proposed Algorithm The major purpose of the optimization is to find the optimal sizing of a stand-alone HES (PV system, DG system, and BS system) to minimize the total system cost (CT ), this is an economic aspect, which in turn reduces the cost of energy (COE). The second objective is minimizing the total CO2 emissions ECO2 T , with continuous provide the load by
358
A. Q. Mohammed et al.
the electricity (the reliability as constrained) through the life cycle of the project for 25 years. A typical system configuration N is a row vector of three elements (n1 to n3 ), where each element represents the required number of subsystem components in the HES. The row vector X is represented by using Eq. 8: N = [n1 n2 n 3 ]
(8)
Where: n1 is the number of modules required for the PV,n2 is the number of modules required for the BS, n3 is the number of modules required for DG. The following will explain these objectives:
4 Total System Cost CT The CT is one of the NPO’s objectives; it represents the total system cost for 25 years. The implemented CT in this study considered the total cost of capital of components system (PV system, BS system, INV system, and the DG system), as well as the total replacement cost of the system component through the same period; the total maintenance cost of the system components for 25 years was also considered, together with the total cost of fuel of the DG through 25 years. The CT was calculated by using Eq. 9: CT = CTCap + CTRep + CTMaint + CTFuel
(9)
Where: CTCap is the total capital cost of the system components ($), CTRep is the total replacement cost of the system component ($), CTMaint is the total maintenance cost of the system components ($), CTFuel is the total fuel cost of the DG ($), through the life cycle of the project (25) years. Minimizing CT leads to minimizing the cost of energy (COE) that is calculated by using Eq. 10 [23]. CT COE = n 1 ED(t)
(10)
Where n is the life cycle of this study (219000 h), ED(t) is the load demand in time (t) (Wh).
5 Total CO2 Emissions (ECO2 T ) The (ECO2 T ) are the second objective of the (NPO). The emissions occur when there a deficit of the RESs, and (LOC) of the BS at a minimum LOC(t) = LOCmin , so will run the load with the DG. This is an undesirable condition because an increase in the CO2 emissions that lead to global warming. The emissions are related to its fuel consumption, where CO2 emission in time (t) calculated by using Eq. 11 [12]: kg × F(t)(l/h). (11) ECO2 (t) = SECO2 l
Multi-objective NPO Minimizing the Total Cost and CO2 Emissions
359
Where: (SE02 ) is the specific carbon dioxide emissions by liter of fuel are given as 2.7 kg/l., ECO2 (t) is the CO2 emission of the DG at a time (t) (kg). The CO2 emission of a DG throughout the lifetime of the project (25) years, it’s the sum of all CO2 emissions are given by using Eq. 12: n EC02 T = EC02 (t) (12) t=1
5.1 The Constraints The constraint in this study is (reliability) mean continues to provide the load by the electricity through the life cycle of the project and give by using Eq. 13: E T (t) ≥
ED(t) eff INV
(13)
Where ETotal (t) is the total output energy of the system at time t (Wh), and is mathematically given by using Eq. 14: E T (t) = n1 × EPV (t) + n2 × EBS (t) + n3 × EDG (t) × effINV
(14)
Where: EPV (t) is the output energy of the PV in time (t) (Wh), EBS (t) is the output energy by the BS in time (t)(Wh), and EDG (t) is the output energy by a DG in time (t)(Wh). In this study used the weighting sum method (Multi-Objective) to minimize (C T ), and (EC02 T ) by Eq. 15:
(15) Q N i = C T N i × W1 + ECO2 T N i × W2 × COP Q(N i ) is an objective function that connected between (C T ), and(EC02 T ), (W 1 , W 2 )are the weights of (C T ), and EC02 T that used in this study to minimize the objectives together. (C T (N i )), and EC02 T N i are computed for each family (configuration) (N i ). The best families are families with the minimum C T , and EC02 T . The COP is the penalty factor for CO2 emission conversion to monetary value. In this study relied on Sweden’s emissions tax which is $150/ton, Sweden is among the countries with the highest carbon taxes in comparison to the other countries. The Flow chart of the optimization process shown in the Fig. 5, energy sources selection block shown in Fig. 6
6 Results and Discussions A new algorithm is used in this study NPO to determine the optimal size of the system, consisting of the energy supply systems (PV system, DG system, and BS system) with the number of inverters depend on the maximum electric produced by the HES, that connected with the INV and the maximum capacity of the INV to reduce the C T ,and T with continuous provide the load by the electricity for 25 years. In this study ECO2 number of the iterations was 50,LOCMax of 2400 W, LOCmin of (0.4×LOCMax ), (DOD) of (0.6 × LOCMax ).
360
A. Q. Mohammed et al.
Fig. 5. Flow chart of the optimization process
Fig. 6. Block Energy Sources Selection
Multi-objective NPO Minimizing the Total Cost and CO2 Emissions
361
6.1 Optimal Configuration The optimal sizing of the algorithm, when used each the components (PV-BS-DG) is N = [1875 687 1], comprising of 1875 number of the PV, 687 number of the BS, and single big DG. This configuration was implemented with many inverters (46). In this configuration, DG run when (LOC) at a minimum and (EEX/DE < 0), without charge of the batteries. The behavior of the system for the first ten days of the summer season (July), shown in Fig. 7. Table 6 showed the PV-BS-DG system configuration performance for 25 years. Table 6. a PV-BS-DG configuration performance for (25) years. Configuration (N) [1875 687 1] CT ($)
1.9559 × 106
T ECO2 (kg)
3.4887 × 106
COE ($/kWh)
0.092771926
From Table 6 notes that the NPO succeed in minimizing multi-objective function include the total system cost CT , and the total CO2 emissions ET CO2.
Fig. 7. The behavior of a PV-BS- DG system configuration for the first ten days of (July)
7 Conclusions This study presented the use of a new multi-objective optimization model by using a new algorithm called Nomadic People Optimizer (NPO) to find optimal sizing of the HES comprised of the (PV systems, BS system, and the DG) to minimizing multiobjective T .With continuous function. The first objective is the CT , the second objective is EC02 provide the load demand by the electricity through the life cycle of the project 25 years as a constraint. Based on the results of this study, it is concluded as follows:
362
A. Q. Mohammed et al.
1. The optimal HES configuration was comprised of a (PV-BS-DG) system. Where T ) is (1875) number optimum sizing that gives minimum (CT ), and minimum (ECO2 of the PV, (687) number of the BS, and single big DG. 2. The NPO could accept different inputs, like the air temperature, solar irradiation, and user load demand data for developing an optimal sizing of the HES in this study. The T ) with continuous NPO also succeeds in minimize the objectives (CT , and the ECO2 provide the load by the electricity for 25 years. In the future studies require the addition of other RESs to the HES such as (wind), and also use the DG to charge the BS.
References 1. Kuang, Y., Zhang, Y., Zhou, B., Li, C., Cao, Y., Li, L., et al.: A review of renewable energy utilization in islands. Renew. Sustain. Energy Rev. 59, 504–513 (2016) 2. Sinha, S., Chandel, S.S.: Review of recent trends in optimization techniques for solar photovoltaic–wind based hybrid energy systems. Renew. Sustain Energy. Rev. 50, 755–769 (2015) 3. Iqbal, M., Azam, M., Naeem, M., Khwaja, A.S., Anpalagan, A.: Optimization classification, algorithms and tools for renewable energy: a review. Renew. Sustain. Energy Rev. 39, 640–654 (2014) 4. Evins, R.: A review of computational optimisation methods applied to sustainable building design. Renew. Sustain. Energy Rev. 22, 230–245 (2013) 5. Stevanovi´c, S.: Optimization of passive solar design strategies: a review. Renew. Sustain. Energy Rev. 25, 177–196 (2013) 6. Sharma, N., Bhat, I.K., Grover, D.: Optimization of a smooth flat plate solar air heater using stochastic iterative perturbation technique. Sol. Energy 85(9), 2331–2337 (2011) 7. Alanne, K., Saari, A.: Distributed energy generation and sustainable development. Renew. Sustain. Energy Rev. 10(6), 539–558 (2006) 8. Kannan, N., Vakeesan, N.: Solar energy for future world: a review. Renew. Sustain. Energy Rev. 62, 1092–1105 (2016) 9. Panwar, N.L., Kaushik, S.C., Kothari, S.: Role of renewable energy sources in environmental protection: A review. Renew. Sustain. Energy Rev. 15(3), 1513–1524 (2011) 10. Lewis, N.S.: Toward cost-effective solar energy use. Science 315(5813), 798–801 (2007) 11. Nozik, A.J.: Photo electrochemistry: applications to solar energy conversion. Annu. Rev. Phys. Chem. 29(1), 189–222 (1978) 12. Ogunjuyigbe, A.S.O., Ayodele, T.R., Akinola, O.A.: Optimal allocation and sizing of PV/Wind/Split-diesel/Battery hybrid energy system for minimizing life cycle cost, carbon emission and dump energy of remote residential building. Appl. Energy 171, 153–171 (2016) 13. Dufo-López, R., Bernal-Agustín, J.L.: Influence of mathematical models in design of PVDiesel systems. Energy Convers. Manage. 49(4), 820–831 (2008) 14. Sedghi, M., Hannani, S.K.: Modeling and optimizing of photovoltaic-wind-diesel hybrid systems for electriication of remote villages in Iran. Scientia Iranica B 23(4), 1719–1730 (2016) 15. Salih, S.Q., Alsewari, A.A.: A new algorithm for normal and large-scale optimization problems: nomadic people optimizer. Neural Comput. Appl. 32(14), 10359–10386 (2020) 16. Tao, H., Salih, S.Q., Saggi, M.K., Dodangeh, E., Voyant, C., Al-Ansari, N., Yaseen, Z.M., Shahid, S.: A newly developed integrative bio-inspired artificial intelligence model for wind speed prediction. IEEE Access 8, 83347–83358 (2020)
Multi-objective NPO Minimizing the Total Cost and CO2 Emissions
363
17. Abdulwahab, H.A., Noraziah, A., Alsewari, A.A., Salih, S.Q.: An enhanced version of black hole algorithm via levy flight for optimization and data clustering problems IEEE. Access 7, 142085–142096 (2019) 18. Salih, S.Q., Alsewari, A.A., Al-Khateeb, B., Zolkipli, M.F.: Novel multi-swarm approach for balancing exploration and exploitation in particle swarm optimization. In: Saeed, F., Gazem, N., Mohammed, F., Busalim, A. (eds.) Recent Trends in Data Science and Soft Computing: Proceedings of the 3rd International Conference of Reliable Information and Communication Technology (IRICT 2018), pp. 196–206. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-319-99007-1_19 19. Al-Khazzar, A.A.A., Khaled, A.J.: A comparative study of the available measured global solar radiation in Iraq. J. Renew. Energy Environ. 4, 47–55 (2017) 20. Suresh, M., Meenakumari, R.: An improved genetic algorithm-based optimal sizing of solar photovoltaic/wind turbine generator/diesel generator/battery connected hybrid energy systems for standalone applications. Int. J. Ambient Energy, 1–8, (2019). 21. Javed, M.S., Song, A., Ma, T.: Techno-economic assessment of a stand-alone hybrid solarwind-battery system for a remote island using genetic algorithm. Energy 176, 704–717 (2019) 22. Hassan, A., El-Saadawi, M., Kandil, M., Saeed, M.: Modeling and optimization of a hybrid power system supplying RO water desalination plant considering CO2 emissions. Desalination Water Treatment 57(26), 11972–11987 (2016) 23. Sedghi, M., Kazemzadeh Hannani, S.: Modeling and optimizing of PV–wind–diesel hybrid systems for electrification of remote villages in Iran. Scientiairanica 23(4), 1719–1730 (2016)
A Real Time Flood Detection System Based on Machine Learning Algorithms Abdirahman Osman Hashi1,3(B) , Abdullahi Ahmed Abdirahman1 , Mohamed Abdirahman Elmi1 , and Siti Zaiton Mohd Hashim2 1 Faculty of Computing, SIMAD University, Mogadishu, Somalia
{aaayare,m.abdirahman}@simad.edu.so 2 Faculty of Computing„ Department of Artificail Intelligence and Big Data, Universiti
Malaysia Kelantan, 16100 Pengkalan Chepa, Kelantan, Malaysia [email protected] 3 Faculty of Informatics„ Department of Computer Science, Istanbul Teknik Üniversitesi, 34469 Masklak, ˙Istanbul, Turkey
Abstract. Flood is expressed as water overflowing onto the ground that usually is dry or an increase of water that has a significant impact on human life and it is also declared as one of the most usually natural phenomenon, causing severe financial crisis to goods and properties as well as affecting human lives. However, preventing such floods would be useful to the inhabitants in order to get a sufficient time to evacuate in the areas that might be possible floods can happen before the actual floods happen. To address the issue of floods, many scholars’ proposed different solutions such as developing prediction models and building a proper infrastructure. Nevertheless, these proposed solutions are not efficient from an economic perspective in here, Somalia. Therefore, the key objective of this research paper is to intend a new robust model which is a real-time flood detection system based on Machine-Learning-algorithms; Random-Forest, Naïve-Bayes and J48 that can detect water level and measure floods with possible humanitarian consequences before they occur. The experimental results of this proposed method will be the solution of forth mentioned problems and conduct research on how it can be easily simulate a novel way that detects water levels using hybrid model based on Arduino with GSM modems. Based on the analysis, Random-Forest-algorithm were outperformed other machine-learning-methods in-terms of accuracy over other-classification with 98.7% accuracy in-comparison with 88.4% and 84.2% for NaiveBayes and J48 respectively. The proposed method has contribution to the field of study by introducing a new way of preventing floods in the field of Artificial, data mining. Keywords: Machine learning · Naive Bayes · Random Forest · Artificial intelligence · Data mining
1 Introduction It’s well known that the ageing of natural disaster cannot be escaped however prealarming systems and proper managing can mitigate its severity to tackle this case. Most © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 364–373, 2021. https://doi.org/10.1007/978-3-030-70713-2_34
A Real Time Flood Detection System
365
of the developed countries, meteorological department has flood-monitoring cell that may not appropriately equipped with intelligent system and scalable flood alarming system or some countries may not have that department including our country, Somalia. As a result, people from areas that flood affected in the prone-areas are dealing with the results of the flood every year [1]. In Somalia, the dangerous flash floods occurred in Beledweine town of Hiran region last year had reported over 100,000 people have been displaced [2]. Consequence to that also, River-flooding has so far an impact on an estimated 620,000 people in Somalia as OCHA stated. More than around 213,800 of these people have been displaced and fled from their houses consequences of the heavy rains that happened in Ethiopia that is received across the country to be affected, especially in southern regions that is also Hiiran Region is among them, according to the UNHCR-led Protection [3]. Increased rainwater since the beginning of May last year (2019) has stated in a sharp rise in water levels in Jubba and Shabelle rivers as result and this might lead to severe flooding in central and southern regions of Somalia. According to the UNHCR-led, the flood magnitude that occurred in Baladweyn last year is reported as the highest water levels in history that occurred in that region as well as the whole regions in the country. Moreover, according to data collected by humanitarian partners resulted that the current flood levels exceed a 50 years return period as it has been affected more than 427,000 people and of these nearly 174,000 have been fled in their homes as a result of the flash and river-flooding that occurred in Hirshabelle state [2]. According to (CMHC), these floods can occur at any time of the year and are most often caused by heavy rainfall that may happen in Ethiopia that would cause then to raise the level of Shabelle and Jubba rivers. Consequences to that would be many people to evacuate and lost their houses. Hence in recent years, due to the rapid advent of communication technologies, Global Positioning System know as (GPS) that equipped with wireless devices and GSMs have been broadly deployed on various public and private positions, generating huge amount of data that could be implemented to measure water levels, locations, and so on, for fleet management [4]. In order to predict & detect flood, Machine learning applications can give valuable solutions to tackle this phenomenon case. Moreover, it is another inevitable job to resist devastation’s flood if there is possible method to inform population living around the area through appropriate and properly way in real-time [1, 8]. To date, detecting variations of water-levels in various of flood zone is widely utilized sensor technologies to share data with inhabitants [9]. The purpose of this paper is to simulate a real-time Flood Detection System based on Machine Learning that can detect water level and measure floods with possible humanitarian consequences before they occur. This paper is structured with five sections. The following section provides a background and related work of flood detection methods. The third section describes the methodology in which this framework is to be implemented. The fourth section presents the experiment design of the proposed framework. Finally, the fifth section presents a result analyze, and its conclusion.
366
A. O. Hashi et al.
2 Background and Related Work There are many natural disasters around the globe, however floods are known that they are the most critical, triggering huge damage to the human-life, infrastructure and agriculture [2, 4]. Hence there must the use of some sort of machine learning algorithm. Machine learning is one of the prominent fields in artificial intelligence that came from the improvement of self-learning algorithms to get knowledge from that data so as to create the forecasts. In these days, the data are huge, and these data can be can be converted into knowledge by using an algorithm which is the field of machine learning [5]. Machine learning gives a good effective option for taking the knowledge into data to increasingly rise the forecast models’ performance and create decisions that came from that data. Hence, the meaning of this research is if we desire to forecast the level of the river in a particular place we can use a special ML algorithm with our past data and if it is successfully recognized it, then it will do better prediction for future water levels [8]. Artificial-neural-networks, neuro-fuzzy are among the numerous ML algorithms that were stated as effective in term of short and long for flood prediction and the following subsequence explains each of these algortihms. 2.1 Artificial-Neural-Networks (ANNs) Artificial-neural-networks are systems that have numerical model with a successful proficient parallel processing. Enabling them to imitate the utilization between neural units and the biological neural network [5]. Among all ML-methods, ANNs are the most important popular learning algorithm, known to be easily changed and effective inmodeling complex flood processes and it has a tolerance with a high fault also it brings an accurate approximation [6]. If we compare convention statistical model to ANNs, ANN approach was utilized with greater accuracy for the help of predictions. Since their first time usage ANN In the 1990s, this algorithms is the most essential prevalent method for flood prediction [7]. 2.2 Adaptive-Neuro-Fuzzy Inference System (ANFIS) The fuzzy-logic of Zadeh [4] could be some soft computing technique with a qualitatively model technique using natural-language. It is also know that Fuzzy-logic is a basic mathematical-model for calculation, which works on consolidating expert knowledge into a fuzzy-inference system to able classification of different date. An FIS another play actor for human-learning through an prediction function with less complexity of computations, which gives good ability for nonlinear-modeling of extreme hydrological events [6], especially flood ones. 2.3 Decision Tree (DT) and Ensemble-Prediction-Systems (EPSs) The machine language strategy of decision tree is one of modeling predicative for suppliers with a thick application in stimulation-of-flood [8]. Decision tree uses branches from tree of decisions that is high precision to the leaves those are the target ones. In
A Real Time Flood Detection System
367
classification-trees (CT), the last factorsin a decision-tree have a separate set of values where leaves stand-for class-labels and branches on behalf of conjunctions of featureslabels [10, 15]. Meanwhile, a lot of language simulating machine alternatives were showed flood simulating model having a very tough background [9, 12]. Hence, there is a developing approach to vary from single form of prediction to an ensembleof models which is fit for not many applications, cost, dataset [13, 16].
3 Proposed Methodology As the research methodology gives a structure overview on the sequences of the follows. The overall framework of this research work will be in different phases either it is hardware or it is software development. It is known that a successful early forecasting and flood warning system will benefit to the population as it acts as a first stage of initialaction for the victims in-terms of human effect suffering and infrastructures. While SMS is an appropriate alert announcement tool that can distribute the data to the flood-victims within particular area. Hence, the first phase is to find out and select of scholarly information to acquire the adequate knowledge required to carry out this research. The main source of information and knowledge in this phase would be observing the river and data gathering from river side community. An example of the adequate knowledge is asking the river side community for the best place in the river that would be good for implementing this proposed architecture. Second phase is going to be the implementation phase. A water level sensor would be putting in the river in order to get and send dynamic real-time-data to the flood-controldevice for mining purposes about the data. This sensor device has it is own function. It will detect the water level that could be normal, above normal or a dangerous condition. After data collected and converted from analog to digital, Machine learning algorithms will be trained to decide if there is a critical condition or not. Random Forest, Naive Bayes and J48 are the machine algorithms that would do the classification base on their accoury. Last Phase will be Data sharing phase. Data that has mined from different algorithms will be transmitted to the core control unit (microcontroller PIC). PIC obtains the one has high accurate in term of their classifiers. After the high accuracy data obtained, the data can be monitored and controlled from anyplace in the region that is available GSM service. 3.1 Experimental Design The proposed framework architecture is designed to be a hybrid model based on machinelearning-algorithms with sort of hardware devices that would be able to detect water levels. Data collected from water sensor would be transmitted to the main controller which is (PIC Microcontroller).
368
A. O. Hashi et al.
Software In order to develop the forthmentioned system we decided to use a java Programming language using Audio Platform and Weka for data mining. Java is a general-purposeprogramming language that has a few implementation dependencies as possible. It is intended to let application developers write once, run anywhere. Arduino Software was used as our IDE and is a cross-platform-application that is written in the programming-language Java itself. It is used to write and upload programs to Arduino compatible boards, but also, with the help of 3rd party cores, other vendor development boards. Hardware In order to implement the system, a number of hardware devices were used. First GSMmodem and is a specialized type of modem which accepts a SIM-card, and operates over a subscription to a mobile operator, just like a mobile phone used. When a GSM-modem is connected to a computer, this allows the computer to use the GSM-modem to communicate over the mobile network. We most frequently used for sending and receiving SMS messages. Secondly, water sensor is used which is an electronic device that is designed to detect the presence of water for purposes such as to provide an alert in time to allow the prevention of water damage. PIC16F877A is an Integrated-Circuit (IC) embedded in a single chip and act as a voltage level converter. PIC16F877A is capable of converting 5V TTL Logic level to TIA/EIA-232-F level and can take up to ±30V input. It is normally used for the communication between microcontroller and Laptop/PC.
Fig. 1. Proposed framework for flood detection
Figure1 demonstrates the whole process guide for software development and hardware of the proposed-design-architecture. One water-level-sensors have been epitomized in order to offer a real-time information to the flood-control-center for processing dedications to be used lately. That sensor has special tasks. It will detect the normal level of water signals, while it transfer data to the Microcontroller and then machine learning algorithms will decide the decision of this data. Finally if there is a dangerous situation, SMS message will come from the GSM SIM900.
A Real Time Flood Detection System
369
4 Results and Discussion The anticipated model which is based on detecting the water level and training three machine-learning-algorithms to measure the accuracy of the water level those are Random-Forest-algorithm, NaiveBayes algorithm and J48 has implemented. As we mentioned, generally there are two key works done in this research: First using the Arduino and GSM devices to detect as a real time from the river and collecting these data as a dataset. Whereas second step to mining the collected data and be training these three selected algorithms in order to know the water level accuracy that is improving the accuracy performance of the flood detection methods. In the following section, the experiment results and analysis is discussed to contain all the forementioned key components: Arduino with GSM has successfully proved its essential role in generating a good data collection tool. With the proper water level parameters setting, it succeeds to achive a better-accuracy than the ordinary solution, such as Sendo-sensor. The upcoming figures are the results that obtained from the algorithms. Each table or figure is followed with additional description.
Fig. 2. Normal Water Level
As Fig. 2 demonstrates that the water level is a normal, however is it is also essential to take consideration that flood warning is increasing (Red colour) while normal water is opposide to that and dramaticily decreasing (Blue colour). Meanwhile, the three algorithms has no much different values in term of correct classified instances and incorrect classified instances while the water level is normal.
370
A. O. Hashi et al.
Fig. 3. Flood Water Level
However, as Fig. 3 shows us that water level is increased rapidly; we observed that the three algorithms has different values in term of of correct classified instances and incorrect classified instances. Based on the analyze we observed that the three machine algorithms have different variations and the upcoming table will demonstrate classifier output for their classifications as the upcoming tables will illustrate. Table 1. Detected water level in term of the three algorithms Parameters
Methods Random Forest NaiveBayes J48
Correctly classified instances 98.7%
88.4%
84.2%
Incorrect classified instance
22.8%
2.8%
2.9%
Root mean squared error
0.0904
0.1387
0.1970
Total number of instances
1000
1000
1000
Table 1, needs to illustrate that our three experiments was conducted with imbalanced data that is the actual data that obtained from real flood from the Aurdinu device. The Random Forest corrected classified Instance gained 98.7% using, whereas instance of incorrect classified is 22.8%. Secondly, NaiveBayes algorithm demonstrated betterresult than J48 algorithm. The accured classified illustrated 88.4%, while, Instances
A Real Time Flood Detection System
371
of incorrect-classified for this algorithm were 2.8%. Moreover, 84.2% of the correctclassified-Instances were achieved by applying J48 algorithm, whereas 6.9% incorrectclassified. The best result has achieved Random Forest compared other classifications with 98.7% for correct-classified-Instances, while, incorrect-classified-Instances indicated only 22.8% as mentioned before. Table 2. Detected water level in term of accuracy by class Method
Accuracy by class True positive
True negative
Recall
Random Forest
0.989
0.004
0.976
NativeBayes
0.886
0.014
0.888
J48
0.842
0.021
0.842
Table 2 demonstrates the precision by class of-classifications. In order to increase information benefit from the data collected from water levels, we trained a number of machine learning methods to ascertain the appropriate techniques that could able to produce great performance and also accuracy. Random Forest algorithm was achieved high performance then other methods. Because of that Random Forest has gotten the highest True Positive which is 0.989% whereas J48 got the lowest one which is 0.842%. Meanwhile, in term of True negative; Ramdom forest is achived the lowest negative which 0.004% whereas J48 achieved the highest value for 0.021%. In order to avoid over-fitting issues and generating easy to set constraints, Random Forest can deal with supervised learning algorithms and utilize a huge number of decision-tree-models. Using this model will provide help and supports to those are living around the river areas that always face many circumstances that are coming from the rivers such as flooding.
5 Conclusions Flood-detection systems have been developed for immediate response to the authority people before it happens. It will inform you the state of the current water-level by using Aurdino-sensor-network, which will then provide SMS notification if there is a dangerous situation through GSM modem. Three machine-learning-algorithms were tested to classify data. It is an neglectable that Random-Forest-algorithm were outperformed other machine-learning-methods in-terms of accuracy over other-classification with 98.7% accuracy comparerd with 88.4% using NaiveBayes algorithm that plays an essential role. Furthermore, J48 achieved 84.2% accuracy close to the NaiveBayes, however it is slightly lower than that algorithm.. However, this proposed method can be further-improved or enhanced to achieve-todo more advanced technology and well applications that is capable for data mining in the next phase of research. For future enhancement, this proposed architecture can be advanced by adding by Video surveillance and GPS-module to track-the-equipment that
372
A. O. Hashi et al.
installed in different-areas. Finally, clustering algorithms can be applied for machinelearning-algorithm in order to improve the results of proposed model. Acknowledgments. The authors would like to express their cordial thanks to SIMAD University, for the Research University Grant no. 15. The authors would also like to acknowledge grateful to SIMAD Research Center for their support and making this research a success.
References 1. Khalaf, M., Hussain, A.J., Al-Jumeily, D., Fergus, P., Idowu, I.O.: Advance flood detection and notification system based on sensor technology and machine learning algorithm. In: 2015 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 105–108. IEEE (2015) 2. Halbeeg: 100,000 People Displaced by Floods in Beledweyne, 27 April 2017. https://en. halbeeg.com: https://en.halbeeg.com/2018/04/27/100000-people-displaced-by-floods-in-bel edweyneofficial/ 3. OCHA: Floods: Response plan. Humanitarian Country Team and partners. K. Elissa, “Title of paper if known” (2018, unpublished). 4. Baydargil, H.B., Serdaroglu, S., Park, J.S., Park, K.H., Shin, H.S.: Flood detection and control using deep convolutional encoder-decoder architecture. In: 2018 International Conference on Information and Communication Technology Robotics (ICT-ROBOT), pp. 1–3. IEEE (September 2018) 5. Mosavi, A., Rabczuk, T., Varkonyi-Koczy, A.R.: Reviewing the novel machine learning tools for materials design. In: Luca, D., Sirghi, L., Costin, C. (eds.) Recent Advances in Technology Research and Education, pp. 50–58. Springer International Publishing, Cham (2018) 6. Li, L., Xu, H., Chen, X., Simonovic, S.: Streamflow forecast and reservoir operation performance assessment under climate change. Water Resour. Manag. 24, 83 (2010) 7. Wu, C., Chau, K.-W.: Data-driven models for monthly streamflow time series prediction. Eng. Appl. Artif. Intell. 23, 1350–1367 (2010) 8. Zadeh, L.A.: Soft computing and fuzzy logic. In: Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems: Selected Papers by Lotfi a Zadeh, pp. 796–804. World Scientific, Singapore (1996) 9. Choubin, B., Khalighi-Sigaroodi, S., Malekian, A., Ahmad, S., Attarod, P.: Drought forecasting in a semi-arid watershed using climate signals: a neuro-fuzzy modeling approach. J. Mt. Sci. 11, 1593–1605 (2014) 10. Choubin, B., Khalighi-Sigaroodi, S., Malekian, A., Ki¸si, Ö.: Multiple linear regression, multi-layer perceptron network and adaptive neuro-fuzzy inference system for forecasting precipitation based on large-scale climate signals. Hydrol. Sci. J. 61, 1001–1009 (2016) 11. Dineva, A., Várkonyi-Kóczy, A.R., Tar, J.K.: Fuzzy expert system for automatic wavelet shrinkage procedure selection for noise suppression. In: Proceedings of the 2014 IEEE 18th International Conference on Intelligent Engineering Systems (INES), Tihany, Hungary, 3–5 July 2014, pp. 163–168 (2014) 12. Hashi, A.O., Hashim, S.Z.M., Anwar, T., Ahmed, A.: A robust hybrid model based on KalmanSVM for bus arrival time prediction. In: Saeed, F., Mohammed, F., Gazem, N. (eds.) Emerging Trends in Intelligent Computing and Informatics: Data Science, Intelligent Information Systems and Smart Computing, pp. 511–519. Springer International Publishing, Cham (2020) 13. Tiwari, M.K., Chatterjee, C.: Development of an accurate and reliable hourly flood forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach. J. Hydrol. 394, 458–470 (2010)
A Real Time Flood Detection System
373
14. Amir Mosavi, K.-W.C.: Review flood prediction using machine learning models. Water 2018, 1–41 (2018) 15. Hameed, S.S., et al.: Filter-wrapper combination and embedded feature selection for gene expression data. Int. J. Adv. Soft Comput. Appl. 10(1), 90–105 (2018) 16. Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F., Pradhan, B.: A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci. Total Environ. 644, 954–962 (2018)
Extracting Semantic Concepts and Relations from Scientific Publications by Using Deep Learning Fatima N. AL-Aswadi1,2 , Huah Yong Chan1(B) , and Keng Hoon Gan1 1 School of Computer Sciences, Universiti Sains Malaysia, 11800
Gelugor, Pulau Pinang, Malaysia [email protected], {hychan,khgan}@usm.my 2 Faculty of Computer Sciences and Engineering, Hodeidah University, Hodeidah, Yemen
Abstract. With the large volume of unstructured data that increases constantly on the web, the motivation of representing the knowledge in this data in the machineunderstandable form is increased. Ontology is one of the major cornerstones of representing the information in a more meaningful way on the semantic Web. The current ontology repositories are quite limited either for their scope or for currentness. In addition, the current ontology extraction systems have many shortcomings and drawbacks, such as using a small dataset, depending on a large amount predefined patterns to extract semantic relations, and extracting a very few types of relations. The aim of this paper is to introduce a proposal of automatically extracting semantic concepts and relations from scientific publications. This paper introduces a novel relevance measurement for concepts, and it suggests new types of semantic relations. Also, it points out of using deep learning (DL) models for semantic relation extraction. Keywords: Concept extraction · Deep learning · Ontology construction · Relevance measurements · Semantic relation discovery
1 Introduction The substantial growth of unstructured data makes manually ontology construction a hard and laborious task as well as it is time-consuming. This unstructured data contains much useful knowledge, but unfortunately, this knowledge is not in the machineunderstandable form, it is just in a human-understandable form [1, 2]. Therefore, constructing the ontologies is considered an important task to make this data in the machineunderstandable form as well as human-understandable form. The ontology is a data model to represent a set of concepts and the relationships among those concepts within a domain [1]. Many applications, such as Automated Fraud Detection, Semantic Searching, Decision-Support and Question-Answering (QA) systems are built based on ontologies [3–6]. Most of the recent research direct their efforts towards using ontologies because © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 374–383, 2021. https://doi.org/10.1007/978-3-030-70713-2_35
Extracting Semantic Concepts and Relations from Scientific Publications
375
these systems rely on results of knowledge modelling [7]. Thus, by using ontologies, the query or scenario construction, as well as the inferencing, are enriched [5, 7]. There are many existing ontologies repositories or tools that are constructed or seek to construct ontologies either manually, cooperatively or automatically, for example, WordNet, which is considered one of the oldest and most popular ontology repositories. It is a high accuracy resource that was manually constructed by linguists. However, the progress of WordNet is quite slow comparing with streaming data across the Web, as well as it lacks many modern terms, such as Covid-19, cloud computing, deep learning or even netbook [8]. Another example of ontology repositories is YAGO (Yet Another Great Ontology) [9], it is an ontology that built on top of both WordNet and Wikipedia. YAGO uses the Wikipedia category pages rather than using information extraction methods to leverage the knowledge of Wikipedia. However, Wikipedia categories are often quite fuzzy and irregular [8] (it does not follow the expected pattern and it is open edit for anyone), that is considered one of the disadvantages of YAGO repository. Also, YAGO uses structured data for building the ontology which might result to waste the space if not all arguments of n-array facts are known. In addition, YAGO is relatively little help if WordNet neither contains some of the related concepts [8]. There are many other ontology repositories that their ontologies were extracted from structured contents of Wikipedia pages such as Freebase, BabelNet, and DBpedia. On the other hand, many ontology extraction tools try to extract and construct the ontology either cooperatively or automatically, such as Text-to-Onto [10], SYNDIKATE (SYnthesis of DIstributed Knowledge Acquired from TExts) [11, 12], CRCTOL (Concept-Relation-Concept Tuple based Ontology Learning) [13], and ProMine [14]. Some of them using structured or semi-structured data as input to extract and construct the ontologies such as ProMine and Text-to-Onto, while others using unstructured data to extract the ontologies such as SYNDIKATE and CRCTOL. However, most of the existing ontology extraction tools have many shortcomings and drawbacks. For example, some of them depend on human intervention in the whole of their tasks such as Text-to-Onto. In addition, most of them, such as Text-to-Onto and CRCTOL, depend on predefined templates for relation extraction that lead to very low recall results [2, 15, 16]. Moreover, some of these tools use small dataset such as Text-to-Onto which used only 21 web articles as the input dataset. Nowadays, many researches that try to extract ontologies from scientific publications have begun to emerge, such as in [17, 18]. The [17] study is an association rules-based approach for enriching the domain ontology rather than extracting new domain ontology. This study depends partially on lexical similarity measures, but in many cases, there is no correlation between the lexical similarity of concept names and the semantic concept similarity because of the high complexity of language or the uncoordinated ontology development. An example of this shortcoming, the concepts pair (table, stable) has lexical similarity while there is not semantically matching. In the [18] study, the authors defined NTNU system that aims to extract the keyphrases and relations from scientific publications using multiple conditional random fields
376
F. N. AL-Aswadi et al.
(CRFs), this study has many limitations and shortcomings as the author stated themselves. One main limitation of these limitations is that this study extract only two types of relations they are synonym and hyponym. In addition, the authors stated that their multiple CRF models with the help of rules have improved the performance on the development set, but the performance was worse on the testing set. Besides all of the above, with continuous scientific development, new fields and terms constantly appear such as Covid-19 in the medical domain or deep learning in the IT domain. So there is a serious need to develop a new technique that can automatically extract and construct ontologies that represent the knowledge. This paper gives a proposal of automatically extracting the semantic concepts and relations from scientific publications by using DL. The rest of this paper is organized as the following: Section 2 gives a look at the ontology construction challenges. Section 3 explores the Deep Belief Network (DBN), while Sect. 4 presents the proposed work. Finally, we concluded in Sect. 5.
2 Ontology Construction Challenges Ontology construction process may conduct by one of the three ways: manual construction (fully performed by experts), cooperative construction (most or all ontology construction tasks are supervised by experts), and automatic construction (automatically performed with limited intervention by users or experts). The main two tasks of the ontology constructing process are extracting the concepts, as well as extracting and mapping the relationship between these concepts. Getting a high degree of precision and recall for these extracted relationships means getting a high degree of precision and reliability of the constructed ontology. The four main drawbacks and shortcomings in the most existing ontology construction research, which is considered the main challenges for automatic ontology construction, are: 1. Most of them not use the efficient relevant measurement to avoid noisy data such as [10, 13] 2. Most of them depend on large amount predefined patterns such as in [8, 13, 17]. 3. Many of them use a small dataset for constructing the ontologies such as in [10, 13] 4. Most of them extract very limited relations almost do not exceed synonym, hyponym, hypernym, meronyms, and/or holonyms relations such as in [8, 13, 14, 18]. The validity of these challenges is discussed and evidenced on our previous work [19] that presents and discusses in details the approaches, prominent systems of ontology construction and their challenges.
3 Deep Belief Network DL is a branch of neural networks (NNs), the difference between traditional NN and DL is in their architectures. NN have shallow architectures (one hidden layer); while DL has deep architectures (more than one hidden layer) and every hidden layer learns
Extracting Semantic Concepts and Relations from Scientific Publications
377
a new extracted features (concepts or relations) from the previous layer. The shallow architectures can effectively solve many simple, well-constrained or defined problems, but their modelling and representational power are limited [20]. Hence, for more complicated real-world applications such as human speech and natural language understanding, where we do not have enough predefined patterns or where we do not have a clear perception of problems, the deep architectures have more abilities when dealing with these complicated problems rather than shallow architectures [20]. As well as DL can handle a large amount of data in an effective and efficient way. Deep Belief Network (DBN) (with its respective variations) is one of the milestones models on the DL [21–23]. It is a multi-layer, unsupervised or supervised, and feedforward architectures. DBN is a generative graphical model that consists of a stack of Restricted Boltzmann machines (RBMs) [20, 24–26]. RBM is a symmetrical graph (each visible node is connected with each hidden node) that consists of two layers: a layer of visible nodes and a layer of hidden nodes with no connections in the same layer [20, 24, 27]. Figure 1 shows an example of DBN. Each layer in the DBNs has a double role, it serves as the hidden layer to the nodes that come before and as the visible layer to the nodes that come after. The training of DBN can be a discriminating training for inference problem, classification problem; or a generating training to generate training data [21, 28].
Fig. 1. An example of DBN that was stacked of 3 RBMs
4 Methodology Our proposed work aims to handle the above shortcomings by suggesting an enhancement for concepts relevance measures for handling the first shortcoming and by suggesting six more relation types for handling the fourth shortcoming as well as by using DL techniques for handling second and third shortcomings. That is because DL can handle a large amount of data in an efficient and effective way as well as because using predefined patterns can give a reasonable precision, but a very low recall because that any relation
378
F. N. AL-Aswadi et al.
is not within the predefined patterns cannot be detected. While DL is based on the deep learning fundament [28]. We proposed these suggestions based on our knowledge and literature that we presented and discussed in [19]. 4.1 Concept Relevance Measurements From the existing studies on ontology construction, the concepts relevance measurements that are used are Term Frequency-Inverse Document Frequency (TF-IDF) that is shown in Eqs. (1), (2) and (3); Domain Relevance measures (DR) that is shown in Eq. (6); Domain Consensus measures (DC) that is shown in Eqs. (4) and (5); and Domain Relevance value (DRM) that is shown in Eqs. (7) and (8) . tf = idf = log2
count of concept c in d total number of concepts in d
(1)
the size of d count of documents where concept c appears tf − idf = tf × idf
DC =
dj ∈D
p c, dj × log2
1 p c, dj
(3) (4)
freq c ∈ dj
p c, dj = n freq c ∈ dj j = 1 dj ∈ D freq c ∈ Dj DR c, Dj = n freq c ∈ Dj j=1 DRM =
λ(c) =
(2)
|log λ(c)| − min|log λ| tf (c) df (c) × × max(tf ) max|log λ| − min|log λ| min(df )
max pk1 (1 − p)n1−k1 pk2 (1 − p)n2−k2 p
max p1k1 (1 − p1 )n1−k1 p2k2 (1 − p2 )n2−k2
, λ ∈ [0, 1]
(5)
(6)
(7)
(8)
p1 ,p2
Where c refers to the concept, d refers to the corpus, D refers to domain corpora, λ refers to the likelihood ratio, n is the number of document collections, k1 and k2 refer to the frequencies of concept (c) in the domain and contrasting domain, n1 and n2 refer to the total number of concepts in the domain and contrasting domain respectively, p refers to conditional probability for the concept (c), p1 and p2 refer to the probability of concept (c) in the domain and contrasting domain respectively, and df refers to the document frequency of a concept (c) in the document. These measurements are good in some aspects and have weakness in others, to name a few, TF-IDF is used in some studies such as [10, 29–31] to select the relevant concepts
Extracting Semantic Concepts and Relations from Scientific Publications
379
or objects for the domain. However, it is not suitable for identifying the significant concepts of a corpus [13]; it is a trustworthy measure for identifying important keywords in individual documents but not for all the corpus. On the other hand, the DRM measure that is suggested and used in [13], is a good measure for identifying the significant concepts of a small corpus. However, according to our experiments, it is not efficient for big corpus due to it estimated the likelihood ratio for each term, which causes extremely high computing cost. Also, DC and DR measures are used together in [32] study to identify the significant concept for the selected domain. However, DR value merely depends on the concept’s frequency in the target domain corpus and the contrasting corpora so if the size of one of this corpus is adjusted; the result of DR would be greatly different [13]. Besides all the above, all the previous measurements donot take into account the time factor to identify the significant concepts for the domain. For explaining the importance of the time factor to measure the relevant concept, suppose that we have corpora in the IT domain, the programming language is a concept in IT domain and at the same time it is a sub-domain of it that has concepts related to it. Now, suppose that after extracting the concepts we get the following instances: Basic language, Pascal language, C language, c# language, Java language, Python language and XXX language (suppose it is a new language that just appears). Then without using time factor in relevance measures, the new appeared language (XXX language) might have a low relevant value and not identifying as a significant, while Basic language is identified as a significant. So, using the time factor is essential to identifying the relevant concepts. By using it, the new appeared language (XXX language) would have a better relevant value, and it would identify as a significant. In contrast, Basic language would have a low relevant value and then it is identified as “old” (old programming language) and be less significant. Based on all the above, we suggest a new concept relevance measurement that aims to solve the previous problems. It is Domain Time Relevance (DTR) measure that estimates the rate of repetition of the concept at all times of domain corpora so that the greater frequency value of the concept with the progress of time, means the greater degree of relevance for this concept. The domain time concept of a concept c in domain D at time t is given by Eq. (9): 1 DC c, Dti+1 − DC c, Dti DRT =
(9)
I =n−1
Where c refers to the concept, t refers to the time, and D refers to domain corpus. 4.2 Semantic Relations Most of the existing studies extracted very limited relations almost do not exceed synonym, hyponym, hypernym, meronyms, and/or holonyms relations such as in [8, 13, 14, 18]. Based on our knowledge and the initial analysis for around 1000 scientific articles that are collected from SCOPUS engine, we can define the types of semantic relations for concepts on scientific domains as it is shown in Table 1. We have suggested six more relation types (homonyms, usage, result, comparison, model, and dependence) for handling the semantic relations shortcoming.
380
F. N. AL-Aswadi et al. Table 1. Semantic relation types
Relation type
Example
Linguistic relation
Equal
Data, Information
Synonyms
Is_A
Bubble Sort, Sorting Algorithm
Hyponyms Hypernyms
Has_A
Algorithm, Performance
Holonyms
Different_of
Plant, Plant or CNNa , CNNb
Homonyms
Part_of
Red Blood Cells, Blood
Meronyms
Used_to Used_by
Technology, Waste Food Technology, Human
Usage
Result_of
Reliable Ontology, Precision Relation
Result
Compared_to
Bubble Sort, Merge Sort
Comparison
Use_A -
Image Classification, Machine Learning
Model
Depend_On
Performance, Data Size
Dependence
a Name of a TV news channel. b Abbreviation of Convolutional Neural Network.
4.3 Extracting Semantic Concepts and Relations Figure 2 shows the workflow of the proposed work to extract the semantic concepts and relations from scientific publications. In our previous work [19], we discussed and presented in details the literature review about ontology construction, OL, DL, and the DL for OL. In this paper, we introduce a proposal of extracting semantic concepts and relations by using DBN. In this work, DBN is used to classify the extracted concepts and to extract the relations between concepts. Also, it is used to classify the extracted semantic relations under the main relations types that were defined in Table 1. After pre-processing the text and extracting the concepts from the text by using ngram and other concept extraction methods and after applying the proposed relevance measure, the terms and tags bags are built in binary representation. Then the system assigning the part of speech (POS) and syntactic tags to each individual term in binary representation. This combining aims to build the feature vectors (training file for DBN). After building appropriate DBN and training it, the trained DBN can be used to classify the concepts and to extract the relations. The concept classification by using DBN will be through two processes. First one is detection process, in this process DBN has only two target outputs: “yes” and “no” where “yes” means that c1 ⊆ c2 while “no” means c1 c2 for each input vector (ci refers to concept i). The second process is the classification process, in this process the second and the third up the level of the k most relevant concepts are used as the DBN target outputs. The k most relevant concepts are identified by using proposed measures. The same processes are done for semantic relations detection and classification except that the target outputs of the classification process are the relations identified in Table 1.
Extracting Semantic Concepts and Relations from Scientific Publications
381
Fig. 2. Workflow to extract semantic concepts and relations from scientific publications.
5 Conclusion In this paper, we introduced new relevance measurement for concepts as well as we introduced six new types of semantic relations, they are homonyms, usage, result, comparison, model, and dependence. Furthermore, we presented a proposal for extracting semantic concepts and relations automatically from scientific publications by using DBN. This proposal aims to address the four main shortcomings and drawbacks in current ontology
382
F. N. AL-Aswadi et al.
extraction systems as it pointed out above. In the future work, we will illustrate in details with experimental results how this proposed work has be performing and enhancing the ontology construction process.
References 1. Mishra, S., Jain, S.: A study of various approaches and tools on ontology. In: 2015 IEEE International Conference on Computational Intelligence & Communication Technology (CICT), pp. 57–61 (2015) 2. Maimon, O., Browarnik, A.: Ontology learning from text: why the ontology learning layer cake is not viable. Int. J. Signs Semiot. Syst. 4(2), 1–4 (2015) 3. Zou, X.: A survey on application of knowledge graph. J. Phys. Conf. Ser. 1487, 012016 (2020) 4. Ergeta, M.: Introduction to Knowledge Graphs and their Applications (2019). https://med ium.com/analytics-vidhya/introduction-to-knowledge-graphs-and-their-applications-fb5b12 da2a8b. Accessed 30 Sept 2020 5. da Silva, J.W.F., Venceslau, A.D.P., Sales, J.E., Maia, J.G.R., Pinheiro, V.C.M., Vidal, V.M.P.: A short survey on end-to-end simple question answering systems. Artif. Intell. Rev. 53(7), 5429–5453 (2020) 6. Al-Ghuribi, S.M., Noah, S.A.M.: Multi-criteria review-based recommender system-the state of the art. IEEE Access. 7, 169446–169468 (2019) 7. Franco, W., Avila, C.V.S., Oliveira, A., Maia, G., Brayner, A., Vidal, V.M.P., et al.: Ontologybased question answering systems over knowledge bases: a survey. In: ICEIS, vol. 1, pp. 532– 539(2020) 8. Arnold, P., Rahm, E.: Extracting semantic concept relations from Wikipedia. In: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics, WIMS14, Thessaloniki, Greece, pp. 1–11. ACM (2014) 9. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, Banff, Alberta, Canada, pp. 697–706. ACM (2007) 10. Maedche, A., Volz, R.: The text-to-onto ontology extraction and maintenance environment. In: Proceedings of the ICDM-Workshop on Integrating Data Mining and Knowledge Management, San Jose, California, USA (2001) 11. Hahn, U., Romacker, M.: The SYNDIKATE text knowledge base generator. In: Proceedings of the 1st International Conference on Human Language Technology Research, San Diego, pp. 1–6. Association for Computational Linguistics (2001) 12. Hahn, U., Marko, K.G.: Ontology and lexicon evolution by text understanding. In: Proceedings of the ECAI 2002 Workshop on Machine Learning and Natural Language Processing for Ontology Engineering, OLT 2002, Lyon, France (2002) 13. Jiang, X., Tan, A.H.: CRCTOL: a semantic-based domain ontology learning system. J. Am. Soc. Inform. Sci. Technol. 61(1), 150–168 (2010) 14. Gillani Andleeb, S.: From text mining to knowledge mining: an integrated framework of concept extraction and categorization for domain ontology, p. 146. Department of Information Systems, Budapesti Corvinus Egyetem, Budapest (2015) 15. Al-Ghuribi, S.M., Alshomrani, S.: Bi-languages mining algorithm for extraction useful web contents (BiLEx). Arab. J. Sci. Eng. 40(2), 501–518 (2015) 16. Wong, W., Liu, W., Bennamoun, M.: Ontology learning from text: A look back and into the future. ACM Comput. Surv. (CSUR) 44(4), 20 (2012) 17. Paiva, L., Costa, R., Figueiras, P., Lima, C.: Discovering semantic relations from unstructured data for ontology enrichment: association rules based approach. In: 2014 9th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6 (2014)
Extracting Semantic Concepts and Relations from Scientific Publications
383
18. Lee, L.-H., Lee, K.-C., Tseng, Y.-H.: The NTNU system at SemEval-2017 task 10: extracting keyphrases and relations from scientific publications using multiple conditional random fields. In: Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, Canada, pp. 951–955. Association for Computational Linguistics (2017) 19. Al-Aswadi, F.N., Chan, H.Y., Gan, K.H.: Automatic ontology construction from text: a review from shallow to deep learning trend. Artif. Intell. Rev. 53(6), 3901–3928 (2020) 20. Deng, L.: Three classes of deep learning architectures and their applications: a tutorial survey. APSIPA Trans. Sig. Inf. Process. (2012) 21. Mo, D.: A survey on deep learning: one small step toward AI. Department of Computer Science, University of New Mexico, USA (2012) 22. Arel, I., Rose, D.C., Karnowski, T.P.: Deep machine learning - a new frontier in artificial intelligence research [research frontier]. IEEE Comput. Intell. Mag. 5(4), 13–18 (2010) 23. Chen, X.-W., Lin, X.: Big data deep learning: challenges and perspectives. IEEE Access. 2, 514–525 (2014) 24. Deng, L., Yu, D.: Deep learning. Sig. Process. 7, 3–4 (2014) 25. Hinton, G.E.: Deep belief networks. Scholarpedia 4(5), 5947 (2009) 26. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527–1554 (2006) 27. Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In: Iberoamerican Congress on Pattern Recognition, pp. 14–36. Springer, Heidelberg (2012) 28. Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Reyes, M.P., et al.: A survey on deep learning: algorithms, techniques, and applications. ACM Comput. Surv. 51(5) (2018). Article 92 29. Dahab, M.Y., Hassan, H.A., Rafea, A.: TextOntoEx: automatic ontology construction from natural English text. Exp. Syst. Appl. 34(2), 1474–1480 (2008) 30. Cimiano, P., Völker, J.: Text2Onto. In: Montoyo, A., Mu´noz, R., Métais, E. (eds.) Natural Language Processing and Information Systems: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, 15–17 June 2005, pp. 227–238. Springer, Heidelberg (2005) 31. Kolbe, N., Vandenbussche, P.-Y., Kubler, S., Traon, Y.L.: LOVBench: ontology ranking benchmark. In: Proceedings of the Web Conference 2020, Taipei, Taiwan, pp. 1750–1760. Association for Computing Machinery (2020). 32. Missikoff, M., Velardi, P., Fabriani, P.: Text mining techniques to automatically enrich a domain ontology. Appl. Intell. 18(3), 323–340 (2003)
Effectiveness of Convolutional Neural Network Models in Classifying Agricultural Threats Sayem Rahman(B) , Murtoza Monzur, and Nor Bahiah Ahmad Faculty of Engineering, School of Computing, Universiti Teknologi Malaysia UTM, 81310 Skudai, Johor, Malaysia {rsayem2,muamurtoza2}@graduate.utm.my
Abstract. Smart farming has recently been gaining traction for more productive and effective farming. However, pests like monkeys and birds are always a potential threat for agricultural goods, primarily due to their nature of destroying and feeding on the crops. Traditional ways of deterring these threats are no longer useful. The use of highly effective deep learning models can pave a new way for the growth of smart farming. This study aims to investigate the manner in which deep learning convolutional neural network (CNN) models can be applied to classify birds and monkeys in agricultural environments. The performance of CNN models in this case is also investigated. In this regard, four CNN variants, namely, VGG16, VGG19, InceptionV3 and ResNet50, have been used. Experiments were conducted on two datasets. The experimental results demonstrate that all the models have the capability to perform classification in different situations. Data quality, parameters of the models, used hardware during experiments also influence the performance of the considered models. It was also found that the convolutional layers of the models play a vital role on classification performance. The experimental results achieved will assist smart farming in opening new possibilities that may help a country’s agriculture industry, where efficient classification and detection of threats are of potential importance. Keywords: Smart farming · Convolutional neural network (CNN) · Deep learning · Computer vision · Image processing
1 Introduction With the rise of civilization and modernization, mankind has greatly benefitted from many aspects of science and technology, but the shortcomings have also been increasing. When it comes to managing food for the world, which certainly has the largest population than ever before, the available resources have not been on par to support it. Consequently, there exist problems regarding hunger issues, and the threat toward agricultural goods also cannot be denied. Among the threats to agricultural farms is the pest problem. Birds and monkeys are a prominent threat to this particular sector, which certainly affects the productivity of any farm. Recent studies showed that millions of dollars are being wasted due to animals and birds destroying crops. The traditional ways to prevent this are failing © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 384–395, 2021. https://doi.org/10.1007/978-3-030-70713-2_36
Effectiveness of Convolutional Neural Network Models
385
[1], as they are not sufficiently effective, hence it is time to find a sustainable solution to minimize the damage to crops as much as possible. Even animals are increasingly becoming smarter nowadays, and this requires finding more technical and up-to-date solutions. Using deep learning and image processing to classify and detect them away could lead to being a possible solution for this particular problem. Reduction in production efficiency due to birds and animal attacks in agriculture fields, orchards and ponds, are both long-standing and high-cost problems [2]. There are millions of cases worldwide and billions of dollars of wastage on a yearly basis due to this less-considered yet highly affecting problem. Traditional techniques are not effective enough to control bird and animal attacks on farms. At the same time, farming is the most important source of the gross domestic product (GDP), as it contributes around 22% of a country’s overall GDP [1]. Traditional techniques such as scarecrows or other violent ways of scaring away the pests are not long term solutions. This new modern era needs a long term and effective solution to scare away birds and animals. Previous research works mostly focused on bird detection and producing specific sounds using the Viola Jones algorithm, image pre-processing, features extraction and template matching [1–3]. However, the long term effects of previous works remain questionable because of the smart nature of some animals such as monkeys. Therefore, this study highlights both birds and animals, specifically monkey classification, using the Wild Bird Image [4] dataset and 10 Monkey Species [5]. When it comes to classifying any intruder, the use of images is the best option, as it leads to fast classification and recognizing the target object. The application of convolutional neural networks (CNNs) of deep learning in this sector is becoming a new norm. There are numerous CNN models available that are being used successfully for object classification and detection. Using CNN models to classify the intruders with the help of image classification allows farmers to be notified in detecting possible threats for their farms. If it can be implemented successfully, it will certainly help to grow a country’s economy. This study applied several existing CNN variants with two datasets to investigate the performance of these models. The model variants considered to analyze overall performance are VGG16, VGG19, InceptionV3 and ResNet50. The champion of the ILSVRC 2014 contest was the GoogleNet, also known as Inception model. VGG net was the runner up of that contest. Other two models also have very high accuracy in terms of classification from images. The remainder of this paper is organized as follows. Section 2 provides an overview of previous works to the particular domain. Section 3 explains the architecture of the convolutional neural network (CNN) model applied. Section 4 presents the proposed methodology for this study. The results and performance of the CNN models are discussed in Sect. 5. Finally, Sect. 6 concludes this paper and discusses potential future work.
2 Related Work A survey by USDA’s Natural Agricultural Statistics Service (NASS) in 1994 shows that nearly half of the field crop producers face losses that are estimated to be around $316 million due to wildlife invasion [2]. Another survey by USDA NASS (1999) in
386
S. Rahman et al.
seven states and two crops, reported a loss of tens of millions of dollars and a reduction in product quality each year, directly by birds [6]. The use of different types of deterring techniques such as different scarecrows, gas cannons, sensor triggered devices, laser beams, and flashlights, has a short duration of effectiveness in terms of deterring the threats [2, 6, 7]. Classifying and detecting the original threat is always challenging, hence, deep learning-based computer vision solutions always achieve better performance, as these are successful in classifying the objects correctly. Many applications such as image classification, speech recognition, and language processing learning techniques use deep learning [8]. Deep learning-based convolutional neural network (CNN) models are currently being used to achieve high precision and effectiveness when dealing with large amounts of image data. CNNs work with typical feedforward neural networks. Backpropagation CNNs comprise many hierarchical layers like feature map layers and classifications layers [9]. A proposed project [3] named “Smart Scarecrow” used image processing with feature extraction and template matching to detect birds, and then used specific sounds to deter them. The study achieved an accuracy of around 90.47%, which was satisfying. Another research team used Context-SVM for object classification and detection using VOC 2007 and 2010 datasets. They achieved the highest mean A.P. of 73.0, and won a competition using the VOC datasets [10]. Image processing and template matching were used to successfully detect and analyze the bird’s motion using the Wild Birds in a Wind Farm dataset [1]. Deep learning models were used by [11] to locate the monkey features in their natural habitat. A total of 6000 high-resolution image dataset of monkeys from the Primate Research Institute was used to conduct this research in 2018 [11]. Bird species from their natural habitats were successfully identified by extracting features from the bird’s colour, wings, beak and so on by another research group. Using CNN, the study achieved an 80% success rate in predicting the bird’s class from images. The CaltechUCSD Birds 200 (CUB-200-2011) dataset was used for the study [12]. The work by [13] involved a complete system model in classifying and detecting birds and other objects. The study pre-trained CNN models like Inception v3, ResNet v2, NASNet mobile, and MobileNet. This particular study achieved 100% accuracy in some cases, indicating the possibility and high success rate of trained CNN models [13].
3 The Architecture of Convolutional Neural Network (CNN) CNN is capable of extracting certain features from the input data. The existing neurons inside the CNN models are capable of extracting high-level abstraction features of the extracted features from the previous layers. The architecture of deep learning CNN models has four main types of building blocks, namely, Convolution Layers, Non-Linearity (Rectified Linear Unit), Max-Pooling and Fully Connected or Classification Layers [14]. Figure 1 illustrates the basic CNN architecture and shows the different layers involved. 3.1 Convnet Layers First Layer: Convolutional Layer. A CNN model has at least one convolutional layers, and one or more fully connected layers. Identifying features with the help of a feature
Effectiveness of Convolutional Neural Network Models
387
Fig. 1. Basic convolutional neural network architecture showing different layers [14].
map in the dataset is known as convolutional operation, and convolutional layers handle these types of operations. Pooling Layer. Max pooling layer reduces noise in the filters. After reducing the noise, the maximum value is extracted and then put in the pooled feature map by the maxpooling layer. This maximum value functions as the input for the next layer. Pooling layers handle the problem of overfitting. Other common pooling layer operations are average pooling, stochastic pooling, spectral pooling, spatial pyramid pooling and multiscale orderless pooling. Fully Connected Layer. Identifying the classification output is the responsibility of the fully connected layers, which are positioned after the convolutional layers, giving the output of the classifier. High-level reasoning decisions are made in the fully connected layers. Loss Layer. Other than the four main layers, there is also a layer referred to as the loss layer, where the fully connected layers serve as the lost layers that compute the loss or error. The error is the penalty for the imbalance between the actual and the desired output. Softmax loss is used to predict a single class out of some mutually exclusive classes, and is widely used as a loss function.
3.2 Activation Functions Another important part of the neural network is activation functions. These activation functions take single numbers, and upon some mathematical computations, determine the output of a neural network. Each neuron in the network has the function attached to it. Activation or deactivation of the neuron depends on the activation function based on the model’s prediction. Some commonly used functions are Sigmoid, Tanh, ReLU (Rectified Linear Unit) and Leaky ReLU.
388
S. Rahman et al.
Sigmoid. Real-Valued functions and output values are taken as an input by the Sigmoid activation function. The input is taken in the range of 0 and 1, as shown in Eq. (1) [14]. f (x) = max(0, x)
(1)
Tanh. Tanh is considered a scaled-up version of sigmoid, as shown in Eq. (2). It generates output values ranging from −1 to 1. Tanh is preferred over Sigmoid because it has fewer drawbacks than the Sigmoid function. The reason is Tanh is zero centred, so gradients do not oscillate between positive and negative values [14]. f (x) =
1 1 + e−x
(2)
Tanh has made it easier for values like strongly negative, neutral or strongly positive values to be taken as model input. ReLU. Zero thresholding shown in Eq. (3) is the linear activation function. ReLU. The computational efficiency of ReLU is to converge the network very quickly [14]. rect(x) = max(0, x)
(3)
Leaky ReLU. Leaky ReLU is the slightly modified version of ReLU. It only has a positive slope in the negative area; otherwise, it is similar to ReLU, as shown in Eq. (4) [14]. x, x ≤ 0 f (x) = {0.01x, otherwise}
(4)
3.3 Regularization Deep convnets are mainly troubled by overfitting of data. The regularization technique is used to solve this kind of overfitting problem. The two most common regularization techniques are Dropout and DropConnect [14]. CNN is not usually trained from scratch. Transfer Learning is used when the dataset is small in the case of network initialization to initialize the weight. Retraining the classifier or last few layers result in a reduction in overall training time. A good model can be achieved by successful selection of parameters like learning rate, number of iterations, number of training and testing samples, batch size and so on. Same networks can be trained with different parameter settings to achieve more accurate results. Weight updating is performed with backpropagation and gradient descent. A problem like gradient flow while training the lower layers can be avoided by using sigmoid or tanh, although ReLU performs more effectively for convnets to handle these types of problems. Convnet’s classifier is the final layer, where softmax classifier is mostly used to obtain the probabilities of each class [14].
Effectiveness of Convolutional Neural Network Models
389
Fig. 2. The overall workflow of the study.
4 Research Methodology The purpose of this study is to analyze how the use of CNN models can help to classify the intruders, especially birds and monkeys, in the agriculture sector. Figure 2 illustrates the overall workflow of the study. The pre-processed data is used as the input for the CNN models after the datasets are collected accordingly. 4.1 Dataset Collection Two datasets are used to perform experiments on various CNN models to check their performance [4, 5]. These two datasets are related to the agricultural sector, as the background of these images is natural and real-world. Therefore, these two datasets were selected to investigate the performance of the CNN models in the agricultural sector. The Wild Birds in a Wind Farm Image dataset was compiled at the Naemura Laboratory from The University of Tokyo, and is publically-available for research purposes. In this dataset there, are four classes of birds, which are undefined birds, hawks, crows and non-birds. The experiment was done on the last three classes. A total of 32,973 bird images are available in the dataset. There exist 4,911 images that look similar but are not birds, and 1,907 unclear images. All the images in the dataset were preprocessed. The second dataset considered is the 10 Monkey Species from Kaggle. It contains 10 species of monkeys. The experiment was done on the entire dataset. There are 1,370 high-resolution images in the dataset.
390
S. Rahman et al.
4.2 Data Pre-processing After the successful collection of the dataset is the task of preparing the data according to the models. For this study, the data is mainly image data, although both of the datasets used here have pre-processed images. They were organized accordingly to fit with the models, and the images of the dataset were checked and uploaded to Google Colab. The data for each of the datasets were split into 4:1 portions for training and testing, respectively. The details are presented in Table 1. Table 1. Images split for the datasets. Dataset 10 Monkey species Wild birds in a wind farm
Total images
Number of training images
Number of validation images
1370
1098
272
24027
21146
2881
4.3 Applying the Convolutional Neural Network (CNN) Models After pre-processing the data, the CNN models were loaded and applied. The datasets were imported from local storage. The Python-based TensorFlow® library with Keras® was used in the Anaconda® environment to apply the CNN models for the datasets located on the local storage. Google Colaboratory was used for the VGG19 model to check the performance. After that, the data was normalized. Then, the CNN models were declared and manipulated to fit in with the models. Adam optimizer was used with a learning rate of 0.0001. The VGG16, VGG19, InceptionV3 and ResNet50 CNN models were applied. The changes and applications were observed thoroughly for further comparison and analysis. 4.4 Training and Testing The next phase involved training the datasets with the CNN models, and then testing them accordingly. The training helped the model to understand each class of the dataset, and the testing part was used to make predictions based on the trained data. The labels understood by the models were validated using the test data portion. The performance of the models depends on the prediction part, which determines the accuracy of each model. The models need to detect the maximum number of predictions accurately. Figure 3 illustrates the overall testing method for this particular study. It is worthy to mention that the training data does not overlap with the testing data.
Effectiveness of Convolutional Neural Network Models
391
Fig. 3. Overall testing method.
5 Result Analysis During the experiments, four CNN models were used using Keras®. Among the eight experiments, seven were done on a personal computer, and another experiment was done using Google Colab, a cloud-based platform that allows using GPU based Tensorflow® components. All the experiments were done using a GPU, in order to reduce processing time during classification. Table 2 presents all the models that have been used with their accuracy, validation accuracy, validation loss and time to train each model for both of the datasets. Table 2. Experimental results of the datasets with different CNN models. Model name
Dataset used
Final accuracy (%)
Validation accuracy (%)
Validation loss
Time taken to train the model
InceptionV3
10 Monkey species
99.63
97.06
0.0765
55 min
Wild birds in a 98.08 wind farm
97.42
0.0845
9 h 29 min
10 Monkey species
99.12
95.96
0.0934
49 min
Wild birds in a 98.50 wind farm
97.17
0.0887
6 h 43 min
10 Monkey species
49.74
46.32
1.6269
48 min
Wild Birds in a wind farm
97.14
96.77
0.1987
4 h 32 min
10 Monkey species
98.25
95.22
0.1635
1 h 15 min
Wild birds in a 97.57 wind farm
95.72
0.1335
1 h 10 min
ResNet50
VGG19
VGG16
392
S. Rahman et al.
From the experimental results, it can be observed that the InceptionV3 and ResNet50 models have performed quite well in terms of accuracy and validation accuracy with a minimum amount of loss. Both of the models achieved more than 99.00% accuracy on the 10 Monkey Species dataset. The training period was also satisfying for the 10 Monkey Species dataset, which was below 1 h. However, the training for the Wild Birds in a Wind Farm dataset took a longer time, as the number of images was much more compared to the 10 Monkey Species dataset. Even after the lengthy training time, the models achieved more than 98.00% accuracy.
Fig. 4. Model accuracy comparison among the used models.
The overall result can be analyzed from Fig. 4. It can be seen that the VGG19 did not perform well in this experiment on the 10 Monkey Species dataset. After 10 epochs, it achieved a 49.74% accuracy. It might need to reconsider the required parameters to be studied more thoroughly to obtain better results. The number of epochs may be increased, as it took only 48 min to train. However, the bird’s dataset was tested on Google Colaboratory with a GPU using the VGG19 model, and it took 4 h and 32 min to achieve an accuracy of 97.14%. The time it took to complete the training was less than the InceptionV3 and ResNet50 using Google Colab. The experiments done on VGG16 have achieved an accuracy of around 98.25% for the 10 Monkey Species dataset, and 97.57% for bird’s dataset. Both of the datasets took slightly more than 1 h and 10 min to complete the training process. Figure 5 shows the accuracy vs epoch graph for the two datasets considered. While analyzing the final results, the utmost accuracy achieved for the two datasets was InceptionV3 with 99.63%, and ResNet50 with 99.12% accuracy, on the 10 Monkey Species dataset. The optimal accuracy for the bird’s dataset was also these two models, with 98.08% (InceptionV3) and 98.50% (ResNet50). The success behind these results can be analyzed from the architecture and characteristics of these two CNN models, where the
Effectiveness of Convolutional Neural Network Models
393
Fig. 5. Accuracy Vs Epoch of the models for the two datasets.
ResNet was the champion of the ILSVRC 2015, and the InceptionV3 was the runners up. InceptionV3 has 7 million parameters, and with 42 layers, it has a lower error rate while working with image data. The computation cost was slightly higher than the previous versions of inception, but they are more efficient than other models. However, the ResNet50 has 50 layers, and with the least error rate, it achieved better results with its unique architecture in both of the datasets, and outperformed InceptionV3 for the bird’s dataset, providing more data will result in better accuracy for this particular model.
6 Future Work This study analyzed the performance of some CNN models to check whether these models can perform classification in different situations, mainly for an agricultural environment. The CNN models were trained and validated using two datasets. For bird’s dataset, all the models performed with over 97% of accuracy, and for the monkey’s
394
S. Rahman et al.
dataset, except for one model, the average accuracy was over 98%, which indicates that the CNN models are capable of classifying objects from different situations. The experiments that were done on the datasets clearly identify that animals and birds can be classified, and thus can be detected, even though there are scopes of improvements. More datasets consisting of all the possible threats in agriculture need to be experimented. The datasets also need to be of high quality, which increases the possibility of more accurate results. Better GPU based hardware may improve performance as well. Combining IoT based devices with CNN models may also open a new dimension in the smart farming industry. In this study, only four pre-trained CNN model variants were used using Keras to perform classification on the two considered datasets. However, in terms of detection, the YOLO darknet model performs better, which was not considered due to resource constraints. As a part of future work, these particular datasets can be used in real agricultural environments for real-time threat detection.
References 1. Gowsalya, K., Priyanka, S., Vanitha, N.: Birds scaring system for agricultural field using image processing. Int. J. Intellect. Adv. Res. Eng. Comput. 06(1), 184–187 (2018) 2. Marsh, R., Erickson, W., Salmon, T.: Scarecrows and predator models for frightening birds from specific areas. In: 15th Vertebrate Pest Conference, University of California, Davis (1992) 3. Pornpanomchai, C., Homnan, M., Pramuksan, N., Rakyindee, W.: Smart scarecrow. In: 2011 3rd International Conference on Measuring Technology and Mechatronics Automation, Shanghai, China, pp. 294–297 (2011) 4. The 10 Monkey Species from Kaggle. https://www.kaggle.com/slothkong/10-monkey-spe cies. Accessed 11 Aug 2020 5. Wild Birds in a Wind Farm Dataset. https://bird.nae-lab.org/dataset/. Accessed 11 Aug 2020 6. Anderson, A., Lindell, C., Moxcey, K., Siemer, W., Linz, G., Curtis, P., Carroll, J., Burrows, C., Boulanger, J., Steensma, K.: Bird damage to select fruit crops: the cost of damage and the benefits of control in five states. Crop Prot. 52, 103–109 (2013) 7. Akula, A., Deshmane, P., Thorat, S., Bagbande, A., Pawar, M.: A comparative study and analysis of approaches towards agricultural supervision. In: 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), Jalgaon, pp. 510–515 (2016) 8. Bilal, J., Haleem, F., Murad, K., Muhammad, I., Ihtesham, I., Awais, A., Shaukat, A., Gwanggil, J.: Deep learning in big data analytics: a comparative study. Comput. Electr. Eng. 75, 275–287 (2019) 9. Zainudin, Z., Shamsuddin, S.M., Hasan, S.: Deep learning for image processing in WEKA environment. Int. J. Adv. Soft. Compu. Appl. 11(1), 1–20 (2019) 10. Chen, Q., Song, Z., Dong, J., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 13–27 (2015) 11. Labuguen, R., Gaurav, V., Negrete, S., Matsumoto, J., Inoue, K., Shibata, T.: Monkey features location identification using convolutional neural networks. In: The 28th Annual Conference of the Japanese Neural Network Society, Japan (2018) 12. Gavali, P., Mhetre, P.A., Patil, N.C., Bamane, N.S., Buva, H.D.: Bird species identification using deep learning. Int. J. Eng. Res. Technol. (IJERT) 8(04), 6030–6033 (2019)
Effectiveness of Convolutional Neural Network Models
395
13. Lee, S., Lee, M., Jeon, H., Smith, A.: Bird detection in agriculture environment using image processing and neural network. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, pp. 1658–1663 (2019) 14. Aloysius, N., Geetha, M.: A review on deep convolutional neural networks. In: 2017 International Conference on Communication and Signal Processing (ICCSP), Chennai, pp. 0588–0592 (2017) 15. Lim, L.J., Sambas, H., MarcusGoh, N.C., Kawada, T., JosephNg, P.S.: ScareDuino: smartfarming with IoT. Int. J. Sci. Eng. Technol. 6(6), 207–210 (2017)
A Study on Emotion Identification from Music Lyrics Affreen Ara1(B) and Raju Gopalakrishna2 1 Department of Computer Science and Engineering, Christ (Deemed To Be University),
Bengaluru, India [email protected] 2 Department of Data Science, Christ (Deemed to be University), Lavasa Campus, Pune, India [email protected]
Abstract. The widespread availability of digital music on the internet has led to the development of intelligent tools for browsing and searching for music databases. Music emotion recognition (MER) is gaining significant attention nowadays in the scientific community. Emotion Analysis in music lyrics is analyzing a piece of text and determining the meaning or thought behind the songs. The focus of the paper is on Emotion Recognition from music lyrics through text processing. The fundamental concepts in emotion analysis from music lyrics (text) are described. An overview of emotion models, music features, and data sets used in different studies is given. The features of ANEW, a widely used corpus in emotion analysis, are highlighted and related to the music emotion analysis. A comprehensive review of some of the prominent work in emotion analysis from music lyrics is also included. Keywords: Music lyrics · Emotion analysis · Affective norms · Music emotion models
1 Introduction Digital music is an integral part of the lives of a good percentage of the human population. According to a report based on a study carried out in 2019 by the International Federation of the Phonographic Industry, people spend around eighteen hours a week listening to music compared to 17.8 h a year ago [1]. The study also illustrates the tremendous growth in the music industry. There is an increased demand for better tools for the selection of music. One approach for music selection is based on the emotion in its lyrics. The lyrics can evoke different emotions and play a significant role in the overall experience of listening to music. Emotion recognition from text is an evolving area in Natural Language Processing. In the case of music lyrics, it analyzes music lyrics and determines the meaning or thought behind it.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 396–406, 2021. https://doi.org/10.1007/978-3-030-70713-2_37
A Study on Emotion Identification from Music Lyrics
397
Music catalogs have become huge, and Music Recommendation Systems (MRS) help the audience by listing music based on listener needs and preferences. MRS is an active and challenging area in music research. Recommending music using the lyric of the piece is a distinguished area related to MRS. Music Lyric Recommendation system is based on the processing of lyrics and strongly footed on Natural Language Processing and Text mining concepts. This allows a listener to select music based on the emotion portrayed in the lyric and, in a broader context, help music composers to select lyric appropriate to a given situation. The Lyrics, the audio signal together with album reviews, are widely used in music analysis. The writing style and selection of words in lyrics uniquely identify the genre, song class, and publication time [2]. Lyrics add semantic content to music and indicate the melodic, structural, and rhythmic properties in the audio signal [2]. Because of these, music lyrics are used in music classification. Lyrics are also incorporated in multimodal music classification [3]. Music analysis and classification are used for tasks such as music discovery, retrieval, recommendation, and playlist generation. Emotion identified from music lyrics is an effective attribute in the automatic analysis and classification of music [4]. Machine identification of emotion from music lyrics – Music emotion Recognition (MER) improves the user experience in computerized music selection systems. A major factor in the study of MER is its multidisciplinary nature. A researcher intending to work in this domain should have insight into the basic structure and concepts in MER. Machine identification of emotions from the text (lyrics) is an upcoming area with lots of challenges largely due to different conceptualization schemes and structures. Compared to the studies and progress in music analysis using audio signals, lyricbased works are relatively less. There is ample scope for contributions in the lyric analysis by employing emerging text mining, natural language processing, and machine learning concepts. In this paper, a concise introduction to MER’s conceptualization and taxonomy and a comprehensive review of related research work are presented.
2 Emotion Models Music can evoke or suggest emotions to the listener. The emotion model plays a pivotal role in music emotion analysis. Major classes of theoretical models of emotions are discrete, dimensional, and miscellaneous. The discrete model is based on the theory that all emotion can be derived from innate raw emotions like happiness, anger, disgust, fear, and sadness [5]. The dimensional model describes the emotion using two dimensions – valance and arousal. Valance and arousal are depicted as orthogonal axes in the affective space [5]. A typical example is the famous Russel model. Miscellaneous models are an incongruous mixture of emotion concepts such as intensity, similarity, and preference [5]. In addition to the above, models that focus only on emotions directly relevant to music are proposed. These models overlap, and studies often employ multiple models of emotion. Recently, two-dimensional models of emotions are gaining more popularity. Three different emotion models are described below.
398
A. Ara and R. Gopalakrishna
The Henver model: A basic emotion model applied in music, conceived by K. Henver [6]. Considering various musical features, he generated eight different emotional clusters. Further, 67 adjectives are distributed into these clusters. Henver’s adjective circle is shown in Fig. 1 [6]. The Russel Model: According to the Russel Model, a two-dimensional circumplex model, all affective states originates from valence (a pleasure-displeasure continuum) and arousal (activation – deactivation) [7]. The different emotions can be traced to varying degrees of both valence and arousal [8]. A visual description of the Russel model is depicted in Fig. 2a [9]. The Thayer Model: it is an energy-stress model that defines valence as permutations of energetic arousal and tense arousal [10]. According to this, music emotions form four groups - contentment (low energy, low stress), depression (low energy, high stress), anxiety (high energy, high stress), and exuberance (high energy, low stress). A visual description of the model is given in Fig. 2b.
Fig. 1. Henver’s adjective circle [6]
A Study on Emotion Identification from Music Lyrics
399
Fig. 2. a. The Russel model [9]. b. The Thayer model [10]
3 Affective Lexicon Affective Lexicon is a selected subset of words in a language about affective (positive or negative) conditions, and most of these words refer to emotions [10]. Affective lexicons are either based on dimensional models or a categorical model. This section describes two affective Lexicon – Affective Norms for English Words (ANEW) and WordNet Affect of Emotion (WNA). Affective Norms for English Words (ANEW): ANEW is a collection of 1034 English words with their rating in the three dimensions – Valence, Arousal, and Dominance(VAD) [11, 12]. The valence defines the extent to which a stimulus results in a positive or negative emotion (“pleasant to unpleasant”). Arousal relates to the level of energy change in the emotion (“calm to excited”). Dominance identifies the degree of the “perceived control” over the response to the stimulus (“control to out-of-control”) [13]. To illustrate, “joy” is more positive than “sad”; “nervous” indicates more arousal than “lazy”, and “fight” shows more dominance than “delicateness.” For rating, ANEW uses a scale of 1–9. Later, ANEW has been augmented with more words (2477). In the domain of affective evaluations of words, ANEW is the most used corpus. The major limitation of ANEW is its small size. Also, in ANEW, the context is not considered. Warriner, Kuperman, and Brysbaert (2013) built an affective norms corpus with about 14,000 words [14]. The ratings were gathered using participants recruited through Mechanical Turk. Later a combined set of ANEW and Warriner lexicons is used to increase the lexicon size. For each word in the combined set, synonyms of the word’s most common meaning are found using Wordnet. Then the affect ratings of the words are extended to its synonyms. A corpus of 22,756 words was created and validated [15]. The same authors used all synonyms and hyponyms of a source word to generate expansion words and assign scores, resulting in a set of 109,752 words. WordNet Affect of Emotion (WNA) is a collection available for the Categorical model. WNA is a product of WordNet, and the labeling is founded on Ekman Emotion [13]. It includes six types of emotions: fear, anger, sadness, disgust, joy, and surprise.
400
A. Ara and R. Gopalakrishna
4 Features from Music Lyrics Recognition of emotion from music lyrics is an active area of research. It is closely related to the principles and practices in natural language processing and text mining. Affective lexicons are a source of feature definition of emotion recognition. Many text features are defined and extracted for emotion analysis. The last decade witnessed the evolution of several deep learning architecture based works. In this section, some of the lyric-based features are discussed. They are only representatives, and several NLP and text mining features can be re-engineered for emotion identification from lyrics. Bag of words (BOW, Term Frequency – Inverse Document Frequency (TF-IDF) and word vector are very popular text features in NLP. But their effectiveness in emotion identification is relatively less. An n-gram model is used for lyrics, which identifies words and collocations representing an output class [2]. These n-grams are then ranked according to the TF-IDF for the class. This is followed by identifying n-grams that are artistic-specific, eliminating such n-grams, and re-ranking. The above ensures that the influence of the vocabulary preferences of individual artists is reduced. Finally, the top m n-grams (for n ≤ k, for a chosen k value) are represented in the feature vector. VOCABULARY [2]: These features are defined based on the richness of vocabulary r and the use of slang and uncommon words. Words that appear in the Urban Dictionary [16] and not in Wiktionary [17] are classified as slang words. A word is defined as uncommon if it is absent in Wiktionary. The logarithmic frequency of slang words and the ratio of uncommon words to all words are calculated as feature values [2]. STYLE [2]: POS and chunk tag distributions capture syntactic structures. “Lines per song,” “tokens per song,” “tokens per line” are considered as length features. A rhyme detection tool [18] is used to capture the rhyme structure. Repetitions of letters (“riiiiise”) or words (“money, money”) and in-line rhymes (“burning turning,” “where were we”) are captured under echoism. SEMANTICS [2]: The Regressive Imagery Dictionary (RID) helps to identify dominant concepts in a text by attaching words to “conceptual thought” (abstract, logical, reality-oriented), “primordial thought” (associative, concrete, fantasy), and “emotion.” RID is found to capture not only what is said but also how it is said. The dominant imageries for each text can be computed and encoded as feature values. ORIENTATION [2]: Under this, the temporal dimension and the extend of egocentric are considered. The fraction of the past tense verb forms to all verb forms identifies whether the song mainly recounts past experiences or present/future ones, and hence can be used as a feature. The pronoun occurrences for 1st, 2nd, 3rd singular, and plural person are taken as the degree of egocentric. Further, the proportion of self-referencing pronouns to non-self-referencing ones and the ratio of first-person singular pronouns to the second person also indicate the orientation. SONG STRUCTURE [2]: The recurrence of identical or similar blocks of texts represent the chorus. The overall similarity between two lines can be computed as a weighted sum of the lexical and structural similarities in terms of word and “POS tag bigram” overlaps. This helps in encoding whether the song contains a chorus, and the title appears in the song text.
A Study on Emotion Identification from Music Lyrics
401
5 Lyrics Datasets for Emotion Identification A vast collection of digital repositories are available for music lyrics in English. Websites like AllMusic, Genius, and LastFm are prevalent sites. For the identification of emotions from songs using supervised machine learning, repositories with annotations are essential. But the availability of benchmarking datasets of emotion annotated music lyrics is limited. Several studies developed their own datasets for the analysis. The conventional method of annotations with experts is time-consuming and practically difficult for the creation of large datasets. As an alternative, researchers started using the concept of Crowdsourcing for lyric annotations. CROWDSOURCING LYRIC EMOTIONS: Crowdsourcing is a work paradigm that could help in the annotation processes more massively. Crowdsourcing platforms like InnoCentive target specialized participants for solving R&D problems in specific topics. On the other hand, platforms like Amazon Mechanical Turk (MTurk) consider repetitive and straightforward tasks that any participant with Internet access can do and are paid by the publisher of the work. This makes MTurk suitable for annotation of lyrics by reaching to mass. MTurk participants can be used to label lyrics and validate the label assigned by other methods [19]. In the same line, online games are also used to associate emotions with songs and lyrics. MajorMiner and TagATune are examples [19]. Social tags are used in creating datasets. Tagging is ubiquitous in online media. Tags in the Last.fm is a good resource for researchers [19]. In a study, AllMusic tags are used to build a ground truth dataset of song emotions [20]. Based on the tags and their ANEW norms, the authors classified each piece into one of the V-D quadrants of Russell’s model. This is followed by three experts’ validation and finally created a dataset of 771 songs [20]. The requirement of annotated songs leads to the proposal of various strategies for ground truth datasets. Desirable but difficult to achieve characteristics of such datasets are: i. Sufficiently large collection ii. Use of well-accepted emotion model iii. Polarized annotations and iv. Publically available [19]. As an illustration, the steps in creating ground truth datasets followed in [21] are considered. To select songs, the authors considered several musical genres and era and ensured that the pieces are distributed uniformly in the four quadrants of the Russel model. They also confirmed that each song belongs predominantly to one of the quadrants. Popular sites like lyrics.com, ChartLyrics, and MaxiLyrics were used for the extraction of lyrics. The lyrics are preprocessed by applying steps such as correction of orthographic errors and removing common patterns, text not part of the song, very short songs, and songs with non-English lyrics. The songs are then given for annotation to 39 participants. The annotators then marked the lyric’s dominant emotion and assigned a value between −4 to 4 for valence and arousal. The average values of valence and arousal were assigned to each song. The authors also removed songs with large variations in the V-A values given by the participants. Finally, a dataset of 180 songs was created. Annotation of songs based on the valance-arousal model employing MTurk workers is reported in [22]. Their dataset includes 744 clips and feature entries and is publicly available. A dataset of 500 western songs was created using a traditional survey-based on questions to paid participants and is made public [23]. In [24], the authors obtained
402
A. Ara and R. Gopalakrishna
emotion labels from MTurk workers and created a small dataset of 100 songs. MoodLyric [25] is a dataset of 2595 lyrics labeled into happy, angry, sad, and relaxed. Their work contributes to the available datasets and provides a comprehensive account of how a dataset can be created. In [19], authors describe the creation of two datasets, MoodyLyrics4Q, a dataset of 2,000 songs, and MoodyLyricsPN, a collection of 5000 songs (labeled as positive or negative only). Both the datasets and the description of their creation are beneficial resources. Music4All is a new database that incorporates attributes such as metadata, tags, genre information, and lyrics [30]. The authors describe different applications of the database. The database can be tuned for emotion analysis also. The creation of ground truth datasets for emotion identification from music lyrics is still a challenging and dynamic area. Further, the scope of extending the concepts and approaches to other languages makes it a hot topic of research.
6 A Review of Emotion Identification from Music Lyrics The review basically focuses on emotion identification from music lyrics using machine learning techniques. The number of recently published works in this area is relatively less. At the same time, several contributions are reported in audio-based emotion identification and mixed-mode (audio and text) methods. In this section, a concise review of selected research papers in the area is emotion detection from lyrics is presented. The review is not extensive but includes some promising as well as representative works reported recently. In [4], the authors used the Naive Bayes classifier for emotion detection from music lyrics. Data collection is realized by crawling the web of Baidu music, a popular music site. The site includes music as well as emotion labels – “sad, passionate, quiet, comfortable, sweet, inspirational, lonely, miss, romantic, yearning, joyful, soulful, happy, nostalgic, relaxed.” These labels are then mapped to “contentment, depression and exuberance” Thayer model is used as the basis. Word segmentation and identification of emotion words using an emotion dictionary constitute the feature extraction phase. The authors used a collection of 271 (contentment), 2091(depression), and 954 (exuberance) pieces, all in Chinese, though they included few English words in an experiment. Further, the songs are classified into positive (“Quiet, Yearning, Romantic, Sweet, Healing, Passionate, Inspirational, Happy, Joyful”) and negative (“Waiting, Sad, Frustrated”) category. Though the accuracy reported is relatively low, the methods adopted are useful for the domain. Music emotion detection based on lyric-audio combination is reported in [13]. The inclusion of “emotional corpus” in feature extraction is one of the highlights of the work. Psycholinguistic are found with the help of emotional corpus: General Inquirer (GI) and CBE (a combined model of ANEW and WNA), whereas interjection words are used for defining stylistic features. The paper describes a method for auto-tagging to deal with missing tags. The dataset used is MIREX, based on the Russel model, which is then mapped to the Thayer model. I addition to the lyric features, audio features are also extracted. “Support Vector Machine” (SVM), “Random Forest,” and “Naive Bayes” are used for classification. The best F-measure reported for combined audio and lyrics is
A Study on Emotion Identification from Music Lyrics
403
.0568 (Random Forest), for stylistic features, 0.456 (Naïve Bayes), and psycholinguistic features, 0.354. The relatively poor results indicate the scope of further research in the domain. A study on emotionally-relevant features for music classification was carried out by Malheiro R. et al. [20]. A set of 180 lyrics were created as part of the study and presented an elaborate discussion on the database creation and validation. The authors considered a large collection of features taken from previous works and proposed few novel features as part of the paper. The feature set includes “content-based,” “stylistic-based features,” “song-structure-based,” and “semantic-based.” These features are classified into groups, and extensive experiments on classification and regression were carried out. The approaches adopted include classification by “quadrants,” by “arousal hemispheres” and by “valence meridian.” The results show that the new features augment the results by 5% to 10% for the different classification schemes. The authors include a detailed analysis of the impact of various features and provided directions for future research. In [26], the authors used Deep Neural Network for music mood detection, combining audio and lyrics. For lyrics, authors used models such as Bag-of-words, Single Gated Recurrent Unit (GRU), single Long Short-Term Memory (LSTM), biLSTM, and a combination of Convolutional Layer and LSTM. Further, a DNN model, fusing both audio and lyrics, is also proposed. A dataset is built from the Million Song Dataset. The authors compared the performance of the Deep learning models with models based on feature engineering. They concluded that deep architectures provide a better performance, and the fusion models are better than unimodal approaches. A CNN-LSTM model is proposed for emotion detection from audio and lyrics, which addresses the limitations of the single network model is proposed in [27]. The model combines two-dimensional features obtained from “CNN-LSTM” and single-dimensional features with deep neural networks. The features of lyrics are of two categories - word embedding (2D) and word frequency vector (1D). Similarly, for audio also both 2D and 1D features are used. This is the motivation for the proposal of the new model. To accommodate the multimodal emotional information fusion, the authors propose a multimodal stacking based ensemble learning.- the original features are changed into labels, thereby solving the problem of feature heterogeneity. The Million song dataset is used for the experiments and reported 74% classification accuracy for lyrics and 78% for fusion. In [28], a neural network-based system for the classification of music is presented. Their work is an extension of a previous Multi-layer perceptron based model trained using a publically available MediaEval database. The authors introduce a preprocessing phase to address the problem related to the classification by quadrant based on valence and arousal by introducing the preprocessing phase. In experiments carried out, MPL is found to outperform SVM and Random forest. The authors reported an average F-measure of 50% in a four-quadrant classification schema. Further, two binary classification approaches- one vs. Rest (OvR) in four-quadrants and binary classifiers in valence and arousal – are also implemented. 69%, 73%, and 69% are the average F-measure for OvR, valence, and arousal, respectively. Finally, based on experiments with the temporal annotation data of the MediaEval database, the authors conclude that
404
A. Ara and R. Gopalakrishna
F-measures in four quadrants are practically constant, regardless of the duration of the time window. Yang D. and Lee W.S. published a study on emotion identification from lyrics [29]. The authors used a psychological model of emotion that is extended to 23 specific emotion categories. The authors claim that the results obtained are humanly comprehendible and in tune with intuitions about specific emotions.
7 Discussion Emotion recognition from music lyrics is still in its infancy. There are several challenges faced by researchers such as the multidisciplinary nature, different approaches to the conceptualization of emotions, different language models for emotion classification, issues in dealing with semi-structured text, dependency on the author, change of language constructs and words with the era and the variations in the interpretation of emotions by different readers. Design of algorithms for the automatic recognition of emotion from lyrics requires knowledge in various filed, including emotion models, emotion dictionaries, creation and validation of music lyric datasets, text processing, and machine learning. This paper points to the different directions to be explored in mastering the area. MER from lyrics opens different research questions from emotion modeling to the machine classification of lyrics. Most of the successful music emotion recognition works are audio-based. Several hybrid models – which consider both lyrics and audio, are also reported. The purpose of this paper is to strengthen research in emotion recognition from lyrics only. If better models can be proposed for emotion recognition from lyrics, it may lead to better hybrid models. Extracting features and classification of lyrics is a relatively difficult task due to the very nature of emotion analysis. Compared to many other text processing or NLP applications, the success in emotion analysis from lyrics is minimal. With several deep learning models, there is a huge scope for research in this area.
8 Conclusion In this paper, a review of basic concepts and implementation aspects of emotion recognition from music lyrics is presented. An overview of emotion models, affective lexicons, feature extraction from music lyrics, music lyric databases, and the process of database creation are described in a concise manner. Few representative papers in the area are reviewed to give an idea about the research trends in emotion analysis from lyrics.
References 1. IFPI Global Report https://www.ifpi.org/news/IFPI-GLOBAL-MUSIC-REPORT-2019. Accessed 29 Jun 2020 2. Michael, F., Caroline, S.: Lyrics-based analysis and classification of music. In: Proceedings of 25th International Conference on Computational Linguistics, COLIN 2014, pp. 620–633 (2014)
A Study on Emotion Identification from Music Lyrics
405
3. Xiao, H., Stephen Downie, J., Andreas, F.E.: Lyric text mining in music mood classification. In: Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR 2009, pp. 411–416 (2009) 4. Yunjung, A., Shutao, S., Shujuan, W.: Naive Bayes classifiers for music emotion classification based on lyrics. In: Proceedings of the 16th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2017, vol. 1, pp. 635–638 (2017) 5. Eerola, T., Vuoskoski, J.K.: A review of music and emotion studies: approaches, emotion models, and stimuli. Music Percept. Interdisc. J. 30(3), 307–340 (2012) 6. Posner, J., Russell, J.A., Peterson, B.S.: The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev. Psychopathol. 17(3), 715–734 (2005) 7. Eerola, T., Vuoskoski, J.K.: A comparison of the discrete and dimensional models of emotion in music. Psychol. Music 39(1), 18–49 (2011) 8. Jamdar, A., Abraham, J., Khanna, K., Dubey, R.: Emotion analysis of songs based on lyrical and audio features. Int. J. Artif. Intell. Appl. (IJAIA) 6(3), 35–50 (2015) 9. Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford University Press, New York (1989) 10. Clore, G.L., Ortony, A., Foss, M.A.: The psychological foundations of the affective lexicon. J. of Pers. Soc. Psychol. 53(4), 751–766 (1987) 11. Bradley, M.M., Lang, P.J.: Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings. Technical Report 1. The Center of Research in Psychophysiology, University of Florida (1999) 12. Bradley, M.M., Lang, P.J.: Affective Norms for English Text (ANET): Affective Ratings of Text and Instruction Manual. Technical Report. D-1, University of Florida (2007) 13. Rachman, F.H., Sarno, R., Fatichah, C.: Music Emotion Classification based on lyrics-audio using corpus based emotion. Int. J. Electr. Comput. Eng. 8(3), 1720–1730 (2018) 14. Warriner, A.B., Kuperman, V., Brysbaert, M.: Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Meth. 45, 1191–1207 (2013) 15. Shaikh, S., Cho, K., Strzalkowski, T., Feldman, L., Lien, J., Liu, T., Broadwell, G.A.: ANEW+: automatic expansion and validation of affective norms of words lexicons in multiple languages. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, pp. 1127–1132 (2016) 16. Wiktionary. https://en.wiktionary.org. Accessed 11 Sept 2020 17. Urbandictionary. https://www.urbandictionary.com. Accessed 11 Sept 2020 18. Hirjee, H., Brown, D.G.: Using automated rhyme detection to characterize rhyming style in Rap music. Empirical Musicology Rev. 5(4), 121–145 (2010) 19. Çano E.: Text-based Sentiment Analysis and Music Emotion Recognition, Doctoral Thesis (2018) 20. Malheiro, R., Panda, R., Gomes, P., Paiva, R.P.: Emotionally-relevant features for classification and regression of music lyrics. IEEE Trans. Affect. Comput. 9(2), 240–254 (2016) 21. Malheiro, R.: Emotion-based Analysis and Classification of Music Lyrics, Doctoral Thesis (2016) 22. Soleymani, M., Caro, M.N., Schmidt, E.M., Sha, C.Y., Yang Y.H.: 1000 songs for emotional analysis of music. In: Proceedings of the 2nd ACM International Workshop on Crowdsourcing for multimedia, pp. 1–6. ACM (2013) 23. Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008) 24. Mihalcea, R., Strapparava, C.: Lyrics, music, and emotions. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 590–599 (2012)
406
A. Ara and R. Gopalakrishna
25. Çano, E., Maurizio, M., MoodyLyrics: a sentiment annotated lyrics dataset. In: 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, Hong Kong, pp. 118–124 (2017) 26. Delbouys, R., Hennequin, R., Piccoli, F., RoyoLetelier, J., Moussallam M.: Music mood detection based on audio and lyrics with Deep Neural Net. In: ISMIR. arXiv:1809.07276v1 [cs.IR] (2018) 27. Chen C., Li, Q.: A multimodal music emotion classification method based on multifeature combined network classifier. Math. Probl. Eng. 2020, 11 (2020). Article ID 4606027 28. Medina, Y.O., Beltrán, J.R., Baldassarri, S.: Emotional classification of music using neural networks with the MediaEval dataset. Pers. Ubiquit. Comput. (2020). https://doi.org/10.1007/ s00779-020-01393-4 29. Yang, D., Lee, W.S.: Music emotion identification from lyrics. In: 2009 11th IEEE International Symposium on Multimedia, vol. 1, pp. 624–629 (2009) 30. Domingues, M.A., Santana, I.A.P.: Music4All: a new music database and its applications. In: 27th International Conference on Systems, Signals and Image Processing, IWSSIP 2020, Brazil (2020). https://doi.org/10.1109/IWSSIP48289.2020.9145170
A Deep Neural Network Model with Multihop Self-attention Mechanism for Topic Segmentation of Texts Fayçal Nouar1(B) and Hacene Belhadef2 1 Management Sciences Department, Guelma University, Guelma, Algeria 2 NTIC Faculty, University of Constantine 2 – Abdelhamid Mehri, Constantine, Algeria
Abstract. Topic segmentation is an important task in the field of natural language processing (NLP), which finds its importance in applications such as information retrieval, text summarization, e-learning. Current neural methods for topic segmentation represent a sentence by a single feature vector that generates single semantic information. However, the dependencies between different parts in a sentence relies on more complex semantic information, which cannot be learned by a single-vector representation. In this paper, we present a deep neural model to capture the multi-aspect semantic information for topic segmentation of texts by multihop attention mechanism to address this issue, which named MHOPSA-SEG. At each attention step, the model assigns different weights to words depending on the previous memory weights. Thus, it can capture multiple sentence semantic vector representation. We conduct experiments on four datasets, including written texts and lectures transcripts. And the experimental results show that MHOPSA-SEG outperforms the state-of-the-art models. Keywords: Topic segmentation · BERT · RNN · Highway network · Multihop self-attention · NLP
1 Introduction Topic segmentation of texts refers to the process of dividing a text or a recording into shorter, contiguous, non-overlapping, meaningful, and topically coherent blocks. It is an important task in the field of natural language processing (NLP) with applications in information retrieval [1], summarization [2], information extraction [3], e-learning [4]. For example, topic segmentation models can be employed to generate e-learning courses by dividing long and unstructured documents into coherent and properly structured blocks according to e-learning standards [4]. The efficiency of segmentation algorithms differs with the type of documents (written texts, monologues, lectures, multi-party meetings transcripts). Topic segmentation in written documents is often considered less important: usually, a text is segmented on the basis of its structure (chapters, paragraphs) [5]. By contrast, the identification of topic boundaries in lecture transcripts is more complicated. The lecture transcripts often exhibit complex structure and topic boundaries © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 407–417, 2021. https://doi.org/10.1007/978-3-030-70713-2_38
408
F. Nouar and H. Belhadef
tend to be fuzzier because of the spontaneous speech, various forms of expression, and the diversity of instructional methods. Traditional approaches for topic segmentation rely on hand-crafted features. To manually design salient features is a cumbersome process and mainly depends on prior knowledge of designers, which makes them inefficient and impractical. Recently, significant deep learning models, mainly based on convolutional neural network (CNN) and recurrent neural network (RNN), have been applied to the topic segmentation task and achieved promising results due to their ability, among other advantages, to learn automatically effective features on a given dataset with multiple levels of representation. CNN can extract features from local and consecutive words. This could be sometimes penalizing for segmentation as some important words often distribute in long sentences with complex structures, and there may be long intervals between them. On the other hand, RNN with Long Short Term Memory (LSTM) networks can learn the long-term dependencies of texts. However, RNN needs to take words in sentences in a sequential order which can lead to loss the contextual information for long texts [6]. To address this issue, the attention mechanism is employed in conjunction with RNN. The attention mechanisms allow to model dependencies without considering their distance in the sentence and to focus more on the most important parts of the text. Even though neural-based models have improved the performance of many topic segmentation algorithms, they still represent a sentence by a single feature vector that generates single semantic information. However, the dependencies between different parts in a sentence relies on more complex semantic information that cannot be learned by a single vector representation. Furthermore, most of these methods do not make use of developments in deep learning, such as deep contextualized representation (e.g. BERT [7]), Highway Networks [8], and multihop self-attention mechanism [9] which have been shown great success in various NLP tasks. In this paper, the problem of topic segmentation is formulated as a binary classification. We introduce a novel deep neural model (MHOPSA-SEG) with a multihop self-attention mechanism that captures the different semantic information from multiple representations of the sentence. MHOPSA-SEG includes a deep contextualized input layer (BERT), bidirectional long short-term memory networks (BiLSTM) layer, Highway network layer, and a multihop self-attention layer. To the best of our knowledge, this is the first work that empirically investigates the capability of highway networks and multihop self-attention mechanism for topic segmentation task. We validate MHOPSASEG on four datasets, including written texts and lectures transcripts. Compared with the state-of-the-art models, MHOPSA-SEG achieves excellent performance on these datasets. Specifically, it achieves four best results on four datasets. The remainder of the paper is organized as follows: Section 2 discusses related works. Section 3 describes the MHOPSA-SEG architecture. Experiments are reported in Sect. 4. Results and discussion are given in Sect. 5. Finally, in Sect. 6, we conclude and address our future work.
2 Related Works The research areas on topic segmentation of texts firstly appeared in 1997 [10]. Since then, a multitude of works have been published on the subject. Traditional works can
A Deep Neural Network Model with Multihop Self-attention Mechanism
409
broadly be categorized into three approaches. The first one includes methods based on lexical cohesion [10–12]. The second one represents methods that make use of features in either a supervised or unsupervised way [13–15]. The last one covers methods that employ topics obtained from topic models (e.g. LDA topic model [16]) to obtain semantic information [17]. These approaches heavily rely on handcrafted features and external resources which rend them difficult to be generalized to a broader variety of datasets. Besides, they are far from an adequate representation of the context. In recent years, most state-of-the-art methods related to deep learning techniques were proposed to segment different types of texts. These methods have been shown to outperform previous state-of-the-art techniques. Yu et al. [18] employed a Hidden Markov model (HMM) framework, in which they used a deep neural network (DNN) to estimate the posterior probability of topics given the bag-of-words in the local context. Wang et al. [19] used a bidirectional long short term memory (BiLSTM) model, associated with a CNN for segmenting web documents. Sehikh et al. [20] measured lexical cohesion by using bidirectional RNN to capture the context in the past and the following set of words. Li et al. [21] proposed a generic end-to-end segmentation model that uses a bidirectional RNN to encode input text sequence and employs another RNN together with a pointer network to identify segments’ boundaries in the input sequence. Koshorek et al. [22] formulated text segmentation as a supervised learning problem. They used a model composed of a hierarchy of two sub-networks based on the LSTM architecture. The first sub-network is a BiLSTM that generates sentence representations; the second one is the segmentation prediction network. Badjatiya et al. [23] proposed an attention-based bidirectional LSTM model where CNNs are used for learning sentence embeddings and a stacked BiLSTM layer captures the context information of each sentence. The BiLSTM outputs are fed into an attention layer to improve performance. In this paper, we investigate the use of a multihop self-attention based deep neural model for topic segmentation task, which provides the multi-aspect sentence semantic information representation.
3 Model Architecture In this section, we explain in detail the proposed model. As shown in Fig. 1, MHOPSASEG mainly consists of the following parts: a deep contextualized representation layer, a bidirectional long short term memory networks (BiLSTMs) layer, a highway network layer, a multihop self-attention layer. The deep contextualized word representation is employed to produce comprehensive sentence representation for the (BiLSTMs) layer. The highway network can extract more comprehensive semantics features of the BiLSTMs output. The multihop self-attention can focus on important features and captures the multi-aspect semantic information of the Highway network output. All details about the model layers will be presented in this section. 3.1 Input Representation Layer We use BERT [7], the Bidirectional Encoder Representations for Transformers, to produce context representation. The BERT model consists of 12 encoder layers, each
410
F. Nouar and H. Belhadef
encoder is a Transformer block, where the output of each encoder layer for a given token can be used as a feature representing that token. The model’s inputs are the words of the sentence. Given a sentence consisting of n words s = {x1 , x2 , . . . , xn }, we obtain Ei , the input representation of xi , by summing the corresponding token, segment, and position embeddings. The output of the hidden states is used to generate new embeddings for each text input at train time. These embeddings are then fed into the next layer of the model. We use a pre-trained BERT model by huggingface1 . We use 12 attention heads and 768-dimensional word-piece embeddings.
Fig. 1. The overall architecture of our proposed topic segmentation model.
3.2 Bidirectional LSTM (BiLSTM) Layer RNNs are a class of neural networks designed to process sequential data of arbitrary length. Their recurrent structure allows them to preserve the information about the past time steps which is particularly beneficial for sequential data such as texts where each word depends on the previous one. In theory, RNNs can make use of information in arbitrary length, but in practice, they are limited to look back only a few steps due to the gradient vanishing and expansion problem. To overcome this problem, Hochreiter et al. [24] proposed Long Short Term Memory (LSTM) networks as a variant of RNN. LSTMs are capable of learning long term dependencies by introducing an adaptive gating mechanism that regulates the information flow that should be retained and discarded at each time step. Specifically, we distinguish the ‘input gate’ i that determines which 1 https://huggingface.co/models.
A Deep Neural Network Model with Multihop Self-attention Mechanism
411
information to be updated at each time step, a ‘forget gate’ f that decides what information to discard, and an ‘output gate’ o that determines the output. At each time step t, given the word vector xt , the previous hidden state ht−1 and the previous cell state ct−1 , the current state ht can be calculated as follows: ft = σ Wf xt + Uf ht−1 + bf (1) ot = σ (Wo xt + Uo ht−1 + bo )
(2)
gt = tanh Wg xt + Ug ht−1 + bg
(3)
it = σ (Wi xt + Ui ht−1 + bi )
(4)
ct = ft ct−1 + it gt
(5)
ht = ot tanh(ct−1 )
(6)
where σ and tanh denote the sigmoid function and the hyperbolic tangent function respectively. The symbol means matrix multiplication and denotes element-wise multiplication. W∗ are the weight matrixes for the input vector xt and U∗ are the weight matrixes for the previous hidden state vector ht−1 . b∗ denote bias term. The bidirectional LSTM (BiLSTM) read the word vector from the forward and reverse directions to obtain overall information of the sequence and thus capture more comprehensive features. At each time step, the final output is the concatenation of the two output vectors from both directions by using element-wise sum: ←
hi = hi ⊕ h
(7)
i
3.3 Highway Network Layer In this paper, the highway network is used to control the information flow, which slows down the problems of the gradient, and also to capture more semantic features for sentences. The output h of BiLSTM is passed to the highway network layer. The output of the highway layer is calculated as follows: tg = σ Wg h + bg (8) z = tg ⊗ f (Wh h + bh ) + (1 − tg) ⊗ h
(9)
where σ is the element-wise sigmoid function, ⊗ is the element-wise product, and f is the rectified linear unit. The Wg , Wh and bg , bh represent the weight matrix and bias vectors, respectively. The tg denotes the transform gate, which controls how much information is converted and passed to the next layer. The (1 − tg) is the carry gate, which allows the input to be passed to the next layer directly. The highway network input h and output z have the same shape.
412
F. Nouar and H. Belhadef
3.4 Multihop Self-attention Layer The multihop self-attention mechanism [9] is an iterative process that allows applying multiple self-attention mechanisms. Each attention step assigns different weights to the words depending on the previous memory weights, which provides the ability to capture multiple sentence vector representations. In this section, we propose the multihop self-attention for the topic segmentation task. The output of the Highway layer is a matrix Z = {z1 , z2 , . . . , zn } where n is the length of the input sentence. In each step k, to calculate the weighted sentence representation, the formulas are as follows: S k = tanh Whk Z tanh Wmk mk (10) β k = softmax wSk S k
(11)
where W indicates the attentive weight matrices. The initial vector of m is obtained from the vector zi generated by the highway layer. mk denotes a memory vector and could guide the next attention step, and is recursively updated by: 0 m = N1 i zi (12) mk = mk−1 + uk In each step, the sentence is represented by a vector uk that focuses specifically on some aspects of the sentence. The uk weighted sums are obtained by multihopping the matrix β k and the matrix Z. The formula is as follows: βk Z (13) uk = t
After each step of the self-attention, k vector representation of the sentence is calculated. The sentence representation uk is the weighted sum after passing through the highway layer Z. The classification probability is calculated by using uk . The finally multiple classification results are averaged as a basis of the classification. The sentence classification probability is computed as follows: (14) Rk = softmax uk R=
1 k R k k
(15)
3.5 Output and Training Once having the sentence representation, we predict the classification via a fully connected network to extract key features, and use the softmax function as the activation function. The formula of the softmax function is as follows: ei Si = j
ej
(16)
A Deep Neural Network Model with Multihop Self-attention Mechanism
413
Where Si represents the ith output value of the softmax function. The function takes the node with the highest probability as the prediction target. The cost function of the model is the cross-entropy of the true class label y defined as follows: yi lnSi (17) C=− i
Where yi is the real classification result and Si is the prediction result. The parameters are trained by minimizing the loss function.
4 Experiments 4.1 Datasets We evaluate our model on four datasets of different types2 . Various datasets’ statistics are presented in Table 1. Table 1. Datasets statistics #Docs
#Segments
#Sentences
Average segment per doc
Average segment length (in sentences)
Average document length (in sentences)
270
3358
49197
12.43
14.65
182.21
30
244
6047
8.13
24.78
201.57
Clinical
227
909
31868
4.00
35.06
140.39
Physics
23
129
10370
5.60
80.38
450.86
Ai
22
269
11304
12.22
42.02
513.82
Wiki. Train. Wiki. Test
Wikipedia (Test). The first dataset is a randomly selected collection of 30 English Wikipedia articles used in [23]. For each article, the gold standard segment boundaries correspond to section breaks as it appears in its table of contents. Clinical Dataset [12]. The second dataset contains 227 documents collected from a medical book where each document is a chapter. The documents are divided into segments as indicated by the author. Artificial Intelligence Dataset [11]. The third dataset consists of 22 manually transcribed and segmented lectures on graduate Artificial Intelligence class. For each lecture, the segmentation was obtained from the lecturer himself. Physics Dataset. The last dataset contains 23 transcribed and segmented lectures on Physics class collected from MIT open courseware3 . In this dataset, the gold standard segment boundaries correspond to section breaks specified by the lecturer. 2 https://github.com/noufay/TopSeg_datasets. 3 https://ocw.mit.edu.
414
F. Nouar and H. Belhadef
4.2 Evaluation Metrics To evaluate the performance of segmentation models, we use the Pk [25] and WidowDiff (Wd) [26] measures. Both metrics use a window that is moved over the document and determine whether the sentences at the beginning and at the end of the window are properly segmented with respect to each other. The width of the window is generally fixed to be half the average segment length in the original segmentation. Pk verifies that the two sentences are in the same segment or not while Wd requires that the number of segments between the two sentences is identical in the hypothesized and the original segmentations. The Pk and Wd are penalty metrics, lower values indicate better performance. 4.3 Baselines We compare the performance of MHOPSA-SEG with a traditional model and a recent attention-based model. The former include BayeSeg4 [12] and Att-CNN-BiLSTM5 [23]. BayesSeg is an unsupervised model in which lexical cohesion is placed in Bayesian framework. The authors marginalize over all possible language models, which improves segmentation accuracy. Dynamic programming is used to search the space of segmentations. The Att-CNN-BiLSTM model is described in Sect. 2. The BayesSeg and AttCNN-BiLSTM implementations are publically available. We select the parameters using the scripts included with those distributions. 4.4 Preprocessing We apply commonly preprocessing standard steps to the texts, which consists of removing punctuations, digits, and non-content-words using the list employed by several competitive systems [12]. 4.5 Training and Parameters We train all the deep learning models on 270 documents from the Wikipedia dataset used in [23], creating over ~49k training sentences (see Table 1). We split the dataset into 0.8/0.2 train/validation sets and train the model in batches of size 64; the number of iterations is 15, which the segmentation result is the best. The BiLSTM layers have 100 neurons. In experiments, the number of layers of the highway network is set to 2 and the number of self-attention steps k to 2. To overcome the problem of overfitting on the training dataset, we use dropouts of 0.3 for input and recurrent gates in the recurrent layers, after the multihop self-attention layer and after the dense fully connected layers. Concerning the Att-CNN-BiLSTM model, we keep the same parameters as described in [23].
4 https://github.com/jacobeisenstein/bayes-seg. 5 https://github.com/pinkeshbadjatiya/neuralTextSegmentation.
A Deep Neural Network Model with Multihop Self-attention Mechanism
415
5 Results and Discussion Table 2 presents the experimental results of the accuracies of MHOPSA-SEG and other state-of-the-art models on four datasets. Note that the BayesSeg model requires the number of segments as an input parameter, which is not the case with neural models. This is very helpful because such a parameter is generally unavailable in practice. MHOPSA-SEG shows an improvement of (1.9%, 3.1%), (0.8%, 0.3), (3.2%, 3.2%) and (1.9%, 1.6%), on both metrics Wd and Pk, on Wikipedia, Clinical, Physics and Ai datasets respectively, over the Att-CNN-BiLSTM model. Also, MHOPSA-SEG shows an increase of performance by (4.3%, 7.4%), (5.4%, 0.34.3%), (4.2%, 5.2%) and (4.8%, 4.1%), on both metrics Wd and Pk, on Wikipedia, Clinical, Physics and Ai datasets respectively, over the BayesSeg model. In general, the performances of MHOPSA-SEG on written texts and lectures transcripts datasets are completely different. This is expected since the datasets are from different types and exhibit different characteristics, which indicates the sensitivity of topic segmentation to the type of document. Transcribed lectures are less cohesive, have a complex structure, therefore, topic boundaries tend to be hard to identify. Table 2. Test set results for baselines and MHOPSA-SEG Model
Wikipedia Wd
Clinical Pk
Wd
Physics Pk
Wd
Ai Pk
Wd
Pk
BayesSeg
0.349
0.392
0.340
0.358
0.391
0.422
0.424
0.439
Att-CNN-BiLSTM
0.325
0.349
0.294
0.318
0.381
0.392
0.395
0.414
MHOPSA-SEG
0.306
0.318
0.286
0.315
0.349
0.360
0.376
0.398
5.1 Comparison with the BayesSeg Model MHOPSA-SEG and BayesSeg were compared on the above four datasets. The deep neural model is superior to the traditional model in all datasets. It proves that deep learning models and mechanisms can effectively learn the semantic representation of texts, therefore, can improve the performance of text segmentation. Compared with the traditional model-based, neural networks can generate more semantic information of features. 5.2 Comparison with Att-CNN-BiLSTM Model MHOPSA-SEG and the attention-based methods were compared on the above four datasets. By using the attention mechanism, Att-CNN-BiLSTM significantly improved the accuracy of segmentation. In Att-CNN-BiLSTM, the attention mechanism applies on RNN to form a strong segmentation model, which performs better than most of the
416
F. Nouar and H. Belhadef
existing segmentation models (see Ref. [23]). However, the performance of Att-CNNBiLSTM is influenced by the single feature vector representation. For topic segmentation tasks, we need to capture more semantic information from the texts, and the Att-CNNBiLSTM model probably loses features, so its performance is not good compared to MHOPSA-SEG.
6 Conclusion In this paper, we introduce a novel neural network model for topic segmentation of texts. This model combines deep contextualized representation, BiLSTMs, highway networks, and multihop self-attention mechanism to improve the segmentation accuracy. The model is evaluated on four datasets of different types. Experimental results validate that MHOPSA-SEG can achieve a clear segmentation improvement. In the future, we plan to use other deep neural models to further optimize MHOPSA-SEG.
References 1. Shtekh, G., Kazakova, P., Nikitinsky, N., Skachkov, N.: Applying topic segmentation to document-level information retrieval. In: Proceedings of the 14th Central and Eastern European Software Engineering Conference, Moscow, Russia, pp. 1–6. ACM (2018) 2. Oyedotun, O.K., Khashman, A.: Document segmentation using textural features summarization and feedforward neural network. Appl. Intell. 45(1), 198–212 (2016) 3. Boufaden, N., Lapalme, G., Bengio, Y.: Topic segmentation: a first stage to dialog-based information extraction. In: Natural Language Processing Pacific Rim Symposium, NLPRS 2001, pp. 273–279, Tokyo, Japan. Citeseer (2001) 4. Özmen, C., Streicher, A., Zielinski, A.: Using text segmentation algorithms for the automatic generation of e-learning courses. In: Proceedings of the 3rd Joint Conference on Lexical and Computational Semantics, *SEM@COLING 2014, pp. 132–140. Association for Computational Linguistics and Dublin City University, Dublin, Ireland (2014) 5. Purver, M.: Topic segmentation. In: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, pp. 291–317. Wiley, New York (2011) 6. Liu, P., Qiu, X., Chen, X., Wu, S., Huang, X.-J.: Multi-timescale long short-term memory neural network for modelling sentences and documents. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 2326–2335. Association for Computational Linguistics (2015) 7. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, USA, vol. 1, pp. 4171–4186. Association for Computational Linguistics (2019) 8. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Quebec, Canada, pp. 2377–2385. MIT Press (2015) 9. Tran, N.K., Niederée, C.: Multihop attention networks for question answer matching. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, Michigan, USA, pp. 325–334. Association for Computing Machinery (2018)
A Deep Neural Network Model with Multihop Self-attention Mechanism
417
10. Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997) 11. Malioutov, I., Barzilay, R.: Minimum cut model for spoken lecture segmentation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 25–32. Association for Computational Linguistics (2006) 12. Eisenstein, J., Barzilay, R.: Bayesian unsupervised topic segmentation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp. 334–343. Association for Computational Linguistics (2008) 13. Tür, G., Hakkani-Tür, D., Stolcke, A., Shriberg, E.: Integrating prosodic and lexical cues for automatic topic segmentation. Comput. Linguist. 27(1), 31–57 (2001) 14. Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 562–569. Association for Computational Linguistics (2003) 15. Hsueh, P.-Y., Moore, J.D.: Automatic topic segmentation and labeling in multiparty dialogue. In: 2006 IEEE Spoken Language Technology Workshop, pp. 98–101. IEEE (2006) 16. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993– 1022 (2003) 17. Riedl, M., Biemann, C.: TopicTiling: a text segmentation algorithm based on LDA. In: Proceedings of ACL 2012 Student Research Workshop, Jeju Island, Korea, pp. 37–42. Association for Computational Linguistics (2012) 18. Yu, J., Xiao, X., Xie, L., Chng, E.S., Li, H.: A DNN-HMM approach to story segmentation. In: 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, San Francisco, CA, USA, pp. 1527–1531. International Speech Communication Association (2016) 19. Wang, L., Li, S., Xiao, X., Lyu, Y.: Topic segmentation of web documents with automatic cue phrase identification and BLSTM-CNN. In: Lin, Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) Natural Language Understanding and Intelligent Applications: 5th CCF Conference on Natural Language Processing and Chinese Computing, vol. 10102, pp. 177–188. Springer, Cham (2016) 20. Sehikh, I., Fohr, D., Illina, I.: Topic segmentation in ASR transcripts using bidirectional rnns for change detection. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, pp. 512–518. IEEE (2017) 21. Li, J., Sun, A., Joty, S.R.: SegBot: a generic neural text segmentation model with pointer network. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 4166–4172. AAAI Press (2018) 22. Koshorek, O., Cohen, A., Mor, N., Rotman, M., Berant, J.: Text segmentation as a supervised learning task. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, USA, Volume 2 (Short Papers), pp. 469–473. Association for Computational Linguistics (2018) 23. Badjatiya, P., Kurisinkel, L.J., Gupta, M., Varma, V.: Attention-based neural text segmentation. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) Advances in Information Retrieval, ECIR 2018. Lecture Notes in Computer Science, vol. 10772, pp. 180–193. Springer, Cham (2018) 24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 25. Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34(1–3), 177–210 (1999) 26. Pevzner, L., Hearst, M.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)
Data Science and Big Data Analytics
Big Data Interoperability Framework for Malaysian Public Open Data Najhan Muhamad Ibrahim1(B) , Amir Aatieff Amir Hussin2 , Khairul Azmi Hassan3 , and Ciara Breathnach3 1 Department of Information System, International Islamic University Malaysia,
53100 Jalan Gombak, Selangor, Malaysia [email protected] 2 Department of Computer Sciences, International Islamic University Malaysia, 53100 Jalan Gombak, Selangor, Malaysia [email protected] 3 Health Research Institutes, University of Limerick, Limerick V94 T9PX, Ireland [email protected], [email protected]
Abstract. Massive quantities of Malaysia Open Data are available in the public domain such as provided by data.gov.my. However, most of the available datasets are not integrated. Some are unstructured and structured following its source of datasets. Naturally, the datasets cannot interconnect or ‘interoperable’ with one another, which leads to Big Data (BD) problem. Advances in the database management system and interconnect linked data techniques to connect database systems, provide extraordinary opportunities to create relationships between distributed datasets for a particular objective. Fast-growing in computing technologies, which lead to the digitization, which lead to the capability to query various open datasets. Public Open Data come in varying sources, sizes, and formats. These Big and Small datasets formats pose various integration problems for Information Technology Frameworks. To generate meaningful linked-data to support the purposes of our study the relationship between these disparate datasets needs to be identified and integrated. This paper proposes a BD interoperability framework to integrate Malaysian public health open data. The main goal to enable the potential application with current technologies to extract and discover from Public Open Data. It would reduce the overall cost for healthcare with better prevention mechanism to be placed at the right time. By having a public open big data framework in health, we would predict the pattern of future disease that may take several years to understand. Keywords: Big data · Public open data · Interoperability framework
1 Introduction Big data was originally associated with three main keys: volume, verities, and velocity. Big data also included structured, semi-structured, and unstructured data [1]. This research paper will focus on the varieties as the key concept of big data with multiple © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 421–429, 2021. https://doi.org/10.1007/978-3-030-70713-2_39
422
N. M. Ibrahim et al.
types of data. Graham, Milligan, and Weingart (2017) have defined that Big Data (DB) as ‘the data that you are unable to use conventional technology to handle or, even more inclusively, the data that requires special computing intervention to manipulate’ [2, 3]. Public open data will be the main source of data, such civil registration (CR), which meets the definition identified by Graham et al. In the last few years, CR data has been recorded in multiple types of form such a Microsoft Excel, Microsoft Word and conventional database system such Microsoft access, MySql throughout distributed geographic location. Indeed, once all of the records combine, it will be a million registered. BD also can be described as datasets/databases, which common computing applications such as spreadsheet software like Microsoft Excel or any application are incapable to operate, manage, and manipulate. Additional to the size, BD descriptions also incorporate the use of data analysis techniques to examine and project information and new knowledge from the generated patterns of available datasets. Refer to the literature, the existing framework is focused on big data quality framework and there is no concrete big data interoperability framework for Malaysian public open data [11]. Therefore, it is significant to consolidate how to develop holistic big data interoperability framework for Public open data in Malaysia. F. Gullo [7] argued the importance to find new solutions and methods, which capable to handle the ‘fast-growing numbers of digitization of data’. Without a proper solution, the challenge to integrate multiple types of data would be unable to manage. Therefore, this research paper strongly ague that BD interoperability is significant to develop and generate a useful pattern from multiple types of datasets and potential to discover meaningful knowledge in future.
Fig. 1. KDD processes [8]
Knowledge discovery databases (KDD) are a well-known method for data pattern processing, data mining, information discovery, and knowledge extraction to find useful information in datasets. Refer to Fig. 1, KDD consist of several workflow and processes from identifying data source, selecting target data, process preprocessed data, transform
Big Data Interoperability Framework for Malaysian Public Open Data
423
data, generate patterns, and produce knowledge. Those workflows will facilitate the data analysis process and develop a useful method, which would enable to discover meaningful information and pattern of new knowledge. KDD has grown and is a wellknown standard impacting several different fields of research such as machine learning, database analysis, artificial intelligence, and knowledge discovery. Recently, a growing number of research articles and technical reports focus on data mining for BD analysis to generate information patterns [9]. In this research work, the basic concepts of KDD have been adopted in the proposed framework. BD analysis is capable to evaluate and examine a huge volume of datasets from different datasets and identifying significant patterns, unexplored potential, data trends, and other meaningful information. The Public Open Data (POD) is not just huge in size, but also different in format and difficult to understand their content. POD also capable to categorize the particular attribute of the information such as “unstructured, incomplete, multi-dimension, and heterogeneous” [5] which may need a new methodology to identify a valuable pattern. As a result, the fast-growing technologies enable us to collect different multiple types of data to find a meaningful pattern of information, nevertheless, generating more data means generating more uncertain or unusual data. Therefore, it is significant to consolidate and develop a holistic method to integrate and generate heterogenous datasets [6]. Also, the advent of BD has created the problem of overflow in and for computer systems. There are several comprehensive mutual agreements between industry, academia, and government sectors to collaborate in research and development. While interoperability issues arise with multiple data types, they also offer the potential to provide creative resolutions with wider applications. There is also a wide discussion on the ability of BD to overcome traditional records of data. However, the growing number of data types, data volume, speed, and complexity may increase the number of interoperability issues. Therefore, it is important to find a common ground for the integration of those datasets. [6]. To overcome the interoperability of the existing BD and small datasets in POD, the BD interoperability framework is proposed to accommodate the integration process between different types of available data records as a semantic-preserving transformation approach. The proposed conceptual framework is a reference process, i.e. a general guideline to develop a big data inteoperability framework and data analysis model. The proposed big data framework also focuses on the general integration requirements to enable the integration between available open data. The proposed conceptual big data interoperability framework is based on KDD methodology, which contends that different types of data should have their own local data model.
2 The Proposed Big Data Interoperability Framework The advent of BD has created the problem of overflow in and for computer systems. There are several comprehensive mutual agreements between industry, academia, and government sectors to collaborate in research and development. While interoperability issues arise with multiple data types they also offer the potential to provide creative resolutions with wider applications. There is also a wide discussion on the ability of BD to overcome traditional records of data. However, the growing number of data types,
424
N. M. Ibrahim et al.
data volume, speed, and complexity may increase the number of interoperability issues. Therefore, it is important to find a common ground for the integration of those data. To overcome the interoperability of the existing BD and small datasets in historical data, our BD interoperability conceptual framework proposes to accommodate the integration process between different types of available data records as a semantic-preserving transformation approach. The suggested framework is a reference process, i.e. a general guideline to develop a data analysis model. The proposed conceptual framework also focuses on the general integration requirements to enable the integration of different available data types. The big data interoperability conceptual framework is based on the KDD methodology, which contends that different types of data should have their own local data model. This research study consisted of a case study of Malaysia public open data in health, chosen from Data.gov.my. The Malaysian public open data was chosen for this research as the available data is huge and publicly available with multiple types of data for several years. This research consists of two main phases. Firstly, the literature review and identify the significant data sources phase involve investigation on currently available research by conducting a literature study based on the various source of the research paper such as conference proceedings, book chapters, and journals. This examination process needs an understanding of the research issues on DB for Public Open Data interoperability. This is to validate and identify the gaps in the current research issues to be adopted in the proposed framework. Secondly, the development of the data model as the main engine for the proposed Big data interoperability framework, which will be based on the Hadoop Ecosystem. Apache Hadoop is an open-source system that capable to solve
Fig. 2. BD interoperability framework
Big Data Interoperability Framework for Malaysian Public Open Data
425
big data problems using a network of many computers. It distributed the storage and processes of a huge volume of a dataset through the Hadoop Distributed File system (HDFS) and the MapReduce programming model [8]. The proposed BD interoperability framework, which consists of four main data sources included Civil Registration Data, Census Data, National Statistic Data, and Miscellaneous Data as presented in Fig. 2. Civil registration Data can be These data types can be further categorized as life course, Census Data as life grid, National statistics as a life event, and Miscellaneous as verification data. These data will be generated in each local data model to develop a Big Data Analysis System (BDAS) using the Hadoop ecosystem. Then, to illustrate the knowledge-based system. The main engine for the interoperability framework will is BDAS where it facilitates the overall integration process between multiple types of local data models into one ontology to reflect each local model. Finally, it will develop the knowledge-based data model to show the relationship between the different data source and generate the new knowledge pattern.
3 Prototype In this section, we discuss the design of the prototype for the reference KDD workflow, i.e. the KDD processing workflow to be adapted for each format and processing pipeline corresponding. We use DIME, the DyWA Integrated Modeling Environment [10], a Domain-specific modeling (DSM) tool to develop web-based software applications that process via workflows data provided in the normal case in databases. DIME differs from traditional programming tools through its methodical use of a domain-specific modeling language to represent the several aspects of a system’s application. Domainspecific modeling languages are supported in DIME by a graphical design approach which does not require any programming knowledge for the development of the software application [10]. In fact, DIME provides graphical modeling tools to support the different aspects needed in the development of web-based software applications. Users design data modeling, data queries, and workflow logic by means of various types of process models. DIME provides process models for the workflows, data models for the metadata schemas, GUI models for the Web application presentation layer, and access models for the access control definition (e.g. via user groups). A web application is designed in the DIME graphical interface, which uses a ‘drag and drops’ concept to develop and manipulate the processing logic (workflow) and the data manipulation of the web application under design. Once the models are ready, a running web application is fully generated and deployed on a standard web stack through DIME’s code generator. A central design principle of the DIME platform is to make its use straightforward and simple to manage. It also assists the model validation, model transformation, and code generation processes in an automated way [10, 11]. The design of the preliminary big data analysis method (simulation) for the reference KDD workflow, i.e. the KDD processing workflow to be adapted for each format and processing pipeline corresponding in our case to the 4 sources in Figs. 3. We will use DIME, the DyWA Integrated Modeling Environment [10], Domain-specific modeling (DSM) tool to develop web-based software applications that process via workflows data
426
N. M. Ibrahim et al.
provided in the normal case in databases. DIME differs from traditional programming tools through its methodical use of a domain-specific modeling language to represent the several aspects of a system’s application. Domain-specific modeling languages are supported in DIME by a graphical design approach that does not require any programming knowledge for the development of the software application [10]. In fact, DIME provides graphical modeling tools to support the different aspects needed in the development of web-based software applications. Users design data modeling, data queries, and workflow logic by means of various types of process models. DIME provides process models for the workflows, data models for the metadata schemas, GUI models for the Web application presentation layer, and access models for the access control definition (e.g. via user groups) [10]. DIME itself is a Cinco-product.
Fig. 3. Top level reference model of the KDD process in DIME
The DIME model in Fig. 3 corresponds to the reference KDD workflow of Fig. 2, The high-level workflow for processing data through the selection, preprocessing, transformation, data mining, and interpretation phases as the KDD blueprint suggests are easily recognizable. Additionally, we see a starting point for the execution, two termination points, and we see that there is a successful execution path that generates at each phase intermediate data and products, leading to the next phase and terminating in the Success endpoint. There is however at each phase also a simple error handling, that leads to a webpage displaying an error message and the Failure termination endpoint. This reference process will be instantiated and adapted to the needs of each of the 4 data sources. The green arrows indicate the logical flow of the computation (here, the
Big Data Interoperability Framework for Malaysian Public Open Data
427
data transformation and analysis). Each phase is itself a process model, which will be designed and implemented for each data source. The dotted lines make the data flow explicit, and the Data context on the left contains all the data collections and intermediate data sets produced and used along the process. In the case of errors encountered in any phase, it will direct to the ErrorHandling process. Finally, the pattern of data is evaluated and interpreted to discover the knowledge. Concretely, the integration processes required to identify the domain-specific language (DSL) based on the API of the implemented software framework based on the dataset, e.g. actions to access Microsoft Excel, import, and export tables rows, and columns. The DSL model is expected to be required for each type of data source to be integrated (i.e. Excel, Access, etc.). Once this DSL is available, the creation, management, and interpretation of those data components are automatically generated by DIME. The impact of such a DSL model becomes important when looking at the standardization of the process model to develop the application system: once Access is integrated as a technology with its DSL. Any Access data set becomes easily accessible. Consequently, it is strongly argued that using DIME with its underlying DyWA data integration model concept can manage the development of advance applications and realize reusable DSL for POD. The first part of the KDD workflow requires a user interface of a web application aimed at adding data to the databases. We have designed this user interface in DIME through a GUI model that uses the DSL of existing predefined GUI elements like forms, buttons, text input fields.
Fig. 4. GUI model in DIME: GUI design for a simple form to receive input data
Figure 4 shows how a webpage form element (right) will connect to the fields of the User data type (left) as defined in the database. The Submit button on the GUI will trigger storing the data in the database. The code of the web application is generated
428
N. M. Ibrahim et al.
from that GUI model so that the user does not need to know any advanced programming language to design and implement the application. Generally, to design the database for any type of application system, the user is required to know the basic knowledge of SQL. In DIME, no SQL knowledge is needed. With the library packages for GUI design and other actions supported by DIME, the model- and DSL-based approach enables the easy and flexible generation of data models and their integration into web applications. This is a generalization of the computational workflow approach successfully applied to scientific workflows in the past [11] and a further step towards the bigger aim of knowledge management for IT inclusivity advocated in.
4 Conclusion The rapid development of computing technologies enables us to generate and capture a variety of data types and available data sources. Therefore, BD Interoperability has become an important research domain to ensure the integration process. The lack of interoperability awareness in Public Open Data in Malaysia inspires this research to focus on the development of a holistic BD interoperability framework. The aim of this research to facilitate the integration processes among multiple types of available public open data. It also to supports flexible BD integration between different sources of POD. This paper provided a conceptual view of the DIME application, which can be considered as a model is driven development environment to develop a data integration process for POD. The main objective is to explore the promising application for POD using BD analysis. In the future, the focus of the research will concentrate more on data modeling for the local data model and data aggregation model of the proposed framework, and the DSLs needed for the technical integration. By having a compressive data model would be able to produce the promising final data model to discover and generate the new pattern of information (knowledge).
References 1. Trandabat, D., Gifu, D.: Social media and the web of linked data. In: ACM/IEEE Joint Conference on Digital Libraries (JCDL) (2017) 2. Hammad, S., Telfah, A., Ezzeldien, M., Morsi, H.: Current developments in biomedical research. Int. J. Adv. Biomed. 1, 1–3 (2016). ISSN 2357-0490. https://doi.org/10.18576/ab/ 010101 3. Graham, S., Ian, M., Weingart, S.: Big Digital History: Exploring Big Data through a Historian’s Macroscope. Imperial College Press, London (2015) 4. Kitchin, R.: Big data, new epistemologies and paradigm shifts. Big Data Soc. 1, 1–12 (2017). https://doi.org/10.1177/2053951714528481 5. Breathnanch, C., Ibrahim, N.M., Clancy, S., Magaria, T.: Towards model checking product lines in the digital humanities: an application to historical data. In: From Software Engineering to Formal Methods and Tools, and Back, pp. 338–364 (2019). https://doi.org/10.1007/978-3030-30985-5_20 6. National Institute of Standards and Technology (NIST), U.S. Department of Commerce, Big Data Interoperability Framework: Volume 1, Definitions (2018)
Big Data Interoperability Framework for Malaysian Public Open Data
429
7. Gullo, F.: From patterns in data to knowledge discovery: what data mining can do. Phys. Procedia 62, 18–22 (2015). 3rd International Conference Frontiers in Diagnostic Technologies 8. Singh, R.K.: Taxonomy of Big Data analytics: methodology, algorithms and tools. Int. J. Fut. Revolution Comput. Sci. Commun. Eng. 4(12), 101–104 (2018). ISSN 2454-4248 9. Gyamfi, N.K., Appiah, P., Sarpong, K.A., Gah, S.K., Katsriku, F., Abdulai, J.: Big Data analytics: survey paper. In: Conference Proceeding: Dialogue on Sustainability and Environmental Management, Accra, Ghana, 15–16 February (2017) 10. Sun, A.Y., Scanlon, B.R.: How can Big Data and machine learning benefit environment and water management: a survey of methods, applications, and future directions. Environ. Res. Lett. 14, 073001 (2019). https://doi.org/10.1088/1748-9326/ab1b7d 11. Ijab, M.T., Ahmad, A., Kadir, R., Hamid, A.: Towards Big Data quality framework for Malaysia’s public sector open data initiative. In: International Visual Informatics Conference, IVIC 2017. Advances in Visual Informatics, pp. 79–87. Springer, Cham (2017)
The Digital Resources Objects Retrieval: Concepts and Figures Wafa’ Za’al Alma’aitah1,2(B) , Abdullah Zawawi Talib2 , and Mohd Azam Osman2 1 Department of Basic Sciences, The Hashemite University, Zarqa, Jordan
[email protected] 2 School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor,
Pulau Pinang, Malaysia {azht,azam}@usm.my
Abstract. Rapid growth of digital resource objects (DRO) and the valuable contents in such resources have increased the availability of these resources to the users. In attempting to enhance the accessibility of these resources, it is necessary to cater the needs of the users as well as provide search outcomes which are closer to the request. Recently, researchers converted the search path in DRO search from the data retrieval (DR) approach to the information retrieval (IR) approach. Various DRO retrieval systems have been built to facilitate the process of accessing the DRO contents. Thus, such systems need to evaluate their performance effectivity. This paper presents the characteristics of the collections that should be made available in the DRO test collections. It also reviews the computational evaluations and statistical tests used to evaluate the performance of the DRO retrievals. Keywords: Digital resource objects · Evaluations · Test collections · Statistical tests
1 Introduction Digital resource object (DRO) refers to information that is structured which elaborates, describes and eases the retrieval, usage and management of information resources [1, 2]. Apart from the content storage, DROs offer platforms to seek, retrieve and organize contents from databases. The standardized description of resources aids in discovering and retrieving information resources in the digital format by describing individual files, single objects or complete collections [3]. An essential part of DROs is the digital cultural heritage collection. Many cultural heritage organizations such as galleries, libraries, archives and museums have moved towards massive digitization of information to secure long-term preservation of valuable archived materials [4]. Unlike traditional objects held in memorial institution, DRO’s content can be shared, combined and aggregated online, and the content of digital files can be easily modified as well. These features provide many benefits for users of the digitized content in enhancing access to digital collections and allowing their reuse for research purposes, learning and developing new commercial contents [5–8]. Recently, DRO handling has been shifted to information retrieval (IR) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 430–438, 2021. https://doi.org/10.1007/978-3-030-70713-2_40
The Digital Resources Objects Retrieval: Concepts and Figures
431
instead of data retrieval (DR) [9–11]. To evaluate the effectiveness of the IR performance in DROs, many researchers have adopted the traditional evaluation measurements in the IR communities [12–15]. The main issue is how to determine the best collection, and also the most efficient and appropriate evaluation methods. Hence, this paper covers reviews on the following entities related to DROs: 1. Characteristics of test collections with some statistics on the common DRO test collections. 2. Performance measurements in DRO retrievals. 3. Statistical tests in DRO performance evaluations. The research methodology is briefly presented in Fig. 1.
Digital Resource Objects (DROs)
Digital Cultural Heritage Collection
Evaluations
Performance Measurements
Statistical Tests
Fig. 1. Research methodology
The remainder of the paper is structured as follows. An overview of digital cultural heritage collections is discussed in Sect. 2. Performance measurements of DRO retrievals are presented in Sect. 3. Statistical tests in DROs are presented in Sect. 4. Section 5 presents the findings, and Sect. 6 summarizes and concludes the paper.
2 Digital Cultural Heritage Collections Cultural heritage refers to past legacy of how a person lives at present and how the aspects are passed to the future generation. Cultural heritage is composed two categories [16]: i. Tangible cultural heritage are objects that are (a) movable (e.g. paintings, antiquities and artefacts) as well as (b) immovable (e.g. buildings, monuments and archaeological sites). ii. Intangible cultural heritage that cannot be touched, but which can be felt through other sensory organs, such as those that can be seen during a play or dance performance, or heard when stories are read or music is played.
432
W. Z. Alma’aitah et al.
Digitization of tangible and intangible heritage objects has formed a new form of cultural heritage known as digital cultural heritage or cultural heritage information resources [17]. Cultural heritage information resources are inclusive of a vast range of objects, contents and artefacts. The European Commission has asserted that the cultural memory of Europe is composed of prints (newspapers, journals, books, etc.), photographs, museum objects, archival documents, sound and audio-visual materials, monuments, and archaeological sites. Three fundamental activities which are integral in generating and applying digital cultural heritage are digitization, access and preservation [18]. The first activity, digitization refers to conversion objects to digital format from the analogue form. Nonetheless, digital objects without the analogue form need not undergo this process, but replaced by the step of its object creation. The second activity is providing access to the digital heritage object in which users can view the object besides having intuitive and efficient tools to seek resources. The last activity which refers to preservation, ascertains the availability and continuous function of a digital object at present and in future. Cultural heritage objects which differ from published objects in libraries that can be found in other places, are unique with limited accessibility and usage in their original form physically. Nonetheless, since digitized materials have no geographic boundaries, they bridge the cultural variance gap and are viable for educational purposes. Upon digitization of cultural heritage objects, metadata have a significant role as they enhance the usability and efficacy of search systems by offering a range of access points, preserving both aspects of contexts and semantics as well as linking similar materials that have multiple versions with those from similar collections [19]. Metadata offers detailed and general frameworks for a specific community and resource search across varied communities. Additionally, a metadata unit includes right’s management and preservation of information. It has been suggested that adjustments need to be done by cultural heritage institutions on their digital collection planning to suit the nature of the enlisted objects and the needs of the users. Cultural collections, commonly, are rich with unique features, inclusive of physical objects, written texts, maps, photographs, sound recordings, and in some cases, original digitized objects [19]. Therefore, it is a norm to stumble upon collections that are rich in semantics and intricate structures. It is better to separate the composite objects (e.g. traditional costumes or photographs with texts) in cultural heritage collections into parts based on their structures to characterize them individually with suitable metadata elements. In fact, digitization has been outlined as the best solution by cultural heritage institutions to preserve vulnerable and rare objects. Thus, digitization practices must be adhered to compile digital collections, mainly to preserve all data concerning digitisation devices and processes. CHiC2013 is a collection extracted from the Europeana website (WWW.europeana. com). It has been collected by Europeana since 2013 to help researchers from the domain of IR to evaluate the effectiveness of the information access to materials on cultural heritage. Furthermore, it aided the process of feedback provision into the community of cultural heritage for providing better and improved document representation. Therefore, these materials would be accessible by people of any location and purpose worldwide.
The Digital Resources Objects Retrieval: Concepts and Figures
433
The collection consists of sub-collections; each contained records of metadata units describing cultural heritage objects. They included the scanned version of a manuscript, an image of a painting or sculpture, or an audio or video recording. Roughly, 62% of the metadata records describe images, while 35% describe texts, 2% described audio, and the remaining 1% described video recordings. The metadata unit is mapped to a single XML format, whereby each metadata consisted of different elements (title, keywords, description, date, provider, etc.) and provides brief descriptions of the objects [20]. The collection was divided into 14 sub-collections according to the language of the record’s content provider and listed according as in Table 1. Table 2 presents some statistics of the CHiC2013 test collection. The main reasons for choosing the CHiC2013 collection by many authors are as follows: i.
It was the only DRO collection available having testing queries and a set of relevance judgments indicating the documents and metadata in the collection that were relevant to each user query. ii. It was extensively used in a multitude of previous studies and would serve as a good baseline [13, 21, 22]. Therefore, the availability of these baseline results rendered the comparison between the existing methods and proposed methods to be easily made, as well as verifying its objectivity and accuracy of such comparison [23, 24]. Table 1. CHiC collections by language and media type [20].
434
W. Z. Alma’aitah et al.
iii. CHiC2013 collection presented a number of issues related to DROs such as lack of quality of metadata contents and difficulty in accessing and retrieving them, and the heterogeneity of metadata content as reported in [9, 11, 25–27]. iv. The collection provided a large number of documents, whereby each document consisted of a huge amount of metadata units. Dealing with this collection was equivalent to dealing with a number of collections, as each document was a collection itself.
Table 2. Statistics of the CHiC2013 test collection Parameter name
Value
Number of documents
1107
Number of testing queries based on document retrieval
22
Number of testing queries based on metadata unit retrieval 17 The average number of query terms
1.6
3 Evaluation Performance Measurement in DRO Retrievals Typically, performance of the IR systems is based on measuring effectiveness, i.e. how well an information retrieval system can separate relevant from non-relevant documents for a given user’s query [28, 29]. The effectiveness of the IR system is evaluated using the standard metrics for evaluating the system mean average precision (MAP), precision at ten documents (P@N), and precession-Recall Curve [30, 31]. The measures that are mostly used in evaluating the retrieved results are based on two main factors: 1) getting the most relevant results and 2) placing the closest query results at the top of the ranked list. By following the traditional IR evaluation methods, the ranked DRO document retrievals in several works have been evaluated [9, 12, 32–34].
4 Statistical Test in DROs The next step in the evaluation is to compare the values of evaluation metrics obtained by the different methods [35, 36]. This allowed one to determine whether the difference in the results were really meaningful or obtained by chance. Achieving such distinction rendered it necessary for the application of a statistical test [37]. Several statistical tests have been applied in IR tasks such as independent t-test, paired t-test and ANOVA [38, 39]. The paired t-test statistical test is the most famous for IR and it is used for parametric data as reported in [40, 41] [42, 43]. Along the same line, many works in the DRO retrievals have adopted paired t-test as in [12, 14, 15, 21, 44, 45]. By looking up the value of t in the t-distribution, we can obtain the P-value, i.e., the probability of observing the sample results under the assumption that the null hypothesis is true.
The Digital Resources Objects Retrieval: Concepts and Figures
435
P-value is compared to a predetermined significance level σ to decide whether the null hypothesis should be rejected or not. For significance levels, 0.05 was utilized [46, 47]. The t-statistic has the following form:
m1 − m2 (n1 −1)∗SD12 +(n2 −1)∗SD22 n1 +n2 −2
∗
(1) n1 +n2 n1 ∗n2
where m1 : Mean value of the first algorithm, m2 : Mean value of the second algorithm, SD1 : Standard deviation of the first algorithm. SD2 : Standard deviation of the second algorithm. n1 : Sample size of the first algorithm. n2 : Sample size of the second algorithm. Note that the null hypothesis is rejected if the P-value of the t-test is smaller than α, where α is typically set to 0.05 (95% confidence interval). Essentially, the smaller the P-value, the larger the significant difference is.
5 Findings Most studies have used CH collection as a test collection in DRO [8]. This is an indication of the importance of CH collections, and that access to CH collections deserves more research and development. The evaluation measures applied by previous studies focused on two kinds of evaluations namely: 1. Performance measurements evaluation: typically, the traditional IR evaluations are MAP, P@N, and precession-Recall Curve [29]. And they are also effective in evaluating the DRO retrievals [30, 32]. 2. Statistical test evaluation: the most suitable statistical test evaluation for DRO retrieval is the paired t-test [11, 42, 43].
6 Summary and Conclusion The aim of this paper is to determine what test collections and evaluations are the most appropriate for DROs. The paper presents an overview of the DRO test collections, characteristics and availability. Moreover, various studies reviewed related to the performance evaluations have been discussed in terms of traditional IR and DROs particularly in the CHiC2013 collection. Finally, the detail of the statistical tests that is applied in DRO retrieval evaluations has been presented. This paper’s findings are that CH collection is preferable as a test collection for DROs, and the traditional IR evaluations are also effective in evaluating the DRO retrievals. For future work, the IR performance will be evaluated by using the standard computational evaluations. For more evaluation, the user’s views and experience shall be explored in the future as an aid to evaluate the IR performance.
436
W. Z. Alma’aitah et al.
References 1. Witten, I.H., Bainbridge, D., Paynter, G., Boddie, S.: Importing documents and metadata into digital libraries: requirements analysis and an extensible architecture. In: Agosti, M., Thanos, C. (eds.) Research and Advanced Technology for Digital Libraries, pp. 390–405. Springer, Heidelberg (2002) 2. Wang, J.: Massive information management system of digital library based on deep learning algorithm in the background of big data. Behav. Inf. Technol., 1–9 (2020). 3. Kalisdha, A., Suresh, C.: Digital libraries: definitions, issues and challenges. Int. J. Sci. Humanit. 3(1), 95–10 (2017) 4. Pattuelli, M.C.: Modeling a domain ontology for cultural heritage resources: a user-centered approach. J. Am. Soc. Inform. Sci. Technol. 62(2), 314–342 (2011) 5. Manžuch, Z.: Ethical issues in digitization of cultural heritage. J. Contemp. Arch. Stud. 4(2), 4 (2017) 6. Hassett, B.R.: The ethical challenge of digital bioarchaeological Data. Archaeologies 14(2), 185–188 (2018) 7. Er, F.D.H., Bulgun, E.Y., Adanır, E.Ö.: Preservation of a textile culture through a digital cultural heritage. Int. J. Sci. Technol. Soc. 6(2), 25 (2018) 8. LeClere, E.: Breaking rules for good? How archivists manage privacy in large-scale digitisation projects. Arch. Manuscr. 46(3), 289–308 (2018) 9. Alma’aitah, W.Z., Talib, A.Z., Osman, M.A.: Opportunities and challenges in enhancing access to metadata of cultural heritage collections: a survey. Artif. Intell. Rev. 53(5), 3621– 3646 (2020) 10. Ogawa, K., Murahashi, T., Taguchi, H., Nakajima, K., Takehara, M., Tamura, S., Hayamizu, S.: Spoken document retrieval using neighboring documents and extended language models for query likelihood model. In: NTCIR 2016, pp. 186–190 (2016) 11. Kando, N., Adachi, J.: Cultural heritage online: information access across heterogeneous cultural heritage in Japan. In: Electronic Proceedings of International Symposium on Digital Libraries and Knowledge Communities in Networked Information Society, DLKC 2004 (2004) 12. Almasri, M., Tan, K., Berrut, C., Chevallet, J.-P., Mulhem, P.: Integrating semantic term relations into information retrieval systems based on language models. In: Asia Information Retrieval Symposium 2014, pp. 136–147. Springer, Cham (2014) 13. Akasereh, M.: A quantitative evaluation of query expansion in domain specific information retrieval. Proc. Am. Soc. Inf. Sci. Technol. 50(1), 1–7 (2013) 14. Alma’aitah, W.Z., Talib, A.Z., Osman, M.A.: Document expansion method for digital resource objects. In: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp. 256–260 (2019) 15. Alma’aitah, W.Z., Zawawi Talib, A., Osman, M.: Structured dirichlet smoothing model for digital resource objects. Int. J. Eng. Adv. Technol. 9(1), 4 (2019) 16. Abd Manaf, Z.: The state of digitisation initiatives by cultural institutions in Malaysia: an exploratory survey. Libr. Rev. 56(1), 45–60 (2007) 17. Lor, P.J., Britz, J.J.: An ethical perspective on political-economic issues in the long-term preservation of digital heritage. J. Am. Soc. Inform. Sci. Technol. 63(11), 2153–2164 (2012) 18. Alvey, E.: Cultural heritage information: access and management. Aust. Acad. Res. Libr. 47(2), 120–121 (2016). https://doi.org/10.1080/00048623.2016.1207275 19. Schlötterer, J., Seifert, C., Granitzer, M.: Web-based just-in-time retrieval for cultural content. In: Proceedings of the 7th International ACM Workshop on Personalized Access to Cultural Heritage, pp. 1–5 (2014)
The Digital Resources Objects Retrieval: Concepts and Figures
437
20. Petras, V., Bogers, T., Toms, E., Hall, M., Savoy, J., Malak, P., Pawłowski, A., Ferro, N., Masiero, I.: Cultural heritage in CLEF (CHiC) 2013. In: International Conference of the CrossLanguage Evaluation Forum for European Languages, pp. 192–211. Springer, Heidelberg (2013) 21. Tan: Extended language model in cultural heritage collection (2015) 22. Almasri, M., Berrut, C., Chevallet, J.-P.: Wikipedia-based semantic query enrichment. In: Proceedings of the 6th International Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 5–8. ACM (2013) 23. Qin, T., Liu, T.-Y., Xu, J., Li, H.: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf. Retrieval 13(4), 346–374 (2010). https://doi.org/10.1007/ s10791-009-9123-y 24. Scholer, F., Kelly, D., Carterette, B.: Information retrieval evaluation using test collections. Inf. Retrieval J. 19(3), 225–229 (2016) 25. Signore, O.: The Semantic Web and cultural heritage: ontologies and technologies help in accessing Museum information. In: Proceeding of the Information Technology for the Virtual Museum, pp. 1–31 (2008) 26. Kanhabua, N., Kemkes, P., Nejdl, W., Nguyen, T.N., Reis, F., Tran, N.K.: How to search the internet archive without indexing it. In: International Conference on Theory and Practice of Digital Libraries, pp. 147–160. Springer, Cham (2016) 27. Seifert, C., Bailer, W., Orgel, T., Gantner, L., Kern, R., Ziak, H., Petit, A., Schlötterer, J., Zwicklbauer, S., Granitzer, M.: Ubiquitous access to digital cultural heritage. J. Comput. Cult. Herit. (JOCCH) 10(1), 4 (2017) 28. Al-Maskari, A., Sanderson, M., Clough, P., Airio, E.: The good and the bad system: does the test collection predict users’ effectiveness? In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–66. ACM (2008) 29. Kanoulas, E., Carterette, B., Clough, P.D., Sanderson, M.: Evaluating multi-query sessions. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1053–1062. ACM (2011) 30. Kontagora, I.U., Hamid, I.R.A.: Comparative studies of information retrieval approaches in user-centered health information system. In: International Conference on Soft Computing and Data Mining, pp. 171–180. Springer, Cham (2018) 31. Xiong, L., Xiong, C., Li, Y., Tang, K.-F., Liu, J., Bennett, P., Ahmed, J., Overwijk, A.: Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020) 32. Maitah, W., Al-Rababaa, M., Kannan, G.: Improving the effectiveness of information retrieval system using adaptive genetic algorithm. Int. J. Comput. Sci. Inf. Technol. 5(5), 91 (2013) 33. Alma’aitah, W.Z., Talib, A.Z., Osman, M.A.: Language model for digital recourse objects retrieval. J. Theor. Appl. Inf. Technol. 97(11), 2871–2881 (2019) 34. Tan, K.L., Lim, C.K.: Language model: extension to solve inconsistency, incompleteness, and short query in cultural heritage collection. In: AIP Conference Proceedings, vol. 1, p. 020138. AIP Publishing (2017) 35. Urbano, J., Marrero, M., Martín, D.: A comparison of the optimality of statistical significance tests for information retrieval evaluation. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 925–928. ACM (2013) 36. Carterette, B.A.: Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans. Inf. Syst. (TOIS) 30(1), 1–34 (2012) 37. Dyckman, T.R., Zeff, S.A.: Important issues in statistical testing and recommended improvements in accounting research. Econometrics 7(2), 18 (2019)
438
W. Z. Alma’aitah et al.
38. Sakai, T.: Statistical reform in information retrieval? In: ACM SIGIR Forum, vol. 1, pp. 3–12. ACM (2014) 39. Carterette, B.: Statistical significance testing in information retrieval: theory and practice. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1387–1389 (2017) 40. Klink, S., Hust, A., Junker, M., Dengel, A.: Improving document retrieval by automatic query expansion using collaborative learning of term-based concepts. In: Document Analysis Systems V, pp. 376–387. Springer, Heidelberg (2002) 41. Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, pp. 623–632 (2007) 42. Urbano, J., Lima, H., Hanjalic, A.: Statistical significance testing in information retrieval: an empirical analysis of type i, type ii and type iii errors. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 505–514 (2019) 43. Hogan, M.D., Carnahan, L.J., Carpenter, R.J., Flater, D.W., Fowler, J.E., Frechette, S.P., Gray, M.M., Johnson, L.A., McCabe, R.M., Montgomery, D.: Information technology measurement and testing activities at NIST. J. Res. Nat. Inst. Stand. Technol. 106(1), 341 (2001) 44. Alma’aitah, W.Z., Zawawi Talib, A., Osman, M.: Information retrieval framework for digital resource objects. Int. J. Adv. Trends Comput. Sci. Eng. 8(1), 6 (2019) 45. Almasri, M.: Semantic query structuring to enhance precision of an information retrieval system: application to the medical domain. In: CORIA 2013, pp. 293–298 (2013) 46. Paranto, S., Zhang, L., Neumann, H.: Management information systems: using a simulated testing package to assess student performance. Res. High. Educ. J. 7, 1 (2010) 47. Dahiru, T.: P-value, a true test of statistical significance? A cautionary note. Ann. Ib Postgrad. Med. 6(1), 21–26 (2008)
A Review of Graph-Based Extractive Text Summarization Models Abdulkadir Abubakar Bichi1(B) , Ruhaidah Samsudin1 , Rohayanti Hassan1 , and Khalil Almekhlafi2 1 School of Computing, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
{ruhaidah,rohayanti}@utm.my 2 MIS Department, College of Business Administration - Yanbu, Taibah University, Yanbu
42353, Saudi Arabia
Abstract. The amount of text data is continuously increasing both at online and offline storage, that makes is difficult for people to read across and find the desired information within a possible available time. This necessitate the use of technique such as automatic text summarization. A text summary is the briefer form of the original text, in which the principal document message is preserved. Many approaches and algorithms have been proposed for automatic text summarization including; supervised machine learning, clustering, graph-based and lexical chain, among others. This paper presents a review of various graph-based automatic text summarization models. Keywords: Natural languages processing · Text mining · Graph approaches
1 Introduction The volume and quantity of documents available today both on the internet and offline storage, make it difficult and time consuming for one to read across and find the required information. This necessitate the used of computing methods to the problem, and the automatic text summarization (ATS), was found to be most promising option [1]. A text summary is a briefer form of the original text, in which the principal document message is preserved [2]. The ATS is classified using different criteria; based on number of input files, generated output, purpose and context. Based on number of input files, ATS is categorized into: single and multi-document ATS. The single or mono-document summarization generate separate summary for each individual document file while multidocument summarization generate one summary for many related documents [3]. The ATS is also classified based on the generated output into; extractive and abstractive ATS. The extractive type is achieved by choosing the vital and most informative document sentences and rearranged them according to their original index [4]. The abstractive on the other hand, involves intense content reformatting, paraphrasing and rewriting the text in entirely different words [5]. The process is complex and more challenging as the deep analysis of linguistic features required [6]. The ATS is further classified based on purpose into; query-focus and generic. In query-focus, a summary is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 439–448, 2021. https://doi.org/10.1007/978-3-030-70713-2_41
440
A. A. Bichi et al.
generated based on the user biasness [7], usually the system considers the query words or phrases in scoring the document sentences. In contrast, the generic type includes all the documents subtopic [8], and generate unbiased summary regardless of the user preference. More so, ATS is classified based on context into indicative and informative. The indicative summary is less detail summary, which contains only the key outlines of the source document [9], whereas the informative summary cover in depth all topics of the original text, which in most cases are enough for major analysis without referring to the original source [10].
2 Extractive Text Summarization The extractive ATS, is generated by selecting the salient sentences of a document and rearranging them together. Formally, for any given document D, let the set (D) represent a set containing all sentences in D, and L be an integer value such that L ≤ |S |/ 2. The extractive summary is defined as a subset M of the set (D), M ⊂ S (D) and |M| ≤ L , where |M| is represent the total number of sentences in the subset M, as shows in Fig. 1. The Summary is a subset of the document sentences form by removing the redundant and unnecessary sentences from the original set, which their absent will not affect the fundamental documents concept.
Fig. 1. Extractive summarization
The first model of extractive ATS has been proposed for more than 60 years [11]. But the field still remains one of the most challengeable area of research in the field of NLP [12]. The earlier techniques of ATS involve the use of text heuristic features like the term’s frequencies [11], sentences position [13], and title words [14] among others. Far along, other techniques were used for extractive ATS, including clustering method, graph method, machine learning and lexical chain.
3 Graph-Based ATS Models Graph-based ATS models are based on the concept of mathematical graph theory, in the model a graph node is drawn for each sentence in the document and edge is drawn for any two sentences with some similarity, as shows in Fig. 2.
A Review of Graph-Based Extractive Text Summarization Models
441
Fig. 2. Example of text graph [15]
In the graph-based ATS method, sentences recommend other similar sentences and the importance of sentence depend on the importance of the sentences that recommend it. Regardless of the algorithm or model used Mihalcea and Tarau [15], outline the following general steps for graph-based extractive text summarization: 1. 2. 3. 4.
Choose the text unit to be represented as the graph vertices. Determine the relations between the text unit and use it to draw the graph edges. Iterate the ranking algorithm to achieve convergence. Arrange vertices according to their scores and select top ranked as summary.
Various models have been proposed for graph-based ATS, as discussed in the following subsections. 3.1 Static Graph-Based Model The static graph-based models are based on the concept of earlier graph ranking algorithms developed for other applications, such as Hyperlink-Induced Topic Search (HITS) algorithm [16], Positional Power Function algorithm [17] and PageRank algorithm [18]. TextRank algorithm [15], was the first graph-based ATS algorithm, based on the concept of PageRank algorithm. The algorithm represents text sentences as graph vertices and graph edge is draw between any two sentences with some similarity. A words’ overlap is used to determine the similarity between sentences. Unlike the original PageRank algorithm for web analysis that used digraph, the TextRank model used undirected graph and a weight wij is introduced to indicates the degree of causality between sentences i and j. The TextRank algorithm ranks sentence using modified PageRank, as shows in Eq. 1. WS(Vi) = (1 − d ) + d ∗
Vj∈In(Vi)
wji Vk∈Out(Vj)
wkj
WS(Vj)
(1)
442
A. A. Bichi et al.
As almost the same time, LexRank algorithm [19], was proposed by different group. It does same function as TextRank algorithm but uses cosine similarity of tf-idf vectors, to determine the similarity of sentences, as show in Eq. 3. idf − modified − cos ine(u, v) d + (1 − d ) P(v) (2) P(u) = V ∈adj|u| N z∈adj|v| idf − modified − cos ine(z, v) The LexRank support multi-document summarization and it use other text features like sentence length and position for scoring sentences. A research by Mallick, Das [20], modified TextRank algorithm by using inverse sentence frequency (is) based cosine similarity for the similarity measurement. Similarly, Elbarougy, Behery [21], modified TextRank algorithm for Arabic language ATS by using the value of noun count in the document sentences as additional sentences scores. In the same way, Sikder, Hossain [22], modified PageRank for summarization of Bengali text; by including others sentences features like sentence position and length in the ranking. In a research by Woloszyn, Machado [23], cosine-similarity is combined with keyword-similarity for sentence scoring. And a graph-based ATS algorithm by Natesh, Balekuttira [24], used noun position for scoring sentence, where the inverse of distance between two nouns in a sentence is used to determine the sentences weights. Alzuhair and Al-Dhelaan [25], proposed Graph-based ATS hybrid ranking algorithm, by combining PageRank algorithm with HITS algorithm using harmonic mean. Barrios, López [26], combined TextRank algorithm with BM25 ranking algorithm for efficient ranking of sentences. Mussina, Aubakirov [27] proposed symmetric ranking in graph-based ATS; where a sentence is ranked symmetrically using the length of the longest common substring in the sentence. 3.2 Dynamic Graph-Based Model The previously discussed ATS algorithms like TextRank and LexRank algorithms work on static graph model. Ziheng [28], proposed the used of evolutionary graph model for ATS, the model consider the arrival of sentences into the documents. The sentences are arranged in chronological order from first to last, and modelled using a directed graph. The Author ranks the documents sentences by considering both their similarities with other sentences in the cluster and their similarities to the previously selected sentences in the documents using modified MMR re-ranker equation [29], as shows in Eq. 3. arg max MMR mod 2 =si∈R−S [λ.Score(Si) + (1 − λ).sim(si , Q) − δ sim(si , sk ) − γ . sim(si , sj ) sk ∈S
si ∈P
(3) Where Score(s) is the score of sentence s, is called the penalty factor introduced to check the redundancy. Gallo, Popelínský [30], enhanced the concept of timestamps graph with time abstraction using a signal function. The method further improved the quality of scoring by selecting the best pattern and discarding the irrelevant edges, as shows in Eq. 4. Score(s) = Scoresin gle (s) + Scoremulti (s)
(4)
A Review of Graph-Based Extractive Text Summarization Models
443
3.3 Graph Pruning-Based Model The graph pruning-based models of extractive summarization reduces the number of graph nodes and edges by pruning unnecessary graph edges and vertices, thus reducing the time of the graph search. Patil and Brazdil [31], modified LexRank by pruning the graph before applying the ranking algorithm. Miranda-Jiménez, Gelbukh [32], developed a model for single-document summarization using the concept of graph pruning based on HITS ranking algorithm. Similarly, Al-Khassawneh, Salim [33], used graph triangle method for pruning graph in extractive text summarization. More so, a research by Hark and Karcı [34], introduced Karcı method, a graph entropy algorithm to filter out irrelevant graph vertices and select most informative sentences in each paragraph, for multi-document summarization. Likewise, the used of maximum independent set method to filter out less relevant graph nodes was proposed by Uçkan and Karcı [35]. The pruning graph models reduces the graph searching time but has additional time of graph pruning, thus the overall process time is not improved in the model but the accuracy of ranking and selection is better in smaller graphs. 3.4 Hypergraph-Based Model Hypergraph allows one edge called hypergraph incidence to connect more than 2 vertices, thus enable more advance relations between the graph vertices. Wang, Wei [36], Wang, Li [37], proposed a model for query-focus text summarization based on the concept of hypergraph. The hypergraph model was extended for multi-document ATS using vertex-reinforced random walk [38]. Similarly, Lierde and Chow [39], applied clustering technique to hypergraph model for query-focus text summarization; by first grouping the document into clusters and then construct a hypergraph for each cluster. 3.5 Affinity Graph-Based Model The concept of affinity graph involves grouping nodes representing similar objects from different graphs. Wan and Yang [40], used the concept of affinity graph for multidocument summarization by utilizing both inter and intra documents diversity to determine the similarity between sentences. Another research applied random walk algorithm to affinity graph-based ATS [41]. Similarly, Hu, He [42], proposed affinity model with manifold ranking and Kanitha, Mubarak [43], scores sentences using the sum of their affinity weights for extractive ATS of Malayalam language. 3.6 Semantic Graph-Based Model The semantic graph-based model used a semantic similarity measure to determine relations between document sentences. Ullah and Al Islam [44], utilized the idea of semantic graph for extractive text summarization by first extracting the Predicate Argument Structure (PAS) of sentences; the sematic similarity between sentences is measures using their PAS. The graph vertices in the approach are ranks using PageRank algorithm and rerank using MMR algorithm to minimize redundancies. Sevilla, Fernández-Isabel [45], proposed hybrid approach for semantic similarity graph using both knowledge source
444
A. A. Bichi et al.
and linguistic features. Similarly, Han, Lv [46], used Frame-Net and word embedding to measure sematic similarity in semantic graph model for extractive text summarization. Mohamed and Oussalah [47], introduced semantic graph-based ATS framework that support both single and multi-document generic summarization; the semantic similarity is determine using both SRL and Wikipedia knowledge. 3.7 Multigraph-Based Model Multigraph model allows more than one edges between two adjacent vertices. The number of edges indicates the strength of the connection, which is regarded as a weight of the vertex. AlZahir, Fatima [48], used multigraph graph model to represent text for extractive text summarization. In the model an edge is drawn for every two similar words in the adjacent sentences, which later represented using a symmetric matrix.
4 Discussion The graph-based approach uses the graph structure to determine relation and ranks the documents sentences. The most common method to determine the degree of causality between sentences in the approach is similarity measure. The technique has been implemented for diverse type of summarizations, including single-document, multi-document, generic and query-specific. As a typical unsupervised technique, the method does not require training with annotated data, therefore less expensive to implement. The majority of the graph-based ATS algorithms do not depend on the semantic meaning of words, therefore easily applied to many languages. The method considers the relation of sentence with all other sentences in the documents from all positions for a final ranking; therefore, generate summary which are readable and coherent. Like the heuristic features-based and clustering methods, the graph-based algorithms are simple to implement. The research based the taxonomy on graph structure and classified the models into: static graph-based, dynamic graph-based, graph pruning-based, hypergraph-based, affinity graph-based, semantic graph-based, and multigraph-based models. The static graphbased models are the initial but still effective and most commonly used models. In the model undirected weighted graph is used to represent text. A similarity measure is used to determine the weights of the graph edges. The most common similarity measure used in the algorithms is cosine similarity of tf-idf vectors. Some algorithms combined more than one similarity measures using either arithmetic mean or simple harmonic mean, such combination slow the models but gives more accurate scores. Some model like LexRank combined the similarity measures with other sentences features for scoring; but such features has no any significant effect. The efficiency of an algorithm in the model is largely depends on the accuracy of the similarity calculation and ranking function. The static graph-based models are popular for their simplicity, ease of implementation and fast computation. The model has been successfully applied to both single-document and multi-document summarization and it is good in resource utilization. The dynamic graph-based model on the contrary, considers the time of sentences arrival into the document in modelling the graph. Therefore, the model used directed graph to represent the text sentences. The dynamic graph-based models generate summary with good
A Review of Graph-Based Extractive Text Summarization Models
445
readability but the models are usually led to a slow and complex graph representation. Like the static graph-based model, the approach is good for both single-document and multi-document summarization. The graph pruning methods like triangle counting and graph entropy methods reduce the number of the graph nodes, thus improved the efficiency and accuracy of the graph search. But the technique suffered with the addition time complexity of pruning the graph. The model is good for generic extractive text summarization and the low number of the graph vertices improve the efficiency of the scoring and selection of sentences. And the model has an advantage of generating summaries with less redundancies. The resource utilization in the approach can be minimized using some implementation techniques like dynamic programming. Similarly, the affinity graph-based model improves Table 1. Comparison of various ATS graph-based models Model
Similarity measure
Language dependency
Strengths
Weakness
Static graph-based
Lexical
No
Simple implementation fast computation, language independent
Less readability
Dynamic graph-based
Lexical
No
Coherency good readability, language independent
Additional computing time
Graph pruning-based
Lexical
No
More accurate scoring due to small size of the graph, language independent
Additional computing time
Hypergraph-based
Lexical
No
More accurate similarity calculation, language independent
Applied only for query-focus summarization
Affinity graph-based Lexical
No
High coverage, language independent
Slow computation, poor readability
Semantic graph-based
Semantic
Yes
Good similarity scoring
Requires external knowledge source, language dependent
Multigraph-based
Lexical
No
Fast computation, language independent
Less accurate scoring
446
A. A. Bichi et al.
the quality of generated summary by sourcing information from other document; but the model also has high computing time and resource utilization compares to original static graph-based model. The model exploits the technique of global voting and recommendation by considering the sentences resemblance with sentences from other documents on similar topics, thus makes the ranking process of text sentences more accurate. The model is especially good for multi-document extractive summarization, in which many documents involves in the ranking and selection process and the generated summaries are highly informative. Likewise, the semantic graph-based models have more accurate similarity calculation, but the use of external database make the model slower and language dependent. The semantic similarity used by the model required linguistic tools and grammar of a particular language, thus make an algorithm proposed for one language very difficult to be modified for another language. On the other hand, hypergraph-based model has limited application, as it only used for query-focus summarization. But the process of determining the similarity in the model is powerful as it can group more than two sentences using hypergraph incidence. The different features of the graph-based model for extractive text summarization are analyzed in Table 1.
5 Conclusion The field of ATS has been studied for more than 60 years, but still remain of one the most challengeable areas in natural language processing and information retrieval. There are many approaches for ATS but graph-based are prefer by many, for their less cost and language independency. The graph-based models are classified into: static graphbased, dynamic graph-based, graph pruning-based, hypergraph-based, affinity graphbased, semantic graph-based, and multigraph-based models. All the model has their pros and cons; a choice of a model depends on the human language and domain of summarization.
References 1. Aries, A., Zegour, D.E., Hidouci, W.K.: Automatic text summarization: What has been done and what has to be done. arXiv:1904.00688v1 [cs.CL] 1 (2019) 2. Narayan, S., Cohen, S.B., Lapata, M.: Ranking sentences for extractive summarizationwith reinforcement learning. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1747–1759 (2018) 3. Cai, X., Li, W.: Ranking through clustering: an integrated approach to multi-document summarization. IEEE Trans. Audio Speech Lang. Process. 21(7), 1424–1433 (2013) 4. Aker, A.: Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities. A thesis submitted in fulfilment of requirements for the degree of Doctor of Philosophy to Department of Computer Science University of Sheffield (2013) 5. Wan, X.: Using only cross-document relationships for both generic and topic-focused multidocument summarizations. Inf. Retrieval 11(1), 25–49 (2008) 6. Khan, A., Salim, N.: A review on abstractive summarization methods. J. Theor. Appl. Inform. Technol. 59(1), 64–72 (2014)
A Review of Graph-Based Extractive Text Summarization Models
447
7. Zhong, S.-h., et al.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42(21), 8146–8155 (2015) 8. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2001) 9. Narayan, S., Cohen, S.B., Lapata, M.: What is this article about? extreme summarization with topic-aware convolutional neural networks. J. Articial Intell. Res. 66, 243–278 (2019) 10. Vollmer, M., et al.: Informative summarization of numeric data. In: 31st International Conference on Scientific and Statistical Database Management (SSDBM 2019). Santa Cruz, CA, USA (2019) 11. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958) 12. Rezaei, H., et al.: Features in Extractive Supervised Single-Document Summarization: Case of Persian News. arXiv:1909.02776v2 [cs.CL] 9 (2019) 13. Baxendale, P.B.: Machine-made index for technical literature: an experiment. IBM J. Res. Dev. 2(4), 354–361(1958) 14. Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969) 15. Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004) 16. Kleinberg, J.M.: Authoritative sources in a hyper linked environment. J. ACM 46(5), 604–632 (1999) 17. Herings, P.J., Van der Laan, G., Talman, D.: Measuring the power of nodes in digraphs. Technicalreport, TinbergenInstitute (2001) 18. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107-117 (1998) 19. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004) 20. Mallick, C., et al.: Graph-based text summarization using modified textrank. In: Soft Computing in Data Analytics, Advances in Intelligent Systems and Computing (2018) 21. Elbarougy, R., Behery, G., Khatib, A.E.: Extractive arabic text summarization using modified pagerank algorithm. Egyptian Informatics Journal (2019) 22. Sikder, R., Hossain, M.M., Robi, F.M.R.H.: Automatic text summarization for bengali language including grammatical analysis. Int. J. Sci. Technol. Res. 8(6), 288–292 (2019) 23. Woloszyn, V., et al.: Modeling Comprehending and Summarizingtextual Content by Graphs. arXiv:1807.00303v1 [cs.CL] (2018) 24. Natesh, A.A., Balekuttira, S.T., Patil, A.P.: Graph based approach for automatic text summarization. Int. J. Adv. Res. Comput. Commun. Eng. 5(2), 6–9 (2016) 25. Alzuhair, A., Al-Dhelaan, M.: An approach for combining multiple weighting schemes and ranking methods in graph-based multi-document summarization. IEEE Access 7, 120375– 120386 (2019) 26. Barrios, F., et al.: Variations of the Similarity Function of TextRank for Automated Summarization. arXiv:1602.03606 [cs.CL], pp. 65–72 (2016) 27. Mussina, A., Aubakirov, S., Trigo, P.: Automatic document summarization based on statistical information. In: 7th International Conference on Data Science, Technology and Applications (DATA 2018) (2018) 28. Ziheng, L.: Graph-based methods for automatic text summarization. In: School of Computing, National University of Singapore (2007) 29. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1998)
448
A. A. Bichi et al.
30. Gallo, M., Popelínský, L., Vaculík, K.: To text summarization by dynamic graph mining. CEUR Workshop Proc. 2203, 28–34 (2018) 31. Patil, K., Brazdil, P.: Text summarization: using centrality in the pathfinder network. In: IADIS International Conference Applied Computing (2007) 32. Miranda-Jiménez, S., Gelbukh, A., Sidorov, G.: Summarizing conceptual graphs for automatic summarization task. In: Pfeiffer, H.D., et al. (eds) Conceptual Structures for STEM Research and Education. ICCS 2013. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg (2013) 33. Al-Khassawneh, Y.A., Salim, N., Jarrah, M.: Improving triangle-graph based text summarization using hybrid similarity function. Indian Journal of Science and Technology, vol. 10, no. 8 (2017) 34. Hark, C., Karci, A.: Karci summarization: a simple and effective approach for automatic text summarization using Karci entropy. Inform. Process. Manag. 57(3), 102187 2020 35. Uçkan, T., Karci, A.: Extractive multi-document text summarization based on graph independent sets. Egypt. Inform. J. 21(3), 145–157 (2020) 36. Wang, W., et al.: Hypersum: hypergraph based semi-supervised sentence ranking for query-oriented summarization. In: 18th ACM Conference on Information and Knowledge Management. ACM (2009) 37. Wang, W., et al.: Exploring hypergraph-based semi-supervised ranking for query-oriented summarization. Inf. Sci. 237, 271–286 (2013) 38. Xiong, S., Ji, D.: Query-focused multi-document summarization using hypergraph-based ranking. Int. J. Inform. Process. Manag. 52(4), 670–681 (2016) 39. Lierde, H.V., Chow, T.W.S.: Query-oriented text summarization based on hypergraph transversals. Inform. Process. Manag. 56(4), 1317–1338 (2019) 40. Wan, X., Yang, J.: Improved affinity graph based multi-document summarization. In: Human Language Technology Conference of NAACL (2006) 41. Wang, K., et al.: Affinity-preserving random walk for multi-document summarization. In: 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics (2017) 42. Hu, P., He, J., Zhang, Y.: Graph-based query-focused multi-document summarization using improved affinity graph. In: Zhang, W.M., Zhang, S. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science. Springer, Cham (2015) 43. Kanitha, D.K., Mubarak, D.M.N., Shanavas, S.A.: Malayalam text summarization using graph based method. Int. J. Comput. Sci. Inform. Technol. 9(2), 40–44 (2018) 44. Ullah, S., Al Islam, A.B.M.A.: A framework for extractive text summarization using semantic graph based approach. In: ACM International Conference Proceeding Series (2019) 45. Sevilla, A.F.G., Fernández-Isabel, A., Díaz, A.: Enriched semantic graphs for extractive text summarization. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 217–226 (2016) 46. Han, X., et al.: Text summarization using framenet-based semantic graph model. Scientific Programming. Hindawi Publishing Corporation (2016) 47. Mohamed, M., Oussalah, M.: SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Inf. Process. Manage. 56(4), 1356–1372 (2019) 48. AlZahir, S., Fatima, Q., Cenek, M.: New graph-based text summarization method. In: IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) (2015)
Review on Emotion Recognition Using EEG Signals Based on Brain-Computer Interface System Mona Algarni(B) and Faisal Saeed College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia [email protected]
Abstract. Deep learning is closely related to theories of brain development. Brain-Computer Interface (BCI) is the latest development in human–computer interaction (HCI). The BCI reads brain signals from different areas of the human brain and translates these signals into commands that can be controlled through the computer applications. BCI technology is effective in the field of human emotions recognition, with high accuracy using EEG signals. When the brain signals are collected and analyzed using deep learning algorithms, it helps in diagnosing diseases and in distinguishing between physical and psychological diseases, which is helpful in making a correct medical decision. The combination of feature selection methods and classification algorithms serves to recognize emotion more accurately from EEG signals. Each of these algorithms has degree of accuracy and unique characteristics. In this paper, we have reviewed and discussed the related studies on BCI technology that are most concerned with classification of emotions through EEG signals. In addition, we have reviewed the methods of collecting signals and feature extraction from EEG datasets. The paper also discusses the main challenges faced in emotion recognition using EEG. We have reviewed several recent studies are classified based on the techniques used in the emotion recognition process. The results show a clear increase in research related to emotion recognition as an important area of investigation, and a diversity of techniques being used to extract and classify features. After discussing the challenges, we found that given the state of technological development, the interconnection between technology and medicine will generate a tremendous volume of applied solutions in future, contributing to the development of research in health informatics systems. A comparison of the recent studies in this field has been conducted, and we deduce the wide variety of techniques used to detect emotion and the increasingly accurate results. Keywords: Brain-Computer Interface · Classification methods · Deep learning · EEG · Electroencephalography · Emotion recognition · Feature extraction
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 449–461, 2021. https://doi.org/10.1007/978-3-030-70713-2_42
450
M. Algarni and F. Saeed
1 Introduction Recent studies in brain and computer interfaces demonstrate the control of robotic systems through mental processes. Brain-Computer Interface (BCI) is the direct connection between a human brain and an external device. These interfaces are used to improve brain function. This technology enables information to be transmitted to and from the brain in the form of electrical signals. Electroencephalography (EEG) represents an important direction in the development of BCI systems. It detects electrical activity in the human brain, and the results of the examination appear in waves [1]. BCI technology is a task that has a real impact on human life, and it can be done by applying deep learning techniques that have been developed effectively, specifically when using deep learning technology to reveal emotions and discover the activity of people who are unable to express their feelings. It is used in several applications such as: medicine, marketing, entertainment, and other fields [2]. BCI technology has shown progress at the level of medical applications, as it is concerned with helping patients overcome physical and psychological disabilities through diagnosis of diseases and recognizing behavioral disorders [3]. BCI technology is characterized by the ability to analyze brain signals and produce real and accurate outputs because the person cannot directly control the brain output. Also, in the present period, BCI systems have been used in developing computerbased entertainment activities through interactive games. BCI can efficiently categorize emotions, such as levels of reflection, engagement, frustration, excitement, and stress [4]. While the applications of BCI vary, our focus in this paper is on techniques used to detect emotions from electroencephalography (EEG). EEG technology is characterized by the difference in the signals in the EEG from one person to another, due to different emotional responses to the same stimuli. Several methods are used to extract and classify feelings from EEG signals, and these methods differ in the accuracy and validity of the results, and this is considered a great challenge to reaching high accuracy in the results of signal analysis. The main problem in applied research of BCI related to the classification of emotions is obtaining precision in the classification of results, and the long time involved in the classification of the signals [4]. This paper is organized as follows: The second section contains a background of BCI and its applications. The third section encapsulates the most important datasets used in the relevant studies. Section 4 reviews proposed methods for feature extraction from EEG signals. The fifth section reviews classification methods for identifying emotions using EEG signals. The sixth topic is a discussion on the applied methods. Finally, the conclusion sheds light on the most important findings and recommendations related to the subject.
2 Brain-Computer Interface (BCI) Brain–Computer Interface (BCI) is a technology oriented towards improving the quality of interaction between humans and computers. There are many applications in this field. When there is good human–machine interaction, it helps the machine understand the human emotions and provide solutions to several problems. In addition, the precise system in the process of emotion recognition can respond to emotions in real time.
Review on Emotion Recognition Using EEG Signals
451
Researchers in computer science, medical science, and psychology are partnering to create a system that can effectively interpret human emotion. The Brain-Computer Interface (BCI) is a software and device that allows recognition of brain activity through a unit to control external devices. The primary goal of BCI research is to provide communication services to patients with neuromuscular disorders. BCI is able to recognize emotion patterns in brain signals through three main stages: signal acquisition, feature extraction, and classification. This article discusses the three phases in the following sections. The field of BCI related to emotion recognition has developed considerably in recent years. It is a complex process, especially for people with emotional disorders. In previous studies, researchers have tried to reveal emotions using traditional methods such as texts, speech, and facial images, but the results are biased and not accurate enough, as the person is able to control the external senses. For this reason, researchers have conducted studies to detect emotions using various biological signals such as EEG or ECG signals, temperature and heart rate (HRV) and eye blinking. These methods are more efficient and accurate than traditional methods [3, 5]. When measuring emotion recognition through brain signals, most studies used Fig. 1 to classify arousal and valence, given its clarity and comprehensiveness. Low arousal and positive valence mean that the resulting emotion is happiness; and when the arousal is low and the valence is negative, the resulting feeling is sadness. Figure 1 illustrates the emotion distribution method in a two-dimensional model. The vertical dimension represents “Valence” and the horizontal dimension “Arousal”, positive and negative. For example, “calmness” is a feeling of positive emotion-equal, low-activation. This model demonstrates that emotions are related to physiological activation that can be classified through EEG signals [6].
Fig. 1. Model emotions according to valence and arousal
The main applications of BCI in different fields are illustrated in the next subsections.
452
M. Algarni and F. Saeed
2.1 Medical Field As the research deepened, we found that there are several studies interested in developing BCIs for a wide range of clinical applications. Examples of medical BCI applications are: rehabilitation devices in health centers, which are used to help patients recover after exposure to traffic accidents and injuries [3]. There are some BCI applications concerned with replacing organs and senses lost or weakened by disease, such as the loss of a limb. Some articles show that the performance of these systems is not good enough to solve the problem of physical disability. On the other hand, BCI medical applications require high accuracy and efficiency, as the error consequences of these applications may lead to disastrous consequences for patients. For example, the incorrect response of a BCI robot or wheelchair can cause serious injuries. One study [5] investigated how medical applications not only rely on metal brain electrodes that appear on the scalp, but can also communicate with sensors that can be implanted directly in the brain to monitor critical cases. These implants give more accurate signals, especially in patients with complete paralysis, as the patients cannot move any part of their body. This technique is a good solution for alerting healthcare providers that a patient needs help based on EEG signals. A review of articles in this field reveals that the time required to make these systems widely available still remains unknown [6]. 2.2 Entertainment Field Bontchev [7] considered BCI games to be very popular with people interested in interactive games. One of the most important recreational uses of BCI technology is game controllers to provide interactive games between the player and the game environment. There are many toys based on a technique that measures EEG signals with metal electrodes around the head. These systems can be fun because they predict player behavior and arouse enthusiasm. In addition, game developers offer comfortable, operable EEG devices made with easy-to-wear electrodes [8]. After reviewing the literature on the entertainment aspect of using BCI technology, future trends in BCI games indicate that the priority of developing games in this area focuses on “ease of play” and “platform development” as essential elements for the success and spread of a game.
3 EEG Signals Acquisition In this stage the signal is captured by the participants, and it includes all the processes associated with pre-processing the signals, which contributes to improving signal performance and reducing noise. Methods of signal acquisition vary, and some studies have preferred to use readymade datasets that are freely available, or by recording the desired signal using physiological equipment to capture the signal. When the captured signals are weak, which is often, they need signal amplifiers, and they must be converted into a digital format for use by computer applications [10]. The signal is pre-processed to prepare it for the feature extraction process, and it is cleared of any noise to strengthen the signals in order to reveal the features explicitly. Some types of filtration can be used in the pre-treatment step [2]. Data sources vary, for the articles reviewed. Some of the
Review on Emotion Recognition Using EEG Signals
453
studies we reviewed collected their own EEG signals. Thus, we concluded that they added an additional advantage because these datasets do not need specific registration systems, are inexpensive, free, and reliable, and available in all circumstances. In Table 1 we compare three datasets common in the field of EEG signaling research. Table 1. Benchmark EEG emotional databases Dataset
Participants Stimulation Emotion
Supplementary References files
DEAP
32
40 video clips
40 emotions according to valence arousal model
Record of face [1, 11, 15, 16, 24, 27] videos
SEED
15
15 video clips
Positive, neutral, and negative emotions
Record of face [10] videos
16 short video clips, four long video clips to individuals and groups
Valence, Record of arousal, full-body and control, depth videos familiarity, and liking/disliking
AMIGOS 40
[12]
4 Feature Extraction This stage includes the extraction of features that determine discriminative information in brain signals recorded by monitoring functional and distinct features of the signals. At this stage, the feature vector should be of low dimension, and the overlapping features are excluded to reduce the complexity of the feature extraction stage. At the feature extraction stage, researchers use specific methods that extract the signals’ features and convert them into commands that fulfill the user’s intent. The signals are classified according to frequency domain, time domain, or time-frequency analysis; the algorithm can use either linear or nonlinear methods [13, 14]. Table 2 shows some of the effective methods in the feature extraction process:
454
M. Algarni and F. Saeed Table 2. Comparison of feature extraction methods
Reference
Feature extraction methods
Description
[10, 15]
Discrete Wavelet Transform (DWT)
This method relies on taking separate samples of the wave signal to analyze it separately and extract the features. This method is characterized by temporal accuracy when simultaneously extracting the position and frequency information from the waves, and it is sensitive to the alignment of the EEG signal at the correct time, which increases the accuracy. It is effective in frequency separation
[11]
Discrete Cosine Transform (DCT)
This method is used to process frequency domain of signals and compress data with better efficiency. It is based on the sequence of data points in the specified signal using Cosine calculations that are expressed at different frequencies. DCT method is similar to DWT method in that it fulfills the process of converting signals from spatial domain to frequency domain
[9, 12]
Scale-Invariant Feature Transform (SIFT)
This is a feature extraction algorithm to describe local features in images and signals. It is not affected by noise or slight changes in EEG signals. SIFT key points are extracted from the input signals and stored in a database, and the feature is then identified by comparing each feature of the new signal with the database and finding matching features based on the Euclidean distance of their feature vectors. The accuracy of the results is ensured by considering the accuracy of the fit and the number of potential false matches
[16, 17]
Short-Term Fourier Transform (STFT)
It is a method for analyzing and detecting the frequency contents of the EEG signals at all time points to provide control and perform a multitude of tasks. The STFT method works to divide the EEG signal into several separate segments of short time signals by switching the time window with some interference between the signals, so that each part has a stable signal. This method is characterized by the use of less time and more discrimination in frequency
(continued)
Review on Emotion Recognition Using EEG Signals
455
Table 2. (continued) Reference
Feature extraction methods
Description
[18, 19]
Convolutional Neural Network (CNN)
It is used to extract complex features in the data, and it is a multi-layered neural network, which means it depends on complete interconnection and that every neuron in one layer is associated with all the neurons in the next layer. CNN is used to recognize and distinguish images and signals from others. It relies on shared-weights architecture and translation invariance characteristics in the feature extraction process
[4, 9, 20]
Power Spectral Density (PSD)
This method is commonly used in the frequency domain to extract features from EEG signals. Features are extracted by Fast Fourier Transform (FFT), which is an algorithm for computing discrete and inverse Fourier transform. This method is based on converting data from the time domain to the frequency domain and vice versa. The EEG signal is analyzed using the PSD approach and divided into four distinct frequencies: theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (30–40 Hz)
5 Classification Methods In the classification stage, the features in the signals are transformed into orders of value. After the signals have been classified, they are processed and converted into output [9], plus effective emotion recognition. In the previous studies, the classification methods used were based on the quality of the classifiers, to obtain the best result. In Table 3, we compare the most important classification methods that were applied for emotion recognition using EEG signals, and their impact. Table 3. Comparison of classifiers Reference
Classifier
Description
[2, 21, 22]
Support Vector Machine (SVM)
It is the most used method for classifying. It works with unstructured data such as signals, text, or images to classify them statistically and perform regression analysis on them. Also, this approach is suitable for multi-dimensional data
(continued)
456
M. Algarni and F. Saeed Table 3. (continued)
Reference
Classifier
Description
[14]
Linear Discrimination Analysis (LDA)
The result of this classification is used as a linear classifier and to reduce the features’ dimensions before the subsequent classification process. This algorithm has the advantage that it distributes predictors separately in each response class, and then uses Bayes’s theory to estimate the probability. Linear discrimination analysis is related to variance analysis and regression analysis. The similarity between these methods is that they attempt to express a specific dependent variable as a linear combination
[4, 6, 23]
Artificial Neural Networks (ANN)
It is a type of classifier for neural networks which is a nonlinear classifier, characterized by accurate results, but it faces some challenges such as the long time involved in training and high computational cost. This algorithm works on the Backpropagation algorithm method to train the network on multiple input methods to find the best input size. This method simulates human neurons and consists of multiple layers. It consists of three layers: an input layer, a hidden layer, and an output layer
[16, 19].
Convolutional Neural Network (CNN)
It is also used in the classification stage; neurons communicate through signaling layers. The CNN method divides the signal into several sub-sections, and the average or maximum value is taken to reduce the features in each section of a single value, which makes the CNN method more powerful and stable in classification. It gives automatic checking of signal quality
6 Discussion of the Results of Emotion Recognition Methods Using EEG Signals In this section, we compare the results of some studies that used deep learning algorithms to recognize emotions, and conclude the most important challenges facing the process of emotion recognition, followed by a summary of our findings. When we compared Uddin et al. [24], this study classified parity and excitation using the SVM method, and used the DCT method and Box-and-whisker scheme to determine the correct features of a DEAP dataset test. The researchers concluded that the statistical properties of Fast Fourier Transform FFT for detecting emotions gave a higher accuracy of 92%. Therefore, this method is superior to the method used in [1], as to accuracy of the results. In another study by Deng et al. [20], affective disorders were identified rather than the emotion itself, in order to determine the risk of emotional disturbances. The researchers preferred to record the EEG signals themselves, and they selected 31 people (9 males and 22 females) to collect the data. In this study, the researchers used two types
Review on Emotion Recognition Using EEG Signals
457
of feature extraction algorithms, PSD and CSP, and selected the most popular classifiers to perform the SVM classification. This study achieved a high accuracy of 95.20%. The study showed that the frontal cortex of the brain has a vital role in determining emotional distress. The accuracy of the CNN classifier is illustrated by a study [15]. They classified feelings on the basis of valence and arousal using the DEAP database applied to 32 subjects. Three feature extraction methods were used: DWT, DCT, and DFT. The main feature of CNN Classifier is discovering the important features without any human oversight. This study yielded results with an accuracy of 94.75% for parity and 95.75% for excitation. We conclude that this method is superior to others, with high accuracy in simultaneous extraction of position and frequency information from the waves, in addition to the use of DWT to reduce the noise of physiological signals. In the context of the CNN workbook, Yildirim et al. [18] aimed to detect depressive disorder by analyzing an EEG using a depression dataset containing a set of EEG signals obtained from the two hemispheres (left and right brain) of patients. In this study, a method based on the use of convolutional neural networks (CNNs) and long and short-term memory (LSTM) was developed that provided sequential learning to detect depressive disorder using EEG signals. The high-resolution results showed 99.12% and 97.66% EEG signals for the left and right hemispheres, respectively. From here we conclude the accuracy, sophistication, and speed of the CNN-LSTM model in diagnosing depression using EEG signals. In [12], the researchers worked to use a deep convolutional neural network on a dataset of physiological signals in the AMIGOS database to discover emotions by linking these physiological signals to arousal and parity data. In this study, SIFT algorithms were adopted to extract signal features, and this application uses a deep convolutional neural network (DCNN) to automatically classify the feature from the signals. This study showed 90% accurate results in detecting emotions in physiological signals. Likewise, in Zeng et al. [17] also the SIFT method was used for feature extraction. It was based on recording the EEG signals of 30 participants while watching 18 Chinese movie clips. The study aimed to classify six emotions: happiness, neutrality, sadness, disgust, anger, and fear. The SVM classifier was used to classify the signals and it showed a result with an accuracy of 87.3%. In this study, the results derived through EEG signals collected from electrodes distributed in the two lobes of the brain showed that the frontal lobe has an outstanding performance in distinguishing between feelings, and this is in agreement with the study of Deng et al. [20]. Table 4 presents a summary of the results of some studies and techniques used to extract and classify feelings from datasets, and the results that the researchers have reached. We conclude from this table that the differences in the results are on account of the algorithms that were used, and that the pre-processing of the signals contributes to giving more accurate results with less time consumed for classification, as it removes the noise in the signals resulting from measuring instruments, electromagnetic interference, and movement actions [23]. By reviewing the studies in this field, we found that human–computer communication and the discovery of emotions is wide-ranging and complex, and needs to be explored through further studies and applications in the future. The use of EEG signals in the emotion recognition process is a new and significant field of research. We concluded
458
M. Algarni and F. Saeed Table 4. Comparison of the results of current studies
Reference
Dataset
Features extraction
Classification method
Accuracy
(Enas et al.) [12] AMIGOS
SIFT
DCNN
90%
(Liu et al.) [2]
DEAP
DWT
RF
Valence = 74.3% Arousal = 77.2%
(Zeng et al.) [17]
Recorded EEG data
STFT
SVM
87.3%
(Girardi et al.) [25]
DEAP
PSD, CSP
SVM
Valence = 56% Arousal = 60.4%
(Wang et al.) [16]
DEAP
STFT
CNN
83.88
(Deng et al.) [20]
Recorded EEG data
PSD, CSP
SVM
95.20%
(Taali et al.) [18] Depression dataset
CNN
LSTM
Valence = 97% Arousal = 99%
(Uddin et al.) [24]
DEAP
DCT
SVM
92%
(Mehndi et al.) [15]
DEAP
DWT, DCT, DFT
CNN
Valence = 94% Arousal = 95%
(Natraj et al.) [10]
SEED-IV, DEAP
DWT
SVM
Valence = 74% Arousal = 86%
that there are many challenges affecting the accuracy of results and the time consumed to classify emotions. Therefore, we have categorized these challenges in recognizing emotions using EEG into two main categories: the first category relates to technology and the second is related to the users. 6.1 Technology Challenges There are some technology challenges related to the use of EEG devices and the manner of dealing with these sensitive devices. Some researchers have found it difficult to deal with the results of the EEG signals and transmit the signals to data correctly [26]. In addition, the accuracy provided by the EEG devices is a great challenge when giving the brain signals, given the need to avoid erroneous results. Girardi et al. [25] showed that the noise in the signals is a big challenge to the clarity of the signal, which necessitates more time to remove this noise from signals by using the pre-processing method. The long time spent on extracting and classifying features is also an issue for researchers. When looking at the challenges facing the process of extracting and classifying the features from signals, we found that the need for enhancing accuracy of the results is one of the most important aspects, in addition to the time consumed and efficiency [27, 28]. Accordingly, researchers and scientists must contribute for raising the level of modern
Review on Emotion Recognition Using EEG Signals
459
algorithms that give greater accuracy in results. Also, improving and constantly updating the databases will contribute to the diversity of future studies. 6.2 User Challenges Some researchers do not know the status of the sample being tested with EEG, especially the medical history, which may affect the final results. In addition, some people undergoing the test are not ready to take the sample correctly. For example, metal electrodes may be placed on the head in a wrong or random way. Among the challenges also is the long time required for the training to use the EEG devices to take signals [28]. Usually, taking the signals requires the participation of some technicians and members of the trained medical team to complete the collection of signals in the correct way. Often, cooperation is needed among the medical team, such as doctors specializing in the brain and nerves, psychiatrists, technicians trained in the use of the EEG device, and researchers in the deep learning field [18]. 6.3 Discussion Recently, the number of studies on the applications of BCI systems has been increased. There are various search methodologies and different algorithms to extract and model features, and this has given a greater variety of results. Research results depend on several factors such as the datasets used and the deep learning algorithms employed to extract features and classifiers for emotional analysis according to the research objectives. Through the studies reviewed, we have concluded that the research in the field of BCI confirms that the system supports adaptive activities and develops the application of analytical methods. Researchers find BCI applications growing exponentially to serve several disciplines; for example medicine, neurosurgery, psychology, mathematics, computer science, physics, and bioengineering. BCI applications in the field of emotion recognition have demonstrated a high accuracy of results, reaching 99% in [18], and this is evidence of the development and use of applications and technologies that adapt to changes in the user’s condition, in addition to improving emotion recognition and communication between individuals and machines.
7 Conclusions The results of studies reviewed in this paper suggest that the BCI applications may increase the effectiveness of rapid communication between computers and the human brain, and help to obtain a higher degree of accuracy that allows users to make correct decisions. Emotion recognition using EEG warrants further experiments to produce high-quality results [3]. It is important to know that although BCI is unable to read or control a person’s inner thoughts, it does predict a person’s behavior based on the emotion extracted from the brain signals. Emotion recognition using EEG data requires a lot of work to achieve more accurate results, especially when dealing with low-quality signals. The researchers in the field of BCI must pay attention to the feedback because it has a great role in improving systems and obtaining classification results in real time,
460
M. Algarni and F. Saeed
and it is necessary to pay attention to the quality of the signals used and the methods of collecting them, to take advantage in the real scenario. Currently, multiple studies are being conducted to develop the functions of BCIs to enhance accurate results, and to improve the level of performance at a cost that suits people and patients. In the next years, the studies of emotion recognition using EEG signals will have many applications, especially in the medical field. One of the future enhancements of BCI technology is response to several commands at the same time based on a single signal, using deep learning techniques effectively to improve BCI performance, and adapting decoding algorithms to brain signals.
References 1. Pandey, P., Seeja, K.R.: Subject independent emotion recognition from EEG using VMD and deep learning. Journal of King Saud University-Computer and Information Sciences (2019). https://doi.org/10.1016/j.jksuci.2019.11.003 2. Liu, J., Meng, H., Li, M., Zhang, F., Qin, R., Nandi, A.K.: Emotion detection from EEG recordings based on supervised and unsupervised dimension reduction. Concurr. Comput. 30(23), 1–13 (2018). https://doi.org/10.1002/cpe.4446 3. Korde, K.S., Paikrao, P.L., Jadhav, N.S.: Analysis of EEG signals and biomedical changes due to meditation on brain by using ICA for feature extraction. In: 2018 Second International Conferences on Intelligent Computing Control System. Iciccs, pp. 1479–1484 (2018) 4. Thammasan, N., Thammasan, K., Moriyama, K., Fukui, K., Numao, M.: Familiarity effects in EEG-based emotion recognition. Brain Inform. 4(1), 39–50 (2017). https://doi.org/10.1007/ s40708-016-0051-5 5. Khalili Ardali, M., Rana, A., Purmohammad, M., Birbaumer, N., Chaudhary, U.: Semantic and BCI-performance in completely paralyzed patients: possibility of language attrition in completely locked in syndrome. Brain Lang. 194(8), 93–97 (2019). https://doi.org/10.1016/ j.bandl.2019.05.004 6. Mohammadpour, M., Hashemi, S.M.R., Houshmand, N.: Classification of EEG-based emotion for BCI applications. In: 7th Conferences Artificial Intelligence Robotics IRANOPEN 2017, pp. 127–131 (2017). https://doi.org/10.1109/rios.2017.7956455 7. Bontchev, B.: Adaptation in affective video games: a literature review. Cybern. Inf. Technol. 16(3), 3–34 (2016). https://doi.org/10.1515/cait-2016-0032 8. Abbasi-Asl, R., Keshavarzi, M., Chan, D.Y.: Brain-Computer interface in virtual reality. In: International IEEE/EMBS Conference on Neural Engineering NER, vol. 2019, pp. 1220–1224 (2019). https://doi.org/10.1109/ner.2019.8717158 9. Al-Nafjan, A., Hosny, M., Al-Wabil, A., Al-Ohali, Y.: Classification of human emotions from Electroencephalogram (EEG) signal using deep neural network. Int. J. Adv. Comput. Sci. Appl. 8(9), 419–425 (2017). https://doi.org/10.14569/ijacsa.2017.080955 10. Thejaswini, S., Ravikumar, K.M., Jhenkar, L., Natraj, A., Abhay, K.K.: Analysis of EEG based emotion detection for DEAP and SEED-IV databases using SVM 208 II. Lit. Rev. 1, 207–211 (2019) 11. Ullah, H., Uzair, M., Mahmood, A., Ullah, M., Khan, S.D., Cheikh, F.A.: Internal emotion classification using EEG signal with sparse discriminative ensemble. IEEE Access 7(3), 40144–40153 (2019). https://doi.org/10.1109/ACCESS.2019.2904400 12. Santamaria-Granados, L., Munoz-Organero, M., Ramirez-Gonzalez, G., Abdulhay, E., Arunkumar, N.: Using deep convolutional neural network for emotion detection on a physiological signals dataset (AMIGOS). IEEE Access 7, 57–67 (2019). https://doi.org/10.1109/ ACCESS.2018.2883213
Review on Emotion Recognition Using EEG Signals
461
13. Alarcão, S.M., Fonseca, M.J.: Emotions recognition using EEG signals: a survey. IEEE Trans. Affect. Comput. 10(3), 374–393 (2019). https://doi.org/10.1109/TAFFC.2017.2714671 14. Wei, Y., Wu, Y., Tudor, J.: A real-time wearable emotion detection headband based on EEG measurement. Sens. Actuators Phys. 263, 614–621 (2017). https://doi.org/10.1016/j.sna.2017. 07.012 15. Mehndi, S.H.: Emotion Recognition using EEG Signal and Deep Learning Approach (8) (2019) 16. Wang, K.Y., Ho, Y.L., De Huang, Y., Fang, W.C.: Design of intelligent EEG system for human emotion recognition with convolutional neural network. In: Proceedings 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems AICAS 2019, pp. 142–145 (2019). https://doi.org/10.1109/aicas.2019.8771581 17. Zhuang, N., Zeng, Y., Yang, K., Zhang, C., Tong, L., Yan, B.: Investigating patterns for self-induced emotion recognition from EEG signals. Sens. (Switzerland) 18(3), 1–22 (2018). https://doi.org/10.3390/s18030841 18. Ay, B., et al.: Automated depression detection using deep representation and sequence learning with EEG signals. J. Med. Syst. 43(7), 1–12 (2019). https://doi.org/10.1007/s10916-0191345-y 19. Gonzalez, H.A., Yoo, J., Elfadel, I.A.M.: EEG-based emotion detection using unsupervised transfer learning. In: Proceedings Annual International Conference of the IEEE Engineering in Medicine and Biology Society EMBS, pp. 694–697 (2019). https://doi.org/10.1109/embc. 2019.8857248 20. Deng, Y., Wu, F., Du, L., Zhou, R., Cao, L.: EEG-based identification of latent emotional disorder using the machine learning approach. In: Proceedings 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference ITNEC 2019, pp. 2642–2648 (2019). https://doi.org/10.1109/itnec.2019.8729424 21. Thejaswini, S., Ravi Kumar, K.M., Rupali, S., Abijith, V.: EEG based emotion recognition using wavelets and neural networks classifier. In: SpringerBriefs Applications of Science and Technology, no. 9789811066979, pp. 101–112 (2018). https://doi.org/10.1007/978-981-106698-6_10 22. Liu, S., et al.: Improve the generalization of the cross-task emotion classifier using EEG based on feature selection and SVR. In: 2019 IEEE 10th International Conference on Awareness Science and Technology iCAST 2019, pp. 1–5 (2019). https://doi.org/10.1109/icawst.2019. 8923256 23. Mert, A., Akan, A.: Emotion recognition from EEG signals by using multivariate empirical mode decomposition. Pattern Anal. Appl. 21(1), 81–89 (2018). https://doi.org/10.1007/s10 044-016-0567-6 24. George, F.P., Shaikat, I.M., Ferdawoos Hossain, P.S., Parvez, M.Z., Uddin, J.: Recognition of emotional states using EEG signals based on time-frequency analysis and SVM classifier. Int. J. Electr. Comput. Eng. 9(2), 1012–1020 (2019). https://doi.org/10.11591/ijece.v9i2 25. Girardi, D., Lanubile, F., Novielli, N.: Emotion detection using noninvasive low cost sensors. In: 2017 7th International Conference on Affective Computing and Intelligent Interaction ACII 2017, vol. 2018, no. 1, pp. 125–130 (2018). https://doi.org/10.1109/acii.2017.8273589 26. Zamanian, H., Farsi, H.: A new feature extraction method to improve emotion detection using EEG signals. Electron. Lett. Comput. Vis. Image Anal. 17(1), 29–44 (2018). https://doi.org/ 10.5565/rev/elcvia.1045 27. Ozdemir, M.A., Degirmenci, M., Guren, O., Akan, A.: EEG based emotional state estimation using 2-D deep learning technique. In: TIPTEKNO 2019 Tip Teknol. Kongresi, pp. 1–4 (2019) https://doi.org/10.1109/tiptekno.2019.8895158 28. Bota, P.J., Wang, C., Fred, A.L.N., Placido Da Silva, H.: A review, current challenges, and future possibilities on emotion recognition using machine learning and physiological signals. IEEE Access 7, 140990–141020 (2019). https://doi.org/10.1109/ACCESS.2019.2944001
A New Multi-resource Deadlock Detection Algorithm Using Directed Graph Requests in Distributed Database Systems Khalid Al-Hussaini1(B) , Nabeel A. Al-Amdi2 , and Fuaad Hasan Abdulrazzak2 1 Faculty of Computer Science and Information Systems, Thamar University, Genius University
for Sciences and Technology, Dhamar, Yemen 2 Faculty of Computer Science and Information Systems, Thamar University, Dhamar, Yemen
Abstract. In distributed system, a single database is spread physically across computers in multiple locations called distributed database. One of the most serious problems in distributed database is deadlock. The deadlock is a state of the system in which transactions are waiting for one another indefinitely. This paper presents a new algorithm to detect multi resource deadlocks using directed graph. The proposed algorithm is development over the algorithms by Brain M. Johnston and Himanshi Grover. In previous algorithms, there is no criteria to decide the transaction, which needs to be aborted early to reduce the repeated time of detections. It makes this decision using incoming and out coming requests of transactions in graph as criteria to decide such transaction to detected and aborted early. It ensures that only one transaction will detect the deadlock cycle. All true deadlocks are detected in finite time and no false and undetected deadlocks are reported. Keywords: Distributed database · Multi-resource deadlock · Deadlock detection · Wait-For-Graph
1 Introduction A Distributed Database (DDB) is a set of multiple logically interrelated databases, which are distributed over a computer network [1]. DDB typically appears to applications as a single database, which is managed by Distributed Database Management System (DDBMS) [1]. DDBMS is defined as the software system that permits the management of the distributed database and makes the distribution transparent to users [2]. Sometimes Distributed Database System (DDBS) is used to refer to the DDB and DDBMS. DDBS is a collection of several related databases, which are physically distributed in different sites over a computer network [3]. Concurrency Control (CC) in transactions is one of the problems in DDBMS [4]. The main objective of the CC is to prevent database updates from interfering by the transactions users, which may lead to inconsistent state in database [5]. The users interact with the database via transactions, each transaction is a set of instructions, which can be,’read, write, lock, or unlock operations’ [6]. If the actions of a transaction involves data at a single site, the transaction is called local, while © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 462–474, 2021. https://doi.org/10.1007/978-3-030-70713-2_43
A New Multi-resource Deadlock Detection Algorithm
463
a distributed transaction is a transaction that accesses and updates data on two or more networked computer systems or in several sites [6]. Deadlock is a state of the system in which two or more transactions are waiting for one another indefinitely (waiting forever) for the resources to be released and never terminates their executions and the resources they hold are not available to any other transaction without that getting a transaction chance to change its state until the resources are acquired [7, 8]. Distributed deadlocks can occur in distributed systems when distributed transactions and concurrency control are being used, a system is deadlocked if and only if there exists a cycle (the transactions wait forever) in the system [9]. Depending on the distributed applications, system allows number of different kinds of resources requests. Based on the underling resource-request models [10, 11] a transaction can make different type of requests such as Single Resource (SR) model and multi-resource (known as AND) model. In SR model, it is the simplest request model of distributed environment and the transaction only have at most one resource request at a time and therefore, in directed graph, only one outgoing edge or request of transaction. In the AND model, the transaction can request multiple resources and transaction stays waited until all the resources are released or acquired. If there is any cycle in the Wait for Graph (WFG) then one transaction will be chosen as victim to be aborted to break the deadlock. Sometimes, there might be more than one deadlock (cycle) and abort. In this case, at least one victim from each cycle will have to be aborted [12]. Whereas multi cycles is problem in the previous algorithms where each algorithm ensures that only one transaction in each deadlock cycle will detect it, after the chosen transaction as victim is aborted, a deadlock detection algorithm must be invoked or continued to detect whether there is another deadlock cycle can be detected. This leads to resolving for frequent deadlocks by aborting the same of transaction more than once or more transactions. In this paper, the proposed algorithm uses an update message; one of its function is to decide the transaction, which exists in all cycles, has low required resources by it and needs to be aborted early to reduce the frequent detections of the deadlocks and breaking all cycles.
2 Related Work Different distributed deadlock detection algorithms have been proposed in the literature. The contributions of other researchers and the algorithms they have used for dealing with deadlocks have been discussed. In [13] has introduced a deadlock detection and resolution algorithm using the concept of the priorities of the transactions. In this algorithm, any starting transaction in the cycle has the highest priority is stored in a priority table. A priority-based table is used to resolve the deadlocks. B.M. Alom algorithm maintains a list of all the transactions in table is called transaction table, and whenever a deadlock cycle is detected, the priorities of the transactions in priority-based table are checked to choose the victim transaction. The transaction with the least priority is chosen as victim transaction to be aborted so that the resources held by it can be released to the waiting transactions. The main problem of this algorithm is failure to detect deadlocks if the order of the priorities is changed.
464
K. Al-Hussaini et al.
Transaction Wait for Graph (TWFG) algorithm [14] has presented new approach to develop B. M. Alom algorithm to resolve the problem of the priorities changes by creating two structures: first is the Local Transaction Structure (LTS), which is used to detect local deadlock, and second is Distributed Transaction Structure (DTS) to detect distributed deadlock. TWFG algorithm ensures that distributed deadlock detection is not dependent on local deadlock detection. This algorithm eliminates the dependency of two structures in B. M. Alom algorithm by assigning unique time stamp to each transaction to resolve the failure of the deadlock detection. The main problem of this algorithm is starvation. In Algorithm [15], Rashid has presented analysis on algorithms such as B. M. Alom [13], Himanshi Grover [18] and Swati Gupta [14]. Finally, all those algorithms detect and resolve the deadlock in distributed database, but all of them have some drawbacks in order such as priority, standard criteria and starvation. Therefore, Rashid suggested new technique that composed of B. M. Alom algorithm to detect deadlock and Himanshi Grover algorithm to determine the victim transaction by using time stamp to solve the problem of the priorities changes. In this technique each transaction has variable is called flag that is associated with it. Initially all fags have Zero value (flag = 0), which means not aborted yet. When younger transaction is detected according to Himanshi Grover algorithm, check its own flag if flag = 0; this technique put flag (younger transaction) = 1, which is not allowed to abort the youngest transaction and can abort the next earlier time stamp transaction. The resolution technique of deadlock by degree in [16], after the occurrence of the deadlock through check WFG the degree is used to resolve it. This technique resolves deadlock, which is detected by choosing the transaction, which has the highest sum of Out-degree and In-degree to be aborted. In [17] has presented a simple algorithm to detect the presence of deadlock. This simple algorithm uses an update message, which has two; functions are first to modify the Wait-for variables and second to check the occurrence of deadlock. As compared to many recent algorithms in the detection of the deadlock, the Brian M. Johnston algorithm can detect the most repeated deadlocks with minimum message passing. In this algorithm, there is no priority criteria of deciding which transaction needs to be aborted and this means that the repeated detections of the deadlocks will occur in multi-cycles deadlocks. In [18], Himanshi Grover has presented improvement over the algorithm which presented by Brain M. Johnston. In [17] there is no priority criteria to decide transaction, which needs to be aborted; hence, Himanshi Grover algorithm used the timestamps of transactions as priority criteria to decide the transaction, which must be aborted after the detection of the deadlock. Accordingly, the youngest transaction is aborted. The algorithm ensures that only one transaction in the deadlock cycle will detect it. All true deadlocks are detected in finite time because continua of propagation until all null queues are found and no false deadlocks are reported, but this algorithm does not have criteria to determine the transaction, which needs to be aborted early to reduce the repeated detections of deadlocks and the abort cost.
A New Multi-resource Deadlock Detection Algorithm
465
3 The Proposed Algorithm: Multi Resource Deadlock Detection (MRDD) In the proposed algorithm, we have developed the algorithms presented in [17, 18]. This development performs by adding a new function to update message. A new function works as criteria to detect or decide the transaction. It needs to be aborted early to reduce the frequent detections of the deadlocks by using income and outcome requests degree of the transactions because the algorithms in [17, 18] do not have this the criteria and step 5 in Sect. 3 shows a new function (a new criteria). In addition, MRDD algorithm avoids the problem of the undetected deadlock by swapping of transactions as in step 2 in Sect. 3. In the computer network, each site, transaction and resource (data object) has unique identifier to prevent conflict [17, 18]. The identifier of site is called Site-ID, transaction is called T-ID and resource is called R-ID. Each site has a certain portion of the database, some data objects (resources) and a few transactions. Every resource controlled by a site has a variable called Locked-by. The variable Locked-by determines the current state of the resource. If the resource is not locked (null) by any transaction, then Locked-by stores null, else, it stores the identifier of the locking transaction [17, 18]. Each transaction (Ti) at site (Si) has data structure like adjacency matrix in table form as shown in Table 1 as follows: Table 1. Structure of each transaction. Transactions T-ID Wait-for Held-by Request-Q Out-degree In-degree
Each transaction has T-ID represents the identifier of the transaction. From other hand, there are main variables for each transaction, which are: Held-by (Ti) variable if the transaction Ti requests resource is not locked then Held by (Ti) is set to null, else it stores the transaction that is holding the required resource by the transaction Ti. Request-Q (Ti) stores all the transactions, which request the locked resource by Ti, and it represents the waiting queue. Wait-for (Ti) variable, this variable is set to Ti if the transaction Ti has transactions in its Request-Q and Held-by, else it does not store any transaction (Initially all Wait-for variables have null value). MRDD algorithm adds two variables Out-degree (Ti) and In-degree (Ti) from [16] for detecting the victim transaction. When the transaction Ti requests the held resource Rj by the transaction Tj then MRDD Algorithm computes Out-degree and In-degree of WFG as follows
466
K. Al-Hussaini et al.
In-degree(Tj): when Tj is holding the required resource by Ti then MRDD Algorithm increases In-degree(Tj) variable by one and adds Ti to waiting queue of Tj (RequestQ(Tj)). Therefore, this variable stores the number of the transactions, which are waiting for the held resource by Tj. Out-degree(Ti): When the transaction Ti requests the held resource by Tj then the MRDD Algorithm increases Out-degree(Ti) by one and adds Tj to Hold-by(Ti). Therefore, this variable stores the number of the transactions, which are held the required resources by Ti. MRDD Algorithm computes out-degree and In-degree. In addition, MRDD uses array called First Detection (FD), two variables called abort_list and flag. First Detection (FD) array is used to store the transactions identifiers, which update message, will receive them in the first detection of the deadlock and to help in detecting or determining the resulted transaction of intersection all cycles, which needs to be aborted early to reduce the repeated detections of the deadlocks. Abort_list variable (Initially set Abort_list as null value (Abort_list = null) is used to store the identifier of the transaction that needs to be aborted early to reduce the repeated detections of the deadlocks. Flag variable (Initially set Flag as Zero value) is used to knowledge presence or absence of deadlock if flag = 1 then deadlock is free, else the deadlock still exists in the system or the deadlock does not occur yet. Suppose a transaction Ti makes a lock request for a resource Rj. In this case, MRDD algorithm includes the following steps:
A New Multi-resource Deadlock Detection Algorithm
1.
3.
4.
Transaction (Ti) Makes a Lock Request for a Resource (Rj) Send lock-request(Ti) to Rj; Wait for granted or not granted message; If (granted) then Locked-by(Rj)=Ti; Held-by(Ti)= null; Else /*suppose Rj is being used by Tj */ Out-degree(Ti)+=1; In-degree(Tj)+=1; Held-by(Ti)= Tj; Abort_list = null; Flag =0; Enqueue(Ti, Request-Q(Tj)); Temp=0;
/* Avoiding of undetected deadlock */ If (In-degree (Tj)>1, Out-degree (Ti)=1, the first transaction (Tf ) in Re quest-Q (Tj) has Out-degree>1 then Go to step 2 to call Swap(Tj, Ti); End if If (Held-by (Tj) != null OR Out-degree (Tj)>0) then Abort_list = null; Flag =0; Temp=0; Wait-for(Ti)= Ti; FD=null; FD=FD U Ti; Go to step 3 to call function Overall_Cycle(Ti, Ti); Go to step 4 to call Update(Wait-for(Ti), Request-Q(Ti), Ti); End if; End if; 2. Transaction Tk Receiving Swap (Tk, Tn) For every transaction in Request-Q (Tk) /* Tx is the first transaction in Request-Q (Tk)*/ Tx= Request-Q (Tk); If (Tx != Tn ) then Enqueue ( Dequeue (Request-Q (Tk) ), Request-Q(Tk) ); Tx=0; Else exit(); End if; Storing the Identifiers of Transactions in FD through Overall_Cycle( Tk, Ts) For every transaction in Request-Q(Ts) Tx=Request-Q(Ts); If (In-degree (Tx)=0) then Nothing; Else if ( Tx belongs to FD OR Tk=Tx) then Exit; Else if (In-degree (Tx) > 0 ) then FD=FD U Tx; Overall-cycle(Tk, Tx); End; Transaction (Tj) Receiving Update Message If abort_list= null then /* Call Request_Degree of Tj before Ti to detect victim */ Abort_list=Request_Degree (Tj); If abort_list= null and Temp=0 then /* If Temp=1 then the transaction which propagates update message can not be checked by Request_Degree */ Abort_list=Request_Degree(Ti); Temp=1; Else if abort_list != null and Temp=0 then Temp=1; If Wait-for (Tj) != Wait-for(Ti) then Wait-for (Tj) = Wait-for(Ti);
467
468
K. Al-Hussaini et al. Now, a check for deadlock is performed as follows: If flag=0 then If Wait-for(Tj) Request-Q (Tj)=null then Update (Wait for(Ti), Request Q(Tj), Ti); Else if Wait-for(Tj) Request-Q (Tj) != null then If Abort_list != null then Go to step 6 to call Resolution_Deadlock(); 5. Request_Degree(Ti) Receiving Transaction (Ti) If Out-degree (Ti)=1, In-degree ( Held-by (Ti) )=1 and the first transaction in Request-Q (Ti) has Out-degree=1 then Return Ti; Else if Out-degree (Ti) > 1, In-degree (Ti) > 1, Held-by (Ti) = (Tx,Ty) then If Held-by (Tx) Request -Q (Ty) =Tz, Out-degree (Tz) = 1, the first transaction in Request-Q (Tz) has Out-degree =1, all the transactions in Held-by (Ti), Held-by (Tx) and Held-by (Ty) belong to FD then Return Tz; Else if some the transactions in Held-by (Ti), Held-by( Tx) or Held-by (Ty) does not belong to FD OR Held-by (Ti) Request-Q (Ti) != null then Return Ti; Else if at least one transaction in Request-Q (Ti) is held the required re source by one transaction in Held-by (Ti), Request-Q (Tx) or Request-Q (Ty) which is in circular wait or belongs to FD then Return Ti ; Else if at most one transaction (Tm) in Request-Q (Ti) is in circular wait (be longs to FD), Out-degree (Tm)=1, In-degree (Tm) ≥1 and the first transaction in Request-Q (Tm) has Out-degree=1 then Return Tm ; 6. Resolution_Deadlock () Let the transaction in abort_list is in Tx. Send clear (Tx, Held-by (Tx)). If flag=1 then Allocate each resource Ri is held by Tx to the first requester Tk in RequestQ(Tx) For every transaction Ti in Request-Q(Tx) requesting resource Ri held by Tx Enqueue (Ti, Request-Q (Tk)); 7.
Transaction Tk Receiving a Clear (Tj , Tk) Message Tk purge the tuple having Tj as the requesting transaction from Request-Q (Tk). In-degree(Tk)=In-degree(Tk)-1; Out-degree(Tj)=Out-degree(Tj)-1; If Out-degree(Tj)=0 then flag=1; Else flag=0;
A New Multi-resource Deadlock Detection Algorithm
469
MRDD is divided into three phases as follows: 1. The first phase: It is new phase, which represents the criteria to decide the transaction, which needs to be detected before the detection of the deadlock and aborted early to reduce the repeated detections of the deadlocks. As we mentioned in step 5 if abort_list does not equal null then the victim transaction is detected and detection it is stopped. 2. The second phase: This phase deals with the detection of deadlocks; MRDD ensures that only one transaction in the deadlock cycle will detect it. The detection of the deadlock occurs if the victim transaction is detected and flag = 0. If flag = 0 then this phase will be similar to the first phase presented by [17, 18]. 3. The third phase: This phase deals with resolution of deadlock, ensures that the transaction in abort-list is aborted because it exists in all cycles.
4 Analytical Model Analysis This section applies and analysis MRDD in the multi resources deadlock detection in distributed database system. 4.1 Multi Resource Deadlock Detection in Two Sites with Eight Transactions Considering the distributed database system (DDS) consist of two sites are site 1 and site 2. Site 1 maintains four transactions, which are T1, T2, T3, and T4. Site 2 maintains four transactions, which are T5, T6, T7, and T8 as shown in Fig. 1.
Fig. 1. Directed graph of distributed environment having 2 sites.
Table 2. Transaction structure for Site 1 of Fig. 1. T-ID Wait-for Held-by Request-Q Out-degree In-degree T1
T1
T3
T2,T4
T2 T3 T4
1
2
T1
T1,T3
Null
2
0
Null
Null
T1,T2
0
2
T1
T1
Null
1
0
470
K. Al-Hussaini et al. Table 3. Transaction structure for Site 2 of Fig. 1. T-ID Wait-for Held-by Request-Q Out-degree In-degree T5
T6
T6
T7
1
1
T6
T6
T7,T8
T5
2
1
T7
T6
T5
T6,T8
1
2
T8
T6
T7
T6
1
1
Table 4. Transaction structure for Site 1 and Site 2 of Fig. 1. T-ID Wait-for Held-by
Request-Q Out-degree In-degree
T1
T1
T3
T2,T4
1
2
T2
T1
T1,T3
Null
2
0
T3
T1
T5
T1,T2
1
2
T4
T1
T1
T6
1
1
T5
T6
T6
T3,T7
1
2
T6
T6
T4,T7,T8 T5
3
1
T7
T6
T5
T6,T8
1
2
T8
T6
T7
T6
1
1
In site 1, based on reviews the intersection values of Table 2, there is no deadlock because there is no intersection between Wait-for(Ti) and Request-Q(Ti). In site 2, when T6 in Table 3 created cycles, then FD = {T6, T5, T7, T8}. When the transaction T6 propagates update message to T5 then T5 has condition according to step 5 in Sect. 3, abort_list = T5 and the process of checking the victim transaction is stopped because abort_list is not null. When the transaction T8 receives update message from T7 then deadlock is detected, according to Table 3 in site 2, the intersection value Wait-for(T8) ∩Request-Q(T8) = T6. Therefore, T5 is aborted. When there is transaction in site 1 requests resource from site 2 or update message propagates from site 2 to site 1 as in Table 4 and Fig. 1 then deadlock cannot be detected because aborting of T5 broke all cycles. Therefore, the number of repeated time of detections (the repeated detections of the deadlocks) is one and the number of the aborted transactions is one. Figure 2 shows that deadlock is free.
A New Multi-resource Deadlock Detection Algorithm
471
Fig. 2. Directed graph shows that deadlock is free in MRDD of Fig. 1.
MRDD algorithm is different on the previous algorithm in the following: 1. It calls updates message and modifies Wait-for when the transaction T6 created in site 2 of Fig. 1. 2. When the transaction T2 requests the held resource by T3 before T1 requests it then the aborting of T3 or allocation of the released resource by T3 to the first requester T2 and enqeueuT1 to Request-Q(T2) in the previous algorithms will cause the problem of the undetected deadlock as in Fig. 3. Therefore, MRDD algorithm avoids this problem by swapping in Request-Q(T3) as we mentioned in Sect. 3.
Fig. 3. Problem of undetected deadlock.
3. It detects the transaction T5 that has low required resources and exists in all cycles before the detection of the deadlock as in Fig. 1. 4. Aborting of T5 in Fig. 2 prevents T7 of the detection of the deadlock although intersection of Wait-for(T7) and Request-Q(T7) is not null because flag equals one, this mean that the deadlock is free.
5 Performance Analysis The performance of MRDD is compared against the existing algorithms in terms of the number of the detected deadlocks and the number of the aborted transactions. From Table 5, Fig. 4, we observe the results of MRDD and existing algorithm [15–18].
472
K. Al-Hussaini et al. Table 5. Performance comparison of MRDD and existing algorithms in Fig. 1.
Algorithms comparison factor
The number of the detected deadlocks
Brain M. Johnston (1991)
The number of the aborted transactions
3
3
Degree based resolution (2013) 3
1
Himanshi Grover (2013)
3
3
Abdullah Mohammed Rashid (2015)
2
2
MRDD
1
1
Fig. 4. Comparison of the number of detections and the number of aborted transaction in Table 5.
When the transaction, which has low required resources exists in the system and responsible for creation many cycles is aborted then algorithm which aborts this transaction has low cost. Therefore, the number of the aborted transactions (one transaction) equals the number of the detected deadlocks. If the victim transaction has more than one request or the victim transaction does not exists in all cycles then the number of the detected deadlocks greater than or equals the number of the aborted transactions. This section shows that MRDD is better than algorithms [15–18] because it reduced the detection and abort cost. As is shown in Fig. 2 and Fig. 4, an improvement percentage of MRDD Algorithm over Brain M. Johnston and Himanshi Grover Algorithms is as in Table 6. Table 6. Improvement percentage of MRDD Algorithm of Fig. 1. Improvement factor
Improvement percentage
Detections of deadlocks
66.7%
The aborts
66.7%
Propagations of update messages 50%
A New Multi-resource Deadlock Detection Algorithm
473
6 Conclusion New algorithm for deadlock detection in distributed systems is presented. In the above proposed algorithm, we uses a new update message which has three functions are: first to detect the victim transaction, second to modify the Wait-for variables and third to check the occurrence of deadlock. Along with this update message we make use of requests degree of transactions criteria which is a better way of deciding the transaction which exists in all cycles, may has low required resources by it, needs to be detected and aborted early to reduce the detection of the frequent deadlocks where aborting it breaks all cycles. We have analyzed the performance of MRDD and compared with algorithms presented in the literature. We observe that MRDD resolves deadlock in time is less than the existed algorithms.
References 1. Jadhav, S., Gawande, A.: An overview of distributed database systems advantages and its problem areas. Int. J. Innov. Res. Comput. Commun. Eng. 5, 2579 (2017) 2. Gupta, N.M., Gore, N.R.: Concurrency control and security issue in distributed database system. Int. J. Eng. Dev. Res. 4, 177–181 (2016) 3. Akintola, A.A., Aderounmu, G.A., Osakwe, A.U.: Performing modeling of an enhanced optimistic locking architecture for concurrency control in a distributed database system. ACM 37, 1265–1267 (2005) 4. Singh, B., Singh, A., Singh, N., Singh, P.: Concurrency control in distributed database management systems. Dealing unfair transactions at higher access classes (DUTHAC). Int. J. Sci. Res. Rev. 7, 1619 (2019) 5. Kaur, M., Kaur, H.: Concurrency control in distributed database system. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 1445 (2013) 6. Gupta, S.: Approaches for deadlock detection and prevention in distributed database system. Int. J. Softw. Web Sci. 9, 86 (2014) 7. Ghodrati, M., Harounabadi, A.: Provide a new mapping for deadlock detection and resolution modeling of distributed database to colored petri net. Int. J. Comput. Appl. 95, 1 (2014) 8. Yadav, R., Makkar, K.: Analysis for deadlock detection and resolution techniques in distributed database. Int. J. Innov. Technol. 1, 813 (2014) 9. Bassan, N., Singh, R., Kaur, S.: Approaches for deadlock detection for distributed systems. Int. J. Emerg. Res. Manag. Technol. 4, 136 (2015) 10. Singh, S., Tyagi, S.S.: A review of distributed deadlock detection techniques based on diffusion computation approach. Int. J. Comput. Appl. 48(0975–888), 28–29 (2012) 11. Sanchez, C., Sipma, H., Manna, Z., Subramonian, V., Gill, C.D.: On efficient distributed deadlock avoidance for realtime and embedded systems. In: Proceedings of the Twentieth IEEE International Parallel and Distributed Processing Symposium, pp. 133–136. IEEE Computer Society Press (2006) 12. Rahimi, S.K., Haug, F.S.: Distributed Database Management Systems: A Practical Approach. Wiley-IEEE Computer Society Pr, Hoboken (2010) 13. Henkens, F., Alom, B.M., Hannaford, M.: Deadlock detection views of distributed database. In: 6th International conference on Information Technology and New Generation, pp. 732–735 (2009) 14. Gupta, S.: Deadlock detection techniques in distributed database system. Int. J. Comput. Appl. 74, 42–45 (2013)
474
K. Al-Hussaini et al.
15. Rashid, A.M., Ali, N.: Deadlock detection and resolution in distributed database environment. Int. J. Sci. Res. Publ. 5, 1–9 (2015) 16. Chahar, P., Dalal, S.: Deadlock resolution techniques. Int. J. Sci. Res. Publ. 3, 3 (2013) 17. Johnston, B.M., Javagal, R.D., Datta, A.K., Ghosh, S.: A distributed algorithm for resource deadlock detection. In: Department of Computer Science, IEEE Transactions, vol. 11, pp. 252– 256 (1991) 18. Grover, H.: A distributed algorithm for resource deadlock detection using time stamping. Int. J. Eng. Res. Technol. 2, 4124–4127 (2013)
Big Data Analytics Model for Preventing the Spread of COVID-19 During Hajj Using the Proposed Smart Hajj Application Ibtehal Nafea(B) Taibah University, AlMadinah Almonwara, Medina, Saudi Arabia [email protected]
Abstract. Following the declaration of COVID-19 as a global disaster, one of the affected events is the Hajj. Ideally, the desire by the faithful to observe their religious practices calls for collaborative efforts. Ideally, the Kingdom of Saudi Arabia can adopt smart technology in facilitating the fight against the spread of the disease. After declaring Covid 19 as a global pandemic, new challenges for the healthcare sector in Hajj have emerged. Hajj represents a major challenge to the Saudi Ministry of Health (MOH) in addressing early detection of infection and controlling it at large gatherings of people. As seen in countries such as Australia, Bahrain, and China, smart technology enhances contact tracing, isolation, and even monitoring. Through the proposed Smart Hajj application, pilgrimage will be facilitated in a manner allowing the collection of data on the people. Ideally, the application will be applied from the start of the trip to the end. As such, the application allows the generation a solution from different data resources for the prevention of COVID-19 thus enhancing the pilgrimage experience. Keywords: Hajj · Big data · COVID19 · Cloud computing · Saudi Arabia
1 Introduction According to the Kingdom of Saudi Arabia (KSA), there were about 7,457,663 pilgrims in the 2019 Hajj pilgrimage. As the Ministry of Hajj and Umrah observed, while the pilgrims hailed from 180 countries, the number consisted of foreign visitor workers and citizens in Saudi Arabia, about 37% of the KSA’s population, citizens of visa-free countries such as those in the Gulf Cooperation Council countries. Table 1 shows the total number of pilgrims (Saudis and non-Saudis) in 2019. However, it suffices to observe that while the Hajj is vital, the Saudi government has maintained its concerns over public health. Ideally, the Saudi government has committed significant funds towards establishing comprehensive and integrated services for the pilgrims. Essentially, the KSA seeks to cater to the healthcare needs during the pilgrim’s stay and in their movement to and from the sacred places. Accordingly, public gatherings are especially detrimental to the wellbeing of the society due to the recent global pandemic. The Hajj, being a religious mass gathering, is one of these occasions. Although it © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 475–484, 2021. https://doi.org/10.1007/978-3-030-70713-2_44
476
I. Nafea Table 1. Total number of pilgrims 2019 [1]
is a time-defined meeting at a specific location, its purpose creates healthcare concerns. Particularly, the pilgrimage portends critical risks of exposure on a widespread scale granted the diversity of nationality and the geographic distribution. The first COVID-19 case in KSA was reported on 2nd March. While the Kingdom had developed mechanism aimed at combatting the prospect of the infection within, like other countries in the world, these efforts turned out inadequate. In essence, the reported case involved a Saudi national who had recently travelled from Iran to the Kingdom through Bahrain. As of April 7, the infection rate had risen to an estimated 2795 cases [2]. While seeking to curb the spread through mass gatherings, Saudi Arabia suspended entry of persons seeking to either perform Umrah pilgrimage in Makkah or visit the Prophet Mosque in Medina. Additionally, the Saudi government has since made allocations of considerable resources towards enhancing the pilgrims’ health of pilgrims. In particular, the government has reinforced the healthcare efforts by availing qualified personnel, health facilities, logistical support and PPE. Through one of the government’s directives, the Ministry of Health (MoH) fronted about 30,000 MoH employees to offer assistance in the 2019 Hajj. Also, the ministry made arrangements on equipping 25 hospitals in the process with a 5,000-bed capacity around Makkah, Madinah, and the Holy Sites. On top, about 142 primary healthcare centres and 140 mobile medical clinics have been established in close proximity to the pilgrimage places. Some of the recognized point of emergency services was established at Jamarat Bridge and Al Mashaaer Al Mugaddassah Metro. Seeing that the Hajj is an international event, the ministry further made improvements through the inclusion of language translation devices in 12 languages and direct translation from 937-Service Centre. These efforts have additionally
Big Data Analytics Model for Preventing the Spread of COVID-19
477
been capped with the installation of an Electronic Medical Record (EMR) system at 55 healthcare centres and all hospitals of the Holy Sites [3]. In 2018, the ministry made an unprecedented rollout of an automatic “robot” technology in medical consultations. Essentially, the utility sought to allow any hospital in Mina to enhance access to consultants in subspecialties. The activation of the technology enabling the identification of persons unidentifiable through their fingerprints, in cooperation with the Ministry of Interior, was an apparent enhancement of the digitization of healthcare for better service delivery [4]. Again, it is worth noting that in 2019 the MoH prepared comprehensive preventive plans. Specifically, these plans were designed to counter and alleviate the epidemic. In the process, the ministry would manage to prevent microscopic diseases that are subject to the international health regulations, and to help in early detection of cases of infectious diseases, setting out precautionary and preventive measures regarding them. Contained within the plan is the capacity to limit and contain outbreaks of epidemic and quarantine diseases during the Hajj. Throughout the process, the government would manage to manage contraction control over the epidemiological situation of infectious diseases during and after the Hajj season. Although these are prevention measures, a more enhanced step is applied on all pilgrims by subjecting them to predetermined health check requirements. Primarily, these were targeted at foreigners outside the Kingdom in cooperation with the Ministry of Interior. Evidently, these are considerable steps by MoH during the Hajj. While in support of huge numbers of pilgrims, the ministry has ensured that hundreds of conditions are treated or diagnosed. In the process, a large data cache is created that, upon storage, could be managed, analysed and utilized to provide a better understanding of disease spread and prevention. Essentially, the data could also enable the MoH improve its early disease warnings and, therefore, enhance the response to outbreaks [5]. Effective sharing of resources during the Hajj by the Ministry of Hajj, Ministry of Health, and Ministry of Interior (MOI) was enhanced by crowdsourcing. As such, the utility of data on a huge scale, while a big concern, was simplified. However, the situation demanded significant networking, processing and an extensive database. As data from the Communications and Information Technology Commission of Saudi Arabia stated, in the Hajj 2019, the data consumption rate in Makkah and the Holy Sites on Arafa reached averaged about 1718 terabytes. Comparatively, the consumption was equivalent to watching more than 704,000 h of 1080p HD video [9]. According to Zhou and colleagues, the 2019 outbreak of the novel coronavirus has resulted in the death of thousands of people [10]. At the time of their reporting shortly after the disease was declared a global pandemic, the writers observed that about 100,000 had been infected with thousands across the world succumbing [10]. However, while medical professionals and drug manufacturers have been at the forefront in seeking to cure the disease, Geographic Information Systems (GIS) provide a critical element in contact tracing and monitoring of suspect cases. Ideally, the ease of the infection’s spread calls for a regional approach. On one hand, GIS technology’s role in prevention and control is reliant on platform construction and map production. In this regard, Jung and Shin reveal that while the COVID-19 is expected to last indefinitely, analysis of big data such as people’s movement, interactions, and automated testing programs are a response of choice within different countries [11].
478
I. Nafea
Seeing that the World Health Organization had made recommendations on public guidelines for enhanced safety, the Kingdom of Saudi Arabia stands at a critical point in the fight against the infection. Particularly, the central place of KSA in the Islamic faith eases the potential for considerable infections. However, healthcare functions call for the adoption of new technologies reliant on the internet of things (IoT). According to Vaishya and colleagues, artificial intelligence can be adopted for early detection and diagnosis, monitoring treatment, contact tracing, the development of drugs, and even in the reduction of workloads amongst the healthcare workers [12]. Within the proposed application, the utility of the internet of things is tooted in gathering user data as they embark on their pilgrimage. Ideally, the use of comparative analytics allows the Smart Hajj application to weigh between patient reports, the data collected through automatic thermo-guns, and the overhead drones. Essentially, the storage and utility of the data create a pool of information that, through ultra-fast sorting, can offer predictive data thus facilitating prevention and control. As such, the KSA, while investing in the medical management of the outbreak, has responsibility for the safety of the pilgrims and, therefore, ought to adopt such IoT techniques. As observed, big data works to connect large data sets from different ministries. Through analytics, patterns are drawn facilitating join the links crucial in the control the spread of diseases such as COVID-19 during the Hajj. In big data, while data volume matters, its utility by the ministry is of more significance [13]. Although the vast amount of data in this case is collected at several different sites, cloud storage is naturally the best option. Mainly, the decision is supported by the difficulty of storing such quantities on local servers. As this paper illustrates, big data can be analysed for insights that lead to better decisions making. In the process, it is shown that the output is vital in planning strategies characterized by a high demand for computing resources and the needs for different end users all towards disease control. Eventually the paper suggests combination architecture of Big Data technology and cloud computing in linking the three KSA main ministries. Additionally, an outline of the detail about the methodology used and data representation in the architecture is offered. Finally, the paper’s conclusion will feature a statement on the concept’s future direction. By studying the history of disease outbreaks, it emerges that the creation of a unified central command is crucial in the eventual success of the fight. Wang, Ng, and Brook, while making a review of the Taiwanese case, reveal that the region’s preparedness was informed by the 2004 SARS outbreak [14]. As a result, the management of COVID-19 involved the screening of travelers entering the region at the points of entry. While KSA is seeking to ensure that the Hajj continues in a controlled environment, the need for checking on the pilgrims demands regular screening, isolation of the pilgrims accordingly, and even the availability of health facilities in the region. On one hand, monitoring the pilgrims ensure that the government has individual health records while, on the other, it allows ensures the pilgrimage proceeds cognizant of the guidelines stipulated by the World Health Organization.
Big Data Analytics Model for Preventing the Spread of COVID-19
479
2 Related Works After declaring COVID-19 a global pandemic, new challenges for the world have been revealed, and many mobile applications have been developed to controlling its spread. Through cloud computing, smart applications connect with all networked systems facilitating the production of enhanced prediction methods [15]. In one instance, the Australian government was seen launching the COVIDSafe app particularly for contact tracing. While the application enables tracking and monitoring, similar systems help to reduce infection rates. It also makes it possible for officials to easily isolate any infected persons or those deemed likely infected. By using GPS data from phones, it is possible to detect potential hotspots thus determining those that are exposed. Cognizant of such measures, governments across the world have sought to use phone tracking and monitoring amongst the citizens. However, GPS technology is not adequately accurate to measure short distances between two phones. Instead, many governments are developing applications allowing the exchange of low-power Bluetooth radio signals. Seeing that each phone creates a random digital ID, enabling transmission with neighbouring phones is recorded as a Bluetooth ‘handshake’. If the user develops symptoms or the tests are positive, they can send notifications to phones that were close to them [6]. Bahrain launched an electronic bracelet to track active COVID-19 cases. Via Bluetooth connection between the GPS-enabled bracelets the government facilitated the tracking of movements thus determining the geographic position of infected persons [7]. Indeed, the fight against infectious diseases is rooted in a region’s past experiences. As Lin and Hou show, in 2015, the outbreak of MERS in South Korea resulted in aggressive testing. Eventually, the management of COVID-19 across the region informed the adoption of a smart management system using data from credit card records, car and cellphone GPS data, and security camera footage. Essentially, containing the pandemic calls for not only communication on social distancing measures but also the enforcement of such. While individuals may fail to unintentionally observe the stipulations, through smart technology, the inter-ministerial team can determine the pilgrim’s movement. By analyzing the big data at the entry points and station within the event area, monitoring subjects without necessarily demanding quarantine will be notable. Additionally, within China, police officers can measure people’s temperature through specially designed helmets. Ideally, the smart helmets were designed to scan the body temperature of pedestrians via the installed infrared cameras. As per the design, the information recorded in real time appears on the virtual reality screen in front of the wearer’s eyes [12]. Moreover, the technology has the capacity to measure the temperature on average of a hundred people in about two minutes [8]. According to the World health Organization, an essential sign of infection with COVID-19 is increased body temperature [10]. As such, thermography is deemed the perfect way to check a large group of people. While contactless, the application gives accurate results. Several ministries and companies in Saudi Arabia have used this technology for the early detection of COVID-19. For instance, the Municipality of Al-Madinah Al-Munawarah region begun adopting thermography techniques using drones equipped
480
I. Nafea
with thermal cameras at the central market and public sites to combat the new Corona virus. Following a directive by the Ministry of Hajj and Umrah, 2020 pilgrimage would feature “very limited numbers” consisting of only the nationalities already in the Kingdom. With the continued spread of the pandemic around the world, control over the crossborder movement of people ensued in various jurisdiction as a means of combating the pandemic. Given that the media has been critical in the growth of internet communication technology, the need for sourcing information through the cloud becomes instrumental. Ideally, the concept creates the need for adopting cloud computing in offering storage solutions. As Jung and Shin observe, the adoption of artificial intelligence in the data mining process requires structuring the data [11]. While the process of contact tracing devoid of technology can be tedious, through the application, the process is hastened granted that analytical tools used in big data management, as adopted in the proposed program, have an analytical function. Therefore, these instances are depictive of the critical role technology is playing in the fight against the coronavirus. As such, the Smart Hajj application is a considerable tool for the pilgrims. Seeing that the mass gathering enhances the possibility of pilgrims getting infected, the application is instrumental especially for sick and immunocompromised. Essentially, the application will be using wireless connectivity provided by Saudi Telecom Company in more than 400 holy places.
3 Methodology The methodology is discussed as five steps for analysis of the data, as presented in Fig. 1.
Fig. 1. Methodolgy steps
3.1 First Step (Research Question) Fundamentally, the research is driven by the desire to address the following questions: 1. What type of personal data is available? 2. How relevant is the data for healthcare management?
Big Data Analytics Model for Preventing the Spread of COVID-19
481
3. How does COVID-19 relate to the movement of people? How can the data gathered enable controlling the spread of COVID-19 during the 2020 Hajj? 3.2 Second Step (Data Collection) Data analysis begins with the collected data from multiple sources within the Ministry of Hajj, Ministry of Health, and Ministry of Interior. Having presented a formal request with the ministries coupled with a presentation of the application’s proposition, the data was gathered particularly on account of its differentiation. In a bid to enhance the citizen’s privacy, the Ministry of Interior demanded the reduction of all identifying data. However, due to the need for personal identification, for written consent from the subjects would be required. Ideally, it was inclusive of: Pilgrim’s information: Passport_ID, entry visa and fingerprints checked by the passport department at the entry port. This kind of data will be collected from the Ministry of Interior. In the 2020 Hajj there will be no pilgrims from outside the Kingdom, so we need the residence number for non-Saudis. Pilgrim’s health information or Electronic Medical Record (EMR). Also, data is collected on the classification of disease according to the International Classification of Diseases (ICD). Essentially, the data was collected from the Ministry of Health (MOH). Crowdsourcing: Crowd movement management schedules, and companies and campaigns that take care of pilgrims, from their countries to Hajj, throughout the journey. As such, the Ministry of Hajj. This type of information can be collected from crowd control and crowd management centres in cooperation with the Ministry of Hajj. Through cloud computing, the data will be integrated and thus sorted into a standardized format [14]. 3.3 Third Step (Investigation) In the preparation for data analysis, categorization of the variables, the subjects, and how to access related and representative data was deemed vital. Ideally, the process is crucial granted that, for example, the majority of the pilgrims are elderly. As such, they are a vulnerable population facing a high risk of contracting the disease. Therefore, all parties associated with the Hajj need coordination and strict measures to mitigate COVID-19’s spread. Fundamentally, the general risks associated with a mass gathering event in Hajj contain pragmatic results inclusive of the COVID-19 transmission speed risk associated with Hajj in limited Holy Places, and the expected burden on the Ministry of Health to control these risks. 3.4 Fourth Step (Research) After the data investigation step, we need to research possible solutions to prevent COVID-19 spread. Notably, the adoption of different tracking and monitoring technologies is beneficial. Through cloud computing, different ministries will be able to work collaboratively.
482
I. Nafea
3.5 Fifth Step (Results/Knowledge) To formalize the results/knowledge from the last steps, this paper introduces the model that helps the different ministries that work in Hajj to prevent the spread of COVID-19 thus providing a roadmap to follow during Hajj time.
4 System Architecture Recently, there has been a rapid development of internet technologies and the use of various services via the Internet such as social networking applications. Through these applications and the huge data volumes involved, there is a requirement for a large network and fast data processing capacity [11]. As such, cloud computing has been an essential emergent technology offering large storage places. A case in point is the volume of data consumption during peak times of Hajj season reached 1718 thousand terabytes, mentioned earlier, which was 34% more than the previous year [9]. Digital technology allows people to create paperless network that enhances analytical processes. Essentially, the emergence of artificial intelligence is one such element facilitating data analysis [16]. Through the Smart Hajj, every pilgrim will be in a position to manage their trip from the beginning to the end. Ideally, the application’s users will get digital assistance with obtaining a visa and travel, and while performing the rituals of Hajj easily while maintaining good health practices during the Hajj. As per the design, entry of the user’s data will relay their information into a common database specific to the pilgrims. As Bragazzi et al. observes, such a design is meant to ensure the correlation of data drawing out patterns [17]. Specifically, the user’s movement will be logged and, relating to other users, any instance of an infection will allow mapping the virtual path travelled hence facilitating contact tracing. Additionally, each pilgrim, while regularly evaluating their health condition, have an opportunity for self-health assessment, and if they feel a fever, the application will check their real-time temperature using different thermal screening devices distributed across Holy Sites. Essentially, the millions of reported cases across the world have resulted in high media traffic [18]. Granted the intensity of focus, it suffices that all technologies be adopted in enhancing social distancing. Although GPS tracking will notify the app users on their proximity to others, cameras in the pilgrimage zones will also facilitate movement monitoring thus enhancing social distancing measures. The cameras, also, will allow surveillance cameras can be used to monitor that masks are being worn. All the data will be exchanged and integrated via the Hajj Epidemic general centre at the inter-ministerial Hajj center, as seen in Fig. 2. The Hajj Epidemic general center collects real-time data from multiple databases within ministries. By monitoring pilgrims’ temperatures via thermal screen-ing, applying the social spacing system and daily self-health assessment, it can help the early detection of COVID-19, if any, and isolate the infected people, who can be found via their mobiles’ location. In the process, the big data gath-ered creates a pool for managing the pandemic ravaging the world since Decem-ber of 2019 [19].
Big Data Analytics Model for Preventing the Spread of COVID-19
483
Fig. 2. System architecture
5 Conclusion Ideally, the ministerial uptake of the idea is rooted in the need to allow the pilgrims to conduct their services freely while cognizant of healthcare guidelines, following their digital footprint and data entries on their health conditions. Fundamentally, it is only through collaborative efforts across the globe that the pandemic can effectively be managed. However, while pharmaceutical companies are dedicated to the search for a drug, technology companies should direct their efforts towards creating people-centric management approaches. Big data has played an important role in preventing the spread of diseases such as COVID-19. This paper has illustrated a model to manage and integrate the big data of pilgrims and the different ministries that work during Hajj so that it can be analysed and used to control COVID-19’s spread during the Hajj, with support for quality healthcare. The proposed system links large data sets from different ministries using the Smart Hajj mobile application. It is monitored from the Hajj Epidemic general centre to monitor and minimise the spread of diseases during Hajj. The healthcare sector during Hajj has many challenges that should be considered. In the future, we can consider techniques of data visualization that are needed to link different ministries, and to improve healthcare during Hajj. This consideration will not only improve the idea of our work but will also allow more flexibility and reliability in our system.
484
I. Nafea
References 1. General Authority for Satistics, Kingdom of Saudi Arabia. https://www.stats.gov.sa/en 2. http://saudigazette.com.sa/article/591598/SAUDI-ARABIA/Saudi-corona-cases-stand-at2795-recoveries-now-615. Accessed 03 May 2020 3. Minsitry of Health, Saudi Arabia. https://www.moh.gov.sa/en/Ministry/MediaCenter/News/ Pages/News-2019-08-25-016.aspx 4. Saudi Press News Agency. www.spa.gov.sa/1958256 5. Nafea, I.: Mobile health application running on public cloud during Hajj. In: Younas, M., Awan, I., Holubova, I. (eds) Mobile Web and Intelligent Information Systems. MobiWIS 2017. Lecture Notes in Computer Science, vol 10486. Springer, Cham (2017) 6. https://www.sciencemag.org/news/2020/05/countries-around-world-are-rolling-out-contacttracing-apps-contain-coronavirus-how 7. Mobi health news, By Rachel McArthur, “Bahrain launches electronic bracelets to keep track of active COVID-19 cases”, April 08 2020. https://www.mobihealthnews.com/news/europe/ bahrain-launches-electronic-bracelets-keep-track-active-covid-19-cases 8. South China morning post, “Chinese police now have AI helmets for temperature screening”, Abacus. Published: 4:17 pm, 28 Feb, 2020. https://www.scmp.com/tech/article/3052879/chi nese-police-now-have-ai-helmets-temperature-screening 9. Communication and Information Technology Comission, Publish Date: 10/12/1440. https:// www.citc.gov.sa/en/mediacenter/pressreleases/Pages/2019081101.aspx. Accessed 14 June 2020 10. Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., Song, C.: COVID-19: challenges to GIS with big data. Geogr. Sustain. 1(1), 77–87 (2020) 11. Jung, J.H., Shin, J.I.: Big data analysis of media reports related to COVID-19. Int. J. Environ. Res. Pub. Health 17(16), 5688 (2020) 12. Vaishya, R., Javaid, M., Khan, I.H., Haleem, A.: Artificial Intelligence (AI) applications for COVID-19 pandemic. Diab. Metab. Syndr. Clin. Res. Rev. 14(4), 337–339 (2020) 13. Hua, J., Shaw, R.: Corona virus (Covid-19) “infodemic” and emerging issues through a data lens: the case of china. Int. J. Environ. Res. Publ. Health 17(7), 2309 (2020) 14. Wang, C.J., Ng, C.Y., Brook, R.H.: Response to COVID-19 in Taiwan: big data analytics, new technology, and proactive testing. Jama 323(14), 1341–1342 (2020) 15. Ienca, M., Vayena, E.: On the responsible use of digital data to tackle the COVID-19 pandemic. Nat. Med. 26(4), 463–464 (2020) 16. Pham, Q.V., Nguyen, D.C., Hwang, W.J., Pathirana, P.N.: Artificial Intelligence (AI) and Big Data for Coronavirus (COVID-19) Pandemic: A Survey on the State-of-the-Arts (2020) 17. Ting, D.S.W., Carin, L., Dzau, V., Wong, T.Y.: Digital technology and COVID-19. Nat. Med. 26(4), 459–461 (2020) 18. Bragazzi, N.L., Dai, H., Damiani, G., Behzadifar, M., Martini, M., Wu, J.: How big data and artificial intelligence can help better manage the COVID-19 pandemic. Int. J. Environ. Res. Publ. Health 17(9), 3176 (2020) 19. Lin, L., Hou, Z.: Combat COVID-19 with artificial intelligence and big data. J. Travel Med. 27(5), taaa080 (2020)
Financial Time Series Forecasting Using Prophet Umi Kalsom Yusof1(B) , Mohd Nor Akmal Khalid2 , Abir Hussain3 , and Haziqah Shamsudin1 1 School of Computer Sciences, USM, 11800 Georgetown, Pulau Pinang, Malaysia
[email protected], [email protected] 2 School of Information Science, JAIST, Nomi 923-1211, Japan [email protected] 3 Department of Computer Science, LJMU, Liverpool L3 3AF, UK [email protected]
Abstract. Forecasting the financial time series had been a difficult endeavor for both academia and businesses. Advances of the financial time series forecasting had moved from traditional techniques to automated and intelligent techniques that based on machine learning and deep learning. However, many methods of automatic forecasting have been tailored to the specific nature of the time series. As such, a recently introduced Prophet model, which is based on time series decomposition, is adopted with variants of its input parameters and applied to six different financial time series data sets obtained from Standard & Poor’s 500 index (SP500), Dow Jones Industrial Average index (DJIA), China Securities Index (CSI300), Malaysia’s stock market of Kuala Lumpur Composite Index (KLCI), Hong Kong Hang Seng 300 index (HS300) and Tokyo’s stock market of Nihon Keizai Shinbun index (Nikkei). The results of the time series forecasting show that the Prophet model is competitive in modeling the actual market movement by simply adopting appropriate parameters where the measure of Mean Absolute Percentage Errors (MAPE) was 6% at most. In addition, the errors of the forecasting result are also comparable to the results of much more complex forecasting models from the literature. Keywords: Financial time series · Prophet · Time series forecasting
1 Introduction Forecasting is a process that produces a set of outputs by a given set of historical input variables [1]. The outputs are future occurrences which are based, at least in part, on presently observable patterns or past events that continue into the future. Through study and observation past relationships can then be discovered. The fundamental idea of forecasting is to find an estimation of mapping between the input and output data in order to discover the implicit rules governing the observed movements [1]. Forecasting the financial market in a robust way has always been a major challenge in both academia and business [1, 2]. Financial time series are among the most difficult signals to forecast because of its noisy nature, market-influencing factors that has © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 485–495, 2021. https://doi.org/10.1007/978-3-030-70713-2_45
486
U. K. Yusof et al.
complex interactions between them, large amount of random day-to-day variations, subject to abrupt shifting (their statistical properties do not remain constant through time) and the unknown random processes (other sudden changes in the influencing factors or unexpected news). These factors naturally lead to the debate on market predictability among the academics, market practitioners and investors [2]. Before investing in a stock there are two types of analysis which investors perform. First is the basic analysis where investors look at the performance of the industry and economy, political climate, intrinsic value of stocks etc. [3], to decide whether to invest or not. For example, some time series do react to business cycle or to slow-down periods, the announcements of firm specific news, fiscal measures, employment reports as well as political events [4]. In addition, due to its unpredictable behaviors, there is some risk to investment in the stock market [1]. On the other hand, technical analysis is the evaluation of stocks from market activity, such as past prices and volumes by means of studying statistics generated from them [3]. Technical analyst aims to measure a security’s intrinsic value using the stock charts to identify trends and patterns that may suggest the trend of a stock in the future. The trend of stock or stock price index may be predicted by applying appropriate algorithms and pre-processed efficiently the information obtained from stock prices. The scope of this paper focuses on the second types of market analysis, which specifically focuses on forecasting the past and future stock prices. A recently introduced Prophet model [5], inspired by the nature of time series forecasted at Facebook and only recently applied to hydrometeorological time series [6] capable of handling the seasonality and non-normality effect of time series forecasting, provided a good motivation for applying it to the challenging nature of the financial time series. In this paper, variants of the Prophet model, which are based on time series decomposition, is applied to the financial market forecasting. In the first step, the Prophet model was adopted where variants of its input parameters, are utilized to effectively model the nature of the financial time series data. These models were applied to seven different financial time series data sets in order to evaluate its performance measure using the Mean Absolute Percentage Error (MAPE). The organization of the paper are as follows. Section 2 reviews recent literatures on financial time series forecasting problem and current practices in solving them. Then, the methodology adopted to model the time series in this paper is given in Sect. 3. In Sect. 4, the results of the proposed approach are evaluated and compared with other results reported in the literature. Finally, Sect. 5 concludes the paper.
2 Literature Review and Current Practices Creating an intelligent system that can accurately predict stock price with high efficiency and accuracy has always been a subject of great interest for many investors and financial analysts. Some of the traditional approaches involves statistical models such as moving average, exponential smoothing, and autoregressive integrated moving average (ARIMA), which are linear in their predictions of the future values [7]. However, in recent years, most approaches in financial time series forecasting have shifted towards using automated and intelligent solutions.
Financial Time Series Forecasting Using Prophet
487
Among the most prominent financial time series forecasting approaches are the machine learning (ML) approaches, in particular, the adoption of support vector regression (SVR), random forest, artificial neural network (ANN) and several ANN variants. [8] had proposed a multi agent system of ANN with bat algorithm to predict German (DAX) stock price. In order to predict the stock price efficiently, principle component analysis (PCA) is integrated with a stochastic time effective function neural network (STNN) to extract the principal components from four financial time series data [9]. [7] had proposed a fusion of multi-stage ML approaches that combines SVR on the first stage with another three ML approaches on second stage. A nonlinear autoregressive exogenous (NARX) model which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, is the other perspective of time series forecasting. This NARX model had proposed by [10] which utilizes dual-stage attentionbased recurrent neural network (DA-RNN) based on the relevant driving series and long-term transient dependencies to make predictions. Other approaches looked into a complex time-sensitive non-linear model and integration of various ML and factors into the time series itself. [11] had also proposed an intelligent hybrid weighted fuzzy (IHWF) time series model in financial markets to improve forecasting accuracy which enhanced by an adaptive sine-cosine human learning optimization (ASCHLO) algorithm and neighborhood volatility direction (NVD) to determine the intervals, weights of the fuzzy system, and the effective universe discourse. Another school of approaches utilizes a more recently hyped sub-categories of the ML approaches, which are the deep learning (DL) approaches. [12] had presented a novel deep learning framework where wavelet transforms (WT), stacked autoencoders (SAEs) and long-short term memory (LSTM) are combined to demonstrate their performance on six market indices forecasting. In order to accumulate the ultimate rewards in an unknown environment, [13] had combined a DL model to capture the dynamic market condition for informative feature learning and a reinforcement learning (RL) framework to interacts with the DL representations and makes trading decisions. The approach was verified under broad testing conditions, on both the stock and the commodity future markets. [14] had adopted the long short-term memory (LSTM) neural network and had extracted investor sentiment from forum posts using Nave Bayes approach, outperforming forecasting performance of other benchmark models. However, many methods of automatic forecasting have been tailored to specific types of time series [5]. The recently introduced Prophet model, which adopted in this paper, was motivated by the challenges involved in forecasting at scale and the nature of the Facebook time series forecasting (such as multiple seasonality, floating holidays, piecewise trends). The motivation of this paper is twofold. Firstly, automatic forecasting technique such as Prophet was seen as possible solution for analyst with little knowledge on time series forecasting. Secondly, this paper also demonstrates the accessibility of Prophet as an automatic forecasting technique for different forecasting problem with potentially distinctive features.
488
U. K. Yusof et al.
3 Prophet Model and Research Methodology Prophet was proposed by [5] which capable of describing the common characteristics of business time series forecasting model (specifically optimized for the business forecast tasks at Facebook). The typical business time series model, in the current context, have any of the following characteristics: • At least a few months (preferably a year) of history records on hourly, daily, or weekly observations. • Strong multiple human-scale seasonality: day of week and time of year. • Important holidays that are known in advance and occur at irregular intervals (e.g. the Super Bowl). • A reasonable number of large outliers or missing observations. • Historical trend changes, for instance due to logging changes or product launches. • Trends that are non-linear growth arches, where it hits a normal limit. With much less effort, the Prophets default settings produce automatic forecasts that are often precise as those produced by skilled forecasters. In addition, Prophet introduces the “analyst-in-the-loop” concept where the results do not fixed by the automatic procedure. By using a variety of easily interpretable parameters, analyst with no background in time series can improve the forecasts. For special cases forecasting, it is possible to cover a large variety of business use-cases by combining automatic forecasting with analyst-in-the-loop forecasts. In Prophet, modular and additive regression model with interpretable parameters are utilized where the time series data is decomposable into three main model components: trend, seasonality, and holidays. They are combined in Eq. 1. y(t) = g(t) + s(t) + h(t)+ ∈t
(1)
where g(t) is the piecewise linear or logistic growth curve for modelling non-periodic changes in time series, s(t) is the periodic changes (e.g. weekly/yearly seasonality), h(t) is the effects of holidays (user provided) with irregular schedules, and _t is the error term accounts for any unusual changes not accommodated by the model. This specification is similar to a generalized additive model (GAM), a class of regression models with potentially nonlinear smoothers applied to the regressors. In Prophet, time is the only one used as a regressor and the components composed of several nonlinear and linear functions of time. Formulation using GAM can combine new components as necessary and easily decompose [5]. This makes the forecasting problem to be framed as a curve-fitting exercise, differs from the previously proposed time series models in the literature, which specifically consider the temporal dependence structure in the data. This formulation gives few practical advantages, although some important inferential advantages of using a generative model such as an ARIMA, was sacrificed: • Flexible: analyst can make different assumptions about trends as Prophet can easily accommodate seasonality with multiple periods. • Robust: regularly spaced or interpolating missing values of the time series data point were not required like the ARIMA model (i.e. removing outliers).
Financial Time Series Forecasting Using Prophet
489
• Adjustable: the analyst can interactively explore many different model specifications since fitting is very fast. Also, analyst can impose certain assumptions on the forecast since the forecasting model has parameters that easily interpretable and changeable parameters. In this paper, two parameters of the Prophets were manipulated which involves the changepoint selection of the time series that can be defined as range changepoint and prior changepoint. The range changepoint is the proportion of historical data point in which trend changepoints will be estimated. The prior changepoint is the parameter modulating the strength of the seasonality model where larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. The default values of range changepoint and prior changepoint are 0.8 (80%) and 0.05 (5%), respectively. These parameters are varied to observe different evaluation measure of the forecasts. The changepoints can be specified by the analyst to determine how much the growth rate is allowed to change in term of range and size of the data points. This is typically achieved using the known dates of product launches and other growth-altering events. As such, the value range of the range changepoint and prior changepoint are [0.05, 1.00] and [0.55, 1.00] with 0.05 intervals, respectively.
4 Experimental Results 4.1 Data Set and Evaluation Measure To evaluate the forecasting performance of the Prophet’s models, the daily data from Standard & Poor’s 500 index (SP500), Dow Jones Industrial Average index (DJIA), China Securities Index (CSI300), Malaysia’s stock market of Kuala Lumpur Composite Index (KLCI), Hong Kong Hang Seng 300 index (HS300) and Tokyo’s stock market of Nihon Keizai Shinbun index (Nikkei) were selected. Since Prophet model can only utilize one feature at a time, this paper considers only the closing price of the data and discarding other features of the data. The SP500 and DJIA index are commonly considered as the most advanced or developed financial market in the world that trades in New York stock exchange [12]. On the contrary, financial markets such as CSI300 in China, and KLCI in Malaysia are often classified as new markets which represent developing markets. In addition, HS300 in Hong Kong and Nikkei in Tokyo represent a market condition that falls between the developed and developing market. Therefore, these seven stock indices give us a natural setting to test the Prophet model performances based on different market conditions. The SP500, DJIA, CSI300, Nikkei, and KLCI data cover the time period from 01/09/2008 up to 30/09/2016, which account to 4160, 4160, 4022, 4046 and 3170 data points, respectively; while the HS300 data covers the time period from 02/09/2008 up to 03/09/2016, which accounts to 4076 data points. The non-trading time periods are treated as frozen where only the time during trading periods are adopted. Mean absolute percentage error (MAPE) is used to evaluate the performance of the prediction models. Formulae of this evaluation measure is shown in Eq. 2, where At is actual value and Ft is forecast value. Note that MAPE is a measure of the deviation between the prediction values (Ft) and the actual values (At) where the measure can take
490
U. K. Yusof et al.
values between 0 and 1, and the prediction performance is better when the values of these evaluation criteria are smaller. The MAPE was chosen since it is relatively more stable than other evaluation criteria [9]. 1 n |At − Ft | × 100 (2) MAPE = t=1 N At 4.2 Results The forecasting results of the Prophet model with their respective parameter settings are reported in Table 1. Observing the parameter settings and their resulting forecasts, the forecasting result becomes better when both parameters are approaching the value of 1. Compared to the default Prophet settings, the results were acceptable but the forecasting results have a wide range of errors (up to 6%). This situation indicated that the trend of most time series data considered in this paper contains many seasonal data points and fluctuation occurred many times along the time series, as indicated in Fig. 1 to Fig. 6. By having higher value of range changepoint, more proportion of historical data point can be used to estimate the changepoints. In addition, higher value of prior changepoint also enables the Prophet model to fit larger seasonal fluctuations that occurred and modulate different seasonality model along the time series. Based on the best parameters of the Prophet model showed in Table 1, Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5, and Fig. 6 respectively shows the forecasting results. Within the years 2008, 2010, 2012 and 2015, HS300 stock markets found to have large variation as shown in Fig. 1. At the same time, we can see that during large fluctuation period, forecasting by the best Prophet parameters is relatively not accurate. DJIA stock market also has a large fluctuation periods in the years 2009 and 2015 (Fig. 2), which severely affect the forecasting result. Large fluctuation periods also appears in the SP500 stock market in the years 2009 and 2016 (Fig. 3); CSI300 stock market in the years 2009 and 2015 (Fig. 4); Nikkei stock market in the years 2009, 2013, 2015 and 2016 (Fig. 5); and KLCI stock market in the years 2010, 2012, 2013 and 2015 (Fig. 6). However, when the stock market is relatively stable, the forecasting result is nearer to the actual value. The results observed also showed the importance of the range changepoint and prior changepoint parameters of the Prophet model which played a role in capturing the time series pattern of the data. Larger range changepoint value means larger data points were utilized to assist the automated forecasting of the Prophet when large fluctuation periods occurred within the time series. This is also true for the case of large prior changepoint value where the effect of a large fluctuation period can be reduced in term of the overall performance of the forecast. This is because fluctuation that appears for longer period can be modelled along the time series where the forecasted value is nearer to the actual value. This makes the forecasting results much more accurate when the stock market fluctuates erratically on certain period. In addition, the advantage of the Prophet model in the context of this paper was the exclusion of data normalization. While data normalization may be needed to address outliers which sometimes removed during data preprocessing, this also removes valuable information of the time series. Adjusting the range changepoint and prior changepoint of the Prophet model addresses this issue. The daily stock price and the relative
Financial Time Series Forecasting Using Prophet
491
Table 1. Forecasting results of the prophet model with their respective parameter settings based on MAPE evaluation measure Change point parameters
Results (MAPE*)
Range
Prior
HS300
SP500
DJIA
CSI300
NIKKEI
KLCI
0.05
0.55
9.3672
5.0886
4.3403
17.0119
13.0508
4.7439
0.10
0.60
6.8592
4.9801
4.0486
15.4705
13.0425
4.6589
0.15
0.65
6.2481
4.8422
3.6453
15.1522
12.3517
4.6150
0.20
0.70
6.2301
4.5050
3.5603
14.2003
11.4234
4.4729
0.25
0.75
6.1719
4.3957
3.5350
13.5843
10.1540
3.9775
0.30
0.80
6.0746
4.2732
3.4336
12.8234
9.3697
3.5994
0.35
0.85
5.5840
3.8521
3.3146
11.4079
8.1247
2.9920
0.40
0.90
5.3824
3.6927
3.2353
10.0263
7.6163
2.5663
0.45
0.95
4.9664
3.6159
3.1466
9.5715
7.5270
1.9686
0.50
1.00
4.5371
3.3260
2.8075
9.0283
7.2186
1.6032
0.55
0.55
4.1199
3.1287
2.6640
8.7612
5.7767
1.3575
0.60
0.60
4.2575
2.7835
2.4083
8.4545
5.1098
1.3023
0.65
0.65
3.9170
2.4605
2.2646
8.3540
5.0927
1.1599
0.70
0.70
3.7380
2.3719
2.2441
8.2303
4.9505
1.2044
0.75
0.75
3.5163
2.2123
2.1032
6.4960
4.3819
1.1770
0.80
0.80
3.1773
2.1973
2.1161
4.5398
3.2439
1.1653
0.85
0.85
3.0869
2.2605
2.0848
4.4292
3.2235
1.0382
0.90
0.90
2.8157
2.2017
2.0499
3.8761
3.2466
1.0478
0.95
0.95
2.6879
2.1837
2.0501
3.6400
3.1799
1.1109
1.00
1.00
2.6846
2.1304
2.0088
2.6543
3.8384
1.1094
4.6399
2.5568
2.4272
3.8463
6.0639
1.3703
Default Prophet *refer Eq. 2
errors of forecasted results from the Prophet model have shown certain fluctuation trend, where small fluctuation leads to relatively small errors and the large fluctuation leads to relatively large errors. Furthermore, the Prophet model is accessible to analyst that have limited capabilities and expertise in time series where its parameter is easily adjustable to meet specific use cases. Although its performance may need to be further improved in the context of this paper, Prophet model presented with a fairly competitive forecasting result (Table 2). Note that the forecast performance obtained by the Prophet model also does not utilizes any feature engineering and technical indicators as input. As such, Prophet provide a suitable platform to test and benchmark different methods of forecasting the financial time series with multiple data sets.
492
U. K. Yusof et al.
Fig. 1. The forecasting result of the Prophet model applied on HS300 data
Fig. 2. The forecasting result of the Prophet model applied on DJIA data
Fig. 3. The forecasting result of the Prophet model applied on SP500 data
Financial Time Series Forecasting Using Prophet
Fig. 4. The forecasting result of the Prophet model applied on CS1300 data
Fig. 5. The forecasting result of the Prophet model applied on Nikkei data
Fig. 6. The forecasting result of the Prophet model applied on KLCI data
493
494
U. K. Yusof et al. Table 2. Forecasting results of the prophet model compared to the approaches by [9] Approaches
Results (MAPE*) HS300 SP500 DJIA
BPNN
1.3015 1.8607 2.0348
STNN
1.2924 1.6725 1.8193
PCA-BPNN
1.2256 1.2820 1.7404
PCA-STNN
1.1557 1.1872 1.5183
SVM
1.6779 1.7722 2.2677
Original Prophet
4.1337 2.7178 2.4091
Revised Prophet† 2.8025 2.0320 1.8977 *refer Eq. 2 †Prophet with best forecast parameters
5 Conclusion In the present paper, Prophet model was utilized to forecast the indexes of HS300, SP500, DJIA, CSI300, Nikkei and KLCI. The forecasting results of the Prophet model also been compared with other forecasting models from the literature. Empirical examinations of forecasting accuracy for the price time series (by the comparing the evaluation measure using the MAPE) show that the Prophet model is competitive in modelling the actual market movement by simply adopting appropriate parameters. In addition, the relative errors of the forecasting result is also reduced closely similar to the results of a much more complex forecasting models. Future works could involves incorporating technical indicators and macroeconomic variables as part of the inputs to increase its accuracy. The Prophet model could also utilize different seasonality models to address fluctuating trend of the financial time series data. In addition, the Prophet model can also be enhanced with popular evolutionary algorithms and machine learning techniques. The work in this paper can also benefit analysts and practitioners to rapidly apply and benchmark different forecasting models in the financial time series and similar domains. Acknowledgement. The authors wish to thank Universiti Sains Malaysia (USM) for the support it has extended in the completion of the present research through the Research University Grant (RUI) (1001/PKOMP/8014084).
References 1. Tarsauliya, A., Kant, S., Kala, R., Tiwari, R., Shukla, A.: Analysis of artificial neural network for financial time series forecasting. Analysis, 9(2) (2010) 2. Ba¸so˘glu Kabran, F., Demirberk Ünlü, K.: A two-step machine learning approach to predict S&P 500 bubbles. J. Appl. Stat. 1–19, (2020)
Financial Time Series Forecasting Using Prophet
495
3. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst. Appl. 42(1), 259–268 (2015) 4. Aznarte, J.L., Alcalá-Fdez, J., Arauzo-Azofra, A., Benítez, J.M.: Financial time series forecasting with a bio-inspired fuzzy model. Expert Syst. Appl. 39(16), 12 302–12 309 (2012) 5. Taylor, S.J., Letham, B.: Forecasting at scale. Am. Stat. 72(1), 37–45 (2018) 6. Tyralis, H., Papacharalampous, G.A.: Large-scale assessment of prophet for multi-step ahead forecasting of monthly streamflow. Adv. Geosci. 45, 147–153 (2018) 7. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock market index using fusion of machine learning techniques. Expert Syst. Appl. 42(4), 2162–2172 (2015) 8. Hafezi, R., Shahrabi, J., Hadavandi, E.: A bat-neural network multi-agent system (bnnmas) for stock price prediction: case study of dax stock price. Appl. Soft Comput. 29, 196–210 (2015) 9. Wang, J., Wang, J.: Forecasting stock market indexes using principle component analysis and stochastic time effective neural networks. Neurocomputing 156, 68–78 (2015) 10. Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., Cottrell, G.: A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971 (2017) 11. Yang, R., He, J., Xu, M., Ni, H., Jones, P., Samatova, N.: An intelligent and hybrid weighted fuzzy time series model based on empirical mode decomposition for financial markets forecasting. In: Industrial Conference on Data Mining. Springer, pp. 104–118 (2018) 12. Bao, W., Yue, J., Rao, Y.: A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PloS one, 12(7), (2017) 13. Deng, Y., Bao, F., Kong, Y., Ren, Z., Dai, Q.: Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 653–664 (2017) 14. Li, J., Bu, H., Wu, J.: Sentiment-aware stock market prediction: a deep learning method. In: 2017 International Conference on Service Systems and Service Management (ICSSSM), pp. 1–6. IEEE (2017)
Facial Recognition to Identify Emotions: An Application of Deep Learning Kenza Belhouchette(B) Research Laboratory on Computer Science’s Complex Systems, Larbi Ben M’Hidi University, El Bouaghi, Algeria
Abstract. Deep learning is an approach that is not recent. But its use in the field of emotion recognition is a very important and very recent subject. Because of its power in classification. In this work we used convolutional neural networks for based emotions recognition. (joy, sadness, anger, disgust, surprise, fear and neutral). Our proposed work is an intelligent system of emotion recognition with mathematical foundations explanation of convolutional neural networks. To evaluate our recognition system we used two evaluation metrics which are: The rate of good classification (tbcs) and Error rate. The recognition rate achieved is very satisfactory. Indeed our recognition system was able to recognize almost more than 90% of emotions. Keywords: Emotion · Convolutional neural network · Facial expression
1 Introduction The concept of deep learning emerged in the early 2010 s with the rediscovery of multilayer artificial neural networks. Convolutional neural networks are to date the most efficient classifier for emotion recognition. They are divided into two parts: The first is the convolutional part. It works as a primitive extractor from images. Each image will pass through a set of successive filters and during each gateway new images called convolution maps are created. Each card will then be reduced to a vector of characteristics called CNN. This is the entrance to the second part. This last consists of connected layers (perceptron multilayer). The main purpose of this part is characteristics combination of the CNN code to classify the image. In this paper, we exploit the ability of CNNs in classification to classify basic emotions which are: joy, sadness, anger, disgust, surprise, fear. Any emotion recognition system generally contains three steps: the first one is face extraction, then facial features extraction and finally classification. CNNs advantage is the elimination of all these steps which minimizes the costs either in computing time or resources used.
2 Background and Related Works A deep learning algorithm is based on artificial neural networks with several hidden layers which are inspired by neural networks biological. Although currently in “fashion”, the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 496–504, 2021. https://doi.org/10.1007/978-3-030-70713-2_46
Facial Recognition to Identify Emotions
497
origin of deep learning dates from the early 20th century and its concept has been built over same century. There are works that used convolutional neural networks for emotion recognition. Prudhvi Raj Dachapally [1] suggest two methods to handle the same task. The first represents each emotion by autoencoders, while the second method is an 8-layer convolutional neural network (CNN). Arushi and Vivek [2] used a VGG16 pretrained network for this task. Xie and Hu [3] proposed a different type of CNN structure that used convolutional modules. This module, to reduce redundancy of same features learned, considers mutual information between filters of the same layer, and processes the best set of features for the next layer. Shruti Jaiswal et G. C. Nandi [4] in their work, have built a human emotion predicting system from a real-time image based on a convolutional neural network André et al. [5] propose a solution for facial expression recognition which is based on the convolutional neural network and preprocessings applied to the image (Fig 1).
3 Proposed Recognition Approach
Fig. 1. Proposed CNNs to recognize basic emotions
3.1 The Convolution Step Convolution is the heart of our convolutional neural network. It is a mathematical tool (convolution product) widely used in image editing, because it allows to highlight the extraction of characteristics from the images of’ inputs, in order to apply a good filter (Fig. 2).
498
K. Belhouchette
Fig. 2. Convolution step
The activation function is executed after each convolution. It is responsible for feeding the current neural and its output. In our work we have used Softmax because it presents almost perfect results in classifying entries into multiple categories and this is the case of our goal wich is classifying images into multiple emotions. Because of its class management capacity: only one class in other activation functions and normalizes the outputs for each class between 0 and 1, and divides by their sum, giving the input value probability being in a specific class. e xi fi (x) = x k ke
(1)
3.2 The Pooling Step The goal is to reduce the dimensionality of feature maps. In Max pooling we choose the maximum value within a matrix (Fig. 3). 3.3 The Flattening It is simply to put together all the images we have to make a vector. Pixels are retrieved line by line and added to the final vector (Fig. 4)
Facial Recognition to Identify Emotions
499
Fig. 3. Max pooling
Fig. 4. Flattening
3.4 Optimization for Deep Learning Gradient descent is algorithm of optimization often used in order to find the weights or coefficients of machine learning algorithms. This step cab be explained as: 1. The neural network aL (x; w1 ; . . . .; L) = hL (hL−1 (. . . h1 (x, w1 ), wL−1 ), wL ) 2. Learning by minimizing empirical error
(2)
500
K. Belhouchette
W ∗ ← argminw
(x,y)∈(X ,Y )
γ (y, aL (x, w1 , . . . , L))
(3)
3. Optimizing by Gradient Descent based approach Wt+1 = Wt − nt w γ
(4)
In our case, we used the Adagrad algorithm [6] is an optimization algorithm based on gradient descent which adapts only the learning rate to the parameters, It performs limited updates for the characteristics spawned and other consistent for infrequent characteristics. Dean et al. [7] found that Adagrad greatly improved the robustness of SGD and used it for learning large neural networks at Google which - among others - learned to recognize cats in YouTube videos [8]. rj =
τ
gt (∇θ γj )2 → Wt+1 = Wt − nt √ r+ε
(5)
Such as: ε a small number to prevent division with 0.
4 Test and Discussion 4.1 Test Data In the test phase, we have exploited two famous databases: the Cohn and Kanade (CK) database [9] and the MMI database [10] (database, in line). The first has more than 500 ordered sequences where each sequence is decomposed into a set of images starting from a neutral state up to maximum expression. The second database used contains 1200 images representing the basic emotion (Fig. 5 and Fig 6).
Fig. 5. MMI dataset
Facial Recognition to Identify Emotions
501
Fig. 6. CK dataset
4.2 Results In this paper we present an intelligent system of basic emotions recognition, with time savings. An image can contain more than 40,000 pixels. To be able to manipulate this amount information, any emotion recognition will take a high execution time, although using CNNs optimize all the points mentioned. Our results were very satisfactory, as we present in the following. For the CNN applied to the dataset Cohn-Kanade predicted until 78 out of 80 images correctly. The confusion matrix for these results (see Table 1 and 2). Table 1. Total Confusion matrix for CNNs classifier Cohn-Kanade database
Table 2. Total Confusion matrix for CNNs classifier MMI database
502
K. Belhouchette
Such as: Fear: FE, SU: Surprise, AN: Anger, DI: Disgust, SA: sadness, NE: Neutral, Fe: Fear; Joy: JO, RT: Recognition rate (Fig. 7)
Haut 98 97 96 95 Haut
94 93 92 91 Joy Sadness Fear SurpriseDisgustNeutral Fig. 7. Confusion table chart
After examining our work, we encountered a set of impresision from to: First, the border between joy, neutral and the disgust is quite thin in the structure of the face, is therefore most of the classification errors for happiness was neutral and sadness and vice versa. Another thing we noticed when tapping into CNNs is that when an image reflects negative emotion, all the main predictions tend to be negative emotions (sadness, fear, disgust, anger). If the given image has no emotion, its top predicted emotions tend to be negative. 4.3 Evaluation Measure The rate of good classification (tbcs) (tbcs) =
Number of correctly identified elments Total number of elements
(6)
Error rate: tes = 1− tbcs According to the results, the classification precision increases with the number of input images. If the precision is reduced, we will need more input images to train our network. The performances obtained are relatively acceptable considering the small size of the sample images of the BDD and the fact that they are NG images. Indeed, CNN
Facial Recognition to Identify Emotions
503
Table 3. TBCs and TEs
works better with large comics. However, considering the size of the database, this is still acceptable although we would have liked to have had a lot more. 4.4 Discussion Generally speaking, a convolutional neural network works well and the performance of our network degrades if a convolutional layer is removed. So depth is essential to achieve good results. Our results improved as we deepened our network and increased the number of epochs. The learning base is also a determining element in convolutional neural networks, it is necessary to have a large learning base to achieve better results. We can say that the number of epochs, the size of the base and the depth of networks are important factors in obtaining better results.
5 Conclusion In this work we used the convolutional neural network; the model obtained was tested on two databases: MMI and CK the result obtained allowed us to make a comparison with some already existing work, which also gave us very good results. The most important advantage of this approach is that most of these advancements are not the result of more powerful hardware, larger datasets, and larger models, but mainly a consequence of new ideas, algorithms, improved mathematical foundations and network architectures.
References 1. Dachapally, P.R.: School of Informatics and Computing: Facial Emotion Detection Using Convolutional Neural Networks and Representational Autoencoder Units 2. Raghuvanshi, A., Choksi, V.: Facial Expression Recognition with Convolutional Neural Network, CS231n Course Projects Winter, (2016) 3. Xie, S., Hu, H.: Facial expression recognition with FRR – CNN. Electron. Lett. 53 (4), 235–237 (2017)
504
K. Belhouchette
4. Jaiswal, S., Nandi, G.C: Robust real-time emotion detection system using CNN architecture. Neural Comput. Appl. 32, 11253–11262 (2020) 5. Lopesa, A.T., AguiarbAlberto, E., De Souzaa, F., Oliveira Santos, T.: Facial expression recognition with convolutional neural networks coping with few data and the training sample order. Pattern Recogn. 61, 610–628 (2017) 6. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 21212159 (2011) 7. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, Ng, A.Y: Large scale distributed deep networks. Adv. Neural Inf. Process. Syst. 1223–1231 (2012) 8. Clark, L. Google’s artificial brain learns to find cat videos. Wired UK, www. wired. (2012) 9. Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, pp. 46–53 (2000) 10. http://mmifacedb.eu/
Text-Based Analysis to Detect Figure Plagiarism Taiseer Abdalla Elfadil Eisa1(B) , Naomie Salim2 , and Salha Alzahrani3 1 College of Science and Arts- Girls Section, King Khalid University Mahayil,
Asir, Saudi Arabia [email protected] 2 Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia 3 Department of Computer Science, Taif University, Taif, Saudi Arabia [email protected]
Abstract. Plagiarism, the process of copying someone else’s text or data without due recognition of the source is a serious academic offence. Many techniques have been proposed for detecting plagiarism in texts but only few techniques exist for detecting figure plagiarism. The main problem associated with existing techniques is that they are not applicable to non-textual elements of figures in research publications. This paper addresses the problem of figure plagiarism in scientific articles and proposes solutions to detect cases where an exact copy or modified figure retains the essential data in the original figure. In this paper, we proposed a deep figure analysis to detect all types of possible figure plagiarism ranging from simple copy and paste to plagiarism of strong modification to the content of the figure source. Unlike existing figure plagiarism detection methods, which compare between figures based on surface features. The proposed method represents each component of a figure and provides information about the text inside its component and the relationships with other component(s) to capture the meaning of the figure. using component-based comparison, and an improvement over existing methods which cannot extract enough information from figures to detect plagiarism. The results obtained by the proposed method are considered as one of the interesting research solutions for figure plagiarism. Keywords: Plagiarism detection · Figure plagiarism detection · Similarity detection · Image plagiarism detection · Semantic similarity · Figures text detection · Figure text analysis
1 Introduction The Internet has become the major source of information in recent times. This allows a large number of people to be able to conveniently search and access required information within seconds. In academia, more articles are published every day and most publishing organisations have begun to index articles in online repositories. This has made it quite easy to detect authors with copied content. Using others work without proper citation is known as plagiarism. In the academic sphere, plagiarism detection is mostly used to identify students or lecturers who have cheated in academic-related exercises. To address © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 505–513, 2021. https://doi.org/10.1007/978-3-030-70713-2_47
506
T. A. E. Eisa et al.
this problem, several plagiarism software systems and techniques have been proposed in the literature but none is without some limitations [1]. Modern plagiarism detection tools attempt to prove a suspicious author guilty by indicating the source where the plagiarised texts were taken from and the percentage of similarity score. Institutions, research centers, and publishing organisations often set the plagiarism detection score benchmark expected of their employees or prospective authors, in order to avoid embarrassment and breach of copyright agreements among others publishers [2, 3]. Existing techniques to detect plagiarism have been proposed in several studies based on textual features and text/sentence similarity measures [4–7]. In a scientific publication, quantitative information, results of experiments, frameworks, and statistical facts are represented in info-graphic form, such as figures, charts, and tables, rather than in text forms. However, less attention has been paid to detecting plagiarism in these non-textual elements of scientific publication. Recently a few studies have addressed the issue of plagiarism in non-textual elements [8–12]. In fact, identifying the similarities between figures based on the type of shapes only is not effective to detect a plagiarised figure. That is, similar shapes between figures do not necessarily mean similar figures. In contrast, similar text between figures may give indicators to a similar figure needing more investigation to confirm plagiarism. Therefore, there is a need to extract rich information from the graphics of the figure. The proposed similarity method is used to find the semantic relationship between different figures. Thus, at the semantic level, semantic similarity deals with the associated features between figures. To measure similarities between figures, semantic similarity is needed to extract the common features between figures. The objective of this paper to propose a new detection method to detect plagiarism of scientific figure in case of text and structure modifications to the original figure. The method was designed to compute the similarity score between the suspicious and source figures based on text, with consideration to the text and structure modifications in figures. figures were represented using shape feature based representation. This represents each component of a figure and provides information about the text inside its component and the relationships with other component(s) to capture the meaning of the figure, (Fig. 1) shows an example of the figure representation. For more details about feature extraction refer to [13]. The proposed method detects plagiarism based on the figure content, using component-based comparison, and an improvement over existing methods, that cannot extract enough information from figures to detect plagiarism and compare between figures based on surface features. The proposed method compares components between figures based on a detailed analysis in order to identify the actual content of the figures. Figures can be plagiarised using three methods of modifications with different degrees. Text modification is the case of when, the plagiarised version kept the structures of the figure without modification and used some methods to modify text inside or outside the figure. Another modification is structure modification, in which cases the plagiarised version kept the text in exact form and modified the structure of the figure. Modification also can be hardly with strong modifications because plagiarized version is modified in the text and structure. Therefore, in this paper, we focused on figure plagiarism with consideration to the text and structure modifications. The paper is organized as follows:
Text-Based Analysis to Detect Figure Plagiarism
507
Sect. 2 provides the proposed detection method, Sect. 3 experimental design and dataset, and Sect. 4 results and evaluation while Sect. 5 gives conclusions and future work.
Fig. 1. An example of the image of the figure and its extracted features
2 Methods 2.1 Similarity Score Based Text Comparison This method is designed to compare figures based on texts inside the shapes of the figure. In figures, text is used to describe the processes. These processes are presented in certain orders. Therefore, the order of text inside the figure is very important to describe the figure and to compare it with other figures. This method takes the text order into consideration during the investigative processes. To perform this task, two features of shape features of figure were used to compare the similarities between figures as shown in (Fig. 2). Namely the component text and component flow. The feature of component flow was used to learn the sequence and orders of the processes. The main steps of the algorithm are explained in Algorithm (1). Step 1: Text Ordering To detect the similarity score between the suspicious and source figure based on text, consider the relation between the text and structure in figures. Text in a figure is usually used by researchers to describe the processes. The order of these processes is represented using the arrows. The order of text is an important feature to understand the processes and to detect the extent of similarity between figures. Using this method first, text similarity between figures can be detected. Second, the order and flow of text can also be detected. In other words, two types of plagiarism can be detecting, namely text plagiarism and text with same order plagiarism. Text Plagiarism : refers to the plagiarism case when the similarity score of the text between the source and suspicious figure is greater than or equal to the threshold value. If most of the text in the suspicious component is overlapped with source components.
508
T. A. E. Eisa et al.
Fig. 2. Features used to detect figure plagiarism based on texts
However, the order of the text in the suspicious figure is represented in a different way to the text order in the source figure. Resulted the sequence and flow of text in suspicious figure is different than the sequence and flow in the source one. Text with Same Order Plagiarism : refers to the plagiarism case when the similarity score of text between the source and suspicious figure is greater than or equal to the threshold value, and the text in the suspicious figure is represented in the same order as in the source figure. To represent the order of the text in the figure, the component flow features were used. Component flow features provide information about how the components are connected and how the text is streamed in the figure. Therefore, texts in a figure can be ordered based on the logic of flow. Suppose we have shape “I” and to decide the order of the text in the next shape “i + 1”, three types of order are considered namely next order, same order and before order. These three types are used to classify the order of the text as explained in Algorithm (2). Step 2: Generate Predicate Form The algorithm is designed to compare figures based on the text inside a shape and the text order. Each component in the source and suspicious figure are presented using the text as predicate while the order is arguments for further processing to calculate the similarity score later. Thus, the components are presented based on the form: text (order). Step 3: Similarity Calculation For each component in the suspicious figure, the overlapping distance (Jaccard similarity) as shown in Eq. 1 between the text predicate of components in the suspicious figure and the text predicate of components in source figures is computed. Calculated component text similarity CText is regarded as a plagiarism case if most, of the text in the suspicious component is overlapped with source components. The algorithm obtains the maximum similarity score between the source and suspicious components. This method is successful in reporting exact or near to exact copy text plagiarism or when texts are restructured and words are reordered. Based on the value of component text similarity, the component will be classified as a suspicious component or not, as shown in Eq. 2
Text-Based Analysis to Detect Figure Plagiarism
509
Algorithm (1) General steps of obtaining a similarity score based on figure text Input: Structured textual format of source and suspicious figures Output: The source of suspicious figure and similarity score of suspicious fiure Processes: For each source and suspicious figure Step 1: Order text based on text ordering algorithm Step 2: Generate predicate form Step 3: Calculate similarity values between predicates: For each component in suspicious figure Do Compute text overlap (suspicious, source) If text overlap > = threshold value then Compute order similarity Compute text with order similarity Else ignore component Step 4: Similarity report and classification Compute overall text similarity as summation of text similarities of components normalized by the total number of components. Compute overall text with order similarity as summation of text with order similarities of components normalized by the total number of components. Compute Overall similarity of suspicious figure an average score between the text similarity and text with order similarities If overall text with order similarity >= threshold value then Plagiarism type = text with same order plagiarism Else if overall text similarity >= threshold value then Plagiarism type = text plagiarism Else Plagiarism free
CTextSim (SusText , SoText ) = SusCom =
|SusText ∩ SoText | |SusText ∪ SoText |
Suspicious if CTextSim ≥ Threshold Free Otherwise
(1) (2)
Next, if the component is classified as a suspicious component then more investigations will be performed to compare between the arguments of the source and suspicious predicates. Equation 3 to compare the order of the text in the suspicious figure against the order of same text in the source figure 1 if Same order Ordersim = (3) 0 Otherwise For each component in the suspicious figure, the algorithm computes an average score between the component texts’ similarity and similarity of their order to calculate text
510
T. A. E. Eisa et al.
and order similarity, as shown in Eq. 4 CTextSim + OrderSim 2 Algorithm (2). Text ordering using the logic of flow. CTextwithOrderSim =
(4)
Data: Structured textual format of source and suspicious figure Output: Text ordering Processes: Read component text Read component flow information If flow type is Line || Next_ TB || Next_ LR || Double Arrow then Set next text in next order (i+1). Else if Flow type is Decision Link || Multi Flow || and Multi Line then Set next text in same order. Else if Flow type is Next RL || Next BT then Set next text in before order (i-1) Print text; print order.
This step is to calculate the degrees of similarity between the source and suspicious figure. Three values are computed for each suspicious figure. Overall Text Similarity (OTS) between the source and suspicious figure is computed as summation of text similarities of components normalized by the total number of components (N), as shown in Eq. 5. CTextSim (5) OTS = N Overall Text with oRder Similarities (OTRS) between source and suspicious figure is computed as summation of text with order similarities of components normalized by the total number of components (N), as shown in Eq. 6. CTextwithOrderSim (6) OTRS = N Overall Figure text similarity score of suspicious figure based on text similarity is computed as an average score between the text similarity and text with order similarities, as shown in Eq. 7. OTS + OTRS (7) 2 Based on the value of the Overall Figure Text Similarity (OFTS) score of the suspicious figure, the figure will be classified as a plagiarised figure or not as shown in Eq. 8. In addition, the type of plagiarism can be decided based on the similarities values, as explained below in Eq. 9. 1 If OFTS ≥ Threshold Plagiarism case = (8) 0 Otherwise OFTS =
Text-Based Analysis to Detect Figure Plagiarism
Plagiarism Type =
Text Plagiarism OTS ≥ Threshold Text With Order Plagiarism OTRS ≥ Threshold
511
(9)
3 Experimental Design and Dataset This experiment considered the number of plagiarised figures detected from the source figures. Carried out using the set of Shape features based figure plagiarism detection corpus which were constructed for the purpose of this research. For each figure inside the set, a textual description file is assigned as metadata to the figure. The advantage of the shape feature based figure plagiarism corpus is that it was developed and simulated by humans and not by programs, thus the behaviour scenarios of the plagiarisers is more natural. The experiment was performed on 1230 Figs. (748 source figures, 449 plagiarised figure and 33 free of plagiarism). The plagiarised figures had a different type of modification with different degrees where some of the suspicious figures are exact copies while others had text modification or text plus structure modifications. Based on the obfuscation strategies used, the suspicious figures were classified into three sets as described below. 1. Exact copy: This set concentrates on copy-and-paste plagiarism from the source figures. 2. Text modification: This set is for those with simple or strong modification of the source text by replacing words with their synonyms and presenting a few grammatical changes or major rewriting with restructuring and paraphrasing. 3. Text plus structural modifications: This set is the combination of text and structural modification, such as changing the shape of the components or the flow between them. The input to the experiment was a set of source figures and a set of suspicious figures with different degrees of modification (simple, medium and strong). The task was to return the source figure for each suspicious figure and display the similarity score based on textual and structural comparisons. The proposed methods were run on each set and the results were calculated based on the set. After the experiments, it was discovered that the proposed method achieved good results in terms of the evaluation measure.
4 Results and Evaluation Three general testing parameters that is precision, recall and F-measures which were commonly used in the information retrieval field were adopted [14]. The results were calculated based on three values of threshold (0.4, 0.5 and 0.6) In Table 1 Precision, recall and F-measures for figure text-based comparisons method at different threshold values and across different sets of the dataset. The detection metod of text-based comparisons can detect the plagiarism of text, especially when the plagiariser hides his or her crime by changing or converting the shapes or the type of flow between a figure components. In such cases, the comparison between figures based on structure did not perform well. The two methods did not perform well in cases where texts were heavily modified or paraphrased because expressions were completely replaced by other words.
512
T. A. E. Eisa et al.
Table 1. Precision recall and F-measures for figure text based comparisons method at different threshold values and across different sets of the dataset.
5 Conclusions and Future Work Many techniques have been proposed for plagiarism detection, in contrast, the plagiarism detecting techniques for figures are very short, and there is still a gap in the methods implemented to detect figure plagiarism. Previous works presented the figures without understanding to the figure content. They ignore the text inside the figures during the detection processes. Still, there is a need for figure plagiarism detection methods capable of handling text and structural features and it became the motivation in this method. The proposed method Figure plagiarism detection based on shape features based representation compared between figures based on the component comparison. In this method, each component in the suspicious figure is compared with the components of the source figure. The proposed methods build predicates from the attributes of the component. The similarity score from the pairwise comparison of the predicate-argument of the suspicious-source figures is computed using Jaccard’s similarity measure to determine the degree of similarity between the figures that are exactly plagiarised. Similarity detection compares suspicious-source figures based on figure text comparison but considering the order of text inside the figure. The results obtained by the proposed method are considered as one of the interesting research solutions for fiure plagiarism.
References 1. Eisa, T.A.E., Salim, N., Alzahrani, S.: Existing plagiarism detection techniques: a systematic mapping of the scholarly literature. Online Inf. Rev. 39(3), 383–400 (2015) 2. Kumar, P.M., et al.: Knowing and avoiding plagiarism during scientific writing. Ann. Med. Health Sci. Res. 4(3), 193–198 (2014) 3. Zhang, Y.-H.H., et al.: Be careful! avoiding duplication: a case study. J. Zhejiang Univ. Sci. 14(4), 355 (2013) 4. Altheneyan, A.S., Menai, M.E.B.: Automatic plagiarism detection in obfuscated text. Pattern Anal. Appl. (2020)
Text-Based Analysis to Detect Figure Plagiarism
513
5. Foltýnek, T., et al.: Detecting machine-obfuscated plagiarism. International Conference on Information. Springer, Berlin (2020) 6. Chakrabarty, A., Roy, S.: An efficient context-aware agglomerative fuzzy clustering framework for plagiarism detection. Int. J. Data Min. Model. Manag. 10(2), 188–208 (2018) 7. Mukherjee, I., et al.: Plagiarism detection based on semantic analysis. Int. J. Knowl. Learn. 12(3), 242–254 (2018) 8. Ahuja, L., Gupta, V., Kumar R.:, A new hybrid technique for detection of plagiarism from text documents. Arab. J. Sci. Eng. (2020) 9. Meuschke, N., et al.: An adaptive image-based plagiarism detection approach. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. ACM: Fort Worth, Texas, USA, pp. 131–140 (2018) 10. Meuschke, N., et al.: HyPlag: a hybrid approach to academic plagiarism detection. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (2018) 11. Kuruvila, J.S., et al.: Flowchart plagiarism detection system: an image processing approach. Procedia Comput. Sci. 115, 533–540 (2017) 12. Iwanowski, M., Cacko, A., Sarwas, G.: Comparing images for document plagiarism detection. International Conference on Computer Vision and Graphics. Springer, Heidelberg (2016) 13. Eisa, T.A.E., Salim, N., Abdelmaboud, A.: Content-based scientific figure plagiarism detection using semantic mapping. International Conference of Reliable Information and Communication Technology. Springer, Heidelberg (2019) 14. Potthast, M., et al.: An evaluation framework for plagiarism detection. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics: Beijing, China, pp. 997–1005 (2010)
A Virtual Exploration of al-Masjid al-Nabawi Using Leap Motion Controller Slim Kammoun1,3(B) and Hamza Ghandorh2 1 Information System Department, Taibah University, Madina, Saudi Arabia
[email protected] 2 Computer Science Department, Taibah University, Madina, Saudi Arabia
[email protected] 3 Research Laboratory of Technologies of Information and Communication & Electrical
Engineering, University of Tunis, Tunis, Tunisia
Abstract. Religious tourism is getting popularity and more and more people wants to visit the religious places. Al-Masjid Al-Nabawi holds a historical priority for 1.5 billion visitors with different languages, cultures, and ethnicity. As part of the Saudi leadership to govern the affairs of Al-Masjid Al-Nabawi, many expansions have been planned\performed to provide better services for Saudi Arabia’s guests and visitors around the years. Large and ongoing expansions of the holy sites in Saudi Arabia may impact the ease of mobility for own guests especially in the time of high seasons (i.e., Haij and Omrah times). Virtual Reality technology started to play a critical role in the tourism industry by virtually exposing users to a certain place. The primary purpose of this paper is to visualize the Al-Masjid Al-Nabawi site in an interactive style and ease of use for the guests. The interaction with the 3D environment is ensured via the Leap motion Controller in a very simple way without need to special learning. We designed and implemented a virtual reality-based touring guide prototype, and an initial user validation was conducted. The results are promising for future investigation. Keywords: Virtual reality · Leap motion controller · Unreal engine · Al-Masjid Al Nabawi · Toursim
1 Introduction Virtual Reality (VR) is, an emerging and high specialized technology that allow to create and display high resolution three-dimensional objects and/or objects with which the user interacts in real time, by experiencing a sensation of immersion and presence in the reconstructed environments [1]. This technology has been applied in many field such as to cultural heritage, museum [2] or medicine [3] to gives a unique, immersive and engaging experience. Meanwhile, VR technology was invented to improve realistic experience in virtual environment and has been widely used in several fields in the last three decades [4]. Using VR technology it is possible to virtually explore a computergenerated environment as a different reality, and to immerse oneself into the past or © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 514–522, 2021. https://doi.org/10.1007/978-3-030-70713-2_48
A Virtual Exploration of al-Masjid al-Nabawi Using Leap Motion
515
in other virtual environments without leaving the current real-life situation [5]. Also, we can said that VR is increasingly used for virtual locations to enhance a visitor’s experience by providing access to additional features [6]. Al-Masjid Al-Nabawi is the second largest mosque in the Muslim world, which is depicted in Fig. 1. Al-Masjid Al- Nabawi holds a historical priority for 1.5 billion visitors with different languages, cultures, and ethnicity. As part of the Saudi leadership to govern the affairs of Al-Masjid Al-Nabawi, many expansions have been planned\performed to provide better services for Saudi Arabia’s guests and visitors around the years.
Fig. 1. Al-Masjid Al-Nabawi
Due to the large and ongoing expansions of the holy sites in Saudi Arabia, including Al-Masjid Al-Nabawi, many guests ought to appreciate to visit Al-Masjid Al-Nabawi‘s landmarks, access provided services, and explore in-and-our routes to their residence locations. However, the guests might not be aware of the rapid changes with Al-Masjid Al-Nabawi, which leads to bad experiences, especially in the time of high seasons (i.e., Haij and Omrah times). There is a new trend in the tourism industry to allow users to use immersion technology to virtually visit or have a “sensation” of a certain place in virtual three-dimensional (3D) spaces as if they were inside it before being in the actual sites. Virtual Reality (VR) is one of the key players in such a domain. We have a hypothesis that the guests will gain a better awareness of Al-Masjid AlNabawi‘s site and proposed services by the application of VR technology. We introduce a VR-based guide tour software to explore Al-Masjid Al-Nabawi using the VR technology in a very interactive approach. The primary purpose of this work is: • Create a 3D representation of al Masjid Al-Nabawi • Propose a new way to interact with this environment via the LMC (Leap Motion Controller) The outline of this paper is organized as follows: Sect. 2 provides the reader with brief background and some related work regarding 3D environments and. Section 3
516
S. Kammoun and H. Ghandorh
describes our methodology towards the development of the software and the integration of the LMC. Section 4 present the current prototype and initial users’ evaluation, and finally Sect. 5 will concludes the paper.
2 Related Works VR is a visualization technology intended to immerse the users in a created virtual world in which the users would gain new insights and live new experiences with configurable settings [3]. VR indulges users in synthetic settings in which users cannot recognize their surroundings [7]. Augmented Reality (AR) is another visualization technology intended to enhance users’ perception of reality through overlaying computer-generated information upon physical objects [7, 8]. AR facilitates users’ direct interaction with the physical world. While maintaining a perception of the surrounding world and underlying events. VR and AR both leverage a mixture of technologies to enable users to navigate and manipulate objects for a particular purpose in the virtual world through their own rules and coordinates system [8]. Both technologies engage users in conventional or new experiences that are not commonly expected: a concept that possibly could revolutionize people’s perception and actions in many fields. The primary goal of VR is to completely replace users’ reality with a video-like experience, rather than supply their reality with augmented 3D virtual objects, as in the case of AR technology [9]. Milgram et al. [10] accurately described AR and VR as how reality and virtuality coexist through their proposed Reality- Virtuality Continuum. There are several VR-based applications designed to explore holy sites (i.e., mosques) around the world, yet they were designed for a certain purpose with narrow specifications. Up to our knowledge, there is no VR-based guide touring for Al-Masjid Al-Nabawi available to use with the integration of the Leap Motion controller. Istana Bukit Zaharah [11] software is a VR-based simulation software that used to make users experience the royal dinner in the palace during the 1850 s. The exploration of the Great Mosque of Banten [12] software is a VR-based educational software that allows users to visualize and explore the site in late 1800. Abidin and Razak [13] presented an overview of the data acquisition and the 3D reconstruction process of Al-Masjid Al-Nabawi and presented the process of transferring the model into a 3D space for real-time navigation experience using a VR headset.
3 Methodology This section describes our development approach for VR based guide tour software to explore Al-Masjid Al-Nabawi. To do that, Unreal engine platform was used to generate the 3D environment and the leap motion controller to navigate into the virtual scene. 3.1 Building the 3D Space We used Unreal Engine (UE) platform (Version 4) [14], Leap Motion Software Development Kit (SDK) [10], and a local machine with VR-ready capabilities with Microsoft
A Virtual Exploration of al-Masjid al-Nabawi Using Leap Motion
517
Windows 8 operating system. UE is a complete suite of game development resources (UE Modeler, UE Development Kit (UDK), 3D functionality libraries) to construct two-dimensional mobile, game consoles, and VR-based games. UE platform offers a development platform with a variety of 3D shading renderer, multiple static lighting scenarios, built-in support for advanced graphics cards. An overview of the UE platform is depicted in Fig. 2.
Fig. 2. An example of unreal engine 4 platform
Fig. 3. The outdoor representation of Al-Masjid Al-Nabawi
518
S. Kammoun and H. Ghandorh
The Virtual representation of Al-Masjid Al-Nabawi presented in the paper shows only a small part of the available space and features cataloged and stored in the database. The user is able to look around in every direction, to approach all space, to move freely in the virtual thanks to the Leap Motion device. Figure 3 shown a three dimensional representation of Al-Masjid Al-Nabawi from the outdoor however Fig. 4 shown the outdoor spaces and doors. Figure 5 is a 3D representation of the indoor of Al-Masjid Al-Nabawi.
Fig. 4. 3D representation of outdoor spaces and doors of Al-Masjid Al-Nabawi
Fig. 5. Indoor 3D representation of Al-Masjid Al-Nabawi
The integration of UE and Leap Motion controller allowed us to build our prototype, design and render 3D representations of Al-Masjid Al-Nabawi landmarks and the surrounding sites to provide users’ functionality to interact with the 3D representations and to impose related alphanumerical\symbols within the virtual scenes.
A Virtual Exploration of al-Masjid al-Nabawi Using Leap Motion
519
3.2 Interaction Technique The Leap Motion controller [15] (see Fig. 5) is a gesture recognition console, where it detects the users’ hand translation and rotation in the 3D space to perform certain functionality. The controller offers three parts: 1) Initialize state, 2) Right-hand state, and 3) Left-hand state. The Initialize state used for syncing of the leap motion controller with the functionality of 3D space, Right-hand state used for the users’ translation with the 3D space, and Left-hand state used for the users’ rotation with the 3D space. The Leap Motion controller acts as the input device for the software. Figure 6 shows an overview of the underlying implementation for the integration between UE and the Leap Motion controller. UE engine requires the Leap Motion SDK as a plug-and-play component to deploy navigation functionalities. As presented in Fig. 7, when the user begins to move his\her hand in front of the controller, the software will open the map of Masjid Al-Nabawi. The user can then press, pick, or adjust a variety of options in his\her machine screen.
Fig. 6. The leap motion sensor
There are main components: users, interaction unit, and virtual space (i.e., UE). The user interacts with the system through the Leap Motion controller, then it sends deployed instruction to the virtual space. Afterwards, the virtual space sends the output information to the users‘screen.
4 Results and Discussion We developed VR-based guide tour software to explore Al-Masjid Al-Nabawi using the integration of VR technology and interactive Leap Motion Controller, which is depicted in Fig. 7. Figure 8 demonstrates a variety of Al-Masjid Al-Nabawi‘s landmarks and the surrounding areas. From Fig. 8, guests will be to virtually access in-and- out routes in Al-Masjid Al-Nabawi‘s landmarks from their sites, regardless of their current location. In order to validate our prototype, a user acceptance test was performed. Ten students gave their written informed consent to participate in this study. They were aged between
520
S. Kammoun and H. Ghandorh
Fig. 7. An overview of LMC implementation as input device with UE4
Fig. 8. An overview of the VR-based touring guide software for Al-Masjid Al-Nabawi.
18 and 22 (mean: 20 years). Participants included 66.7% male and 33.3% female students from Taibah University‘s main campus. During a session a participant was seated in front of the screen and having the LMC in front of them (see Fig. 8). He\She was able to move within the virtual environment throw gesture using LMC. During a session, each participant explore Al-Masjid Al-Nabawi and all the implemented features as seen in Fig. 3 and Fig. 4 and Fig. 5. We asked the participants to use the software for a short training session, and then we asked them to use it for 30 min before filling a usability questionnaire. From Fig. 9 indicates participants’ responses about the ease of use of the software. 60% of the participants stated positive responses, and 30% of the participants had neutral responses, while 10% of the participants suffered from non-typical translation in the 3D space. From Fig. 9 indicates participants’ responses about the enjoyment level after the experiment. 66.7% of the participants expressed enjoyable experiences, while 33.3% of the participants had a neutral experience.
A Virtual Exploration of al-Masjid al-Nabawi Using Leap Motion
521
Fig. 9. Users’ responses about the ease of use. and level of enjoyment
5 Conclusion and Future Work We proposed a VR-based guide tour prototype to explore Al-Masjid Al-Nabawi by the integration of VR technology, UE, and the Leap Motion controller. Current users’ validation indicated the good potential for sophisticated implementations and to undertake a future investigation in terms of users’ performance against other exploration techniques, for example, AR immersion mode. The current prototype could be expanded to provide multiple languages, visually impaired assistance features for the visitors, and to facilitate future expansions for administrators and construction workers of Al-Masjid Al-Nabawi or other similar historical places. For future work, we are eager to explore Al-Masjid Al-Nabawi landmarks with the integration of VR technology and wearable technology for exploration and navigation.
References 1. Jerald, J.: The Vr Book: Human-Centered Design for Virtual Reality. Association for Computing Machinery and Morgan & Claypool (2016) 2. Hammady, R., Ma, M., Powell, A.: User experience of markerless augmented reality applications in cultural heritage museums: ‘Museumeye’ as a case study Ramy. Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-95282-6 3. Schirmer, C.M., Mocco, J., E.J.: Evolving virtual reality simulation in neurosurgery. Neurosurgery. 73, 127–37 4. Mazuryk, T., Gervautz, M.: Virtual reality history, applications, technology and future. IEEE International Symposium Industrial Electronics, pp. 1013–1018 (2009). https://doi.org/10. 1109/ISIE.2009.5221998 5. Kersten, T.P., Tschirschwitz, F., Deggim, S., Lindstaedt, M.: Virtual reality for cultural heritage monuments – from 3D data recording to immersive visualisation. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 11197 LNCS, pp. 74– 83 (2018). https://doi.org/10.1007/978-3-030-01765-1_9 6. Ghani, I., Rafi, A., Woods, P.: The effect of immersion towards place presence in virtual heritage environments. Pers. Ubiquitous Comput. (2019). https://doi.org/10.1007/s00779019-01352-8 7. Azuma, R.T.: A survey of augmented reality. Presence Teleoperators Virtual Environ. 6, 355–385 (1997)
522
S. Kammoun and H. Ghandorh
8. Carmigniani, J., Furht, B.: Handbook of Augmented Reality (2011). https://doi.org/10.1007/ 978-1-4614-0064-6 9. Pandya, A.: Medical augmented reality system for image -guided and robotic surgery: Development and surgeon factors analysis (2004) 10. Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented reality: a class of displays on the reality-virtuality continuum. Telemanipulator Telepresence Technol. 2351, 282–292 (1995). https://doi.org/10.1117/12.197321 11. Ali, K.: Virtual reconstruction of heritage building istanabukit zaharah 12. Subali, M., Andriansyah, M., Saptono, D., Purwanto, I., Antonius, I.S., Rahmadi, H., Sudjono, L.A.L.: Development of banten e-heritage using virtual reality technology on mobile device. In: In 2018 Third International Conference on Informatics and Computing (ICIC), pp. 1–5 (2018) 13. Abidin, M.I.Z., Razak, A.A.: Modelling of the Prophet mosque in virtual reality. In: ACM International Conference Proceeding Series Part F1479, pp. 320–324 (2019) https://doi.org/ 10.1145/3316615.3320117 14. Epic Games, Inc. Unreal Engine 4, docs.unrealengine.com/en-US/index.html 15. Ultraleap Ltd. Leap Motion Controller SDK, developer.leapmotion.com/
Comparison of Data Analytic Techniques for a Spatial Opinion Mining in Literary Works: A Review Paper Sea Yun Ying1 , Pantea Keikhosrokiani1(B) , and Moussa Pourya Asl2 1 School of Computer Sciences, Universiti Sains Malaysia, 11800 Minden, Penang, Malaysia
[email protected] 2 School of Humanities, Universiti Sains Malaysia, 11800 Minden, Penang, Malaysia
Abstract. Opinion mining is the use of analytic methods to extract subjective information. A study was conducted to apply spatial opinion mining in literary works to examine the writers’ opinions about how matters of space and place are experienced. For this reason, this paper conducts a review study to identify and compare different analytical techniques for opinion mining in fictional writings. This review study focused on sentiment analysis and topic modeling as two main techniques for spatial opinion mining in literary works. The comparison results are reported and the limitations of different techniques are mentioned. The results of this study can assist researchers in the field of opinion and text mining. Keywords: Big data analytics · Opinion mining · Text mining · Sentiment analysis · Topic modeling · Literary works
1 Introduction Opinion Mining (OM) is a technique that is used to detect and extract the prevalent opinion about entities. It utilizes text mining to detect the sentiment orientation of a text which could be positive, negative or neutral. It can be described as a fiend of knowledge discovery and data mining (KDD) that applies Natural language processing (NLP) and statistical machine learning (ML) techniques to categorize opinionated text from factual text. Therefore, the OM task involves opinion identification, opinion classification, target identification, source identification and opinion summarization as stated by [1]. Text analytics or text mining, is the methodology and process that allows machines to derive quality information and insights from textual data. This process involves using NLP, information retrieval and (ML) techniques to parse unstructured text data into more structured forms and extracting patterns and information from this kind of data that might bring benefits to the end user [2]. Human analysis of textual information is subject to prejudice and bias because people tend to give opinions that are consistent with their preferences. It is commonly believed that individuals can make decisions based on a rational analysis of available alternatives. However, it cannot be ignored that emotions exert a profound impact on the decisions that humans make in reality. Emotion is the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 523–535, 2021. https://doi.org/10.1007/978-3-030-70713-2_49
524
S. Y. Ying et al.
main ingredient that cannot be neglected in any decision making processes. Furthermore, humans have difficulty in producing consistent results when the amount of information to be processed is huge. Thus, automated text mining and summarization systems are required to overcome subjective biases and human limitations with an objective sentiment analysis system [3]. For this reason, a study was conducted to propose a computerized method to examine literary writers’ opinions in how matters of space and spatiality are addressed in their fictional works [4, 5]. Therefore, a review study was required to compare different data analytic techniques used to find how spatial experiences are portrayed in certain literary texts by diasporic writers [6, 7]. Therefore, this paper focused on the review and comparison of data analytic techniques for opinion mining in literary writings.
2 Methods for a Spatial Opinion Mining in Literary Works 2.1 Natural Language Processing and Text Analytics Emergence of big data technologies and artificial intelligence transformed the way researches from various disciplines are conducted [8–10]. Natural Language Processing (NLP) can be defined as a subfield of data science and Artificial Intelligence (AI) with roots in computational linguistics. The field is closely related to AI and was useful in the 1950’s automatic translation efforts [11]. It is mainly concerned with designing and constructing applications and systems that enable interaction between machines and natural languages evolved for utilization by humans [2]. NLP techniques enable computers to understand and manipulate natural language texts or speeches to perform useful outputs [12]. Text analytics, also referred to as text mining, is the methodology and process that allows machines to derive quality information and insights from textual data. Text analytics can be defined as an application which applies text mining techniques to sort out data sets. This process involves using NLP, information retrieval and machine learning (ML) techniques to parse unstructured text data into more structured forms and extracting patterns and information from these kinds of data that might bring benefits to the end user [2]. Text mining has become popular due to the development of big data platforms and deep learning algorithms that are able to analyze massive sets of unstructured data. There are two sub-domains for mining knowledge from user-generated discourses (subjectivity analysis): opinion mining and sentiment analysis [1]. Some of the researchers agree to use these domains interchangeably [13], while some of them have considered sentiment analysis to be a subfield of OM [14]. 2.2 Opinion Mining Opinion Mining (OM) can be defined as the science of using text mining to detect the sentiment orientation of a text which could be positive, negative or neutral. The term OM was first mentioned in 2003 by Dave et al. who described it as analysis of reviews about entity and presented it as a model for document polarity classification for recommended or not recommended. OM is a process used to extract information
Comparison of Data Analytic Techniques for a Spatial Opinion
525
or opinion about entities. It can be defined as a fiend of knowledge discovery and data mining (KDD) that applies NLP and statistical ML techniques to categorize opinionated text from factual text. Therefore, the OM task involves opinion identification, opinion classification, target identification, source identification and opinion summarization [1]. The main concern that Khan et al. mentioned in their paper is how to automatically detect opinion components from unstructured text data and summarize the opinion about an entity from a large volume of unstructured text data. 2.3 Topic Modelling Topic Modelling is one of the unsupervised techniques used to perform text clustering in large document collections. It is a statistical model that helps to search a group of keywords or topics for a text. It assumes that each document consists of a group of topics or keywords. Each topic or keyword in the text consists of a collection of words. It is a form of opinion mining that is able to obtain recurring patterns of words in a textual document. Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) are two of the widely used topic modelling techniques nowadays. Topic Modelling is one type of probabilistic generative model that is usually used for discovering the hidden semantic structures in a text body. It is also used for annotating documents based on its topics and these annotations can be used to organize, search and summarize the whole texts. In short, topic modelling can handle the tasks of tag recommendation, text categorization, keyword extraction, and similarity search. It can be described as an approach for searching a set of words or topics from a large document collection that best describes the information in the collection. All topic models have the basic assumption in which each document contains a mixture of topics/keywords and each topic/keyword contains a collection of words [15]. 2.4 Sentiment Analysis (SA) Sentiment Analysis (SA) is one of the recent techniques used to extract and analyze emotional and sentiment statements in a text [16–18] It can be referred to as emotional polarity computation as it is used to detect the sentiments and categorize them based on their polarity. The used polarities can be positive, neutral or negative. In this case, traditional machine learning techniques on n-grams, parts of speech, and other bag of words features able to be applied when the data is labelled. Knowledge-based method that was introduced by [19] is another method in using labelled data. Both of these methods rely on crowdsourcing. Sentiment analysis (SA) is related to the extraction and analysis of emotional and sentimental statements in a text. The aim of SA is to detect opinions, identify the sentiments they express, and then categorize them based on their polarity. It uses polarities such as positive or negative or a scale of ratings (e.g., 1–5) and relies on features about emotion, affect, review and subjectivity. SA is a classification process which is divided into three main levels: documentlevel [20], sentence-level [21], and aspect-level [22]. The purpose of document-level SA is to categorize an opinion document as expressing a positive or negative opinion. In this case, the whole document is considered as a basic information unit (topic). The aim of sentence-level SA is to classify sentiment expressed in every sentence. It is
526
S. Y. Ying et al.
necessary to determine whether the sentence is subjective or objective before determining whether the sentence expresses positive or negative opinions. However, document level and sentiment level SA do not find out what exactly people preferred or did not prefer. Aspect-level SA performs finer-grained analysis. The goal of performing an aspectlevel SA is to determine the sentiment with respect to the specific aspects of entities. Aspect-level directly looks at the opinion rather than looking at language constructs which include documents, paragraphs, sentences, clauses or phases. Sentiment analysis is generally carried out by three common approaches: Lexicon-based approach, Learnbased approach or Machine Learning approach, and Hybrid approach.
3 Comparison and Discussion Between Sentiment Analysis and Topic Modelling Approaches 3.1 Comparison and Discussion on Different Sentiment Analysis Approaches Performing SA with different approaches produces different results. Each approach has its own advantages and disadvantages. The comparison and discussion of the two main approaches used in SA is given in Table 1. Table 1. Comparison of two approaches Criteria
Lexicon-based
Learn-based
Classification
Unsupervised learning
Supervised, Semi-supervised and Unsupervised learning
Advantages
• Domain independent • Does not need labelled data and the procedure of learning • Fast time to get the results
• • • •
Dictionary is not necessary High accuracy of classification High precision and adaptability Do not need maintenance
Disadvantages • Need maintenance for • Dependent to the domain so corpus/corpora classifier trained on the texts in one • Requires strong linguistic resources domain does not work with other which is not always available domains • Needs dictionaries that covers • Needs labelled data and procedure plenty of opinion words of learning • Low accuracy if compared with • Slow time to get the results learn-based approach
Lexicon-based Approach: Lexicon-based approach can be referred to as a rule-based approach because the dictionaries are used following certain rules. It depends on finding the opinion lexicon which is used to analyze a text. It relies on a sentiment lexicon, a collection of known and precompiled sentiment terms. Lexicon-based approach is one of the methods to do SA in document level, and it can be considered as an unsupervised approach as mentioned by [13].
Comparison of Data Analytic Techniques for a Spatial Opinion
527
Lexicon-based can be grouped into dictionary-based approach, corpus-based approach and manual approach. The dictionary-based method depends on searching opinion seed words before looking for the dictionary of their synonyms and antonyms. Dictionary based method can use existing dictionaries such as SentiWordNet. The corpus-based method starts with a seed list of opinion words before looking for other opinion words in a large corpus to help in searching opinion words with context specific orientations. This approach needs expensive manual annotation effort because it involves large corpus as mentioned by [23]. Corpus-based method is not as effective as applying a dictionary-based approach alone because it is difficult to prepare a large corpus to cover all English words [24]. Manual approach is a very time-consuming method and it is usually combined with dictionary-based and corpus-based approaches to prevent the mistakes that result from dictionary-based and corpus-based methods. Recent Studies of Lexicon-based Approach in SA: Lexicon-based approach is an unsupervised learning in that it does not need prior training for mining data. Most researches create their own lexicon in order to improve the performance. Several recent studies of lexicon-based approach in SA are summarized in Table 2. Learn-based Approach: Learn-based or ML Approach uses the famous ML algorithms and linguistic features. It is a classification algorithm which trains labelled document over corpus so that the features can be recognized for classifying the sentiment (Giannakopoulos et al., 2012). This approach can be supervised, semi-supervised or unsupervised. Supervised methods need large number of labelled training data which makes them expensive while unsupervised methods do not need labelled data and are therefore easy to apply for unlabeled data. There are several recent studies in SA using learn-based approach that are summarized in Table 3. Table 4 shows the advantages and disadvantages of different learn-based methods. It is a guideline to decide which method is and can be used. The machine learning methods given in the table below is the most common used method. Hybrid Approach: The hybrid approach combines both lexicon-based and learn-based approaches. It is very common with sentiment lexicons and plays a crucial role in the majority of methods. It applies the lexicon-based approach for sentiment score and then these scored documents represent the training data for the learn-based part. According to [29], hybrid approach is widely adopted due to its high accuracy and stability inherited from lexicon based approach and ML approach respectively. It uses a lexicon or learning symbiosis to attain the best of both worlds-stability and readability from a carefully designed lexicon, while the high accuracy from a powerful supervised algorithm.
3.2 Comparison of Topic Modelling: Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) are two common topic modelling techniques that are used nowadays. Both of them are able to find hidden topics from given documents without labelled training data. Discovering hidden topics provide benefits for different purposes such as clustering documents and
528
S. Y. Ying et al. Table 2. Comparative Study of lexicon-based approach in SA
Author
Dataset
Techniques
Tool
Accuracy
[25]
Newspaper articles (the set of 1292 quotes)
WordNet- lexicon based
WordNet Affect, SentiWordNet, MicroWNOp, JRC Tonality
82% improve the baseline 21%
[26]
40 216 tweets from Stanford Twitter Sentiment, and 3 269 tweets from ObamaMcCain Debate
General Inquirer LBM, MPQA, K-means, ONMTF, Moodlens, CFMS and ESSA
A novel framework General inquirer make use of ESSA with improvement 21.4% and 17.87% for both datasets
[27]
Online customers reviews (Spam & fake reviews)
Combine lexicon SentiWordNet and and use shallow MPQA lexicon dependency parser
85.7% for sentiment method but word counting approach 76.7%
[28]
Data set of 1 600 Facebook messages
n-gram (uni and bi-grams)
Lexicon of Hu and Liu (HL) and MPQA lexicon
70%
[29]
Three datasets (training set, test set and the verified set) 1000, 5000, and 10 000
Enhancement BOW model
New lexicon
[30]
335 022 restaurant reviews
Multilevel model
AFINN sentiment lexicons
Find out relation between the words
[31]
3 000 tweets (movie Lexicon based tweets) technique
TextBlob, SentiWordNet and WSD
TextBlob results were relatively better
[32]
6 250 tweets (political views)
Lexicon based technique and ML approach to check the accuracy
TextBlob, SentiWordNet and W-WSD
62.67% (TextBlob); 53.33% (SentiWordNet); 62.33% (WSD)
[33]
11 861 sentence-level snippets (Movie reviews)
Lexicon based technique
VADER, Textblob and NLTK
77% (VADER); 74% (Textblob); 62% (NLTK)
[34]
1 828 patients’ opinions in healthcare
Lexicon based technique
VADER and TextBlob
71.9% (VADER); 73% (TextBlob)
83.5%
Comparison of Data Analytic Techniques for a Spatial Opinion
529
Table 3. Comparative study of learn-based approach in SA Author
Dataset
Techniques
Tool
Accuracy
[35]
70 103 hotel reviews Naïve Bayes
Natural language toolkit (NLTK)
Find out relationship between sentiment variables and driving number of reviews
[36]
185 English song lyrics (textual)
Naïve Bayes, KNN, RapidMiner SVM
SVM based classifiers indicate promising results
[37]
1 200 electronic product review
Naïve Bayes, SVM, Matlab simulator Maximum Entropy and ensemble classifier
SVM and MEM have equal accuracy of 90%; NB has 89.5% but NB has better precision
[38]
56 483 restaurant reviews
Naïve Bayes algorithm
Ictcla50
74%
[39]
24 000 sentences of multi domain sentiment data (12 domain) from 2 000 reviews
Naïve Bayes, Binarized Multinomial Naïve Bayes (BMNB), Multinomial Naïve Bayes, SVM and J48
NA
BMNB has the best performance in six out of twelve domain, followed by SVM in four out of twelve data domain, the best feature selection is Information Gain
[40]
1 346 545 business reviews
SVM, Naïve Bayes, Natural language Logistic toolkit (NLTK) Regression, SGD
Linear SVC and SGD have an accuracy of 94.4%; NB and Logistic Regression tend to have slightly worst results
[41]
Online Hotel Reviews: 19 650 data in Chinese and 2 008 in English
Multinomial Naïve Bayes
Find out general consistent interrelationship between structured and unstructured UGC
Natural language toolkit (NLTK)
(continued)
530
S. Y. Ying et al. Table 3. (continued)
Author
Dataset
Techniques
Tool
Accuracy
[42]
Hotel reviews of 3 hotels located in Astana
Language processing SVM, MEM
NLTK
Extended Model able to express more accurate sentiment polarity
[43]
13 541 tweets from E-Twitter and 479 tweets from Twitter-sanders
SVM, Decision Tree and Naïve Bayes
WEKA
SVM provides more accurate results than DT and NB
[44]
4 datasets: 1 600 000 tweets; 888 tweets; 10 729 tweets; 99 989 tweets
Naïve Bayes, Random Forest, SVM, Logistic Regression, Majority Voting, and a proposed ensemble
NA
Proposed ensemble classifier performs better than the other classifier
Table 4. Comparison of different machine learning methods Methods Advantages
Disadvantages
SVM
• High dimensional input space • Involve few inappropriate features • Document vectors are scarce
• A huge amount of training data set is needed • Data collection process is tedious
NB
• Simple and intuitive approach • It is an efficiency method with reasonable accuracy
• Mainly used for small training set • It always assume conditional independence among the linguistic features
MEM
• It does not assume the independent features like NB • Able to handle large amount of data
• Simplicity is hard
KNN
• It is computationally efficient • Classification of an instance will be similar to those closer to it in the vector space
• Big storage is needed • Computationally intensive recall
organizing online available texts for information retrieval and are also able to provide accurate recommendations. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are two common text data computer algorithms that have gained much attention individually in the text analysis for topic extraction studies but are not famous for neither document classification nor comparison studies [45]. Based on Anaya’s study, the accuracy rates for both LDA and LSA at the high level of abstraction are 84% while
Comparison of Data Analytic Techniques for a Spatial Opinion
531
the accuracy of LDA and LSA are 64% and 67% respectively at the lower level of abstraction. [46] and [47] summarize LDA and LSA as given in Table 5. Table 5. Comparison of LDA and LSA Criteria
LDA
LSA
Characteristics • A generative probabilistic model • Examines words existed in a • Need manually remove stop words document with same or similar to increase the performance of meaning • Low dimension representation of model • Words are categorized into topics documents and words by applying and can exist in more than one topic SVD • Create latent semantic space Advantages
• Works well with a huge corpus • Avoid from issue overfitting • Able embedded in other complicated models • Noise reduction is possible • Easier to implement than LSA • Best learns descriptive topics
Disadvantages • The number of topics should be set in advance • Uncorrelated topics
• Able to cluster the words and documents in the space • Unable to capture the multiple meanings of words • Error reduction available by using dimension reduction • Best at creating a compact semantic representation of documents and words in a corpus • Not presenting well defined probabilities • Lack of interpretable embedding • Less efficient presentation • Offer lower accuracy then LDA
The primary points that need to be taken into consideration when using a topic model technique is the degree to which the learned topics match human judgments and are able to help humans differentiate between ideas, as suggested by [46]. The evaluation of the topic model has been ad hoc and application-specific. The existing evaluations range from fully-automated intrinsic evaluations to manually crafted extrinsic evaluations. Extrinsic evaluations are normally hand constructed and are often expensive to perform for domain-specific topics while intrinsic evaluations are able to evaluate the amount of information encoded by the topics easily. Perplexity is one of the common examples of intrinsic evaluations as suggested by [48] to evaluate the performance of the topic model technique. [49] found that perplexity may not yield human interpretable topics. As a result, researchers have introduced topic coherence measures – a qualitative approach to automatically discover the coherence of a topic [34, 50–52]. A number of measures have been combined into a framework in order to evaluate the coherence between topics inferred by a model. The degree of semantic similarity between topic-related words in the topic is measured by using topic coherence measures [53]. The higher the topic coherence score, the more the semantically meaningful topic is generated [46].
532
S. Y. Ying et al.
4 Conclusion This study focused on reviewing different sentiment analysis and topic modeling techniques which are suitable for a spatial opinion mining in literary works. Based on the results of this review study, LDA is an unsupervised learning approach that is applied in this study due to lack of labelled training data. Unsupervised learning is normally used for finding hidden patterns of data to improve the performance of the model. In other words, it might not be used alone but combined with supervised learning approach in order to achieve a higher quality of model. Unsupervised learning approach can be conducted when (1) there are no labels on training data; (2) the data cannot be labelled manually or it is expensive to do so; and (3) most of the supervised learning algorithms fail to fit well with the underlying distribution of the data renders. In this study, the first and the second criteria are matched so LDA is used. However, it will never be the first choice if a big and good quality labelled training data is provided. Lexicon-based approach is performed for the sentiment analysis task in this project. It is a method that is designed for all domains as it is unsupervised and domain independent. In short, it can achieve a more robust performance across domains than learn-based approach. However, the limitation of the lexicon-based approach is that the maintenance of sentiment lexicons for different domains is needed. Lexicon-based performs faster than learn-based method but the accuracy of lexicon-based method is always lower than the learn-based method. In order to solve these kinds of problems, a big and good quality of labelled data should be provided. For future studies, the machine learning approach is suggested to be applied. The main reason is the limitation of unsupervised learning and that the lexicon-based approach of sentiment analysis can be ignored if ML approach is applied. However, the labelled training data set should be provided. The lack of data and lack of good data will generate a poor model. Most of the available machine learning algorithms require large amounts of data before they start to build a model. The good data ensures the model to capture good features from the training data set and then the algorithm will perform well. Acknowledgment. The authors are thankful to School of Computer Sciences and School of Humanities, Universiti Sains Malaysia for unlimited supports to finish this project. In addition, the authors are grateful to Division of Research & Innovation, USM for financial support from Short Term Grant (304/PHUMANITI/6315300) granted to Dr Moussa Pourya Asl.
References 1. Khan, K., et al.: Mining opinion components from unstructured reviews: a review. J. King Saud Univ. – Comput. Inf. Sci. 26(3), 258–275 (2014) 2. Sarkar, D.: Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data. Apress, New York (2016) 3. Lum, K.: Limitations of mitigating judicial bias with machine learning. Nat. Hum. Behav. 1(7), 0141 (2017) 4. Asl, M.P.: The politics of space: vietnam as a communist heterotopia in Viet Thanh Nguyen’s the refugees. Lang. Linguist. Lit. 26(1), 156–170 (2020)
Comparison of Data Analytic Techniques for a Spatial Opinion
533
5. Asl, M.P.: Micro-Physics of discipline: Spaces of the self in middle Eastern women life writings. Int. J. Arabic-English Studies 20(2), 223 (2020) 6. Asl, M.P.: Leisure as a space of political practice in Middle East women life writings. GEMA Online®. J. Lang. Stud. 19(3), 43–56 (2019) 7. Asl, M.P.: Practices of counter-conduct as a mode of resistance in Middle East women’s life writings. Lang. Linguist. Lit.®, 24(2), 195–205 (2018) 8. Keikhosrokiani, P.: Chapter 1 - Introduction to Mobile Medical Information System (mMIS) Development, in Perspectives in the Development of Mobile Medical Information Systems, P. Keikhosrokiani, Editor. 2020, Academic Press pp. 1–22 (2020) 9. Keikhosrokiani, P., Perspectives in the Development of Mobile Medical Information Systems: Life Cycle, Management, Methodological Approach and Application, Academic Press, Cambridge (2019) 10. Abdelrahman, O., Keikhosrokiani, P.: Assembly line anomaly detection and root cause analysis using machine learning. IEEE Access 8, 189661–189672 (2020) 11. Hilborg, P.H., Nygaard, E.B.: Viability of sentiment analysis in business. 2015, The Copenhagen Business School. http://studenttheses.cbs.dk 12. Chowdhary, K.R.: Natural language processing. In: Chowdhary, K.R. (ed.) Fundamentals of Artificial Intelligence, pp. 603–649. Springer India, New Delhi (2020) 13. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012) 14. Tang, H., Tan, S., Cheng, X.: A survey on sentiment detection of reviews. Expert Syst. Appl. 36(7), 10760–10773 (2009) 15. Kumar, S.A., et al.: Computational intelligence for data analytics. In: Recent Advances in Computational Intelligence, Springer. pp. 27–43 (2019) 16. Bakshi, R.K., et al.: Opinion mining and sentiment analysis. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE (2016) 17. Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl.-Based Syst. 89, 14–46 (2015) 18. Li, N., Wu, D.D.: Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis. Supp. Syst. 48(2), 354–368 (2010) 19. Andreevskaia, A., Bergler, S.: CLaC and CLaC-NB: Knowledge-based and corpus-based approaches to sentiment tagging. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007) (2007) 20. Yessenalina, A., Yue, Y., Cardie, C.: Multi-level structured models for document-level sentiment classification. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. (2010) 21. Farra, N., et al.: Sentence-level and document-level sentiment mining for Arabic texts. In: 2010 IEEE International Conference on Data Mining Workshops (2010) 22. Zhou, H., Song, F.: Aspect-level sentiment analysis based on a generalized probabilistic topic and syntax model (2015) 23. He, Y., Zhou, D.: Self-training from labeled features for sentiment analysis. Inf. Process. Manag. 47(4), 606–616 (2011) 24. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014) 25. Balahur, A., et al.: Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013) 26. Hu, X., et al.: Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd International Conference on World Wide Web (2013) 27. Peng, Q., Zhong, M.: Detecting spam review through sentiment analysis. JSW 9(8), 2065– 2072 (2014)
534
S. Y. Ying et al.
28. Flekova, L., Preo¸tiuc-Pietro, D., Ruppert, E.: Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words. In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (2015) 29. El Alaoui, I., et al.: A novel adaptable approach for sentiment analysis on big social data. J. Big Data 5(1), 12 (2018) 30. Gan, Q., et al.: A text mining and multidimensional sentiment analysis of online restaurant reviews. J. Qual. Assur. Hosp. Tourism 18(4), 465–492 (2017) 31. Gupta, M., Sharma, P.: Sentimental Analysis of Movies Tweets with Different Analyzer 32. Hasan, A., et al.: Machine learning-based sentiment analysis for twitter accounts. Math. Comput. Appl. 23(1), 11 (2018) 33. Bonta, V., Janardhan, N., Kumaresh, N.: A Comprehensive study on lexicon based approaches for sentiment analysis. Asian J. Comput. Sci. Technol. 8(S2), pp. 1–6 (2019) 34. RamyaSri, V., et al.: Sentiment analysis of patients’ opinions in healthcare using lexicon-based method 35. Duan, W., et al.: Mining online user-generated content: using sentiment analysis technique to study hotel service quality. In: 2013 46th Hawaii International Conference on System Sciences (2013) 36. Kumar, V., Minz, S.: Mood classifiaction of lyrics using SentiWordNet. In: 2013 International Conference on Computer Communication and Informatics (2013) 37. Neethu, M.S., Rajasree, R.: Sentiment analysis in twitter using machine learning techniques. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) (2013) 38. Chen, R.Y., Guo, J.Y., Deng, X.L.: Detecting fake reviews of hype about restaurants by sentiment analysis. In: Web-Age Information Management. Cham: Springer International Publishing (2014) 39. Saad, F.: Baseline evaluation: an empirical study of the performance of machine learning algorithms in short snippet sentiment analysis. In: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business (2014) 40. Salinca, A.: Business reviews classification using sentiment analysis. In: 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) (2015) 41. Zhang, X., et al.: Sentimental interplay between structured and unstructured user-generated contents: An empirical study on online hotel reviews. Online Inf. Rev. 40(1), 119–145 (2016) 42. Yergesh, B., Bekmanova, G., Sharipbay, A.: Sentiment analysis on the hotel reviews in the Kazakh language. In: 2017 International Conference on Computer Science and Engineering (UBMK) (2017) 43. Mathur, R.: Analyzing sentiment of twitter data using machine learning algorithm. GADL J. Invent. Comput. Sci. Commun. Technol. (JICSCT) 4(2), 1–7 (2018) 44. Saleena, A.N: An ensemble classification system for twitter sentiment analysis. Procedia Comput. Sci. 132, 937–946 (2018) 45. Anaya, L.H.: Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers: ERIC (2011) 46. Stevens, K., et al.: Exploring topic coherence over many models and many topics. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (2012) 47. George, M., Soundarabai, P.B., Krishnamurthi, K.: Impact of topic modelling methods and text classification techniques in text mining: a survey. Int. J. Adv. Electron. Comput. Sci. 4(3) (2017) 48. Wallach, H.M., et al.: Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning (2009)
Comparison of Data Analytic Techniques for a Spatial Opinion
535
49. Chang, J., et al.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems. (2009) 50. Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)– Long Papers (2013) 51. Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (2014) 52. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (2015) 53. Korenˇci´c, D., Ristov, S., Šnajder, J.: Document-based topic coherence measures for news media text. Expert Syst. Appl. 114, 357–373 (2018)
Open Data in Prediction Using Machine Learning: A Systematic Review Norismiza Ismail1,2(B) and Umi Kalsom Yusof1 1 School of Computer Sciences, Universiti Sains Malaysia, 11800 Penang, Malaysia
[email protected], [email protected], [email protected] 2 Digital Management and Development Centre, Universiti Malaysia Perlis, 02600 Arau, Perlis, Malaysia
Abstract. The determinants of open data (OD) in prediction using machine learning (ML) have been discussed in this study, which is done by reviewing current research scenario. As open government data (OGD) and social networking services (SNSs) have grown rapidly, OD is considered as the most significant trend for users to enhance their decision-making process. The purpose of the study was to identify the proliferation of OD in ML approaches in generating decisions through a systematic literature review (SLR) and mapping the outcomes in trends. In this systematic mapping study (SMS), the articles published between 2011 and 2020 in major online scientific databases, including IEEE Xplore, Scopus, ACM, Science Direct and Ebscohost were identified and analyzed. A total of 576 articles were found but only 72 articles were included after several selection process according to SLR. The results were presented and mapped based on the designed research questions (RQs). In addition, awareness of the current trend in the OD setting can contribute to the real impact on the computing society by providing the latest development and the need for future research, especially for those dealing with the OD and ML revolution. Keywords: Systematic literature review · Systematic mapping study · Open data · Prediction · Machine learning
1 Introduction 1.1 Background The open data (OD) proliferation has driven to the new generation of reusability, accessibility, sustainability and interoperability of transparent datasets, explore the potential of OD principles implementation, allowing various modules, systems and organizations operating worldwide together [1–3]. The open government data (OGD) movement has evolved exponentially since 2009, when the government of the United States agreed to enforce the concept of transparency by publishing thousands of their datasets [4]. The spigots of readily available public data are later opened by the European Commission, Mexico and Singapore [5–8]. As of 10th September 2020, Malaysian’s OD portal © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 536–553, 2021. https://doi.org/10.1007/978-3-030-70713-2_50
Open Data in Prediction Using Machine Learning
537
itself has 13,168 datasets with 18 clusters that can be accessed by the citizens since 2014 [9]. The implementation of OGD platform allows all stakeholders from the social, economic, environmental and so on to benefit from accessing, using and exchanging all datasets. Moreover, the emerging OGD increases the advantages of the sector and academic discussions, especially in the service context [10]. Different types of Web 2.0-based technologies have been used, including raw data download, transparent API (application programming interface) and LOD (linked open data) [11]. Currently, in addition to the OGD, the increasing amount of user-generated distributed through social networking services (SNSs) such as reviews, commentaries and past experiences, has made much of information accessible, so called OD [12]. The effect of SNSs has been well mentioned on word-of mouth communications and decisionmaking processes [13]. Digital marketers aware that they need to increase the usefulness of SNSs by providing value added services to effectively draw and exploit the interest of users [14]. As a result, SNSs are now expanding their capabilities by providing a diverse portfolio of built-in applications (apps) to meet the needs of social media users new experiences [15]; namely, personalised topic-specific virtual spaces to better serve usergenerated content (UGC) (e.g. Facebook, YouTube, Twitter and LinkedIn), including feedback, updates on past experiences and suggestions for potential content [16]. Several studies explored how OD can be used to predict certain attitude or behavior in decision making processes. As an example, OD have been researched to support a tourist’s decision-making process in profiling characteristics of worldwide tourist attractions and destinations by machine learning (ML) technique, Random Forest Method [12]. 1.2 Problem Description Several present prediction studies using ML but not so many specifically related to the field of OD. As OGD and SNSs have grown rapidly, OD is considered as the most significant trend for users to enhance their decision-making process. This information is freely accessible online and many studies have explored the effect of online reviews on the users’ decisions. However, further analysis should be done specifically investigated the degree to which OD studies could predict the reaction of users to a specific interest by using ML. From a theoretical point of view by implementing SLR and SMS, this research draws attention to the immense potential to manipulate perceptions and behaviours using OD sources. In practise, ML instruments can contribute significantly to the prediction and efficient positioning of any decision-making process through various ML techniques and algorithms which can be applied to the datasets.
2 Methodology The Systematic Literature Review (SLR) was used as the instrument in this study to access the number of possible papers and to understand the literature that can be classified as a systematic mapping analysis (SMS) in various research streams [17, 18]. In the context of OD in prediction, SMS is useful for investigating and displaying the whole picture of a research field, showing the amount of evidence and creating those research facts. The results of the mapping study help to classify research priorities within the
538
N. Ismail and U. K. Yusof
subject based on the studies needed. In general, the review process progress in brief as shown in Fig. 1 [19, 20]. From the mentioned process, the results are established after the analysis was done and all the findings were published.
Fig. 1. The study selection processes
In order to ensure consistency during the classification of included and excluded articles, according to Budgen et al, the publications were reviewed twice [21]. The first round of research was carried out to classify the basic topic of the study using names, abstracts and keywords [22] which the not related articles to Research Questions (RQs) were excluded. In the second round, the full texts of articles were reviewed and the not related articles were removed. Then, any new and useful information related to RQs were collected. The articles described have been reviewed correctly and refined as needed. Furthermore, the mapping process was performed to provide the present state and trend of the sample. 2.1 Research Questions (RQs) The primary goal of the SMS is to describe all the relevant studies for the RQs. At the beginning of the study process, the RQs and findings regarding: “What research questions in OD and prediction in ML are being addressed?”; “What original research exists in the study?”; and “What areas of research novelty?”. Then, RQs were developed and categorized into two main sets: Bibliometric Research Questions (BRQs) and Content Research Questions (CRQs) [23]. Bibliometric Research Questions (BRQs). In order to address the answer of the following RQs the selected papers were examined: 1. How many prediction publications in the field of OD have been published, and how has development changed over the years? 2. What have been the highest number of publications in scientific databases and journals for OD in prediction? 3. What countries have been actively research and contribute to the publications? Content Research Questions (CRQs). After the BRQs were determined, the more detailed analysis through the full text of the articles was required to answer the following RQs: 1. What are the ML techniques have been implemented by the existing research of OD in prediction?
Open Data in Prediction Using Machine Learning
539
2. What type of datasets have been used? 2.2 Data Collection Keywords and digital databases used in the search are highly influenced by the results of the literature review [17]. By implementing the search strategy, the articles were collected from the selected databases in order to answer the RQs designed. Selection of Database and Search Queries. The analysis started with a preliminary search that covered the essence of OD and prediction by accessing Google Scholar to identify the keywords in order to get the idea of accessible and relevant articles. Because of its accessibility web search engine in most peer-reviewed online journals and citations tracking tool, Google scholars have been selected to include full text indexes or metadata of scholarly literature (articles, patents, citations, etc.) [24] as shown in Table 1.
Table 1. Quick search results of publications found by using Google Scholar Google Scholar Search keywords
Number of articles
Open Data and Prediction
4,710,000
“Open Data” AND “Prediction”
36,600
“Open Data” AND (“Prediction” OR “Predict”) 49,600
However, from the quick search, keyword “Predict” is synonymous with “Prediction” and was used in some literature after certain keywords were merged and searched in multiple iterations. As the subsequent search for literature from various organisations and working groups can be distributed, this research was restricted to Computer Science (CS) and Information Technology (IT). In conclusion, the keywords for the search strategy were determined over the years between 2011 and 2020 by the followings criteria: • • • •
“Open Data” AND (“Prediction” OR “Predict”) within: Article or Document Title AND Abstract Computer Science (CS) and Information Technology (IT) field of research
In CS and IT, the databases that are most useful are IEEE, ACM and Science Direct from the prior reports [18]. The reason for selecting IEEE is that it is a major innovative technology excellence organization [25] and ACM remains the largest CS database in the world [26]. Scopus was selected for its largest peer-reviewed abstract literature and citation database and a comprehensive description of the research output in the world [27] and EbscoHost as an additional digital database. Table 2 demonstrates the distribution of articles using specific digital databases in a total of 576 prediction-related OD articles.
540
N. Ismail and U. K. Yusof Table 2. Distribution of publications (n = 576)
Scientific databases
Number of articles
Scopus (https://scopus.com)
340
Science Direct (https://www.sciencedirect.com)
73
IEEE XPlore (https://ieeexplore.ieee.org/Xplore/home.jsp)
72
EbscoHost (https://search.ebscohost.com/)
58
ACM (https://dl.acm.org)
33
Study Selection Criteria. The number of journals extracted (n = 576) from the searching keywords was found to be quite large which duplicate articles were identified and removed resulting 181 articles. The abstract of the articles was checked from the results to exclude those that were unrelated to the subject based on the criterion of inclusion and exclusion in Table 3 which resulting 105 articles remained. Table 3. Inclusion and exclusion criteria Inclusion criteria
Exclusion criteria
• Include primary studies related to the RQs • Research article or journal issue closely related to the topic of RQs • Articles explaining “open data” AND “Prediction” • Industry, government and any academic climate studies or research conducted • The full text of the publication is available
• A duplicate copy of the same research study • Publications in which OD and prediction are not defined • Authored papers in languages other than English • Business Articles (general business issue)
2.3 Results Included Depending on the title, abstract and availability, the content of the articles to be included in the analysis was limited. In other words, only when the full texts were read and systematically mapped to the present study, then the selected papers are accepted. From the final review, 72 articles were included across all the processes as shown in Fig. 2. In detail, Table 4 shows the articles distribution by the scientific databases respectively which IEEE Xplore have the highest numbers with 34 articles. 2.4 Data Extraction In this process, the full texts of the articles were reviewed in detail and classified from the data collection by search strategy to answer each RQs. The detailed analysis of the
Open Data in Prediction Using Machine Learning
541
Fig. 2. The final articles included (n = 72)
papers was based on the elicitation methods developed from the systematic examination of empirical evidence [28, 29]. All the articles are reviewed based on CRQs and BRQs identified to explain the findings. In the final result, the frequency of the publications was determined. Table 4. Distribution of included publications (n = 72) Scientific databases Number of articles included IEEE XPlore
34
Science direct
14
ACM
12
EbscoHost
7
Scopus
5
2.5 Data Extraction In this process, the full texts of the articles were reviewed in detail and classified from the data collection by search strategy to answer each RQs. The detailed analysis of the publications was based on the elicitation methods derived from the systematic study of the empirical results [28, 29]. All the articles are reviewed based on CRQs and BRQs identified to explain the findings. The frequency of the publications was calculated in the final result. Bibliometric Research Questions (BRQs). The bibliographic data of the papers was checked and gathered in this section to address the RQs. BRQs1 - Articles published range: It is significant to see if there has been a rising or declining articles trend by year. BRQs2 - Journal Database: The greatest number of articles published were determined from the selected studies. BRQs3 - Active Countries: The articles included were examined by the publicationproducing countries in order to see that their degree of contributions and activities related to OD and prediction articles have changed over the past 10 years. Although there have
542
N. Ismail and U. K. Yusof
been some articles involving more than one author from different countries, only the first author (country) has been recorded. If a study’s country was unknown, the author’s affiliation was considered. Content Research Questions (CRQs). In this section, the content of the papers was studied and information required for RQs was gathered. CRQs1 - Research Method: Machine learning approaches have recently been commonly used for prediction purposes and data mining generates many decisions [30]. The literature has shown that the ML methods can be divided into supervised, unsupervised and semi-supervised techniques with respective algorithms as shown in Fig. 3 [31–33]. The articles were extracted based on the research method addressed or by determination of research design through analyzing the information used for the non-stated methods in the article. CRQs2 - Datasets Type: The type of datasets mentioned in the articles, which have been used in ML prediction, is important to see.
Fig. 3. The overview of machine learning techniques classification
3 Results The results of the findings from the SLR were mapped to understand better the trend of the studies. The results were examined and demonstrated based on each RQs. 3.1 Bibliometric Research Questions (BRQs) BRQs1 - Articles Published Range. Quantitative analyses of the results based on the OD and prediction papers have been carried out in this section to see interesting overview whether there has been arising or declining trend every year. From 2011 to 2020, the distribution of all 72 listed publications spans the years as shown in Fig. 4.
Open Data in Prediction Using Machine Learning
543
Fig. 4. The publications distributed by years (n = 72)
From here, only two relevant articles were identified and no publications were published in 2011. Nevertheless, it has been seen that the research sector has exponential growth since 2014. This exponential progress is dependent on the encouragement and motivation of the government or the practitioners themselves. As an example, the new publication of the Public Sector Information Directive was published by the European Commission in 2013, which provided detailed data on cultural heritage in the form of public data freely available to European public institutions [34]. Furthermore, in May 2013, the White House directed federal agencies to generate more OD and machine-readable government information, such as public APIs, to be implemented to access OD and machine-readable government data by government and private developers [35, 36]. In addition, Germany’s policy makers, public authorities, the private sector and researchers have adopted the Dresden Agreement in response to users’ interest in convenience, organised and user-friendly access to the Open Government Platform [37]. BRQs2 - Scientific Journal Name. From the selected articles, 72 journals were published in 65 different scientific journals between 2011 and 2020 and the top six scientific journals as shown in Table 5. From the finding, IEEE contribute 23.6% (17 articles) of the total articles published and followed by CoDIT, CIKM, Molecules, Renewable Sustainable Energy Reviews and SAC with 2 journals respectively. BRQs3 - Active Countries. The countries are mentioned in Table 6 with regard to the degree of their contribution to the prediction use of OD. It is demonstrated that China, Italy, United Kingdom and the United States with the proportions of 15.28%, 9.72%, 9.72% and 8.33%, respectively, displaying the total number of selected articles including the first four high-profile nations, 43.05% of the all studies. Italy has been the only country to send publications on an ongoing basis since 2012, while Chinese scholars, as the most prolific producers, have been more actively involved in publishing papers from 2019 to 2020. The most numbers of participating countries and articles produced were in 2019 with 13 countries and 19 articles respectively. Moreover, in order to see patterns between the contribution of the cumulative countries for the first 5 years, more comprehensive research was carried out, from 2011 to 2015
544
N. Ismail and U. K. Yusof Table 5. Top 6 scientific journals with greatest number of articles published
Journal Names
No. of articles included
IEEE (IEEE Access, IEEE BigComp, IEEE BigDataService, ICCCS, 17 CCWC, etc.) CoDIT (Conference on Control, Decision and Information Technologies)
2
CIKM (ACM International Conference on Conference on Information and Knowledge Management)
2
Molecules
2
Renewable and Sustainable Energy Reviews
2
SAC (ACM/SIGAPP Symposium on Applied Computing)
2
as shown in Table 7 and the remaining years, since 2016 as of September 2020 in Table 8. The Table 7 and 8 contrasting some remarkable patterns as follows. By contrast, from 9 countries (2011–2015) to 26 countries (2016–2020), the number of countries studying OD has significantly increased by more than 70%. Some government memorandums could motivate the growth of publication-producing countries over the last five years [35, 38] and guidelines of OD principles [39]. For the previous 10 years, China has stayed competitively at the top of the list. However, the ranked list, dramatically created by Italy as opposed in both tables.
3.2 Content Research Questions (CRQs) CRQs1 - Research Method. From the papers, the origins of publication styles were drawn and classified on the basis of three techniques of Machine Learning (ML), as shown in Fig. 5. As a result, the growth of publications over the last 10 years shows that with 59 publications (81.9%), the Supervised ML techniques have been widely published and contribute to the development in prediction using OD.
Fig. 5. Machine learning techniques classified from the articles included (n = 72)
Countries
China
Italy
UK
USA
Australia
India
Taiwan
South Korea
Canada
Germany
Indonesia
Japan
Malaysia
Netherlands
Russia
Austria
Cameroon
Colombia
Czech Rep
No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2011 1
2012
1
2013
1
2
1
2014
1
1
2
2015
1
2
1
2
2016
1
1
1
1
1
2
1
1
2017
1
1
1
1
1
1
1
1
2
2018
Table 6. The frequency of countries contributed to OD in prediction (n = 72)
1
1
1
2
1
2
3
1
2
3
2019
1
1
1
1
1
1
1
2
2020
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
6
7
7
11
Freq
(continued)
1.39
1.39
1.39
1.39
2.78
2.78
2.78
2.78
2.78
2.78
2.78
4.17
4.17
4.17
4.17
8.33
9.72
9.72
15.28
Prop
Open Data in Prediction Using Machine Learning 545
Hong Kong
Ireland
Peru
Philippines
Poland
Saudi Arabia
Spain
Turkey
23
24
25
26
27
28
29
30 1
1
2012
1
1
2013
4
5
1
2014
5
6
1
1
2015
7
9
1
1
1
2016
9
10
1
2017
* For all publications with more than one country involved, the first author (country) was mentioned.
0
Greece
22
No. of participating Countries
France
21
0
Finland
20
2011
No. of articles
Countries
No
Table 6. (continued)
10
11
1
2018
13
19
1
1
2019
9
10
1
2020
72
1
1
1
1
1
1
1
1
1
1
1
Freq
100.00
1.39
1.39
1.39
1.39
1.39
1.39
1.39
1.39
1.39
1.39
1.39
Prop
546 N. Ismail and U. K. Yusof
Open Data in Prediction Using Machine Learning
547
Table 7. Open Data in prediction publications-producing countries 2011–2015 (n = 13) Rank no
Countries
Total
%
Rank No
Countries
Total
%
1
China
3
23.1
2
Czech Republic
1
7.7
1
UK
3
23.1
2
France
1
7.7
2
Italy
1
7.7
2
Ireland
1
7.7
2
India
1
7.7
2
Spain
1
7.7
2
Indonesia
1
7.7
Table 8. Open Data in prediction publications-producing Countries 2016–2020 (n = 59) Rank Countries No
Total %
Rank Countries No
Total %
1
China
8
13.6 5
Russia
2
3.4
2
Italy
6
10.2 6
Indonesia
1
1.7
2
USA
6
10.2 6
Austria
1
1.7
3
UK
4
6.8 6
Cameroon
1
1.7
4
Australia
3
5.1 6
Colombia
1
1.7
4
Taiwan
3
5.1 6
Finland
1
1.7
4
South Korea 3
5.1 6
Greece
1
1.7
5
India
2
3.4 6
Hong Kong
1
1.7
5
Canada
2
3.4 6
Peru
1
1.7
5
Germany
2
3.4 6
Philippines 1
1.7
5
Japan
2
3.4 6
Poland
1
1.7
5
Malaysia
2
3.4 6
Saudi Arabia
1
1.7
5
Netherlands 2
3.4 6
Turkey
1
1.7
CRQs2 - Datasets Type. It is equally relevant to see the type of datasets that have been used in prediction. Fig. 6 shows the distribution of the form of datasets examined, and by 25%, the government dataset remained the largest of the papers (18). The spectrum of governmental data were government policies, laws, socials, socio-economy and community. In specific, the datasets were directly related to crimes, pandemic, security and terrorism. With 11 publications (15.3%), the second highest was environmental data included electricity, power plants, water, topography and meteorology. Then, with ten publications (13.9%), healthcare and biomedicine followed as the third highest with data generated as part of scientific research such as biologists, pharmacology, chemistry,
548
N. Ismail and U. K. Yusof
medicines and life sciences. Total of 9 publications (12.5%) were education data that related to any educational environment included students’ performance, learning activities and curriculum. Transportation, finance, ICT, and tourist and entertainment at the bottom lowest compared to the other type of datasets.
Fig. 6. Proportions of the dataset types employed in the studies (n = 72)
Fig. 7. The trend of Machine Learning Techniques distribution by year (n = 72)
4 Discussion Determining and mapping the current ML techniques in prediction within the range of year 2011 to 2020, generally the trend shows that supervised ML technique grow dramatically since 2012 and expected to be more in the coming years as shown in Fig. 7. According to the articles selected, the classification approach in supervised ML is more popular with 33 articles compared to regression approach. The literature have addressed that classification approach which predict a discrete value of output can be implemented by using several algorithm such as Logical Regression [36, 40–42], Naive Bayes [43–48], Random Forest [49–51], Support Vector Machine (SVM) [52–54], Neural Network [55– 58], Gradient Boosting Decision Tree (GBDT) [59] and k-Nearest Neighbor Algorithm (kNN) [60]. However, for regression approach which predict a continuous value output presented were such as Linear Regression [44, 61–63], Decision Trees Regressor [64– 66], Artificial Neural Network (aNN) [67–70] and Support Vector Regressor [71]. Eventhough unsupervised ML techniques not so many presented in the literature, but interesting to show that several clustering approaches has been used such as K-Mediods [72], Fuzzy [73] and K-means [55]. This approach is useful in a collection of data with no pre-existing labels and with a minimum of human oversight, it searches for previously undetected trends. There are several articles researching on the multiple ML approach in order to improving Prediction. It can be a combination of more than one algorithm of supervised ML, unsupervised ML and semi-supervised ML Techniques [46, 47, 74–76]. Furthermore, there are new modern ML tools such as WEKA (Waikato Environment for Knowledge Analysis) in order to test ML algorithm such as SVM, Decision Trees,
Open Data in Prediction Using Machine Learning
549
aNN and Linear Regression [29, 55, 66]. WEKA includes a range of data analysis and predictive modelling visualisation tools and algorithms, along with graphical user interfaces for quick access to these features. Another tool that has been presented was Mathematica™ which ML algorithm such as Random Forest can be employed in the experiment with datasets [12].
5 Conclusion and Research Opportunities This research draws attention to the immense potential of OD sources to influence users’ attitudes and behaviors. In practise, prediction tools in machine learning can contribute significantly in supporting the decision-making of users in different types of fields. By providing them with knowledge about the current trend in the OD setting, the latest development, and the need for future research from the results presented, this study could assist organisations, software developers and researchers. In addition, the results of this study will contribute to the real impact on computing society, especially for those dealing with the OD and ML revolution. Hopefully, the results and findings of this research will be widely disseminated in order to obtain more feedback for growth. Acknowledgement. The authors wish to thank the Universiti Sains Malaysia (USM) and Universiti Malaysia Perlis (UniMAP) for the support it has extended in the completion of the present research.
References 1. Open Knowledge Foundation. what is open data? (2014). https://okfn.org/opendata/. Accessed 1 Apr 2019 2. Open data handbook. What is open data? (2012). https://opendatahandbook.org/en/what-isopen-data/index.html. Accessed 1 Apr 2019 3. W3C(e-Gov). egovernment at w3c: improving access to government through better use of the web (2009). https://www.w3.org/2007/eGov/. Accessed 1 Apr 2019 4. Obama, B.: Transparency and open government. Memorandum for the heads of executive departments and agencies (2009) 5. Foulonneau, M., Martin, S., Turki, S.: How open data are turned into services? In: International Conference on Exploring Services Science, pp. 31–39. Springer, Cham (2014) 6. Office of Management and Budget’s (OMB). Memorandum m-1 0–06, open government directive (2013). https://goo.gl/LcxbZE. Accessed 1 Apr 2019 7. Directive 2013/37/EU of the European Parliament and of the Council. Amending directive 2003/98/ec on the re-use of public sector information known as the “psi directive” (2013). https://ec.europa.eu/justice/data-protection/article-29/documentation/opinionrecommendation/files/2013/wp207en.pdf. Accessed 1 Apr 2019 8. Insights; Publications. What executives should know about open data (2014). https://www. mckinsey.com/industries/technology-media-and-telecommunications/our-insights/what-exe cutives-should-know-about-open-data. Accessed 1 Apr 2019 9. MAMPU: Our open data policy (2017). https://data.gov.my. Accessed 13 Sept 2019 10. Lindman, J., Kinnari, T., Rossi, M.: Industrial open data: case studies of early open data entrepreneurs. In: 2014 47th Hawaii International Conference on System Sciences, pp. 739– 748. IEEE (2014)
550
N. Ismail and U. K. Yusof
11. Song, S.H., Kim, T.D.: A study on the open platform modeling for linked open data ecosystem in public sector. In: 2013 15th International Conference on Advanced Communications Technology (ICACT), pp. 730–734. IEEE (2013) 12. Pantano, E., Priporas, C.V., Stylos, N.: ‘You will like it!’ using open data to predict tourists’ response to a tourist attraction. Tourism Manage. 60, 430–438 (2017) 13. Chu, S.C., Kim, Y.: Determinants of consumer engagement in electronic word-of-mouth (eWOM) in social networking sites. Int. J. Advert. 30(1), 47–75 (2011) 14. Diffley, S., Kearns, J., Bennett, W., Kawalek, P.: Consumer behaviour in social networking sites: implications for marketers. Irish J. Manage. (2011) 15. Jai, T.M.C., Burns, L.D.: Attributes of apparel tablet catalogs: value proposition comparisons. J. Fashion Mark. Manage. (2014) 16. Turban, E., King, D., Lee, J.K., Liang, T.P., Turban, D.C.: Social commerce: foundations, social marketing, and advertising. In Electronic Commerce, pp. 309–364. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-10091-3_7 17. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2007) 18. Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global (2011) 19. Davis, A., Dieste, O., Hickey, A., Juristo, N., Moreno, A.M.: Effectiveness of requirements elicitation techniques: empirical results derived from a systematic review. In: 14th IEEE International Requirements Engineering Conference (RE 2006), pp. 179–188. IEEE (2006) 20. Maglyas, A., Nikula, U., Smolander, K.: What do we know about software product management? -A systematic mapping study. In: 2011 Fifth International Workshop on Software Product Management (IWSPM), pp. 26–35. IEEE (2011) 21. Budgen, D., Burn, A.J., Brereton, O.P., Kitchenham, B.A., Pretorius, R.: Empirical evidence about the UML: a systematic literature review. Softw. Pract. Experience 41(4), 363–392 (2011) 22. Yin, R.K.: Validity and generalization in future case study evaluations. Evaluation 19(3), 321–332 (2013) 23. Sadoughi, F., Behmanesh, A., Sayfouri, N.: Internet of things in medicine: a systematic mapping study. J. Biomed. Inform. 103, 103383 (2020) 24. Halevi, G., Moed, H., Bar-Ilan, J.: Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—review of the literature. J. Informetrics 11(3), 823–834 (2017) 25. Madarash-Hill, C., Hill, J.B.: Enhancing access to IEEE conference proceedings: a case study in the application of IEEE Xplore full text and table of contents enhancements. Sci. Technol. Libr. 24(3–4), 389–399 (2004) 26. Zelevinsky, V., Wang, J., Tunkelang, D.: Supporting exploratory search for the ACM digital library. In: Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2008), pp. 85–88 (2008) 27. Boyle, F., Sherman, D.: Scopus™: The product and its development. Serials Librarian 49(3), 147–153 (2006) 28. Lindman, J., Rossi, M., Tuunainen, V.K.: Open data services: Research agenda. In: 2013 46th Hawaii International Conference on System Sciences, pp. 1239–1246. IEEE (2013) 29. Derguech, W., Bruke, E., Curry, E.: An autonomic approach to real-time predictive analytics using open data and internet of things. In: 2014 IEEE 11th International Conference on Ubiquitous Intelligence and Computing and 2014 IEEE 11th International Conference on Autonomic and Trusted Computing and 2014 IEEE 14th International Conference on Scalable Computing and Communications and Its Associated Workshops, pp. 204–211. IEEE (2014) 30. Alyahyan, E., Dü¸stegör, D.: Predicting academic success in higher education: literature review and best practices. Int. J. Educ. Technol. High. Educ. 17(1), 3 (2020)
Open Data in Prediction Using Machine Learning
551
31. Castañón, J.: (10). Machine learning methods that every data scientist should know. Consultado em Outubro 16 (2019) 32. Kononenko, I., Kukar, M.: Machine learning basics. Mach. Learn. Data Min. 59–105 (2007) 33. Zawacki-Richter, O., Marín, V.I., Bond, M., Gouverneur, F.: Systematic review of research on artificial intelligence applications in higher education–where are the educators? Int. J. Educ. Technol. High. Educ. 16(1), 39 (2019) 34. Schultz, M., Shatter, A.: Directive 2013/37/EU of the European Parliament and of the council of 26 June 2013 amending directive 2003/98/EC on the re-use of public sector information. Official J. Eur. Union Brussels (2013) 35. Obama, B.: Executive order--making open and machine readable the new default for government information. The White House (2013) 36. Weerakkody, V., Sivarajah, U., Mahroof, K., Maruyama, T., Lu, S.: Influencing subjective well-being for business and sustainable development using big data and predictive regression analysis. J. Bus. Res. (2020) 37. Hunnius, S., Krieger, B., Schuppan, T.: Providing, guarding, shielding: open government data in Spain and Germany. In: European Group for Public Administration Annual Conference, Speyer, Germany (2014) 38. Wright, F.: Data Gov. pp. 77–82 (2014) 39. Nugroho, R.P., Zuiderwijk, A., Janssen, M., de Jong, M.: A comparison of national open data policies: lessons learned. Transforming Government: People, Process and Policy (2015) 40. Xue, J.: Financial risk prediction and evaluation model of P2P network loan platform. In: 2020 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 1060–1064. IEEE (2020) 41. Alloghani, M., Aljaaf, A.J., Al-Jumeily, D., Hussain, A., Mallucci, C., Mustafina, J.: Data science to improve patient management system. In: 2018 11th International Conference on Developments in eSystems Engineering (DeSE), pp. 27–30. IEEE (2018) 42. Sarker, F., Tiropanis, T., Davis, H.C.: Linked data, data mining and external open data for better prediction of at-risk students. In: 2014 International Conference on Control, Decision and Information Technologies (CoDIT), pp. 652–657. IEEE (2014) 43. Capariño, E.T., Sison, A.M., Medina, R.P.: Application of the modified imputation method to missing data to increase classification performance. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 134–139. IEEE (2019) 44. Rao, A.R., Clarke, D.: A comparison of models to predict medical procedure costs from open public healthcare data. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018) 45. Tuke, J., Nguyen, A., Nasim, M., Mellor, D., Wickramasinghe, A., Bean, N., Mitchell, L.: Pachinko prediction: a Bayesian method for event prediction from social media data. Inf. Process. Manage. 57(2), 102147 (2020) 46. Zhang, Y., Siriarya, P., Kawai, Y., Jatowt, A.: Automatic latent street type discovery from web open data. Inf. Syst. 101536 (2020) 47. Tarasova, O., Poroikov, V.: HIV resistance prediction to reverse transcriptase inhibitors: focus on open data. Molecules 23(4), 956 (2018) 48. Noymanee, J., Nikitin, N.O., Kalyuzhnaya, A.V.: Urban pluvial flood forecasting using open data with machine learning techniques in pattani basin. Procedia Comput. Sci. 119, 288–297 (2017) 49. Rocca, G.B., Castillo-Cara, M., Levano, R.A., Herrera, J.V., Orozco-Barbosa, L.: Citizen security using machine learning algorithms through open data. In: 2016 8th IEEE LatinAmerican Conference on Communications (LATINCOM), pp. 1–6. IEEE (2016) 50. Dias, G.M., Bellalta, B., Oechsner, S.: Predicting occupancy trends in Barcelona’s bicycle service stations using open data. In: 2015 SAI Intelligent Systems Conference (IntelliSys), pp. 439–445. IEEE (2015)
552
N. Ismail and U. K. Yusof
51. Montanari, F., Zdrazil, B.: How open data shapes in silico transporter modeling. Molecules 22(3), 422 (2017) 52. Chen, Y.Y., Lv, Y., Li, Z., Wang, F.Y.: Long short-term memory model for traffic congestion prediction with online open data. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 132–137. IEEE (2016) 53. Asat, A.N., Mahat, A.F., Hassan, R., Ahmed, A.S.: Development of dengue detection and prevention system (Deng-E) based upon open data in Malaysia. In: 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–6. IEEE (2017) 54. Nechaev, Y., Corcoglioniti, F., Giuliano, C.: Type prediction combining linked open data and social media. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1033–1042 (2018) 55. Li, R., Xiong, H., Zhao, H.: More than address: pre-identify your income with the open data. In 2015 International Conference on Cloud Computing and Big Data (CCBD), pp. 193–200. IEEE (2015) 56. Qiao, C., Hu, X.: A joint neural network model for combining heterogeneous user data sources: an example of at-risk student prediction. J. Am. Soc. Inf. Sci. 71(10), 1192–1204 (2020) 57. Gutierrez-Osorio, C., Pedraza, C.: Modern data sources and techniques for analysis and forecast of road accidents: a review. J. Traffic Transp. Eng. (English edition) (2020) 58. Panda, M.: Learning crisis management information system from open crisis data using hybrid soft computing. Int. J. Hybrid Intell. Syst. 12(3), 145–156 (2015) 59. Chen, S., Wang, Q., Liu, S.: Credit risk prediction in peer-to-peer lending with ensemble learning framework. In: 2019 Chinese Control and Decision Conference (CCDC), pp. 4373– 4377. IEEE (2019) 60. Chen, H., Hu, Q., He, L.: Clairvoyant: an early prediction system for video hits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 2054–2056 (2014) 61. Pohjankukka, J., Riihimäki, H., Nevalainen, P., Pahikkala, T., Ala-Ilomäki, J., Hyvönen, E., Heikkonen, J.: Predictability of boreal forest soil bearing capacity by machine learning. J. Terramech. 68, 1–8 (2016) 62. Lubis, F.F., Rosmansyah, Y., Supangkat, S.H.: Gradient descent and normal equations on cost function minimization for online predictive using linear regression with multiple variables. In: 2014 International Conference on ICT for Smart Society (ICISS), pp. 202–205. IEEE (2014) 63. Lin, B.H., Tseng, S.F.: A predictive analysis of citizen hotlines 1999 and traffic accidents: a case study of Taoyuan city. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 374–376. IEEE (2017) 64. Wu, C.H., Kao, S.C., Kan, M.H.: Knowledge discovery in open data of dengue epidemic. In: Proceedings of the 4th Multidisciplinary International Social Networks Conference, pp. 1– 8 (2017) 65. Grzegorowski, M.: Massively parallel feature extraction framework application in predicting dangerous seismic events. In: 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 225–229. IEEE (2016) 66. Sarker, F., Tiropanis, T., Davis, H.C.: Students’ performance prediction by using institutional internal and external open data sources (2013) 67. Prabakar, A., Wu, L., Zwanepol, L., Van Velzen, N., Djairam, D.: Applying machine learning to study the relationship between electricity consumption and weather variables using open data. In: 2018 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGTEurope), pp. 1–6. IEEE (2018) 68. Goldstein, E.B., Coco, G., Plant, N.G.: A review of machine learning applications to coastal sediment transport and morphodynamics. Earth Sci. Rev. 194, 97–108 (2019)
Open Data in Prediction Using Machine Learning
553
69. Lee, J., Park, G.L.: Temporal data stream analysis for EV charging infrastructure in Jeju. In: Proceedings of the International Conference on Research in Adaptive and Convergent Systems, pp. 36–39 (2017) 70. Cecconi, F.R., Moretti, N., Tagliabue, L.C.: Application of artificial neutral network and geographic information system to evaluate retrofit potential in public school buildings. Renew. Sustain. Energy Rev. 110, 266–277 (2019) 71. Petrlik, J., Sekanina, L.: Towards robust and accurate traffic prediction using parallel multiobjective genetic algorithms and support vector regression. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems, pp. 2231–2236. IEEE (2015) 72. Shen, S.K., Liu, W., Zhang, T.: Load pattern recognition and prediction based on DTW Kmediods clustering and Markov model. In: 2019 IEEE International Conference on Energy Internet (ICEI), pp. 403–408. IEEE (2019) 73. Shan, S., Cao, B.: Forecasting the degree of crowding in urban public open space upon multisource data. In: 2016 9th International Symposium on Computational Intelligence and Design (ISCID), vol. 2, pp. 69–74. IEEE (2016) 74. Violos, J., Pelekis, S., Berdelis, A., Tsanakas, S., Tserpes, K., Varvarigou, T.: Predicting visitor distribution for large events in smart cities. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8. IEEE (2019) 75. Goel, M., Sharma, N., Gurve, M.K.: Analysis of global terrorism dataset using open source data mining tools. In: 2019 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 165–170. IEEE (2019) 76. Pradhan, I., Potika, K., Eirinaki, M., Potikas, P.: Exploratory data analysis and crime prediction for smart cities. In: Proceedings of the 23rd International Database Applications and Engineering Symposium, pp. 1–9 (2019)
Big Data Analytics Based Model for Red Chili Agriculture in Indonesia Junita Juwita Siregar1 and Arif Imam Suroso2(B) 1 Computer Science Department, School of Computer Science, Bina Nusantara University
Jakarta Indonesia, 11480 West Jakarta, Indonesia [email protected] 2 School of Business, IPB University, Bogor, West Java, Indonesia [email protected]
Abstract. Horticulture chili plants are a food crop with significant role in the Indonesian macroeconomy. Chili production increases in certain months, which is followed by domestic demand for chili. As production centers for chili plants are concentrated in several provinces, production sometimes is unable to meet consumption demands. This results in a severe increase in chili prices in certain months. The disparity in chili prices in some areas can impact farmers. The purpose of this paper is to develop a model that can be used for the implementation of big data analytics (BDA) in the red chili horticultural agro-industry. Big Data Analytics techniques are applied to develop a predictive model. The research method used a qualitative content analysis approach. The results of this study is a proposed model based on BDA that is applicable to agribusiness of red chili plants in Indonesia. Thus, it could be used in making decisions for farmers to plan optimal chili production schedule and plan logistics and distribution chains to several regions so that farmers can reduce the production cost and increase profit for the farmer. Keyword: Machine learning · Big data analytics · Red chili agriculture
1 Introduction 1.1 Background The chili plant commodity in Indonesia consists of various variants, including other large chilies, such as big red peppers and curly red chilies, and cayenne chili consisting of green chili pepper and red cayenne chilies [4]. Among these variants, curly red chilies are the most often consumed by Indonesians. The cultivation of curly chilies requires considerable investment, however, if the price of curly chilies is high at harvest, the profits are also high [26]. Chili production centers in Indonesia are located in the island of Java, which contributed 58.3 percent in 2009 to the national chili production [8]. The disparity in the price of chili between regions in February 2019 decreased when viewed based on resident card, while the monthly price between regions for red chilies reached 36.56%, and cayenne pepper was 37.30% when compared to January 2019 [26]. If considered © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 554–564, 2021. https://doi.org/10.1007/978-3-030-70713-2_51
Big Data Analytics Based Model
555
by city, fluctuations in red chili prices differ between regions [5]. In terms of the supply, the process of chilli supply, both production and distribution, is not fully controlled by farmers. The main factor causing disparities is “small” chili farmers where the decisionmaking process for production is not supported by a good forecasting of production and prices. The problem of extreme fluctuations and the commodity distribution chain has become a polemic from year to year. When the supply of chilies is less or lower than consumption, the price will increase and vice versa. Often domestic production cannot meet the high demand for chili in a particular region, thus dependence on supplies from other regions is inevitable (Fig. 1).
Fig. 1. Production of Red chili in Indonesia (2014–2018) resource BPS RI
Based on the description of the above problems, the aim of this study is to apply big data technology that is being widely applied in the agriculture to chili plants in order to achieve modern agriculture 4.0. Big data is related to 5Vs: volume, variety, velocity, veracity and value [14, 24]. Volume is related to the size of data that is as big as the capacity of the building data storage devices increase [14, 24]. Variety reflects structured and unstructured data formats [14, 24]. Velocity relates to the speed at which data is transferred either in real-time, stream-lines or per-batch [14, 24]. Veracity emphasizes quality, and level of confidence of the data in relation to data sources [14, 24]. Value is related to the process of “revealing underexploited value” from big data to support decision making [14, 24]. The use of big data analitics [22, 27], in this case, is to predict optimum prices and future chili production that is aimed at helping farmers to make more effective production plans by determining the allocation of resources on the side of the production, such as at what volume and what quality to produce in accordance with market demands. Other factors to consider include the distribution systems, prices, implications due to inflation, and determination of ideal commodity prices for farmers. The problem of limited land area for chilies, government intervention to import during dry season, imbalance in supply and infrastructure conditions demand, infrastructure conditions delays in the distribution of supplies to the regions and lack of farmer capital.
556
J. J. Siregar and A. I. Suroso
2 Methods The current research pursues a literature review using qualitative content analysis approach. The author finds a problem in the chili crop agri-industry using the fishbone [1] methodology (Fig. 2).
Fig. 2. Fishbone [1] methodology of chili crop agroindustry
According to the problem analysis, this study tested the literature review of developed BDA models and techniques in the red chili agri-industry in Indonesia. The literature is reviewed pertaining to big data analytics and methods of implementation for red chili agriculture. Findings from the literature review identify what areas have applied BDA to red chili agriculture. The objectives of the literature review are: • In what areas is BDA applied to red chili agriculture ? • What BDA techniques and model were used to develop this model ? • What is an accurate model to increase chili production by predicting future prices chili to support farmers in planning logistics and distribution chains to several regions to cover the chili supply gap ?. 2.1 Review on Big Data Analytics This method is applied to determine the extent to which big data analytic [22, 27] used to support the decision making process, as well as to understand the types of red chili farming problems that are being resolved. We looked at BDA research that focuses on prediction of yields of various crops. The authors found six papers analyzing BDA Applications. Most of the studies in the literature examine the application of big data analytic [27] to specific supply chain functions [37]. This paper addresses various supply chain issues related to sustainability, risk management, whereas demand forecasting literature, finding current and future demand are among the most common model predictions when applying big data analytic [27]. What kinds of supply chain problems are there in chili farming, if solved, will provide larger gains. This study analysis was monitored to the extent to which BDA was used to support the decision-making process. The greater part
Big Data Analytics Based Model
557
of the papers reviewed focused on BDA [3] prescriptive application, optimization [31], machine learning and datamining, thus prescriptive analytics [18, 24] of big data analytics for the adoption of optimization [7] and modeling machine learning seem reasonable to support decision making process. For the predictive analytic level, classification is the most widely used BDA model in the SCM [3, 31] Context. This model is intended to classify a large quantity of data objects into predetermined categories, resulting in predictions with a high level of accuracy for prescriptive analytics [18, 24] of BDA for the adoption of optimization [31] and modeling simulation to support decision making. The reason for using this method is to see the extent to which big data analytics [22, 27] is used to support the decision making process as well as to understand the types of red chili farming problems that are being at hand. The distribution of the big data analytic across each field study is shown in Table. 1 and the clustering of the literature that has been inspected by main application topic is summarized in: Table 1. Classification of the examined literature by BDA application topic, model and technique BDA topic
BDA model
BDA technique
Author
Operational
Regression analysis
Machine learning
[9]
Security
Classification
Naïve Bayes
[10]
Framework
Selection
KNN
[11]
Social learning
Clustering
K-mean
[12]
Project management
Linear programming
Optimization
[16]
Inventory
Forecasting
Statistical
[17]
Medical
Selection
Hidden Markov
[18]
Finance banking
Clustering
Genetic algorithm
[19]
SCM
Clustering Regression analysis
Data mining Optimization
[20, 30, 36] [38, 39, 41]
Transportation
Simulation
Deep learning
[21]
Business management
Clustering
Sentiment analysis
[23]
Innovation
Classification
Neural network
[25]
Cancer, Cardiovascular
Clustering
Text mining
[28, 29]
Detection system
Classification
SVM
[32]
Crop yield
Forecasting
Machine learning
[33]
Harvesting
Clustering
Time-series
[35]
Product manufacture
Classification
SVM
[36]
Vehicle routing
Clustering
Genetic algorithm
[42]
Logistic
Classification
Machine learning
[43]
558
J. J. Siregar and A. I. Suroso
2.2 Proposed BDA Model. The proposed of Big Data Analytics [22, 27] Model approach which will be implementing for BDA in agriculture for red chile (Fig. 3).
Fig. 3. Big data analytics model [22] for red chili agriculture
The following is an explanation of each big data analytics (BDA) analysis model mentioned above: Descriptive Analytics. Descriptive Analytics [13, 15] uses two main methods: data aggregation and data mining (also known as data discovery), to find historical data. Data aggregation is the process of collecting and organizing data to create manageable data sets.Descriptive analytics [13, 15] can identify areas that need improvement or change. The following are examples of descriptive analysis: • Summary of past events such as chili sales and marketing distribution data in each province. • Report on sales trends and demand trends of chili prices on a website. • Compilation of survey results. Related to this research, the descriptive analytics data collected is in the form of data on the area of chili production, chili consumption data, and chili production data Indonesia. As an example, below is chili production data in all Indonesia provinces which is obtained from an official website of Indonesian Ministry of Agriculture. Predictive Analytics. Predictive Analytics [30] focuses on predicting and understanding what happens in the future. Analysis looks at past data patterns and trends by looking at historical data in the form of data in areas of chili production in each province, the amount of chili production in each province, the amount of chili supply - demand in each province and the price of chili in each province in Indonesia over a period of 10 years (backward). Subsequently, the analysis is applied to determine what factors affect the
Big Data Analytics Based Model
559
disparity in chili prices in several regions. From this data, a clustering model is developed to make it easier to predict the future. This gives information about many aspects of the business, including setting realistic goals, effective planning, managing performance expectations, and avoiding the risk of crop failure. For example, chili consumption data retrieved from an official website of Indonesian Ministry of Agriculture. Prescriptive Analytics. Prescriptive Analytics [19, 25, 35]. At this stage, the analysis is based on what has been produced through descriptive and predictive analysis [34] regarding the future, providing valuable insights for making the best data-based decisions to optimize business performance. However, like predictive analytics, this methodology requires a large amount of data to produce useful results, which are not always available. Also, the machine learning [2, 40] algorithms, on which this analysis is based, cannot always explain all external variables. Associated with this research plan, the analysis will result in an optimization value of the production cost of chili plants, making it easier for farmers to plan planting and post-harvest periods. Optimization [31] of the distribution of chili plants will be generated based on the classification of high-medium-low chili producing regions. This distribution can be used to maintain the availability of chili food in relation to the demand for chili consumption to maintain stable chili pepper price that do not go up too high and go down so low. Thus, prescriptive analytics [19, 25, 35] will provide insights to policy makers for how to maintain national chili price stability. Proposed BDA Technique. The following is a proposed BDA tehnique of machine learning algorithm SVM [15] and LSTM [6] model that will be built in this research. The new model proposed here is how modification algorithm, SVM and LSTM algorithms can generate a preditction of chili price, prediction of chili consumption, prediction of chili demand, prediction of chili production within 10 years can be more accurate than the current model so that it can produce optimization values that can be used by the government in making a policy on the national chili food security in Indonesia (Fig. 4). Data Collection. Data collection stage area of chili plants in each province are total production of chili plants, total consumption of chili per region, the quantity of demand and supply of chilies from each region, the price of chili in each area, and 2010–2020 data history from web data the official government site of the Central Statistics Agency and the Directorate General of Horticulture. Preprocessing Data. The preprocessing stage is divided into two stages, namely: Cleaning Data. This stage is checking whether there is duplication in the historical data such as the area of chili plants, the amount of production, the amount of consumption, fixes for data errors, looking for data with missing value, which is given a value of 0 (zero). Data Transformation. This stage is transforming the numerical data on chili price data, production costs, and production of chili plants into nominal data. This production yield parameter will then be used as a classification parameter. The final result of this stage is to generate a training dataset.
560
J. J. Siregar and A. I. Suroso
Fig. 4. Proposed BDA technique [6, 15] model development
Model Development. This is the stage of making a system model, using the SVM [15] and LSTM [6] algorithm and, where at this stage a prediction model will be developed for the amount of chili production, the price of chili, the quantity of demand and supply of chili, and the price of chili using the machine learning [2] algorithm method and classification techniques to obtain provincial classes of chili production, such as high-medium-low, province class with high - medium-low chili consumption and high - medium - low disparity in the regional class. Validation Model. Validation will measure and evaluate the performance of the model to determine the accuracy of the model in predictions about chili plant production. The stage where the classification model has been tested is the final result of the SVM [15] and LSTM [6] algorithm training data generation process. This model is then expected to be a recommendation model for predicting chili plant production.
Big Data Analytics Based Model
561
3 Conclusion The literature review applies a qualitative content analysis method to address several questions and provide an overview of how big data analytics implementation can be applied to the chili agribusiness and methods that will support decision making in the red chili agribusiness. The implementation of big data agriculture 4.0 to red chili agriculture requires that each chili production area will have a distribution center to supply chili consumption areas and thus ensure a balance in supply and create stable chili prices. Big data analytic [22] models for red chili agriculture thru machine learning [2, 8] that predicts chili plant production can be used to strengthen the supply chain of the chili agro-industry. Information regarding chili crop production areas, consumption data, price disparity data, chili export data, price, and supply/demand data can be predicted based on historical data. The chili agribusiness management [26] process starts upstream and moves downstream with farmers planning chili crop patterns according to market needs, selection of chili seed varieties, dry land management, fertilization, irrigation and post-harvest management. Big data analytics [22, 27] will build network management as an organizational structure and technology in a network that facilitates coordination and process management carried out by actors at the stakeholder network layer. In this case, corporate horticulture, business management and marketing management are the key actors in the supply chain [37] of an effective chili agribusiness. Application of big data supports: • • • •
Farmers in building cropping patterns based on market needs. Wholesaler coordination with inter-regional chili distribution networks. Government purchasing system policy to manage the disparity in chili prices. Governmental decision-making on chili food security through the development of post-harvest chili processing industry technology.
3.1 Research Limitations and Future Research. This research has several limitations. We are unable to calculate the costs required for application of big data in agriculture, especially chili plants and architecture for big data systems. The development of big data in the chili agro-industry has not yet been applied by private businesses in Indonesia. Further in depth studies are needed to look at: • Technology components focused on the information infrastructure that supports the data chain. • Human resources/organizational components that focused on governance and business models. • Network management of organizational structure and technology in the network that facilitates the coordination and management of processes carried out by actors at the stakeholder network layer. For future research, it is necessary to investigate factors that could potentially reduce validity in big data analysis, especially in relation to the chili agroindustry. Research is also needed to measure the success rate of the proposed Big Data Analytics approach model.
562
J. J. Siregar and A. I. Suroso
References 1. American Society of Brewing Chemists.: Fishbone References for Applied Brewing Scientists An Essential Troubleshooting and Diagnostic Tool (2019) 2. Muller, A.C., Guido, S.: Introduction to Machine Learning with Phyton: a Guide for Data Scientists. “O’Reilly Media (2016). ISBN-10: 978-1449369413 3. Barbosa, M.W., Vicente, A.D.L.C., Ladeira, M.B., Oliveira, M.P.V.D.: Managing supply chain resources with big data analytics: a systematic review. Int. J. Logistics Res. Appl. 21(3), 177–200 (2018) 4. Biro Pusat Statistik Indonesia,BPS.:Distribusi Perdagangan Komoditas Cabai Merah Indonesia Jakarta (2018). 5. Badan Peneltian dan Pengembangan Pertanian.: Grand design lumbung pangan dunia Roadmap Dukungan Teknologi pada Pengembangan Cabai, 2016–2045. Jakarta (ID) (2016) 6. Bianci, F.M., Maiorino, E., Rizzi, A., Jenssen, R.: Recurrent Neural Networks for Short Term-Load Forcasting an Overview and Comparative Analysis, Springer Brief in Computer Science (2017). ISBN 978–3–319–70337–4, https://doi.org/10.1007/978-3-319-70338-1 7. Brouer, B.D., Karsten, C.V., Pisinger, D.: Big data optimization in maritime logistics. In: Emrouznejad, A. (ed.). Big Data Optimization: Recent Developments and Challenges. Springer International Publication of the series Studies in Big Data, vol. 18, pp. 319–344 (2016). https://doi.org/10.1007/978-3-319-30265-2_14 8. Ratner, B.: Statictical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition. CRC Press (2011). ISBN-10: 1439860912 9. Chae, B., Yang, C., et al.: The impact of advanced analytics and data accuracy on operational performance: a contingent resource based theory (RBT) perspective. Decis. Support Syst. 59, 119–126 (2014) 10. Cyber-Security: The Use of Big Data Analytic Model for Network Intrusion Detection Classification Computer Engineering and Intelligent Systems, vol. 10(7) (2019) 11. Shah, P., Chaudhary, S.: Big data analytics framework for spatial. In: 6th International Conference BDA, Proceedings, Warangal, India (2018) 12. Dascalu, M., Tesila, B., Radu, I.C.: A big data analytics tool for social learning management systems. In: Proceedings of 14th Intenational Scientific Conference Elearning and Software for Education: elearning Challenges and New Horizons, vol. 2 (2018) 13. Deshpande, A., Kumar, M.: Artificial Intelligence for Big Data, Solution using Artificial Intelligence Techniques. Published by Packt Publishing Ltd., Livery Place, 35 Livery Street Birmingham, B3 2PB, UK (2018). ISBN 978–1–78847–217–3 14. Deng, N., Tian, Y., Zhang, C.: Support Vector Machine Optimization Based Theory, Algorithm and Extensions. CRC Press, Taylor and Francis Group, LLC (2013) 15. Duan, L., Xiong, Y.: Big data analytics and business analytics. J. Manage. Anal. 2(1), 1–21 (2015) 16. Duta, D., Bose, I.: Managing a big data project: the case of Ramco Cements Limited. Int. J. Prod. Econ. 165, 293–306 (2015) 17. Downing, M., Chipulu, M., Ojiako, U., Kaparis, D.: Advanced inventory planning and forecasting solutions: a case study of the UKTLCS Chinook maintenance programme. Prod. Plan. Control 25(1), 73–90 (2014) 18. Ganesh, S., Talukhder, A.K.: Formal methods, artificial intelligence, big-data analytics, and knowledge engineering in medical care to reduce disease burden and health disparities. In: 6th International Conference BDA Proceedings, Warangal, India (2018) 19. Guha, A., Veeranjaneyulu, N.: Prediction of bankruptcy using big data analytic based on fuzzy C-means algorithm. IAES Int. J. Artif. Intell. 8(2), 168 (2019)
Big Data Analytics Based Model
563
20. Hahn, G.J., et al.: A perspective on applications of in-memory analytics in supply chain management. Decis. Support Syst. 76, 45–52 (2015) 21. IBM: White paper 2014, Big data and analytics in travel and transportation, beyond the hype: solutions that deliver big value (2014) 22. John Wiley and Sons Inc.: Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data Published by 10475 Crosspoint Boulevard Indianapolis, IN 46256 (2015). www.wiley.com 23. Kwon, K., et al.: A real-time process management system using RFID data mining. Comput. Ind. 65, 721–732 (2014) 24. Lepenioti, K., Bousdekis, A., Apostolou, D., Gregoris, M: Prescriptive analytics: literature review and research challenges. Int. J. Inf. Manage. 50, 57–70 (2020) 25. Manyika, J., et al.: Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute, New York City (2011) 26. Ministry of Agriculture.: Outlook for Agricultural Commodity, Horticulture Sub-Sector: Chilli, Jakarta, Ministry of Agriculture Republic of Indonesia (2016). (in Indonesian) 27. Nguyen, T., Zhou, L.,Leromonachou, P., Lin, Y.: Big data analytics in supply chain management: a state-of-the-art literature review. Comput. Oper. Res. (2017). https://doi.org/10.1016/ j.cor.2017.07.004 28. Raghupathi, V., Zhou, Y., Raghupathi, W.: Exploring big data analytic approaches to cancer blog text analysis. Int. J. Healthc. Inf. Syst. Inf. (2019) 29. Rumsfeld, J.S., Joynt, K.E., Maddox, T.M.: Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13(6), 350 (2016) 30. Schoenherr, T., Pero, C.S.: Data science, predictive analytics, and big data in supply chain management: current state and future potential. J. Bus. Logist. 36(1), 120–132 (2015) 31. Srinivas, S., Ravindran, A.R.: Optimizing outpatient appointment system using machine learning algorithms and scheduling rules: a prescriptive analytics framework. Expert Syst. Appl. 102, 245–261 (2018) 32. Sulaiman, N.S., Aziz, N.S., Samsudin, N., Mohamed, W.A.: Big data analytic of intrusion detection system. Int. J. Adv. Trends Comput. Sci. Eng. (2020) 33. Surianarayanan, C., Palanivel,K.: An approach for prediction of crop yield using machine learning and big data techniques. Int. J. Comput. Eng. Technol. 10(3), 110–118 (2019) 34. Soltanpoor, R., Sellis, T.: Prescriptive analytics for big data. In: Cheema, M.A., Zhang, W., Chang, L. (eds.) 27th Australasian Database Conference: ADC 2016. Databases Theory and Applications. LNCS, vol. 9877, pp. 245–325, Sydney, NSW. Springer International Publishing (2016) 35. Tan, K.H., et al.: Harvesting big data to enhance supply chain innovation capabilities: an analytic infrastructure based on deduction graph. Int. J. Prod. Econ. 165, 223–233 (2015) 36. Tao, F., Cheng, J., Qi, Q., Zhang, M., Zhang, H., Sui, F.: Digital twin-driven product design, manufacturing and service with big data. Int. J. Adv. Manuf. Technol. 94(9–12), 3563–3576 (2018) 37. Tiwari, S., Wee, H.M., Daryanto, Y: Big data analytics in supply chain management between 2010 and 2016: Insights to industries. Comput. Ind. Eng. 115, 319–330 (2018) 38. Waller, M.A., Fawcett, S.E.: Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. J. Bus. Logist. 34, 77–84 (2013) 39. Wang, G., et al.: Big data analytics in logistics and supply chain management: certain investigations for research and applications. Int. J. Prod. Econ. 176, 98–110 (2016) 40. Wang, C.H., Cheng, H.Y., Deng, Y.T.: Using Bayesian belief network and time series model to conduct prescriptive and predictive analytics for computer industries. Comput. Ind. Eng. 115, 486–494 (2018) 41. Zhao, R., Liu, Y., Zhang, N., Huang, T.: An optimization model for green supply chain management by using a big data analytic approach. J. Cleaner Prod. 142, 1085–1097 (2017)
564
J. J. Siregar and A. I. Suroso
42. Zheng, S.: Solving Vehicle Routing Problem: A Big Data Analytic Approach (2019) 43. Zhong, R.Y., et al.: A big data approach for logistics trajectory discovery from RFID-enabled production data. Int. J. Prod. Econ. 165, 260–272 (2015)
A Fusion-Based Feature Selection Framework for Microarray Data Classification Talal Almutiri1(B) , Faisal Saeed2 , Manar Alassaf3 , and Essa Abdullah Hezzam2 1 Department of Information Systems, Faculty of Computing and Information Technology,
King Abdulaziz University, Jeddah, Saudi Arabia 2 Department of Information Systems, College of Computer Science and Engineering,
Taibah University, Medina, Saudi Arabia {fsaeed,ehezzam}@taibahu.edu.sa 3 Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia [email protected]
Abstract. Gene expression profiling uses microarray techniques to discover patterns of genes when they are expressed. This helps to draw a picture of how the cell performs its function and determines whether there are any mutations. However, microarrays generate a huge amount of data which causes a computational cost and is time-consuming in the analysis process. Feature selection is one of the solutions for reducing the dimensionality of microarray datasets by choosing important genes and eliminating redundant and irrelevant features. In this study, a fusion-based feature selection framework was proposed that aims to apply multiple feature selection methods and combine them using ensemble methods. The framework consists of three layers; in the first layer, there are three feature selection methods that worked independently for ranking genes and assigned a score for each gene. In the second layer, a threshold is used to filter each gene according to their calculated scores. In the last layer, the final decision about which genes are important is made based on one of the decision voting strategies, either majority or consensus. The proposed framework presented an improvement in terms of classification accuracy and dimensionality reduction when compared with other previous methods. Keywords: Cancer classification · Gene expression · Feature selection · Fusion · Microarray data
1 Introduction Deoxyribonucleic acid (DNA) is the genetic molecule that holds all the important information about how the organism cells will be built and maintained [1]. The molecules of DNA are packaged in a special form or a structure called a chromosome, which works to protect and organize the DNA during cell division [2]. Each human cell contains 46 chromosomes organized in 23 pairs. These chromosomes hold hereditary material known as genes [1]. Genes are a series of instructions that decide the appearance of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 565–576, 2021. https://doi.org/10.1007/978-3-030-70713-2_52
566
T. Almutiri et al.
the organism, and how an organ lives and functions. Also, they are responsible for generating proteins that help in building a human body [3]. There are about 20,000 to 23,000 genes in the cells of human beings, and each gene has a specific job as a part of the process of cell development [3]. Genes are expressed to produce proteins inside cells; this entire process, according to the central dogma in biology, is known as gene expression. The gene expression process includes DNA replication, converting DNA into mRNA (transcription), and translating mRNA into proteins [4, 5]. Even though DNA is the same in different types of the organism cells, each time the cell expresses only a set of its genes. When genes are expressed, there is some information which needs to be measured through a process defined as gene expression profiling. Gene expression profiling is a method for discovering the pattern of genes expressed and measuring the level of DNA transcription in a cell during a specific time [6]. Therefore, the output of this process helps to draw a picture of how the cell is performing its function and determines whether there are some mutations when genes are expressed. A mutation is the alteration in the sequence of DNA spelling which makes the gene different in normal people, and this is used as an indicator of some diseases. One of the techniques used to discover mutations or analyze gene expression is microarray [7]. A DNA microarray or DNA chip can be defined as a set of microscopic DNA spots organized in rows and columns that are connected to a solid surface. Each DNA spot includes many thousands of replicas of a particular DNA sequence. Microarray produces a massive amount of data that requires computational complexity through analysis. Therefore, microarray data is suffering from the “curse of dimensionality”. Curse of dimensionality means a dataset has a large number of features (columns) and a small number of samples (rows). Large dimensions cause high computational effort for processing, waste of storage space, poor visualization in exploring data, and a negative effect on prediction results [8]. Feature selection is one of the techniques that helps in reducing dimensionality. The terms of feature selection, dimension reduction, or variable selection are widely used in statistics and machine learning. Feature selection is a process of eliminating unnecessary features by applying statistical or mathematical techniques [9, 10]. Applying a single feature selection method has achieved notable results in many studies. On the other hand, a fusion-based technique, which combines or ensemble two or more features selection methods and applies one of the decision styles, such as majority voting to obtain the selected features, could positively affect in selecting important features [11, 12]. Gene expression microarray datasets are suffering from dimensionality which affect classification results. Therefore, this paper proposes a fusion-based feature selection framework to reduce the dimensionality of gene expression microarray datasets and enhance classification results. This framework consists of three layers: feature selection methods layer, selecting the threshold layer, and decision layer. The three-layer framework is discussed in more detail in Sect. 3. The proposed solution was applied to five microarray datasets for cancer namely brain, breast, central nervous system (CNS) cancer, colon, and prostate. In the classification phase, support vector machine (SVM), random forest and gradient boosting were used.
A Fusion-Based Feature Selection Framework
567
2 Related Studies This section presents studies on applying single feature selection methods and fusionbased selection methods for gene expression datasets. Ke et al. [13] employed the fusion method which is called the score-based criteria fusion (SCF). The SCF method was based on aggregating scores of two criteria (SU and ReliefF), and then the features were sorted based on their values in the final score vector. Momenzadeh et al. [14] integrated five feature selection (FS) ranking methods such as Bhattacharyya distance, entropy, and receiver operating characteristic curve, t-test and Wilcoxon by using the hidden Marko model (HMM) to select informative genes. Stable results were noticed from the smaller standard deviation, reflecting the robustness of the proposed method compared to other FS approaches. Xiaohui et al. [15] presented a modified hybrid feature selection method called MSVM-RFE-OA which combined support vector machine recursive feature elimination (SVM-RFE) with overlapping ratio area. Their approach applied to eight microarray datasets. Also, it achieved better results when compared with the original SVM-RFE and SVM-RFE-OA. Rajangam et al. [16] introduced a fusion feature selection that combined Correlation Feature Selection (CFS) as a filter method and the Velocity Clamping Particle Swarm Optimization (VCPSO) as a wrapper method. The CFS-VCPSO enhanced the classification accuracy of microarray datasets, where all datasets achieved higher than 90% of accuracy. Borja et al. [17] proposed a fusion-based method that combined four filter methods: Chi-Square, Information Gain, Minimum Redundancy Maximum Relevance (mRMR), and ReliefF. Each filter method worked independently, then the results of all methods were combined. Finally, they used the Fisher discriminant ratio as a threshold to select the final subset of genes that were considered as informative genes. Morovvat et al. [18] presented an ensemble method of filter and wrapper for gene expression microarray datasets. They applied five feature selection methods, and the filter methods applied first to reduce wrapper complexity; then, the wrapper was used. Three classifiers were introduced in an ensembling manner, in which the final decision for the predicted class was selected based on decision-making strategies such as majority voting or consensus patterns.
3 Methods This section presents the proposed framework for fusion-based feature selection, which consists of three layers. Each layer has a specific task to accomplish. The first layer has multiple feature selection methods used to calculate a score for each gene. In the second layer, the threshold is used to evaluate each gene through a flag called support. When the feature crosses the threshold, the support will be true. The third layer is the decision, which is made using different strategies such as majority or consensus according to the support flags assigned with all genes for all feature selection methods. The proposed fusion-based feature selection framework is shown in Fig. 1.
568
T. Almutiri et al.
Fig. 1. The proposed fusion-based feature selection framework
3.1 Fusion-Based Feature Selection Feature Selection Methods Layer. In this layer, three feature selection methods were implemented to calculate a coefficient or feature importance using linear algorithms. The three methods were linear regression, decision tree, and least absolute shrinkage and selection operator (LASSO). Each method assigned a score to all features or genes according to the relevancy of the feature to the target class. A brief description of each method follows: LASSO. The LASSO is a regularization and variable selection method for statistical models. The LASSO reduces the sum of squared errors, with an upper bound on the sum of the absolute values of the model coefficients. The lasso estimation is defined by the solution to the l1 optimization problem [19] as defined in the following equations: k Y − X β22 β1 < t (1) minimize Subjected to j=1 n where t is the upper bound for the sum of the coefficients. The LASSO as the FS method, the features with a coefficient equal to zero will be considered as irrelevant features and then excluded. Linear Regression. Linear regression is a sort of regression analysis which is a linear relationship between the outcome (X) and dependent (Y) variable. If there is more than one feature, it is called multiple linear regression, where each predicted response has the following equation: Y = β0 + β1 X1 + β2 X2 + · · · + βn Xn
(2)
Therefore, linear regression attempts to fit a linear model with features’ coefficients to minimize the average of squared error occurring between the predicted values and actual values. Hence, to employ the linear regression as the FS method, the best coefficients will be computed and then the feature will be excluded if the coefficient of this feature is less than the mean of all coefficients.
A Fusion-Based Feature Selection Framework
569
Decision Tree. The decision tree as feature selection provides information about feature importance which helps in features or variables ranking. In the decision tree, feature importance is calculated using Gini Importance or Mean Decrease in Impurity (MDI). Then, genes are ranked by summing the impurity reductions overall tree nodes where a split was produced on that feature, with impurity reductions weighted to account for the size of the node. Gini Impurity is calculated as [20, 21]. GI =
C i=1
P(i) ∗ (1 − P(i))
(3)
where C is a number of the classes and P(i) is the probability of selecting a data point with class i. Threshold Layer. The threshold layer works to evaluate each gene based on the score defined in the previous layer. When the coefficient or importance of a gene passes the threshold, it is presented as a support gene. The proposed framework provides a self-evaluation threshold, which means the FS approaches have a specific threshold or criteria for evaluating genes and selecting them as informative features. For example, the threshold for the Lasso is 1e-5, according to the scikit-learn library in Python. Decision Layer. The proposed framework performs either majority voting or consensus for the final decision about all genes. Therefore, the support flags for each FS method is the vote that is considered in this layer. In the majority voting rule, the genes are selected as important ones when they have more than half the votes. When a tie happens, the final decision of the gene flagged as not supported. In the consensus rule, the gene should be voted unanimously by all the FS methods to be made an important gene.
3.2 Classification Phase The section presents a brief description of the three classifiers implemented in the proposed framework. Support Vector Machine. SVM is one of the most powerful classification algorithms, which is based on the statistical theory of solving the quadratic optimization problem [22]. The working of SVM is based on constructing the optimal hyperplane (decision surface) in the training phase to separate the data with the maximum generalization ability. Random Forest. A random forest is a tree-based ensemble with each tree depending on a collection of random variables. The random forest classifier constructs the number of decision trees on data samples and then predicts each of them. Finally, the best solution is selected by voting. It is worth mentioning that the random forest has archived an excellent result in the microarray data classification task [23]. Gradient Boosting. Gradient boosting is a powerful ensemble technique in which the predictors are sequential, where the subsequent predictors learn from the mistakes of the previous predictors. Thus, new predictors learn from the previous predictors’ mistakes;
570
T. Almutiri et al.
it takes fewer iterations to reach close to actual predictions. However, the stopping criteria should be selected carefully to avoid overfitting on the training data phase. Typical predictors that are employed in gradient boosting, are usually decision trees. Accordingly, gradient boosting’s approach is sufficiently strong to introduce an outstanding contribution to gene expression microarray datasets [24]. Hyper-parameters Tuning. Hyper-parameter tuning and optimization is a process of examining the settings of parameters before running a learning algorithm. Hyperparameter tuning is used to find the optimal settings for a specific algorithm, which can enhance the accuracy and performance of the learning process [24]. In the current study, there were some of the hyper-parameter values of each used classifier, that were tuned based on a specific range of values. Table 1 shows the names of tuned hyper-parameters for each classifier. Table 1. The hyper-parameters and values used in the three classifiers Hyper-parameter
Values
Role
C
[1, 0.1, 0.01, 0.001, 10,100]
The strength of the regularization
Gamma
[1, 0.1, 0.01, 0.001, 0.0001, 10, 100]
Gamma: Kernel coefficient for ‘rbf’ and ‘sigmoid’
Kernel
[“sigmoid”, “linear”, “rbf”]
The type of kernel that will be used in the SVM
n_estimators
[200,300,400, 500]
The number of trees in the forest
max_features
[‘auto’, ‘sqrt’, ‘log2’]
The number of features to consider when looking for the best split
criterion
[‘gini’, ‘entropy’]
The function to measure the quality of a split
max_depth
[4–8]
The maximum depth of the tree
learning_rate
[0.01, 0.025, 0.05, 0.075, 0.1]
Learning rate shrinks the contribution of each tree by learning rate
max_depth
[3, 5, 8]
Maximum depth of the individual regression estimators
max_features
[“log2”,“sqrt”]
The number of features to consider when looking for the best split
criterion
[“friedman_mse”, “mae”]
The function to measure the quality of a split
subsample
[0.5, 0.6,0.7, 0.8, 0.9, 1.0]
The fraction of samples to be used for fitting the individual base learners
SVM
Random Forest
Gradient Boosting
A Fusion-Based Feature Selection Framework
571
4 Experimental Design This section presents the five datasets used in this paper and the implementation of the proposed fusion-based feature selection framework. The experiment included the six scenarios to demonstrate the framework options for each of the proposed layers. 4.1 Datasets The introduced framework was tested on five gene expression microarray datasets for different kinds of cancers, including Brain, Breast, Central Nervous System (CNS), Colon, and Prostate as presented in Table 2. Table 2. The description of the five datasets Dataset
# Features
# Instances
# Classes
Brain
5597
42
5(10,10,10,4,8)
Breast
24481
97
2(46,51)
CNS
7129
60
2(21,39)
Colon
2000
62
2(40,22)
Prostate
6033
102
2(50,52)
Source: [25]
4.2 The Implemented Scenarios of the Proposed Framework In this study, six scenarios shown in Table 3 were used to demonstrate the framework options for each of the proposed layers. Table 3. The six implemented scenarios of the proposed framework. Name
F.S methods layer
Threshold layer
Decision layer
Base
Baseline – without FS
–
–
Maj (All)
Linear regression, LASSO, and decision tree
Majority
Cons (All)
Linear regression, LASSO, and decision tree
Self-evaluation: for Linear regression and decision tree the threshold is the mean, and for LASSO is 1e-5
Cons (LASSO, LR)
LASSO and Linear regression
Cons (LASSO, DT)
LASSO, and decision tree
Consensus
Cons (LR, DT)
Linear regression and decision tree
Consensus
Consensus Consensus
572
T. Almutiri et al.
4.3 Performance Evaluation Measure Classification accuracy was used in this study to evaluate the proposed framework performance. Classification accuracy is calculated as the sum of correct samples that a classifier predicted divided by the total number of predictions. Accuracy =
Number of correct predicted samples Total number of samples
(4)
5 Results and Discussion This section presents and discusses the results of the conducted experiments for all scenarios. The comparisons between the obtained results and other studies are also presented in this section. The proposed method of fusion-based FS was tested on various microarray datasets with six scenarios. Table 4 demonstrates the number of features that were selected in each conducted scenario. Moreover, Tables 5, 6 and 7 show the findings of classification accuracy for the SVM classifier across five datasets. Table 4. The number of selected genes in each conducted scenario Cons (LR, DT) Cons (LASSO, Cons (LASSO, Cons (All) Maj (All) Base DT) LR)
Dataset
0
0
22
0
22
5597
Brain
3
1
22
1
24
24481 Breast
1
0
0
0
1
7129
CNS
3
1
27
1
29
2000
Colon
1
1
18
1
18
6033
Prostate
Table 5. The accuracy results of SVM in each conducted scenario Cons (LASSO, LR) Maj (All) Base
Dataset
95.00
95.00
88.50 Brain
74.44
75.11
63.89 Breast
75.00
65.00 CNS
93.33
95.00
90.00 Colon
97.00
97.00
92.09 Prostate
–
As presented in Tables 4, 5, 6, and 7, some general observations can be summarized as follows:
A Fusion-Based Feature Selection Framework
573
Table 6. The accuracy results of random forest in each conducted scenario Cons (LASSO, LR) Maj (All) Base
Dataset
88.67
88.67
86.50 Brain
78.52
84.65
86.23 Breast
76.48
65.05 CNS
88.57
85.24
88.57 Colon
95.00
95.00
92.00 Prostate
–
Table 7. The accuracy results of Gradient Boosting in each conducted scenario Cons (LASSO, LR) Maj (All) Base
Dataset
85.00
85.00
88.50 Brain
80.70
80.72
70.03 Breast
81.48
68.14 CNS
83.81
82.14
82.14 Colon
93.09
93.09
91.09 Prostate
–
• All proposed framework scenarios highly reduced the number of features compared to the original number of features. • The SVM’s results of all voting scenarios were considerably better than the base’s findings. • The voting decision achieved the highest results across most datasets for the SVM classifier. • In SVM, the consensus scenarios were stricter in making the decision of selecting features compared to voting scenarios. As a result, some consensus decision scenarios selected very few or no features, so they were excluded in the classification phase. • In both random forest and gradient boosting, results improved slightly compared to SVM’s results. On the other hand, random forest and gradient boosting worked better with the large datasets, Breast and CNS, in terms of the number of genes. For the breast dataset, the random forest obtained 84.65%, and for the CNS dataset, the gradient boosting achieved 81.48%. • Some datasets obtained the same results in both decision methods for all classifiers, such as Brain and Prostate. The comparison between the best results obtained in this study (using Maj (All) scenario) and those found by other researchers is demonstrated in Table 8. According to Table 6, the proposed framework shows better results in terms of classification accuracy and genes numbers for most datasets compared with related studies. On the other hand, the proposed framework showed better accuracy for the Colon dataset compared with [27], while the accuracy obtained for brain and prostate datasets showed
574
T. Almutiri et al.
Table 8. Comparison of the accuracy and selected genes of the proposed framework with other studies [26]
[27]
[18]
[15]
Current Study
Dataset
#F
Acc
#F
Acc
#F
Acc
#F
Acc
#F
Acc
–
–
13
97.62
–
–
Avg(121.68)
81.98
22
95.00
Brain
–
–
41
90.72
–
–
–
24
84.65
Breast
– 3 –
– 83.00 –
39
98.33
–
25
91.94
11
33
97.06
–
– 90.32 –
–
–
1
81.48
CNS
Avg (52.36)
88.61
–
29
95
Colon
Avg(57.50)
92.24
18
97.00
Prostate
better improvement as they applied wrapper methods after filter methods. However, the proposed framework presented outperformed by reducing gene numbers for prostate compared to [27]. In both Breast and CNS datasets, the results have not been better than another study’s findings [27]. However, the number of selected genes by the proposed method was significantly lower than the number of genes has been chosen by their approach [27].
6 Conclusion In this study, the framework for fusion-based feature selection has been proposed to address the problem of high dimensionality in gene expression microarray datasets and to improve the classification performance. The proposed framework consists of three layers: the FS methods layer, the threshold layer, and the decision layer. Also, it contained different scenarios to make a decision for selecting the important genes to the classification task over different FS methods, namely majority voting and consensus decisions. The different scenarios have been conducted on five microarray datasets with the SVM classifier. The experimental results confirmed that the proposed framework of fusion-based FS appears to have promising results compared to other studies. In future work, more options should be included in the framework for tackling the high dimensionality problem such as adding a uniform threshold (for example mean or median) for all included methods. Also, investigating the impact of applying wrapper methods in the proposed framework will be useful. Moreover, an automated search for the best scenario based on the different scenarios of the framework might be helpful for applying it over different datasets.
References 1. Miko, I., LeJeune, L.: Essentials of genetics. Cambridge NPG Education (2009) 2. Khurana, S.P.: Biotechnology: Principles and Process. Studium (2015) 3. Matilainen, M.: Identification and characterization of target genes of the nuclear receptors VDR and PPARs (2007) 4. Crick, F.: Central dogma of molecular biology. Nature 227, 561–563 (1970)
A Fusion-Based Feature Selection Framework
575
5. Alberts, B., Bray, D., Hopkin, K., Johnson, A.D., Lewis, J., Raff, M., Roberts, K., Walter, P.: Essential cell biology. Garland Science (2013) 6. Vlachakis, D.: Gene Expression Profiling in Cancer. Intechopen (2019). https://doi.org/10. 5772/intechopen.78451 7. Bustin, S.A., Benes, V., Garson, J.A., Hellemans, J., Huggett, J., Kubista, M., Mueller, R., Nolan, T., Pfaffl, M.W., Shipley, G.L.: The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments (2009) 8. Chattopadhyay, A., Lu, T.-P.: Gene-gene interaction: the curse of dimensionality. Ann. Transl. Med. 7, 813–817 (2019) 9. Xue, Y., Xue, B., Zhang, M.: Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans. Knowl. Discov. from Data. 13, 1–27 (2019) 10. Dash, R.: A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: a case study. J. King Saud Univ. Inf. Sci. 32, 232–247 (2020) 11. Tsai, C.-F., Sung, Y.-T.: Ensemble feature selection in high dimension, low sample size datasets: parallel and serial combination approaches. Knowledge-Based Syst. 106097 (2020) 12. Jesus, J., Araújo, D., Canuto, A.: Fusion approaches of feature selection algorithms for classification problems. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 379–384. IEEE (2016) 13. Ke, W., Wu, C., Wu, Y., Xiong, N.N.: A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 6, 61065–61076 (2018). https://doi.org/10.1109/ACC ESS.2018.2873634 14. Momenzadeh, M., Sehhati, M., Rabbani, H.: A novel feature selection method for microarray data classification based on hidden Markov model. J. Biomed. Inform. 95, 1–8 (2019). https:// doi.org/10.1016/j.jbi.2019.103213 15. Lin, X., Li, C., Zhang, Y., Su, B., Fan, M., Wei, H.: Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics. Molecules 23, 52 (2018) 16. Athilakshmi, R., Rajavel, R., Jacob, S.G.: Fusion Feature selection: new insights into feature subset detection in biological data mining. Stud. Inform. Control. 28, 327–336 (2019) 17. Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: Using a feature selection ensemble on DNA microarray datasets. In: ESANN (2016) 18. Morovvat, M., Osareh, A.: An ensemble of filters and wrappers for microarray data classification. Mach. Learn. Appl. An Int. J. 3, 1–7 (2016) 19. Bühlmann, P., van de Geer, S.: Statistics for high-dimensional data: Methods, Theory and Applications. Springer Science and Business Media (2011). https://doi.org/10.1080/026 64763.2012.694258 20. Kazemitabar, J., Amini, A., Bloniarz, A., Talwalkar, A.S.: Variable importance using decision trees. In: Advances in Neural Information Processing Systems. pp. 426–435 (2017) 21. Xia, F., Zhang, W., Li, F., Yang, Y.: Ranking with decision tree. Knowl. Inf. Syst. 17, 381–395 (2008) 22. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995). https:// doi.org/10.1007/bf00994018 23. Aydadenta, H.: Adiwijaya: a clustering approach for feature selection in microarray data classification using random forest. J. Inf. Process. Syst. 14, 1167–1175 (2018). https://doi. org/10.3745/JIPS.04.0087 24. Probst, P., Boulesteix, A.-L., Bischl, B.: Tunability: importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 20, 1–32 (2019) 25. Zhu, Z., Ong, Y.-S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 40, 3236–3248 (2007)
576
T. Almutiri et al.
26. Sun, L., Zhang, X., Qian, Y., Xu, J., Zhang, S.: Feature selection using neighborhood entropybased uncertainty measures for gene expression data classification. Inf. Sci. (Ny) 502, 18–41 (2019). https://doi.org/10.1016/j.ins.2019.05.072 27. Hameed, S.S., Muhammad, F.F., Hassan, R., Saeed, F.: Gene selection and classification in microarray datasets using a hybrid approach of PCC-BPSO/GA with multi classifiers. J. Comput. Sci. 14, 868–880 (2018)
An Approach Based Natural Language Processing for DNA Sequences Encoding Using the Global Vectors for Word Representation Brahim Matougui1,2(B) , Hacene Belhadef1 , and Ilham Kitouni1 1 University of Constantine, 2- Abedelhamid Mehri, 25016 Constantine, Algeria
[email protected] 2 National Center for Biotechnology Research, 25016 Constantine, Algeria
Abstract. DNA sequence has several representations; one of them is to split it into k-mers components. In this work, we explore the high similarity between natural language and “genomic sequence language” which are both characterbased languages, to represent DNA sequences. In this representation, we processed a DNA sequence as a set of overlapping word embeddings using the Global Vectors representation. In Natural language processing context, we can consider k-mers as words. The embedding representation of k-mers helped to overcome the curse of dimensionality, which is one of the main issues of traditional methods that encode k-mers occurrence as one hot vector. Experiments on the first Critical Assessment of Metagenome Interpretation (CAMI) dataset demonstrated that our method is an efficient way to cluster metagenomics reads and predict their taxonomy. This method could be used as first step for metagenomics downstream analysis. Keywords: Global vectors representation · Word embeddings · DNA sequence representation · Natural language processing
1 Introduction DNA sequencing has become the main goal of modern genomics and epigenomics words. As result, the amount of produced DNA sequences by different Next Generation technologies (NGS) has exploded (see Table 1) and consequently, the analysis of these sequences has become a bottleneck. Despite the wide use of k-mer representation for analyzing sequence of DNA, this representation suffers from the problem of dimensionality, this due to the direct DNA sequence encoding as one-hot-vector which is vulnerable when we want to investigate larger value of k. for example, to represent 12-mer we need 412 = 16777216 vector dimension. Useful information from DNA sequences cannot be extracted by small k values. Research suggests that good accuracy can only be achieved by a larger value of k [1]. Word embedding is a method to represent text where each word is represented by a real-valued vector. Words that have the same meaning have similar real-valued vector © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 577–585, 2021. https://doi.org/10.1007/978-3-030-70713-2_53
578
B. Matougui et al.
representations. The Global Vectors for word representation (GloVe) [2], word2vec [3], and fastText [4, 5] are the three main word embedding models in Natural Language Processing (NLP). In this paper, to deal with the problem of dimensionality, we propose a new DNA sequence representation method based on word embeddings. These embeddings are encoded by the Global Vectors for word representation [2]. The new suggested method allows a variable-length k-mers representation without penalizing the machine performance. This is because whatever the value of k, all k-mers are represented into a continuous vector space of size d (d = 50). The main contribution of this work includes: • DNA sequence encoding using Global Vectors for Word Representation. • The variable-length k-mers embedding model: user can introduce different word k length values ( klow BIM adoption
2.562506
0.005342 *
Relative advantage - > BIM adoption
5.200519
0.000000 *
Financial constraints - > BIM adoption
1.655987
0.049176 *
Competitive pressure - > BIM adoption
0.531837
0.297537
Compatibility - > BIM adoption
0.365801
0.357334
Organizational readiness- > BIM adoption 4.100428
0.00045 *
Notes:* P < 0.05; Significant
5 Discussion and Conclusions This study found that the relative advantage and interoperability of BIM are the influencing factors on BIM adoption in Malaysian AEC. The stakeholders of AEC consider BIM as a beneficial technology that helps in managing the business operation and construction activities. This finding is consistent with [34, 44–47]. Furthermore, compatibility is surprisingly found to have no effect on the BIM adoption like [34, 44] also found insignificant in their studies. This finding suggests that compatibility has no contribution to BIM adoption. Malaysian AEC may consider it incompatible with their existing work procedure and practices. The analysis results show that organizational readiness is an important indicator of BIM adoption. Organizations with sufficient IT infrastructure to implement BIM and available internal expertise to use BIM are more likely to adopt BIM. Another advantage of the internal competency of organizations is to try the software before actual implementation, boast confidence in the adoption decision. These finding are consistent with [34, 44, 48]. The financial constraints are the cost of BIM adoption, ongoing cost, and implementation cost. This study found a negative association between cost and BIM adoption. This finding is consistent with [25, 34]. It seems as there is no pressure from competing organizations as found in the analysis. The other possibility is that AEC stakeholders are waiting for specific gains and benefits from early BIM adopters before adopting BIM in their organizations. This finding is consistent with previous literature [34, 40, 44]. This study found the regulation to support significant driving factors towards BIM adoption. These findings are consistent with [25, 45] indicating that BIM is demanded and supported by regulatory bodies. In conclusion, the objective of the research study is to find the effect of the factors on the BIM adoption in Malaysian AEC. The factors are then categorized based on Technology, Organization, and Environment framework. Finally, this study presents the BIM adoption model for the Malaysian AEC industry, and the model is validated with data collection and statistical analysis. BIM is an interesting field of research, for the reason of its applicability in AEC and related disciplines. Although this study provides a holistic view of BIM adoption factors in Malaysia it is not out of limitations. The first limitation is the selection of participants as only two major cities of Malaysia
1006
H. M. F. Shehzad et al.
are considered for data collection. Future studies should consider a large sample size comprise of other parts of Malaysia such as east Malaysia. The second is the use of a single technology adoption theory. Future studies are suggested to combine factors from multiple models and theories to a comprehensive analysis of the adoption phenomena. Future work should consider the moderators to be used with existing technology acceptance models to comprehensively analyze the BIM adoption. Also, future studies are suggested to analyze interoperability factors effect on BIM adoption. This paper will help researchers interested in technology adoption to carry out further research in the BIM adoption domain. This study will also help AEC organizations and practitioners to address the factors identified, to assess and promote BIM adoption in Malaysia.
References 1. Juan, Y.-K., Lai, W.-Y., Shih, S.-G.: Building information modeling acceptance and readiness assessment in Taiwanese architectural firms. J. Civ. Eng. Manag. 23, 356–367 (2017). https:// doi.org/10.3846/13923730.2015.1128480 2. Mahamadu, A., Mahdjoubi, L., Booth, C.: Determinants of building information modelling (bim) acceptance for supplier integration: a conceptual model. In: Proceedings 30th Annual ARCOM Conference, pp. 723–732. Association of Researchers in Construction Management, Portsmouth (2014) 3. Malaysia Department of Statistics: Department of Statistics Malaysia Official Portal. https://www.dosm.gov.my/v1/index.php?r=column/pdfPrev&id=RmpwV3lyVVVtemd IdHYyKzdZT2dvQT09 4. MPC Malaysia: Malaysia Productivity Corporation (MPC). https://www.mpc.gov.my/ 5. CIDB Malaysia: Malaysia Building information modeling report 2016. https://www.cidb.gov. my/images/content/penerbitan-IBS/BIM-REPORT.pdf 6. Hatem, W.A.: Motivation factors for adopting building information modeling (BIM) in Iraq. Eng. Technol. Appl. Sci. Res. 8, 2668–2672 (2018). https://doi.org/10.5281/ZENODO.125 7505 7. Jongsung, W., Ghang, L.: Where to focus for successful adoption of building information modeling within organization. J. Constr. Eng. Manag. 139, 51–58 (2013). https://doi.org/10. 1061/(ASCE)CO.1943-7862 8. Latiffi, A.A., Brahim, J., Fathi, M.S.: Transformation of Malaysian construction industry with building information modelling (BIM). MATEC Web Conf. 66, 00022 (2016). https://doi.org/ 10.1051/matecconf/20166600022 9. Ghaffarianhoseini, A., Tookey, J., Ghaffarianhoseini, A., Naismith, N., Azhar, S., Efimova, O., Raahemifar, K.: Building Information Modelling (BIM) uptake: clear benefits, understanding its implementation, risks and challenges. Renew. Sustain. Energy Rev. 75, 1046–1053 (2017). https://doi.org/10.1016/j.rser.2016.11.083 10. Ahmed, A.L., Kawalek, J.P., Kassem, M.: A comprehensive identification and categorisation of drivers, factors, and determinants for BIM adoption: a systematic literature review. Comput. Civil Eng. 2017, 220–227 (2017) 11. Herr, C.M., Fischer, T.: BIM adoption across the Chinese AEC industries: an extended BIM adoption model. J. Comput. Des. Eng. 6(2), 173–178 (2018). https://doi.org/10.1016/j.jcde. 2018.06.001 12. Walasek, D., Barszcz, A.: Analysis of the adoption rate of building information modeling [BIM] and its return on investment [ROI]. Procedia Eng. 172, 1227–1234 (2017). https://doi. org/10.1016/j.proeng.2017.02.144
Building Information Modelling Adoption Model for Malaysian Architecture
1007
13. Ngowtanasawan, G.: A causal model of BIM adoption in the Thai architectural and engineering design industry. In: Procedia Engineering, pp. 793–803 (2017) 14. Bosch-Sijtsema, P., Isaksson, A., Lennartsson, M., Linderoth, H.C.J.: Barriers and facilitators for BIM use among Swedish medium-sized contractors - “We wait until someone tells us to use it.” Vis. Eng. 5, 1–12 (2017). https://doi.org/10.1186/s40327-017-0040-7 15. Mustaffa, N.E., Salleh, R.M., Ariffin, H.L.B.T.: Experiences of building information modelling (BIM) adoption in various countries. In: 2017 International Conference on Research and Innovation in Information Systems (ICRIIS), pp. 1–7. IEEE (2017) 16. Yusuf, B.Y., Embi, M.R., Ali, K.N.: Academic readiness for building information modelling (BIM) integration to Higher Education Institutions (HEIs) in Malaysia. In: International Conference on Research and Innovation in Information Systems, ICRIIS, pp. 1–6 (2017) 17. Ministry of International Trade and Industry: Ministry of International Trade and Industry. https://www.miti.gov.my/index.php/pages/view/industry4.0?mid=559 18. Baker, J.: Information Systems Theory (2012) 19. Jeyaraj, A., Rottman, J.W., Lacity, M.C.: A review of the predictors, linkages, and biases in IT innovation adoption research. J. Inf. Technol. 21, 1–23 (2006). https://doi.org/10.1057/pal grave.jit.2000056 20. Rogers, E.M.: Diffusion of Innovations, p. 551. Free press, New York (2003) 21. Takim, R., Harris, M., Nawawi, A.H.: Building information modeling (BIM): a new paradigm for quality of life within architectural, engineering and construction (AEC) industry. Procedia Soc. Behav. Sci. 101, 23–32 (2013). https://doi.org/10.1016/j.sbspro.2013.07.175 22. Ding, L., Xu, X.: Application of cloud storage on BIM life-cycle management. Int. J. Adv. Robot. Syst. 11, 129 (2014). https://doi.org/10.5772/58443 23. Rogers, E.M., York, N.: Diffusion of Innovations, 4th edn. Iffil The Free Press, New York (1995) 24. Gao, J., Li, M., Tan, C.Y.: A concept model for innovation diffusion in construction industry. In: International Conference Innovations of Engineering and Technology, pp. 262–266 (2013) 25. Xu, H., Feng, J., Li, S.: Users-orientated evaluation of building information model in the Chinese construction industry. Autom. Constr. 39, 32–46 (2014). https://doi.org/10.1016/j. autcon.2013.12.004 26. Seed, L.S.: The Dynamics of BIM Adoption : A Mixed Methods Study of BIM as an Innovation within the United Kingdom Construction Industry. Thesis 1 (2015) 27. Son, H., Lee, S., Kim, C.: What drives the adoption of building information modeling in design organizations? an empirical investigation of the antecedents affecting architects’ behavioral intentions. Autom. Constr. 49, 92–99 (2015). https://doi.org/10.1016/j.autcon.2014.10.012 28. Euisoon, A., Kim, M.: BIM awareness and acceptance by architecture students in Asia. J. Asian Archit. Build. Eng. 15, 419–424 (2016) 29. Parasuraman, A., Colby, C.L.: An updated and streamlined technology readiness index: TRI 2.0. J. Serv. Res. 18, 59–74 (2015). https://doi.org/10.1177/1094670514539730 30. Hanafi, M.H., Sing, G.G., Abdullah, S., Ismail, R.: Organisational readiness of building information modelling implementation: architectural practices. J. Teknol. 78, 121–126 (2016). https://doi.org/10.11113/jt.v78.8265 31. Ding, L., Zhou, Y., Akinci, B.: Building information modeling (BIM) application framework: the process of expanding from 3D to computable nD. Autom. Constr. 46, 82–93 (2014). https://doi.org/10.1016/j.autcon.2014.04.009 32. Oraee, M., Hosseini, M.R., Banihashemi Namini, S., Merschbrock, C.: Where the gaps lie: ten years of research into collaboration on BIM-enabled construction projects. Constr. Econ. Build. 17, 121 (2017). https://doi.org/10.5130/AJCEB.v17i1.5270 33. Merschbrock, C., Nordahl-Rolfsen, C.: BIM technology acceptance among reinforcement workers - the case of oslo airport’s terminal 2. J. Inf. Technol. Constr. 21, 1–2 (2016)
1008
H. M. F. Shehzad et al.
34. Ahuja, R., Jain, M., Sawhney, A., Arif, M.: Adoption of BIM by architectural firms in India: technology–organization–environment perspective. Archit. Eng. Des. Manag. 12, 311–330 (2016). https://doi.org/10.1080/17452007.2016.1186589 35. Koo, B., Shin, B., Krijnen, T.F.: Employing outlier and novelty detection for checking the integrity of BIM to IFC entity associations. In: ISARC 2017 - Proceedings of the 34th International Symposium on Automation and Robotics in Construction, pp. 14–21 (2017) 36. Zhang, L., Wang, G., Liu, H.: The development trend and government policies of open BIM in China. In: Proceedings of the 17th International Symposium on Advancement of Construction Management and Real Estate, pp. 981–993 (2014) 37. European, C.: New European Interoperability Framework 38. Pauwels, P., Zhang, S., Lee, Y.C.: Semantic web technologies in AEC industry: a literature overview. Autom. Constr. 73, 145–165 (2017). https://doi.org/10.1016/j.autcon.2016.10.003 39. Boone, J.: Competitive pressure: the effects on investments in product and process innovation. RAND J. Econ. 31, 549 (2006). https://doi.org/10.2307/2601000 40. Cao, D., Li, H., Wang, G.: Impacts of isomorphic pressures on BIM adoption in construction projects. J. Constr. Eng. Manag. 140, 04014056 (2014). https://doi.org/10.1016/j.chemgeo. 2003.12.009 ˇ Rebolj, D.: Culture change in construction industry: from 2D toward bim based 41. Babiˇc, N.C., construction. J. Inf. Technol. Constr. 21, 86–99 (2016). https://doi.org/10.18632/oncotarget. 5527 42. Desbien, A.L.: Using BIM capabilities to improve existing building energy modelling practices. Eng. Constr. Archit. Manag. 21, 16–33 (2017). https://doi.org/10.3130/jaabe.15.279 43. Hair, J.F., Jr., Hult, G.T.M., Ringle, C., Sarstedt, M.: A Primer on Partial Least Squares Structural Equation Modeling. SAGE Publications Ltd., Thousand Oaks (2013) 44. Chen, Y., Yin, Y., Browne, G.J., Li, D.: Adoption of building information modeling in Chinese construction industry: the technology-organization-environment framework. Eng. Constr. Archit. Manag. (2019). https://doi.org/10.1108/ECAM-11-2017-0246 45. Ahuja, R., Sawhney, A., Jain, M., Arif, M., Rakshit, S.: Factors influencing BIM adoption in emerging markets – the case of India. Int. J. Constr. Manag. 3599, 1–2 (2018). https://doi.org/ 10.1080/15623599.2018.1462445 46. Son, H., Lee, S., Hwang, N., Kim, C.: The adoption of building information modeling in the design organization: an empirical study of architects in Korean design firms. In: 31st International Symposium Automation Robotics and Construction Mining, (ISARC 2014 ), pp. 194–201 (2014) 47. Tsai, M.-C., Lai, K.-H., Hsu, W.-C.: A study of the institutional forces influencing the adoption intention of RFID by suppliers. Inf. Manag. 50, 59–65 (2013) 48. Ding, Z.: Key factors for the BIM adoption by architects : a China study. Eng. Constr. Archit. Manag. (2015). https://doi.org/10.1108/ECAM-04-2015-0053
Digital Government Competency for Omani Public Sector Managers: A Conceptual Framework Juma Al-Mahrezi(B) , Nur Azaliah Abu Bakar, and Nilam Nur Amir Sjarif Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia {azaliah,nilamnur}@utm.my
Abstract. The government invests in a transformation initiative called Digital Government or previously known as e-Government. Digital Government requires technology, people, and process along with a set of strategies. For the digital government to succeed, government workers must be equipped with appropriate digital skills that are currently not thoroughly explored. Government employees must be trained with proper digital skills, albeit rigorously studied, to ensure the success of this Digital Government implementation. In Oman, studies on digital government started in 2004. However, most of the studies focus on strategy, process, and technology but lack the people aspect. Also, there is a lack of studies on Digital Government Competency (DGC) for public sector managers. Additionally, studies on how to retain employees with advanced ICT competency in different public sector organisations have to be conducted. Therefore, this study aims to develop a conceptual framework for the digital government to assess the relationship between public sector managers, digital leadership skills, data protection skills, soft skills, digital literacy, management skills, digital creativity and innovation. This framework’s development relies on Human Capital Theory (HCT) and Technology Organization Environment Theory (TOE). Based on previous literature works, there is a relationship between the success of digital government initiatives and employees’ competency. As a result, this study proposes a framework for researchers and governments by demonstrating the value of the Digital Government Competency on the Omani government, which can help to increase the success rate of the Omani Digital Government initiatives. Keywords: Competency · Digital government · Human capital theory · Public sector · Technology organization environment theory
1 Introduction Nowadays, the skills required to ensure the success of the digital transformation differ from those required during previous years. Thus, the enterprise managers have to define the skills required by the workers in technology departments compared to other departments to succeed in digital transformation efforts today and in the coming years. According to a survey by Gerald C. Kane, Doug Palmer [1] on Global CIO skills, creativity, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1009–1020, 2021. https://doi.org/10.1007/978-3-030-70713-2_90
1010
J. Al-Mahrezi et al.
learning skills, and emotional intelligence are among the most necessary technological skills and the demand are expected to increase over the next three years. According to Half [2], the main challenge in successfully executing a digital initiative is finding technology professionals with the right combination of relevant skills to complement the digital project team. According to a study on the Australian industry by Gekara, Snell [3], the three professional classes of supervisors, technicians, and merchants show a great need for digital skills. Also, there is no broader digital environment appreciation. This situation is a major shortcoming in an evolving digital economy. Emerging technology is likely to lead the future of work. The workers will need to learn particular technologies in specific contexts and a higher degree of overall digital skills [3]. On the other hand, all management levels in the organisations are taking more intensive action with what digital systems is expected to be. Therefore, it requires a higher degree of technical competence [3]. Managers in public administrations are particularly challenged as they must process a digital mindset to rethink all processes digitally [4]. Likewise, the Digital Government initiative in Oman may face the same issues related to digital government competencies and skills at the management level. The changes in organisational processes and traditional ways of operating and delivering goods and services influenced workforce needs, knowledge, skills, and competencies. To ensure that these digital competency issues are also relevant to the Omani Public Sector context, the researchers interviewed the experts in Omani Public Sector to get their insights. Five experts were involved in the Omani digital government initiative. They agreed that the Omani Digital Government initiative is affected by personnel’s digital skills, both non-ICT and ICT personnel. Overall and agreed by all five experts, the most common problem is the lack of skills and expertise among IT professionals and public sector workers, impacting the digital government. The preliminary interview with the experts reveals the need to examine Omani government workers’ IT skills required for the Digital Government initiatives. The lack of various digital skills required to integrate e-government or to digitise the government is the main reason for digitisation failure in most developing countries. In contrast, digital skills enable successful government implementation of e-government or digitisation; the position and level of digital skills among government staff are unknown [5]. Furthermore, Al-Kalbani [7] pointed out that the critical factors for effective compliance with information security in the Digital Government of Oman are: government loyalty, awareness and training, organisation assurance, audit and monitoring, legal and social forces, technology capability, system integration, and technology compatibility and reliability. Interestingly, previous research works have given attention to the various context of digital government but less attention to digital government competencies. Therefore, based on the problem background review, there is a lack of studies on digital government competencies of public sector managers. The previous studies on digital government competencies were from an organisational perspective but not in employees’ perspectives. Also, no such studies in digital government competencies were conducted in Omani Public Sector and absent of digital government competency framework for public sector managers. Thus, in the problem statement section, this study is aware that for the success of Digital Government initiatives, government employees must be equipped
Digital Government Competency for Omani Public Sector Managers
1011
with relevant digital skills and that currently such topic has not been rigorously studied. Therefore, this study aims to fill this gap by investigating public employees’ digital government competencies and proposing a digital government competency framework from the public sector managers’ perspective. The significance of this research will bridge the gap between digital proficiencies and Digital Government initiatives in Omani government organisations. The findings will be useful because the Omani government needs to improve and increase the Digital Government’s performance. The government can update the plans and strategies using the proposed framework to improve public sector managers’ digital government competencies. Measuring and identifying the relevant competencies is very important in any government, and Oman is one of them. The digital government competencies framework will help the organisation to explain priorities and set performance goals across government agencies. It will guide executives and staff to understand their planned behaviours and abilities and how to accomplish them. Also, this research is supporting Oman Vision 2040. It has ten main indicators; national competencies with dynamic capabilities and skills to compete locally and internationally. The following section addresses previous Digital Government studies and competencies, and related hypotheses.
2 Related Work Competency is defined as a grouping of skills, attributes, and behaviours directly related to successful performance on the job. They are essential for all staff, regardless of occupation, function, or level [9]. Whereas, a competency framework covers the preferred competencies for a specific task and may also include a description of single competencies and indicators to measure performance and outcome [10]. Digital Government Competencies identified by this paper’s researchers as government employees’ skills and ability to adapt and implement the Digital Government initiative successfully. 2.1 Digital Government and Competencies The lack of various digital skills is one of the resistance factors in successful digital egovernment or government transformation. However, the status and level of digital skills among government employees remain unknown [5]. In light of this reality, enterprise managers must define the skills workers in technology departments need to succeed in digital transformation efforts today and in the coming years. Gerald C. Kane et al. [1] stated that about 77% of Chief Information Officers interviewed noted that it was difficult to find new digital skills. Also, 38% of enterprises indicated that digital transformation initiatives failed to meet the required expectations due to a lack of talent within those companies. About 35% of the interviewers said there is a lack of transformation specific leadership. Additionally, the Technology Benchmarking Survey findings showed that 77% of IT hiring decision-makers in North America faced difficulties finding candidates with up-to-date digital skills [2]. Additionally, according to Butschan, Heidenreich, Weber ans Kraemer [11], the empirical evidence for competence in the digital transformation is minimal. Therefore,
1012
J. Al-Mahrezi et al.
to reduce this gap, the authors and researchers developed and tested competency frameworks for digital transformation and found that cognitive skills are crucial for successful digital transformation. Most developing countries use existing frameworks used in a developed country to build and execute digital government services projects. However, often it is not relevant due to different criteria and context. Also, there is a lack of public administration research in the Arab Gulf States, as most of these countries borrow western best practices [6]. Also, it is limited to which competencies and skills are of particular importance for digital transformation [8]. Consequently, a framework needs to be developed to picture the current situation for requirements and needs in developing countries, particularly to the public sector’s digital government competencies. 2.2 Related Theories on Digital Competency Technology-Organisation-Environment (TOE) Theory: TOE discusses that the process by which technological innovations are adopted and implemented in organisations is combined with the technological, organisational, and environmental contexts surrounding their operations [12]. These three sets of factors present both constraints and opportunities for technological innovation [12]. The TOE framework, as initially presented, and later adapted in IT adoption studies, provides a useful analytical and the potential of application to IS innovation domains. However, specific factors identified within the three contexts may vary across different studies [7]. Human Capital Theory (HCT): In defining Human Capital Theory, Baker [13] stated that human capital includes the knowledge and skills embodied in an individual that can be generated, developed, gathered, advanced, managed, and retained. Its value can be established at all levels of society. Florida [14] illustrated that the essential feature of this theory is the existence of talents of people and their impact on economic growth and improvement. Also, this theory highlights that formal education is highly instrumental and necessary to improve the productive capacity of a population. However, Human Capital Theory studies usually assume that experiences are translated into knowledge and skills (Table 1).
2.3 The Gap in the Existing DGC Study Reviewing a range of theories and frameworks from an academic and administrative perspective shows that not only the hard technical skills associated with digital technology but also more general and ‘soft’ interpersonal skills are considered as crucial recent digital government skills [1, 3, 5, 8]. Therefore, the Framework on Digital Government Competency is proposed based on the literature review analysis conducted. The researchers collected digital competency skills and did the DGC Comparative Analysis between those different perspectives in Table 2 and Fig. 1. Table 2 shows that government organisations in Australia, Malaysia, Singapore, and the U.K. have a different perspective on DGC. Information security competency, digital creativity and innovation, soft skills, digital literacy, and management competency are the most frequently mentioned DGC in the literature.
Digital Government Competency for Omani Public Sector Managers
1013
Table 1. Studies used related theories Theory
Author(year)
Topic
TOE
(Al-Kalbani 2017) [7]
A compliance-based framework for information security in e-government in Oman
(Al-Balushi, Bahari and Rahman 2016)[15]
Technology Organizational and Environmental (TOE) Factors Influencing Enterprise Application Integration (EAI) Implementation in Omani Government Organisations
(Nam 2018) [16]
Examining the anti-corruption effect of e-government and the moderating effect of national culture: A cross-country study
(Elena 2018) [17]
Digital economy and a new paradigm of the labours market
HCT
Fig. 1. Mapping of the research
3 Methodology 3.1 Research Design The research methodology involved three phases. The first phase was assigned to investigate, including defining the research problem and the knowledge gap. Then the next phase was to extract Digital Government competencies and identify related theories. The third phase was devoted to developing an integrated framework; in this phase, the competencies were classified, followed by framework construction, as shown in Fig. 2.
1014
J. Al-Mahrezi et al.
•Defining research problem •Defining knowledge gap •preliminary interview with experts
Data collection & analysis •Review of literature •Defining key concepts & related theories
•Developing hypothesis •Developing a conceptual Framework
Propose Framework
Investigation
Fig. 2. Research methodology
3.2 Data Collection The researchers conducted a literature search on various databases and collecting the articles from 2015. This exercise resulted in the discovery of about 100 articles related to the research topic. Then the researchers checked the title, abstract, and keywords of these articles, and kept 30 articles that addressed the research topic, read them, went through their sources manually, and performed a typical literature review. The aim is to locate more articles relevant to the subject, given the results of the previous online search. The researchers often sought to find similar studies from the same authors or research groups, and papers referencing previously discovered publications. So, for this research, the researchers searched on www-ScienceDirect-com, ieeexplore.ieee.org, and www-Scopus-com; also, the researchers used other databases that are provided by the University of Technology Malaysia (UTM) and utilised Google Scholar. This study uses the terms e-government in Oman, digital government, digital skills, government competencies in Oman, digital government competency framework, competency, public sector, human capital theory, and technology organisation theory as keywords. The researchers have reviewed 90 articles and governmental reports. 3.3 Data Analysis The second phase of this study involved analysing the data collection, using the articles and organisations report. The first step was to extract Digital Government competencies and identify related theories. Next was to classify competencies, followed by framework construction. The competencies extraction was based on different perspectives from academic researchers and organisations, and past frameworks, as presented in Table 2.
4 Research Conceptual Framework This paper aims to define a competency framework as a tool for organisations supporting improved Digital Government competency in the Omani Public Sector. In the final stage of researching past studies and government reports, this study outlines the six most important competencies for public sector managers that influence the digital government competency (Fig. 3).
Digital Government Competency for Omani Public Sector Managers
1015
Table 2. Comparative Analysis of DGC Skills from Previous Studies According to Countries and Existing DGC Frameworks Previous studies, countries & existing dgcc frameworks
C1
UK (Civil service workforce plan 2016 to 2018)[20] Australia (Workforce digital skills framework) [3, 21] (Albalushi, Zaidan, Khadir and Yusof 2019) [22] (Osmundsen 2020) [8] (Prifti, Knigge, Kienegger, & Krcmar 2017) [23]
C4
√
C5
C6
√ √
√
√
√
√
√
√
√
√
√ √
√ √
√
√
√
(Gekara et al. 2019) [3] (Andriole 2018) [24] (Malanda 2019) [5]
√
√
√
√
√
√
√
√
√ √
The European e-competence framework[25]
√
European digital competence framework for citizens [25]
√
√
√ √
SFIA [26] TOTAL
C3
√
Malaysia (Malaysia’s digital government capability framework)[18] Singapore (Singapore capability framework) [19]
C2
4
7
9
6
5
3
C1-Digital Leadership Competency C2-Soft skills. C3-Management Competency C4-Information Security Competency C5-Digital Literacy C6-Digital Creativity and Innovation
4.1 Management Competency Management skills refer to managing people and other organisational resources, including managing projects, supporting staff, finding solutions, and planning. Moreover, as new digital technologies are introduced into the organisations, new management skills, such as re-engineering business processes, are needed to enable significant business improvements. Therefore, the organisations’ human capital and competencies management is essential for a successful digital transformation [8]. Moreover, project management standards and communications were found to have a significant impact on ICT implementation. Self-efficacy factor scored the highest correlation with ICT, followed by top management support. On the contrary, resistance to change showed the lowest correlation with ICT success [27]. The nature of management practices also influences the organisation’s success and its methods of controlling the workplace’s outcomes. Top management support provides and distributes sufficient resources efficiently; senior management’s approval is needed
1016
J. Al-Mahrezi et al. HCT C3 - Management competency C2- Soft Skills
H1 H2 H3
C5 - Digital literacy C6 - Digital creativity and innovation TOE C4- Information security competency C1- Digital leadership competency
H4
H5
Digital Government competency
H6
Fig. 3. Propased research work
for the employees to work harder and be innovative [27]. Additionally, agile program management is a requirement for digital transformation and other skills and competencies [24]. Agile project management can be a solution for moving slowly towards a project [8]. Therefore, project management standards are an essential means of improving and enhancing project management [27]. Likewise, DGC strategy and architectures such as IT Governance, IT Strategy and Preparation, Information Management, Information Systems Integration, Information Security, Information, Verification, Analytics, Publishing Information Content.are crucial for planning, adoption and implementation of new technologies in any organisation such as the Digital Government initiative. Thus, based on the above argument, the hypothesis is developed and proposed as follows: H1: Management competency will positively influence DGC. 4.2 Soft Skills Competency A mixture of hard and soft skills is required as employees are expected to have the ability to select knowledge from different sources, then apply this knowledge both professionally and in their personal lives [28]. Also, digital skills must not be viewed in isolation from other skills such as soft skills [29]. An example of soft skills is digital problem-solving and digital communication and collaboration. Digital problem-solving is defined as troubleshooting/solving problems using digital technologies, recognising workplace problems and needs in the digital environment, and proposing innovative solutions [3]. Problem-solving is a part of several digital frameworks, consistent with other (non-digital) skill frameworks [3]. A lack of ICT skills affects several industries globally. Obtaining appropriate information and communication skills worldwide and in Southern Africa remains a challenging task [5]. Also, Project Management Standards and Communication are found to influence ICT implementation significantly [27]. So, based on the above argument, the hypothesis is developed and proposed as follows: H2: Soft skills competency will positively influence DGC.
Digital Government Competency for Omani Public Sector Managers
1017
4.3 Digital Literacy This paper identifies digital literacy as the hard-technical skills needed to operate digital devices, software, and systems. It is the cognitive skills required to work in an increasingly data and information environment, an intensive environment that encompasses a wide range of information and data sources and types. It also involves safety-related ethical skills and strategic skills for troubleshooting and addressing work-related issues. The growth of the Internet as digital technology in the 1990s led to new considerations on the types of skills needed to function in a networked online media environment [29]. Also, technical, operational, and other non-technical skills have become utmost important, such as finding, evaluating, and managing increasing amounts of information on the Internet [3]. This change is represented by some terms such as ‘information literacy’ and ‘digital media literacy. Furthermore, several studies have identified digital skills gaps as critical challenges to the growth of Australia’s digital economy and a significant obstacle that needs to be overcome using innovative policies [3]. Another notable finding is that while digital skills were initially identified as primarily the domain of highly technical ICT professionals, it is becoming increasingly evident that the general workforce, including those engaged in low-skilled occupations, needs digital skills to navigate highly digitalised and mechanised workplaces [3]. Thus, based on the above discussion, the hypothesis is developed and proposed as follows: H3: Digital literacy will positively influence DGC. 4.4 Digital Creativity and Innovation For an economy improvement and sustainable growth, knowledge is not enough. Innovation and creativity are the serious requirements to turn knowledge into an idea and a product that increases the value to the organisation, people, or individual. The use of experience can be considered a skill [28]. So, based on the above argument, the hypothesis is developed and proposed as follows: H4: Digital creativity and innovation will positively influence DGC. 4.5 Information Security Competency The researchers identified information security competency as the ability to browse the Internet and use emails under security instructions and data protection supervision. Digital security and security are defined as complying with the Organisational Policies to protect hardware, software, information, and systems and analyse digital risks to identify cybersecurity threats and vulnerabilities [3]. Security is also a crucial feature of e-services, particularly in government services dealing with sensitive individual data. Moreover, the essential security feature is that Digital Government applications are more capable of protecting citizens’ sensitive data from loss and unauthorised access [30]. This factor also increases the reliability and credibility of government services. In other words, information security and privacy are critical factors in the success of e-Government [30]. Thus, based on the above discussions, the hypothesis is developed and proposed as follows: H5: Information security competency will positively influence DGC.
1018
J. Al-Mahrezi et al.
4.6 Digital Leadership Competency It is identified as the ability to lead and manage the success of people. Project management standards and communications were found to have a significant impact on ICT implementation. Self-efficiency was shown to have the highest correlation with ICT factors, followed by top management [27]. According to [27], four new dimensions and elements that can stimulate ICT successful implementation in the Omani context are recognised. These dimensions and factors are identified as acceptance, effective leadership, ICT experts, and situational awareness. As such, a leadership style is considered significant to achieve an entirely appropriate culture and communication. A supervisory instruction administers staff management and leadership at the centre. This finding is consistent with other public sector departments operating based on specific legislation and mandates [30]. Thus, based on the above argument, the hypothesis is developed and proposed as follows: H1: Digital Leadership competency will positively influence DGC.
5 Conclusion and Future Work In conclusion, the digital government competencies framework will help organisations explain priorities and set performance criteria throughout government organisations. Measuring and identifying competencies is very important in any government organisation. The government can update the plans and strategies using the proposed framework to improve public sector managers’ digital government competencies. It will guide workers and managers to understand the abilities required from them and what they should aspire to accomplish. Therefore, for the government organisations to fully and successfully deliver the digital services, decision-makers and government managers need to know their needs and focus on the competencies required to execute the digital government successfully. Consequently, the investigation on competencies and skills will help to develop an efficient digital government competency framework capable of addressing employees’ needs and achieving common objectives. Therefore, the next phase is to update the proposed framework for digital government competencies for the public sector managers. Additionally, another competency will be studied and considered in updating the developed framework. Acknowledgement. This Resaerch is financially supported by Universiti Teknologi Malaysia TDR Grant under Vot Number Q.K130000.3556.06G26.
References 1. Kane, G.C., et al.: Aligning the organisation for its digital future (2016). https://www2. deloitte.com/us/en/insights/topics/emerging-technologies/mit-smr-deloitte-digital-transform ation-strategy.html 2. Half, R.: Staffing Digital Projects: Not as Straightforward as it Sounds (2017). https://www. roberthalf.com/blog/management-tips/staffing-digital-projects-not-as-straightforward-as-itsounds
Digital Government Competency for Omani Public Sector Managers
1019
3. Gekara, V., et al.: Skilling the Australian workforce for the digital economy. Research Report (2019) 4. Mergel, I.: Competencies for the digital transformation of public administrations (2020). https://www.co-val.eu/blog/2020/04/08/digital-transformation-of-public-administrations/. 5. Malanda, D.: Digital skills in the public sector: A systematic literature review. In: Digital Innovation and Transformation Conference (2019) 6. Okoth, S.: A review of state of the art of public administration in western academia: lessons for the Gulf States. Middle East Rev. Public Adm. (MERPA) 1(1), 331 (2015) 7. Al-Kalbani, A.: A compliance-based framework for information security in e-government in Oman (2017) 8. Osmundsen, K.: Competencies for digital transformation: insights from the Norwegian energy sector. In: Proceedings of the 53rd Hawaii International Conference on System Sciences (2020) 9. UN, U.N., UN Competency Development (2010) 10. Lee, M.-X., Lee, Y.-C., Chou, C.: Essential implications of the digital transformation in industry 4.0 (2017) 11. Butschan, J., et al.: Tackling hurdles to digital transformation—the role of competencies for successful industrial Internet of things (IIoT) implementation. Int. J. Innov. Manag. 23(04), 1950036 (2017) 12. Tornatzky, L.G., Fleischer, M., Chakrabarti, A.K.: Processes of Technological Innovation. Lexington books (1990) 13. Baker, J.: The Technology–Organisation–Environment Framework, pp. 231–245 (2011) 14. Florida, R.: The economic geography of talent. Ann. Assoc. Am. Geogr. 92(4), 743–755 (2002) 15. Al-Balushi, F., Bahari, M., Rahman, A.: Technology organizational and environmental (TOE) factors influencing enterprise application integration (EAI) implementation in omani government organizations. Ind. J. Sci. Technol. 9(46), 1–5 (2016) 16. Nam, T.: Examining the anti-corruption effect of e-government and the moderating effect of national culture: a cross-country study. Gov. Inf. Q. 35(2), 273–282 (2018) 17. Elena, S.: Digital economy and a new paradigm of the labor market. Mirovaya ekonomika i mezhdunarodnye otnosheniya 62(12), 35–45 (2018) 18. Mohtar Mohd, A.R.: Get Digital Right… through Digital Capability (2017) 19. Singapore, G.: Newly-launched GovTech to Transform Public Service Delivery with Citizencentric Digital Services and Products (2016). https://www.tech.gov.sg/media/media-releases/ newly-launched-govtech-to-transform-public-service-delivery-with-citizen-centric-digitalservices-and-products 20. GOV.UK. Government Transformation Strategy: people, skills and culture (2017). https:// www.gov.uk/government/publications/government-transformation-strategy-2017-to-2020/ government-transformation-strategy-people-skills-and-culture 21. Australian Government, D.T.A. Digital Service Platforms Strategy. Transform our culture, skills and capabilities (2020). https://www.dta.gov.au/our-projects/digital-service-platformsstrategy/six-keys-success/2-transform-our-culture-skills-and-capabilities 22. Albalushi, A., Zaidan, A., Khadir, F.A.B.A.: Competency identification of officials in omani civil service for improving government performance. Int. J. Bus. Social Sci. 10(11) (2019) 23. Prifti, L., et al.: A Competency Model for “Industrie 4.0” Employees (2017) 24. Andriole, S.J.: Skills and competencies for digital transformation. IT Prof. 20(6), 78–81 (2018) 25. Union, E.: The European e-Competence Framework 3.0, in A common European Framework for ICT Professionals in all industry sectors (2014) 26. Burrows, M.: SFIA6: The Complete Reference Guide. SFIA Foundation, London (2015)
1020
J. Al-Mahrezi et al.
27. Al-Lamki, Z.S.: The Influence of Culture on the Successful Implementation of ICT Projects in Omani E-government (2018) 28. Malama, H., Mawela, T.: Digital literacy in social media and the factors affecting a knowledgebased economy. In: Digital Innovation and Transformation Conference (2019) 29. ECORYS. Digital skills for the UK economy (2016) 30. Al-Mamari, Q.: E-Government adoption and implementation in Oman: a government perspective (2013)
Computational Vision and Robotics
Landmark Localization in Occluded Faces Using Deep Learning Approach Zieb Rabie Alqahtani1,2,3(B) , Mohd Shahrizal Sunar1,2 , and Abdulaziz A. Alashbi1,2 1 Media and Game Innovation Centre of Excellence, Institute of Human Centered Engineering,
Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia [email protected], [email protected], [email protected] 2 School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia 3 Deputyship of Communication and Digital Infrastructure, Ministry of Communication and Information Technology, Riyadh, Kingdom of Saudi Arabia
Abstract. Detecting and localizing facial landmark in occluded faces is a challenging problem for face landmark detection in computer vision. The challenge turns to be more difficult when the occlusion is high where most of the face is veiled. High occluded faces landmark localization is an ongoing research gap which motivates more accurate and highly efficient solutions. This paper presents a review of recent advances in facial landmark detection and localization, discusses available datasets and investigates the influence of occlusion on the accuracy, performance, and robustness on landmark detection. It outlines existing challenges in dealing with and controlling of occlusion. Keywords: Facial landmark detection · Face detection · Computer vision · Face recognition · Deep neural networks
1 Introduction Facial Landmarks which are also known as facial Key-points that generally specify the areas of the nose, eyes, and mouth of a face [1, 2]. Facial landmark detection system’s main goal is to automatically detect and identify the locations of the facial key-points within the face in digital images or videos [3]. These resulted key-points are the local points describing the unique location of a facial component such as eyes, nose, and mouth corners. In general, these landmark points give the shape of the face [4]. Face landmark detection is the core ingredient for many face analysis tasks [5–7] such as facial attribute inference [8].There is a huge amount of face applications in different scenarios that use facial landmark as the key-step such as Emotions Detection [9], 3D face Modeling [10], and Social Media such as Snapchat, a facial face app which has many funny and cute filters to be applied and augmented on the target face once its detected and its facial landmarks identified. The entire processing is dependent and mainly relies on facial Landmarks localization and recognition system. Once the app detects the face © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1023–1029, 2021. https://doi.org/10.1007/978-3-030-70713-2_91
1024
Z. R. Alqahtani et al.
and localizes its landmarks, it can be easy to apply many filters like putting sunglass on eyes, add beard, moustache, overlay earrings on the ears and many other funny filters that can be applied on the detected face. However, if the target face is occluded and covered, Snapshot app fails dramatically to either detect the face or localize its facial landmarks and as a result the app is unable to augment or apply any filter could be applied to that targeted face mainly due to the occlusion problem of the face. More recently, research focus on the challenging “in-the-wild” conditions, in which facial images can undergo arbitrary facial expressions, head poses, illumination, facial occlusions, and many others have increased. It is still difficult to obtain facial landmark locations from images in the above-mentioned scenarios where face appears in occlusion or with extreme lightning variations [11]. In general, there is still a lack of a robust method that can handle all those variations. For example, Snapchat face app can easily detect face and localize its facial landmarks in real time for un-occluded face. However, it becomes a real challenge to apply any augmented filters if the face is in a quite partial degree of occlusion. The reason is that the built-in face detection of Snapchat-app which is the first step failed to detect the occluded face so the landmark localization system is unable to find facial landmarks to be presented to snapchat filter-system. Regardless to the numerous numbers of research proposed for landmark localization and detection of human face in constrained conditions. However, in unconstrained conditions such as when face is heavily occluded due to many reasons such as religious concern like niqab face [12], medical masks for health reasons as it is currently due to the Corona virus COVID-19 which tragically has struck the whole world [13]. The challenge in occluded faces usually arises when the covering of face is either partial or severe where more than 50% of the face is occluded, and only few of the facial landmarks such as the two eyes are visible. There is an increasing demand to improve face detectors for occlusion because the task is now becoming more complex and too hard [6]. In this paper we presented a review of landmark localization and detection in unconstrained environment, available datasets were discussed. We tested one state-of-the-art face landmark detection algorithm MTCNN in two different dataset and compare the performance of MTCN localization of facial landmark in conventional faces where there is no occlusion, and in high degree of occlusion.
2 Related Studies 2.1 Models for Facial Landmark Localization Landmark detection and localization of human face in digital images has received a lot of focus by computer vision researchers for more than ten years. Previous work in this field was nicely summarized by [4, 14] and was divided into two main approaches. Parametric Shape Model-based Methods and Non-parametric Shape Model-based Methods. Active Appearance Model (AAM) is the classical example of holistic model based method, was proposed by [15, 16] which is a statistical model that fits the facial key-points correlations, they tried to match shape and texture simultaneously so that instead of tracking an instance deformable object they match a model that can fit whole class of objects [16]. In [17] some extensions by fitting more landmarks than needed and using
Landmark Localization in Occluded Faces Using Deep Learning
1025
two-dimensional landmark template instead of one. [17] some extensions by fitting more landmarks than needed and using two-dimensional landmark template instead of one. Cascaded regression methods have recently become one of the most well-known stateof-the-art methods for face alignment, because of the accuracy and speed they archive [18]. These types of methods learn a regression function from image appearance in order to fit the aimed output which is shape. Deep Convolutional Neural Network (CNN) has currently become the dominant models for all face related applications among them landmark localization and detection. [4] divided deep learning methods into deep learning for nonlinear shape variations and nonlinear mapping from appearance to shape. Examples for nonlinear shape are [19] which was mentioned earlier and [20] who proposed hierarchical probabilistic model for addressing the challenge of expressions and poses variations of facial features. [5] classified Deep learning-CNN into pure learning and hybrid learning methods. In pure learning methods landmark locations is predicted directly by CNN model. [21] used a cascade of four convolutional layers to predict five facial landmarks from a given bounding box of input image face, each landmark point is then refined by a shallow network, each point is modeled by a CNN for better accuracy. [8, 22] proposed TasksConstrained Deep-CNN to predict a divers of related tasks includes gender, pose and emotion expression along with facial landmark points. In [11] similar CNN model was proposed to predict face detection, landmark localization, pose estimation, and girder detection in a joint related tasks, in this model features from multiple layers were shared for utilization of the low-level to high-level feature representation. In an improvement of the CNN cascaded framework, similar work was proposed by [23] to predict gradually 68 landmark facial points instead of 5 landmark points. In an intuitive work of [24] who proposed multi task CNN framework (MTCNN) composed of three stages as illustrated in Fig. 1 for predicting simultaneously face detection and five landmark localization. In the first stage CNN-layer called P-Net inspired from [25] a region proposal network which propose regions with bounding boxes, the obtained regions are refined by NMS technique [26] to eliminate overlapped bounding boxes. The output of previous stage P-net is fed to the second stage network R-Net which perform more filtering on false positive candidates and apply also NMS on bounding boxes for more calibration. The final stage O-Net takes the output of R-Net and output face bounding box along with five facial landmark points. Regardless to the numerous researches reported on landmark localization and detection of human face in constrained conditions, however, in unconstrained conditions such as when faces are heavily occluded due to personal, medical and other reasons such as restrictions guidelines during pandemics, as in the case of COVID-19 which tragically influenced the whole world activities. The challenge in occluded faces usually arises when the covering of face is either partially or severely where more than 50% of the face is occluded, and only few of the facial landmarks such as the two eyes are visible. There is an increasing demand to improve facial landmark detection for occlusion because the task is now becoming more complex and too hard.to localize and detect more facial landmark. Very limited researches have reported and investigated landmark localization and detection. The occlusion challenge is one of the main obstacles to find and locate facial landmarks accurately.
1026
Z. R. Alqahtani et al.
2.2 Dataset for Landmark Detection Many Face landmark dataset had been proposed along with their annotation for training and evaluations which can be categorized based on the challenge they were addressing to solve. Annotated Facial Landmark in the Wild (AFLW) dataset [27] contains 25,000 images. The annotations of face landmark defined 21 landmark points according to their visibility. Helen database [28] contains 2330 high resolution images, it was annotated with dense 194 facial landmark annotations. The Annotated Faces in the Wild (AFW) database contains about 205 images with relatively larger pose variations than the other “in the-wild” databases. 6 facial landmark annotations are provided by the database [29]. The ibug dataset [30] from 300 faces in the Wild (300-W) database is the most challenging database so far with significant variations. It only contains 135 images with annotations of 68 landmarks.
3 Methodology and Implementation We have used an in-house Niqab dataset collected in a previous work by [12] which was collected to address the issue of detecting heavily occluded faces. It consists of approximately 12k photos with 14k of heavily occluded faces. The dataset was annotated and labeled to be used in deep learning training procedure. The annotation tools is used to define each (x, y) coordinates of the five landmarks points for each face with in the image and save this coordinates information in a text file named based on image name. We have used Multi-task Cascaded Convolutional Networks (MTCNN) model [31] due to its reported high accuracy and near-real time performance. We used the default threshold [0.5, 0.5, 0.7], and minimum face size = 24 as set by the author. MTCNN architecture is shown in Fig. 1.
Fig. 1. MTCNN model architecture
4 Experimental Results A pilot study has been done to evaluate the performance of current facial landmark detection algorithms in heavily occluded faces and comparing it with the performance in faces with no occlusion. AFW and Niqab dataset [12]. AFW dataset was used which contains 205 images with 472 faces in normal appearance with no occlusion except of some faces with simple occlusion such as faces wearing glasses. Niqab Dataset [12] on
Landmark Localization in Occluded Faces Using Deep Learning
1027
the other hand is a highly occluded dataset has been used for the experiment. MTCNN has been evaluated in the two mentioned datasets and testing its performance using the Niqab dataset described in the previous section. A sample of 466 photos has been randomly chosen from Niqab dataset for the testing. MTCNN algorithm obtained high accuracy in AFW, only 10 and 9 faces were classified as false positive and false negative out of 463 images with faces with no occlusion. The precision of 97% and recall of 98% indicated the high performance of the algorithm in conventional faces dataset. However, the performance in the presence of high degree of occlusion degraded dramatically. It achieved 20% of accuracy which indicating very poor performance of the algorithm in the highly occluded dataset. Table 1. Summaries the performance result of MTCNN in both AFW and Niqab dataset. It is noticed that the highly covered faces are more challenging to current face landmark detection algorithms. Table 1. Performance result of MTCNN on AFW and Niqab dataset Dataset
TP
FP
FN
Precision
Recall
F-measure
Accuracy
1
AFW
463
10
9
97%
98%
98%
98%
2
Niqab
97
12
355
88.9%
21%
33.9%%
20%
5 Conclusion and Future Work Facial landmark detection and localization of occluded faces is a challenge for recent face landmark detection algorithms. In this paper we reviewed the models and datasets of current face landmark detection. We conducted performance comparison of one current face landmark detection in normal faces dataset and in highly occluded dataset. We believe that the poor performance in highly occluded faces is due to the occlusion problem where most of the face features are very limited due to occlusion. This makes Face landmark detection and localization struggles to localize facial landmarks. For future work we are working on training facial landmark detection and localization algorithm that can be able to detect and localize accurately facial landmark points of occluded faces such as two eyes, nose and mouth corners.
References 1. Wu, Y., Shah, S.K., Kakadiaris, I.A.: GoDP: globally optimized dual pathway deep network architecture for facial landmark localization in-the-wild. Image Vis. Comput. 73, 1–6 (2018) 2. Feng, Z.-H., et al.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) 3. Dong, X., et al.: Style aggregated network for facial landmark detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
1028
Z. R. Alqahtani et al.
4. Wang, N., et al.: Facial feature point detection: a comprehensive survey. Neurocomputing 275, 50–65 (2018) 5. Wu, Y., Ji, Q.: Facial landmark detection: a literature survey. Int. J. Comput. Vis. 127(2), 115–142 (2019) 6. Zhu, M., et al.: Robust facial landmark detection via occlusion-adaptive deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) 7. Fan, H., Zhou, E.: Approaching human level facial landmark localization by deep learning. Image Vis. Comput. 47, 27–35 (2016) 8. Zhang, Z., et al.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision. Springer, Cham (2014) 9. Bargal, S.A., et al.: Emotion recognition in the wild from videos using images. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction (2016) 10. Ansari, A.-N., Abdel-Mottaleb, M.: 3D face modeling using two views and a generic face model with application to 3D face recognition. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003. IEEE (2003) 11. Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 121–135 (2017) 12. Alashbi, A.A.S., Sunar, M.S.: Occluded face detection, face in Niqab dataset. In: International Conference of Reliable Information and Communication Technology. Springer, Cham (2019) 13. Wang, C., et al.: A novel coronavirus outbreak of global health concern. The Lancet 395(10223), 470–473 (2020) 14. Kowalski, M.: Localization and tracking of facial landmarks in images and video sequences. The Institute of Radioelectronics and Multimedia Technology (2018) 15. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: European Conference on Computer Vision. Springer, Heidelberg (1998) 16. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001) 17. Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: European Conference on Computer Vision. Springer, Heidelberg (2008) 18. Cao, X., et al.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107(2), 177–190 (2014) 19. Wu, Y., Wang, Z., Ji, Q.: Facial feature tracking under varying facial expressions and face poses based on restricted Boltzmann machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013) 20. Wu, Y., Wang, Z., Ji, Q.: A hierarchical probabilistic model for facial feature detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014) 21. Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013) 22. Zhang, Z., et al.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2015) 23. Zhou, E., et al.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2013) 24. Xia, Y., Zhang, B., Coenen, F.: Face occlusion detection using deep convolutional neural networks. Int. J. Pattern Recognit. Artif. Intell. 30(09), 1660010 (2016) 25. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (2015) 26. Rothe, R., Guillaumin, M., Van Gool, L.: Non-maximum suppression for object detection by passing messages between windows. In: Asian Conference on Computer Vision. Springer, Cham (2014)
Landmark Localization in Occluded Faces Using Deep Learning
1029
27. Koestinger, M., et al.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE (2011) 28. Le, V., et al.: Interactive facial feature localization. In: European Conference on Computer Vision. Springer, Heidelberg (2012) 29. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012) 30. Sagonas, C., et al.: A semi-automatic methodology for facial landmark annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2013) 31. Zhang, K., et al.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Contrast Image Quality Assessment Algorithm Based on Probability Density Functions Features Ismail Taha Ahmed1(B) , Soong Der Chen2 , Norziana Jamil3 , and Baraa Tareq Hammad1 1 College of Computer Sciences and Information Technology, University of Anbar, Anbar, Iraq
{ismail.taha,baraa.tareq}@uoanbar.edu.iq 2 College of Graduate Studies, Universiti Tenaga Nasional, Kajang, Malaysia
[email protected] 3 College of Computing and Informatics, Universiti Tenaga Nasional, Kajang, Malaysia
[email protected]
Abstract. Recently, the existing image quality Assessment algorithms (IQAs) works focusing on distorted images by compression, noise and blurring. Reducedreference Image Quality Metric for Contrast-changed images (RIQMC) and No Reference-Image Quality Assessment (NR-IQA) for Contrast-Distorted Images (NR-IQA-CDI) have been created for CDI. For each of the five global feature that used in NR-IQA-CDI, The statistical model or the Probability Density Function (PDF) was determined using a Sun2012 database which containing a wide variety of natural scene images. NR-IQA-CDI showed poor performance in two out of three image databases, where the Pearson Correlation Coefficient (PLCC) were only 0.5739 and 0.7623 in TID2013 and CSIQ database, respectively. For this reason, we present the NR-IQA-CDI based on Monotonic Probability Density Functions (PDFs) (NR-IQA-CDI-MPCF) to address the problem of the existing bell-curve-like PDF of contrast features that cannot reflect the monotonic relation between contrast feature values and perceptual image quality. The findings indicate that the NR-IQA-CDI-MPCF outperforms the current NR-IQA-CDI, especially in the TID2013 database. Keywords: NR-IQA-CDI · Bell-curve · Monotonic relation · NR-IQA-CDI based on Monotonic Probability Density Functions (PDFs) (NR-IQA-CDI-MPCF)
1 Introduction Various kinds of distortion such as noise, blurring, fast fading, blocking artifacts and contrast which may appear because of some of certain processes on the image can degrade the quality of images. Contrast is one of the most popular forms of distortion [1, 2]. Figure 1 shows the Contrast-distorted image is low gray scale image. From the figure, we can see that the poor contrast image prefers to have a clustered histogram, while the good contrast image prefers to have a well spread histogram. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1030–1040, 2021. https://doi.org/10.1007/978-3-030-70713-2_92
Contrast Image Quality Assessment Algorithm
1031
The goal of objective image quality assessment algorithm (IQA) is to design algorithm that is able to predict the quality of an image in a way that is consistent to subjective assessment by human. In general, Objective image quality assessment algorithm (IQAs) can be classified into three categories according to the availability of reference image: full-reference (FR), reduced-reference (RR) and no-reference (NR) [3–5].
Fig. 1. First row is good and poor contrast image (My Love.bmp), the second row is a histogram.
Recently, the existing IQAs works for CDI are (1) RIQMC [6] based on entropies and image histogram order statistics; (2) Two metrics [7], i.e. histogram flatness (HFM) and spread (HS) assessed the contrast quality; (3) By incorporating both bottom-up and topdown strategies, the Reduced-reference Contrast-changed image quality index (RCIQM) was generated by [8]; and (4) NR-IQA-CDI [9] which is constructed on the basis of NSS principles such that there are certain regularities in natural scene statistics that may be missing from the statistics of distorted images [10, 11]. The NSS features are used in NR-IQA-CDI including the mean, standard deviation, entropy, kurtosis and skewness. The statistical model or the Probability Density Function (PDF) was determined using a Sun2012 database which containing a wide variety of natural scene images. The image distortion was estimated by the likelihood that it was a PDF-based natural scene image. Based on this feature set, the quality of the image is estimated. In the regression, we adopt SVR to identify the mapping function between the feature set and perceptual quality score. It just 10-fold leave one out cross validation is used. Regrettably, NRIQA-CDI performance are not positive in some of test image databases, TID2013 and CSIQ, where the PLCC are only around 0.57 and 0.76, respectively. Therefore, this paper enhances the existing NR-IQA-CDI. The optimization of Contrast Features in NR-IQA-CDI is discussed in Sect. 2. Section 3 present the Experimental Results and Discussions. Finally, the conclusion will be provided in Sect. 4.
1032
I. T. Ahmed et al.
2 Optimization of Contrast Features in NR-IQA-CDI Here, we presents how Monotonic PDF of Contrast Features are applied in NR-IQA-CDI. It begins with addressing the problems of current NR-IQA-CDI followed by application of Monotonic PDF of Contrast Feature and predicting image quality using the features. 2.1 Addressing the Problems of the Existing NR-IQA-CDI As mention in [12], Here we will briefly mention the problem and how they have succeeded to address this problem. After a series of experiments, they found that the normalized values of contrast related features such as standard deviation and entropy increase monotonically, whereas the other feature (kurtosis, skewness and mean) do not change significantly with an increasing degree of perceptual image quality. Nevertheless, as seen in Fig. 2, the monotonic relation has not been well reflected by bell-curve-like PDF. This is right particularly for the red line of the curve (right half), at which the slope is negative, where, the increase in contrast feature value, indicates poorer image quality, f > fo. Probability, p(f) 2pmax
pmax
fo
Feature value, f
Fig. 2. The monotonic relation of Bell-curve-like Probability Density Functions pdf.
After that, they address the problem of current NR-IQA-CDI which the bell-curve like pdf of the contrast related features such as standard deviation and entropy does not correlate well with the monotonic relation between the contrast features and the perceived contrast level. For more details see [12]. Therefore, the next section the details about application of Monotonic PDF of contrast feature in NR-IQA-CDI will be presented. 2.2 Application of Monotonic PDF of Contrast Feature in NR-IQA-CDI For each of the five global feature that used in NR-IQA-CDI [9], The statistical model or the Probability Density Function (PDF) was determined using a Sun2012 database which containing a wide variety of natural scene images. Global Mean, Global Kurtosis, and global Skewness, it is calculated as in the previous NR-IQA-CDI [9]. However, To maintain the monotonic relation between the contrast feature and image quality, it is
Contrast Image Quality Assessment Algorithm
1033
proposed to modify the right half of the PDF by flipping it against the horizontal line p(f) = pmax as shown by the green dash line in Fig. 2. The modified PDF function, p’(f) is as defined by Eq. 1. Therefore, the PDF of contrast features such as standard deviation and entropy are modified according to Eq. 1. It is important to note that the modified PDF does not sum to 1 so it’s a no longer a PDF in strict sense. However, it’s still called PDF to highlight the source of modification. ⎧ ⎪ p(f ) if f ≤ fo ⎨ (1) p (f ) = 2pmax − p(f ) if fo < f ≤ 2fo ⎪ ⎩ 2pmax if f > 2fo The following sub-sections will present the details about predicting image quality in NR-IQA-CDI based on Monotonic Probability Density Functions (PDFs) (NR-IQACDI-MPCF). The steps for predicting image quality are shown in Fig. 3.
Fig. 3. Flowchart of the Predicting Image Quality.
Step1: Modelling the Natural Scene Statistics: In order to model the Natural Scene Statistics (NSS), it is required to estimate the probability density function (PDF) of a feature before performing regression to estimate the relation between the rating of image quality and the probability of a feature. The details of the two steps are as described below in Step 1.1 and 1.2. Step 1.1:Estimate the PDF for each Contrast Features: In order to estimate PDF of NSS feature, a wide variety of natural scene images are required. SUN2012 database [13] consisting of 16,873 natural scene images covering large variety of image content is used for this purpose. The details of estimating PDF are as described below in Step 1.1.1 – 1.1.2. Step 1.1.1: Compute the Contrast Features of each image in database SUN2012: For each image I in the Sun2012 database [13], we calculate contrast features. Let µ
1034
I. T. Ahmed et al.
implies the sample mean operator. Then, for each image, contrast features including sample standard deviation std(I), and entropy ent(I) are calculate as: std (I ) = µ (I − µ(I ))2 , (2) ent(I ) = −
pj (I )log 2 pj (I ),
(3)
j
where I h denotes the histogram of the image I, Pi (I) denotes the probability density of ith grayscale in the image I and log (.) has base two. Step 1.1.2: Estimate the PDF of each Contrast Features by performing distribution fitting with their respective empirical distribution: Here, only the PDF of contrast features such as standard deviation and entropy are modified according to Eq. 1. It is important to note that the modified PDF does not sum to 1 so it’s a no longer a PDF in strict sense. However, it’s still called PDF to highlight the source of modification. To conduct distribution fitting for different parametric and non-parametric distributions, the empirical distribution or histogram of each of the contrast features (in Eq. 4) of the images in the SUN2012 database is used. The best-fit distribution is the one that visually suits the empirical distribution better. Figure 4 and Fig. 5 show both the empirical distribution (bar chart) and also the best-fit distribution (red curve) of each feature, −
respectively. Notice that the best-fit distribution for σ˜ , andH , is Generalized Extreme Value Distribution, and Non-parametric Distribution respectively. MATLAB© function difittool () was used to implement this step. The final features used for predicting image quality are the Natural Scene Statistics (NSS) of the Contrast Features defined by Eq. 10. They are the probabilities of the occurrence of the Contrast Features values according to their respective best-fit distribution. Step 1.2: Perform Feature Normalization and Regression: In feature normalization, the values of each feature fi are normalized against their mean, μf and standard deviation, σf such that the normalized values, Zi will have zero mean and unit standard deviation as defined in Eq. 4. Feature normalization is important before performing any machine learning such as regression because it tends to increase the accuracy. Zi =
fi − μf σf
(4)
Regression aims to find the mapping function, or better known as regression function, which map independent variables to dependent variables. In this work, the independent variables are the features and the dependent variable is the subjective mean opinion score (MOS). Regression is important in this work to remove non-linearity to improve the linear correlation between the features and MOS. In this work, Support Vector Regression (SVR) (via LIBSVM-3.12 package [14]) is used to find the regression function, similar to what has been used in the current NR-IQA-CDI for fair comparison. In SVR, regression function is determined through the approach of supervised machine learning. In this approach, a set of samples of feature values along with the target output or better known as training set has to be provided for SVR algorithm to “learn” from; the algorithm repeatedly and strategically adjusts the parameters of the function to progressively
Contrast Image Quality Assessment Algorithm
1035
Fig. 4. Histogram and the corresponding fitting curve of Standard Deviation feature rely on SUN2012 images database [13].
Fig. 5. Histogram and the corresponding fitting curve of Entropy feature rely on SUN2012 images database [13].
reduce the error between the predicted and target output until the error is minimized. Once determined, the regression function can be used to compute image quality by predicting MOS using normalized features. The details of computing image quality will be explained further in Step 2. Step 2: Computing Image Quality: The steps to compute image quality or to predict the MOS are as below: Step 2.1: Compute Contrast Features of input image. Step 2.2: Compute the probability of each of the Contrast Features using their respective PDFs from step 1.1.2 Step 2.3: Compute the final image quality using the normalization and regression function obtained from step 1.2.
1036
I. T. Ahmed et al.
3 Experimental Results and Discussions Here, the NR-IQA-CDI rely on Monotonic Probability Density Functions (PDFs) (NRIQA-CDI-MPCF) performance is evaluated. The evaluation Procedure, discussions, and conclusions was described. 3.1 Evaluation Methodology To evaluate our proposed, the same test image databases used by NR-IQA-CDI were used. Only contrast distorted images (116, 250, and 400) were chosen from CSIQ database [15], TID2013 database [16] and CID2013 database [6], respectively. Mean opinion score (MOS) or differential mean opinion score (DMOS) are used to represent the Subjective scores. In order to evaluate the IQA accuracy, K-fold Cross Validation (CV) was used to determine how well the IQA could be generalized to independent data groups while decrease bias, since regression is essentially a learning algorithm that requires training. Figure 6 and 7 show the flowchart of the performance evaluation. Three databases were split randomly into 10 subgroups when performing the K-fold cross validation as illustrated in Fig. 7. To test the proposed metric, the method named 10-fold leave-one-out CV was used. By using K times, the result of the assessment were averaged. To decrease the variability, repeated rounds of cross-validation (k = 2 to 10) were accomplished through various partitions as shown in Fig. 6. The above cross-validation has been iterated hundred times (to prevent bias). Table 1 and Table 2 show the average results. To evaluate IQA performance, three metric were used between the estimated objective scores and the subjective Mean Opinion Scores (MOS). The metrics are (1) SROCC, (2) PLCC and (3) RMSE. The Effective performance in case of correlation with human awareness if SROCC ~ 1, PLCC ~ 1 and RMSE ~ 0 [12, 17]. 3.2 NR-IQA-CDI-MPCF Evaluation Table 1 and 2 list the three performance metrics of the current NR-IQA-CDI and the proposed NR-IQA-CDI-MPCF with the three test image databases and k-fold crossvalidation with k range from 2 to 10. Figure 8 shows the bar chart graph comparing the average values of each of the three performance metrics of the two NR-IQA-CDI for each of the three databases. At a glance, NR-IQA-CDI-MPCF outperformed the existing NR-IQA-CDI in all three performance metrics using TID2013. However, there wasn’t much difference in the values of performance metrics using CSIQ and CID2013. The next section addresses and decides if the performance differences between NRIQA-CDI and NR-IQA-CDI-MPCF are significant. 3.3 Statistical Performance Analysis Percentage of Difference: Each k in each of the databases, the difference between the two-performance metrics are calculated by following di = MPCFci − ci
(5)
Contrast Image Quality Assessment Algorithm
1037
Table 1. The PLCC, SROCC and RMSE across 100 train-test iteration via NR-IQA-CDI [9].
Table 2. The PLCC, SROCC and RMSE across 100 train-test iteration via NR-IQA-CDI-MPCF (proposed).
Where ci corresponds to the first metric values without using Monotonic Probability Density Functions (PDFs) features and MPCFci corresponds to the second metric values by using Monotonic Probability Density Functions (PDFs) features. Then the average percentage of differences is calculated for all the k values and databases. The percentage is calculated by dividing the performance difference by the absolute first metric value of ci. 1 n di /abs(ci ) (6) dp = i=1 n Where n corresponds to the total number k across all databases. The absolute value is used to keep the percentage (increment or decrement) sign of difference in performance [12, 18]. The percentage of difference is shown in Table 3.
1038
I. T. Ahmed et al.
Fig. 8. Comparison of SROCC, PLCC, and RMSE of NR-IQA-CDI, NR-IQA-CDI-MPCF on CSIQ, TID2013, CID2013 databases (a, b, and c). Table 3. Percentage difference results for NR-IQA-CDI-MPCF – NR-IQA-CDI. Image DB PLCC TID2013 CID2013 CSIQ All DB
SROCC RMSE
19.29% 19.62% -11.69% 0.62%
0.55% -2.18%
-1.18% -1.05% 1.51% 6.27%
6.38% -4.11%
Statistical Significance: a Paired T-test hypothesis test [12, 18] is Implemented to the performance metric value calculated by NR-IQA-CDI [9] and NR-IQA-CDI-MPCF to produce the p-value as shown in Table 4. Generally, p-value of less than 0.05 implies that a significant difference appear within the values. In this work, a p-value less than 0.05 indicates that differences are statistically significant [19]. The discussions on the results in Table 3 and 4 are as follows: 1. Table 3 indicates that the findings using the TID2013 have improved, that was our main aim for enhancement. There was a significant increase in PLCC and SROCC by 19.29% and 19.62%, respectively. The RMSE decreased noticeably by 11.69%. The 3 p-values for TID2013 were below than 0.05, meaning significant differences in those three performance measures (see Table 4).
Contrast Image Quality Assessment Algorithm
1039
Table 4. P-values differences between NR-IQA-CDI & NR-IQA-CDI-MPCF. If p-value ≤ 0.05: the observed difference is “significant”. P-values of differences
the observed difference
Image database
PLCC
SROCC
RMSE
PLCC
SROCC
RMSE
TID2013
8.53 × 10
1.64 × 10 -9
4.43 × 10 -12
Significant
Significant
Significant
CID2013
3.08 × 10 -6 3.08 × 10 -5
5.44 × 10 -07
Significant
Significant
Significant
CSIQ
5.23 × 10 -3 3.31 × 10 -3
1.12 × 10 -03
Significant
Significant
Significant
All DB
1.11 × 10 -3 1.17 × 10 -3
3.30 × 10 -04
Significant
Significant
Significant
-11
2. For CID2013, PLCC and SROCC increased very marginally by 0.62% and 0.55%, respectively. The RMSE also decreased very marginally by 2.18%. The 3 p-values for CID2013 indicate that the differences in these three measures were not statistically significant (see Table 4). 3. For the CSIQ, there slight decrease in PLCC and SROCC by 1.18% and 1.05%, respectively. The RMSE increase slightly by 1.51%. The 3 p-values for CSIQ below than 0.05, indicate that the differences in these three measures were statistically significant (see Table 4). 4. For the average results of the three databases, there were moderate increase in PLCC and SROCC by 6.27% and 6.38%, respectively. The RMSE decreased moderately by 4.11%. The three p-values for all databases below than 0.05, indicate that the differences in these three performance matrices were statistically significant (see Table 4).
4 Conclusion In this paper, the existing NR-IQA-CDI was optimized with monotonic PDF of contrast features. The disadvantage of the existing bell-curve-like PDF was that it cannot reflect the monotonic relation between contrast feature values and perceptual image quality. Hence, it was proposed to amend the PDFs of the two contrast features, standard deviation and entropy, such that they became monotonic function. The performance evaluation indicated that NR-IQA-CDI-MPCF outperform the current NR-IQA-CDI in database TID2013 which was the primary target for improvement of this work, although there wasn’t much performance difference in database CID2013 and CSIQ. The main different between the work presented in this paper and those proposed by Fang [9] is Fang’s work used bell-curved PDF of contrast features but this paper used monotonic PDF. The findings of the CID2013 database still require more enhancement, so we recommend to use more powerful methods. Acknowledgements. This research is supported by Uniten iRMC Research Publication Fund 2021.
1040
I. T. Ahmed et al.
References 1. Gonzalez, R.C., Woods, R.E.: Digital image processing (2012) 2. Arici, T., Dikbas, S., Altunbasak, Y.: A histogram modification framework and its application for image contrast enhancement. IEEE Trans. image Process. 18, 1921–1935 (2009) 3. Ahmed, I.T., Der, C.S., Hammad, B.T.: A survey of recent approaches on no-reference image quality assessment with multiscale geometric analysis transforms. Int. J. Sci. Eng. Res. 7, 1146–1156 (2016) 4. Ece, C., Mullana, M.M.U.: Image quality assessment techniques pn spatial domain. IJCST 2, 177 (2011) 5. Ahmed, I.T., Der, C.S., Hammad, B.T.: Recent approaches on no-reference image quality assessment for contrast distortion images with multiscale geometric analysis transforms: a survey. J. Theor. Appl. Inf. Technol. 95, (2017) 6. Gu, K., Zhai, G., Yang, X., Zhang, W., Liu, M.: Subjective and objective quality assessment for images with contrast change. In: Image Processing (ICIP), 2013 20th IEEE International Conference on, pp. 383–387 (2013) 7. Tripathi, A.K., Mukhopadhyay, S., Dhara, A.K.: Performance metrics for image contrast. In: Image Information Processing (ICIIP), 2011 International Conference on, pp. 1–4 (2011) 8. Liu, M., Gu, K., Zhai, G., Le Callet, P., Zhang, W.: Perceptual reduced-reference visual quality assessment for contrast alteration. IEEE Trans. Broadcast. 63, 71–81 (2016) 9. Fang, Y., et al.: No-reference quality assessment of contrast-distorted images based on natural scene statistics. IEEE Signal Process. Lett. 22, 838–842 (2015) 10. Simoncelli, E.P., Olshausen, B.A.: Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001) 11. Geisler, W.S.: Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59, 167–192 (2008) 12. Ahmed, I.T., Der, C.S., Jamil, N., Hammad, B.T.: Analysis of probability density functions in existing no-reference image quality assessment algorithm for contrast-distorted images. In: 2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC), pp. 133–137 (2019) 13. Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp. 3485–3492 (2010) 14. Chang, C., Lin, C.: ${$LIBSVM$}$: a Library for Support Vector Machines (Version 2.3) (2001) 15. Larson, E.C., Chandler, D.M.: Categorical image quality (CSIQ) database (2010) 16. Ponomarenko, N., et al.: Color image database TID2013: peculiarities and preliminary results. In: Visual Information Processing (EUVIP), 2013 4th European Workshop on, pp. 106–111 (2013) 17. Ahmed, I.T., Der, C.S., Jamil, N., Mohamed, M.A.: Improve of contrast-distorted image quality assessment based on convolutional neural networks. Int. J. Electr. Comput. Eng. 9, 5604–5614 (2019) 18. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press (2003) 19. Fu, Y., Wang, S.: A no reference image quality assessment metric based on visual perception. Algorithms 9, 87 (2016)
The Impact of Data Augmentation on Accuracy of COVID-19 Detection Based on X-ray Images Yakoop Qasim(B) , Basheer Ahmed, Tawfeek Alhadad, Habeb Al-Sameai, and Osamah Ali Department of Mechatronics and Robotics Engineering, Taiz University, Taiz, Yemen
Abstract. COVID-19 is the most common epidemic that attacks the immune system. It was the reason of the death of more than 700 thousand people. A lot of studies made by researchers focused on diagnosing COVID-19 by using deep learning technology. In this paper, we presented a convolutional neural network based on the VGG-16 model architecture to diagnose COVID-19 based on XRay images. The Data Augmentation technique was used to increase the number of images related to the COVID-19 class from 219 images to 1000 images. After training the proposed model on 2100 images for three classes (COVID-19, Normal and Viral pneumonia) and evaluating its performance on 900 images, we obtained an overall accuracy of 96.3%. This result was higher than the result achieved without using Data Augmentation technique which was 94.4% and other studies results. We conclude that the Data Augmentation technique is very effective with X-Ray images and it has significantly improved the model performance. Keywords: Deep learning · Data augmentation technique · Convolutional neural networks · COVID-19 · Transfer learning
1 Introduction Humanity has been facing many epidemics that threaten its existence since ancient times [1]. Currently the most common epidemic is coronavirus disease 2019 (COVID-19). The first case was recorded in Wuhan, China in December 2019 [2]. Due to the rapid spread of COVID-19 the World Health Organization (WHO) declared a state of emergency in all countries of the world in January 2020 [3]. Also, WHO declared COVID-19 as an epidemic and began to set precautionary measures to limit its spread with the coordination and cooperation of governments [4]. The confirmed, death and recovery cases were recorded as over than 20, 0.7 and 12.5 million respectively until August 2020 [5]. Fever, fatigue and a dry cough are the most common symptoms of COVID-19. After the symptoms appear on a person, the cases of COVID-19 can be confirmed by the DNA of the virus from a sample taken from the person’s throat or nose and examine it by using Reverse Transcription-Polymerase Chain Reaction (RT-PCR). This technique is distinguished by very high accuracy [6]. But it has some drawbacks such the high cost and longtime etc. [7, 8], which may limit the use of this technique in resource-poor states. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1041–1049, 2021. https://doi.org/10.1007/978-3-030-70713-2_93
1042
Y. Qasim et al.
As a result of that, some of the suspected cases cannot be examined and confirmed. So we tried to find other methods to diagnose suspected cases at a short time and less cost. The Computed Tomography scan (CT) and X-ray images are characterized by the ability to detect damage on the body caused by disease or accidents [9]. Mostly COVID19 targets the respiratory system and damages the lung [10]. So, the (CT) and X-rays are capable to detect the damage that may affect the lung due to presence of a virus or other diseases in the patient’s lung [11, 12]. However, the lack of medical diagnostics experts is a problem. To solve it we resort to link the diagnostic process with the science of artificial intelligence, which provides techniques with results that are characterized by accuracy, efficiency, sensitivity and high effectiveness equivalent to RT-PCR results such as deep learning, as well as it produces results in a short time. We prefer to rely on X-ray images because it is inexpensively and the X-ray devices are available in most/all health facilities, unlike RT-PCR and CT devices. The main goal of this paper is to increase the COVID-19 classification accuracy by using Data Augmentation technique and proving the importance of using this technique when there are not many samples of images to train the CNN model.
2 Related Work To seek faster and more accurate interpretation of radiographs based on differentiation and classification of diseases’ effects. Artificial intelligence has occupied interests of researchers in developing systems for diagnosing many diseases, especially COVID-19. There are several studies related to the processes of detecting COVID-19 have been reviewed. In [13] the authors used the concatenation technique to build a model consists of ResNet50V2 [14] and Xception [15] models. The Transfer learning was used with pretrained Image Net weights to train the proposed model on 15085 X-ray images for classifying them into three classes which were COVID-19, normal and pneumonia. The proposed model achieved an overall accuracy of 91.4% for all classes. In [16] the authors presented a new CNN model and named it DarkCovidNet. The proposed model has been based on DarkNet [17] model. It was trained with 1127 X-ray images for classifying them into binary and multi-classes. The proposed model achieved a classification accuracy of 98.08% for binary classes and 87.02% for multi-classes. In [18] the authors used decompose, transfer and compose methods to analyze the input data into sub-classes. They depended on Transfer Learning technique with ResNet [19] pre-trained model. They adapted the CNN architecture based on class decomposition to design a new model which is called Decompose, Transfer and Compose (DeTraD) model. The DeTraD model is built to detect COVID-19 of 80, 105 and 11 X-ray images of normal, COVID-19 and SARS respectively. The proposed model achieved 95.12%, 97.91% and 91.87% accuracy, sensitivity and specificity respectively in the detection of COVID-19 X-ray images from other cases. In [20] the authors proposed the VGG-16 [21] model which is based on a deep learning technique to design two models separately. The first model classifies binary classification of the healthy chest X-ray images and pneumonia. It considered 3520 Xray images of healthy people and 3003 X-ray images of pneumonia including those with
The Impact of Data Augmentation on Accuracy of COVID-19 Detection
1043
COVID-19. The second model classifies binary classification of the pneumonia chest X-ray images and COVID-19. It was considered of 250 X-ray images of COVID-19 and 2753 X-ray images of pneumonia cases. The first proposed model has achieved sensitivity of 0.96, specificity of 0.98, and accuracy of 0.96 and the second has achieved sensitivity of 0.87, specificity of 0.94, and accuracy of 0.98 in the detection of COVID-19 X-ray images from other case.
3 Methodology and Dataset 3.1 Dataset In this study, a COVID-19 radiography database [22] available on Kaggle website was used, it consists of a subsets of images for three classes. It contains 219 X-ray images of COVID-19 cases, 1345 X-ray images for viral pneumonia cases and 1341 X-ray images for normal. All cases of COVID-19 images were taken and 1000 X-ray images were taken from each other classes. 3.2 Convolutional Neural Networks Convolutional neural networks (CNN) are one of the types of neural networks that are used in the field of computer vision for image and video processing and have become widely used in the field of image classification, object recognition and the formation of a convolutional neural network. There must be four basic layers to build CNN models. Convolution. It is responsible for extracting features from the images by applying filters or which is known as the kernel and perform a process of dot product to extract important features and create a feature map. Rectified Linear Unit (ReLU). It is a function used to facilitate the mathematical operations by converting the values less than zero to zeros in the feature map arrays. Pooling. They are layers used to reduce the feature maps dimensions that were created in the convolution layers while preserving the most important data in the feature maps. Fully Connected Layers. These are neural network layers which are connected to each other. These layers have a main task, which is receiving the feature maps formed in the convolution layers and converting them into a vector, then performing the classification process on this data. 3.3 Data Augmentation It is a technique used in the field of deep learning when a large dataset is not available to train the CNN model to prevent the occurrence of overfitting. It has been used in this study to raise the number of COVID-19 class images from 219 images to 1000 images. Moreover, the vertical and horizontal flip and zoom have been used to make the number of the images used in this study become 3000 images at a rate of 1000 images for each class. The dataset was divided to include 2100 images for training and 900 images for validation.
1044
Y. Qasim et al.
3.4 Proposed Model In this study, we present a CNN model based on the VGG-16 [21] model architecture which consists of 12 convolution layers were trained on ImageNet dataset and achieved top 5 tests with 92.7% accuracy. The model consists of convolution layers with 3 × 3 filters and Stride by 1 pixel as well as Padding by 1 pixel. The model also has three Dense layers that are responsible for classifying 1000 classes. The Dense layers have been removed and replaced with two layers the first layer has 1024 nodes, while the second has 3 nodes and between the two layers there is a Dropout layer with rate of 0.1. Transfer Learning was used to train the model on the new dataset related to COVID19. Transfer Learning is a technology in the field of neural networks in which the weights of a pre-trained model on a large dataset are reused to reduce the time and cost of training the model on the new dataset. Only the last convolution layer has been retrained and all other convolution layers have been frozen. Firstly, the features of the X-ray images are extracted after passing the images on the model. Then the Deep Neural Networks (DNN) layers weights and the last layer of the VGG-16 model are adapted to classify the images to three classes depend on the features of images. Generally, in this paper, the VGG-16 model was used to extract features from X-Ray images and then classify them by using the DNN layers with softmax function to three classes as COVID-19, Normal and Viral pneumonia, as shown in Fig. 1.
Fig. 1. The architecture of the deep convolutional neural network which is proposed.
4 Results To evaluate performance of the proposed model, we have created a confusion matrix. It is a matrix that represents the real cases of each class in its rows and the expected states of each class in its columns. Through the confusion matrix, we can find four parameters which are True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN). The Table 1 shows these parameters for the three classifications. Through these four parameters, we can find the evaluation metrics, which is an important standard for judging the validity of this model in diagnosing COVID-19. The higher these values are, the more effective
The Impact of Data Augmentation on Accuracy of COVID-19 Detection
1045
the model is. Since the intent of this paper is to detect and diagnose COVID-19, we will focus on the evaluation metrics related to Covid-19. These evaluation metrics are overall accuracy, accuracy for each class, sensitivity, specificity, precision and F-score. Whereas: Sensitivity represents the percentage of cases with correctly classified COVID-19. Specificity: Represents the percentage of cases without COVID-19 among all correctly classified as COVID-19. Precision: represents the percentage of infected cases with COVID-19 out of all cases classified as COVID-19. F-score: means harmonic between sensitivity and precision.
Fig. 2. Confusion Matrix for the VGG16 model without DA.
Overall Accuracy = correct predictions/total predictions. Accuracy for each class = (TP + TN)/(TP + FP + TN + FN). Sensitivity = TP/(TP + FN). Specificity = TN/(TN + FP). Precision = TP/(TP + FP). F-Measure = 2*Sensitivity*Precision/(precision + Sensitivity). From the previous Confusion Matrixes Fig. 2, 3, the four parameters have been calculated and presented in the Tables 1 and 2 for each class.
1046
Y. Qasim et al.
Fig. 3. Confusion Matrix for the proposed model with DA.
Table 1. Four values extracted from Confusion Matrix. Proposed model class
TP
FP
TN
FN
Without DA
COVID-19
57
8
592
8
Normal
294
18
347
6
Pneumonia
277
11
354
23
COVID-19
296
8
592
4
Normal
294
23
577
6
Pneumonia
273
6
594
27
With DA
5 Discussion The COVID-19 detection is the main goal of this study. So, we focused on the results which are related to COVID-19. Through the results which we obtained and shown in Table 2, we noted that the overall accuracy was 94.4% without using DA and 96.3% with using DA. The sensitivity and precision for detecting COVID-19 were 98.7 and 97.7 respectively. All of these indicate that the proposed model is very effective for COVID-19 detection.
The Impact of Data Augmentation on Accuracy of COVID-19 Detection
1047
Table 2. Evaluation metrics. Evaluation metrics
Metrics Without DA With DA
Overall accuracy
94.4
96.3
COVID-19 accuracy
97.6
98.8
Normal accuracy
96.4
97.11
Pneumonia accuracy
94.9
96.8
COVID-19 sensitivity 87.7
98.7
Normal sensitivity
97.3
98
Pneumonia sensitivity 92.3
93
COVID-19 specificity 98.7
98.8
Normal specificity
97
95
Pneumonia specificity 97
98.7
COVID-19 precision
97.7
87.7
Normal precision
94.2
94.2
Pneumonia precision
96.2
97.2
COVID-19 F-score
87.7
98.2
Normal F-score
96
95.7
Pneumonia F-score
94.2
95
Table 3. Comparison with the results of some previous studies. Study
Model
Accuracy
Xu et al. [25]
ResNet + Location attention
86.7
Tulin et al. [16]
DarkCovidNet
87.02
Mohammed and Abolfazl [13]
Xception and ResNet50V2
91.4
Wang and Wong [24]
CovidNet
92.4
Asmaa et al. [18]
ResNet-18
92.5
Sohaib et al. [26]
Inception V3
93
Ioannis et al. [23]
VGG-19
93.48
Proposed model
VGG-16
96.3
The accuracy of COVID-19 detection improved because of using DA, whereas the accuracy of COVID-19 detection increased with using DA. That means the proposed model has ability to detect COVID-19 with a small error percentage of 1.2%. We compared between the results we obtained, and the results obtained from previous studies as shown in Table 3.
1048
Y. Qasim et al.
6 Conclusion In this paper, we presented a CNN model based on the VGG-16 architecture to extract features from the images and generate features maps, then classify them through deep neural networks. Data Augmentation was used to increase the COVID-19 images from 219 images to 1000 images. From the results we obtained and shown in Table 2, we can note that using DA technique is very effective with X-Ray images. It is noticeable that the overall accuracy without using DA was 94.4% and with using DA technique increased to 96.3%. This remarkable change in the overall accuracy and precision of the COVID-19 class resulted from using DA. The results obtained with DA go one better than results obtained without DA except Sensitivity, precision and F-score of normal class. Because COVID-19 is epidemic that the world is facing it, we focused on the results of COVID-19 to find effective solutions and limit the spread of it. There are difficulties to get a large dataset of COVID-19. So, to overcome that difficulties and ensure optimal results, we hope that a large dataset of cases infected with COVID-19 will be archived by research centers or specialized authorities and upload it on free web to provide opportunities to develop more effective models and obtain better results. We concluded from this study that using DA and Transfer Learning techniques to train the CNN model on a dataset is very effective, especially when there is a great similarity between the features of the classes.
References 1. History.com editors Pandemics That Changed History. https://www.history.com/topics/mid dle-ages/pandemics-timeline. Accessed Aug 2020 2. Huang, C., et al.: Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet 395, 497–506 (2020). https://doi.org/10.1016/S0140-6736(20)30183-5 3. World Health Organization Emergency Committee regarding the outbreak of novel coronavirus. https://www.who.int/news-room/detail/30-01-2020-statement-on-the-second-mee ting-of-the-international-health-regulations-(2005)-emergency-committee-regarding-the-out break-of-novel-coronavirus-(2019-ncov). Accessed Aug 2020 4. World Health Organization (2020) WHO Director-General’s opening remarks at the media briefing on COVID-19 - 26 October. https://www.who.int/director-general/speeches/detail/ who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---26-october2020. Accessed Aug 2020 5. World Health Organization Coronavirus disease. https://covid19.who.int/. Accessed Aug 2020 6. Nicole Jawerth How is the COVID-19 virus detected using real time RT–PCR? (2020). https:// www.iaea.org/sites/default/files/6120811.pdf 7. Farcas, G.A., Soeller, R., Zhong, K., Zahirieh, A., Kain, K.C.: Real-time polymerase chain reaction assay for the rapid detection and characterization of chloroquine-resistant plasmodium falciparum malaria in returned travelers. Clin. Infect. Dis. Official Publ. Infect. Dis. Soc. Am. 42(5), 622–627 (2006). https://doi.org/10.1086/500134 8. Bustin, S.A.: Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocr. 25, 169–193 (2000) 9. Peter Kitchener (2013) Vision X-Ray Group Blog. https://www.xray.com.au/importance-ofmedical-imaging/. Accessed Aug 2020
The Impact of Data Augmentation on Accuracy of COVID-19 Detection
1049
10. Di Gennaro, F., Pizzol, D., Marotta, C., Antunes, M., Racalbuto, V., Veronese, N., Smith, L.: Coronavirus diseases (COVID-19) current status and future perspectives: a narrative review. Int. J. Environ. Res. Public Health 2020, 17, 2690 (2020). https://doi.org/10.3390/ijerph170 82690 11. Godet, C., Elsendoorn, A., Roblot, F.: Benefit of CT scanning for assessing pulmonary disease in the immunodepressed patient. 93(6), 425–430 (2012). https://doi.org/10.1016/j.diii.2012. 04.001 12. Wielpütz, M.O., Heußel, C.P., Herth, F.J., Kauczor, H.U.: Radiological diagnosis in lung disease: factoring treatment options into the choice of diagnostic modality. Deutsches Arzteblatt Int. 111(11), 181–187 (2014). https://doi.org/10.3238/arztebl.2014.0181 13. Rahimzadeh, M., A.: A new modified deep convolutional neural network for detecting COVID-19 from X-ray images. Inf. Med. Unlocked 19(2020), 100360 (2020) 14. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645. Springer, Cham (2016) 15. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) 16. Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Acharya, U.R.: Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 121(2020), 103792 (2020) 17. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017) 18. Abbas, A., Abdelsamea, M.M., Gaber, M.M.: Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. arXiv preprint arXiv:2003.138 15v3 (2020) 19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90 20. Brunese, L., Mercaldo, F., Reginelli, A., Santone, A.: Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays. Comput. Methods Programs Biomed. 196(2020), 105608 (2020) 21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2020) 22. Rahman, T., Chowdhury, M.E.H., Khandakar, A.: COVID-19 Radiography database. https:// www.kaggle.com/tawsifurrahman/covid10-radiography-database 23. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 43, 635– 640 (2020). https://doi.org/10.1007/s13246-020-00865-4 24. Wang, L., Wong, A.: COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images. arXiv preprint arXiv: 2003.09871. 25. Xu, X., Jiang, X., Ma, C., Du, P., Li, X., Lv, S., et al.: Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia. arXiv preprint arXiv:200209334 (2020) 26. Asif, S., Wenhui, Y., Jin, H., Tao, Y., Jinhai, S.: Classification of COVID-19 from Chest X-ray images using Deep Convolutional Neural Networks. medrxiv. https://doi.org/https://doi.org/ 10.1101/2020.05.01.20088211 (2020)
A Fusion Schema of Hand-Crafted Feature and Feature Learning for Kinship Verification Mohammed Ali Almuashi1,2(B) , Siti Zaiton Mohd Hashim3 , Nooraini Yusoff3 , and Khairul Nizar Syazwan3 1 Universiti Teknologi Malaysia, Johor Bahru, Malaysia 2 Jeddah University, Jeddah, Kingdom of Saudi Arabia 3 Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Pengkalan
Chepa, 16100 Kota Bharu, Kelantan, Malaysia {sitizaiton,nooraini.y,nizar.w}@umk.edu.my
Abstract. The rapid progress of technology is remarkable and becomes more widespread in various forms such as social networks, smart phones, and highdefinition cameras. In this context, analysing facial to kinship based on digital images is a new research topic in computer vision and has been increased dramatically in recent years. In this paper, we trying to detect the relationships between pairs of face images which is reflected a verification matter: given a pairs of face images with a view to find out and infer kin from the non-kin. For this, we proposed a method define by a fusion scheme composed of feature learning (high-level feature) and hand-crafted feature (low-level feature) along with features subtracting absolute value for face pair. For hand-crafted, we apply a histogram of oriented gradients (HOG) descriptor, while, convolutional neural net- works (CNN) is to represent the feature learning. In our experiment to validate the proposed method we apply restricted protocol setting. The proposed method is tested and evaluated on the benchmark databases KinFaceW-I and KinFaceW-II, and the verification accuracies of 68.6% and 73.5% were achieved, respectively. Keywords: Kinship verification · Hand-crafted feature · Feature learning · Fusion
1 Introduction In an easy and quick way, humans can identify each other by their faces and this dexterity is well demonstrated in recognizing people in images. As well as quite robust against significant changes in facial features, occlusions, pose, hairstyle, expression, and aging. With the technological advances, such as high-quality digital cameras, mobile devices, and the Internet, the exploitation of the human face as a sign and evidence of identity, emotions expressions, age, visual speech, gender, and kinship has become more significant, active, and widely applied in current real-world situations. The most interesting aspect related to people centered photos is the relationship of people in the photos. Identifying people and their relationship from images has significant social and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1050–1063, 2021. https://doi.org/10.1007/978-3-030-70713-2_94
A Fusion Schema of Hand-Crafted Feature and Feature Learning
1051
business values in different fields in the real world, for example, search for missing children, social media, genealogical lineage studies, and forensic [1, 2]. In computer vision, kinship is the task of training the machine to recognize the genetic kin and non-kin based on features extracted from digital images [3]. In general, kinship is a genetic relationship between two family members, including parent-child, and sibling-sibling relations. However, kinship analysis via facial images has obtained increasing attention in recent years [4–13]. The selection of the features extracted from the facial images is very significant in order to achieve high performance towards kinship verification. However, most of existing methods of kinship verification basically adopted a single feature descriptor, which cannot describe the face images well due to the changeable environments. Moreover, most presented methods developed for kinship verification relies much on the low-level handcrafted-based features [14]. However, relying solely on this type of feature has limitations on describing the faces im- ages and represent the underlying information of kin relationships. Thus, it is necessary to find new method which is able to make use of the complementarity of different features. Meanwhile, we discover that combining multiple types of features play a key role in effectively improving model and obtaining best performance [10, 15, 16]. Therefore, in this paper, to detect the relationships between pairs of face images, we proposed a method outlined in a fusion scheme com- posed of hand-crafted and feature learning. In particular, we propose to use the histogram of oriented gradient (HOG) for hand-crafted feature, and deep convolutional neural networks (CNN) to represent feature learning. To the best of our knowledge, there are very few studies built on the incorporated hand-crafted feature with feature learning for the kinship verification. The remainder of the paper is organized as follows. Summarizes related works on kinship verification are in the Sect. 2. Followed by the proposed method which is presented in Sect. 3. The experimental and evaluation results of kinship verification are provided in Sects. 4 and 5. Finally, conclusion and future works are given by Sect. 6.
2 Related Works Initially, the recognition of kinship and face identity might be similar, however, identity recognition (is this same person?) aims to check whether the pair of images belongs to the same person” comparable features from one person”. While in the kinship (are they relatives?), aims to check whether the pair of images are kin” comparable features from different persons”. In the studies of psychology findings attracted the attention of researchers to figure out the mechanism by which to identify the genetic kinship relationship through face that may assists and contributes to develop an auto- mated system that leads to classify and verify individuals either linked or not [17–22]. The most current methods that can distinguish positive pairs from the negative regarding to kin relationship are relies upon three sets: 1) features-based [23–26], 2) learningbased [5, 27–30], and 3) deep learning [4, 13, 31–34]. To the best of our knowledge, the first study of kinship verification which using facial images was offered by Fang et al. [35]. They proposed a method based on a set of low-level features extracted from face
1052
M. A. Almuashi et al.
such as eye color, skin color, mouth, distance between eye and eye, and distance between eye and nose. Therefore, since 2010, the concerns are going to kinship verification. Over that studies, encouraging results contributed to further research in this area. Generally, the idea of selecting and engineering features is considered extremely complex, especially when handling kinship verification [36–39]. Therefore, meditate and focus on the distinctive and relevant information of kinship to acquire a high performance. However, due to the drawbacks of hand- crafted or feature engineering [40], nowadays, many of the proposed algorithms turned to design algorithms using deep learning [41]. Deep learning has shown outperformed methods the shallow designed features which employed by most preceding studies, by learning and extracting remarkable information from facial features without a request an expertise and much effort. Moreover, these techniques caught the attention from a commercial companies and communities such as Facebook, Google and Microsoft [42, 43] as well as obtained the remarkable results across various applications such as face recognition, information retrieval, handwriting recognition, medical image analysis, and person re-identification [44–48]. Unlike the most of the previous methods introduce to kinship verification which is using a single feature particularly those that rely on the hand-crafted feature. In this paper, to detect the relationships between pairs of face images, we proposed a method outlined in a fusion scheme composed of hand-crafted and deep feature learning to represent the underlying information indicating kin relations and to better capture the discriminative features. For hand-crafted, we apply a histogram of oriented gradients (HOG) descriptor, while, convolutional neural network (CNN) is to represent the feature learning. Once the process of concatenating all features is performed for all facial images, the final step of our method which precedes the verification process is calculate the difference between features vectors (pairs of images), this action called features subtracting absolute value. We record the preliminary experimental results conducted on the benchmark kinship databases KinFaceW-I and KinFaceW-II showing very encouraging performance compared to the other methods.
3 Proposed Method The proposed method depends on fusing both hand-crafted feature and feature learning along with features subtracting absolute value for face pair to get a suitable description lead up to tackle the kinship verification. The pipeline of the proposed kinship verification method is exhibit in Fig. 1. As depicted in Fig. 1, given a pair images as input, firstly, we extract two kinds of features, and hand-crafted feature (HOG), feature learning (CNN). For CNN, we learn and extract high-level 4096-dimensional vector for each facial image using VGG model. The fusion, HOG and CNN features, and fusing are the following steps. Then, for each pair, we calculate the difference feature vectors. Finally, these vectors of features are feeding SVM classifier in order to verify the input pair image either kin or non-kin. 3.1 Hand-Crafted: Histogram of Oriented Gradients (HOG) Generally, a strong feature makes a classifier extremely superior. The general approach of kinship verification is to use global face and/or local features. In computer vision,
A Fusion Schema of Hand-Crafted Feature and Feature Learning Parent
1053
Child
Pre-processing
Extract handcrafted feature (HOG) and deep feature learning (CNN)
Fusing all features
Features subtracting absolute value
Classification
Fig. 1. The pipeline of the proposed kinship verification method.
a histogram of oriented gradients (HOG) is a feature utilized for objects detection and many others fields [49, 50]. The global description of an image is the most important advantages of the HOG feature, due to ensuring access to large amount of information to describe the image perfectly as much as possible. In our case, each facial image is represented as a single high-dimensional vector and size of the vector depends on the cell and image. To compute and extract the HOG features, resize all facial im- ages to 100 × 100 pixels height, then follow the setting in [49] towards extracting the features. In our work, images are divided into non-overlapping blocks, each block is divided into cells, and then a histogram of orientations is computed for each cell. Therefore, in order to extract HOG features, each image was divided into 2 × 2 blocks, each block was contained 8 × 8 cells. Therefore, the final size of features space in a single vector contains 4356-dimensional for kinship representation. 3.2 Feature Learning: Convolutional Neural Network (CNN) Recently, deep convolutional neural network (CNN) has become a very powerful technique to automatically learn and extracting highly distinctive features from large volume of data and construct efficient classifiers. The simplest architecture of CNN is composed of set of layers, convolutional, pooling with activation function and fully connected layer [51]. Many proposed deep CNN network architectures including GoogLeNet [52], AlexNet [53], and VGG [54] have recently shown remarkably high performance for computer vision and visual recognition tasks, which have been made available for research purposes. In the proposed method, we focus on VGG architecture. The architecture of VGG proposed in [54], and has shown impressive performance on object classification task. In addition, CNN with transfer learning is an effective solution for kinship verification due to the CNN requires a large number of data [55, 56]. However, availability such a large-scale data of kinship database is a challenging problem. Therefore, we considered to use the VGG network, which is likely to be more reliable and practical than training a model from scratch, to capture discriminative features for kinship. The feature that we have acquired is 4096-dimensional feature vector for every image.
1054
M. A. Almuashi et al.
The architecture of VGG network is a very deep network, the first layer of the network receives RGB images with a large numbers of convolution layer followed by max-pooling down to fully connected layer. The size of convolutional filters is 3 × 3 with a convolutional stride of size 1 and 2 × 2 pooling to reduce the number of parameters. The final dimensions of feature vector that obtained from the VGG is 4096. Training the VGG network maybe more expensive to train and requires computation time and large memory size. The network of VGG performs 92.7% top-5 test accuracy in ImageNet [57]. The architecture of VGG visually represented in Fig. 2.
Fig. 2. The architecture of VGG convolutional neural network model [58].
In order to extract the CNN feature, we conducting the pre-processing step for resizing all images in database to size of 224 × 224 × 3. Then, we utilize the VGG network to extract features from the fully-connected layer, which resulting 4096-dimensional vector for every image. Once all features, HOG and CNN, processes are performed for all facial images, these features will be fused to describe the faces, as shown in the next section. 3.3 Fusion Hand-Crafted Feature and Feature Learning Once extraction process for all features HOG and CNN is performed for all facial images, in the meantime, we generated features which is a fusion of the hand-crafted feature (HOG) along with feature learning (CNN). These features produced a large size of a matrix which has 8452-dimensional long feature vector. 3.4 Features Subtracting Absolute Value The next step of our method comes after obtaining a matrix of features of all images in previous step and before classifier, is calculate and find the difference between vectors of feature. The decision of choosing the subtraction property was based on extensive experiments that have been conducted, which show that the subtraction property for the kinship problem is promising. However, the result that we have acquired by this procedure will be a new matrix of features using for classifier feeding. Let we have two
A Fusion Schema of Hand-Crafted Feature and Feature Learning
1055
subsets, positive and negative kin for one class of relationship (for example, father-son) in a matrix of features: P = (xva , xvb ) (1) X = N = (xva , xvb ) where, the subset P indicates to the true kin pair and subset N refers to false kin in original features matrix X . Additionally, the va and vb in both subsets point to features of two images pairs. Therefore, in order to compute features Dvavb of subtracting absolute value between xva and xvb , we ought to use the formula as in the following: Dvavb = |va − vb|
(2)
4 Experiments and Evaluation In this section, we conducted kinship verification experiments and evaluation on benchmark kinship database to show the performance and efficacy of the proposed method. Furthermore, describe the principles on which it relies experiments including database, settings, evaluation, and results discussion. 4.1 KinFaceW-I and KinFaceW-II Databases All conducted experiments were based on two standard databases, namely KinFaceWI and KinFaceW-II collected by [5], so as to evaluate and assess the performance of our proposed method for kinship verification. These two databases considered commonly used in relation to the verification of kinship. The characteristics of these databases are that the face images are grouped from the Internet, it taken under uncontrolled environments and release from restriction in the matter of lighting, background, pose age, expression, and race. In addition, four classes of kinship are composed both databases, namely, father-son (F-S), father-daughter (F-D), mother-son (M-S), and mother-daughter (M-D) contains 134, 156, 127, and 116 pairs of facial images (overall: 1,066 im- ages) respectively for the KinFaceW-I acquired from different photos, and 250 pairs of facial images (overall: 2000 images) for each relation for the KinFaceW- II acquired from same photo. In both databases, all facial images are colorful and the dimensions is 64 × 64 pixels. The examples images of these databases are shown in Fig. 3. 4.2 Experimental Setting In our experiment, in order to validate the proposed method for kinship verification we are complying with restricted protocol setting which is described in [5, 59]. Moreover, it should be pointed out that in the most previous kinship studies conducted under the restricted setting. The restricted protocol setting indicates the setting of using only the kin relation labels (kin or non-kin) of given pairs of images for training. Therefore, it is not allowed to generate additional image pairs utilizing available images or from external
1056
M. A. Almuashi et al.
Fig. 3. The instances pair image from two databases of some types of kinship. The left images are from KinFaceW-I, while the right images are from KinFaceW-II, both databases represent positive and negative kinship based on four relations, father-son (F-S), father-daughter (F-D), mother-son (M-S), and mother-daughter (M-D). Table 1: The index of the five folds of F-S, F-D, M-S, and M-D facial pairs classes on both KinFaceW-I and KinFaceW-II databases. Folds KinFaceW-I
KinFaceW-II
F-S
F-D
M-S
M-D
All classes
1
1–27
1–31
1–25
1–23
1–50
2
28–54
32–64
26–50
24–46
51–100
3
55–81
65–96
51–75
47–69
101–150
4
82–108
97–124
76–101
70–92
151–200
5
109–134
125–156 102–127 93–116 201–250
data or any other information for the purpose of increasing the size of the database for training an algorithm, as is the case under the unrestricted protocol. Table 1 presents the index of the five folds of F-S, F-D, M-S, and M-D facial pairs for the two databases in order to obtain a fair comparison between various approaches. However, we apply five-fold cross validation on the KinFaceW-I and KinFaceW-II databases. Each class/subset of kin relations F-S, F-D, M-S, and M-D in KinFaceW-I built and analysis separately, and split evenly as well as KinFaceW-II. On top of that, each fold made up of roughly the same number of image pairs with kin relations. Additionally, the pairs (parent with his/her true child) with kinship (positive relation) alongside the pairs (parent with his/her false child) with no kinship (negative relation) are contained in each fold with same number of pairs.
5 Results and Discussion This section is dedicated to conduct experiments of kinship verification on databases based on the restricted type of protocol setting, introduced in Sect. 4.2.
A Fusion Schema of Hand-Crafted Feature and Feature Learning
1057
We applied our method on two different feature types, HOG, CNN, which extracting from each face image. For the HOG feature, entire facial images in both databases are colorful and the size is 100 × 100 pixels. Describe each facial image was computed and extracted by a 4356-dimensional long feature vector. The reason we selected HOG feature is that they have shown reasonably good performance in different domains, (e.g. [60, 61]). For the CNN feature, for all the colorful images in the both databases, we resize to 224 × 224 pixels. The result of feature learning that extracted from immediately layer by using VGG network computes a 4096-dimensional vector for all images. Then, each feature vector in HOG (hand-craft) and the CNN (feature learning) will be concatenated into a long feature vector, consequently, we acquired large vector represents two descriptors, which holds 8452-dimensional of features. The problem of kinship verification is a binary classification. Therefore, in our experiments, in order to verify that there is kinship relationship between a pair of facial images we used support vector machine (SVM) and the polynomial kernel. However, the classifier SVM has showed remarkable performance as regards to the task of kinship verification. To implement our proposed method, for deep CNN features extraction we used Keras deep learning library runs on top of Theano for Python [62], and Matlab for rest tasks. The specifications of computer that used for the process of implementation are CPU is i5-4460, 3.20 GHz and 16 GB of DDR3 memory. 5.1 Experimental Results on KinFaceW-I and KinFaceW-II Databases The mean verification results accuracy of the proposed method based on the on KinFaceW-I and KinFaceW-II databases are listed on Table 2. Table 2. The mean verification accuracy (%) under restricted setting on the KinFaceW-I and KinFaceW-II databases. Relation
KinFaceW-I
KinFaceW-II
HOG
CNN
HOG+CNN
HOG
CNN
HOG+CNN
F-S
55.46%
66.96%
74.0%
56.40%
70.00%
81.0%
F-D
54.12%
61.99%
70.2%
56.60%
65.00%
72.8%
M-S
51.70%
65.91%
63.3%
52.60%
68.00%
71.2%
M-D
51.80%
71.70%
66.9%
51.97%
68.80%
69.0%
Mean
53.27%
66.64%
68.6%
54.39%
67.95%
73.5%
The results that given by Table 2 and Fig. 4 indicate that the proposed method by fuse different types of feature representation has made better verification results (68.60% and 73.50%) as compared to the single feature representation, HOG (53.27% and 54.39%) and CNN (66.64% and 67.95%), on both databases. The reason for the increase in accuracy is the feature fusion helps to fully describe the image and obtain discriminative and relevant features. However, the mean verification rate for all four relations on
1058
M. A. Almuashi et al.
Verification accuracy
Restricted setting 80% 70% 60% 50% 40% 30% 20% 10% F-S
F-D
M-S
M-D
Pair of image relations
(a)
Verification accuracy
Restricted setting 90% 80% 70% 60% 50% 40% 30% 20% 10% F-S
F-D
M-S
M-D
Pair of image relations
(b) Fig. 4. The mean verification accuracy (%) of four subsets relations under restricted setting on the KinFaceW-I (a) and KinFaceW-II (b) databases.
KinFaceW-II 73.50% is superior to those on KinFaceW-I 68.60%, and improved verification rate of 4.90%. In addition, we can see that the use deep learning CNN feature gives results higher than the use handcrafted HOG feature. This may be because the deep CNN able to finding several signals that can be exploited to determine kinship, such as lighting, background, clothing, etc., and hence boost in accuracy performance. Further, in all tested databases, the verification rates on F-S relation is earned higher than other relations subsets. The observations derived from the results as stated in Table 2 can be summarized in the following: • Different descriptors can complementary to other descriptors, therefore the integrated use of them could lead to get the better outcomes. This observation means that various descriptors can be able to provide different results for cues that can be used to detect kinship. In this regard, kinship models built upon compact set of prominent features extracted from different type of descriptors can yield the desired accuracy performance.
A Fusion Schema of Hand-Crafted Feature and Feature Learning
1059
• Again, in the KinFaceW-II, each kin relationship is taken from same sources of photos and contains a large number of images. For these reasons it seems obviously that the performance resulting from the KinFaceW-II database achieved better verification accuracy compared to KinFaceW-I database for all relations subsets. • With regard to that we got a low performance rate on proposed method compared to the other methods, we know in advance the high-level features that generated by CNN require large scale-size of kinship pair instances in order to determine kin relationships and attain encouraging and promising verification performance. Therefore, the main reason for this is because the current KinFaceW databases contain a very small number of face pairs, thereby preventing the extracting the effective and discriminative feature. Specifically, Table 3 and Table 4 show comparison between the results of proposed method with previous studies on the KinFaceW-I and KinFaceW-II databases. Table 3. The comparison of mean verification accuracy (%) between different methods on the KinFaceW-I database. Author
Method
[11]
Local binary pattern and the pyramid 76.36% multi-level (LBP-PML)
[12]
ALEXNET-SVM
64.25%
[63]
Deep kinship verification (DKV)
66.9%
[64]
Ensemble similarity learning (ESL)
74.1%
[65]
Neighbourhood repulsed correlation metric learning (NRCML)
65.8%
[56]
Fusion LBP-LPQ-BSIF-CNN
68.4%
Proposed method Fusion HOG+CNN
Result
68.6%
Table 4. The comparison of mean verification accuracy (%) between different methods on the KinFaceW-II database. Author
Method
Result
[11]
Local binary pattern and the pyramid multi-level (LBP-PML)
76.63%
[12]
ALEXNET-SVM
77.2%
[63]
Deep kinship verification (DKV)
69.5%
[64]
Ensemble similarity learning (ESL)
74.3%
[65]
Neighbourhood repulsed correlation metric learning (NRCML)
65.8%
[56]
Fusion LBP-LPQ-BSIF-CNN
66.5%
Proposed method
Fusion HOG+CNN
73.5%
The performance of our proposed method, which mainly based on fusing handcrafted feature and feature learning, yields mean accuracy 68.6% on KinFaceW-I which
1060
M. A. Almuashi et al.
achieve a higher level of performance as compared to the DKV (66.9%), NRCML (65.8%), fusion LBP-LPQ-BSIF-CNN (68.4%), and ALEXNET-SVM (64.25%) methods, which achieves gain 1.7%, 2.8%, 0.2%, and 4.35%, respectively, while lower performance than ESL (74.1%) and LBP-PML (76.36%). Similarly, on the KinFaceW-II, the proposed method records mean accuracy 73.5% which is higher than DKV (69.5%), NRCML (65.8%), fusion LBP-LPQ-BSIF- CNN (66.5%) methods, which achieves gain 4.0%, 7.7%, and 7.0%, respectively. On the other hand, the proposed method gives lower performance than ESL (74.3%), LBP-PML (76.63%), and ALEXNET-SVM (77.2%) methods.
6 Conclusion and Future Works Most of the present kinship verification works are mainly based on the single feature descriptor especially shallow hand-crafted features, which provide low performance and unsatisfactory results. In this paper, to detect the relation- ships between pairs of face images, we introduced a method, a fusion scheme composed of hand-crafted (low-level feature) and feature learning (high-level feature). The results inspired by experiments clarify that we can take advantage of the proposed method to address kinship verification problem. The aspirations and ambition to improve the verification of kinship still in progress. Therefore, in the future works, take the initiative to develop and design a new reliable database that simulate reality that would eliminate the defects in the existing databases. In addition, enhance the efficiency of features descriptors to better represent the face image and hence improve kinship verification.
References 1. Robinson, J.P., Shao, M., Wu, Y., Liu, H., Gillis, T., Fu, Y.: Visual kinship recognition of families in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 40(11), 2624–2637 (2018) 2. Chen, X., An, L., Yang, S., Wu, W.: Kinship verification in multi-linear coherent spaces. Multimedia Tools Appl. 76(3), 4105–4122 (2015) 3. Yan, H., Lu, J.: Facial Kinship Verification: A Machine Learning Approach. Springer, Singapore (2017) 4. Zhang, K., Huang, Y., Song, C., Wu, H., Wang, L.: Kinship verification with deep convolutional neural networks (2015) 5. Lu, J., Zhou, X., Tan, Y.P., Shang, Y., Jie, Z.: Neighborhood repulsed metric learning for kinship verification. IEEE Trans. Patt. Anal. Mach. Intell. 36(2), 331–345 (2013) 6. Kou, L., Zhou, X., Xu, M., Shang, Y.: Learning a genetic measure for kinship verification using facial images. Math. Probl. Eng. 2015, 5 (2015) 7. Lu, J., Hu, J., Zhou, X., Zhou, J., Castrilln-Santana, M., Lorenzo-Navarro, J., Kou, L., Shang, Y., Bottino, A., Vieira, T.F.: Kinship verification in the wild: the first kinship verification competition. In: IEEE International Joint Conference on Biometrics, pp. 1–6. IEEE (2014) 8. Lu, J., Hu, J., Tan, Y.: Discriminative deep metric learning for face and kinship verification. IEEE Trans. Image Process. 26(9), 4269–4282 (2017) 9. Liu, Q., Puthenputhussery, A., Liu, C.: A novel inheritable color space with application to kinship verification. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE (2016)
A Fusion Schema of Hand-Crafted Feature and Feature Learning
1061
10. Dornaika, F., Arganda-Carreras, I., Serradilla, O.: Transfer learning and feature fusion for kinship verification. Neural Comput. Appl. 32, 7139–7151 (2019) 11. Chergui, A., Ouchtati, S., Sequeira, J., Bekhouche, S.E., Bougourzi, F.: Kinship verification using BSIF and LBP. In: 2018 International Conference on Signal, Image, Vision and their Applications (SIVA), pp. 1–5. IEEE (2018) 12. Rehman, A., Khalid, Z., Asghar, M.A., Khan, M.J.: Kinship verification using deep neural network models. In: 2019 International Symposium on Recent Advances in Electrical Engineering (RAEE), vol. 4, pp. 1–6. IEEE (2019) 13. Chergui, A., Ouchtati, S., Mavromatis, S., Bekhouche, S.E., Sequeira, J.: Investigating deep CNNs models applied in kinship verification through facial images. In: 2019 5th International Conference on Frontiers of Signal Processing (ICFSP), pp. 82–87. IEEE (2019) 14. Qin, X., Liu, D., Wang, D.: A literature survey on kinship verification through facial images. Neurocomputing 377, 213–224 (2020) 15. Wang, X., Kambhamettu, C.: Leveraging appearance and geometry for kinship verification. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 5017–5021. IEEE (2014) 16. Boutellaa, E., Lpez, M.B., Ait-Aoudia, S., Feng, X., Hadid, A.: Kinship verification from videos using spatio-temporal texture features and deep learning (2017) 17. Dal Martello, M.F., Maloney, L.T.: Where are kin recognition signals in the human face? J. Vis. 6(12), 2 (2006) 18. DeBruine, L.M., Smith, F.G., Jones, B.C., Roberts, S.C., Petrie, M., Spector, T.D.: Kin recognition signals in adult faces. Vis. Res. 49(1), 38–43 (2009) 19. Froelich, A.G., Nettleton, D.: Does my baby really look like me? using tests for resemblance between parent and child to teach topics in categorical data analysis. J. Stat. Educ. 21(2), 1–19 (2013) 20. Kaminski, G., Dridi, S., Graff, C., Gentaz, E.: Human ability to detect kinship in strangers’ faces: effects of the degree of relatedness. Proc. Biol. Sci. 276(1670), 3193–3200 (2009) 21. Maloney, L.T., Dal Martello, M.F.: Kin recognition and the perceived facial similarity of children. J. Vis. 6(10), 4 (2006) 22. Park, J.H., Schaller, M., Van Vugt, M.: Psychology of human kin recognition: heuristic cues, erroneous inferences, and their implications. Rev. Gen. Psychol. 12(3), 215–235 (2008) 23. Duan, X., Tan, Z.-H.: A feature subtraction method for image based kinship verification under uncontrolled environments. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1573–1577. IEEE (2015) 24. Guo, G., Wang, X.: Kinship measurement on salient facial features. IEEE Trans. Instrum. Measure 61(8), 2322–2325 (2012) 25. Zhou, X., Hu, J., Lu, J., Shang, Y., Guan, Y.: Kinship verification from facial images under uncontrolled conditions. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 953–956 (2011) 26. Zhou, X., Lu, J., Hu, J., Shang, Y.: Gabor-based gradient orientation pyramid for kinship verification under uncontrolled environments. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 725–728 (2011) 27. Xia, S., Shao, M., Luo, J., Fu, Y.: Understanding kin relationships in a photo. IEEE Trans. Multimedia 14(4), 1046–1056 (2012) 28. Hu, J., Lu, J., Tan, Y., Yuan, J., Zhou, J.: Local large-margin multi-metric learning for face and kinship verification. IEEE Trans. Circ. Syst. Video Technol. 28(8), 1875–1891 (2017) 29. Fang, Y., Yan, Y., Chen, S., Wang, H., Shu, C.: Sparse similarity metric learning for kinship verification. In: 2016 Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2016) 30. Liang, J., Hu, Q., Dang, C., Zuo, W.: Weighted graph embedding-based metric learning for kinship verification. IEEE Trans. Image Process. 28(3), 1149–1162 (2018)
1062
M. A. Almuashi et al.
31. Patil, H.Y., Chandra, A.: Deep learning based kinship verification on kinfacew-i dataset. In: 2019 IEEE Region 10 Conference (TENCON), TENCON 2019, pp. 2529–2532. IEEE (2019) 32. Chergui, A., Ouchtati, S., Sequeira, J., Bekhouche, S.E., Bougourzi, F., Telli, H.: Deep features for kinship verification from facial images. In: 2019 International Conference on Advanced Systems and Emergent Technologies (IC ASET), pp. 64–67. IEEE (2019) 33. Yang, Y., Wu, Q.: A novel kinship verification method based on deep transfer learning and feature nonlinear mapping. In: AIEA 2017 (2017) 34. Kohli, N., Vatsa, M., Singh, R., Noore, A., Majumdar, A.: Hierarchical representation learning for kinship verification. IEEE Trans. Image Process. 26(1), 289–302 (2016) 35. Fang, R., Tang, K.D., Snavely, N., Chen, T.: Towards computational models of kinship verification. In: 2010 IEEE International Conference on Image Processing, pp. 1577–1580. IEEE (2010) 36. Bottino, A., De Simone, M., Laurentini, A., Vieira, T.: A new problem in face image analysisfinding kinship clues for siblings pairs. In: ICPRAM, vol. 2, pp. 405–410 (2010) 37. Vieira T.F., Bottino A., Islam I.U.: Automatic verification of parent-child pairs from face images. In: Iberoamerican Congress on Pattern Recognition, pp. 326–333. Springer, Heidelberg (2013) 38. Xia, S., Shao, M., Fu, Y.: Toward kinship verification using visual attributes. In: Proceedings of the 21st International Conference on Pattern Recognition, ICPR2012, pp. 549–552 IEEE (2012) 39. Almuashi, M., Mohd Hashim, S.Z., Mohamad, D., Alkawaz, M.H., Ali, A.: Automated kinship verification and identification through human facial images: a survey. Multimed. Tools Appl. 76(1), 265–307 (2017) 40. Hu, G., Yang, Y., Yi, D., Kittler, J., Christmas, W., Li, S.Z., Hospedales, T.: When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 142–150 (2015) 41. Bengio, Y.: Learning Deep Architectures for AI. Now Publishers Inc. (2009) 42. Jung, K., Zhang, B.-T., Mitra, P.: Deep learning for the web. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1525–1526 (2015) 43. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675– 678 (2014) 44. Deng, L.: Three classes of deep learning architectures and their applications: a tutorial survey (2012) 45. Deng, L.: A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Sig. Inf. Process. 3, E2 (2014) 46. Chen, Y., Zhu, X., Gong, S.: Person re-identification by deep learning multi-scale representations. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2590–2600 (2017) 47. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017) 48. Pelin, G., Simsek, A.: Face recognition via deep stacked denoising sparse autoencoders (DSDA). Appl. Math. Comput. 355, 325–342 (2019) 49. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005) 50. Wei X., Guo G., Wang H., Wan H.: A multiscale method for HOG-based face recognition. In: International Conference on Intelligent Robotics and Applications, pp. 535–545. Springer, Cham (2015)
A Fusion Schema of Hand-Crafted Feature and Feature Learning
1063
51. Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 1–62 (2020) 52. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 53. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 54. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014) 55. Alshazly, H., Linse, C., Barth, E., Martinetz, T.: Ensembles of deep learning models and transfer learning for ear recognition. Sensors 19(19), 4139 (2019) 56. Lopez, M.B., Hadid, A., Boutellaa, E., Goncalves, J., Kostakos, V., Hoiso, S.: Kinship verification from facial images and videos: human versus machine. Mach. Vis. Appl. 29(5), 873–890 (2018) 57. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015) 58. Nash, W., Drummond, T., Birbilis, N.: A review of deep learning in the study of materials degradation. npj Mater. Degrad. 2(1), 1–12 (2018) 59. Duan, X., Tan, Z.-H.: Neighbors based discriminative feature difference learning for kinship verification. In: International Symposium on Visual Computing, pp. 258–267. Springer, Cham (2015) 60. Dong, J., Ao, X., Su, S., Li, S.: Kinship classification based on discriminative facial patches. In: 2014 IEEE Visual Communications and Image Processing Conference, pp. 157–160. IEEE (2014) 61. Kobayashi, T., Hidaka, A., Kurita, T.: Selection of histograms of oriented gradients features for pedestrian detection. In: International Conference on Neural Information Processing, pp. 598–607. Springer, Heidelberg (2008) 62. Chollet, F.: Keras: deep learning library for theano and tensorflow, vol. 7, no. 8, p. T1 (2015) 63. Wang, M., Li, Z., Xiangbo Shu, J., Tang, J.: Deep kinship verification. In 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2015) 64. Zhou, X., Shang, Y., Yan, H., Guo, G.: Ensemble similarity learning for kinship verification from facial images in the wild. Inf. Fus. 32, 40–48 (2016) 65. Yan, H.: Kinship verification using neighborhood repulsed correlation metric learning. Image Vis. Comput. 60, 91–97 (2017)
Lossless Audio Steganographic Method Using Companding Technique Ansam Osamah Abdulmajeed(B) College of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq [email protected]
Abstract. The objective of the work presented here was an implementation of lossless steganographic method on audio files in the frequency domain. The main contribution here was to use of companding technique in audio files to preserve the secret data reversibility with less influence on the signal. This was achieved by making the greatest changes in the less significant coefficients. In this work, secret bits were hidden in the detail components of the first level of integer wavelet transform using companding technique. Location map was created, to prevent samples’ overflow/underflow resulting from companding technique, and losslessly compressed using a proposed compression method. Subsequently, it was embedded in the approximation components of that level using LSB replacement. Prior to embedding the compressed location map, the proposed method used Fredkin gate to jumble both of the original LSBs and the compressed location map under controlling of a long secret key. This process was used to increase the security and preserve the reversibility. Results showed that the proposed method kept good quality of the stego-audio (SNR was above 30 dB), as well as restored the cover audio without any loss. In addition, the proposed compression method for location map achieved acceptable compression ratio. Furthermore, the security level was increased by use of Fredkin gate. In conclusion, companding technique can be used to achieve lossless data hiding with negligible effect on the audio quality when it is applied on the less significant coefficients. Keywords: Lossless data hiding · Steganography · Integer wavelet transform · Companding technique · Fredkin gate
1 Introduction One of the most important issues that should be taken in the consideration when the secret data are transferred between parties is the security. Steganography is one of the techniques that is used to embed secret data within other media, called cover media, in a manner that is difficult for third parties to reveal the existence of it [1]. Cover media can be image, audio or video. In addition, steganographic methods are classified on a basis of the domain into two main categories: transform domain and time domain steganography. In transform domain, steganography transforms the cover media into frequency domain and embeds the secret data in the coefficients. Time domain steganography, in contrast, embeds the data into the explicit samples [1]. According to the ability of cover media © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1064–1074, 2021. https://doi.org/10.1007/978-3-030-70713-2_95
Lossless Audio Steganographic Method Using Companding Technique
1065
restoration, steganography methods can be classified into lossless and lossy methods. Lossless steganography methods involve retrieving the secret data as well as getting the cover media back to its origin without any loss. Unlike lossless steganography methods, lossy methods cannot restore the cover media back. In some critical and sensitive applications, like military, medical image processing, remote sensing, and law-enforcement, any change in cover media may confusing the analysis [2]. Therefore, lossless steganography algorithms have been carefully tested in several studies. Most studies used images as cover media, and applied algorithms using different methods such as difference expansion [3–6], contrast mapping [7], prediction error expansion and histogram modification [8, 9], and companding technique [10, 11]. The proposed algorithms, however, were less commonly applied in audio files compared to image files. Lossless data hiding method in audio files was proposed by Yan and Wang [12] using prediction error expansion taking the advantage of the correlation between three adjacent samples. The bits were embedded in the prediction error. Location map was assigned to the locations that did not fulfill an expandability condition. Subsequently, it was embedded in the audio file after compressed it using run length and Huffman encoding. Differential evolution algorithm was used by Wang et al. [13] to define the most appropriate linear prediction coefficients. The secret data were inserted using histogram shifting and expansion of prediction errors. The location map was compressed using run length and Huffman encoding. Unlike previous expansion methods that applied expansion by 2, expansion by α (where 1 < α ≤ 2) was performed by Nishimura [14] to control the payload and the quality of audio file. In that research, the secret bits were added/subtracted to/from the rounded value resulted from expansion. According to Huang et al. [15], secret data were hidden by expansion of the integer DCT coefficient. In that work, reversible data hiding technique was employed in tampering detection and localization algorithm in audio files. Companding technique and its impact on lossless data hiding in audio files has not been tested, yet. Therefore, the objective of this work was to implement a lossless steganographic method on WAV audio files based on applying companding technique in the frequency domain. In this work, secret bits were hidden in the detail components of the first level of integer wavelet transform using companding technique. Location map was created to fix samples’ overflow/underflow issue. The location map was compressed using a proposed lossless compression method, and embedded in the approximation components of that level using LSB replacement. To increase the security and preserve the reversibility, the proposed method used fredkin gate to jumble both the original LSBs and the compressed location map under controlling of a long secret key prior the embedding.
2 Materials and Methods 2.1 Integer Wavelet Transform Discrete Wavelet Transform (DWT) is the most widely used transform in processing digital signals. It transforms the signal into a scaled and shifted form of that signal and breaks it up into approximation components (low-frequency), and detail components (high-frequency) [16]. The eventual coefficients of DWT are represented as floating point. In case of using DWT on integer signals, the floating-point format can lead to
1066
A. O. Abdulmajeed
error in secret data extraction due to the rounding operation on the coefficients. Integer wavelet transform, an integer version of DWT, is adopted to solve the above-mentioned issue [17]. Integer wavelet transform also called lifting wavelet transform (LWT). In addition to the integer format of coefficients, LWT has more other advantages over the DWT. It minifies time and memory space due to the in-place computation [18]. In the current work, CDF (2,2) wavelet was adopted according to the equations below [19]: Forward LWT: Splitting the signal x: si = x2i , di = x2i+1
(1)
prediction:
1 1 di = di − (si + si+1 ) + 2 2
(2)
update: si = si +
1 1 (di−1 + di ) + 4 2
(3)
Backward LWT: Inverse update:
1 1 si = si − (di−1 + di ) + 4 2
(4)
Inverse prediction: di = di +
1 1 (si + si+1 ) + 2 2
(5)
Merge: x2i = si , x2i+1 = di
(6)
2.2 Companding Techniques Companding technique can be applied in reversible data hiding, if the following relationship is satisfied: E(C(x)) = x where: x is the signal. C(x), E(x) are the compression and the expanding function respectively. The simplest form of this technique used for data hiding is the following [20]:
(7)
Lossless Audio Steganographic Method Using Companding Technique
1067
1. Performing of compression function on original signal x, such that C(x) = x = 2 × x. 2. Insertion of secret bit b into LSBs of x, such that x = 2 × x + b. 3. Extracting of the secret bit b from LSBs of received x. 4. Applying of expanding function E(x) on x to restore signal x, such that x = (x –b)/2. 2.3 Fredkin Gate Fredkin gate is one of the reversible logical gates. The simplest form of Fredkin gate accepts three inputs, and outputs three outputs. In this form, one input controls the output of the other inputs. Fredkin gate acts as exchange gate; if a, b, and c are three inputs, where c is the controlling input, Fredkin gate maps the inputs to the outputs as the following [21, 22]: ⎧ ⎪ a = a ⎪ ⎪ ⎪ ⎪ b = b , if c = 0 ⎪ ⎪ ⎨ c =c F(a, b, c) = (8) =b ⎪ a ⎪ ⎪ ⎪ ⎪ ⎪ b = a , if c = 1 ⎪ ⎩ c =c
2.4 The Proposed Method At sender side, (See Fig. 1), the LWT was applied on the cover audio, the secret bits were hidden in the detail components using companding technique. In order to avoid overflow/underflow resulted from companding technique on some of detail coefficients, the proposed method hid the secret bits just in the low detail coefficients according to certain threshold T such that –T ≤ coeff. ≤ T. Here, the adopted threshold was 511. The above-mentioned condition necessitated the creation of location map which decides whether the current coefficient satisfies the condition or not. Taking into account that most of the coefficients that satisfy the condition (location map bit = 1), or not (location map bit = 0) are often located close to each other, the current work proposed lossless compression method, (see Algorithm 1), to compress such location map. The location map had large duplications of consecutive 1’s and 0’s intercepted with smaller duplications of 0/1. The proposed compression method encoded each 8 consecutive 0s and 1s as two 0s and two 1s, respectively, and encoded each individual 0 as 0 followed by 1, and each individual 1 as 1 followed by 0.
1068
A. O. Abdulmajeed
In order to add additional level of security and retain the reversibility of the proposed method, the original LSBs of the approximation coefficients were kept aside and inputted to Fredkin gate along with the compressed location map bits under controlling of a long secret key. The secret key was a binary representation of k 10 where k was large integer number of 15 digits. The jumbled LSBs resulting from Fredkin gate were sent to recipient throughout other covert channel, while the jumbled compressed location map was embedded in the high approximation coefficients using LSB replacement in the three LSBs bits under the condition –512 ≥ coeff. ≥ 512. Finally, the inverse LWT was applied to get the stego-audio.
The scrambled SBs send to recipient
Location map
Compression Detail Component
LWT
Approximation Component Original LSBs
Hiding location map using LSB replacement in the 3 LSBs of original coefficients
The scrambled compressed location map
ILWT
send to recipient
Fredkin gate
Hiding data using companding technique according to the location map
Secret Data
Original audio
Secret key10
Stego audio
modified coefficient
modified coefficient
Fig. 1. The proposed method at sender side
At recipient side, (see Fig. 2), LWT was applied on the received stego-audio. The three LSBs were extracted from the approximation components and inputted to Fredkin gate along with the received LSBs under the secret key controlling. The first output, original LSBs, was replaced by the LSBs of the approximation coefficients using LSB replacement. The other output, the compressed location map, was decompressed and used to extract the data from the detail coefficients using companding technique. Then, inverse LWT was applied to retrieve an identical copy of the original audio.
Lossless Audio Steganographic Method Using Companding Technique
1069
Original coefficients
Original location map
Received audio
LWT
The compressed location map Secret key10
Decompression
Fredkin gate
The received LSBs
ILWT
Detail Component LSB extraction
Approximation Component
The 3 LSBs from the coeffii Original LSBs
LSB replacement The 3 LSBs from the coefficients
Original audio
Extracted Data
Extracting using companding technique according to the location map
Original coefficients
Fig. 2. The proposed method at recipient side
2.5 Performance Evaluation Metrics The common metrics used to evaluate data hiding is Signal to Noise Ratio (SNR) which analyzes the amount of noise in the signal. SNR can be calculated as the following:
L SNR = 10 log 10 x(i)2 /MSE (9) i=1
where x, x’ and L are the original, stego signals, and the length of signal respectively, MES is the Mean Square Error, which is the accumulative squared difference between the original and stego signals [23]. MSE =
L 2 x(i) − x (i) i=1
(10)
SNR of less than 20 dB points out to noisy signal, while 30 dB SNR or above denotes a good quality signal. Segmental Signal to Noise Ratio (SegSNR) is the average of the SNR measures of all stego-audio frames [24].
r 2 10 N j=1 x(j) log10 r (11) SegSNR = 2 i=1 N j=1 (x(j) − x (j)) where: N is the number of segments of the signal, r is the count of samples in each segment. The fidelity of the retrieved secret data is measured by BER as the following [24]. BER =
falsely retrieved its count of secret bits
(11)
The amount of compression in signal is measured by the compression ratio as the following [25]: compression ratio =
length of uncompressed data length of compressed data
(12)
1070
A. O. Abdulmajeed
3 Results and Discussion Many mono WAV files of 16 bits/samples of different sampling rate (44.1 kHz, 8.192 kHz, and 22.05 kHz) were used to evaluate the quality of the proposed method with maximum embedding capacity (secret data, which is a random string of 0s and 1s, and location map). The SNR, SegSNR, embedding capacity of the proposed method are listed in Table 1. Table 1. Evaluation of the proposed method Audio file
No. of samples
Max. capacity (in bits)
Capacity (bps)
SegSNR
SNR
Chirp
13,136
7,353
0.6
21.8357
39.6229
Handel
73,120
34,520
0.5
34.1767
41.8266
Speech_dft
110,040
85,998
0.8
28.9838
33.6742
Ringtone
190,128
71,615
0.4
26.5492
46.0582
Buzyphone
98,344
58,038
0.6
24.4470
39.8561
The proposed method preserved a good audio quality (see SNR, and SegSNR in Table 1) with good payloads, because it made great changes in the less significant coefficients, and less changes in the most significant coefficients. That is, the LSB replacement was applied on the most significant coefficients within the approximation components, while companding technique was applied just on the less significant coefficients within the detail components; taking into consideration that the companding technique might cause big audio degradation. This suggested reason can also explain why the quality was affected by the value of the threshold T (see Fig. 3); the quality decreased as T increased.
Fig. 3. The effect of threshold on audio quality, SNR increased as T decreased.
The original, stego, and restored signals of “Speech_dft.wav” in both time and frequency domain presented in Fig. 4 indicated low degradation in audio quality of the proposed method. Table 2 shows a comparison between hiding using companding technique proposed in this work, and the hiding using prediction error expansion proposed by [12] and [13] according to their experiments and their audio files they have tested (16-bit mono wave audio file). The average (mean) of SNR, SegSNR, and capacity of the proposed method
Lossless Audio Steganographic Method Using Companding Technique
1071
Fig. 4. “Speech_DFT.wav” signal in frequency and time domain. (a) original audio, (b) stego audio, (c) restored audio Table 2. Proposed method compared to [12] and [13] according to their experiments and their tested audio files Audio file
Proposed work
[12]
[13]
SNR
40.2076
/
23.36
SegSNR
27.1985
18.3464 /
ER
0.58
0.95
0.99
compared with those in [12] and [13] showed that the companding technique achieved good quality with good payload that increased as the threshold increased but the audio quality was affected (see Fig. 3). For all tested signals, the hidden data were retrieved without any loss (BER = 0), and the cover audio was completely recovered to its original form after extracting
1072
A. O. Abdulmajeed
the hidden data (MSE = 0). The difference between original and restored signals of “Speech_dft.wav” in both time and frequency domain are shown in Fig. 5.
Fig. 5. The difference between original and restored signal in time and frequency domain MSE =0
The proposed compression method significantly reduced the number of embedded bits and achieved acceptable compression ratio; because most of the coefficients that satisfy the hiding condition are located close to each other, (see Table 3). Table 3. The compression ratio of the proposed compression method Audio file
Location map length (in bit) Compressed location map (in Compression ratio bit)
Chirp
6,568
3,910
1.6798
Handel
36,560
31,246
1.1701
Speech_dft 55,020
52,528
1.0474
Ringtone
95,064
45,046
2.1104
Buzyphone 49,172
33,090
1.4860
The secret key was long enough to make brute force attack impracticable and preserve the security of the proposed method. A test has been performed on one of the audio files listed in “Table 1” using one-bit difference secret key. The key 829419002466184 was used to hide secret data in “handel.wav” with 36560 bits of original location map. After compression, the location map decreased to 31246 bits which have been hidden in the audio using that secret key. Changing one bit in the key changed the retrieved compressed location map; thus, the decompressed location map was completely different. The BER of the retrieved compressed location map using the key 829419002466185 instead of 829419002466184 was 0.0047 with 147 false bits out of 31246 bits. These false bits resulted in decompressed location map of 37001 bits instead of 36560 bits, which was completely different from the actual location map. Using incorrect secret key even with only one-bit different stopped the proposed method and notified the recipient to use the correct key, or at least retrieved wrong secret data and original cover (in case the length of the decompressed location map at the recipient is equal to the original form).
Lossless Audio Steganographic Method Using Companding Technique
1073
4 Conclusions Companding technique can be used to achieve lossless data hiding with negligible effect on the audio quality when it is applied on the less significant coefficients of the detail components of LWT. Experiments also show that the proposed compression method for location map achieved acceptable compression ratio. Furthermore, jumbling the compressed location map with original LSB using Fredkin gate prior embedding was very useful to preserve the reversibility and increase the security level of the proposed method. Further works are recommended to use companding technique in video steganography.
References 1. Shivaram, H., Acharya, D.U., Adige, R.: Wavelet transform based steganography technique to hide audio signals in image. Procedia Comput. Sci. 47, 272–281 (2015) 2. Muhammad, N., Bibi, N., Mahmood, Z., Akram, T., Naqvi, S.R.: Reversible integer wavelet transform for blind image hiding method. PLoS ONE 12(5), 1–7 (2017) 3. Tian, J.: Reversible watermarking by difference expansion. In: Proceedings of Multimedia and Security Workshop, pp. 19–22. ACM Multimedia, Juan-Les-Pins, France (2002) 4. Tian, J.: Reversible data embedding using a difference expansion. IEEE Trans. Circuits Syst. Video Technol. 13(80), 890–896 (2003) 5. Alattar, A.M.: Reversible watermarking using the difference expansion of a generalized integer transform. IEEE Trans. Image Process. 13(8), 1147–1156 (2004) 6. Maity, H.K., Maity, S.P.: Intelligent modified difference expansion for reversible watermarking. Int. J. Multimedia Appl. 4(4), 83–95 (2012) 7. Coltuc, D., Chassery, J.: Very fast watermarking by reversible contrast mapping. IEEE Signal Process. Lett. 14(4), 255–258 (2007) 8. Hu, X., Zhang, W., Li, X., Yu, N.: Minimum rate prediction and optimized histograms modification for reversible data hiding. IEEE Trans. Inf. Forensics Secur. 10(3), 653–664 (2015) 9. Li, X., Zhang, W., Gui, X., Yang, B.: Efficient reversible data hiding based on multiple histograms modification. IEEE Trans. Inf. Forensics Secur. 10(9), 2016–2027 (2015) 10. Xuan, G., Yang C., Zhen, Y., Shi, Y. Q., Ni, Z.: Reversible data hiding using integer wavelet transform and companding technique. In: Proceedings of the Third International Workshop on Digital Watermarking, pp. 115–124, Seoul, South Korea (2004) 11. Weng, S., Zhao, Y., Pan, J., Ni, R.: A novel reversible watermarking based on an integer transform. In: Proceedings of IEEE International Conference on Image Processing, pp. 241– 244, San Antonio, Texas, USA (2007) 12. Yan, D., Wang, R.: Reversible data hiding for audio based on prediction error expansion. In: Proceedings of International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 249–252, Harbin, China (2008) 13. Wang, F., Xie, Z., Chen, Z.: High capacity reversible watermarking for audio by histogram shifting and predicted error expansion. Sci. World J. 2014, 1–7 (2014) 14. Nishimura, A.: Reversible audio data hiding based on variable error expansion of linear prediction for segmental audio and G.711 speech. IEICE Trans. Inf. Syst. E99-D(1), 83–91 (2016) 15. Huang, X., Ono, N., Nishimura, A., Echizen, I.: Reversible audio information hiding for tampering detection and localization using sample scanning method. J. Inf. Process. 25, 469– 476 (2017)
1074
A. O. Abdulmajeed
16. Hemalatha, S., Acharya, U.D., Renuka, A., Deepthi, S., Jyothi, U.K.: Audio steganography in discrete wavelet transform domain. Int. J. Appl. Eng. Res. 10(16), 36639–36644 (2015) 17. Deepthi, S., Renuka, A., Hemalatha, S.: Data hiding in audio signals using wavelet transform with enhanced security. Comput. Sci. Inf. Technol. (CS & IT) 3(9), 137–146 (2013) 18. Lei, B., Soon, I.Y., Zhou, F., Li, Z., Lei, H.: A robust audio watermarking scheme based on lifting wavelet transform and singular value decomposition. Signal Process. 92(9), 1985–2001 (2012) 19. Weng, S., Zhao, Y., Pan, J., Ni, R.: Reversible data hiding using the companding technique and improved DE method. Circuits Syst. Signal Process. 27(2), 229–245 (2008) 20. Memon, N.A.: A novel reversible watermarking method based on adaptive thresholding and companding technique. World Acad. Sci. Eng. Technol. Int. J. Comput. Inf. Eng. 5(7), 525–529 (2011) 21. Lee, J., Huang, X., Zhu, Q.: Decomposing Fredkin gate into simple reversible elements with memory. Int. J. Digital Content Technol. Appl. 4(5), 153–158 (2010) 22. Al-Shafi, M.A.: Analysis of Fredkin logic circuit in nanotechnology: an efficient approach. Int. J. Hybird Inf. Technol. 9(2), 371–380 (2016) 23. Alyousuf, F.Q.A., Din, R., Qasim, A.J.: Analysis review on spatial and transform domain technique in digital steganography. Bull. Electr. Eng. Inf. 9(2), 573–581 (2020) 24. Naidu, T.R.K., Kumar, G.P., Prasad, T.G.: Overview of digital audio steganography techniques. Int. J. Emerg. Technol. Eng. (IJETE) 3(7), 62–66 (2016) 25. Hameed, M.E., Ibrahim, M.M., Manap, N.A., Mohammed, A.A.: An enhanced lossless compression with cryptography hybrid mechanism for ECG biomedical signal monitoring. Int. J. Electr. Comput. Eng. 10(3), 3235–3243 (2020)
Smart Traffic Light System Design Based on Single Shot MultiBox Detector (SSD) and Anylogic Simulation E. R. Salim, A. B. Pantjawati(B) , D. Kuswardhana, A. Saripudin, N. D. Jayanto, Nurhidayatulloh, and L. A. Pratama Department of Electrical Engineering Education, Universitas Pendidikan Indonesia, Jl. Dr. Setiabudhi 207, Bandung 40154, Indonesia [email protected]
Abstract. Traffic lights, which can optimize vehicle flow rates, solve congestion, and reduce accidents, are often found on every city road. At some intersections, the traffic light transition duration is still set manually without taking into account the number of vehicles, resulting in longer vehicle queues. This paper proposes a smart traffic light system that can overcome these problems. The system is divided into two main parts, namely detection of objects and determining the duration of the traffic light. In the object detection process, this system uses computer vision technology through the singleshoot multibox detector (SSD) algorithm to detect the number and average speed of passing vehicles. The data then become input for the Anylogic simulation to determine the optimal green light duration for traffic lights. After obtaining the optimal duration, this system will simulate the flow rate of the vehicles. In the trial stage, this system gave good results with the average number of vehicles increasing by around 156 vehicles with an average travel time of 6 s faster. Keywords: Anylogic · Single board computer · Smart traffic light · Singleshoot multibox detector
1 Introduction Traffic jams in big cities have become commonplace. Congestion generally occurs at certain times, especially during peak hours. The volume of vehicles entering the road is difficult to predict. [1]. In addition, the large number of traffic lights is also a problem if the settings are still using a regular timer that has been set from the start without taking into account the number of vehicle densities [2]. Therefore, a smart traffic light system is needed to solve this problem. This system will set a timer on the traffic light based on the number of vehicles queuing up [3]. This can optimize the duration of the traffic light [4]. In addition, this system is equipped with machine learning which will provide automatic traffic engineering based on input data from the camera which is processed by image processing [5]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1075–1085, 2021. https://doi.org/10.1007/978-3-030-70713-2_96
1076
E. R. Salim et al.
Traffic congestion has become such a serious problem that many researchers are interested in investigating transportation systems [3]. One of which is predicting traffic flow by monitoring traffic activities to detect congestion [6]. In this case, it is necessary to increase traffic monitoring on vehicle detection [7]. Various Computer Vision technologies for intelligent transportation systems have been developed using different algorithms. Several authors have developed a congestion detection system using the Background Substraction algorithm by calculating the traffic flow at a certain time [6, 8]. Other authors make a system for determining the number of vehicles using blob detection [5, 9]. However, these studies have not used complex algorithms in terms of detection accuracy, have not been integrated into a system that can detect speed and have not counted the number of vehicles to get their density [10]. Other studies also use Anylogic to calculate vehicle flow, but it is not based on the number of vehicles calculated by image processing [15]. The hardware used in this tool is a single board computer (SBC) NVIDIA Jetson Nano which has a camera, Wifi module and Liquid Crystal Display (LCD) as the main components of the tool, and the Single Shot Multibox-Detector (SSD) algorithm to detect vehicle speed and count the number of vehicles in a certain time span. So that we get the number of vehicle density which makes the duration of the traffic light more efficient [11]. This study also uses anylogic simulation software to simulate traffic monitoring [12]. At the trial stage, we chose the Dago intersection in Bandung city of Indonesia, which has relatively heavy traffic.
2 Methods The design of this system uses an experimental method through two stages, namely designing and testing. At the design stage, a vehicle detection model is determined, where the hardware part uses the Nvidia Jetson Nano and the software section uses the Single Shot Multibox-Detector (SSD) algorithm for the detection process [13]. The SSD algorithm is shown in Fig. 1 [13]. Single Shot Multibox-Detector (SSD) uses a very different approach from other algorithms, namely applying a single neural network to the entire image [4]. This network will divide the image into a certain number then predict the boundary boxes. In each boundary area, the probability is calculated to classify whether it is the object or not [2].
Fig. 1. SSD algorithm
Smart Traffic Light System Design
1077
After the SSD detects a vehicle, the system will count the number of vehicles crossing the test area. The data obtained will be processed using Anylogic Simulation to determine the most efficient green light duration at the intersection [12, 13]. The next stage is trial, which was carried out at the Dago intersection by taking data per 30 min in several times comprising morning, afternoon, and evening [4]. The results of this data collection were used to calculate the number and the average speed of vehicles using Computer Vision. Then, those data were processed using Anylogic Simulation to obtain the optimal green light duration [16]. This process can be seen from the flowchart in Fig. 2.
Fig. 2. System flowchart
3 Results and Discussion 3.1 Design The design of the tool included installing the Nvidia jetson Nano as a Single Board Computer (SBC) that has added a camera. The program was inputted on Nvidia jetson nano for object detection with SSD [14]. The camera would detect objects (vehicles) which were then processed using Nvidia jestson nano to calculate the number. Nvidia jetson nano has a high computational speed, namely 472 Giga Floating Point Operation Per Seconds (GFLOPs), because it uses the Graphic Processing Unit (GPU) to process images from the camera. The power supply used is a 12V 5A power supply to get 100% performance. Hardware that was made can be seen in Fig. 3. 3.2 Trial The trial process was carried out at the Dago intersection with a video duration of half an hour with three times data retrieval. In addition, data collection was carried out at
1078
E. R. Salim et al.
Fig. 3. Vehicle detection hardware
different times, namely morning, afternoon and evening to obtain differences in vehicle density. The data retrieval process can be seen in Fig. 4 and Fig. 5.
Fig. 4. Evening data collection
Fig. 5. Morning and afternoon data collection
3.3 Data Processing Data processing was done after getting the number of vehicles and their average speed at different hours. Data processing used Anylogic Simulation software to determine the most optimal value of the traffic light duration [10]. In the morning, afternoon and evening data processing, a dataset of 35–120 s was set for the initial time of traffic light. Morning: Morning simulation results can be seen in Fig. 6. After performing the simulation, the best values for the green traffic light duration for each road were 35, 110, 40 and 45 s respectively, as shown in Fig. 7, where the symbols P1, P2, P3 and P4 are roads that enter the intersection. The best value is the
Smart Traffic Light System Design
1079
Fig. 6. Morning simulation results
Fig. 7. Optimization results of the morning traffic light duration
value given by the Anylogic software for the duration of the green light based on the number of vehicles entering the intersection [17]. In addition to determining the optimal green light duration for traffic lights, this stage also performed a simulation to predict the impact of changes in traffic light duration [4]. This process can be seen in Fig. 8 and Fig. 9. Before optimization, the number of vehicles (2,205) obtained an average travel time of 129 s. Meanwhile, after optimization, the number of vehicles (2,394) obtained an average travel time of 122 s. Afternoon: Afternoon simulation results can be seen in Fig. 10. After performing the simulation, the best value for the duration of the green traffic light for each road segment were 100, 110, 75 and 120 s as shown in Fig. 11. From the simulation to predict the impact of changes in traffic light duration, the number of vehicles obtained from the data was 2,168 before optimization with an average travel time of 128 s. Meanwhile, after optimization, the number of vehicles became 2,291 with an average travel time of 123 s, as can be seen in Fig. 12 and Fig. 13.
1080
E. R. Salim et al.
Fig. 8. Morning data before optimization
Fig. 9. Morning data after optimization
Fig. 10. Afternoon simulation results
Smart Traffic Light System Design
1081
Fig. 11. Optimization results of the afternoon traffic light duration
Fig. 12. Afternoon data before optimization
Fig. 13. Afternoon data after optimization
Evening: Evening simulation results can be seen in Fig. 14. After performing the simulation, the best values for the duration of the green light traffic light for each road segment were 60, 115, 45 and 45 s as shown in Fig. 15.
1082
E. R. Salim et al.
Fig. 14. Evening simulation results
Fig. 15. Optimization results of the evening traffic light duration
The processes before and after optimization can be seen in Fig. 16 and Fig. 17. Before optimization, the number of vehicles was 2,247 and the average travel time was 122 s. Meanwhile, after optimization, the number of vehicles was 2,404 with an average travel time of 117 s.
Fig. 16. Evening data before optimization
Smart Traffic Light System Design
1083
Fig. 17. Evening data after optimization
3.4 Analysis The comparison of the optimal green traffic light duration data per hour can be seen in Table 1. It can be seen that there is quite difference in the duration of the traffic light on each road. In the morning and evening, the second road section gets the longest green light duration, while in the afternoon, the fourth road segment gets the longest green duration. This happens because the roads are having the greatest number of vehicles with the lowest average vehicle speed. The algorithm provides a longer green light duration and the density can unravel. Table 1. The comparison of the optimal green traffic light duration Time
Morning
Road segment
P1
P3
Afternoon
Number of vehicles
201
1224 332
P3
Evening
P4
P1
P2
P3
P4
469
320
465
205
1225 524
39 44.2
Avg 61.7 speed (km/hour)
38.6
54.6 43.8 55.8
57.9 64.8
Optimal duration (s)
110
40
110
35
45
100
75
P1
120 60
P2
P3
1225 335
P4 205
41.1
47.6 55.8
115
45
45
At this stage, comparisons between the data before and after optimization are also carried out and analyze the accuracy of the object detection algorithm. This process can be seen in Table 2.
1084
E. R. Salim et al. Table 2. The comparison of data optimization
Time
Before optimization
After optimization
Number
Avg. Conf. Dev. travel time (s)
Number
Avg. Conf. Dev. travel time (s)
2205
129
6.478 155.21
2394
122
5.067 292.73
Afternoon 2168 (0–1 PM)
128
6.479 153.91
2291
123
5.639 137.707
Evening (5–6 PM)
122
6.229 150.639 2404
117
5.056 126.479
Morning (7–8 AM)
2247
From Table 2, it can be seen that there is an optimization of the average travel time after using the optimal traffic light duration. In the morning, there is an increase in the number of vehicles by 189 vehicles with an average travel time of 7 s faster. In the afternoon, there is an increase in the number of vehicles by 123 vehicles with an average travel time of 5 s faster. In the evening, there is an increase in the number of vehicles by 157 vehicles, with an average travel time of 6 s faster.
4 Conclusion Based on the results of the discussion, it can be concluded that the computer vision-based smart traffic light system and the Anylogic simulation developed are successfully made according to the plan. This can be seen from the trials conducted at the Dago intersection. This system gives good results with an average increase in the number of vehicles and an average travel time.
References 1. Razavi, M., Hamidkhani, M., Sadeghi, R.: Smart traffic light scheduling in smart city using image and video processing. In: Proceedings of the 3rd International Conference on Internet of Things and Applications, IoT 2019, pp. 1–4 (2019) 2. Lalangui, G., et al.: Framework comparison of neural networks for automated counting of vehicles and pedestrians. In: Communications in Computer and Information Science, vol. 1096 CCIS, pp. 16–28 (2019) 3. Khushi: Smart control of traffic light system using image processing. In: International Conference on Current Trends in Computer, Electrical, Electronics and Communication, CTCEEC 2017, pp. 99–103 (2018) 4. Diaz, N., Guerra, J., Nicola, J.: Smart traffic light control system. In: 2018 IEEE 3rd Ecuador Technical Chapters Meeting, ETCM 2018 (2018) 5. Cao, Y., Lei, Z., Huang, X., Zhang, Z., Zhong, T.: A Vehicle detection algorithm based on compressive sensing and background subtraction. AASRI Procedia 1, 480–485 (2012)
Smart Traffic Light System Design
1085
6. Trnovszký, T., Sýkora, P., Hudec, R.: Comparison of background subtraction methods on near infra-red spectrum video sequences. Procedia Eng. 192, 887–892 (2017) 7. Arinaldi, A., Pradana, J.A., Gurusinga, A.A.: Detection and classification of vehicles for traffic video analytics. Procedia Comput. Sci. 144, 259–268 (2018) 8. Brutzer, S., Höferlin, B., Heidemann, G.: Evaluation of background subtraction techniques for video surveillance. In: CVPR, pp. 1937–1944. IEEE (2011) 9. Bhaskar, P.K., Yong, S.P.: Image processing based vehicle detection and tracking method. In: 2014 International Conference on Computer and Information Sciences (ICCOINS), pp. 1–5. IEEE (2014) 10. Gil Jiménez, P., Bascón, S.M., Moreno, H.G., Arroyo, S.L., Ferreras, F.L.: Traffic sign shape classification and localization based on the normalized FFT of the signature of blobs and 2D homographies. Signal Process. 88(12), 2943–2955 (2008) 11. Heredia, A., Barros-Gavilanes, G.: Video processing inside embedded devices using SSDMobilenet to count mobility actors. In: 2019 IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2019 - Proceedings, pp. 1–6 (2019) 12. Muravev, D., Hu, H., Rakhmangulov, A., Mishkurov, P.: Multi-agent optimization of the intermodal terminal main parameters by using AnyLogic simulation platform: case study on the Ningbo-Zhoushan Port. Int. J. Inf. Manage. 57, 102133 (2020) 13. Pop, M.D.: Traffic lights management using optimization tool. Procedia-Soc. Behav. Sci. 238, 323–330 (2018) 14. Chen, Q., Huang, N., Zhou, J., Tan, Z.: An SSD algorithm based on vehicle counting method. In: Chinese Control Conference CCC, vol. 2018-July, pp. 7673–7677 (2018) 15. Pop, M.D.: Decision making in road traffic coordination methods: a travel time reduction perspective. In: Proceedings – 2020 International Conference Engineering Technologies and Computer Science, EnT 2020, pp. 42–46 (2020) 16. Antonova, V.M., Grechishkina, N.A., Kuznetsov, N.A.: Analysis of the modeling results for passenger traffic at an underground station using AnyLogic. J. Commun. Technol. Electron. 65(6), 712–715 (2020) 17. Shamlitskiy, Y.I., Mironenko, S.N., Kovbasa, N.V., Bezrukova, N.V., Tynchenko, V.S., Kukartsev, V.V.: Evaluation of the effectiveness of traffic control algorithms based on a simulation model in the AnyLogic. J. Phys. Conf. Ser. 1353(1), 012101 (2019)
Learning Scope of Python Coding Using Immersive Virtual Reality Abdulrazak Yahya Saleh(B) , Goh Suk Chin, Roselind Tei, Mohd Kamal Othman, Fitri Suraya Mohamad, and Chwen Jen Chen FSKPM Faculty, University Malaysia Sarawak (UNIMAS), 94300 Kota Samarahan, Sarawak, Malaysia [email protected]
Abstract. Programming is a highly sought-after technical skill in the job market, but there are limited avenues available for training competent and proficient programmers. This research focuses on evaluating an immersive virtual reality (VR) application that has been introduced in the field of Python learning, which uses the interaction technique and a user interface, allowing the novice to engage in VR learning. 30 participants were recruited for the evaluation purpose and they are divided into two groups–15 for Experiment I, and 15 for Experiment II. A questionnaire to evaluate the user interface was done in Experiment I, and a questionnaire to evaluate the novice’s acceptance of the VR application was given to the participants in Experiment II. Furthermore, interviews were conducted to collect detailed feedback from all the participants. From the results, it can be noted that the implemented interaction designs in this VR application are adequate. However, more interaction techniques can be integrated to increase the degree of immersive experience of the user in the application. Besides, the interface of the application is considered adequate and reasonable. Nevertheless, there is room for improvement in the aspect of usability and provide a higher level user experience. The novices’ acceptance level of the new proposed learning method is low; this might be due to the users’ fear of change– a normal human behaviour in embracing new things in life. Therefore, a larger sample size is proposed to further investigate the novice’s acceptance of the new learning method by using an improved version of the VR application. Keywords: Immersive virtual reality · Education · Programming learning
1 Introduction Virtual reality (VR) has become a popular medium of application for various fields such as entertainment [1, 2]; education [3–5]; medical training [7–9]; mental health monitoring and development [10, 12]; and military training [12–14]. VR in education and learning tools are designed for students to acquire academic concepts in the virtual reality environment. Scenarios that represent the psychological feeling of being in the VR world, realism and level of reality are used to measure the user experience [15]. Realism is the user’s expected response to the stimuli, and the perceived experience of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1086–1100, 2021. https://doi.org/10.1007/978-3-030-70713-2_97
Learning Scope of Python Coding Using Immersive Virtual Reality
1087
the virtual environment (VE) [16, 17] that will vary according to the degree of reality in interacting with the virtual components presented. Programming has been identified as a crucial skill for career success in innumerable disciplines and sectors, but it is quite a daunting task to master various types of programming languages. Based on IEEE Ranking Spectrum, the top five programming languages are Python, Java, C, C++, and R language [18]. The motivation of students in learning programming depends very much on the availability of effective tools to resolve the problems mentioned above [19]. Insufficient understanding of the executed programs is a major factor that will cause students to struggle in learning programming. Inability to grasp the fundamentals of writing computer programs will soon lead to discouragement; negative attitudes may set in, and students may become demotivated towards learning the subject of programming. According to Edori [20], the enthusiasm of students is considered as a top learning factor because it has a direct impact on their perseverance and dedication in completing their goals. Types of interactions provided in the VR application must contain interesting elements that can increase the enthusiasm of students toward Python learning. There are various advantages and disadvantages of implementing VR in education as shown in Table 1. According to Sherman and Craig [21], mental immersion refers to the deeply engaged state within VE, and sensory immersion refers to the users’ movement, vision, auditory and haptic sensations when they are engaged in the scene changes of VE; these sensory immersion responses are essential for successful and enjoyable interacting experiences in the VE. Table 1. Advantages and disadvantages of VR. Advantages of VR
Disadvantages of VR
Highly motivated the user [22]
Costly [23]
Encourage active participation rather than passivity [23]
Safety effect [23]
Learner-centered [24]
Possible reluctance dealing [23]
Promote higher order thinking skills [24]
Spatial awareness
The five basic human senses should be involved for a complete immersion feeling in the virtual environment. However, sight is the most important sense and it is the most closely allied with reason [25], so it is natural and normal for the VR environment to focus on sight and hearing only. Research of gauging the VR effects on the brain functions and the user experience using the electroencephalogram-based evaluation method shows that the motor, cognitive or other functions of the brain are influenced by the virtual environment [26]. The graphics and content nature of VE cause changes in the aforementioned brain processes which have the effect of influencing the user experience. Therefore, an immersive VR system that is capable of isolating the user from the real world is able to provide the highest level of immersion and increase the task efficiency; but at the same time, it is the most expensive among the three systems [27]. Since VR can be applied to interdisciplinary education systems, immersive VR is employed in this
1088
A. Y. Saleh et al.
research. The VR learning environment is further extended with animation and multimedia for a richer experience. It is becoming a more popular and powerful media for student usage in schools [28]. Huang et al. [29] studied the user acceptance of implementing 3D VR in learning based on the perceived usefulness and perceived ease of use, as rated by the user. The results show that 3D VR encourages positive learning attitudes if the learner perceived these two things: the system is a useful tool for learning; and it must be easy to use. Figure 1 shows the final model of the learner’s attitude towards 3D VR learning.
Fig. 1. Final model of learner’s attitude toward 3D VR learning [29].
According to Salis and Pantelidis [30], VR affords constructivist learning because it provides a highly interactive environment for the user. Constructivists emphasise that an individual’s knowledge is constructed from his prior experiences or beliefs and are used to analyse events [31]. Over the decade, the employment of VR as a method of learning programming has been investigated by a few researchers. Grivokostopolou [32] presents an innovative 3D VR model for teaching search algorithms. Experiments with control groups were implemented in this study to evaluate the performance of the students. The results show that visualisation using VR helps improve the students’ comprehension and learning efficiency. Also, the Cronbach’s alpha value of 0.79 obtained for the questionnaire reliability evaluation indicates that the VR environment is efficient in increasing the interest, motivation, and knowledge construction of the students. However, the weakness of this research is that it assumes the students had pre-knowledge in some basics of programming, concepts of iteration, sequence, and bifurcation. Furthermore, an empirical study has been done by Pierre et al. [33] about the effects of VR games on higher education. The main focus of their games is to learn computer programming, specifically the computer algorithms. Their results show that students’ learning attitudes in higher education can be highly motivated by using the VR game approach. However, there is insufficient evidence to support the results and findings that that learning through VR games is more efficient. There is also a lack of empirical data to prove the students’ acceptance of the VR games proposed. In summary, a variety of VR applications related to learning have been built recently. However, the study of the application effectiveness
Learning Scope of Python Coding Using Immersive Virtual Reality
1089
based on the types of interaction techniques used, and the performance of the developed applications has yet to appear in the literature. User experience and the user’s acceptance of the VR learning methods are the concern of this study. To fill some of the knowledge gaps discussed above, this research aims to evaluate a developed immersive VR system which provides an interacting environment for the participants to learn programming, specifically the high-level Python, as it is one of the most popular computer languages in 2020 [34]. The main objectives of this study are as follows: to evaluate the type of interaction techniques used in the application; to identify the degree of the user’s acceptance of the application developed; and to evaluate the performance of the developed VR application.
2 Materials and Methods 2.1 Architecture of VR Learning Apps This study presents an interactive Python learning application based on the immersive VR concept; to hone his or skills, the user is required to engineer an escape route from the maze presented by using the prior Python knowledge learnt in the application. An Advanced technology theme is embedded in the design of the application, mainly to increase curiosity as well as a student’s motivation in learning. The second until the fifth scene are learning scenarios; the user would be presented with learning materials related to four Python topics, such as strings, tuple, if-else statement, and for loops before proceeding to the tutorial rooms. There are various learning materials designed and placed in the rooms, and the user can trigger them by collision. The design of the main virtual scene is divided into learning and tutorial sessions as shown in Figs. 2, 3, 4, 5 and Fig. 6. Figure 2 below shows a welcome scene that has these two aims: first, give the user a clear idea of the application; second, brief the user with appropriate instructions for proper handling of the app. The User Guide Manual is displayed to provide interaction techniques used in this scene.
Fig. 2. Welcome scene of the application
A mini-map with checkpoints as shown in Fig. 3 below is projected on the top right-hand corner; this is to allow the user to know the exact real-time location and to assist the user to complete the Python learning. Besides, some coding involving tuples
1090
A. Y. Saleh et al.
Fig. 3. Learning scene
are demonstrated as an example to enable the user to learn quickly. The interaction techniques used in this scene are Mini-map, Gaze Interaction, and System Control. Tutorial video is added to the scene to allow the user to experience the tutor-like learning in the physical environment, as shown in Fig. 4 below. The interaction techniques used in this scene are Sonic Interaction and Trigger Technique.
Fig. 4. Video style learning in learning scene
Fig. 5. Python tutorial scene (Maze)
The final scene is the Python tutorial as shown in Fig. 5 and Fig. 6; the user is required to make an escape from the maze by choosing the correct path to follow. This maze Python tutorial aims to create a fun and interesting learning environment so as to increase the user’s motivation of learning as well as deepening his or her Python understanding and experience. A syntax error will be shown if the user chooses the incorrect path, and a brief explanation will be displayed. The interaction technique used in this scene is Walking-In-Place. An endnote is included in this scene as shown in Fig. 6
Learning Scope of Python Coding Using Immersive Virtual Reality
1091
Fig. 6. Welcome and end note in Python tutorial scene
to indicate that the user has completed the Python learning and tutorial. The interaction techniques used in this scene are User Guide Manual, Gaze Interaction, and System Control. 2.2 Types of Interaction Techniques Used The types of interaction techniques used in this application are listed as follows: a) Gaze Interaction Technique; b) Wayfinding; c) System Control – 2D menu; d) Trigger technique; e) Scene Load; f) Sonic Interaction; g) Walking-in-place. a) In the tutorial scene as shown in Fig. 3, the user can click the button to trigger an event by using an indirect selection technique called gaze cursor or in another word, the gaze interaction technique. This method is an indirect selection method; the target key is selected when the user gazes at the target for a predetermined time, which means when the cursor stops for a predetermined time on the target key. The cursor is moved based on a collection of head movement data with the head tracker [35]. In this case, the google cardboard HMDs can be used to test whether this application moves in the direction that the user is looking, and the button is clicked if static gazing of over 2 s is detected. b) Besides, wayfinding techniques are used, which allow the user to orient themselves in the virtual space to get from one place to another [36]. This technique helps the user navigate in an unfamiliar environment. The user is able to discover by himself or herself which way he or she should take that is related to the wayfinding techniques using the mini-map provided in this application. The wayfinding techniques used in this virtual scene are user-centred, which are based on human perceptions; in this case, the user navigates the scene by his or her perception instead of depending on the virtual world. The number of wayfinding quests is mostly based on a field of view and search strategies of users. Miniature maps, including the dynamic player’s current position and checkpoint are placed on the top right corner of the user interface, allowing the player to have an efficient method of finding a way. According to Darken & Sibert [37], maps prove to be an invaluable tool for acquiring and maintaining orientation and position in a virtual environment. In Fig. 3 and Fig. 4, the red dots indicate the player, yellow numbers represent checkpoints. c) The system control used in this application is 2D menus in VR, which are displayed in Fig. 6 that shows the ‘home’ and ‘quit’ button. By using 2D graphical menus, the
1092
d)
e)
f)
g)
A. Y. Saleh et al.
environment of the scene is occluded, and somehow it only appears less frequently on the screen. The trigger technique, also known as the event trigger technique, is widely used in this application such that the tuple information will be displayed if a particular point is triggered. For example, a video tutorial will pop up and play when the user has travelled to a spot in the VE and triggered the video action. It is a reactive former; the event triggered will control a triggering condition based on a current measurement, and violated if an event is triggered [38]. Generally, when the scene changes from one to another, all the instances of a game object, scripts, and all of the UI elements belonging to that scene are destroyed, and the ones from the new scene will be loaded. Scene loading features are included in the application as well as the transition between scenes. Furthermore, immersive sound is embedded in the ecologically valid interactive multisensory experiences. In this situation, sonic interaction can be discovered when the information is triggered, and a voice explaining the relevant information will be played. Although an immersive sound can be delivered through a speaker setup, sound rendering through headphones is preferred since it is the hardware solution available with a state-of-the-art consumer HMD, and is easily integrated in mobile devices. Headphone-based sound rendering makes it possible to completely control the sound arriving at each ear [39]. Walking-In-Place (WIP) is the VR interaction technique, through which the user performs virtual locomotion by walking-in-place in step-like movements while remaining stationary [40]. Angle detection is used to control the movement of the avatar in the scene. The user is allowed to pitch and yaw in order to explore the 360-degree environment; if the pitch angle of the player is declined 30 degrees, the avatar will move forward. This approach is used to perform the WIP because there is no physical motion sensor or input equipped by the Google Cardboard.
2.3 Implementation Details According to Dahlstrom et al. [41], in the year 2015, 92% of undergraduates each owned a smartphone, and it exceeded the 91% of undergraduate that possessed a laptop each. Based on this statistic, it is more sensible and practical to implement a VR educational app on smartphones. Moreover, mobile VR headsets are affordable and hence more suitable for the implementation of student-centred pedagogies [42]. The only drawback of mobile VR app is the issue of pixels and frame rates, as the “screen door effect” (SDE) may occur [43]. This means the viewer can see the fine lines separating the pixels because the HMD uses lenses to magnify the screen’s pixels across a much wider field of view. Notwithstanding the minor weakness of the mobile VR app, Google Cardboard is chosen as the HMD of this research. This is because it is low-cost and has been widely used in previous studies such as the following works: a) Smartphone-based Virtual Reality Systems in Classroom Teaching [44]. b) Experiential Learning VR System for Studying Computer Architecture [45]. c) Google Cardboard for a K-12 Social Studies Module [46].
Learning Scope of Python Coding Using Immersive Virtual Reality
1093
d) Assessing Google Cardboard virtual reality as a content delivery system in business classrooms [47]. Researchers of the studies above agreed that leveraging the use of smartphones and low-cost Cardboard renders learning with virtual reality affordable. The feature differences between the traditional HMD and mobile-based HMD are resolution, field of view, and frame rate [48]. Therefore, Google Cardboard can be a good tool for building an immersive VR app. 2.4 Experiments and Evaluation To validate the application, experimentation was performed in two phases. The experiment (Experiment I) in the first phase was tested by users with prior knowledge in Python, who are graduates of Cognitive Science of Universiti Malaysia Sarawak. The experiment (Experiment II) in the second phase was tested by target users, who are undergraduates of Cognitive Science of University Malaysia Sarawak. The demographics of the 30 participants are illustrated in Fig. 7 and Fig. 8 below.
Fig. 7. Gender of 30 participants
Fig. 8. Ages of 30 participants
Participants in both experiments have been asked to install the application developed into their smartphone a day before. In a 90-min lab session, participants were instructed to play the application with a Google cardboard and an earphone. A set of questionnaire was given to participants of Experiment I; and a different set of questionnaire was given to participants Experiment II. A 10-min interview was done with each respondent so that he or she can further describe his or her VR experience, and justify the ratings for the questions in the questionnaires. The interview is a loosely structured qualitative in-depth conversation with the participants; it is a one-to-one dialogue consisting of asking questions and giving answers, through which data are collected methodically. A semi-structured interview is usually conducted face-to-face, allowing researchers to gain insights, ask questions and evaluate phenomena from various perspectives [49]. However, in this case, a video call was implemented to interview the participants due to the government-imposed restrictions to contain the COVID-19 pandemic. In the interview sessions, the respondents could raise any matters concerning the VR app tested—some are normal and ordinary comments, while some others are unexpected issues; all these different pieces of information are very useful to the researcher and can be utilised to improve the app. Some of the questions asked during the interview are as below: a) Please describe your feeling when you just enter the first scene (instruction scene).
1094
A. Y. Saleh et al.
b) What do you feel when you are in the second and third topic (‘Tuple’ and ‘For’ loop)? c) Give your overall comment on this learning experience. Experiment I: In experiment I, a group of 15 graduates with previous knowledge and experience in Python programming were recruited; in this case, they are named as “experts”. The focus of this experiment is to evaluate the design of the application based on the participants’ perception so that they can compare their previous Python learning experience with that of the proposed Python learning in this experiment. With the participation of “experts” in this experiment, their experience and comments can be inputs for application improvements in the future, as this developed application is still in its infancy stage. This group of participants will be asked to choose between their previously used learning method and the VR learning method proposed. The justification for their choice will be provided through the answers to a few questions in the questionnaire: a) “why do you prefer this learning method?” and b) “what do you like about this learning method? The 5-point Likert-scale questionnaires on the interface evaluation used in this experiment were adopted from [50], and amended to suit for this experiment. This 5-point Likert scale has these ratings: 1 = “strongly disagree” to 5 = “strongly agree”. Before the experiment, previously used methods to learn Python were surveyed, and the result shows that online hands-on learning is the common method used by the “experts”. Experiment II: In experiment II, a group of 15 undergraduates with no experience in the Python language learning were recruited, mainly to evaluate the user’s acceptance of this newly proposed learning method. The users of this experiment were known as novices of the Python language. Novices were chosen to participate in this experiment because the majority target users of this application will be students who have zero knowledge in the Python programming. This application is designed with basic Python knowledge and concepts, which is unsuitable for skilful Python programmers. The usability of the proposed application will be tested by the novice users, and they will be asked to describe their experiences in a questionnaire. The 5-point Likert-scale questionnaire on the degree of acceptance used in this experiment was also adopted from [51] and amended to suit this experiment. This 5-point Likert scale has these ratings: 1 = “strongly disagree” to 5 = “strongly agree”.
3 Results and Discussion 3.1 Experiments I The results obtained through the questionnaire are collected and tabulated in Table 2. The calculated mean value for each of all the answers is 3.00, which can be interpreted that the performance of the application is merely adequate. This implies that the design and interface of this application still require some improvements before it can be put into actual use. Based on the results above, the mean score is 3.00 for each of all the 8 questions; this may correlate to the study from [51] which reports that a person is more likely
Learning Scope of Python Coding Using Immersive Virtual Reality
1095
Table 2. Results of experiments I Description
Mean
Selecting actions using gaze interaction is simple and natural.
3.000
The difficulty of the contents in the application is appropriate
3.000
The number of information or content at each topic is appropriate
3.000
The environment has a nice look
3.000
The elements of the environment are not intrusive
3.000
The layout of the menus of the interface is correct
3.000
The minimap is useful to solve some levels
3.000
Transitions between scenes are good
3.000
to adopt a midpoint response when he or she is asked to rate an answer in a question. Consequently, the accuracy of the overall assessment results might have been adversely affected. According to the “experts”, the instructional design in the application requires improvement so that the user can easily understand the operational procedures of the application. Regarding the amount of information in each topic, the majority opined that the content in each scene is insufficient to achieve a high level of understanding. However, using a VR app for long hours may cause cyber-sickness; more concise contents are preferred in the scenes to optimise time usage as well as the quality of learning. When the “experts” were asked to choose between their previously used learning method and the current VR method, the majority of them chose their previously used method to learn Python. And this is their reasoning: the VR application is low in graphic quality, as not everyone has a high graphic smartphone. Furthermore, latency was detected, and this causes the degree of immersion to decrease. These results seem to reflect the challenges of the mobile VR app mentioned in Sect. 2.3. In order to create a truly immersive mobile VR app, a mixed balance of high resolution, high pixel fill density, high frame-rate, and high screen refresh rate is necessary. However, today’s smartphone display capability does not meet these criteria, which are prerequisites for creating a great VR experience. Out of the 15 “experts”, 10 of them used a lower-specification smartphone when undergoing this experiment. With all these technical issues and deficiencies as well as the current application design, the challenge of using a mobile VR app to conduct fully immersive learning is yet to be resolved. The results pertaining to the preferred learning method of the “experts” are illustrated in a pie chart in Fig. 9. The results of the interview for this experiment show that generally, the instruction scene is sufficient for the user to understand the basic functions of the application, and the procedures required to produce the desired learning outcomes. However, 73% of the “experts” reported that the instruction scene of the application can be improved to be more engaging and immersive. Most of the “experts” concluded that the contents are too general in terms of procedural details and information on each topic. This is a reasonable comment since all the “experts” have already acquired the fundamental knowledge of Python. Therefore, a good solution would be integrating more comprehensive and advanced Python programing techniques in the application.
1096
A. Y. Saleh et al.
Fig. 9. Preference of the experts on the learning method
However, the content mix between advanced and basic topics should not be skewed too much towards the former that will affect the primary objectives of the VR application, as the target users are mostly beginners of the Python language. The overall comment from the “experts” on this application is that more work should be done to provide richer experience for advanced users. The interaction techniques implemented in this application are good but insufficient to provide an excellent level of user experience. For example, gaze interaction and WIP are implemented to eliminate the use of the controller. However, introduction of these methods increases the chance of cyber-sickness because the users are required to position their heads for motion capture. According to the “experts” and the literature review, scene loading can enhance user experience, but a low refresh rate as well as low graphic quality of the application or smartphones will reduce the user experience. The system control and mini-map facilitate ease of use and provide better visual and cognitive experience for the user, but improvement still can be made accordingly. 3.2 Experiments II The results obtained through the questionnaires are collected and tabulated in Table 3 below. From the results, a mean score of 1.800 was computed for the first and last questions as shown in Table 3. Dizziness while playing the VR application records the highest mean value of 3.867. The overall result obtained in this experiment indicates that the users are not ready to accept the new learning technology. Google cardboard was reported as a less comfortable HMD by the novices, because they need to hold the HMD all the time, and it causes tiredness to the hand muscles. Furthermore, they did not enjoy the learning sessions because dizziness was felt after some time. Some of them had vomiting feelings after the learning sessions. However, cyber-sickness is one of the well-known motion sicknesses that may affect some people during or after a VR play, with symptoms such as dizziness, nausea, and imbalance [52]. The magnifying screen’s pixels have worsened the motion sickness. Therefore, higher-resolution displays are needed to create a better user experience. Apart from the questionnaires, interview results show that the novices are mostly confused as this is their first attempt to learn programming. More time is required for the beginners to familiarise with the syntaxes and commands used. There are evidences to show that the content should be improved by including more advanced Python knowledge. These findings are similar to those of Experiment I in this study. However, as mentioned before, it is necessary to optimise the content of Python programming and duration
Learning Scope of Python Coding Using Immersive Virtual Reality
1097
Table 3. Results of experiments II Description
Mean
The headset is comfortable
1.800
I like the look of the game
1.867
It did not take me long to understand what I need to do
2.667
I understand the purpose of this application
2.733
I understand the contents in each topic
2.067
The content designed in this application clear and easy to understand
2.067
I got dizzy when I was playing
3.867
I wish this application was used in my university
2.000
I think programming like this can be fun
2.000
I had a great time playing and I would like to tell my friends
1.800
of usage to avoid cyber-sickness. Furthermore, the results also show that novices are less likely to attempt using a new learning method. This result is in contrast with the research done by Pierre et al. [33] mentioned in Sect. 1. According to the novices, sonic and visual interactions of this application are satisfactory, but long hours of usage involving sight, touch, and hearing will cause tiredness easily. These results can be correlated with the study done by Robertson et al. [25]; it is important to incorporate human’s five basic senses in a VR app to enhance user experience, but there is a limitation to the practical feasibility. Therefore, sight and hearing are the foremost senses to be included in enriching the VR experience. Based on the results from both Experiments I and II, the novices’ acceptance level of this VR application is consistent with the review from the experts. The mediocre level of acceptance might be due to the fear of change in the human behaviour; by and large, people tend to follow the traditional way of doing things, and refuse to try new approaches. In general, the novices’ acceptance of the proposed learning method is lower than expected. On the whole, the performance of the VR application is considered acceptable for learning Python programming. The VR learning method is convenient and is a modern tool for learning at any place. It provides a good experience and is an interesting method for users to embark on learning anything; this VR application not only can be used for learning programming, but other fields as well. The application is a good approach for beginners to acquire knowledge and skills in any subjects.
4 Conclusion This study focuses on a VR learning application; it is a new learning approach for students in order to help them to have a better understanding of the subject they are learning, and it is achieved by providing an immersive learning experience. The interaction techniques used in the VR learning are as follows: Gaze Interaction Technique, Wayfinding, System
1098
A. Y. Saleh et al.
Control with 2D menu, Trigger technique, Scene Load, Sonic Interaction, and Walkingin-place. These techniques aim to increase the interaction between the user and the virtual environment. However, the results show that these interaction techniques used are still insufficient to provide an excellent user experience. Furthermore, the issues of implementing the mobile VR app for a fully immersive VR experience remain unsolved, with the use of the existing interaction techniques. Therefore, future improvements are needed and are listed below for discussion. The proposed VR learning application was evaluated via Experiments I and II. The main findings of this study are as follows: a) The interaction designs used in this application are acceptable, but more interaction techniques can be included to enhance the interactivity of the application. b) The interface of the application is merely adequate. Improvement is needed to render the application more user-friendly as well as providing a better user experience. c) The novices’ acceptance of the newly proposed way of learning is low. However, a bigger sample size can be used in the future to evaluate the novices’ acceptance of the second version of this application. The development of this learning system is in its infancy stage, and hence, further improvements to this application are essential before it can be officially introduced to the users. The scopes of future work can include these few aspects of investigations: expand the learning content to cover all basic knowledge of the Python language; increase the amount of user instructional manual in the application; design an intelligent tutoring module and further enhance the interaction between the application and the user; shorten the period of learning to reduce dizziness; and finally, design a questionnaire with a 7point Likert scale to avoid middle responses. Overall, the proposed application requires improvements in the quality of content, usability, and other technical features. This application is still an acceptable educational system–a good approach for beginners to learn the Python language in a fun and enjoyable manner, although the user experience can be heightened by introducing further improvements in the design and use of higherspecification devices. Further research is needed to develop the application into a more sophisticated but user-friendly system before it can be fully implemented. Acknowledgement. This work was supported and funded by Universiti Malaysia Sarawak (UNIMAS), under the Scholarship of Teaching and Learning Grants (SoTL/FSKPM/2019(1)/001).
References 1. Zyda, M.: From visual simulation to virtual reality to games. Computer 38, 25–32 (2005) 2. Shafer, D.M., Carbonara, C.P., Korpi, M.F.: Factors affecting enjoyment of virtual reality games: a comparison involving consumer-grade virtual reality technology. Games Health 8, 15–23 (2019) 3. Bell, J.T., Fogler, H.S., Arbor, A.: The application of virtual reality to chemical engineering education, Simul. Ser. 29(2) (1997) 4. Bogusevschi, D., Muntean, C., Muntean, G.-M.: Teaching and learning physics using 3D virtual learning environment: a case study of combined virtual reality and virtual laboratory in secondary school’. J. Comput. Math. Sci. Teach. 39, 5–18 (2020)
Learning Scope of Python Coding Using Immersive Virtual Reality
1099
5. Vesisenaho, M., Juntunen, M., Hakkinen, P., Poysa-Tarhonen, J.: Virtual reality in education: focus on the role of emotions and physiological reactivity’. J. Virtual world Res. 12, (2019) 6. Gallagher, A.G., Ritter, E.M., Champion, H., Higgins, G., Fried, M.P., Moses, G., Satava, R.M.: Virtual reality simulation for the operating room. Ann. Surg. 241(2), 364–372 (2005) 7. Tieri, G., Morone, G., Paolucci, S., Iosa, M.: Expert Review of Medical Devices Virtual reality in cognitive and motor rehabilitation: facts, fiction, and fallacies. Exp. Rev. Med. Dev. 15, 1–11 (2018) 8. McKnight, R.R., Pean, C.A., Buck, J.S., Hwang, J.S., Hsu, J.R., Pierrie, S.N: Virtual Reality and Augmented Reality-Translating Surgical Training into Surgical Technique, pp. 1–12. Springer, New york (2020) 9. Makled, E., Yassien, A., Elagroudy, P., Magdy, M., Abdennadher, S., Hamdi, N.: PathoGenius VR: VR medical training. In: Proceedings- Pervasive Displays, pp. 1–2 (2019) 10. Triegaardt, J., Han, T.S., Sada, C., Sharma, S., Sharma, P.: The role of virtual reality on outcomes in rehabilitation of Parkinson’s disease. Neurol. Sci. 41, 529–536 (2020) 11. Yin, J., Yuan, J., Arfaei, N., Catalano, P.J., Allen, J.G., Spengler, D.: Effects of biophilic indoor environment on stress and anxiety recovery. Environmental 136, 105247 (2020) 12. Alexander, T., Westhoven, M., Conradi, J.: Virtual Environments for Competency-Oriented Education and Training, pp. 23–29 (2017) 13. Kupin, A., Moeller, B., Jiang, Y., Banerjee, N. K., Banerjee, S.: Task-Driven Biometric Authentication of Users in Virtual Reality (VR) Environments, pp. 55–67 (2019) 14. Ahir, K., Govani, K., Gajera, R., Shah, M.: Application on virtual reality for enhanced education learning, military training and sports. Augment. Hum. Res. 5, 1–9 (2020) 15. Cipresso, P., Alice, I., Giglioli, C., Raya, M.A., Riva, G.: The past present, and future of virtual and augmented reality research: a network and cluster analysis of the literature. Front. Psychol. 9, 1–20 (2018). https://doi.org/10.3389/fpsyg.2018.02086 16. Baños, R., Botella, C., García-Palacios, A., Villa, H., Perpiñá, C., Gallardo, M.: Environments: the roles of absorption and dissociation, CyberPsychol. Behav. 2(2), 143–148 (1999) 17. Radianti, J., Majchrzak, T.A., Fromm, J., Wohlgenannt, I.: A systematic review of immersive virtual reality applications for higher education: design elements, lessons learned, and research agenda. Comput. Educ. 147 (2020) 18. Cass, S.: The Top Programming Languages 2019 - IEEE Spectrum. IEEE Spectrum (2019) 19. Milne, I., Rowe, G.: Difficulties in learning and teaching programming - views of students and tutors. Educ. Inf. Technol. 7(1), 55–66 (2002) 20. Edori, P.G.: Students’ Motivation and the Challenges Instructors Face Incorporating ICT Based Instructional Materials (2014) 21. Sherman, W.R., Craig, A.B.: Understanding Virtual Reality Interface, Application, and Design. Morgan Kaufmann, Cambridge (2019) 22. Mikropoulos, T., Chalkidis, A., Katskikis, A., Emvalotis, A.: Students’ attitudes towards educational virtual environments. Educ. Inf. Technol. 3(2), 137–148 (1998) 23. Pantelidis, V.S.: Reasons to Use Virtual Reality in Education and Training Courses and a Model to Determine When to Use Virtual Reality, pp. 59–70 24. Al-bataineh, A., Brooks, L.: Challenges, Advantages, And Disadvantages of Instructional Technology in the Community College Classroom (2016) 25. Robertson, G.G., Card, S.K., Mackinlay, J.: Three views of virtual reality: non-immersive virtual reality. Computer 26(2), 81 (1993) 26. Baka, E., Stavroulia, K.E., Thalmann, N.M., Lanitis, A.: An EEG-based Evaluation for Comparing the Sense of Presence between Virtual and Physical Environments, pp. 107–116 (2018) 27. Bowman, D.A., Mcmahan, R.P., Tech, V.: Virtual Reality: How Much Immersion Is Enough? (2007)
1100
A. Y. Saleh et al.
28. Ollege, U.N.C.: Learning by doing and learning through play: an exploration of interactivity in virtual environments for children Maria Roussou. Comput. Entertain. 2(1), 1–23 (2004) 29. Huang, H., Liaw, S., Lai, C.: Exploring learner acceptance of the use of virtual reality in medical education-a case study of desktop and projection-based display systems. Interact. Learn. Environ. 24, 1–17 (2013) 30. Salis, C., Pantelidis, V.S.: Designing virtual environments for instruction: concepts and considerations. VR Sch. 2, 6–10 (1997) 31. Mergel, B.: Instructional Design and Learning Theory (1998) 32. Grivokostopoulou, F.: An Innovative Educational Environment Based on Virtual Reality and Gamification for Learning Search Algorithms, pp. 1–6 (2016) 33. Pierre, F., Zhao, F., Koufakou, A.: Learning Programming in Virtual Reality Environments. Computer Science HCI in Games, pp. 448–457 (2020) 34. Kamaruzzaman, M.: Top 10 In-Demand Programming Languages to Learn in 2020 (2020) 35. Choe, M., Choi, Y., Park, J., Kim, H.K.: Comparison of gaze cursor input methods for virtual reality devices. Int. J. Hum.–Comput. Interact. 35, 1–10 (2018) 36. Control, P.M.: Navigation in Virtual Reality Comparison of Gaze-Directed and Pointing Motion Control, pp. 18–20 (2016) 37. Darken, R.P., Sibert, J.L.: Wayfinding strategies and behaviors in large virtual worlds. Human Factors in Computing Systems Common Ground (1996) 38. Heemels, W., Johansson, K., Tabuada, P.: An Introduction to Event-Triggered and selftriggered control. IEEE Conference on Decision and Control (CDC) (2012) 39. Serafin, S., Geronazzo, M., Erkut, C., Nilsson, N.C., Nordahl, R.: Sonic interactions in virtual reality: state of the art, current challenges, and future directions. IEEE Comput. Graphics Appl. 38(2), 31–43 (2018) 40. Boletsis, C.: The new era of virtual reality locomotion: a systematic literature review of techniques and a proposed typology. Multi. Technol. Interact. 1(4), 24 (2017) 41. Dahlstrom, E., Brooks, D.C., Grajek, S., Reeves, J.: ECAR Study of Students and Information Technology-5 common problems faced by python beginners (2015) 42. Cochrane, T.: Mobile VR in education-From the fringe to the mainstream. Int. J. Mobile Hum. Comput. Interact. 8(4), 44–60 (2016) 43. Bingham, D.: To Reach Its Full Potential, Mobile VR Needs More Pixels-But There’s No Free Lunch (2018) 44. Deb, S., Ray, A.B.: Smartphone Based Virtual Reality Systems in Classroom Teaching -A Study on The Effects of Learning Outcome, pp. 68–71 (2016) 45. Dascalu, M., Bagis, S., Nitu, M., Ferche, M., Dragos, A., Moldoveanu, B.: Experiential learning VR system for studying computer architecture. 10(3), pp. 197–215 (2017) 46. Yap, M.C.: Google Cardboard for a K12 Social Studies Module, pp. 1–30 (2016) 47. Lee, S.H., Sergueeva, K., Catangui, M., Kandaurova, M.: Assessing Google Cardboard virtual reality as a content delivery system in business classrooms. J. Educ. Bus. 92(4), 153–160 (2017) 48. Papachristos, N.M., Vrellis, I., Mikropoulos, T.A.: A comparison between Oculus Rift and a low-cost smartphone VR headset: immersive user experience and learning. In: Proceedings - IEEE 17th International Conference on Advanced Learning Technologies, ICALT 2017, pp. 477–481 (2017) 49. Sileyew, K.J.: Research Design and Methodology. Text Mining - Analysis, Programming and Application (2019) 50. Segura, R.J., Del Pino, F.J., Ogáyar, C.J., Rueda, A.J.: VR-OCKS: a virtual reality game for learning the basic concepts of programming. Comput. Appl. Eng. Educ. 28(1), 31–41 (2020) 51. Kieruj, N.D., Moors, G.: Variations in response style behavior by response scale format in attitude research. Int. J. Public Opin. Res. 22(3), 320–342 (2010) 52. Humar, I., Krebl, M., Orel, M., Lu, H.: Virtual Reality Sickness and Challenges Behind Different Technology and Content Settings. Mobile Networks and Applications (2019)
Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale Gobiga Rajalingam, Janarthan Jeyachandran(B) , M. S. M. Siriwardane, Tharshvini Pathmaseelan, R. K. N. D. Jayawardhane, and N. S. Weerakoon Rajarata University of Sri Lanka, Mihinthale, Sri Lanka
Abstract. Fake news, hate speech, crude language, ethnic and racial slurs and more have been spreading widely every day, yet in Sri Lanka, there is no definite solution to save our society from such profanities. The method we propose detects racist, sexist and cursing objectionable content of Sinhala, Tamil and English languages. To selectively filter out the potentially objectionable audio content, the input audio is first preprocessed, converted into text format, and then such objectionable content is detected with a machine learning filtering mechanism. In order to validate its offensive nature, a preliminary filtering model was created which takes the converted sentences as input and classifies them through a binary classification. When the text is classified as offensive, then secondary filtering is carried out with a separate multi-class text classification model which classifies each word in the sentence into sexist, racist, cursing, and non-offensive categories. The models in preliminary filtering involve the Term Frequency–Inverse Document Frequency (TF-IDF) vectorizer and Support Vector Machine algorithm with varying hyperparameters. As for the multi-class classification model for Sinhala language, the combination of Logistic Regression (LR) and Countvectorizer was used while the Multinomial Naive Bayes and TF-IDF vectorizer model was found suitable for Tamil. For English, LR with Countvectorizer model was chosen to proceed. The system has an 89% and 77% accuracy of detection for Sinhala and Tamil respectively. Finally, the detected objectionable content is replaced in the audio with a predetermined audio input. Keywords: Natural Language Processing · Machine learning · Speech recognition
1 Introduction In this society, the media has become one of the factors that influence virtually every aspect of our lives. As per a research conducted in 2011 [1] regarding profanities in media, a positive association has been found between exposure to profanity in multiple forms of media among the adolescents and engagement in physical and relational aggression. This concludes the drastic consequences of the continued uncensored presence of objectionable content in the media. Several ethnic riots arose in Sri Lanka in the recent past and the primary reason for those conflicts was the inability to control and regulate the racial profanity in the media following the actual root causes of the riots. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1101–1114, 2021. https://doi.org/10.1007/978-3-030-70713-2_98
1102
G. Rajalingam et al.
During those conflicts, the government banned social media in Sri Lanka rather than moderating the media content, because there was no proper mechanism to specifically filter and replace objectionable content being spread via media. Some of the English-speaking countries have implemented various forms of solutions to solve this issue, mainly focusing on Automatic Objectionable Content Detection and Replacement in Audio, Automatic Objectionable Content Detection in Text, and Libraries and Other Implementations. Yet, there is no solution for Sri Lankans to remove objectionable content in the media, which is in Sri Lankan local languages, Sinhala and Tamil. The very few research works based on detecting objectionable content in Sri Lankan locales are mainly focused on textual content detection in social media. The motivation of our system focuses on alleviating the aforementioned problem of profanities in the media for the Sri Lankan scenario. Thus, the system intends to automatically detect objectionable content under three main categories, namely, sexist, racist, and cursing from an input audio, and then to replace those content with a predetermined audio clip. This is expected to work for the Sri Lankan local languages, Sinhala and Tamil, as well as for English. As a developing country, the physical and mental wellbeing of each citizen, especially the youth, is vital to induce our nation to become sustainable in the coming years. We believe that this attempt taken would have a positive impact towards this cause.
2 Related Work The scientific study from a computer science point of view, on detecting objectionable content in audio and automatic audio replacement is mainly based on the English Language. Accordingly, the related work can be mentioned under three main categories as follows. 2.1 Automatic Objectionable Content Detection and Replacement in Audio Anthony Edward Stuart et al. [2] has proposed a hardware-based system on detecting and replacing objectionable content out of real-time audio streams. Praveen S. Nair [3] has produced a client server-based system to filter certain portions out of a multimedia stream, and meanwhile, Gene Fein et al. [4] have invented a Communication Device Language Filter which is focused on cellular conversations. V. Vanjani [5], has proposed a system for filtering media content containing profanity in songs. 2.2 Automatic Objectionable Content Detection in Text Paula Fortuna, and Sergio Nunes [6], have conducted a survey on Automatic detection of hate speech in the text. Mukul Anand, and R. Eswari [7], have experimented with an automatic approach for classification of Abusive Content Comments in Social Media using Deep Learning. Dulan Dias, Madushi Welikala, and N.G.J. Dias [8] have proposed a system for identifying Racist Social Media Comments in Sinhala Language using Text Analytics Models with Machine Learning, and this is one of the significant research works done with regard to Sri Lankan locale. Nemanja Djuric, et al. [9] have proposed a
Automatic Audio Replacement of Objectionable Content
1103
method for hate speech detection with comment embeddings. Furthermore, research on hate speech detection for the Italian language on Facebook is done by Fabio Del Vigna, Andrea Cimino, Marinella Petrocchi, and Maurizio Tesconi1 [10]. Anna Schmidt and Michael Wiegand [11], have done a survey on hate speech detection using Natural Language Processing. 2.3 Libraries and Other Implementations Edward Loper and Steven Bird [12] have created the Natural Language Toolkit (NLTK), of which the basic core concepts can be utilized to implement Natural Language Processing (NLP) modules to analyze the input audio and filter objectionable content. ‘Think DSP’ by Allen Downey [13], is an important Python library, especially for Digital Signal Processing. Moreover, Tomas Prankckevicius and Virginijus Maecinkevicius [14], have investigated Naïve Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression classifiers implemented in Apache Spark, i.e. the in-memory intensive computing platform. However, the system we propose stands significant among all, since there is no present research work related to automatic audio replacement of objectionable content for Sri Lankan locale.
3 System Overview The main workflow of our system is maintained through four inter-dependent submodules namely, Digital Signal Processing (DSP) module, Speech Recognition Module, Natural Language Processing (NLP) module, and the Audio Replacing Module. 3.1 DSP Module The DSP module of the system accepts the user input audio, converts it into a.WAV (Windows Wave) file if it is in some other audio format, and then performs the noise reduction process with a multitude of filters. Afterwards, the cleaned audio samples are amplified through the DSP module as its final task. 3.2 Speech Recognition Module Since we are classifying the objectionable content in textual form, the use of speech recognition is required to convert the audio into text format, and we used the Google Speech Recognizer to complete this task accurately and efficiently. 3.3 NLP Module In order to validate the contextual offensive nature, a preliminary filtering model was created which takes the converted sentences as input and classifies whether they are simply offensive or non-offensive through a binary classification. If the text is classified as offensive, then secondary filtering is carried out with a separate multi-class text classification model which classifies each word in the sentence into sexist, racist, cursing, and non-offensive categories. Considering the results with the pipelined comparisons, different classifiers were chosen accordingly.
1104
G. Rajalingam et al.
3.4 Audio Replacing Module The specific timestamp of the detected objectionable content was obtained as an input from the Natural Language Processing Module. The start time of the first identified objectionable word is captured and the audio is split into two portions. Then, the end time of that identified objectionable word is obtained and the second part of the audio is split accordingly. Here, the tail part of the 2nd portion of the audio is stored separately. Thus, the aforementioned method was applied to all detected objectionable words and finally, a series of audio clips stored in a folder is acquired by the system. At last, the split audio clips are merged based on the user requirements, using predetermined replacement audio input. The interdependency of these modules can be depicted as below (see Fig. 1).
Fig. 1. Interdependency of the system module on how each of the modules are connected together
4 Methodology 4.1 Data Corpus Collection Since our project concerns Natural Language Processing, social media posts and comments were chosen as the source of raw data, since there was a requirement of data with the presence of heavy colloquial language. This ensured the detection of profanity, regardless of whether the audio file contains formal speech or casual speech form. A list of offensive keywords was formed for all three languages to manually collect the social media posts and comments from mainly Twitter and Facebook. The keywords were chosen through careful speculation of their intended meaning and certain changes were made in each language as the culture differs. Since caste discrimination is considered offensive in Tamil culture, the derogatory caste names were included under the “Racist” category as well. As for the “Non-Offensive” category, the random and ordinary text was chosen from social media. Each data entry was ensured to be between the word
Automatic Audio Replacement of Objectionable Content
1105
length of 4 to 20 words and the datasets were stored in CSV (Comma-Separated Values) format in UTF-8 (8-bit Unicode Transformation Format) encoding. There were a few ambiguities in deciding the offensive nature of an entry based on the intention of the profanity used in that context. Under the “Racist” category, the comparison of intention between the following examples was considered.
: Non-Racist intention
: Racist intention However, regardless of the intention of the speaker, the offensive derogatory words are used in the aforementioned examples, which makes them eligible to be considered under the Racist category. The corpora for binary classification was ensured to be balanced in the distribution to avoid suboptimal performance of the model [15]. The comments were collected from social media posts and assigned a class considering the presence of offensive keywords. A different approach was carried out in the data collection for the Multi-class classification model, since it concerns the prediction of the offensive nature of each word in a sentence. Therefore, the data corpus for this purpose was ensured to contain short snippets from social media comments in a balanced manner wherein 900 instances each for each class for Sinhala, 430 instances each for each class for Tamil and 500 instances each for each class for English. 4.2 Preprocessing of Data The data corpus was preprocessed by removing stopwords, numbers, URLs, punctuations, special characters, and duplicates. The stopwords list for Sinhala and Tamil was inspired from NLTK’s stopwords for English and it included colloquial stopwords as well. 4.3 Building and Evaluating the Model The data corpora were split into training datasets randomly with 70% of the corpus and the rest was used as testing data set for model evaluation. The preprocessed datasets were used to generate the features for the Machine Learning algorithms. The vectorization and classification tasks were conducted using the Scikit-Learn library. With regards to feature extraction, we tested the datasets with the TF-IDF vectorizer and Count vectorizer along with the classifiers [16]. The SVM classifier is known to outperform other classifiers as k-Nearest Neighbors and Naive Bayes [17]. Hence, for the binary classification models,
1106
G. Rajalingam et al.
SVM was chosen to proceed with. The binary Classification reports for Tamil, Sinhala, and English models using SVM classifier with TF-IDF vectorizer are shown in Table 1. Table 1. Binary classification reports for Sinhala, Tamil and English models. Language
Classes (0:Non-offensive, 1:Offensive)
Precision
Recall
F1 Score
Sinhala
0
0.92
0.88
0.90
1
0.87
0.91
0.89
Tamil
0
0.80
0.89
0.84
1
0.85
0.75
0.80
0
0.92
0.96
0.94
1
0.96
0.93
0.94
English
The secondary filtering model was used to identify the offensive nature of each word in a sentence. This involves a data set of short snippets of text rather than lengthy sentences, where the performance of SVM is relatively lower than that of variants of NB (Naive Bayes) [18]. Regarding this, for the multi-class classification models a combination of classifiers and vectorizers were used to identify the most-suitable model such as Decision Tree Classifier (DTC), Multinomial Naive Bayes (MNB), Logistic Regression (LR) and Support Vector Machine (SVM) classifiers with Countvectorizer and Term Frequency–Inverse Document Frequency vectorizer (TF-IDF vectorizer). Table 2 shows the Multi-class Classification model comparison for the Tamil, Sinhala, and English languages. As per the classification report for Sinhala in the Table 2, the highest accuracy of 0.7983 was achieved with the combination of LR with Countvectorizer. For the Tamil multi-class classification model, LR with Countevectorizer gives an accuracy of 0.7182, meanwhile MNB (Multinomial Naive Bayes) with TF-IDF allows the accuracy of 0.7166. For English, since there was a distinct difference between the accuracy, LR with Countvectorizer was chosen to proceed with. Since both LR with Countevectorizer and MNB (Multinomial Naive Bayes) with TF-IDF allows high accuracies for the Tamil multi-class classification model, their normalized confusion matrices were compared. As shown in Table 3, MNB with TF-IDF vectorizer has made more accurate predictions in the ‘Non-offensive’ category in comparison with LR-Countvectorizer model. Our project is intended to provide a clear distinction between Non-offensive words and Offensive words and it is vital that the ‘Nonoffensive’ category has reduced false-positive predictions. Therefore, the MNB-TFIDF vectorizer model was found suitable for Tamil multi-class classification.
0.6278
TF-IDF
LR
MNB
0.6385
Countvectorizer
DTC
Accuracy average
0.7182
0.7166
TF-IDF
Countvectorizer
0.7029
Countvectorizer
Tamil
Vectorizer
Classifier
0.7983
0.7553
0.7677
0.7628
0.7975
Sinhala
0.935
0.8772
0.8739
0.9220
0.9336
English
0.63 0.76 0.77
1 2 3
0.69
3 0.74
0.74
2
0
0.75
1
0.66
3 0.69
0.72
2
0
0.76
1
0.67
3 0.68
0.64
2
0
0.64
1
0.66
3 0.59
0.64
2
0
0.63 0.63
0
Tamil
Precision
1
Classes (0:Non-offensive, 1:Sexist, 2:Racist, 3:Cursing)
0.78
0.88
0.82
0.71
0.72
0.88
0.76
0.68
0.72
0.88
0.79
0.70
0.79
0.87
0.70
0.66
0.79
0.91
0.80
0.69
Sinhala
0.92
0.96
0.99
0.88
0.87
0.86
0.93
0.84
0.83
0.88
0.91
0.88
0.90
0.98
0.98
0.84
0.92
0.99
0.98
0.86
English
0.72
0.69
0.80
0.68
0.66
0.74
0.68
0.75
0.66
0.75
0.66
0.72
0.67
0.57
0.66
0.63
0.66
0.59
0.62
0.67
Tamil
Recall
0.83
0.89
0.71
0.76
0.83
0.81
0.65
0.74
0.83
0.84
0.66
0.73
0.83
0.90
0.71
0.59
0.86
0.87
0.70
0.76
Sinhala
0.90
0.90
0.99
0.95
0.86
0.87
0.93
0.85
0.87
0.87
0.98
0.78
0.88
0.90
0.97
0.94
0.89
0.91
0.97
0.95
English
Table 2. Multi-class classification reports for the Tamil, Sinhala and English models.
0.74
0.72
0.70
0.71
0.68
0.74
0.72
0.72
0.66
0.74
0.71
0.70
0.67
0.60
0.65
0.61
0.66
0.62
0.63
0.65
Tamil
F1 score
0.81
0.89
0.76
0.73
0.77
0.84
0.70
0.71
0.77
0.86
0.72
0.72
0.81
0.89
0.71
0.62
0.82
0.89
0.75
0.72
Sinhala
(continued)
0.91
0.93
0.99
0.92
0.87
0.87
0.93
0.85
0.85
0.88
0.94
0.82
0.89
0.94
0.97
0.89
0.91
0.95
0.98
0.90
English
Automatic Audio Replacement of Objectionable Content 1107
SVM
Classifier
Accuracy average
0.6753
0.7029
TF-IDF
0.7151
Tamil
Countvectorizer
TF-IDF
Vectorizer
0.7859
0.7842
0.7842
Sinhala
0.9237
0.8772
0.9237
English
0.78 0.75 0.72
1 2 3
0.76
3 0.64
0.77
2
0
0.60
1
0.68
3 0.65
0.74
2
0
0.73 0.70
0
Tamil
Precision
1
Classes (0:Non-offensive, 1:Sexist, 2:Racist, 3:Cursing)
0.77
0.90
0.82
0.66
0.79
0.87
0.78
0.68
0.77
0.88
0.83
0.68
Sinhala
Table 2. (continued)
0.96
0.96
0.99
0.82
0.89
0.98
0.99
0.73
0.92
0.94
0.98
0.87
English
0.61
0.69
0.61
0.82
0.54
0.63
0.76
0.73
0.69
0.73
0.69
0.74
Tamil
Recall
0.86
0.86
0.66
0.75
0.84
0.89
0.67
0.74
0.82
0.88
0.66
0.78
Sinhala
0.85
0.90
0.94
1.00
0.84
0.82
0.92
0.93
0.87
0.89
0.96
0.97
English
0.66
0.72
0.69
0.72
0.63
0.69
0.67
0.6
0.69
0.74
0.69
0.73
Tamil
F1 score
0.82
0.88
0.73
0.71
0.81
0.88
0.72
0.71
0.79
0.88
0.74
0.72
Sinhala
0.90
0.93
0.96
0.90
0.87
0.89
0.95
0.81
0.90
0.91
0.97
0.91
English
1108 G. Rajalingam et al.
0.75
0.13
0.18
0.19
Non-offensive
Sexist
Racist
Cursing
0.08
0.05
0.68
0.06
0.05
0.74
0.05
0.12
0.66
0.02
0.12
0.06
0.08
0.16
0.1
0.68
Predicted class Non-offensive
Cursing
Non-offensive
Racist
Predicted class
Actual class Sexist
LR-CountVectorizer
MNB-TFIDvcetorizer
0.15
0.07
0.79
0.17
Sexist
Table 3. Normalized confusion matrix for MNB-TFIDFvectorizer and LR-CountVectorizer.
0.04
0.69
0.04
0.1
Racist
0.72
0.06
0.06
0.03
Cursing
Automatic Audio Replacement of Objectionable Content 1109
1110
G. Rajalingam et al.
5 Evaluation After the models were finalized, the hyperparameters of the classifiers and vectorizers were tuned for optimum accuracy using the pipelining method in Scikit-learn. The classification reports of final chosen models for Sinhala, Tamil and English are depicted in Table 4. Table 4. Binary classification models for Sinhala, Tamil and English models. Language
Average accuracy
Classes (0:Non-offensive, 1:Offensive)
Precision
Recall
F1 score
Sinhala
0.89
0
0.92
0.88
0.90
1
0.87
0.91
0.89
Tamil
0.82
0
0.80
0.89
0.84
1
0.85
0.75
0.80
0
0.92
0.96
0.94
1
0.96
0.93
0.94
English
0.94
The performance of the finalized binary classification models was evaluated using the Receiver Operating Characteristic (ROC) Curve, which displays the graphical relationships among the metrics. With the use of ‘Area Under the Curve’ measure, it can be observed that the performance of the classifiers is greater than the classifiers with no power (see Fig. 2).
Fig. 2. Receiver Operating Characteristic Curves of the models for Sinhala, Tamil and English language with a high AUC describing its distinguishability between the classes.
Automatic Audio Replacement of Objectionable Content
1111
The Precision vs Recall curve can be used to measure the quality of the output generated by the classifiers. Using the AUC measure, on the curve, it can be concluded that the classifier is returning accurate results and majority results are positive (see Fig. 3).
Fig. 3. Precision vs Recall Curves of the models for all three target languages. The flat line on the figure denotes “No Skill” and the curve denotes the classifier, wherein the latter is present quite above the flat line
Classification reports of final chosen multi-class models for Sinhala, Tamil and English is shown in Table 5. Table 5. Multi-class classification models for Sinhala, Tamil and English models. Language
Model used
Classes (0:Non-offensive, 1:Sexist, 2:Racist, 3:Cursing)
Precision
Recall
F1 score
Average accuracy
Sinhala
LR with Countvectorizer
0
0.83
0.87
0.85
0.89
1
0.92
0.92
0.92
2
0.95
0.91
0.93
3
0.88
0.89
0.89
0
0.75
0.69
0.72
1
0.84
0.81
0.82
2
0.80
0.78
0.79
3
0.69
0.80
0.74
0
0.92
0.89
0.83
1
0.92
0.95
0.94
2
0.90
0.84
0.87
3
0.86
0.82
0.79
Tamil
English
MNB with TF-IDF vectorizer
LR with Countvectorizer
0.77
0.89
1112
G. Rajalingam et al.
Normalized confusion matrix for the finalized Multi-class model for all three languages is shown in Table 6. Table 6. Normalized confusion matrix for the finalized Multi-class model for Sinhala, Tamil, English Sinhala Actual class
Predicted class Non-offensive Sexist Racist Cursing
Tamil
Non-offensive 0.86
0.04
0.01
0.07
Sexist
0.91
0.01
0.04
0.03
Racist
0.04
0.01
0.9
0.02
Cursing
0.06
0.02
0.02
0.89
Actual class
Predicted class Non-offensive Sexist Racist Cursing
Non-offensive 0.75
0.06
Sexist
0.13
Racist
0.17
Cursing
0.02
English Actual class
0.01
0.06
0.69
0.04
0.012
0.05
0.73
0.03
0.09
0.06
0.64
Predicted class Non-offensive Sexist Racist Cursing
Non-offensive 0.87
0.0
0
0.01
Sexist
0.07
0.9
0
0.01
Racist
0.05
0.03
0.89
0.01
Cursing
0.03
0.04
0.02
0.9
6 Conclusion When considering Sri Lanka, the government had to temporarily ban a few social media platforms several times from 2018 to 2019 to control the violence between the communities which was accelerating via hate speech and racist fake news spreading through social media. Yet, there is no proper mechanism to specifically filter and replace objectionable content in an audio for Sri Lankan locale. In addressing this issue, we proposed a system for Sri Lankan locale, which automatically detects and replaces objectionable content in an audio. In this proposed system, to selectively filter out the potentially objectionable audio content, the input audio is first preprocessed, and converted into text format. The presence of racist, cursing, and sexist objectionable content are detected along with their corresponding locations and timestamps through the filtering mechanism. Afterwards the detected objectionable content is seamlessly replaced with predetermined audio input. The model was tested against a Tamil dataset of 1720, and a Sinhala dataset of 3950. Our proposed system can be seen as a complementary attempt to alleviate the problem
Automatic Audio Replacement of Objectionable Content
1113
of profanities in media for the Sri Lankan scenario, with its accuracy of testing results for Sinhala language at 89%, and its accuracy for testing results of Tamil language at 77%. Hence, this system might not be the perfect model for this purpose, and moreover, it has few limitations at the present. The main limitation which our software has is the requirement of an internet connection to run successfully. However, with proper incorporation of customized Speech-to-Text converter models for the Sri Lankan languages, the prerequisite of internet connection can be rectified. Apart from that, the data corpora collection and preprocessing for Sinhala and Tamil proved to be demanding and due to the insufficient data corpus in Tamil, the performance of model was comparatively less. As a future work, we are aiming to create a voice cloning model which can enable the replacement of detected objectionable content with a replica of the voice of the speaker. We are also expecting to expand the system for real time purposes and for all media types. Apart from that, we plan to develop this product as a plugin, so that it can be implemented in social media platforms to automatically detect and replace objectionable audio content in the video clips that are shared around. Yet, given that human moderators cannot monitor the large amount of audio files spread across the country and due to the lack of mechanisms for automatic audio replacement of objectionable content for Sri Lankan locale, we believe that this attempt represents a compatible solution for the identified problem.
References 1. Conye, S.M., Stockdale, L.A., Nelson, D.A., Fraser, A.: Profanity in media associated with attitudes and behavior regarding profanity use and aggression. Pediatrics 2011 128(5), 867– 872 (2011). https://doi.org/10.1542/peds.2011-1062 2. Stuart, et al.: Automatic Replacement of Objectionable Audio Content from Audio Signals, by. Patent US 20090055189A1, 26 Feb 2009 3. Nair, P.: Filtering some Portions of a Multimedia Stream. Patent US 2014O129225A1, 08 May 2014 4. Fein, G., Merritt, E.: Communication Device Language Filter. Patent US 2010O28O828A1, 04 Nov 2010 5. Vanjan, V.: Systems and Methods for Filtering Objectionable Content. Patent US 20150205574A1, 23 July 2015 6. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 81(05), 1–30 (2018) 7. Anand, M., Eswari, R.: Classification of abusive comments in social media using deep learning. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 974–977 (2019) 8. Dias, D., Welikala, M., Dias, N.G.J.: Identifying racist social media comments in sinhala language using text analytics models with machine learning. In: Conference: 2018 18th International Conference on Advances in ICT for Emerging Regions, pp. 1–6. Colombo, Sri Lanka (Sept 2018). https://doi.org/10.1109/ICTER.2018.8615492 9. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: WWW 2015 Companion: Proceedings of the 24th International Conference on World Wide Web, p. 29. Association for Computing Machinery, New York, May 2015. https://doi.org/10.1145/2740908.2742760
1114
G. Rajalingam et al.
10. Vigna, F.D., Cimino, A., Dell’Orletta, F., Petrocchi, M.: Hate me, hate me not: hate speech detection on Facebook. In: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy, Jan 2017 11. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Conference: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Association for Computational Linguistics, Jan 2017 12. Loper, E., Bird, S.: NLTK: The Natural Language Tool. Presented at computer research repository, 1, 63–70 (2002).https://doi.org/10.3115/1118108.1118117 13. Allen, B.: Downey, “Think DSP”, Digital Signal Processing in Python, 1st edn. O’Reilly Media Inc., USA (2014) 14. Pranckeviˇcius, T., Marcinkeviˇcius, V.: Comparison of naïve bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic J. Modern Computing 5(2), 221–232 (2017) 15. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsl 6, 1 (2004). https://doi.org/10.1145/1007730. 1007733 16. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay. É.: Scikit-learn: Machine Learning in Python (2011) 17. Colas, F., Brazdil, P.: Comparison of SVM and some older classification algorithms in text classification tasks. Artificial Intelligence in Theory and Practice, 217 (2006). https://doi.org/ 10.1007/978-0-387-34747-9_18 18. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 90–94, Jeju, Republic of Korea, 8–14 July 2012 (2012)
A Comparison of CNN and Conventional Descriptors for Word Spotting Approach: Application to Handwritten Document Image Retrieval Ryma Benabdelaziz1(B) , Djamel Gaceb1 , and Mohammed Haddad2 1 LIMOSE Laboratory, University M’Hamed Bougara of Boumerdes, Boumerdes, Algeria
{r.benabdelaziz,d.gaceb}@univ-boumerdes.dz 2 LIRIS Laboratory, Claude Bernard Lyon 1 University, Villeurbanne, France
[email protected]
Abstract. Natural images are easier to represent in feature space than textual images due to the reduced complexity and thus do not require greater learning capacity. The visual information representation is an important step in contentbased image retrieval (CBIR) systems, used for searching relevant visual information in large image datasets. The extraction of discriminant features can be carried out using two approaches. Manual (conventional CBIR), based on preselected features (colors, shapes, or textures) and automatic (modern CBIR) based on auto-extracted features using deep learning models. This second approach is more robust to the complexity relative to textual images, which require a deep representation reaching the semantics of text in the image. DIRS (Document Image Retrieval Systems) are CBIR systems related to documents images that propose a set of efficient word-spotting techniques, such as the interest points based techniques, which offer an effective local image representation. This paper presents an overview of existing word retrieval techniques and a comparison of our two proposed word-spotting approaches (interest points and CNN description), applied on handwritten documents. The results obtained on degraded and old Bentham datasets are compared with those of the literature. Keywords: Interest points · CNN network · Local and global features · Word retrieval · Word-Spotting · CBIR
1 Introduction Word retrieval in old handwritten document images is a difficult challenge due to many reasons such as the difficulties encountered in the document analysis tasks, the different degradations, the poor quality of documents (noise, stains, ink degradation, etc.), and variations in the writing (style and size). Thus, it is necessary to develop techniques that can achieve good performance and overcome limitations related to the previously mentioned issues. To offer a local image description, and to be able to target the image relevant regions, several word retrieval systems have been proposed in the literature. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1115–1126, 2021. https://doi.org/10.1007/978-3-030-70713-2_99
1116
R. Benabdelaziz et al.
These systems perform the retrieval task in three major steps: 1- Interest point detection (corners, contours, or salient regions). 2- Description step (construction of descriptor vectors). 3- Matching these feature vectors (matching step). Several methods have been proposed in this context but are limited when dealing with handwriting document images compared to printed images or other document components. The specificities of handwritten text make them difficult to operate (old, styles, and text orientation, stains, transparency, crumpled paper, etc.). The most effective techniques in the literature offer powerful image descriptors but are not efficient because they require high computation time and a large memory space related to the number of detection points and the size of the descriptors. However, the current trend is in the use of systems based on artificial neural networks, more precisely Convolutional Neural Networks (CNN) to save generally the computation time and to offer a better performance. Additionally, CNN has shown promising results in many document analysis fields (classification, binarisation, object detection, etc.). CBIR techniques can be separated into two families: conventional (based on preselected features) and modern (based on auto-extracted features). The first approach lacks specialization and can extract preselected visual features from images without taking into account their spatial location or by preserving a certain degree of location by dividing the image into blocks. The second approach, most often based on CNN architecture, offers a self-extraction of features that represent in-depth the semantics of the image. It also offers the possibility of making full use of the knowledge found in several image databases according to different levels of knowledge: generic to specific. If we look thoroughly at the nature of the visual features manipulated by CNNs we find colors, shapes, textures features. These kinds of features are common in conventional CBIR systems (see in Fig. 1. the architecture of CBIR). Both allow the production of descriptor vectors that globally define the images, yet these techniques perform poorly on textual images compared to natural images, on which they have shown good results. Online step Query image
User
Content Features extracon
Global descripon
Matching step
Offline step Images collecon
Content Features extracon
Global descripon
Images collecon Indexaon
Retrieved Images
Fig. 1. CBIR architecture with the steps of on-line and off-line modes.
CNN-based systems are destined to classify large image databases and have shown encouraging results in manipulating most of the features proposed in CBIR, but in a more structured way by introducing the notions of automatism, deep, transfer, and incremental learning. This allows a system to learn progressively and incrementally different image
A Comparison of CNN and Conventional Descriptors
1117
features, so that it can recognize them later while maintaining their spatial relationships in the image and all this throughout the convolutional layers, offering at the end the possibility of extracting a global descriptor vector for each image. The rest of the paper is presented as follows. Section two presents a literature review of word retrieval methods, focusing on supervised-learning and supervised-leaning-free methods. Section three presents our two proposed word-spotting approaches based on interest point descriptors and CNN-based descriptors. In section four, the performances of our two approaches are compared and discussed. Finally, in section five, we draw our conclusion and give some directions for future work.
2 Related Work In this section, we present the literature review in the context of query-based image retrieval focusing on supervised-learning and supervised-learning-free query by example (found in Table 1. the comparative study of some word-spotting techniques). It is worth noting that query by string methods are out of our paper focus. Most of the reviewed methods are either segmentation-based or segmentation-free. The segmentation in this context is related to the retrieved entity, e.g. If we talk about retrieving words or logos in all document pages, and then we apply the segmentation step of the document into a bigger entity (lines, block, etc.), the technique is considered free-segmentation words, free-segmentation logos, etc. This is the principle that we used to classify the reviewed methods. 2.1 Supervised-Learning-Free Word-Spotting Techniques Segmentation-Based Techniques: In [1], the authors proposed a handwritten wordspotting technique using a global word image description. This paper presents different ways of describing image features using: horizontal word profile, Upper/Middle/Lower word profile, background-to-ink transition, grayscale variance, horizontal/vertical Gaussian derivative, and the image resulting from a Gaussian smoothing. The authors evaluated and compared these features using the DTW distance to measure the similarity between words. Others [2] chose to use a shape descriptor for signatures indexation; they have used different local parts description of the signature skeleton to compute a global descriptor. These global representations, gives too much importance to the word shapes representation (pixel level). So, the well-retrieved words must have the nearest word profile or signature. However, it is not well suitable for handwritten databases, which include many morphological word variations. In [2], the authors propose a local visual features extraction from the images using a sliding window that will scan the word images to compute a histogram of the gradient orientations of each image cell. This technique (inspired by the SIFT method) uses an interesting local descriptive level, which produces for each image many descriptors vectors (one for each window location). This technique generates descriptors representing image regions entirely that may include many degradations in the background. The authors of [3] propose a technique for logo retrieval in the Tobaccoo 800 image dataset. In the logo description step, the authors detect points from contours that were extracted using the Canny detector, and
1118
R. Benabdelaziz et al.
a shape context description using a log-polar. The shape-context-based description has been shown to be robust to noise, distortions, and has also been used in handwritten word retrieval and for logo recognition. Then, a histogram using the log-polar coordinates will be constructed for each point. However, doing a local description of each point makes the matching process very expensive on a large image dataset. Therefore, they proposed to construct a single histogram describing the logo entirely constructed from the histograms of each point without considering the logos semantic. Authors of [4] use graphs to represent interest points of local visual feature extraction, this technique uses the topological information (word skeleton) and morphological information (word contours). The word skeleton is used to extract several structural points to create graph vertices and word contours to create graph edges. This graph representation based on structural points and morphological information is interesting especially for handwritten images but requires a good graph matching technique that is most of the time very complex and consumes lots of computation time. Paper [5] proposes a technique for handwritten word-spotting. This technique uses interest points extracted from word images. These local features are based on the gradient information (magnitudes and orientation), thus producing very robust and descriptive points located on the high frequencies (writing). The detecting point was then be filtered to decrease their number. The description step was performed by analyzing different (dynamic) windows around the detected points, which produced scale-invariant points. In the matching step, the k-nearest spatial neighbor’s search (KNN: K-Nearest Neighbors) is used, which consists of limiting the search space of k-spatial neighbors in the query image and that of the word, calculate the average Euclidean distance of the points found in these two surfaces, if no point is found, the point is ignored. Otherwise, the result is accumulated until all the points are reached. The tests were performed on three different handwritten image datasets and produced good results. This technique uses a filtering step that reduces the interest points quantity in the detection step in order to decrease the computation time, but this issue deletes some relevant points. To compute the similarity between images, the authors take all minimums of the computed Euclidean distances between search areas (candidate and query areas) and then compute the average of these minimums, without checking if the nearest morphological point exists among these neighbors. Free-Segmentation Techniques: Authors of [6] have developed a word-spotting technique applied to printed documents. They focused on the matching step instead of the detection and description steps. Such as, in the interest points detection and description step, they used the SIFT method in order to perform a first matching step that allows the location of the k-most similar points in the document image to those of the query image. Candidate areas are built around these selected points. Then, a second matching step will be performed between more target points. Authors of [7] proposed a free-segmentation word-spotting technique that involves using their previous interest points descriptions techniques, based on gradient orientations and magnitudes to perform a first word localization in the document’s images. The first step consists of matching the gravity center
A Comparison of CNN and Conventional Descriptors
1119
(descriptor vector) of the query image with all interest points of the documents using a matching threshold, which will localize the points morphologically similar to this gravity center. After that, a second matching step that consists of drawing circles (relied on to query size) around each retained point from the first matching step, and matching all interest points inside the circle at each new circle until reaching a fixed size. In the end, choosing for each retained circle, the maximum similarity found between the query and the circles surrounding this point. Then, display the most similar circles (represent the possible words). These two interesting free-segmentation techniques are based on two matching levels. The first matching level helps to globally search the probable word location and reduce retrieval time. The second one consists of searching more locally of the word similarities. One of the drawbacks of this procedure is that it depends directly on the quality of the first matching (matching threshold), which is not always exact.
2.2 Supervised-Learning Techniques Segmentation-Based Techniques: The method proposed in [8] uses deep CNNs to perform word-spotting. The authors made some changes to the CNN structure by adding a PHOC (Pyramidal Histogram of Characters) layer and replacing the softmax output with a sigmoid function suitable for the PHOC. This type of layer allows CNNs to accept different image sizes while producing a constant output size, which is essential for network formation. So when an image has entered, and regardless of its size, the network can predict its corresponding PHOC representation. The proposed method has surpassed the results of ImageNet image-based pre-trained networks and word-based retrained networks. CNNs using PHOC were proving to be effective in many word-spotting works applied on several datasets, whether it is for query-by-example or query-by-string word-spotting approaches. The technique proposed in [9] is also based on CNNs networks; this technique can not only learn the word descriptors but also learn the similarity score between descriptors directly from words. It is based on a combination of two classification and regression methods. In addition, have a technique for expanding the image dataset that uses jitter localization to balance between similar and non-similar images. Segmentation-Free Techniques: There are no much free-segmentation learning-based word-spotting approaches, which use the neural networks in handwritten images description task, but we found some literature works that include this task, for example, to refine their model. These authors [10] propose a technique for word retrieval in handwritten document pages by estimating a statistical model from the query image. Such as, they use the document pages to extract visual features from the image using the dense SIFT method. Then, a classification technique was used to build a codebook (Bag of Features) using these interest points. The goal here is to estimate a Bag of features HMM model that will encode the visual appearance of the query image in a sequence of probabilities. A training technique will be then applied to refine this model. The query will be encoded using the probability sequence. Other authors introduce a not targeted segmentation to the retrieved entity like [11], which proposes a technique for logos retrieval in administrative documents from the Tobacoo 800 image dataset. In the online stage, they segment
1120
R. Benabdelaziz et al.
the document into interest regions. At the same time, they trained a Siamese network by using the Alexnet model in the CNN part. The Siamese network was used in this work as a visual feature extractor. The network then produces two features vectors for each pair of images and then calculates the distance between them. The authors compared the execution times of the two scenarios: (a) Considering the Siamese network as a feature extractor only, then calculating its Euclidean distance outside the network. (b) By calculating the distance from the Siamese network. The results of the scenario (a) were faster and more efficient. Learning the similarity between images is a new trend designed to decide if two images are similar or not, and was then used in word-spotting techniques by showing to the network many couples or triples of similar and dissimilar images, then using one of the last layers of the network to compute the similarity distances. The two classical techniques (supervised-learning-free) and automatics techniques (supervised-learning-based) are effective. The first family does not need to know previously the dataset, so it can work directly with any new datasets without learning it before. Besides, the image descriptors can be either global by describing the images globally, without giving importance to the semantics of the images or can offer a local image description by representing different image regions (interest point or any images parts), but need a good matching strategy that respects the interest points position to perform a words retrieval results. The second family needs to see a portion of the dataset and to learn it to offer powerful learning features and better descriptors. So, this family of techniques is specific to the learning dataset and cannot unfold in real-time but are efficient despite their global description that can be as effective as the local description, without giving importance to the matching step (the Euclidean distance can offer good results).
3 Methodology 3.1 Interest Points-Based Approach Interest Points Detection and Description: Our approach [12] is a segmentation-free word-spotting technique and supervised-learning-free that uses textural features in the spatial context. This approach is an amelioration of this technique [5]. It is organized as follows: to improve a feature extraction on handwritten images, we have applied a preprocessing step. The interest points detection is performed using the gradient vectors (horizontal and vertical). Scale-invariant features descriptors are resulting from the combination of the gradient magnitudes and the gradient orientations using local analysis. This is mainly to provide a more comprehensive local description of the region surrounding each detected interest point. Matching Step: We assume that each image is represented by a set of descriptors. We propose a new matching technique using the brute force method to perform a selective nearest neighbor search. This technique offers a good similarity score, through the matching of the two image descriptors sets (I1 and I2 respectively represented by the interest points NI1 and NI2) in the bispace (textural and Cartesian). Figure 2 depicts our matching method.
A Comparison of CNN and Conventional Descriptors
1121
3.2 CNN Based Approach Among the most interesting advantages of neural networks is that they can fully or partially be reused to improve the accuracy of the model using a new dataset and accelerate the training. Transfer learning consists of reusing the training knowledge from a pre-trained model to train new models on a new problem by training the network with comparatively more little data. The approach that we have proposed in [13] is mainly based on the use of a deep transfer learning architecture for word-spotting applications. Our method takes place in three different stages: (1) The transfer-learning stage (pretrained fine-tuned), (2) The image description stage, (3) The matching stage (see Fig. 3). In our work, We have used the CNN network as a features extractors by flattening the last fully connected layer and use it as the new output of the model allowing a global image description (see more details in [13]). We have also introduced a data augmentation step in the training step using rotation property in order to increase the performance of the model and make the model invariant to image rotation. To perform the matching between the resulting global image descriptors vectors, we can compute a simple Euclidean distance for example. 3.3 Handwritten Dataset
100 dimensional Texture space
The top texture best Matche
Query image
2 dimensional Cartesian space
The combinaon of texture and spaal best matches of Database Image
The 7 spaal best matches The i Database Image Similar keypoints
Query image
Database image
Fig. 2. Illustration of the matching process of our approach [12].
In our techniques, we have used the evaluation protocol proposed in ICDAR15 word-spotting competition of handwritten document [14], and the handwritten Bentham dataset, that contains a set of word images extracted from 10 document pages of the British philosopher Jeremy Bentham (1748-1832). In the ICDAR15 competition, this dataset was divided into two-word image sets (3234 candidate images and 95 query images corresponding to 20 different words classes). We have chosen to use this old handwritten database because it comprises different word forms (styles and sizes), and have many writing distortions that make them complex and difficult to handle.
1122
R. Benabdelaziz et al. Tranfer deep learning step Pre-trained network Fine-tuned network Small Dataset (Query Images)
Large Dataset (ImageNet)
Word image descripon step Test Dataset (Word Image) Queries + candidates
Data augmentaon Fine-tuning
Train
Train
Pre-trained weights
Knewledge
(1) Features extracon from CB output
Frozen weights
Fine-tuned weights
(2) Features extracon Using parally fine-tuned CB
Query word
Candidate word
Features extracon using pretrained or fine-tuned network
Matching step Using distance metric
Fig. 3. A synoptic diagram representing our word-spotting approach based on transfer deep learning on handwritten images [13].
4 Result and Discussions 4.1 Results Interest Points Based Approach: We have used a handwritten dataset (Bentham dataset) to evaluate our word-spotting technique [12], and we have compared our results with some works that have used the same dataset and evaluation protocol by using the mAP (mean average precision) metric (See some quantitative results of queries matching in Table 1 and the evaluation results in Table 2). CNN Based Approach: In our proposed approach [13], we chose to use the VGG16 model because it is known for its performance in the document field, especially handwriting. To train our model, the query image set is separated into a set of image samples (that represent the same word) and added a 21st sample. The latter class contains word samples taken from the set of candidate images from Bentham’s dataset (The 21st sample does not include any sample of the query words, and this in order to differentiate between the query images and other images and improve the similarity measurement). In order to improve our results and to overcome some overfitting issues observed due to our small image base, we increased our training dataset using different degrees of random rotation (3°, 5°, 10°, 15°, 20°, and 30°). See the results of our word-spotting based CNN with and without using data augmentation in Table 2. In the matching stage, we have used three similarity distances between the resulting global image vectors (Euclidean distance, Chi2 distance, and Kullback-Leiber divergence); the metrics given the best results are mentioned in Table 2. 4.2 Discussion The interest points proposed approach [12] is a scale-invariant technique but not rotation invariant. Based on the literature review, the rotation in textual images especially in the handwritten ones is not taken into consideration. We choose to not trait this point but still, the results were very effective compared to other works that use the same
A Comparison of CNN and Conventional Descriptors
1123
dataset. However, our technique takes a lot of computation time, and this is related to the complexity in the interest points description stage and also to the matching step, which is directly related to the quantity of the detected interest points in the image. The generated descriptors are strong because the gradient information is good for image description that can capture textural information and offer the possibility to deal directly with the writing location and direction (the points are located directly on the handwriting and are separated according to four directions based on gradient orientations). The bispace matching is also efficient but takes a lot of computation time, its performance depends on the number of K-nearest neighbors chosen around each point but its limit is related to the number of points detected on each image. The basic CNNs are not scale and rotation invariant. First, we have used a fine-tuned CNN to see the effect of global features resulting from training in handwritten wordspotting; the best result was noticed using the Kullback-Leibler divergence. When we tried to augment the dataset using many rotated degrees, we have noticed the best results using Chi2 distance, so we can deduct that the data augmentation can overcome the rotation variance. Finally, we noticed that document images can more or less be influenced by rotation variance and the data augmentation can overcome the scale, rotation, and other images changes. We have compared the computation time of each description technique (global and local) and the results are shown in Table 3. We can see that the global description is very speed compared to the local one and this is obvious. Table 1. Example results of our word-spotting based on interest points using Bentham dataset.
Query 1er
2eme
Retrieved Images 3eme
4eme
5eme
The segmentation is relevant when the document is recent, printed, and in good quality, but leads to error when the documents are old, handwritten, and in poor quality. Many literature techniques are based on the local description using segmented or freesegmentation words. Although it takes a huge computation time in the free-segmentation techniques, but still feasible and gave interesting results. However, there is no real wordspotting free-segmentation-words that use CNN-based description. CNN networks for the training needs word samples (previously segmented) and to get these words samples we need to apply a segmentation on the document (not really a free-segmentation).
1124
R. Benabdelaziz et al.
Table 2. Experimental results of our two word-spotting approaches using Bentham dataset. Methods
Year
Supervised-learning
Image description
Evaluation protocol
MAP
–Our CNN approach 2020 using data augmentation
Based
Global
ICDAR15
0.709
–Our CNN approach without data-augmentation [13]
2020
Based
Global
ICDAR15
0.674
–Our Interest points approach
2019
Free
Local
ICDAR15
0.612
–Zagoris et al. [5]
2017
Free
Local
ICDAR15
0.440
–Zagoris et al. [7]
2017
Free
Local
ICFHR14
0.600
–Retsinas et al. [15]
2016
Free
Global
ICFHR14
0.577
–Sfikas et al. [16]
2016
Based
Global
ICFHR14
0.536
–PRG group [17]
2015
Based
Global
ICDAR15
0.424
–CVC group [18]
2015
Free
Global
ICDAR15
0.300
–Zagoris et al. [19]
2014
Free
Local
ICDAR15
0.217
Table 3. Computation time of supervised learning-free and supervised learning based techniques. Method
A query set (95 images)
Candidates set (3234)
−Mean time of detection and description (interest points based approach)
11,33 s
10,23 s
−Mean time of a global description (CNN 0,011 s based approach)
0,012 s
5 Conclusion In this paper, we have presented an overview of existing word retrieval techniques based on supervised-learning and supervised-learning-free. We then presented our two wordspotting approaches based on the interest points description and CNN description. We have highlighted the strength and limitations of each of them. We notice that the interest points based techniques can be applied in the free-segmentation word-spotting field, but requires high computation time and large memory space. Contrariwise, the supervised learning-based technique is very efficient in image description but cannot be applied to free-segmentation word-spotting approaches because they need training samples (presegmented word). The compromise should be in developing techniques that combine the advantages of the two techniques. This is the object of our future works.
A Comparison of CNN and Conventional Descriptors
1125
References 1. Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 218–222. IEEE Computer Society, Edinburgh, UK (2003). https://doi.org/10.1109/ ICDAR.2003.1227662 2. Rodriguez, J.A., Perronnin, F.: Local Gradient Histogram Features for Word Spotting in Unconstrained Handwritten Documents. 1st ICFHR, p. 6 (2008) 3. Rusiñol, M., Lladós, J.: Efficient logo retrieval through hashing shape context descriptors. In: Proceedings of the 8th IAPR International Workshop on Document Analysis Systems DAS’10. pp. 215–222. ACM Press, Boston, Massachusetts (2010). https://doi.org/10.1145/ 1815330.1815358 4. Wang, P.: Historical Handwriting Representation Model Dedicated to Word Spotting Application (2014) 5. Zagoris, K., Pratikakis, I., Gatos, B.: Unsupervised word spotting in historical handwritten document images using document-oriented local features. IEEE Trans. on Image Process. 26, 4032–4041 (2017). https://doi.org/10.1109/TIP.2017.2700721 6. Konidaris, T., Kesidis, A.L., Gatos, B.: A segmentation-free word spotting method for historical printed documents. Pattern Anal. Appl. 19, 963–976 (2016). https://doi.org/10.1007/ s10044-015-0476-0 7. Zagoris, K., Pratikakis, I., Gatos, B.: Unsupervised word spotting in historical handwritten document images using document-oriented local features. IEEE Trans. Image Process. 26, 4032–4041 (2017). https://doi.org/10.1109/TIP.2017.2700721 8. Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282. IEEE, Shenzhen, China (2016). https://doi.org/10.1109/ ICFHR.2016.0060 9. Zhong, Z., Pan, W., Jin, L., Mouchere, H., Viard-Gaudin, C.: SpottingNet: learning the similarity of word images with convolutional neural network for word spotting in handwritten historical documents. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 295–300. IEEE, Shenzhen, China (2016). https://doi.org/10.1109/ ICFHR.2016.0063 10. Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-Features HMMs for segmentation-free word spotting in handwritten documents. In: 2013 12th International Conference on Document Analysis and Recognition. pp. 1305–1309. IEEE, Washington, DC, USA (2013). https://doi. org/10.1109/ICDAR.2013.264 11. Wiggers, K.L., Britto, A.S., Heutte, L., Koerich, A.L., Oliveira, L.S.: Image retrieval and pattern spotting using siamese neural network. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, Budapest, Hungary (2019). https://doi.org/10. 1109/IJCNN.2019.8852197 12. Benabdelaziz, R., Gaceb, D., Haddad, M.: Word spotting based on bispace similarity for visual information retrieval in handwritten document images. Int. J. Comput. Vis. Image Process. 9, 38–58 (2019). https://doi.org/10.4018/IJCVIP.2019070103 13. Benabdelaziz, R., Gaceb, D., Haddad, M.: Word-Spotting approach using transfer deep learning of a CNN network. In: 020 1st International Conference on Communications, Control Systems and Signal Processing (CCSSP). pp. 219–224. IEEE, EL OUED, Algeria (2020). https://doi.org/10.1109/CCSSP49278.2020.9151583 14. Puigcerver, J., Toselli, A.H., Vidal, E.: ICDAR2015 competition on keyword spotting for handwritten documents. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 1176–1180. IEEE, Tunis, Tunisia (2015). https://doi.org/10.1109/ ICDAR.2015.7333946
1126
R. Benabdelaziz et al.
15. Retsinas, G., Louloudis, G., Stamatopoulos, N., Gatos, B.: Keyword spotting in handwritten documents using projections of oriented gradients. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 411–416. IEEE, Santorini, Greece (2016). https://doi.org/ 10.1109/DAS.2016.61 16. Sfikas, G., Retsinas, G., Gatos, B.: Zoning aggregated hypercolumns for keyword spotting. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 283–288. IEEE, Shenzhen, China (2016). https://doi.org/10.1109/ICFHR.2016.0061 17. Sudholt, S., Rothacker, L., Fink, G.A.: Learning local image descriptors for word spotting. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 651–655. IEEE, Tunis, Tunisia (2015). https://doi.org/10.1109/ICDAR.2015.7333842 18. Ghosh, S.K., Valveny, E.: A sliding window framework for word spotting based on word attributes. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, vol. 9117, pp. 652–661. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_73 19. Zagoris, K., Pratikakis, I., Gatos, B.: Segmentation-based historical handwritten word spotting using document-specific local features. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 9–14. IEEE, Greece (2014). https://doi.org/10.1109/ICFHR. 2014.10
Handwritten Arabic Character Recognition: Comparison of Conventional Machine Learning and Deep Learning Approaches Faouci Soumia1(B) , Gaceb Djamel1 , and Mohammed Haddad2 1 Laboratory of Computer Science, Modeling, Optimization, and Electronic Systems (LIMOSE)
FS, M’Hamed, Bouguerra University of Boumerdès, Boumerdès, Algeria {s.faouci,d.gaceb}@univ-boumerdes.dz 2 Lab LIRIS, UMR CNRS 5205, University of Claude Bernard Lyon 1, 69622 Villeurbanne, France [email protected]
Abstract. Over the last decades, automatic handwriting recognition has received a lot of attention, as it is a crucial component for many applications in various fields. Research for this issue has focused on handwriting recognition in Latin languages and fewer studies have been dedicated to the Arabic language. In this paper, we propose and compare two approaches to classifying Arabic characters. The first is based on conventional machine learning using the SVM classifier by comparing different sets of features, most commonly used in the pattern recognition field. The second is based on deep learning by testing different CNN (convolutional neural networks) architectures, which brings a self-characterization of Arabic features. In this context, a new fast and simplified CNN architecture is proposed. We also test different transfer learning strategies on two versions of the OIHACDB dataset and the AIA9K dataset proposed in the literature. In the experimental section, we show that the proposed CNN model achieves accuracies of 94.7%, 98.3%, and 95.2% on the test set of the three databases OIHACDB-28, OIHACDB-40, and AIA9K respectively. Our experiments enrich the tests already carried out on these datasets and show good results in comparison with the literature. Keywords: CNN · Deep learning · Arabic handwritten character recognition · Transfer learning · Feature extractor (FE) · Fine-tuning (FT)
1 Introduction Handwriting recognition has become one of the challenges of active research in the field of optical character recognition (OCR). This is proved by its increasing use in many applications for different domains such as automatic check processing in banks, signature verification, postal address recognition, writer identification, optical form reading, computer-assisted transcription of ancient manuscripts, etc. [1, 2]. The figure Fig. 1 shows the areas of use of handwritten text recognition. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1127–1138, 2021. https://doi.org/10.1007/978-3-030-70713-2_100
1128
F. Soumia et al.
Fig. 1. Areas of use of handwriting recognition.
The handwriting recognition system is a mechanism for recognizing human handwritten in any language from scanned handwritten image (offline handwriting) or realtime handwriting using a stylus pen on electronic device (online handwriting). The offline handwriting recognition system includes the following steps: image acquisition, preprocessing, segmentation, feature extraction, and classification. First, the handwriting to be recognized is digitized through scanners or cameras. The preprocessing step then helps to eliminate noise or distortions of the scanned image that can be used for further processing. Next, the image of the document is segmented into lines, words, and individual characters where this stage is an important step for handwriting in order to extract features from each character image, which will be performing in the feature extraction process. Finally, these features are used for classification in the last step. Handwriting is characterized by variations in the writing and style of different people. The main difficulties in recognizing handwriting are related to distortions and pattern variability. Several solutions have been proposed to solve the handwriting recognition problem such as: supporting vector machines (SVM) [3], K-nearest neighbors (KNN), neural networks (NN), and recently, convolutional neural networks (CNN) [4, 5]. However, previous research in this field has focused on handwriting recognition in Latin languages. The Arabic language was getting less attention as most of the existing systems are devoted to Latin scripts. The problem with the recognition of the Arabic alphabet is that several characters have similar shapes (main part), but the number or positions of dots are different, as shown in figure Fig. 2(a). Arabic script is characterized by the presence of ligatures, formed by combining two or more letters, for example, Alif-Laam. Besides, human writers have the ability to combine diacritical points, use dashes instead of dots, or change the shape of characters. An example of this situation is shown in figure Fig. 2(b).
Fig. 2. (a) Examples of Arabic characters which have the same main part and which differ in the diacritic position; (b) Writers and intra-class variability, three samples of the letter “thaa”.
Handwritten Arabic Character Recognition
1129
Deep Learning (DL) is a machine learning method, most often based on neural networks. DL algorithms have occupied first place in the field of object recognition. Among the types of neural networks, the most exploited are convolutional neural networks (CNN) [6]. In CNNs, we distinguish different types of layers: convolutional layers also called feature extractor layers [7], pooling layers [8], and fully connected layers [7, 8]. CNN also employs normalization and regularization techniques (called batch normalization operation) [9, 10]. The CNN models can be used in three different ways: 1) train CNN from scratch; 2) use the transfer learning strategy to take advantage of a pretrained model features on a larger or smaller dataset; and 3) keep the transfer learning strategy and refine the weights of the CNN architecture (partial or integral fine-tuning). In this article, we propose a simple and fast model of offline Arabic handwriting recognition using CNN. The proposed model is evaluated using two datasets OIHACDB [1] and AIA9K [21], then compared to models based on conventional machine learning using SVM classifier and a selection of different sets of features. In addition, we have developed and compared several CNN models based on transfer learning with the exploitation of different existing architectures (VGG, ResNet, etc.). Then, we introduce the results obtained from different tests.
2 Overview of the Arabic Handwriting Recognition Systems For Arabic handwriting recognition field, a limited number of datasets have been proposed in the literature. The most well-known datasets are: The Kharma/Ahmed/Ward database [11], IFHCDB [12], AHDB [13], CENPARM [14], IFN/ENIT [15], ADBase [16], HACDB [17], AHCD [18], OIHACDB [1, 19], CMATERDB 3.3.1 [20], AIA9K [21]. Many efforts have been made to solve the Arabic handwriting recognition problem. In [20], a multi-layer perceptron (MLP) classifier trained by back propagation (BP) algorithm was used to develop a set of 88 features for Arabic handwritten digit recognition. The image dataset used is CMATERDB 3.3.1 [20]. The proposed model achieved an average accuracy of 94.93%. Two methods were presented in [22] to improve the work described in [20]. The first one introduced changes to the existing MLP-based model. The second uses a CNN model trained with a back propagation algorithm. The MLP-based approaches produced an accuracy of 95.8%, while CNN obtained 97.4%. In [24] an accuracy of 98.59% was obtained on CMATERDB dataset by exploiting the Restricted Boltzmann Machine (RBM) [23]. CNN was also exploited in [25] for handwritten recognition of mixed numbers of several languages. The overall accuracy of the combined multilingual datasets was 99.26%. Recently, Ashiquzzaman and et al. [26] improved the work proposed in [21]. The changes made focus on increasing the data in the CMATERDB 3.3.1 dataset and using ELU as an activation function instead of ReLU. The accuracy of the proposed method reaches 99.4%. H. Miled worked at the level of PAWs (Part of Arabic Words) [27]. Initially, it deleted the diacritics, grouped the PAWs belonging to different words, segmented a pseudo word into graphemes and extracted primitives from these graphemes then built a HMM (hidden Markov chain) models of words. In [28], the authors constructed an Arabic data set containing 6,090 characters and 1,080 words then use Time delay neural networks. The method achieves an accuracy
1130
F. Soumia et al.
of 98.50% for letters and 96.90% for words. Maalej and et al. [29] proposed a CNNBLSTM hybrid model. CNN is used as an automatic feature extractor from raw images. Then, long-term bi-directional memory (BLSTM) followed by a temporal classification layer is used for sequence labeling. This hybrid model achieves a recognition rate of 92.21% on IFN/ENIT image dataset. Another method based on CNN model can be seen in [7]. The authors proposed a CNN model that has two convolutional layers. The proposed CNN model achieved 94.9% accuracy on the AHCD test data. In order to recognize the historical Arabic manuscript text, Alaasam and et al. [30] proposed a CNN model with two convolutional layers, two fully connected layers, and a final fully connected layer. By combining the handwritten text image dataset and the synthesized text image data set, an accuracy of 85% has been achieved. For the recognition of Arabic handwritten alphanumeric characters, Mudhsh and et al. [31] designed a deep alphanumeric neural network using VGG network and two methods of regularization and augmentation. Two datasets were used for the tests: accuracy of 99.57% on the ADBase dataset and 97.32% on the HACDB dataset. The authors of the AIA9K dataset (isolated handwritten characters) [21] tested several characteristics (GIST, HOG, LBP, SIFT, and SURF) with several classifiers (ANN, SVM linear/non-linear). The best rates obtained on this dataset are 94.28% with non-linear SVM (SIFT), 93.29% with ANN (SIFT). Younis proposed in [10] a CNN model that has three convolutional layers followed by a fully connected layer. This method achieved an accuracy of 94.7% and 94.8%, using the AHCD [18] and AIA9K [21] datasets, respectively. Boufenar and al. proposed in [1, 19] two versions of the OIHACDB database (OIHACDB-28 and OIHACDB-40). Two approaches have been tested: the first one is based on AIRS system (Artificial Immune Systems) [19], tested on the OIAHDCB-28 dataset using various features such as: zoning ‚number of related components‚ loops, etc. The recognition rate obtained by this system is 93.2%. The second is based on deep learning using two strategies: convolutional neural network (CNN) used as feature extractor (FE) based on AlexNet architecture and CNN architecture with fine-tuning [1]. Recently, Khayyat and et al. [32] introduced a new approach to classify Arabic manuscripts images. They developed a DL model using the MobileNet pretrained architecture to classify and predict the handwriting styles (six styles) of Arabic manuscripts images. Considering the great variability, the complexity, the challenges, the richness presented by the datasets (OIHACDB and AIA9K), we have chosen to extend the experiments on these datasets by developing and comparing two types of approaches: Conventional machine learning using different features most used in pattern recognition field (features selection) and Deep Learning using different CNN architectures (auto-extraction of features).
3 Proposed Methods In this section, we present the different methods that we have proposed and tested on the OIAHDCB [1, 19] and the AIA9K [21] datasets. The first method, which we have tested only on the OIHACDB-40 database, is based on the conventional machine learning using SVM classifier and different selected features. For the second method, we used both OIHACDB and AIA9K databases on which several CNN models (proposed CNN, ResNet, Inception V3, and VGG16) with different strategies were applied.
Handwritten Arabic Character Recognition
1131
3.1 Method Based on Conventional Machine Learning The choice of discriminating and relevant features is an important step for this sort of method. To this end, we evaluated and compared the discriminating power of various features on images of Arabic handwritten characters. These are grouped into three categories: 1) statistics‚ 2) structural and 3) global transformations. For the first time, we evaluated two statistical features, zoning‚ and hierarchical centroid. For zoning attributes, the image of the shape is split into n zones then, the densities of the black pixels are calculated for each zone. The hierarchical centroid features are extracted by calculating the centroid of the image and it transposed along the two axes and by dividing the image into two sub-images every time. In the second time‚ structural features are evaluated, we have chosen to test a feature based on line segments extracted from different types of lines of the considered shape of the character. For the global transformation attributes, we tested primitives extracted from the Gabor filters (Mean amplitude and mean energy) and the Zernike moments. Other features are tested, like Gradient orientation histograms (HOG) [33], SURF, and SIFT [33], which can belong to all three categories. Combinations of different characteristics have also been tested. For the classification module, we used the SVM classification method (linear). 3.2 Method Based on Deep Learning Using CNN Models The second method consists of using a CNN network and a deep learning process to automatically extract the features needed to recognize Arabic handwritten characters. Figure Fig. 3 shows the simplified CNN architecture that we developed and compared to other existing architectures. It consists of five convolutional layer blocs, each of them performs a batch normalization operation after a ReLU activation function. The normalization is followed by a pooling layer (max pooling) with a window size of 2 × 2. After convolution layers, our architecture includes a fully connected layer (FC), then an output layer of size 512, a regularization operation (dropout with a “keep probability” parameter of 20%), and at the end an output layer which is a 40 class Softmax layer on the OIHACDB-40 dataset and 28 class Softmax layer on the OIHACDB-28 and AIA9K datasets. For the AIA9K dataset, we have just changed the filter size of convolution layers (the filters used are of size: 32, 64, 64, 128, and 128). To update the weights during the training process, we used the Categorical CrossEntropy as a cost function that is the appropriate cost function for multi-class classification problems. For the optimization of the CNN model, we tested two methods: “Stochastic Gradient Descent” and “Adam”. “Adam” optimizer was chosen because it gave better results with a learning rate lr = 0.001. In order to compare our architecture to other CNN models based on VGG16, ResNet, Inception V3, and AlexNet architectures, we tested two different strategies: (1) Learning from scratch and (2) transfer learning. For the first strategy, we use the two architectures VGG16 and ALexNet initialized with random weights and trained for a number of epochs. The second strategy focused on fine-tuning based on the comparison of pretrained models: VGG16, Inception V3, and ResNet50. These architectures are exploited in two different ways: 1) the transfer learning strategy using the pre-trained models as feature extractor, and 2) maintain the transfer learning strategy and refine the CNN architecture weights.
1132
F. Soumia et al.
Fig. 3. Simplified architecture of proposed CNN (CB: convolutional base, FC: fully connected layers).
4 Evaluation and Results The proposed models are implemented in the Python programming languages using Tensorflow and Keras libraries. We also used Matlab and Builder C++ for feature extraction and Tanagra for the SVM classifier. To evaluate the performance of the proposed models, the accuracy of the prediction is calculated. Our simplified and fast CNN model was trained for 50 epochs in maximum. We applied the proposed models for the OIHACDB and the AIA9K (8737 images) databases. For OIHACDB we have two versions OIHACD-28, which includes 28 classes of Arabic handwritten characters (5600 images, 200 examples per class), and OIHACDB-40 (40 classes, 30000 images, 750 examples per class). OIHACDB-28 and OIHACDB-40 databases are divided into two groups: 75% of the images of this base are used for training and 25% are used for testing. AIA9K image dataset [21] is divided into three groups: 70% for training, 15% for testing, and 15% for validation. Here is a summary table (Table 1) of the different feature extraction approaches that we tested (SVM classifier is used) on the OIHACDB-40 database with the results obtained. Base on Table 1, it is shown that the combination of certain features improves the system performance considerably. The best recognition rate (97%) is achieved by the combination (Line segments, Gabor filter‚ Hierarchical Centroid‚ HOG, and Densities). These features were selected using the SBS (sequential backward selection) method. The cases of observed confusion are often due to the presence of strong deformations and the shape similarity of certain different letters (e.g. Thaa, Taa).
Handwritten Arabic Character Recognition
1133
Table 1. Recognition rate obtained by the different features on the OIAHDCB-40 dataset. Features used
Recognition rate using SVM
Line segments
61%
Gabor
66%
Hierarchical centroid
55%
Pixel density
46%
Zernike moments
74%
HOG
74%
SIFT
69%
SURF
74%
Gabor + Line segments
82%
Gabor + HOG
90%
Gabor + Hierarchical centroid
79%
Gabor + SURF
85%
Line segments, Gabor, Hierarchical centroid
86%
Line segments, Gabor, HOG
93%
Line segments, Gabor, Hierarchical centroid, HOG
94%
Line segments, Gabor, Hierarchical centroid, HOG, densities
97%
Table 2 shows the results obtained by applying the CNN models with the two strategies and the different architectures used (second method) on the test dataset of the AIA9K‚ OIHACDB-28 and OIHACDB-40 databases. From the result presented in Table 2, it can be seen that the proposed CNN model achieves accuracies of 94.7% and 98.3% on the test set of the two databases OIHACDB28 and OIHACDB-40 respectively. Additionally, for these images databases, the experimental results show that the VGG model used as features extractor gives the highest test accuracy (accuracy of 99%). Compared with the results obtained by Boufenar [1] with the same strategy using ALexNet as pre-training model (88.08% for OIHACDB-28 and 82.38% for OIHACDB-40), this result demonstrates a higher accuracy score. On the AIA9K image dataset, the highest accuracy (95.2%) was obtained by our proposed CNN model. This result is better than the results reported in related works [10, 21] (accuracy of 94.28% and 94.8% respectively). For models used in transfer learning mode, they did not improve accuracy.
1134
F. Soumia et al.
Table 2. Results of CNN models using different architectures with the two strategies on AIA9K‚ OIHACDB-28 and OIHACDB-40 datasets and comparison with other literature approaches without deep learning. FT: Fine-Tuning, FE: feature extraction mode. Methods
Accuracy OIHACDB-28
Accuracy OIHACDB-40
Accuracy AIA9K
Simplified CNN model 94.7%
98.3%
95.2%
VGG from scratch
–
85.7%
–
AlexNet from scratch
96%
92%
91.5%
Resnet50 architecture (FT)
87%
91%
74.1%
InceptionV3 architecture (FT)
93.1%
90.4%
88.5%
VGG16 architecture (FE)
99%
99%
92.1%
AlexNet (FE) [1]
88.08%
82.38%
–
AlexNet (FT) [1]
98.12%
98.12%
–
CNN model [10]
–
–
94.8%
Other methods AIRS [19]
93.25%
–
–
Decision tree [19]
91.66%
–
–
Naive Bayes [19]
90.67%
–
–
KNN [19]
76.91%
–
–
RF (Random forest) [19]
95.6%
–
–
We tried to change the number of convolution layers of the proposed CNN model. We have noticed that using different numbers of convolutional layers with different numbers of filters should help us achieve better precision. Note that system classification errors are caused by characters of similar morphology such as “Daal” versus “Raa” and “Zaay” or characters with diacritics like “Raa” and “Zaay”. The figure Fig. 4 shows the accuracy and loss curves of training and validation of the simplified CNN model during training on OIHACDB-28, OIHACDB-40, and AIA9K datasets.
Handwritten Arabic Character Recognition
1135
Fig. 4. Accuracy and loss of training and validation of the simplified CNN model during training; (a) AIAK9 dataset, (b) OIHACDB-40 dataset, (c) OIHACDB-28 dataset.
1136
F. Soumia et al.
5 Conclusion This paper deals with the problem of the recognition of Arabic handwritten characters associated with the complexity of Arabic script, the great variety of shape of the same character, intra/inter-writer variability, etc. In this context, two approaches are developed and compared: Method based on conventional machine learning using an SVM classifier with several sets of features (SURF, SIFT, Zernike, Gabor, etc.) and method based on deep learning using five different CNN architectures (simplified CNN, ResNet, Inception V3, AlexNet, and VGG16). The experiments carried out present a complement, extension, and enrichment to existing work carried out on three image databases: OIHACDB-28, OIHACDB-40, and AIA9K. These datasets were built to present challenges for recognition systems. The results obtained show the performances of the elaborated approaches and their efficiency compared to those of the literature. They also show the superiority of deep learning approaches over traditional learning approaches. This allows us to have good orientations to follow in the development of effective Arabic handwriting recognition systems. In future work, we plan to work on more databases. We want also try to do the training from scratch with ResNet and Inception V3 architectures instead of transfer learning from these pre-trained models and compare the results. Future work might include testing more pre-trained models like VGG19, Inception V4, etc. Another possibility would be combining a CNN with SVM; this may help to achieve better results.
References 1. Chaouki, B., Adlen, K., Mohamed, B.: Investigation on deep learning for off-line handwritten Arabic character recognition. Cogn. Systems Research, 50, 180–195, August 2018 2. Rashid, S., Schambach, M., Rottland, J., Null, S.: Low resolution Arabic recognition with multidimensional recurrent neural networks. In: 4th Proceedings of the International Workshop on Multilingual OCR, New York, p. 6 (2013) 3. Ait Aider, M., Hammouche, K., Gaceb, D.: Recognition of handwritten characters based on wavelet transform and SVM classifier. Int. Arab J. Inf. Technol. 15(6), 1082–1087 (2018) 4. Baldominos, A., Saez, Y., Isasi, P.: A survey of handwritten character recognition with mnist and emnist. Appl. Sci. 9(15), 3169 (2019). https://doi.org/10.3390/app9153169 5. Ramzan, M., Khan, H.U., Awan, S.M., Akhtar, W., Ilyas, M., Mahmood, A., Zamir, A.: A survey on using neural network based algorithms for handwritten digit recognition. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 9(9), 519–528 (2018) 6. Dhillon, A., Verma, G.K.: Convolutional neural network: a review of models, methodologies and applications to object detection. Prog. Artif. Intell. 9, 85–112 (2020). https://doi.org/10. 1007/s13748-019-00203-0 7. El-Sawy, A., Loey, M., Hazem, E.: Arabic handwritten characters recognition using convolutional neural network. WSEAS Tran. Comput. Res. 5(1), 11–19 (2017) 8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), vol. 1, pp. 1097–1105. Curran Associates Inc.57 Morehouse Lane, Red Hook; NY, United States, USA (2012) 9. Loffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd International Conference on International Conference on Machine Learning, France, vol. 37, pp. 448–456 (2015)
Handwritten Arabic Character Recognition
1137
10. Younis, K.: Arabic handwritten character recognition based on deep convolutional neural networks. Jordanian J. Comput. Inf. Technol. (JJCIT) 3(3), 186–200 (2018) 11. Kharma N., Ahmed M., Ward R.: A new comprehensive database of handwritten Arabic words, numbers, and signatures used for OCR testing. In: IEEE Canadian Conference 1999 on Electrical and Computer Engineering, Canada, vol. 2, pp. 766–768. IEEE (1999) 12. Saeed, M., Karim, F., Farhad, F., Majid, Z., Mohamad, G.: A comprehensive isolated Farsi/Arabic character database for handwritten OCR research. In: 10th International Workshop on Frontiers in Handwriting Recognition. Université de Rennes 1, La Baule (France), p. 5, October 2006 13. Somaya, A., Dave, E., Colin, H.: A data base for Arabic handwritten text recognition research. Int. Arab J. Inf. Technol. 1, 117–121 (2004) 14. Huda, A., Javad, S., Ching, Y., Suen, N.: A novel comprehensive database for arabic offline handwriting recognition. In: Computer Science and Software Engineering Department, Concordia University, 1455 de Maisonneuve Blvd. West, Montreal, Quebec, Canada, p. 6 (2008) 15. Mario, P., Samia, S., Volker, M., E.llouze, N., Hamid, A.: Ifn/Enit - database of handwritten arabic words. In: Francophone International Conference on writing and Document, Tunis, pp. 1–8 (2002) 16. Sherif, A., Ezzat, E.: Arabic handwritten digit recognition. Doc. Anal. Recogn. 11(3), 127–141 (2008) 17. Lawgali, A., Angelova, M., Bouridane, A.: HACDB: Handwritten Arabic characters database for automatic character recognition. In: EUVIP 2013: Proceedings of the 4th European Workshop on Visual Information Processing, pp. 255–259. IEEE, Piscataway, NJ (2013) 18. El-Sawy, A., Loey, M., El-bakry, H.: Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans. Comput. Res. 5, 11–19 (2017) 19. Chaouki, B., Mohamed, B., Marc, S.: An artificial immune system for offline isolated handwritten Arabic character recognition. Evolving Systems, 9, 25–41 (2018). https://doi.org/10. 1007/s12530-016-9169-1 (2016) 20. Nibaran, Das., Ayatullah, F., Sudip, S., Syed, S.: Handwritten Arabic numeral recognition using a multi-layer perceptron. In: Proceedings National Conference on Recent Trends in Information Systems, pp. 200–203 (2006) 21. Torki, M., Hussein, M. E., Elsallamy, A., Fayyaz, M., Yaser, S.: Window-Based Descriptors for Arabic Handwritten Alphabet Recognition: A Comparative Study on a Novel Dataset. arXiv:1411.3519 (2014) 22. Ashiquzzaman, A., Tushar, AK.: Handwritten Arabic numeral recognition using deep learning neural networks. In: IEEE International Conference on Imaging, Vision and Pattern Recognition (2017), Bangladesh, pp. 1–4. IEEE (2017) 23. Zhang, N., Ding, S., Zhang, J., Xue, Y.: An overview on restricted Boltzmann Machines. Neurocomputing, 275, 1186–1199 (2018) 24. Alani, A.: Arabic handwritten digit recognition based on restricted Boltzmann machine and convolutional neural networks. Information 8(4), 142 (2017) 25. Latif, G., Alghazo, J., Alzubaidi, L., Naseer, MM., Alghazo, Y.: Deep convolutional neural network for recognition of unified multi-language handwritten numerals. In: 2nd IEEE International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), London, pp. 90–95. IEEE (2018) 26. Ashiquzzaman, A., Tushar, AK., Rahman, A., Mohsin, F.: An efficient recognition method for handwritten Arabic numerals using CNN with data augmentation and dropout. In: Balas, V.E., Sharma, N., Chakrabarti, A., (eds) Data management, analytics and innovation, advances in intelligent systems and computing, pp. 299–30914. Springer, Singapore (2019)
1138
F. Soumia et al.
27. Miled, H., Olivier, C., Cheriet, M., Romeo-Pakker, K.: Une Méthode Rapide de Reconnaissance de l’Écriture Arabe Manuscrite. 16th GRETSI colloquium, Grenoble, pp. 857–860 (1997) 28. Mars, A., Antoniadis G.: Arabic online handwriting recognition using neural network. Int. J. Artifi. Intell. Appl. (IJAIA), 7(5), September 2016 29. Maalej, R., Kherallah, M.: Convolutional Neural network and BLSTM for offline Arabic handwriting recognition. In: International Arab Conference on Information Technology (ACIT) 2018, Werdanye, Lebanon, pp. 1–6 (2018) 30. Alaasam, R., Kurar B., Kassis M., El-Sana J.: Experiment study on utilizing convolutional neural networks to recognize historical Arabic handwritten text. In: 1st International Workshop on Arabic script analysis and recognition (ASAR), Nancy, France, pp. 124–128. (2017) 31. Mudhsh, M.A., Almodfer, R.: Arabic handwritten alphanumeric character recognition us very deep neural network. Information 8(3), 105 (2017) 32. Khayyat, M., Elrefaei, L.: A deep learning based prediction of arabic manuscripts handwriting style. Int. Arab J. Inf. Technol. 17(5), 1–10 (2020) 33. Sidheswar, R., Arun, K., Chandrabhanu, M.: Analysis of various image feature extraction methods against noisy image: SIFT, SURF and HOG. In: 2nd International Conference on Electrical, Computer and Communication Technologies (ICECCT), India, pp. 1–5 (2017)
Document Image Edge Detection Based on a Local Hysteresis Thresholding and Automatic Setting Using PSO Mohamed Benkhettou1(B)
, Nibel Nadjeh2
, and Djamel Gaceb1
1 LIMOSE Laboratory, M’hamed Bougara University, Boumerdès, Algeria
[email protected], [email protected] 2 LMCS Laboratory, Higher National School of Computer Science – ESI, Algiers, Algeria
Abstract. The problem of image segmentation is a persistent problem that fits within the framework of computer vision. We can see year after year new trends bringing in the research’s latest advances in the hope of reaching a goal, the one of having the optimal and ideal segmentation of images. In this article, we implement an optimization mechanism using PSO algorithm, it provides the key elements for a good analysis of the impact of parameterization on the quality of segmentation. We propose an improvement of Pratt’s metric which aims at providing symmetry, the use of a distance map favourable to computation times, a locally adaptive hysteresis thresholding approach which seems promising, and another one based on Hossain’s works. Our experiments have been performed on a new image bank that we have built by merging document image datasets presented during the DIBCO competitions, setting up a multitude of different challenges, varying from images of printed texts to degraded manuscripts. Keywords: Image segmentation · Edge detection · Quality evaluation · Combinatorial optimization · PSO
1 Introduction Image segmentation is a technique of dividing a digital image into multiple segments so as to simplify it. It is used to detect, extract and recognize any type of information carried by an image and is considered one of the most important steps in the image analysis process. Poor segmentation can lead to the failure of the entire computer vision process, especially works requiring human vision such as facial recognition [1], character recognition and handwritten word’s search in documents [2], different types of detections: tumors for medical imaging [3], intrusions and dangerous objects in surveillance, recognition or identification of objects/persons [4]. The constant quest for better performance makes it essential. However, the aim to achieve an optimal and generic segmentation still remains not reached, and is still open after half a century of efforts and thousands of papers and communications. We note region-based approaches (region growing, split and merge, etc.) [5], pixel classification approaches (e.g. K-means, Fisher, histogram mode separations, mean shift) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1139–1150, 2021. https://doi.org/10.1007/978-3-030-70713-2_101
1140
M. Benkhettou et al.
[6], edge detection approaches [7], as well as their hybridation [8]. However, the evolution of artificial intelligence has led to new possibilities and neural methods appeared (ANNs, CNNs, etc.), we have classified them as follows: a) low-level objective methods: these specialize in performing a low-level task (edge detection, region detection,…), b) highlevel objective methods: which directly target the goal without explicitly using low-level treatments (facial detection and recognition, object detection,…),and c) hybrid (multiobjective) methods: combining both, semantic segmentation methods are a good example [9]. Image edge detection is one of the most studied problems since the beginning of the work on image analysis, on access to the visual contents of scanned documents and on pattern recognition (OCR, OMR, document classification, postal sorting, signature recognition, etc.). This is largely due to the very intuitive nature of the edge, which very naturally appears as the ideal visual index in most situations. There is currently no complete and general process that could extract all types of edges in all situations (blur, degradation, noise, poor lighting, etc.) thus the edge detection remains besides the image segmentation an unsolved problem. This paper is organized as follows. Related works on image segmentation are presented in Sect. 1. Our proposed approach in Sect. 2. Experimentation and results in Sect. 3. A global discussion in Sect. 4 and a conclusion is given in the last section. 1.1 Overview of Edge Detection Even the fact that the perfect segmentation has never existed and remains non-existent to this day, its techniques including the first-order derivative methods have stood out by proving their simplicity and correct results in certain applications, which has attracted many researchers, even today after the appearance and development of deep learning techniques [1–4]. The most classical edge detectors such as: Roberts, Prewitt, Sobel, Compass, Kirsch and Robinson [4, 11, 12] give fairly good results on good quality images but have limitations on degraded images (like the images of old documents). Second order derivative methods have been proposed in [12], based on the detection of zero-crossing points of the second derivative, which are more efficient on blurred or poor contrast images but are not recommended on degraded or noisy images because of their high sensitivity to variations. Other, more elaborate approaches have been proposed in order to detect edges in an optimal way by determining criteria to be best met in advance. According to Canny [7, 11–13], optimal edge detection is based on 3 criteria: a) Good detection criterion (strong response even at weak edges). b) Good localization criterion (the exact localization of the edges). c) Uniqueness of response criterion (unique response for each edge without duplications). These criteria are given by functions and form the basis of the evaluation methods for contour segmentation. Canny aimed at best satisfying these criteria following these steps: 1) Image smoothing, 2) Calculation of the gradient with the Sobel operator, 3) Extraction of the local-maximas and 4) Hysteresis thresholding of the resulting map [14]. Several other improvements were presented in [7], we also cite the works of Deriche [15] and Shen and Castan [16]. Other approaches based on information content analysis have been proposed, such as: statistical contours [17], Pb using local luminance, color as well as textures [18], gPb [19], etc. Recently, conventional and deep neural networks approaches (mainly based on U-net architectures) have emerged, such as: Boosted Edge Learning (BEL) approach [20] based on supervised
Document Image Edge Detection Based on a Local Hysteresis
1141
learning of contours and boundaries, Multiscale or deep-description [21] which has been applied in edge detection of nature images, Sketch Tokens [22] and structured contours whose learning is based on textures using random forests [23]. With the advent and development of CNNs, the importance of hierarchical or deep feature extraction has been emphasized including Deep-Contour [24], DeepEdge [25], CSCNN [26] and many other references are cited in [27]. These methods, which are more complex, generally provide better robustness against noise and image degradation. Other metaheuristic approaches have been used to optimize edge detection to improve handwriting recognition [28]. Nowadays, several research works are being carried out combining different learning techniques and varying architectures, aiming at reducing the time of the learning phases, the detection times and increasing the relevance of the detections [27]. However, the door remains open to experimentation, especially with the use of GPU parallelism and the involvement of the latest transfer learning techniques, etc. We notice that the problem has been shifting over time. For our part, we wish to study the genericity of edge detection setting in the face of the great variability of document images in terms of quality and complexity. The segmentation quality evaluation methods are grouped into three main types: supervised (with reference image serving as ground truth), non-supervised (using to the information present on the images) and algorithm analysis [10].
2 Proposed Approach Our segmentation approach consists of performing a quantitative analysis of the impact of the various parameters involved throughout the entire process aimed at achieving optimal segmentation. We therefore propose to use a purely supervised segmentation architecture driven by the Particle Swarm Optimization algorithm (PSO) using the conventional evaluation metrics of contour segmentation as an objective function to be optimized, characterizing the good segmentation criteria. The result will be one of the most optimal segmentations of an image according to a given operator. This allows us to analyze the variation of the parameterization and its impact on the complete segmentation process. Here’s the proposed architecture: We use Deriche’s detector as it offers a good compromise between quality of segmentation, speed of execution and number of parameters to be varied, in particular the parameter α that controls the intensity of the smoothing. The gradient map obtained is thresholded by hysteresis. We therefore obtain three parameters involved. While the metric isn’t optimized, an adjustment of the parameters is made. In the case where the smoothing parameter is affected, we perform a new iteration from the first segmentation step. If there is only the threshold which has been affected, it is not then necessary to go back through the first two steps. Once the metric is optimized, we analyze the impact of the optimal set of parameters on the segmentation according to two criteria: good localization and good detection, the criterion of uniqueness of the response is guaranteed during the edge thinning when extracting local-maximas. The good detection constitutes the basis of the pixel classification after the segmentation process: good where the contours and the background are well detected, bad if there has been over-detection or under-detection. The good localization consists of determining whether a pixel correctly assigned to its class (contour/non-contour) is well located and
1142
M. Benkhettou et al.
Fig. 1. The global schema of our architecture.
does not have any translation due to the smoothing. It is important to know that incorrect pixel placement can result in a displaced edge at a given point. This type of anomaly is of variable magnitude depending on the field of application, for some where the exact location does not have a big impact on the goal of the segmentation, it is interesting to preserve the contours even if a shift of few pixels is noted, for this reason a good judgment allowing a certain flexibility is appreciable, this is possible using distance calculation. 2.1 Parameter Optimization Using PSO The three parameters involved in this segmentation constitute a three-dimensional search space, the adjustment of these parameters acts as a combinatorial optimization problem [29], we set: α ∈ [0.5, 7.0] with a step of 0.1, the lower threshold ∈ [5, 30] and the higher threshold ∈ [10, 80] with a step of 1. We will use PSO which is a robust, fast, relatively easy to parameterize algorithm offering a good compromise between the quality of the results and search time [30]. It is a classical and popular algorithm belonging to swarm intelligence algorithms, based on collective intelligence by simulating the social behaviour of birds. Each individual is called a particle. The search space represents the area that can be explored by all these particles. By positioning itself at a position X, a particle is translating a set of parameters admissible for the evaluation of the objective function, it is therefore a solution. Each particle at a time t has a position X, a velocity V, as well as a memory allowing it to remember its best position ever visited Pb (which allowed it to obtain the best value of the objective function). Over the iterations, each particle updates its parameters, using its best position Pb (as an individual component),
Document Image Edge Detection Based on a Local Hysteresis
1143
the best position ever visited by the whole swarm Gb (as a social component) as well as its current position and velocity according to these formulas [30]: Vit = c1 ∗ Vit−1 + c2 ∗ Pbti − Xit + c3 ∗ Gbt − Xit and Xit+1 = Xit + Vit (1) With c1: the inertia weight of the particle, c2: the importance given to the individual component and c3: the importance given to the social component. We will use PSO in its two versions, Mono and Multi-Objective [30, 31] with the settings recommended in [32]. Evaluation Metrics and Objective Functions. The use of the evaluation metrics as an objective function allows a quality-based segmentation control. For this reason, we use the metrics that are best suited to our chosen criteria. Good localisation: Pratt (FOM) seen in [25] as it offers flexibility regarding to the translation that can result from a too strong smoothing. Good detection: F-measure as it is very strict for this task, PSNR [11, 12] which is more concerned with signal levels, nevertheless, it represents a fairly good estimator when combined with other metrics. In addition to these, we perform a bi-objective optimization with Pratt/F-measure, aiming to show the correlation between the two metrics representing the two criteria.
2.2 Our Proposed Version of Pratt’s Formula The much-emphasized problem with Pratt’s formula lies in its non-symmetry as it is weak while facing under-detection errors. For this reason, we propose comparing the ground truth to the contour map and the contour map to the ground truth by calculating two distinct distance maps, one for each. Using this formula: 1 × i MP
1 Cont 2 − d k, II 1 + d k, ICont gt
(2)
Having ICont the distance between the ith pixel of the contour map and its position in the gt reference map, it is designated by the distance between the pixel and the nearest pixel is the distance between the ith pixel of the reference map and in the reference map. ICont I its supposed position in the contour map, using the same logic as the previous one. MP: Cont and k: the concerned pixel during the calculation. Thus, during the ∪ I Card ICont gt I over-detection the second distance would be equal to 0, giving the classic Pratt’s formula (with different normalization). When correctly detected, the two distances cancel each other. Under-detection, on the other hand, would only be partially penalized by Pratt, in our case, we calculate the distance in order to make the same evaluation as other cases. Another problem related to metrics based on the calculation of distances is the execution time, we propose using chamfer distance map which can replace the Euclidean distance in our case while having similar results. The chamfer distance [33] was a distance that was able to provide results that were almost similar to the Euclidean distance in record time by performing only two passes on the image, it is capable of providing the minimum distances at all points of the image. We will perform a bi-objective optimization with: our
1144
M. Benkhettou et al.
variant of Pratt/F-measure, aiming to verify the effectiveness of our variant according the distance to under-detections and its correlation with the F-measure.Having ICont gt th between the i pixel of the contour map and its position in the reference map, it is designated by the distance between the pixel and the nearest pixel in the reference map. is the distance between the ith pixel of the reference map and its supposed position ICont I
∪ ICont in the contour map, using the same logic as the previous one. MP: Card ICont gt I and k: the concerned pixel during the calculation. Thus, during the over-detection the second distance would be equal to 0, giving the classic Pratt’s formula (with different normalization). When correctly detected, the two distances cancel each other. Underdetection, on the other hand, would only be partially penalized by Pratt, in our case, we calculate the distance in order to make the same evaluation as other cases. Another problem related to metrics based on the calculation of distances is the execution time, we propose using chamfer distance map which can replace the Euclidean distance in our case while having similar results. The chamfer distance [33] was a distance that was able to provide results that were almost similar to the Euclidean distance in record time by performing only two passes on the image, it is capable of providing the minimum distances at all points of the image. We will perform a bi-objective optimization with: our variant of Pratt/F-measure, aiming to verify the effectiveness of our variant according to under-detections and its correlation with the F-measure. 2.3 Toward an Unsupervised Segmentation Method Using an Adaptive Thresholding We emphasize on automating the thresholding as it alone represents 2 parameters out of 3. We therefore propose two methods.
Our Variation of Hossain’s Method. The method proposed in [14] tends to get overdetections on degraded or too noisy images, this comes to the presence of large intensity fluctuations on this type of images which increases the value of the standard deviation, thus making the value kb negative. Consequently, the value of the low threshold Tb becomes zero, leaving only one high threshold. To remedy this problem, we propose the reformulation of the equation of Tb as follows: Tb = max(kb , c3 ∗ kh )
(3)
with c3 as a constant fixed between 0 and 1 (empirically: 0.6). Our Local Hysteresis Thresholding Method. Global approaches have the advantage of being very fast thanks to the calculation of a single threshold. However, they do not adapt very well to local variations (or local image quality): lighting or degradations distributed in a non-uniform manner on the image cause over or under-segmentation of certain areas of the image and degrade the quality of the overall segmentation, knowing that thresholding is an irreversible operation. To reduce this effect, we propose a new approach for calculating the thresholds Th and Tb for a hysteresis local and in an adaptive manner application. The latter is inspired by a variant of the concept of local thresholding by Sauvola presented in [34]. At each pixel (x, y) of the image of the local maxima of
Document Image Edge Detection Based on a Local Hysteresis
1145
the gradient, the thresholds Th and Tb are calculated locally (in a window of Radius r) using the Eqs. 4. k1 1 and Tb (x, y) = Th (x, y) = μ(x, y) 1 − k1 + σ (x, y) ∗ Th (4) k2 k3 With k2 : a positive constant (we recommend 130) which controls the dynamics of the standard deviation in the processing window, k1 : another positive (empirically fixed at 0.5), k3 : constant > 2 (fixed at 6.5), μ(x, y): the average greyscale in the sliding window of a size of (2r + 1)*(2r + 1), centred at the pixel (x, y) that has a luminance I (x, y). I.e. μ(x, y) = σ (x, y) =
1
+r
(2r + 1)2
k=−r
+r l=r
I (x + k, y + l)
+r +r 1 (I (x + k, y + l) − μ(x, y))2 k=−r l=r (2r + 1)
(5)
(6)
We speed up the method and make it independent of the size of the calculation window and more favourable to high resolution images, we use integral images as seen in [34]. The latter makes this local approach very competitive in computation time with local approaches but with better robustness. This property is very important because it makes a processing window of (35 × 35) having the same time cost as a window of (3 × 3) which will lead to noticeable better performance in constant time.
3 Experimentation and Results In order to cover the maximum amount of challenges we have conduced our experimentations on a dataset that we have compiled using DIBCO image banks [35] dating from 2009 to 2019. 3.1 Evaluation of Conventional Edge Detectors Genericity Our first execution was performed using four operators (Sobel, Canny, Deriche and ShenCastan) on images with very varied characteristics in order to study the performance of the operators with their optimal settings. The table below shows the results obtained after evaluation by F-measure. The results show that there is no better operator than another, nor a genericity for various images that can be obtained. In fact, it is a matter of using the right operator with the right set of parameters for a given image. This is due to the specific characteristics that depend on the image’s initial quality. We notice that Deriche detector offers relatively stable results for different images, unlike Shen-Castan who, despite the fact that obtains better average results, shows a weaker stability.
1146
M. Benkhettou et al.
Table 1. Conventional edge detectors performance (F-measure) on images of different quality and complexity. Image Sobel
Canny Deriche Shen-Castan
1
0.4485 0.4813 0.4844
0.5188
2
0.1756 0.2817 0.3978
0.2991
3
0.4829 0.5164 0.5222
0.5667
4
0.1580 0.2297 0.3254
0.2895
5
0.5186 0.5419 0.5476
0.6154
6
0.2465 0.3109 0.3892
0.3204
7
0.4792 0.5110 0.5324
0.6195
8
0.4472 0.4472 0.4472
0.4472
9
0.5319 0.4549 0.5234
0.5555
10
0.4895 0.4191 0.4865
0.5179
Table 2. Average quality evaluation per metric optimization. Quality evaluation Average F-measure Average PSNR
F-measure Pratt 0.61 15.32
0.54
Bi-objective Ours (Pratt) 0.6
14.19 15.13
0.59 15.29
3.2 Evaluation of Our Improved Version of Pratt’s Metric We perform optimizations with the different evaluation metrics as objective functions and evaluate the results obtained with the strictest metric (F-measure). The results shown in (Table 2) are obtained by averaging each optimization. The bi-objective optimization (F-measure + Pratt) brings a considerable plus compared to Pratt’s metric alone. However, the better results obtained by the F-measure allows us to tell the dominance of the good detection over the good localization, which explains why the results obtained by our metric are better than the results obtained with Pratt’s metric. The evaluation of Pratt’s metric using the chamfer distance map (in Table 3) shows a difference of the order of 0.06 which is neglectable since this bias is the same for all the images, thus does not affect the benchmarking process. Table 3. Chamfer distance vs Euclidian distance in calculation of Pratt’s formula.
Avg.
Chamfer distance
Euclidian distance
Resolution
Nb. of contour points
0.255 s
9373.30 s
1398 × 623
23415
Document Image Edge Detection Based on a Local Hysteresis
1147
3.3 Our Proposed Method Evaluation Here, we present the results obtained by our optimization mechanism (Best_PSO) as well as our two proposed methods compared to the two approaches of Fang and Hossain (Hossain 2016):
Average F-measure 0.5
0.465
0.4209
0.4517
0.4647
Ours (Global)
Ours (Local)
0.4 0.3
0.2199
0.2 0.1 0 F_Best (Global)
Fang 2009 (Global)
Hossain 2016 (Global)
Fig. 2. Average F-measure according to each segmentation method.
The following representation in box plots (Fig. 3) summarizes the variations in the value of the F-measure on all the images (the red line represents the median value of each approach).
Fig. 3. Box plot of each compared approach
We can see that our two local and global segmentation methods offer performances which approach the PSO approach as well as their stability, facing the diversity of the tested dataset. This results in a very reduced internal dispersion around the red lines. Some segmented images are shown below (See Fig. 4).
1148
M. Benkhettou et al.
Fig. 4. Example of original images (a) and (c) from our dataset and their corresponding segmentations (b) and (d) obtained using our version of Pratt’s formula in our architecture.
4 Discussion and Further Analysis The architecture in Fig. 1 is used for providing the best and optimal segmentation possible to achieve (in a fair time). It is not usable being in a purely supervised context. We noticed that there was no better operator than others (see Table 1), which was the first idea to highlight while proposing this architecture. We therefore managed to push the analysis further by varying the evaluation metrics (summarized in Table 2), where we notice that there is a compromise between the good detection and the good localization of the edges, as the smoothing process reduces noises but also erases the weakly marked edges. In addition, if the smoothing is too strong even when applied on well-marked edges, provokes a displacement that affects the good detection metrics, we therefore say that it depends on the field of application and the needs. We noticed that the best way that we used in order to keep the edges as intact as possible was to pass through the whole image with an adaptive and local treatment for each situation that the detector faces. In Fig. 2, we can see that our local approach in a purely unsupervised context, seems to perform very close results to our proposed architecture. The approaches that were proposed by both Fang and Hossain share a common basis with both of our approaches, the comparison with both was necessary (see Fig. 2 and 3) in order to show the impact of the changes that we performed. Figure 4 shows the quality level that we planned to reach with both of our approaches.
5 Conclusion In this paper, we have presented a global and local thresholding methods, as well as a mechanism involving PSO as an optimization metaheuristic that sets the parameters in a totally independent way. The latter is used in a supervised context, based on objective functions that measure the quality of the segmentation: F-measure, “FOM” by Pratt, the combination of both (in a bi-objective mode) and our improved version of FOM, this last offers a good compromise between good localization and good detection. The will to migrate to the purely unsupervised context allowed us to propose both segmentation methods offering results very close to the PSO mechanism that requires a reference image (ground truth).
Document Image Edge Detection Based on a Local Hysteresis
1149
References 1. Qian„ T., Zhang, F., Khan, S.U.: Facial expression recognition based on edge computing. In: 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), Shenzhen, China, pp. 410–415 (2019) 2. Putro, R.A.P., Putri, F.P., Prasetiyowati M.I.: A combined edge detection analysis and clustering based approach for real time text detection. In: 2019 5th International Conference on New Media Studies (CONMEDIA), Bali, Indonesia, pp. 59–62 (2019) 3. Hamad, Y.A., Simonov, K., Naeem, M.B.: Brain’s Tumor Edge Detection on Low Contrast Medical Images, pp. 45–50. AiCIS, Fallujah, Iraq (2018) 4. Eetha, S., Agrawal, S., Neelam, S.: Zynq FPGA based system design for video surveillance with sobel edge detection. In: 2018 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Hyderabad, India, pp. 76–79 (2018) 5. Chandra, J.N., Supraja, B.S., Bhavana, V.: A Survey on Advanced Segmentation Techniques in Image Processing Applications. ICCIC, Coimbatore, pp. 1–5 (2017) 6. Lebourgeois, F., Drira, F., Gaceb, D., Duong, J.: Fast integral meanshift: application to color segmentation of document images. ICDAR, USA, pp. 52–56 (2013) 7. Pandey, A., Shrivastava, S.K.: A survey paper on calcaneus bone tumor detection using different improved canny edge detector. In: ICSCA, Pondicherry, pp. 1–5 (2018) 8. Chen, H., Ding, H., He, X., Zhuang, H.: Color image segmentation based on seeded region growing with Canny edge detection. In: 2014 12th International Conference on Signal Processing (ICSP), Hangzhou, pp. 683–686 (2014) 9. Sevak, J.S., et al.: Survey on semantic image segmentation techniques. In: International Conference on Intelligent Sustainable Systems (ICISS), Palladam, pp. 306–313 (2017) 10. Chabrier, S., Laurent, H., Rosenberger, C., Emile, B.: Comparative study of contour detection evaluation criteria based on dissimilarity measures. EURASIP J. Image Video Process. (1), 693053 (2008) 11. Cocquerez, J., Philipp, S.: Analyse d’images: Filtrage et segmentation, Editions Massons (1995) 12. Bres, S., Jolion, J.M., Lebourgois, F.: Traitement et analyse des images numériques, Book, Hermes, p. 412 (2003) 13. Yuan L., Xu X.: Adaptive image edge detection algorithm based on canny operator. In: International Conference AITS, Harbin, pp. 28–31 (2015) 14. Hossain, F., Asaduzzaman, M., Abu Yousuf, M., Rahman, M.A.: Dynamic thresholding based adaptive canny edge detection. Int. J. Comput. Appl. 135(4), 37–41 (2016) 15. Deriche, R.: Fast algorithms for low-level vision. PAMI, 12(1), 78–87 (1990) 16. Shen, J., Castan, S.: An optimal linear operator for step edge detection. CVGIP: Graphical Models and Understanding, 54(2), 112–133 (1992) 17. Konishi S., Yuille, A.L., Coughlan, J.M., Zhu, S.C.: Statistical edge detection: learning and evaluating edge cues. PAMI, 25(1), 57–74 (2003) 18. Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE PAMI 26(5), 530–549 (2004) 19. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE PAMI 33(5), 898–916 (2011) 20. Dollar, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE PAMI, 37(8), 1558–1570 (2015) 21. Ren. X.: Multi-scale improves boundary detection in natural images. In: International conference ECCV, pp. 533–545 (2008) 22. Lim, J.J., Zitnick, C.L., Dollar, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: IEEE International Conference CVPR, Portland, pp. 3158– 3165 (2013)
1150
M. Benkhettou et al.
23. Dollar, P., Tu, Z., Belongie, S.: Supervised learning of edges and object boundaries. In: IEEE International Conference. CVPR, New York, USA, pp. 1964–1971 (2006) 24. Shen, W., Wang, X., Wang, Y., Bai, X., Zhang, Z.: Deepcontour: a deep convolutional feature learned by positive sharing loss for contour detection draft version. In: IEEE International Conference CVPR, Boston, pp. 3982–3991 (2015) 25. Bertasius, G., Shi, J., Torresani, L.: Deepedge: a multiscale bifurcated deep network for topdown contour detection. In: IEEE International Conference CVPR, Boston, pp. 4380–4389 (2015) 26. Hwang, J.J., Liu, T.L.: Pixel-wise deep learning for contour detection. ICLR (2015) 27. Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image Segmentation Using Deep Learning: A Survey (2020). arXiv:2001.05566v4[cs.CV] 28. Chaudhary, R., Patel, A., Kumar, S., Tomar, S.: Edge detection using particle swarm optimization technique. In: Interntional. Conference. ICCCA, Greater Noida, pp. 363–367 (2017) 29. Bose, A., Mali, K.: Fuzzy-based artificial bee colony optimization for gray image segmentation. Sig. Image Video Processing, 10, 1089–1096 (2016) 30. Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm intelligence (2007) 31. Parsopoulos, K.E., Vrahatis, M.N.: Multi-objective Optimization in Computational Intelligence: Theory and Practice, Book, IGI, p. 496 (2008) 32. Ashuri, B., Tavakolan, M.: Fuzzy enabled hybrid genetic algorithm–particle swarm optimization approach to solve tcro problems in construction project planning. J. Constr. Eng. Manag. 138(9), 1065–1074 (2012) 33. Thiel, E.: Les distances de chanfrein en analyse d’images: fondements et applications. Doctoral thesis, Institut IMAG, France, 177 (1994) 34. Gaceb, D., Lebourgeois, F., Duong, J.: Adaptative smart-binarization method: for images of business documents. In: IEEE International Conference ICDAR, Washington, pp. 118–122 (2013) 35. Pratikakis, I., et al.: ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019). In: International Conference ICDAR, Sydney, Australia, pp. 1547–1556 (2019)
Fast I2SDBSCAN Based on Integral Volume of 3D Histogram: Application to Color Layer Separation in Document Images Zakia Kezzoula(B) and Djamel Gaceb LIMOSE Laboratory, University M’Hamed Bougara of Boumerdes, Boumerdes, Algeria {z.kezzoula,d.gaceb}@univ-boumerdes.dz
Abstract. The optical reading of administrative documents using automatic analysis and recognition is a very demanding area in terms of the document quality in order to guarantee effective recognition of their content. Nowadays, when using color in administrative documents, digital experts have seen the importance of color support in facilitating access to the content of scanned documents, especially in the presence of quality degradation, stamps, handwritten notes, and marks on the text. In order to meet this current need, we propose a new method of color layer segmentation intended for document images. This makes it possible to simplify the separation and access to certain information which is very complex or impossible to extract from the image without color processing. It is a new variant of the original DBSCAN approach (called I2SDBSCAN, for integral double space DBSCAN), adapted to pixel clustering of document images guided by color densities. The use of integral volume in 3D color histogram and the coupling between Cartesian and colorimetric spaces have made it possible to considerably reduce the computation times. Experiments prove the effectiveness of the proposed method. Keywords: Color image preprocessing · Clustering · Document image segmentation · Fast I2SDBSCAN · 3D color histogram · Integral volume
1 Introduction Administrative, historical or medical documents play a very important role in the organizations of any type as they carry rich information, which obliges us to protect them and keep them from the various degradations that threaten these documents. The dematerialization and digitization of these documents became essential in order to be able to preserve them, extract and fully exploit the contained information by using systems dedicated to the automatic reading of images of administrative documents. In administrative documents, designers usually use colors to highlight the information area of the document (the total amount of the invoice, the area to be completed on the form, seal, etc.). This can be used to guide automatic reading systems. The separation of the document images into color layers improves and facilitates the extraction and access to superimposed information layers (stamps on text, handwritten notes on printed text, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 F. Saeed et al. (Eds.): IRICT 2020, LNDECT 72, pp. 1151–1163, 2021. https://doi.org/10.1007/978-3-030-70713-2_102
1152
Z. Kezzoula and D. Gaceb
Fig. 1. Example of superimposed color layers on document images.
watermarked text, appearance of the front on the back, etc.) and degraded document (see Fig. 1). This task requires an unsupervised colorimetric segmentation (pixel classification or clustering, superpixels generation and region detection) which should be robust, discriminating and has low computational complexity to offer a better separation of the elements that make up the physical document layout. Since the number of color layers is different from one image to another, and therefore initially unknown, it is necessary to use an unsupervised clustering approach which will be able to automatically identify the number of clusters. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most common unsupervised clustering algorithms, which has undergone various improvements and adaptations. It is resistant to noise and can handle clusters of various shapes and sizes. The principal drawback of the original method is its computational complexity, in particular in the context of pixel clustering and difficult parameterization to generate satisfactory clustering results on images of variable complexity. In this paper we propose a new variant of DBSCAN (called I2SDBSCAN, for integral double space DBSCAN), fast and adapted to pixel clustering of document images guided by color densities. The organization of this paper is as follows. The Sect. 2 describes the existing segmentation methods. Section 3 introduces our classification and segmentation methods. Finally, Sect. 4 is devoted to the presentation and discussion of the various experimental results obtained.
2 Existing Segmentation Methods There are many existing image segmentation approaches affecting different fields of application. They can be grouped into the following categories:
Fast I2SDBSCAN Based on Integral Volume of 3D Histogram
1153
2.1 Segmentation Method Based on Edge/Region Detection or Binarization Edge Detection. Its objective is based on the correct localization and detection of interregion transitions [1]. There are several types of methods in this category: derived methods that are very sensitive to noise [2], the analysis method introduced by Kass involving smoothing of derivatives and deformable models is also called “snakes” or “active contours” [2]. The disadvantage of these methods is that they are sensitive to initialization, noise, and difficulty to adjust various parameters. Binarization (Thresholding). Consists of automatically separating a document image (or any other image) into two layers: foreground and background. The goal of binarization is to speed up and simplify segmentation. Segmenting a color image can be very expensive, but it is much easier to divide it into two classes. In the literature there are three families of thresholding approaches: global, local (adaptive) [3], and mixed [4]. Global methods are fast but when the lighting of the image is non-uniform, the global threshold does not adapt to this change and gives very poor results. The local methods are more suited to local changes but their drawback lies in their computational complexity. Mixed methods take advantage of the advantages of two approaches. Region-based Segmentation. Divide the image into a set of regions according to a set of predefined criteria. The region-based segmentation mainly includes the region growing, split/merge and XY-Cut, which is a top-down method [5]. The best-known approach is region growing. The disadvantage of this approach lies in the difficult choice of the homogeneity predicate. The second approach is the split/merge using a quadtree and a Voronoi diagram [6]. The boundaries of regions obtained by these methods are usually imprecise and do not exactly coincide with the boundaries of objects in the image. There are also vertical and horizontal XY-Cut approaches. However, this method is not well adaptable to images of varied structures or with inclination as ancient Arabic documents [7]. Region-contour Cooperation. It exploits the advantages of segmentation by regions and contours to achieve a more precise and faithful segmentation result than that obtained using a single technique. The integration of these two types of segmentation can be achieved at different levels. There are three types of cooperation: sequential, mutual, results. Other approaches are based on the learning of artificial neural networks (ANN).
2.2 Segmentation Based Elements Classification (or Clustering) The classification/clustering based techniques are the techniques, which segment the image into classes/clusters having pixels with similar features. Classification techniques divide into two types, supervised and unsupervised classes according to the availability of a priori knowledge on the classes to be obtained.
1154
Z. Kezzoula and D. Gaceb
Supervised Classification. In this classification, the number of classes is known and we have a set of already labeled elements (pixels, superpixels, CCs, regions or block), serving as a training set. It is then a matter of being able to associate each new element with the most suitable class using the already labeled elements. Many classification algorithms exist in the literature such as SVM, k-PPV [8], ANN [9], CNN [10]; etc. Neelima et al. [9] employed artificial neural networks (ANNs) for the classification while extracting ad-hoc features such as shape structures and visual impressions. In [10], they present an approach to the automatic high level segmentation of interesting elements from paper documents (i.e. stamps, logos, printed blocks of text, signatures and tables). This approach requires a pre-segmentation of the pixels into objects, and then the classification of the objects into classes using a convolutional neural network (CNN). The CNNs are widely used in the field of pattern recognition and computer vision [11]. Li et al. [12] used CNNs to extract the features of connected components (CCs). In addition, they incorporated conditional random fields into their framework to consider relations between neighboring CCs. However, most of these CC-level separation methods start from CC extraction results, and their final performance heavily depends on the employed CC extraction methods, which often fails to handle overlapping cases. [13] Propose a local-global combined approach for document binarization. This model is composed of a global branch and a local branch, taking the global patches from the down-sampled image and cropped local patches from the source image as respective inputs. The final binary prediction is achieved by combining the results of these two branches. [14] Present a method that separates handwritten and machine-printed components that are mixed and overlapped in documents, it performs pixel-level classification using CNN model. Unsupervised Pixel Classification. The objective is then to be able to automatically group pixels considered similar in the same class. These approaches are divided into two categories: approaches which require knowing in advance the class number (e.g. k-means [15, 16], C-means, Fisher) and approaches which do not require knowing in advance the class number (DBSCAN). For our work of separating documents into color layers, we are interested in the second category of method, because the number of color layers is unknown (varies from one image to another). Several studies focus on the segmentation of color document images. In [16], they developed the k-means classification algorithm, and they presented an adaptive segmentation system that was created for the analysis of color documents’ images. This method is based on serialization of the k-means (or dynamic clouds) algorithm applied sequentially to the image in a sliding window. However, the algorithm can become very computationally intensive depending on the window size and the dimension of the feature space. In [17], they focused on the problem of transparency from the back to the front, a consequence of the paper’s property, the chemical quality of the ink, or the scanning conditions. This degradation results in marks reducing the readability of the document image. Their study addresses the removal of these marks in order to bring better readability to the digital documents. It represents an unsupervised recursive segmentation method of decorrelated data that is based on the
Fast I2SDBSCAN Based on Integral Volume of 3D Histogram
1155
use of the logarithmic histogram to guide recursive segmentation, and on principal component analysis and the “dynamic swarm” algorithm. In [18] another new approach to image segmentation of ancient Arabic manuscript-type color documents is proposed. The developed method operates directly on the luminance. The multiscale analysis allows a separation between the background and the foreground. Statistical features are extracted from the resulting foreground that is used by the fuzzy c-mean classification algorithm for text/graphic segmentation of the foreground. DBSCAN (Density-Based Spatial Clustering of Application with Noise) is one of the algorithms for grouping high density and low-density regions that are marked as noise. It requires two input parameters ε (Eps: the radius of the neighborhood to study) and MinPts (the minimum number of points that must be contained in the neighborhood). Its principle is to select an arbitrary point p and get all the points that are in its ε-neighborhood N_ε (p), (N_ε (p) = {q ∈ D| dist (p, q) ≤ ε}). If the number of points in N_ε (p) is greater than or equal to MinPts, a new cluster is identified, this cluster is then extended until it gets all the density points accessible from p, but if the number of points in N_ε (p) is not sufficient (