316 73 14MB
English Pages 719 [720] Year 2023
Lecture Notes in Networks and Systems 614
Sajid Anwar Abrar Ullah Álvaro Rocha Maria José Sousa Editors
Proceedings of International Conference on Information Technology and Applications ICITA 2022
Lecture Notes in Networks and Systems Volume 614
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Sajid Anwar · Abrar Ullah · Álvaro Rocha · Maria José Sousa Editors
Proceedings of International Conference on Information Technology and Applications ICITA 2022
Editors Sajid Anwar Institute of Management Sciences Peshawar, Pakistan Álvaro Rocha University of Lisbon Lisbon, Portugal
Abrar Ullah School of Mathematical and Computer Science Heriot-Watt University Dubai, United Arab Emirates Maria José Sousa University Institute of Lisbon (ISCTE) Lisbon, Portugal
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-9330-5 ISBN 978-981-19-9331-2 (eBook) https://doi.org/10.1007/978-981-19-9331-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Conference Organization
Honorary Chair David Tien, Senior Lecturer, Charles Sturt University and Vice Chairman, IEEE Computer Chapter, NSW, Australia Prof. Álvaro Rocha, Professor, University of Lisbon, Portugal, President of AISTI (Iberian Association for Information Systems and Technologies), Chair of IEEE SMC Portugal Section Society Chapter General Chair Dr. Abrar Ullah, Associate Professor, School of Mathematical and Computer Sciences, Heriot-Watt University, Dubai, United Arab Emirates Dr. Maria Jose, Pro-Rector for Distance Education, University Institute of Lisbon, Portugal General Co-chair Dr. Ryad Soobhany, Assistant Professor, School of Mathematical and Computer Sciences, Heriot-Watt University, Dubai, United Arab Emirates Dr. Imran Razzak, Senior Lecturer, School of Information Technology, Deakin University, Victoria, Australia Dr. Pedro Sebastião, Assistant Professor, University Institute of Lisbon, Portugal International Chair Dr. Sajid Anwar, Associate Professor, Institute of Management Sciences, Peshawar, Pakistan Dr. Anthany Chan, Charles Sturt University, Australia Workshop Chair Dr. Teresa Guarda, Director of the CIST Research and Innovation Center, Faculty of Systems and Telecommunications, UPSE, Ecuador Dr. B. B. Gupta, Assistant Professor, National Institute of Technology Kurukshetra, India
v
vi
Conference Organization
Special Session Chair Prof. Fernando Moreira, Professor Catedrático, Diretor do Departamento de Ciência e Tecnologia, Universidade Portucalense, Porto, Portugal Dr. Shah Nazir, Assistant Professor, University of Swabi, Pakistan Poster Chair Dr. Isabel Alexandre, Assistant Professor, University Institute of Lisbon, Portugal Joana Martinho da Costa, Invited Assistant Professor, University Institute of Lisbon, Portugal Program Committee Chair Dr. Sérgio Moro, Associate Professor, University Institute of Lisbon, Portugal Dr. Babar Shah, Associate Professor, Zayed University, Abu Dhabi, UAE
Preface
This conference addresses the importance that IT professionals, academics, and researchers stretch across narrowly defined subject areas and constantly acquire a global technical and social perspective. ICITA 2022 offers such an opportunity to facilitate cross-disciplinary and social gatherings. Due to the breadth and depth of the topics, it is challenging to class them into specific categories; however, for the convenience of readers, the conference covers a wide range of topics which are broadly split into Software Engineering, Machine Learning, Network Security, and Digital Media and Education. The need for novel software engineering (SE) tools and techniques which are highly reliable and greatly robust is the order of the day. There is a greater understanding that the design and evolution of software systems and tools must be “smart” if it is to remain efficient and effective. The nature of artifacts, from specifications through to delivery, produced during the construction of software systems can be very convoluted and difficult to manage. A software engineer cannot find all its intricacies by examining these artifacts manually. Automated tools and techniques are required to reflect over business knowledge to identify what is missing or could be effectively changed while producing and evolving these artifacts. There is an agreed belief among researchers that SE provides an ideal platform to apply and test the recent advances in AI (Artificial Intelligence) tools and techniques. More and more SE problems are now resolved through the application of AI, such as through tool automation and machine learning algorithms. Machine learning is a broad subfield of Computational Intelligence that is concerned with the development of techniques that allow computers to “learn”. With an increased and effective use of machine learning techniques, there has been a rising demand for the use of this approach in different fields of life. There is a wider application of machine learning in different domains of computer science including e-commerce, software engineering, robotics, digital media and education, and computer security. Given the opportunities and challenges of the emerging machine learning applications, this area has a great research potential for further investigation.
vii
viii
Preface
The growth of data has revolutionized the production of knowledge within and beyond science, by creating efficient ways to plan, conduct, disseminate, and assess high-quality novel research. The past decade has witnessed the creation of innovative approaches to produce, store, and analyze data, culminating in the emergence of the field of data science, which brings together computational, algorithmic, statistical, and mathematical techniques toward extrapolating knowledge from ever-growing data sources. This area of research is continuously growing and attracting a lot of interest. Computer security is a process of protecting computer software, hardware, and networks against harm. The application of computer security has a wider scope, including hardware, software, and network security. In the wake of rising security threats, it is eminent to improve security postures. This is an ongoing and active research area that attracts a lot of interest from researchers and practitioners. With the advent of the Internet and technology, the traditional teaching and learning has largely transformed into digital education. Teachers and students are significantly reliant upon the use of digital media in face-to-face classrooms and remote online learning. The adoption of digital media in education profoundly modifies the landscape of education, particularly with regards to online learning, e-learning, blended learning, and face-to-face digital assisted learning, offering new possibilities but also challenges that need to be explored and assessed. The International Conference on Information Technology and Applications (ICITA) is an initiative to consider the above-mentioned considerations and challenges. Besides the above topics, International Workshop on Information and Knowledge in the Internet of Things (IKIT) 2022 was run in conjunction with ICITA 2022 with a focus on the Internet of Things (IoT). In addition, 1st Workshop on Exponential Technologies in Business and Economics (Wetbe) was run with a focus on exponential technologies. ICITA 2022 was able to attract 138 submissions from 28 different countries across the world. From the 138 submissions, we accepted 62 submissions, which represents an acceptance rate of 44%. Out of 61, IKIT 2022 received 22 submissions with 10 accepted papers and Wetbe 2022 received 11 submissions with six accepted papers. Out of all submissions, 61 were selected to be published in this volume. The accepted papers under this volume were categorized under four different themes, including Software Engineering; Machine Learning and Data Science; Network Security, Internet of Things, and Smart technology; and Digital Media and Education. Each submission is reviewed by at least two to three reviewers, who are considered experts in the related submitted paper. The evaluation criteria include several issues, such as correctness, originality, technical strength, significance, quality of presentation, interest, and relevance to the conference scope. This volume is published in Lecture Notes in Networks and Systems Series by Springer, which has a high SJR impact. We would like to thank all Program Committee members as well as the additional reviewers for their effort in reviewing the papers. We hope that the topics covered in ICITA proceedings will help the readers to understand the intricacies involving the
Preface
ix
methods and tools of software engineering that have become an important element of nearly every branch of computer science. We would like to extend our special thanks to the keynote speakers, Helga Hambrock, Senior Instructional Designer, Adjunct Professor in Educational Technology and Instructional Design at Concordia University, Chicago, USA; Anthony Lewis Brooks, Associate Professor, Department of Architecture, Design and Media Technology, Aalborg University, Denmark; José Manuel Machado, Director of Centro ALGORITMI, Director of the Doctoral Program in Biomedical Engineering Department of Informatics/Centro ALGORITMI, School of Engineering, University of Minho, Portugal; and Ronnie Figueiredo, School of Technology and Management, Centre of Applied Research in Management and Economics (CARME), Polytechnic of Leiria, Portugal. Dubai, United Arab Emirates Peshawar, Pakistan
Abrar Ullah, Ph.D. Sajid Anwar, Ph.D.
Contents
Machine Learning and Data Science Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term Memory Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Farman Hassan, Auliya Ur Rahman, and Muhammad Hamza Mehmood
3
Implementation of Big Data and Blockchain for Health Data Management in Patient Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . António Pesqueira, Maria José Sousa, and Sama Bolog
17
Ambient PM2.5 Prediction Based on Prophet Forecasting Model in Anhui Province, China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmad Hasnain, Muhammad Zaffar Hashmi, Basit Nadeem, Mir Muhammad Nizamani, and Sibghat Ullah Bazai
27
Potato Leaf Disease Classification Using K-means Cluster Segmentation and Effective Deep Learning Networks . . . . . . . . . . . . . . . . . Md. Ashiqur Rahaman Nishad, Meherabin Akter Mitu, and Nusrat Jahan
35
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Banuki Nagodavithana and Abrar Ullah
47
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye Cataract Disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amir Sohail, Huma Qayyum, Farman Hassan, and Auliya Ur Rahman
63
DarkSiL Detector for Facial Emotion Recognition . . . . . . . . . . . . . . . . . . . . Tarim Dar and Ali Javed Review and Enhancement of Discrete Cosine Transform (DCT) for Medical Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emadalden Alhatami, Uzair Aslam Bhatti, MengXing Huang, and SiLing Feng
77
89
xi
xii
Contents
Early Courier Behavior and Churn Prediction Using Machine Learning in E-Commerce Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barı¸s Bayram, Eyüp Tolunay Küp, Co¸skun Özenç Bilgili, and Nergiz Co¸skun
99
Combining Different Data Sources for IIoT-Based Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Rodrigo Gomes, Vasco Amaral, and Fernando Brito e Abreu Comparative Analysis of Machine Learning Algorithms for Author Age and Gender Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Zarah Zainab, Feras Al-Obeidat, Fernando Moreira, Haji Gul, and Adnan Amin Prioritizing Educational Website Resource Adaptations: Data Analysis Supported by the k-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . 139 Luciano Azevedo de Souza, Michelle Merlino Lins Campos Ramos, and Helder Gomes Costa Voice Operated Fall Detection System Through Novel Acoustic Std-LTP Features and Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . 151 Usama Zafar, Farman Hassan, Muhammad Hamza Mehmood, Abdul Wahab, and Ali Javed Impact of COVID-19 on Predicting 2020 US Presidential Elections on Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Asif Khan, Huaping Zhang, Nada Boudjellal, Bashir Hayat, Lin Dai, Arshad Ahmad, and Ahmed Al-Hamed Health Mention Classification from User-Generated Reviews Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Romieo John, V. S. Anoop, and S. Asharaf Using Standard Machine Learning Language for Efficient Construction of Machine Learning Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . 189 Srinath Chiranjeevi and Bharat Reddy Machine Learning Approaches for Detecting Signs of Depression from Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Sarin Jickson, V. S. Anoop, and S. Asharaf Extremist Views Detection: Definition, Annotated Corpus, and Baseline Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Muhammad Anwar Hussain, Khurram Shahzad, and Sarina Sulaiman Chicken Disease Multiclass Classification Using Deep Learning . . . . . . . . 225 Mahendra Kumar Gourisaria, Aakarsh Arora, Saurabh Bilgaiyan, and Manoj Sahni
Contents
xiii
Deepfakes Catcher: A Novel Fused Truncated DenseNet Model for Deepfakes Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Fatima Khalid, Ali Javed, Aun Irtaza, and Khalid Mahmood Malik Benchmarking Innovation in Countries: A Multimethodology Approach Using K-Means and DEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Edilvando Pereira Eufrazio and Helder Gomes Costa Line of Work on Visible and Near-Infrared Spectrum Imaging for Vegetation Index Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Shendry Rosero Modeling and Predicting Daily COVID-19 (SARS-CoV-2) Mortality in Portugal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Alexandre Arriaga and Carlos J. Costa Software Engineering Digital Policies and Innovation: Contributions to Redefining Online Learning of Health Professionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Andreia de Bem Machado, Maria José Sousa, and Gertrudes Aparecida Dandolini Reference Framework for the Enterprise Architecture for National Organizations for Official Statistics: Literature Review . . . . . . . . . . . . . . . 299 Arlindo Nhabomba, Bráulio Alturas, and Isabel Alexandre Assessing the Impact of Process Awareness in Industry 4.0 . . . . . . . . . . . . 311 Pedro Estrela de Moura, Vasco Amaral, and Fernando Brito e Abreu An Overview on the Identification of Software Birthmarks for Software Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Shah Nazir and Habib Ullah Khan The Mayan Culture Video Game—“La Casa Maya” . . . . . . . . . . . . . . . . . . 331 Daniel Rodríguez-Orozco, Amílcar Pérez-Canto, Francisco Madera-Ramírez, and Víctor H. Menéndez-Domínguez Impact of Decentralized and Agile Digital Transformational Programs on the Pharmaceutical Industry, Including an Assessment of Digital Activity Metrics and Commercial Digital Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 António Pesqueira, Sama Bolog, Maria José Sousa, and Dora Almeida Sprinting from Waterfall: The Transformation of University Teaching of Project Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Anthony Chan, David Miller, Gopi Akella, and David Tien
xiv
Contents
Versioning: Representing Cultural Heritage Evidences on CIDOC-CRM via a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Ariele Câmara, Ana de Almeida, and João Oliveira Toward a Route Optimization Modular System . . . . . . . . . . . . . . . . . . . . . . 373 José Pinto, Manuel Filipe Santos, and Filipe Portela Intellectual Capital and Information Systems (Technology): What Does Some Literature Review Say? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Óscar Teixeira Ramada REST, GraphQL, and GraphQL Wrapper APIs Evaluation. A Computational Laboratory Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Antonio Quiña-Mera, Cathy Guevara-Vega, José Caiza, José Mise, and Pablo Landeta Design Science in Information Systems and Computing . . . . . . . . . . . . . . . 409 Joao Tiago Aparicio, Manuela Aparicio, and Carlos J. Costa Organizational e-Learning Systems’ Success in Industry . . . . . . . . . . . . . . 421 Clemens Julius Hannen and Manuela Aparicio Network Security Smart Pilot Decontamination Strategy for High and Low Contaminated Users in Massive MIMO-5G Network . . . . . . . . . . . . . . . . . . 435 Khalid Khan, Farhad Banoori, Muhammad Adnan, Rizwan Zahoor, Tarique Khan, Felix Obite, Nobel John William, Arshad Ahmad, Fawad Qayum, and Shah Nazir Cluster-Based Interference-Aware TDMA Scheduling in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Gohar Ali Messaging Application Using Bluetooth Low Energy . . . . . . . . . . . . . . . . . . 459 Nikhil Venkat Kumsetty, Sarvesh V. Sawant, and Bhawana Rudra Scalable and Reliable Orchestration for Balancing the Workload Among SDN Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 José Moura A Digital Steganography Technique Using Hybrid Encryption Methods for Secure Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Sharan Preet Kaur and Surender Singh Utilizing Blockchain Technology to Enhance Smart Home Security and Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Rehmat Ullah, Sibghat Ullah Bazai, Uzair Aslam, and Syed Ali Asghar Shah
Contents
xv
Quality of Service Improvement of 2D-OCDMA Network Based on Two Half of ZCC Code Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Mohanad Alayedi The Impact of 5G Networks on Organizations . . . . . . . . . . . . . . . . . . . . . . . . 511 Anthony Caiche, Teresa Guarda, Isidro Salinas, and Cindy Suarez Internet of Things and Smart Technology The Fast Health Interoperability Resources (FHIR) and Integrated Care, a Scoping Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 João Pavão, Rute Bastardo, and Nelson Pacheco Rocha Blockchain Based Secure Interoperable Framework for the Internet of Medical Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Wajid Rafique, Babar Shah, Saqib Hakak, Maqbool Khan, and Sajid Anwar WaterCrypt: Joint Watermarking and Encryption Scheme for Secure Privacy-Preserving Data Aggregation in Smart Metering Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Farzana Kabir, David Megías, and Tanya Koohpayeh Araghi Technological Applications for Smart Cities: Mapping Solutions . . . . . . . 557 Bruno Santos Cezario and André Luis Azevedo Guedes Duty—Cycling Based Energy-Efficient Framework for Smart Healthcare System (SHS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Bharti Rana and Yashwant Singh Quality 4.0 and Smart Product Development . . . . . . . . . . . . . . . . . . . . . . . . . 581 Sergio Salimbeni and Andrés Redchuk Embedded Vision System Controlled by Dual Multi-frequency Tones . . . 593 I. J. Orlando Guerrero, Ulises Ruiz, Loeza Corte, and Z. J. Hernadez Paxtian Determinants of City Mobile Applications Usage and Success . . . . . . . . . . 605 Rita d’Orey Pape, Carlos J. Costa, Manuela Aparicio, and Miguel de Castro Neto Sustainable Digital Transformation Canvas: Design Science Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Reihaneh Hajishirzi The Role of Community Pharmacies in Smart Cities: A Brief Systematic Review and a Conceptual Framework . . . . . . . . . . . . . . . . . . . . . 629 Carla Pires and Maria José Sousa
xvi
Contents
Digital Media and Education Education in the Post-covid Era: Educational Strategies in Smart and Sustainable Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Andreia de Bem Machado, João Rodrigues dos Santos, António Sacavém, Marc François Richter, and Maria José Sousa Digital Health and Wellbeing: The Case for Broadening the EU DigComp Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Anícia Rebelo Trindade, Debbie Holley, and Célio Gonçalo Marques An Information Systems Architecture Proposal for the Thermalism Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Frederico Branco, Catarina Gonçalves, Ramiro Gonçalves, Fernando Moreira, Manuel Au-Yong-Oliveira, and José Martins Technologies for Inclusive Wellbeing: IT and Transdisciplinary Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 Anthony L. Brooks Should the Colors Used in the Popular Products and Promotional Products Be Integrated? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 Takumi Kato Impact of Teacher Training on Student Achievement . . . . . . . . . . . . . . . . . 703 Miguel Sangurima Educational Data Mining: A Predictive Model to Reduce Student Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Carlos Redroban, Jorge Saavedra, Marcelo Leon, Sergio Nuñez, and Fabricio Echeverria Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Editors and Contributors
About the Editors Dr. Sajid Anwar is an Associate Professor at the Center of Excellence in Information Technology Institute of Management Sciences (IMSciences), Peshawar, Pakistan. He received his MS (Computer Science, 2007) and Ph.D. degrees (Software Engineering, 2011) from NUCES-FAST, Islamabad. Previously, he was head of Undergraduate Program in Software Engineering at IMSciences. Dr. Sajid Anwar is leading expert in Software architecture engineering and Software maintenance prediction. His research interests are cross-disciplinary and industry focused and includes: Search based Software Engineering; Prudent based Expert Systems; Customer Analytics, Active Learning and applying Data Mining and Machine Learning techniques to solve real world problems. Dr. Sajid Anwar is Associate editor of Expert Systems Journal Wiley. He has been a Guest Editor of numerous journals, such as Neural Computing and Applications, Cluster Computing Journal Springer, Grid Computing Journal Springer, Expert Systems Journal Wiley, Transactions on Emerging Telecommunications Technologies Wiley, and Computational and Mathematical Organization Theory Journal Springer. He is also Member Board Committee Institute of Creative Advanced Technologies, Science and Engineering, Korea (iCatse.org). He has supervised to completion many M.S. research students. He has conducted and led collaborative research with Government organizations and academia and has published over 50 research articles in prestigious conferences and journals. Dr. Abrar Ullah is an Associate Professor and Director of Postgraduate Studies at the School of Mathematical and Computer Science, Heriot Watt University, Dubai Campus. Abrar received the M.Sc. (Computer Science, 2000) from University of Peshawar. Abrar received the Ph.D. (Security and Usability) from University of Hertfordshire, UK. Abrar has been working in industry and academia for over 20 years. He has vast experience in teaching and development of enterprise systems. Abrar started his teaching career in 2002 as lecturer at the University of Peshawar and Provincial Health Services Academy Peshawar. In 2008, Abrar joined the ABMU
xvii
xviii
Editors and Contributors
NHS UK as Lead Developer and contributed to a number of key systems in the NHS. In 2011, Abrar joined professional services at Cardiff University as “Team Lead and Senior Systems Analyst” and led a number of successful strategic and national level projects. In the same period, besides his professional role, he also worked as lecturer of “Digital Media Design” for School of Medicine, Cardiff University. In 2017, Abrar held the role of lecturer at school of management and computer science, Cardiff Metropolitan University. He also held the role of “Lead Developer” at the NHS—Health Education and Improvement Wales (HEIW) until 2019. Abrar is General Chair of the 16th ICITA conference to be held in Lisbon, Portugal on 20– 22 October 2022. His research interests are cross-disciplinary and industry focused. Abrar has research interest in Security Engineering, Information Security, Usability, Usable Security, Online Examinations and Collusion Detection, Applying Machine Learning techniques to solve real world security problems. Abrar has published over 16 research articles in prestigious conferences and journals. Dr. Álvaro Rocha holds the title of Honorary Professor, and holds a D.Sc. in Information Science, Ph.D. in Information Systems and Technologies, M.Sc. in Information Management, and B.Sc in Computer Science. He is a Professor of Information Systems at the ISEG—Lisbon School of Economics and Management, University of Lisbon. He is also President of AISTI (the Iberian Association for Information Systems and Technologies), Chair of the IEEE Portugal Section Systems, Man, and Cybernetics Society Chapter, and Editor-in-Chief of both JISEM (Journal of Information Systems Engineering and Management) and RISTI (Iberian Journal of Information Systems and Technologies). Moreover, he has served as Vice-Chair of Experts for the European Commission’s Horizon 2020 program, and as an Expert at the COST—Intergovernmental Framework for European Cooperation in Science and Technology, at the Government of Italy’s Ministry of Education, Universities and Research, at the Government of Latvia’s Ministry of Finance, at the Government of Mexico’s National Council of Science and Technology, and at the Government of Polish’s National Science Centre. Dr. Maria José Sousa (Ph.D. in Industrial Management) is Pro-Rector for Distance Learning Development and a University Professor at ISCTE. She is also a research fellow at Business Research Unit, and has assumed a Post-Doc position from 20162018 in digital learning and digital skills, researching those fields, with several publications in journals with high impact factor (Journal of Business Research, Journal of Grid Computing, Future Generation Computer Systems, and others). And is collaborating as an expert in digital skills, with Delloite (Brussels) for a request of the European Commission in the creation of a new category regarding digital skills to be integrated with the European Innovation Scoreboard (EIS). She was a member of the Coordinator Committee of the Ph.D. in Management at Universidade Europeia. She was also a Senior Researcher at GEE (Research Office) in the Portuguese Ministry of Economy, responsible for Innovation, Research, and Entrepreneurship Policies, and a Knowledge and Competencies Manager at AMA, IP, Public Reform Agency (Ministry of the Presidency and the Ministers Council). She was also a Project
Editors and Contributors
xix
Manager at the Ministry of Labor and Employment, responsible for Innovation, and Evaluation and Development of the Qualifications Projects. Her research interests currently are public policies, health policies, innovation, and information science. She has developed major research in the innovation policies with articles published in high-level journals (as the European Planning Studies, Information Systems Frontiers, Systems Research, and Behavioral Science, Computational and Mathematical Organization Theory, Future Generation Computer Systems, and others). She is also the guest-editor of more than 5 Special Issues from Springer and Elsevier. She has participated in European projects of innovation transfer (for example, as Ambassador of EUWIN, and Co-coordinating an Erasmus+ project with ATO—Chamber of Commerce of Ankara about entrepreneurship) and is also an External Expert of COST Association—European Cooperation in Science and Technology, and former President of the ISO/TC 260—Human Resources Management, representing Portugal in the International Organization for Standardization.
Contributors Fernando Brito e Abreu ISTAR-IUL & ISCTE-Instituto Universitário de Lisboa, Lisboa, Portugal Muhammad Adnan Department of Computer Science, University of Swabi, Swabi, Pakistan Arshad Ahmad Institute of Software Systems Engineering, Johannes Kepler University, Linz, Austria; Department of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences and Technology, Haripur, Pakistan Gopi Akella Charles Sturt University, Wagga Wagga, New South Wales, Australia Ahmed Al-Hamed School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China Feras Al-Obeidat College of Technological Innovation, Zayed University, Abu Dhabi, UAE Mohanad Alayedi Department of Electronics, Ferhat Abbas University of Setif 1, Setif, Algeria Isabel Alexandre Iscte – Instituto Universitário de Lisboa, Lisboa, Portugal Emadalden Alhatami School of Information and Communication Engineering, Hainan University, Haikou, China Gohar Ali Department of Information Systems and Technology, Sur University College, Sur, Oman Dora Almeida Independent Researcher, Lisboa, Portugal
xx
Editors and Contributors
Bráulio Alturas Iscte – Instituto Universitário de Lisboa, Lisboa, Portugal Vasco Amaral NOVA LINCS & NOVA School of Science and Technology, Caparica, Portugal Adnan Amin Center for Excellence in Information Technology, Institute of Management Sciences, Peshawar, Pakistan V. S. Anoop Kerala Blockchain Academy, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India; School of Digital Sciences, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India Sajid Anwar College of Information Technology, Zayed University, Academic, UAE Joao Tiago Aparicio INESC-ID, Instituto, Superior Técnico, University of Lisbon, Lisbon, Portugal Manuela Aparicio NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Lisbon, Portugal Tanya Koohpayeh Araghi Internet Interdiscipinary Institute (IN3), Center for Cybersecurity Research of Catalonia (CYBERCAT), Universitat Oberta de Catalunya, Barcelona, Spain Aakarsh Arora School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India Alexandre Arriaga ISEG (Lisbon School of Economics and Management), Universidade de Lisboa, Lisbon, Portugal S. Asharaf Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India Uzair Aslam People’s Primary Healthcare Initiative (PPHI) Sindh, Karachi, Pakistan Manuel Au-Yong-Oliveira INESC TEC, GOVCOPP, Department of Economics, Management, Industrial Engineering and Tourism, University of Aveiro, Aveiro, Portugal Farhad Banoori South China University of Technology (SCUT), Guangzhou, China Rute Bastardo UNIDCOM, Science and Technology School, University of Trásos-Montes and Alto Douro, Vila Real, Portugal Barı¸s Bayram HepsiJET, ˙Istanbul, Turkey Sibghat Ullah Bazai College of Information and Communication Technology, BUITEMS, Quetta, Pakistan; Department of Computer Engineering, BUITEMS, Quetta, Pakistan
Editors and Contributors
xxi
Uzair Aslam Bhatti School of Information and Communication Engineering, Hainan University, Haikou, China Saurabh Bilgaiyan School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India Co¸skun Özenç Bilgili HepsiJET, ˙Istanbul, Turkey Sama Bolog University of Basel, Basel, Switzerland Nada Boudjellal School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; The Faculty of New Information and Communication Technologies, University Abdelhamid Mehri Constantine 2, Constantine, Algeria Frederico Branco Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal; INESC TEC, Porto, Portugal Anthony L. Brooks CREATE, Aalborg University, Aalborg, Denmark Anthony Caiche Universidad Estatal Peninsula de Santa Elena, La Libertad, Ecuador; CIST—Centro de Investigación en Sistemas y Telecomunicaciones, La Libertad, Ecuador José Caiza Universidad de Las Fuerzas Armadas ESPE, Latacunga, Ecuador Ariele Câmara Centro de Investigaçao em Ciências da Informaçao, Tecnologias e Arquitetura, Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal Bruno Santos Cezario Centro Universitario Augusto Motta, Rio de Janeiro, Brasil Anthony Chan Charles Sturt University, Wagga Wagga, New South Wales, Australia Srinath Chiranjeevi Vellore Institute of Technology, Bhopal, India Loeza Corte Universidad de la Cañada, Teotitlán de Flores Magón, Oax, México Carlos J. Costa Advance/ISEG—Lisbon School of Economics and Management, Universidade de Lisboa, Lisbon, Portugal Helder Gomes Costa Universidade Federal Fluminense, Niterói, RJ, Brazil Nergiz Co¸skun HepsiJET, ˙Istanbul, Turkey Lin Dai School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China Gertrudes Aparecida Dandolini Engineering and Knowledge Management Department, Federal University of Santa Catarina, Santa Catarina, Brazil
xxii
Editors and Contributors
Tarim Dar University of Engineering and Technology-Taxila, Department of Software Engineering, Taxila, Pakistan Ana de Almeida Centro de Investigaçao em Ciências da Informaçao, Tecnologias e Arquitetura, Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal; Centre for Informatics and Systems of the University of Coimbra (CISUC), Coimbra, Portugal Miguel de Castro Neto NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Lisbon, Portugal Pedro Estrela de Moura NOVA School of Science and Technology, Caparica, Portugal Luciano Azevedo de Souza Universidade Federal Fluminense, Niterói, RJ, Brazil João Rodrigues dos Santos Economics and Business Department, Universidade Europeia/IADE, Lisbon, Portugal Rita d’Orey Pape EIT InnoEnergy SE, Eindhoven, The Netherlands; ISCTE-IUL, Lisboa, Portugal Fabricio Echeverria Universidad ECOTEC, Samborondón, Ecuador Edilvando Pereira Eufrazio Universidade Federal Fluminense, Niterói, Brazil SiLing Feng School of Information and Communication Engineering, Hainan University, Haikou, China Rodrigo Gomes NOVA School of Science and Technology, Caparica, Portugal Catarina Gonçalves AquaValor – Centro de Valorização e Transferência de Tecnologia da Água, Chaves, Portugal Ramiro Gonçalves Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal; INESC TEC, Porto, Portugal; AquaValor – Centro de Valorização e Transferência de Tecnologia da Água, Chaves, Portugal Mahendra Kumar Gourisaria School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India Teresa Guarda Universidad Estatal Peninsula de Santa Elena, La Libertad, Ecuador; CIST—Centro de Investigación en Sistemas y Telecomunicaciones, La Libertad, Ecuador André Luis Azevedo Guedes Centro Universitario Augusto Motta, Rio de Janeiro, Brasil Cathy Guevara-Vega Universidad Técnica del Norte, Ibarra, Ecuador; eCIER Research Group, Universidad Técnica del Norte, Ibarra, Ecuador
Editors and Contributors
xxiii
Haji Gul Center for Excellence in Information Technology, Institute of Management Sciences, Peshawar, Pakistan Reihaneh Hajishirzi Advance/ISEG (Lisbon School of Economics & Management), Universidade de Lisboa, Lisbon, Portugal Saqib Hakak Faculty of Computer Science, Canadian Institute for Cybersecurity, University of New Brunswick, Fredericton, Canada Clemens Julius Hannen NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Lisbon, Portugal; aboDeinauto, Berlin, Germany Muhammad Zaffar Hashmi Department of Chemistry, COMSATS University Islamabad, Islamabad, Pakistan Ahmad Hasnain Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing, China; School of Geography, Nanjing Normal University, Nanjing, China; Jiangsu Center for Collaborative Innovation in Geographical Information, Resource Development and Application, Nanjing, China Farman Hassan University of Engineering and Technology, Taxila, Punjab, Pakistan Bashir Hayat Institute of Management Sciences Peshawar, Peshawar, Pakistan Z. J. Hernadez Paxtian Universidad de la Cañada, Teotitlán de Flores Magón, Oax, México Debbie Holley Department of Nursing Sciences, Bournemouth University, Poole, England MengXing Huang School of Information and Communication Engineering, Hainan University, Haikou, China Muhammad Anwar Hussain Department of Computer Science, University of Technology Malaysia, Johor Bahru, Malaysia Aun Irtaza Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan Nusrat Jahan Department of CSE, Daffodil International University, Dhaka, Bangladesh Ali Javed Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan Sarin Jickson Kerala Blockchain Academy, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India Romieo John Kerala Blockchain Academy, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India
xxiv
Editors and Contributors
Farzana Kabir Internet Interdiscipinary Institute (IN3), Center for Cybersecurity Research of Catalonia (CYBERCAT), Universitat Oberta de Catalunya, Barcelona, Spain Takumi Kato Meiji University, Tokyo, Japan Sharan Preet Kaur Chandigarh University, Chandigarh, India Fatima Khalid Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan Asif Khan School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China Habib Ullah Khan Department of Accounting & Information Systems, College of Business & Economics, Qatar University, Doha, Qatar Khalid Khan Beijing University of Posts and Telecommunications, Beijing, China Maqbool Khan Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur, Pakistan; Software Competence Center Hagenberg, Vienna, Austria Tarique Khan University of Politechnico Delle Marche, Ancona, Italy Nikhil Venkat Kumsetty National Institute of Technology Karnataka, Surathkal, India Eyüp Tolunay Küp HepsiJET, ˙Istanbul, Turkey Pablo Landeta Universidad Técnica del Norte, Ibarra, Ecuador Marcelo Leon Universidad ECOTEC, Samborondón, Ecuador Andreia de Bem Machado Engineering and Knowledge Management Department, Federal University of Santa Catarina, Santa Catarina, Brazil Francisco Madera-Ramírez Universidad Autónoma de Yucatán, Mérida, México Khalid Mahmood Malik Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA Célio Gonçalo Marques Polytechnic Institute of Tomar, Tomar, Portugal; Laboratory of Pedagogical, Innovation and Distance Learning (LIED.IPT), Tomar, Portugal José Martins INESC TEC, Porto, Portugal; AquaValor – Centro de Valorização e Transferência de Tecnologia da Água, Chaves, Portugal David Megías Internet Interdiscipinary Institute (IN3), Center for Cybersecurity Research of Catalonia (CYBERCAT), Universitat Oberta de Catalunya, Barcelona, Spain
Editors and Contributors
xxv
Muhammad Hamza Mehmood University of Engineering and Technology, Taxila, Pakistan Víctor H. Menéndez-Domínguez Universidad Autónoma de Yucatán, Mérida, México David Miller Charles Sturt University, Wagga Wagga, New South Wales, Australia José Mise Universidad de Las Fuerzas Armadas ESPE, Latacunga, Ecuador Meherabin Akter Mitu Department of CSE, Daffodil International University, Dhaka, Bangladesh Fernando Moreira REMIT, IJP, Universidade Portucalense, Porto, Portugal; IEETA, Universidade de Aveiro, Aveiro, Portugal José Moura Instituto de Telecomunicações (IT), Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal Basit Nadeem Department of Geography, Bahauddin Zakariya University, Multan, Pakistan Banuki Nagodavithana Heriot-Watt University, Dubai, UAE Shah Nazir Department of Computer Science, University of Swabi, Swabi, Pakistan Arlindo Nhabomba Iscte – Instituto Universitário de Lisboa, Lisboa, Portugal Md. Ashiqur Rahaman Nishad Department of CSE, Daffodil International University, Dhaka, Bangladesh Mir Muhammad Nizamani School of Ecology, Hainan University, Haikou, China Sergio Nuñez Universidad del Pacifico, Guayaquil, Ecuador Felix Obite Department of Physics, Ahmadu Bello University, Zaria, Nigeria João Oliveira Centro de Investigaçao em Ciências da Informaçao, Tecnologias e Arquitetura, Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal; Instituto de Telecomunicações, Lisboa, Portugal I. J. Orlando Guerrero Universidad de la Cañada, Teotitlán de Flores Magón, Oax, México João Pavão INESC-TEC, Science and Technology School, University of Trás-osMontes and Alto Douro, Vila Real, Portugal Amílcar Pérez-Canto Universidad Autónoma de Yucatán, Mérida, México António Pesqueira ISCTE-Instituto Universitário de Lisboa, Lisbon, Portugal José Pinto Algoritmi Research Centre, University of Minho, Guimarães, Portugal
xxvi
Editors and Contributors
Carla Pires CBIOS - Universidade Lusófona’s Research Center for Biosciences and Health Technologies, Lisbon, Portugal Filipe Portela Algoritmi Research Centre, University of Minho, Guimarães, Portugal; IOTECH—Innovation on Technology, Trofa, Portugal Fawad Qayum Department of Computer Science, IT University of Malakand, Totakan, Pakistan Huma Qayyum UET Taxila, Punjab, Pakistan Antonio Quiña-Mera Universidad Técnica del Norte, Ibarra, Ecuador; eCIER Research Group, Universidad Técnica del Norte, Ibarra, Ecuador Wajid Rafique Department of Computer Science and Operations Research, University of Montreal, Quebec, Canada Auliya Ur Rahman University of Engineering and Technology, Taxila, Punjab, Pakistan Óscar Teixeira Ramada ISCE - Douro - Instituto Superior de Ciências Educativas do Douro, Porto, Portugal Michelle Merlino Lins Campos Ramos Universidade Niterói, RJ, Brazil
Federal
Fluminense,
Bharti Rana Department of Computer Science and Information Technology, Central University of Jammu, Samba, Jammu & Kashmir, India Andrés Redchuk Universidad Rey Juan Carlos, Madrid, España Bharat Reddy National Institute of Technology, Calicut, India Carlos Redroban Universidad ECOTEC, Samborondón, Ecuador Marc François Richter Postgraduate Program in Environment and Sustainability (PPGAS), Universidade Estadual do Rio Grande do Sul, Porto Alegre, Brazil Nelson Pacheco Rocha IEETA, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal Daniel Rodríguez-Orozco Universidad Autónoma de Yucatán, Mérida, México Shendry Rosero Universidad Estatal Península de Santa Elena, La Libertad, Ecuador Bhawana Rudra National Institute of Technology Karnataka, Surathkal, India Ulises Ruiz Instituto nacional de astrofísica óptica y electrónica. Sta María Tonantzintla, San Andrés Cholula, Pue, México Jorge Saavedra Universidad Estatal Peninsula de Santa Elena, La Libertad, Ecuador
Editors and Contributors
xxvii
António Sacavém Economics and Europeia/IADE, Lisbon, Portugal
Business
Department,
Universidade
Manoj Sahni Department of Mathematics, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India Sergio Salimbeni Universidad del Salvador, Buenos Aires, Argentina Isidro Salinas Universidad Estatal Peninsula de Santa Elena, La Libertad, Ecuador; CIST—Centro de Investigación en Sistemas y Telecomunicaciones, La Libertad, Ecuador Miguel Sangurima Universidad Católica Andrés Bello, Caracas, Venezuela Manuel Filipe Santos Algoritmi Guimarães, Portugal
Research
Centre,
University
of
Minho,
Sarvesh V. Sawant National Institute of Technology Karnataka, Surathkal, India Babar Shah Center of Excellence in IT, Institute of Management Sciences, Peshawar, Pakistan Syed Ali Asghar Shah Department of Computer Engineering, BUITEMS, Quetta, Pakistan Khurram Shahzad Department of Data Science, University of the Punjab, Lahore, Pakistan Surender Singh Chandigarh University, Chandigarh, India Yashwant Singh Department of Computer Science and Information Technology, Central University of Jammu, Samba, Jammu & Kashmir, India Amir Sohail UET Taxila, Punjab, Pakistan Maria José Sousa University Institute of Lisbon (ISCTE), Lisbon, Portugal Cindy Suarez Universidad Estatal Peninsula de Santa Elena, La Libertad, Ecuador; CIST—Centro de Investigación en Sistemas y Telecomunicaciones, La Libertad, Ecuador Sarina Sulaiman Department of Computer Science, University of Technology Malaysia, Johor Bahru, Malaysia David Tien Charles Sturt University, Wagga Wagga, New South Wales, Australia Anícia Rebelo Trindade Polytechnic Institute of Tomar, Tomar, Portugal; Educational Technology Laboratory (LabTE), University of Coimbra, Coimbra, Portugal Abrar Ullah Heriot-Watt University, Dubai, UAE Rehmat Ullah Department of Computer Engineering, BUITEMS, Quetta, Pakistan Abdul Wahab University of Engineering and Technology, Taxila, Pakistan
xxviii
Editors and Contributors
Nobel John William University of Valencia, Valencia, Spain Usama Zafar University of Engineering and Technology, Taxila, Pakistan Rizwan Zahoor University of Campania Luigi Vanvitelli, Caserta, Italy Zarah Zainab City University of Science and Information Technology, Peshawar, Pakistan Huaping Zhang School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Machine Learning and Data Science
Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term Memory Network Farman Hassan, Auliya Ur Rahman, and Muhammad Hamza Mehmood
Abstract Sepsis has become a primary source of mortality of patients treated in intensive care units. The timely detection of sepsis assists in decreasing the mortality rate, as it becomes difficult to treat the patient if the symptoms get worsen. The primary objective of this work is to early detect sepsis patients by utilizing a deep learning model, and then perform a comparative analysis of the proposed system with other modern techniques to analyze the performance of the proposed model. In this work, we employed the long short-time memory model on the sepsis patient dataset. The three different performance metrics are used to evaluate the performance of the proposed system, i.e., accuracy, specificity, and AUROC. The results were obtained in three different windows after the patient was admitted to the intensive care unit, such as 4, 8, and 12 h window sizes. This proposed system achieved accuracy, specificity, and AUROC of 77, 75, and 91%, respectively. The comparison of the proposed system with other state-of-the-art techniques is performed on the basis of the above-mentioned performance metrics, which demonstrated the significance of the proposed system and proved that this system is reliable to implement in real-time environments. Keywords Sepsis · Deep learning · Long short-term memory · ICUs
1 Introduction Sepsis is a major topic in the medical research field, and different definitions have been used for sepsis such as sepsis is a disease of internal organ disorder of the human body [1] and it is a critical condition for patients created by the overwhelming immune response to an infection [2]. Additionally, sepsis is also defined as a syndrome without a criterion standard diagnosis [3]. During the infection of sepsis, the patient’s immunity system is unbalanced, in the case of infection of a disease, the human body spreads a special fluid against the disease to minimize the effect of that disease, the F. Hassan · A. U. Rahman (B) · M. H. Mehmood University of Engineering and Technology, Taxila, Pakistan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_1
3
4
F. Hassan et al.
infection enters the human body through the lungs and it gets deoxygenated blood from the heart and provides oxygenated blood to the heart, and after being infected, it put sepsis infection to blood vessels, which causes disorder of multiple human organs that ultimately results in the death of the patient [3]. Figure 1 shows the sepsis life cycle affecting human organs. According to WHO World Health Organization (WHO) report, annually about 1.7 million people are affected by sepsis infection and about 0.27 million patients die due to the sepsis disease in the USA while about 300,00,000 people are affected by the sepsis and about 20% of them, i.e., 60,00,000 patients die due to the sepsis all over the world [4]. The diagnosis and treatment of the sepsis disease are costly and a big portion of the annual budget about 24 billion U.S. dollars are consumed for this purpose [4]. Early sepsis detection through artificial intelligence techniques will reduce the extra budget required for sepsis detection, and the chances of patients surviving will increase as the patients at far places from diagnosis centers will also be able to check the sepsis presence by using sepsis detection devices and their treatment will be started as soon as possible [4]. The sepsis diagnosis is a big challenge even for the sepsis experts and the doctors dealing with sepsis patients on daily basis. SIRS criteria are used mostly for the sepsis definition to predict and detect sepsis, and SIRS criteria are based on body temperature and some other symptoms including
Fig. 1 Life cycle of sepsis
Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term …
5
cold, cough, etc. [5, 6]. Sepsis Organ, qSOFA, and MEWS are also used for rule-based criteria of Sepsis [7]. The SIRS criteria mentioned by [7] are listed as follows: • PACO2 – 12 × 109 cell/L • Heart Rate – >92 beats/min • Breathing Rate – 22 breaths/min Recent research in sepsis detection focused on patients with positive sepsis conditions while some methods focused on patients in the ICU to utilize their health records for early detection and prediction of sepsis [7]. K-nearest neighbors (KNN), recurrent neural network (RNN), gated recurrent unit (GRU), and long short-term memory (LSTM) units are utilized for sepsis detection with remarkable accuracy and another evaluation parameter [8]. Among all the algorithms for the detection of sepsis, the neural networks are the leading ones in each and every aspect. Specifically, researchers have employed neural network-based methods using the standard dataset, i.e., physionet.org . The dataset contains a tabular form of the data with hourly records of about 40,000 patients including vital signs as well as other clinical values of patients. Researchers have applied pre-processing techniques such as forward filling, backward filling, mean, and other useful techniques as the datasets are unbalanced so random forest, SVM, and neural networks of various types have been used to handle such types of datasets to avoid biased results [7–9]. Some researchers have evaluated their algorithms on the basis of accuracy, some evaluated on the basis of sensitivity and specificity, and some evaluated on the basis of accuracy, sensitivity, specificity, area under the ROC curve, precision, and recall for their algorithms [4, 10]. Various techniques have been applied to resolve the sepsis detection issue and the researchers experienced different evaluation results by applying different techniques for sepsis detection. Researchers have utilized machine learning and deep learning algorithms for their research. Barton et al. [11] trained the algorithm with the k-fold cross-validation technique and selected 80% data for training purposes and 10% for validation purposes while the last 10% for the testing purpose. The authors applied various classifiers and compared their accuracy. The accuracy of the convolutional neural network (CNN)-LSTM was at the top of the list with a 95% value, and the accuracy of CNN-LSTM + transfer was at the second-highest position with a 90% value, the accuracy of extreme gradient boost (X.G.B) lied at third position and the KNN_DTW gave the lowest accuracy of 46%. Biglarbeigi et al. [12] used only eight variables, namely, Temp., Heart Rate, Systolic Blood Pressure, Respiration Rate, Platelet, WBC, Glucose, and Creatinine out of 40 by applying the feature extraction technique. Biglarbeigi et al. [12] divided the dataset into four groups, namely, sets A, B, C, and D while further dividing the dataset into 80% training
6
F. Hassan et al.
set and 20% testing set. For the classification, the KNN classifier was utilized, and for better evaluation, they utilized the same method by selecting set A with 20,336 patient records as a training set and set B with 20,000 patient records as a testing set. Training accuracy of 99.7 and 99.6% testing accuracy was attained by using 80 and 20% training and testing ratios, respectively. Yao et al. [13] applied three classifiers, namely, decision tree, random forest, and logistic regression (LR). Furthermore, the authors used different autoencoders, namely, spatial autoencoder, temporal autoencoder, spatial–temporal autoencoder, and temporal plus spatial autoencoders, and compared the results of all three classifiers with them. The decision trees obtained an accuracy of 67% using a temporal autoencoder, the random forest obtained an accuracy of 0.722 using a temporal plus spatial autoencoder, and the logistic regression obtained an accuracy of 60.4% using a temporal autoencoder. Eskandar et al. [14] used biomarkers with a machine learning algorithm for sepsis detection and used the physiological data [6] for training and testing of the algorithm. Pre-processing was applied to data by sorting the data, after that, they applied a few algorithms using the dataset and compared their results, and the accuracy for KNN, SOFA, QSOFA, random forest, and multi-layer perceptron was 99, 64.5, 88.5, 98, and 98%, respectively. Rodriguez et al. [15] applied a supervised machine learning algorithm for sepsis detection and used the data that was collected from the ICU of three highlevel university hospitals named HPTU, HPTU, and IPS University, all hospitals were located in Colombia. The data obtained was of the patients above 18 years of age and skipped the entries and utilized sirs criteria for sepsis identification. A few classifiers were applied for sepsis detection and compared their results, the accuracy for random forest, support vector machine (SVM-ANOVA), SVM-dot, and neural network (NN) was 62.4, 61.7, 61.4, and 62.5%, respectively. Chen and Hernández [16] designed a model that performed the data pre-processing, feature engineering, model tuning, and analysis for the final stage of implementation. The dataset was imbalanced, and to resolve this issue, a random forest was applied, which is highly suitable for the imbalanced dataset, and the accuracy for the full model was 81.88% and for the compact model was 78.62%, respectively. Chicco et al. [17] used the clinical record of about 364 patients of Methodist medical center and Proctor Hospital; the dataset contained 189 men and 175 women of age between 20 and 86 years, with records of each patient who stayed between 1 and 48 days at the hospital; various algorithms were applied and compared their results; the accuracy of random forest, MLP, LR, DL, NB, SVM (linear), KNN, SVM (kernel), and DT was 32, 31, 31, 30, 27, 26, 23, 22, and 18%, respectively. Researchers have utilized deep learning approaches for the detection of sepsis. Al-Mualemi et al. [7] used an electronic health record to detect septic shock and severe sepsis conditions. For the training purpose, patient records with severe sepsis conditions were utilized and the eight initial vital signs were utilized for the sepsis prediction. Vital signs being used were H.R, Temp, S.B.P, and M.A.P. SIRS criteria were used for the definition of septic shock, and for the classification purpose, a deep learning algorithm was used. The results of RNN-LSTM, SVM-quadratic kernel,
Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term …
7
and adaptive-CNN were compared and the training accuracy of RNN-LSTM, SVMquadratic kernel, adaptive-CNN, RNN-LSTM was 92.72, 78.00, and 93.84%, respectively, while the testing accuracy of RNN-LSTM, SVM-quadratic, adaptive-CNN was 91.10, 68.00, and 93.18%, respectively. Alvi et al. [18] used the deep neural networks for the early detection of neonatal sepsis, neonatal sepsis is a sepsis condition regarding the mother and newly born baby [4, 19], and two datasets were used, namely, Lopez Martinez and AJ Masino, both datasets were obtained from different fields of study and thus gave a variety of data for training and testing purpose; the dataset Lopez Martinez contained about 3% sepsis cases and 66% non-sepsis cases, which gives 1:2 of an imbalanced dataset, and the dataset contained 50 columns including labels; if the labels are removed, then a suitable matrix of 7 × 7 can easily be made that resemble the handwritten dataset MNIST. Moreover, artificial neural networks (ANN), CNN, and LSTM-RNN were applied, and their results were compared, the LSTM-RNN gave the highest accuracy of 99.40% while the accuracy of ANN and CNN was 98.20 and 97.21%, respectively. Kok et al. [9, 20] applied a temporal convolutional network for sepsis detection and used the Gaussian process regression (GPR) to predict the distribution of possible values for missing values of all entries. Furthermore, a temporal convolutional neural network was trained using a training dataset and obtained an accuracy of 95.5% and the testing accuracy of 80.0% while on the basis of time-step metrics the accuracy was 98.8%. Fu et al. [21] applied a convolutional neural network for sepsis detection, missing values were replaced with 0 for CNN and −1 for RNN; to remove the effects of missing values, feature selection technique was applied and selected 11 features out of 40, and they bagged both CNN and RNN-LSTM. By averaging the outcomes of the CNN and RNN-LSTM, the model was ensembled and applied to the testing of the model with a testing dataset. Performances of CNN, RNN, and an ensemble were compared. The accuracy for CNN, RNN, and ensemble was 89.4, 87.5, and 92.7%, respectively. Wickramaratne and Shaad Mahmud [8] transferred the labels of the dataset 6 h ahead for early prediction of sepsis and used the initial eight vital. Labels were encoded to 1 and 0 for sepsis and normal cases using one-hot encoding before feeding into the network. GRU applied GRU and then used only vital signs as well as vital signs with the laboratory values, the accuracy for GRU was 99.8%, accuracy with only vital signs was 96.7%, and when used vital signs with laboratory values, the accuracy became 98.1%. Baral et al. [22] used bi-directional LSTM for the sepsis prediction, they applied feature extraction technique using MLP and LSTM, the dataset was unbalanced, and they used Synthetic Minority Over-sampling Technique (SMOTE) to resolve the issue of unbalance dataset; to resolve the issue of irregular time series, they applied bucketing technique, after that, they applied bi-directional LSTM for classification purpose; they compared the results obtained from state of art algorithm and the proposed one, and the accuracy of the state of art was 85.7% and the accuracy of the proposed solution was 92.5%. Rafiei et al. [23] applied a fully connected LSTM-CNN model for sepsis detection, they utilized the dataset available at Physionet.org (Barton et al. 2021), they used two modes of dataset 1 with vital signs and demographic values of the patient and second one with using clinical values of the patient, in the first mode, they used LSTM while in the second mode, they used GRU
8
F. Hassan et al.
for Sepsis prediction, in the first mode, they measured the accuracy of the algorithm, the accuracy for 4, 8, and 12 h was 75, 69, and 72%, respectively, while in the second mode, the accuracy for 4, 8, and 12 h was 68, 66, and 62%, respectively. Van et al. [24] applied BiLSTM neural network for sepsis detection, they used the health record of the ICU department of Ghent University Hospital located in Belgium containing records of 2177 patients, and they utilized SVM, ANN, and ANN for comparison of their algorithm; they produced the results and compared the accuracy; the accuracy of BiLSTM, ANN, SVM, KNN, and LR was 0.83, 0.56, 0.55, 0.35, and 0.54 respectively. [25] applied bi-directional long short-term memory and medical-grade wearable multisensory system for Sepsis prediction, they used electronic health records of the MIMIC-III database for training and testing of their model, they selected 5699 patients record aged over 18 years, they applied forward filling approach to fill up the 30% empty values, they selected patients record contained 2748 non-sepsis cases and 2130 sepsis cases, they used 5-folds for training of their algorithm, they compared their algorithm with CNN, LSTM, XGBOOST, MLP, and Random Forest, and the accuracy of TOP-NET, CNN, LSTM, XGBOOST, and Random Forest was 90.1, 89.3, 89.8, 88.3, 87.9, and 87.3%, respectively. Silva et al. [26] used deep signs for Sepsis prediction, they applied deep signs learning-based LSTM algorithm on the electronic health record of patients with time series, their applied algorithm APACHE II uses 16 attributes including age, and the deep signs used 15 attributes except for age, they applied 10, 15, 20, and 25 epochs, and then they evaluated their algorithm using accuracy and their accuracy value became 81.50%. The above literature has shown improvement in the detection of sepsis, however, none of the work has focused on smaller windows, for example, 4, 8, and 12 h. There is a need for sepsis detection system that better monitors the conditions of patients having sepsis disease after the patients are admitted at ICU for a brief period of time. Therefore, we proposed a LSTM-based sepsis detector for the early detection of sepsis in smaller windows, namely, 4, 8, and 12 h. Our major contributions to this work are given as follows: • We designed a novel LSTM-based approach for early detection of sepsis. • We evaluated the performance of our approach for a brief period of time, particularly, 4, 8, and 12 h. • The proposed technique successfully detected the sepsis and non-sepsis with a better detection rate. • For validation of our approach, rigorous experimentations were conducted using clinical data. • Comparative assessment with existing approaches indicates that our approach has the capability to detect the sepsis disease and can be employed in emergency centers. We organized the remaining manuscript as follows: Sect. 2 discussed the proposed working mechanism, Sect. 3 has details of experimental evaluation, and finally, Sect. 4 concluded this work.
Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term …
9
2 Proposed Methodology The main purpose of this work is to detect sepsis patients using the clinical data available at PhysioNet (Barton et al. 2021). Our approach is comprised of three main stages, namely, the pre-processing, features extraction, and finally, the classification stage. In the initial stage of this work, we employed two techniques, namely, forward filling and backward filling of MAP and DBP attributes. Next, in the second stage of this work, we selected those patients that have observations more than once and whose clinical data are available for regular hours. Furthermore, we employed a filter for patients that have a prediction in the range of 3–24 h. Next, in the classification stage, we selected only the Vital signs from both training and testing sets. LSTM was trained using the training set while the testing set was utilized for the evaluation purposes. Finally, our approach detects whether a person has sepsis disease or a normal person. The detailed working procedure of our approach is shown in Fig. 2.
2.1 Pre-processing The pre-processing was performed on the Physionet dataset in order to remove its flaws and make it understandable. The utilized dataset was obtained from Barton et al. (2021), which contained two classes of data, named as A and B. These classes have PSV files that incorporated hourly records of patients. There are more than
Fig. 2 Working system
10
F. Hassan et al.
40,000 hourly records present in both of these classes, which are obtained from both sepsis and non-sepsis patients. The dataset has 31% missing values, which were filled by using two modern data filling techniques, i.e., backward filling and forward filling [27]. But still some of MAP and DBP values were missing, which were filled by using D.B.P. and M.A.P formula. The forward filling was used to fill the missing value with its preceding value present in CSV or excel file, whereas backward filling was applied on the initial row which was not possible to be filled by forward filling, and these missing values were filled by values of the next row.
2.2 Intelligent Sepsis Detector We proposed a novel intelligent sepsis detector based on LSTM network [28–30]. LSTM is an important type of recurrent neural network (RNN), which is competent for storing data for a long time. The vanishing gradient problem of RNNs is resolved in this particular RNN, LSTM. The memory cells and gates present in the hidden recurrent layer of LSTM make it able to store dependencies for long intervals of time. A single memory cell of LSTM is shown in Fig. 3. Each memory cell of LSTM maintains a cell state vector ct and each time next memory cell can read, write, or reset the cell by using an explicit gating mechanism. Each of the memory cells has four gates, i.e., input gate it , modulation gate gt , output gate ot , and forced gate f t . The x t , ht , and t in Fig. 3 represent the input, hidden state, and time, respectively. The input gate it controls whether the memory cell is updated, the modulation gate gt controls the internal state cell ct and make information meaningless, the forget f t controls whether the memory cell is set to zero, and output gate ot controls whether the information of current cell state is made visible. These all gates have a sigmoid activation ranging from 0 to 1. These gates are calculated using the following formulas. Fig. 3 LSTM memory cell
Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term …
11
i t = σ (Wi xt + Vi h t−1 + bi )
(1)
f t = σ W f xt + V f h t−1 + b f
(2)
ot = σ (Wo xt + Vo h t−1 + bo )
(3)
gt = tanh Wg xt + Vg h t−1 + bg
(4)
ct = f t h t−1 + i t gt
(5)
h t = ot tanh((ct ))
(6)
The tanh of the modulation gate gt allows the well distribution of gradient without vanishing, which allows the information to flow well for a longer time without vanishing. This allows the dependencies to flow for a long interval of time. In this study, the TensorFlow library of Python is used to implement the LSTM model. We utilized the following configuration settings of LSTM for the detection of sepsis, namely, 5 LSTM layers, hidden units of size 200, activation function of ReLU, padding is same, mini-batch size of 64, adam optimizer, maximum epochs of 600, and the 5 layers were followed by a SoftMax and a fully connected layer. Furthermore, we utilized various configurations; however, we obtained good detection performance by using the above-mentioned configurations.
3 Experimental Results and Discussions This section has presented the experimental details. Our approach is evaluated using three performance parameters, namely, accuracy, specificity, and AUROC. The details are discussed in the subsequent sections.
3.1 Dataset We used the dataset, which is publicly available at Physionet (Barton et al. 2021). The Physionet gives a complete research guide in the form of related research papers including conference papers as well as journal papers (Barton et al. 2021). Physionet provides a platform for research teams to work on the annual challenge, and teams from all over the world research on the topic and submit their research papers on this platform. The dataset used is in the form of PSV files, we converted PSV files to CSV format for further processing. This dataset comprises of two training sets
12
F. Hassan et al.
Training set A and Training Set B, each of them has clinical values of approximately 20,000 patients of sepsis during their admission in the hospital for each hour for 24 h stay with about 31.63% of NaN values. The clinical values include attributes Resp, EtCO2, H.R., O2Sat, M.A.P, D.B.P, Temp, S.B.P, BaseExcess, Gender, ICULOS, Unit1, Unit2, and SepsisLabel. The initial eight are known as vital signs of sepsis, the last 5 are demographic values of a patient, and among them are the clinical values of the patient. 33% of the dataset contained NAN values, and these values were not measured at the time of organization of the dataset (Barton et al. 2021).
3.2 Results and Discussions To analyze the performance of our approach, we split the dataset into training and test sets. For this purpose, we randomly allocated 90 and 10% of the records for the training and testing sets, respectively. To make the results less biased to the selected sets, we further applied stratified tenfold cross-validation in which the data is randomly partitioned into ten equal-sized folds (sets) with approximately the same percentage of each sepsis label. A single fold acts as a test set, while the remaining nine folds are used as the training set. The cross-validation process is repeated ten times, with each of the ten folds used precisely once as the test set. The results are then averaged to produce a single estimation. Furthermore, we used window slicing and noise injection data augmentation techniques to handle class imbalance issues. The window slicing augmentation method randomly extracts continuous slices from the original EHRs. For noise injection, we randomly applied Gaussian noise to the measured vital signs. Deep architectures are prone to overfitting; therefore, two regularization techniques were used in our model: l2 weight decay, which penalizes the l2 norm of the model’s weights and dropout and stochastically omits units in each layer during the training. In the training phase, the network weight update is achieved through mini-batch Stochastic Gradient Descent (SGD) over a shuffled batch size of 64. The Nesterov acceleration technique is used to optimize the Mean Squared Error (MSE) loss function. We trained our model through 600 epochs. The model is implemented in Python using Keras framework 2.2.4 with Tensorflow 1.14.0 as the backend. During the thirty-seventh hour, unusual changes in the heart and the respiratory rates have occurred. The body temperature has then risen slightly. In just a few hours, the patient has met the Sepsis-3 definition. This research work has developed system for the timely identification of sepsis in the human body. The proposed system utilized the LSTM model, which is trained and evaluated on the Physionet dataset (Barton et al. 2021). The dataset is comprised of computerized health reports of multiple patients admitted in the ICUs. In this work, we utilized the performance metrics, namely, AUROC, specificity, and accuracy, to evaluate the performance of the proposed system. In this study, the above-mentioned performance metrics are calculated in three different window sizes, i.e., 4-h window (4 h), 8-h window (8 h), and 12-h window (12 h), after patient is admitted in the ICU.
Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term … Table 1 Performance using LSTM
Performance 4 h metrics AUROC%
8h
13 12 h
91.19 + 0.005 89.21 + 0.007 87.34 + 0.021
Specificity% 75.24 + 0.014 73.45 + 0.015 70.32 + 0.018 Accuracy%
76.32 + 0.012 77.67 + 0.014 71.42 + 0.016
The AUROC is the metric that is used to evaluate the classification performance of the model depending upon some threshold value (GreatLearning, Understanding ROC, 2020). The AUROC results calculated after 4, 8, and 12 h windows are 91.19, 89.21, and 87.34%, respectively, as given in Table 1. The specificity metric determines the false positive rate of any classification. It is the ratio of true negative with both true negative and false positive values [31]. The specificity achieved by the proposed system in 4 h window is 75.24%, 8 h window is 73.45%, and 12 h window is 70.32% as shown in Table 1. The accuracy metric is highly utilized for measuring performance in most of the proposed deep learning models. It is the ratio of accurately determined predictions to the total number of predictions in the model [32]. The accuracies achieved by the proposed system are 76.32, 77.67, and 71.42%, after 4, 8, and 12 h windows, respectively, as given in Table 1. These results of performance metrics demonstrated that the proposed system can efficiently identify the sepsis patients by using the data of patients provided to the system.
3.3 Performance Comparison with Other Techniques In this section, the comparative analysis is performed with other state-of-the-art approaches to verify the significance of the proposed system. In order to conduct this comparison, the results achieved from other studies are compared with the results of the proposed system. The study in Nemati et al. [33] developed an artificial intelligence sepsis expert (AISE) system for the detection of sepsis in patients. The various features obtained from patient records were processed in a machine learning model, i.e., modified Weibull-cox proportional hazards model. This study calculated results in 4, 6, 8, and 12 h windows after the patient was admitted in the ICU. The AUROC achieved in this study is 85%, specificity of 67%, and an accuracy of 67%. In [34], multiple features of EMR, entropy, and socio-demographic merged together to develop a model for the detection of sepsis in the patients. Blood pressure (BP) and heart rate (HR) features were proven as significant predictors for detecting sepsis in this study, and the results were calculated in 4 h window size. The AUROC value obtained in this work is 78%, specificity of 55%, and an accuracy of 61%. The detailed results in terms of an AUROC, accuracy, and specificity of the proposed and other models are given in Table 2. These obtained results demonstrate that our
14
F. Hassan et al.
Table 2 Performance comparison with other techniques Reference paper
Method
Specificity%
AUROC%
Nemati et al. [33]
Modified Weilbull-Cox 67 proportional hazards model
Accuracy%
67
85
Shashikumar et al. [34]
Entropy + EMR + socio-demographic patient history
61
55
78
Proposed model
LSTM model
77
75
91
proposed LSTM-based system provides better results as compared to other techniques. Therefore, we concluded based on the comparison that our proposed system can be utilized in real-time environments for the detection of sepsis.
4 Conclusion This study proposed an intelligent sepsis detection system for an early detection of sepsis. Sepsis is a life-threatening disease and millions of people die every year due to negligence. Therefore, it is necessary to propose an automated detection system for sepsis to prevent the loss of precious lives. In this work, we proposed a system that employed a deep learning model such as the LSTM network and trained this model using the physionet dataset. The results of the proposed system signified that the proposed LSTM-based system can detect the sepsis patients with a very low false rate and in the early stages. Additionally, the proposed system can be implemented in real-time scenarios such as ICU. In future work, we aim to propose more state-ofthe-art deep learning frameworks for enhancing the performance of timely detection of sepsis.
References 1. Nemati S et al (2018) An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit Care Med 46(4):547–553 2. Li X, Kang Y, Jia X, Wang J, Xie G (2019) TASP: a time-phased model for sepsis prediction. In: 2019 computing in cardiology (CinC). IEEE, p 1 3. Delahanty RJ, Alvarez JoAnn, Flynn LM, Sherwin RL, Jones SS (2019) Development and evaluation of a machine learning model for the early identification of patients at risk for sepsis. Ann Emerg Med 73(4):334–344 4. Reyna M, Shashikumar SP, Moody B, Gu P, Sharma A, Nemati S, Clifford G (2019) Early prediction of sepsis from clinical data: the PhysioNet/computing in cardiology challenge 2019. In: 2019 computing in cardiology conference (CinC), vol 45, pp 10–13. https://doi.org/10. 22489/cinc.2019.412 5. Dellinger RP et al (2013) Incidence surviving sepsis campaign: international guidelines for management of severe sepsis and septic shock, 2012. Intensive Care Med 39(2):165–228
Intelligent Sepsis Detector Using Vital Signs Through Long Short-Term …
15
6. Giannini HM et al (2019) A machine learning algorithm to predict severe sepsis and septic shock: development, implementation, and impact on clinical practice. Crit Care Med 47(11):1485–1492 7. Al-Mualemi BY, Lu L (2020) A deep learning-based sepsis estimation scheme. IEEE Access 9:5442–5452 8. Wickramaratne SD, Shaad Mahmud MD (2020) Bi-directional gated recurrent unit based ensemble model for the early detection of sepsis. In: 2020 42nd annual international conference of the ieee engineering in medicine & biology society (EMBC). IEEE, pp 70–73 9. Kok C, Jahmunah V, Oh SL, Zhou X, Guruajan R (2020) Automated prediction of sepsis using temporal convolutional network. J Comput Biol Med 127 10. Li X, André Ng G, Schlindwein FS (2019) Convolutional and recurrent neural networks for early detection of sepsis using hourly physiological data from patients in intensive care unit. In: 2019 computing in cardiology (CinC). IEEE, p 1 11. Barton C et al (2019) Evaluation of a machine learning algorithm for up to 48-hour advance prediction of sepsis using six vital signs. Comput Biol Med 109:79–84. https://physionet.org 12. Biglarbeigi P, McLaughlin D, Rjoob K, Abdullah A, McCallan N, Jasinska-Piadlo A, Bond R et al (2019) Early prediction of sepsis considering early warning scoring systems. In: 2019 computing in cardiology (CinC). IEEE, p 1 13. Yao J, Ong ML, Mun KK, Liu S, Motani M (2019) Hybrid feature learning using autoencoders for early prediction of sepsis. In: 2019 computing in cardiology (CinC). IEEE, p 1 14. Eskandari MA, Moridani MK, Mohammadi S (2021) Detection of sepsis patients using biomarkers based on machine learning 15. Rodríguez A, Mendoza D, Ascuntar J, Jaimes F (2021) Supervised classification techniques for prediction of mortality in adult patients with sepsis. Am J Emerg Med 45:392–397 16. Chen M, Hernández A (2021) Towards an explainable model for Sepsis detection based on sensitivity analysis. IRBM 17. Chicco D, Oneto L (2021) Data analytics and clinical feature ranking of medical records of patients with sepsis. BioData Mining 14(1):1–22 18. Alvi RH, Rahman MH, Khan AAS, Rahman RM (2020) Deep learning approach on tabular data to predict early-onset neonatal sepsis. J Inf Telecommun 1–21 19. Reyna MA, Josef C, Seyedi S, Jeter R, Shashikumar SP, Brandon Westover M, Sharma A, Nemati S, Clifford GD (2019) Early prediction of sepsis from clinical data: the PhysioNet/computing in cardiology challenge 2019. In: 2019 computing in cardiology (CinC). IEEE, p 1 20. Kok C, Jahmunah V, Oh SL, Zhou X, Gururajan R, Tao X, Cheong KH, Gururajan R, Molinari F, Rajendra Acharya U (2020) Automated prediction of sepsis using temporal convolutional network. Comput Biol Med 127:103957 21. Fu J, Li W, Jiao Du, Xiao B (2020) Multimodal medical image fusion via Laplacian pyramid and convolutional neural network reconstruction with local gradient energy strategy. Comput Biol Med 126:104048 22. Baral S, Alsadoon A, Prasad PWC, Al Aloussi S, Alsadoon OH (2021) A novel solution of using deep learning for early prediction cardiac arrest in Sepsis patient: enhanced bidirectional long short-term memory (LSTM). Multimed Tools Appl 1–26 23. Rafiei A, Rezaee A, Hajati F, Gheisari S, Golzan M (2021) SSP: early prediction of sepsis using fully connected LSTM-CNN model. Comput Biol Med 128:104110 24. Van Steenkiste T, Ruyssinck J, De Baets L, Decruyenaere J, De Turck F, Ongenae F, Dhaene T (2019) Accurate prediction of blood culture outcome in the intensive care unit using long short-term memory neural networks. Artif Intell Med 97:38–43 25. Liu X, Liu T, Zhang Z, Kuo P-C, Xu H, Yang Z, Lan K et al (2021) TOP-net prediction model using bidirectional long short-term memory and medical-grade wearable multisensor system for tachycardia onset: algorithm development study. JMIR Med Informatics 9(4):e18803 26. da Silva DB, Schmidt D, da Costa CA, da Rosa Righi R, Eskofier B (2021) DeepSigns: a predictive model based on Deep Learning for the early detection of patient health deterioration. Exp Syst Appl 165:113905
16
F. Hassan et al.
27. Ullah A et al (2022) Comparison of machine learning algorithms for sepsis detection. Sepsis 28. Qadir G et al (2022) Voice spoofing countermeasure based on spectral features to detect synthetic attacks through LSTM. Int J Innov Sci Technol 3:153–165 29. Dawood H et al (2022) A robust voice spoofing detection system using novel CLS-LBP features and LSTM. J King Saud Univ Comput Inf Sci 30. Hassan F, Javed A (2021) Voice spoofing countermeasure for synthetic speech detection. In: 2021 international conference on artificial intelligence (ICAI). IEEE 31. Kumar A (2018) ML metrics: sensitivity vs. specificity - dzone ai, dzone.com. [Online]. https://dzone.com/articles/mlmetricssensitivityvsspecificitydifference#:~:text=What%20Is% 20Specificity%3F,be%20termed%20as%20false%20positives. Accessed: 13-Mar-2022 32. How to check the accuracy of your machine learning model, Deepchecks, 09Feb-2022. [Online]. https://deepchecks.com/how-to-check-the-accuracy-of-your-machine-lea rning-model/. Accessed 13-Mar-2022 33. Nemati S, Holder A, Razmi F, Stanley MD, Clifford GD, Buchman TG (2018) An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit Care Med 46(4):547– 553. https://doi.org/10.1097/CCM.0000000000002936 34. Shashikumar SP, Stanley MD, Sadiq I, Li Q, Holder A, Clifford GD, Nemati S (2017) Early sepsis detection in critical care patients using multiscale blood pressure and heart rate dynamics. J Electrocardiol 50(6):739–743
Implementation of Big Data and Blockchain for Health Data Management in Patient Health Records António Pesqueira , Maria José Sousa , and Sama Bolog
Abstract Blockchain Technology (BT) and Big Data (BD)-based data management solutions can be used for storing and processing sensitive patient data efficiently in the healthcare field. While many institutions and industries have recognized the significance of both technologies, few have implemented and executed them in the health sector when it comes to the management of patients’ medical records. By leveraging Patients’ Health Records (PHR) data, the purpose of this paper is to develop a practical application with an architecture built on BT and BD technologies and to help organizations manage data requirements and enhance data security, provenance, traceability, availability, and effective identity management. For that purpose, a case study was developed, which covers the BT and BD key considerations, as well as key issues such as policies, smart contracts, consent, and provision of secure identities, so that records are properly managed and controlled. Hence, the purpose of this study is to summarize the key characteristics of a practical EHR implementation, emphasizing security measures and technologies to decipher the effectiveness of the included technological components, such as decentralized identification, consent management, and private BT management. According to the results of the case study research, the presented solution has high accuracy and is capable of managing PHR effectively. Additionally, it has been shown to have a high practical value when it comes to meeting the accuracy and real-time requirements of BT and BD applications. Keywords Blockchain Technology · Big Data · Patients’ Health Records
A. Pesqueira · M. J. Sousa (B) ISCTE-Instituto Universitário de Lisboa, Lisbon, Portugal e-mail: [email protected] A. Pesqueira e-mail: [email protected] S. Bolog (B) University of Basel, Basel, Switzerland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_2
17
18
A. Pesqueira et al.
1 Introduction Global healthcare organizations (HCO) manage vast amounts of clinical, medical, and administrative data, from pharmaceutical supply chains to Patient Health Records (PHRs) and claims management. As different HCO data management security procedures become more common, an entirely new ecosystem of information is becoming available, increasing the volume of collected data exponentially [1]. The ability to link currently siloed information and serve as the “single source of truth” makes Blockchain Technology (BT) and Big Data (BD) extremely valuable technologies for improving healthcare-related clinical and operational data management solutions. This work proposes an immutable, secure, scalable, and interoperable architecture that enables patients and hospitals to be more transparent and secure while collecting sensitive patient data from a variety of integrated, connected, but independently managed healthcare systems, utilizing the proposed architecture. Hence, a BT and BD-based architecture is applied as a framework for a future real-world PHR system that ensures controlled data access and integrity and provides a transparent and trustable system framework for patient information for various stakeholders in the healthcare sector. By incorporating more advanced encryption methods into an immutable design audit trail, this study shows how the presented architectural design enhances the privacy, confidentiality, and secrecy of patient data compared to existing solutions. Using the proposed architecture, it was possible to create an immutable, secure, scalable, and interoperable platform that empowers patients and hospitals with greater transparency, privacy, and security while collecting sensitive patient data from a variety of integrated, connected, but independently managed healthcare systems.
2 Methodology By leveraging Patients’ Health Records (PHR), the purpose of this paper is to develop a practical application with an architecture built on BT and BD technologies and to help HCOs and patients in managing data requirements and enhance data control, provenance, traceability, availability, and effective identity management. Within the scope of the project, PHRs are clearly defined by the possibility of connecting wearables and sensors devices, but also allowing self-management patient monitoring and connecting to different HCOs providers to achieve a fully secure and trustworthy healthcare data management system. The designed solution in a private and protected Hyperledger Fabric (HF) environment was based on transactions in a proposed private BT, which represent exchanges of information and documents, as well as cryptographic hash files that represent single words used for Master Data Management (MDM) purposes and high-resolution medical images.
Implementation of Big Data and Blockchain for Health Data …
19
One key objective of this proposed architecture was the design of a permissionless system in which patients can be anonymous or whose identities are pseudonymous, and every patient can add new blocks to the ledger. On the other hand, in the developed permissioned BT, the identity of each patient is controlled by an identity provider or hospital administrative access. With decentralized identity and other privacy mechanisms, blockchain and distributed ledger technologies offer users novel possibilities for protecting their data. By enabling users to own and control their data, these systems provide users with greater sovereignty. This case study involved an exhaustive consultation with eight-hospital management and operating members from three hospitals located in Germany, Portugal, and Spain that requested that the identity aspects of the research paper be anonymized for reasons of confidentiality and data protection. This consultation with hospital management and operating members allowed a better understanding of all involved requirements and necessary technical system architecture. One of the main concerns of interviewing the hospital professionals was to address several trust issues, such as patient identification, patient consent, and hospital-patient user authentication. The involved hospital staff group also stressed the importance of allowing patients to add consent statements at any stage of their inpatient care journey or medical consultations, and with a trust mechanism that the BT holds them securely. In the process of collecting requirements from the healthcare professionals, a requirement arose that was directly related to the ability to act upon the directives and restrictions of the patients and having a system in place that can interpret them as access control decisions, as well as the assurance that the system is adhering to patient directives. The ability of healthcare providers to use a consistent, rules-based system for accessing patient data that can be permissioned to selected health organizations was essential. In addition, having different interconnected systems makes it easier to integrate PHR systems by utilizing a system of on-chain and off-chain data storage, where the designed architectures were needed to ensure full compliance. Access to the on-chain resources needs to be made available immediately to anyone who has permission to view the BT, while access to the off-chain data is stored in a designed SAP HANA configuration controlled by consent based on patient data from the EHR/PHR system. Different healthcare companies have tested SAP HANA for predictive analysis, spatial data analysis, graph data analysis, and other advanced analytical operations, which has led to the selection of this architectural approach. Personalized care is provided for all patients based on their biological, clinical, and lifestyle information [2].
2.1 Architecture One initial consideration was the possibility of collecting data through web-based and mobile applications in the future, in addition to the existing well-being and care
20
A. Pesqueira et al.
sensor technologies in the involved hospitals in different settings and integrating them using REST (representational state transfer) and application programming interfaces (APIs). The scalability option is critical in the future to ensure that the designed architecture can become the backbone for future PHRs. It should incorporate data from both patient-based and EHR-based technologies to provide a robust and comprehensive pool of data that can be accessed by authorized users such as healthcare providers and patients. Among the practical resolutions was the integration of the Ethereum smart contracts language into Solidity, which is embedded in the distributed BT network. A smart contract is a self-enforcing, immutable, autonomous, and accurate contract written in code, which is the building block of Ethereum applications. Furthermore, the security assurance that once smart contracts are deployed, and transactions are completed, the code becomes immutable, and as a result, the transactions and information become irreversible [3]. As a result of HF’s connection with the Ethereum platform, smart contracts were developed to deploy complex business logic into the network validation nodes as well as to test future scenarios in terms of exchanging medical images between the creation of Externally Owned Accounts and Contract Accounts (CAs). By combining Public Key Infrastructure (PKI) and decentralization/consensus, identity authorization processors can transform non-permissioned BTs into permissioned BTs where entities may register for long-term credentials or enrollment certificates commensurate with their types. Credentials issued by the Transaction Certificate Authority (TCA) to patients and medical doctors are used to issue pseudonymous credentials, which are used to authorize submitted transactions. Thus, certificates for healthcare transactions persist on the BT and can be used by authorized auditors to group otherwise unconnected transactions. With HF, the architected design allowed modularity, speed, smart contracts integration, privacy, and security among other benefits. With the following code lines, it is possible to understand how the EHR and PHR APIs were separated, but also how the patient data was retrieved from the ledger component [4]. Furthermore, in the below examples, the role of the hospital data administrator will also be validated from the request header so that the fabric gateway connection can be initiated with a subsequent smart contract invoking the function, which in turn will deploy the chaincode package containing the smart contract by using the chaincode lifecycle, then query and approve the installed chaincode for all three hospitals, and then commit it. In addition to the API checks, it was also necessary to verify from an end-user interface perspective that the BT calls initiating transactions were subsequently implemented in the smart contract. Medical transactions, for example, are sensitive information outside the circle of the patient and doctor who receives authorization, and the HF core solution offers the opportunity to create private channels for members of the network to exchange sensitive information [5]. Furthermore, from a security perspective, it also has the security underlying principles from HF and a key feature for providing additional hardware-based digital
Implementation of Big Data and Blockchain for Health Data …
21
signature security, as well as the ability to manage identities through Hardware Security Model (HSM). Using the designed architecture, the fabric SDK is running in JavaScript, while Node JS is being used in the backend nodes, which is being used in conjunction with an interface defined by Angular and using an original sandbox test environment. A smart contract is used primarily in this paper to automate the logic of the medical record and to store all the data in a dedicated ledger that can be viewed by patients and doctors based on the defined access rules. The use of smart contracts enabled the architecture to move medical records from one hospital to another while maintaining the necessary security and encryption. An important component of the architecture is based on HF chaincode with JavaScript, and Node.js, where Ethereum is connected, and smart contracts are created using Solidity [6]. As can be seen in Fig. 1, while ensuring that scalability and data security principles are followed, as well as interoperability with another critical component, such as SAP HANA, in terms of ensuring that the necessary business intelligence analysis and data curation are performed for a value-managed architecture with the data privacy and security procedures fully considered. Before granting access to the authorized EHR platform, identity management validations were critical to verify the credentials of the doctors and prove that the physicians held valid medical licenses.
Fig. 1 Architecture overview with all involved components
22
A. Pesqueira et al.
Fig. 2 Actions and activities from the different process and system profiles
A crucial step was the creation of the hospital group channel to create control access rules to control the access to existing hospitals, new hospitals, future HCOs, and future needs for hospital departments or organizations. As well as granting access control to network administrative changes, including network access control, where HF was fundamental to enabling patients and health data policies to be associated with different record data management protocols, as well as access control and network access control defined in the access control network. Thus, by studying the below picture (Fig. 2), we can see how the hospital data management users or system administrators provide information and actions to the medical doctors and parts of the EHR module, and then the medical doctors and patients are involved through the PHRs. As a key component of the overall architecture, AngularJS was used to connect with the Fabric docker cloud through an SDK node, which then connected to the entire HF ecosystem architecture. In this case study, SAP Hana was further integrated as an additional connection allowing the BD connection. The decision to use SAP HANA for MDM and BD analytical purposes was based mainly on the capability of combining additional tools and services with HANA, such as data intelligence, and to use HANA Cloud to collect and analyze future data resulting from the future unstructured, high-frequency data from the designed EHR and PHR platform. Connecting SAP HANA was part of an effort to collect BD from wearables, fitness trackers, and other sources of quantifiable lifestyle information that can be used to better understand behavior patterns or create baselines for understanding health concerns without requiring patients to sign up for studies or focus groups.
Implementation of Big Data and Blockchain for Health Data …
23
As part of the defined data schema application, participants (such as MDs, patients, or hospital administrators), assets (e.g., patients’ medical records), transactions (such as prescriptions, diagnoses, and medical consultations), as well as events (such as capturing symptoms) were defined to efficiently drive the necessary decision in terms of the database, orders, and certificates. In terms of credentialing physicians, verification of primary sources, privacy-preserving authentication, and the management of digital identities were crucial to the security of the established architecture, which grants access to resources and different stakeholders in an information system. In this architecture, one of the key mechanisms was the Primary Source Verification (PSV) required to verify a medical doctor’s license, certification, or registration to practice, which provided a solution to complete PSV, not the licensed individual.
2.2 PHR User Interface The last part of the case study was the development and implementation of the user interface (UI) system for the electronic health record and personal health records, where the primary goal was to develop a simple, yet trustworthy design. We show below the dashboard for the PHR UI, where the following areas were developed: medical records belonging to the patient, personal data, treatments, schedules, laboratory results, diagnosis documents, payments, and other settings, as shown in Figs. 3 and 4.
Fig. 3 Patient health records dashboard from the user interface
24
A. Pesqueira et al.
Fig. 4 Patient registration form with patient personal information, treatment, payment, and treatment information
The corresponding table and clinical notes are illustrated in Fig. 5 as part of the payment process of the administrative system, with a corresponding view of a representative table for the hospital department, case number, and payment information.
Fig. 5 Representative table for hospital department, case number, and payment information
Implementation of Big Data and Blockchain for Health Data …
25
3 Conclusion During this case study research, it was discovered that the presented solution is highly accurate in managing data about PHR, and according to the results of the case study research, the presented solution has high accuracy and is capable of managing PHR effectively. An increased number of patients and healthcare organizations should be involved in future research where specific tests to the technology and API connections can also be leveraged and maximized concerning pressure tests. Additionally, it has been shown to have a high practical value when it comes to meeting the accuracy and real-time requirements of BT and BD applications. One of the key objectives of this proposed architecture was the creation of a permissionless system, in which patients could be anonymous or pseudonymous, and in which every patient could add a new block to the ledger. In this work, we propose an immutable, secure, scalable, and interoperable architecture that can support a variety of connected, integrated, and independently managed healthcare systems to be more transparent and secure in the collection, management, and analysis of sensitive patient data. Due to the immaturity of Hyperledger, there were a few disadvantages, but fortunately, partway through the project, a new HF version and Composer were released, and the system was upgraded to take advantage of the numerous bugs fixes and enhancements that were included in these releases. The HF is a promising BT framework that comes with policies, smart contracts, and secure identities, allowing possible access to additional add-ins like SAP HANA or even with future advanced decentralized identification management systems via different connections such as docker. Interoperability between multiple hospital organizations provided a framework for developing a private and closed blockchain scenario. This approach provides reliable and secure solutions for managing medical records. Yet the most important task is to resolve security challenges and improve source code to provide a scalable and pluggable solution with effective implementation of powerful ordering service on a large-scale fabric network, updating consortium policies, and implementing the patient’s module functionality.
References 1. Abdelhak M, Grostick S, Hanken MA (2014) Health information-e-book: management of a strategic resource. Elsevier Health Sciences 2. Mathew PS, Pillai AS (2015) Big data solutions in healthcare: problems and perspectives. In: 2015 International conference on innovations in information, embedded and communication systems (ICIIECS). IEEE, pp 1–6 3. Pierro GA (2021) A user-centered perspective for blockchain development 4. Miglani A, Kumar N, Chamola V, Zeadally S (2020) Blockchain for the internet of energy management: review, solutions, and challenges. Comput Commun 151:395–418
26
A. Pesqueira et al.
5. Yuchao W, Ying Z, Liao Z (2021) Health privacy information self-disclosure in the online health community. Front Public Health 8:602792 6. Bai P, Kumar S, Aggarwal G, Mahmud M, Kaiwartya O, Lloret J (2022) Self-sovereignty identity management model for smart healthcare system. Sensors 22(13):4714
Ambient PM2.5 Prediction Based on Prophet Forecasting Model in Anhui Province, China Ahmad Hasnain, Muhammad Zaffar Hashmi, Basit Nadeem, Mir Muhammad Nizamani, and Sibghat Ullah Bazai
Abstract Due to recent development in different sectors such as industrialization, transportation, and the global economy, air pollution is one of the major issues in the twenty-first century. In this work, we aimed to predict ambient PM2.5 concentration using the prophet forecasting model (PFM) in Anhui Province, China. The data were collected from 68 air quality monitoring stations to forecast both short-term and longterm PM2.5 concentrations. The determination coefficient (R2 ), root mean squared error (RMSE), and mean absolute error (MAE) were used to determine the accuracy of the model. According to the obtained results, the predicted R, RMSE, and MAE values by PFM for PM2.5 were 0.63, 15.52 μg/m3 , and 10.62 μg/m3 , respectively. The results indicate that the actual and predicted values were significantly fitted and PFM accurately predict PM2.5 concentration. These findings are supportive and helpful for local bodies and policymakers to deal and mitigate air pollution problems in the future. Keywords Prophet forecasting model · Time series analysis · PM2.5 · Anhui province · China A. Hasnain Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China School of Geography, Nanjing Normal University, Nanjing 210023, China Jiangsu Center for Collaborative Innovation in Geographical Information, Resource Development and Application, Nanjing 210023, China M. Z. Hashmi (B) Department of Chemistry, COMSATS University Islamabad, Islamabad, Pakistan e-mail: [email protected] B. Nadeem Department of Geography, Bahauddin Zakariya University, Multan, Pakistan M. M. Nizamani School of Ecology, Hainan University, Haikou, China S. U. Bazai College of Information and Communication Technology, BUITEMS, Quetta, Pakistan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_3
27
28
A. Hasnain et al.
1 Introduction Due to recent development in different sectors such as industrialization, transportation, and the global economy, air pollution is one of the major widespread environmental issues in the twenty-first century. Previously, it has been reported by WHO that the air pollution level increased in many Asian countries such as China, Bangladesh, Pakistan, and India [1, 8]. China is the largest emerging country in the world, with a large population, industries, and transportation. In the last 3 decades, many cities and areas of the country experienced serious air pollution issues (Zhao et al. 2020). In the last few years, the Government of China took some serious steps to control the level of air pollution in the country, which resulted in a slight decline in air pollution but still there is a need to adopt some strict and preventive measures to protect the environment at a significant level [13, 14]. Particulate matter, with a diameter of 2.5 μm or less, is called PM2.5 , which has been proven to have harmful health impacts [7]. PM2.5 has a more significant impact on human health than PM10 . PM2.5 contains inducing materials such as lipopolysaccharide and polycyclic aromatic hydrocarbons, which severely degrades the human respiratory system [4]. Due to strict restrictions and the Air Pollution Prevention and Control Action Plan in September 2013 implemented by the government, a slight drop in the concentration of PM2.5 has been observed in China. However, heavy haze events still occur occasionally in many cities and regions in the country [11, 16]. Associated with harmful effects and an impact on the environment, air pollution has attracted widespread attention from researchers and scholars [13]. In recent years, many scholars used time series analysis to predict the concentrations of air pollutants (Zhao et al. 2020); [13]. Bhatti et al. [3] used SARIMA model to forecast air pollution and a factor analysis approach. Kaminska [10] used the random forest model to find out the short-term effects of air pollution and it is a popular approach due to its non-linear pattern. Garcia et al. [6] developed generalized linear models (GLMs) to predict the concentration of PM10 and find out the relationship between PM10 and meteorological variables. He et al. [9] presented the linear and non-linear methods to predict the PM2.5 concentration in their study. Against this background, in our work, we used the prophet forecasting model (PFM) which was developed by Facebook to predict both short-term and long-term PM2.5 concentration in Anhui province, China. The model has a unique ability to forecast accurately and it has been successfully activated if the data have numerous outliers and missing values. Compared with other models, such as autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average (SARIMA), PFM takes approximately 10 times less time and it has been successfully established [12]. In this research, we aimed to predict one of the more critical air pollutants (PM2.5 ) using PFM in Anhui province, China. The results of this research will be supportive and helpful for local bodies and policymakers to deal and mitigate air pollution problems in the future.
Ambient PM2.5 Prediction Based on Prophet Forecasting Model …
29
Fig. 1 The geographical location and the air quality monitoring stations in Anhui Province
2 Proposed Methodology 2.1 Study Area Anhui province is located near the sea and is one of the core areas of the Yangtze River Delta (YRD). Anhui province crosses the Yangtze River, Huai River, and Xin’an River, which makes it a more significant region of the country. As of 2020, the province has 16 provincial cities and 9 county-level cities. Anhui is rich in several major economic sectors such as industries and transportation. Figure 1 shows the geographical location and the air quality monitoring stations of Anhui province.
2.2 Data The daily average concentration of PM2.5 was used in this research and the data were collected between 1 January 2018 and 31 December 2021. The data were downloaded from the website of historical data of air quality in China and the data were from the China Environmental Monitoring Station (CNEMC 2019). In Anhui Province,
30
A. Hasnain et al.
68 monitoring stations are working to collect and record air pollution data and their location is shown in Fig. 1.
2.3 Proposed Model For time series analysis and prediction, PFM is a powerful tool and it takes a very small time to fit the model. The following formula was used for the model: y(t) = g(t) + s(t) + h(t) + et
(1)
Equation (1) was employed to determine the performance of the model, where y(t) represents the actual values; g(t) and s(t) represent seasonality; h(t) is for holiday outliers; and et is used for unexpected error. The model has a number of parameters and the model can be supposed as linear or logistic. The model accepts a Bayesianbased curve fitting technique to forecast and predict time series data, which is one of the significant features and makes it more imperative compared with other forecasting methods. Change points are significant features in the PFM and the fitting scale can be quantified; the model showed better results with higher change points. To determine the number of change points, the model plots a large number and then the PFM uses L1 regularization to pick out a few points to use. Due to a lack of change points, L1 regularization was used to avoid overfitting. L(x, y) ≡
n i=1
(yi− h θ (xi ))2 + λ
n
|θi |
(2)
i=1
Equation (2) represents the L1 regularization, where x and y are the coordinates of the nchange points. 2 for the change between the observed and predicted i=1 (yi − h θ (x i )) was used value squared. The purpose of λ ni=1 |θi | is to sustain the stability in weights in order to avoid overfitting, where λ represents how much the weights are penalized. Based on the number of estimators, the model has a number in determining the value of λ. To determine the model performance, the actual and predicted values were compared in different time frames.
2.4 Statistical Analysis In this work, we used determination coefficient (R2 ), root mean squared error (RMSE), and mean absolute error (MAE) to evaluate the model’s performance. The following formulas are used for these metrics:
Ambient PM2.5 Prediction Based on Prophet Forecasting Model … n x)2 (yi − n − x)2 (x i i=1 i=1 n |xi − yi |2 R M S E = n1 i=1
R2 =
M AE =
1 n
n
i=1 |x i
− yi |
31
(3) (4) (5)
where xi and yi are used for actual and predicted values and n is the number of samples.
3 Results and Discussion To specify the features of the model, a linear model was inputted and LI regularization technique was used for error and change points. The actual and predicted values were compared for both short-term and long-term prediction. During the entire period, the PFM showed superior performance. Figures 2 and 3 show the predicted results of ambient PM2.5 in Anhui Province. The results indicate that during the entire period the actual and predicted values were significantly fitted and the predicted R, RMSE, and MAE values for PM2.5 concentration by PFM were 0.63, 15.52 μg/m3 , and 10.62 μg/m3 , respectively (Fig. 2). Deters et al. [5] predicted PM2.5 concentration using a machine learning approach. The results indicated that in the current work the performance of the PFM was quite better compared with the mentioned study. In 1-year prediction, the predicted R, RMSE, and MAE values for PM2.5 were 0.58, 13.38 μg/m3 , and 9.38 μg/m3 , respectively. The performance of the model during the entire period by R values was higher than a yearly prediction, while the RMSE and MAE values were lower in a yearly prediction compared with the entire period forecasting for ambient PM2.5 in Anhui province. The actual and predicted values showed a good agreement during both periods with small differences. Previously, [15] used ARIMA and prophet methods to forecast air pollutants. The study revealed low accuracy compared with the current work. Moreover, in a 6-month duration, the PFM provides superior performance as shown by all statistical indicators (Fig. 2). The predicted R, RMSE, and MAE values by PFM for ambient PM2.5 were 0.66, 12.43 μg/m3 , and 8.64 μg/m3 , respectively. It should be noted that during this window of time, the predicted R, RMSE, and MAE values were improved. A significant relation was observed between the actual and predicted values in the 6-month prediction. With a 3-month prediction, the model predicts the concentration of PM2.5 , R = 0.48, RMSE = 16.70, and MAE = 12.69 in Anhui province. It suggests that the model provides better performance for longterm perdition than short-term. Figure 3 shows the predicted ambient PM2.5 for the upcoming 1.5 years.
32
A. Hasnain et al.
Fig. 2 Scatterplots of ambient PM2.5 results; a entire dataset, b yearly prediction, c 6-month prediction, and d 3-month prediction
4 Conclusion In the current study, the PFM was used to predict both short-term and long-term ambient PM2.5 concentration, using daily average data in Anhui Province. According to the gained results, the model has a wide ability to accurately predict the concentration of PM2.5, and the actual and predicted values were suggestively fitted during different windows of time. The model can be used in other regions and fields as a prediction method to obtain new findings. The results of the current research will be supportive and helpful for local bodies and policymakers to control and mitigate air pollution problems in the upcoming years.
Ambient PM2.5 Prediction Based on Prophet Forecasting Model …
33
Fig. 3 PM2.5 (μg/m3 ) forecasting in Anhui Province
References 1. Air Visual (2019) Airvisual–air quality monitor and information you can trust. Available at: https://www.airvisual.com/. Accessed 26 Aug 2019 2. Bhatti UA, Wu G, Bazai SU, Nawaz SA, Baryalai M, Bhatti MA, Nizamani MM (2022) A pre-to post-COVID-19 change of air quality patterns in anhui province using path analysis and regression. Pol J Environ Stud. https://doi.org/10.1007/s11356-020-08948-1 3. Bhatti UA, Yan Y, Zhou M, Ali S, Hussain A, Qingsong H et al (2021) Time series analysis and forecasting of air pollution particulate matter (PM2.5): an SARIMA and factor analysis approach. IEEE Access 9:41019–41031. https://doi.org/10.1109/access.2021.3060744 4. Bilal M, Mhawish A, Nichol JE, Qiu Z, Nazeer M, Ali MA et al (2021) Air pollution scenario over pakistan: characterization and ranking of extremely polluted cities using long-term concentrations of aerosols and trace gases. Remote Sens Environ 264:112617. https://doi.org/10.1016/ j.rse.2021.112617 5. Deters JK, Zalakeviciute R, Gonzalez M, Rybarczyk Y (2017) Modeling PM2.5 urban pollution using machine learning and selected meteorological parameters. J Electr Comput Eng 1–14. https://doi.org/10.1155/2017/5106045 6. Garcia JM, Teodoro F, Cerdeira R, Coelho LMR, Kumar P, Carvalho MG (2016) Developing a methodology to predict Pm10 concentrations in urban areas using generalized linear models. Environ Technol 37(18):2316–2325. https://doi.org/10.1080/09593330.2016.1149228 7. Hasnain A, Hashmi MZ, Bhatti UA, Nadeem B, Wei G, Zha Y, Sheng Y (2021) Assessment of air pollution before, during and after the COVID-19 Pandemic Lockdown in Nanjing, China. Atmosphere 12:743. https://doi.org/10.3390/atmos12060743 8. Hasnain A, Sheng Y, Hashmi MZ, Bhatti UA, Hussain A, Hameed M, Marjan S, Bazai SU, Hossain MA, Sahabuddin M, Wagan RA, Zha Y (2022) Time series analysis and forecasting of air pollutants based on prophet forecasting model in Jiangsu Province, China. Front Environ Sci 10:945628. https://doi.org/10.3389/fenvs.2022.945628 9. He B, Heal MR, Reis S (2018) Land-use regression modelling of intraurban air pollution variation in China: current status and future needs. Atmosphere 9(4):134 10. Kami´nska JA (2018) The use of random forests in modelling short-term air pollution effects based on traffic and meteorological conditions: a case study in wrocław. J Environ Manage 217:164–174. https://doi.org/10.1016/j.jenvman.2018.03.094 11. Liu N, Zhou S, Liu C, Guo J (2019) Synoptic circulation pattern and boundary layer structure associated with PM2.5 during wintertime haze pollution episodes in Shanghai. Atmos Res 228:186–195. https://doi.org/10.1016/j.atmosres.2019.06.001 12. Taylor SJ, Letham B (2017) Forecasting at scale. Am Stat 72(1):37–45. https://doi.org/10.1080/ 00031305.2017.1380080
34
A. Hasnain et al.
13. Wang J, He L, Lu X, Zhou L, Tang H, Yan Y et al (2022) A full-coverage estimation of PM2.5 concentrations using a hybrid XGBoost-WD model and WRF-simulated meteorological fields in the Yangtze River Delta Urban agglomeration, China. Environ Res 203:111799. https://doi. org/10.1016/j.envres.2021.111799 14. Wu X, Guo J, Wei G, Zou Y (2020) Economic losses and willingness to pay for haze: the data analysis based on 1123 residential families in Jiangsu Province, China. Environ Sci Pollut Res 27:17864–17877. https://doi.org/10.1007/s11356-020-08301-6 15. Ye Z (2019) Air pollutants prediction in shenzhen based on arima and prophet method. E3S Web Conf 136:05001. https://doi.org/10.1051/e3sconf/201913605001 16. Zhai S, Jacob DJ, Wang X, Shen L, Li K, Zhang Y et al (2019) Fine particulate matter (PM2.5) trends in China, 2013-2018: separating contributions from anthropogenic emissions and meteorology. Atmos Chem Phys 19:11031–11041. https://doi.org/10.5194/acp-19-110312019
Potato Leaf Disease Classification Using K-means Cluster Segmentation and Effective Deep Learning Networks Md. Ashiqur Rahaman Nishad, Meherabin Akter Mitu, and Nusrat Jahan
Abstract Potatoes are the most often consumed vegetable in many countries throughout the year, and Bangladesh is one of them. Plant diseases and venomous insects pose a significant agricultural hazard and now substantially impact Bangladesh’s economy. This paper proposes a real-time technique for detecting potato leaf disease based on a deep convolutional neural network. The categorization of a picture into several categories is known as segmentation. We have used the K-means clustering algorithm for segmentation. In addition, to increase the model’s efficacy, numerous data augmentation procedures have been applied to the training data. A convolutional neural network is a deep learning neural network used to prepare ordered clusters of data, such as depictions. We have used a novel CNN approach, VGG16, and ResNet50. By using VGG16, novel CNN, and resNet50, the suggested technique was able to classify potato leaves into three groups with 96, 93, and 67% accuracy, respectively. The recommended method outperforms current methodologies as we compared the performances of the models according to relevant parameters. Keywords Potato disease · Deep learning · VGG16 · Image segmentation · K-means clustering · Data augmentation
1 Introduction Agriculture is commonly known as soil culture. It is considered the spine of the financial framework for creating nations. In Bangladesh, agriculture is imperative Md. A. R. Nishad · M. A. Mitu · N. Jahan (B) Department of CSE, Daffodil International University, Dhaka, Bangladesh e-mail: [email protected] Md. A. R. Nishad e-mail: [email protected] M. A. Mitu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_4
35
36
Md. A. R. Nishad et al.
for people subsistence and contribution to GDP. In 2020, agriculture accounted for 12.92% of Bangladesh’s GDP [1]. Recently potato has been the third most consumed food in Bangladesh. On the other hand, 56 diseases have been recorded in potato fields in Bangladesh [2] where the loss of annual potato yield due to late blight is estimated at 25–57% [3]. Late blight is the foremost common and highly detrimental parasitic disease in potatoes. Therefore, it can be beneficial to the agriculture and economy of Bangladesh if we can reduce the potato production losses due to these diseases. K-Means Clustering is an unsupervised learning algorithm to illuminate the clustering issues in machine learning or data science. However, other approaches, such as contour detection and edge detection, are also helpful for segmentation. For example, image contour detection is crucial to numerous image analysis applications, counting picture segmentation, object recognition, and classification [4]. Deep Learning has been an effective tool within the past few decades for handling expansive sums of information. The interest in utilizing hidden layers has surpassed traditional methods, particularly design recognition. One of the foremost well-known deep neural networks is the Convolutional Neural Network (CNN). Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and the revival of deep convolutional neural networks (CNN) [5]. CNN is the dominant method of deep learning [6]. As previously mentioned, the loss of potatoes is 25–57% yearly due to late blight. If we can reduce this loss rate to 10%, it will have a huge impact on the economy of the country. For this reason, we think that more research needs to be done in this field and it has a good research scope. Finally, a deep learning-based system was proposed to predict potato leaf disease in our study and illustrate it in Fig. 1. It is time to motivate ourselves for agricultural development because this could be a way to protect our world from various disasters. The contributions of this study are listed as follows: • We proposed a preprocessing step on the PlantVillage potato leaf dataset. • The processed images are segmented by K-means clustering. • Finally, the Dataset is classified according to their respective classes, such as earlyblight, late blight, and healthy leaf, using different networks including VGG16, ResNet50, and 2D-CNN.
2 Literature Review To provide a better solution for potato leaf disease, detection and classification were our main aim. However, researchers have already proposed different techniques for detecting potato leaf diseases. A summary of those approaches is highlighted in this section. A CNN model was proposed by Mohit et al. [7] where three max-pooling layers were followed by two fully-connected layers, which gave them efficiency over the
Potato Leaf Disease Classification Using K-means Cluster …
37
Fig. 1 Deep learning-based smart system to predict potato leaf disease
pre-trained models. Overall, they got 91.2% accuracy on the plantVillage dataset of 10 classes (9 diseases and one healthy class). Chlorosis, often known as yellowing disease, is a plant disease that affects black gram plants. Changjian et al. [8] restructured residual dense network, a hybrid deep learning model that encompasses the upper hand of deep residual and dense networks, reducing the training process, and considered the Tomato leaf dataset from AI Challenger. Vaibhav et al. [9] used a hybrid dataset collected from four different sources. They have followed five-fold cross-validation and testing on unseen data for extreme evaluation. The model gained a cross-validation accuracy of 99.585% and average test accuracy of 99.199% on the unseen images. Divyansh et al. [11] initiated a pre-trained model to extract significant features from the potato PlantVillage dataset, logistic regression provided 97.8%. Amreen et al. [12] and Anam et al. [13] suggested a deep learning method. They segmented the images and trained the CNN model with those images. They achieved the highest accuracy on the GitHub dataset by utilizing DenseNet121 10-Fold. Md. Tarek et al. [14] applied the k-means clustering segmentation method on the fruit’s images, and SVM provided 94.9% accuracy. Yonghua et al. [15] designed an AISA-based GrabCut algorithm to remove the background information. On the other hand, using the same dataset, Sumita et al. [16] presented a CNN for recognizing corn leaf disease and got 98.88% accuracy. Huiqun et al. [17] applied transfer learning to reduce the size of train data, computational time, and model complexity. Five deep network structures were used, while Densenet_Xception offered the highest accuracy. Rangarajan et al. [18] proposed a pre-trained VGG16 algorithm for identifying eggplant disease. The highest accuracy for datasets created with RGB and YCbCr images in field conditions was 99.4%. Parul et al. [19] have utilized a CNN method
38
Md. A. R. Nishad et al.
Table 1 Summarize recent papers for potato disease prediction Author
Algorithm
Dataset
Zhou et al. (2021) [26]
Restructured residual dense network, Deep-CNN, RestNet50, DenseNet121
Tomato AI 9 classes of challenger dataset tomato leaf (13,185 images) diseases
95
Tiwari et al. (2021) [24]
SVM, ANN, KNN, DenseNet 121, DenseNet 201, MobileNet-v2
Hybrid dataset (25,493 images)
27 classes of 6 different crops diseases
99.58
Tiwari et al. (2020) [25]
VGG19, Inception Potato V3, Logistic PlantVillage Regression, VGG16 (2152 images)
3 classes of potato leaf diseases
97.8
Umamageswari et al. (2021) [10]
FCM, CSA, Fast Mendeley’s leaf GLSM model, disease dataset PNAS-Progressive (61,485 images) Neural Architecture
8 classes of 7 different crops diseases
97.43
Abbas et al. (2021) [27]
C-GAN, DenseNet21
10 classes of tomato leaf diseases
99.51
Tomato PlantVillage (16,012 images)
Classes
Accuracy (%)
to identify diseases in plants. They created a dataset combining the open-source PlantVillage Dataset and images from the field and the Internet and got 93% accuracy. After observing several previous research works, we have summarized a few recent papers and illustrated them in Table 1.
3 Data Preparation 3.1 Data Collection Data is one of the major parts of any machine learning algorithm. In this study, infected and healthy potato leaf images were collected from the PlantVillage Potato leaf disease dataset. We observed two common potato diseases: Early and late blight; however, we consider a total of three classes including healthy leaf. To train and test our proposed network’s performances, the dataset has been divided into 80:20 ratios. Table 2 represents the exact data volume for each class and Fig. 2 for representing a sample data.
Potato Leaf Disease Classification Using K-means Cluster …
39
Table 2 Dataset Serial No.
Class
1
Healthy
2
Late blight
1000
800
200
3
Early blight
1000
800
200
2152
1722
430
Total
Number of samples 152
Training sample 122
Test sample 30
Fig. 2 Example of PlantVillage dataset. a Potato early blight, b Potato late blight, and c Potato healthy
3.2 Augmentation Different data augmentation techniques have been applied to the training data to enhance the model’s efficiency. The computation cost is reduced a lot using the smaller pixels, and therefore, we used scale transformation ranging between 0 to 1 (1/255). A shear angle of 0.2 is applied on the training images in which one axis is fixed, but the other is stretched to a specific angle. We applied a zoom range of 0.2 to zoom in the images and a horizontal flip to rotate the image by 180 degrees on the x-axis. The augmentation techniques that we applied in this study are listed as follows: • • • •
Resize shear_range 0.2 Zoom_range 0.2 Horizontal flip
3.3 Segmentation The primary purpose of segmentation is to normalize and alter the visualization of an image that would be easier to analyze. We chose the k-means clustering method
40
Md. A. R. Nishad et al.
Fig. 3 Pseudo code to describe K-means clustering
and selected multiple K values: 3, 5, and 7 among these, we observed that K value 3 produces the best output. That is why finally we chose the value of K as 3. Figure 3 presents the pseudo code of k-means clustering. K-means clustering aims to reduce the sum of squared distances between all locations and the cluster center to the smallest possible value, equation shown in (1) J=
k n
( j)
||xi
− c j ||2
(1)
j=1 i=1
here J = objective function, k = number of clusters, n = number of cases, x i = case I, cj = centroid for cluster j, and ||x i (j) − cj || is the distance function. After applying the k-means clustering algorithm on our dataset, we got segmented data. To present the output of k- means, we created Fig. 4.
3.4 Proposed Network In this section, we are going to discuss three different network models. Our prepared dataset performed better with VGG16. Figure 5 illustrates the basic block diagram of our paperwork.
4 Experimental Result Analysis Several sets of experiments have been carried out for plant leaf disease classification and detection research. We used k-means clustering here, a common image segmentation approach, to segment the image [20]. To anticipate the classes of the leaf
Potato Leaf Disease Classification Using K-means Cluster …
41
Fig. 4 Dataset after segmentation
Fig. 5 Block diagram of our study
photos, we used three classification approaches. CNN, ResNet50, and VGG16 are the three. VGG16 and ResNet50 are pre-trained models. Each model was trained for 50 epochs on the training set. Here, Table 3 denotes the performance measures of our models. We employed performance measures such as accuracy, precision, recall, F1-score, and confusion matrix to evaluate the suggested approach’s performance. Accuracy =
TP + TN TP + TN + FP + FN
Precision =
TP TP + FP
42
Md. A. R. Nishad et al.
Table 3 Performances of different approaches Approach
Algorithm
Before augmentation
After augmentation
After segmentation (K-means) + augmentation
Evaluation metric ACC
PR
Recall
F1-score
VGG16
0.954
0.954
0.957
0.955
Novel 2D-CNN
0.776
0.774
0.775
0.775
ResNet50
0.643
0.635
0.655
0.645
VGG16
0.959
0.959
0.945
0.952
Novel 2D-CNN
0.815
0.814
0.804
0.808
ResNet50
0.63
0.63
0.63
0.63
VGG16
0.963
0.963
0.965
0.964
Novel 2D-CNN
0.93
0.93
0.91
0.92
ResNet50
0.67
0.66
0.68
0.67
Recall =
F1-score =
TP TP + FN
2 ∗ TP 2 ∗ TP + FP + FN
here TP = True Positive, TN = True Negative, FP = False Positive, and FN = False Negative. After augmentation and segmentation, we acquired VGG16 as the best model for our dataset as it generated 96% accuracy. The other two models generated 93 and 67% accuracy for our dataset. We present the ROC curve for VGG16 in Fig. 6.
4.1 Performance Comparison The proposed VGG16 model is compared to previously proposed networks such as VGG19, Novel CNN, PDDCNN, MCD (Minimum–maximum distance), and SVM. All of the models were trained on the original PlantVillage dataset before being applied to the augmented dataset; some of the models included segmentation. Table 4 shows that the presented VGG16 model outperformed all other proposed models for “augmented + segmented” dataset, with 96% accuracy.
5 Conclusion and Future Work Deep learning-based approaches have appeared as a great solution to produce promising outcomes in plant disease detection and recognition. This study has
Potato Leaf Disease Classification Using K-means Cluster …
43
Fig. 6 Training and validation results VGG16
proposed a deep learning-based method to classify potato leaf disease; here, we also used the k-means segmentation approach to generate better results. In this paper, we used three different deep learning-based algorithms. After completing our experiment on the original Kaggle plantvillage potato leaf dataset, we achieved a Convolutional Neural network (CNN) that works 93% accurately, ResNet50 provided 67% accuracy, and finally from VGG16, we obtained 96% accuracy. We used k-means clustering for image segmentation followed by four types of data augmentation on the training set. Therefore, we can summarize the study as follows: • We applied a k-means clustering segmentation approach. • Prepared dataset using different augmentation methods. • VGG16 was proposed as the best model for our experiment. In future work, we will develop an application to predict the class of a leaf disease and apply other algorithms to enrich the model performance. As a result, farmers in the agricultural field will be able to identify specific diseases at an early stage that will be helpful for them to take the necessary steps. However, we have a few limitations:
44
Md. A. R. Nishad et al.
Table 4 Compare previous work with our proposed model Reference
CNN model Segmentation
Augmentation
Dataset
Accuracy (%)
Rizqi et al. [21]
VGG16, VGG19
N/A
Yes (translations, PlantVillage rotation, shearing, vertical and horizontal flips)
91
Javed et al. [22]
Novel CNN, PDDCNN
YOLOv5
Yes (scale PlantVillage, transformation, PLD rotation, shearing, vertical flips, zoom)
48.9
Ungsumalee and Aekapop [23]
MCD
K-means clustering
N/A
PlantVillage
91.7
Proposed
VGG16, Novel 2D-CNN, ResNet50
K-means clustering
Yes (rescale, horizontal flip, sheer, zoom)
PlantVillage
96
• A large amount of data may improve the results. • It is possible to experiment with other segmentation methods. • Finally, provide a better application for crop fields.
References 1. O’Neill A (2022) Share of economic sectors in the GDP in Bangladesh 2020. https://www.sta tista.com/statistics/438359/share-of-economic-sectors-in-the-gdp-in-bangladesh/. Accessed 21 June 2022 2. Naher N, Mohammad H, Bashar MA (2013) Survey on the incidence and severity of common scab of potato in Bangladesh. J Asiatic Soc Bangladesh, Sci 39(1):35–41 3. Huib H, Joost Van U (2017) Geodata to control potato late blight in Bangladesh (GEOPOTATO). https://www.fao.org/e-agriculture/news/geodata-control-potato-late-blightbangladesh-geopotato. Accessed 3 June 2022 4. Catanzaro B et al (2009) Efficient, high-quality image contour detection. In: 12th international conference on computer vision. IEEE 5. Shin H et al (2020) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298 6. Sun Y et al (2020) Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans Cybernet 50(9):3840–3854 7. Mohit A et al (2020) ToLeD: tomato leaf disease detection using convolution neural network. Procedia Comput Sci 167:293–301 8. Changjian Z et al (2021) Tomato leaf disease identification by restructured deep residual dense network. IEEE Access 9:28822–28831
Potato Leaf Disease Classification Using K-means Cluster …
45
9. Vaibhav T et al (2021) Dense convolutional neural networks based multiclass plant disease detection and classification using leaf images. Eco Inform 63:101289 10. Umamageswari A et al (2021) A novel fuzzy C-means based chameleon swarm algorithm for segmentation and progressive neural architecture search for plant disease classification. ICT Express 11. Divyansh T et al (2020) Potato leaf disease detection using deep learning. In: 4th international conference on intelligent computing and control systems (ICICCS) 12. Amreen A et al (2021) Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput Electron Agric 187:106279 13. Anam I et al (2021) Rice leaf disease recognition using local threshold based segmentation and deep CNN. Int J Intell Syst Appl 13(5) 14. Md Tarek H et al (2021) An explorative analysis on the machine-vision-based disease recognition of three available fruits of Bangladesh. Vietnam J Comput Sci 1–20 15. Yonghua X et al (2020) Identification of cash crop diseases using automatic image segmentation algorithm and deep learning with expanded dataset”. Comput Electron Agric 177:105712 16. Sumita M et al (2020) Deep convolutional neural network based detection system for real-time corn plant disease recognition. Procedia Comput Sci 167:2003–2010 17. Huiqun H et al (2020) Tomato disease detection and classification by deep learning. In: International conference on big data, artificial intelligence and internet of things engineering (ICBAIE) 18. Rangarajan K et al (2020) Disease classification in eggplant using pre-trained VGG16 and MSVM. Sci Rep 10(1):1–11 19. Parul S et al (2018) KrishiMitr (Farmer’s Friend): using machine learning to identify diseases in plants. In: IEEE international conference on internet of things and intelligence system (IOTAIS) 20. Nameirakpam D et al (2015) Image segmentation using K-means clustering algorithm and subtractive clustering algorithm. Procedia Comput Sci 54:764–771 21. Rizqi AS et al (2020) Potato leaf disease classification using deep learning approach. In: International electronics symposium (IES). IEEE 22. Javed R et al (2021) Multi-level deep learning model for potato leaf disease recognition. Electronics 10(17):2064 23. Ungsumalee S, Aekapop B (2019) Potato leaf disease classification based on distinct color and texture feature extraction. In: International symposium on communications and information technologies (ISCIT). IEEE 24. Tiwari V et al (2021). Dense convolutional neural networks based multiclass plant disease detection and classification using leaf images. Ecol Inf 63(2021): 101289. https://doi.org/10. 1016/j.ecoinf.2021.101289 25. Tiwari D et al (2020) Potato leaf diseases detection using deep learning. In: 4th International Conference on Intelligent Computing and Control Systems (ICICCS) (pp 41–466). IEEE 26. Zhou C et al (2021) Tomato leaf disease identification by restructured deep residual dense network. IEEE Access 9(2021): 28822–28831 27. Abbas, Amreen, et al. (2021) Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput Electron Agric 187(2021):106279
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning Banuki Nagodavithana and Abrar Ullah
Abstract Polycystic Ovarian Syndrome (PCOS) is a silent disorder that causes women to have weight gain, infertility, hair loss, and irregular menstrual cycles. It is a complex health issue, and one of the methods to diagnose patients with PCOS is to count the number of follicles in the ovaries. The issue with the traditional method is that it is time-consuming and prone to human errors as it can be challenging for medical professionals to distinguish between healthy ovaries and polycystic ovaries. Using Deep Learning, the concept was to create and use various Deep Learning Models such as a CNN, Custom VGG-16, ResNet- 50, and Custom ResNet-50, to obtain a high-accuracy result that will detect between healthy and polycystic ovaries. From the results and evaluation obtained, the CNN model achieved 99% accuracy, VGG 16 model: 58%, ResNet-50 Model: 58%, and Custom ResNet-50 Model: 96.7%.
1 Introduction Polycystic ovary syndrome (PCOS) is a silent disorder with serious side effects and it has affected women globally causing them to suffer from different types of health issues such as irregular menstrual cycles, weight gain, infertility hair loss, and diabetes (Fig. 1). Since it is a complex health issue, the traditional method of diagnosing a patient with PCOS is that a medical professional would have to rule out two of the third options: a patient must have high levels of androgen levels (male sex hormones), irregular menstrual cycles, and a high number of follicles in the ovaries. The regular process of detecting polycystic ovaries is to use a transabdominal scan of the ovaries. After the medical professional receives the scan, they would have to count the number B. Nagodavithana · A. Ullah (B) Heriot-Watt University, Dubai, UAE e-mail: [email protected] B. Nagodavithana e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_5
47
48
B. Nagodavithana and A. Ullah
Fig. 1 Difference between a normal ovary and a polycystic ovary. Source [21]
of follicles (cysts) in the ovaries. If there is a result of more than twelve follicles within the ovary with a diameter of 2–10 mm and ovarian volume of more than 10 cm3, this patient is most likely to have polycystic ovaries [14]. However, the traditional method is more likely prone to human errors and can be time-consuming. It is quite difficult to distinguish between a normal ovary and a polycystic ovary as sometimes the characteristics can be similar. In a study called “Delayed Diagnosis and a Lack of Information Associated With Dissatisfaction in Women With Polycystic Ovary Syndrome” by Gibson et al, a large number of women have outlined delayed diagnosis and vague information given by doctors [8]. Since this can be an underlying problem, it is needed to create an apparatus that will detect the disease quickly, provide high-accuracy results, and most importantly provide a platform where women can have a better patient experience. The goal was to create, implement, t and train various Deep learning models that would tackle and observe the disease’s identity, patterns, and characteristics to give prime results. The outcome would be beneficial to healthcare professionals and patients as this would reduce the time of the diagnosis and shed light to provide accurate results on a disease that is complex. This is a challenge as there is a lack of research done on detecting polycystic ovaries through modern technology.
2 Background PCOS is a hormonal disorder that affects women of reproductive age, and this is when the ovaries deliver aberrant amounts of androgens (male sex hormones) [13]. Thus, causing women to have irregular menstrual cycles, hair loss, weight gain, and infertility [26]. During ovulation, a mature egg is released from the ovary so that it waits to be fertilized by a male sperm; however, if the egg is not fertilized, it will be sent out from the body during menstruation. Occasionally, a woman does not develop
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning
49
the right number of hormones that are needed to ovulate; therefore, when ovulation does not take place, the ovaries can start to develop small follicles. These tiny, small follicles make hormones called androgens. Women with high levels of androgen often have PCOS and this is an issue because it can affect a woman’s menstrual cycle [32]. Although studies suggest that women of different ages may experience different effects of PCOS, a study conducted by [30] indicated adolescents may experience other symptoms of PCOS in relation to their living habits, experiencing a difference in weight, acne, hirsutism, and irregular menstrual cycles [30]. Hailes [9] further states PCOS also has an impact on the mentality and physicality of women, such as hair growth, and psychological disorders such as depression, anxiety, and bipolar disorder [2]. There are different diagnosis methods that submerged throughout the years by medical professionals. This importantly highlights the difficulty and struggles to diagnose women with PCOS as the methods have kept changing. In 1990, the National Institutes of Health Criteria (NIH) implied the features of PCOS diagnosis based on the existence of clinical or biochemical hyperandrogenism and oligo/amenorrhea anovulation [16]. Biochemical Hyperandrogenism is when the levels of androgens in the blood reach a higher level [16]. For clinical hyperandrogenism, medical professors will search for physical signs such as acne, hair loss, and increased body hair that indicate a boosted amount of androgen levels. A woman that does not have PCOS usually has a number of 3-8 follicles per ovary [5]. In unusual cases where women may obtain a larger number of follicles in their ovary and polycystic ovarian morphology (PCOM) would be used to test. PCOM was established by the Rotterdam Criteria in 2003 to diagnose patients with PCOS using the polycystic ovarian morphology (PCOM) on the ultrasound along with clinical or biochemical hyperandrogenism and oligo/ amenorrhea anovulation included by the NIH [2]. In brief, PCOM is used to test the follicles per ovary, to see if the number of follicles is equal to or greater than 12 and/or has an ovarian volume greater than 10cm3 in at least one ovary and this can be detected with the help of ultrasound scanning [24]. Hence, the European Society of Human Reproduction and Embryology/American Society for Reproductive Medicine Rotterdam consensus (ESHRE/ASRM) expanded the diagnosis of PCOS that meets two of the standards: anovulation or oligo-ovulation must be present, clinic or biochemical hyperandrogenism must be present, and polycystic ovarian morphology (PCOM) must be seen on the ultrasound [20]. Lastly, the Androgen Excess Society evaluated PCOS as hyperandrogenism with polycystic ovaries or ovarian dysfunction. Hence, the Androgen Excess Society (AES) mediated those increased levels of androgen are the cause of PCOS. Therefore, androgen excess must be present including oligomenorrhea or polycystic cysts must be visible in the ultrasound images [3]. Medical Imaging is a modern solution to diagnose, monitor, or treat diseases using different types of technology. Ultrasound imaging is a type of medical imaging method that is used to capture live images of tissues, organs, or vessels in the body without having to make an incision [12]. Regarding diagnosing PCOS, Medical
50
B. Nagodavithana and A. Ullah
professionals use a procedure called a transvaginal ultrasound scan that is a type of pelvic ultrasound used to analyse a female’s reproductive organs such as the ovaries, uterus, or cervix [10]. It is one of the recommended methods as it gives the internal structure of the ovary, and it can be visible especially in obese patients. Another method called transabdominal ultrasound can be used [23], it is a method to visualize the organs in the abdomen; however, transvaginal ultrasound imaging is impressive as it is more reliable for detecting the appearance of polycystic ovaries. Since the transvaginal ultrasound includes a 3D ultrasound, it is easily accessible for medical professionals to view and analyse the image that is needed to diagnose the patient with PCOS. The medical expert can count the number of cysts and calculate the ovarian volume by using a simplified formula: calculated using the simplified formula: 0.5 × length x height x width [4]. These precautions are taken so that it reduces the likelihood of an error in the ultrasound image. However, it is still important to understand that the cysts can appear in large or small sizes and the ovarian volume can be miscalculated due to human error. A study conducted by [17] was to examine the different levels of agreement between observers using ultrasonographic features of polycystic ovaries. The focus was to identify and quantify polycystic ovaries and the method was to investigate transvaginal ultrasound scans in 30 women with PCOS by observers that were trained in Radiology and Reproductive Endocrinology. The scans had the number of follicles greater or equal to 2mm, ovarian volume, largest follicle diameter, follicle distribution pattern, and presence of corpus luteum [17]. The research’s conclusion was that the results suggested that evaluating the ultrasonographic features of polycystic ovaries was “moderate to poor” by the observers. Therefore, further training has been recommended for medical experts in the industry to analyse PCOM on ultrasonography [17]. A study about the “Pitfalls and Controversies” on the diagnostic criteria for Polycystic Ovary Syndrome suggests that the judgement of the ultrasound images of the polycystic cysts in the ovaries can be subjective. An investigation was conducted of 54 scans between images of polycystic ovaries that were duplicated and randomized for an assessment of four observers [1]. During the results, the observers agreed on a diagnosis of PCOS 51% of the time, then agreed with themselves 69% of the time. In this study, the polycystic ovary was initiated by having greater or equal to 10 follicles in 2–8 mm and an ovarian volume greater or equal to 12 cm3 . During the discussion by the observers, the criteria were either “too subjective”, or the measurements were “too insensitive” for an agreement [18]. Therefore, it is important to develop an automated system that will help medical professionals to detect polycystic ovaries with accurate results between polycystic and normal ovaries to help medical professionals detect and diagnose PCOS easily, and this can be done by using Deep Learning. Deep learning is a machine learning technique that allows models or systems to perform certain tasks to give an outcome. The model feeds onto a large amount of data and has a unique architecture that contains different features and layers that perform different duties to give a better result; this showcases that the models can achieve results beyond human-level performance [15].
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning
51
Vikas et al used Deep Learning to detect polycystic ovaries. The idea was to differentiate between different deep learning techniques such as Convolutional Neural Networks, Data Augmentation, and Transfer Learning. These images have been collected and divided for training, validation, and test sets. In this study, data augmentation has been tested on the training set as it would boost the performance of the data. In addition, Transfer Learning is implemented to execute a certain task that will be re-used in a similar task to enhance the performance of the model [31]. Whereas Convolutional Neural Network (CNN) uses image recognition to detect various types of images. It is essentially used for classifying images, collecting comparisons, and achieving object recognition. The Transfer learning Fine Tuning Model with data augmentation achieved the highest accuracy of 98%. In a study on the classification of polycystic ovaries based on ultrasound images using the Competitive Neural Network architecture, ultrasound images were used as the data and were evaluated through pre-processing. The team proceeds to use segmentation so that they can separate the object from the background, and the object in this study is the follicles or the polycystic ovaries which will then be detected, labelled, and cropped for the next step which is the feature extraction. The feature extraction will take the information from the newly cropped follicle image to differentiate itself from other objects and then the classification process will proceed to put these images into the classes whether the patient has PCOS or not [7]. The training process will train the dataset and allocate the weights randomly and the testing process will use a hyperplane to get the results of the follicles if it has PCOS or not. The weight changes the input data in the hidden layer [6]. Using the machine learning approach and competitive neural network with a 32-feature vector which processed a time of 60.64 seconds gave the best accuracy of 80.84% [7]. Kokila et al. [11] developed brain tumour detection using deep learning techniques. The brain tumour is detected and identified by the CNN model that is commonly used to provide a high accuracy rate for data with images [28]. The model was able to achieve an accuracy of 92% and the tumour identification model was analysed using a Dice coefficient; therefore, the average Dice score was [11].
3 Methodology and Implementation The requirements mainly focused on the models to accept ultrasound images of ovaries and predict with prime accuracy results. Throughout this course of implementing the project, different models were used to train and test on the data. Additionally, many changes have been performed to enhance the model’s performance.
52
B. Nagodavithana and A. Ullah
3.1 Implementing the Deep Learning Models The following section evaluates the process of collecting and pre-processing the data and developing and training the deep learning models.
3.1.1
Collecting Data
The models were trained and tested from a Kaggle dataset of ultrasound images of normal and polycystic ovaries. The data contains about 1697 polycystic ovary images and 2361 normal images. Additionally, the ultrasound scans were validated with a help of a medical expert to dispute any conflict of interest. Link to Kaggle datasets: https://www.kaggle.com/datasets/anaghachoudhari/ pcos-detection-using-ultrasound-images.
3.1.2
Splitting the Data
The data was split between the Train, Test, and Validation sets. The split ratio was divided by—60:20:20. The importance of splitting the data is to analyse the performance of the deep learning models. During the training of the models, the polycystic ovaries labels are set to 0 and the normal ovaries labels are set to 1.
3.1.3
Data Pre-processing
Before training the models with the dataset, it is necessary to pre-process the data so that it can eliminate data that can cause obstruction to the models. Data Normalization and the process of resizing the data are applied to the data so that the models can consume the data easily.
3.1.4
Data Augmentation
Data Augmentation is performed as it will generate more training samples that would boost the model’s performance to obtain better results. It tries to dodge the model from overfitting and handling imbalanced data that would affect the models negatively. Different data augmentation methods were used such as rotation, zoom range, width shift range, height shift range, and horizontal flip. Using the Keras Image Data Generator enables the images with different characteristics to be generated, and therefore, this data was processed to the deep learning models.
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning
53
3.2 Methodology As the data has been explored now, the next section presents the different models that were implemented and dives into each model’s architecture and implementation. For each model, an in-depth analysis was provided to demonstrate the complexity of the model. The proposed models are CNN model, VGG-16 model, Custom VGG-16 model, ResNet-50 model, and Custom ResNet-50 model.
3.2.1
CNN Model
The CNN model that was implemented is a simple architecture that contains five Conv2D layers, five MaxPool layers, Batch Normalization layers between the Convolution layers, and four Dropout layers. The last layer is a Dense layer with a Sigmoid activation. Additionally, the RMSprop optimizer was used at a learning rate of 2.7e−05. Due to the dataset containing only over 2000 images, the model would be able to pick up on small details of the polycystic ovaries at the very first few layers, and deeper into the architecture at later layers, it would detect precise details of the disease [29].
3.2.2
VGG-16 Model
The VGG-16 is a convolutional neural network architecture that contains 16 convolutional layers. The hyperparameter components of the model are consistent as it has only 3 × 3 convolution layers with a vast number of filters Keras (nd). This model is the most popular architecture among many deep learning models and is the basic choice for drawing out features from images. The positioning of the layers is uniform throughout the structure and consists of convolution layers with a 3 × 3 filter with a stride 1 and for stride 2 it uses the same padding with a Maxpool layer of 2 × 2 filter Mohan [22]. As the data passes through the model, the number of filters increases from 64 to 512. The final stages of the VGG-16 model end with three Dense layers. Implementing the model was simple and straightforward as it follows a chain of repeated layers. After importing the necessary libraries, a sequential model object must be defined using Keras. The next step was to add the stack of layers. The first block contains two consecutive layers with 64 filters of size 3 × 3, this is accompanied by a 2 × 2 max pooling layer with stride 2. Additionally, the input image size is 224 × 224 × 1. Following that, is to add the rest of the layers following the architecture. After implementing the stacks, the last step is to add the fully connected layers. Before the first fully connected layer, a flattened layer must be added. Lastly, the final layer is the output layer with a SoftMax Activation Mohan [22].
54
3.2.3
B. Nagodavithana and A. Ullah
ResNet-50 Model
The ResNet-50 architecture is a deep learning model known for image recognition, objection detection, and image segmentation. Due to its framework, the network can be trained over more than a million images thus resulting in great performance Mohan [21]. The architecture has a feature called the skip connection that automatically directs the gradient to the back propagated earlier layers resulting in a deep network. For the implementation of the model, a pretrained model was used from Keras. The model contains early stopping as it can be challenging for developers to decide on how many epochs a model should be trained. Many epochs can rise the issue of overfitting the model and less epochs can result in underfitting of the model Mohan [21]. To bring light to this dilemma, Early Stopping is a technique that will train a huge number of epochs and stop once the model’s performance does not have any improvements on the validation dataset Mohan [21]. In addition, a Model Checkpoint is used that will save the best-performing model after Early Stopping. Early Stopping might not be the prime model; therefore, the model checkpoint will save the best model during training depending on the given parameter.
3.2.4
Custom ResNet-50 Model
The main goal of designing the model was to implement various features that would strengthen the model’s performance. The model is parallel to the previous ResNet50 model; however, it contains more filters. In addition, back propagation layers are added between the convolution layers. A separable convolution layer is added to the model, and it is similar to a convolution layer, but it can be considered as a hybrid version. The layer divides a single convolution layer into two or more convolutions to produce the same output. Hence, this is an advantage as the model uses less parameters; therefore, it will use less training time on the model which makes the process faster. The filters were changed so the model’s architecture will be less complex. The trainable features are reduced as there were many filters on the regular ResNet-50 model that was trained over the two thousand images to try avoiding overfitting to a value that works with the architecture and to get better validation accuracy. Additionally, batch normalization was used to improve the training time and accuracy of the neural network. The activation functions that were used are ReLU that is the activation layer that runs in between the layers and Softmax that is used as a segregation of the classes and the results are often highlighted. Furthermore, a Cyclical Learning Rate (CLR) was implemented as it sets the global learning rates for training the model to eradicate numerous experiments and to find the prime values without additional computation [25]. Additionally, the learning rate finder function is implemented as it will compare the series of learning
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning
55
rates on one epoch. The optimizer that is used for the model training is Root Mean Squared Propagation (RMSProp). It stimulates the optimization process by reducing the number of function evaluations required to reach the optima and find the desired result (Brownlee 2021). The implementation of the model was proceeded to change the optimizers between RMSProp and SGD. Additionally, different learning rates were used to evaluate the changes in the model’s performance.
4 Results and Evaluation This section examines the performance evaluation that has been done on all the deep learning models that were introduced. Each model’s results include the Accuracy, Precision, Recall, F-1 score, Confusion Matrix, and ROC Curve.
4.1 CNN Model’s Performance The first experiment was done on the CNN model and the results are as follows: • • • • •
Accuracy: 99% Precision: Infected: 100% and Not Infected: 100% Recall: Infected: 100% and Not Infected: 100% F-1 Score: Infected: 100% and Not Infected: 100% Confusion Matrix: Infected
Infected Not infected
Not infected
340
0
0
473
• ROC Curve (Fig. 2).
4.2 Custom VGG-16 Model’s Performance The second model that was implemented is the VGG-16 Model and the results are as follows: • Accuracy: 58% • Precision: Infected: 0% and Not Infected: 58% • Recall: Infected: 0% and Not Infected: 100%
56
B. Nagodavithana and A. Ullah
Fig. 2 VGG-16 model ROC curve
• F-1 Score: Infected: 0% and Not Infected: 74% • Confusion Matrix: Infected
Not infected
Infected
0
340
Not infected
0
473
• ROC Curve (Fig. 3).
Fig. 3 Custom VGG-16 model ROC curve
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning
57
Fig. 4 ResNet-50 model ROC curve
4.3 ResNet-50 Model The third model that was implemented is the ResNet-50 model and obtained the same results as the VGG-16 model that was implemented earlier: • • • • •
Accuracy: 58% Precision: Infected: 0% and Not Infected: 58% Recall: Infected: 0% and Not Infected: 100% F-1 Score: Infected: 0% and Not Infected: 74% Confusion Matrix: Infected
Not infected
Infected
0
340
Not Infected
0
473
• ROC Curve (Fig. 4).
4.4 Custom ResNet-50 Model The last model that was implemented is the Custom ResNet-50 Model and after running the experiment, the results are as follows: • • • • •
Accuracy: 96.7% Precision: Infected: 100% and Not Infected: 91% Recall: Infected: 86% and Not Infected: 100% F-1 Score: Infected: 92% and Not Infected: 95% Confusion Matrix:
58
B. Nagodavithana and A. Ullah
Fig. 5 ResNet-50 model ROC curve custom
Infected Infected Not infected
Not infected
292
48
0
473
• ROC Curve (Fig. 5).
4.5 Discussion The models that obtained high accuracy are the CNN model and the Custom ResNet50 model. The CNN model obtained an accuracy of 99% and the Custom ResNet-50 obtained an accuracy of 96.7%. Both Custom VGG-16 Model and ResNet-50 model obtained an accuracy of 58%. The CNN model achieved precision, recall, and F-1 Score as 100% for both polycystic and normal ovaries, therefore, showcasing that the algorithm returns relevant results. Looking at the confusion matrix, the results also provide accurate scores. In the runner-up position, the Custom ResNet-50 model provided results as precision was 100% for polycystic ovaries and for normal ovaries it was 91%. The recall was 86% for polycystic ovaries and 100% for normal ovaries. Lastly, the F-1 Score was 92% for polycystic ovaries and 95% for normal ovaries. Additionally, the ratio in the confusion matrix is high; however, the ratio between “infected” and “not infected” is only 48 results. The Custom VGG-16 model and the ResNet-50 model did not have results compared to the models mentioned above. The Custom VGG-16 model’s precision is 0% for polycystic ovaries and 58% for normal ovaries, recall is 0% for polycystic ovaries and 100% for normal ovaries, and F-1 score is 0% for normal ovaries and 74% for infected ovaries. The confusion matrix’s ratio also highlights that the model
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning
59
did not perform well as the ratio between polycystic ovaries and normal ovaries is 340 results and only for normal ovaries is 473 results. Moving forward, the ResNet-50 model’s precision is 0% for polycystic ovaries and 58% for normal ovaries, recall is 0% for polycystic ovaries and 100% for normal ovaries, F-1 score is 0% for polycystic ovaries and 74% for normal ovaries. The confusion matrix ratio between “infected” and “not infected” is 340 results and “not infected” and “infected” is 473 results. The CNN model and the Custom ResNet-50 models were successful to provide high-accuracy results thus making them reliable. However, it would be ideal to experiment further on the CNN model to eliminate signs of overfitting. The custom VGG-16 model and ResNet-50 model did not perform well as this could happen due to several reasons, including lack of data, class imbalance problems, etc. During the process, the limitations were that it was difficult to find various ultrasound scans of normal ovaries and polycystic ovaries; therefore, it was important to do data augmentation to enhance the model’s performance. An additional limitation was that the scripts ran on GoogleColab, since it is a free source there are restrictions to running multiple scripts or executing the models simultaneously.
5 Conclusion This research was conducted to highlight the importance of women’s health. PCOS is a silent syndrome that many women face and should be taken seriously in the medical industry as it leads to many health issues stated in this paper. Women are diagnosed with PCOS through different extensive methods, therefore, pointing out how complex this health issue is. The main goal was to find Deep Learning methods that will help to detect polycystic ovaries with the use of ultrasound scans and to provide high-accuracy results as this would speed up the process and make it easier for medical professionals to diagnose patients with PCOS. During the process of the experiment, there was a lack of research between polycystic ovaries and technology; therefore, it was strenuous to find information that would help with the procedure of the project. With the help of previous work conducted by other researchers, it assisted to provide information that helped with the execution of the deep learning models and further diving deep into the experiments to provide the optimum results. From the results obtained, the most reliable models are the CNN model and Custom ResNet 50 model. The models that obtained high accuracy are the CNN model and the Custom ResNet-50 model. The CNN model obtained an accuracy of 99% and the Custom ResNet-50 obtained an accuracy of 96.7%, thus making them the two most reliable models. Whereas the ResNet-50 model obtained an accuracy of 58%, hence this shows that this model is not compatible to detect normal and polycystic ovaries. For future direction, additional Deep learning models will be implemented and evaluated to further improve the accuracy of detecting the disorder. With a Deep
60
B. Nagodavithana and A. Ullah
learning model that accurately predicts the disease that avoids overfitting, an automated interface programme will be invented to grab ultrasound scans and automatically give results to help doctors diagnose patients with PCOS. This is to believe that this will help reduce the exhaustion that women go through during their PCOS journey and provide a better patient experience.
References 1. Amer S, Li T, Bygrave C, Sprigg A, Saravelos H, Cooke I (2002) An evaluation of the interobserver and intra-observer variability of the ultrasound diagnosis of polycystic ovaries. Hum Reprod 17(6):1616–1622 2. Azizi M, Elyasi F (2017) Psychosomatic aspects of polycystic ovarian syndrome: a review. Iran J Psychiat Behav Sci 11(2) 3. Azziz R (2006) Diagnosis of polycystic ovarian syndrome: the rotterdam criteria are premature. J Clin Endocrinol Metab 91(3):781–785 4. Chen Y, Li L, Chen X, Zhang Q, Wang W, Li Y, Yang D (2008) Ovarian volume and follicle number in the diagnosis of polycystic ovary syndrome in Chinese women. Ultrasound Obstet Gynecol off J Int Soc Ultrasound Obstet Gynecol 32(5):700–703 5. Çelik HG, C¸ elik E, Polat I (2018) Evaluation of biochemical hyperandrogenism in adolescent girls with menstrual irregularities. J Med Biochem 37(1):7 6. Deep AI (2019) Weight (Artificial Neural Network) [online] Available at: https://deepai.org/ machinelearning-glossary-andterms/weight-artificial-neural-network. 7. Dewi R, Wisesty U et al (2018) Classification of polycystic ovary based on ultrasound images using competitive neural network. J Phys Conf Ser 971:012005. IOP Publishing 8. Gibson-Helm M, Teede H, Dunaif A, Dokras A (2017) Delayed diagnosis and a lack of information associated with dissatisfaction in women with polycystic ovary syndrome. J Clin Endocrinol Metab 102(2):604–612 9. Hailes J (2019) Depression and anxiety are common in women with PCOS. Learn how PCOS might affect your mental and emotional health, including mood, stress and body image. There is also information on what you can do if you find your mental and emotional health is affected by PCOS. TOPICS. [online] Available at: https://www.jeanhailes.org.au/health-a-z/pcos/emo tions Accessed 6 Nov 2021 10. Higuera V (2015) Ultrasound: purpose, procedure, and preparation. Keras (n.d.). Vgg16 and vgg19 11. Kokila B, Devadharshini M, Anitha A, Sankar SA (2021) Brain tumor detection and classification using deep learning techniques based on MRI images. J Phys Conf Ser 1916:012226. IOP Publishing 12. Krans B (2006) Ultrasound: purpose, procedure, and preparation 13. Kumar A, Woods KS, Bartolucci AA, Azziz R (2005) Prevalence of adrenal androgen excess in patients with the polycystic ovary syndrome (pcos). Clin Endocrinol 62(6):644–649 14. Lai Q, Chen C, Zhang Z, Zhang S, Yu Q, Yang P, Hu J, Wang C-Y (2013) The significance of antral follicle size prior to stimulation in predicting ovarian response in a multiple dose GNRH antagonist protocol. Int J Clin Exp Pathol 6(2):258 15. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 16. Legro RS, Spielman R, Urbanek M, Driscoll D, Strauss JF III, Dunaif A (1998) Phenotype and genotype in polycystic ovary syndrome. Recent Prog Horm Res 53:217–256 17. Lujan ME, Chizen DR, Peppin AK, Dhir A, Pierson RA (2009) Assessment of ultrasono-graphic features of polycystic ovaries is associated with modest levels of inter-observer agreement. J Ovarian Res 2:6–6
Diagnosis of Polycystic Ovarian Syndrome (PCOS) Using Deep Learning
61
18. Lujan ME, Chizen DR, Pierson RA (2008) Diagnostic criteria for polycystic ovary syndrome: pitfalls and controversies. J Obstet Gynaecol Can 30(8):671–679 19. Lujan ME, Chizen DR, Pierson RA (2008) Diagnostic criteria for polycystic ovary syndrome: pitfalls and controversies. J Obstet Gynaecol Canada 30(8):671–679 20. Mohammad MB, Seghinsara AM (2017) Polycystic ovary syndrome (PCOS), diagnostic criteria, and AMH. Asian Pac J Cancer Prev APJCP 18(1):17 21. Mohan S (2020) Keras implementation of resnet-50 (residual networks) architecture from scratch 22. Mohan S (2020b) Keras implementation of vgg16 architecture from scratch with dogs vs cat data set 23. National Cancer Institute (2011) NCI dictionary of cancer terms. [online] Available at: https:// www.cancer.gov/publications/dictionaries/cancer-terms/def/transabdominal-ultrasound 24. Reid SP, Kao C-N, Pasch L, Shinkai K, Cedars MI, Huddleston HG (2017) Ovarian morphology is associated with insulin resistance in women with polycystic ovary syndrome: a cross sectional study. Fertil Res Pract 3(1):1–7 25. Rosebrock A (2019) Cyclical learning rates with Keras and deep learning 26. Setji TL, Brown AJ (2007) Polycystic ovary syndrome: diagnosis and treatment. Am J Med 120(2):128–132 27. Simplyremedies (2020) Kenali Penyakit Hormon Wanita, Sindrom Ovari Polisistik (PCOS) available at: https://simplyremedies.com/steadfast/kenali-penyakit-hormon-wanita-sindromov ari-polisistik-pcos/ Accessed 10 Jun 2022 28. Tatan V (2019) Understanding CNN (convolutional neural network) 29. Thakur R (2019) Step by step vgg16 implementation in keras for beginners 30. Trent M, Austin SB, Rich M, Gordon CM (2005) Overweight status of adolescent girls with polycystic ovary syndrome: body mass index as mediator of quality of life. Ambul Pediatr 5(2):107–111 31. Vikas B, Radhika Y, Vineesha K (2021) Detection of polycystic ovarian syndrome using convolutional neural networks. Int J Cur Res Rev 13(06):156 32. Weiner CL, Primeau M, Ehrmann DA (2004) Androgens and mood dysfunction in women: comparison of women with polycystic ovarian syndrome to healthy controls. Psychoso-Matic Medicine 66(3):356–362
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye Cataract Disorder Amir Sohail, Huma Qayyum, Farman Hassan, and Auliya Ur Rahman
Abstract Humans see the happenings around through the help of eyes. Currently, visual impairment and blindness have become significantly dangerous health problems. Even though advanced technologies are emerging rapidly, blindness and visual impairment, still, remain significant problems around the globe in healthcare systems. Specifically, cataract is among the problem that results in poor vision and may also cause falling as well as depression. In old times, mostly the old people were suffering; however, childhood cataracts are common that result in severe blindness as well as visual impairment in children as well. Therefore, it is extremely mandatory to develop an automated system for the detection of cataracts. Being that this research presents a novel deep learning-based approach, CataractEyeNet to detect cataract disorder using the lens images. More specifically, we customized the pre-trained VGG-19 model and added 20 more layers to enhance the detection performance. The CataractEyeNet has obtained an accuracy of 96.78%, precision, recall, and F1-score of 97%, 97%, and 97%, respectively. The experimental outcomes of the CataractEyeNet show that our system has the capability to accurately detect cataract disorders. Keywords Eye cataract · Deep learning · VGG-19 · Medical imaging
1 Introduction The eye is an organ of the human body through which we observe happenings around us and its related diseases are increasing day by day. Cataract is an eye-related problem, which can cause weak sightedness and blurriness. Cataract is the formation of clouds around the lens of the eye, which results in a decrease of vision. Cataracts are of different types such as nuclear cataracts, cortical cataracts, posterior cataracts, and congenital cataracts. The above different types are classified based on the development and the location in the eye. A cataract develops gradually; however, the vision of one or both eyes decreases. There are numerous symptoms of the A. Sohail · H. Qayyum · F. Hassan (B) · A. U. Rahman UET Taxila, Punjab, Pakistan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_6
63
64
A. Sohail et al.
cataract, namely, double vision in the affected eye, halos surrounding lights, dim colors, etc., [1]. It develops with age, which gives rise to blurry vision and sensitivity to brightness. Additionally, there are certain diseases, namely, diabetes, ultraviolet rays, and traumas that cause cataract disorder. Some other factors can act as catalysts, namely, heavy usage of alcohol, smoking, high blood pressure, and exposure to radiation from X-rays, etc., [2]. The problem of visual impairment is increasing worldwide and nearly 62.5 million cases of visual impairment and blindness are reported around the globe [3]. A cataract is considered one of the main reasons for these visual impairments. Still, a significant number of cataract disorders remained undiagnosed [4]. The research community has conducted research to find how many people have undetected eye diseases (UED) and a considerable amount of UED cases were found [5, 6]. Classification is the process of classifying data into different categories while in this work, we have two classes, namely, cataract and non-cataract. Initially, for the purposes of the classification, pre-processing is performed, followed by feature extraction, and finally, images are classified based on the features given as input [7, 8]. Earlier, the cataract disorder was detected through fundus image analysis in which a fundus camera was used to detect it. Numerous feature extraction methods have been developed, namely, wavelet, acoustical, texture, sketch, color, spectral parameters, and deep learning based as well [9, 10]. Pre-trained models, namely, AlexNet, GoogleNet, ResNet, etc., are also employed for cataract detection and classification. The above models are developed using the convolutional neural network and dataset, namely, ImageNet is used for training the model. Employing pre-trained models for the problems is known as transfer learning [11]. The early detection of cataract patients is necessary to avoid blindness problems. Therefore, the pre-trained models play a significant role in saving time and providing better classification performance [9].
2 Literature Review There have been efforts by the research community to employ machine learningbased methods [12–21] for the detection of eye cataract disorders. In [12], support vector machine (SVM) and back propagation neural networks have been utilized for the detection of cataract disorders using numerous images, namely, fundus images and ultrasound images. In [13], various image features, namely, edge pixel count, big ring area, and small ring area were fed into the SVM classifier for the classification of normal, cataract, and post-cataract images. The method obtained an accuracy, sensitivity, and specificity of 94%, 90%, and 93.75%, respectively. In [21] an automated system based on the retro-illumination images to grade cortical and posterior subcapsular cataracts (PSC) has been developed. Numerous features, namely, intensity, homogeneity, and texture were utilized to specify the geometric structure as well as the photometric appearance of cortical cataracts. For the purposes to classify the cortical cataract and PSC cataract, support vector regression was employed. The
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye …
65
system has the benefits to avoid the under-detection as well as the over-detection for clear lenses and high opacity lenses, respectively. In [14], nuclear cataracts were detected and graded through the regression model. The system comprised four steps, namely, features selection, parameters selection, training, and validation, respectively. In [15], texture features based on the retro-illumination image characteristics and grade expertise of cataracts were used to train the linear discriminant analysis (LDA) for the classification of cataracts and normal. The method obtained an accuracy of 84.8%. In [16], two types of cataracts, namely, nuclear cataracts and cortical cataracts were detected sequentially by the two different grading systems. In one system, the lens’s structure was used for the feature extraction process followed by the SVM regression for the classification. Opacity in cortical cataract grading was detected with the region growing [18]. In [17], two tasks, namely cataract detection and grading were performed using fundus image analysis. Both the temporal and spatial domain features were extracted while SVM was employed to classify the images as cataract or normal. The radial basis function network was used to grade the cataracts such as mild cataract or severe cataract. The method obtained sensitivity and specificity of 90% and 93.33%, respectively. Similarly, in [16, 18], an active shape model was developed for the two tasks, namely cataract detection and grading. Moreover, the SVM regression classifier was utilized for the classification purposes of nuclear cataracts and normal eyes. The method obtained an accuracy of 95%. In [19], an automated cataract detection system was designed using a gray level co-occurrence matrix (GLCM) for the feature extraction. K-nearest neighbor (KNN) was utilized to classify the normal eyes vs. cataracts. GLCM was employed to obtain the values of uniformity, dissimilarity, and contract in the pupil of the eyes. The method obtained an accuracy of 94.5%. However, this method has utilized a very small number of images for training and testing purposes. In [20], two methods, namely, wavelet-transform and sketch-based were used for the feature extraction. Additionally, multi-class fisher discriminant analysis was also performed using the above two methods. The research community has also worked on deep learning-based techniques [9, 22–29] for the detection and grading of the eye cataracts. Mostly, deep learningbased methods are based on convolutional neural networks (CNNs). In [9], a deep convolutional neural network (DCNN) was used for the detection of cataract disorder using a cross-validation approach. The method obtained an accuracy of 93.52% for cataract detection while 86.69% accuracy for the grading of cataracts. However, this method has a problem of vanishing gradient. In [22], a DCNN-based system was designed for the detection of cataract disease using fundus images. The model obtained an accuracy of 97.04%. In [23], a system based on the discrete state transition and ResNet was developed to detect cataract disorder. The residual connection technique was used to avoid the vanishing gradient problem. In [24], a CNN-based deconvolutional network was used to design a cataract disorder detection system. It was investigated that vascular information lost after computation of multi-layer convolutional has a significant role in the grading of eye cataract. The cataract detection performance was enhanced by designing a hybrid global–local features representation model. In [25], a transfer learning-based approach was designed for the
66
A. Sohail et al.
detection of eye cataract disorder. Similarly, this [26] work has developed a cataract detection and grading system using a hybrid model based on the combination of two neural networks, namely, recurrent neural network (RNN) and CNN. The model was capable to learn relationships among the inner feature maps [27]. In [28], a cataract disorder detection system was designed by employing the Caffe to extract features from the fundus images while maximum entropy for the pre-processing. For classification purposes, SVM followed by SoftMax was employed. In [29], a three stages-based system, namely, pre-processing, features extraction, and classification was designed for the detection of cataract disorder. For improving the quality of images, top–bottom hat transformation and trilateral filters were used. Moreover, two-layered back propagation neural network model was employed.
2.1 Convolutional Neural Network The CNN is based on the deep learning algorithm that takes an image as an input followed by assigning learnable weights and biases to numerous objects present in the image as well as making it capable to distinguish from each other [30, 31]. Furthermore, the process of pre-processing necessary in the CNN-based models is very rare compared to other algorithms employed for the purposes of classification. During the training time of the CNN models, the filters are used to learn the characteristics. The CNN-based algorithms are similar to that of the connective pattern of the neurons in the human brain. Moreover, it was inspired by the institute of the visual cortex. The neurons normally react to the stimuli in the limited space of the visual field called as receptive field. A set of these fields intersect for covering the overall visual area. The CNN-based algorithms have been extremely popular because of the improved performance in the classification tasks of the images. CNNs consist of numerous blocks of layers, namely, convolutional, pooling (either maximum, minimum, or average pooling), flatten, dense, and dropout layers. Most importantly, features of the temporal and spatial domain from the images are extracted using the convolutional layers and the filters. Additionally, the computational efforts are significantly minimized through the weight-sharing technique [32, 33]. The CNNs are considered as feedforward artificial neural networks, which have two shortcomings, namely, shared weights and similar filters have neurons associated to the surrounding patches. A typical CNN model consists of three blocks, namely, a convolutional layer, a maxpooling layer, and a fully connected layer for enabling the network to classify images [34].
2.2 Pre-trained Models The CNNs have superior performance on big datasets; however, these models suffer from an overfitting problem on a small dataset [35, 36]. Transfer learning is utilized
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye …
67
to save the time of the training and it is beneficial to be employed for image classification problems. In transfer learning, the pre-trained models on the large dataset, namely, ImageNet [37, 38] can be utilized for such applications that have relatively limited size datasets. CNNs have been employed in numerous applications, namely, manufacturing, medical fields, as well as in baggage screening [39–41]. Transfer learning is favored due to the reason because it minimizes the lengthy training time and the necessity of the big dataset. Furthermore, designing a deep learning-based model from the scratch needs a big dataset and lengthy training time [42]. Hence, in this work, we also used the existing pre-trained model, namely, VGG-19 [43], but added 20 more additional layers for better performance of the cataract disorder detection. The details are given in the subsequent sections.
2.3 VGG-19 VGG was originally designed by the visual geometry group at Oxford and called it a VGG. The VGG carries and utilizes ideas from its prototypes such as AlexNet, enhances, and utilizes the deep convolutional neural layers for improving the accuracy. AlexNet [44] was proposed that enhanced the conventional CNN models; therefore, VGG is considered a successor of the AlexNet. The VGG-19 comprises of 19 layers and the detailed parameters configuration is found in [44]. VGG-19 is a variant of the original VGG model, and the 19 layers have 16 convolutional layers, 5 maximum pooling layers, 3 fully connected layers, and 1 SoftMax layer. VGG has also other variants; however, we have employed VGG-19 in this work. The above literature shows a significant contribution to the detection of cataract disorders; however, the existing methods, still, have limitations that need to be addressed. Therefore, we developed a novel deep learning-based approach CataractEyeNet for the detection of cataract disorder. The main contributions of this research work are as under. • We developed a novel deep learning-based approach named CataractEyeNet by customizing the VGG-19 to detect the cataract disorder. • The CataractEyeNet is capable to distinguish cataract disorder images from normal. • We observed that non-customized VGG-19 has degraded performance than our proposed CataractEyeNet method. • For the validation of our approach, we performed extensive experimentation of the ODIR-5 K dataset. The remaining manuscript is organized in the following way, Sect. 2 has a detailed discussion about the proposed methodology while Sect. 3 has details of experimental results. Finally, the research work is concluded in Sect. 4.
68
A. Sohail et al.
3 Proposed Methodology This section provides details of the proposed methodology for cataract detection. The CataractEyeNet is based on the pre-trained model, namely, VGG-19. We modified the VGG-19 model by adding 20 layers including convolutional layers, max-pooling layers, and a flatten, and a dense layer. For experimentation purposes, we used the ODIR-5 K dataset and used 80% of the data for training the CataractEyeNet while 20% for evaluating the proposed CataractEyeNet. The detailed working mechanism is illustrated in Fig. 1.
3.1 Customization In this work, we have customized the pre-trained model, namely, VGG-19. The details of the VGG-19 are discussed in Sect. 2.3. The model has 19 deep layers and we further added 20 more layers to enhance the performance for accurate detection of the cataract disorder. We observed from the experimental findings that the customized algorithm has superior performance for the detection of cataract disorder. The benefit of using a pre-trained model is saving enough time for training and improving classification performance. Moreover, the model is capable to capture both dependencies in the image, namely, the spatial and temporal using the relevant filters. The customized architecture has good performance due to the decrease in the parameters and reusability of the learnable weights. Specifically, the proposed network can be trained to learn the complexity of the images better.
Fig. 1 Proposed working mechanism
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye …
69
3.2 Proposed CataractEyeNet Model In this research work, we have proposed a novel architecture, CataractEyeNet that is based on the VGG-19 model for the detection of eye cataract disorder. The VGG-19 model is comprised of 19 deep layers and is discussed in detail in Sect. 2.3. We further added 20 layers including convolutional layers, max-pooling layers, a flatten, as well as a dense layer at the end. The proposed CataractEyeNet has 5 blocks comprised of 5 convolutional layers, 5 max-pooling layers, 1 flatten layer, and 1 dense layer. In the first block, there are two layers that have filters equal to 64, kernel size equal to 3 × 3, padding equal to the same, and an activation function of ReLU. In the second block, there are 2 convolutional layers that have filters of size 128, kernel size of 3 × 3, padding is same, and an activation function a ReLU. The third block of the CataractEyeNet is comprised of three convolutional layers, a filter size of 256, kernel size of 3 × 3, padding is same, and an activation function of ReLU. The fourth and fifth blocks of the network consist of three convolutional layers, filter sizes of 512, kernel size 3 × 3, padding is same while activation is ReLU. We added one max-pooling layer after each convolutional layer having a pool size of 2 × 2 and a strides size of 2 × 2. Finally, we added a flatten layer and a dense layer that has an activation function of the sigmoid. More specifically, the proposed. CataractEyeNet is comprised of 39 layers. We performed experiments using the standard dataset, namely, ODIR-5 K for the detection of eye cataract disorder. The proposed system significantly detected patients with cataracts.
3.3 Dataset In this research work, a publicly available dataset, Oscular disease intelligent recognition (ODIR-5 K) is utilized for the purposes of experimentation. The dataset contains 5000 multi-labeled color fundus images of both right as well as left eyes. Additionally, the images have the doctor’s descriptive diagnosis, and each is for the individual eyes as well as the patient. The data originally was collected from numerous hospitals of China and compiled by the Shanggong medical technology Co., Ltd. Initially, the images captured by the high-resolution cameras include unnecessary features, namely, eyelashes, freckles, etc. Most importantly, the diseases were annotated by expert ophthalmologists. The details of the datasets are given in [45].
70
A. Sohail et al.
4 Experimental Setup and Results This section has provided a detailed experimental setup and discussion. In this work, we used the following performance parameters, namely, accuracy, precision, recall, and F1-score. The detailed experimental findings of the proposed CataractEyeNet, confusion matrix analysis, and performance comparison are discussed in the subsequent sections.
4.1 Performance of the CataractEyeNet This experiment is aimed to check the evaluation performance of the proposed CataractEyeNet to detect cataract disorder. To achieve this aim, we split up the ODIR-5 K dataset into two sets, namely, the training set as well as the testing set. We used two classes such as cataract and normal as our purpose was to detect the cataract disorder. Moreover, we customized the pre-trained VGG-19 by introducing numerous convolutional layers and pooling layers. However, we obtained good classification results by adding 20 additional layers to the VGG-19 model and named it CataractEyeNet. We reported the detailed results of the CataractEyeNet in Fig. 2. We examined from Fig. 2 that CataractEyeNet obtained an accuracy of 95.78% while the precision is 97%, the recall obtained is 97%, and the F1-score is 97%. Moreover, the precision for normal is 98%, recall is 95%, and F1-score is 96% while for the cataract the precision is 96%, recall is 98%, and F1-score is 97%. As we discussed that we introduced different combinations of layers to enhance the performance of the VGG-19, however, we obtained poor performance on all configurations except the addition of 20 layers into it. We noticed the loss of the proposed CataractEyeNet equals 63.36%. The experimental outcomes obtained by the CataractEyeNet are promising, and based on the outcomes mentioned in Fig. 2, we can claim that this technique can be implemented by ophthalmologists for the accurate detection of Cataract disorder to avoid blindness.
4.2 Confusion Matrix In this section, we explained the detailed classification performance of CataractEyeNet. It is obvious that accuracy alone is not enough to check the performance of the algorithm for the classification problem. Other performance parameters, namely, precision, recall, and F1-score can’t be ignored and are significant to checking the complete performance of the algorithm. Therefore, we have also designed an error matrix analysis to compute other metrics for the detailed performance of the CataractEyeNet. The designed confusion matrix is given in Table 1. It can be observed that CataractEyeNet has correctly classified 93 normal eyes and 118
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye … Fig. 2 Performance of the CataractEyeNet
71
100 90 80 70 60 50 40 30 20 10 0 Accuracy%
Table 1 Confusion matrix
Predicted class
Precision%
Recall%
F1-score
Actual class Normal
Cataract
Normal
93
5
Cataract
2
118
cataract disorders, respectively. Moreover, the CataractEyeNet has misclassified 5 cataract disorders as normal while 2 normal as cataract disorders. The lower misclassification of the CataractEyeNet indicates that our method is capable to detect the cataract disorder accurately and the proposed system can be adopted in hospitals to save time as well as accurately detect the patients to avoid blindness.
4.3 Performance Comparison Against Other Methods This experiment is conducted to compare the performance of the proposed CataractEyeNet against the existing techniques [21, 23, 29, 46–48]. To achieve this goal, we take the experimental results from the papers directly without implementing their methods. The detailed outcomes of the other methods [21, 23, 29, 46–48] in terms of accuracy are given in Table 2. The reported results in Table 2 indicate that [47] obtained the lowest accuracy of 61.9% and it is the worst performance for the detection of cataract disorder among all the techniques [21, 23, 29, 46–48]. Furthermore, [48] obtained 94.83%, which is the second-best performing technique among other techniques [21, 23, 29, 46–48] while our method, CataractEyeNet, has a superior performance by achieving an accuracy of 96.78%. Moreover, we have also reported the improvements in our method compared to the other methods [21, 23, 29, 46–48]. The proposed CataractEyeNet has obtained 3.98%, 5.92%, 34.88%, 4.78%, 2.78%, and 1.95% than [21, 23, 29, 46–48] respectively. Moreover, preprocessing isn’t required, spatial padding is used for preserving the spatial resolution
72
A. Sohail et al.
Table 2 Performance comparison with other approaches Authors
Accuracy%
Improvement of our method w.r.t other technique
Xiong et al. [46]
92.8
3.98
Yang et al. [29]
90.86
5.92
Abdul-Rahman et al. [47]
61.9
34.88
Cao et al. [21]
92
4.78
Zhou et al. [23]
94
2.78
Lvchen Cao [48]
94.83
1.95
Proposed
96.78
–
of the images, the ReLU activation function is used for introducing non-linearity and making the model classify better the images, and the computational time is enhanced because employing the tanh degraded the detection performance. Due to the abovementioned reasons, experimental findings, and comparative analysis, we concluded that CataractEyeNet is better to use for the detection of cataract disorder. The abovementioned experimental outcomes as well as the comparative assessment against the existing techniques illustrate that CataractEyeNet has the capability to accurately detect cataract disorder patients.
5 Conclusion In this work, we addressed the problem of detecting the cataract disorder. The cataract disorder has become a challenging task, which needs to be addressed. People of all ages suffer from blindness as well as visual impairment. Therefore, to achieve this goal, we have designed a novel cataract disorder detection system, CataractEyeNet, that has better classification performance. The proposed CataractEyeNet has obtained good accuracy of 96.78%, the highest precision, recall, and F1-score of 97%, 97%, and 97%, respectively. From the experimental findings, we concluded that CataractEyeNet has superior performance and can be adopted by medical experts in hospitals for the detection of cataract disorders. In near future, we aim to apply the same method for the detection of different grades of cataracts.
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye …
73
References 1. Access on 8-20-2021. https://www.healthline.com/health/cataract 2. Liu YC, Wilkins M, Kim T, Malyugin B, Mehta JS (2017) Cataracts. Lancet 390(10094):600– 612 3. Flaxman SR, Bourne RRA, Resnikoff S et al (2017) Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis. Lancet Global Health 5:e1221–e1234 4. Chua J, Lim B, Fenwick EK et al (2017) Prevalence, risk factors, and impact of undiagnosed visually significant cataract: the Singapore epidemiology of eye diseases study. PLoS One 12:e0170804 5. Varma R, Mohanty SA, Deneen J, Wu J, Azen SP (2008) Burden and predictors of undetected eye disease in Mexican Americans: the Los Angeles latino eye study. Med Care 46:497–506 6. Keel S, McGuiness MB, Foreman J, Taylor HR, Dirani M (2019) The prevalence of visually significant cataract in the Australian national eye health survey. Eye (Lond) 33:957–964 7. Sahana G (2019) Identification and classification of cataract stages in maturity individuals’ victimization deep learning formula 2770. Int J Innov Technol Explor Eng (IJITEE) 8(10) 8. Soares JVB, Leandro JJG, Cesar RM, Jr, Jelinek HF, Cree MJ (2006) Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification. IEEE Trans Med Imaging 25(9):1214–1222 9. Zhang L, et al (2017) Automatic cataract detection and grading victimization deep convolutional neural network. In: IEEE Ordinal International Conference on Networking, Sensing and Management (ICNSC), Calabria 10. Zhang Q, Qiao Z, Dong Y, Yang J-J (2017) Classification of cataract structure pictures supported deep learning. In: IEEE International Conference on Imaging Systems and Techniques, Beijing, China, pp 1–5 11. Patton EW, Qian X, Xing Q, Swaney J, Zeng TH (2018) Machine learning on cataracts classification using SqueezeNet. In: 4th International Conference on Universal Village, Boston, USA, pp 1–3, ISBN-978-1-5386-5197-1 12. Yang JJ, Li J, Shen R, Zeng Y, He J, Bi J, Li Y, Zhang Q, Peng L, Wang Q (2016) Exploiting ensemble learning for automatic cataract detection and grading. Comput Methods Programs Biomed 124:45–57 13. Nayak J (2013) Automated classification of normal, cataract and post cataract optical eye images using SVM classifier. In: Proceedings of the world congress on engineering and computer science, vol 1, pp 23–25 14. Xu Y, Gao X, Lin S, Wong DWK, Liu J, Xu D, Cheng CY, Cheung CY, Wong TY (2013) Automatic grading of nuclear cataracts from slit-lamp lens images using group sparsity regression. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, Heidelberg, pp 468–475 15. Gao X, Li H, Lim JH, Wong TY (2011) Computer-aided cataract detection using enhanced texture features on retro-illumination lens images. In: 2011 18th IEEE international conference on image processing. IEEE, pp 1565–1568 16. Li H, Lim JH, Liu J, Wong DWK, Tan NM, Lu S, Zhang Z, Wong TY (2009b) Computerized systems for cataract grading. In: 2009 2nd international conference on biomedical engineering and informatics. IEEE, pp 1–4 17. Harini V, Bhanumathi V (2016) Automatic cataract classification system. In: 2016 international conference on communication and signal processing (ICCSP). IEEE, pp 0815–0819 18. Li, H., Lim, J.H., Liu, J., Wong, D.W.K., Tan, N.M., Lu, S., Zhang, Z., Wong, T.Y., 2009a. An automatic diagnosis system of nuclear cataract using slit-lamp images, in: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EEE. pp. 3693–3696. 19. Fuadah YN, Setiawan AW, Mengko T (2015) Performing high accuracy of the system for cataract detection using statistical texture analysis and k-nearest neighbor. In: 2015 international seminar on intelligent technology and its applications (ISITIA). IEEE, pp 85–88
74
A. Sohail et al.
20. Li T, Zhu S, Ogihara M (2006) Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10(4):453–472 21. Cao L, Li H, Zhang Y, Zhang L, Xu L (2020) Hierarchical method for cataract grading based on retinal images using improved Haar wavelet. Information Fusion 53:196–208 22. Ran J, Niu K, He Z, Zhang H, Song H (2018) Cataract detection and grading based on combination of deep convolutional neural network and random forests. In: 2018 international conference on network infrastructure and digital content (IC-NIDC). IEEE, pp. 155–159 23. Zhou Y, Li G, Li H (2019) Automatic cataract classification using deep neural network with discrete state transition. IEEE Trans Med Imaging 39(2):436–446 24. Xu X, Zhang L, Li J, Guan Y, Zhang L (2019) A hybrid global-local representation CNN model for automatic cataract grading. IEEE J Biomed Health Inform 24(2):556–567 25. Yusuf M, Theophilous S, Adejoke J, Hassan AB (2019) Web-based cataract detection system using deep convolutional neural network. In: 2019 2nd international conference of the IEEE Nigeria computer chapter (NigeriaComputConf). IEEE, pp 1–7 26. Jiang J, Liu X, Liu L, Wang S, Long E, Yang H, Yuan F, Yu D, Zhang K, Wang L, Liu Z (2018) Predicting the progression of ophthalmic disease based on slit-lamp images using a deep temporal sequence network. PLoS ONE 13(7):e0201142 27. Gao X, Lin S, Wong TY (2015) Automatic feature learning to grade nuclear cataracts based on deep learning. IEEE Trans Biomed Eng 62:2693–2701 28. Qiao Z, Zhang Q, Dong Y, Yang JJ (2017) Application of SVM based on genetic algorithm in classification of cataract fundus images. In: 2017 IEEE international conference on imaging systems and techniques (IST). IEEE, pp 1–5 29. Yang M, Yang JJ, Zhang Q, Niu Y, Li J (2013) Classification of retinal image for automatic cataract detection. In: 2013 IEEE 15th international conference on e-health networking, applications and services (Healthcom 2013). IEEE, pp 674–679 30. Albahli S, et al (2022) Pandemic analysis and prediction of COVID-19 using gaussian doubling times. Comput Mater Contin 833–849 31. Hassan F et al (2022) A robust framework for epidemic analysis, prediction and detection of COVID-19. Front Public Health 10 32. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 international conference on engineering and technology (ICET). IEEE, pp 1–6 33. Goyal M, Goyal R, Lall B (2019) Learning activation functions: a new paradigm for understanding neural networks. arXiv:1906.09529 34. Bailer C, Habtegebrial T, Stricker D (2018) Fast feature extraction with CNNs with pooling layers. arXiv:1805.03096 35. Yaqoob M, Qayoom H, Hassan F (2021) Covid-19 detection based on the fine-tuned MobileNetv2 through lung X-rays. In: 2021 4th international symposium on advanced electrical and communication technologies (ISAECT). IEEE 36. Ullah, MS, Qayoom H, Hassan F (2021) Viral pneumonia detection using modified GoogleNet through lung X-rays. In: 2021 4th international symposium on advanced electrical and communication technologies (ISAECT). IEEE 37. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255 38. Wang SH, Xie S, Chen X, Guttery DS, Tang C, Sun J, Zhang YD (2019) Alcoholism identification based on an AlexNet transfer learning model. Front Psych 10:205 39. Christodoulidis S, Anthimopoulos M, Ebner L, Christe A, Mougiakakou S (2016) Multisource transfer learning with convolutional neural networks for lung pattern analysis. IEEE J Biomed Health Inform 21(1):76–84 40. Yang H, Mei S, Song K, Tao B, Yin Z (2017) Transfer-learning-based online Mura defect classification. IEEE Trans Semicond Manuf 31(1):116–123 41. Akçay S, Kundegorski ME, Devereux M, Breckon TP (2016) Transfer learning using convolutional neural networks for object classification within X-ray baggage security imagery. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 1057–1061
CataractEyeNet: A Novel Deep Learning Approach to Detect Eye …
75
42. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345– 1359 43. Manzoor S, et al, Melanoma detection using a deep learning approach 44. Access on 6.6.2022, https://iq.opengenus.org/vgg19-architecture/ 45. Access on 6 May 2022. https://academictorrents.com/details/cf3b8d5ecdd4284eb9b3a80fcfe 9b1d621548f72 46. Xiong L, Li H, Xu L (2017) An approach to evaluate blurriness in retinal images with vitreous opacity for cataract diagnosis. J Healthc Eng 47. Abdul-Rahman AM, Molteno T, Molteno AC (2008) Fourier analysis of digital retinal images in estimation of cataract severity. Clin Experiment Ophthalmol 36(7):637–645 48. Gao X, Wong DWK, Ng TT, Cheung CYL, Cheng CY, Wong TY (2012) Automatic grading of cortical and PSC cataracts using retroillumination lens images. In: Asian conference on computer vision. Springer, Berlin, Heidelberg, pp 256–267 49. Lvchen Cao LZ, Li H, Zhang Y, Xu L (2019) Hierarchical method for cataract grading based on retinal images using improved Haar wavelet. arXiv:1904.01261 50. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299–1312
DarkSiL Detector for Facial Emotion Recognition Tarim Dar and Ali Javed
Abstract Facial Emotion recognition (FER) is a significant research domain in computer vision. FER is considered a challenging task due to emotion-related differences such as heterogeneity of human faces, differences in images due to lighting conditions, angled faces, head poses, different background settings, etc. Moreover, there is also a need for a generalized and efficient model for emotion identification. So, this paper presents a novel, efficient, and generalized DarkSiL (DS) detector for FER that is robust to variation in illumination conditions, face orientation, gender, different ethnicities, and varied background settings. We have introduced a low-cost, smooth, bounded below, and unbounded above Sigmoid-weighted linear unit function in our model to improve efficiency as well as accuracy. The performance of the proposed model is evaluated on four diverse datasets including CK + , FER-2013, JAFFE, and KDEF datasets and achieved an accuracy of 99.6%, 64.9%, 92.9%, and 91%, respectively. We also performed a cross-dataset evaluation to show the generalizability of our DS detector. Experimental results prove the effectiveness of the proposed framework for the reliable identification of seven different classes of emotions. Keywords DarkSIL (DS) emotion detector · Deep learning · Facial emotion recognition · SiLU activation
1 Introduction Automatic facial emotion recognition is an important research area in the field of artificial intelligence (AI) and human psychological emotion analysis. Facial emotion recognition (FER) is described as the technology of analysing the facial expression of a person from images and videos to get information about the emotional state of that individual. FER is a challenging research domain because everyone expresses their T. Dar · A. Javed (B) University of Engineering and Technology-Taxila, Department of Software Engineering, Taxila 47050, Pakistan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_7
77
78
T. Dar and A. Javed
emotions differently. Furthermore, several challenges and obstacles exist in this area which makes emotion analysis quite difficult. Nowadays, researchers are focusing to improve the interaction between humans and computers. One way of doing that is to make computers intelligent so they can understand the emotions of humans and interact with them in a better way. Automatic FER systems have the ability to improve our life quality. FER systems can help in the rehabilitation of patients with facial paralysis diseases, they aid in getting customers’ feedback on products [1], and robotic teachers having an understanding of students’ feelings can offer an improved learning experience. In short, FER systems have extensive applications in various domains, i.e., medical, deep fakes detection, e-learning, identification of emotions of drivers while driving, entertainment, cyber security, image processing, virtual reality applications [2], face authentication systems, etc. Early research in the field of facial emotion identification is focused on appearance and geometric-based feature extraction methods. For example, the local binary pattern (LBP)-based model presented in [3] introduced the concept of the adaptive window for feature extraction. The approach [3] was validated on Cohn-Kanade (CK) and Japanese Female Facial Expression (JAFFE) datasets against six and seven emotions. Also, Niu et al. [4] proposed a fused feature extraction method from LBP and oriented FAST and Rotated BRIEF (ORB) descriptors. After that, the support vector machine (SVM) classifier was used to identify the emotions. This method [4] was evaluated on three datasets, i.e., CK + , MMI, and JAFFE. The LBP approaches have the limitations of producing long histograms which slows down model performance on large datasets. Many Convolutional Neural Networks (CNNs)-based methods are developed in the past few decades that have achieved good classification results for FER. For instance, Liu et al. [5] developed CNN-based approach by concatenation of three different subnets. Each subnet was a CNN model which was trained separately. A fully connected layer was used to concatenate extracted features from these subnets and after that softmax layer was used to classify the emotion. The approach [5] was only validated on one dataset which is the Facial Expression Recognition (FER-2013) dataset and obtained an overall accuracy of 65.03%. Similarly, Ramdhani et al. [1] presented a facial emotion recognition system based on CNN. The purpose of this approach [1] was to gather customer satisfaction with the product. This approach was tested on the custom and the FER-2013 datasets. This method [1] has limited evaluation against four emotions on these datasets. Moreover, Jain et al. [6] proposed a deep network (DNN) consisting of convolution layers and deep residual modules for emotion identification and tested the method on JAFFE and Extended Cohn-Kanade (CK + ) datasets. However, there still exist many limitations of these methods such as existing models are not generalized or outperform certain conditions, i.e., variation in face angles, people belonging to different ethnic groups, high computational complexity, variations in lighting conditions and background setting, gender, skin diseases, heterogeneity in faces, and difference in expression of emotion which vary from person to person. In this paper, we presented a robust and effective deep learning model that can automatically detect and classify seven types of facial emotions
DarkSiL Detector for Facial Emotion Recognition
79
(happy, surprise, disgust, fear, sad, anger, and neutral) from frontal and face-oriented static images more accurately. In the proposed work, we customize the basic block of Darknet-53 architecture and introduce the Sigmoid-weighted Linear Unit (SiLU) activation function (a special form of swish function) for the classification of facial emotions. SiLU is a simple multiplication function of input value with a sigmoid function. This activation function allows a narrow range of negative values which facilitates it to recognize the patterns in data easily. As a result of this activation function, a smooth curve is obtained, which aids in optimizing the model in terms of convergence with minimum loss. Furthermore, using SiLU activation in the Darknet-53 architecture optimizes the model performance and makes it computationally efficient. The main contributions of this research work are as follows: • We propose an effective and efficient DarkSil (DS) emotion detector with SiLU activation function to automatically detect seven diverse facial emotions. • The proposed model is robust to variations in gender and race, lighting conditions, background settings, and orientation of the face at five different angles. • We also performed extensive experimentation on four diverse datasets containing images of spontaneous as well as non-spontaneous facial emotions and performed a cross-corpora evaluation to show the generalizability of the proposed model.
2 Proposed Methodology CNN is a network that contains a different number of layers which assists feature extraction from images better than other feature extraction methods [7]. Deep convolutional neural networks are being developed to improve image recognition accuracy. In this study, we present a customized Darknet-53 model which is the improved and deeper version of Darknet-19 architecture. The input size requirement of Darknet-53 is 256 × 256 × 3. The overall architecture of our customized proposed model is shown in Fig. 1.
Fig. 1 Architecture of the proposed method
80
T. Dar and A. Javed
2.1 Datasets for Emotion Detection To evaluate the performance of our model, we have selected four diverse datasets, i.e., Extended Cohn-Kanade (CK + ) [8], Japanese Female Facial Expression (JAFFE) [9], Karolinska Directed Emotional Faces (KDEF) [11], and Facial Expression Recognition 2013 (FER-2013) [10]. JAFFE [9] consists of 213 posed images of ten Japanese models with 256 × 256 resolution. All of the facial images were taken under strictly controlled conditions with similar lighting and no occlusions like hair or glasses. The CK + [8] database is generally considered to be the most frequently used laboratory-controlled face expression classification dataset. Both non-spontaneous (posed) and spontaneous (non-posed) expressions of individuals belonging to different ethnicities (Asians or Latinos, African Americans, etc.) were captured under various lighting conditions in this dataset. The resolution of images in the CK + dataset is 640 × 490. KDEF [11] is a publicly accessible dataset of 4900 images of resolution 562 × 762 taken from five different angles: straight, half left, full left, half right, and full right. This dataset is difficult to analyze because one eye and one ear of the face are visible in full right and full left profile views, making the FER more challenging. FER-2013 [10] contains 35,685 real-world grayscale images of 48 × 48 resolution. As this dataset contains occultation, images with text, nonface, very low contrast, and half-faced images, so, the FER-2013 dataset is more diversified and complex than other existing datasets. A few sample images of all four datasets are presented in Fig. 2.
Fig. 2 Sample images of datasets
DarkSiL Detector for Facial Emotion Recognition
81
2.2 Data Processing In the pre-processing step, images of each dataset are resized to our model requirement of 256 × 256 resolution with three channels. After pre-processing, images are sent to our customized proposed model to extract the reliable features and later classify the emotions of seven different categories as shown in Fig. 1.
2.3 DarkSiL Architecture The smallest component of our customized DarkSiL architecture is composed of the convolutional layer, the Batch Normalization (BN) layer, and the SiLU activation layer which are described as follows: (1) Convolutional layers are the main components of convolutional neural networks. CNN uses a filter or kernel of varied sizes on input to generate a feature map that summarizes the presence of detected features. Darknet-53 architecture contains 53 convolution layers. (2) Batch Normalization Layer—The use of BN is to normalize the output to the same distribution based on the eigenvalues of the same batch. It can accelerate network convergence and prevent over-fitting after the convolutional layer. (3) SiLU activation layer—SiLU is a special case of the Swish activation function which occurs at β = 1. Unlike the ReLU (and other commonly used activation units such as sigmoid and tanh units), the SiLU’s activation does not increase monotonically. The property of non-monotonicity improves gradient flow and provides robustness to varying learning rates. One excellent property of the SiLU is its ability to self-stabilize [19]. Moreover, SiLU is a smooth, unbounded above and below activation function. Unboundedness aids in avoiding saturation, and the bounded below property produces strong regularization effects. Furthermore, smoothing helps in obtaining a generalized and optimal model. SiLU activation can be computed as
f(x) = x × sigmoid(βx)
(1)
where x is the input value and β = 1. The smallest component of the Darknet model is repeated 53 times which means its architecture contains 53 convolutions and 53 batch normalization layers. So, 53 SiLU layers are introduced in our customized architecture. We also used the transfer learning approach to train our model on seven output classes of emotions. Feature extraction layers are initialized by using pre-trained Darknet-53 architecture whereas the last three layers after global average pooling, i.e., fc8 (convolution layer with output size 1000), softmax layer, and classification layer are replaced to improve the model.
82
T. Dar and A. Javed
In the Darknet-53 model, the global average pooling (GAP) layer is presented instead of a fully connected layer. The GAP layer computes the average of all feature maps and feeds the obtained vector into the next convolution layer. The GAP layer has numerous advantages over the convolution layer. One of them is that it imposes a connection between extracted features and categorizations which helps in better interpretation of feature maps as the confidence maps for classes. Second, over-fitting can be prevented in this layer as there is no parameter optimization required in the GAP layer. Moreover, the GAP adds up the spatial information and makes it more robust to spatial translation. In the softmax layer, numbers in the input vector are converted into values in the range of 0 and 1 which are further perceived as probabilities by the model. The mathematical softmax function in this layer is a generalized case of logistic regression and is applied for the classification of multiple classes. A classification layer calculates the cross-entropy loss for classification purposes with exclusive categories. The output size in the preceding layer determines the number of categories. In our case, the output size is seven different classes of emotions and the input image is classified into one of these categories.
3 Experimental Setup and Results For all experiments, the dataset is split into training (60%), validation (20%), and testing (20%) sets. The parameters used for model training on each experiment are Epoch:20, Shuffle: Every epoch, Learning rate: 4 × 10–4 , Batch size: 32, Validation frequency: Every epoch, and Optimizer: Adam. All experiments are carried out on MATLAB 2021a on the machine with the following specifications: AMD Ryzen 9 5900 × 12-core 3.70 GHz processor, 32 GB RAM, 4.5 TB hard disk, and Windows 10 Pro. We employed the standard metrics of accuracy, precision, and recall for the evaluation of our model as these metrics are also used by the contemporary FER methods.
3.1 Performance Evaluation of the Proposed Method We designed four-stage experiments to show the effectiveness of the proposed model on KDEF [11], JAFFE [9], FER-2013 [10], and CK + [8] datasets. In the first stage, we performed an experiment on the JAFFE dataset to investigate the performance of the proposed model on a small posed dataset. After training and validation, the proposed model is tested on the test set and the results are mentioned in Table 1. It is worth noticing that our model has achieved an accuracy of 92.9% on the JAFFE dataset, a mean precision of 93.5%, and a mean recall of 92.8%. Results above 90% on the biased JAFFE dataset with mislabeled class problems show the effectiveness of the proposed model for FER.
DarkSiL Detector for Facial Emotion Recognition Table 1 Results of the proposed model on different datasets
83
Dataset
Accuracy (%) Mean precision Mean recall (%) (%)
JAFFE
92.9
93.5
CK +
99.6
99.1
99.2
KDEF
91.0
93.4
93.0
FER-2013 64.9
65.3
61.1
92.8
In the second stage, we conducted an experiment to show the efficacy of the proposed model on a dataset having individuals who belong to different regions, races, and genders. For this purpose, we choose a lab-controlled CK + dataset that contains spontaneous and non-spontaneous facial expressions of people with varying lighting conditions. Table 1 demonstrates the remarkable performance of the proposed model on the CK + dataset. Results of accuracy, precision, and recall close to 100% show that our model can accurately distinguish seven different types of facial expressions in frontal face images of people belonging to different geographical regions of the world. In the third stage, to check the robustness of the proposed model on varied angular facial images, we designed an experiment on the KDEF dataset as it comprises facial images taken from five different viewpoints. Our proposed model obtained an overall accuracy of 91%, mean precision, and mean recall of 93.4% and 93%, respectively, as shown in Table 1. Obtained results demonstrate that the proposed model not only identifies emotions from frontal face images with higher accuracy but also performs well in the predictions of the facial emotions in images with faces tilted at some angle. In the fourth stage, we implemented an experiment to examine the effectiveness of the proposed method on a real-world FER-2013 dataset that covers challenging scenarios of intra-class variations and class imbalance. This dataset is originally split into training, validation or public test, and private test sets. Furthermore, the FER2013 dataset has non-face, low contrast, occlusion, different illumination conditions, variation in face pose, images with text, half-rotated, tilted, and varied ages and gender images which make the classification process more difficult. As reported in Table 1, our model achieved an accuracy of 64.9% which is good in presence of such variation on this challenging dataset. Moreover, the accuracy achieved on this dataset, i.e., 64.9% ≈ 65% is very close to the human-level accuracy of 65 ± 5% on this dataset [10].
3.2 Comparison with Contemporary Methods To show the effectiveness of our model for facial emotion recognition on multiple diverse datasets, we compared the performance of our method against the existing
84
T. Dar and A. Javed
state-of-the-art (SOTA) FER methods. In the first stage, we compared the performance of our method with these contemporary methods [12–14] on the JAFFE dataset and the results are provided in Table 2. From Table 2, it is clearly observed that our model achieved an average gain of 12.2% over the existing SOTA. Our proposed model also has a higher discriminative ability than existing works. In the second stage, we compared the results of our method on the CK + dataset with existing methods [6], and [17]. The results in Table 2 depict that our model has a 9–10% better recognition rate in the classification of FER and performs well than comparative methods on the CK + dataset. In the third stage, we compared the performance of our method with state-of-the-art methods [12, 15], and [16] on the KDEF dataset. As shown in Table 2, the accuracy of our model is higher than all of the existing works [12, 15, 16] on the KDEF dataset. The second best-performing method [15] obtained an accuracy of 88% which is 3% lesser than our proposed model. The results state that the proposed method can detect images taken from five angles (0°, -45°, 45°, -90°, and 90°) more accurately than SOTA methods. In the last stage, we compared our model’s performance with contemporary approaches of [1, 5], and [18] for the FER-2013 dataset, and results in terms of accuracy are provided in Table 2. It can be seen that the accuracy of the proposed model on the FER-2013 dataset is higher or very close to the best-performing model [5] with a slight difference of 0.13%. It means that our proposed model can detect facial emotions with more accuracy in challenging scenarios of the real world.
3.3 Cross-Corpora Evaluation The previous works on FER gave less attention to the aspect of model generalizability for seven classes of emotions. So, to overcome this limitation, we conducted a cross-corpora evaluation in which four different datasets are used to demonstrate the generalizability of our model. Previous studies have used one or two datasets for training and performed testing on other datasets and also used a few types of emotions when performing cross-corpora experiments. In this study, we include a wide range of datasets from small posed and lab-controlled ones to real-world and spontaneous expression datasets and straight face to varied angled face image datasets in our cross-dataset experiments. The results of the cross-corpora evaluation are displayed in Table 3. Despite the very good performance of the proposed model on the individual datasets, it could not perform as well on cross-dataset experiments. A possible reason for the degradation of the accuracy of these experiments is that there exist many dissimilarities among these datasets. These datasets are collected under distinct illumination conditions, with varying background settings in different environments. Types of equipment used in capturing images are different and images are taken from varying distances from the camera. Furthermore, subjects involved in the preparation of these datasets do not belong to the same geographical regions and are of
DarkSiL Detector for Facial Emotion Recognition Table 2 Comparison of DS detector (proposed model) with SOTA
Table 3 Results of the cross-corpora evaluation
85
Model
Dataset
Accuracy (%)
Sun et al. [12]
JAFFE
61.68
Kola et al. [3]
JAFFE
88.3
LBP + ORB [4]
JAFFE
92.4
Proposed Model
JAFFE
92.9
DTAN [17]
CK +
91.44
DTGN [17]
CK +
92.35
DTAGN (Weighted Sum) [17]
CK +
96.94
DTAGN(Joint) [17]
CK +
97.25
Jain et al. [6]
CK +
93.24
Proposed Model
CK +
99.6
Williams et al. [16]
KDEF
76.5
Sun et al. [12]
KDEF
77.9
VGG-16 Face [15]
KDEF
88.0
Proposed Model
KDEF
91.0
Talegaonkar et al. [18]
FER-2013
60.12
Ramdhani et al. With batch size FER-2013 [1] 8
58.20
With batch size FER-2013 128
62.33
Liu et al. [5]
FER-2013
65.03
Proposed Model
FER-2013 64.9
Training dataset
Testing dataset
Accuracy (%)
Fer-2013
JAFFE
31.0
CK +
KDEF
25.8
CK +
67.0
JAFFE
21.4
KDEF
12.2
FER-2013
28.7
KDEF
JAFFE
35.7
CK +
40.2
JAFFE
KDEF
15.9
FER-2013
14.3
CK +
24.9
86
T. Dar and A. Javed
different gender, ages, and races. There also exists a dissimilarity among morphological characteristics of individuals involved in the making of these datasets. Moreover, people belonging to different ethnicities have differences in expressing their emotions. Eastern in contrast to Western shows low arousal emotions. Japanese (eastern) in contrast to European and American (western) tends to show fewer physiological emotions [13]. Datasets available in the domain of FER are also biased like KDEF is ethnicity biased (only European people) and JAFFE is a lab-controlled and highly biased dataset concerning gender (only females) and ethnicity (only Japanese models) and ambiguous expression annotations [14]. Images present in the original datasets are also different from each other in terms of resolution (FER-2013: 48 × 48, JAFFE: 256 × 256, etc.) and image type (grayscale and RGB). Although we upscale and downscale them into the same resolution according to our customized model requirement. But this reason may also affect the results of the cross-corpora evaluation. Despite all these reasons, it can be observed from the results in Table 3 that our proposed model, when trained on the FER-2013 dataset and tested on the JAFFE dataset, obtained an accuracy of 67% which is good in presence of such diversity. Also, the model trained on the KDEF dataset is able to achieve an accuracy of 40.2%. In Table 3, results above 30% are shown in bold.
4 Discussion In this study, we conducted different experiments on four diverse datasets covering scenarios of straight and varied angled face images, people belonging to different cultures having different skin tones and gender (males, females, and children), variations in lighting conditions, different background settings, races, and a real-world challenging dataset. Our proposed model obtained accuracies greater than 90% except for the FER-2013 dataset. By closely observing the FER-2013 dataset, we found these possible reasons for the degradation of accuracy on this dataset. There exists a similarity in the face morphology of anger, surprise, and disgust classes of emotions in this dataset. Additionally, there exist more images of happy emotions as compared to other classes of emotions, which leads to insufficient learning of traits for these classes. Moreover, the FER-2013 dataset contains images with nonfaces, occlusions, half-rotated and tilted faces, and variations in facial pose, age, and gender, which affect the recognition rate of the model. However, in presence of such challenges, our proposed model is still able to achieve human-level accuracy of approximately 65% for this dataset [10]. Table 1 shows the summarized performance of the proposed model on all these datasets. The outperforming results of our model on varied and diverse datasets including challenging scenarios show that our model is effective and robust in recognizing facial emotions. Moreover, the addition of SiLU activation in Darknet architecture not only increases the model’s efficiency but also improves accuracy. We also performed cross-corpora experiments to show the generalizability of our approach. From the results, we can say that our model
DarkSiL Detector for Facial Emotion Recognition
87
has covered most of the limitations of existing methods and performed well than comparative approaches.
5 Conclusion In this research, we have introduced a novel model for facial emotion recognition that is efficient, cost-effective, and robust to variations in gender, people belonging to different races, lighting conditions, background settings, and orientation of the face at five different angles. The presented model was tested on four different datasets and achieved remarkable performance on all of them. The proposed model not only effectively classified emotions from frontal face pictures but also outperformed existing methods on face images with five distinct orientations. We also performed a crosscorpora evaluation of the proposed model to demonstrate its generalizability. In the future study, we plan to create a custom FER dataset to test the performance of our method in real time and further improve the performance of cross-corpora evaluation. Acknowledgements This work was supported by the Multimedia Signal Processing Research Lab at the University of Engineering and Technology, Taxila, Pakistan.
References 1. Ramdhani B, Djamal EC, Ilyas R (2018, August). Convolutional neural networks models for facial expression recognition. In 2018 International Symposium on Advanced Intelligent Informatics (SAIN). IEEE, pp 96–101 2. Mehta D, Siddiqui MFH, Javaid AY (2018) Facial emotion recognition: A survey and real-world user experiences in mixed reality. Sensors 18(2):416 3. Kola DGR, Samayamantula SK (2021) A novel approach for facial expression recognition using local binary pattern with adaptive window. Multimed Tools Appl 80(2):2243–2262 4. Niu B, Gao Z, Guo B (2021). Facial expression recognition with LBP and ORB features. Comput Intell Neurosci 5. Liu K, Zhang M, Pan Z (2016, September). Facial expression recognition with CNN ensemble. In 2016 International Conference on Cyberworlds (CW), IEEE. pp 163–166 6. Jain DK, Shamsolmoali P, Sehdev P (2019) Extended deep neural network for facial emotion recognition. Pattern Recogn Lett 120:69–74 7. Wang H, Zhang F, Wang L (2020, January) Fruit classification model based on improved Darknet53 convolutional neural network. In 2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), IEEE. pp 881–884 8. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 IEEE Computer Society Conference on Computer Vision And Pattern Recognition-Workshops, IEEE. pp 94–101 9. Lyons MJ, Kamachi M, Gyoba J (2020) Coding facial expressions with Gabor wavelets (IVC special issue). arXiv preprint arXiv:2009.05938 10. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Bengio Y (2013, November) Challenges in representation learning: A report on three machine learning contests.
88
11. 12. 13. 14. 15.
16. 17.
18.
19.
T. Dar and A. Javed In International Conference on Neural Information Processing, pp. 117–124. Springer, Berlin, Heidelberg Lundqvist D, Flykt A, Öhman A (1998) Karolinska directed emotional faces. Cogn Emot Sun Z, Hu ZP, Wang M, Zhao SH (2017) Individual-free representation-based classification for facial expression recognition. SIViP 11(4):597–604 Lim N (2016) Cultural differences in emotion: differences in emotional arousal level between the East and the West. Integr Med Res 5(2):105–109 Liew CF, Yairi T (2015) Facial expression recognition and analysis: a comparison study of feature descriptors. IPSJ transactions on computer vision and applications 7:104–120 Hussain SA, Al Balushi ASA (2020). A real time face emotion classification and recognition using deep learning model. In Journal of physics: Conference Series 1432(1), p 012087. IOP Publishing Williams T, Li R (2018, February) Wavelet pooling for convolutional neural networks. In International Conference on Learning Representations Jung H, Lee S, Yim J, Park S, Kim J (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision, pp 2983–2991 Talegaonkar I, Joshi K, Valunj S, Kohok R, Kulkarni A (2019, May) Real time facial expression recognition using deep learning. In Proceedings of International Conference on Communication and Information Processing (ICCIP) Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Review and Enhancement of Discrete Cosine Transform (DCT) for Medical Image Fusion Emadalden Alhatami, Uzair Aslam Bhatti, MengXing Huang, and SiLing Feng
Abstract An image fusion is a kind of single process which combines the necessary or efficient information from a set of different or similar input images into a single output image where the resulting image is more accurate, informative, and complete than any of the input images with a specific algorithm. Image enhancement is a process used to improve the quality of an image and increases the application of these input data images, which is helpful in different fields of science such as medical imaging, microscopic imaging, remote sensing, computer vision, robotics, etc. In this paper, we describe the primary function of image fusion to improve an image’s good quality by evaluating the sharpness. Then we attempt to give an overview of multi-modal medical image fusion methods, emphasizing how we can use the DCT method for medical image fusion. This fused image provides more accurate information about the real world, which is helpful for human vision and machine perception or any further image processing tasks in the future. Keywords Image fusion · Discrete Cosine Transform (DCT) · Medical Image Fusion · (CT)Image · (MR)Image
E. Alhatami (B) · U. A. Bhatti · M. Huang · S. Feng School of Information and Communication Engineering, Hainan University, Haikou 570100, China e-mail: [email protected] U. A. Bhatti e-mail: [email protected] M. Huang e-mail: [email protected] S. Feng e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_8
89
90
E. Alhatami et al.
1 Introduction Image fusion is one of the essential techniques which is also used in the field of digital image processing. The image fusion process works on combining the necessary information from two or more images. It then produces a single output image that has all the required information than any input image [1]. It is the process of joining two or more similar images to form a new image using wavelet theory [2]. It operates in various fields of science such as medical imaging, remote sensing, ocean surveillance and artificial neural networks, etc. This process acquires all of the features from the different input images and put in a single image that has more accurate, complete and informative information than any input image [1]. The input images could be of many types, such as multi-sensor, multi-modal, multi-focal and multi-temporal. Image enhancement is the process of improving the good quality of an image in which different images are registered to form a single output image that has good quality and is appropriate for the human being and the machine interpretation [3]. The image fusion process can be of two types—The spatial domain fusion method and the transform domain fusion method [4]. The spatial domain fusion method is a kind of process that will immediately deals with the pixels of input images where the pixel is the smallest unit of graphics. And in the Transform domain fusion method, images are first changed into the frequency domain. Then, it also helps in the evaluation of sharpening the image. Nowadays, based on the wavelet transform theory, the image fusion algorithm has worked faster as the recent decade. Discrete Cosine Transform has good features in terms of time–frequency. It can be applied successfully for the image processing field [5]. The process of image fusion can be performed at three levels: pixel, Feature, and decision level. The image fusion technique is used to obtain a lot of informative, accurate, complete and top-quality image from two or more pictures. The objectives of image fusion are to reduce the data which will be lost during the fusion process because of some physical parameters such as pixel intensity, echo and repetition time etc. increases the complexity of the pictures and another goal is to enhance the quality of an image in terms of sharpness, as shown in (Fig. 1). Medical image fusion is the process of fusing multiple images from multiple imaging modalities to obtain a fused image with a large amount of information for
Fig. 1 a Input image 1 b Input image 2 c Fused image
Review and Enhancement of Discrete Cosine Transform (DCT) …
91
increasing the clinical applicability of medical images [6]. With the advancements in the field of medical science and technology, medical imaging can provide various modes of imagery information, and different medical images have some specific characteristics which require simultaneous monitoring for clinical diagnosis [6]. Hence multimodality image fusion is performed to combine the attributes of various image sensors into a single image. The medical images obtained from different sensors are fused to enhance the diagnostic quality of the imaging modality [7].
2 Image Fusion Objective • Image fusion techniques have broad applications in image processing and computer vision areas to improve the visual ability of human and machine vision. • The fused image is more suitable for human perception than any individual input images. • General Properties of medical image fusion (Image overlay for displays- Image sharpening for operator clarity- Image enhancement through noise reduction). • Extracting representative salient features from source images of the same scene, and then the salient features are integrated into a single image by a proper fusion method. • The fused image should not change, affect, or damage the quality of the original images. A fused image should be imperceptible that humans cannot find the difference between the original and the fused image [7].
3 Fusion Classification Image fusion technology is widely used in pattern recognition and image processing. Image fusion technology can involve various stages of image processing. Therefore, according to the image processing and analysis, the integration technology is divided into three levels: image fusion algorithm based on a pixel level, image fusion algorithm based on feature level method, and image fusion algorithm based on decision level [7].
3.1 Pixel Level In pixel level classification, image fusion is implemented between an individual pixel value. This level measures dot and pixels per inch but, sometimes it has different meanings especially for printer devices where, dpi is a measure of the printer density of dot and how many number of pixels in an input image involved called resolution.
92
E. Alhatami et al.
The benefits of image fusion at the pixel level are that the actual quantities are directly included in the image fusion process [8].
3.2 Feature Level In feature level classification, image fusion is implemented between the segmented portions of input images by examining the properties of pictures. Feature Level has various features such as edges, lines, and texture parameters etc. [8]. This level is also used in pre- image processing for image splitting or to change the perception.
3.3 Decision Level In decision level classification, image fusion is implemented between the segmented portions of input images by examining the initial object perception and their grouping. In the decision level, the results calculated from different algorithms are shown as confidence rather than decisions called as soft fusion. Otherwise, it is called hard fusion. The input images can be processed individually which helps in the information extraction [7, 8]. The decision-level methods can be categorized as voting methods, statistical methods and fuzzy logic-based methods. The decision-level fusion methods are also used in the field of artificial intelligence. Ex- Bayesian inference, and Dempster-Shafer method [9].
4 Acquired Input Image Images are acquired and fused in different ways, such as multi-sensor, multitemporal, multi-focal, multi-modal and multi-view [10]. • Multi-sensor image fusion: fusion fuses source images captured by various sensors. • Multi-temporal image fusion: fusion combines images taken under various conditions with a specific end goal to fuse accurate images of articles that were not taken within the expected time. • Multi-focal image fusion: image fusion is combined with image scenes of different center lengths brought about by repetition, where complementary information from the source image is fused. • Multi-modal image fusion: the fusion fuses supplementary and complementary information from the source image. • Multi-view image fusion: this combines images of a similar method taken from different angles simultaneously.
Review and Enhancement of Discrete Cosine Transform (DCT) …
93
5 Research Methodology 5.1 Discrete Cosine Transform (DCT) The Discrete Cosine Transform (DCT) can play an essential role in the compression of images in the form of Moving Pictures Expert Groups (MPEG) and Joint Video Team (JVT) etc. Discrete Cosine Transform (DCT) is used to transform the spatial domain image into the frequency domain image [10]. The coefficients of the images are represented by the alternating current (AC) values and Direct Current Values (DC). Red, Green, Blue (RGB) image can be divided into blocks of images with the size of 8*8 pixels. Then, the image group in the matrices of the image is divided and grouped by the matrices of red, green, and blue and transformed to the grey scale image [10]. Discrete cosine transformation (DCT) plays a crucial role in digital image processing. In DCT, images are divided into non-overlapping blocks of size N*N and the coefficients of the DCT are calculated for each block and then the fusion rules are applied to get a higher quality fused image. These techniques cannot be performed well while using the algorithms with block size less than 8 × 8 and also have the block size equivalent to the image size itself. The advantage of DCT is that it is a straightforward algorithm and can be used for real-time application transformations [11] (Fig. 2).
Fig. 2 Image fusion diagram using DCT for medical image
94
E. Alhatami et al.
Fig. 3 Image fusion flowchart Using DCT
5.2 DCT for Image Fusion For fusing the multimodality images such as CT and MRI, first, the input images are decomposed into base and detail images using the fourth-order differential equations method. The final detail image is obtained by a weighted average of principal components of detail images. Next, the base images are given as input for CT and MRI decomposition. The corresponding four sub-band coefficients are processed using DCT. DCT is used to extract significant details of the sub-band coefficients. The spatial frequency of each coefficient is calculated to improve the extracted features. At last, the fusion rule is used to fuse DCT coefficients based on spatial frequency value. The final base image is obtained by applying inverse DCT (IDCT) as shown in (Fig. 3). A final fused image is generated by combining the above final detail and base images linearly [12].
6 Discussion MRI, also known as Magnetic Resonance Imaging, provides information on the soft tissue structure of the brain without functional information. The density of protons in the nervous system, fat, soft tissue, and articular cartilage lesions is large, so the image is apparent and does not produce artifacts. It has a high spatial resolution and no radiation damage to the human body, and the advantage of rich information makes it an essential position in clinical diagnosis [13]. The density of protons in the bone is very low, so the bone image of MRI is not clear. The CT image is called Computed Tomography imaging. X-ray is used to scan the human body. The highdensity absorption rate of bone tissue relative to soft tissue makes the bone tissue of the CT image particularly clear. The low permeability of X-rays in soft tissue leads to a low absorption rate, so CT images show less cartilage information, representing anatomical information. MRI image and CT image as shown in (Fig. 4). Figure 5 shows an example of Image Fusion use of DCT in medical diagnosis by fusing CT and MRI. The CT is used for capturing the bone structures with high spatial resolutions and MRI is used to capture the soft tissue structures like the heart,
Review and Enhancement of Discrete Cosine Transform (DCT) …
95
Fig. 4 a CT source images, b MRI source images
eyes, and brain. CT and MRI can be used collectively with Image Fusion techniques to enhance accuracy and sensible medical applicability [14]. MRI and CT combine the advantages of clear bone information in CT images and the clear soft tissue of MRI images to compensate for the lack of information in a single imaging [15]. Figure 4 illustrates the fusion of MRI and CT images. In this, Fig. 5 MRI-CT medical image fusion
96
E. Alhatami et al.
the fusion of images is achieved by the guide filtering-based technique with image statistics.
7 Conclusion The fusion of medical images from various modalities is examined as a topic of study for researchers due to its importance and usefulness for the health sector and a better diagnosis with merged images of quality information. Merged images should contain more comprehensive information than any input image, even if redundant information is present. Typical images of MRI and CT transforms, the number of decomposition levels affects image fusion result. The DCT method has a real potential to compress and decompress images and the transformation process at a pixel level, (DCT) Method is suitable for real-time applications as they can also obtain a reasonable compression ratio which is beneficial for transmitting and storing data.
References 1. Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, Yuan L, Luo W, Mehmood A (2021) Local similaritybased spatial-spectral fusion hyperspectral image classification with deep CNN and Gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15 2. Shahdoosti HR, Mehrabi A (2017) MRI and PET image fusion using structure tensor and dual ripplet-II transform. Multimedia Tools Appl 77:22649–22670 3. Jing W, Li X, Zhang Y, Zhang X (2018) Adaptive decomposition method for multi-modal medical image fusion. IET Image Process 12(8):1403–1412 4. Ravi P, Krishnan J (2018) Image enhancement with medical image fusion using multiresolution discrete cosine transform. In: International conference on processing of materials, minerals and energy, vol 5, pp 1936–1942 5. Kumar S (2015) Image fusion based on pixel significance using cross bilateral filter. SIViP 9(5):1193–1204 6. Du, Li W, Lu K, Xiao B (2016) An overview of multi-modal medical image fusion. Neurocomputing 215:3–20 7. Li T, Li J, Liu J, Huang M, Chen YW, Bhatti UA (2022) Robust watermarking algorithm for medical images based on log-polar transform. EURASIP J Wireless Commun Netw 1–11 8. Kaur, Saini KS, Singh D, Kaur M (2021) A comprehensive study on computational pansharpening techniques for remote sensing images. Arch Comput Methods Eng 1–18 9. Balakrishnan A, Zhao MR, Sabuncu JG, Dalca AV (2020) VoxelMorph a learning framework for deformable medical image registration. IEEE Trans Med Imaging 38(8):1788–1800 10. Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Inform Syst 13(3):329–351 11. Amiri E, Roozbakhsh Z, Amiri S, Asadi MH (2020) Detection of topographic images of keratoconus disease using machine vision. Int J Eng Sci Appl 4(4):145–150 12. Bavirisetti DP, Kollu V, Gang X, Dhuli R (2017) Fusion of MRI and CT images using guided image filter and image statistics. Int J Imaging Syst Technol 27(3):227–237 13. Bhatti UA, Yu Z, Li J, Nawaz SA, Mehmood A, Zhang K, Yuan L (2020) Hybrid watermarking algorithm using Clifford algebra with Arnold scrambling and chaotic encryption. IEEE Access 8:76386–76398
Review and Enhancement of Discrete Cosine Transform (DCT) …
97
14. Yang C, Li J, Bhatti UA, Liu J, Ma J, Huang M (2021) Robust zero watermarking algorithm for medical images based on Zernike-DCT. Secur Commun Netw 15. Zeng C, Liu J, Li J, Cheng J, Zhou J, Nawaz SA, Bhatti UA (2022) Multi-watermarking algorithm for medical image based on KAZE-DCT. J Ambient Intell Human Comput 1–9
Early Courier Behavior and Churn Prediction Using Machine Learning in E-Commerce Logistics Barı¸s Bayram, Eyüp Tolunay Küp, Co¸skun Özenç Bilgili, and Nergiz Co¸skun
Abstract With the surge in competitive e-commerce demands occurring mainly due to the COVID-19 outbreak, most logistics companies have been compelled to create more efficient and successful delivery organizations, and new logistics companies which provide different opportunities to employees have entered the market and led to a boost in competition. In this work, an approach to early employee churn prediction is developed for couriers of a private logistics company using real delivery behaviors and demographic information of the courier. The churn scores of the couriers are computed regarding the delivery performances of the couriers for each day. Also, using the historical delivery data of the couriers, a regression model is employed for the prediction of the delivery behaviors for the next week to be utilized for churn prediction. Based on the churn scores, the couriers are clustered into a number of groups in a weekly manner. In the experiments, the Gradient Boosting Trees (GBTs) based binary classification and regression algorithms achieved the best performances in courier behavior prediction in terms of R2 -scores (up to 86.2%) and error values, and churn prediction in terms of ROC curves with AUC scores (up to 85.6%) and F1-scores (up to 68.4%). Keywords Transportation · e-commerce logistics · Early employee churn prediction · Behavior prediction · Gradient boosting trees
B. Bayram (B) · E. T. Küp · C. Ö. Bilgili · N. Co¸skun HepsiJET, ˙Istanbul, Turkey e-mail: [email protected] E. T. Küp e-mail: [email protected] C. Ö. Bilgili e-mail: [email protected] N. Co¸skun e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_9
99
100
B. Bayram et al.
1 Introduction The global COVID-19 pandemic has accelerated the expanding usage of e-commerce sites which also led to a rapid and noticeable increase in online shopping [12]. In the first half of 2020, the e-commerce sales in Turkey reached 91.7 billion Turkish Lira climbing 64% over the previous year, which significantly affected the e-commerce logistics industry. Since this increase has been accelerating further with the pandemic, logistics companies are constantly fostering competition both in online marketing management and delivery service. Therefore, the companies need to adopt a reliable crowd-sourced delivery system by ensuring a high-quality courier service in this competitive environment. In various sectors, the knowledge, capabilities, skills, and experiences of employees become the growth factor of the companies. It is a crucial and challenging process to retain valuable employees, since training and recruiting new employees to fill vacant positions requires a lot of time and resources [3]. In the competitive environment of the e-commerce logistics industry especially during the COVID19 pandemic period, several companies have emerged that offer relatively different opportunities for payment per delivery, location, or workload. The logistics industry has a significant role in creating employment since the need for additional couriers substantially increased due to the attrition and the boosted delivery workload. Also, this competitive environment causes an increase in the rate of voluntary turnover, which negatively impacts sustaining the workforce and risks losing the competitive advantage of the companies. Retaining hard-working, communicative, financially and morally satisfied couriers who have been an employee of the company for a long time, but are not motivated, is also an essential strategy for logistics companies. However, in this period, the attrition in a working place that means a decrease in the labor force is an important issue. Therefore, the companies need to detect the attrition intend to retain the employees and prevent turnovers to enhance the efficiency of human resources and competitiveness of the logistics companies. In the logistics sector, courier turnover is an important problem, which causes noticeable gaps in the operations of shipments during the pandemic period. To retain human resources in logistics, the churn prediction problem has not been investigated yet. Due to financial reasons, tremendous distrust arises between couriers and managers while fairly distributing the packages and resources among the couriers. In the real world, human resources data have many problems such as missing, inconsistent and noisy information, etc. which deteriorate the development of a churn prediction ability. For various prediction and analysis tasks, the use of machine learning algorithms is a key also to human resources problems, because the department has extensive data on the employees about wages, benefits, career evaluations, annual evaluations of performance, recruitment, exit interviews, opinions about other employees and seniors, etc. The prediction of customer churn has widely been studied, and many machine learning-based approaches have been proposed in recent years. The attrition of the employees has been investigated in several studies, but in the logistics area, the churn prediction has not been addressed yet.
Early Courier Behavior and Churn Prediction Using Machine Learning …
101
Several state-of-the-art machine learning algorithms have been employed for the prediction of churners. Most of the works have focused on churn prediction of customers in different industries like cargo and logistics [4, 10], banking [8], telecommunication [5], and aviation [7]. For customer turnover prediction, various machine learning algorithms have been employed such as Gradient Boosting Trees (GBT) [11], Decision Tree (DT) [11], etc. Abiad and Ionescu present a logistic regression-based approach for the analysis of customer churn behaviors [1]. In the cargo and logistics industry, only a few works have focused on the churn prediction of customers, but for employees, there are no studies to detect the attrition of couriers. In the study [2], several state-of-the-art machine learning algorithms for the prediction of employee turnover were investigated and compared, and it is observed that Extreme Gradient Boosting Trees (XGBoost) presented the best prediction performance. Besides dynamic and behavioral features, for employee churn prediction, the static features which are personal information of the couriers such as age, the distance between cross-dock and courier’s living place, mandatory military service status, gender, absenteeism, etc. are discussed to analyze the impacts of the information on the attrition [9]. The use of machine learning algorithms for churn prediction of employees is difficult for researchers due to the confidentiality and the lack of human resources and real churn data of the employees, which affect the deep analysis of the problem and the generalization of a churn prediction solution. In this work, a machine learning-based churn prediction approach is presented using real delivery data of a private company in the logistics sector to predict potential churner couriers using static demographic information and dynamic delivery performances. Also, courier behaviors in the future are predicted to be used for churn prediction. The approach is composed of a binary classification-based churn prediction model and a clustering method to categorize the couriers into different types of working profiles using the combination of the churn scores of the binary model with the delivery rate of the couriers. The early churn prediction model computes the churn scores for all the existing couriers on each day of the previous and next weeks using various features of the couriers. The features include the daily aggregated delivery data, calendar features (e.g. day of week, day of month, month of year), and demographic information (e.g. age, education, gender). Moreover, a cross-dock-based analysis can be conducted according to the predictions of churners in the same cross-docks. The proposed approach is evaluated using various algorithms with 24-month long real delivery data of hepsiJET company. The main aim of the proposed approach is to provide information on possible churners to the cross-dock managers to take an action to retain the couriers. The problems due to the manager, working districts, the other employees, etc. may be detected regarding the prediction of churners in the same cross-dock. The main contributions of this study are; (i) developing an advanced feature engineering process for courier churn prediction on real logistics data, (ii) conducting the first national work in the logistics industry for future performance prediction and churn prediction of the couriers, (iii) developing one of the first machine learning-based methods for churn prediction on predicted delivery performances
102
B. Bayram et al.
of the couriers, and (iv) the weekly update of the regression and churn prediction algorithms to adapt the abnormal conditions in special situations.
2 Proposed Approach The proposed approach (Fig. 1) for early courier churn prediction on the streaming data is composed of the following steps: (1) data preparation, (2) extraction of behavioral aggregated and demographic features for the behavior and churn prediction models, (3) the construction of the training sets for both models, (4) generation of the models, (5) prediction of the couriers’ behaviors for the next week, (6) daily churn prediction to compute churn scores for each courier’s feature vector in each day of the previous and next weeks, and (7) clustering of the daily scores and delivery number rates of the couriers to categorize the couriers into four courier profiles of performances regarding the churn. The churn prediction model computes the probability of churn for each courier, C in each day, D of the last and next weeks which is used as the daily churn score: scor e X C = P(class = chur ner |X C ) where X C ∈ {X CD pr ev , X CDnext } represents the features of the courier, C in which X CD pr ev is the feature vector from the delivery data of the courier, C in the day D pr ev of the previous week, and X CDnext is the vector of the courier’s predicted behaviors including the delivery statistics in the day Dnext of the next week. The courier behavior prediction and churn prediction using various features of the couriers in the previous and next weeks may help to carry out layoffs to reduce the workforce due to the dramatic decreases in the number of deliveries, the rate of
Fig. 1 The overview of the proposed churn prediction approach
Early Courier Behavior and Churn Prediction Using Machine Learning …
103
delayed delivery, and working hours. The reliable and efficient prediction capability will give prior information about the couriers’ future performance which can be used to take related actions to improve the performance, or to predict the churn possibility in advance.
2.1 Data Preparation In the data preparation step, the pre-processing methods are employed such as removal of irrelevant and redundant data, and imputation of missing values with the most frequent values, which are mostly in demographic information such as birth date, marital status, and mandatory military service status. In addition, for the training set preparation, the last 5, 15, 30, or 60 days of the churners are annotated as “churn” class, and the rest days of the churners and all days of the non-churners are annotated as “non-churn”. The different annotation processes are evaluated in terms of the churn prediction performances to estimate the most discriminative process of the churners.
2.2 Feature Extraction and Selection The raw data of deliveries and demographic information of couriers is utilized as the input of the proposed churn prediction approach. The delivery details which cover the attributes for each delivery are as follows: the ids of delivery, courier, cross-dock, district, address, and city, number of attempts for the delivery, its payload, and the promised and delivered dates of the delivery, and courier based attributes for each courier are as follows: total work hours of a day, total absent days in a week, and total working days since the first day, and the demographic details which are age, education, military service status, gender, and marital status. The feature extraction step is performed in the training and prediction steps for courier behavior prediction and churn prediction using the raw data. The raw data of the delivery performances is daily aggregated to extract the dynamic features of the couriers’ delivery behaviors. Also, the delivery counts for all the working couriers are predicted for each day in the next week, and the same aggregated features are extracted using the predicted future behaviors of the couriers. The step is individually performed for each task of the behavior prediction and churn prediction. Also, the calendar time features which are day of week, day of month, day of year, week of year, month of year, and year are extracted. The extracted features are listed in Table 1. The aggregated features are combined with demographic and time features to be used in the feature selection step. It is important to estimate the most distinctive features from the turnover behavior of the couriers to improve the churn prediction performance. Also, the most useful features are selected for the regression model to efficiently predict future behaviors. The prediction of behaviors that lead to turnovers
104
B. Bayram et al.
Table 1 The aggregated features Feature
Description
delivery_count_today/yest
Delivery made today/yesterday
delivery_rate_today/yest
Cross-dock wise rate of delivery made today/yesterday
mean_delv_num_3d/1w/2w/1 m
Mean delivery made in the last 3 days/week/two weeks/month
std_delv_num_3d/1w/2w/1 m
Std delivery made in the last 3 days/week/two weeks/month
max_delv_num_3d/1w/2w/1 m
Maximum delivery made in the last 3 days/week/two weeks/month
min_delv_num_3d/1w/2w/1 m
minimum delivery made in the last 3 days/week/two weeks/month
mean_delv_rate_3d/1w/2w/1 m
Mean cross-dock wise rate of delivery made in the last 3 days/week/two weeks/month
depending on the delivery performance, and working time and area can be useful to retain the couriers.
2.3 Generation Behavior and Churn Prediction Models For the tasks of courier behavior prediction and churn prediction, the steps of training set construction, model selection, and training of the model are individually examined. Churn prediction model. To estimate the best churn prediction model, various state-of-the-art machine learning algorithms are evaluated in a binary classification manner, which are XGBoost, Light Gradient Boosting Machine (LightGBM), Random Forest (RF), and Multilayer Perceptron (MLP). Delivery number prediction model. The most suitable regression model is selected using the selected features. For the prediction of delivery capacity, several regression models investigated in different problems [6] are employed, which are XGBoost regressor, LightGBM regressor, Linear Regression (LR), RF Regressor (RFR), MLP Regression, and Support Vector Regressor (SVR) in the model selection process. The algorithms are used with their best set of hyperparameters which is estimated also using the selected set.
Early Courier Behavior and Churn Prediction Using Machine Learning …
105
2.4 Early Courier Churn Prediction Based on the extracted features from the delivery behaviors of the previous and the predicted behaviors for the next week, the binary classification-based churn prediction model produces churn scores. Using the scores, the couriers are clustered into four types of couriers ((i) screening group, (ii) open-for-improvement group, (iii) average performance group, (iv) high-performance group) to monitor the performance changes of the couriers, and the churn prediction is achieved depending on the couriers in two clusters with higher scores than the others.
3 Experiments The delivery behaviors of couriers have been utilized for future behavior and churn predictions. The binary classification and regression models are evaluated using the real delivery data from hepsiJET which is a logistics company in Turkey.
3.1 Experimental Setup The dataset of couriers includes the delivery behaviors in a period of two years with the pandemic era from February 2020 to May 2022. In the dataset, there are 37 categorical and numeric attributes composed of 7 raw features including demographic information and ids of couriers and cross-docks in which they are working, 6 calendar features, and 24 features extracted from the raw delivery count. The initial training set covers the delivery data of the couriers until 2022, and the test set is composed of the data in 2022. Using the training and test sets, the built-in feature importance outputs of the XGBoost, LightGBM, and Random Forest models and the best features based on the p-value significant levels (less than 0.05) are used to select the features. For churn prediction, the delivery numbers and cross-dock wide rate of delivery numbers made today and the mean values of the counts and rates in 3 days/1 week/2 weeks with age, education, and working times are selected. Also, for behavior prediction, all the aggregated features and working times are selected for the experiments. In the experiments, every Sunday, using the models generated with the selected features of the historical data, the churn prediction is performed on the data of the previous week, and the behavior and churn prediction models are employed for the next week.
106 Table 2 The average of overall R2 values and RMSE of the regression algorithms
B. Bayram et al. Algorithms
Avg. R2 values Avg. RMSE
XGBoost Regression
0.853
27.60
LightGBM Regression
0.862
26.71
Random Forest Regressor
0.830
27.98
Linear Regression
0.836
27.21
Multilayer Perceptron Regressor 0.814
29.06
3.2 Evaluation Metrics The evaluation for prediction of future delivery behaviors is carried out in terms of R2 -score and Root Mean Square Error (RMSE). Also, for the churn prediction, the empirical performances of traditional machine learning algorithms have been evaluated using F1-scores and ROC curves with AUC scores. However, accuracy is not a reliable evaluation metric for such binary classification problems with imbalanced data, therefore, F1-scores are computed by taking into account that the couriers in the screening group are the possible churners in the weeks.
3.3 Results of Courier Behavior Prediction Experiment The average RMSEs and AUC scores of the regression algorithms are given in Table 2. The best performances were obtained by the LightGBM algorithm for each month in the test set, and the XGBoost regression model presented the suitable performances. However, the MLP and Random Forest provided the worst prediction performances. Also, the total RMSE for the predicted future behaviors of the couriers for each day are demonstrated in Fig. 2. Therefore, for the churn prediction using the delivery behaviors of the couriers predicted for the next weeks, the predictions of XGBoost and LightGBM regression models are combined.
3.4 Results of Churn Prediction Experiment In the experiments of early churn prediction, firstly, the best number of days is estimated for the annotation of the churner couriers regarding the numbers, 3, 7, 10, 14, 30, and 60. In Fig. 3, for each day, the ROC curve of the best algorithm with the highest AUC score is demonstrated, so the best churn prediction performance was obtained by the XGBoost when annotating the daily performances of the churners in the last 1 month as “churn” class. In Table 3a and b, the churn prediction performances are listed, which are obtained using the previous and next weeks, respectively. According to the results of the
Early Courier Behavior and Churn Prediction Using Machine Learning …
107
Fig. 2 Daily total RMSE values of the predicted behaviors of the couriers by the algorithms
Fig. 3 The number of days used for annotation in terms of the ROC curves with AUC scores of the best algorithms
algorithms, using the courier data in the previous and next weeks, the XGBoost algorithm presented the best performances for the early churn prediction of couriers based on the data in previous weeks in terms of the average F1-score as 0.684 and AUC score as 0.856, and in next weeks in terms of the average F1-score as 0.662 and AUC score as 0.805. In Fig. 3a, b, the ROC curves with AUC scores demonstrated were obtained using previous weeks and next weeks, respectively, in which the most suitable prediction performances were attained by XGBoost. Also, LightGBM and Random Forest provided satisfying performances. The binary MLP model presented suitable prediction performance for the early churn prediction with the delivery data in the previous weeks. However, the Decision Tree algorithm had the worst prediction performances in each week of the test set. The best monthly performance was achieved in April by the XGBoost algorithm, but the worst one was obtained in February (Fig. 4).
108
B. Bayram et al.
Table 3 The F1-score and AUC score of the churn prediction models with (a) the features of couriers in the previous weeks, and (b) the features of couriers in the next weeks Algorithm
F1-score
AUC-score
Algorithm
F1-score
AUC-score
XGBoost
0.684
0.856
XGBoost
0.662
0.805
LightGBM
0.671
0.827
LightGBM
0.640
0.791
Random Forest
0.657
0.831
Random Forest
0.644
0.790
MLP
0.660
0.803
MLP
0.624
0.777
Decision Tree
0.614
0.741
Decision Tree
0.541
0.707
Fig. 4 The ROC curves with AUC scores of the algorithms for churn prediction with a the data of the previous weeks, and b the data of the next weeks
4 Conclusion and Discussion It is a fact in the logistics sector that hiring a new courier instead of the resigned courier is a costly and time-consuming problem if the new courier is not familiar with the delivery intensity of the districts with the area- and company-specific requirements and circumstances. To reduce the costs associated with courier churn, an early churn prediction approach is deployed that can reliably predict the couriers who are about to leave. For the predicted possible churner, strategies should be adopted to retain as many valuable couriers as possible. To improve the performances of the churn prediction and delivery number prediction, various aggregated features from delivery performances are employed and analyzed in the experiments. The features including behavioral features daily aggregated from the deliveries and demographic features were evaluated with XGBoost, LightGBM, and Random Forest, and the most useful features were selected for the regression model of courier behavior prediction and binary classification based churn prediction. In the experiments, it was demonstrated that the XGBoost and LightGBM have provided the best performances which are higher R2 values, and F1 and AUC scores than the other algorithms, for behavior prediction and churn prediction, respectively.
Early Courier Behavior and Churn Prediction Using Machine Learning …
109
References 1. Abiad M, Ionescu S (2020) Customer churn analysis using binary logistic regression model. BAU J Sci Technol 1(2):7 2. Ajit P (2016) Prediction of employee turnover in organizations using machine learning algorithms. Algorithms 4(5):C5 3. Boushey H, Glynn S (2012) There are significant business costs to replacing employees. Center Am Progress 16:1–9 4. Chen K, Hu YH, Hsieh YC (2015) Predicting customer churn from valuable B2B customers in the logistics industry: a case study. IseB 13(3):475–494 5. Dahiya K, Bhatia S (2015). Customer churn analysis in telecom industry. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions), pp 1–6 6. Le L, Nguyen H, Zhou J, Dou J, Moayedi H et al (2019) Estimating the heating load of buildings for smart city planning using a novel artificial intelligence technique PSO-XGBoost. Appl Sci 9(13):2714 7. Li Y, Wei J, Kang K, Wu Z (2019) An efficient noise-filtered ensemble model for customer churn analysis in aviation industry. J Intell Fuzzy Syst 37(2):2575–2585 8. Karvana K, Yazid S, Syalim A, Mursanto P (2019) Customer churn analysis and prediction using data mining models in banking industry. In: 2019 international workshop on big data and information security (IWBIS), pp 33–38 9. Nagadevara V, Srinivasan V, Valk R (2008) Establishing a link between employee turnover and withdrawal behaviours: application of data mining techniques 10. Sahinkaya G, Erek D, Yaman H, Aktas M (2021) On the data analysis workflow for predicting customer churn behavior in cargo and logistics sectors: case study. In: 2021 international conference on electrical, communication, and computer engineering (ICECCE), pp 1–6 11. Sharma T, Gupta P, Nigam V, Goel M (2020) Customer churn prediction in telecommunications using gradient boosted trees. In: International conference on innovative computing and communications, pp 235–246 12. Viu-Roig M, Alvarez-Palau E (2020) The impact of E-commerce-related last-mile logistics on cities: a systematic literature review. Sustainability 12(16):6492
Combining Different Data Sources for IIoT-Based Process Monitoring Rodrigo Gomes, Vasco Amaral , and Fernando Brito e Abreu
Abstract Motivation—Industrial internet of things (IIoT) refers to interconnected sensors, instruments, and other devices networked together with computers’ industrial applications, including manufacturing and energy management. This connectivity allows for data collection, exchange, and analysis, potentially facilitating improvements in productivity and efficiency, as well as other economic benefits. IIoT provides more automation by using cloud computing to refine and optimize process controls. Problem—Detection and classification of events inside industrial settings for process monitoring often rely on input channels of various types (e.g. energy consumption, occupation data or noise) that are typically imprecise. However, the proper identification of events is fundamental for automatic monitoring processes in the industrial setting, allowing simulation and forecast for decision support. Methods—We have built a framework where process events are being collected in a classic cars restoration shop to detect the usage of equipment such as paint booths, sanders and polishers, using energy monitoring, temperature, humidity and vibration IoT sensors connected to a Wifi network. For that purpose, BLE beacons are used to locate cars being repaired within the shop floor plan. The InfluxDB is used for monitoring sensor data, and a server is used to perform operations on it, as well as run machine learning algorithms. Results—By combining location data and equipment being used, we are able to infer, using ML algorithms, some steps of the restoration process each classic car is going through. This detection contributes to the ability of car owners to remotely follow the restore process, thus reducing the carbon footprint and making the whole process more transparent.
R. Gomes (B) NOVA School of Science and Technology, Caparica, Portugal e-mail: [email protected] V. Amaral NOVA LINCS & NOVA School of Science and Technology, Caparica, Portugal F. B. Abreu ISTAR-IUL & ISCTE-Instituto Universitário de Lisboa, Lisboa, Portugal © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_10
111
112
R. Gomes et al.
Keywords Process activity recognition · IIoT · IoT sensors · Intrusive load monitoring · Machine learning · Indoor location · Classic cars restoration · Charter of Turin
1 Introduction The historical importance, the aesthetics, the build quality, and the rarity are characteristic features that individually or collectively define a car as a classic. Due to the many admirers, classic cars are highly valued, sentimentally, and monetarily. Keeping the authenticity of those masterpieces, i.e., maintaining them as close as possible to when they left the factory, requires expert restoration services. Guidelines for the restoration of classic cars were proposed by FIVA.1 They may be used for the certification of classic cars by accredited certification bodies such as the ACP.2 Monitoring the classic car restoration process, so that pieces of evidence are recorded, is an important matter, both for managing the shop floor plan, for certification purposes, and for allowing classic car owners to follow the restoration process remotely, reducing the carbon footprint and making the whole process more transparent. Our work aims to create and implement an IoT monitoring system to recognize the tools used and infer restoration tasks (e.g., mineral blasting, bodywork restoration, painting, painting drying, bodywork finishing stage) that a classic car is going through. We intend to use Intrusive Load Monitoring (ILM) techniques by installing energy meters in the workshop outlets and use its data in a supervised Machine Learning (ML) model for detecting the various tools used by the workers in the restoration of each classic car and combining with its location, to automatically recognize the ongoing restoration task. This presents some challenges as classic car restoration is a complex process [1]. In the same workshop many cars may be under restoration, often each at a different stage in that process and different tools being applied. To further make detection challenging, the same tool may be shared across adjacent cars without unplugging, making power consumption-based detection imprecise. The current work is a continuation of the one reported in [2], where a Raspberry Pi-based edge computer equipped with several sensors was attached (using magnets) to each car body on the plant shop floor, allowing to capture data on the vibrations produced by different restoring tools, as well as temperature and humidity conditions where the cars went through. Estimote BLE3 beacons attached to the walls of the plant shop floor were also detected by the edge computer, to allow the indoor location of the car body under restoration. The raw data captured by the edge computers was then sent to the AWS4 cloud-based platform where a ML algorithm, combining detected 1
Fédération Internationale des Véhicules Anciens (FIVA), https://fiva.org. Automóvel Club de Portugal (ACP), https://www.acp.pt/classicos. 3 Bluetooth Low Energy. 4 Amazon Web Services. 2
Combining Different Data Sources for IIoT-Based Process Monitoring
113
tools and detected position, allowed us to identify some of the tasks of the restoration process. A web application was also built to monitor the state of operation of all edge computers and beacons. Although this work presented significant progress in using IoT techniques in an industrial context (aka IIoT) for process monitoring purposes, the developed system lacked precision in detecting the operation of some tools, as well as in indoor locating. In Sect. 2, we present previously developed projects with similar objectives as ours. Then, the proposed description and architecture for our work are detailed in Sect. 3. In Sect. 4, we evaluate and discuss the obtained results. Finally, in Sect. 5, we present some conclusions.
2 Related Work 2.1 Intrusive Load Monitoring (ILM) Most works on this topic, were developed in smart-home contexts. Hundred power consumption samples from three houses and the same appliances were used in [3] for feature extraction to serve as input to an Artificial Neural Network (ANN) classification algorithm. Results showed a positive overall accuracy of 95%. A second test using the ANN trained with the previous data, was executed with data from a new house, but worse results emerged, even after reducing the number of features. An attempt to identify the different states of each appliance is described in [4]. A pre-processing with z-normalization was used for feature selection. The classification process applied a Hidden Markov Model (HMM) algorithm. Positive results were achieved with an accuracy of 94% for the test with appliances in the training set, and 74% for other appliances. An app to visualize in real-time the recognized appliance characteristics is also reported. A prototype for collecting load data with an Arduino and an energy sensor is described in [5]. For classification purposes, different algorithms were tested, namely K-Nearest Neighbour (KNN), Random Forest (RF), Decision Tree (DT), and the neural network Long Short-Term Memory. RF presented the best results. An experimental test was also carried out to obtain the best sliding window size, i.e. the one to be used in feature extraction and classification algorithms. An IoT architecture for ILM is presented in [6]. The features were the same as those described in [3], and three supervised learning (SL) algorithms were tried for classification. The Feed Forward Neural Network (FNNN) obtained the best accuracy results for seen data (90%).
114
R. Gomes et al.
2.2 Indoor Location Systems Several techniques are used for this purpose. In some cases, the reader is linked to the object to track, and a lot of tags are dispersed through the space [7], while in others a tag is linked to the moving object, and the reader(s) is(are) fixed. In both cases, distances are calculated based on RSSI.5 Trilateration can then be used for detecting the position of the moving object. For instance, Wifi-based indoor location can be performed through the trilateration of RSSI corresponding to detected access points (APs) on a mobile phone [8]. BLE-based location technology is similar, but beacon transmitters are used instead of APs, such as in [9], where RSSI trilateration and fingerprinting are used. ML algorithms can be used for improved fingerprinting such as in [10], where an average estimation error of 50 cm is reported.
3 Proposed System Overview System description The detection of tools used by the workers includes the three main steps of ILM, i.e. data collection, feature extraction, and classification. Regarding data extraction, smart energy meters are installed between the tools and workshop outlets to capture the plugged tools’ energy loads. Two smart meter types (Nedis Smart Plug and the Shelly Plug S.) were tested in the workshop with many available tools. They capture the electrical power in Watts (W) in real-time with a frequency of one measure per second. Both have an API to get the measured data and use Wifi to send this data to the internet. We chose Shelly’ because its API is more straightforward, its plug is smaller (i.e. physically less intrusive), has a smaller size and its power range is enough for the tools used in the workshop (up to 2500 W, compared to Nedis’ 3650 W). For feature extraction, we used the technique described in [3, 6]. A script is always running getting as input the energy sensor data and when some non-zero power value arrives, the algorithm takes the next 100 data entries (the sliding window size) and calculates all features regarding power levels and power variations. Nine features were chosen based on previous works: Maximum power value; Minimum power value; Mean power for nonzero values; Number of samples with power less than or equal to 30 W; Number of samples with power between 30 and 400 W; Number of samples with power between 400 and 1000 W; Number of samples with power greater than 1000 W; Number of power transitions between 10 and 100 W; Number of power transitions greater than 1000 W. The group of features regarding each data window serves as input to a supervised ML model. In the ML training phase, these features are labeled with the ground-truth, i.e. the tool (target) being measured. Different algorithms are then trained with the same labelled data to find the one with the best predictions when providing unlabeled 5
Received Signal Strength Indicator is a measurement of the power present in a received radio signal.
Combining Different Data Sources for IIoT-Based Process Monitoring
115
data (in the ML estimation stage). After the three ILM phases, the electrical tools used by the workers at any time of the day in the workshop are registered and available. To complete this process of recognizing the tools, the remaining part we need to tackle is to define which car was under intervention by those same tools. As a result of a literature review about indoor location systems, some possible solutions emerged. In [2], each sensor box has a car associated, and in a real system, each vehicle would have a sensor box attached that goes with it throughout the whole workshop process. So one solution is to use a location system to track down each energy sensor, and then, as the sensor box location is available, find the closest distance between both and do the link. Another solution is to use the timestamps from the energy sensors data and match them with the detected restoration steps of the sensors boxes. With the awareness of the tools used on each car, a combination with the information provided by the sensor boxes is made. In addition to its developed Process Identification Algorithm [2] more robustness and reliability is obtained by combining all data. A web application is needed for system users to get feedback about the system developed and make simple changes. Some features that should be available are, for example, the list of all activated and deactivated smart plugs in the workshop, the list of all electrical tools belonging to the workshop, and the registration of more smart plugs in the system. We decided to implement an indoor localization system for the sensor box using a ML-based BLE fingerprinting technique. The latter encompasses two phases. First is a training phase in which RSSI samples are captured throughout the entire area, and corresponding locations are used to train several ML location estimation models. Second, a validation phase where a target moves around and estimates are produced by the models (a pair of x, y coordinates) are compared with ground truth measurements to assess their accuracy and choose the best one. In our experimental setup, many BLE beacons were distributed throughout the workshop. Some measurement points were distributed across the workshop floor plan, with about 3 m between each other. And reference points were defined, and their coordinates inside the workshop were obtained. All these points were determined through the workshop plan, as can be seen in detail in the Fig. 1 diagram. To obtain the coordinates of each point, a cartesian plot was placed over the floor plan of the workshop, with the axes in the same measurement scale as the available floor plan scale. Then, by going through the measurement points, using a laser distance meter, the distance to three reference points visible is pointed out, as well as the beacons’ RSSI detected values in that point and their ids. Then for each measurement point, a trilateration algorithm is used to obtain its coordinates based on the distances to the reference points and their coordinates. Having said that, a ML model is trained using the coordinates of each measurement point as the target of the model and its detected beacons’ RSSI values. An example is shown at Table 1. During normal operation, the trained ML model, running in each sensor box/car, takes as input the detected RSSI values and predicts its most likely location within the workshop.
116
R. Gomes et al.
Fig. 1 Workshop floor plan with the identification of beacon’s locations, reference points, and measurement points
Table 1 Example of a row of the data acquired in the sensor box location method to serve as input to the ML model Beacon id1 (RSSI) Beacon id2 (RSSI) Beacon id3 (RSSI) Beacon id4 (RSSI) (x, y) 90.5
80.6
30.5
70.0
(9.95, 2.56)
System architecture Since we receive power consumption data from the sensors every second, we chose the open-source InfluxDB time series database. We installed it in a virtual machine hosted by an OpenStack platform operated by INCD. A bucket receives electric sensors’ data in the ILM part of the work, as shown in Fig. 2. The ingestion uses a Telegraf agent that asks the energy sensor API every second for its measurements. A Python script performs feature extraction upon a 100 s sliding window of power consumption data values retrieved from the InfluxDB database. The results serve as input to the ML model that returns the tool predictions (i.e. which tools most likely were in use in the sliding window). Every tool prediction is then saved with its timestamp in another bucket.
Combining Different Data Sources for IIoT-Based Process Monitoring
117
Server Energy Sensors Data
InfluxDB Python Library Read
Bucket
InfluxDB Python Library Write
Energy Sensors
Tool Predictions
Feature Extraction Script (.py)
Tool Prediction
Machine Learning Model (.pkl) Bucket
Fig. 2 Architecture of the system’s electrical data and feature extraction
To deploy the ML model after being manually trained, we saved in our server a Pickle file6 with the trained model that is accessed every time a prediction is required. For the new location system, the beacons data are also saved in an InfluxDB bucket, and the location ML model, after being trained, is also deployed in our server, using a pickle file, that is used in the Process Identifier Algorithm. The Process Identifier Algorithm was developed in an AWS Lambda function. Still, as we want to reduce as much as possible the use of proprietary services that can later be charged, so we decided to transfer the function to a Python script running on our server, and change [2] sensor boxes data to InfluxDB. This way, the script queries InfluxDB for all the data needed to run the algorithm for identifying restoration processes, now with the help of the tools identified by the ILM module. The Web Application front-end was implemented in [2] with React and communicates with the back-end via the Amazon API Gateway, so it is necessary to update and expand the front end so it can show feedback to the users about the new system features related to the predictions and smart plugs. This is an IoT system, so its architecture layers could be defined. We can use a five layers structure in our workshop problem and define them as follow: Physical Things—The electrical tools available in the workshop; Perception—The electrical sensors plugged in the workshop outlets; Communication Network—WiFi as this is the via that sensors use to communicate to the cloud; Middleware—All the data storage services and algorithms implemented over InfluxDB that interact with sensors data; Application—Web Application where the interaction with users happens.
6
Pickle is a useful Python tool that allows saving trained ML models for later use.
118
R. Gomes et al.
4 Results and Discussion To verify if predictions can be made with the electrical data and choose the best ML model, we considered the more accurate models used in the previous works detailed in Sect. 2. Six different ones were implemented. Random Forest (RF), K-Nearest Neighbour (KNN), Decision Tree (DT), Gaussian Naive Bayes (GNB), Gradient Boosting (GB), and the neural network algorithm Feed Forward Neural Network (FNNN). These six different supervised ML algorithms were implemented, tested, and compared. The energy data used to test the ML models was recorded in the workshop. The electrical tools at work in the workshop, a drill, two electrical sanders, two polishers, an angle grinder, and a hot air blower, were measured for an entire afternoon with the Shelly plug. After having the features data available, we divide it into a training set and a test set. For this, we randomly choose 70% of the entries related to each tool for training and 30% for the test set. We did it by the tool so we could have data from every available tool in the train set and the same for the test set. In the implementation, we manually run every algorithm in a local machine and take advantage of the open-source libraries available online for ML development. For the RF, KNN, DT, GNB, and GB algorithms, we used the Sklearn library. For the FNNN, the Keras library was used. For all the algorithms, the results with or without data normalization were compared, and different parameters and hyper-parameters were used to get the best of each algorithm for a more meaningful comparison between them. However, none of the algorithms presented better results with data normalization. A feature reduction was also made and tested in the models. The most important features were the maximum power value and minimum power value. Testing with just these two features, none of the algorithms gave better results, so all the features were used to compare the algorithms. For the FNNN model, nine input nodes were used as this is the features number, two hidden layers, and an output layer with six nodes equal to the number of different tools to predict. Then a different number of nodes in the hidden layers were tested to reach the best results. As we can see in Table 2 the results are very positive as we were able to achieve 100% of accuracy and maximum F1-Score with two algorithms, the Random Forest and with Gradient Boosting. Also, the minimum accuracy value was 63% for the Gaussian Naive Bayes correctly predicting more than half of the tools given to the model. Given these results, the algorithm that should be deployed to the cloud to enter the system is either Random Forest or Gradient Boosting. To test the feasibility of the sensor box location system, we took a small workshop zone to measure some data and test an ML algorithm. As described in Sect. 2, to get the measurement points (Ms) coordinates, we first needed to define the cartesian coordinates, relative to the workshop floor plan, of the reference points (RPs) 8, 6, 7. The result was the coordinates that can be seen in Fig. 3. Then all the distances
Combining Different Data Sources for IIoT-Based Process Monitoring Table 2 Accuracy and F1-score of the tested ML algorithms
119
Algorithm
Accuracy (%)
F1-score (%)
RF
100.0
100.0
81.8
78.8
KNN DT
81.8
81.8
GNB
63.6
59.1
100.0
100.0
GB FNN
72.73
82.0
from each one of the measurement point to the RPs were pointed out. The distances of M1 are shown in Fig. 3 too. Having the distances from each measuring point, a trilateration algorithm was used to obtain their coordinates. The coordinates obtained were: M1 (5.5, 2.3), M2 (9.95, 2.56), M3 (14.73, 2.53). The RSSI values detected in each M were pointed out. As this location test is just the first superficial phase of the testing procedure that must be done, just one algorithm was chosen so we could verify if predictions could be made with our acquired data. So the ML algorithm chosen to predict the sensor box spot was the K-Nearest Neighbour (KNN), as it is one of the most considered in fingerprinting-related works. The set of RSSI values and the coordinates of each measurement point was used to train the ML model. Some RSSI values were obtained around measurement points M1, M2, and M3 and given to the model as a test set so that the spots could be predicted. Besides the small number of train and test data, the KNN achieved an accuracy of 90%. With the results obtained, we can only realize that the tracking system could be made and expanded to the entire workshop as with just a few data received in a small zone, the ML model showed positive results. However, we must obtain more data regarding the beacons’ RSSI values, do tests in all the spaces of the workshop, and compare different ML algorithms. Only then can we guarantee the correct functioning of the box location system.
Fig. 3 Partial plan of the workshop, identifying beacons location (blue), reference points (red) and measurement points (green)
120
R. Gomes et al.
5 Conclusion and Future Work This work presented an ILM approach for tool recognition in a workshop context and a location solution for the cars being restored. With the ML algorithms tested, the results observed regarding the ILM approach demonstrate that it can clearly predict the power tools used. Missing only the identification of which car they were used. However, the data acquired to train and test the model was only for testing purposes. More data should be acquired over several days for a completely reliable model. The new sensor box location method should also be expanded for the whole workshop so every sensor box can be located precisely in all the floor plan. Also, the merging with the work done in [2] should be finished by completing the web application and tested so the restoration processes can be identified and available in the application. After completing the system, future work will be to create a real-time viewing of the workshop floor plan where all the detected events would be marked in the exact place where they happened. So it can be possible in an interactive way to see all the events detected by the developed IoT system. Acknowledgements This work was produced with the support of INCD funded by FCT and FEDER under the project 01/SAICT/2016 nº 022153, and partially supported by NOVA LINCS (FCT UIDB/04516/2020).
References 1. Gibbins K (2018) Charter of Turin handbook. Tech. rep., Fédération Internationale des Véhicules Anciens (FIVA). https://fiva.org/download/turin-charter-handbook-updated-2019english-version/ 2. Pereira D (2022) An automated system for monitoring and control classic cars’ restorations: an IoT-based approach. Master’s thesis, Monte da Caparica, Portugal. http://hdl.handle.net/ 10362/138798 3. Paradiso F, Paganelli F, Luchetta A, Giuli D, Castrogiovanni P (2013) ANNbased appliance recognition from low-frequency energy monitoring data. In: Proceedings of the 14th international symposium on a world of wireless, mobile and multimedia networks, WoWMoM 2013. IEEE. https://doi.org/10.1109/WoWMoM.2013.6583496 4. Ridi A, Gisler C, Hennebert J (2014) Appliance and state recognition using Hidden Markov Models. In: Proceedings of the 2014 international conference on data science and advanced analytics (DSAA 2014), pp 270–276. IEEE. https://doi.org/10.1109/DSAA.2014.7058084 5. Mihailescu RC, Hurtig D, Olsson C (2020) End-to-end anytime solution for appliance recognition based on high-resolution current sensing with few-shot learning. Internet of Things (Netherlands) 11. https://doi.org/10.1016/j.iot.2020.100263 6. Franco P, Martinez J, Kim YC, Ahmed M (2021) Iot based approach for load monitoring and activity recognition in smart homes. IEEE Access 9:45325–45339. https://doi.org/10.1109/ ACCESS.2021.3067029 7. Saab S, Nakad Z (2011) A standalone RFID indoor positioning system using passive tags. IEEE Trans Ind Electron 58(5):1961–1970. https://doi.org/10.1109/TIE.2010.2055774
Combining Different Data Sources for IIoT-Based Process Monitoring
121
8. Khelifi F, Bradai A, Benslimane A, Rawat P, Atri M (2019) A survey of localization systems in internet of things. Mobile Netw Appl 24(3):761–785. https://doi.org/10.1007/s11036-0181090-3 9. Cabarkapa D, Grujic I, Pavlovic P (2015) Comparative analysis of the Bluetooth low-energy indoor positioning systems. In: 2015 12th international conference on telecommunications in modern satellite, cable and broadcasting services, TEL-SIKS 2015, pp 76–79. https://doi.org/ 10.1109/TELSKS.2015.7357741 10. Sthapit P, Gang HS, Pyun JY (2018) Bluetooth based indoor positioning using machine learning algorithms. In: 2018 IEEE international conference on consumer electronics—Asia (ICCEAsia), pp 206–212. https://doi.org/10.1109/ICCE-ASIA.2018.8552138
Comparative Analysis of Machine Learning Algorithms for Author Age and Gender Identification Zarah Zainab, Feras Al-Obeidat, Fernando Moreira, Haji Gul, and Adnan Amin
Abstract Author profiling is part of information retrieval in which different perspectives of the author are observed by considering various characteristics like native language, gender, and age. Different techniques are used to extract the required information using text analysis, like author identification on social media and for Short Text Message Service. Author profiling helps in security and blogs for identification purposes while capturing authors’ writing behaviors through messages, posts, comments, blogs, comments, and chat logs. Most of the work in this area has been done in English and other native languages. On the other hand, Roman Urdu is also getting attention for the author profiling task, but it needs to convert RomanUrdu to English to extract important features like Named Entity Recognition (NER) and other linguistic features. The conversion may lose important information while having limitations in converting one language to another language. This research explores machine learning techniques that can be used for all languages to overcome the conversion limitation. The Vector Space Model (VSM) and Query Likelihood (Q.L.) are used to identify the author’s age and gender. Experimental results revealed that Q.L. produces better results in terms of accuracy.
Z. Zainab City University of Science and Information Technology, Peshawar, Pakistan F. Al-Obeidat College of Technological Innovation, Zayed University, Abu Dhabi, UAE e-mail: [email protected] F. Moreira (B) REMIT, IJP, Universidade Portucalense, Porto, Portugal e-mail: [email protected] IEETA, Universidade de Aveiro, Aveiro, Portugal H. Gul · A. Amin Center for Excellence in Information Technology, Institute of Management Sciences, Peshawar, Pakistan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_11
123
124
Z. Zainab et al.
Keywords Vector space model · Query likelihood model · Information retrieval (I.R.) · Text mining · Author profiling
1 Introduction In recent eras, social media like Facebook, Twitter, Myspace, Hyves, Bebo, and Net-log have expanded impressively and have enabled millions of users of all ages to develop and support personal and professional relationships [24]. It can also be used as a tool for advertising, marketing, online business, and social media activities where users can keep their personal information. According to [7], social media blogs have evolved massive user traffic, such as Facebook, which had 1.65 billion monthly active users in the first quarter of 2016. Most people tend to provide fake names, ages, genders, and locations to conceal their real identities [4]. To catch internet predators, law enforcement agencies and social setup moderators confront two incredibly important issues: • Investigation of the substantial number of profiles and communications on social setups is quite challenging. • Internet predators typically provide fake identities and act like young people to establish a link with their victims. Furthermore, identifying the gender and age of a customer’s style according to their social media comments helps them recognize who their customers are. Therefore, they make decisions to improve their services in the future [22, 30]. It indicates that it needs to develop automatic tools and techniques for detecting fake author profiles in different types of texts, like Facebook posts/comments, Twitter comments, blog posts, and other analytical perspectives [17]. Author profiling can merely be defined as a text set, and you must identify age group, gender, profession, education, native language, similar personality traits, etc. That is a challenging problem for researchers [4]. For automatic detection, prediction, and forecasting, machine learning (ML) techniques play an important role [14]. Therefore, this research focuses on ML techniques for predicting the author’s age and gender using Query Likelihood (Q.L.) and Vector Space Models (VSMs). These techniques are applied using Stylistic Features (S.F.s) [12, 35] to identify the author’s age and gender. The outcomes for gender analysis considering S.F. show an accuracy of 70% for Q.L. and 66% for VSM. It is considered that S.F. making spaces between tokens shows the accuracy of 70% for Q.L. and 44% for VSM. By removing S.F., the accuracy is 70% for Q.L. and 46% for VSM. The outcomes for age analysis considering S.F. show an accuracy of 62% for Q.L. and 66% for VSM. Considering S.F. makes spaces between tokens, the accuracy is 64% for Q.L. and 56% for VSM. Without considering S.F., the accuracy is 66% for Q.L. and 64% for VSM. When age and gender are combined, the accuracy outcomes for considering S.F. are 76% for Q.L. and 66% for VSM, while making spaces between tokens is 67% for Q.L. and 50% for VSM, and without considering S.F., it is 68% for Q.L. and 55% for VSM.
Comparative Analysis of Machine Learning Algorithms for Author Age …
125
The researcher recently started work on Roman-Urdu analysis. The important factors in Roman-Urdu for author profiling and analysis are the first author’s name and affiliation. Thus, a very limited machine learning methodology is proposed for the author profiling in the Roman-Urdu analysis. Due to the limited literature on Roman-Urdu profiling systems, a few machine learning algorithms have been used to identify the author’s gender and age in this paper. People used Roman and Urdu to comment on text to put their opinions, and they tried to convey their messages using shorthand, emojis, and so on, as advanced generations began to use other slang languages other than the specific language to type easily and freely. As the world became more automated, machines learned such languages to make different decisions. The Roman Urdu language is working on other languages such as English, so it is now a hot topic to be highlighted to introduce or make the state-of-the-art technology to learn the Urdu Roman language. We chose these models to compare the results, which yielded better results and demonstrated a new path toward improvement. In the given paper, Sect. 1 contains an introduction, while in Sect. 2, we have included related work to the problem. Section 3 contains a detailed discussion of the methodology and step-by-step procedure of how the framework works. Finally, in Sect. 4, we have included the conclusion of the work and then references.
2 Related Work In recent years, researchers have made progress in this area to develop benchmark corpora and techniques for author profiling tasks. The most prominent effort in this regard is the series of PAN competitions on author profiling [16, 23, 26–29]. The literature corpora have been developed in various genres, for example, fiction and non–fiction texts [28], chat logs [36], customer reviews [27], emails [10], blogs and social media [12, 25, 34], comments [13]. Author identification is one of the methods of authorship associate degree whose objective is to identify the traits of an author (age, gender, language, etc.) by analyzing his written behavior [27]. It also helps to reduce the misuse of social media and gain the trust of users. Most of these corpora are in the English language,however, some work has been done in European languages as well, like Dutch, Italian, and Spanish [38]. One of the tasks these days, which attracts the attention of researchers, is to predict the age and gender of the author by making an analytical and critical analysis of the author’s written behavior. According to [15], they investigated the problem of gender and genre identification on English corpora of 604 documents taken from the BNC mentioned above corpus tagged with fiction vs. non–fiction genre and gender. The corpus has an equal number of female and male authors in each genre (123 in fiction and 179 in non-fiction). The corpora consist of blog data that has been highly targeted for author profiling experimentation. J. Schler et al. [34] developed a corpus of 71,493 English blogs to analyze stylistic and content-based features for identifying the author’s age and gender. S. Rosenthal et al. [31] have a corpus of 24,500 English
126
Z. Zainab et al.
blogs for age prediction using three different features, including lexical stylistics, lexical content, and online behavior. G. K. Mikros et al. [20] investigated author profiling in the Greek language using blogs. GBC (Greek Blog Corpus) was built by taking 50 blog entries from 20 bloggers. L. Wanner et al. [38], Contributed blog corpora in Spanish, Dutch, French, German, and Catalan languages for gender and language identification using stylistic features. S. Mechti et al. [18] developed a corpus of health forums for age and gender identification. It contains 84,518 profiles. These profiles are categorized into age groups 12–17, 18–29, 30–49, 50–64, and 65+, in which the female gender class was dominant. These days’ social networks like Twitter and Facebook have grabbed the attention of data analysts and researchers to use different machine learning and text mining techniques for performing such tasks with improved accuracy. W. Zhang et al. [39] collected 40,000 posts from Chinese social media (Siena Weibo) users for the age prediction of authors, considering four different age groups. G. Guglielmi et al. [13] explored the author profiling task by collecting comments from Twitter in 13 different languages for gender identification. The corpus contained 4,102,434 comments from 184,000 authors, with a division of 55% female and 45% male authors. The same as (Nguyen et al. 2013) used a corpus of Twitter comments in the Dutch language for age prediction. J. S. Alowibdi et al. [3] also performed an analysis by considering a Twitter comments corpus in the Dutch language with 53,326 profiles, of which 30,898 were male, and 22,428 were female. F. Rangel et al. [27] developed a Spanish corpus of 1200 Facebook comments to investigate how human emotions correlate with their gender. Schler et al. (2015) have a Facebook corpus of 75 million words from 75,000 users (with consent) to predict gender, age, and personality traits as a function of the words they use in their Facebook statuses. B. Plank et al. [37] experimented on the personality assessment of an author using a corpus of 66,000 Facebook users of the same applications. Another Vietnamese corpus consisting of 6831 forum posts collected from 104 authors was developed by [8], for the identification of the same traits as used by (Pham et al. 2009) by employing stylistic and content features. M. Ciot et al. [8] built a corpus of 8618 Twitter users in four languages, French, Indonesian, Turkish, and Japanese, for gender prediction. M. Sap et al. [33] developed an age and gender prediction lexicon from a corpus of 75,394 Facebook users of My Personality 8, a third-party Facebook application. Verhoeven et al. [37] developed a twitter-based corpus containing six different languages that are Dutch, German, Italian, French, Portuguese, and Spanish, for gender and personality identification. The corpora based on social media texts are mostly generated for English and other European languages using publicly available data. Also, profiles in these corpora contain text in one single language. This research contributes to multilingual (Roman, Urdu, and English) Simple SMS text messages, which contain both public and private messages users typed by them.
Comparative Analysis of Machine Learning Algorithms for Author Age …
127
The Roman Urdu script is also gaining attention in research trends. S. Mukund et al. [21] performed the sentiment analysis on Urdu blog data using structural correspondence learning for Roman Urdu. M. Fatima et al. [11] extended this work by adding bilingual (Roman Urdu and English) lexicons. M. Bilal et al. [5] investigated the behavior of multilingual opinions (Roman Urdu and English) extracted from a blog. M. Daud et al. [9] also worked on multilingual opinion mining (Roman Urdu and English). K. Mehmood et al. [19] performed spam classification on comment data based on English and Roman Urdu languages. According to Safdar [32], Multilingual Information Retrieval (MLIR) accepts queries in numerous languages and retrieves the results in the demanded language by the users. A questionnaire-based web survey is designed and directed to the internet users through the survey link. 110 participants responded, and the researcher detects the majority of them use the internet daily. The English language is identified as the popular language used for searching for information. They can understand the English language but use Roman Urdu for socialization and retrieving information which includes audio, video, etc. Author area identification is a component of author profiling that aims to pinpoint the author’s location based on the text [1]. Author area identification may enhance content recommendation, security, and the lowering of cybercrime due to its numerous uses in fake profile detection, content recommendation, sales and marketing, and forensic linguistics. Numerous author profiling tasks have received much attention in English, Arabic, and other European languages, but author region identification has received less attention. Urdu is a morphologically rich language, despite being used by over 100 million people worldwide [2]. The dearth of corpora is a major reason for the lack of attention and advancement in research. Roman-Urdu-Parl is the first-ever large-scale publicly available Urdu parallel corpus with 6.37 million sentence pairs. It has been built to ensure that it captures the morphological and linguistic features of the language. The study was proposed by [6]. In this study, the author presents a user study conducted on students at a local university in Pakistan and collected a corpus of Roman Urdu text messages. We could quantitatively show that several words are written using more than one spelling. Most participants of our study were not comfortable in English and hence chose to write their text messages in Roman Urdu.
3 Methodology This section discusses the main framework of the selected algorithms and strategies to compare the performance of algorithms upon different criteria. It consists of seven phases, as given in Fig. 1. Details of each phase of the framework are enumerated below.
128
Z. Zainab et al.
Fig. 1 Proposed model for author profiling using machine learning algorithms
3.1 Preprocessing Phases First, all the text messages of the authors in the form of.txt are inserted into the system. Then preprocessing has been performed, which carries three phases, and each phase is enumerated below: • Separating Files: The system reads the training and testing files in this phase. It separates the file inside the collection. • Separating Sentences: In this phase, the sentences and text messages are separated into sentences inside the individual training and testing files. • Tokenization: In this preprocessing phase, the sentences are tokenized using the split function in python, which separates the token inside the file by identifying the space between the tokens inside the sentences.
3.2 Selecting Strategy In the first experiment, three different strategies were compared to predict the author’s age and gender by analyzing the author’s writing behavior using two learning models, i.e., the Query Likelihood Model and Vector Space Model. The strategies are as follows: • Considering Stylistic Features: In the first strategy, the results are generated without removing stylistic features, i.e., emojis, digits, punctuations, special characters, and abbreviations from the author’s text messages, as shown in Fig. 2.
Comparative Analysis of Machine Learning Algorithms for Author Age …
129
Fig. 2 Strategy 1—considering stylistic expressions
Fig. 3 Strategy 2—adding extra information, i.e., whitespaces between words having attached stylistic expressions
• Considering Stylistic Features making spaces between tokens: In the second strategy, the extra information, i.e., white spaces, are generated between the word tokens and stylistic expressions of the author’s messages. For example, messages between friends sometimes include emojis or other stylistic expressions attached to tokens like, i.e., Hi!:) (token1). It was considered a single token in strategy one as shown in Fig. 2, which acts as a searching parameter for finding similar tokens in text messages. However, in this scenario, such writing expressions are separated from the word tokens by making white spaces between them, as shown in Fig. 3. Making spaces increases the number of search parameters as compared to strategy 1. • Removing Stylistic Features: In the third strategy, results are generated by removing all the stylistic expressions i.e., emojis, digits, punctuations, special characters, and abbreviations from the author’s text messages. All text messages are filtered, and noise is completely removed from it.
3.3 Applying Algorithms In this part, two algorithms, i.e., Vector Space Model and Query Likelihood Model, for comparative analysis based on strategies are listed in Fig. 3.1. The algorithmic steps of both algorithms are as follows: Algorithm 1: Vector Space Model This model is used for finding the Similarity angle between the testing files and training files of the author’s text messages. For each
130
Z. Zainab et al.
document, a vector is derived. The set of documents in a collection is then viewed as a set of vectors in a vector space. Each term will have its axis. The formula is shown in Eq. 1. |V | q d q.d i=1 qi di = . = Cos(q, d) = |q||d| d d |V | 2 |V | 2 q i i=1 i=1 d i
(1)
Equation 1, q (i) is the TF-IDF weightage of term i in the test files, and d(i) is the TF-IDF weightage of term i in the training files. The steps to compute the formula of VSM are as follows: Term Frequency (T.F.): TF measures the number of times a term (word) occurs in a document. Normalization of Document. The document will be of different sizes. On a large document, the frequency of the terms will be much higher than the smaller ones. Hence, we need to normalize the document based on its size by dividing the term frequency by the total number of terms. Inverse Document Frequency (IDF): The main purpose of searching is to find relevant documents matching the query. It is used to weigh down the effects of too frequently occurring terms. Also, the terms that occur less in the document can be more relevant. Moreover, it also weighs up the effects of less frequently occurring terms. The formula is as shown in Eq. 2.
N log df x
(2)
where in Eq. 2 N shows the total no of training files and D.F. (x) is several documents containing term x of testing files. Algorithm 2: Query likelihood Model: In the Query Likelihood Model, the documents are ranked by P (d | q), which means the probability of a document is interpreted as the likelihood of a training document (d) that is relevant to the test file (q). Formula as shown in Eq. 3. f jm (q, d) =
1 − λc(w, q) c(w, q)log 1 + λ|d| p(w|c) wq,d
(3)
Term Frequency in testing files (q) (c (w, q)): In this step, find the frequency of words in test files individually and how many times the particular word occurs in the test file individually.
Comparative Analysis of Machine Learning Algorithms for Author Age …
131
Term Frequency in training files (d) (c (w, d)): Finding the frequency of testing files words in training files individually and how many times the particular test file word occurs in the training file individually. Length of Training files (d) (|d |): Finding the length of each training file. Frequency of Term in full collection: Find the frequency of test files (q) words in the full collection and how many times the particular test file word occurs in all training files individually. Ranking score of test files: Rank the document according to higher scores to less generated between the training files and test files generated by both algorithms, as shown in Fig. 1. Voting Against the Ranked Test Files: Select the top 5 files and predict the class for the testing file. The class is assigned to the test files on the basis higher number of votes against that class. Output from System: The output from the system is the predicted class of gender and age for the test files.
3.4 Datasets DATA SET The corpus for this experiment is used under a student’s permission at Comsat University Lahore and has 350 files in the Roman Urdu language. The names and gender identifications of the authors in a total of 350 files have been examined for this study. Different authors were working in different languages, but with the passage of time, as advanced generations people started using some other slang languages other than the specific language, ignoring the structure and rules, people used Roman and Urdu to comment on text to put their opinions, and they tried to convey their messages using shorthand’s, emojis, and so on. So, we especially focused on the Urdu Roman language and observing such structures and rules that make machines easy to learn, so we took an overview of Roman Urdu and tried to test, learn, and train models on the Roman Urdu language, which are now working on other languages such as English. Hence, it is now a hot topic to be highlighted as to introduce or make the state-of-the-art technique to learn Urdu Roman language. We selected these models to compare the results, which give better results and show a new direction toward improvement. 300 files are used for a training dataset that includes the text messages of the students. As mentioned in the training dataset, 300 out of 350 files are part of the training dataset, and the rest of the 50 files are considered testing datasets. Scores will be generated against each test file individually. The experiment was conducted to detect an author’s gender (male and female) and age (15–19, 20–24, 25–xx) by analyzing their writing behavior. Due to the importance of Roman Urdu, in this work, we have only worked on a single language. In Experimental Setup 1, two learning models were compared, i.e., the Vector space model and the Query likelihood. The vector space model shows the angle
132
Z. Zainab et al.
between the test and the training document, and query likelihood gives the likelihood score between the test and training document. In Experimental Setup 2, three different techniques for predicting the author’s age and gender were compared using two learning models, i.e., the Vector space model and Query likelihood.
4 Results and Discussions In the first part, three different strategies are analyzed using vector space and query likelihood models. In the second part, the performance of the algorithms is analyzed based on strategies. Table 1 shows the average accuracy of the first technique, which is 69%. It has been discovered that some writing styles (emojis, digits, punctuation, special characters, and abbreviations) provide unique information that uniquely identifies the writing behavior of males and females, as shown in Table 8, which shows the topmost ranked training message document against testing file no.1 and having a greater number of male labels of ranked files for testing file no.1. This testing file is classified as male, indicating that it has more similar unique information. In the second strategy, the extra information, i.e., white space, is added between the tokens and stylistic expression, as shown in Fig. 3. The addition of white space as extra information gives an average accuracy of 57%, as shown in Table 1. It decreases the average accuracy to 12 as compared to strategy 1 as shown in Table 8, which reveals that adding extra information to the author’s text messages could disturb the accuracy and semantic structure of text messages by changing the writing behavior of authors (Table 2). In the third strategy, the stylistic features of the author’s writing behavior are removed from text messages without adding extra information, as shown in Fig. 3. The average accuracy of this strategy is 58%, as shown in Table 1. The result reveals that not adding the extra information may have the chance to keep the accuracy stable but removing stylistic expression may disturb the stability of the accuracy, as shown in Table 1. As shown in Table 1, removing stylistic expression decreases the accuracy to 11% as compared to strategy-1. This means that by removing the stylistic Table 1 Average accuracy of gender identification of author based on various strategies Strategies
Query likelihood model (%)
Vector space model (%)
Average accuracy (%)
Considering stylistic features
72
66
69
Considering stylistic features making spaces between tokens
70
44
57
Removing stylistic features
70
45
58
Comparative Analysis of Machine Learning Algorithms for Author Age … Table 2 Results of strategy 1
Table 3 Result of strategy 2
Score
133
Testing file
Ranked files
Authors gender
Test file 1
File no 62 222.25193705047192
Male
Test file 1
File no 8
219.17079365448933
Male
Test file 1
File no 121
212.6895614378466
Female
Test file 1
File no 220
211.38446480688157
Male
Test file 1
File no 160
210.76168901099146
Male
Testing file
Ranked files
Score
Author s gender
Test file 1
File no 8
271.22134336941154
Male
Test file 1
File no 121
267.43985328126604
Female
Test file 1
File no 214
265.3586276760159
Male
Test file 1
File no 267
262.44556127925523
Male
Test file 1
File no 160
262.42980757929917
Male
expression, one may have a chance to lose an important factor of information from the author’s text message, as shown in Table 8 (Table 3).
4.1 Age Analysis The three different age groups are being analyzed in the experiment to identify the author’s age group by analyzing the author’s writing behavior in text messages. The age groups are as follows: Group 1: 15–19, Group 2: 20–24, Group 3: 25-on-wards. In strategy 1, the age group of authors is being analyzed by considering stylistic features, i.e., emojis, digits, punctuation, special characters, and abbreviations in the author’s text messages. First, the system has been trained on emojis by using a list of predefined emojis. Analyzing strategy 1 as shown in Table 4 shows an average accuracy of 64%, which reveals that some common stylistic expressions may be used by the same age group of people. Due to this, it may disturb the accuracy of predicting the correct age group for the author by analyzing the writing behavior in the author’s text messages. In the second strategy, the extra information, i.e., white space, is added between the tokens and stylistic expression. The addition of
134
Z. Zainab et al.
Table 4 Result of strategy 2 Testing file
Ranked files
Score
Authors gender
Test file 1
File no 8
212.59566984485463
Male
Test file 1
File no 62
212.32997557500443
Male
Test file 1
File no 121
207.80014623237886
Female
Test file 1
File no 160
204.5628158047853
Male
Test file 1
File no 214
202.06059226005428
Male
Table 5 Average accuracy of age identification based on different strategies Features
Query likelihood model (%)
Vector space model (%)
KNN
Average accuracy (%)
Considering stylistic features
62
66
22
50
With stylistic features, spaces between tokens
64
56
26
48
Without stylistic features
66
64
24
51.33
white space as extra information gives an average accuracy of 60%. It decreases the average accuracy by 4% as compared to strategy 1. As shown in Table 5 this reveals that adding extra information to the author’s text messages could disturb the accuracy and semantic structure of the text messages by changing the writing behavior of authors of the same age group. In the third strategy, the stylistic features of the author’s writing behavior are removed from text messages without adding extra information. The average accuracy of this strategy is 65%, as shown in Table 5. Removing the stylistic feature, the accuracy increases by 1%. The result reveals that not adding the extra information may have a chance to keep the accuracy stable but removing stylistic expressions may improve the stability of accuracy as shown in Table 5, which shows that some common stylistic expressions were used by different age groups, which means stylistic expression may not give an important factor of information in the case of the author’s age (Tables 6, 7 and 8).
4.2 Age Gender Combine Analysis Average accuracy for gender and age for both algorithms, i.e., vector space model and Query likelihood, is discussed based on all three considered strategies as listed in Table 9. The average results of the listed strategies in Table 4.8 show that the Query likelihood model is better, i.e., 67.33%, as compared to the Vector Space Model, i.e.,
Comparative Analysis of Machine Learning Algorithms for Author Age …
135
Table 6 Result of age strategy 1 Testing file
Ranked files
Score
Authors gender
Test file 1
File no 62
219.17079365448933
20–24
Test file 1
File no 8
212.6895614378466
20–24
Test file 1
File no 121
211.38446480688157
20–24
Test file 1
File no 220
211.38446480688157
20–24
Test file 1
File no 160
210.76168901099146
25xx
Score
Authors gender
Table 7 Result of age strategy 2 Testing file
Ranked files
Test file 1
File no 8
271.22134336941154
20–24
Test file 1
File no 121
267.43985328126604
20–24
Test file 1
File no 214
265.3586276760159
25-xx
Test file 1
File no 267
262.44556127925523
15–19
Test file 1
File no 160
262.42980757929917
25-xx
Score
Authors gender
Table 8 Result of age strategy 3 Testing file
Ranked files
Test file 1
File no 8
271.22134336941154
20–24
Test file 1
File no 121
267.43985328126604
20–24
Test file 1
File no 214
265.3586276760159
20–24
Test file 1
File no 267
262.44556127925523
25–xx
Test file 1
File no 160
262.42980757929917
25–xx
57%, and KNN, i.e., 29%. The accuracy of the Query Likelihood model is 10.33% better than the Vector Space Model. The reason behind the limitation of the Vector Space Model is the multiplication of zero problems, where if the term frequency (T.F.) score against any searching token is zero in the case when a matching word is not found in the author’s text messages, Moreover, in the Query Likelihood model, the smoothing technique overcomes the limitation of the Vector Space Model. Due to the smoothing technique, the accuracy of the Query Likelihood Model is far better than the Vector Space Model in all the list strategies in Table 9.
5 Conclusion Outperformed with an average accuracy of Q.L. is 67.33% as compared to VSM and KNN. VSM performed poorly because the limitation of the Vector Space Model is the multiplication of zero problems discussed in Sect. 3 where in term frequency
136
Z. Zainab et al.
Table 9 Average accuracy for gender and age identification using the query likelihood model and vector space model Features
Query likelihood model (%)
Vector space model (%)
KNN (%)
Considering stylistic features
67
66
21
With stylistic features, spaces between tokens
67
50
31
Without stylistic features 68
55
36
Average accuracy
57
29
67.33
(T.F.) score against any searching token is zero in the case when a matching word is not found in the author’s text messages where KNN algorithm is a lazy learner, i.e. it does not learn anything from the training data and simply uses the training data itself for classification. To predict the label of a new instance the KNN algorithm will find the K closest neighbors to the new instance from the training data, and the predicted class label will then be set as the most common label among the K closest neighboring points. Further, changing K can change the resulting predicted class label.
References 1. Akram Chughtai R (2021) Author region identification for the Urdu language (Doc. dissertation, Dep. of Computer science, COMSATS University Lahore) 2. Alam M, Hussain SU (2022) Roman-Urdu-Parl: Roman-Urdu and Urdu parallel corpus for Urdu language understanding. Trans Asian Low-Resour Lang Inf Process 21(1):1–20 3. Alowibdi JS, Buy UA, Yu P (2013) Language independent gender classification on Twitter. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, pp 739–743 4. Ameer I, Sidorov G, Nawab RMA (2019) Author profiling for age and gender using combinations of features of various types. J Intell Fuzzy Syst 36:4833–4843 5. Bilal M, Israr H, Shahid M, Khan A (2016) Sentiment classification of roman-Urdu opinions using näıve Bayesian, decision tree, and KNN classification techniques. J King Saud UnivComput Inf Sci 28:330–344 6. Bilal A, Rextin A, Kakakhel A, Nasim M (2017) Roman-txt: forms and functions of roman Urdu texting. In: Proceedings of the 19th international conference on HCI with mobile devices and services, pp 1–9 7. Biswas B, Bhadra S, Sanyal MK, Das S (2018) Cloud adoption: a future road map for Indian SMEs. In: Intelligent engineering informatics. Springer, pp 513–521 8. Ciot M, Sonderegger M, Ruths D (2013) Gender inference of Twitter users in non-English contexts. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1136–1145 9. Daud M, Khan R, Daud A et al (2015) Roman Urdu opinion mining system (rooms). arXiv preprint arXiv:1501.01386 10. Estival D, Gaustad T, Pham SB, Radford W, Hutchinson B (2007) Author profiling for English emails. In: Proceedings of the 10th conference of the Pacific Association for computational linguistics, pp 263–272
Comparative Analysis of Machine Learning Algorithms for Author Age …
137
11. Fatima M, Anwar S, Naveed A, Arshad W, Nawab RMA, Iqbal M, Masood A (2018) Multilingual SMS-based author profiling: data and methods. Nat Lang Eng 24:695–724 12. Fatima M, Hasan K, Anwar S, Nawab RMA (2017) Multilingual author profiling on Facebook. Inform Process Manag 53:886–904 13. Guglielmi G, De Terlizzi F, Torrente I, Mingarelli R, Dallapiccola B (2005) Quantitative ultrasound of the hand phalanges in a cohort of monozygotic twins: influence of genetic and environmental factors. Skele-Tal Radiol 34:727–735 14. Khan S, Ullah R, Khan A, Wahab N, Bilal M, Ahmed M (2016) Analysis of dengue infection based on Raman spectroscopy and support vector machine (SVM). Biomed Opt Express 7:2249–2256 15. Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Lit Linguist Comput 17:401–412 16. Krenek J, Kuca K, Blazek P, Krejcar O, Jun D (2016) Application of artificial neural networks in condition-based predictive maintenance. Recent developments in intelligent information and database systems, pp 75–86 17. Kurochkin I, Saevskiy A (2016) Boinc forks, issues, and directions of de-development. Procedia Comput Sci 101:369–378 18. Mechti S, Jaoua M, Faiz R, Bouhamed H, Belguith LH (2016) Author profiling: age prediction based on advanced Bayesian networks. Res Comput Sci 110:129–137 19. Mehmood K, Afzal H, Majeed A, Latif H (2015) Contributions to the study of bi-lingual roman Urdu SMS spam filtering. In: 2015 National software engineering conference (NSEC). IEEE, pp 42–47 20. Mikros GK (2012) Authorship attribution and gender identification in Greek blogs. Methods Appl Quant Linguist 21:21–32 21. Mukund S, Srihari RK (2012) Analyzing urdu social media for sentiments using transfer learning with controlled translations. In: Proceedings of the second workshop on language in social media, pp 1–8 22. Nemati A (2018) Gender and age prediction multilingual author profiles based on comments. In: FIRE (Working Notes), pp 232–239 23. Ogaltsov A, Romanov A (2017) Language variety and gender classification for author profiling in pan 2017. In: CLEF (Working notes) 24. Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on search and mining user-generated contents, pp 37–44 25. Plank B, Hovy D (2015) Personality traits on Twitter—or—how to get 1,500 personality tests in a week. In: Proceedings of the 6th workshop on computational approaches to subjectivity, sentiment, and social media analysis, pp 92–98 26. Quirk GJ, Mueller D (2008) Neural mechanisms of extinction learning and retrieval. Neuropsychopharmacology 33:56–72 27. Rangel F, Herna´ndez I, Rosso P, Reyes A (2014) Emotions and irony per gender in Facebook. In: Proceedings of workshop ES3LOD, LREC, pp 1–6 28. Rangel F, Rosso P, Koppel M, Stamatatos E, Inches G (2013) Overview of the author profiling task at pan 2013. In: CLEF conference on multilingual and multimodal information access evaluation, CELCT, pp 352–365 29. Rangel F, Rosso P, Potthast M, Stein B, Daelemans W (2015) Overview of the 3rd author profiling task at pan. In: Poceedings of CLEF, sn. p. 30. Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents, pp 37–44 31. Rosenthal S, McKeown K (2011) Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 763–772 32. Safdar Z, Bajwa RS, Hussain S, Abdullah HB, Safdar K, Draz U (2020) The role of Roman Urdu in multilingual information retrieval: a regional study. J Acad Librariansh 46(6):102258
138
Z. Zainab et al.
33. Sap M, Park G, Eichstaedt J, Kern M, Stillwell D, Kosinski M, Un- gar L, Schwartz HA (2014) Developing age and gender predictive lexica over social media. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1146–1151 34. Schler J, Koppel M, Argamon S, Pennebaker JW (2006) Effects of age and gender on blogging. In: AAAI spring symposium: computational approaches to analyzing weblogs, pp 199–205 35. Sittar A, Ameer I (2018) Multilingual author profiling using stylistic features. In: FIRE (Working Notes), pp 240–246 36. Tudisca S, Di Trapani AM, Sgroi F, Testa R (2013) Marketing strategies for Mediterranean wineries competitiveness in the case of Pantelleria. Calitatea 14:101 37. Verhoeven B, Plank B, Daelemans W (2016) Multilingual personality profiling on twitter. In: To be presented at DHBenelux 2016 38. Wanner L et al (2017) On the relevance of syntactic and discourse features for author profiling and identification. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers, pp 681–687 39. Zhang W, Caines A, Alikaniotis D, Buttery P (2016) Predicting author age from Weibo microblog posts. In: Proceedings of the tenth international conference on language resources and evaluation, pp 2990–2997
Prioritizing Educational Website Resource Adaptations: Data Analysis Supported by the k-Means Algorithm Luciano Azevedo de Souza , Michelle Merlino Lins Campos Ramos , and Helder Gomes Costa
Abstract As part of the COVID-19 Pandemic control measures, the rapid shift from face-to-face classroom systems to remote models, virtual learning environments and academic administration websites have become crucial. The difficulty of changing them in a smart and nimble manner arises when assessing the major needs of the consumers. In this manner, our article attempted to reveal these objectives using a survey of 36 of 80 students enrolled in a specific MBA course. The data was examined using clustering methods and statistical analysis. The primary findings were that IOS had worse performance than Android, and users who chose desktop computers had greater usability than those who preferred mobile devices. The suggested activity prioritization considered responsiveness in IOS as a priority following declared relevance order and usability in inverted order. Keywords k-means algorithm · Covid-19 · Education · Responsivity · Clustering
1 Introduction The COVID-19 changed higher education in a variety of ways, ranging from the learning tools and models that those institutions are adopting to the needs and expectations of the current and future workforce. Therefore, the higher education will most likely never be the same [13] One of the changes that most affected the process learning was the integral adoption of distance learning in the place of the face-toface learning. Therefore, distance learning is a unique solution to not paralyze the education system during critical times [1]. Modern education prepares students for L. A. de Souza (B) · M. M. L. C. Ramos · H. G. Costa Universidade Federal Fluminense, Niterói, RJ 24210-240, Brazil e-mail: [email protected] M. M. L. C. Ramos e-mail: [email protected] H. G. Costa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_12
139
140
L. A. de Souza et al.
effective activities by emphasizing knowledge and the ability to apply it [12]. Because students are no longer restricted to the traditional classroom, mobile technology in education has an impact on learning [3]. The use of mobile devices in education provides both opportunities and challenges [7]. The accessibility and opportunities provided by this technology, demonstrate the benefits of m-learning, while identifying its main issues. The main pedagogical challenge is determining what works better in the classroom and what should be learned outside, and how both can coexist [2, 4, 8, 11, 15]. The use of new advanced mobile platforms with the operating systems as the IOS or Android in educational system by the students has generated a new challenge and increases the opportunities to exploit these devices in the education, due to the characteristics and features that a mobile phone offers, by providing a new experience to the students in the classroom environment [14]. This study focused on mapping features in the institution’s website and in doing a survey about the usability and the relevance of such features, using both access via notebook and mobile devices, comprising both IOS and Android operating systems. The K-means was support identifying critical features to improve the website usability.
2 Proposed Methodology The methodological procedures used in this work are shown in Fig. 1. As an initial step, the existing site was consulted to map its active resources. The mapped functionalities are described in Table 1, with their respective acronyms, adopted for representation in this work. The data collection instrument was organized in a google form. First, the respondent was asked about the general usability and usability of each feature, where they were presented with Likert scale options 1terrible, 2-poor, 3-regular, 4-good, 5-excellent. Next, the respondent was asked about the relevance of each feature. The alternatives offered for this question were: 1-very low, 2-low, 3-average, 4-high, and 5-very high. Next, questions were asked about how often each resource was used from mobile devices such as tablet and smartphone, with
mapping of existing features
construction of the data collection instrument
data collection
cluster analysis (kmeans)
Clustered Data Analysis
Final considerations
Fig. 1 Methodological procedures
general data analysis
Prioritizing Educational Website Resource Adaptations: Data Analysis … Table 1 Mapped functionalities
Description
141 Acronym
Calendar of Classes
CALEND
Test scores in the subjects studied
TSCORES
Communication with course coordination
COORD
Communication with classmates
CLASSMATES
Communication with MBA professors
PROFESSORS
Delivering homework and evaluative activities
WORKDELIVERY
Participation in subject discussion forums
FORUMS
Information about other courses
OCOURSES
Read Texts
TEXTS
Get learning materials (Download)
LMATERIALS
the response options following the scale 1-never, 2-rarely, 3-sometimes, 4-always, 5-often. The participant was also asked which operating system they use on their mobile device. The survey was conducted to capture the opinion of the 80 students of an active specialization course in a Brazilian educational organization. The google questionnaire was made available in the WhatsApp group that such students were participating in between May 1st and June 4th, 2020, a period when restrictive measures to control COVID-19 implied an abrupt migration to the remote learning system. We obtained 36 voluntary contributions. For data analysis Minitab (version 17.1.0) and R (R-4.2.1) software with RStudio (version 2022.02.3 Build 492) were used. In R, the packages "ggplot2", "likert", "cluster", "factoextra" were used, and for loading and saving data with MS Excel the packages "openxlsx", "writexl". The procedures were carried out using a PC with a 64-bit Windows 10 operating system, 8 GB RAM, and the RStudio 2022.02.3 Build 492 (R 4.1.3) environment. 2.80 GHz Intel(R) Core(TM) i5-8400 CPU.
3 Experimental Results Figure 2 shows the respondents’ evaluations of the overall usability of the existing website. We observed that the general evaluation did not show any score “terrible” and had values concentrated between “regular” and “good”. As for the mobile operating system: Android prevails with 66.7% and only 22% prefer to access through mobile devices.
142
L. A. de Souza et al.
Fig. 2 Preferred device, and mobile Op. System
3.1 Likert Evaluation The features of the existing website were evaluated by respondents and the result of the survey is shown at Fig. 3. It’s possible to see the ordination of the analyzed variables in decrescent value of balancing comparing higher scale values (4, 5) and lower values (1, 2) with the value 3 centered. The resources with the worst results of usability by the consulted users were: “access to teaching materials”, “homework delivery”, and “contact with the teacher”. Figure 4 shows the results about the relevance of each feature.
Fig. 3 Usability of features in existing website, obtained by running R package Likert
Fig. 4 Relevance of features in existing website, obtained by running R package Likert
Prioritizing Educational Website Resource Adaptations: Data Analysis …
143
The ranking of relevance shows more pronounced differences between features than in the prior evaluation, which was balanced. The “access to learning materials” and “homework delivery” were two of the most important aspects that also had the lowest usability. Then lies access to evaluation scores and reading texts. Table 2 was organized in which the features are shown according to the ordered usability balancing, then the inverted order of usability, as we must prioritize developing in this dimension the ones with worse perception by users and the relevance in direct order. The product of relevance order and inverted usability order indicates a prioritizing ponderation of both dimensions. Finally, we ordinate the results at the last column. From this tabulation we have an order of the functionalities to be adapted as a priority. However, it does not take into consideration other information provided by survey participants, such as the preferred device for access and operating system of the mobile devices they use. The respondents also answered about the frequency of use each feature and was performed a comparison of frequency declared in mobile devices and in desktops produced the variable preference of device. The stratification of the overall evaluation by preferred device type is shown in Fig. 5. Table 2 Usability x relevance Feature
Usability of existing website
Inverted usability
Relevance of features
Inverted usability x Relevance
Prioritizing order
Contact coordinator
1
10
5
50
3
Contact classmates
2
9
10
90
1
Text reading
3
8
4
36
5
Participate in discussion forums
4
7
9
63
2
Consult evaluation grades
5
6
3
18
7
Information about other courses
6
5
8
40
4
Consult events 7 calendar
4
7
28
6
Access learning materials
8
3
1
3
10
Delivery homework
9
2
2
4
9
Contact with professor
10
1
6
6
8
144
L. A. de Souza et al.
Fig. 5 General evaluation of usability by preference of device
To go further in prioritizing features, we have broken down the evaluation of each feature by the device most often used for access. This result is represented in Fig. 6. Except for viewing the calendar of activities, participating in discussion forums, contacting classmates and coordination, it is noticeable that worse evaluations of the usability of the features for mobile devices. To better understand such reasons, we analyze the overall evaluation by stratifying the operating system used for access. Figures 7 and 8 show the general evaluation of these aspects. The overall information is ambiguous, as on the one hand there is a concentration of responses regarding the usability of the existing website as 2 (bad) for the IOS user, with a higher concentration of responses 3 (regular) and 4(good) for Android users, but rating 5 (excellent) shows 25% of the responses from IOS users. To refine the prioritization, we then applied clustering techniques, for a more robust analysis of the data.
3.2 K-Means Clustering As shown in Figs. 3 and 4, the relevance rating was more pronounced than in the usability rating, which was balanced between different features. Thus, we took as database for cluster identification the set of answers that indicated relevance. The purpose of the k-means method is to classify data by structuring the set into subsets whose features show intra-group similarities and differences with other groups. [5, 9] (Wu et al., 2008; de Souza & Costa, 2022). We utilized three ways to determine the number of groups needed to segregate the data. Elbow (Fig. 9a), Gap Stat (Fig. 9b), and Silhouette (Fig. 9c). The Elbow
Prioritizing Educational Website Resource Adaptations: Data Analysis …
Fig. 6 Individual feature usability evaluation by preference of device
145
146
L. A. de Souza et al.
Fig. 7 General evaluation of usability by Mobile Op. System
method (Cuevas et al., 2000) indicated k = 3, Gap Stat methods [16] suggested k = 7 and the Silhouette method (Kaufman & Rousseeuw, 2005) indicates k = 10. To define the analysis, we did a visual comparison of the data with k ranging from 2 to 5 for, as shown in Fig. 10. We choose to classify the data into three clusters, because the proximity within groups and the remoteness between the groups. Figure 11 depicts a visualization of the clustering of the observations. We separated clusters with the following amounts of observations: Cluster 1 with 10 elements, Cluster 2 with 16 elements, and Cluster 3 with 10 elements. Again, we plot the overall usability evaluation by clusters which can be seen in Fig. 12. We verify that cluster 1 (with 10 respondents) considers less relevance for all features. Cluster 2, with 16 members, considers the features "Visualization of the activity calendar", "Access to evaluation results", "Access to teaching materials" and "Homework delivery" as highly relevant. To investigate the possible relationship with the preferred device and Op. system, we organized the data by cluster in Table 3. We expect in cluster 2 greater differentiation in device and operating system preferences, particularly for the IOS system. However, among the 12 IOS users, only 1 in cluster 2 preferred mobile device. Thus, our suggested priority list, Table 4, focuses on relevance vs. usability inverted. The focus should be on responsiveness, making it possible to use the mobile device with better adaptability (Fig. 13).
Prioritizing Educational Website Resource Adaptations: Data Analysis …
Fig. 8 Individual feature usability evaluation by Mobile Op. System
Fig. 9 Optimal number of clusters
Fig. 10 Visual representations of clustering wit k = 2 to k = 5
147
148
L. A. de Souza et al.
Fig. 11 Clustering data using k-means (k = 3) Fig. 12 General evaluation of usability by cluster
Table 3 Preferred device and Op. System by cluster Preferred device and Op. System
Cluster 1
Cluster 2
Cluster 3
Total
DESKTOP
4
9
7
20
Android
3
6
5
14
iOS
1
3
2
6
EQUAL
5
2
1
8
Android
2
1
iOS
3
1
1
5
MOBILE
1
5
2
8
Android
1
4
2
7
iOS
1
3
1
Prioritizing Educational Website Resource Adaptations: Data Analysis … Table 4 Prioritizing order
Feature
Prioritizing order of development
Contact classmates
1st
Participate in discussion forums
2nd
Contact coordinator
3rd
Information about other courses
4th
Text reading
5th
Consult events calendar
6th
Consult evaluation grades
7th
Contact with professor
8th
Delivery homework
9th
Access learning materials
10th
149
Fig. 13 Individual usability evaluation of feature by cluster
4 Final considerations The goal of this work was to prioritize the features of the existing educational website due to the rapidly changing access profile during the Covid-19 pandemic, when users were forced to shift from traditional workplaces to remote work. Statistical analysis approaches such as likert package analysis using R software and clustering with the k-means algorithm were used. As usability with IOS was worst in all features, and that in the cluster analysis the users in desktop were most satisfied with the features, and the list prioritized the development priority that was built considering relevance and usability inverted. We suggest to extending this research for mass education sites. Acknowledgements This research was partially supported by: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES, Brazil) Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, Brazil)
150
L. A. de Souza et al.
Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ, Brazil)
References 1. Azhari B, Fajri I (2021) Distance learning during the COVID-19 pandemic: School closure in Indonesia. Int J Math Educ Sci Technol. https://doi.org/10.1080/0020739X.2021.1875072 2. Belle LJ (2019) An evaluation of a key innovation: mobile learning. Acad J Interdiscip Stud 8(2):39–45. https://doi.org/10.2478/ajis-2019-0014 3. Bleustein-Blanchet M (2016) Lead the change. Train. Ind. Mag., 16–41 4. Criollo-C S, Guerrero-Arias A, Jaramillo-Alcázar Á, Luján-Mora S (2021) Mobile learning technologies for education: Benefits and pending issues. Appl Sci (Switz), 11(9). https://doi. org/10.3390/app11094111 5. Cuevas A, Febrero M, Fraiman R (2000) Estimating the number of clusters. In Can J Stat 28:2 6. de Souza LA, Costa HG (2022) Managing the conditions for project success: an approach using k-means clustering. In Lect Notes Netw Syst: 420 LNNS. https://doi.org/10.1007/978-3-03096305-7_37 7. de Oliveira CF, Sobral SR, Ferreira MJ, Moreira F (2021) How does learning analytics contribute to prevent students’ dropout in higher education: A systematic literature review. Big Data Cogn Comput 5(4):64. https://doi.org/10.3390/bdcc5040064 8. Ramos MMLC, Costa HG, Azevedo G, da C (2021). information and communication technologies in the educational process. In https://services.igiglobal.com/resolvedoi/resolve.aspx?doi=https://doi.org/10.4018/978-1-7998-8816-1.ch016 pp 329–363. IGI Global. https://doi.org/10.4018/978-1-7998-8816-1.ch016 9. Jain AK (2009). Data clustering: 50 years beyond K-means q. https://doi.org/10.1016/j.patrec. 2009.09.011 10. Kaufman Leonard, Rousseeuw PJ (2005) Finding groups in data : an introduction to cluster analysis. 342 11. Mierlus-Mazilu I (2010). M-learning objects. In: ICEIE 2010 – 2010 International Conference on Electronics and Information Engineering, Proceedings, 1. https://doi.org/10.1109/ICEIE. 2010.5559908 12. Noskova T, Pavlova T, Yakovleva O (2021) A study of students’ preferences in the information resources of the digital learning environment. J Effic Responsib Educ Sci 14(1):53–65. https:// doi.org/10.7160/eriesj.2021.140105 13. Pelletier K, McCormack M, Reeves J, Robert J, Arbino N, Maha Al-Freih, Dickson-Deane C, Guevara C, Koster L, Sánchez-Mendiola M, Skallerup Bessette L, Stine J (2022). 2022 EDUCAUSE Horizon Report® Teaching and Learning Edition. https://www.educause.edu/hor izon-report-teaching-and-learning-2022 14. Salinas-Sagbay P, Sarango-Lapo CP, Barba, R. (2020) Design of a mobile application for access to the remote laboratory. Commun Comput Inf Sci, 1195 CCIS, 391–402. https://doi.org/10. 1007/978-3-030-42531-9_31/COVER/ 15. Shuja A, Qureshi IA, Schaeffer DM, Zareen M, (2019) Effect of m-learning on students’ academic performance mediated by facilitation discourse and flexibility. Knowl Manag ELearn, 11(2), 158–200. https://doi.org/10.34105/J.KMEL.2019.11.009 16. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Society Ser B: Stat Methodol, 63(2), 411–423. https://doi.org/10.1111/ 1467-9868.00293
Voice Operated Fall Detection System Through Novel Acoustic Std-LTP Features and Support Vector Machine Usama Zafar, Farman Hassan, Muhammad Hamza Mehmood, Abdul Wahab, and Ali Javed
Abstract The ever-growing old age population in the last two decades has introduced new challenges for elderly people such as accidental falls. An accidental fall in elderly persons results in lifelong injury, which has extremely severe consequences for the remaining life. Furthermore, continued delay in the treatment of elderly persons after accidental fall increases the chances of death. Therefore, early detection of fall incidents is crucial to provide first aid and avoid the expenses of hospitalization. The major aim of this research work is to provide a better solution for the detection of accidental fall incidents. Most automatic fall event detection systems are designed for specific devices that decrease the flexibility of the systems. In this paper, we propose an automated framework that detects in-door fall events of elderly people in the real-time environment using a novel standard deviation local ternary pattern (Std-LTP). The proposed Std-LTP features are able to capture the most discriminatory characteristics from the sounds of fall events. For classification purposes, we employed the support vector machine (SVM) to distinguish the indoor fall occurrences from the non-fall occurrences. Moreover, we have developed our fall detection dataset that is diverse in terms of speakers, gender, environments, sample length, etc. Our method achieved an accuracy of 93%, precision of 95.74%, recall of 90%, and F1-score of 91.78%. The experimental results demonstrate that the proposed system successfully identified both the fall and non-fall events in various indoor environments. Subsequently, the proposed system can be implemented on various devices that can efficiently be used to monitor a large group of people. Moreover, the proposed system can be deployed in daycare centers, old homes, and for patients in hospitals to get immediate assistance after a fall incident occurs. Keywords Fall event · Machine learning · Non-fall event · Std-LTP · SVM
U. Zafar · F. Hassan · M. H. Mehmood · A. Wahab · A. Javed (B) University of Engineering and Technology, Taxila, Pakistan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_13
151
152
U. Zafar et al.
1 Introduction The population of aged people around the world is increasing at a rapid pace because of the advancement that has been made in the medical field. As reported by United Nations World Aging Population Survey (World Population Ageing, United Nations, 2020), there were around 727 million people aged 65 years or more in 2020. It is expected that this figure will be doubled by 2050. One of the most prevalent causes of injuries is an accidental fall. Old age people are mostly affected by these accidental falls that happen due to various reasons such as unstable furniture, slippery floors, poor lighting, obstacles, etc. It is very common in most countries that elderly persons live alone in their homes without the presence of kids and nurses. The most devastating effect of the fall incident on elderly people is that they may lay on the floor unattended for a long interval of time. As a result, they develop long-lasting injuries that in some cases can even lead to death. Research study finds that these fall incidents of old people cost millions of Euros to the UK government [1]. Risk factors of accidental falling increase with old aged people and also cost more for their treatment and care [2]. According to one report [3], people aged 65 years or older are more likely to fall once a year and some of them may fall more than once as well. These statistics demand to develop reliable automated fall detection systems using modern-day technology to help reduce the after-effects of fall incidents and to provide immediate first-aid support to the concerned person. The research community also explored motion sensors and acoustic sensors-based fall detection systems for elderly people using several techniques implemented in wearable devices i.e., smart watches, smart shoes, smart built, smart bracelets, and smart rings. In Yacchirema et al. [4], LowPAN, sensor networks, and cloud computing were used for the detection of fall events. Four machine learning classifiers i.e., logistic regression, ensemble, deepnets, and decision trees were employed for classification purposes, but the ensemble performed well using SisFall, sliding windows along with Signal magnitude area (SMA), and Motion-DT features. The healthcare professional receives the notification through a secure and lightweight protocol. In Giansanti et al. [5], a mono-axial accelerometer was utilized to calculate the acceleration of different parts of the body intending to detect any mishap. Acceleration is one of the important parameters that can be used to observe the motion of the body. A mono-axial accelerometer measures the vertical acceleration of a person’s body to detect the fall event. However, elderly people need to wear the accelerometer every time which greatly affects their daily activities and routine lives. In a study by Muheidat [6], sensor pads were used and placed under the carpet, which monitors the aged persons. Computational intelligence techniques i.e., convex hull and heuristic were used to detect fall events. In a study by [7], instead of wearable devices, fall was detected using smart textiles with the help of a non-linear support vector machine. To analyze an audio signal, the Gabor transform was applied in the time and frequency domain to derive new features known as Wavelet energy. In the study by [8], a fall detection system was developed using an acoustic sensor that was placed on the z-axis to detect the pitch of the audio. However, this method has a
Voice Operated Fall Detection System Through Novel Acoustic …
153
limitation as only a single person is allowed in the locality. Moreover, elderly people are unable to carry the sensors all the time. This concern was addressed in [9] and two types of sensors were used i.e., body sensor and fixed sensor at home. Both the body sensor and fixed sensor were used at the same time. At home, fixed sensors can also work independently if a person is unable to carry body sensors. A mixed positioning algorithm was used to determine the position of the person that is used to decide the fall event. The research community has also explored vision-based techniques for fall detection. In visual surveillance applying background, subtraction is a quite common approach to discriminate moving objects. In the study by Yu et al. [10], visionbased fall detection was proposed in which background subtraction was applied for extraction of the human body silhouette. The extracted silhouettes were fed into CNN for detecting both fall occurrences and non-fall occurrences. In the study by [11], the Gaussian mixture model was utilized for observing the appearance of a person during the video sequence. This approach detects the fall event in case of any deformation found in the shape of the concerned person. In the study by Cai et al. [12], a vision-based fall detection system was proposed in which hourglass residual units were introduced to extract multiscale features. SoftMax classifier was used for the categorization of both fall and non-fall events. In the study by Zhang et al. [13], the YOLACT network was applied to the video stream to distinguish different human activities and postures. A convolutional neural network (CNN) was designed for the classification of fall vs non-fall events. Vision-based fall detection systems are widely used; however, these fall detection systems have certain limitations i.e., privacy issues, real-life falls are difficult to detect because of the data set, high-cost because high-resolution cameras are required to cover the entire room, computational complexity due to the processing of millions of video frames. The research community has also explored various machine learning and spectral features-based fall detection systems for the detection of fall occurrences and nonfall occurrences [14–17]. In the study by [14], the Hidden Markov model was used to determine the fall event. The Mel frequency cepstral coefficients (MFCC) features are capable to extract prominent information from the audio signals and are used for different research works [15, 18–22], respectively. In the study by [22], MFCC features were used to train Nearest Neighbor (NN) for the categorization of fall occurrences and non-fall occurrences. In the study by [23], MFCC, Gammatone cepstral coefficients (GTCC), and skew-spectral features were used for extracting features, and a decision tree was used for the classification of fall and non-fall events. In the study by Shaukat et al. [21], MFCC and Linear Predictive coding (LPC) were utilized for the voice recognition of elderly persons. An ensemble classifier was employed for classification purposes on daily sound recognition (AudioSet 2021) and RWCP (Open SLR 2021) datasets. In the study by [15], MFCC features were used with one class support vector machine method (OCSVM) for the classification of the fall and non-fall sounds. In our prior work [15], we proposed an acousticLTP features-based approach with the SVM classifier for fall event detection. This method was more effective than MFCC in terms of computational cost and also rotationally invariant. Although the above-mentioned sensors-based, acoustic-based,
154
U. Zafar et al.
and computer vision-based fall detection systems achieve better detection of fall events. However, different restrictions are still present in modern methods i.e., fall detection systems can be implemented merely in wearable devices, some frameworks are only sensors-based which makes it difficult for elderly people to carry the body sensors all the time, and computer vision-based fall detection systems have privacy concerns, high computational costs, failure in fall detection in case server fails in a client–server architecture, etc. So, there is a need to develop automated fall event detection systems that are robust to the above-mentioned limitations. The major contributions of our study are as under: • We present a novel audio feature i.e., Std-LTP that is capable of extracting the most discriminative characteristics from the input audio. • We present an effective voice-operated fall detection system that can reliably be utilized for determining fall occurrences. • We created our own in-house audio fall event dataset that is distinct respect of speakers, speaker gender, environment, etc. The remaining paper is organized as follows. In Sect. 2, we discussed the proposed methodology. In Sect. 3, experimental results are discussed, whereas we conclude our work in Sect. 4.
2 Proposed Methodology The main goal of the designed system is to identify fall occurrences and non-fall occurrences from audio clips. Extraction of features and classification of audios are the two steps that are involved in this proposed system. Initially, we extracted our 20-dimensional Std-LTP features from the audio input and then used all the 20dimensional features to classify the fall occurrences and non-fall occurrences. For classification purposes, we employed the SVM. The flow diagram of the designed system is given in Fig. 1.
2.1 Feature Extraction The extraction of features is critical for designing an efficient categorization system. The process of feature extraction of the proposed method is explained in the following section.
Voice Operated Fall Detection System Through Novel Acoustic …
155
Fig. 1 Proposed system
2.2 Std-LTP Computation In the proposed work, we presented the Std-LTP features descriptor to extract the characteristics of fall occurrences and non-fall occurrences from the audio. We obtained the 20-dimensional Std-LTP features from the audio signal y[n]. To extract features from the voice using Std-LTP, the audio signal is divided into multiple windows (Wc). We computed the Std-LTP by encoding each Wc of an audio signal y[n]. The total windows are obtained by dividing the samples by 9. Each Wc comprises nine samples which are used to generate the ternary codes. Initially, we computed the threshold value of each window Wc. In the prior study [22], acousticLTP utilized a static threshold value for each Wc which doesn’t take into account the local statistics of the samples of each Wc. In this paper, we computed the value of the threshold using the local statistics of each sample around the central sample c in each Wc. We computed the threshold value by calculating the standard deviation of each Wc and multiplying it by scaling factor α, so, the threshold for each Wc varies. The standard deviation of each Wc is computed by using the following equation: σ =
8
i=0 (qi
N
− μ)2
(1)
156
U. Zafar et al.
where σ is standard deviation, qi is the value of each sample in the Wc, μ is the mean of nine values of that Wc, N is the number of samples which is nine. The threshold is calculated as follows: σ ∗α
(2)
where α is the scaling factor and 0 < α ≤ 1. We used α = 0.5 in our work because we achieved the best results in this setting. We compared the c with corresponding neighboring values. To achieve this purpose, we quantified the magnitude difference between c and the neighboring samples. Values of samples greater than c+th are set to 1 and those which are smaller than c–th are set to –1, whereas values between c–th and c+th are set to 0. Hence, we obtained the ternary codes as follows: ⎧ qi ≥ (c + (σ ∗ α)) ⎪ ⎨ +1, f (qi , c, t) = 0, (c − (σ ∗ α)) < qi < (c + (σ ∗ α)) ⎪ ⎩ −1, qi ≤ (c − (σ ∗ α))
(3)
where f (qi , c, t) is the function representing the ternary codes. For instance, consider the following frame having 9 samples as shown in Fig. 2. We computed the standard deviation of the Wc which is σ ≈ 6 in this case. Next, we multiply the standard deviation value by the scaling factor of 0.5 to get the threshold value that is σ * α = 6 * 0.5 = 3. So, values that are greater than 33 are set to +1, less than 27 are set to –1, and values between 33 and 27 are set to 0. In this way, the ternary code of the vector having nine values is generated. Next, we compute the upper and lower binary codes. For upper codes, we set the value to 1 where the ternary code is +1 and values of 0 and –1 are set to zero as: f u (qi , c, t) =
1, f (qi , c, t) = +1 0, other wise
For lower codes, we set all values of –1 to 1 and 0 and +1 to 0 as: Fig. 2 Feature extraction
(4)
Voice Operated Fall Detection System Through Novel Acoustic …
fl (qi , c, t) =
1, f (qi , c, t) = −1 0, other wise
157
(5)
We transformed these upper and lower codes into decimal values as: Tu p =
7
f uuni (qi , c, t).2i
(6)
fluni (qi , c, t).2i
(7)
i=0
Tl p =
7 i=0
Histograms are calculated for the upper and the lower codes as follows: h u (k) =
W
δ(TW(u) , k)
(8)
δ(TW(l) , k)
(9)
w=1
h l (k) =
W w=1
where k shows the histogram bins. We used ten patterns for upper and lower binary codes to capture the characteristics of the sound involving the fall and non-fall events as our experiments provided the best results on ten patterns from both groups. We combined the ten upper and ten lower codes to develop a 20-dim Std-LTP descriptor as: Std − L T P = [h u ||h l ]
(10)
2.3 Classification The binary classification problems can be easily resolved by using a SVM, therefore we utilized SVM in our work for performing classification. The Std-LTP features are utilized to train SVM for categorizing fall occurrences and non-fall occurrences. We tuned different parameters for SVM and set the following values: box constraint of 100, kernel scale to 1, gaussian kernel, and outlier function to 0.05.
158
U. Zafar et al.
Table 1 Details of fall and non-fall dataset No of samples
No of fall samples
No of non-fall samples
Training samples
Testing samples
508
234
274
408
100
3 Experimental Setup and Results Discussion 3.1 Dataset We developed our fall detection dataset comprising audio clips of fall occurrences and non-fall occurrences recorded with two devices i.e., Lenovo K6 note and infinix note 10 pro. The dataset is specifically designed to detect fall events. We recorded the voices of different speakers for fall and non-fall incidents in various environments and various locations i.e., home, guest room, washroom, etc. The period of sound clips varies from 3 to 7 s in duration. Sound clips of fall occurrences comprise intense painful audio while the sound clips of non-fall occurrences consist of inaudible audio, conversations, TV being played in the background, etc. The dataset has 508 audio samples that comprise 234 samples of fall events and 274 samples of non-fall events. The audio categorization of the dataset is reported in Table 1.
3.2 Performance Evaluation of the Proposed System This experiment is performed to evaluate the efficacy of the developed system for the detection of possible fall occurrences on our in-house fall detection dataset. For this experiment, we utilized the data up to 80% (408 samples to train the model and 20% data (100 samples) for testing purposes. More specifically, we used 234 fall event audios and 274 non-fall audios. We obtained the 20-dim Std-LTP features of all the sound clips and trained them on the SVM for the categorization of fall occurrences and non-fall occurrences. We obtained an accuracy of 93%, precision of 95.74%, recall of 90%, and F1-score of 91.78%. These results enhance the efficiency of the developed system for fall detection.
3.3 Performance Comparison of Std-LTP Features on Multiple Classifiers We conducted an experiment to measure the significance of SVM with our StdLTP features for fall detection. For this, we selected different machine learning classifiers i.e., Logistic regression (LR), Naïve Bayes (NB), K-nearest neighbour
Voice Operated Fall Detection System Through Novel Acoustic …
159
Table 2 Performance comparison of multiple classifiers Method
Kernel
Accuracy%
Precision%
Recall%
F1-score%
Std-LTP + LDA
Linear
90
90
91.83
90.90
Std-LTP + NB
Kernel NB
82
88
78.57
83.01
Std-LTP + KNN
Fine
92
92
92
92
Std-LTP + Ensemble
Subspace KNN
92
94
90.38
92.15
Std-LTP + DT
Coarse
78
58
95.45
72.15
Std-LTP + SVM
Fine Gaussian
93
95.74
90
91.78
(KNN), ensemble, Decision tree (DT), along with the SVM, and trained them using the proposed features and results are reported in Table 2. We can observe that StdLTP performed best with the SVM by achieving an accuracy of 93%. The Std-LTP with the KNN and ensemble subspace KNN achieved the second-best results with an accuracy of 92%. The Std-LTP with DT was the worst and achieved an accuracy of 78%. We can conclude from this experiment that the proposed Std-LTP features with the SVM outperform all comparative classifiers for fall event detection.
4 Conclusion In this research work, we have presented a better accidental fall detection approach for the elderly persons to provide first aid. Elderly people live in home alone, which need continuous monitoring and special care. Therefore, in this work, we designed a novel approach based on the proposed innovative acoustic Std-LTP features to obtain the prominent attributes from the screams of the accidental falls. Moreover, we developed our in-door fall incidents diverse dataset that has voice samples of screams and pain voices. We used our in-door fall events dataset for experimentation purposes and obtained 20-dimensional proposed Std-LTP features from the voice clips. We fed the extracted 20-dim Std-LTP features into SVM for distinguishing between the fall occurrences and non-fall occurrences. Experimental results show that the proposed method efficiently identifies fall occurrences with 93% accuracy and the lowest false alarm rate. Furthermore, it is possible to implement this system in actual environments i.e., in hospitals, old houses, nursing homes, etc. In the future, we aim to use the proposed Std-LTP features on other fall events datasets to check the effectiveness and generalizability of the proposed system. We also aim to send the location of monitored persons to caretakers where the fall is detected.
160
U. Zafar et al.
References 1. Scuffham P, Chaplin S, Legood R (2003) Incidence and costs of unintentional falls in older people in the United Kingdom. J Epidemiol Community Health 57(9):740–744 2. Tinetti ME, Speechley M, Ginter SF (1988) Risk factors for falls among elderly persons living in the community 319(26):1701–1707 3. Voermans NC, Snijders AH, Schoon Y, Bloem BR (2007) Why old people fall (and how to stop them). Pract Neurol 7(3):158–171 4. Yacchirema D, de Puga JS, Palau C, Esteve M (2019) Fall detection system for elderly people using IoT and ensemble machine learning algorithm. Pers Ubiquit Comput 23(5):801–817 5. Giansanti D, Maccioni G, Macellari V (2005) The development and test of a device for the reconstruction of 3-D position and orientation by means of a kinematic sensor assembly with rate gyroscopes and accelerometers 52(7):1271–1277 6. Muheidat F, Tawalbeh L, Tyrer H (2018) Context-aware, accurate, and real time fall detection system for elderly people. In: 2018 IEEE 12th international conference on semantic computing (ICSC). IEEE 7. Mezghani N, Ouakrim Y, Islam MR, Yared R, Abdulrazak B (2017) Context aware adaptable approach for fall detection bases on smart textile. In: 2017 IEEE EMBS international conference on biomedical & health informatics (BHI). IEEE, pp 473–476 8. Popescu M, Li Y, Skubic M, Rantz M (2008) An acoustic fall detector system that uses sound height information to reduce the false alarm rate. In: 2008 30th annual international conference of the IEEE engineering in medicine and biology society. IEEE, pp 4628–4631 9. Yan H, Huo H, Xu Y, Gidlund M (2010) Wireless sensor network based E-health systemimplementation and experimental results. IEEE Trans Consum Electron 56(4):2288–2295 10. Yu M, Gong L, Kollias S (2017) Computer vision based fall detection by a convolutional neural network. In: Proceedings of the 19th ACM international conference on multimodal interaction 11. Rougier C, Meunier J, St-Arnaud A, Rousseau J (2011) Robust video surveillance for fall detection based on human shape deformation. IEEE Trans Circuits Syst Video Technol 21(5):611–622 12. Cai X, Li S, Liu X, Han G (2020) Vision-based fall detection with multi-task hourglass convolutional auto-encoder. IEEE Access 8:44493–44502 13. Zhang L, Fang C, Zhu M (2020) A computer vision-based dual network approach for indoor fall detection. Int J Innov Sci Res Technol 5:939–943 14. Tong L, Song Q, Ge Y, Liu M (2013) HMM-based human fall detection and prediction method using tri-axial accelerometer. IEEE Sens J 13(5):1849–1856 15. Khan MS, Yu M, Feng P, Wang L, Chambers J (2015) An unsupervised acoustic fall detection system using source separation for sound interference suppression. Signal Process 110:199–210 16. Younis B, Javed A, Hassan F (2021) Fall detection system using novel median deviated ternary patterns and SVM. In: 2021 4th international symposium on advanced electrical and communication technologies (ISAECT). IEEE 17. Banjar A et al (2022) Fall event detection using the mean absolute deviated local ternary patterns and BiLSTM. Appl Acoust 192:108725 18. Qadir G et al (2022) Voice spoofing countermeasure based on spectral features to detect synthetic attacks through LSTM. Int J Innov Sci Technol 3:153–165 19. Hassan F, Javed A (2021) Voice spoofing countermeasure for synthetic speech detection. In: 2021 International conference on artificial intelligence (ICAI). IEEE, pp 209–212 20. Zeeshan M, Qayoom H, Hassan F (2021) Robust speech emotion recognition system through novel ER-CNN and spectral features. In: 2021 4th international symposium on advanced electrical and communication technologies (ISAECT). IEEE 21. Shaukat A, Ahsan M, Hassan A, Riaz F (2014) Daily sound recognition for elderly people using ensemble methods. In 2014 11th international conference on fuzzy systems and knowledge discovery (FSKD). IEEE, pp 418–423
Voice Operated Fall Detection System Through Novel Acoustic …
161
22. Li Y, Ho KC, Popescu M (2012) A microphone array system for automatic fall detection. IEEE Trans Biomed Eng 59(5):1291–1301 23. Hassan F, Mehmood MH, Younis B, Mehmood N, Imran T, Zafar U (2022) Comparative analysis of machine learning algorithms for classification of environmental sounds and fall detection. Int J Innov Sci Technol 4(1):163-174s
Impact of COVID-19 on Predicting 2020 US Presidential Elections on Social Media Asif Khan , Huaping Zhang , Nada Boudjellal , Bashir Hayat , Lin Dai, Arshad Ahmad , and Ahmed Al-Hamed
Abstract By the beginning of 2020, the world woke up to a global pandemic that changed people’s everyday lives and restrained their physical contact. During those times Social Media Platforms (SMPs) were almost the only mean of individualto-individual and government-to-individuals communications. Therefore, people’s opinions were more expressed on SM. On the other hand, election candidates used SM to promote themselves and engage with voters. In this study, we investigate how COVID-19 affected voters’ opinions through the months of the US presidential campaign and eventually predict the 2020 US Presidential Election results using Twitter’s data. Mainly two types of experiments were conducted and compared; (i) transformer-based, and (ii) rule-based sentiment analysis (SA). In addition, vote shares for the presidential candidates using both approaches were predicted. The results show that the rule-based approach nearly predicts the right winner, Joe Biden with MAE 2.1, outperforming the predicted results from CNBC, Economist/YouGov, and transformer-based (BERTweet) approach, except for RCP (MAE 1.55). Keywords Twitter · Sentiment Analysis · Rule-based · Transformers · COVID-19 · Election Prediction · USA Presidential Election
A. Khan · H. Zhang (B) · N. Boudjellal · L. Dai · A. Al-Hamed School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China e-mail: [email protected] N. Boudjellal The Faculty of New Information and Communication Technologies, University Abdelhamid Mehri Constantine 2, 25000 Constantine, Algeria B. Hayat Institute of Management Sciences Peshawar, Peshawar, Pakistan A. Ahmad Institute of Software Systems Engineering, Johannes Kepler University, 4040 Linz, Austria Department of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences and Technology, Mang Khanpur Road, Haripur 22620, Pakistan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_14
163
164
A. Khan et al.
1 Introduction The outbreak of COVID-19 pandemic shook the whole world and disturbed the daily life of people by locking them in their dwellings. During those quarantine times, the role of social media platforms was much appreciated. All sectors of life were affected including politics with at least 80 countries and territories postponing their elections. On the other hand, at least 160 countries decided to hold their elections despite the risks posed by the pandemic [2]. Among these was US election. The US 2020 Presidential Elections campaign has fundamentally changed amid the coronavirus pandemic, with candidates straining to reach voters virtually. Twitter—with US as the top country for the number of users, is widely used by politicians to react to their audience and it is a platform where voters can express their opinion about candidates and their programs freely. Therefore, Twitter data is a mine that when exploited well, can lead to meaningful predictions and insights, including election results prediction. Several studies have analyzed elections on SMPs [1, 3, 4, 11, 12, 18]. Many researchers investigated elections on SMPs using SA approaches [4, 10, 13, 16], and very few studies employed BERT-based models for SA to predict elections on SMPs [6, 20]. The impact of COVID-19 on elections has been investigated by [5, 8, 15, 19], nevertheless, these studies used data from different surveys. To the best of our knowledge, no study has investigated the impact of a pandemic on an election using social media data. In this study, we analyze the effect of COVID-19 on US elections by mining people’s opinions about candidates and eventually predicting 2020 US election results. We studied the tweets about COVID-19 and the two US presidential final candidates: Donald Trump and Joe Biden. The main contributions of this study are: 1 Twitter Mining—tweets related to COVID-19 and Joe Biden and Donald Trump. 2 Predict US Presidential Election 2020 using COVID-19 data. 3 Analyze and compare two SA approaches—VADER for rule-based approach, and BERTweet for transformer-based approach. 4 Compare our predictions with three famous polls’ results as well as the 2020 US Presidential Election. The rest of the paper is organized as follows: Sect. 2 provides an overview of the related literature, followed by the proposed methodology in Sect. 3. Section 4 discusses the experimental results. Afterwards, the study is concluded in Sect. 5.
2 Related Studies Social media data have consistently attracted countless researchers to explore diverse events including election predictions [1, 3, 4, 11, 12, 18]. These researchers endeavoured to predict elections on SMPs by utilizing different features, factors,
Impact of COVID-19 on Predicting 2020 US Presidential Elections …
165
and approaches. There are mainly three types of approaches for predicting elections on SMPs: (i) sentiment Analysis, (ii) social network analysis, and (iii) volumetric/counting [12]. A majority of the studies showed the effectiveness of SA approaches in election predictions using Twitter data [4, 10, 13, 16]. The authors [14] analyzed and forecasted US 2020 Presidential Election using a lexicon-based SA. They analyzed tweets in each state for the candidates and classified the results into Solid Democratic, Solid Republican, Tossup, Lean Democratic, and Lean Republican. The authors [21] conducted experiments (SA) on Sanders Twitter benchmark dataset and concluded that the Multi-layer Perceptron classifier performs the best. The authors investigated Japanese House of Councilors election (held in 2019) using the replies to candidate and the sentiment values of these replies [16]. The paper [3] investigated 2018 Brazilian Presidential Election and 2016 US Presidential Election by employing social media data and machine learning techniques. The authors collected posts from candidates’ profiles including traditional posts. Next, they use an artificial neural network to predict the vote share for presidential candidates. Few studies have used BERT-based SA to predict elections. The study [6] analyzes US 2020 Presidential Elections using LSTM and BERT-based models for Twitter SA. The authors concluded that the BERT model indicated Biden as a winner. Likewise, the study [20] analyzed 2020 US Presidential Elections using SA. The authors compared four deep learning and machine learning domains: naïve bayesian, textblob, BERT, and support vector machine. They found that BERT shows better performance as compared to the other three methods. Some studies investigated the impact of COVID-19 on elections. For instance, the authors [9] analyzed and predicted corona cases in USA, and observed presidential elections with the corona confirmed and death cases. The authors used ARIMA, a time series algorithm. In another study [7], the authors investigated the impact of COVID-19 on Trump during US Presidential Elections 2020. They used multivariate statistical analyses by gathering national survey data, before and after the election. They showed that the pandemic harmed Trump’s image, which led to a very narrow path to winning the election. Other studies investigated the impact of a pandemic on elections such as [5, 8, 15, 19], however, these studies analyzed data like surveys and questionnaires. All these studies studied elections either using sentiment analysis of general opinions of the public about candidates, elections, and parties on social media, or the impact of COVID-19 on elections using surveys. This led us to investigate the impact of a pandemic on elections using social media. We studied the impact of COVID-19 on 2020 US Presidential Elections using rule-based and transformer-based SA.
3 Proposed Methodology This section presents our proposed methodology in detail. Figure 1 demonstrates the proposed methodology for predicting elections using tweets related to COVID-19.
166
A. Khan et al.
Fig. 1 Framework
Table 1 Data collection (keywords)
Joe Biden
@JoeBiden
Donald J. Trump
@realDonaldTrump
COVID-19
Coronavirus, covid-19, covid, corona, pandemic, epidemic, virus
3.1 Tweets In this study, we use Twitter data to predict 2020 US Presidential Elections. We employ a python library “Tweepy”, to mine the tweets from Twitter. Tweets mentioning the two candidates running for the President (see Table 1) are collected between 1st August 2020 and 30th November 2020. Finally, the collected tweets that include keywords related to coronavirus (see Table 1) are selected for this study.
3.2 Data Pre-processing The collected raw tweets contain numerous amounts of meaningless data and superfluous noise. We preprocessed all the tweets to clean the data. In this study, we use tweets written in English only. Further, we remove unnecessary noise and data such as stopwords, hashtags(#), mentions(@), IPs, URLs, and emoticons. The tweets are further converted to lower case and are tokenized.
Impact of COVID-19 on Predicting 2020 US Presidential Elections …
167
3.3 Sentiment Analysis Sentiment analysis (SA) analyses the subjective information in a statement. It is also referred to as opinion mining. It uses NLP technique to classify tweets (statements) into positive, negative, and neutral. SA plays a very vital role in the domain of election prediction. SA somehow portray the intentions and attitudes of voters toward political entity such as politicians and political parties. In this study, we employed two SA approaches to predict the winner of the 2020 US Presidential Election: (i) rule-based SA approach, and (ii) transformer-based SA approach. We employed Valence Aware Dictionary and sEntiment Reasoner (VADER) in the first approach. It is particularly attuned to opinions expressed on SM. Many researchers use this approach extensively in different domains such as Twitter. References. The latter approach uses BERTweet. It is a language model pre-trained for English tweets. It is trained on RoBERTa. The corpus used for BERTweet comprises 850 million tweets including 5 million COVID-19 related tweets.
3.4 Predicting the President We employed Eq. 1 (Donald Trump) and Eq. 2 (Joe Biden) to predict the winner of the 2020 US Presidential Election. We discard the neutral sentiments focusing on positive and negative sentiments. Furthermore, we calculate the Mean Absolute Error (MAE) using Eq. 3 to evaluate our two methods and observe the amount of deviation of our predictions from the actual election outcomes. Vote-share Trump =
(Pos. Trump + Neg. Biden) × 100 (Pos. Trump + Neg. Trump + Pos. Biden + Neg. Biden)
(1)
Vote-share Biden =
(Pos. Biden + Neg. Trump) × 100 (Pos. Trump + Neg. Trump + Pos. Biden + Neg. Biden) MAE =
1 N |Predictedi − Actuali | i=1 N
(2) (3)
168
A. Khan et al.
4 Experimental Results and Discussion This section presents the experimental results and discussions of our study. We performed extensive experiments to forecast the victor of election using COVID-19 tweets. We use a python library “Tweepy” to mine the tweets from Twitter. We mined tweets mentioning Donald Trump and Joe Biden between 1 Aug 2020 and 30 Nov 2020. Next, we chose tweets that contain keywords related to COVID-19 (see Table 1) and considered these tweets for this study. Table 2 shows the dataset used in this research study. Figure 2 shows the distribution of collected tweets over time. The number of tweets related to COVID-19 dropped, as the elections were getting closer. Especially, for Joe Biden in September 2020, and Donald Trump in November 2020. The experiments are based on two approaches, rule-based (VADER) and transformer-based (BERTweet). The tweets are preprocessed by employing Natural Language Toolkit (NLTK). It comes with a built-in sentiment analyzer VADER, used in this study. For transformer-based SA approach (BERTweet), we used a library “pysentimiento (A Python Toolkit for Sentiment Analysis and SocialNLP tasks)” [17]. All experiments were conducted in Jupyter Notebook (Python 3.7.4) environment on a PC with 64-bit Windows 11 OS, Intel(R) Core(TM) i7-8750H CPU and 16 GB RAM.
4.1 Sentiment Analysis We have conducted extensive experiments to analyze the sentiments of people towards Donald Trump and Joe Biden during the pandemic (and election campaigns). Table 2 Number of Tweets Data collection COVID-19 only
Biden 20000 15000
Donald Trump
1,385,065
681,408
42,642
21,702
Trump
18166 10635
10000 5000
Joe Biden
6769
4775
9287 7143
6698 871
0 2020-08
Fig. 2 Timeline—Tweets collection
2020-09
2020-10
2020-11
Impact of COVID-19 on Predicting 2020 US Presidential Elections …
169 Biden Pos
Time
2020-11 5 22
73
9
2020-10 11
44
45
7
2020-09 6
49
45
4 5
2020-08 3
62
0
35 50
100 Sentiments
39
52
51
Biden Neg
42
Biden Neu
50
46
Trump Pos
53
42
Trump Neg
150
200
Trump Neu
Fig. 3 Sentiment analysis using BERTweet
We have applied two different methods and compared them. The results of both methods (rule-based and transformer-based SA) are shown and discussed below.
4.1.1
Results of BERTweet
Figure 3 shows the sentiment analysis using BERTweet (transformer-based) for Donald Trump and Joe Biden. It combines the sentiment percentages for the two political leaders showing the results of 200%. The first slot from left to right (0–100%) demonstrates the sentiments for Joe Biden and the second slot onwards (100–200%) demonstrates the sentiments for Donald Trump. It can be seen in Fig. 3 that people’s sentiments towards the leaders show a small percentage of positive as compared to neutral and negative. It is interesting to notice the sentiment shift for both leaders. The percentage of negative sentiment is getting lower for Joe Biden as the elections were getting closer. The negative sentiments were shifting slightly towards positive and more to neutral. The sentiments towards Donald Trump are nearly the same with slight shifts during the months before elections. Nonetheless, we believe that the percentage of negative decreased in the month of November from 51% (in October) to 39% due to the high decrease in the number of tweets. The results show that on average the attitude of voters (Twitter users) toward Joe Biden was more supportive than Donald Trump.
4.1.2
Results of VADER
Figure 4 illustrates the sentiment analysis using VADER (rule-based SA approach) for Donald Trump and Joe Biden during elections campaigns using COVID-19 tweets. Figure 4 combines the sentiment percentages for the two political leaders showing the results in a 200% slot. The first slot from left to right (0–100%) demonstrates the sentiments for Joe Biden and the second slot onwards (100–200%) demonstrates the sentiments for Donald Trump. It is interesting to notice that the sentiments shift for both leaders, especially for Joe Biden (see Fig. 4). The positive sentiment percentage for Biden increased from
170
A. Khan et al.
Time
2020-11
44
2020-10
17
2020-09
36
2020-08
18 0
35
24
Biden Neg
28
32
40
28
Biden Neu
25
32
45
22
Trump Pos
100 Sentiments
150
36 57 50
41
22
43
35
Biden Pos
35
34
31
39
200
Trump Neg Trump Neu
FIG. 4 Sentiment analysis using VADER
18 to 44%, and the negative sentiment percentage decreased from 57 to 17%, as the elections were getting closer. The neutral percentage remains almost the same with trivial changes. That shows that the attitude of people towards Joe Biden was getting positive by the time, which can be considered as a leading factor towards winning the election. On the other hand, there is not a colossal shift in the positive sentiment toward Donald Trump except in October 2020 which increased by nearly 9%. Moreover, the negative percentage decreases with time.
4.2 Predicting Vote Share Figure 5 represents the vote-share for Donald Trump and Joe Biden using VADER (RB) and BERTweet (TB) approaches. We employed Eqs. 1 and 2 to predict the vote share for Donald Trump and Joe Biden consequently. The vote-shares are presented in percentages on monthly basis (from August 2020 to November 2020) followed by the “average vote-shares”. “TB” represents transformer-based (BERtweet) approach in Fig. 5 and “RB” represents rule-based (VADER). Surprisingly, the results from TB show Donald Trump as a clear winner (avg. voteshare for Trump is 62.18%, and 37.82% for Joe Biden. Contrastingly, the results from VADER show that Biden is the winner with a negligible lead (50.15% for Biden and 49.85% for Trump). 80 60
36.56 63.44 51.92 48.08
59.16 40.84 45.71 54.29
29.44 70.56 69.21 30.79
37.82 62.18 50.15 49.85
20
26.11 73.89 33.76 66.24
40
2020-08
2020-09
2020-10
2020-11
Average vote-share
0
TB. Biden vote-share
TB. Trump vote-share
RB. Biden vote-share
RB. Trump vote-share
Fig. 5 Vote shares for Biden and Trump using BERTweet and VADER
Impact of COVID-19 on Predicting 2020 US Presidential Elections …
171
Table 3 Predicted results and the MAE Predicted Results
Final Results
RCP
Economist/YouGov
CNBC
BERTweet
VADER
2020 US Presidential Election
Joe Biden
51.2
53
52
37.82
50.15
51.40
Donald Trump
44
43
42
62.18
49.85
46.90
MAE
1.55
2.75
2.75
14.43
2.1
Table 3 shows our predicted vote shares using BERTweet and VADER for Donald Trump and Joe Biden along with the actual 2020 US Presidential Election results as well as the three polls’ (CNBC, RCP, and Economist) predicted values. In addition, Table 1 shows the MAE for the polls and our predictions. The results using VADER are quite impressive as it outperformed BERTweet and the two polls’ predicted results except that of RCP. Contrary, BERTweet has the highest error (MAE = 14.43). The results show that a pandemic can affect events and help us in predicting an event such as an election.
5 Conclusion and Future Work In this study, we investigated the effects of a pandemic on an election. We analyzed 2020 US Presidential Election during COVID-19 using Twitter’s data (from 1st August 2020 to 30th November 2020). We studied tweets mentioning Donald Trump, and Joe Biden and contain COVID-19 keywords. Conspicuously, the tweets for Joe Biden (66.3%) were more in numbers as compared to Donald Trump (33.7%). Mainly two types of experiments were conducted and compared—transformer-based SA, and rule-based SA. In addition, vote shares for the presidential candidates were predicted using both approaches. The results are quite interesting; the rulebased approach somehow led us in predicting the right winner (Joe Biden) with MAE 2.1, outperforming the predicted results from CNBC, Economist/YouGov, and transformer-based (BERTweet) approach, except for RCP (MAE 1.55). This study has some limitations, which need to be investigated in future. The first COVID-19 cases in USA were diagnosed in January 2020. Whilst, the tweets considered in this study are of a short period (Aug–Nov), which has effects on the outcomes of the predictions. In addition, tweets mentioning the candidates are considered, while ignoring the tweets using hashtags. Investigating a large number of tweets would improve the predictions and give us a better insight into the elections. Moreover, analyzing correlations with the real COVID-19 results is needed—studying the number of infected, recovered/discharged, and deaths.
172
A. Khan et al.
References 1. Ali H, Farman H, Yar H et al (2021) Deep learning-based election results prediction using Twitter activity. Soft Comput. https://doi.org/10.1007/s00500-021-06569-5 2. Asplund E (2022) Global overview of COVID-19: impact on elections | International IDEA. In: Int. IDEA. https://www.idea.int/news-media/multimedia-reports/global-overview-covid19-impact-elections 3. Brito KDS, Adeodato PJL (2020) Predicting Brazilian and U.S. elections with machine learning and social media data. In: Proceedings of the international joint conference on neural networks 4. Budiharto W, Meiliana M (2018) Prediction and analysis of Indonesia Presidential election from Twitter using sentiment analysis. J Big Data 5:1–10. https://doi.org/10.1186/s40537-0180164-1 5. Cassan G, Sangnier M (2022) The impact of 2020 French municipal elections on the spread of COVID-19. J Popul Econ 35:963–988. https://doi.org/10.1007/s00148-022-00887-0 6. Chandra R, Saini R (2021) Biden vs Trump: modeling US general elections using BERT language model. In: IEEE access, pp 128494–128505 7. Clarke H, Stewart MC, Ho K (2021) Did covid-19 kill Trump politically? The pandemic and voting in the 2020 presidential election. Soc Sci Q 102:2194–2209. https://doi.org/10.1111/ ssqu.12992 8. Dauda M (2020) The impact of covid-19 on election campaign in selected states of Nigeria 14–15 9. Dhanya MG, Megha M, Kannath M et al (2021) Explorative predictive analysis of Covid-19 in US and its impact on US Presidential Election. In: 2021 4th international conference on signal processing and information security, ICSPIS 2021, pp 61–64 10. Ibrahim M, Abdillah O, Wicaksono AF, Adriani M (2016) Buzzer detection and sentiment analysis for predicting presidential election results in a Twitter nation. In: Proceedings of the 15th IEEE international conference on data mining workshop (ICDMW), pp 1348–1353. https://doi.org/10.1109/ICDMW.2015.113 11. Jaidka K, Ahmed S, Skoric M, Hilbert M (2019) Predicting elections from social media: a three-country, three-method comparative study. Asian J Commun 29:252–273. https://doi.org/ 10.1080/01292986.2018.1453849 12. Khan A, Zhang H, Boudjellal N et al (2021) Election prediction on twitter: a systematic mapping study. Complexity 2021:1–27. https://doi.org/10.1155/2021/5565434 13. Khan A, Zhang H, Shang J et al (2020) Predicting politician’s supporters’ network on twitter using social network analysis and semantic analysis. Sci Program. https://doi.org/10.1155/ 2020/9353120 14. Nugroho DK (2021) US presidential election 2020 prediction based on Twitter data using lexicon-based sentiment analysis. In: Proceedings of the confluence 2021: 11th international conference on cloud computing, data science and engineering, pp 136–141 15. Nurjaman A, Hertanto H (2022) Social media and election under covid-19 pandemic in Malang regency Indonesia. Int J Commun 4:1–11 16. Okimoto Y, Hosokawa Y, Zhang J, Li L (2021) Japanese election prediction based on sentiment analysis of twitter replies to candidates. In: 2021 international conference on asian language processing, IALP 2021, pp 322–327 17. Pérez JM, Giudici JC, Luque F (2021) pysentimiento: a python toolkit for sentiment analysis and SocialNLP tasks. http://arxiv.org/abs/2106.09462 18. Salem H, Stephany F (2021) Wikipedia: a challenger’s best friend? Utilizing informationseeking behaviour patterns to predict US congressional elections. Inf Commun Soc. https:// doi.org/10.1080/1369118X.2021.1942953 19. Shino E, Smith DA (2021) Pandemic politics: COVID-19, health concerns, and vote choice in the 2020 general election. J Elections, Public Opin Parties 31:191–205. https://doi.org/10. 1080/17457289.2021.1924734
Impact of COVID-19 on Predicting 2020 US Presidential Elections …
173
20. Singh A, Kumar A, Dua N et al (2021) Predicting elections results using social media activity a case study: USA presidential election 2020. In: 2021 7th international conference on advanced computing and communication systems, ICACCS 2021, pp 314–319 21. Xia E, Yue H, Liu H (2021) Tweet sentiment analysis of the 2020 U.S. presidential election. In: The web conference 2021—companion of the world wide web conference, WWW 2021, pp 367–371
Health Mention Classification from User-Generated Reviews Using Machine Learning Techniques Romieo John, V. S. Anoop, and S. Asharaf
Abstract The advancements in information and communication technologies contributed greatly to the development of social media and other platforms where people express their opinions and experiences. There are several platforms such as drugs.com where people rate pharmaceutical drugs and also give comments and reviews on the drugs they use and their side effects. It is important to analyze such reviews to find out the sentiment, opinions, drug efficacy, and most importantly, adverse drug reactions. Health mention classification deals with classifying such user-generated text into different classes of health mentions such as obesity, anxiety, and more. This work uses machine learning approaches for classifying health mentions from the publicly available health-mention dataset. Both the shallow machine learning algorithms and deep learning approaches with pre-trained embeddings have been implemented and the performances were compared with respect to the precision, recall, and f1-score. The experimental results show that machine learning approaches will be a good choice for automatically classifying health mentions from the large amount of user-generated drug reviews that may help different stakeholders of the healthcare industry to better understand the market and consumers. Keywords Health mention classification · Machine learning · Transformers · Deep learning · Natural language processing · Computational social sciences R. John · V. S. Anoop (B) Kerala Blockchain Academy, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India e-mail: [email protected] R. John e-mail: [email protected] V. S. Anoop School of Digital Sciences, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India S. Asharaf Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_15
175
176
R. John et al.
1 Introduction The Digital revolution has caused the proliferation of internet services at an exponential rate. The recent innovations in information and communication technologies fueled the growth of internet-based applications and services such as social networks and online forums. People use these services to share their reviews and opinions on different products and services to deliberate discussions (Zhang et al. 2022) [13, 15]. As these reviews and opinions contains several useful but latent information for various stakeholders, it is important to analyze them [5, 10, 11, 16, 19]. Online forums such as drugs.com (https://www.drugs.com) are also used for expressing opinions, but specific to drugs and medications. Such platforms contain reviews mentioning important information such as names of drugs, health condition for the drug was used, user reviews, and adverse drug reactions. Identifying this information and classifying the user post into any one of the health mention classes will be of greater importance for healthcare researchers and other stakeholders [1, 6, 17, 18]. Manual analysis of such platforms to find out useful pieces of information from user-generated content would be a time-consuming task and may often result in poor quality. Machine learning algorithms that can handle large quantities of data and classify them into different categories may find application in this context. In the recent past, many such approaches are reported in the machine learning and natural language processing literature with varying degrees of success. With the introduction of sophisticated deep learning algorithms such as Convolutional Neural Networks (CNN) and Bi-directional Long Short-term Memory (BiLSTM), state-ofthe-art results are being obtained in classification tasks, specifically, health mention classification from user-generated posts. The proposed work employs different machine learning algorithms to classify health mentions from the user-generated health reviews that are publicly available. The major contributions of this paper are summarized as follows: (a) Discusses the relevance of health mention classification from user-generated health reviews which are unstructured text. (b) Implements different machine learning algorithms both in shallow learning and deep learning categories to classify health mentions. (c) Reports and discusses the classification performance of different algorithms used in the proposed approach for classifying health mentions. The remainder of this manuscript is organized as follows—Sect. 2 briefly discusses some of the very recent and prominent works on health mention classification using machine learning techniques. Section 3 presents the proposed approach, and Sect. 4 discusses the experiment conducted. The results are presented in Sect. 5 along with a detailed discussion on the same and in Sect. 6, the authors present the conclusions.
Health Mention Classification from User-Generated Reviews Using …
177
2 Related Studies The recent advancements in natural language processing techniques such as the development of large language models made several works reported in the machine learning literature on health mention classification. This section discusses some of the very recent and prominent works that use machine learning approaches for health mention classification. Pervaiz et al. [9] have done a performance comparison of transformer-based models on twitter health mention classification. They have chosen nine widely used transformer methods for comparison and reported that ROBERTa outperformed all other models by achieving an f1-score of 93%. Usman et al. proposed an approach for the identification of diseases and symptoms terms from Reddit to improve the health mention classification problem [14]. The authors have released a new dataset that manually classifies the Reddit posts into four labels namely personal health mentions, non-personal health mentions, figurative health mentions, and hyperbolic health mentions. Experimental results demonstrated that their approach outperformed state-of-the-art methods with an F1-Score of 0.75. A recent approach that attempted to identify COVID-19 personal health mentions from tweets using the masked attention model was reported [12]. They have built a COVID-19 personal health mention dataset containing tweets annotated with four types of health conditions—self-mention, other mention, awareness, and non-health. This approach obtained promising results when compared with some of the stateof-the-art approaches. Fries et al. proposed an ontology-driven weak supervision approach for clinical entity classification from electronic health records [3]. Their model named Trove used medical ontologies and expert-generated rules for the classification task. They have evaluated the performance of their proposed framework on six benchmark tasks and real-life experiments. Kayastha et al. proposed a BERTbased adverse drug effect tweet classification [7]. The authors have reported that their best-performing model utilizes BERTweet followed by a single layer of BiLSTM. The system achieved an F-score of 0.45 on the test set without the use of any auxiliary resources such as Part-of-Speech tags, dependency tags, or knowledge from medical dictionaries [7]. Biddle et al. developed an approach that leverages sentiment distributions to distinguish figurative from literal health reports on Twitter [2]. For the experiments, the authors have modified a benchmark dataset and added nearly 14,000 tweets that are manually annotated. The proposed classifier outperformed state-of-the-art approaches in detecting health-related and figurative tweets. Khan et al. incorporated a permutation-based contextual word representation for health mention classification [8]. The performance of the classifier is improved by capturing the context of disease words efficiently and the experiments conducted with the benchmark dataset showed better accuracy for the proposed approach.
178
R. John et al.
The proposed approach uses different machine learning approaches (both shallow learning and deep learning) for classifying health mentions from user-generated social media text. This work employs machine learning algorithms such as Random Forest, Logistic Regression, Naive Bayes, Support Vector Machine, Light Gradient Boosting Machine, Bi-directional Long Short-Term Memory, Convolutional Neural Network, and Transformers for building classifier models. We also use pre-trained embedding such as BERT and SciBERT for better feature extraction for better classification. Section 3 discusses in detail the proposed approach for health mention classification.
3 Proposed Approach This section discusses the details on the proposed approach. The overall workflow of the proposed method is shown in Fig. 1. Random Forest (RF): The Random Forest approach has been used to examine the drug dataset in several research. Having the ability to analyze facts and make an educated guess. The dataset employed in this study is balanced in nature. Because random forest separates data into branches to form a tree, we infer that random forest cannot be utilized to provide prognostic options to address unbalanced issues.
Fig. 1 Overall workflow of the proposed approach
Health Mention Classification from User-Generated Reviews Using …
179
Logistic Regression (LR): Logistic regression (LR) is a technique that employs a set of continuous, discrete, or a combination of both types of characteristics, as well as a binary goal. This approach is popular since it is simple to apply and produces decent results. Naive Bayes (NB): Bayesian classifiers are well-known for their computational efficiency and natural and efficient handling of missing data. With this advantage, past work trials have shown that this model has a high prediction accuracy. Support Vector Machines (SVM): The SVM has a distinct edge when it comes to tackling classification jobs that need high generalization. The strategy aims to reduce mistake by focusing on structural risk. This technique is widely utilized in medical diagnostics. Light Gradient Boosting Machine (LGBM): is a gradient boosting framework built on the decision tree method that may be applied to a range of machine learning tasks, including classification and ranking. The proposed approach also implements the following deep learning classifiers for the health-mention classification. Convolutional Neural Networks (CNN): The CNN architecture for classification includes convolutional layers, max-pooling layers, and fully connected layers. Convolution and max-pooling layers are used for feature extraction. While convolution layers are meant for feature detection, max-pooling layers are meant for feature selection. Long Short-Term Memory (LSTM): is a kind of recurrent neural network that effectively stores past material in memory. The LSTM can also tackle the gradient vanishing problem in RNN. Like RNNs, LSTMs work with time-series data, albeit the temporal distance or length may be unnecessarily large. Transformers: are a class of deep neural network models. A transformer is a deep learning model extensively used in natural language processing applications that adopts the mechanism of self-attention, deferentially weighting the significance of each part of the input data.
4 Experiment 4.1 Dataset For this experiment, we use the publicly available drug review dataset from the UCI Machine Learning repository. The dataset is available at https://archive.ics.uci.edu/ ml/machine-learning-databases/00462/ [4]. The dataset contains a total of 215,063 patient reviews on specific drugs along with related conditions. There are a total of six attributes in the dataset namely, drugName—that mentions the name of the drug, condition—the name of the condition, review—the patient review, rating—the numerical rating, date—the date of review, and usefulCount—the number of users
180
R. John et al.
Table 1 A snapshot of the dataset used Unique ID Drug name Condition
Review
Rating Date
Useful count
4907
Belviq
Weight Loss
This is a waste of money Did not curb my appetite nor did it makes me feel full
1
23-Sep-14
57
151,674
Chantix
Smoking Cessation
Took it for one 10 week and that was it. I didn’t think it was possible for me to quit. It has been 6 years now Great Product
14-Feb-15
26
30,401
Klonopin
Bipolar Disorder
This 06 medication helped me sleep But eventually it became ineffective as a sleep aid. It also helps me calm down when in severe stress, anxiety, or panic
14-July-09 24
103,401
Celecoxib
Osteoarthritis Celebrex did 01 nothing for my pain
12-Feb-09
35
who found the corresponding review useful. A snapshot of the data is shown in Table 1.
4.2 Experimental Setup This section describes the experimental setup we have used for our proposed approach. All the methods described in this paper were implemented in Python 3.8. The experiments were run on a server configured with Intel(R) Core (TM) i5-10300H [email protected] GHz core processor and 8 GB of main memory. Firstly, the dataset was pre-processed to remove the stopwords, URLs, and other special characters. The
Health Mention Classification from User-Generated Reviews Using …
181
demoji library available at https://pypi.org/project/demoji/ was used for converting the emojis into textual forms. We have also used English contractions list for better text enhancement. During the analysis, we found that the dataset is unbalanced and to balance the dataset, we have performed down-sampling or up-sampling. The feature engineering stage deals with computing different features such as CountVectorization, word embeddings, and TF-IDF, after word tokenization. This work used texts to sequences in Keras https://www.tensorflow.org/api_docs/python/tf/keras/prepro cessing/text/Tokenizer and BERT tokenization from https://huggingface.co/docs/tra nsformers/main_classes/tokenizer is used for the transformers. Once the experimentready dataset is obtained, we have splitted the dataset into train and test split and then used with the algorithms listed in Sect. 3.
5 Results and Discussions This section details the results obtained from the experiment conducted with the dataset given in Sect. 4.1. The results obtained for the shallow machine learning algorithms and related discussions are given in Sect. 5.1, Sect. 5.2 discusses the results of Bi-directional Long Short-term Memory and Convolutional Neural Network classifiers. The results and discussions for the health mention classification using BERT is given in Sect. 5.3 and the results for SciBERT is added in Sect. 5.4.
5.1 Health Mention Classification Using Shallow Machine Learning Algorithms Four shallow machine learning algorithms were implemented as discussed in the proposed approach, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Naive Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM). The precision, recall, and f1-score for these algorithms are shown in Table 2. LR has scored a precision of 88%, recall of 79%, and an f1-score of 83%. LGBM classifier has recorded 69%, 90%, and 78% for the precision, recall, and accuracy respectively. While NB has scored 92% precision, 73% recall, and 81% f1-score, the RF classifier obtained 68%, 61%, and 65% for precision, recall, and f1-score. The SVM algorithm has recorded 73% precision, 63% recall, and 66% f1-score for the experiment conducted.
182
R. John et al.
Table 2 Classification report for the Logistic Regression, LGBM, Naive Bayes, Random Forest, and SVM classifiers Algorithm
Precision
Recall
F1-score
Logistic Regression
0.88
0.79
0.83
Light Gradient Boosting Machine
0.69
0.90
0.78
Naïve Bayes
0.92
0.73
0.81
Random Forest
0.68
0.61
0.65
Support Vector Machine
0.73
0.63
0.66
5.2 Health Mention Classification Using Bi-Directional Long Short-Term Memory (BiLSTM) and Convolutional Neural Network (CNN) We have implemented CNN and BiLSTM algorithms for the classification of health mentions and the results obtained are shown in Table 3 and Table 4 respectively. The Table 3 represents the precision, recall, and f1-score comparison for selected 14 diseases. The Convolutional Neural Network has recorded a weighted average of 89% for the precision, 88% for recall, and 88% for f1-score (Fig. 2). Table 3 Classification report for the convolutional neural network model
Health Condition
Precision
Recall
F1-score
ADHD
0.92
0.89
0.91
Acne
0.95
0.88
0.92
Anxiety
0.83
0.72
0.77
Bipolar disorder
0.83
0.76
0.79
Birth control
0.96
0.98
0.97
Depression
0.74
0.85
0.79
Diabetes (type 2)
0.91
0.89
0.90
Emergency contraception
0.99
0.93
0.96
High blood pressure
0.93
0.83
0.88
Insomnia
0.88
0.85
0.87
Obesity
0.70
0.61
0.66
Pain
0.87
0.95
0.91
Vaginal yeast infection
0.97
0.94
0.95
Weight loss
0.68
0.74
0.71
Macro average
0.87
0.84
0.86
Weighted average
0.89
0.88
0.88
Accuracy
0.88
Health Mention Classification from User-Generated Reviews Using … Table 4 Classification report for BiLSTM model
183
Health condition
Precision
Recall
F1-score
ADHD
0.92
0.87
0.89
Acne
0.94
0.89
0.91
Anxiety
0.80
0.74
0.77
Bipolar disorder
0.78
0.75
0.76
Birth control
0.97
0.98
0.97
Depression
0.74
0.79
0.76
Diabetes (type 2)
0.85
0.88
0.86
Emergency contraception
0.97
0.95
0.96
High blood pressure
0.82
0.83
0.82
Insomnia
0.84
0.87
0.85
Obesity
0.55
0.75
0.63
Pain
0.93
0.91
0.92
Vaginal yeast infection
0.93
0.96
0.94
Weight loss
0.70
0.44
0.54
Macro average
0.84
0.83
0.83
Weighted average
0.87
0.87
0.87
Accuracy
0.87
Fig. 2 Classification report for the Logistic Regression, LGBM, Naive Bayes, Random Forest, and SVM classifiers
184
R. John et al.
The classification report for the BiLSTM model is shown in Table 4 with 14 diseases. This model has recorded a weighted average of 87% for precision, recall, and f1-score (Figs. 3, 4).
Fig. 3 Precision, recall, F1-score comparisons for different health conditions for convolutional neural network
Fig. 4 Precision, recall, f1-score comparisons for different health conditions for bidirectional long short-term memory
Health Mention Classification from User-Generated Reviews Using …
185
5.3 Health Mention Classification Using Bidirectional Encoder Representations from Transformers For the BERT implementation, the classification report is shown in Table 5, for the top six health conditions namely Birth Control, Depression, Pain, Anxiety, Acne, and bipolar disorder. BERT has recorded 91% for the precision, recall, and f1-score. The comparison report shows that BERT outperformed other models in terms of f1-score (Fig. 5). Table 5 Classification report for BERT model
Health condition
Precision
Recall
F1-score
Birth control
0.98
0.98
0.98
Depression
0.80
0.82
0.81
Pain
0.92
0.95
0.94
Anxiety
0.75
0.81
0.78
Acne
0.93
0.98
0.91
Bipolar disorder
0.87
0.69
0.77
Accuracy
0.91
Macro average
0.88
0.86
0.87
Weighted average
0.91
0.91
0.91
Fig. 5 Precision, recall, F1-score comparisons for different health conditions for bidirectional encoder representations from transformers
186
R. John et al.
5.4 Health Mention Classification Using Pre-Trained BERT-Based Language Model for Scientific Text SciBERT is similar to the working of the BERT model, but it was pre-trained using a medical corpus using publicly accessible data from PubMed and PMC. For implementing SciBERT on our dataset, we manually labelled 2000 training samples and 1000 test samples using only top 40 health conditions. The classification report for the SciBERT model for top 17 health conditions are shown in Table 6 and it shows the weighted average precision is 87%, recall is 89%, and f1-score is 86%. When closely observing the precision, recall and accuracy of individual health conditions are not satisfactory and we believe this is due to the limited data samples used for training and this needs to be further investigated (Fig. 6). Table 6 Classification report for Sci-BERT model
Health condition
Precision
Recall
F1-score
ADHD
0.42
0.39
0.41
GERD
0.70
0.45
0.41
Abnormal uterine bleeding
0.50
0.05
0.09
Acne
0.45
0.37
0.40
Birth control
0.56
0.43
0.49
Depression
0.50
0.36
0.42
Emergency contraception
0.47
0.37
0.42
Fibromyalgia
0.77
0.19
0.31
Insomnia
0.45
0.30
0.36
Irritable bowel syndrome
0.50
0.20
0.29
Migraine
0.49
0.63
0.55
Muscle spasm
0.57
0.22
0.32
Sinusitis
0.41
0.23
0.29
Smoking cessation
0.74
0.51
0.60
Urinary tract infection
0.43
0.29
0.35
Vaginal yeast infection
0.50
0.37
0.43
Weight loss
0.49
0.23
0.31
Macro average
0.44
0.16
0.20
Weighted average
0.87
0.89
0.86
Accuracy
0.89
Health Mention Classification from User-Generated Reviews Using …
187
Fig. 6 Precision, recall, F1-score comparisons for different health conditions for SciBERT—a pre-trained BERT-based language model for scientific text (SciBERT)
6 Conclusions Analyzing user generated text from social media on health mentions have several use-cases such as understanding user reviews and sentiments on medications, its efficacy, and adverse drug reactions. As the manual analysis is very cumbersome, machine learning approaches may become very handy and effective for automating the analysis. This work proposed machine learning-based approaches for health mention classification from social media posts. The shallow learning algorithms such as Logistic Regression, Light Gradient Boosting Machine, Naive Bayes, Random Forest, and Support Vector Machine are implemented along with deep learning algorithms such as BiLSTM, CNN and Transformers. The end results are promising and show that machine learning will be a great choice for automating the health mention classification from the user generated contents.
References 1. Abualigah, L., Alfar, H. E., Shehab, M., & Hussein, A. M. A. (2020). Sentiment analysis in healthcare: a brief review. Recent Advances in NLP: The Case of Arabic Language, 129–141. 2. Biddle, R., Joshi, A., Liu, S., Paris, C., & Xu, G. (2020, April). Leveraging sentiment distributions to distinguish figurative from literal health reports on Twitter. In Proceedings of The Web Conference 2020 (pp. 1217–1227).
188
R. John et al.
3. Fries JA, Steinberg E, Khattar S, Fleming SL, Posada J, Callahan A, Shah NH (2021) Ontologydriven weak supervision for clinical entity classification in electronic health records. Nat Commun 12(1):1–11 4. Gräßer, F., Kallumadi, S., Malberg, H., & Zaunseder, S. (2018). Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning. In Proceedings of the 2018 International Conference on Digital Health (pp. 121–125). 5. Hajibabaee, P., Malekzadeh, M., Ahmadi, M., Heidari, M., Esmaeilzadeh, A., Abdolazimi, R., & James Jr, H. (2022). Offensive language detection on social media based on text classification. In 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0092–0098). IEEE. 6. Jothi N, Husain W (2015) Data mining in healthcare–a review. Procedia computer science 72:306–313 7. Kayastha, T., Gupta, P., & Bhattacharyya, P. (2021). BERT based Adverse Drug Effect Tweet Classification. In Proceedings of the Sixth Social Media Mining for Health (\#SMM4H) Workshop and Shared Task (pp. 88–90). 8. Khan, P. I., Razzak, I., Dengel, A., & Ahmed, S. (2020). Improving personal health mention detection on twitter using permutation based word representation learning. In International Conference on Neural Information Processing (pp. 776–785). Springer, Cham. 9. Khan, P. I., Razzak, I., Dengel, A., & Ahmed, S. (2022). Performance comparison of transformer-based models on twitter health mention classification. IEEE Transactions on Computational Social Systems. 10. Lekshmi, S., & Anoop, V. S. (2022). Sentiment Analysis on COVID-19 News Videos Using Machine Learning Techniques. In Proceedings of International Conference on Frontiers in Computing and Systems (pp. 551–560). Springer, Singapore. 11. Liu J, Wang X, Tan Y, Huang L, Wang Y (2022) An Attention-Based Multi-Representational Fusion Method for Social-Media-Based Text Classification. Information 13(4):171 12. Luo, L., Wang, Y., & Mo, D. Y. (2022). Identifying COVID-19 Personal Health Mentions from Tweets Using Masked Attention Model. IEEE Access. 13. Messaoudi, C., Guessoum, Z., & Ben Romdhane, L. (2022). Opinion mining in online social media: a survey. Social Network Analysis and Mining, 12(1), 1-18 14. Naseem, U., Kim, J., Khushi, M., & Dunn, A. G. (2022). Identification of disease or symptom terms in reddit to improve health mention classification. In Proceedings of the ACM Web Conference 2022 (pp. 2573–2581). 15. Reveilhac, M., Steinmetz, S., & Morselli, D. (2022). A systematic literature review of how and whether social media data can complement traditional survey data to study public opinion. Multimedia Tools and Applications, 1–36. 16. Salas-Zárate, R., Alor-Hernández, G., Salas-Zárate, M. D. P., Paredes-Valverde, M. A., BustosLópez, M., & Sánchez-Cervantes, J. L. (2022). Detecting depression signs on social media: a systematic literature review. In Healthcare (Vol. 10, No. 2, p. 291). MDPI. 17. Shiju, A., & He, Z. (2021). Classifying Drug Ratings Using User Reviews with TransformerBased Language Models. MedRxiv. 18. Thoomkuzhy, A. M. (2020). Drug Reviews: Cross-condition and Cross-source Analysis by Review Quantification Using Regional CNN-LSTM Models. 19. Varghese, M., & Anoop, V. S. (2022). Deep Learning-Based Sentiment Analysis on COVID19 News Videos. In Proceedings of International Conference on Information Technology and Applications (pp. 229–238). Springer, Singapore.
Using Standard Machine Learning Language for Efficient Construction of Machine Learning Pipelines Srinath Chiranjeevi and Bharat Reddy
Abstract We use Standard Machine Learning Language (SML) to streamline the synthesis of machine learning pipelines in this research. The overarching goal of SML is to ease the production of machine learning pipeline by providing a level of abstraction which makes it possible for individuals in industry and academia to use machine learning to tackle challenges across a variety of fields without having to deal with low level details involved in creating a machine learning pipeline. We further probe into how a wide range of interfaces can be instrumental in interacting with SML. Lines of comparison are further drawn to analyze the efficiency of SML in practical use cases versus traditional approaches. As an outcome, we developed SML a query like language which serves as an abstraction from writing a lot of code. Our findings show how SML is competent in solving the problems that utilize machine learning. Keywords Machine learning pipelines · Standard machine learning language · Problem solving using machine learning
1 Introduction Machine Learning has simplified the process of solving a vast amount problem in a variety of fields by learning from data. In most cases, machine learning has become more attractive than manually creating programs to address these same issues. However, there is a multitude of nuisances involved when developing machine learning pipelines [2]. If these nuisances are not taken into consideration, one may not receive satisfactory results. A domain expert utilizing machine learning to solve
S. Chiranjeevi (B) Vellore Institute of Technology, Bhopal, India e-mail: [email protected] B. Reddy National Institute of Technology, Calicut, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_16
189
190
S. Chiranjeevi and B. Reddy
Fig. 1 Example of a SML query performing classification
problems may not want or have the time to deal with these complexities. To combat these issues, we introduce Standard Machine Learning Language (SML). The overall objective of the SML is to provide a level of abstraction which simplifies the development process of machine learning pipelines [8]. Consequently, this enables students, researchers, and industry professionals without a background in machine learning to solve problems in different domains with machine learning. We developed SML a query like language which serves as an abstraction from writing a lot of code (see Fig. 1 for an example). In the subsequent sections related works are discussed followed by defining the grammar used to create queries for SML [3]. The architecture of SML is described, lastly SML is applied to use-cases to demonstrate how it reduces the complexity of solving problems that utilize machine learning.
2 Related Work There are related works that attempt to provide a level of abstraction as well for writing machine learning code. In this article on Automating Data Science [5] TPOT is a tool they implemented in Python that creates and optimizes machine learning pipelines using genetic programming. Given cleaned data, TPOT performs feature selection, preprocessing, and construction. Given the task (classification, regression, or clustering) it uses the best features to determine the most optimal model to use. Lastly, it performs optimization on parameters for the selected model. What differentiates SML from TPOT is that in addition to feature, model, and parameter selection/optimization a framework is in place to apply these models to different datasets and construct visualizations for different metrics with each algorithm. This article on NLP systems [6] where they used LBJava is a tool based on a programming paradigm called Learning Based Programming (D. Roth 2010) which is an extension of conventional programming that creates functions using data driven approaches. LBJava follows the principles of Learning Based Programming by abstracting the details of common machine learning processes. What separates SML from LBJava and TPOT is that it offers a higher level of abstraction by providing a query like language which allows people who are not experienced programmers to use SML.
Using Standard Machine Learning Language for Efficient Construction …
191
3 Grammar The SML language is a domain specific language with grammar implemented in Backus-Naur form (BNF). Each expression has a rule and can be expanded into other terms. Figure 1 is an example of how one would perform classification on a dataset using SML. The query in Fig. 1 reads from a dataset, performs an 80/20 split of training and testing data respectively, and performs classification on the 5th column of the hypothetical dataset using columns 1, 2, 3, and 4 as predictors. In the subsequent subsections SML’s grammar in BNF form is defined in addition to the keywords [1].
3.1 Grammar Structure This subsection is dedicated to defining the grammar of SML in terms of BNF. A Query can be defined by a delimited list of actions where the delimiter is an AND statement; with BNF syntax this is defined as: < Query >::=< Action > | < Action > AND < Query >
(1)
An Action in (1) follows one of the following structures defined in (2) where a Keyword is required followed by an Argument and/or OptionList. < Action >::=< Keyword > \ < Argument > ” | < Keyword > \ < Argument > ”\(” < OptionList > \)”
(2)
| < Keyword > \(” < OptionList > \)” A Keyword is a predefined term associating an Action with a particular string. An Argument generally is a single string surrounded by quotes that specifies a path to a file. Lastly, an Argument can have a multitude of options (3) where an Option consist of an OptionName with either an OptionValue or OptionValueList. An OptionName, and OptionValue consist of a single string, an OptionList (4) consist of a comma delimited list of options and an OptionValueList (5) consist of a comma delimited list of OptionValues. < Option >::=< OptionName > \ = ” < Option Value > | < OptionName > \ = ”\ ” < OptionValueList > \ ” < OptionList >::=< Option > | < Option > ” < OptionList > < OptionValueList >::=< OptionValue > | < OptionValue > \, ” < OptionValueList >
(3)
(4) (5)
192
S. Chiranjeevi and B. Reddy
Fig. 2 Here the example Query on the top was defined in Fig. 1 and the bottom Query is in BNF format. For the example Query the first Keyword is READ followed by an Argument that specifies the path to the dataset, next an OptionValueList containing information about the delimiter of the dataset and the header. We then include the AND delimiter to specify an additional Keyword SPLIT with an OptionValueList that tells us the size of the training and testing partitions for the dataset specified with the READ Keyword. Lastly, the AND delimiter is used to specify another Keyword CLASSIFY which performs classification using the training and testing data from the result of the SPLIT Keyword followed by an OptionValueList which provides information to SML about the features to use (columns 1–4), the label we want to predict (column 5), and the algorithm to use for classification
To put the grammar into perspective the example Query in Fig. 1 has been transcribed into BNF format and can be found in Fig. 2. The next subsection describes the functionality for all Keywords of SML.
3.2 Keywords Currently there are 8 Keywords in SML. These Keywords can be chained together to perform a variety of actions. In the subsequent subsections we describe the functionality of each Keyword.
3.2.1
Reading Datasets
When reading data from SML one must use the READ Keyword Followed by an Argument containing a path to the dataset. READ also accepts a variety of Options. The first Query in Fig. 3 consist of only a Keyword and Argument. This Query reads in data from”/path/to/dataset”. The second Query includes an OptionValueList in addition to reading data from the specified path; the OptionValueList specifies that the dataset is delimited with semicolons and does not include a header row.
Fig. 3 Example using the READ Keyword in SML
Using Standard Machine Learning Language for Efficient Construction …
193
Fig. 4 An example utilizing the REPLACE Keyword in SML
Fig. 5 Example using the SPLIT Keyword in SML
3.2.2
Cleaning Data
When NaNs, NAs and/or other troublesome values are present in the dataset we clean these values in SML by using the REPLACE Keyword. Figure 4 shows an example of the REPLACE Keyword being used. In this Query we use the REPLACE Keyword in conjugation with the READ Keyword. SML reads from a comma delimited dataset with no header from the path”/path/to/dataset”. Then we replace any instance of” NaN” with the mode of that column in the dataset.
3.2.3
Partitioning Datasets
It is often useful to split a dataset into training and testing datasets for most tasks involving machine learning. This can be achieved in SML by using the SPLIT Keyword. Figure 5 shows an example of a SML Query performing an 80/20 split for training and testing data respectively by utilizing the SPLIT Keyword after reading in data.
3.2.4
Using Classification Algorithms
To use a classification algorithm in SML one would use the CLASSIFY Keyword. SML has the following classification:
3.2.5
Algorithms Implemented
Support Vector Machines, Naive Bayes, Random Forest, Logistic Regression, and K-Nearest Neighbors. Figure 6 demonstrates how to use the CLASSIFY Keyword in a Query.
194
S. Chiranjeevi and B. Reddy
Fig. 6 Example using the CLASSIFY Keyword in SML. Here we read in data and create training and testing datasets using the READ and SPLIT Keywords respectively. We then use CLASSIFY Keyword with the first 4 columns as features and the 5th column to perform classification using a support vector machine
Fig. 7 Example using the CLUSTER Keyword in SML. Here we read in data and create training and testing datasets using the READ and SPLIT Keywords respectively. We then use CLUSTER Keyword with the first 7 columns as features and perform unsupervised clustering with the K-Means algorithm
Fig. 8 Example using the REGRESS Keyword in SML. Here we read in data and create training and testing datasets using the READ and SPLIT Keywords respectively. We then use REGRESS Keyword with the first 9 columns as features and the 10th column to perform regression on using ridge regression
3.2.6
Using Clustering Algorithms
Clustering algorithms can be invoked by using the CLUSTER Keyword. SML currently has K-Means clustering implemented. Figure 7 demonstrates how to use the CLUSTER Keyword in a Query.
3.2.7
Using Regression Algorithms
Regression algorithms use the REGRESS Keyword. SML currently has the following regression algorithms implemented: Simple Linear Regression, Ridge Regression, Lasso Regression, and Elastic Net Regression. Figure 8 demonstrates how to use the REGRESS Keyword in a Query.
3.2.8
Saving/Loading Models
It is possible to save models and reuse them later. To save a model in SML one would use the SAVE Keyword in a Query. To load an existing model from SML one
Using Standard Machine Learning Language for Efficient Construction …
195
Fig. 9 Example using the LOAD and SAVE Keywords in SML
Fig. 10 Example using the PLOT Keyword in SML
would use the LOAD Keyword in a Query. Figure 9 shows how the syntax required save and load a model using SML. With any of the existing queries using REGRESS, CLUSTER, or CLASSIFY Keywords attaching SAVE to the Query will save the model.
3.2.9
Visualizing Datasets and Metrics of Algorithms
When using SML it is possible to visualize datasets or metrics of algorithms (such as learning curves, or ROC curves). To do this the PLOT Keyword must be specified in a Query. Figure 10 shows can example of how to use the PLOT Keyword in a Query. We apply the same operations to perform clustering in Fig. 7, however we utilize the PLOT Keyword.
4 SML’s Architecture With SML’s grammar defined enough information has been presented to dive into SML’s architecture. When SML is given a Query in the form of a string, it is passed to the parser. The high-level implementation of the grammar is then used to parse through the string to determine the actions to perform. The actions are stored in a dictionary and given to one of the following phases of SML: Model Phase, Apply Phase, or Metrics Phase. Figure 11 shows a block diagram of this process. The model phase is generally for constructing a model. The Keywords that generally invoke the model phase are: READ, REP LACE, CLASSIFY, REGRESS, CLUSTER, and SAVE. The apply phase is generally for applying a preexisting model to new data. The Keyword that generally invokes the apply phase is LOAD. It is often useful to visualize the data that one works with and beneficial to see performance metrics of a machine learning model. By default, if you specify the PLOT Keyword in a Query, SML will execute the metrics phase. The last significant component of SML’s architecture is the connector. The connector connects drivers from different libraries and languages to achieve an action a user wants during a particular phase (see Fig. 12). If one considers applying linear regression on a dataset, during the model phase SML calls the connector to retrieve the linear regression library in this
196
S. Chiranjeevi and B. Reddy
Fig. 11 Block Diagram of SML’s Architecture
Fig. 12 Block diagram of SML’s connector
case SML uses sci-kit learn’s implementation however, if we wanted to use an algorithm not available in sci-kit learn such as a Hidden Markov Model (HMM) SML will use the connector to call another library that supports HMM.
5 Interface There are multiple interfaces available for working with SML. We have developed a web tool that is publicly available which allows users to write queries and get results back from SML through a web interface (see Fig. 13). There is also a REPL environment available that allows the user to interactively write queries and displays results from the appropriate phases of SML. Lastly, users have the option to import SML into an existing pipeline to simplify the development process of applying machine learning to problems.
Using Standard Machine Learning Language for Efficient Construction …
197
Fig. 13 Interface of SML’s website. Currently users can read instructions and examples of how to use SML are on the left pane. In the middle pane users can type an SML Query and then hit the execute button. The results of running the Query through SML are then displayed on the right pane
6 Use Cases We tested SML’s framework against ten popular machine learning problems with publicly available data sets. We applied SML to the following datasets: Iris Dataset,1 Auto-MPG Dataset,2 Seeds Dataset,3 Computer Hardware Dataset,4 Boston Housing Dataset,5 Wine Recognition Dataset,6 US Census Dataset,7 chronic kidney disease,8 Spam Detection9 which were taken from UCI’s Machine Learning Repository (M. [4]. We also applied SML to the Titanic Dataset.10 In this paper we discuss in detail the process of applying SML to the Iris Dataset and the Auto-MPG dataset.
6.1 Iris Dataset Figure 14 shows all the code required to perform classification on the Iris dataset using SML in Python. In Fig. 14 data is read in from a specified path of a file called” iris.csv” of a subdirectory called “data” in the parent directory, performs an 80/20 split, uses the first 4 columns to predict the 5th column, uses support vector machines as the algorithm to perform classification and finally plots distributions of our dataset 1
https://archive.ics.uci.edu/ml/datasets/Iris. https://archive.ics.uci.edu/ml/datasets/Auto+MPG. 3 https://archive.ics.uci.edu/ml/datasets/seeds. 4 https://archive.ics.uci.edu/ml/datasets/Computer+Hardware. 5 https://archive.ics.uci.edu/ml/datasets/Housing. 6 https://archive.ics.uci.edu/ml/datasets/wine. 7 https://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990). 8 https://archive.ics.uci.edu/ml/datasets/ChronicKidneyDisease. 9 https://archive.ics.uci.edu/ml/datasets/Spambase. 10 https://www.kaggle.com/c/titanic. 2
198
S. Chiranjeevi and B. Reddy
Fig. 14 SML Query that performs classification on the iris dataset using support vector machines. The purpose of this figure is to highlight the level of complexity relative to an SML query
Fig. 15 The SML Query in Fig. 14 produces these results. The subgraph on the left is a lattice plot showing the density estimates of each feature used. The graph on the right shows the ROC curves for each class of the iris dataset
and metrics of our algorithm. The Query in Fig. 14 uses the same 3rd party libraries implicitly or explicitly. The complexities required to produce such results with and without SML are outlined. The result for both snippets of code is the same and can be seen in Fig. 15.
6.2 Auto-Mpg Dataset Figure 16 shows the SML Query required to perform regression on the Auto-MPG dataset in Python. In Fig. 16 we read data from a specified path, the dataset is separated by fixed width spaces, and we choose not to provide a header for the dataset. Next, we perform an 80/20 split, replace all occurrences of “?” with the mode of the column. We then perform linear regression using columns 2–8 to predict the label. Lastly, we visualize distributions of our dataset and metrics of our algorithm. The outcome of both processes is the same and can be seen in Fig. 17.
Using Standard Machine Learning Language for Efficient Construction …
199
Fig. 16 SML Query that performs classification on the Auto-MPG dataset using support vector machines
Fig. 17 The SML Query in Fig. 16 produce these results. The subgraph on the left is a lattice plot showing the density estimates of each feature used. The top right graph shows the learning curve of the model and the graph on lower right shows the validation curve
7 Discussion For the Iris and Auto-MPG use cases the same libraries and programming language were used to perform regression and classification. The amount of work required to perform a task and produce the following results in Fig. 17 and Fig. 15 significantly decreases when SML is utilized. Constructing each SML query used less than 10 lines of code however, implementing the same procedures without SML using the same programming language and libraries needed 70 + lines of code (Xing Wu, Cheng Chen, Pan Li, Mingyu Zhong, Jianjia Wang, Quan Qian, Peng Ding, Junfeng Yao, and Yike Guo 2022). This demonstrates that SML simplifies the development process of solving problems with machine learning and opens a realm of possibility to rapidly develop machine learning pipelines which would be an attractive aspect for researchers (Yang Yang, Suzhen Li, and Pengcheng Zhang 2022).
8 Conclusion To summarize we introduced an agnostic framework that integrates a query-like language to simplify the development of machine learning pipelines. We provided a high-level overview of its architecture and grammar. We then applied SML to
200
S. Chiranjeevi and B. Reddy
machine learning problems and demonstrated how the complexity of the code one must write significantly decreases when SML is used. In the future we plan to extend the connector to support more machine learning libraries and additional languages. We also plan to expand the web application to make SML easier to use for a lament user. If we want researchers from other domain areas to utilize machine learning without understanding the complexities required for machine learning a tool like SML is needed. The concepts presented in this paper are sound. The details may change but the core principals will remain the same. Abstracting the complexities of machine learning from users is appealing because this will increase the use of machine learning by researchers in different disciplines.
References 1. AlBadani B, Shi R, Dong J (2022) A novel machine learning approach for sentiment analysis on twitter incorporating the universal language model fine-tuning and svm. Appl Syst Innov 5(1):13 2. Domingos P (2012) A few useful things to know about machine learning. 55:78–87, New York, NY, USA, ACM 3. Kaczmarek I, Iwaniak A, Swietlicka A, Piwowarczyk M, Nadolny A (2022) A machine learning approach for integration of spatial development plans based on natural language processing. Sustain Cities Soc 76:103479 4. Lichman M (2013) UCI machine learning repository 5. Olson RS, Bartley N, Urbanowicz RJ, Moore JH (2016) Evaluation of a tree-based pipeline optimization tool for automating data science. CoRR, abs/1603.06212 6. Rizzolo N, Roth D (2010) Learning based java for rapid development of NLP systems. In: LREC, Valletta, Malta, p 5 7. Roth D (2005) Learning based programming. Innovations in machine learning. Theory and applications, pp 73–95 8. Stoleru C-A, Dulf EH, Ciobanu L (2022) Automated detection of celiac disease using machine learning algorithms. Sci Rep 12(1):1–19 9. Wu X, Chen C, Li P, Zhong M, Wang J, Qian Q, Ding P, Yao J, Guo Y (2022) FTAP: feature transferring autonomous machine learning pipeline. Inf Sci 593:385–397 10. Yang Y, Li S, Zhang P (2022) Data-driven accident consequence assessment on urban gas pipeline network based on machine learning. Reliab Eng Syst Saf 219:108216
Machine Learning Approaches for Detecting Signs of Depression from Social Media Sarin Jickson, V. S. Anoop, and S. Asharaf
Abstract Depression is considered to be one of the most severe mental health issues globally; in many cases, depression may lead to suicide. According to a recent report by the World Health Organization (WHO), depression is a common illness worldwide and approximately 280 million people in the world are depressed. Timely identification of depression would be helpful to avoid suicides and save the life of an individual. Due to the widespread adoption of social network applications, people often express their mental state and concerns on such platforms. The COVID-19 pandemic has been a catalyst to this situation where the mobility and physical social connections of individuals have been limited. This caused more and more people to express their mental health concerns with such platforms. This work attempts to detect signs of depression from unstructured social media posts using machine learning techniques. Advanced deep learning approaches such as transformers are used for classifying social media posts that will help in the early detection of any signs of depression in individuals. The experimental results show that machine learning approaches may be efficiently used for detecting depression from user-generated unstructured social media posts. Keywords Depression detection · Machine learning · Social media · Transformers · Deep learning · Natural language processing · Computational social sciences
S. Jickson · V. S. Anoop Kerala Blockchain Academy, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India e-mail: [email protected] S. Asharaf Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India e-mail: [email protected] V. S. Anoop (B) School of Digital Sciences, Kerala University of Digital Sciences, Innovation and Technology, Thiruvananthapuram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_17
201
202
S. Jickson et al.
1 Introduction Depression is one of the most dreadful mental disorders that affect millions of people around the world. Recent statistics by the World Health Organization (WHO) reports that the number of individuals with depression is increasing day by day [7]. To be more specific, it is estimated that approximately 280 million people around the world are affected by depression, 5.0% among adults and 5.7% among adults older than 60 years [7]. These rates are highly alarming considering the fact that in many cases, individuals commit suicide even though there are effective treatments available for mild, moderate, and severe depression [12]. The reduced mobility and limited or no social interactions imposed by the COVID-19 pandemic have fueled the rate of depression-related disorders around the globe [16]. People tended to be at the home due to the lockdown and other travel restrictions imposed by the government and other administrations which affected the mental stages of individuals in a negative manner that often led to depression in the vast majority. Depression usually does not last long but if it is to become a disease, the symptoms of depression must persist for two weeks or more. So, the timely identification of depression is very important to bring the individual back to his normal life. The advancements in the internet and the competition between various internet service providers caused low-cost internet services to be a standard in many countries. This has not only increased the rate of internet penetration, but also shot up the growth of internet-based applications and services such as social networks [1, 21]. According to some recent statistics (January 2022), more than half of the world’s population uses social media and that contributes to approximately 4.62 billion people around the world [9]. It is also estimated that internet users worldwide spend an average of 2 h and 27 min per day on any social media, and in the future, the amount of time spent on social networks will likely stay steady. So, connecting the statistics of the global depression reports and internet penetration, it is highly relevant that people use social media as a platform for sharing their opinions, anxieties, and other mental stages. This has become a new normal due to the COVID-19 pandemic where physical meetings and social networking opportunities were limited. Social network analysis deals with collecting, organizing, and processing social media posts generated by users to unearth latent themes from them. This technique is proven to be efficient in understanding user intentions and patterns that may help the key stakeholders to take proactive decisions [12, 20]. As manual analysis of social media posts will be inefficient, considering a large number of messages, recent approaches use machine learning that can process a large amount of data with nearhuman accuracy. People with depression often post about the same along with the indications of several symptoms related to depression and early detection will be possible by analyzing the same. Very recently, several studies incorporating machine learning approaches for depression detection from social media have been reported in the literature [13, 14, 23, 25] with varying degrees of success. The proposed work uses different machine learning algorithms (both shallow learning and deep learning) for identifying the severity of depression from social media posts represented as
Machine Learning Approaches for Detecting Signs of Depression …
203
unstructured text. This work also makes use of transformers for severity classification of posts into no depression, moderate, and severe. The main contributions of this paper are summarized as follows: (a) Discusses depression—one of the most challenging mental disorders and the role of social media in depression detection. (b) Implements different machine learning algorithms in both shallow and deep learning categories and compares the performance. (c) Reports the classification performance of different algorithms in classifying the severity of depression from social media posts. The remainder of this paper is organized as follows. Section 2 discusses some of the recent related works that are reported in the literature that uses machine learning approaches for depression detection. In Sect. 3, the authors present the proposed approach for classifying the severity of depression-related social media posts. Section 4 details the experiment conducted and the dataset used, and in Sect. 5, the results are presented and discussed in a detailed fashion. Section 6 concludes this work.
2 Related Studies Depression detection has garnered a lot of interest among social network and healthcare researchers in recent times. There are several approaches reported in the recent past that attempt to detect depression from social media posts, specifically from unstructured text. This section discusses some of the recent and prominent approaches reported in the machine learning and social network analysis literature that is highly related to the proposed approach. Xiaohui Tao et al. developed a prototype to illustrate the approach’s mechanism and any potential social effects using a depressive sentiment vocabulary [19]. They compared the data with this vocabulary and classified the social media posts. Mandar Deshpande et al. uses natural language processing to classify Twitter data and used SVM and Naive Bayes algorithms for the classification of depression-related posts [6]. Guangyao Shen et al. proposed a method that uses a multi-modal dictionary learning solution [18]. Faisal Muhammad Shah et al. developed a method that uses a hybrid model that can detect depression by analyzing user’s textual posts [17]. Deep learning algorithms were trained using the training data and then performance has been evaluated on the Reddit data which was published for the pilot piece of work. In particular, the authors have proposed a Bidirectional Long Short-Term Memory (BiLSTM) with different word embedding techniques and metadata features that gave comparatively better performance. Chiong et al. proposed a textual-based featuring approach for depression detection from social media using machine learning classifiers [3]. They have used two publicly available labeled datasets to train and test the machine learning models and other three non-twitter datasets for evaluating the performance of their proposed model. The experimental results showed that their proposed approach effectively
204
S. Jickson et al.
detected depression from social media data. Zogan et al. developed DepressionNet— a depression detection model for social media with user post summarization and multi-modalities [24]. They have proposed a novel framework for extractive and abstractive post summarization and used deep learning algorithms such as CNN and GRU for classification. Titla-Tlatelpa et al. proposed an approach for depression detection from social media using a profile-based sentiment-aware approach [5]. This approach explored the use of the user’s characteristics and the expressed sentiments in the messages as context insights. The authors have proposed a new approach for the classification of the profiles of the users and their experiment on the benchmark datasets showed better results. Another approach that uses sentiment lexicons and content-based features was reported by Chiong et al. They have proposed 90 unique features as input to the machine learning classifier framework for depression detection from social media. Their approach resulted in more than 96% accuracy for all the classifiers with the highest being 98% with the gradient boosting algorithm [4]. Lara et al. presented DeepBoSE—a deep bag of sub-emotions for depression detection from social media [11]. The proposed approach computed a bag-of-features representation that uses emotional information and is further trained on the transfer learning paradigm. The authors have performed their experiments on eRisk17 and eRisk18 datasets for the depression detection task and it could score better f1-score for both. An approach for early detection of stress and depression from social media using mental state knowledge-aware and the contrastive network was reported by Yang et al. The authors have tested the proposed methods on a depression detection dataset Depression-Mixed with 3165 Reddit and blog posts, a stress detection dataset Dreaddit with 3553 Reddit posts, and a stress factors recognition dataset SAD with 6850 SMS-like messages. Their proposed approach detected new stateof-the-art results in all the datasets used [23]. Angskun et al. presented a big data analytics approach for social media for the real-time detection of depression from social media [2]. They have used Twitter data collected for a period of two months and implemented machine learning algorithms including deep learning approaches. They have reported that the Random Forest classifier algorithm showcased better results and their model could capture depressive moods of depression sufferers. This proposed work implements different machine learning algorithms including transformers to classify the severity of depression from online social media posts. The comparison results for all the machine learning approaches are also reported on publicly available depression severity labeled datasets.
3 Proposed Approach This section discusses the proposed approach for depression severity classification from social media text using machine learning approaches. The overall workflow of the proposed approach is shown in Fig. 1. The first step deals with the preprocessing of social media posts such as data cleaning and normalization. As social media posts are user-generated, they may contain several noises and unwanted contents such
Machine Learning Approaches for Detecting Signs of Depression …
205
as URLs, special characters, and emojis. As these elements may not convey any useful features, these should be removed from the dataset. Then the normalization techniques are applied that will change the values of numerical columns in the dataset to a common scale without losing information. This step is crucial for improving the performance and training stability of the model. In the feature extraction stage, various features relevant to training the machine learning classifiers such as count of words/phrases, term frequency versus inverse document frequency (TF-IDF), and also using pre-trained embedding models such as BERT (Bidirectional Encoder Representations from Transformers) will be extracted. The features collected will be used for training machine learning algorithms such as shallow learning (SVM, Logistic Regression, Naive Bayes, etc.) and deep learning (ANN, CNN, and Transformers). After the feature extraction stage, the dataset was splitted into train and test, and the same will be used for training and testing the model, respectively. In our case, the dataset contains three labels—not depression, moderate, and severe depression, and the datapoints were imbalanced. Techniques such as down-sampling and up-sampling were performed to create a balanced version of experiment-ready copy for the final dataset. The proposed approach implemented the following shallow-learning algorithms: Logistic Regression (LR): Logistic regression (LR) is a technique that employs a set of continuous, discrete, or a combination of both types of characteristics, as well as a binary goal. This approach is popular since it is simple to apply and produces decent results.
Fig. 1 Overall workflow of the proposed approach
206
S. Jickson et al.
Support Vector Machines (SVM): The SVM has a distinct edge when it comes to tackling classification jobs that need high generalization. The strategy aims to reduce mistakes by focusing on structural risk. This technique is widely utilized in medical diagnostics. Multinomial Naïve Bayes (NB): This is a popular classification algorithm used for the analysis of categorical text data. The algorithm is based on the Bayes theorem and predicts the tag of a text by computing the probability of each tag for a given sample and then gives the tag with the highest probability as output. Random Forest (RF): The Random Forest approach has been used to examine the drug dataset in several researches. Having the ability to analyze facts and make an educated guess, the dataset employed in this study is balanced in nature. Because random forest separates data into branches to form a tree, we infer that random forest cannot be utilized to provide prognostic options to address unbalanced issues. The proposed approach implements the following neural network/deep learning algorithms (with pre-trained embeddings) on the social network dataset. Artificial Neural Network (ANN): An artificial neural network is a group of nodes that are interconnected and inspired by how the human brain works. ANN tries to find the relationship between features in a dataset and classifies them according to a specific architecture. Convolutional Neural Networks (CNN): The CNN architecture for classification includes convolutional layers, max-pooling layers, and fully connected layers. Convolution and max-pooling layers are used for feature extraction. While convolution layers are meant for feature detection, max-pooling layers are meant for feature selection. Transformers: They are a class of deep neural network models. A transformer is a deep learning model extensively used in natural language processing applications that adopt the mechanism of self-attention, deferentially weighting the significance of each part of the input data.
4 Experiment This section discusses the experiment conducted using the proposed approach discussed in Sect. 3. A detailed explanation of the dataset used and the experimental testbeds are due in this section.
Machine Learning Approaches for Detecting Signs of Depression …
207
Table 1 A snapshot of the dataset used Posting ID
Text
train_pid_8231
Words can’t describe how bad I feel right now: I just want to Severe fall asleep forever
Label
train_pid_1675
I just tried to cut myself and couldn’t do it. I need someone to talk to
Moderate
train_pid_6982
Didn’t think I would have lived this long to see 2020: Don’t even know if this is considered an accomplishment
Not depression
4.1 Dataset This experiment uses a publicly available dataset as part of the shared task on Detecting Signs of Depression from Social Media Text as part of the Second Workshop on Language Technology for Equality, Diversity, and Inclusion (LT-EDI-2022) at ACL 2022 (Kayalvizhi et al., 2022). The dataset consists of training, development, and test set and the files are in tab-separated format with three columns—Posting ID, Text, and Label. A snapshot of the dataset is shown in Table 1. In the dataset, Not Depression represents that the user doesn’t show a sign of depression in his social media texts, Moderate label denotes that the user shows some signs of depression, and Severe represents that the user shows clear signs of depression through his social media texts. The dataset contains a total of 13,387 data points, out of which 3801 belong to Not Depression, 8325 belong to Moderate, and 1261 belong to Severe classes.
4.2 Experimental Setup This section describes the experimental setup we have used for our proposed approach. All the methods described in this paper were implemented in Python 3.8. The experiments were run on a server configured with IntelI Core I i5-10300H CPU @ 2.50 GHz core processor and 8 GB of main memory. Firstly, the dataset was pre-processed to remove the stop words, URLs, and other special characters. The emoji library available at https://pypi.org/project/demoji/ was used for converting emojis into textual forms. We have also used English contractions list for better text enhancement. During the analysis, we found that the dataset is unbalanced, and to balance the dataset, we performed down-sampling or up-sampling. The feature engineering stage deals with computing different features such as count vectorization, word embeddings, and TF-IDF, afterword tokenization. This work used texts
208
S. Jickson et al.
to sequences in Keras https://www.tensorflow.org/api_docs/python/tf/keras/prepro cessing/text/Tokenizer and BERT tokenization from https://huggingface.co/docs/tra nsformers/main_classes/tokenizer for the transformers. Once the experiment-ready dataset is obtained, we have splitted the dataset into train and test split and then used the algorithms listed in Sect. 3.
5 Results and Discussions This section details the results obtained from the experimental setup explained in Sect. 4 that implemented the proposed approach discussed in Sect. 3. Different machine learning algorithms were implemented on the dataset mentioned in Sect. 4.1 and the results are compared. Tables 2, 3, 4, 5 shows the precision, recall, and F1-score for the Logistic Regression, Support Vector Machines, Naive Bayes, and Random Forest classification algorithms. For the Logistic Regression algorithm, the precision, recall, and F1-score for the Not Depression class are found to be 87%, 89%, and 88%, respectively, and for the Moderate class, the values were found to be 90%, 84%, and 87%. The class Severe has recorded a precision of 95%, recall of 0.99%, and 97%, respectively. The Support Vector Machines (SVM), the Not Depression class has attained a precision of 85%, recall of 89%, and f1-score of 87%, respectively, and for the Moderate class, it is 89%, 82%, and 85%. The Severe class has recorded 95%, 98%, Table 2 Classification report for the logistic regression algorithm
Precision
Recall
F1-score
Not Depression
0.87
0.89
0.88
Moderate
0.90
0.84
0.87
Severe
0.95
0.99
Accuracy
Table 3 Classification report for the support vector machine algorithm
0.97 0.91
Macro average
0.91
0.91
0.91
Weighted average
0.91
0.91
0.91
Precision
Recall
F1-score
Not Depression
0.85
0.89
0.87
Moderate
0.89
0.82
0.85
Severe
0.95
0.98
Accuracy
0.97 0.90
Macro average
0.90
0.90
0.90
Weighted average
0.90
0.90
0.90
Machine Learning Approaches for Detecting Signs of Depression … Table 4 Classification report for the naïve bayes algorithm
209
Precision
Recall
F1-score
Not Depression
0.88
0.81
0.85
Moderate
0.86
0.83
0.84
Severe
0.89
0.99
Accuracy
Table 5 Classification report for the random forest algorithm
0.93 0.87
Macro average
0.87
0.88
0.87
Weighted average
0.87
0.87
0.87
Precision
Recall
F1-score
Not Depression
0.82
0.86
0.84
Moderate
0.84
0.82
0.83
Severe
0.96
0.93
0.95
Macro average
0.87
0.87
0.87
Weighted average
0.87
0.87
0.87
Accuracy
0.87
and 97% of precision, recall, and f1-score, and the SVM algorithm has shown a weighted average of 90% for all three classes. The Naive Bayes classifier has recorded 0.88% for precision, 81% for recall, and 85% for the f1-score for Not Depression class and 86%, 83%, and 84% for the precision, recall, and f1-score for the Moderate class. For the Severe class, the recorded values were 89%, 99%, and 93% for precision, recall, and accuracy, respectively. For the Not Depression class, the Random Forest classification algorithm has scored 82%, 86%, and 84% for the precision, recall, and f1-score, respectively, and for the Moderate class, the values were found to be 84%, 82%, and 83%, respectively. The Severe class has recorded a precision of 96%, recall of 93%, and f1-score of 95%. Graphs representing the precision, recall, and f1-score comparison for Logistic Regression, Support Vector Machines, Naive Bayes, and Random Forest are shown in Fig. 2. The classification report for the Transformer with BERT-Base-Cased and with BERT-Base-Uncased models are shown in Table 6 and Table 7, respectively. For the BERT-Base-Cased, a precision value of 87%, a recall value of 83%, and an f1-score of 85% were recorded, and for the Moderate class, the corresponding values were 77%, 87%, and 81%, respectively. For the Severe class, the precision was 87%, the recall was 87%, and the f1-score was also 87%. On the other hand, the BERT-BaseUncased model performed poorly and recorded a precision of 73%, recall of 67%, and f1-score of 70%. The Moderate and the Severe classes have attained the precision, recall, and f1-score of 77%, 87%, and 81%, and 80%, 71%, and 76%, respectively. Figure 3. shows the precision, recall, and accuracy comparison for the transformer model with BERT-Base-Cased and BERT-Base-Uncased pre-trained embeddings.
210
S. Jickson et al.
(a) The precision, recall and F1-score comparison for Logistic Regression
(b) The precision, recall and F1-score comparison for Support Vector Ma-
(c) The precision, recall and F1-score
(d) The precision, recall and F1-score
Fig. 2 The precision, recall, and accuracy comparison for logistic regression, support vector machinenaïveive bayes, and random forest algorithms Table 6 Classification report for transformer with BERT-Base-Cased model
Precision
Recall
F1-score
Not Depression
0.87
0.83
0.85
Moderate
0.77
0.87
0.81
Severe
0.87
0.87
Table 7 Classification report for transformer with BERT-Base-Uncased model
0.87 0.84
Accuracy Macro average
0.85
0.84
0.85
Weighted average
0.85
0.84
0.85
Precision
Recall
F1-score
Not Depression
0.73
0.67
0.70
Moderate
0.77
0.87
0.81
Severe
0.80
0.71
0.76 0.81
Accuracy Macro average
0.79
0.76
0.77
Weighted average
0.80
0.81
0.80
Machine Learning Approaches for Detecting Signs of Depression …
(a) Precision, recall, accuracy for the Transformer with BERT-Base-Cased model
211
(b) Precision, recall, accuracy for the Transformer with BERT-Base-Uncased model
Fig. 3 Precision, Recall, and Accuracy comparison for the transformer with BERT-Base-Cased and BERT-Base-Uncased models
Table 8 Classification report for artificial neural network model
Precision
Recall
F1-score
Not Depression
0.82
0.80
0.81
Moderate
0.74
0.81
0.77
Severe
0.89
0.82
0.86 0.81
Accuracy Macro average
0.82
0.81
0.81
Weighted average
0.82
0.81
0.81
The classification report for the Artificial Neural Network (ANN) and Convolutional Neural Network (CNN) are shown in Table 8 and Table 9, respectively. The ANN has scored a precision of 82%, a recall of 80%, and an f1-score of 81% for the Not Depression class; a precision of 74%, a recall of 81%, and an f1-score of 77% for the Moderate class. The Severe class has scored 89% for precision, 82% for recall, and 86% for f1-score for ANN. The Convolutional Neural Network has attained 88% precision, 79% recall, and 83% f1-score for the Not Depression class, 6% precision, 90% recall, and 78% f1-score for the Moderate class, and 92% precision, 73% recall, and 81% f1-score for the Severe class. Figure 4 Shows the precision, recall, and accuracy comparison of ANN and CNN models. Table 9 Classification report for convolutional neural network
Precision
Recall
F1-score
Not Depression
0.88
0.79
0.83
Moderate
0.69
0.90
0.78
Severe
0.92
0.73
Accuracy
0.81 0.80
Macro average
0.83
0.80
0.81
Weighted average
0.83
0.80
0.81
212
S. Jickson et al.
(a) Precision, recall, accuracy for the ANN model
(b) Precision, recall, accuracy for the CNN model
Fig. 4 Precision, Recall, and Accuracy comparison for the ANN and CNN models
Table 10 Summary of the precision, recall, and accuracy of all the models Model
Precision
Recall
F1-score
Accuracy
Logistic Regression
0.91
0.91
0.91
0.91
Support Vector Machine
0.90
0.90
0.90
0.90
Naïve Bayes
0.87
0.87
0.87
0.87
Random Forest
0.87
0.87
0.87
0.87
Transformer (BERT-Base-Cased)
0.85
0.84
0.85
0.84
Transformer (BERT-Base-Uncased)
0.80
0.81
0.80
0.81
Artificial Neural Network
0.82
0.81
0.81
0.81
Convolutional Neural Network
0.83
0.80
0.81
0.80
The precision, recall, and f1-score value comparison for all the models considered in the proposed approach is given in Table 10 and the corresponding graphical comparison is shown in Fig. 5. From Table 10 and Fig. 5, it is evident that for the considered dataset, shallow machine learning approaches showcased better precision, recall, and f1-score, but the results for the Transformer models also looks promising. This indicates that more analysis should be done using transformers with other pre-trained models to confirm the potential for better classification.
6 Conclusions Depression, one of the most severe mental disorders should be identified during its initial stages to give proper medical attention to any individual. The number of people who share their mental states on online social media has grown exponentially due to several factors such as limited mobility and social activities during recent times. So, it is highly evident that machine learning approaches need to be developed and
Machine Learning Approaches for Detecting Signs of Depression …
213
Fig. 5 The precision, recall, f1-score, and accuracy summary for all the models
implemented for the early detection of depression-related information. This work attempted to implement different machine learning algorithms to classify the severity of depression-related social media posts. The experimental results show that machine learning may be highly useful in identifying the signs of depression from social media. As the initial results look promising, the authors may continue implementing more machine learning algorithms for depression detection and analysis in the future.
References 1. Aggarwal K, Singh SK, Chopra M, Kumar S (2022) Role of social media in the COVID-19 pandemic: A literature review. Data Min Approaches Big Data Sentim Anal Soc Media, 91–115 2. Angskun J, Tipprasert S, Angskun T (2022) Big data analytics on social networks for real-time depression detection. J Big Data 9(1):1–15 3. Chiong R, Budhi GS, Dhakal S (2021) Combining sentiment lexicons and content-based features for depression detection. IEEE Intell Syst 36(6):99–105 4. Chiong R, Budhi GS, Dhakal S, Chiong F (2021) A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Comput Biol Med 135:104499 5. de Jesús Titla-Tlatelpa J, Ortega-Mendoza RM, Montes-y-Gómez M, Villaseñor-Pineda L (2021) A profile-based sentiment-aware approach for depression detection in social media. EPJ Data Sci 10(1):54 6. Deshpande M, Rao V (2017) Depression detection using emotion artificial intelligence. In 2017 International Conference on Intelligent Sustainable Systems (ICISS), IEEE, pp 858–862 7. Evans-Lacko S, Aguilar-Gaxiola S, Al-Hamzawi A, Alonso J, Benjet C, Bruffaerts R, Thornicroft G (2018) Socio-economic variations in the mental health treatment gap for people with anxiety, mood, and substance use disorders: results from the WHO World Mental Health (WMH) surveys. Psychol Med 48(9):1560–1571
214
S. Jickson et al.
8. Funk M (2012) Global burden of mental disorders and the need for a comprehensive, coordinated response from health and social sectors at the country level 9. Hall JA, Liu D (2022) Social media use, social displacement, and well-being. Curr Opin Psychol, 101339 10. Kayalvizhi S, Durairaj T, Chakravarthi BR (2022) Findings of the shared task on detecting signs of depression from social media. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, pp 331–338 11. Lara JS, Aragón ME, González FA, Montes-y-Gómez M (2021) Deep bag-of-sub-emotions for depression detection in social media. In International Conference on Text, Speech, and Dialogue, pp 60–72. Springer, Cham 12. Lekshmi S, Anoop VS (2022) Sentiment analysis on COVID-19 news videos using machine learning techniques. In Proceedings of International Conference on Frontiers in Computing and Systems, pp. 551–560. Springer, Singapore 13. Liu D, Feng XL, Ahmed F, Shahid M, Guo J (2022) Detecting and measuring depression on social media using a machine learning approach: systematic review. JMIR Ment Health, 9(3), e27244 14. Ortega-Mendoza RM, Hernández-Farías DI, Montes-y-Gómez M, Villaseñor-Pineda L (2022) Revealing traces of depression through personal statements analysis in social media. Artif Intell Med 123:102202 15. Ren L, Lin H, Xu B, Zhang S, Yang L, Sun S (2021) Depression detection on reddit with an emotion-based attention network: algorithm development and validation. JMIR Med Inform 9(7):e28754 16. Renaud-Charest O, Lui LM, Eskander S, Ceban F, Ho R, Di Vincenzo JD, McIntyre RS (2021) Onset and frequency of depression in post-COVID-19 syndrome: A systematic review. J Psychiatr Res 144:129–137 17. Shah FM, Ahmed F, Joy SKS, Ahmed S, Sadek S, Shil R, Kabir MH (2020) Early depression detection from social network using deep learning techniques. In 2020 IEEE Region 10 Symposium (TENSYMP), IEEE. pp 823–826 18. Shen G, Jia J, Nie L, Feng F, Zhang C, Hu T, Zhu W (2017). Depression detection via harvesting social media: A multimodal dictionary learning solution. In IJCAI (pp.3838–3844) 19. Tao X, Zhou X, Zhang J, Yong J (2016) Sentiment analysis for depression detection on social networks. In International Conference on Advanced Data Mining and Applications, pp 807– 810. Springer, Cham 20. Varghese M, Anoop VS (2022). Deep learning-based sentiment analysis on COVID-19 News Videos. In Proceedings of International Conference on Information Technology and Applications, pp 229–238. Springer, Singapore 21. Xiong F, Zang L, Gao Y (2022) Internet penetration as national innovation capacity: worldwide evidence on the impact of ICTs on innovation development. Inf Technol Dev 28(1):39–55 22. Yang K, Zhang T, Ananiadou S (2022) A mental state Knowledge–aware and Contrastive Network for early stress and depression detection on social media. Inf Process Manage 59(4):102961 23. Yang K, Zhang T, Ananiadou S (2022) A mental state Knowledge–aware and Contrastive Network for early stress and depression detection on social media. Inf Process Manag 59(4):102961 24. Zogan H, Razzak I, Jameel S, Xu G (2021) Depressionnet: learning multi-modalities with user post summarization for depression detection on social media. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 133–142 25. Zogan H, Razzak I, Wang X, Jameel S, Xu G (2022) Explainable depression detection with multi-aspect features using a hybrid deep learning model on social media. World Wide Web 25(1):281–304
Extremist Views Detection: Definition, Annotated Corpus, and Baseline Results Muhammad Anwar Hussain, Khurram Shahzad, and Sarina Sulaiman
Abstract Extremist view detection in social networks is an emerging area of research. Several attempts have been made to extremist views detection on social media. However, there is a scarcity of publicly available annotated corpora that can be used for learning and prediction. Also, there is no consensus on what should be recognized as an extremist view. In the absence of such a description, the accurate annotation of extremist views becomes a formidable task. To that end, this study has made three key contributions. Firstly, we have developed a clear understanding of extremist views by synthesizing their definitions and descriptions in the academic literature, as well as in practice. Secondly, a benchmark extremist view detection corpus (XtremeView-22) is developed. Finally, baseline experiments are performed using six machine learning techniques to evaluate their effectiveness for extremist view detection. The results show that bigrams are the most effective feature and Naive Bayes is the most effective technique to identify extremist views in social media text. Keywords Extremism · Extremist view detection · Machine learning · Classification · Social media listening · Twitter
1 Introduction Rising evidence has revealed that social media play a crucial role in unrest creation activities [24]. Researchers and policymakers have also reached a broad agreement on the link between social media use and the active role of extremist organizations, M. A. Hussain (B) · S. Sulaiman Department of Computer Science, University of Technology Malaysia, Johor Bahru, Malaysia e-mail: [email protected] S. Sulaiman e-mail: [email protected] K. Shahzad Department of Data Science, University of the Punjab, Lahore, Pakistan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_18
215
216
M. A. Hussain et al.
such as the Islamic State of Iraq and Al-Sham (ISIS) [9, 10, 27]. It is the reason that social media platforms, such as Twitter, provide unrestricted access where individuals, interest groups, and organizations can engage in the discussions of their choice, including extremist discussions and recruitment without fear of repercussions. Furthermore, social media content has the potential to reach millions of users in a short span of time. It is widely recognized that extremist groups use social media for spreading their ideology, fundraising, recruiting, attracting innocent young people, and using them for their cynical causes. For instance, the growth of the Islamic State in Iraq and Syria (ISIS) to tens of thousands of people has been partly attributed to its increased use of social media for propaganda and recruiting purposes. Recognizing the challenge, online extremism, propaganda proliferation, and radicalization detection in social media have received attention from researchers during the last decade [8, 13, 18]. Developing automated techniques for the detection of extremist viewpoints is a challenging undertaking because there are differences in understanding how the notation of extremism should be described. This implies that depending on the definition of extremist views, some communication may be judged as extremist by one fragment of the society and not by others. To that end, this study has made the following key contributions. Conceptualized extremism. We have gathered the existing definitions and descriptions of extremism from diverse sources, including popular dictionaries, academic literature, as well as the descriptions of regularity bodies. Subsequently, these details are synthesized to clearly conceptualize the notion of extremism. To the best of our knowledge, this is the first-ever attempt to develop a clear understanding of the notion of extremism before developing any corpus. Development of extremism detection corpus. A literature search is performed to identify detection benchmark corpora. The identified corpora are examined and the research gap is established. Subsequently, we have developed an extremist views detection (XtremeView-22) corpus based on the developed understanding. The corpus is readily available for extremist views detection in social media. Evaluation of supervised learning techniques. Finally, baseline experiments are performed to evaluate the effectiveness of machine learning techniques for extremist view detection in social media. The baseline results and the generated corpus will be useful for fostering research on extremism detection in social media. The rest of this study is outlined as follows. Section 2 discussed the definitions and descriptions of extremist views in the literature. Section 3 presents an overview of the existing corpora and the details of our newly developed XtremeView-22 corpus. Section 4 presents the experimental setup and the baseline results of the experiments. Finally, Sect. 5 concludes the paper.
Extremist Views Detection: Definition, Annotated Corpus, and Baseline …
217
2 Conceptualizing Extremism There are multiple definitions and descriptions of the term extremism. However, there is no widely accepted academic definition nor there is a global description of the term extremism [23]. Therefore, to conceptualize the term extremism, this study has used three types of sources for collecting descriptions of the term extremism. It includes glossaries or dictionaries, scientific literature, and real-world practice as presented in policy and regulations of governments. The details of all three types of sources are presented below.
2.1 Extremism in Dictionaries As a starting point, we have identified the definitions of extremism as presented in the established dictionaries. It includes printed dictionaries of the English language, online dictionaries, and the encyclopedia. In particular, the notable glossaries used in the study are the Advanced American dictionary, Oxford English Dictionary, The Oxford Essential Dictionary of the U.S military, and the Oxford Learner’s dictionary of academic English. Table 1 presents the definitions of extremism as presented in these sources. It can be observed from the table that most dictionaries define it as a noun and that a majority of the definitions focus on political and religious views to refer to extremism. In contrast, the other key facets, such as economic or social views, are not considered extremist in any dictionary. Furthermore, some dictionaries present a brief and high-level definition of the term extremism, whereas others are more specific in defining the term. Besides being specific, these dictionaries present a broader scope of the notion of extremism by including views, conspiracies, actions, and measures of extreme nature in defining extremism. Table 1 Definitions and descriptions of extremism in dictionaries Refs.
Definition
[6]
“Extremism as a noun is the political, religious, etc., ideas or actions that are extreme and not normal, reasonable, or acceptable to most people”
[5]
Oxford Learner’s Dictionary of Academic English defines extremism as a noun that is “the holding of extreme political or religious views; fanaticism”
[19]
“A person who holds extreme political or religious views, especially one who advocates illegal, violent, or other extreme action”
[7]
“Supporting beliefs that are extreme”
[15]
“The chiefly derogatory a person who holds extreme or fanatical political or religious views, especially one who resorts to or advocates extreme action: political extremists and extremist conspiracy”
218
M. A. Hussain et al.
Table 2 Descriptions of extremism in the scientific literature Refs. Description [25]
“Extremism in religion is studied extensively and has led to associate it with a particular religion”
[11]
“Extremism usually refers to the ideology that may be religious or political, that is unacceptable to the general perception of the society”
[28]
The ideology of extremism is an ideology of intolerance toward enemies, justifying their suppression, assuming the existence of dissident citizens, and recognizing only its own monopoly on the truth, regardless of legal attitudes (therefore, extremist activity is almost always an unconstitutional activity)
[20]
“An ideological movement, contrary to the democratic and ethical values of a society, that uses different methods, including violence (physical or verbal) to achieve its objectives”
[4]
Extremism is “the quality or state of being extreme”
[14]
“Violent extremism refers to the action through which one adopts political, social, and religious ideation that leads to the initiation of violent acts”
[3]
“Extremism is also defined as a set of activities (beliefs, attitudes, feelings, actions, strategies) of a character far removed from the ordinary”
[16]
“Language which attacks or demeans a group based on race, ethnic origin, religion, disability, gender, age, disability, or sexual orientation/gender identity”
[26]
Online extremism “as Internet activism that is related to, engaged in, or perpetrated by groups or individuals that hold views considered to be doctrinally extremist”
2.2 Extremism in the Scientific Literature This study has performed a comprehensive search of academic literature in the quest for understanding extremism from a scientific literature perspective. Table 2 presents the notable studies that have attempted to describe extremism. It can be observed from the literature that similar to the dictionary definitions most of the scientific literature has associated extremism with religion and political ideology. However, in contrast to the dictionary definitions, few scientific studies have also included ethical and social values of the society in the scope of extremism. Also, these studies have emphasized intolerance and the use of violence in defining the notion of extremism. A few other studies have defined extremism as the language which attacks or demeans a group based on its characteristics or the statements that convey the message of intolerant ideology toward an out-group, immigrant or enemies.
2.3 Extremism in Practice The third type of sources that are considered for conceptualizing extremism is based on the descriptions used by government agencies and regularity bodies, to combat extremism. Table 3 presents a summary of the descriptions as presented in these sources. It can be observed from the table that, in essence, the constituents
Extremist Views Detection: Definition, Annotated Corpus, and Baseline …
219
Table 3 Descriptions of extremism in practice Refs.
Description
[17]
“All conduct publicly inciting to violence or hatred directed against between EU and a group of persons or a member of such a group defined by reference companies to race, color, religion, descent or national or ethnic”
[12]
The UK Government characterizes extremism as “opposition to fundamental values, including democracy, the rule of law, individual liberty, and respect and tolerance for different faiths and beliefs”
of extremism are inciting violence or hatred against an individual or community on the basis of race, color, religion, and national or ethnic affiliations. Another notable observation is that the concept of extremism is mostly discussed in association with liberalism, freedom, and the fundamental values of society. More specifically, extremism is an active uttered opposition to fundamental values, tolerance, and respect for different beliefs and faith. It is a hostile idea to liberty norms, such as democracy, freedom, sexual equality, discrimination, sectarianism, against human rights, freedom of expression, and segregation of person or folk or group. EU, UK, and US have their own counter-extremism strategies to combat this evil in their society to ensure the security of their citizens. The UN has also developed global counterterrorism policies for its member states, which is the field of artificial intelligence where the system can learn from features and improve based on. In summary, although there are several differences between the three types of sources discussed above, there are also some commonalities in defining extremism. For instance, extremism encompasses political, social or religious views. That is, all stakeholders agree that a hatred behavior targeted at an individual on the basis of religion, race, color, nationality, freedom, gender equality, and violence against social values, political beliefs, and religious views of certain specifications can be recognized as extremism. Furthermore, extremism promotes an ideology of asymmetric social groups, well-defined by race, ethnicity or nationality as well as authoritarian concept of society.
3 Extremism Detection Corpus This section focuses on the second contribution, the development of extremism detection (XtremeView-22) corpus. In particular, firstly an overview of the existing datasets and the limitations of these datasets are discussed. Subsequently, the process of developing the proposed corpus and the specifications of the XtremeView-22 corpus are presented.
220
M. A. Hussain et al.
Table 4 Summary of the extremism detection datasets Refs.
Annotations
Extremism
[21]
17,000
Not available
Religious
[2]
122,000
Not available
Religious
[2]
122,619
Not available
Religious
[2]
17,391
Not available
Religious
[1]
10,000
Extreme 3001, non-extreme 6999
Religious
Support 788, refute 46, empty 1850
Religious
[22]
Size
2684
3.1 Extremism Detection Corpora A literature search is performed to identify the studies that focus on an NLP-based approach for extremism detection. An overview of the identified studies is presented in Table 4. It can be observed from the table that six extremism detection corpora are available. The second observation is that the benchmark data annotations that define whether a given sentence is extremism or not are merely available for two datasets. Consequently, the remaining four datasets can neither be used to reproduce the existing results, nor these datasets are readily available for generating new results. A further examination of the two datasets revealed that the benchmark annotations of one ISIS-Religious dataset [22] are partially available. That is, out of the 2684 sentences, the annotations of merely 834 sentences are available, whereas the annotations of the remaining 1850 are not available. Therefore, the ISIS-Religious datasets are not readily usable. Finally, it can be observed that most of the extremism detection datasets focus on the religious perspective which is contrary to our understanding of the notation of extremism. That is, the synthesis of various definitions and descriptions presented in the preceding section concluded that extreme political and social views should also be considered as extremist views.
3.2 Development of XtremeView-22 This study has developed an extremism detection (XtremeView-22) corpus by using a seed corpus, ISIS-Religious. As a starting point for the development, the raw tweets were examined. It was observed that the tweets included residue and garbage values that do not play any role in the identification of extremism, it includes hashtags, images, URLs, emotions, smileys, etc. The tweets were cleaned by removing these contents using a Python script. Also, prior to the data annotation, duplicate tweets are omitted. Furthermore, the text samples that were comprised of multiple tweets were also eliminated to ensure that message replies are not interpreted without the context of the original message.
Extremist Views Detection: Definition, Annotated Corpus, and Baseline … Table 5 Specifications of the XtremeView-22 corpus
Item
No. of tweets
Extremist views
2413
Non-extremist views Total
221
215 2629
For the data annotation, two researchers reviewed a random sample of the tweets and annotated them as an extremist view or non-extremism view. Note, that both researchers took into consideration the meanings of the understanding of the concept of extremism based on the findings presented in the preceding section. The results were merged and the conflicts were resolved. The process was repeated a few times to develop a consistent understanding of the concept of extremism. Finally, one researcher performed all the annotations and the other researcher verified the annotations. Accordingly, we developed the XtremeView-22 corpus which is composed of 2684 tweets, where every tweet is marked as either Extremist or Non-extremist view. A key feature of the corpus is that all the annotations are complete and they are freely and publicly available for use by the research community. The specification of the established corpus is presented in Table 5. We contend that this substantial amount of extremist views represents the existence of a threat that needs to be detected and eradicated. On the other hand, the imbalance in the developed corpus presents a challenge for the machine learning techniques to learn and predict Extremist views. We contend that the imbalance in the corpus provides an opportunity for the interested research community to develop techniques for the detection of Extremist views and to enhance the corpus for handling the imbalance problem in the context of extremism detection.
4 Effectiveness of ML Techniques This section presents the baseline experiments that are performed to evaluate the effectiveness of supervised machine learning techniques for extremist view detection. Experiments are performed using six classical techniques. The choice of the techniques is based on the diversity of the underlying mechanism of these techniques for the text classification task. It includes Support Vector Machine (SVM), Decision Trees (DT), K-Nearest Neighbor (KNN), Naive Bayes (NB), Random Forest (RF), and Logistic Regression (LR). These techniques are fed with two types of features, unigrams, and bigrams. Note, there are other state-of-the-art deep learning techniques that are found to be more effective for various NLP tasks. However, these techniques require a large amount of annotated data for learning and prediction, which is not available for the task of extremism detection. Therefore, in this study, experiments are not performed using deep learning techniques.
222
M. A. Hussain et al.
Table 6 Summary results of experiments Techniques
Unigram P
Bigram R
F1
P
R
F1
Naïve bayes
0.863
0.738
0.743
0.868
0.853
0.859
Random forest
0.862
0.738
0.790
0.868
0.850
0.857
Decision tree
0.862
0.737
0.789
0.868
0.853
0.858
K-nearest neighbor
0.862
0.738
0.789
0.865
0.852
0.857
Logistic regression
0.865
0.742
0.793
0.870
0.850
0.858
Support vector machine
0.860
0.736
0.787
0.862
0.849
0.853
For the reliability of results, tenfold cross-validation is performed, and Precision, Recall, and F1 scores are calculated. Finally, the macro average of the tenfold results is computed. For the experiments, Tensorflow and Scikit-learn, are used. Prior to the experimentation, the pre-processing, including tokenization, removing punctuations, and lemmatization is also performed. Table 6 presents the Precision, Recall, and F1 scores of the machine learning techniques. It can be observed from the table that all the techniques achieved a reasonable F1 score of at least 0.790 which represents that all the techniques can somewhat detect extremist views in text. It can also be observed from the results that Naive Bayes achieved the highest F1 score of 0.859 using bigram features. However, from the comparison of the results of all the techniques, it can be observed that all the other techniques achieved comparable F1 scores. A similar observation can be made about the effectiveness of all the techniques when unigram features were fed to all the techniques. These results represent that all the techniques are equally effective for the detection of extremist views. It can be observed from the table that the Precision scores are higher than the Recall scores which represents that most of the sentences that are predicted as extremist are actually extremist. Whereas, some extremist views are not detected by the techniques. From the comparison of the results of unigram and bigram features, it can be observed that all the techniques achieved a higher F1 score when bigram features are fed to the techniques. This represents that bigrams features have a higher ability to discriminate between Extremist and Non-extremist views.
5 Conclusion Extremists use social outlets to approach enormous audiences, distribute propaganda, and recruit members for their cynical causes. Several attempts have been made for online extremism detection, however, there is a scarcity of publicly accessible extremism detection datasets. Also, the existing datasets are confined to religious extremism, whereas no attempt has been made to detect socially and politically extreme views. Furthermore, there is no consensus on what should be recognized
Extremist Views Detection: Definition, Annotated Corpus, and Baseline …
223
as an extremist view. To that end, this is the first-ever study that has synthesized the definitions and descriptions of extremist views from dictionaries, academic literature, and in practice, and used them to conceptualize the notation of extremism. Subsequently, the developed understanding is used to manually develop a corpus of 2640 English tweets. Finally, experiments are performed to evaluate the effectiveness of machine learning techniques. The results conclude that bigrams are the most effective features for extremism views detection and Naive Bayes is the most effective technique. In the future, we aim to scale the size of the dataset so that it can be used by deep learning techniques. Also, the effectiveness of various types of features will be evaluated.
References 1. Aaied A (2020) ISIS Twitter. https://www.kaggle.com/datasets/aliaaied/isis-twitter 2. Activegalaxy (2019) Tweets targeting Isis. https://www.kaggle.com/datasets/activegalaxy/isisrelated-tweets 3. Asif M, Ishtiaq A, Ahmad H, Aljuaid H, Shah JJT, Informatics (2020) Sentiment analysis of extremism in social media from textual information 48:101345 4. Berger JM (2018) Extremism. MIT Press 5. Dictionary O (2000) Oxford advanced learner’s dictionary. Oxford University Press, Oxford 6. Dictionary OAA (2022) Oxford Advanced American Dictionary. https://www.oxfordlearnersd ictionaries.com/definition/american_english/ 7. Dictionary TJRA (2012) The free dictionary 17 8. Frissen T (2021) Internet, the great radicalizer? Exploring relationships between seeking for online extremist materials and cognitive radicalization in young adults. Comput Hum Behav 114:106549 9. Hassan G, Brouillette-Alarie S, Alava S, Frau-Meigs D, Lavoie L, Fetiu A, … Rousseau C (2018) Exposure to extremist online content could lead to violent radicalization: a systematic review of empirical evidence. Int J Dev Sci 12(1–2):71–88 10. Hollewell GF, Longpre N (2022) Radicalization in the social media era: understanding the relationship between self-radicalization and the internet. Int J Offender Ther Comp Criminol 66(8):896–913. https://doi.org/10.1177/0306624X211028771 11. Lipset SMJTBJoS (1959) Social stratification and ‘right-wing extremism’ 10(4):346–382 12. Lowe DJSiC, Terrorism (2017) Prevent strategies: the problems associated in defining extremism: the case of the United Kingdom 40(11):917–933 13. Matusitz JJCSQ (2022) Islamic radicalization: a conceptual examination (38) 14. Misiak B, Samochowiec J, Bhui K, Schouler-Ocak M, Demunter H, Kuey L, … Dom GJEP (2019) A systematic review on the relationship between mental health, radicalization and mass violence✩ 56(1):51–59 15. Nicholson O (2018) The Oxford dictionary of late Antiquity. Oxford University Press 16. Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. Paper presented at the Proceedings of the 25th international conference on world wide web 17. Quintel T, Ullrich C (2020) Self-regulation of fundamental rights? The EU Code of Conduct on Hate Speech, related initiatives and beyond. In Fundamental rights protection online. Edward Elgar Publishing, pp 197–229 18. Rea SC (2022) Teaching and confronting digital extremism: contexts, challenges and opportunities. Inf Learn Sci 19. Stevenson A (2010) Oxford dictionary of English. Oxford University Press, USA
224
M. A. Hussain et al.
20. Torregrosa J, Bello-Orgaz G, Martínez-Cámara E, Ser JD, Camacho DJJoAI, Computing H (2022) A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges 1–37 21. Tribe F (2019) How ISIS uses Twitter. https://www.kaggle.com/datasets/fifthtribe/how-isisuses-twitter 22. Tribe F (2019) Religious texts used by ISIS. https://www.kaggle.com/datasets/fifthtribe/isisreligious-texts 23. Trip S, Bora CH, Marian M, Halmajan A, Drugas MI (2019) Psychological mechanisms involved in radicalization and extremism. A rational emotive behavioral conceptualization. Front Psychol 10:437 24. Whittaker J (2022) Online radicalisation: the use of the internet by Islamic State terrorists in the US (2012–2018). Leiden University 25. Wibisono S, Louis WR, Jetten JJFip (2019) A multidimensional analysis of religious extremism 10:2560 26. Winter C, Neumann P, Meleagrou-Hitchens A, Ranstorp M, Vidino L, Fürst JJIJoC, Violence (2020) Online extremism: research trends in internet activism, radicalization, and counterstrategies 14:1–20 27. Youngblood M (2020) Extremist ideology as a complex contagion: the spread of far-right radicalization in the United States between 2005 and 2017. Humanit Social Sci Commun 7(1):1–10 28. Zhaksylyk K, Batyrkhan O, Shynar M (2021) Review of violent extremism detection techniques on social media. Paper presented at the 2021 16th international conference on electronics computer and computation (ICECCO)
Chicken Disease Multiclass Classification Using Deep Learning Mahendra Kumar Gourisaria, Aakarsh Arora, Saurabh Bilgaiyan, and Manoj Sahni
Abstract The consumption of poultry, especially chicken, has gone up to hundreds of billions around the globe. With the large consumption, there is a high percentage of humans getting affected by diseases caused by chicken such as bird flu, which could cause serious illness or death. The mortality rate among the chicken also affects adversely the poultry farmers, as the disease spreads to other batches of chicken. The poultry market is huge and due to the rise in demand for consumption by humans, it is necessary to find a very intelligent system for the early identification of various diseases in chickens. The aim of this paper is to detect diseases in chickens at an early stage using deep learning techniques, preventing mortality in chickens, farmer’s loss due to mortality among chickens and ultimately keeping us healthy too. In this paper, various types of CNN models were implemented for the categorical classification of “Salmonella”, “Coccidiosis”, “Healthy” and “New Castle Disease”, and the best model was selected on the basis of efficiency with respect to the ratio of (Maximum Validation Accuracy) MVA and LVL (Least Validation Loss). A total of 7 CNN models and 5 Transfer Learning models were used for the detection of chicken disease and the proposed ChicNetV6 model showed the best results by gaining an efficiency score of 2.8198 and an accuracy score of 0.9449 with a total training time of 1125 seconds. Keywords Chicken disease · Deep learning · Poultry market · Multiclass classification · Chicken mortality
M. K. Gourisaria (B) · A. Arora · S. Bilgaiyan School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha 751024, India e-mail: [email protected] M. Sahni Department of Mathematics, Pandit Deendayal Energy University, Gandhinagar, Gujarat 382426, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_19
225
226
M. K. Gourisaria et al.
1 Introduction Poultry and its products are one of the most popular in the food industry. As a growing industry, the development of diseases among chickens and other animals results in potential harm to humans and the environment. The widespread disease causes large economic and environmental damage. A rapid rise in common poultry diseases such as Colibacillosis, salmonellosis, Newcastle Disease, chronic respiratory disorder and coccidiosis, followed by several bursal diseases, fowl cholera, nutritional deficiency and fowlpox, would result in even more responsibilities that developing nations are unprepared to handle. Hence, broiler animals, especially chicken welfare are critical not only just for human consumption, but also for productivity and economic benefit. So, an early detection technique is required to prevent the further spread of disease by treating the animals. Salmonella is a bacterial pathogen belonging to the genus Salmonella, which resides in the intestines that cause disease in both poultry and humans. Salmonella typhimurium (ST) and Salmonella enteritidis (SE) strains are linked to human illnesses spread through the poultry and broiler product food chain [1]. Salmonellosis can worsen mortality and performance losses in young birds due to overcrowding, starvation, and other stressful situations, as well as filthy surroundings [2]. Coccidiosis is caused by the apicomplexan protozoan Eimeria, which is the most severe parasitic disease in chickens. Infected animals’ development and feed consumption are severely hampered by coccidiosis, resulting in a loss of productivity [3]. Two popular diagnostic approaches include counting the number of oocysts (oocysts per gram [opg]) in the droppings or checking the digestive system to determine lesion scores. Although measures like management and biosecurity could prevent Eimeria from breaching the farms, in practice, they are insufficient to prevent coccidiosis outbreaks [4]. Newcastle disease is spread worldwide by virulent Newcastle disease virus (NDV) strains that infect avian species. Because of the low contact rate, NDV spreads relatively slowly within and across village poultry populations. The faecal-oral route seems to be the most common method of transmission [5]. Deep Learning and Machine Learning are becoming the epicentre of technology by advancing in many fields like health care, engineering, medicine and many more. Some of the contributions include Diabetes mellitus diagnosis [6] where the K-Nearest Neighbors machine algorithm performed the best, Liver Disease Detection [7] and Maize Leaf detection [8]. In this research article, we have implemented 12 state-of -the-art architectures, where 7 proposed CNN and 5 transfer learning were trained and evaluated on various performance metrics such as F1-score, precision, Efficiency ratio, AUC and Recall. The rest of the paper is divided into the following parts II. Related Work, III. Dataset Preparation, IV. Technology and Software Used, V. Implementation and Results and VI. Conclusions and Future work.
Chicken Disease Multiclass Classification Using Deep Learning
227
2 Related Work As mentioned, poultry farming, especially chicken, is one of the fastest-growing industries and serious measures need to be taken to prevent them from various hazards and diseases. A more feasible approach is for early detection of the disease in chickens using the Deep Learning approach. Classical Machine Learning (ML) and Deep Learning (DL) approaches have been implemented by many researchers for the diagnosis of diseases in chickens. SVM was used by [9] to detect unhealthy broilers infected with avian flu. Their research developed an algorithm for classifying isolated infected broilers based on the examined structures and attributes, which was validated on test data and found to be 99% accurate. In another paper by [10], they used a deep learning approach for the detection of sick broilers and proposed Feature Fusion Single Shot MultiBox Detector (FSSD) to enhance the Single Shot MultiBox Detector (SSD) model using the InceptionV3 model as a base. They achieved a mean average precision (mAP) of 99.7%. Yoo et al. [11] proposed a continuous risk prediction framework for Highly pathogenic avian influenza (HPAI) disease and used ML algorithms like eXtreme Gradient Boosting Machine (GBM) and Random Forest. The model’s predictions for high risk were 8–10 out of 19 and the Gradient Boost algorithm performed well with an AUC curve of 0.88. Using Deep learning techniques, Akomolafe and Medeiros [12] performed a classification of Newcastle disease and Avian flu. The CNN models used gained accuracy of 95% and 98%, respectively. Wang et al. [13] proposed an auto-mated broiler digestive disorder detector that categorizes fine-grained aberrant broiler droppings photos as abnormal or normal using a deep Convolutional Neural Network model. For comparison, Faster R-CNN and YOLO-V3 were also constructed. Faster R-CNN gained recall and mAP at 99.1% and 93.3%, whereas YOLO-V3 attained 88.7% and 84.3%, respectively. In the study of [14], a machine vision-based monitoring system was presented for the detection of the Newcastle disease virus. The data was collected from live broilers as they walked and features were extracted using 2D shape posture shape descriptors and walk speed. From the used ML models, RBF- Support Vector Machine (SVM) gave the best results of 0.975 and 0.978 accuracies. Cuan et al. [15] presented a Deep Chicken Vocalization Network (DPVN) based on broiler vocals for early diagnosis of Newcastle Disease. They used sound technology for the extraction of poultry vocalizations and used it in the DL models. The best model achieved accuracy, F1-Score and recall of 98.50%, 97.33% and 96.60%, respectively. All of the above-mentioned implementations for the detection of chicken disease were good, but there were a few drawbacks, such as the fact that few papers concentrated on transfer learning models, while others focused on identifying a specific type of disease. A specific sickness cannot be identified via sound observation and chicken posture. Additionally, using the sound discrimination method in a group context is very difficult. Any variation in chicken droppings like colour, shape and texture can be detected in real-time, as birds defecate 12 times a day. Hence, disease detection through faecal images is the most efficient way.
228 Table 1 Class distribution
M. K. Gourisaria et al. Class name
Number of images
Salmonella
2625
Coccidiosis
2476
New castle disease Healthy
562 2404
3 Data Preparation 3.1 Dataset Used The dataset used was taken from Kaggle, where it was retrieved from UCI and the dataset was uploaded by Alland Clive [16]. The dataset contained 8067 image files along with a “.csv” file containing four classes which can be seen in Table 1.
3.2 Dataset Preparation Feature engineering and data augmentation were critical in balancing the unbalanced dataset during dataset creation. In our approach, we have used various data augmentation techniques like Zoom range, Horizontal flip, Rescale, Shear, Height shift range and Width shift range. In this paper, we have used shear, zoom, rescale, horizontal flip and rotation for the training image dataset and rescale feature for the test and validation dataset using Keras Image Data Generator (Fig. 1).
Fig. 1 Sample images of chicken faeces
Chicken Disease Multiclass Classification Using Deep Learning
229
3.3 Splitting Dataset, Hardware and Software Used The dataset was first split into two, with 70% as a training set and 30% as a testing set. The testing set was later divided into two equal haves as the test set and validation set in a ratio of 50%. All machine learning algorithms were implemented and analyzed using Python 3.7, and the libraries like scikit-learn, TensorFlow and Keras on a Google Colaboratory notebook. The workstation is equipped with an Intel i7 9th generation processor and 8 GB of RAM.
4 Technology Used 4.1 Convolutional Neural Network Convolutional Neural Network (CNN/ConvNet) is an algorithm of Deep learning that plays a major role in the field of computer vision. To distinguish one feature from another and build a spatial relationship between them, the algorithm assigns biases and weights to distinct characteristics of the input image. ConvNet re-choirs much lower pre-processing as compared to other classification algorithms. CNN consists of layers called Convolutional layers and these layers function on the Convolutional theorem’s principle, and go through the same procedure as backward and forwardfeed propagation. Response to stimuli in the Visual Cortex region of the human brain is done by individual neurons. Stimuli in the receptive field get a response from only individual neurons, which is a limited portion of the visual field. ConvNets are basically constructed of four types of layers which are Maxpool, Full-Connection, Convolutional, and Flattening. The in-variance translation property of a Convolution Neural Network can be defined as in the following Eq. 1. x(y(n)) = y(x(n))
(1)
A ConvNet accurately captures the spatial and temporal interactions in an image by using appropriate filters. The architecture achieves superior fitting to the picture dataset due to the reduced number of parameters and reusability of weights. In the new function, the properties of the old function may be readily described and changed. When images are treated as discrete objects, Convolutional Neural Network may be represented as shown in Eq. 2.
230
M. K. Gourisaria et al.
( f ∗ g)[n] =
m=+M
f [n − m]g[m]
(2)
m=−M
where f and g represent the input image and kernel function, respectively. The function g gets convoluted over f for the purpose of getting passed into a function called Rectified Linear Unit (ReLu), which is an activation function, for getting output as features.
4.2 Transfer Learning Transfer learning is a method where we can use a model which is already trained on a dataset and solve a new problem. In transfer learning, a computer leverages information from a previous dataset to improve prediction about a new task. Neural networks in computer vision are used to identify edges in the first layer, shapes in the second layer and task-specific properties in the third layer. The early and core layers are used in transfer learning, whereas the following layers are simply retrained. Because the model has already been trained, transfer learning can help one develop an effective machine learning model with less training data.
5 Implementation and Results In this section, we focus on all the CNN architectures implemented and performance metrics. For different architectures, we have used several parameters and layers such as a different number of Convolutional and Artificial layers, kernel size, different activation functions and optimizers. The input image size was set to (224 × 224). In this paper, we have implemented 7 CNN architectures from scratch and 5 Transfer Learning models. Each of the models was trained for 15 epochs. For proposing an efficient architecture, the efficiency ratio is considered to be the most important factor. So, the best CNN architecture was selected after analyzing and comparing the Efficiency score, Training Time and metrics like AUC, F1-Score and Recall. Equation (3) shows the mathematical formula for the calculation of the efficiency score Efficiency =
Maximum Validation Accuracy Least Validation Loss + Normalised Training Time
(3)
Chicken Disease Multiclass Classification Using Deep Learning
231
5.1 Experimentation and Analysis The creation of all CNN models was based on input image size, and the number of Convolutional, Maxpool and Dense layers. The input image size was set to 224 × 224 × 3 as default for all the architectures. This was done to obtain precise and accurate results. The abbreviations used in Tables 3, 4, 5, 6 and 7 have been defined in Table 2. After data exploration, we implemented all the CNN models with different parameters as mentioned above. All the models were executed with a random state set to 42 and were trained on trained data and validation data with a batch size set to 32. The best model was selected by using metrics calculated from elements of the confusion matrix (TP, FP, TN, FN) like Precision, F1-Score, Recall and Accuracy. All the information of various CNN models used in this paper has been mentioned in Tables 3 and 4 like the number of Convolutional, Maxpool and Artificial Layers used, Filters, Kernel Initializers and specifically which optimizer was used to reduce the cost function. From Table 3, we can see that the ChicNetV3 model gained the Maximum Validation Accuracy (MVA) of 0.9424 and the least Training Time of 1125 s, whereas ChicNetV1 gained the minimum Least Validation Loss (LVL) of 0.3270. As we can see from Table 4, the Transfer Learning model Xception has performed best as compared to other models by getting a maximum MVA of 0.9608 and the least LVL of 0.2767. On the other hand, the VGG16 showed the least scores, getting the least MVA of 0.4040 and the highest LVL of 1.6513. Table 2 Abbreviations used
Notation
Meaning
CL
Convolutional layer
AL
Artificial layer
ML
MaxPool layer
FD
Feature detection
KS
Kernel size
KI
Kernel initializer
PS
Pool size
LVL
Least validation loss
MVA
Maximum validation accuracy
OP
Optimizer
TP
True positive
FP
False positive
TN
True negative
FN
False negative
TT
Training time (in seconds)
NT
Normalized time
CL
4
2
2
4
4
4
4
Model
ChicNetV1
ChicNetV2
ChicNetV3
ChicNetV4
ChicNetV5
ChicNetV6
ChicNetV7
2
5
2
1
2
2
5
AL
4
5
2
2
2
2
4
ML
{128,128,64,32}
{128,64,32,32}
{128,64,64,32}
{128,64,64,32}
{32,64}
{32,64}
{128,64,32,32}
FD
Table 3 Structure and performance of CNN models
3,9,9,3
3
3
3
3,9
3,9
3,3
KS
Uniform
Glorot uniform
Glorot uniform
Glorot uniform
Uniform
Uniform
Uniform
KI
2,4,4,2
2,2
2,2
2,2
2,4
2,4
2
PS
0.3808
0.3342
0.3865
0.4044
0.3471
0.3718
0.3270
LVL
0.9358
0.9424
0.9323
0.9285
0.9310
0.9364
0.9383
MVA
Adam
Adam
Adam
Adam
Adam
RMSProp
Adam
OP
1508
1125
1391
1621
1258
1248
1520
TT
232 M. K. Gourisaria et al.
Chicken Disease Multiclass Classification Using Deep Learning
233
Table 4 Structure and performance of transfer learning models Model No
AL
KI
LVL
MVA
OP
TT
InceptionResNetV2
2
Glorot uniform
0.2885
0.9587
Adam
1553
VGG19
4
Glorot uniform
0.4114
0.8911
RMSProp
1607
VGG16
4
Glorot uniform
1.6513
0.4040
RMSProp
1526
Xception
2
Glorot uniform
0.2767
0.9608
Adam
1476
InceptionV3
2
Glorot uniform
1.3719
0.9499
Adam
1234
From the 7 CNN and Transfer Learning models, we have selected the best from them and compared them in the following section. Information from Table 5 shows the performance of the CNN models in various metrics and we can notice that the selected models have performed magnificently well in all the domains (Fig. 2 and Table 6).
5.2 Comparison of Selected Models’ Results From all implemented CNN models, we have selected the ChicNetV6 model and transfer learning model Xception as the best models as compared to the other models for an input image of 224 × 224. This section focuses on the comparison of the 2models selected and finding the best model after evaluation based on metrics like Efficiency, Precision, AUC, Accuracy, F1-Score, Recall and Training Time. The formulae of the above metrics are as follows: From Table 7, we can observe that the proposed model ChicNetV6 performed much better as compared to the Transfer Learning model Xception with a maximum number of True Positive, Efficiency ratio, AUC and least Training Time. However, the Xception model outperformed our model in terms of accuracy, F1-score and Recall gaining scores of 0.9523, 0.9041 and 0.8775, respectively.
6 Conclusion and Future Work Poultry farming is a huge industry and getting rid of any anomaly in the form of breeding, diseases or food, in this sector, also affects us both economically and in our well-being. The effort of this paper is by using the techniques of Deep Learning and Transfer Learning, to find an optimal CNN architecture for the detection of diseases in chickens like “Salmonella”, “Coccidiosis” and “New Castle Disease”, and also detect whether the chicken is healthy or not. Based on different metric evaluations like Efficiency, Accuracy, F1-Score, Recall and Training Time for the input image size of 224 × 224, we can conclude that our proposed model ChicNetV6 has performed outstandingly well in all the above-mentioned metrics with the highest
Accuracy
0.9486
0.9127
0.9270
0.9301
0.9375
0.9449
0.9319
Model
ChicNetV1
ChicNetV2
ChicNetV3
ChicNetV4
ChicNetV5
ChicNetV6
ChicNetV7
0.8889
0.9156
0.8915
0.8880
0.8865
0.8363
0.9147
Precision
0.8317
0.8589
0.8540
0.8243
0.8119
0.8094
0.8762
Recall
Table 5 Performance of CNN Models on various metrics
0.9766
0.9861
0.9837
0.9777
0.9691
0.9633
0.9863
AUC
0.6962
0.7993
0.7248
0.7411
0.6964
0.7608
0.7768
F1-score
42 1170
336 68
32 1180
57
1170
59 347
42
71 345
42 1170
333
42 1170
76
1148
77 328
64
327
33 1179
50
TN
FN 354
FP
TP
CM
0.8116
2.8198
1.0104
0.6611
1.5133
1.5110
0.8353
Efficiency
0.3808
0.3342
0.3865
0.4044
0.3471
0.3718
0.3270
LVL
0.7721
0.0000
0.5362
1.0000
0.2681
0.2479
0.7963
NT
234 M. K. Gourisaria et al.
Chicken Disease Multiclass Classification Using Deep Learning
235
Fig. 2 Metrics curve of ChicNetV6 model
efficiency ratio of 2.8198 and least training time of 1125 s, reducing the computational cost. Future work on the classification offaecal images could be more efficient by using the Generative Adversarial Networks (GAN) technique. We can produce more data instead of using data augmentation by using GANs. The Batch Normalization method could also have been applied to all the CNN architecture for more precise results.
Accuracy
0.9684
0.9022
0.3746
0.9589
0.9523
Model
InceptionResNetV2
VGG19
VGG16
Xception
InceptionV3
0.9049
0.9273
0.2144
0.7971
0.9514
Precision
0.9041
0.9066
0.5636
0.8168
0.9208
Recall
Table 6 Performance of transfer learning models on various metrics
0.9533
0.9859
0.3562
0.9108
0.9928
AUC
0.8775
0.8776
0.0409
0.7810
0.9241
F1-score
115 59
59
1178
39 1094
34
365
185 365
847
219
84 1128
74
1193
32 330
19
TN
FN 372
FP
TP
CM
0.5968
0.9761
0.1642
0.6442
0.8326
Efficiency
1.3719
0.2767
1.6513
0.4114
0.2885
LVL
0.2197
0.7076
0.8084
0.9717
0.8629
NT
236 M. K. Gourisaria et al.
Accuracy
0.9449
0.9523
Model
ChicNetV6
Xception
0.9049
0.9156
Precision
Table 7 Selected CNN model metric comparison
0.9041
0.8589
Recall
0.9533
0.9861
AUC
0.8775
0.7993
F1-score
34 1178
39
1180
57 347
32
TN
FN 365
FP
TP
CM
0.9761
2.8198
Efficiency
0.2767
0.3342
LVL
1476
1125
TT
Chicken Disease Multiclass Classification Using Deep Learning 237
238
M. K. Gourisaria et al.
References 1. Desin T, Koster W, Potter A (2013) Salmonella vaccines: past, present and future. Expert Rev Vaccines 12:87–96 2. Waltman WD, Gast RK, Mallinson ET (2008) Salmonellosis. Isolation and identification of avian pathogens, 5th edn. American Association of Avian Pathologists, Jackson-ville, FL, pp 3–9 3. Dalloul RA, Lillehoj HS (2006) Poultry coccidiosis: recent advancements in control measures and vaccine development. Expert Rev Vaccines 5(1):143–163 4. Grilli G, Borgonovo F, Tullo E, Fontana I, Guarino M, Ferrante V (2018) A pilot study to detect coccidiosis in poultry farms at early stage from air analysis. Biosyst Eng 2 5. Awan MA, Otte MJ, James AD (1994) The epidemiology of Newcastle disease in rural poultry: a review. Avian Pathol 23(3):405–423 6. Gourisaria MK, Jee G, Harshvardhan GM, Singh V, Singh PK, Work-neh TC (2022) Data science appositeness in diabetes mellitus diagnosis for healthcare systems of developing nations. IET Commun 7. Singh V, Gourisaria MK, Das H (2021) Performance analysis of machine learning algorithms for prediction of liver disease. In: 2021 IEEE 4th international conference on computing, power and communication technologies (GUCON). IEEE, pp 1–7 8. Panigrahi KP, Das H, Sahoo AK, Moharana SC (2021) Maize leaf disease detection and classification using machine learning algorithms. In: Progress in computing, analytics, and networking. Springer, Singapore, pp 659–669 9. Zhuang X, Bi M, Guo J, Wu S, Zhang T (2018) Development of an early warning algo-rithm to detect sick broilers. Comput Electron Agric 144:102–113 10. Zhuang X, Zhang T (2019) Detection of sick broilers by digital image processing and deep learning. Biosys Eng 179:106–116 11. Yoo DS, Song YH, Choi DW, Lim JS, Lee K, Kang T (2021) Machine Learning-driven dynamic risk prediction for highly pathogenic avian influenza at poultry farms. Republic of Korea: daily risk estimation for individual premises. Transboundary Emerg Dis 12. Akomolafe OP, Medeiros FB (2021) Image detection and classification of new-castle and avian flu diseases infected poultry using machine learning techniques. Univ Ibadan J Sci Logics ICT Res 6(1 and 2):121–131 13. Wang J, Shen M, Liu L, Xu Y, Okinda C (2019) Recognition and classification of broiler droppings based on deep convolutional neural network. J Sens 2019:10. https://doi.org/10. 1155/2019/3823515 14. Okinda C, Lu M, Liu L, Nyalala I, Muneri C, Wang J, Shen M (2019) A machine vision system for early detection and prediction of sick birds: a broiler chicken model. Biosys Eng 188:229–242 15. Cuan K, Zhang T, Li Z, Huang J, Ding Y, Fang C (2022) Automatic newcastle disease detection using sound technology and deep learning method. Comput Electron Agric 194:106740 16. Clive A (2022) Chicken disease image classification, Version 3. Retrieved from https://www. kaggle.com/datasets/allandclive/chicken-disease-1
Deepfakes Catcher: A Novel Fused Truncated DenseNet Model for Deepfakes Detection Fatima Khalid, Ali Javed, Aun Irtaza, and Khalid Mahmood Malik
Abstract In recent years, we have witnessed a tremendous evolution in generative adversarial networks resulting in the creation of much realistic fake multimedia content termed deepfakes. The deepfakes are created by superimposing one person’s real facial features, expressions, or lip movements onto another one. Apart from the benefits of deepfakes, it has been largely misused to propagate disinformation about influential persons like celebrities, politicians, etc. Since the deepfakes are created using different generative algorithms and involve much realism, thus it is a challenging task to detect them. Existing deepfakes detection methods have shown lower performance on forged videos that are generated using different algorithms, as well as videos that are of low resolution, compressed, or computationally more complex. To counter these issues, we propose a novel fused truncated DenseNet121 model for deepfakes videos detection. We employ transfer learning to reduce the resources and improve effectiveness, truncation to reduce the parameters and model size, and feature fusion to strengthen the representation by capturing more distinct traits of the input video. Our fused truncated DenseNet model lowers the DenseNet121 parameters count from 8.5 to 0.5 million. This makes our model more effective and lightweight that can be deployed in portable devices for real-time deepfakes detection. Our proposed model can reliably detect various types of deepfakes as well as deepfakes of different generative methods. We evaluated our model on two diverse datasets: a large-scale FaceForensics (FF)++ dataset and the World Leaders (WL) dataset. Our model achieves a remarkable accuracy of 99.03% on the WL dataset and 87.76% on the FF++ which shows the effectiveness of our method for deepfakes detection.
F. Khalid · A. Irtaza Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan A. Javed (B) Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan e-mail: [email protected] K. M. Malik Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_20
239
240
F. Khalid et al.
Keywords Deepfakes detection · DenseNet121 · FaceForensics++ · Fused truncated DenseNet · World leaders dataset
1 Introduction The evolution of deep learning-based algorithms such as autoencoders [12] and Generative Adversarial Networks (GANs) [9] have led to the generation of many realistic image and video-based deepfakes. Deepfakes represent the synthesized multimedia content based on artificial intelligence which mainly falls in the categories of FaceSwap, Lip-Sync, and Puppet mastery. FaceSwap deepfakes are centered on identity manipulation, where the original identity is swapped with the targeted one. Lip-syncing is a technique for modifying a video where the mouth area fits the arbitrary audio, whereas the puppet-mastery approach is concerned with the modification of facial expressions including the head and eye movement of the person. Deepfakes videos have some useful applications such as creating videos of a deceased person by using his single photo, changing the aging and de-aging of people, etc. Both applications can also be used to create realistic videos of live and deceased actors in the entertainment industry. Deepfakes have the potential not only to influence our view of reality, but can also be used for retaliation and deception purposes by targeting politicians and famous leaders and spreading disinformation to take political revenge. Existing literature on face-swapping and puppet-mastery has explored different end-to-end deep learning (DL)-based approaches. Various studies [3, 5, 10, 11] have focused on the application of DL-based methods for face swap deepfakes detection. In Bonettini et al. [3] ensemble model of EfficientNet and average voting was proposed. The model was evaluated only in intra-dataset settings, thus the generalization capability of this method cannot be guaranteed in an inter-dataset setup. In Rossler et al. [11], CNN was used in conjunction with the SVM for real and face swap detection. This approach was unable to perform well on the compressed videos. In Nirkin et al. [10] confidence score was computed from cropped faces, which were later fed into the deep learner to identify the identity manipulation. This model does not generalize well on unseen data. In de Lima et al. [5], VGG-11 was used to determine frame level features, which were then fed to various models like ResNet, R3D, and I3D to detect the real and forged videos. This technique is computationally more costly. Research approaches [1, 4, 6, 14] have also been presented for puppet mastery deepfakes detection by employing the DL-based methods. In Guo et al. [6], feature maps generated from convolutional layers were subtracted from the original images. The method removes unnecessary details from the image, allowing the RNN to concentrate on the important details. This method requires more samples for training to obtain satisfactory performance. In Zhao et al. [14], pairwise learning was used to extract source features from CNN, which were later used for classification. However, the performance of the model decreases on images that have consistent
Deepfakes Catcher: A Novel Fused Truncated DenseNet Model …
241
features. In Chintha et al. [4], temporal discrepancies in deepfakes videos were identified by combining XceptionNet CNN which extracted the facial features using bidirectional LSTM. The architecture performed well on multiple datasets; however, the performance degrades on compressed samples. In Agarwal et al. [1], a combination of VGG-16 and encoder-decoder network was applied for detection by computing the facial and behavioral attributes. This method does not apply well to unseen deepfake videos. According to the literature, existing approaches, notably [1, 10], don’t have the generalization ability on the unseen data. Rossler et al. [11], Chintha et al. [4] performs well on high-quality videos, but their performance degrades on compressed videos. Although [5] outperforms other state-of-the-art (SOTA) techniques, but is computationally complex. To better address the challenges, we present a novel fused truncated DenseNet framework that works effectively on unseen data and induces modifications to further reduce the computational cost and optimization efforts while achieving higher accuracy. Specifically, this paper makes a significant contribution based on the following: 1. We present a novel fused truncated DenseNet model that is robust to different types of deepfakes (face swap, puppet mastery, imposter, and lip-sync) and to different generative methods (Face2Face, NeuralTextures, Deepfakes, and FaceShifter). 2. We present an efficient deepfakes detection method by employing the GeLu activation function in our proposed method to reduce the complexity of the model. 3. We introduce a series of layers including global average pooling and dense layers combined with the regularization technique to prevent overfitting. 4. To evaluate the generalizability of our proposed model, we performed extensive experiments on two different deepfakes datasets including the cross-set examination.
2 Proposed Methodology This section explains the proposed workflow employed for deepfakes detection. The architecture of our proposed framework is depicted in Fig. 1.
2.1 Facial Frames Extraction The initial stage is to identify and extract faces from the video frames since the facial landmarks are the most manipulated part in deepfakes videos. For this purpose, we used the Multi-task Cascaded Convolutional Networks (MTCNN) [15] face detector to extract the facial region of 300 × 300 from the input video during pre-processing. This approach recognizes the facial landmarks such as the eyes, nose, and mouth,
242
F. Khalid et al.
Fig. 1 Architecture of proposed method
from coarse to fine details. We chose this method as it detects faces accurately even in the presence of occlusion and variable light, unlike other face detectors such as Haar Cascade and Viola Jones framework [13].
2.2 Fused Truncated DenseNet We extract the frames having frontal face exposure after detecting faces in the input video. The frames were then resized to 224 × 224 resolution and fed to our fused truncated DenseNet121. We introduce truncation modifications that help in parameter and model size minimization; as well as feature fusion, which merges the correlated feature values produced by our algorithm. As a result, an effective and lightweight model for the detection of real and deepfake videos is created. The use of pre-trained frameworks is inspired by the fact that these models have been trained on enormous publicly available datasets like ImageNet, and hence can learn the essential feature points. DenseNet121 is a ResNet architectural extension. The training technique faces vanishing gradient issues as the network’s depth grows. Both the ResNet and DenseNet models are intended to address this issue. The DenseNet design is built on all layer’s connectivity, with each layer receiving input from all previous layers and passing the output to all the subsequent layers. As a result, the resultant connections are dense, which enhances the efficiency with fewer parameters. The goal of having a DenseNet121 model is to give a perfect transmission of features throughout the whole network without performance degradation, even with considerable depth. DenseNet also handles parameter inflation utilizing a concatenation instead of layer additions. Our proposed method includes two DenseNet121 architectures. Model A is partially trained on our dataset, its early layers are frozen to preserve the ImageNet features, and the remaining layers are retrained on our data. Whereas model B is entirely retrained on our dataset. Figure 1 shows the proposed fused truncated DenseNet model, which is composed of 7 × 7 Convolution layer, proceeded by the Batch Normalization (BN), Gaussian Error Linear Unit (GeLu), and 3 × 3 Max
Deepfakes Catcher: A Novel Fused Truncated DenseNet Model …
243
Pooling layer. Next, a pair of dense blocks with a BN, GeLu, and 1 × 1 Convolution layer is followed by another BN, GeLu, and 1 × 1 Convolution layer. Unlike ResNet and other deep networks that rely on feature summation to generate large parameters, the DenseNet model employs a dense block with ‘n’ rate of growth that is appended to all the network layers. This approach evolves into an efficient endto-end transfer of features from preceding layers to succeeding layers. The proposed design produces a rich gradient quality even at deeper depths while lowering the parameter count makes it very useful for detection purposes. To avoid depletion of resources during the features extraction, the DenseNet model needs a transition layer that down-samples the feature maps by using 1 × 1 Convolution layer and 2 × 2 Average Pooling layer. Layer Truncation Although DenseNet has much lesser parameters than other DL-based models, the proposed approach aims to further minimize the parameters without compromising its effectiveness. DenseNet121 has around 8.5 million parameters. The base DenseNet model is suitable for large datasets such as the ImageNet, which has over 14 million images and 1000 categories, training and replicating this model can be time-consuming. Furthermore, with such a small dataset, employing the complete model’s architecture merely adds complexity and uses enormous resources. As a result, most of the models’ layers are eliminated through a proposed truncation from its complete network, lowering the number of parameters and reducing the end-to-end flow of features. The proposed fused truncated DenseNet with only six dense blocks followed by a transition layer connecting to another set of three dense blocks is shown in Fig. 1. The proposed methodology reduces the DenseNet121 model’s parameter count by a significant factor of 93.5%. More specifically, truncated DenseNet decreases the parameters from the initial 8.5 million to only around half a million. Activation Function is used in a multilayer neural network to express the connection between the output values of neurons in the preceding layer and the input values of those in the following layer. It determines whether a neuron should be activated or not. We used the Gaussian Error Linear Units (GeLu) [7] function in our method. As sigmoid and ReLu faces the gradient vanishing issue, along with this, ReLu also creates the dead ReLu issue. To address these issues of ReLu, probabilistic regularization techniques such as dropout are widely used after the activation functions to improve accuracy. GeLu is presented to combine stochastic regularization with an activation function. It is a conventional Gaussian distribution function that puts nonlinearity to the output of a neuron depending on their values, rather than using the input value as in ReLu. Model concatenation and Prediction The smaller size of the truncated DenseNet network results in a lower parameter value. On the contrary, adding more depth to the layers will make the truncation approach useless. To overcome this problem, we employed the model concatenation method, which improves the accuracy of our model with fewer parameters. Model concatenation and feature fusion broadened the model instead of increasing its depth, enabling the required fast end-to-end feature extraction for training and validation. To better process the features produced by the fusion of both models, the proposed method incorporates a new set of layers
244
F. Khalid et al.
consisting of Global Average Pooling, a dense layer, and the dropout connected to another dense layer activated by the classifier. These additional layers attempt to increase efficiency and regularization, hence preventing overfitting problems.
3 Experiment Setup and Results 3.1 Dataset We evaluated the performance of the proposed method using two datasets: FaceForensics++ [11] and the World Leaders Dataset [2]. FF++ is an extensive face manipulation dataset created with automated, modern video editing techniques. Two traditional computer graphics methods, Face2Face (F2F) and FaceSwap (FS) are used in conjunction with two learning-based methods, DeepFakes (DF) and NeuralTextures (NT). Each video has an individual with a non-occlusion face, although it is difficult due to differences in the skin tone of various people, lighting conditions, the presence of facial accessories, and the loss of information due to low video resolution. The YouTube videos of world-famous politicians (Clinton, Obama, Warren, and others) with their original, comical imposters (Imp), face swap (FS), lip-sync (LS), and puppet master subsets made up the WL dataset. Politicians are speaking throughout the videos; each video has only one person’s face and the camera is static with minimal variations in zooming. We divided both datasets into 80:20 splits with 80% of the videos for training and the rest 20% for testing.
3.2 Performance Evaluation of the Proposed Method We designed an experiment to analyze the performance of our method on the original and fake sets of FF++ and WL datasets to demonstrate its effectiveness for deepfakes detection. For this purpose, we employed our model to classify the real and fake videos of each subset of FF++ separately. On FF++, we tested the real samples with the fake samples from FS, DF, F2F, NT, and FaceShifter (FSh) sets, and the results are presented in Table 1. It can be noticed that the FF++-FS set has the highest accuracy of 95.73% and 0.99 AUC among all other sets. FS videos are generated by using the 3D blending method. These remarkable results on the FS set indicate that our model can better capture these traits to identify the identity changes and static textures. Whereas FSh achieved an accuracy of only 60.90% and AUC of 0.67 because the generative method of this set is very complex due to the fusion of two complex GAN’s architecture [8]. This makes it extremely challenging to reliably capture the distinctive traits of the texture used in the FSh, which limits the accuracy of our model.
Deepfakes Catcher: A Novel Fused Truncated DenseNet Model …
245
Table 1 Performance evaluation of proposed method on FF++ dataset Accuracy
FS
DF
F2F
NT
FSh
95.73
93.9
92.6
83.5
60.90
PR
0.99
0.97
0.97
0.90
0.63
AUC
0.99
0.98
0.97
0.92
0.67
Table 2 Performance evaluation of proposed method on WL dataset Leaders
Subsets
Accuracy
PR
AUC
Obama
FS
94.57
0.96
0.97
Imp
58.57
0.60
0.63
LS
62.36
0.65
0.68
JB
FS
89.68
0.91
0.94
Imp
95.65
0.97
0.96
Clinton
FS
84.13
0.87
0.86
Imp
91.43
0.92
0.94
FS
93.14
0.93
0.95
Imp
93.12
0.93
0.95
Sander
FS
89.59
0.91
0.90
Imp
78.88
0.80
0.82
Trump
Imp
99.70
1.00
1.00
Warren
For WL, each leader’s deepfakes type (FS, Imp, and LS) is tested with the original samples. Table 2 shows that the FS of Obama has shown the best accuracy of 94.57% and 0.97 AUC. Whereas Imp set of Trump has shown an accuracy of 99.70% among all the leaders. The results of this experiment revealed that our proposed model performed remarkably on both datasets. These results are due to the GeLu’s nonlinear behavior and its combinative property of dropout, zoneout, and ReLu. GeLu solves the dying ReLu problem by providing a gradient in the negative axes to prevent neurons from dying and is also capable of differentiating each datapoint of the input image.
3.3 Ablation Study In this experiment, an ablation study is conducted to demonstrate the performance of various activation functions on the FaceSwap set of the FaceForensics++ dataset. Table 3 illustrates the performance of different activation functions. The results show that our method employing the GeLu activation provided the best performance as compared to other activation functions. The disparity in findings is mainly due to the GeLu’s combinative property of dropout and zone out as well as its non-convex,
246
F. Khalid et al.
Table 3 Performance evaluation on different activation functions Activation functions
ReLu
SeLu
TRelu
ELU
GeLu
Testing on FF++ (FS)
94.5
90.6
92.3
95.09
95.73
non-monotonic, and nonlinear nature with curvature present in all directions. On the other hand, convex and monotonic activations like ReLu, ELU, and SeLu are linear in the positive axes and lack curvature. As a result, GeLu outperforms other activation functions.
3.4 Performance Evaluation of Proposed Method on Cross-Set In this experiment, we designed a cross-set evaluation to inspect the generalizability of the proposed method among the intra-sets of the datasets. For the FF++ dataset, we conducted an experiment where each trained set is tested on all the other sets, like FS trained set is tested on all the other sets, respectively. Similarly, for the WL dataset, we conducted the same experiment within each leader’s intra-set, like Obama’s FS trained set is tested on the Imp and LS sets, respectively. The results displayed in Table 4 are slightly encouraging as both the datasets contain different deepfakes types and generative methods, but still our proposed method can differentiate the modifications of identity change, expression change, and neural rendering. Table 4 shows that, on the FF++ dataset, the sets having the same generative method achieved better results as compared to others. In comparison to the FF++ dataset, our proposed model has shown better results on the WL dataset, it has easily detected the FS and Imp of most of the leaders with good accuracies, as both the types have the same generative methods, so our model generalizes well on the same generative methods. LS of Obama has shown the lowest accuracies among all because this set contains spatiotemporal glitches. DL-based models (CNNs along with RNNs) can extract the features in both the spatial and temporal domains. In our method, we used a fused truncated DenseNet-based CNN model to identify the artifacts in the spatial domain only, which reduces the accuracy of this set. We conducted another cross-set evaluation experiment for the WL dataset, where the FS and Imp trained model of one leader is tested with the FS and Imp of another leader, respectively. The motive behind this experiment was to check the robustness of the same forgery type on different leaders. The results shown in Table 5 are relatively good, which shows that the proposed model can distinguish the same forgery on different individuals even in the presence of challenging conditions such as variations in skin tones, facial occlusions, lightning conditions, and facial artifacts.
Deepfakes Catcher: A Novel Fused Truncated DenseNet Model …
247
Table 4 Performance evaluation on cross-sets of FF++ and WL dataset Test set Train set
FF++
WL
Subsets
FS
FS
–
DF
51.9
F2F
51.4
FSh NT
F2F
FSh
NT
Imp
LS
48.6
67.0
52.9
49.2
–
–
–
54.8
58.1
68.9
–
–
54.7
–
50.2
57.0
–
–
56.0
56.0
51.8
–
48.1
–
–
55.2
55.2
50.2
48.3
–
–
–
FS
–
–
–
–
–
62.1
46.9
Imp
48.0
–
–
–
–
–
32.2
LS
35.3
–
–
–
–
41.2
–
JB
FS
–
–
–
–
–
76.0
0.94
Imp
79.2
–
–
–
–
–
–
Clinton
FS
–
–
–
–
–
84.8
–
Imp
83.2
–
–
–
–
–
–
FS
–
–
–
–
–
82.1
–
Imp
92.0
–
–
–
–
–
–
FS
–
–
–
–
–
76
–
Imp
91.0
–
–
–
–
–
–
Obama
Warren Sander
DF
Table 5 Performance evaluation on cross-set of WL dataset Test set
Train set Obama JB
Obama
JB
Fs
Imp
Fs
–
–
66.3 60.3 71.3
69.3 53.6 –
Imp –
Clinton
Warren
Sander
Trump
Fs
Imp
Fs
Fs
Imp
55.1
53.8 50.1 50.3 79.3 71.1
84.1
69.4 71.2 87.3 75.2 69.3
37.6
Imp
Imp
Clinton 65.1 59.1 81.0 70.2 –
–
60.4 62.8 81.1 61.2 55.2
Warren
78.4 48.3 83.1 61.1 71.3
80.1
–
Sander
65.1 42.2 79.6 65.0 84.3
61.2
51.4 49.1 –
–
Trump
–
68.2
–
82.1 –
51.3 –
75.2 –
–
91.2 70.1 69.4
55.5 –
79.2
3.5 Comparative Analysis with Contemporary Methods The key purpose of this experiment is to validate the efficacy of the proposed model over existing methods. The performance of our method on the FF++ with existing methods is shown in Table 6. The accuracy of our model for FS and NT has increased by 5.44 and 2.9%, respectively. Whereas, for F2F and DF, our method has achieved higher accuracies as compared to most of the methods. It is difficult to obtain good
248
F. Khalid et al.
Table 6 Performance comparison against existing methods on FF++ dataset Model
FS
DF
F2F
NT
FSh
Combined
XeceptionNet
70.87
74.5
75.9
73.3
–
62.40
Steg. Features
68.93
73.6
73.7
63.3
–
51.80
ResidualNet
73.79
85.4
67.8
78.0
–
55.20
CNN
56.31
85.4
64.2
60.0
–
58.10
MesoNet
61.17
87.2
56.2
40.6
–
66.00
XeceptionNet
90.29
96.3
86.8
80.6
–
70.10
Classification
54.07
52.3
92.77
–
–
83.71
Segmentation
34.04
70.37
90.27
–
–
93.01
Meso-4
–
96.9
95.3
–
–
–
MesoInception
–
98.4
95.3
–
–
–
Proposed
95.73
93.9
92.6
83.5
60.9
87.76
Table 7 Performance comparison against existing methods on WL dataset Paper
Subset
Obama
Clinton
Warren
Sander
Trump
Agarwal et al. [2]
FS Imp
JB
Combined
0.95
0.95
0.98
0.96
0.94
0.93
1.00
0.94
–
–
0.93
0.94
– –
LS
0.83
–
–
–
–
Agarwal et al. [1]
–
–
–
–
–
–
–
0.94
Proposed
FS
0.97
0.86
0.95
0.90
–
0.94
0.97
Imp
0.63
0.94
0.95
0.82
1.00
0.96
LS
0.68
–
–
–
–
–
detection results on all subsets of the FF++ dataset, especially in the presence of challenging conditions like non-facial frames, varying illumination conditions, people of different races, and the presence of facial accessories. Our method outperforms most methods since it achieves good identification results across all subsets and can discriminate between real and fake videos generated using different manipulation techniques. We compared the performance of our method on the WL dataset with existing methods using the AUC score. Table 7 shows when all the dataset’s leaders are combined, our method outperforms the existing techniques.
4 Conclusion In this paper, we have presented a fused truncated DenseNet model to better distinguish between real and deepfakes videos. Our proposed system is lightweight and
Deepfakes Catcher: A Novel Fused Truncated DenseNet Model …
249
resilient with a shorter end-to-end architecture and fewer parameter sizes. In comparison to other SOTA models with greater parameter sizes, our truncated model trains quicker and performs well on a large and diverse dataset. Our model performed well, regardless of the distinct occlusion settings, variations in skin tones of people, and the presence of facial artifacts in both datasets. We performed an intra-set evaluation on both datasets and get better results on the sets having the same type of generative method. This shows that our model can detect the deepfakes on the unseen samples of any dataset using similar generative methods for deepfake creation. We intend to increase the generalizability of our methodology in the future to improve the cross-corpus assessment. Acknowledgements This work was supported by the grant of the Punjab Higher Education Commission of Pakistan with Award No. (PHEC/ARA/PIRCA/20527/21).
References 1. Agarwal S, Farid H, El-Gaaly T, Lim S-N (2020) Detecting deep-fake videos from appearance and behavior. In: 2020 IEEE international workshop on information forensics and security (WIFS) 2. Agarwal S, Farid H, Gu Y, He M, Nagano K, Li H (2019) Protecting world leaders against deep fakes. CVPR workshops 3. Bonettini N, Cannas ED, Mandelli S, Bondi L, Bestagini P, Tubaro S (2021) Video face manipulation detection through ensemble of cnns. In: 2020 25th international conference on pattern recognition (ICPR) 4. Chintha A, Thai B, Sohrawardi SJ, Bhatt K, Hickerson A, Wright M, Ptucha R (2020) Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J Sel Top Sign Proces 14(5):1024–1037 5. de Lima O, Franklin S, Basu S, Karwoski B, George A (2020) Deepfake detection using spatiotemporal convolutional networks. arXiv:2006.14749 6. Guo Z, Yang G, Chen J, Sun X (2021) Fake face detection via adaptive manipulation traces extraction network. Comput Vis Image Underst 204:103170 7. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415 8. Li L, Bao J, Yang H, Chen D, Wen F (2019) Faceshifter: towards high fidelity and occlusion aware face swapping. arXiv:1912.13457 9. Liu M-Y, Huang X, Yu J, Wang T-C, Mallya A (2021) Generative adversarial networks for image and video synthesis: algorithms and applications. Proc IEEE 109(5):839–862 10. Nirkin Y, Wolf L, Keller Y, Hassner T (2021) DeepFake detection based on discrepancies between faces and their context. IEEE Trans Pattern Anal Mach Intell 11. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision 12. Tewari A, Zollhoefer M, Bernard F, Garrido P, Kim H, Perez P, Theobalt C (2018) High-fidelity monocular face reconstruction based on an unsupervised model-based face autoencoder. IEEE Trans Pattern Anal Mach Intell 42(2):357–370 13. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001
250
F. Khalid et al.
14. Zhao T, Xu X, Xu M, Ding H, Xiong Y, Xia W (2021) Learning self-consistency for deepfake detection. In: Proceedings of the IEEE/cvf international conference on computer vision, pp 15023–15033 15. Xiang J, Zhu G (2017) Joint face detection and facial expression recognition with MTCNN. In: 2017 4th international conference on information science and control engineering (ICISCE). IEEE, pp 424–427
Benchmarking Innovation in Countries: A Multimethodology Approach Using K-Means and DEA Edilvando Pereira Eufrazio and Helder Gomes Costa
Abstract This article addresses the comparison of innovation between countries using the data from the Global Innovation Index (GII), using a Data Envelopment Analysis (DEA) based approach. A problem that occurs when using DEA is the distortions caused by heterogeneity in the data. In this proposal, this problem is avoided by using two-stage modelling. The first stage consists of the grouping of countries in clusters using K-means, and in the second stage, data from inputs and outputs are brought by GII. This stage is followed by an analysis of the benchmarks of each of these clusters using the classic DEA model considering constant returns of scale and the identification of anti-benchmarks through the inverted frontier. As an innovation to the GII report, this article brings a two-stage analysis that makes the comparison between countries that belong to the same cluster fairer, mitigating potential distortions that should appear because of heterogeneity in the data. Keywords Innovation · DEA · K-means · Benchmarking · Clustering
1 Introduction This article brings a comparison between countries innovation, using data from the Global Innovation Index (GII). Using a Data Envelopment Analysis (DEA) based approach. Considering innovation as a something that permeates various social sectors and is present at various levels of the global production system. Thus, sometimes it is a difficult task to obtain metrics capable of measuring innovation and still justify the investments made in its promotion. Studying innovation at the country level, considering a National Innovation System (NIS) is very important for the development of countries. Because, through E. P. Eufrazio (B) · H. G. Costa Universidade Federal Fluminense, Niterói, Brazil e-mail: [email protected] H. G. Costa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_21
251
252
E. P. Eufrazio and H. G. Costa
innovation, it is possible to reduce the unemployment rate [1], and innovation works as a driver of economic development [2]. Even though various approaches for measuring innovation efficiency have been proposed, two important elements are often missing, at least in combination: (1) accounting the diversity of national innovation system (NIS), which makes benchmarking or ranking countries a hard task. (2) evaluating the responsiveness of innovation outputs to innovation-related investments [3]. Among the possible indices to measure innovation, the Global Innovation Index (GII) stands out. The index was created in 2007, and in 2017, it encompassed 127 countries, representing 92.5% of the world population and 97.6% of GDP. The GII seeks to establish metrics capable of better capturing the multiple facets of innovation and revealing its advantages to society [4]. The index is a reference on innovation in the world. GII analyze not only traditional measures of innovation, but also evaluate unconventional factors encompassed in innovation. Envisioned to capture as complete a picture of innovation as possible, the Index comprises 80 indicators for 2020. Thus, this work seeks to identify within the group of countries for which the GII is calculated a way to identify countries like each other in terms of investment and results in innovation. To search for these similar groups, K-means [5] was used; so that 5 groups of countries were found. After the separation into groups, a classical Data Envelopment Analysis (DEA) model (CCR) [6] was applied to identify the relative efficiencies of each country in its group. And still identifying through the study of frontiers and inverted frontiers which would be the benchmarks and anti-benchmark of each cluster. Considering this proposal, a search was made in the literature for works that dealt with the use of DEA and efficiency in technological innovation at the national level. Some works were found considering other indexes but with different approaches. [7– 9]. These differences range from methodological terms to differences in approach in terms of geographic limitations of the analyses [10–13]. In our work, we apply this idea to fill a gap in a two-stage analysis that seeks to understand the economies compared to those that share levels of investment in innovation in a similar way and to understand within each of these groups what leads certain countries to become stand out. We believe that this approach can be expanded and used in other public development policies. What is quite in line with the theme of GII 2020: Who will Finance Innovation? [14].
2 Background 2.1 The Global Innovation Index (GII) The GII is co-published by Cornell University, INSEAD, and the World Intellectual Property Organization (WIPO). The GII is composed of three indices: the overall
Benchmarking Innovation in Countries: A Multimethodology Approach …
253
GII, the Innovation Input Sub-Index, and the Innovation Output Sub-Index. The overall GII score is the average of the scores of the Input and Output Sub-Indices. The Innovation Input Sub-Index is comprised of five pillars that capture elements of the national economy that enable innovative activities: (1) Institutions, (2) Human capital and research, (3) Infrastructure, (4) Market sophistication, and (5) Business sophistication. The Innovation Output Sub-Index provides information about outputs that are the result of the innovative activities of economies. There are two output pillars: (6) Knowledge and technology outputs and (7) Creative outputs. Each pillar has three sub-pillars, and each sub-pillar is composed of individual indicators, totaling 80 for 2020. (Global Innovation n.d.). In this article, we use the data from the 2020 GII edition, as it is the newest one available at the time the research was done.
2.2 K-Means Method The K-means method basically allocates each sample element to the cluster whose centroid (sample mean vector) is closest to the observed value vector for the respective element [5]. This method consists of four steps [15]: 1. First choose k centroids, called “similar” or “prototypes”, to start the partition process. 2. Each element of the dataset is then compared to each initial centroid by a distance measure, which is usually the Euclidean distance. The element is allocated to the group whose distance is the shortest. 3. After applying step number 2 for each of the sample elements, recalculate the centroid values for each new group formed, and repeat step 2, considering the centroids of these new groups. 4. Steps 2 and 3 should be repeated until all sample elements are “well allocated” in their groups, i.e., until no element reallocation is required. To decide on the number of clusters, we use the “elbow method” which is a cluster analysis consistency interpretation and validation method designed to help find the appropriate number of clusters in a dataset [16].
2.3 Data Envelopment Analysis (DEA) Data Envelopment Analysis is a methodology based on mathematical programming, which has aim to measure the efficiency of a set of productive units, called Decision Making Units (DMUs), which consume multiple inputs to produce multiple outputs [6]. In the original model of Charnes et al. [6], efficiency is represented by the ratio of weighted outputs to weighted inputs, which generalizes the Farrell efficiency of signal input and output. An important feature of DEA is its ability to provide efficiency scores while taking into account both multiple inputs and multiple outputs [17].
254
E. P. Eufrazio and H. G. Costa
The DEA procedure optimizes the measured performance of each DMU in relation to all other DMUs in a production system which transforms multiple inputs into multiple products (outputs), using Linear Programming (PL) where it is resolved a set of interrelated Linear Programming Problems (PPL’s), as many as the DMU’s, aiming, thus, to determine the relative efficiency of each one of them [18]. In this work, we use the CCR model, originally presented by Charnes et al. [6], which builds a linear surface by parts, not parametric, involving the data. Works with constant returns to scale, that is, any variation in the inputs produces proportional variation in the outputs. This model is also known as the CRS model—Constant Returns to Scale. In terms of orientation, the CCR model can be oriented towards outputs or inputs. Each of the guidelines will be briefly detailed to familiarize the reader with the concepts. However, in this work, the orientation adopted will be output. The mathematical structure of these models allows a DMU to be considered efficient with multiple sets of weights. Zero weights can be assigned to some input or output, which means that this variable was disregarded in the assessment [19].
2.4 The CCR Model Inverted Frontier The inverted frontier can be seen as a pessimistic assessment of DMUs. This method assesses the inefficiency of a DMU by building a frontier consisting of units with the worst management practices, called an inefficient frontier. Projections of DMUs on the inverted frontier indicate an anti-target which is the linear combination of anti-benchmarks. For the calculation of the inefficiency frontier, an exchange of the inputs is made with the outputs of the original DEA model [20]. The inverted frontier assessment can be used to avoid the problem of low discrimination in DEA and to order DMUs. For that, we use the efficient aggregated index (composite efficiency). Which consists of the arithmetic mean between the efficiency in relation to the original frontier and the inefficiency in relation to the inverted frontier. Thus, for a DMU with maximum compound efficiency, it needs to perform well at the standard frontier and not perform well at the inverted frontier. This implies that a DMU is good at those characteristics where it performs well and not so bad at those where its performance is not the best [21].
3 Methods and Results At first, a flow of the methodology followed will be presented to familiarize the reader and in the next subsections, each topic presented will be briefly detailed and the relevant results are showed. The adopted methodology is grounded on a sequence of three steps: 1. Use GII composite index to make a k-means cluster analysis; 2. Apply the DEA CCR model
Benchmarking Innovation in Countries: A Multimethodology Approach …
255
Fig. 1 Final cluster structure
oriented to output to each cluster in step 1, using in this step the sub-pillars provides in GII report referring to inputs and outputs for calculate a composite efficiency; 3. Identify benchmarks and anti-benchmarks considering a standard frontier and an inverted frontier.
3.1 K-Means Cluster Analysis of GII Here we use the GII aggregated input and output to divide data into country clusters. We also used a sensitivity analysis of the number of clusters to support the choice of the number of clusters in conjunction with the elbow method. Figure 1 brings the final structure of the clusters considering 5 clusters obtained through K-means. These will be the data that served as the basis for the rest of the analysis. Each cluster going forward will be looked at in isolation.
3.2 DEA CCR Model Output-Oriented In this subsection, we used the DEA CCR model (constant returns to scale). That was fitted approach considering the combined use with K-means, which generate homogeneity inside the groups and heterogeneity between groups. Therefore, it was decided to work with constant returns to scale and not with variable returns, as in the example of the BCC model, a fact due to the preprocessing obtained by clustering the data. In our analysis context, each country is considered a DMU and for the set of inputs and outputs we consider the sub-pillars described in the GII methodology.
256
E. P. Eufrazio and H. G. Costa
Thus, the model had 5 input variables (Institutions, Human Capital and Research, Infrastructure, Market sophistication, and Business Sophistication); and had 2 output variables (Knowledge and Technology Outputs and Creative Outputs). Despite the use of sub-pillars that are an aggregation of other indicators, we understand that the fact that scores are standardized, and the same rules are applied to all DMUs, alleviates the possible unwanted effect of using indices instead of original variables as described in Dyson et al. [22]. Output orientation was chosen, as one of our goals is to see which countries are benchmarks in terms of optimizing spending on investments linked to innovation. In other words, when orienting the model to outputs we consider that the investments remain constant, which allows us to observe those DMUs that are more efficient. An analysis of the inverted frontier is also carried out, which seeks to identify which DMUs have done little in terms of output even with high inputs. Finally, the composite efficiency index is calculated to identify DMUs that present a kind of balance between the approaches. Table 1 shows the results compiled from the five clusters in terms of calculated efficiencies. The table shows the first three and the last three countries of each cluster considering an ordering based on composite efficiency. Analyzing clusters according to efficiencies, it can be seen that cluster 1 includes some countries in Latin and Central America, with Jamaica and Colombia standing out. This cluster also includes countries in Africa and Central Asia. With respect to those countries with the worst performance, Oman stands out in the Arabian Peninsula, Brunei, and Peru. Cluster 2 includes countries that have a higher level of investment due to the high GDP of these nations. In this cluster, we have countries like the USA, Singapore, China, United Kingdom. In the analysis, we see that Switzerland has achieved greater compound efficiency and is still efficient in terms of standard efficiency, together with Ireland and the United Kingdom. At the other extreme of the cluster, Singapore, Canada, and China are less efficient. It is understood that these numbers are limited to analysis within the cluster, which in absolute terms has the highest level of inputs and outputs. Cluster 3 includes countries from Eastern Europe, countries from Latin America (Brazil, Mexico, and Chile) also includes some Asian countries. It can be said that the cluster in general brings together developing economies, containing practically all BRICS except China. It is noticed that countries positively detach Bulgaria, Vietnam, and Slovakia. On the other end, we have South Africa, Brazil, and Russia, countries that, compared to the others in the cluster, invest considerable resources but do not have outputs consistent with the investment. Cluster 4 has 36 countries, mostly poor countries on the African continent, with a low index of inputs which translates into a low rate of investment and consequently a low rate of outputs. It is worth highlighting positively in this cluster Côte d’Ivoire, Madagascar, and Pakistan, these countries are not necessarily those with the highest rates of outputs but have a high relative efficiency balancing outputs with inputs. A negative highlight should be placed on the last three countries Mozambique, Zambia,
Benchmarking Innovation in Countries: A Multimethodology Approach … Table 1 Efficiency compilation
Countries
257
Standard efficiency
Inverted efficiency
Composite efficiency
Jamaica
1.00
0.49
0.75
Colombia
1.00
0.49
0.75
Morocco
1.00
0.52
0.74
Cluster 1
Last countries in cluster 1 Oman
0.57
1.00
0.29
Peru
0.48
0.95
0.27
Brunei
0.46
1.00
0.23
Switzerland
1.00
0.63
0.68
Ireland
1.00
0.65
0.67
United Kingdom (the)
1.00
0.68
0.66
Cluster 2
Last countries in cluster 2 Singapore
0.75
0.86
0.44
Canada
0.68
1.00
0.34
China
0.67
1.00
0.33
Bulgaria
1.00
0.63
0.69
Viet Nam
1.00
0.65
0.67
Slovakia
1.00
0.67
0.67
Cluster 3
Last countries in cluster 3 Brazil
0.66
1.00
0.33
South Africa
0.64
1.00
0.32
Costa Rica
0.59
1.00
0.29
Côte d’Ivoire
1.00
0.48
0.76
Madagascar
1.00
0.52
0.74
Pakistan
1.00
0.54
0.73
Cluster 4
Last countries in cluster 4 Mozambique
0.58
1.00
0.29
Zambia
0.50
1.00
0.25
Benin
0.43
1.00
0.21
Cluster 5 Malta
1.00
0.73
0.63
Iceland
1.00
0.75
0.62
Luxembourg
1.00
0.78
0.61 (continued)
258 Table 1 (continued)
E. P. Eufrazio and H. G. Costa Countries
Standard efficiency
Inverted efficiency
Composite efficiency
Estonia
1.00
0.84
0.58
Last countries in cluster 5 Australia
0.84
1.00
0.42
Slovenia
0.84
1.00
0.42
United Arab Emirates (the)
0.71
1.00
0.35
and Benin, which are poor countries that consequently invest more but still have not reaped proportional outputs. Cluster 5 includes countries from the Iberian Peninsula, some countries from southern Europe, and Oceania. It brings countries with consolidated economies that are just below cluster 2, which brings the countries with the highest levels of inputs and outputs. In the group of countries in cluster 5, Malta, Iceland, and Luxembourg are worth mentioning. Considering inverted frontier the following countries stand out: Australia, Slovenia, and the United Arab Emirates.
3.3 DEA CCR Model Output-Oriented In cluster 1, we identified as benchmarks the countries Colombia, Jamaica, Panama, and Morocco for most DMUs. This cluster is made up of economies in general in development and that countries in general don’t have a high level of investments in innovation. However, it is understood that within the paradigm of these clusters, these DMUs can present practices to be observed by the members of the cluster. On the other pole, we identified Brunei Darussalam, Uzbekistan, and Rwanda as the most frequent anti-benchmarks. Which are countries with little efficiency in terms of balancing inputs and outputs. For cluster 2, the benchmarks and anti-benchmarks of the cluster, we see that the countries that are the most frequent benchmarks are Hong Kon, Germany, and Netherlands. This cluster, as previously mentioned, is the cluster with the highest values of outputs, the countries belonging to this group have high values in terms of outputs and inputs. Already considering the anti-benchmarks, we have Canada and China as the most frequent countries, but it is worth noting that there are some countries like China and Hong Kong that are benchmarks from Canada for example, and are anti-benchmarks from other countries like the Netherlands. In cluster 3, benchmarks and anti-benchmarks concentrate on developing countries that already have slightly higher values in terms of inputs and outputs. As the most frequent countries in terms of benchmarks, we have Bulgaria, Armenia, and Iran (Islamic Republic). Considering the anti-benchmarks, Brazil, Costa Rica, and Greece are the most frequent. These countries even have relatively high inputs in
Benchmarking Innovation in Countries: A Multimethodology Approach …
259
the cluster but do not have a proportional output, which ends up compromising their efficiency. Cluster 4 shows the countries with the lowest indexes in terms of inputs and outputs. As evidenced by countries mostly belonging to the African continent and that do not have many investments in terms of technological innovation. Considering the countries that stand out in terms of benchmarks, we have Egypt which stands out with a good balance between inputs and outputs, there are already benchmarks like Zimbabwe which despite being a benchmark has lower levels of score compared to Egypt. Thinking of anti-benchmarks, we highlight Mozambique, Guinea, and Benin. Cluster 5, shows the countries of Eastern Europe, Oceania, and the Iberian Peninsula. In general, consolidated economies with high levels in terms of outputs and inputs. Stand out as cluster 5 benchmarks are Estonia and Czech Republic (the). It should be noted that this cluster is the one that brings the highest number of efficient DMUs showing a balanced level between inputs and outputs in comparative terms. Regarding anti-benchmarks, we highlight Belgium, Slovenia, and the United Arab Emirates.
4 Conclusion The article brought a hybrid application between K-means and DEA applied to GII data. We understand that in methodological terms, the approach can be extended to other fields, generating an effective way to separate the DMUs before applying the DEA models. Considering the results, we understand that the objectives of the article were achieved since clusters and benchmarks and anti-benchmarks were identified for each of the 5 clusters found. It is understood that the applied methodology and the results found have the potential to aid decision-making in terms of public policies and are in line with the 2020 theme of the GII (“Who will finance innovation?”). As work limitations, we present the fact that we work with index numbers for the variables of inputs and outputs. It is also can be a limitation the fact of working with the data of inputs and outputs considering the same year and not a panel analysis. We understand that in proposals for future work, working with panel data considering different years in terms of inputs in relation to the outputs can provide valuable insights, since investments in innovation take time to take effect. Another proposal would be related to understanding how much the use of index numbers implies in the analysis of benchmarks. Acknowledgements This research was partially supported by: • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES). • Conselho Nacional de Desenvolvimento Científico e Tecnológico. • Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro.
260
E. P. Eufrazio and H. G. Costa
References 1. Richardson A, Audretsch DB, Aldridge T, Nadella VK (2016) Radical and incremental innovation and the role of university scientist. Springer, Cham, pp 131–207. https://doi.org/10.1007/ 978-3-319-26677-0_5 2. Rinne T, Steel GD, Fairweather J (2012) Hofstede and Shane Revisited. Cross-Cult Res 46(2):91–108. https://doi.org/10.1177/1069397111423898 3. Tziogkidis P, Philippas D, Leontitsis A, Sickles RC (2020) A data envelopment analysis and local partial least squares approach for identifying the optimal innovation policy direction. Eur J Oper Res 285(3):1011–1024. https://doi.org/10.1016/j.ejor.2020.02.02 4. Lapa MSS, Ximenes E (2020) Ensaio Sobre a Relação de Pernambuco com o Indicador Produtos Criativos Adotado no Índice Global de Inovação. Braz J Dev 6(11), 92639–92650. https://doi. org/10.34117/bjdv6n11-613 5. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. The Regents of the University of California. https://projecteuclid.org/euclid.bsmsp/120051 2992 6. Charnes A, Cooper WW, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2(6):429–444. https://doi.org/10.1016/0377-2217(78)90138-8 7. Guan J, Chen K (2012) Modeling the relative efficiency of national innovation systems. Res Policy 41(1):102–115. https://doi.org/10.1016/j.respol.2011.07.001 8. Matei MM, Aldea A (2012) Ranking national innovation systems according to their technical efficiency. Procedia Soc Behav Sci 62:968–974. https://doi.org/10.1016/j.sbspro.2012.09.165 9. Min S, Kim J, Sawng YW (2020) The effect of innovation network size and public R&D investment on regional innovation efficiency. Technol Forecast Soc Chang 155:119998. https:// doi.org/10.1016/j.techfore.2020.119998 10. Chen K, Guan J (2012) Measuring the efficiency of China’s regional innovation systems: application of network data envelopment analysis (DEA). Reg Stud 46(3):355–377. https:// doi.org/10.1080/00343404.2010.497479 11. Crespo NF, Crespo CF (2016) Global innovation index: moving beyond the absolute value of ranking with a fuzzy-set analysis. J Bus Res 69(11):5265–5271. https://doi.org/10.1016/j.jbu sres.2016.04.123 12. Pan TW, Hung SW, Lu WM (2010) Dea performance measurement of the national innovation system in Asia and Europe. Asia-Pac J Oper Res 27(3):369–392. https://doi.org/10.1142/S02 17595910002752 13. Salas-Velasco M (2019) Competitiveness and production efficiency across OECD countries. Compet Rev 29(2):160–180. https://doi.org/10.1108/CR-07-2017-0043 14. Cornell University, Insead, and Wipo (2020) The Global Innovation Index 2020: Who Will Finance Innovation? 15. Mingoti SA (2005) Análise de dados através de métodos de Estatística Multivariada: Uma abordagem aplicada 297 16. Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: An analysis and critique. Strateg Manag J 17(6):441–458. https://doi.org/10.1002/ (sici)1097-0266(199606)17:6%3c441::aid-smj819%3e3.0.co;2-g 17. Farrell MJ (1957) The measurement of productive efficiency. J Roy Stat Soc 120(3):253–281 18. Sueyoshi T, Goto M (2018) Environmental assessment on energy and sustainability by data envelopment analysis. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118979259 19. Cooper WW, Seiford LM, Tone K (2007) Data envelopment analysis: a comprehensive text with models, applications, references and DEA-solver software: Second edition. In: Data envelopment analysis: a comprehensive text with models, applications, references and DEA-solver software, 2nd edn. Springer US. https://doi.org/10.1007/978-0-387-45283-8 20. da Silveira JQ, Mezab LA, de Mello JCCBS (2012) Use of dea and inverted frontier for airlines benchmarking and anti-benchmarking identification. Producao 22(4):788–795. https://doi.org/ 10.1590/S0103-65132011005000004
Benchmarking Innovation in Countries: A Multimethodology Approach …
261
21. Mello JCCBS, Gomes EG, Meza LA, Leta FR (2008) DEA advanced models for geometric evaluation of used lathes. WSEAS Trans Syst 7(5):510–520 22. Dyson RG, Allen R, Camanho AS, Podinovski VV, Sarrico CS, Shale EA (2001) Pitfalls and protocols in DEA. Eur J Oper Res 132(2), 245–259. https://doi.org/10.1016/S0377-221 7(00)00149-1
Line of Work on Visible and Near-Infrared Spectrum Imaging for Vegetation Index Calculation Shendry Rosero
Abstract This study proposes to generate a basic line of calculation of vegetation indexes from bands of images from different sources for the same scene, for this, we worked on captures from a NIR sensor (near infrared) and captures from an RGB sensor as if it were a single image; Although for this type of work, there are applications based on the vegetation index that could be a commercial alternative, it is no less true that this proposal is intended to generate a basic image processing model for academic purposes while it can become a low-cost alternative for farmers whose plantations need proposals with limited budget, hence we worked with two techniques, the use of geometric transformations and processes based on correlation enhancement techniques. Keywords Image registration · k-means NIR · Rectification · Multi spectral
1 Introduction Obtaining 3D information from area images depends on the quality and quantity of input information (2D) that can be preprocessed; for practical cases, it is common to follow a flow composed of calibration, rectification, and image registration. Calibration consists of obtaining as much intrinsic and extrinsic information as possible from the camera, which allows eliminating distortions, in some cases, typical of the lens and its configuration, and in other cases, defects typical of the lens used, which together would allow eliminating common distortions such as radial distortions, barrel distortions, to mention the most common ones. In the case of aerial photography and due to the constant movements of the vehicle due to stabilization effects, it is not enough to correct the camera’s distortions; since it is not always possible to obtain a homothetic photograph of the terrain, it is also necessary to rectify the images, which consists of transforming an image into a scale projection S. Rosero (B) Universidad Estatal Península de Santa Elena, La Libertad, Ecuador e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_22
263
264
S. Rosero
of the terrain; the rectification then corrects the possible displacements existing in the aerial capture due to constant stabilization movements [1, 2]. The result is a rectified image that must contain the characteristics of an orthogonal projection of the photographed object on the study plane at a given scale. The conditions of this technique require that the initial captures do not exceed an angle of inclination of 3° in any direction of the captured plane and that the terrain be considered a flat terrain; this last condition can be obtained by varying the capture height of the images. The third element of the process is image registration, which consists of transforming the data set obtained from each photograph into a set of coordinates that allows the data obtained to be compared or integrated. The present study proposes to simplify this process, reducing the number of processes for image calibration and image registration, because the low altitude of the drones would eliminate the need for image rectification over terrains with certain deformation. The calibration and registration methods used are presented in Sect. 2, the comparative methods in Sect. 3, and the results obtained in Sect. 4, which includes a grouping process for the comparison of the registration techniques used. Under this context, the overall result will be a short academic treatise on image processing whose direct beneficiaries could be local farmers by obtaining a simple method of plant health assessment.
2 Literature Review 2.1 Calibration One of the important factors to consider when correcting for camera distortion, in general, is the radial and tangential distortion. The mathematical process of the distortion factors usually yields the necessary parameters for the correction of the distortion, this procedure for reasons of simplification and didactics is beyond the scope of this research. In general, the calibration process used [3] is based on obtaining images or videos of the patterns through the camera to be calibrated, each new pattern found will represent a new equation, and the resolution of the equation will depend on a system of equations formed by a few N captured patterns.
2.2 Image Registration and Vegetation Index Calculation Image registration and alignment is considered a process that allows, under a coordinate system, to transform image data to obtain different parameters to be evaluated or simply to improve the characteristics of the processed images. Registration methods can be classified into two major groups equally effective depending on the quality of the images used and the purposes of subsequent measurement, hence the
Line of Work on Visible and Near-Infrared Spectrum Imaging …
265
methods based on intensity measurement and feature-based methods are highlighted, the present study makes a brief analysis of the two methods. Image registration based on improved correlation coefficients The ECC (Enhanced Correlation Coefficient) was proposed in 2008 by Georgios D. Evangelidis and Emmanouil Z. Psarakis [4], its purpose was the utilization of a new similarity measure called, enhanced correlation coefficient, estimating image parameters through a motion model classified into four models: Translation, Euclidean, Affine Model, and Homography, the difference of each model varies according to how many parameters must be obtained from the variation of the image to be aligned with respect to the fixed image, this geometric variation allows to distinguish a change in angles, line parallelism or the complete distortion of images [4–6]. Image registration based on feature detection Feature-based methods tend to search for points of interest between each pair of images, the best-known techniques employ descriptors such as SIFT (Scale Invariant Feature Transform (Lowe, 1999), or Speeded-Up Robust Features (SURF) proposed by Bay et al. in 2008 to determine the points of interest between images. One of the advantages of using descriptors is that they do not require user intervention, although on the other hand, in tests performed, it was determined that both SIFT and SURF have problems when comparing images that lack highlights in each image compared, making it difficult to find points of interest to compare; alternatives shown on “Image registration by local approximation methods” (Goshtasby, 1988) [7]. A similar proposal by Goshtas was made years ago [8]. Vegetation index calculation One of the most common applications for photographs captured by near-infrared spectra is the various calculations of vegetation indices, the best known being the Normalized Difference Vegetation Index or NDVI (Rouse et al., 1974), which among other indices is a sample of the particular radiometric behavior of vegetation in relation to its photosynthetic activity and the structure of the plants themselves (leaf structure), allowing in this specific case to determine the plant health of the element under examination. This is due to the amount of energy that plants in general absorb or reflect according to the various proportions of the electromagnetic spectrum to which they may be subjected, especially for bands in the order of red and near infrared. Thus, the spectral response of healthy vegetation will contrast between the visible spectrum with respect to the near infrared or NIR, which is evident from the amount of water absorbed by the plant to such an extent that, while in the visible spectrum, the plant pigments absorb the greatest amount of energy received in the NIR, and they reflect the greatest amount of energy; on the contrary, in diseased vegetation (for various reasons), the amount of reflection in the NIR spectrum will be severely reduced. In general, the calculation of the indices is reduced to operations on the bands of the visible spectrum with respect to the NIR spectrum. Table 1 shows a light example of the different calculations that can be obtained.
266
S. Rosero
Table 1 The table shows simplified vegetation indices obtained with respect to the processing of visible spectrum bands with near-infrared spectrum bands Index
Calculation
Normalized difference vegetation N DV I = index (NDVI)
Transformed
TVI =
√
N I R−R E D N I R+R E D
Feature Scale from −1 to 1, with zero value representing absence of plants or presence of other elements, non-zero values, different plant health
N DV I + 0.5 The 0.5 is a correction factor that avoids negative results, while the square root tries to correct values that approximate a Poisson distribution
An example of how to interpret the values obtained depends on the fluctuation of the calculation made, for example for the NDVI case (−1 to 1), the studies determine that those negative values correspond mainly to cloud-like formations, water, and snow. Values close to zero correspond mainly to rocks and soils or simply areas devoid of vegetation. Values below 0.1 correspond to rocky areas, sand, or snow. Moderate values (0.2–0.3) represent shrub and grasslands. High values indicate temperate and tropical forests (0.6–0.8). Of course, this interpretation will be subject to the type of place where the capture is made, considering the different shades of the photographed elements.
3 Methodology The proposal consists of the Geometric Calibration of Cameras, an optical calibration, and the registration of images. For the calibration of cameras in the photos used in the generation of the registry, two digital cameras were used: Ricoh GR digital 4 and a MAPIR Survey 2 (with near-infrared spectrum), the two cameras were subjected to a calibration process using the method proposed by Zhang [3], through a pattern like a chessboard. In the case of MAPIR Survey 2, an additional optical calibration method proposed by the same supplier was used.
3.1 Calibration The method employed in Zhang’s method uses a checkerboard-like pattern of squares, in this case, an asymmetric pattern of 10 × 7 black/white squares was used. The process begins with the capture of images through the camera to be calibrated, in this case, tests were made for the MAPIR and the Ricoh GR digital camera. The photos were captured at distances of 30 and 60 cm with the purpose of verifying the
Line of Work on Visible and Near-Infrared Spectrum Imaging …
267
Fig. 1 Result of image calibration using the checkerboard pattern (Zhang procedure). Image a shows an uncorrected photograph, while image b shows the rectified photograph (slight black rectification border)
best results both visually and mathematically; that is, to obtain a distortion coefficient between 0 and 1, or at least that the values obtained for this coefficient are not far from 1. The number of photographs captured was 20 per camera and the implementation of the algorithm is an adaptation of the algorithm proposed at http://docs.opencv.org/ 3.1.0/d4/d94/tutorial_camera_calibration.html, using Visual Studio with C++ and OpenCV. From the results obtained, the author believes that from the way the captures were obtained (sensor types and brands), the calibration values did not affect the results obtained (Fig. 1); however, this section is maintained for academic purposes related to the normal data flow in image processing. Image correction The adaptation of the algorithm starts by reading the general settings from an XML file, which maintains the initial parameters such as: number of internal corners widthwise, number of internal corners heightwise, size in millimeters of the checkerboard square, and the number of internal corners heightwise. Parallel to this, it is necessary to define another XML file that will contain the path to each of the images captured by the camera to be processed, this file, its name and location must also be defined in the “default.xml” file under the tags. With these mathematical results, we proceeded to rectify the Ricoh images. As a precaution, it is recommended to insert a function that calculates the size of the image, and thus verify that this size corresponds to the size of the calibration standards used, otherwise the effect may be counterproductive, and the resulting images may appear with greater distortion than that added by the camera lens. An example of the process is shown in Fig. 1. The same process was applied to the MAPIR Survey 2 camera.
268
S. Rosero
MAPIR Survey Calibration The calibration of photographs coming from the MAPIR Survey 2 camera was subjected to the same Ricoh calibration process plus an additional optical calibration process proposed by the manufacturer; obtaining the distortion matrix from the Mapir camera does not require a previous optical correction, but it is recommended to avoid problems in the detection of control points for the calculation of distortion coefficients. Even for cases where the images had very short exposure times and the images did not allow control point detection, an additional preprocessing was performed consisting of: Channel separation (R, B, G), Channel separation was necessary to obtain an image with higher contrast difference to allow the calibration algorithm to determine the control points. From the separation of channels, it was determined that the channel corresponding to the blue color (blue), had higher contrast, and therefore, greater opportunity to detect the necessary control points in the calibration, after this was made an improvement of contrast through its histogram and finally the calibration process.
3.2 Image Registration Two forms of registration were performed, the first by means of geometric transformation techniques and the second through improved correlation coefficients [4]. Image registration through geometric transformations The process consists of reading two previously calibrated images, an RGB image and a NIR (Near Infrared) image. The NIR image from the MAPIR Survey 2 camera presents an image with three channels, red, green, and blue, of which the green and blue channels do not contain significant information, therefore, these two channels could be eliminated to speed up the calculations since it was observed that there is no major benefit in the results. The geometric transformations technique is included in the Matlab Computer Vision Toolbox and allows to create control points manually as shown in Fig. 2. The technique recommends at least four points, but satisfactory results were achieved with at least 11 geometrically distant control points, it is good to point out that what is sought is a coordinate system, therefore, the position accuracy is not dictated by intensity values or similar scenes but only coordinates, which makes the technique robust for images with a diversity of objects as reference and control points and interesting for GPS control points. Once the control points have been loaded, a transformation function based on geometric transformations [5, 6] must be invoked, and the size of the aligned image is set with respect to the image size. The result of this process will be an aligned image, the result of which is shown in Fig. 3.
Line of Work on Visible and Near-Infrared Spectrum Imaging …
269
Fig. 2 Control point selection process, the technique recommends at least four geometrically distant control points
Fig. 3 Image resulting from the alignment using geometric transformation techniques, the figure shows a fusion between the NIR image and the RGB image
After this process, the two images can be concatenated to obtain the four bands needed for the measurement of the various vegetation indices. The concatenation is a parallel concatenation. Image registration using improved correlation coefficients The improved correlation coefficient image alignment algorithm is based on the proposal of Georgios et al. [4], which consists of the estimation of correlated coefficients of a motion model. The advantage of the geometric transformation technique is that it does not need control points, and additionally, unlike other similarity measurement methods, it is invariant to photometric distortions for contrast/brightness levels.
270
S. Rosero
Fig. 4 Image resulting from the alignment using ECC techniques, the figure shows the aligned image without channel separation
For the application of this technique, we used Python with OpenCv 3 and part of the algorithm can be found at https://www.learnopencv.com/image-alignment-eccin-opencv-c-python/ (Mallic 2015), the result can be seen in Fig. 4.
3.3 Vegetation Index Calculation Once the images are registered, the next step is to calculate the various vegetation indices based on the band transformation, as shown in Table 1. The NDVI index was selected for experimentation purposes of the proposed method. The calculation of the vegetation index depends on the operations performed with the near-infrared spectrum and the red channel of the images to be analyzed, as shown in Eq. 1. N DV I =
N I R − RE D N I R + RE D
(1)
The results of the application of geometric transformations for NIR image registration and NDVI calculation are shown in Figs. 5 and 6.
4 Results According to the images presented, the nomenclature of the vegetation index in the NDVI image coincides with the healthy disposition in the visual contrast seen in the RGB image, but in order to estimate a grouping value, it is necessary to determine around which values the fluctuation of the index is grouped, therefore, a grouping
Line of Work on Visible and Near-Infrared Spectrum Imaging …
271
Fig. 5 Comparison of the index result, the left image shows the original RGB image, in which intense green areas can be seen referring to areas that could be considered with better health, and the right image shows the result of the index whose value range goes from −1 to 1, according to this, the clear areas (high range) correspond to areas of better plant health
Fig. 6 The right image shows the result of obtaining the vegetation index using ECC, slight changes can be observed with respect to Fig. 5
based on K-means of three groups and five iterations was performed, whose result is shown for the stadium image in Figs. 7 and 8.
5 Discussion Much of the current work on biomass calculations [9, 10], water body quality [11– 13], and vegetation, are based on the processing of hyperspectral images [14, 15] obtained from satellites, but one of the many problems of this type of images is that the study area is not always at hand, or despite maintaining the specific study region, the amount of information recovered from the processing of these images is minimal and corresponds to large areas of land. One could think of capturing images of specific areas but this would increase the cost of the study plus a climate-related mitigating factor. One of the low-cost alternatives available today is the work of precision agriculture [1, 16] in which the images captured come from unmanned
272
S. Rosero
Fig. 7 Grouping for stadium image values with ECC registration
Fig. 8 Clustering for stadium image values with registration through geometric transformations
vehicles that fly over the study terrain at low altitude, which allows obtaining either a greater amount of information per pixel, better resolutions and, depending on the altitude, radiometric and reflectivity calibrations could be avoided. The processing of this type of image would be reduced to the analysis of multispectral images [17–19] coming from one or more cameras with the appropriate filters. In the case of using two or more cameras, an image registration process must be included, given the difficulty of capturing images belonging to the same area and controlling factors such as focal length, exposure times, etc., to be within the same projection area. The present study proposes a simplified method for calculating vegetation indices that starts from camera calibration [3], up to image registration, and whose results can be compared with more complex techniques. The most complex part relies on image registration, for which two techniques were evaluated, such as geometric transformations [7, 8] and those based on correlation coefficients [4]. From the tests carried out, it was shown
Line of Work on Visible and Near-Infrared Spectrum Imaging …
273
that, except for execution times due to the higher computational cost of the ECC techniques, the results do not differ significantly from those of the ECC techniques. Although we leave open the possibility of comparing the results of this study with commercial solutions in future works, we were able to establish an academic working model on image processing and vegetation index calculations on a solid theoretical basis and at low cost, which in the short term can be perfectly used by the local farmer as a tool for analysis.
References 1. Marcovecchio DG, Costa LF, Delrieux CA (2014) Ortomosaicos utilizando Imágenes Aéreas tomadas por Drones y su aplicación en la Agricultura de Precisión, pp 1–7 2. Igamberdiev RM, Grenzdoerffer G, Bill R, Schubert H, Bachmann M, Lennartz B (2011) International journal of applied earth observation and geoinformation determination of chlorophyll content of small water bodies (kettle holes) using hyperspectral airborne data. Int J Appl Earth Obs Geoinf 13(6):912–921 3. Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334 4. Evangelidis GD, Psarakis EZ (2008) Parametric image alignment using enhanced correlation coefficient maximization. 30(10):1–8 5. Szeliski R (2006) Image alignment and stitching, pp 273–292 6. Baker S, Matthews I (2004) Lucas-Kanade 20 years on : a unifying framework. 56(3):221–255 7. Goshtasby A (1988) Image registration by local approximation methods. Image Vis Comput 6(4):255–261 8. Goshtasby A (1986) Piecewise linear mapping functions for image registration. Pattern Recogn 19(6):459–466 9. Garcia A (2009) Estimación de biomasa residual mediante imágenes de satélite y trabajo de campo. Modelización del potencial energético de los bosques turolenses, p 519 10. Peña P (2007) Estimación de biomasa en viñedos mediante imágenes satelitales y aéreas en Mendoza, Argentina, pp 51–58 11. Gao B (1996) NDWI a normalized difference water index for remote sensing of vegetation liquid water from space. 266(April):257–266 12. De E (2010) Evaluación de imágenes WorldView2 para el estudio de la calidad del agua, p 2009 13. Ledesma C (1980) Calidad del agua en el embalse Río Tercero ( Argentina ) utilizando sistemas de información geográfica y modelos lineales de regresión Controle da qualidade da água no reservatório de Rio Terceiro (Argentina ) usando sistemas de informação geográfica e m, no 12 14. Koponen S, Pulliainen J, Kallio K, Hallikainen M (2002) Lake water quality classification with airborne hyperspectral spectrometer and simulated MERIS data. 79:51–59 15. District ML, Thiemann S, Kaufmann H (2002) Lake water quality monitoring using hyperspectral airborne data—a semiempirical multisensor and multitemporal approach for the Mecklenburg Lake District, Germany. 81:228–237 16. García-cervigón D, José J (2015) Estudio de Índices de vegetación a partir de imágenes aéreas tomadas desde UAS/RPAS y aplicaciones de estos a la agricultura de precisión
274
S. Rosero
17. Firmenich D, Brown M, Susstrunk S (2011), Multispectral interest points for RGB-NIR image registration, pp 4–7 18. Valencia UPDE (2010) Análisis de la clorofila a en el agua a partir de una imagen multiespectral Quickbird en la zona costera de Gandia 19. Lillo-saavedra MF, Gonzalo C (2008) Aplicación de la Metodología de Fusión de Imágenes Multidirección-Multiresolución (MDMR) a la Estimación de la Turbidez en Lagos. 19(5):137– 146
Modeling and Predicting Daily COVID-19 (SARS-CoV-2) Mortality in Portugal The Impact of the Daily Cases, Vaccination, and Daily Temperatures Alexandre Arriaga and Carlos J. Costa Abstract The COVID-19 pandemic is one of the biggest health crises of the twentyfirst century, it has completely affected society’s daily life, and has impacted populations worldwide, both economically and socially. The use of machine learning algorithms to study data from the COVID-19 pandemic has been quite frequent in the most varied articles published in recent times. In this paper, we will analyze the impact of several variables (number of cases, temperature, people vaccinated, people fully vaccinated, number of vaccinations, and boosters) on the number of deaths caused by COVID-19 or SARS-CoV-2 in Portugal and find the most appropriate predictive model. Various algorithms were used, such as OLS, Ridge, LASSO, MLP, Gradient Boosting, and Random Forest. The method used for data processing was Cross- Industry Standard Process for Data Mining (CRISP-DM). The data was obtained from an open-access database. Keywords COVID-19 · Deaths · Sases · Vaccination · Temperature · Machine learning · Portugal · Python
1 Introduction An outbreak of a disease caused by a virus is considered a pandemic when it affects a wide geographic area and has a high level of infection which can lead to many deaths [1]. Throughout the history of humanity, there have been several pandemics, some with more mortality rates than others, such as the Spanish flu (1918), the Asian flu A. Arriaga (B) ISEG (Lisbon School of Economics and Management), Universidade de Lisboa, 1200-109 Lisbon, Portugal e-mail: [email protected] C. J. Costa Advance/ISEG (Lisbon School of Economics and Management), Universidade de Lisboa, 1200-109 Lisbon, Portugal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_23
275
276
A. Arriaga and C. J. Costa
(1957), the Hong Kong flu (1968), and the Swine flu (2009). [2] The most impactful pandemic of this century is the COVID-19 pandemic. COVID-19 is a respiratory disease caused by the SARS-CoV-2 virus [1], that affects all age groups but has more serious consequences in older individuals and/or people who have pre-existing medical conditions [3]. The first recorded cases date back to December 31, 2019, in Wuhan City, China [4]. This disease spread quite fast all over the world; in Portugal, the first case was registered on March 2, 2020 [5]. Anyone who tests positive for this disease may be symptomatic or asymptomatic. Symptoms of COVID-19 may be fever, tiredness, cough, and in more severe cases, shortness of breath and lung problems [2]. The study of the impact of vaccination, number of registered cases, and temperatures on the number of deaths caused by the SARS-CoV-2 virus has been quite frequent in recent times. The objective of this paper is to find the appropriate model to estimate the number of daily deaths by SARS-CoV-2 and later find the algorithm with the best predictive power. For this same purpose, several variables were used, both for vaccination and the number of cases. The main target is to use several machine learning algorithms to predict daily mortality.
2 Background COVID-19 mortality data can be predicted by various methods such as machine learning or statistical forecast algorithms [1]. Other than machine learning algorithms, several studies used ARIMA and SARIMA models, considering the seasonal behavior present in mortality [6]. In this paper, only machine learning algorithms were used in data modeling and prediction. According to the article [7], “The premise of machine learning is that a computer program can learn and adapt to new data without the need for human intervention”. In machine learning, there is no algorithm that can predict with the slightest error all types of data [8], that is, for each type of data, there are algorithms that are more suitable than others to predict future data. Choosing the best algorithm also depends on the problem we are facing, and on the number of variables used in the model [8]. There are several types of machine learning algorithms, such as unsupervised, supervised, semi-supervised, and reinforcement learning. The supervised type performs a mapping of the dependent and independent variables, to predict unknown future data of the dependent variable [9]. Semi-supervised type uses unlabeled data (needs no human intervention) with labeled data (needs human intervention) to predict future data. These types of algorithms can be more efficient, as they need much less human intervention in building the models [10]. Reinforcement learning algorithms produce a series of actions considering the environment where they are inserted to maximize “the future rewards it receives (or minimizes the punishments) over its lifetime” [11]. Last but not least, in Unsupervised Learning algorithms, the input data is entered “but obtains neither supervised target outputs nor rewards from its environment” [11]. An example of this type of algorithm is K-means.
Modeling and Predicting Daily COVID-19 (SARS-CoV-2) Mortality …
277
Fig. 1 Machine learning algorithms (supervised type)
In this paper, only supervised learning algorithms were discussed, which are OLS, LASSO, Ridge, Gradient Boosting, MLP, and Random Forest (Fig. 1). OLS or Linear Regression is one of the simplest machine learning algorithms to understand. Linear Regression can be simple (when only one independent variable is used in the model), or multiple (when two or more variables are used to predict the dependent variable) [12]. The structural model of Linear Regression is Y = β0 + β1 X 1 + · · · + βm X m + ε
(1)
where Y represents the dependent variable, and X is the independent variable(s). β parameters are the coefficients estimated by the regression model and the ε parameter is the error associated with the estimated model. Ridge Regression is an algorithm that is used when there are multicollinearity problems between the predictor variables of the model [12]. Multicollinearity is a condition that exists when one or more independent variables of the model can predict another independent variable relatively well. The least Absolute Shrinkage and Selection Operator (LASSO) is an algorithm that improves the accuracy of the model through variable selection and regularization. This process is called variable shrinkage, in which the objective is to reduce the number of predictive variables present in the model [12]. Gradient Boosting (GB) can be used for both classification and regression purposes. This algorithm is an ensemble algorithm, which started out being used in the optimization of a cost function and has been used in various areas, such as in the detection of energy theft [13]. This method has not been used much in studies concerning the COVID-19 pandemic [14]. GB is an algorithm, through several iterations, that combines a series of models with a learning rate to minimize prediction errors. In each of the models resulting from the iterations, discard the weakest predictors and choose the strongest ones [14]. GB additive model can be represented as follows: Fm (x) = Fm−1 (x) + ρm h m (x)
(2)
278
A. Arriaga and C. J. Costa
where Fm−1 is the previous model, and h m is the learning rate, which is used to minimize prediction errors [13]. ρm is a multiplier that can be represented as follows: ρm = argmin ρ
n
L(yi , Fm−1 (xi ) + ρh m (xi ))
(3)
i=1
where yi is the target class label [13]. These machine learning algorithm have already been used in several articles that talk about the topic discussed in this paper such as [13, 14]. Multilayer Perception (MLP) is a machine learning method that uses artificial neural networks. As shown in the article [15], “The experience of the network is stored by the synaptic weights between neurons and its performance is evaluated, for example, by the ability to generalize behaviors, recognize patterns, fix errors or execute predictions”. This algorithm associates several neurons, forming neural networks that will perform various functions to improve the prediction [15]. MLP can use supervised or unsupervised learning, in this paper, we only focus on Supervised Learning. Random Forest (RF) is another ensemble algorithm like Gradient Boosting, that uses decision trees in the background. The decision trees are created based on a random sample of the training data [16]. The difference between RF and GB is that RF does not use a learning rate, it uses the average results of all generated trees [17]. The choice of these six algorithms was made to understand how linear algorithms, ensemble algorithms, and neural network algorithms behave in predicting the referenced data, measuring the accuracy of each one of them, through the most diverse measures described further down in the paper.
3 Method To predict COVID-19 mortality data, we used the number of daily cases and all vaccination data present in the database called “Our World in Data” [18]. The temperature data was obtained from the database of the National Centers for Environmental Information and is referred to the mean temperature registered in LISBOA GEOFISICA Station [19]. The following figures show the graphs of all variables from March 2, 2020 to February 28, 2022 (Figs. 2, 3, 4 and 5). The process described below was designed using the method CRISP-DM [20]. We used vaccination data to predict mortality given the impact that vaccination had since its inception on the number of both deaths and cases of COVID-19 infection [21]. We also used daily number of new cases to study the impact that this variable had before and after the start of the vaccination process on the number of deaths and for last we also use temperature because of the seasonal pattern present in the data. For the daily new cases, with the help of python [22], we created two dummy variables, the first named before_vaccination and the second named after_vaccination. The
Modeling and Predicting Daily COVID-19 (SARS-CoV-2) Mortality …
Fig. 2 Daily number of deaths from SARS-CoV-2 in Portugal
Fig. 3 Total people vaccinated and fully vaccinated in Portugal
Fig. 4 Total people with vaccine boosters and daily number of vaccinations in Portugal
279
280
A. Arriaga and C. J. Costa
Fig. 5 Daily number of deaths from SARS-CoV-2 and the daily average temperature in Portugal
before_vaccination variable is equal to 1 if we are talking about a day before the vaccination process started, and is equal to 0 if the day in question is after the beginning of the vaccination process. While the second dummy variable has the opposite process. Next, we create two new variables: new_cases_before_vaccination (new_cases × before_vaccination) and new_cases_after_vaccination (new_cases × after_vaccination). The next step was to infer how we would use the vaccination data. As we know from previous studies, the impact of vaccination is not immediate [23], so we decided to make lags for all vaccination variables, the first set of lags was one month and the second set of two weeks. For the temperature data, the average values were used. As there was missing data in the database, we decided to fill in the missing data in two ways, the first was to replace the initial missing data with zero, and the second was through python’s interpolate method, to fill in the remaining missing data [22]. Then, as sometimes there was no data on weekends, we removed all weekends from the database. Finally, so that all the variables are on the same scale and to measure which variables have the most impact on the model, a standardization of the data was carried out through the StandardScaler function of the scikit-learn module [24]. After all the data was prepared, we started to build the model that would be used in the regressions. The first step was to insert all the variables present in the database into a linear regression model (OLS), through stats modules [25], then the Variance Inflation Factor (VIF) of the model was tested. If there were one or more variables with a VIF greater than 5, the variable with the least correlation with the dependent variable was removed. Finally, the p-value of the t-test statistics [26] was observed and the non-significant variables for the model were removed, but as the adjusted R2 decreases, it was decided to keep these same variables, leaving the following: new_cases_beffore_vaccination, temperature, people_vaccinated, new_vaccinations, new_vaccinations_lag1M, new_cases_after_vaccination, and boosters_lag_1M.
Modeling and Predicting Daily COVID-19 (SARS-CoV-2) Mortality …
281
The next step was to divide the data into training and test samples and parameterize the algorithms for the model. The parameterization of Ridge, LASSO, Gradient Boosting, MLP, and Random Forest was done by inserting random numbers in many iterations, through various Cross-Validation methods [24], to find the optimal parameters. After all predictions have been made, a Durbin–Watson test [27] was performed on the residuals of each algorithm prediction to test whether there is an autocorrelation between the residuals. We also calculated the average of the residuals to see if it was close to 0 [28]. Finally, a comparison of some measures such as Mean Absolute Error (MAE), Mean Square Error (MSE), Median Absolute Error, Explained Variance Score (EVS), and the predicted R2 was made to all algorithms [24].
4 Results As mentioned above, the first step was the estimation of the linear regression model (OLS) as shown in Fig. 6. From the output in Fig. 6, the daily number of cases of COVID-19 has a positive impact, both before and after the vaccination process, with a greater impact before vaccination (higher coefficient in the model). The vaccination variables all have a positive impact, except for the one-month lag of the daily number of administered vaccines, a result that goes against what would be expected. Finally, it can be inferred
Fig. 6 OLS Regression Model
282
A. Arriaga and C. J. Costa
Table 1 Algorithm quality measures Model
MAE
MSE
MdAE
EVS
R2
OLS
0.386
0.388
0.288
0.504
0.503
Ridge
0.386
0.387
0.289
0.504
0.503
LASSO
0.385
0.387
0.287
0.505
0.504
Gradient boosting
0.136
0.055
0.070
0.930
0.930
MLP
0.190
0.133
0.087
0.829
0.830
Random forest
0.150
0.075
0.075
0.904
0.904
that temperatures have a negative impact on the number of deaths, which is in line with the data that we can observe in the graphs of both variables. For the estimation of the remaining models, we used a hyper parametrization of the models. Moving on now to the identification of the model with the best predictive power, we can observe the table below with the information referring to each model. By observing Table 1, we can infer that Gradient Boosting was the best predictive algorithm, obtaining the best scores in all measures. Random Forest and MLP also obtained good results, with RF being superior to MLP in all score measures. This indicates that these three algorithms may be candidates for making a good future prediction of daily COVID-19 mortality data. In Fig. 7, we can observe the importance of each of the predictors, given by the Gradient Boosting algorithm. The people vaccinated and the temperature are the most important variables for predicting COVID-19 deaths, contrary to what happened in the OLS, in which the number of daily cases was the variable with the highest coefficient. Strangely, the GB assigns less weight to the number of cases before vaccination compared to the number of cases after vaccination. A relevant curiosity is also that the variables that were not significant in the OLS are the two with the least importance in the GB. Bearing in mind that this was the algorithm with the greatest predictive power and the coefficients given by the OLS, we can say that the average temperature and the vaccinated people played a leading role in reducing deaths from SARS-CoV2. Finally, we can observe in Table 2 the results of the Durbin–Watson test and the average of the residues to test their quality. The values in Table 2 show that the residuals are not correlated (test statistic close to 2 ± 0.5), and their average is close to 0 in all algorithms, with the worst results being in the first three [27]. We can say that all models adequately capture the information present in the data [28, 29].
Modeling and Predicting Daily COVID-19 (SARS-CoV-2) Mortality …
283
Fig. 7 Gradient boosting—predictors importance
Table 2 Durbin–Watson test results and mean value of residuals
Model
Durbin–Watson test
Mean of residuals
OLS
2.379
−0.030
Ridge
2.379
−0.030
LASSO
2.379
−0.030
Gradient Boosting
2.134
0.017
MLP
2.260
0.001
Random Forest
1.861
0.016
5 Conclusions The objective of this paper was to infer the impact of vaccination, temperature, and the number of cases on SARS-CoV-2 mortality in Portugal. Various vaccination data and the lags of these data, such as a “division” of the number of daily cases registered before and after vaccination and the daily average temperature were used. The initial model began to be built using the OLS method and then replicated in other algorithms. There was a positive correlation between the dependent variable and the number of cases, as expected, but the difference in the coefficient before and after vaccination was very clear, while almost all the vaccination data present in the model had a negative coefficient, as already was expected, except the daily number of vaccinations lagged a month. The results in Gradient Boosting, MLP, and Random Forest were satisfactory, while in OLS, Ridge, and LASSO, the model fit values were below expectations, which may mean that the relation between the predictors and the dependent variable it’s not linear. The objectives of the paper were achieved, as the algorithm with the greatest predictive power was identified, which consists of
284
A. Arriaga and C. J. Costa
an ensemble algorithm, Gradient Boosting, and was proven that the vaccination is a good preventive measure against deaths from SARS-CoV-2 and temperature has a negative impact on the number of deaths. Acknowledgements We gratefully acknowledge financial support from FCT—Fundação para a Ciência e a Tecnologia (Portugal), national funding through research grant UIDB/04521/2020.
References 1. Almalki A, Gokaraju B, Acquaah Y, Turlapaty A (2022) Regression analysis for COVID-19 infections and deaths based on food access and health issues. Healthcare 10(2):324. https:// doi.org/10.3390/healthcare10020324 2. Rustagi V, Bajaj M, Tanvi, Singh P, Aggarwal R, AlAjmi MF, Hussain A, Hassan MdI, Singh A, Singh IK (2022) Analyzing the effect of vaccination over COVID cases and deaths in Asian countries using machine learning models. Front Cell Infect Microbiol 11. https://doi.org/10. 3389/fcimb.2021.806265 3. Sarirete A (2021) A bibliometric analysis of COVID-19 vaccines and sentiment analysis. Proc Comput Sci 194:280–287. https://doi.org/10.1016/j.procs.2021.10.083 4. Sohrabi C, Alsafi Z, O’Neill N, Khan M, Kerwan A, Al-Jabir A, Iosifidis C, Agha R (2020) World Health Organization declares global emergency: a review of the 2019 novel coronavirus (COVID-19). Int J Surg 76:71–76. https://doi.org/10.1016/j.ijsu.2020.02.034 5. Milhinhos A, Costa PM (2020) On the progression of COVID-19 in Portugal: a comparative analysis of active cases using non-linear regression. Front Public Health 8. https://doi.org/10. 3389/fpubh.2020.00495 6. Perone G (2022) Using the SARIMA model to forecast the fourth global wave of cumulative deaths from COVID-19: evidence from 12 hard-hit big countries. Econometrics 10:18. https:// doi.org/10.3390/econometrics10020018 7. Aparicio JT, Romao M, Costa CJ (2022) Predicting bitcoin prices: the effect of interest rate, search on the internet, and energy prices. 17th Iberian conference on information systems and technologies (CISTI), Madrid, Spain, pp. 1–5. https://doi.org/10.23919/CISTI54924.2022.982 0085 8. Aparicio JT, Salema de Sequeira, JT and Costa CJ (2021) Emotion analysis of Portuguese Political Parties Communication over the covid-19 Pandemic, 16th Iberian conference on information systems and technologies (CISTI), Chaves, Portugal, pp. 1–6. https://doi.org/10.23919/ CISTI52073.2021.9476557 9. Cord M, Cunningham P (2008) Machine learning techniques for multimedia: case studies on organization and retrieval. Springer Science & Business Media 10. Zhu X (Jerry) (2005) Semi-supervised learning literature survey. University of WisconsinMadison, Department of Computer Sciences 11. Mendelson S, Smola AJ (eds) (2003) Advanced lectures on machine learning: machine learning summer school 2002, Canberra, Australia, February 11–22, 2002: revised lectures. Springer, Berlin, New York 12. Saleh H, Layous J (2022) Machine learning—regression Thesis for: 4th year seminar higher institute for applied sciences and technology 13. Gumaei A, Al-Rakhami M, Mahmoud Al Rahhal M, Raddah H, Albogamy F, Al Maghayreh E, AlSalman H (2020) Prediction of COVID-19 confirmed cases using gradient boosting regression method. Computers, Materials & Continua, 66(1):315–329. https://doi.org/10.32604/cmc. 2020.012045
Modeling and Predicting Daily COVID-19 (SARS-CoV-2) Mortality …
285
14. Shrivastav LK, Jha SK (2021) A gradient boosting machine learning approach in modeling the impact of temperature and humidity on the transmission rate of COVID-19 in India. Appl Intell 51, 2727–2739 (2021). https://doi.org/10.1007/s10489-020-01997-6 15. Borghi PH, Zakordonets O, Teixeira JP (2021) A COVID-19 time series forecasting model based on MLP ANN. Proc Comput Sci 181:940–947. https://doi.org/10.1016/j.procs.2021. 01.250 16. Gupta KV, Gupta A, Kumar D, Sardana A (2021) Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model in big data mining and analytics, 4(2):116–123. https://doi.org/10.26599/BDMA.2020.9020016.4 17. Ye¸silkanat CM (2020) Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm. Chaos Solitons Fractals 140:110210. https:// doi.org/10.1016/j.chaos.2020.110210 18. COVID-19 Data Explorer. https://ourworldindata.org/coronavirus-data-explorer. Accessed 2022/07/05 19. Menne MJ, Durre I, Korzeniewski B, McNeill S, Thomas K, Yin X, Anthony S, Ray R, Vose RS, Gleason BE, Houston TG (2012) Global historical climatology network—daily (GHCN-Daily), Version 3. https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/ gov.noaa.ncdc:C00861/html 20. Costa C, Aparício JT (2020) POST-DS: a methodology to boost data science, 15th Iberian conference on information systems and technologies (CISTI), Seville, Spain, pp. 1–6. https:// doi.org/10.23919/CISTI49556.2020.9140932 21. Haas EJ, McLaughlin JM, Khan F, Angulo FJ, Anis E, Lipsitch M, Singer SR, Mircus G, Brooks N, Smaja M, Pan K, Southern J, Swerdlow DL, Jodar L, Levy Y, Alroy-Preis S (2022) Infections, hospitalisations, and deaths averted via a nationwide vaccination campaign using the Pfizer–BioNTech BNT162b2 mRNA COVID-19 vaccine in Israel: a retrospective surveillance study. Lancet Infect Dis 22:357–366. https://doi.org/10.1016/S1473-3099(21)00566-1 22. Albon C (2018) Machine learning with Python cookbook: practical solutions from preprocessing to deep learning. O’Reilly Media, Inc 23. Dyer O (2021) Covid-19: Moderna and Pfizer vaccines prevent infections as well as symptoms, CDC study finds. BMJ n888. https://doi.org/10.1136/bmj.n888 24. Avila J, Hauck T (2017) Scikit-learn cookbook: over 80 recipes for machine learning in Python with scikit-learn. Packt Publishing Ltd 25. Seabold S, Perktold J (2010) Statsmodels: econometric and statistical modeling with python. Proceedings of the 9th Python in science conference (SciPy 2010) Austin, Texas. https://doi. org/10.25080/Majora-92bf1922-011 26. Kim TK (2015) T test as a parametric statistic. Korean J Anesthesiol 68:540–546. https://doi. org/10.4097/kjae.2015.68.6.540 27. Mckinney W, Perktold J, Seabold S (2011) Time series analysis in Python with statsmodels Proceedings of the 10th Python in science conference (SciPy 2011). https://doi.org/10.25080/ Majora-ebaa42b7-012 28. Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. 2nd edition, OTexts: Melbourne, Australia 29. Akossou A, Palm R (2013) Impact of data structure on the estimators R-square and adjusted R-square in linear regression. Int J Math Comput 20:84–93
Software Engineering
Digital Policies and Innovation: Contributions to Redefining Online Learning of Health Professionals Andreia de Bem Machado, Maria José Sousa, and Gertrudes Aparecida Dandolini
Abstract Social, economic, and cultural transformations are interconnected with each other by the need for changes in the educational scenario regarding the teaching– learning process. With this in mind, this search will analyze in the light of a bibliometric review which policies allied to digital innovation can contribute to redefining the online learning of health professionals. It also presents the results of a bibliometric review that was conducted in the Web of Science (WoS) database. The results point to a pedagogy that provides interaction between teachers and students with the effective use of technology in order to promote knowledge for the formation of future professionals’ competencies. The purpose of this study is to see if technologies and technological practices can help health professionals learn more effectively online. Furthermore, it was discovered that digital education technologies and instructional methodologies are critical tools for facilitating fair and inclusive access to education, removing barriers to learning, and broadening teachers’ perspectives in order to improve the learning process of healthy students. Keywords Digital innovation · Online learning · Health professionals
1 Introduction The socioeconomic, political, cultural, scientific, and technological changes that have occurred throughout the twenty-first century, such as the globalization of the A. de B. Machado (B) · G. A. Dandolini Engineering and Knowledge Management Department, Federal University of Santa Catarina, Santa Catarina, Brazil e-mail: [email protected] G. A. Dandolini e-mail: [email protected] M. J. Sousa University Institute of Lisbon (ISCTE), Lisbon, Portugal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_24
289
290
A. de B. Machado et al.
economy and information, have driven the digital revolution [1]. As a result, there has been a substantial increase in the usage of ICT in society, resulting in major and quick changes in the way humans relate and communicate. In this context, most students use the internet as a primary learning need [2], so these data have policy implications for education during the digital transformation that is occurring in the 4.0 industrial revolution in this millennium [6]. In this technological scenario of changes in the way of receiving information and communicating, there have been changes in the educational space, especially regarding the teaching and learning process. These modifications were intended to provide students with critical thinking, focused on a collaborative construction of knowledge, in order to make it meaningful [8]. Proper policies allied with digital innovation [9] can contribute to redefining the online learning of health professionals through active learning methods. The use of these methods and the early insertion of students in the daily life of services favor meaningful learning, and the construction of knowledge, in addition to developing skills and attitudes, with autonomy and responsibility. Changing the learning process from face-to-face to distance learning is a decision that must be made by educational institutions so that the educational objectives can be implemented effectively and efficiently. The usage of internet networks in the learning process is known as online learning, and it provides students with the flexibility to learn whenever and wherever they want [7]. Learning can be understood as a path to transformation of the person and the reality, through which the student and the teacher become subjects of the teaching– learning process, transforming their pedagogical and professional practices, and building freedom with responsibility. And currently, with the changes in educational methods and tools, it becomes possible for those subjects to reflect critically on their practice and their learning mediated by digital innovation. Thus, this research aims to analyze, in the light of a bibliometric review, which policies allied to digital innovation can contribute to redefining the online learning of health professionals. The study was developed by conducting a bibliometric search in the Web of Science database. In addition to this introductory section, the next section describes the methodology, the results, and the analysis of the resulting bibliometric scenario of scientific publications. The third section brings the final considerations.
2 Proposed Methodology In order to measure, analyze and increase knowledge about confidence in the subject, policies combined with digital innovation can contribute to redefining the online learning for health professionals present in scientific literature publications, a bibliometric analysis was performed, starting with a search in the Web of Science, a database currently maintained by Clarivate Analytics. The study was developed using a strategy consisting of three phases: Execution plan, data collection, and bibliometrics. The Bibliometrix program was used to evaluate the bibliometric data because it is the most compatible with the Web of Science database. Biblioshiny,
Digital Policies and Innovation: Contributions to Redefining Online …
291
a R Bibliometrix package, has the most comprehensive and appropriate collection of techniques among the tools investigated for bibliometric analysis [5]. These data provided the organization of relevant information in a bibliometric analysis, such as temporal distribution; main authors, institutions, and countries; type of publication in the area; main keywords; and the most referenced papers. Scientific mapping allows one to investigate and draw a global picture of scientific knowledge from a statistical perspective. It uses mainly the three knowledge structures to present the structural and dynamic aspects of scientific research.
2.1 Methodological Approach This study is characterized as exploratory-descriptive since it aims to describe the subject and increase the researchers’ familiarity with it. A systematic search in an online database was employed for the literature search, which was followed by a bibliometric analysis of the results. Bibliometrics is a method used in the information sciences to map documents from bibliographic records contained in databases using mathematical and statistical methodologies [3]. Bibliometry allows for relevant findings such as the number of publications by region; temporality of publications; organization of research by area of knowledge; literature count related to citation of the numbers of studies related to citations found in the researched documents; identification of the impact factor of a scientific publication, among others, which contribute to the systematization of research results and the minimization of biases when analyzing data. The study was divided into three parts for the bibliometric analysis: Planning, collecting, and findings. These steps all came together to answer the study’s guiding question, which was: How can digital policies and innovation contribute to redefining the online learning of healthcare professionals? Planning began in November and ended in December 2021, when the research was carried out. During planning, some criteria were defined, such as the limitation of the search to electronic databases, and not contemplating physical catalogs in libraries, due to the number of documents considered sufficient in the Web search bases. In the planning scope, the Web of Science database was stipulated as the most suitable for the domain of this research due to the relevance of that database in the academic environment and its interdisciplinary character, which is the focus of research in this area. And also because it is one of the largest databases of abstracts and bibliographic references of peer-reviewed scientific literature and it undergoes constant updating. Considering the research problem, the search terms were defined in the planning phase, namely: “policy” and “online learning” or “online education” and “digital innovation” and “health professionals”. The use of the Boolean operator OR aimed to include the largest possible number of studies that address the subject of interest of this research. The use of truncator “*” was used to enhance the result by searching for the “policies coupled with digital innovation can help redefine online learning for health professionals” and its writing variations presented in the literature. It is considered that the variations of the expressions used in the search are presented, in
292 Table 1 Bibliometric data
A. de B. Machado et al. Main information about the data collected Description
Results
Timespan
1998–2021
Sources (journals, books, etc.)
608
Documents
897
Document Types Article
566
Article; data paper
4
Article; early access
30
Article; paper from proceedings
10
Editorial material
5
Paper from proceedings
261
Review
21
Review; early access
1
Document Contents Keywords Plus® (ID)
986
Author’s Keywords (DE)
2635
Authors Authors
3739
Author appearances
4431
Authors of single-authored documents
87
Authors of multi-authored documents
3652
Collaboration between authors Single-authored documents
87
Documents per author
0.241
Authors per document
4.15
a larger context, within the same purpose, because a concept depends on the context to which it is related. Finally, when planning the search, it was decided to use the terms defined in the title, abstract, and keyword fields, without restricting the time, language or any other that might limit the results. The data collection retrieved a total of 897 indexed papers, from 1998, the first publication, until 2021. The collection revealed that these papers were written by 3,739 authors, and linked to 427 institutions from 103 different countries. A total of 986 keywords were used. Table 1 shows the results of this data collection in a general bibliometric analysis. The eligible articles in the Web of Science database were published between 1998 and 2021. Of the 897 papers, there is a varied list of authors, institutions, and countries that stand out in the research on policies allied to digital innovation that can contribute to redefining the online learning of health professionals.
Digital Policies and Innovation: Contributions to Redefining Online …
293
When analyzing the country that has the most publications in the area, one can see that the USA stands out with 43% of total publications, a total of 4,026 papers. In second place is China with 24% of the publications, as shown in Fig. 1, which shows the 20 most cited countries. Another analysis performed is related to the identification of authors. The most relevant authors on digital policies and innovation, online learning, and health professionals are Derong Liu, with 14 publications, and Yu Zhang, with 12 published documents, as shown in Fig. 2
Fig. 1 Proposed research methodology
Fig. 2 Most relevant authors
294
A. de B. Machado et al.
Fig. 3 Most globally cited documents
The twenty most globally cited documents are shown in Fig. 3. The paper that gained the most prominence, with 774 citations, is Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems, by Derong Liu and Qinglai Wei, published in 2017, as shown in Table 2. From the general survey data, it was also possible to analyze the hierarchy of the sub-branches of this research on how policies allied to digital innovation can contribute to redefining the online learning of health professionals. The set of rectangles represented in the TreeMap shown in Fig. 4 shows the hierarchy of the subbranches of the research in a proportional way. It can be seen that themes such as education, model, performance, online, and technologies appear with some relevance and are related to policy, digital innovation, and online learning of health professionals. Also, from the bibliometric analysis, 986 keywords chosen by the authors were retrieved. Thus, the tag cloud shown in Fig. 5 was elaborated based on those retrieved words. The highlight was “education” with a total of 42 occurrences and, in second place, “model”. When looking at which country has the most publications in the region, the United States comes out on top with 43% of all publications (4,026). China is in second place, accounting for 24% of all publications. Derong Liu, who has 14 publications, is the most relevant author on the theme of policies and digital innovation, online learning, and health professionals, followed by Yu Zhang, who has 12 publications. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems by Derong Liu and Qinglai Wei, published in 2017, was the work that stood out with 774 citations.
Digital Policies and Innovation: Contributions to Redefining Online …
295
Table 2 Articles and total citations Paper
Total citations
Liu D, 2014, IEEE Trans Neural Netw Learn Syst-A
322
Xiang Y, 2015, 2015 Ieee International Conference On Computer Vision (Iccv) 248 Shea P, 2009, Comput Educ
237
Jaksch T, 2010, J Mach Learn Res
193
Liu D, 2014, Ieee Trans Neural Netw Learn Syst
192
Modares H, 2014, Ieee Trans Autom Control
185
Xu J, 2017, Ieee Trans Cogn Commun Netw
182
Gai Y, 2012, Ieee-Acm Trans Netw
150
Wang S, 2018, Ieee Trans Cogn Commun Netw
148
Xu X, 2007, Ieee Trans Neural Netw
144
Aristovnik A, 2020, Sustainability
142
Zhang W, 2020, J Risk Financ Manag
125
Endo G, 2008, Int J Robot Res
122
Jiang Y, 2015, Ieee Trans Autom Control
116
Kelly M, 2009, Nurse Educ Today
113
Ivankova Nv, 2007, Res High Educ
101
Geng T, 2006, Int J Robot Res
98
Dinh Thai Hoang Dth, 2014, Ieee J Sel Areas Commun
95
Jiang Y, 2012, Ieee Trans Circuits Syst Ii-Express Briefs
94
Sundarasen S, 2020, Int J Environ Res Public Health
86
Fig. 4 Tree map
296
A. de B. Machado et al.
Fig. 5 Tag cloud
3 Conclusion It was found that the policies allied to digital innovation that can contribute to redefining the online learning of health professionals are those that tend to stimulate the development of a teaching–learning process that is creative, meaningful for the student, and committed to the local and regional health needs, encouraging autonomy and self-management of one’s own learning. The health system is part of the scenario for practice, providing an opportunity for the health field in a real situation, dynamic, and in action. Thus, the organization of health services, their practices, their management, and the formulation and implementation of policies are fundamental to the education process of health professionals. So, with the increasing globalization and the emergence of digital education, policies for online learning of health professionals have to enable educational strategies based on active methodologies carried out through research projects that provide open and direct feedback. As limitations, the method presented here is not able to qualitatively identify the theme of the policies allied to digital innovation that redefine the online learning of health professionals, and therefore, it is recommended the realization of integrative literature reviews that allow broadening and deepening the analysis performed here.
Digital Policies and Innovation: Contributions to Redefining Online …
297
References 1. de Bem Machado A, Sousa MJ, Dandolini GA (2022) Digital learning technologies in higher education: A bibliometric study. Em Lect Notes Netw Syst, p 697–705. Singapore: Springer Nature Singapore 2. de Bem Machado A, Secinaro S, Calandra D, Lanzalonga F (2021b) Knowledge management and digital transformation for Industry 4.0: a structured literature review. Knowl Manag Res & Pract, 1–19. https://doi.org/10.1080/14778238.2021.2015261 3. Linnenluecke MK, Marrone M, Singh AK (2019). Conducting systematic literature reviews and bibliometric analyses. Aust J Manag, 031289621987767. 4. Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634. https://doi.org/10.1109/ TNNLS.2013.2281663 5. Moral-Muñoz JA, Herrera- E, Santisteban-Espejo A, Cobo MJ (2020) Software tools for conducting bibliometric analysis in science: An up-to-date review. El profesional de la información 29(1):e290103 6. Rusli R, Rahman A, Abdullah H (2020) Student perception data on online learning using heutagogy approach in the Faculty of Mathematics and Natural Sciences of Universitas Negeri Makassar. Indonesia. Data in Brief 29(105152):105152. https://doi.org/10.1016/j.dib.2020. 105152 7. Sari MP, Sipayung YR, Wibawa KCS, Wijaya WS (2021) The effect of online learning policy toward Indonesian students’ mental health during Covid-19 pandemic. Pak J Med Health Sci 15(6):1575–1577. https://doi.org/10.53350/pjmhs211561575 8. Sousa MJ, Marôco AL, Gonçalves SP, Machado AdB (2022) Digital learning is an educational format towards sustainable education. Sustain, 14, 1140. https://doi.org/10.3390/su14031140 9. Ueno T, Maeda S-I, Kawanabe M, Ishii S (2009). Optimal online learning procedures for modelfree policy evaluation. Em Mach Learn Knowl Discov Databases, p 473–488. Berlin, Heidelberg: Springer Berlin Heidelberg
Reference Framework for the Enterprise Architecture for National Organizations for Official Statistics: Literature Review Arlindo Nhabomba , Bráulio Alturas , and Isabel Alexandre
Abstract Enterprise Architecture Frameworks (EAF) play a crucial role in organizations by providing a means to ensure that the standards for creating the information environment exist and they are properly integrated, thus enabling the creation of Enterprise Architectures (EA) that represent the structure of components, their relationships, principles, and guidelines with the main purpose of supporting business. The increase in the variety and number of Information Technology Systems (ITS) in organizations, increasing their complexity and cost, while decreasing the chances of obtaining real value from these systems, makes the need for an EA even greater. This issue is very critical in organizations whose final product is information, such as the National Organizations for Official Statistics (NOOS), whose mission is to produce and disseminate official statistical information of the respective countries. Historically, NOOS have individually developed business processes and similar products using ITS that are not similar, thus making it difficult to produce consistent statistics in all areas of information. In addition, over the years, the NOOS adopted a business and technological structure and model that entails high maintenance costs that are becoming increasingly impractical and the delivery model inexcusable, and the current EAF are not properly optimized to deal with these problems. NOOS are being increasingly challenged to respond quickly to these emerging information needs. We carried out this research through a literature review and a body of information pertinent on the topic was collected, which allowed us to demonstrate that, in order to respond to these challenges, it is necessary to have a holistic view of ITS through the definition of an EA using a reference EAF among the current ones or a new one, built from scratch. Keywords Enterprise architecture · IS architecture · IT systems · Official statistics
A. Nhabomba (B) · B. Alturas · I. Alexandre Iscte – Instituto Universitário de Lisboa, Avenida das Forças Armadas, 1649-026 Lisboa, Portugal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_25
299
300
A. Nhabomba et al.
1 Introduction Undoubtedly, the cost and complexity of ITS have increased exponentially in recent times, while the chances of getting real value from these systems have drastically decreased, requiring EA (also more complex) to satisfy the information needs of organizations. This situation introduces an additional degree of complexity in the practice of managing and maintaining ITS to ensure its alignment with the organizations’ business, a factor that continues to be seen as of vital importance by IT professionals and business managers in maximizing the contribution of Information Systems (IS) investments. The NOOS, the governing bodies of national statistical systems, are not immune to this problem. Over the years, through many iterations and technological changes, they built their organizational structure and production processes, and consequently their statistical and technological infrastructure. Meanwhile, the cost of maintaining this business model and associated asset bases (process, statistics, technology) is becoming insurmountable and the delivery model unsustainable [1]. For most NOOS, the underlying model for statistical production is based on sample surveys, but increasingly, organizations need to use administrative data or data from alternative sources to deliver efficiencies, reduce provider burden, and make richer use of existing information sources [1]. This requires significant new EA features that are not available on the vast majority of NOOS. The absence of these resources makes it difficult to produce consistent statistics in all domains of information. NOOS are being increasingly challenged to respond quickly to these emerging information needs. The advent of EAF over the past few decades has given rise to a view that business value and agility can best be realized through a holistic approach to EA that explicitly examines every important issue from every important perspective [2]. Similarly, Zachman, early in the EA field, stated that the costs involved and the success of the business, which increasingly depend on its IS, require a disciplined approach to managing these systems [3]. The need for an architectural vision for the IS of the NOOS, which allows a holistic conceptualization of their reality and which allows dealing with each situation in particular regardless of the IS solutions implemented in it, is thus justified by the need for tools that allow not only the representation of its reality, in order to understand the whole, as well as to examine how its constituent parts interact to form this whole. It is from this evidence of the facts that the need for a new EAF for the NOOS can be understood.
2 The Official Statistical Information Official statistical information (or official statistics) provides the quantitative basis for the development and monitoring of government social and economic policies [4]. This information is essential for economic, demographic, social, and environmental development and for mutual knowledge and trade between states and peoples of the world [5]. For this purpose, official statistics that pass the practical utility test
Reference Framework for the Enterprise Architecture for National …
301
must be compiled and made available impartially by NOOS to honor citizens’ right to public information [6]. There are many examples where good quality data are essential for decision-making, such as participation and performance in education, health statistics (morbidity, mortality rates, etc.), crime and incarceration rates, and tax information. Statistical data are almost invariably representative at the national level, because it is obtained from complete censuses or large-scale national sample surveys, and generally seek to present definitive information in accordance with international definitions and classifications or other well-established conventions [6]. However, building a capacity to systematically produce relevant, current, reliable, comprehensive, and internationally comparable statistics is a challenge for all countries. In this context, institutions involved in the production of statistics must rely on the use of international standards, without which the comparability of data produced by different NOOS, within a country and between countries, would be impossible. Its practical implementation is strongly aligned and supported by the Generic Statistical Business Process Model (GSBPM—Generic Statistical Business Process Model) [7].
3 Standards for Supporting Official Statistics Production During the last decades, official statistical production has been undergoing a process of modernization and industrialization conducted internationally. In this regard, the most distinctive initiative is the activities of the High-Level Group for the Modernization of Official Statistics (HLG-MOS) of the United Nations Economic Commission for Europe [8]; being responsible for the development of the following reference models: GSBPM [9], Generic Statistical Information Model (GSIM) [10], and the Common Statistical Production Architecture (CSPA) [11]. To these models is also added the Generic Activity Model for Statistical Organizations (GAMSO) [12]. The GSBPM describes and defines the set of business processes required to produce official statistics and provides a standard framework and harmonized terminology to help statistical organizations modernize their production processes as well as share methods and components [9]. Figure 1 shows the phases of the GSBPM. In addition to the processes, we also have the information that flows between them (data, metadata, rules, parameters, etc.). The GSIM aims to define and describe these information objects in a harmonized way [13]. GSIM and GSBPM are complementary models for the production and management of statistical information. The GSBPM models statistical business processes and identifies the activities carried out by the producers of official statistics resulting in information outputs [10]. These activities are divided into sub-processes, such as “edit and impute” and “calculate aggregates”. As shown in Fig. 2, GSIM helps to describe the GSBPM sub-processes by defining the information objects that flow between them, that are created in them, and that are used by them to produce official statistics [10]. The CSPA is a set of design principles that allow NOOS to develop components and services for the statistical production process, in a way that allows these
302
A. Nhabomba et al.
Fig. 1 Phases of the GSBPM
Fig. 2 Relationship between GSIM and GSBPM [10]
components and services to be easily combined and shared between organizations, regardless of the underlying technology platforms [14]. In this way, CSPA aims to provide “industry architecture” for official statistics. In addition to the achievements made with the GSBPM, GSIM, and CSPA standards, to support greater collaboration between NOOS, it is equally important to cover all typical activities of a statistical organization to improve communication within and between these organizations, introducing a common and standard terminology. This is the task of GAMSO [15]. The following diagram shows the position of GAMSO in relation to other standards for international collaboration (Fig. 3). All of these are measures to industrialize the statistical production process by proposing standard tools for the many aspects of this process. In general, they all follow a top-down approach, through which generic proposals are made that do not take into account specific methodological details of production [16]. As an immediate positive consequence, NOOS can find a direct adaptation of these standards to their particular processes and the statistical production is more easily comparable in the international domain and, therefore, susceptible to standardization to a certain extent. However, NOOS have been developing their own business processes and ITS to create statistical products [1]. While products and processes are conceptually very similar, individual solutions are not; each technical solution was built for a very specific
Reference Framework for the Enterprise Architecture for National …
303
Fig. 3 Relationship between GAMSO, GSBPM, GSIM, and CSPA [15]
Fig. 4 NOOS status quo
purpose, with little regard for the ability to share information with other adjacent applications in the statistical cycle, and with limited ability to handle similar but slightly different processes and tasks [1]. To this, Gjaltema [1], considers “accidental architecture”, as the process and solutions were not conceived from a holistic view (Fig. 4). In Fig. 5, two entities producing official statistics (NOOS, local delegations, or delegated bodies) have totally different technological concepts in the same stages of the statistical production process (e.g., using the GSBPM). As a result, outputs 1 and 2 will never be comparable, which jeopardizes the quality of the information produced and, consequently, the decisions taken, not only nationally, but also internationally. In terms of cost, columnist Bob Lewis has shown that in these situations, during initial IT implementations, the managed architecture is slightly more expensive than the rugged one, but over time, the cost of this one increases exponentially compared to the first one [17] (see Fig. 5). This same idea is shared by Sjöström et al. [18] (see Fig. 6). This means that NOOS find it difficult to produce and share data across systems in line with modern standards (e.g., Data Documentation Initiative (DDI) and Statistical Data and Metadata eXchange (SDMX)), even with new production support
304
A. Nhabomba et al.
Fig. 5 Total IT functionality delivered to enterprise [17]
Fig. 6 Architecture cost [18]
standards (GSBPM, GSIM, CSPA, and GAMSO). In short, the status quo of NOOSs is characterized by • • • •
complex and costly systems; difficult to keep those increasingly expensive systems aligned with NOOS’s needs; rigid processes and methods; inflexible aging technology environments.
4 Enterprise Architecture Frameworks EAF define how to organize the structure and perspective associated with EA [19]. EA, in turn, represents the architecture in which the system in question is the entire company, especially the company’s business processes, technologies, and IS [20]. These components are EA artifacts that can be defined as specific documents, reports, analyses, models, or other tangible items that contribute to an architectural description [20], i.e., providing a holistic view for developing solutions. Thus, an EAF collects tools, techniques, artifact descriptions, process models, reference models, and guidance used in the production of specific artifacts. This includes innovations in an organization’s structure, the centralization of business processes, the quality and timeliness of business information, or ensuring that the money spent on IT investments can be justified [19]. Over the past three decades, many EAF have emerged (and others have disappeared) to deal with two major problems: the increasing complexity
Reference Framework for the Enterprise Architecture for National …
305
of IT systems and the increasing difficulty in getting real value out of these systems [20]. As we can imagine, these problems are related. The more complex a system the less likely it is to deliver the expected value to the business. By better managing complexity, it increases the chances of adding real value to the business. Current literature highlights the following EAF: Zachman’s Framework, The Open Group Architecture Framework (TOGAF), Federal Enterprise Architecture (FEA), Value Realization Framework (VRF) along with Simple Iterative Partitions (SIP) or VRF/SIP, Department of Defense Architecture Framework (DoDAF), Integrated Architecture Framework (IAF), and two techniques developed in the academic context, which are Enterprise Knowledge Development (EKD) and Resources, Events, and Agents (REA) [20, 21]. Nowadays, the criteria for the selection of the main EAF is based on two perspectives: • widely used and highly rated EAF and; • EAF that support mobile IT/cloud computing and web service elements, which are crucial requirements of current IS. According to research in the Journal of Enterprise Architecture, Cameron and McMillan [22], from the perspective of the “widely used” criteria, TOGAF, Zachman, Gartner, FEA, and DoDAF are the most widely used, and it was decided that TOGAF, FEA, and DoDAF are “highly rated”. Sessions [2] also, in his study, states that Zachman, TOGAF, FEA, and Gartner are the most commonly used EAF. From this last list, Moscoso-Zea et al. [23] replace Gartner with DoDAF for the same perspective (widely used). In the second criterion for the selection of the EAF, “integration with the basic structure of mobile IT/cloud computing and services”, Gill et al. [24] argued that FEA, TOGAF, Zachman, and the Adaptive Enterprise Architecture Framework are adequate. Given these facts, we found that only the frameworks of Zachman, TOGAF, and FEA stand out in the two perspectives considered. For this reason, they are of interest to our study. The Zachman Framework provides a means of ensuring that the standards for creating the information environment exist and are properly integrated [25]. It is a taxonomy for organizing architectural artifacts that takes into account both who the artifact is aimed at and the specific problem being addressed [20]. These two dimensions allow the classification of the resulting artifacts, allowing any organization to obtain all types of possible artifacts. However, Zachman alone is not a complete solution. There are many issues that will be critical to the company’s success that Zachman doesn’t address. For example, Zachman doesn’t give us a step-by-step process for creating a new architecture, and it doesn’t even help us decide if the future architecture we’re creating is the best possible one [20]. Further, Zachman doesn’t give us an approach to show the need for a future architecture [20]. For these and other questions, we’ll need to look at other frameworks. TOGAF describes itself as a “framework”, but its most important part is the Architecture Development Model (ADM) which is a process to create an architecture [20]. Since ADM is the most visible part of TOGAF, we categorized it as an architectural process rather than an architectural framework like Zachman. Viewed as an architectural process, TOGAF complements Zachman, which is taxonomy. It should be noted, however, that TOGAF is not linked to government organizations
306
A. Nhabomba et al.
[26, 27]. As for the FEA, it was developed specifically for the federal government and offers a comprehensive approach to the development and use of architectural endeavors in federal government agencies [28] and is recognized as a standard for state institutions [29], unlike Zachman and TOGAF which are generic. FEA is the most complete of the three frameworks under discussion, i.e., it has a comprehensive taxonomy, like Zachman, and it also allows for the development of these artifacts, providing a process for doing so, like TOGAF [20]. There is, however, an important criticism of the FEA. In 2010, the Federal Council of CIO (Chief Information Officers) raised some problems in relation to FEA, such as [30] • lack of a common and shared understanding; • confusion about what EA is; • issues associated with FEA compliance reports. Participants recognized that it was time for a change. And, in general, a series of constraints in the implementation of EA are pointed out, such as the lack of clarity of its functions, ineffective communication, low maturity, and commitment of EA and its tools [31]. These challenges were attributed to three root causes: the ambiguity of the EA concept, the difficult terminology, and the complexity of EA frameworks.
5 Framework for NOOS As soon as we have briefly described the three most important frameworks, and presented their limitations, we will now discuss the essential characteristics of the solution proposed for NOOS. From the description of the EAF above, we concluded that with any of the three frameworks (Zachman, TOGAF, and FEA) it is possible to define how to create and implement an EA, providing the principles and practices for its description. Recognition of this reality is important because it allows any organization, public or private, including NOOS, to be aware of the specific needs that EAs must support, as well as to alert to the need for a sustained development of ITS. However, for NOOS, special attention must be considered, taking into account their public nature, which at the same time requires a lot of rigor in the execution of statistical surveys, respecting all phases of the GSBPM. For this type of organization, it is crucial to define a global EA, integrating all the entities involved in the statistical processes and, for that, it is necessary to adopt an EAF that supports this business model, which includes the implementation of ITS solutions in multiple statistical cycles while the process is performed by multiple entities. As we saw earlier, Zachman and TOGAF have some limitations to build an EA (although they can be used together in a blended approach) especially in NOOS (they are not linked to government organizations). In a comparative study carried out by Sessions and DeVadoss [20] between Zachman, TOGAF, FEA, and VRF/SIP considered the most important in that study and using criteria such as information availability, business focus, governance orientation, reference model orientation, prescriptive catalog, maturity model, among others, and in particular the criterion of the maturity model,
Reference Framework for the Enterprise Architecture for National …
307
FEA was considered the best [20]. The maturity model refers to how much guidance the framework provides to assess the effectiveness and maturity of different organizations in the use of EA [20]. This feature is important for NOOS since different entities are involved in the statistical production process and it is interesting to assess their effectiveness and maturity in the use of EA [32]. Furthermore, FEA is the most complete of the three most important EAF (it has mechanisms not only to classify artifacts, but also to develop them), as we saw earlier. It was also seen that FEA is a standard framework for state organizations, which NOOS fall under. By presenting all these characteristics, which are favorable to NOOS, FEA seems suitable for NOOS, despite being tainted by the problems raised by the Federal Council of CIO [30]. Therefore, to take advantage of these potentials, we recommend, as the first option, the creation of a reference EAF based on FEA, with better approaches in the following fields: common and shared understanding of the EA, compliance reports, clarity of the EA concept, and simplification of terminology and structures of EA. This approach, for official statistics, is also supported by Alturas, Isabel, and Nhabomba [32]. The second option that we propose is the creation of a new EAF, from scratch, and specific to the official statistics industry. This option would somewhat use a blended approach, consisting of fragments of each of the EAF that provide the most value in their specific areas. These fragments would be obtained by rating the EAF taking into account the criteria considered important on a case-by-case basis. This approach can be explained in the following Fig. 7. Here, it is recommended that the criterion “maturity model” and the three frameworks (Zachman, TOGAF, and FEA) are always present in the evaluation; the maturity model for being characteristic of NOOS and the three frameworks for being the most important. At the end of this exercise, the result will be a new EAF that consists of fragments from each of the top-rated frameworks. This will be the most suitable framework for NOOS and its implementation will require a broad perspective on all selected frameworks and expertise in helping companies create a framework that works better, given the specific needs and political realities of that company. Fig. 7 Criteria and ratings for each framework
308
A. Nhabomba et al.
6 Conclusions In this article, we demonstrated the need for a new framework for enterprise architecture for NOOS as these organizations have historically developed technical solutions without any holistic perspective, i.e., solutions developed individually for very specific purposes, with little consideration for the ability to share information, resulting in an accidental architectures, which propitiates complex and costly expensive systems, difficult to keep those systems aligned with NOOS’s needs, rigid processes and methods, and inflexible aging technology environments. To address these problems, two possible solutions were proposed. Before conceiving these solutions, we first selected the best frameworks, based on two criteria, which are “widely used and highly rated EAF” and “EAF that support mobile IT/cloud computing and web service element”, and three of them (Zachman, TOGAF, and FEA) proved to be the best. Then we presented the first solution that is a reference framework based on FEA to take advantage of its potential related to the fact that it is more complete than the other two frameworks, it is a standard for state organizations, and it works better with the maturity model, an important feature for NOOS. We recommended that this first option should have better approaches in the fields related to common and shared understanding of AE, compliance reporting, clarity of AE concept, and simplification of terminology and structures of EA. The second solution is a new EAF, resulting from a blended approach, consisting of fragments of each of the EAF that provide the most value in their specific areas. In this second option, we recommended that the criterion “maturity model” and the three most important frameworks must always be present in the evaluation; the first for being peculiar to NOOS and the three frameworks for being the most important. In the future, we will continue this research, providing a concrete proposal for a new EAF for NOOS, following one of the suggested solutions.
References 1. Gjaltema T (2021) Common statistical production architecture [Online]. https://statswiki. unece.org/display/CSPA/I.++CSPA+2.0+The+Problem+Statement 2. Sessions R (2007) A comparison of the top four methodologies, pp 1–34 [Online]. http://www. citeulike.org/group/4795/article/4619058 3. Zachman J (1987) A framework for information systems architecture. IBM Syst J 26(3) 4. Janssen T, Forbes S (2014) The use of official statistics in evidence based policy making in New Zealand. Sustain Stat Educ Proc Ninth Int Conf Teach Stat ICOTS9 5. Divisão Estatística das Nações Unidas (2003) Handbook of statistical organization, third edition: the operation and organization of a statistical agency 6. Feijó C, Valente E (2005) As estatísticas oficiais e o interesse público. Bahia Análise & Dados 15:43–54 7. UNECE (2012) Mapping the generic statistical business process model (GSBPM) to the fundamental principles of official statistics 8. High-Level Group for the Modernisation of Statistical Production (2011) Strategic vision of the high-level group for strategic developments in business architecture in statistics. In:
Reference Framework for the Enterprise Architecture for National …
9. 10.
11. 12. 13. 14. 15. 16.
17. 18. 19.
20.
21.
22. 23.
24.
25. 26. 27. 28.
29. 30.
309
Conference of European statisticians (24) [Online]. file:///C:/Users/user/Documents/1Doc 2021/ISCTE/PROJETO/Artigos 2022/Fontes/Strategic Vision.pdf Choi I (2020) Generic statistical business process model [Online]. https://statswiki.unece.org/ display/GSBPM/I.+Introduction#footnote2 Choi I (2121) Generic statistical information model (GSIM): communication paper for a general statistical audience. Mod Stats [Online]. https://statswiki.unece.org/display/gsim/GSIM+v1.2+ Communication+Paper Gjaltema T (2021) CSPA 2.0 common statistical production architecture. https://statswiki. unece.org/pages/viewpage.action?pageId=247302723 Gjaltema T (2021) Generic activity model for statistical organizations [Online]. https://statsw iki.unece.org/display/GAMSO/I.+Introduction. Lalor T, Vale S, Gregory A (2013) Generic statistical information model (GSIM). North Am Data Doc Initiat Conf (NADDI 2013), Univ Kansas, Lawrence, Kansas (December, 2013) Nações Unidas (2015) Implementation guidelines United Nations. United Nations Fundam Princ Off Stat, pp 1–117 UNECE (2015) Generic activity model for statistical organisations, pp 1–11 (March) Salgado D, de la Castellana P (2016) A modern vision of official statistical production, pp 1–40 [Online]. https://ine.es/ss/Satellite?L=es_ES&c=INEDocTrabajo_C&cid=125994986 5043&p=1254735839320&pagename=MetodologiaYEstandares%2FINELayout Lewis B (2021) Technical architecture: what IT does for a living. https://www.cio.com/article/ 189320/technical-architecture-what-it-does-for-a-living.html. Accessed 10 June 2022 Sjöström H, Lönnström H, Engdahl J, Ahlén P (2018) Architecture recommendations (566) Galinec D, Luic L (2011) The impact of organisational structure on enterprise architecture deployment. In: Proceedings of the 22nd Central European conference on information and intelligent systems, Varaždin, Croatia, 21–23 September 2011, vol 16, no 1, pp 2–19. https:// doi.org/10.1108/JSIT-04-2013-0010 Sessions R, DeVadoss J (2014) A comparison of the top four enterprise architecture approaches in 2014 by Roger sessions and John deVadoss table of contents. Microsoft Dev Netw Archit Cent 57 Bernaert M, Poels G, Snoeck M, De Backer M (2014) Enterprise architecture for small and medium-sized enterprises: a starting point for bringing EA to SMEs, based on adoption models, pp 67–96. https://doi.org/10.1007/978-3-642-38244-4_4 Cameron B, Mcmillan E (2013) Analyzing the current trends in enterprise architecture frameworks. J Enterp Archit 60–71 Moscoso-Zea O, Paredes-Gualtor J, Luján-Mora S (2019) Enterprise architecture, an enabler of change and knowledge management. Enfoque UTE 10(1):247–257. https://doi.org/10.29019/ enfoqueute.v10n1.459 Gill AQ, Smith S, Beydoun G, Sugumaran V (2014) Agile enterprise architecture: a case of a cloud technology-enabled government enterprise transformation. Proc—Pacific Asia Conf Inf Syst PACIS 2014 1–11 Rocha Á, Santos P (2010) Introdução ao Framework de Zachman (January 2010), p 19 Masuda Y, Viswanathan M (2019) Enterprise architecture for global companies in a digital IT era: adaptive integrated digital architecture framework (AIDAF). Springer, Tóquio Gill AQ (2015) Adaptive cloud enterprise architecture. Intelligent information systems. University of Technology, Australia Defriani M, Resmi MG (2019) E-government architectural planning using federal enterprise architecture framework in Purwakarta districts government. Proc 2019 4th Int Conf Inf Comput ICIC 2019 (April 2020). https://doi.org/10.1109/ICIC47613.2019.8985819 LearnIX, FEAF—Federal Enterprise Architecture Framework, 2022. https://www.leanix.net/ en/wiki/ea/feaf-federal-enterprise-architecture-framework Gaver SB (2010) Why doesn’t the federal enterprise architecture work?, p 114 [Online]. https://www.ech-bpm.ch/sites/default/files/articles/why_doesnt_the_federal_enterpr ise_architecture_work.pdf
310
A. Nhabomba et al.
31. Olsen DH (2017) Enterprise architecture management challenges in the Norwegian health sector. Procedia Comput Sci 121:637–645. https://doi.org/10.1016/j.procs.2017.11.084 32. Nhabomba ABP, Alexandre IM, Alturas B (2021) Framework de Arquitetura de Sistemas de Informação para as Organizações Nacionais de Estatísticas Oficiais. In: • 16 a Conferência Ibérica de Sistemas e Tecnologias de Informação, pp 1–5. https://doi.org/10.23919/CISTI5 2073.2021.9476481
Assessing the Impact of Process Awareness in Industry 4.0 Pedro Estrela de Moura, Vasco Amaral , and Fernando Brito e Abreu
Abstract The historical (and market) value of classic cars’ depends on their authenticity, which can be ruined by careless restoration processes. This paper reports on our ongoing research on monitoring the progress of such processes. We developed a process monitoring platform that combines data gathered from IoT sensors with input provided by a plant shop manager, using a process-aware GUI. The underlying process complies with the best practices expressed in FIVA’s Charter of Turin. Evidence (e.g., photos, documents, and short movies) can be attached to each task during process instantiation. Furthermore, car owners can remotely control cameras and car rotisserie to monitor critical steps of the restoration process. The benefits are manifold for all involved stakeholders. Restoration workshops increase their transparency and credibility while getting a better grasp on work assignments. Car owners can better assure the authenticity of their cars to third parties (potential buyers and certification bodies) while reducing their financial and scheduling overhead and carbon footprint. Keywords Classic car documentation · Auto repair shop software · Business process · BPMN · DMN · Internet of things · Industry 4.0 · Process monitoring · GUI [ · Process awareness · Process mining
P. E. de Moura (B) NOVA School of Science and Technology, Caparica, Portugal e-mail: [email protected] V. Amaral NOVA School of Science and Technology & NOVA LINCS, Caparica, Portugal e-mail: [email protected] F. B. e Abreu ISCTE - Instituto Universitário de Lisboa & ISTAR-IUL, Lisboa, Portugal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_26
311
312
P. E. de Moura et al.
1 Introduction Classic cars are collectible items, sometimes worth millions of euros [1], closer to pieces of art than to regular vehicles. To recognize their historic status required to reach these price-tag levels, classic cars should go through a rigorous certification process. This means that, during preservation or restoration procedures, strict guidelines should be followed to preserve their status, otherwise authenticity can be jeopardized, hindering the chances for certification. Such guidelines are published in FIVA’s1 “Charter of Turin Handbook” [3]. Since the expertise required for matching those guidelines is scarce and very expensive, car owners often choose restoration workshops far away from their residences, sometimes even overseas. This means that to follow the work done, long-range travels are required, with corresponding cost and time overheads and an increase in carbon footprint. We are tackling this issue by creating a platform that allows classic car owners to follow the work being done while reducing the need for manual input by workshop workers. This is accomplished by creating a digital twin that mirrors the work done at the workshop. In this paper, we describe a process-aware platform with a model-based GUI that is used by both workshop workers and car owners. We use a BPMN + DMN model described in [7] that is inspired by the “Charter of Turin’s Handbook” guidelines. This is the first attempt we are aware, of modeling this charter and was a great starting point for this research. Other sources, including local experts, provided the information required to fill in the blanks during the modeling process because the charter is vague in certain procedures or of subjective interpretation due to being written in natural language. During execution, process instances (cars under preservation or restoration) progress from task to task, either due to automatic detection with ML algorithms that take as input IoT sensors’ data collected by an edge computer attached to each car or due to manual intervention by the workshop manager. During the preservation or restoration process, the latter can attach evidence (photos, scanned documents, and short videos) to each task instance (task performed upon a given car). That evidence is used to automatically generate, using a LaTeX template, a report for car owners, for them to warrant the authenticity of the restoration and/or preservation their classic cars went through, to certification bodies and/or potential buyers. Our platform also allows holding meetings remotely with car owners, granting them complete control of a set of pan, tilt, and zoom operations upon a set of IP cameras at the workshop pointed at their car in a specific showroom. This feature reduces car owners’ financial and scheduling overhead and their carbon footprint. Both features (evidence collection and online interaction) increase the transparency of the restoration and preservation processes. We adopted an Action Research methodology, as interpreted by Susman and Evered in [12], where five stages of work are continuously iterated: Diagnosing, Action Planning, Action Taking, Evaluating, and Specifying Learning. By choosing 1
Fédération Internationale des Véhicules Anciens (FIVA), https://fiva.org.
Assessing the Impact of Process Awareness in Industry 4.0
313
this methodology, we aim to constantly receive feedback from platform users on the features being implemented, allowing an agile and quality-in-use development roadmap [4]. We claim two major contributions of this ongoing applied research work: (i) the positive impact of the proposed digital transformation in this Industry 4.0 context, and (ii) the assessment of the feasibility of process-aware/model-based GUIs, a topic we could not find addressed in the literature. This paper is organized as follows: Sect. 2 presents related work along three axes that intersect our work, Sect. 3 describes the proposed platform, and Sect. 4 presents the corresponding validation efforts; finally, in Sect. 5, we draw our conclusions and prospects for future work.
2 Related Work 2.1 Car Workshop Systems Several commercial systems can be found under the general designation of “Auto Repair Shop Software”. Besides documenting the history of ongoing repairs exist, they usually are concerned with financial management (invoicing), scheduling, workforce management, inventory, and management of interactions with customers (with some features found in CRM systems) and suppliers (e.g., paints and spare parts). An example that covers these aspects is Shopmonkey,2 advertised as a “Smart & simple repair shop management software that does it all”. Software systems specially designed for classic cars are scarce. One example of such is Collector Car Companion.3 It is a platform targeting classic car owners and restoration shops that allows documenting cars and their restoration processes, including photographic evidence. Additionally, it can be used to catalog parts and track costs and suppliers. We could not find any model-based solution for managing classic car restoration and preservation processes. For examples of such systems, we had to look at other industries.
2.2 Business Process Models in Industry 4.0 Kady et al. [5] created a platform aimed at beekeepers to help them manage their beehives. This was achieved by using sensors to continuously measure the weight of beehives and other discrete measurements at regular intervals. Additionally, they 2 3
https://www.shopmonkey.io. https://collectorcarcompanion.com/.
314
P. E. de Moura et al.
built BPMN models with the help of beekeepers, based on apicultural business rules. The patterns of the measurements collected are identified for data labeling and BPMN events association. These events trigger automated business rules on the workflow model. The process monitoring realized in this work is executed in a very similar way to ours. The differences occur in the way it is presented in the GUI. Instead of offering the visualization directly on the BPMN model itself, they added trigger events to the model that send notifications to the beekeeper’s mobile phone. Schinle et al. [10] proposed a solution to monitor processes within a chaincode by translating them into BPMN 2.0 models using process mining techniques. These models could then be used as graphical representations of the business processes. The authors claim to use process monitoring and process mining techniques, but it is unclear how the model-based GUI includes the monitoring aspects, as the only representation of a model shown is the one obtained after process mining, without process monitoring elements. Makke and Gusikhin [8] developed an adaptive parking information system that used cameras and sensors to track parking space occupancy, by implementing Petri Nets as digital twins for the parking space. In their representation, tokens were used to represent vehicles, places to represent areas or parking spots, and transitions to represent entrances and exits of the parking areas. Petri nets were also used as a way to represent the routes that individual vehicles took while in the parking space. The authors used a model-based GUI monitoring approach, but the models are hidden from the final users. This differs from our solution, as we present BPMN models in the GUI used by final users. Pinna et al. [9] developed a graphical run-time visualization interface to monitor Smart-Documents scenarios. Their solution consisted of an interface with a workflow abstraction of the BPEL models that highlighted the services already performed and the ones being performed. The decision to use BPEL abstraction models over the BPEL models themselves was because the BPEL workflow contained too many components, which made the scenario unreadable for human users, such as control activities and variables updating. Their abstraction used an icon-based GUI, instead of the usual text-based, for representing activities. It is unclear why this decision was made, as it seems that this annotation makes it harder to follow the process for an unaccustomed user. To mitigate this problem, by mousing over the icons, some additional information about the activity can be obtained. This publication does not describe the validation of the proposed approach. Most of the articles that use BPMs in Industry 4.0 contexts adopted BPMN, as confirmed by the secondary study titled “IoT-aware Business Process: a comprehensive survey, discussion and challenges” [2]. Our choice of using BPMN is then aligned with current practice. However, the main conclusion we draw from our literature review is that using a process-aware model-based GUI in Industry 4.0 is still an unexplored niche. The closest example we found of using this untapped combination of technologies is [8], but still, it seemed to only be used as an intermediary analysis tool.
Assessing the Impact of Process Awareness in Industry 4.0
315
3 Proposed Platform 3.1 Requirements Our platform can be divided into two separate subsystems, each with its own set of use cases. The first is the Plant Shop Floor Subsystem. This is the main part of our system where the Charter of Turin-based models can be viewed and interacted with. The operations that the different users can do in this subsystem are identified in the use case diagram in Fig. 1. The Experimental Hub Subsystem manages the live camera feeds to be used during scheduled meetings with car owners. The possible operations done in this subsystem are identified in the use case diagram in Fig. 2. In the diagrams, the Plant Shop Manager actor represents the workshop staff members that will control the day-to-day updates done to each vehicle and update the system accordingly. The Administrative Manager actor represents the workshop staff members who will have the control to create and delete restoration and preservation
Fig. 1 Use case diagram of the plant shop subsystem
316
P. E. de Moura et al.
Fig. 2 Use case diagram of the experimental hub subsystem
processes, as well as some administrative tasks, like registering new users to the system and sending credentials to be used to access the Experimental Hub Subsystem. Lastly, the Classic Car Owner actor stands for the owners themselves.
3.2 Architecture In the original architecture proposed in [7], Camunda’s Workflow Engine was used (and still is) to execute the Charter of Turin-based process, i.e., allowing its instances to be created, progress through existing activities, and deleted. The data stored in this platform were obtained through REST calls by a component designated as Connector. This component used BPMN.io to display the BPMN models on a web page to be interacted with by the workshop manager, indicating the path taken during the restoration process. This component also included a REST API that allows the retrieval of information about each instance. This API was used by a component developed with the ERPNext platform to allow owners to see the progress applied to their car as a list of completed tasks, while also providing some CRM functionalities for the workshop manager. We decided to discard the use of the ERPNext platform because, albeit it is open-source, implementing new features within this platform was laborious and inefficient, due to scarce documentation and lack of feedback from its development team. An overview of the current system’s architecture is depicted in the component diagram in Fig. 3. The Workflow Editor component is used to design the BPMN and DMN diagrams, while the Workflow Engine component stores the latter and allows for the execution of their workflows.
Assessing the Impact of Process Awareness in Industry 4.0
317
Fig. 3 Component diagram of the system
The Charter of Turin Monitor is the component that integrates the features formerly existing in the Connector component with some CRM features equivalent to those reused from ERPNext. It serves as the GUI that workshop employees use to interact with the BPMN process instances and use the CRM features to convey information to the owners. It also serves as the GUI used by classic car owners to check the progress and details of the restoration/preservation processes. One of these details is a direct link to the secret Pinterest board that holds the photographic evidence taken by the workshop staff members. These boards are divided into sections, each representing an activity with photos attached. Lastly, there are several IP cameras mounted in what we called the Experimental Hub, a dedicated room on the workshop premises. Classic car owners can remotely access and control these cameras through their web browsers, using the Camera Hub component. During their meeting, this access will only be available for a limited time, assigned by the workshop manager within the Charter of Turin Monitor component. The Camera Hub component also calls an API implemented in the Charter of Turin Monitor component to upload photos and videos taken during the meetings directly to the corresponding Pinterest board.
3.3 Technologies To model and deploy the BPMN and DMN models, we chose two Camunda products: Camunda’s Modeler for process modeling and Camunda’s Workflow Engine for process execution. Camunda software is widely used by household name companies, which stands for its reliability. The choice was also due to the two products being freeware, offering good tutorials and manuals.
318
P. E. de Moura et al.
For our back-end, we chose ASP.NET framework, primarily due to the offered plethora of integration alternatives, matching our envisaged current and future needs. The back-end was deployed on a Docker container in a Linux server running in a virtual machine hosted by an OpenStack platform operated by INCD (see the acknowledgment section). The database software we chose was MongoDB, as there is plenty of documentation on integrating it with.NET applications and deploying it with Docker. For our front-end, we decided not to use the default.NET framework Razor, but instead use Angular. Even though this framework does not offer integration as simple as Razor,.NET provides a template that integrates the two, while providing highly dynamic and functional pages with many libraries and extensions. Within our frontend, we integrated BPMN.io’s viewer bpmn-js. This viewer was developed with the exact purpose of working with Camunda and offers a simple way to embed a BPMN viewer within any web page. Finally, we chose Pinterest to store the photographic evidence collected. The option of storing these directly in the database or another platform was considered, but Pinterest was ultimately chosen by offering an API that allows all needed functionalities to be done automatically, without the need for manual effort. Also, it provides good support, in the form of widgets and add-ons, for integrating its GUI within other web pages, in case there is a later need for this feature. All the code and models used in this project can be found on GitHub.4
4 Validation This work has two main parts requiring validation, the DMN and BPMN models based on the process described in the “Charter of Turin Handbook” and the GUI used to represent them.
4.1 Model Validation To validate the models, we asked for feedback from the classic car workshop experts before deployment. This allowed for the more abstract parts of the “Charter of Turin Handbook”, which is fully described in natural language, to be complemented with the actual process followed in the workshop. A continuous improvement is now in place since the platform was already deployed in the workshop. Whenever any inconsistencies are found, the appropriate changes are swiftly made to allow for a fast redeployment.
4
https://github.com/PedroMMoura.
Assessing the Impact of Process Awareness in Industry 4.0
319
4.2 GUI Validation For GUI validation, we required analysis from the viewpoint of both the workshop workers that directly interact with the Chart of Turin-inspired model and the car owners, who use the platform to follow the process. To validate the workshop workers’ interaction, we resorted to using an expert panel [6]. The selected members for this panel needed prior knowledge of the models, or at least the general process, being used. This meant that we were limited to people that work directly for workshop companies that do restoration and preservation on classic cars and to certification companies engineers. Once the experts had been chosen (see Table 1), we conducted meetings with them, showing the platform and requesting feedback with a small interview. In the interim between interviews, we kept updating the platform based on the feedback received, checking how the satisfaction with it evolves. Upon completion of all the interviews, all data is aggregated and used to evaluate the results. This is still an ongoing task. From the interviews done so far, the feedback received has been mostly positive, with a great interest in the project being developed. Among the suggested features that were already implemented are coloring the tasks that require evidence gathering according to FIVA requirements, a text field for each task that allows for any additional information to be added when necessary, and a few other usability improvements. To validate the owners’ interaction with the system, we decided to use an interview approach [11]. These interviews will be performed with any classic car owner willing to participate, not requiring prior knowledge. As a result, we should get a good idea of new users’ overall satisfaction levels while using our platform. After being informed of our work, several classic car owners and longtime customers of the workshop showed great interest in working with us to test the platform. As of the writing of this document, these interviews have not yet been conducted, because priority was given to finishing the validation of the workshop workers’ interaction before starting the validation of the owner’s interaction. This choice was made because, while the worker’s interaction directly affects the owner’s GUI, the owner’s interaction barely affects the worker’s experience. Table 1 Expert panel characterization
Profession
Expertise
Field of work
Years of experience
Manager
Plant shop floor works
Car body restore shop
20
Secretary
CRM
Car body restore shop
15
Manager
HR management
Car body restore shop
15
Engineer
Classic cars certification
Certification body
25
Researcher
BPM modeling R&D
25
320
P. E. de Moura et al.
5 Conclusion and Future Work In this paper, we described our ongoing effort to develop and validate a platform for monitoring the progress of classic cars’ restoration process and recording evidence to allow documenting it in future certification initiatives. We took FIVA’s Charter of Turin’s guidelines as inspiration for producing a BPMN process model that is used as the backbone of our process-aware graphical user interface. The validation feedback received until now has been very positive. This work has gathered interest from several players in the classic car restoration industry, from classic car owners to workshops and certification bodies, which will be very helpful in improving the developed platform and in future validation steps. As future work, we plan to use process mining techniques to validate the models, based on data that is already being collected. Since each classic vehicle is just a process instance, we will have to wait until a considerable number of them complete the restoration process, since process mining ideally requires a large amount of data to produce adequate results. Acknowledgements This work was produced with the support of INCD funded by FCT and FEDER under the project 01/SAICT/2016 nº 022153, and partially supported by NOVA LINCS (FCT UIDB/04516/2020).
References 1. Autocar: The 13 most expensive cars ever sold (2018). https://www.autocar.co.uk/car-news/ industry/12-most-expensive-cars-sold-auction 2. Fattouch N, Lahmar IB, Boukadi K (2020) IoT-aware business process: comprehensive survey, discussion and challenges. In: 29th internernational conference on enabling technologies: infrastructure for collaborative enterprises (WETICE). IEEE, pp 100–105 3. Gibbins K (2018) Charter of Turin handbook. Tech. rep., Fédération Internationale des Véhicules Anciens (FIVA). https://fiva.org/download/turin-charter-handbook-updated-2019english-version/ 4. ISO Central Secretary (2011) Systems and software engineering—Systems and software Quality Requirements and Evaluation (SQuaRE)—system and software quality models. Standard ISO/IEC 25010:2011, International Organization for Standardization, Geneva, CH. https:// www.iso.org/standard/35733.html 5. Kady C, Chedid AM, Kortbawi I, Yaacoub C, Akl A, Daclin N, Trousset F, Pfister F, Zacharewicz G (2021) IoT-driven workflows for risk management and control of beehives. Diversity 13(7):296 6. Li M, Smidts CS (2003) A ranking of software engineering measures based on expert opinion. IEEE Trans Software Eng 29(9):811–824 7. Lívio D (2022) Process-based monitoring in industrial context: the case of classic cars restoration. Master’s thesis, Monte da Caparica, Portugal. http://hdl.hadle.net/10362/141079 8. Makke O, Gusikhin O (2020) Robust IoT based parking information system. In: Smart cities, green technologies, and intelligent transport systems. Springer, pp 204–227 9. Pinna D (2008) Real-time business processes visualization in document processing systems. Master’s thesis, Torino, Italia
Assessing the Impact of Process Awareness in Industry 4.0
321
10. Schinle M, Erler C, Andris PN, Stork W (2020) Integration, execution and monitoring of business processes with chaincode. In: 2nd conference on blockchain research & applications for innovative networks and services (BRAINS). IEEE, pp 63–70 11. Seidman I (2006) Interviewing as qualitative research: a guide for researchers in education and the social sciences. Teachers college press 12. Susman GI, Evered RD (1978) An assessment of the scientific merits of action research. Adm Sci Q 582–603
An Overview on the Identification of Software Birthmarks for Software Protection Shah Nazir and Habib Ullah Khan
Abstract Software birthmarks were created in order to identify instances of software piracy. The perception of a software birthmark was established in response to the limitations of watermarks, fingerprints, and digital signatures, which make it challenging to determine the identity of software. Software programs can be compared based on their extracted properties and birthmarks to determine who owns the software. Birthmarks are used to identify a specific programming language’s executable and source code. Researchers and practitioners can create new methods and processes for software birthmarks on the basis of which piracy is effectively identified by using the analysis of software birthmarks from various viewpoints. The goal of the current study is to comprehend the specifics of software birthmarks in order to gather and evaluate the information provided in the literature now in existence and to facilitate the advancement of future studies in the field. Numerous notable software birthmarks and current techniques have been uncovered by the study. Various sorts of analyses were conducted in accordance with the stated study topics. According to the study, more research needs to be done on software birthmarks to make accurate and reliable systems that can quickly and accurately find stolen software and stop software piracy. Keywords Software security · Birthmark · Software birthmark · Software measurements
1 Introduction Software piracy is a major issue for the software business which suffers severe business losses as a result of this infringement. Software piracy is the unlicensed use of software that is illegal. The prevention of software piracy is crucial for the expanding S. Nazir (B) Department of Computer Science, University of Swabi, Swabi, Pakistan e-mail: [email protected] H. U. Khan Department of Accounting & Information Systems, College of Business & Economics, Qatar University, Doha, Qatar © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_27
323
324
S. Nazir and H. U. Khan
software industry’s economy. Researchers are attempting to develop methods and tools to stop software piracy and outlaw the use of illegally obtained software. Pirated software has a number of drawbacks that prevent it from the advantages of software upgrades, constant technical support, assurance of virus-free software, thorough program documentations, and quality assurance. Different strategies are in use to stop software piracy. Such techniques include fingerprints [1], watermarks [2–6], and software birthmarks [7–16]. The disadvantage of watermark is that it is detachable using code obfuscation and transformation techniques that preserve semantics. The software fingerprints have the same problems. To get around these restrictions, the birthmark, a well-known and widely acknowledged technique for preventing software piracy, was created. Software birthmarks are fundamental characteristics of software that can be utilized to determine the distinct identity of software and later be used as proof of software theft. The concept of a software birthmark was developed in response to the limitations of watermarks, fingerprints, and digital signatures, which make it challenging to determine the identity of the software. Birthmarks were first thought about long before they were legally recognized in 2004. Software birthmarks are typically used for Windows API and software theft. If a piece of software has more intrinsic features, it should be regarded as having a strong birthmark. In the end, the birthmark’s durability will enable quick and accurate software uniqueness identification. The birthmark of software is based on the reliability and toughness of the software [7]. The suggested research makes a contribution by presenting a compressive indepth investigation of software birthmarks, which are employed for a variety of applications but are primarily for the detection of software piracy.
2 Identification of Software Birthmark to Prevent Piracy A software program is, in general, a pool of several software features of a particular kind. If a birthmark has additional characteristics, it is referred to as a strong birthmark. Such as the birthmark designed by Nazir et al. [17] which is considered a strong birthmark as this birthmark consists of more software features. This birthmark mostly has four characteristics. The pre-condition feature category was omitted after doing the initial analysis because it is included in almost all types of software. The remaining three feature categories were then utilized. Sub-categories were then created for each of the three main categories. Program contents, program flow, internal data structure, configurable terminologies, control flow, interface description, program size, program responses, restriction, naming, functions, thorough documentation, limitation and constraints, user interface, statements in the program, internal quality, and global data structure are the 17 features that were taken into consideration for the input category. Automation, scalability, ease of use, applicability, friendliness, robustness, portability, scope, interface connections, standard, reliance, and external
An Overview on the Identification of Software Birthmarks for Software … Table 1 Different forms of software birthmarks
325
S. no
Refs.
Technique of birthmarks
1
[9]
DKISB
2
[11]
JSBiRTH
3
[15]
Dynamic K-Gram
4
[16]
K-gram
5
[19]
Dynamic key instruction sequences
6
[18]
Birthmark-based features
7
[20]
System call dependence graph
8
[21]
Optimized grouping value
9
[22]
Static major-path birthmarks
10
[23]
Thread-aware birthmarks
11
[24]
System Call-Based Birthmarks
12
[25]
Method-based static software birthmarks
13
[26]
Static Object Trace Birthmark
14
[27]
Static instruction trace birthmark
15
[28]
Static API Trace Birthmark
16
[29]
A dynamic birthmark for Java
17
[30]
Dynamic Opcode n-gram
18
[31]
API call structure
19
[32]
CVFV, IS, SMC, and UC
20
[33]
Whole Program Path Birthmarks
quality are the 12 subfeatures that make up the non-functional components. Functional specification, data and control process, behavior, and functionality are the four subfeatures that make up the functional components. Understanding these aspects and how they are logically organized makes it easier to understand the code [18]. Table 1 identifies the different forms of software birthmarks. Table 2 shows the techniques used for software birthmark.
3 Analysis of the Existing Approaches Software birthmarking is regarded as a reliable and effective method for detecting software theft and preventing piracy. Resilience and believability are two factors that can be used to gauge how comparable two birthmarks are. For comparing software based on birthmarks, various metrics are employed. Most frequently, two birthmarks are compared using the cosine distance. Other common set-based metrics including the Dice coefficient [14, 27] and the Jaccard index [29] are used for assessing the similarity of dynamic birthmarks. Diverse approaches are used to work with byte code and on source code-based birthmarks. Various famous libraries were searched to show
326
S. Nazir and H. U. Khan
Table 2 Approaches of software birthmark R. no
Technique
[34]
Class invocation graph and state flow chart-based analysis of repackaging detection of mobile apps
[35]
State diagram and call tree-based comparison
[19]
Jaccard index, dice coefficient, cosine distance, and containment similarity metrics
[18]
Comparison through mining of semantic and syntactic software features
[36]
Estimating birthmark of software based on fuzzy rules
[37]
Dynamic birthmark based on API
[38]
Cosine similarity metrics
[39]
k-gram comparisons
[40]
Control flow graphs
[41]
Cosine similarity metric
the literature on software birthmark. Figure 1 shows the libraries and the number of total publications. These libraries and the number of publications depict that more articles were published in the Springer library followed by the ScienceDirect, and so on.
ACM
IEEE
Sciencedirect
Springer
Taylor and Francis
Wiley Online Library
7
4 31
42
29
37
Fig. 1 Libraries and publication
Article type
An Overview on the Identification of Software Birthmarks for Software …
327
Other Journal Conference Book/Book chapter
0
20
40
60
80
100
Number of papers Fig. 2 Article type and number of publications
Number of papr
1500 1000 500 0 0
1
2
3
4
5
6
7
Libraries Title
abstract
contents
Linear (Title)
Linear (abstract)
Linear (contents)
Fig. 3 Filtering process for identification of related papers
Figure 2 represents the article type and the number of publications. The figure depicts that more articles were published as conference papers followed by the journal and so on. The reason behind this representation was to show the increase/decrease of research work in the area. Figure 3 shows the filtering process for the identification of related papers. These papers were initially represented as papers based on title, then showed as abstract, and finally showed as full contents.
4 Conclusions Software’s birthmarks are inherent qualities that can be utilized to spot software theft. Software birthmarks were created with the intention of identifying software piracy. Software piracy might be entire, partial, or to a small degree. The owners of
328
S. Nazir and H. U. Khan
the companies that generate software suffer enormous losses as a result of software piracy. Software programs’ extracted traits, generally referred to as birthmarks, can be used to compare birthmarks of software programs to assess the ownership of the software. Birthmarks are used to identify a specific programming language’s executable and source code. Researchers and practitioners can create new methods and processes of software birthmarks on the basis of which piracy is effectively identified by using the analysis of software birthmarks from various angles. The goal of the proposed study is to advance future research in the field by gaining an understanding of the specifics of software birthmarks by evidence gathered from the literature and expertise supplied within. Conflict of Interest The authors declare no conflict of interest.
References 1. Gottschlich C (2012) Curved-region-based ridge frequency estimation and curved Gabor filters for fingerprint image enhancement. IEEE Trans Image Process 21(4):220–227 2. Thabit R, Khoo BE (2014) Robust reversible watermarking scheme using Slantlet transform matrix. J Syst Softw 88:74–86 3. Venkatesan R, Vazirani V, Sinha S (2001) A graph theoretic approach to software watermarking. In: 4th international information hiding workshop, Pittsburgh, PA, pp 157–168 4. Stern JP, Hachez GE, Koeune FC, Quisquater J-J (2000) Robust object watermarking: application to code. In: Information hiding, vol 1768, lecture notes in computer science. Springer Berlin Heidelberg, pp 368–378 5. Monden A, Iida H, Matsumoto K-I, Inoue K, Torii K (2000) A practical method for watermarking java programs. In: Compsac2000, 24th computer software and applications conference, pp 191–197 6. Cousot P, Cousot R (2004) An abstract interpretation-based framework for software watermarking. In: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on principles of programming languages, vol 39, no 1, pp 173–185 7. Nazir S et al (2019) Birthmark based identification of software piracy using Haar wavelet. Math Comput Simul 166:144–154 8. Kim D et al (2014) A birthmark-based method for intellectual software asset management. In: Presented at the 8th international conference on ubiquitous information management and communication, Siem Reap, Cambodia 9. Tian Z, Zheng Q, Liu T, Fan M (2013) DKISB: dynamic key instruction sequence birthmark for software plagiarism detection. In: High performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (HPCC_EUC), IEEE 10th international conference on 2013, pp 619–627 10. Ma L, Wang Y, Liu F, Chen L (2012) Instruction-words based software birthmark. Presented at the proceedings of the 2012 fourth international conference on multimedia information networking and security 11. Chan PPF, Hui LCK, You SM (2011) JSBiRTH: dynamic javascript birthmark based on the run-time heap. Presented at the proceedings of the 2011 IEEE 35th annual computer software and applications conference 12. Lim H, Park H, Choi S, Han T (2009) A static java birthmark based on control flow edges. Presented at the proceedings of the 2009 33rd annual IEEE international computer software and applications conference, vol 01
An Overview on the Identification of Software Birthmarks for Software …
329
13. Zhou X, Sun X, Sun G, Yang Y (2008) A combined static and dynamic software birthmark based on component dependence graph. Presented at the international conference on intelligent information hiding and multimedia signal processing 14. Park H, Choi S, Lim H-I, Han T (2008) Detecting Java class theft using static API trace birthmark (in Korean). J KIISE: Comput Practices Lett 14(9):911–915 15. Bai Y, Sun X, Sun G, Deng X, Zhou X (2008) Dynamic K-gram based software birthmark. Presented at the proceedings of the 19th Australian conference on software engineering 16. Myles G, Collberg C (2005) K-gram based software birthmarks. Presented at the proceedings of the 2005 ACM symposium on applied computing, Santa Fe, New Mexico 17. Nazir S, Shahzad S, Nizamani QUA, Amin R, Shah MA, Keerio A (2015) Identifying software features as birthmark. Sindh Univ Res J (Sci Ser) 47(3):535–540 18. Nazir S, Shahzad S, Nizamani QUA, Amin R, Shah MA, Keerio A (2015) Identifying software features as birthmark. Sindh Univ Res J (Sci Ser) 47(3):535–540 19. Tian Z, Zheng Q, Liu T, Fan M, Zhuang E, Yang Z (2015) Software Plagiarism detection with birthmarks based on dynamic key instruction sequences. IEEE Trans Softw Eng 41(12):1217– 1235 20. Liu K, Zheng T, Wei L (2014) A software birthmark based on system call and program data dependence. Presented at the proceedings of the 2014 11th web information system and application conference 21. Park D, Park Y, Kim J, Hong J (2014) The optimized grouping value for precise similarity comparison of dynamic birthmark. Presented at the proceedings of the 2014 conference on research in adaptive and convergent systems, Towson, Maryland 22. Park S, Kim H, Kim J, Han H (2014) Detecting binary theft via static major-path birthmarks. Presented at the proceedings of the 2014 conference on research in adaptive and convergent systems, Towson, Maryland 23. Tian Z, Zheng Q, Liu T, Fan M, Zhang X, Yang Z (2014) Plagiarism detection for multithreaded software based on thread-aware software birthmarks. Presented at the proceedings of the 22nd international conference on program comprehension, Hyderabad, India 24. Wang X, Jhi Y-C, Zhu S, Liu P (2009) Detecting software theft via system call based birthmarks. In: Computer security applications conference. ACSAC ’09. Annual, pp 149–158 25. Mahmood Y, Sarwar S, Pervez Z, Ahmed HF (2009) Method based static software birthmarks: a new approach to derogate software piracy. In: Computer, control and communication. IC4 2009. 2nd international conference on 2009, pp 1–6 26. Park H, Lim H-I, Choi S, Han T (2009) Detecting common modules in java packages based on static object trace birthmark (in English). Comput J 54(1):108–124 27. Park H, Choi S, Lim H, Han T (2008) Detecting code theft via a static instruction trace birthmark for Java methods. In: 2008 6th IEEE international conference on industrial informatics, pp 551–556 28. Park H, Choi S, Lim H, Han T (2008) Detecting java theft based on static API trace birthmark. In: Advances in information and computer security Kagawa, Japan. Springer-Verlag 29. Schuler D, Dallmeier V, Lindig C (2007) A dynamic birthmark for java. Presented at the proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, Atlanta, Georgia, USA 30. Lu B, Liu F, Ge X, Liu B, Luo X (2007) A software birthmark based on dynamic opcode n-gram. Presented at the proceedings of the international conference on semantic computing 31. Choi S, Park H, Lim H-I, Han T (2007) A static birthmark of binary executables based on API call structure. Presented at the proceedings of the 12th Asian computing science conference on advances in computer science: computer and network security, Doha, Qatar 32. Tamada H, Nakamura M, Monden A (2004) Design and evaluation of birthmarks for detecting theft of java programs. In IASTED international conference on software engineering, pp 17–19 33. Myles G, Collberg C (2004) Detecting software theft via whole program path birthmarks. In: Zhang K, Zheng Y (eds) Information security: 7th international conference, ISC 2004, Palo Alto, CA, USA, 27–29 Sept 2004. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 404–415
330
S. Nazir and H. U. Khan
34. Guan Q, Huang H, Luo W, Zhu S (2016) Semantics-based repackaging detection for mobile apps. In: Caballero J, Bodden E, Athanasopoulos E (eds) Engineering secure software and systems: 8th international symposium, ESSoS 2016, London, UK, April 6–8, 2016. Proceedings. Springer International Publishing, Cham, pp 89–105 35. Anjali V, Swapna TR, Jayaramanb B (2015) Plagiarism detection for java programs without source codes. Procedia Comput Sci 46:749–758 36. Nazir S, Shahzad S, Khan SA, Ilya NB, Anwar S (2015) A novel rules based approach for estimating software birthmark. Sci World J 2015 37. Daeshin P, Hyunho J, Youngsu P, JiMan H (2014) Efficient similarity measurement technique of windows software using dynamic birthmark based on API (in Korean). Smart Media J 4(2):34–45 38. Kim D et al (2013) Measuring similarity of windows applications using static and dynamic birthmarks. Presented at the proceedings of the 28th annual ACM symposium on applied computing, Coimbra, Portugal 39. Jang M, Kim D (2013) Filtering illegal Android application based on feature information. Presented at the proceedings of the 2013 research in adaptive and convergent systems, Montreal, Quebec, Canada 40. Jang J, Jung J, Kim B, Cho Y, Hong J (2013) Resilient structural comparison scheme for executable objects. In: Communication and computing (ARTCom 2013), fifth international conference on advances in recent technologies in 2013, pp 1–5 41. Chae D-K, Ha J, Kim S-W, Kang B, Im EG (2013) Software plagiarism detection: a graphbased approach. Presented at the proceedings of the 22nd ACM international conference on information & knowledge management, San Francisco, California, USA
The Mayan Culture Video Game—“La Casa Maya” Daniel Rodríguez-Orozco, Amílcar Pérez-Canto, Francisco Madera-Ramírez, and Víctor H. Menéndez-Domínguez
Abstract One of the most important cultures in Central America is the Mayan Culture, whose preservation is essential to keep the traditions alive. In this work, a video game with Mayan environment as a scenario is proposed. The objective is that the player acquires the necessary knowledge to deepen in this beautiful culture to obtain new experiences as a Mayan individual. The video game consists in a tour of several places with Mayan traditions where the user can collect coins to achieve points and learn important information about the civilization. We want the user to learn through entertainment technology and feel engaged by the Mayan culture. Keywords Mayan culture · Educational video game · Computer graphics · Gamification
1 Introduction Currently, the Mayan Culture in the Yucatan Peninsula (Mexico) has been losing much of its presence over the years, mainly in the cities, being the reason for several studies to identify the possible causes [20]. In this sense, the role that education plays to promote and preserve this culture is undeniable. Teaching methods proposed in many schools can be improved, since reading, the use of books, and extensive research tasks are not a habit in Mexican society, so students are losing interest in knowing their roots and cultural aspects, avoiding transmitting the customs and traditions that D. Rodríguez-Orozco · A. Pérez-Canto · F. Madera-Ramírez · V. H. Menéndez-Domínguez (B) Universidad Autónoma de Yucatán, Mérida, México e-mail: [email protected] D. Rodríguez-Orozco e-mail: [email protected] A. Pérez-Canto e-mail: [email protected] F. Madera-Ramírez e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_28
331
332
D. Rodríguez-Orozco et al.
the Mayan culture has inherited from us. Nevertheless, technological innovations, such as virtual scenarios, wireless mobile devices, digital teaching platforms, and virtual and augmented reality, increase students’ interest and motivation, as well as their learning experience [17, 21]. In this sense, due to technological innovations in recent years, video games have gained a great importance among the young population as can be seen in some conferences (https://gdconf.com, https://www.siggraph.org/). We decided to promote a playful and vibrant strategy using the latest generation software to make a cultural video game, as we believe it is an effective way to connect with people who are interested in living new experiences through our culture and it will allow the transmission of the wonderful customs and traditions offered in Yucatan, a Mexican state. So why is it important to promote and preserve the Mayan culture? Because it represents the link and the teachings that our ancestors left us to maintain our identity, since many aspects related to the Mayan language, traditions, and customs explain most of our personality allowing us to share new ideas with other cultures. For this purpose, the steps of the construction of the video game “La Casa Maya” are described, using important aspects such as the architecture of Mayan buildings, the geographic location, the distribution of the Solar Maya, and some important utensils created by the culture.
2 Development of the Mayan Culture Video Game 2.1 Preliminary Investigation The Mayan culture contains many important aspects that reflect its multiculturalism, and for many people, this will be their first contact with the Mayan culture; thus, it was decided to explain in a simple way many of the aspects that make up the Mayan culture. We placed small posters containing the necessary information that allows the player to understand the meaning, in such a way that the user can enjoy the video game without feeling pressure to learn everything immediately. Some of the main concepts considered are the following: “The Huano Houses” that are structures where the Mayas used to take refuge, “The Pib Oven” where food is prepared, or “The Hamacas” that are used to rest. This project revolves around how the Maya managed to survive the threatening nature found in Yucatan, so it was decided to focus on the portions of land called “solares”, which are related to a house where the Mayas feel safe (Fig. 1).
2.1.1
Background Information
We collected information about the existence of video game titles that tell stories about the Mayan culture as follows.
The Mayan Culture Video Game—“La Casa Maya”
333
Fig. 1 Screenshots of the gameplay, the house’s indoor (left), information poster in the ground (right)
• Age of Empires: This is a classic video game that lets you rebuild the power of the Mexica from scratch: you will be the power behind the throne in charge of building the city and you will also define the speed of its development as a society, [26]. As mentioned, the video game is a little about the Mayan culture, so it does not touch on important topics such as the traditions and experiences of the native Mayans in Yucatan, and its focus is not purely educational. • Mictlan: An Ancient Mythical Tale: The Mictlan is located in a fantastic world very influenced by the pre-Columbian Mesoamerican cultures. Users will immerse in a dark and varied world exploring detailed locations with an incredible narrative depth while experiencing a rich atmosphere of a hidden past [16]. The approach of the game is not entirely based on real facts, and its audience will be reduced due to the amount of violence that the game emits by its nature related to themes of conquest. The main purposes of our video game are to form an educational environment that allows us to reach all types of audiences, intuitive, that has truthful and useful information that is a support resource to preserve the Mayan culture. • Naktan: It is a 3D adventure game, which aims to recreate scenarios, characters, and part of the Mayan culture, being Akbal the main character, a 12-year-old boy who will have to search for his family and on the way; he will learn to become a warrior, understanding all the mystery that surrounds him [13]. Probably Naktan would have been a great video game that reflects the Mayan culture, but unfortunately the game will not be able to be finished due to the lack of economic resources required. 2.1.2
The Mayan Solar
Traditionally, the Mayan Solar is the property or plot of land, between 250–1,000 m2 presented in Fig. 2, where most of the activities of the Maya family in Yucatan take place, separated from the outside by an albarrada (wall of stones). It consists of the Casa Maya, some small buildings, and an open area that is delimited by the albarrada [4]. The Mayan Solar on the Yucatan peninsula can be divided into two zones: intensive use and extensive use [10]. The intensive use zone includes the space near
334
D. Rodríguez-Orozco et al.
Fig. 2 Photo of a Mayan Solar [3]
the house, where the laundry room, kitchen, water tanks, and animal housing (e.g., chicken coops) are located. Fruit trees, vegetables, ornamental, and medicinal plants are also cultivated [10]. The extensively used area includes secondary vegetation that is utilized as firewood, construction material, and organic fertilizers [5, 10, 18] There are many buildings on each Solar, the main building (traditionally located in the center of the Solar) is the dwelling house (naj, in Maya language), which includes the Maya house and kitchen (k’oben, Maya language), and the roof is built by guano palms. Surrounding the main building is a seedbed (k’anché), an elevated structure for storing corn cobs and other medicinal plants, as well a(ch’e’en), for storing water in places where there are no cenotes and a batea (nukulíp’o’o) [4].
2.2 General Idea About the Subject of the Video Game The video game is classified as a serious game genre since teachers can engage their students with educational content, so that students learn while having fun [9]. The video game is aimed at all audiences who wish to learn about Mayan Culture, but especially at students between the ages of 8 and 25 years old. It is developed in first person view to feel a close experience of being a Mayan traveler looking for knowledge. The protagonist is Itzamná, a wise and all-powerful Mayan god who created the world and all that inhabits it. He has promised to share his knowledge of the culture through a tour he has planned for the player. At the beginning of the game, Itzamná creates coins over the map to guide the player to interesting places with valuable information that will complement the knowledge to complete the tour. The video game is developed to be entertaining and eye-catching, so that players stay as long as possible learning about the Mayan culture; immersion is necessary, as it incites the player to know the game, commits him to play it constantly, gives him a fun experience, and gives him the ability to concentrate on the game [1]. One way
The Mayan Culture Video Game—“La Casa Maya”
335
Fig. 3 Two map perspective visualization, from a top-view camera
to achieve this immersion is with “presence”, defined as the feeling of “being there”, feeling a virtual space as if it were real [14]. To achieve this feeling, the player can explore the entire map freely and can observe the details of each element, in such a way that they become familiar and stop perceiving the game as a virtual space and feel that they are in an authentic Mayan house (Fig. 3).
2.3 3D Meshes and Models 3D models were created in Blender (https://www.blender.org/) and Cinema 4D (https://www.maxon.net/es/cinema-4d), using tools that facilitate the texturing and deformation of the models. The workload of the creation of each mesh, the workload was divided into two segments, the modeling of small meshes that we would call “props”, and the elaboration of larger meshes (from 538 to 28,128 polygons approximately) that would form the architectural base of the map.
2.3.1
Props Models
3D models in video games are known as “props” and contain a small number of polygons compared to the architectural models. Their function is generally aesthetic and provides a more natural and splendid environment to the map, and they take the player to an immersion that enhances the gaming experience. Models were made with the original scale of each product using different visual references to carry out their modeling (Fig. 4). • Dishes and Ceramics: For the Mayan culture, utensils are more than containers to place food, some of them are used to measure the amount of food, while others are side dishes [15]. • Metate: A rectangular-shaped stone tool for grinding food, especially harvested grains such as maize [24]. The symbolic value of this element is very great since maize is important for the Mayans.
336
D. Rodríguez-Orozco et al.
Fig. 4 Some of the 12 props employed in the game
• Portable Furnace: Also called “comal”, it is a small oven where the firewood is introduced through the hole and on top is where foods such as tortillas are cooked. It is useful to take into the rainforest for outdoor cooking [15]. • Tables: There is a great variety of tables, and large stones are employed to support them. • Wooden Chair: At mealtime, Mayans would gather in the kitchen, which also served as a social area, and sit on wooden logs. These wooden trunks were also used for those women who prepared the maize dough as it was a very long and tiring process. 2.3.2
Architectural Models
The architecture of the video game map is the spatial or cartographic distribution of the buildings, trees, and all kinds of objects that captivate the player. The project focuses mainly on the ecosystem that the state of Yucatan provides, so through many images and references, we were able to create a similar environment. For the setting of the map, 10 structures were modeled to make the game experience more realistic (Fig. 5). • The albarrada is a wall made of stones that delimits a plot of land, usually used in the house lots to determine its size. It serves as the limit of the map. • The huano house is one of the indispensable structures of the Mayas, since it is here where one sleeps, cooks, washes, lives, and worships. They have a measure
Fig. 5 Some architectural models utilized in the video game
The Mayan Culture Video Game—“La Casa Maya”
337
called “vara” that allowed the Mayas to measure the head of the family [6]. The sensation of living in a Mayan house is of relief, because of the nature of its location, this type of structure provides a cool and shade that counteract the heat of the Yucatan Peninsula, suitable for the vegetation which allows these houses can be built because the materials for construction of this type are found in the region [23]. • Hamaca: Of Caribbean origin, the “hamaca” (from hamack which means tree in Haitian) is the place where the Mayas rested and slept in their homes. Originally made from the bark of the hamack tree, they were later made from mescal or henequen as they provided greater comfort [22]. • The Mayan “Comal” consists of three stones (tenamascles) and a clay disc on which the food is placed, and a fire is lit below. This is where the tortillas are cooked [8]. 2.3.3
Level Map Design
Many articles on the Yucatan ecosystem were investigated, including the topography and vegetation that exist in the area. No animals were used, but there are plans to incorporate endemic. Yucatan has a dry and semi-dry climate, which implies temperatures of approximately 36 °C (96.8 °F), [11]. This type of climate allows the planting and harvesting of various seeds, such as beans, corn, oranges, and henequen. It is no coincidence that the Mayan culture was dedicated to harvesting such seeds to be included in its daily diet.
3 Experimentation and Results 3.1 Usability Analysis of the Video Game In this section, the first usability test of the video game is presented. It is important to note that the video game is still in the development phase and that this is the first approach with a real audience that will test the initial demo version of the video game. By usability, we refer to a quality attribute that establishes how easy and useful a system is to use, assessing whether users interact with the system in the easiest, most comfortable, and intuitive way [7]. The aim of the experiment was to record the opinion of a group of people about the video game interface in relation to its usefulness and ease of use employing the SUS (System Usability Scale) survey [2]. The SUS tool has been used in system and application usability studies in both industry and academia to compare systems [25], ensuring its effectiveness as a tool for measuring the perceived usability of a system. According to Tullis and Stetson, with a small sample group, it is possible to obtain reliable results on the perceived ease of use of a system [25], and it is possible to find
338
D. Rodríguez-Orozco et al.
85% of usability problems of a product with at least 5 people of population, so the feedback from the test with a sample of 5 is enough to fix software problems [12]. The participant group consisted of a representative sample of 5 male Mexican students from the Faculty of Mathematics at UADY (Universidad Autonoma de Yucatan), whose training area is Engineering and Technology, and their ages ranged between 19 and 23 years old. These students have maintained a basic contact with the Mayan culture in the state of Yucatan, and they have between 5 and 10 years of experience in the use of video games. The experiment was conducted online. First, each participant responded to a survey (https://forms.gle/f7ow3VpbTm33PCXY9) with general information about their background and experience with Mayan culture and video games. Next, a link was sent to each participant to download and install the video game demo, and finally, a brief opinion was asked about the improvements in the game and what the experience obtained. The SUS survey is a questionnaire with 10 items that users score according to their level of acceptance, using a Likert scale from 1 (strongly disagree) to 5 (strongly agree). The algorithm described by Brooke [2] in SUS, A quick and dirty usability scale, was used to obtain a total score from 0 to 100. Having a score above 70 on the test categorizes the usability of the system as Acceptable, above 85 as Excellent, and equal to 100 as the best imaginable. A graph with the scores obtained is shown in Fig. 6. Calculating an average of the evaluation results gives a score of 78.5, an acceptable score because it is above 70, which in the SUS is classified as “good” since it is above the average of 50 [2]. From the results obtained, we can conclude that it is necessary to improve aspects related to the gameplay and user interface, as well as certain technical details for the proper functioning of the game. Some of the participants’ comments highlight that the video game is quite good and that after playing they can differentiate the essential aspects of the Mayan Culture. However, they would like to see more mechanics implemented to make the game more challenging. We consider that the results are good for the video game demo, even so, we will continue working
Fig. 6 SUS test results of the experiment carried out
The Mayan Culture Video Game—“La Casa Maya”
339
to increase the video game quality and to improve the project by implementing more missions and map variations to cover more themes of the Mayan culture. In this section, we share the demo of the video game; in case the reader wishes to play the final product, a cordial invitation is made to download the video game “The Mayan Solar—(El Solar Maya)” in the following link https://bit.ly/3mi2gTR. Read in the link for the Minimum hardware requirements. If you do not have the minimum computer requirements, we also invite you to watch the video presentation of the final product “The Solar Maya Gameplay Walkthrough” in https://bit.ly/3xoVUZp.
4 Conclusion The purpose of this article was to present the creation of the video game “The Mayan Solar” to attract people who want to learn about the customs and traditions of the Mayan culture. The Mayan culture plays a very important role in the personality of many Mexicans due to the great expansion and importance that this culture had in the south of the Mexican Republic and neighboring countries of Central America. We can still find great vestiges and traditions within the populations close to the town that have contact with the native people. We want to promote the idea that we will continue working on the update of the video game, so that it can compete with other products on the market and thus support the transmission of culture and preserve the traditions and customs. Also, more experiments must be done by using more people, with different backgrounds.
References 1. Armenteros M, Fernández M (2011) Inmersión, presencia y fow. Contratexto 0(019):165–177 2. Bangor A, Kortum PT, Miller JT (2009) Determining what individual SUS scores mean: adding an adjective rating scale. J Usability Stud 4(3):114–123; Brooke J (2004) SUS—a quick and dirty usability scale. Usability Eval Ind 3. Brown A (2009) Flickr, Solar Maya (recovered on june 6, 2022). https://www.flickr.com/pho tos/28203667@N03/4225464260 4. Cabrera Pacheco AJ (2014) Estrategias de sustentabilidad en el solar maya Yucateco en Mérida, México. https://rua.ua.es/dspace/bitstream/10045/34792/1/ana-cabrera.pdf 5. Castaneda Navarrete J, Lope Alzina D, Ordoñez MJ (2018). Los huertos familiares en la península Yucatán. Atlas biocultural de huertos familiares México (recovered on june 6, 2022). https://www.researchgate.net/publication/328103004_Los_huertos_familiares_en_la_ peninsula_de_Yucatan 6. Chavez ONC, Vázquez AR (2014) Modelo Praxeológico Extendido una Herramienta para Analizar las Matemáticas en la Práctica: el caso de la vivienda Maya y levantamiento y trazo topográfico. Bolema: Boletim de Educação Matemática 28(48):128–148 7. Dumas JS, Reddish JC (1999) A practical guide to usability testing. Intellect Rev(1) 8. Escobar Davalos I (2004) Propuesta para mejorar el nivel de aceptación de las preparaciones culinarias tradicionales de la sierra ecuatoriana, aplicadas a la nueva cocina profesional. Universidad Tecnológica Equinoccial
340
D. Rodríguez-Orozco et al.
9. Fuerte K (2018) ¿Qué son los Serious Games? Instituto para el Futuro de la Educación, Tecnológico de Monterrey (recovered on august 27, 2022). https://observatorio.tec.mx/edunews/que-son-los-serious-games 10. Herrera Castro ND (1994) Los huertos familiares mayas en el oriente de Yucatán. Etnoflora yucatanense, fascículo 9. Universidad Autónoma de Yucatán 11. INEGI (2018) Información por entidad, Yucatán, Territorio, Relieve (recovered on june 6, 2022). https://cuentame.inegi.org.mx/default.aspx 12. Lewis JR (2014) Usability: lessons learned … and yet to be learned. Int J Hum-Comput Interact 30(9):663–684 13. MartinPixel (2017) Naktan, un videojuego desarrollado en México que busca difundir la cultura maya, (recovered on june 6, 2022) https://www.xataka.com.mx/videojuegos/naktan-un-videoj uego-desarrollado-en-mexico-que-busca-difundir-la-cultura-maya 14. Mcmahan A (2003) Immersion, engagement, and presence: A method for analyzing 3-D video games. Video Game Theory Reader 67–86 15. Mexico Documents (2015). Utencilios Mayas. vdocuments.mx (recovered on june 6, 2022). https://vdocuments.mx/utensilios-mayas1.html 16. Mictlan: An Ancient Mythical Tale (2022). Steam, indie videogames, mictlan: an ancient mythical tale (recovered on june 6, 2022). https://store.steampowered.com/app/1411900/Mic tlan_An_Ancient_Mythical_Tale/?l=spanish 17. Nincarean D, Alia MB, Halim NDA, Rahman MHA (2013) Mobile augmented reality: the potential for education. Procedia Soc Behav Sci 103:657–664 18. Ordóñez Díaz MDJE (2018) Atlas biocultural de huertos familiares en México: Chiapas, Hidalgo, Oaxaca, Veracruz y península de Yucatán 19. Osalde A (2022) Las Albarradas: el Legado de Apilar Piedra Sobre Piedra, Yucatán Today, (recovered on march 15, 2023). https://yucatantoday.com/las-albarradas-el-legado-de-apilarpiedra-sobre-piedra/ 20. Ramírez Carrillo LA (2006) Impacto de la globalización en los mayas yucatecos. Estudios de cultura maya 27:73–97 21. Roussos M, Johnson A, Moher T, Leigh J, Vasilakis C, Barnes C (1999) Learning and building together in an immersive virtual world. Presence 8(3):247–263 22. Sánchez ARP, Contreras PT (2018) Hamacas artesanales como producto de exportación. Jóvenes En La Ciencia 4(1):1272–1277 23. Sánchez Suárez A (2006) La casa maya contemporánea: Usos, costumbres y configuración espacial. Península 1(2):81–105 24. Searcy MT (2011) The life-giving stone. University of Arizona Press, Ethnoarchaeology of Maya metates 25. Tullis T, Albert W (2016) Measuring the user experience: collecting, analyzing, and presenting usability metrics. Morgan Kaufmann, Amsterdam 26. Xiu (2020). Matador network, 6 videojuegos basados en la época prehispánica (recovered on june 6, 2022). https://matadornetwork.com/es/videojuegos-sobre-la-epoca-prehispanica/
Impact of Decentralized and Agile Digital Transformational Programs on the Pharmaceutical Industry, Including an Assessment of Digital Activity Metrics and Commercial Digital Activities António Pesqueira , Sama Bolog, Maria José Sousa , and Dora Almeida Abstract Developing digital transformational and measurement capabilities within the pharmaceutical industry is considered one of the most important factors for delivering commercial excellence and business innovation. Digital transactional programs (DTP) are criticized and evaluated from different perspectives, including how they are used, as well as how they generate value for pharmaceutical companies. From March 2nd through April 18th, 2022, 315 pharmaceutical professionals and leaders were surveyed on the impact of decentralized and agile digital transformational programs on the pharmaceutical industry via an online structured questionnaire that included closed questions such as assessing digital activity metrics and commercial digital activities. This paper conducted assessments with various assumptions about innovation, the relevance of decentralized and agile initiatives, and the impact on commercial excellence to gain insight into the complexity of the assumptions and to evaluate the overall value of digital empowerment and knowledge increase. These results and comparable questionnaire analyses show the importance of using new decentralized digital technologies, metric understanding, and new DTP that enhance the ability of industry professionals to make more effective diagnoses, perform better digital procedures, and access appropriate information. Statistical analysis indicates that the findings are related to the impact and innovation created by DTP on product launch strategies, but also the overall impact on innovation generation within companies.
A. Pesqueira (B) · M. J. Sousa University Institute of Lisbon (ISCTE), Lisbon, Portugal e-mail: [email protected] M. J. Sousa e-mail: [email protected] S. Bolog University of Basel, Basel, Switzerland e-mail: [email protected] D. Almeida Independent Researcher, Lisboa, Portugal © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_29
341
342
A. Pesqueira et al.
Keywords Digital research findings · Digital metrics · Commercial excellence · Metrics insights · Innovation
1 Introduction Despite not being a new topic in the pharmaceutical context, digital transformational programs (DTP) can bring benefits to different stakeholders in the commercial field, but not all internal operations and functions benefit directly from it. The pharmaceutical industry is constantly creating new applications for analyzing and displaying big data available to all stakeholders in the health system in a powerful way that automatically creates opportunities for DTP. These applications can, however, be a driving force of change in the sector, particularly with the use of digital mobile data [2, 10]. Different stakeholders within pharma are responsible for different data components, such as healthcare providers concerning providing better healthcare services, researchers and developers of new products aimed at improving quality of life as well as other stakeholders involved in health-related processes and sharing digital healthcare data [7]. Personal health data are sensitive; therefore, ethical and legal questions must be considered when analyzing this data and especially with the use of DTP. The use of decentralized technologies like blockchain and agile digital strategies helps to optimize the decision-making and business strategy execution processes, but not always are the most effective and efficient methods utilized [1]. In this study, we examine how digital transformation is impacting and influencing the pharmaceutical industry with a key focus on commercial functions. Also, part of the selected methodology is a better understanding of DTP innovations and the factors that influence digital adoption by pharmaceutical companies in an attempt to answer the research questions above. This study introduces new research areas like assessing the impact and influence of decentralized and agile DTP on the pharmaceutical industry, but also provides a better understanding of key metrics, learning initiatives, and digital activities that are deemed relevant. The primary research questions are as follows: Do new decentralized and agile digital transformational programs impact the brand strategy, commercial execution, and new product launches? Question 2—Which are the most important factors that facilitate digital transformations? Questions 3—What are the relevant metrics and digital activities part of digital transformation?
2 Literature Review Managing and optimizing digital channels is now a prerequisite for pharmaceutical companies, as well as focusing on substantial investments with clarity connected with
Impact of Decentralized and Agile Digital Transformational Programs …
343
DTP. Additionally, COVID-19 has caused significant changes in how pharmaceutical companies interact with their market and stakeholders, as well as altered internal team dynamics and forced several companies to change customer experience strategies [6]. The pharmaceutical industry is developing different DTP to implement decentralized or web-based solutions, as well as applying agile models and working concepts to engage with different stakeholders [3]. Research and development in pharmaceuticals are utilizing artificial intelligence to discover new drugs, aid in clinical trials, and improve supply chain management. The use of digital communications to disseminate educational materials and wellness advice is increasing among pharmaceutical companies to better engage their patients [11]. Nevertheless, compliance oversight continues to be a time-consuming process for most large pharmaceutical companies. It is further complicated by the fact that sales teams must be trained in an engaging and informative manner, reports must be produced for the board of directors and senior management, interactions with healthcare professionals (HCPs) should be monitored and audited, and when necessary, violations should be investigated and remedied [9]. The coordination of compliance across legal, human resources, sales, and marketing departments is a critical aspect of this process [8]. Technology solutions such as knowledge management software, content management, workflow tools, and other tools can help with this digital transformation effort. The purpose of DTP is to modernize different operational programs, such as commercial programs, using technology to streamline operations, improve efficiency, automate repetitive tasks, and more proactively engage relevant stakeholders. Among the areas for which automated workflows and timely completion will prove beneficial are employee onboarding, sales training, and monitoring product launches [5]. However, the pharmaceutical industry is still in most cases missing out on the wider picture, considering that digital transformation is still being redefined across different companies. Healthcare providers, key opinion leaders, regulators, and product or public decision-makers all have unique needs, biases, and preferences. A successful DTP content strategy, analytics, and digital experiences can only be accomplished when there is a focus on managing and optimizing digital channels and focusing on concrete investments with a clear return on investment [3]. Thus, DTP are often regarded as having vast benefits, yet they are sometimes not fully understood by a broader audience. Among their benefits are automation and the improvement of various operations’ quality levels. Those programs should also be a means of increasing agility, scaling different processes and results as well as reducing costs without forgetting to consider the integration of systems and business processes [4]. As part of the implementation process, different pharmaceutical companies are looking for every opportunity to blend processes and data consistently, avoiding silos of information and manual intervention [4]. Sales and marketing pharmaceuticals’ ability to be able to identify the target audience and address their unmet needs through the building of local networks and relationships with HCPs is quite important. To meet this need, different companies
344
A. Pesqueira et al.
are forced to develop a more effective marketing mix utilizing all the features and benefits of each new channel and go-to-market strategy [9]. Technology has driven rapid changes over the last few years and taking advantage of these advancements with the ability to promote shared networks and external partnerships has become critical to the future of technology. A new decentralized digital technology is an early-stage technology that allows the storage and exchange of information in a decentralized and secure manner. It can minimize friction, reduce corruption, increase trust, and empower different users of different systems by providing an effective tool for tracking and transactions. Many decentralized digital technologies like blockchain are still nascent, but they are potentially transforming the healthcare and life sciences industries by creating new wealth and disrupting them. However, key challenges include the lack of interoperability, security threats, centralization of power, and a reluctance to experiment due to recent overhype [9].
3 Methodological Approach Part of this research work was designing an online questionnaire that contained questions concerning all proposed topics. A survey sent to 315 pharmaceutical leaders and professionals from different ranges asked which digital activities they thought were relevant and which influential and impact factors are relevant to be considered for enhancing the value of an overall DTP but also for the implementation of decentralized and agile digital initiatives. In this section, we go over the methodology used in addition to other key metrics that helped in drawing meaningful conclusions and better understanding the relationship between key variables during the questionnaire data analysis. Since the purpose of the study is to assess the impact and influence of decentralized and agile DTP to brand strategy, innovation, and commercial excellence, the quantitative method was deemed appropriate for the study. Also, part of the selected methodology is a better understanding of digital innovations, and the factors that influence digital adoption by pharmaceutical companies are an attempt to answer the described research questions. The selected methodology was the most appropriate strategy for the investigation mainly due to its ability to assess and understand different digital transformation characteristics as well as to have a more detailed understanding of different influencing factors of digital transformation in the pharmaceutical industry.
3.1 Questionnaire Design and Variables Selection After the literature review, we created a structured questionnaire that was formulated by the defined methodology approach process that was already explained.
Impact of Decentralized and Agile Digital Transformational Programs …
345
From March 2nd to April 18th, 2022, an online survey was conducted where the total number of respondents across all global regions was 315, all working in the pharmaceutical sector. In addition, one survey was administered to external consultants and experts to gauge their views on the designed questions and research methodology. Based on previous studies that were translated and validated by an expert committee consisting of two specialists, a statistician, and the authors, the study used a self-administered survey developed by the authors using Google Forms (Google LLC, Mountain View, CA, USA). The questionnaire was administered in English and included demographic questions, but also questions about the digital transformation impact, investments, current applications, and future applications. The research strategy was to distribute a link to the questionnaire using e-mail and phone messages, describing the purpose of the study and inviting additional respondents based on the initial respondents identified and their network of contacts to participate. Respondents were asked to complete the questionnaire anonymously, to respect all respondents’ right to privacy, and not identify their affiliated organizations in the database. To validate the consistency of the questionnaire, ten respondents, a representative sample of the study population, were analyzed before the distribution of the survey. We selected individuals from companies around the globe with a confidence level of 95% (and p = q = 0.5) who have an interest in digital transformation, relevant experience, or knowledge of transformative technologies such as blockchain or artificial intelligence, and a focus on the life sciences industry.
4 Results and Analysis 4.1 Descriptive Analysis The following figures show a summary of all descriptive information regarding the data collection and corresponding characteristics from the study sample. The sample includes pharmaceutical professionals from all over the world, as seen in the following Fig. 1. According to the working organizations, vaccines are the most prevalent therapeutic sector with 66 (21%) respondents, followed by oncology with 19% respondents. Similarly, the other therapeutic business areas are represented in percentages in Fig. 2. The seniority level of the sample is quite representative: 40% of the respondents have the job title of vice-president or senior vice-president, and 31% of the respondents have the job title of associate director or director or senior management position, as shown in Fig. 3. The driving factors in the surveyed companies when asked about implementing decentralized and agile programs show that companies are implementing digital
346
A. Pesqueira et al.
Fig. 1 Sample region or market where the affiliated organization is primarily located
Fig. 2 Therapeutical areas from the respondents working organizations
decentralized and agile solutions to support excellent product launches (34%), accelerate the engagement of key opinion leaders (17%), and enable faster time to market (15% of all responses) as shown in Fig. 4. In terms of sales and marketing effectiveness, the following digital metrics were highlighted in terms of their importance and business relevance: 18% are related to the tracking and success of digital initiatives against initial planning and budget, while 17% are related to digital success by segment and customer group, pricing strategy, and competitor positioning against pricing strategies, as shown in Fig. 5.
Impact of Decentralized and Agile Digital Transformational Programs …
347
Fig. 3 Level of seniority and job title
Fig. 4 Driving forces from the organizational implementation of decentralized and agile digital programs
The final figures present a graphical analysis showing the combination of seniority and digital factors in deciding on new skills or training needs in new digital development plans. Here we can see that for VPs/SVPs the most important factors are new trends in the field, competitive intelligence, or marketing research, and then the influence of senior management or other organizational leaders, which is an interesting factor as the feedback comes from the VPs/SVPs level itself. For the middle level of organizational decision-making, we can see that the digital vision or understanding of the company’s vision and mission to achieve digital
348
A. Pesqueira et al.
Fig. 5 Most relevant metrics for effectiveness and performance
success is one of the most critical factors in deciding on new capabilities or training areas in new digital programs, but we also see that for the executive level, competitive intelligence and marketing research are very important (as shown in Fig. 6). In terms of the percentage of time spent on digital-related activities and the deciding factors in deciding on digital skills or training to be included in digital development plans, it is clear that professionals who spend more than 60% of their time on DTP and projects believe that competitive intelligence or market research, new trends in the field, executive influence, and then understanding of the company’s vision and mission are the deciding factors (Fig. 7). The professionals who have spent more time on digital issues believe that the most relevant activities and training formats that are most useful for learning about digital innovation in the pharmaceutical industry are intra-organizational master classes, then online training programs from academies or universities, and finally live training or certificate programs.
4.2 Statistical Analysis To answer the defined research questions, this paper will introduce the key principles of different comparative and relational logic modes employed as the basis for statistical hypotheses to better understand all the relationships and connections between variables and analyze the most relevant correlations and connections.
Impact of Decentralized and Agile Digital Transformational Programs …
349
Fig. 6 Level of seniority and important factors in deciding new skills or training needs in new digital development plans
To determine whether there is a significant difference between our key variables and the controls between totally different treatments, we first conducted several univariate calculations using univariate statistics. In the case of a classical hypothesis test, only one effect from the treatment group is considered to be responsible for the effects. By analyzing some of the statistical data from the dependent variables, the corresponding analysis could show that both decentralized/agile digital influence (INFL) and digital business impact on innovation (IMPCT) are connected with the independent variables. Analyzing INFL in connection with the grouping variables of the organizational innovation capacity (INNO) and the alignment of the digital transformation to brand strategy and product launch strategies (BRANDLAUNCH), the independent T-test indicates clear influencing levels and through the student’s t-test that the DTP can influence INNO. To get as much information as possible for a compressed analysis, Welch’s t-test was also applied. This uses the square root of mean–variance to standardize differences in means. As a final step in defining the correct t-test, the tests were performed on all cases with valid data. When analyzing the p-value, it was needed to understand differences in means, i.e., the difference in the sample means. Upon testing the reliability coefficient scale through Alpha Cronbach, both INFL and IMPCT exhibited positive reliability coefficient scores (Alpha Score = 0.721 and 0.915, respectively). As evidenced by the high score level obtained for both
350
A. Pesqueira et al.
Fig. 7 Percentage of time in digital-related activities and most decisive factors in deciding for digital skills or training to be included in digital development plans
dependent variables, the internal consistency scale evaluates them positively. When this indicator is equal to or superior to 0.80, which was the case in the case at hand— the IMPCT variable—it is generally considered a good internal consistency measure, as it is higher than 0.60, which is the most common value in exploratory cases. A two-way table (also called a contingency table) was also used to analyze categorical data as more than one variable was involved. The purpose of this type of test was to determine whether there were significant relationships between different category variables. The test consisted of understanding if the COVID-19 pandemic impacted the organization’s digital strategy (COVID) and the digital presence within the organization with clear results to the innovation (DPRE—Digital Presence Indicator). To test the hypothesis and according to the already defined research hypothesis, the critical value for the chi-square statistic determined by the level of significance was calculated and compared. Based on the results, it was possible to accept our hypothesis, which led us to conclude that COVID and DPRE are related (p = 0.569), meaning that COVID has not only positively impacted the organizational digital strategy but also brought clear results in terms of business innovation increase.
Impact of Decentralized and Agile Digital Transformational Programs …
351
5 Discussions These results mean that the two subsets of the variables (COVID and DPRE) allow us to understand statistically significant evidence that there exists a statistically significant association between the dependent variable (IMPCT). Another analysis applied was to analyze the difference between the means of multiple groups through the ANOVAs, where our dependent continuous variables (IMPCT) allow us to test our INNOC variable group and answer the second research question, where the level of the independent variables (was) included into the analysis and the descriptive statistics for each combination of levels for the independent variables. The results showed that there is an association among the variables of interest, specifically between the variables of interest and the independent variable. The final step of our analysis was to apply a linear mixed model, allowing us to explore the relationship between the variables of interest, including the interaction between the variables of interest. We found that projects related to DTP have an impact on commercial success. Here, there was a clear association (df = 1, 0.26, F = 0.114, and p = 0.864) between INFL and IMPCT on commercial excellence outcomes in innovation and performance (COMEX). The analysis used sum contrast coding for categorical predictors and allowed for better interpretability of models with interactions and resulted mainly from the shape of the p-value distribution. To understand the complexity of these assumptions and evaluate the overall value of digital empowerment and knowledge increase, this paper conducted assessments utilizing varied assumptions about innovation, decentralized and agile DTP initiatives, and the impact on commercial excellence.
6 Conclusions In light of these results and comparable questionnaire analyses, it is important to use new decentralized digital technologies, gain a better understanding of metric information, and develop new transformative programs to help industry professionals make more effective diagnoses, perform better digital procedures, and access relevant information. The focus of this study was on the way DTP are impacting and influencing the pharmaceutical industry’s commercial functions. To answer the research questions, a better understanding of digital innovations and factors that influence digital adoption by pharmaceutical companies was considered part of the selected methodology. Consequently, this study not only provided a better understanding of key metrics, learning initiatives, and digital activities that are considered relevant within the pharmaceutical industry but also introduced new research areas from the pharmaceutical industry.
352
A. Pesqueira et al.
Descriptive analysis revealed that the majority of questionnaire samples were based in Europe or North America with commercial leadership positions and primarily from the vaccines and oncology sectors. Conclusions were also drawn regarding the level of seniority and the most important factors in determining new skills or training requirements for new digital development plans, as well as the amount of time spent interacting with digital technology. There was a positive and clear answer to the question of if DTP impacts brand strategy, commercial execution, and new product launches. In addition to providing the necessary info, specific metrics and digital activities were also demonstrated as part of the DTP. In terms of commercial strategy, the findings clearly showed that DTP influences key areas like product launch excellence, the involvement of key opinion leaders, and quicker time to market. Furthermore, it is possible to conclude the skills necessary for organizations to create interdisciplinary teams of quantitative and technical talent to solve strategic business challenges. The statistical analysis indicates that the findings are related to the impact and innovation created by DTP on product launch strategies, but also the overall impact on innovation generation within companies.
References 1. Alla S, Soltanisehat L, Tatar U, Keskin O (2018) Blockchain technology in electronic healthcare systems. In: IIE annual conference. Proceedings. Institute of Industrial and Systems Engineers (IISE), pp 901–906 2. Elhoseny M, Abdelaziz A, Salama AS, Riad AM, Muhammad K, Sangaiah AK (2018) A hybrid model of the internet of things and cloud computing to manage big data in health services applications. Futur Gener Comput Syst 86:1383–1394 3. Finelli LA, Narasimhan V (2020) Leading a digital transformation in the pharmaceutical industry: reimagining the way we work in global drug development. Clin Pharmacol Ther 108(4):756–761 4. Ganesh NG, Chandrika RR, Mummoorthy A (2021) Enhancement of interoperability in health care information systems with the pursuit of blockchain implementations. In: Convergence of blockchain technology and e-business. CRC Press, pp 201–226 5. Haleem A, Javaid M, Singh RP, Suman R (2022) Medical 4.0 technologies for healthcare: features, capabilities, and applications. Internet of Things and Cyber-Physical Systems 6. Hole G, Hole AS, McFalone-Shaw I (2021) Digitalization in the pharmaceutical industry: what to focus on under the digital implementation process? Int J Pharm X 3:100095 7. Massaro M (2021) Digital transformation in the healthcare sector through blockchain technology. Insights from academic research and business developments. Technovation 102386 8. McDermott O, Antony J, Sony M, Daly S (2021) Barriers and enablers for continuous improvement methodologies within the Irish pharmaceutical industry. Processes 10(1):73 9. Pesqueira A (2022) Data science and advanced analytics in commercial pharmaceutical functions: opportunities, applications, and challenges. In: Information and knowledge in the internet of things, pp 3–30 10. Pesqueira AM, Sousa MJ, Mele PM, Rocha A, Sousa M, Da Costa RL (2021) Data science projects in pharmaceutical industry. J Inf Sci Eng 37(5) 11. Pesqueira A, Sousa MJ, Rocha Á (2020) Big data skills sustainable development in healthcare and pharmaceuticals. J Med Syst 44(11):1–15
Sprinting from Waterfall: The Transformation of University Teaching of Project Management Anthony Chan, David Miller, Gopi Akella, and David Tien
Abstract Project Management is taught as a compulsory core unit in the undergraduate Information Technology degree. The subject is based on the Project Management Book of Knowledge (PMBOK) and was taught using case studies and teamwork environment. However, many students find the content overwhelming to be taught in a 12-week session of study. The subject was transformed after a series of consultations with past students, and members of the Information and Communication Technology industry. Using some of the best practices from the industry, this subject was transformed into an active participatory format using Agile principles. Taking the style of medical education and casebooks, student participation and interest increased tremendously and both student satisfaction and performance were noted. This paper will outline the dynamic strategies employed, the preliminary benefits received from such changes, and how the students responded to these changes. Keywords Project management · Teamwork · Authentic learning · Subject development · Agile
1 Introduction Project Management is a subject taught to all undergraduates of Bachelor of Information Technology at Charles Sturt University, Australia. It is a core subject in the degree and placed under AQF Level 7 of the Australian Quality Framework (TESQA). The subject is accorded 8 points and is to be completed in one session. The number of points measures the size of the subject’s contribution to the degree in which the student is undertaking. A core subject means that the student must successfully complete the subject to be eligible for graduation. There are a few core subjects that must be completed over the period of enrolment. The Course Accreditation Policy of the university sets a total of 192 points to be acquired by the student. The Subject Policy states that an eight point subject should require a student to spend up to 160 h A. Chan (B) · D. Miller · G. Akella · D. Tien Charles Sturt University, Wagga Wagga, New South Wales, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_30
353
354
A. Chan et al.
engaged in the learning and teaching activities and time spent in preparation for the subject’s assessment (CSU). A postgraduate version of the subject is also available.
2 Enrolment and Teaching Approaches The student enrolment before the COVID pandemic year totals to about 1,000 students per calendar year. The teaching workforce can stretch to as many as 12 tutors teaching up to three sessions per year. The allocation of three contact hours per week over a space of twelve teaching weeks put a lot of pressure in covering the contents of the subject. The subject’s curriculum is based on the Project Management Institute Project Management Book of Knowledge (PMBOK). In the earlier versions of PMBOK, the standard knowledge areas covered these ten topics: Project Integration Management, Project Scope Management, Project Schedule Management, Project Cost Management, Project Quality Management, Project Resource Management, Project Communication Management, Project Risk Management, Project Procurement Management, and Project Stakeholder Management [14]. The university allocates three hours of lectures and tutorials each week of the twelve teaching weeks. Each week’s teaching is focused on a lecture aided by a set of PowerPoint slides followed by a tutorial working on a few questions. Project Management Institute published the latest 7th edition of PMBOK which reflects the full range of development approaches and expands a section called “models, methods and artefacts” [28]. There have been a few notable teaching approaches of Project Management recorded in literature over the past twenty years: The Information Technology (IT) method was found to be more effective compared to written case study methods as it employed higher cognitive skills and it also triggers interest in learning about project management [15]. The recognition that IT has emerged as a new academic discipline and project management is one of the five core technology areas cited by the Association for Computing Machinery (ACM) curriculum guidelines for the discipline. An experiential approach to teaching the subject was described [1]. Following that, the consideration of future pedagogy that will impact student experience was highlighted and it focused on two key components: students’ perceptions of what is significant and the component of virtual learning [25]. A call to the blended learning approach that provided emphasis on the role of learners as contributors to the learning process rather than the recipients of learning soon appeared. This approach also addressed the different student learning styles [16]. The importance and flexibility of software tools were highlighted to ensure that they aligned with PMBOK and highlighted the need for students to be instructed in their use [12]. The difficulty in teaching undergraduate students on this topic has been globally considered in multiple academic disciplines since the students have no prior knowledge [26] and a move to flipped teaching methodology was made [2]. The move to Agile methodology meant a change in teaching approach was required and a framework was presented to Information Systems educators [32].
Sprinting from Waterfall: The Transformation of University Teaching …
355
3 Subject Design Rationale The Waterfall methodology was presented to undergraduates under a previous subject convenor (also known as subject leader) in a lecture-tutorial model. It moved from standard PowerPoint lessons to case studies and then finally, two websites of fictitious companies—one acting as the employer and the other acting as a client. These sites were created under the advice of educational designers to give a limited sense of reality. The lecturers wore three hats at some stage trying to deliver theory, present the case problem as an employer, and then advise steps in completing the task. The failure rate in the subject was high as students with little or no project management experience grapple with theory, issues, and approaches in the Waterfall methodology. Subject design began in 2019 with consultations done with alumni, industry contacts, and teaching partners. The difficulty in teaching Project Management narrowed down to three areas: the amount of content that must be covered in the teaching session, the absence of experience in the project management field, and the differing level of project management experience among tutors. There was no teaching model available to higher education to teach the Agile approach in 2019. An accidental early teaching effort of transitioning was recorded [35]. Commentators have also mentioned the conflict of [Waterfall] methodology in tutors trying to teach Agile and research in this area is ongoing [32]. As academic tutors were the bridge to the knowledge base and Agile methodology, the new subject design centred on student teamwork. Teamwork among undergraduate (and postgraduate) students has always been a challenge in higher education [17, 18, 21, 27]. But yet active learning is more engaging than the lecture [34]. The tutor’s role will also have to be re-adjusted to focus on team discovery and development. And with the outbreak of SARS-CoV-2, further adjustments had to be made quickly to accommodate an online delivery model [3, 20]. The amount of volunteering in the hospital and the admiration of medical staff at work by the author provided impetus into understanding how medical students are taught in the hospital ward. The teamwork concept in healthcare has resulted in preventable medical errors, many of which are the result of dysfunctional or nonexistent teamwork [19]. The increased specialization of tasks in effective patient management echoes the increased interest of students in the different streams of IT: networking, programming, management, databases, cybersecurity, and the like. The need to ensure appropriate healthcare outcomes and patient safety is seen as similar in a comprehensive coverage of skills within the IT professional practice areas. Reporting and accountability were also seen as important to promote transparency in the process [5]. Undergraduate students have another major challenge—engagement in class and teamwork. Students are often distracted with their mobile devices or laptop and decreasing interest in the discussion or issues at hand [9, 10, 13]. A survey of workplace behaviour and workers with mobile devices gave an idea to a system of team reward-and-punishment system. A comprehensive teamwork mark system would give power to the team to recognize the individual as well as team effort and success
356
A. Chan et al.
without the assessment being in the way [6]. At the end of each meeting, the team would judge if everyone came to the scrum on time, prepared and not distracted. A system of points would take off the final mark for the assessment. This deduction would form the report provided in the form of minutes of the meeting by a system of rotating team leaders. The reward system provided explicit incentives for teamwork. This was explained to students in industry practice of providing a salary and bonus reward practice for successful teamwork [23]. At the beginning of each teaching session, every student would have to take a compulsory quiz to ensure every student understood their responsibility. The work of “train the trainer” began by acknowledging that this subject will put the tutor in the role of facilitator of student learning rather than a teacher [24, 29, 30]. A team re-training was organized and then close contact was maintained with first-time tutors of the subject to ensure that the “conflict of methodologies” was attended to, as a matter of priority [11]. Tutors only referred to theory on a needs basis or when questions were raised by students. Most of the time, the student teams discussed among themselves in small groups. The backdrop to good curriculum implementation is the learning material. The solution to cover concepts listed in PMBOK was to approach it in the style of medical education. The concept is modelled after the role of the first-year medical school student who keeps in contact with the patient as much as possible, seeks help from the clinician, and consults other experts and sources to develop a complete picture of the patient’s life. This medical student will work on a casebook that includes, but not limited to the patient’s entire history. This approach will allow the student to develop a deeper and more diverse understanding of what comprises healthcare life of the person [4, 36, 38]. After university studies, IT graduates work on client projects either in house or off site. A similar understanding of the client and the project is required, with access to other experts such as resourcing, financing, and others. The understanding and Agile implementation is therefore a perfect fit into this project management team environment. The ability to communicate well with other team members and skill was practiced throughout the session. Each team will be given the choice to pursue the path they chose and not be constrained to a standard path. This will help greatly as teams are pushed to investigate, research, and be creative with the solutions they offer. A casebook approach will also provide the affective feel towards a project as opposed to a “case study” approach [38]. As in the approach and purpose of casebooks, communication with end-users and stakeholders is prioritized. The ability to deliver their project to a non-technical high-level audience is also a skillset to be developed as opposed to classroom time of delivering oral presentations [8].
Sprinting from Waterfall: The Transformation of University Teaching …
357
4 Implementation and Stakeholder Responses The first implementation of using the casebook concept in Project Management was carried out in 2019 using PMBOK Sixth Edition. It was a major change in class logistics management and a period for tutors to be settled into their new roles. The postgraduate cohort was picked as the first group as it is a more resilient group to change. The undergraduate group joined a year later. The passing rate of the subject pre-Agile implementation (2019) is 66% for domestic students and about 72.8% for the international cohort. The chart below shows two years before implementation and two years after, with the number of students. Passing rate
2018
2019
2020
2021
Domestic
66.0% n = 81
66.0% n = 69
96.0% n = 54
93.5% n = 49
International
75.2% n = 2242
72.8% n = 1914
90.1% n = 1260
87.8% n = 333
The teamwork and casebook implementation has a better result as students were able to engage with their learning and become active participants in the teaching– learning space. Qualitative comments received from students were as follows: The experience was major learning experience in terms of our learning. We were able to make huge strides in improving our existing knowledge of the matter and it turned out to be an amazing opportunity. I loved that [lecturer name] tried something different in this subject, something that wasn’t textbook and PowerPoint. I really liked the group work elements. It was great to meet other students and talk through ideas. This proved to be a whole new experience for us as the interaction with the professional workspaces and implementing the theoretical knowledge was something we had not worked on before. Working as a team and collaborating towards the common goals taught us the lifelong lessons of teamworking and partnering. We achieved a lot together and a lot of it was that we shared our knowledge, ideas, expertise with each other during the whole project. The major success factors were communication, sharing of ideas, responsibility, and participation. No one got distracted during meetings.
The student subject evaluation reports through the four years also showed us positive development in a few areas. The percentage of students agreeing to the statement presented rose significantly [2019 was the year of implementation].
The subject incorporates the study of current content
2018 (%)
2019 (%)
2020 (%)
2021 (%)
56.5
59.0
94.5
84.5 (continued)
358
A. Chan et al.
(continued) 2018 (%)
2019 (%)
2020 (%)
2021 (%)
66.0
57.0
83.5
93.0
Created opportunities for me to learn from my 66.0 peers
68.0
83.5
88.0
The teaching in this subject motivated me to learn
Students acknowledged that Agile was the current approach that is used in the industry. The way the subject is taught and organized around the casebook has motivated them to learn more about Agile practices and tools. The teamwork component with its penalty points for poor attendance or non-participation has helped groups function well and is the keystone to the student team contribution. Some unexpected benefits were also realized, as these student comments indicated. I am a mature student and the take-away skills from this subject could be implemented in my own workplace. It was incredible to see how it works at my job. Even my boss complimented me on the broad approach and utilizing skills from every staff. My weakness is my hatred for teamwork. I like to work individually most of the time but working on this project with the team, I improved on my team-work skills. I found out that some of the team members are hesitant in sharing their ideas. They don’t share it clearly and it was very difficult to understand what they were trying to say. I realize good communication is the backbone of successful projects.
5 Future Work and Direction The casebook concept has delivered benefits in this initial period of review. Future work would be driven by the following points. IT students are used to notation and brevity and students need to understand that working in project management requires the construct of online discourse and how they need to construe this for positive participation [40]. Project management requires access to multiple information sources and inter-disciplinary research and this requires an alternative approach to information literacy and delivery [33]. The work of many young IT professionals is rooted in processes of vocational education and a “hands-on approach” versus “discussing-and-thinking model” approach [22, 31, 41].
6 Conclusion Subject development in project management studies can contribute positively to new experiences and enable educators to present an introductory experience in the
Sprinting from Waterfall: The Transformation of University Teaching …
359
principles of PMBOK effectively. It is important to move away from the teachercentric style of lectures and the assumption that students will not learn if they are not fed theory. It would be impossible to force-feed all the elements in PMBOK in a university semester anyway. In this curriculum revision, many students have been driven to look for more information on their own and this is exhibited by the work they have delivered. None have expressed difficulty in understanding what they are reading. In many cases, this subject has also delivered a largely unpredicted outcome of bringing students together and bridging the loneliness of struggling with concepts that are alien to them. Acknowledgements The authors acknowledge the contribution of co-author David Miller, who was a member of the Project Management Institute, for his assistance and insights into the development of this subject. David passed away on 21 August 2022 while this manuscript was in its final draft.
References 1. Abernethy K, Piegari G, Reichgelt H (2007) Teaching project management: an experiential approach, vol 22. Consortium for Computing Sciences in Colleges. https://doi.org/10.5555/ 1181849.1181888 2. Abushammala MFM (2019) The effect of using flipped teaching in project management class for undergraduate students. J Technol Sci Educ 9(1):41–50. https://doi.org/10.3926/jotse.539 3. Basilaia G, Kvavadze D (2020) Transition to online education in schools during a SARS-CoV-2 coronavirus (COVID-19) pandemic in Georgia. Pedagogical Res 5(4) 4. Beier LM (2018) Seventeenth-century English surgery: the casebook of Joseph Binns. In: Medical theory, surgical practice. Routledge, pp 48–84 5. Bell SK, White AA, Yi JC, Yi-Frazier JP, Gallagher TH (2017) Transparency when things go wrong: physician attitudes about reporting medical errors to patients, peers, and institutions. J Patient Saf 13(4). https://journals.lww.com/journalpatientsafety/Fulltext/2017/12000/Transp arency_When_Things_Go_Wrong__Physician.11.aspx 6. Bravo R, Catalán S, Pina JM (2019) Analysing teamwork in higher education: an empirical study on the antecedents and consequences of team cohesiveness. Stud High Educ (Dorchesteron-Thames) 44(7):1153–1165. https://doi.org/10.1080/03075079.2017.1420049 7. CSU. Recommended Student Time Commitment. https://www.csu.edu.au/division/learningand-teaching/subject-outline/subject-schedule-and-delivery/recommended-student-time-com mitment 8. Daniel M, Rougas S, Warrier S, Kwan B, Anselin E, Walker A, Taylor J (2015) Teaching oral presentation skills to second-year medical students. MedEdPORTAL 11. https://doi.org/ 10.15766/mep_2374-8265.10017 9. Dontre AJ (2021) The influence of technology on academic distraction: a review. Hum Behav Emerg Technol 3(3):379–390 10. Flanigan AE, Babchuk WA (2022, 2022/04/03) Digital distraction in the classroom: exploring instructor perceptions and reactions. Teach Higher Educ 27(3):352–370. https://doi.org/10. 1080/13562517.2020.1724937 11. Frydenberg M, Yates D, Kukesh J (2018) Sprint, then fly: teaching agile methodologies with paper airplanes. Inf Syst Educ J 16(5). http://isedj.org/2018-16/n5/ISEDJv16n5p22.html 12. Goncalves RQ, von Wangenheim CAG, Hauck JCR, Zanella A (2018) An instructional feedback technique for teaching project management tools aligned with PMBOK. IEEE Trans Educ 61(2):143–150. https://doi.org/10.1109/TE.2017.2774766
360
A. Chan et al.
13. Goundar S (2014) The distraction of technology in the classroom. J Educ Hum Dev 3(1):211– 229 14. A Guide to the Project Management Book of Knowledge (2014) Project Management Institute, 5th ed. 15. Hingorani K, Sankar CS, Kramer SW (1998) Teaching project management through an information technology-based method. Proj Manag J 29(1):10–21. https://doi.org/10.1177/875697 289802900105 16. Hussein BA (2015) A blended learning approach to teaching project management: a model for active participation and involvement: insights from Norway. Educ Sci 5(2):104–125. https:// www.mdpi.com/2227-7102/5/2/104 17. Iacob C, Faily S (2019, 2019/11/01) Exploring the gap between the student expectations and the reality of teamwork in undergraduate software engineering group projects. J Syst Softw 157:110393. https://doi.org/10.1016/j.jss.2019.110393 18. Joanna W, Elizabeth AP, Seth S, Alexandra K (2016) Teamwork in engineering undergraduate classes: what problems do students experience? In: 2016 ASEE annual conference & exposition, Atlanta 19. Lerner S, Magrane D, Friedman E (2009) Teaching teamwork in medical education. Mt Sinai J Med 76(4):318–329. https://doi.org/10.1002/msj.20129 20. Lindsjørn Y, Almås S, Stray V (2021) A case study of teamwork and project success in a comprehensive capstone course. Norsk IKT-konferanse for forskning og utdanning 21. McCorkle DE, Reardon J, Alexander JF, Kling ND, Harris RC, Iyer RV (1999, 1999/08/01) Undergraduate marketing students, group projects, and teamwork: the good, the bad, and the ugly? J Mark Educ 21(2):106–117. https://doi.org/10.1177/0273475399212004 22. McKenzie S, Coldwell-Neilson J, Palmer S (2018) Understanding the career development and employability of information technology students. J Appl Res Higher Educ 10(4):456–468. https://doi.org/10.1108/JARHE-03-2018-0033 23. Mower JC, Wilemon D (1989, 1989/09/01) Rewarding technical teamwork. Res-Technol Manage 32(5):24–29.https://doi.org/10.1080/08956308.1989.11670609 24. Nuñez Enriquez O, Oliver KL (2021) ‘The collision of two worlds’: when a teacher-centered facilitator meets a student-centered pedagogy. Sport Educ Soc 26(5):459–470 25. Ojiako U, Ashleigh M, Chipulu M, Maguire S (2011) Learning and teaching challenges in project management. Int J Project Manage 29(3):268–278. https://doi.org/10.1016/j.ijproman. 2010.03.008 26. Pan CCS (2013, Oct 2013 2015-12-07) Integrating project management into project-based learning: mixing oil and water? In: IEEE conferences, pp 1–2. https://doi.org/10.1109/CICEM. 2013.6820187 27. Pfaff E, Huddleston P (2003, 2003/04/01) Does it matter if i hate teamwork? What impacts student attitudes toward teamwork. J Mark Educ 25(1):37–45. https://doi.org/10.1177/027347 5302250571 28. PMBOK Guide (2022) Project Management Institute.https://www.pmi.org/pmbok-guide-sta ndards/foundational/PMBOK 29. Putri AAF, Putri AF, Andriningrum H, Rofiah SK, Gunawan I (2019) Teacher function in class: a literature review. In: 5th international conference on education and technology (ICET 2019) 30. Reeve J (2006) Teachers as facilitators: What autonomy-supportive teachers do and why their students benefit. Elem Sch J 106(3):225–236 31. Rosenbloom JL, Ash RA, Dupont B, Coder L (2008, 2008/08/01/) Why are there so few women in information technology? Assessing the role of personality in career choices. J Econ Psychol 29(4):543–554. https://doi.org/10.1016/j.joep.2007.09.005 32. Rush DE, Connolly AJ (2020) An agile framework for teaching with scrum in the IT project management classroom. J Inf Syst Educ 31(3):196–207. http://jise.org/Volume31/n3/JISEv3 1n3p196.html 33. Scheepers MD, De Boer A-L, Bothma TJ, Du Toit PH (2011) A mental model for successful inter-disciplinary collaboration in curriculum innovation for information literacy. South Afr J Libr Inf Sci 77(1):75–84
Sprinting from Waterfall: The Transformation of University Teaching …
361
34. Sibona C, Pourrezajourshari S (2018) The impact of teaching approaches and ordering on IT project management: active learning vs. lecturing. Inf Syst Edu J 16(5). https://isedj.org/201816/n5/ISEDJv16n5p66.html 35. Snapp MB, Dagefoerde D (2008) The Accidental agilists: one teams journey from waterfall to Agile. In: Agile 2008 conference 36. Stanton RC, Mayer LD, Oriol NE, Treadway KK, Tosteson DC (2007) The mentored clinical casebook project at Harvard Medical School. Acad Med 82(5). https://journals.lww.com/ academicmedicine/Fulltext/2007/05000/The_Mentored_Clinical_Casebook_Project_at_Harv ard.15.aspx 37. TESQA. Australian quality framework. Tertiary Education Quality and Standards Agency. https://www.teqsa.gov.au/australian-qualifications-framework 38. Thompson CE (2022) Beyond imperturbability: the nineteenth-century medical casebook as affective genre. Bull Hist Med 96(2):182–210 39. Understanding the project management knowledge areas. https://www.workfront.com/projectmanagement/knowledge-areas 40. Ware P (2005) “Missed” communication in online communication: tensions in a GermanAmerican telecollaboration. Lang Learn Technol 9(2):64–89 41. Zarrett NR, Malanchuk O (2005) Who’s computing? Gender and race differences in young adults’ decisions to pursue an information technology career. New Dir Child Adolesc Dev 2005(110):65–84. https://doi.org/10.1002/cd.150
Versioning: Representing Cultural Heritage Evidences on CIDOC-CRM via a Case Study Ariele Câmara, Ana de Almeida, and João Oliveira
Abstract Understanding the elements that allow the recognition of archaeological structures is an essential task for the identification of cultural heritage. On the other hand, recording these elements is necessary for the historical study, evolution, and recognition of these types of structures. One of the challenges presented for the digital representation of this information and knowledge relates to the fact that there are results from different surveys and records on the status of the same monument, that can be considered as separate versions of knowledge. In this paper, we describe a schema to represent versioning data about archaeological heritage dolmens using the CIDOC-CRM model as a basis. The versioning schema will work as a database model for the development of a knowledge graph to aid with automatized dolmen recognition in images. The intended model efficiently stores and retrieves eventdriven data, exposing how each update creates a new “version” of a new event. An event-driven model based on versioning data makes it possible to perform comparisons of versions produced at different times or people and allows for the creation of complex version chains and trees. Keywords Archaeological structures · Versioning · CIDOC-CRM · Knowledge graph · Event-driven model
A. Câmara (B) · A. de Almeida · J. Oliveira Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal e-mail: [email protected] Centro de Investigaçao em Ciências da Informaçao, Tecnologias e Arquitetura, Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal A. de Almeida Centre for Informatics and Systems of the University of Coimbra (CISUC), Coimbra, Portugal J. Oliveira Instituto de Telecomunicações, Lisboa, Portugal © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_31
363
364
A. Câmara et al.
1 Introduction Knowledge graphs have emerged as a technology that aims to provide semantic information capable of being understood and interpreted by machines about real-world entities [14]. To represent the knowledge about entities and processes, it is necessary to differentiate real things (phenomenal) from those described through information (declarative) [9]. Mapping both the phenomenal and declarative knowledge of cultural heritage according to a common standard model such as the one provided by the International Committee for Documentation—Conceptual Reference Model (CIDOC-CRM)—is a key to support interoperability. The representation of historical, cultural, and archaeological data has traditionally been carried out by different specialists and maintained by institutions such as libraries, archives, and museums [9]. The multiple sources and different researchers’ backgrounds have led, through the years, to a disparity between data sources and formats, and different historical versions of declarative information, for example, data derived from interpretations of the same object [6]. Handling these metadata as a unique set is vital for different purposes such as information retrieval. This paper explores the development of a schema to model a graph-based data model to represent the different versions of the knowledge acquired about dolmens— using the information about the structural elements that may help their recognition in satellite images. In order to achieve this goal, we adopted the CIDOC-CRM.
2 Representing Data from Heterogeneous Sources 2.1 CIDOC-CRM CIDOC-CRM is a formal ontology for the integration, mediation, and exchange of cultural heritage information with multi-domain knowledge. Its development started in 1996 and in 2006 it became an ISO 21127:2006 standard [1]. Although it started as an ontology for museums, it is not limited by this usage and has been used for different purposes [7, 9, 13, 15]. The CIDOC ontology (version 7.2.1) consists of 81 classes taxonomically organized and 160 unique properties to model knowledge. The most general classes are represented as a E77 Persistent Item, E2 Temporal Entity, E52 Time Span, E53 Place, E54 Dimension, and E92 Spacetime Volume. As an event-centric model that supports the historical discourse, the CIDOC-CRM allows for the description of entities that are themselves processes or evolutions on time. Using the E2 Temporal Entity and its subclasses enables the description of entities that occurred at some point (or points) over time. One of these entities focuses on the description of a temporal duration and documents the historical relationships between objects that were described using subclasses of E77 Persistent Item. CIDOCCRM enables the creation of a target schema to join all the varied knowledge about
Versioning: Representing Cultural Heritage Evidences …
365
a domain, since CRM provides a definition and a formal structure to represent the concepts and relationships of cultural heritage.
2.2 Definition—Schema Versioning Schema evolution requires keeping the complete change history for the schema—so it is necessary to retain all previous definitions. Versioning mechanisms can be useful to support scholars in making research more transparent and discursive. The schema versioning idea was introduced in the object-oriented database system development context, as systems are implemented to deal with multiple schemas and the evolution of information [12]. We found a few examples of versioning using the CIDOC-CRM as a way to track different versions. The authors in Velios and Pickwoad [15] use an event-centric approach, where each entry represents a different version of bookbinding with all temporal classes directly connected to the same entity equivalent to each of the records on the cover of a given binding. Despite making it easier to understand from a human point of view, from a computational point of view having a unique entity that relates to several temporal instances will add cycles to the information graph [5] present ArCo on how to develop and validate a cultural heritage knowledge graph— discussing an approach to represent dynamic concepts which may involve or change over time. Every change generates a new record version of the same persistent entity that is represented by the catalogue record and its versions are related to it. Each version is associated with a time interval and has temporal validity. Another work that we can mention is that of [2], which shows how Version Control Systems have different benefits in the field of digital humanities, thus proposing an implementation of versioning in collaboration with version history. However, it is not shown how to work with this model when using CRM to structure heritage data. In a database several versions of information can coexist, even more so when we talk about temporal databases, which is the case when we deal with information about cultural heritage derived from several investigations on a monument or about the analysis data generated about it.
2.3 Data Modelling Issues: Event-Version Archaeological reasoning is supported by the multiple interpretations and theories stated, published, re-examined, and discussed over the years. Archaeology contains a rich and complicated example of argumentation used in a scientific community, showing how different fact-based theories were developed and changed over time [6]. The standard inferences, the sequence of factual observations, and the change of belief occurring over time can be represented using knowledge graphs. Despite
366
A. Câmara et al.
this, there are some limitations to this representation of this data, such as (i) correctly grouping components from different periods/versions and (ii) scalability [13]. An event represents a single episode in the data collection or recording. This single event can only consist of an investigative technique and is therefore a unique entity in time and space. Different events may have new results over the same object. Thus, the cultural property could be interpreted by various agents (e.g., researchers) at different moments in time, resulting in different interpretations. Event-based models are already well established [1, 8, 10, 15]. However, models such as CIDOC were not designed to directly support the different pieces of information representing different perspectives interpreted by different agents, or new pieces of information generated through it [5]. Using an event-centric model based on versions to structure data, we can (1) describe any number of component versions and (2) identify the components belonging to different versions [13]. When we take into account the knowledge about the various phases of the thing being analyzed, we can link what we observe with the related events.
3 Recording Cultural Heritage: A Study Case Using Archaeological Monuments The study case here presented deals with the representation of information about megalithic monuments classified as Dolmens in Pavia (Portugal) built in the Neolithic-Chalcolithic. These structures are one of the most representative and ubiquitous cultural features of prehistoric landscapes in Western Europe. The first systematic works about dolmens in Pavia were carried out by Vergílio Correia who published his research in 1921 [4]. His work is considered a benchmark for the knowledge of megalithic [11]. There are different records concerning research carried out in the area and on the same monuments. For the development of the schema, we collected and analyzed all data available through the DGPC Digital Repository1 regional archaeological map [3], the information provided by experts, and data obtained through photo interpretation. Unlike CIDOC-CRM, which is an event-based model, the data in most archaeological records on cultural heritage speaks implicitly about events. In this sense, we can use the CRM model to capture non-existent semantics and make explicit what is implicit. It helps to capture what lies between the existing semantic lines and structures into a formal ontology. At the same time, it serves as a link between heritage data, allowing to represent all this knowledge in a way that can be understood by people but also processed by machines, and thus allows the exchange, integration,
1
The DGPC is the State department responsible for managing archaeological activity in Portugal. Management of heritage is achieved through preventive archaeology and research, and records are provided viathe Archaeologist’s Portal: https://arqueologia.patrimoniocultural.pt/.
Versioning: Representing Cultural Heritage Evidences …
367
research, and analysis of data. By making what is implicit explicit, we are able to use existing data to ask new questions and consequently obtain new results.
3.1 Object-Based Record Dolmens are persistent physical structures built by man, with a relatively stable form, which occupy a geometrically limited space and establish a trajectory in space–time throughout their existence. Following the hierarchy of classes defined by CIDOC, the E22 Human-Made Object is the most specialized class within the hierarchy of man-made persistent items. The dolmen, as a general term, is characterized here as a CIDOC-CRM entity E22 Human-Made Object, which, according to the CIDOCCRM class specification, “… comprises all persistent physical items of any size that are purposely created by human activity” [1].
3.2 Versioning-Based Record Since we work with data produced by different specialists at different times on the same object, the question of how to deal with such a rich and diverse number of primary sources is not simple, especially if the authorship and origin are not always clear [13]. In order to do so, we must focus on the content, with each record being seen as a unique version of the same monument. As a result, we consider abstracts, records, or metadata that represent knowledge about the same entity as documents expressing a unique version about the monument. First, a single instance is created representing where the knowledge was obtained. A new related entity is created for representing the information about the dolmen found in the document and, finally, these separate instances are connected through a new entity, as shown in Fig. 1. We assign an ID to represent several E22 Human-Made Thing instances that contain knowledge about the same human-made object. An ID is characterized as an instance of the CIDOC-CRM E42 Identifier and is used to group E22 HumanMade Thing instances, each representing data about the same item but obtained from different documents. This model will record whatever activity over the object that generated the first record, whenever it is generated and acquired, while maintaining all previous knowledge to easily access it using the same class, creating a simple and non-recursive model. A branch in this context means that N parallel versions can be developed separately at a certain point in the development model. Since the goal is the representation of relevant information for posterior analysis, interpretation, and classification of images, in this case, for recognizing dolmens, the focus is on the dolmens structure representation and all the related elements that may assist in its recognition. To represent the structural information, E22 Human-Made Thing instances can be used as output for new entities that allow the characterization of the elements it represents.
368
A. Câmara et al.
Fig. 1 Schema representing the relationships in our versioning-based record to connect the item with their features by document
3.3 Event-Version-Based Record E2 Temporal Entity and its subclasses usage allows for the description of entities that occurred at some different point(s) over time. These entities focus on a temporal duration description and in records of the constant chronological relationships between the objects—to represent the information as an activity, beginning or end of something. However, this constant is not always respected. For example, when we talk about the beginning of the existence of a dolmen, we are talking about a phase of time described as Neolithic-Chalcolithic, semantically the existing connection properties would lead us to infer that the object description refers to its structure during this event and not its structure at the time it was analyzed and recorded, as is the case. Still, the initial structure is mostly unknown, since the structures may have been
Versioning: Representing Cultural Heritage Evidences …
369
created, modified, and reused, and there are no records about these activities. In any case, this information would not help to identify structures in images. For our use case, we need a class to understand the actions of making claims about an object property and that allows us to access the date and place where the knowledge was obtained—or at least know all the characteristics of the object at the time of data collection. The E13 Attribute Assignment class comprises the actions of making assertions about a property of an object or any unique relationship between two items or concepts, allowing to describe the people’s actions making propositions and statements during scientific procedures, for example, who or when a condition statement was made. Note that any instance of properties described in a knowledge base such as this is someone’s opinion—which in turn should not be recorded individually for all instances in favour of avoiding an endless resource whose opinion was the description of another opinion [1]. However, for the present case, as the description obtained by different entities sometimes contain contradictory data, a model that works with different views is necessary. Thus, these fragmented reports can be seen as versions that can enrich and complement our knowledge of the monuments and their relations, but they can also present conflicting information and narratives and multiple E13 instances can potentially lead to a collection of contradictory values. This redundant modelling of alternative views is preferred here because when talking about structural features, they all become relevant for a better perception of the object and how it may have been affected and affect its surrounding environment—which can help recognition. In this sense, we use the E13 Attribute Assignment entity to record the action of describing the dolmen and connect the event to the object with its descriptions. Using records as events to deal with different pieces of information of dolmens status, and unique IDs to group instances concerning the same dolmen, made it possible to overcome the issue. To associate the action of describing the dolmen to where the information was obtained, we use the E31 Document entity. This class allows for the representation of information on identifiable material items that originated propositions about the object under analysis. Thus, the relationship with the E31 Document entity is described based on the type of document that records the information. In addition to the document with the object description, we recorded the date of the information using the E52 Time Span entity. This information is relevant to prioritize the most current knowledge and enable the analysis of the chronological order of events that led to the description of the dolmen represented at that time. The schema model described is shown in Fig. 2. By using records as events and by considering each new record on the same monument as a unique version of the same, we create a model capable of dealing with the fact that different research works were, are being, or can be performed on the same monument, resulting in different outcomes since they can be made by different researchers, with diverse approaches and at different periods in time. Therefore, we manage to keep all the information that can later be relevant for the recognition of these or of similar structures.
370
A. Câmara et al.
Fig. 2 The archaeological monuments described in a record are represented as instances of HumanMade Object. When different documents report the same object, they are represented as unique entities (Human-Made Object) and related by a local ID. The local ID relates entities about the same monument and each Human-Made Object is related to the record activity and document, where knowledge is represented as acquired
4 Conclusion This article proposes the implementation of versioning in a model defined by CIDOCCRM. We defined a new scheme model to represent different versions of information about the same monument, keeping all the previous and new knowledge without the need of merging, which could lead to incongruent information on the same entity due to different approaches in time and methodology. Thus, we developed an interoperable model capable of storing, analyzing, and retrieving data quickly and effortlessly, allowing cross-reference information, identifying patterns, and assisting in automated classification and recognition methods of these or similar structures in images. The next phases of the project involve the development of the schema model to represent the physical and geographical characteristics of the structure and the surrounding landscape to generate a knowledge graph capable of contextualizing all the elements that allow the identification of dolmens in the territory. Acknowledgements This work was partially supported by the Fundação para a Ciência e a Tecnologia, I.P. (FCT) through the ISTAR-Iscte project UIDB/04466/2020 and UIDP/04466/2020, through the scholarship UI/BD/151495/2021.
References 1. Bekiari C, Bruseker G, Doerr M, Ore CE, Stead S, Velios A (2021) Volume A: definition of the CIDOC conceptual reference model
Versioning: Representing Cultural Heritage Evidences …
371
2. Bürgermeister M (2020) Extending versioning in collaborative research. In: Versioning cultural objects digital approaches, pp 171–190. http://dnb.d-nb.de/ 3. Calado M, Rocha L, Alvim P (2012) O Tempo das Pedras. Carta Arqueológica de Mora. Câmara Municipal de Mora. https://dspace.uevora.pt/rdpc/handle/10174/7051 4. Câmara A (2017) A fotointerpretação como recurso de prospeção arqueológica. Chaves para a identificação e interpretação de monumentos megalíticos no Alentejo: aplicação nos concelhos de Mora e Arraiolos. Universidade de Évora. https://dspace.uevora.pt/rdpc/handle/ 10174/22054 5. Carriero VA, Gangemi A, Mancinelli ML, Nuzzolese AG, Presutti V, Veninata C (2021) Patternbased design applied to cultural heritage knowledge graphs. Semantic Web 12(2):313–357. https://w3id.org/arco 6. Doerr M, Kritsotaki A, Boutsika K (2011) Factual argumentation—a core model for assertions making. ACM J Comput Cult Herit 3(8). https://doi.org/10.1145/1921614.1921615 7. Faraj G, Micsik A (2021) Representing and validating cultural heritage knowledge graphs in CIDOC-CRM ontology. Future Internet 13(11):277. https://doi.org/10.3390/FI13110277 8. Guan S, Cheng X, Bai L, Zhang F, Li Z, Zeng Y, Jin X, Guo J (2022) What is event knowledge graph: a survey. IEEE Trans Knowl Data Eng 1–20. https://doi.org/10.1109/TKDE.2022.318 0362 9. Hiebel G, Doerr M, Eide Ø (2017) CRMgeo: a spatiotemporal extension of CIDOC-CRM. Int J Digit Libr 18(4):271–279. https://doi.org/10.1007/S00799-016-0192-4/FIGURES/6 10. McKeague P, Corns A, Larsson Å, Moreau A, Posluschny A, Daele K van, Evans T (2020) One archaeology: a manifesto for the systematic and effective use of mapped data from archaeological fieldwork and research. Information 11(4):222. https://doi.org/10.3390/INFO11 040222 11. Rocha L (1999) Aspectos do Megalitismo da área de Pavia, Mora (Portugal). Revista Portuguesa de Arqueologia 2(1). https://dspace.uevora.pt/rdpc/handle/10174/2248 12. Roddick JF (1995) A survey of schema versioning issues for database systems. Inf Softw Technol 37(7):383–393. https://doi.org/10.1016/0950-5849(95)91494-K 13. Roman Bleier SMW (ed) (2019) Versioning cultural objects: digital approaches 14. de Souza Alves T, de Oliveira CS, Sanin C, Szczerbicki E (2018) From knowledge based vision systems to cognitive vision systems: a review. Procedia Comput Sci 126:1855–1864. https:// doi.org/10.1016/J.PROCS.2018.08.077 15. Velios A, Pickwoad N (2016) Versioning materiality: documenting evidence of past binding structures. Versioning cultural objects digital approaches, pp 103–126. http://dnb.d-nb.de/
Toward a Route Optimization Modular System José Pinto, Manuel Filipe Santos, and Filipe Portela
Abstract Urban mobility and routes planning are one of the biggest problems of cities. In the context of smart cities, researchers want to help overcome this issue and help citizens decide on the best transportation method, individual or collective. This work intends to research a modular solution to optimize the route planning process, i.e., a model capable of adapting and optimizing its previsions even when given different source data. Through artificial intelligence and machine learning, it is possible to develop algorithms that help citizens choose the best route to take to complete a trip. This work helps to understand how Networkx can help transportation companies to optimize their routes. This article presents an algorithm able to optimize their routes using only three variables starting point, destination, and distance traveled. This algorithm was tested using open data collected from Cascais, a Portuguese City, following the General Transit Feed Specification (GTFS) and achieved a density score of 0.00786 and 0.00217 for the two scenarios explored. Keywords Artificial intelligence · Machine learning · Route planning · Smart cities · Urban mobility · GTFS
1 Introduction The pace at which cities are progressively growing in population is a reality that has caused complications of urban mobility. The gap between this growth and investment in infrastructure and solutions to meet the mobility needs of populations in urban environments causes disruption to each citizen’s personal life. One of the most common problems in cities, not only in Europe but worldwide, is traffic congestion. Whether it is the choice of using one’s own motorized transport to the detriment of collective J. Pinto · M. F. Santos · F. Portela (B) Algoritmi Research Centre, University of Minho, 4800-058 Guimarães, Portugal e-mail: [email protected] F. Portela IOTECH—Innovation on Technology, 4785-588 Trofa, Portugal © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_32
373
374
J. Pinto et al.
transport or the lack of viable options for collective transport, this is a problem that, in addition to the inherent traffic congestion, raises some environmental issues. More and more, cities are seeking to apply the concepts and achieve the status of Smart City to respond to the challenges they face nowadays, among them, traffic congestion. The optimization of a city’s processes depends on the development of information technologies, particularly in areas that offer the city, intelligent and dynamic and if possible interoperable systems, and areas like artificial intelligence. This paper’s work is framed within ioCity, a project by the startup IOTech, that proposes the development of an innovative solution to respond to a problem that is quite difficult to solve in the urban mobility area: the recurrent traffic jams on the roadways, often caused by the difficulty in finding a parking spot, ioCity suggests the development of an intelligent Progressive Web App (PWA) that can provide the user with a transport route to a given location taking into account several factors such as traffic and transportation (location, occupancy rate, etc.) All of this is possible through data collected and analyzed in real time. This paper aims to explain how it can optimize the process of route planning within the context of urban mobility. Through artificial intelligence and machine learning, it is possible to develop algorithms that help companies by optimizing their provisions into new datasets to reduce travel time, always considering the influencing factors of urban mobility and the user’s preferences. For this particular study, the data of a Portuguese city was used, Cascais. The first section of this document provides a framework for the subject of this work, briefly describing the environment, the themes explored, and a brief introduction to the concrete work carried out. In Sect. 2, a review of the concepts and existing literature that served as a basis for all the practical work is carried out. Point 3 indicates the materials, methods, and methodologies used throughout the project. Point 4 presents the work carried out following the CRISP-DM methodology, this includes business understanding, data understanding, data preparation, and modeling. Sections 5 and 6 describe the results obtained and the discussion they provided. Finally, in point 7, all the work carried out during this project is concluded and the next steps to be taken following the project are defined.
2 Background The field of artificial intelligence (AI) has been developed from humble beginnings in which curiosity for something new stimulates the research to a global impact presented in projects of high relevance to the society. With it, the dataset can be used to optimize them through the development of algorithms and machine learning. According to Bartneck et al. [1], the definition of AI and what should and should not be included has changed over time and, to this day, its definition continues to be the subject of debate [2]. Since the emergence of the Smart Cities concept in the late 1990s [3], several definitions have emerged and been published resulting from different analyses and approaches by various researchers in the application domain of the concept.
Toward a Route Optimization Modular System
375
Hall [4] defines the term as a city that monitors and integrates all critical infrastructures (roads, bridges, tunnels, railways, subways, airports, communications, energy, security, among others), optimizes its resources, and maximizes the services provided to its citizens. Harrison et al. [5] underlines the need for an Information Systems infrastructure interconnected with the city’s physical, social, and business infrastructures in order to leverage the city’s collective intelligence. The vision for building a Smart City has been progressively developed by researchers, engineers, and other stakeholders in the field. Samih [6] presents an architecture model of a Smart City based on six components: Smart Economy, Smart People, Smart Governance, Smart Mobility, Smart Environment, and Smart Living. Smart Cities mobilities address the following: • Urban mobility (definition): It refers to all aspects of movement in urban settings. It can include modes of transport, such as walking, cycling, and public transit, as well as the spatial arrangement of these modes in a built environment. • Route Planning: The process of computing the effective method of transportation or transfers through several stops. • Route Optimization: The process of determining the most cost-efficient route. It needs to include all relevant factors, such as the number and location of all the required stops on the route, as well as time windows for deliveries. Machine Learning comprises four types of learning: Supervised Learning—The machine receives a set of examples labeled as training data and makes predictions for all unseen points. Unsupervised Learning—the machine studies data to identify patterns. Semi-supervised Learning—The machine receives a training sample consisting of labeled and unlabeled data and makes predictions for all unknown points. Reinforcement learning—the machine is provided with a set of allowed actions, rules, and potential end states [7]. AI operations and optimization involve the application of Artificial Intelligence (AI) technologies, such as machine learning and advanced analytics. This is done to automate problem-solving and processes in network and IT operations and to enhance network design and optimization capabilities.
3 Material and Methods This chapter describes the methodologies that are used in the development of this paper. To guide the writing and development, the chosen methodology is the Design Science Research (DSR) methodology. The SCRUM methodology is used for managing the work and the Cross-Industry Standard Process for Data Mining. The (CRISP-DM) methodology is used to guide the research and development using Machine Learning techniques. To analyze the dataset used, the Talend Open Studio for Data Quality tool was used. Statistical analyses were performed on the various columns of the dataset to provide a better understanding of its content. A simple
376
J. Pinto et al.
statistical analysis was performed on each column, whose indicators enable verification of the number of lines, null values, distinct values, unique values, duplicate values, and blank values, and a value frequency analysis, whose indicators enable verification of the most common values in a column. Python scripts were used to extract and transform the data, namely, the NumPy and Pandas libraries. To develop the model and algorithm, the Networkx library in the Python environment was used.
4 CRISP-DM The next sections are divided according to the methodology chosen to guide the work, the Cross-Industry Standard Process for Data Mining (CRISP-DM). It is also important to point out that the first phase of CRISP-DM, the business understanding, can be found in the introduction of the project.
4.1 Business and Data Understanding This work intends to explore the development of a modular solution to optimize the process of routs. If the origin dataset changes, the model can adapt and optimize its previsions in the new dataset. The first phase will explore optimization algorithms that can receive routes and optimize them. In the second phase (future work), the team will use another dataset to test the model. Then the team defined which data will be needed for the next phase of the project. This data should support the Machine Learning models that will provide the best available route to travel between two locations. The data needed for the next phase focus on three crucial points to define a route, which are Starting point, Destination, and Distance traveled. At this stage, the focus is on collecting an initial dataset that allows us to build a foundation for the project. The dataset idealized at the launch of the product was a dataset composed of data related to public transportation that would allow to build a network of several interconnected paths/lines on which some functionalities could be developed. To obtain the necessary dataset, the initial plan was to contact companies and municipal services in order to get a dataset whose information corresponds to a real situation of planning the operation of a public transportation network. However, due to the pandemic outbreak of COVID-19, this idea was soon discarded, and it was decided to use datasets available on Open Data platforms. After the research and analysis of the selected datasets, it is possible to conclude that the datasets that follow the General Transit Feed Specification (GTFS) have sufficient data for the construction of the representative model of the lines and intersections of a network of the public transport network. The dataset from a Portuguese city (Cascais) was chosen as the basis for this project because it is the one with the most useful information to support the development of this project.
Toward a Route Optimization Modular System
377
Table 1 Simple statistics of the attributes selected Column
Distinct
Unique
Duplicate
stop_times.trip_id
3576 (3.3%)
0
3576 (3.3%)
stop_times.stop_id
1022 (1.0%)
0
1022 (1.0%)
stop_times.shape_dist_traveled
3370 (3.1%)
225 (0.01%)
3145 (2.9%)
After analyzing all the columns of this dataset, it was found that only three of them have relevant information for model building. Table 1 shows the results of the statistical analysis performed on these three columns of the dataset from 110505 rows without null or blank values.
4.2 Data Preparation In order to prepare the modeling, data were extracted from the selected dataset and, on these, a filtering of the information considered relevant for the project was performed. Python scripts (Numpy and Pandas libraries) were used in this procedure to make the necessary changes. The process of processing the dataset and the respective changes made are presented below, followed by a summary of the process: • • • • •
Discarding of all attributes except for the attributes listed in point 4.1; Select the bus line to be transformed into the attribute “stop_times.trip_id”; Rename attributes “stop_times.trip_id” to “Route” and “stop_id” to “Start”; Create the attribute “Stop” transforming the information of the attribute “Start”; Create the attribute “Distance” transforming the information of the attribute “stop_times.shape_dist_traveled”.
This process is repeated for each new bus line that is to be added to the dataset. This process results in a dataset that describes a bus line where each line of the dataset indicates the bus line to which the record belongs in the attribute “Route”, the origin stop and the destination stop in the attributes “Start” and “Stop”, respectively, and the distance traveled between the two stops in the attribute “Distance”. For each bus line, a document in “.csv” format is generated containing the information generated about the respective bus line. In order to centralize the information required to build the model, a Python script was used to integrate all the information generated in the process described in the previous paragraph. At an early stage of the project, only a small number of lines were selected. Initially, only four bus lines were selected and the remaining lines were added to the model base progressively so as to increase its complexity without abruptly causing problems/errors in the model. The result of this integration is a “.csv” file with four columns. The columns “Start” and “Stop” form the vertices of an oriented graph, while the column “Distance” will be the edge connecting the vertices. The “Route” column indicates which of the bus lines the connection between two vertices belongs to. In order to centralize the
378
J. Pinto et al.
information needed to build the model, a Python script was used to integrate all the information generated. This Python script has the function of aggregating the information from all the bus lines selected for the construction of the model. At an early stage of the project, only a small number of lines were selected. Initially, only four bus lines were selected and the remaining lines were added to the model base progressively in order to increase the complexity of the model without abruptly causing problems/errors in the model.
4.3 Modeling In this step, the team selected and optimized the models: route optimization. In this section, the tasks performed to achieve the mentioned objective are described. To begin the modeling process, it was initially necessary to define which techniques to adopt. Given the structure of the data that served as the basis for the modeling, two variables were defined: the source variable which represents the starting point and corresponds to the origin vertex of the oriented graph; the target variable which represents the next stop and corresponds to the destination vertex of the oriented graph. Once these two variables were defined, it was necessary to approach the resolution of this challenge as a regression problem, since the goal is to obtain the shortest path between two (or more) vertices of the oriented graph that represents a transportation network. To answer this regression problem over an oriented graph, it was necessary to select an algorithm capable of processing structured data in the form of a graph, which limited the possible approaches to the problem. Two possible approaches emerged: Dijkstra’s algorithm and the Bellman-Ford algorithm. The Dijkstra algorithm was then selected, since the number of resources using this algorithm is larger. After selecting the modeling techniques, it is important to define the scenarios upon which the model will be built. Two scenarios have been defined: • Scenario 1: In this scenario, the first four bus lines of the dataset are inserted. It is a small dataset with only a few intersections that allows testing the objective functionality of the model. • Scenario 2: In this scenario, the first eighteen bus lines of the dataset are inserted. This scenario includes more intersections than the 1º scenario and allows us to verify the model performance in a larger dataset. Figure 1 allows us a graphical visualization of the graph using Networkx built for Scenario 2. The next step was to build the model and write the respective code and programming. The Jupyter Notebook platform was used to write the code entirely in the Python language. To import the data to the platform, the Pandas library was used. The imported data is the result of the process described in Sect. 4.3 of this document.
Toward a Route Optimization Modular System
379
Fig. 1 Representation of a graph with 18 bus lines from the dataset according to the node density score
Once the data is imported, the Networkx library is used to create the oriented graph. In the imported dataset, there are four columns: Start, Stop, Distance, and Route. The function nx.from_pandas_edgelist allows to create the mentioned graph from a dataset with the structure of the imported data. Vertex pairs are created by associating the Start column to the source variable defining the source vertex, the Stop column to the target variable defining the destination vertex, and the Distance and Route variables as attributes of the edge between the two vertices. The Distance column indicates the actual distance between the two representative vertices and the Route column indicates the route to which this pair of vertices belongs. After building the oriented graph, some functions supporting the route planning model were written. In this dissertation, it was decided to develop a model that allows predicting the best route between two vertices of the graph with the possibility of adding up to three stopping points on the path. The function of the algorithm allows receiving five arguments: start, stop, stop1, stop2, and stop3. The vertices represented by the start and stop arguments are fixed since they indicate the starting point and the final destination of the route idealized by a user. The intermediate points (stop1, stop2, and stop3) can have their order changed if the computation of the best path (shortest path) so indicates. To obtain the best path, a permutation of the arguments is performed where all the travel possibilities are explored considering that the starting and ending points do not change, only the intermediate points. Then, for each permutation, the best path is computed using Dijkstra’s algorithm in the functions nx.shortest_path and nx.shortest_path_length. The algorithm explores all connections between two vertices of the graph and returns the shortest path. This process is repeated for all the connections of the permutation being returned by the function of the optimized path (shortest) and the total distance traveled on that path.
380
J. Pinto et al.
During the execution of the algorithm, the number of lines that the route contains is captured in parallel in order to return a value indicating the cost of the route to the user.
5 Results In order to evaluate and obtain a descriptive perspective of the produced graph, some metrics were computed using the Python library used in the construction of the graph, the Networkx library. The metrics selected were the following, presented and described in Table 2, and the results presented in Table 3. It is possible to verify that, with the increase in the number of bus lines, the graph becomes less dense derived from the nature of the actual context the graphs represent. Since, as a rule, the lines of the bus networks of an operator originate from a set of common stops, having only a few intersections along the various routes of the lines. From scenario 1 to scenario 2, there is a reduction in the score from 0.00786 to 0.00217 justified by the increase in the number of bus lines that constitute scenario 2. The lower average centrality score in scenario 2 (from 0.01572 in scenario 1 to 0.00434) is also natural since the number of “isolated” nodes that constitute the path of most routes is inevitably higher. The node [156257] is, in both scenarios, the Table 2 Metrics selected for evaluation Metrics
Description
Density
Returns the density of a graph m , d = n(n−1) where n is the number of nodes and m is the number of edges in the graph
Connectivity
Returns the average degree of the neighborhood of each node. For directed graphs, N(i) is defined according to the parameter “source”, w = 1 knn,i j∈N (i) wi j k j si where si is the degree weight of node i, wij is the weight of the edge connecting i and j, and N(i) are the neighbors of node i
Centrality
Calculates the degree centrality for nodes. The degree centrality for a node v is the fraction of nodes it is connected to
Intermediation Calculates the shortest path intermediation between nodes The median of node v is the sum of the fraction of the shortest paths of all pairs that pass through v c B (v) s,t∈V σσ(s,t|v) (s,t) , where V is the set of nodes, σ (s, t) is the number of shortest paths (s, t), and σ (s, t | v) is the number of paths that pass through some other node v other than s, t. If s = t, σ (s, t) = 1, and if v ∈ s, t, σ (s, t | v) = 0 In-Degree
The in-degree of a node is the number of edges that point to the node
Out-Degree
The out-degree of a node is the number of edges that point outside the node
Toward a Route Optimization Modular System
381
Table 3 Results for each scenario Scenario 1
Scenario 2
Metrics
Nodes
Score
Nodes
Score
Density
–
0.00786
–
0.00217
Average connectivity of nodes
–
1.00017
–
0.96968
Average of centrality
–
0.01572
–
0.00434
Greater centrality
[156257]
0.04580
[156257]
0.04580
Less centrality
127 distinct nodes
0.01526
[156112]
0.00186
Average of intermediation
–
0.30723
–
0.04746
Larger intermediation
[155633, 155735, 155636]
0.69125
[156257]
0.48793
Minor intermediation
[156066, 156067, 156068, 156305, 156306]
0.03018
6 distinct nodes
0.0
In-degree average
–
1.03030
–
1.16231
Average out-degree
–
1.03030
–
1.16231
node that presents the highest score indicating that this node is a central point of the whole network. Even so, its centrality score decreases from 0.04580 in scenario 1 to 0.02429 in scenario 2 since in scenario 2 this node is connected to a much smaller fraction of the total number of nodes (approximately half in relation to scenario 1). It is also possible to verify in scenario 2 that node [156112] has the lowest score (0.00186), revealing itself as the least relevant node for network connectivity. The intermediation scores allow us to ascertain which nodes are most traversed in the set of all shortest paths between all possible pairs of nodes. In scenario 2, as in the centrality score, node [156257] presents the highest score, with a value of 0.48793. This means that for about half of the shortest paths between all possible pairs of nodes, this node is traversed, reinforcing the importance that this node has in the network. Another detail to take from this metric is that this node only became the node with the highest score in scenario 2, after expanding the network with more bus lines. In scenario 1, nodes [155633, 155,735, 155636] present the highest score, with a value of 0.69125. It was verified that the model produced and presented in Sect. 4.3 of this document allows you to obtain, in both test scenarios and for the levels of complexity introduced in this model (selection of intermediate points to cover in the planning of a route), the best possible route to cover, on the transport system used as a basis. For each run, all the hypotheses to travel the selected points are analyzed and, at the end of the run, the best route, the distance traveled on it, and its monetary cost are returned. It can be seen in Fig. 2, a run with three intermediate points, that the route returned as best to traverse the inserted points differs from the order in which the points are inserted. Once the model is run, it is possible to confirm that its basic goal is achieved, i.e., to provide the best route to go through a set of points; however, the inclusion of
382
J. Pinto et al.
Fig. 2 Execution results (part 1)
more variables that influence this decision would have resulted in a more complex and interesting analysis and respective decision-making for a real context.
6 Conclusion and Future Work In Sect. 5, the produced algorithm was validated in order to verify if it is able to return the shortest route between a set of points to be traversed in a graph and its associated cost. The model produced allows, by inserting a start point, an end point, and up to three intermediate points, to obtain the shortest path between the inserted points and the monetary cost associated with the route. The descriptive metrics of the graph presented in the same point indicate that the network built is not very dense (density score of 0.00217 in scenario 2), with a low average centrality of nodes (average centrality score of 0.00434 in scenario 2) limiting the possibilities of possible routes to follow. To do this, it will be necessary to increase the number of existing links in most nodes, allowing a larger number of new possible paths to travel. Once the results were analyzed and the process was reviewed, some opportunities to
Toward a Route Optimization Modular System
383
improve the model were identified. The next steps to be taken in the project include the following: • Obtain a more complete dataset capable of representing a more comprehensive transport network: Since the dataset used in the development of this project only refers to buses in the municipality of Cascais, it will be interesting to analyze how the model will adapt to a higher degree of complexity when contact points with other transport networks are introduced, whether buses from other operators or even other types of transport (trains, subway, among others). • Complement model with other variables: The introduction of more variables that impact decision-making and the result that the model returns will increase the value and usefulness of this model especially, real-time variables such as weather information or cultural events along the routes that can alter decision-making. • Planning an implementation: In order to apply and use this model in a real context, it is necessary to develop a means of doing so. The ideal solution would be a mobile application that allows a user to interact with the model and extract useful information from it. • Explore other algorithms like Exploring Graph Neural Networks (GNN), Neural Evolution techniques, Grammatical Evolution, or Reinforcement Learning. Acknowledgements This work has also been developed under the scope of the project NORTE01-0247-FEDER-045397, supported by the Northern Portugal Regional Operational Programme (NORTE 2020), under the Portugal 2020 Partnership Agreement, through the European Regional Development Fund (FEDER).
References 1. Bartneck C, Lütge C, Wagner A, Welsh S (2021) What is AI? In: An introduction to ethics in robotics and AI. SpringerBriefs in ethics. Springer, Cham. https://doi.org/10.1007/978-3-03051110-4_2 2. Poole D, Mackworth A (2017) Artificial intelligence: foundations of computational agents. Cambridge University Press 3. Albino V, Berardi U, Dangelico RM (2015) Smart cities: definitions, dimensions, performance, and initiatives. J Urban Technol 22(1):3–21. https://doi.org/10.1080/10630732.2014.942092 4. Hall P (2000) Creative cities and economic development. Urban Stud 37(4):200 5. Harrison C et al (2010) Foundations for smarter cities. IBM J Res Dev 54(4):1–16. https://doi. org/10.1147/JRD.2010.2048257 6. Samih H (2019) Smart cities and internet of things. J Inf Technol Case Appl Res 21(1):3–12 7. Mohri M, Rostamizadeh A, Ameet (2012) Foundations of machine learning
Intellectual Capital and Information Systems (Technology): What Does Some Literature Review Say? Óscar Teixeira Ramada
Abstract This research has the goal to make known, the binomial, intellectual capital, and information systems (technology), what some of the literature review says about it. Altogether, from the scarce set of existing research in this regard, five papers were selected that obeyed the criterion of the presence of a relationship with these two topics, together. In terms of substance, what can be concluded is that the scientific contribution given to the knowledge made wider, is very tenuous, not to say, null. What can be seen is that, it was research, based on secondary and also primary sources, the former not being suitable for this specific purpose. And even the primary ones, suffer from some technicality, that proves to be of little practical use. In short, it can be said that these two topics, conditioned by the selection made, did not add any contribution to the expansion of scientific knowledge. West, east, north, and south, nothing new. Keywords Intellectual capital · Intangible assets · Information systems · Technology
1 Introduction The topic of intellectual capital, alone, has become increasingly important as it is recognized that other topics, such as business performance, competitive advantages, innovation and the well-being of citizens, countries and the world, in general, is getting better and better. The literature on this topic appears predominantly associated with these and others, and can be interpreted as part of an integrated perspective. This is what happens with [1–6], which, in the most recent one, covers the most diverse years (from the most distant—[7, 8] and even the closest—[9, 10]). These, as well as other authors, translate the set of research, which encompasses, two or three or four or even more topics in an interconnected way that, in a deeper analysis, intends to expand the Ó. T. Ramada (B) ISCE - Douro - Instituto Superior de Ciências Educativas do Douro, Porto, Portugal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_33
385
386
Ó. T. Ramada
knowledge being it more complete. But what is observed is that it sins for being less complete, less clear, less understandable and, above all, less applicable in practical reality. One of the most suitable scientific methods to know the value of intellectual capital is Andriessen’s method [11], in his work entitled, “Making Sense of Intellectual Capital—Designing a Method for the Valuation of Intangibles”, 2004, p. 137. He is not the only author who has published research focused on the final value of intellectual capital. Other authors such as Fedotova et al. [12], Goebel [13] and Gogan and Draghici [14] also have it. Specifically, regarding to the intellectual capital and information systems (technology), these related topics present a set of characteristics. In order to gather information that is relevant for various business purposes, information systems, that is, the way in which companies have specifically recorded their assets and liabilities (commonly known as property and the like), which tends to be shaped with the help of technology, it makes it possible for them to know their activities, how they are structured, to quantify them, to see how they vary (increase, decrease, or remain constant). An information system (technology), developed and extended, in order to allow the exercise of control over the company, preferably in real time, allows it to be more efficient and effective. Intellectual capital is the necessary tool to design a good information system (technology), adapted to a specific company and that makes it possible to manage it better. In this way, the consideration, together, of the two topics, intellectual capital and information systems (technology), are so important that, if considered individually, exhibit the obstacle of lacking something that complements them, mutually, and makes them exhibit greater business knowledge, especially the value of intangible assets, in addition to tangible ones. The databases consulted were “B-on”, “Elsevier” (Science Direct), “Taylor and Francis”, “Emerald Collections”, “Web of Science”, and “Wiley Online Library”, predominantly. In particular, “International Journal of Accounting and Financial Reporting”, “International Journal of Learning and Intellectual Capital”, “Information and Management”, “Journal of Strategic Information Systems”, “Journal of Intellectual Capital”, and “ResearchGate” (Conference Paper). The consideration of only five papers object of research was the result, scarce, of a deep selection in these databases, and not an insufficient effort by the researcher. It was the consideration of the binomial, together, intellectual capital and information systems (technology), which served as criteria for inclusion and exclusion, in the selection of papers, and in the formulation of the research question. Therefore, this consists of knowing: what does the selected literature review say with these two combined topics? Is there any specificity to show that results from it? For this reason, one of the contributions of this research is to make known what exists in the literature and, in particular, in the review carried out.
Intellectual Capital and Information Systems (Technology): What Does …
387
2 Literature Review Al-Dalahmeh et al. [9] are three authors who carried out research on the impact of intellectual capital on the development and efficiency of accounting information systems applied to companies in the industrial sector in the Kingdom of Jordan, according to accounting auditors. These information systems, both in their accounting and technological aspects, play a crucial role in business success, insofar as information is a valuable resource and, therefore, a source of effectiveness and efficiency. To this end, companies lack intellectual capital to direct resources and increase the aforementioned information efficiency, which involves the development of accounting systems, that cannot be achieved until the intellectual capital is developed. Thus, the research aims to underline how important this development is to increase the efficiency of accounting information applicable to companies in the industrial sector. Thus, the research goals are the presentation of a theoretical framework on the concept of intellectual capital and the development of its different dimensions, in addition to demonstrating their effect on the efficiency of accounting information systems. With regard to the research method used, it is an analytical-descriptive approach in which the researchers collected information, both from primary and secondary sources. In the first, the necessary information came from a questionnaire prepared and distributed to a group of external auditors who constituted a sample of this research, after which the answers were analyzed using SPSS to test the compliance of the same. In the second, the information consisted of books, researchers’ theses, papers in specialty journals, in order to build a theoretical framework and, thus, achieve the research goals. With regard to the study population, seventy-five companies listed on the Amman Stock Exchange and belonging to the industrial sector were selected. In the sample, the corresponding seventy-five auditors from the same companies with high qualifications and professional efficiency, responsible for the audit of the aforementioned companies, were selected, also. The implicit concept of intellectual capital, adopted by these authors, consists of four components: human capital (skills and competences), creativity capital (development of new products and/or services), operational capital (work systems and expenses), and customer capital (customer relationships and answers to customer needs). As main conclusions obtained by the authors in the research, seven stand out. First, the efficiency of accounting information systems is measured by the benefits achieved through the use of the outputs of these systems, compared with the costs incurred with their construction, design, and operation. Second, there is an urgency for companies to develop intellectual capital to improve the application of accounting information systems, which involves mechanisms that promote this, making the resource conceived and maintained in any company.
388
Ó. T. Ramada
Third, industrial companies must work to determine the level of knowledge and skills in order to guarantee the quality and efficiency of accounting information systems. Fourth, these, applied in companies, increase the efficiency of workers and the skills to develop and achieve progress. Fifth, industrial companies provide the possibility of progress in the work and development of workers to guarantee industrial and information quality. Sixth, companies participate in initiatives to increase the level of industrial performance and the efficiency of accounting information systems. Finally, and seventh, industrial companies use practical means to find new ideas and the quality of accounting information systems. As recommendations, the authors suggest the need to develop the intellectual capital of industrial companies as the main focus of management due to the pioneering effect on companies in the long run, increasing investment. Directing companies, they must be managed in the sense of adopting clear and transparent policies, in order to bring together the competent members in such a way, that they raise the level and quality of the accounting information systems. The need in the management of industrial companies in the Kingdom of Jordan should be such that it promotes the development of intellectual capital because of its effects on improvement accounting information systems, stimulates an intellectual culture that increases its importance. Finally, the need to increase elements of the creative capital of workers is emphasized, via accounting information systems in industrial companies, by virtue of its pioneering role, nationally and internationally. Zeinali et al. [10] are also three authors who carried out a research about an assessment of the impact of information technology capital and intellectual capital (organizational capital, relational capital, and innovation capital) on the future returns of companies in the securities markets. With regard to the research method used, it was of the quasi-experimental type, based on the present information and the Financial Statements of the companies. It should be noted that this was a correlative and descriptive study, with regard to data collection. It was a post-hoc study. The population includes all investment, banking and telecommunications, electronic payments and insurance companies, listed in the securities markets of the Tehran Stock Exchange (Iran). The sample, therefore, consisted of fifty companies, selected in the years 2009–2013. The considered hypotheses were stipulated in an econometric regression model, with variables constituted from panel data. As a dependent variable, the future stock returns of company i in year t + 1 (Ri, t+1 ) were used. As independent variables, the technological information of company i in year t + 1 (IT Capitali, t+1 ), the organization of the capital of company i in year t + 1 (Organizational Capitali, t+1 ), the relational capital of company i were used in year t + 1 (Relational Capitali, t+1 ), the R&D Capital of company i in year t + 1 times the investment of company i in year t + 1 (R&DCapitali, t+1 × Investi, t+1 ), financial leverage of company i in year t + 1 (LEVi, t+1 ), age of company i in year t + 1 (Agei, t+1 ), size of company i in year t + 1 (SIZEi, t+1 ), and investment of company i in year t + 1 times R&D Capital of company i in year t + 1 (Investi, t+1 × R&DCapitali, t+1 ).
Intellectual Capital and Information Systems (Technology): What Does …
389
Regarding the most evident conclusions, the researchers concluded that, the planning of investments in the area of technological information, according to the business goals, without forgetting the dimension and structure, facilitates their activities between the different sectors, reducing time and costs. This leads to higher returns, which depend on information technologies. The intellectual capital appears in this context as a hidden value that causes benefits in the Financial Statements. It guides companies towards achieving competitive advantages and higher returns and reveals that the economic value of business resources is more the result of intellectual capital and less the production of goods and/or services. With regard to information technologies, the authors claim that, increasingly, it has assumed a greater role in all aspects, from production, distribution and sales methods, being the factor that advances the perspectives of future returns. If companies have available, correct, accurate and timely information, they can attract competitive advantages, with information being an important strategic source. Investing in information technologies is of all importance to improve the skills and competences of companies. Hsu et al. [7] are also three other authors who research on the boundaries of knowledge between information systems and business disciplines, from a perspective centered on intellectual capital. Indeed, the authors state that the development of information systems can be considered as a kind of collaboration between users and those responsible for their development. Having few skills to leverage localized knowledge, embedded in these two types of stakeholders, can serve as an obstacle to software development in order to achieve high performances. Therefore, exploring directions to efficiently bridge the frontiers of knowledge in order to facilitate access to it, is essential. From the point of view of the research method used, the authors resorted to a survey in order to carry out the empirical test. This approach has its origins in previous literature on the topics. Respondents were professionals who, in some way, dealt with the development of information systems. Thus, they performed a two-step approach to data collection. In the first, they contacted the 251 managers of the information systems departments of the “Taiwan Information Managers Association”. Via telephone, they informed the purpose of the research and verified their availability to participate. For those who accepted, they were asked to nominate project managers, group leaders, senior members within organizations, among others. For companies with two or more completed projects, each contact’s information was recorded. In all, a total of 750 projects were identified. In the second, the aforementioned survey was carried out, and a survey was delivered to the 750 managers of the mentioned groups, identified in the first step. A total of 279 answers were obtained, corresponding to a answer rate of 35.6%. As omitted answers were obtained, only a total of 267 answers from 113 companies were considered. To ensure sample representativeness, two analyzes were carried out by the researchers: first, companies that were able to participate in the study, were compared with those that were not. No differences were found between the two groups in terms
390
Ó. T. Ramada
of size and business sector. It was ensured that there were no significant differences, between those that were chosen. From a socio-demographic point of view, 73% were male and 27% were female. Among males, 58% had a Bachelor’s degree, and 35% had a Master’s degree. Among them, 43% were programmers, 18% systems analysts and 19% project leaders. With regard to age, 28% were between 21 and 30 years old, 60% between 31 and 40 years old, 10% between 41 and 50 years old, and over 51 years old only 1.5%. As the main conclusions drawn by the authors, it is emphasized that the frontiers of knowledge played an important role in forecasting systems and in the quality of projects, as well as having a mediation role, between intellectual capital and the performance of information systems. The three components of intellectual capital (human capital, relational capital, and structural capital) have been shown to have a significant impact on knowledge efficiency. The magnitude of the impact of the human capital component on knowledge proved to be moderated by the relationship between users and those who develop information systems. Generally speaking, higher (lower) levels of relational capital held by the two types of stakeholders minimize (maximize) the negative impacts of insufficient understanding of effective knowledge. As main limitations, the researchers mention that, a cross-sectional sample used may have inversely affected intellectual capital. So, future research recommends the use of a temporal sample. On the other hand, in the sample used, only one side was consulted in understanding the efficiency of knowledge. Indeed, this level should be more detailed and not limited to the two types of stakeholders. Reich and Kaarst-Brown [8] are two authors whose focus, in their research, refers to the creation, of intellectual and social capital, through information technologies, in the context of career transition. Certain organizations must continuously innovate with information technologies in order to maintain their competitive advantages. The idea is to illustrate, using a case study, how “Clarica Life Insurance Company” created, from 1993 to 1996 (sample period), the channel that allowed business within the innovations in information technologies. This company is a financial institution that provided financial services to customers in Canada and the United States at that time, including life insurance, investment products, employee benefits, management services for people with reduced mobility, financial planning, mortgage loans, and pension plans. It should also be noted that, in 2002, this company was acquired by “Sun Life”. Theoretically speaking, its foundations lie in the works of [15], in which theories of the co-creation of intellectual and social capital were created. With regard to the research method, it should be noted that the authors made use of individual interviews, with those who occupied the most important positions in the aforementioned sample period. In order to overcome problems arising from the aggregation of individuals’ answers, the authors collected them through various data sources, such as surveys, interviews and published documents originating from the company and the media. They also carried out the triangulation between three
Intellectual Capital and Information Systems (Technology): What Does …
391
different organizational groups, which allowed insights into former workers, professionals in general and business professionals, who exercised activity in information technologies and even those who still exercise activity in the same domain. In this way, the intention is understanding the evidence demonstrated in such a way that one can know the career transitions and the results obtained. With regard to the conclusions, the authors divide them into two types of categories: enrichment of knowledge of career transitions, based on the case study, “Clarica Life Insurance Company”. Thus, from the point of view of these conclusions, they are more identified by the authors as implications for research. Another conclusion is related to the fact that the approach taken shows little ability to see, in a comprehensive way, political or structural issues. From the point of view of the implications for management, there is no doubt for the authors that the managers of “Clarica Life Insurance Company” recognize the value of intellectual and social capital in the information technologies involved over several years. First, an assessment of the initial social capital between the business and information technology areas should be initiated. Second, each company can face similar or different impediments to the beginning of this spiral or its continuity. Cunha et al. [16], finally, there are three authors, who related the intellectual capital and information technologies, carrying out a systematic review of the results they have arrived at. In fact, according to these authors, the world is experiencing such an evolution that the economy is increasingly based on knowledge, information technologies, innovation, and telecommunications. This rise of the economy based on the knowledge has increased interest in the theory of intellectual capital, which aims to manage the intangible assets of organizations. Companies that belong to these activity sectors recognize, in intellectual capital, the key based on knowledge that contributes to creating competitive advantages in them. The research seeks to answer the following question: How do intellectual capital and information technology relate to each other? by resorting to the aforementioned systematic review, based on four steps: conducting the researcher, selecting papers based on their titles and abstracts, content analysis and, finally, mapping evidence and discussions. Thus, with regard to the research method, the approach is that of a systematic review in order to make it understandable and unbiased research, distinguished from the traditional review. The process covers some stages, culminating in a thematic map with their respective syntheses. Regarding the first step, conducting the research, the process was carried out through an automatic search in the bibliographic database engines: “Elsevier” (Science Direct), “Wiley Online Library”, and “Emerald Collections”. In the second step, selection of papers based on their titles and abstracts, these were read in order to exclude those that were not related to the scope of the research. Thus, the result was forty-nine papers selected in total, of which twenty-eight from “Elsevier” (Science Direct) (25% of 113), three from “Wiley Online Library” (4% of 74 papers), and eighteen from “Emerald Collections” (17% of 103 papers). In the third step, content analysis, all papers were read and analyzed according to inclusion and exclusion criteria. According to the inclusion criteria, there are those
392
Ó. T. Ramada
that are only considered related to journals, with relevance being given to the topics of intellectual capital and information technologies and, according to the exclusion criteria, the fact that the papers do not focus only on the two topics mentioned. Finally, in mapping evidence and discussions, all papers were analyzed and grouped by five themes, in order to provide an answer to the aforementioned research question: statistical analysis, information technologies, technological knowledge, intellectual capital assets and theory of intellectual capital (understanding and sharing knowledge). As main conclusions to be highlighted, it should be noted that human capital was the most studied component and relational capital was the least, which can serve as a basis for future research. Some topics can be highlighted, such as the relationship between intellectual capital and information technologies, which identify them with knowledge management, learning in organizations, human resources management, innovation and the creation of new knowledge, absorption capacities and competitive advantages. In short, the authors conclude that the adoption of intellectual capital management and information technologies confirm that the needs of the new economy have been met. Knowledge generates new knowledge that can be achieved from the creation of knowledge assets generated by stakeholders. A research of this nature helps to clarify the procedures for managing intellectual capital in information technology projects, which opens new horizons for future research themes.
3 Conclusions This research refers to the binomial intellectual capital and information systems (technology), with regard to the literature review. Its goal is to dissect, within the selected literature review, that would satisfy the two topics, what it refers to about it. Given the small number, it may suggest to the most demanding reader that it is nothing more than a mere collection of papers, the result of a selection that should deserve more development and care. It turns out, however, that this is not true. The papers obtained, in which the two criteria were present, were few and that is why the result was meager, far below the desired. Just as possible. Throughout this research, essentially, the geographical contexts were the Kingdom of Jordan, Tehran (Iran), Taiwan, and Canada. Thus, it can be concluded these geographic locations where there would be more scientific interest are not included. In addition to the scarcity of research, it is noted that, on top of that, its content is devoid of relevance, insofar as it does not contribute to the expansion of scientific knowledge in the field, making it possible to learn more. One of the possible explanations for the occurrence of this content devoid of relevance is due to the fact that the combination of intellectual capital and information systems (technology), is difficult to treat, together. Moreover, as explained in
Intellectual Capital and Information Systems (Technology): What Does …
393
the introduction, the junction of two or more topics, regarding, namely, intellectual capital, is less applicable in practical reality. In aggregate terms, it can be synthetically inferred that the selection of the five papers is related with the impact of intellectual capital on the development and efficiency of accounting information systems (within the scope of companies in the industrial sectors), with the evaluation of the impact of technological information and intellectual capital, on future business returns in capital markets, with the frontiers of knowledge between information systems and the disciplines associated with business focused on intellectual capital, with the creation of intellectual capital through technologies of information, inserted in career transition and, finally, with a systematic review of the results found regarding intellectual capital and information technologies. In comparative terms, some ideas can be mentioned. Indeed, for [9], the development of accounting systems cannot be achieved without the intellectual capital and, for [10], this is the basis of competitive advantages along with high returns in capital markets. Adequate investment in information technologies, for [10], is important to improve business skills and these, too, underlie high rates of return in the same markets. With regard to the relationship between information systems and business disciplines, from the point of view of intellectual capital, studied by [7], if a company has few qualifications to leverage knowledge, this constitutes an obstacle to developing software. in order to explore new directions that go beyond the frontiers of knowledge, facilitating access to it. Reich and Kaarst-Brown [8] relates the creation of intellectual capital with information technologies, in the context of career transition. In a case study, the authors concluded that, in this same context, there was enrichment in the career transition and suggest new theoretical developments. Relating information technologies with intellectual capital, in [16], it appears that there has been an increase in the knowledge-based economy, namely, which has had an impact on theories about intellectual capital and other intangible assets. Thus, the management of intellectual capital, together with information technologies, confirms the idea that knowledge generates new knowledge. It can therefore be said that there is still a long way to go in the binomial in question. In the contributions of this research, it can be confirmed that what exists in the literature on this binomial, is scarce in number, little relevant in content, and in terms of practical utility, little or nothing was obtained. However, without a joint, integrated arrangement, based on a brief literature review alluding to the binomial, intellectual capital and information systems (technology), nothing could be known, based on scientific papers. Thus, this work constitutes the missing demonstration of the content of this binomial. With regard to the implications, it is possible to underline what refers to the very large area that remains to be filled, for instance, more developed economies. Perhaps, this filling in will prove to be more fruitful and will provide better results. Regarding the limitations, they basically have to do with the fact that the primary and secondary sources of data are based on what has already passed, as opposed to being on what is yet to come.
394
Ó. T. Ramada
Finally, as far as future avenues of research, they are multifaceted. For instance, considering different activity sectors, more or less technological, finding an answer to the question of whether in sectors intensive in the labor factor, the adequacy is satisfactory in the employment of the binomial or not [11]. On the other hand, other future avenues of research are the geographic core being more incident in countries with more developed economies (Europe, America, …), with the help of the intellectual capital and information systems (technologies). Regarding the answer to the research question, it can be said that the aforementioned review is very small, having an almost null content, scientifically. From a concrete point of view, regarding the specificity to be highlighted, it is limited to the absence of specific contributions that can be used scientifically. In a word: nothing.
References 1. Gómez-González J, Rincon C, Rodriguez K (2012) Does the use of foreign currency derivatives affect firm’s market value? Evidence from Colombia. Emerg Mark Financ Trade 48(4):50–66 2. Akman G, Yilmaz C (2008) Innovative capability, innovation strategy and market orientation: an empirical analysis in Turkish software industry. Int J Innov Manag 12(1):69–111 3. Arvan M, Omidvar A, Ghodsi R (2016) Intellectual capital evaluation using fuzzy cognitive maps: a scenario-based development planning. Expert Syst Appl 55:21–36 4. Boekestein B (2006) The relation between intellectual capital and intangible assets of pharmaceutical companies. J Cap Intellect 7(2):241–253 5. Chen Y, Lin M, Chang C (2009) The positive effect of relationship learning and absorptive capacity on innovation performance and competitive advantage in industrial markets. Ind Mark Manag 38(2):152–158 6. Croteau A, Bergeron F (2001) An information technology trilogy: business strategy, technological development and organizational performance. J Strat Inf Syst 10(2):77–99 7. Hsu J, Lin T, Chu T, Lo C (2014) Coping knowledge boundaries between information system and business disciplines: an intellectual capital perspective. Inf Manag 1–39 8. Reich B, Kaarst-Brown M (2003) Creating social and intellectual capital through IT career transitions. J Strat Inf Syst 12:91–109 9. Al-Dalahmeh S, Abdilmuném U, Al-Dulaimi K (2016) The impact of intellectual capital on the development of efficient accounting information systems applied in the contributing Jordanian industrial companies—viewpoint of Jordanian accountant auditors. Int J Acc Financ Rep 6(2):356–378 10. Zeinali K, Zadeh F, Hosseini S (2019) Evaluation of the impact of information technology capital and intellectual capital on future returns of companies in the capital market. Int J Learn Intellect Cap 16(3):239–253 11. Andriessen D (2004) Making sense of intellectual capital—designing a method for the valuation of intangibles. Elsevier, Butterworth-Heinemann, p 2004 12. Fedotova M, Loseva O, Fedosova R (2014) Development of a methodology for evaluation of the intellectual capital of a region. Life Sci J 11(8):739–746 13. Goebel V (2015) Estimating a measure of intellectual capital value to tests its determinants. J Intellect Cap 16(1):101–120 14. Gogan L, Draghici A (2013) A model to evaluate the intellectual capital. Procedia Technol 9:867–875
Intellectual Capital and Information Systems (Technology): What Does …
395
15. Nahapiet J, Ghoshal S (1998) Social capital, intellectual capital, and the organizational advantage. Acad Manag Rev 23(3):242–266 16. Cunha A, Matos F, Thomaz J (2015) The relationship between intellectual capital and information technology findings based on a systematic review. In: 7th European conference on intellectual capital ECIC, pp 1–11
REST, GraphQL, and GraphQL Wrapper APIs Evaluation. A Computational Laboratory Experiment Antonio Quiña-Mera , Cathy Guevara-Vega , José Caiza, José Mise, and Pablo Landeta Abstract This research studies the effects of development architectures on the quality of APIs by conducting a computational laboratory experiment comparing the performance efficiency of a GraphQL API, a REST API, and a GraphQL API that wraps a REST API. Open data from the Electronic Chamber of Commerce of Ecuador, part of a national e-commerce research project, was used. To characterize quality, we used ISO/IEC 25,023 metrics in different use cases of e-commerce data consumption and insertion. Finally, we statistically analyzed the experiment results, which indicate a difference in quality between the REST API, the GraphQL API, and the GraphQL API (wrapper); this being the case, the GraphQL API performs more efficiently. Keywords REST API · GraphQL API · Wrapper · Computational laboratory experiment · ISO/IEC 25,023 · e-commerce
A. Quiña-Mera · C. Guevara-Vega (B) · P. Landeta Universidad Técnica del Norte, 17 de Julio Avenue, Ibarra, Ecuador e-mail: [email protected] A. Quiña-Mera e-mail: [email protected] P. Landeta e-mail: [email protected] A. Quiña-Mera · C. Guevara-Vega eCIER Research Group, Universidad Técnica del Norte, Ibarra, Ecuador J. Caiza · J. Mise Universidad de Las Fuerzas Armadas ESPE, Quijano y Ordoñez Street, Latacunga, Ecuador e-mail: [email protected] J. Mise e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Anwar et al. (eds.), Proceedings of International Conference on Information Technology and Applications, Lecture Notes in Networks and Systems 614, https://doi.org/10.1007/978-981-19-9331-2_34
397
398
A. Quiña-Mera et al.
1 Introduction GraphQL is a query language for web-based application programming interfaces (APIs) proposed by Facebook in 2015 that represents an alternative to the use of traditional REST APIs [6, 7]. The work presented by Brito et al. cites an example of the simplicity of GraphQL APIs compared to REST APIs, which with a GraphQL query indicates only the required fields, unlike REST returns a very long document in JSON format with many fields, of which they use a few [9, 11]. The evolution of computer systems has motivated the creation of more efficient technologies when developing a software technology architecture. In this sense, we propose the construction of a GraphQL API that consumes data from a database and a GraphQL API that wraps an existing REST API called a GraphQL wrapper. The consumption efficiency of these APIs is evaluated using software product quality metrics based on ISO/IEC 25023 [4]. We apply this project to the existing REST API of the Electronic Commerce Chamber of Ecuador that exposes open e-commerce data with two purposes: (i) to provide new open data consumption functionality to the technology community, and (ii) to identify which API is the most efficient (focus of our study). Therefore, we define the following research question: What is the effect of the architecture in API development on the external quality of the software? We answer the research question with a computational laboratory experiment [3] to compare the efficiency of three APIs built with different architectures: REST, GraphQL, and Wrapper (REST + GraphQL) using the open e-commerce data of the Ecuadorian Chamber of Commerce. The rest of the document consists of: Sect. 2 briefly describes REST and GraphQL. Section 3 provides an experimental setup to compare the performance efficiency between GraphQL, REST, and GraphQL APIs (wrapper). Section 4 shows the execution of the experimental design proposed in Sect. 3. Section 5 shows the results of the experiment execution. Section 6 shows the threats to the validity, execution, and results of the experiment. Section 7 shows the discussion of the results. Finally, Sect. 8 shows the conclusions of the study and future work.
2 REST and GraphQL REST (REpresentational State Transfer) is an architectural style developed as an abstract model for web architecture and used to guide the definition of Hypertext Transfer Protocol and Uniform Resource Identifiers [9]. When its creation in 2000, it became a solution to eliminate the complexity of web services and transform serviceoriented architectures [10]. Specifically, it can be mentioned that, at its creation, it was a lighter solution than Web Services Description Language (WSDL) and SOAP due to the use of XML-based messages with which it was possible to create very useful data sets [8]. GraphQL is a language that uses a graph pattern as its basic operational unit [13]. A graph pattern consists of a graph structure and a preset of
REST, GraphQL, and GraphQL Wrapper APIs Evaluation. …
399
graph attributes [17]. It is a framework that includes the graph-based query language Hartig [15]. There are REST APIs that wish to have the advantages of GraphQL, for which additional interfaces are required. This is where the creation of GraphQL wrappers for existing REST APIs comes in. Once it receives GraphQL queries, a Wrapper passes the requirements through the target API [12].
3 Experimental Setting 3.1 Goal Definition The objective of the computational laboratory experiment is to compare the efficiency of GraphQL, REST, and GraphQL (Wrapper) architectures applied to the open ecommerce data of the Equatorial Chamber of Commerce.
3.2 Factors and Treatments The investigated factor is the external software quality of API GraphQL, API REST, and API GraphQL (wrapper), operationalized with performance efficiency metrics. The treatments applied to this factor are – GraphQL architecture for developing APIs. – REST architecture for developing APIs. – GraphQL architecture (wrapper) for API development.
3.3 Variables To measure the performance efficiency of GraphQL, REST and GraphQL (wrapper) architectures, we conducted a computational laboratory experiment. We relied on the following metrics from the ISO/IEC 25023 standard [4]: – Average response time is the time it takes to complete a job or an asynchronous process. The measurement function is X = nI=1 (B I = A I ) n
(1)
where AI is the time to start job I; BI is the time to complete job I; and n = the number of measurements. – Average system response time, the average time taken by the system to respond to a user task or system task.
400
A. Quiña-Mera et al.
X = nI=1 (Ai ) n
(2)
where Ai = Time taken by the system to respond to a specific user task or system task at the ith measurement, and n = number of measured responses.
3.4 Hypothesis We propose the following hypotheses for the experiment: – H 0 (Null hypothesis): There is no difference in the external quality of APIs developed with GraphQL, REST, or GraphQL (wrapper). – H 1 (Alternative hypothesis): There is a difference in the external quality of APIs developed with GraphQL, REST, or GraphQL (wrapper). The GraphQL API has a more efficient performance. – H 2 (Alternative hypothesis): There is a difference in the external quality of APIs developed with GraphQL, REST, or GraphQL (wrapper). The REST API has a more efficient performance. – H 3 (Alternative hypothesis): There is a difference in the external quality of APIs developed with GraphQL, REST, or GraphQL (wrapper). The GraphQL API (wrapper) has a more efficient performance.
3.5 Design We based the design of the experiment on the execution of four use cases. Two data query use cases with complexity up to two levels of relationship in the data structure. And two data insertion use cases. We will execute the query use cases with the following numbers of records 1, 10, 100, 100, 100, 1000, 10000, 10000, 100000, 300000 to each API (REST, GraphQL, and GraphQL (wrapper)).
3.6 Use Cases Below, we show four use cases adapted to the reporting structure from 2017 to 2020 from an annual survey conducted by the Ecuadorian Chamber of Electronic Commerce on e-commerce: Use case UC-01. Query the global data of the questions asked in the e-commerce survey conducted in 2017, 2018, 2019, and 2020. Use case UC-02. Queries data on the frequency of internet usage, reasons for purchase, and reasons for non-purchase for 2017, 2018, 2019, and 2020. Use case UC-03. Inserts questions from the 2020 e-commerce survey.
REST, GraphQL, and GraphQL Wrapper APIs Evaluation. …
401
Experimental laboratory Local-PC Client Application (Experimental Tasks) GraphQL (wrapper) Use cases REST Use cases
GraphQL-API (Wrapper)
E-commerce REST-API
Type System Data request
Resolvers GraphQL Use cases
Data request
GraphQL-API
Response
Type System
JSON response files
Resolvers
Data request
e-commerce database
Fig. 1 Computational laboratory experiment architecture
Use case UC-04. Inserts the frequency of internet usage, reasons for purchase, and reasons for not purchasing by 2020.
3.7 Experimental Tasks In this section, we design a computational laboratory to execute the experiment set up in Sect. 3.5, see Fig. 1. The requirements of the experimental tasks are detailed below: Experimental task 1—REST API queries and inserts. This task executes the use cases of querying and inserting data over the REST API of e-commerce data using a client application that automates this process. Experimental task 2—GraphQL API queries and inserts. This task executes the use cases of querying and inserting data over the GraphQL API of e-commerce data using a client application that automates this process. Experimental task 3—GraphQL API (wrapper) queries and inserts. This task executes the use cases of query and data insertion over the GraphQL API (wrapper) of the e-commerce data using the same client application that automates task 1.
3.8 Instrumentation The instruments used in the experiment are described, such as the infrastructure, technology, and libraries that compose the computational laboratory: Specification of the local computer where the APIs are implemented: – Linux Ubuntu 3.14.0 operating system.
402
A. Quiña-Mera et al.
– Vcpu 1 core. – RAM memory 2 GB. – Hard Disk 30 GB. Development environment, which consists of the following technologies: Backend: (REST API, GraphQL API, and GraphQL API (wrapper)) – – – – – – –
Visual Studio Code (Code Editor). Node.js (JavaScript runtime environment). Npm (package management system). Express.js (web application framework). FrontEnd: (Client Application) Visual Studio Code (Code Editor). React.js (JavaScript library for creating user interfaces). Apollo Client (queries to GraphQL APIs). IBM SPSS [18] application was used for data collection and analysis.
3.9 Data Collection and Analysis The steps to collect the data from the use case execution established in the experiment are: (i) execute the use case in the client application. (ii) Copy the result of the use case execution from the Visual Studio Code programming IDE console and paste it into a Microsoft Excel 365 file. (iii) Tabulate the collected data. We will statistically analyze the experiment’s results using IBM SPSS Statistics Pearson correlation matrices to observe the degree of a linear relationship in each variable. And the discriminant analysis to observe the significant differences between the architectures applied in the experiment using Wilks’ Lambda statistic.
4 Experiment Execution The experiment was run in October 2021, following the provisions of Sect. 3.
4.1 Preparation We start by checking that the components of the experimental lab are ready to execute the use cases in the client application that consumes the REST API, GraphQL API, and GraphQL API (wrapper). We then set up the following experiment execution steps: – In a specific GraphQL API EndPoint (wrapper), we run the data query use cases for each amount of data (10, 100, 1000, 10,000, 100,000,300,000).
REST, GraphQL, and GraphQL Wrapper APIs Evaluation. … Table 1 Use Case UC-01 results
# Records
REST API
403 GraphQL API 2.012,6965
GraphQL API (Wrapper)
10
4.981,4331
7118.0000
100
6.937,0527
10.351,2771
16.537,8169
1000
8.401,2710
12.262,4505
21.675,0374
10,000
11.399,8534
8.710,2970
20.110,1503
100,000
18.483,0179
3.380,3218
21.863,3397
300,000
39.962,5077
5.966,3039
45.928,8116
Average
15.027,5226
7.113,8911
22.205,5259
– In a specific GraphQL API EndPoint (wrapper), we execute the data inser tion use cases for one minute. – Then, in different REST API EndPoints, we execute the data query use cases for each amount of data (10, 100, 1000, 10,000, 100,000,300,000). – In different REST API EndPoints, we execute the data insertion use cases for one minute. – After each execution, we copy the results to the Microsoft Excel file.
4.2 Data Collection In this section, we show in Table 1 the example of the data collection of the UC01 use case, where you can observe the response time of the UC-01 execution for a different number of registers for each architecture, the time is in milliseconds.
5 Results This section in Table 2 shows the averages of response time and average system response time results as external quality metrics of the APIs obtained from the execution of the use cases of the experiment. Although we observe that the response time performance of GraphQL API is up to two times faster compared to REST API and three times faster than GraphQL API (wrapper), note that the result of the wrapper is similar to the sum of the results of the REST and GraphQL APIs.
5.1 Statistical Analysis After obtaining the results, we performed the statistical analysis by calculating the Pearson correlation for the two variables (response time and system response
404
A. Quiña-Mera et al.
Table 2 Experiment results Architecture
GraphQL API
UC-01
UC-02
UC-03
UC-04
Response time
Response time
Performance
Performance
Efficiency order
7113,9
897,8
1,456
4,28
1
REST API
15,027,52
6595,72
1,06
0,7245
2
GraphQL API (wrapper)
22,205,52
7493,58
0,8454
0,6071
3
time), as well as the equality test of means of the groups indicating Wilks’ Lambda values. Table 3 shows the Pearson correlation matrix between the response time and system response time variables of the REST API, GraphQL API, and GraphQL API (wrapper) architectures. We observe that the execution time of GraphQL API and GraphQL API (wrapper) have a positive linear correlation of 0.373. On the other hand, the performance variable of GraphQL API and REST API have a positive linear correlation of 0.78. Table 4 shows the result of the Group Equality of Means test indicating Wilks’ Lambda values (with the value closest to zero being a positive indicator). In this sense, we observe that the GraphQL API (wrapper) has the lowest significance level with 0.003, which concludes that there are differences between the groups. Therefore, we can determine that the GraphQL API has a significant advantage over the others. From the same performance, we observe that the value of 0.011 of the REST API is lower than the p-value (0.05). For this reason, we accept the alternative Hypothesis H 1 that indicates a difference in the external quality between REST API, GraphQL API, and GraphQL API (wrapper). So being, API GraphQL has a more efficient performance.
6 Threats to Validity Threats to validity are a set of situations factors, weaknesses, and limitations that could interfere with the validity of the results of the present empirical study, so the relevant potential threats are analyzed based on the classification proposed by Wohlin et al. [16]. Internal validity. We manipulated the GraphQL wrapper development process to expose open e-commerce data by applying the agile SCRUM methodology. We then started the experiment by performing four data query and insertion use cases with a scope of up to two levels of relationship in the data query. Next, we took the response time from the execution of the request to the receipt of the response. Finally, we evaluate the resource utilization by taking the CPU processing speed in the ISO/IEC 25023 quality metric.
REST, GraphQL, and GraphQL Wrapper APIs Evaluation. …
405
Table 3 Pearson correlation matrix R-TIME PEA
R-TIME
R-PERF
W-TIME
W-PERF
G-TIME
G-PERF
1
−0.906*
−1.67
0.320
−2.86
0.703 0.119
R-TIME SIG
0.013
0.752
0.537
0.583
R-TIME N
6
6
6
6
6
6
R-PERF PEA
−0.906*
1
−0.248
0.082
0.078
−0.793
R-PERF SIG
0.013
0.635
0.877
0.883
0.060
R-PERF N
6
6
6
6
6
6
W-TIME PEA
−0.167
−0.248
1
−0.980**
0.373
0.386
W-TIME SIG
0.752
0.635
0.001
0.466
0.450
W-TIME N
6
6
6
6
6
6
W-PERF PEA
0.320
0.082
−0.980**
1
−0.411
−0.276
W-PERF SIG
0.537
0.877
0.001
0.419
0.597
W-PERF N
6
6
6
6
6
6
G-TIME PEA
−0.286
0.078
0.373
−0.411
1
−0.419
G-TIME SIG
0.583
0.833
0.466
0.419
G-TIME N
6
6
6
6
6
6
G-PERF PEA
0.703
−0.793
0.386
−0.276
−0.419
1
G-PERF SIG
0.119
0.60
0.450
0.597
0.409
G-PERF N
6
6
6
6
6
0.409
6
R: REST API; G: GraphQL API; W: GraphQL (wrapper) API PERF: Performance; PEA: Pearson correlation; SIG: Significance (bilateral)
Table 4 Test of means groups Effect
Value F
API GraphQL—Wilks’ Lambda
0.265 13.860b 1.000
Hypothesis gl Error gl Sig
API REST—Wilks’ Lambda
0.244 15.465b 1.000
5.000
0.011
API GraphQL (wrapper)—Wilks’ Lambda 0.156 27.087b 1.000
5.000
0.003
5.000
0.014
a. Design: Intersection b. Exact statistic Intra-subject Design: API REST, API GraphQL, API GraphQL (wrapper)
External validity. We experimented in a computational laboratory context where we ran the use cases on the REST and GraphQL APIs (wrapper) in the same execution environment on the Local-PC. Construct validity. To minimize measurement bias, we developed the experiment execution constructs to automatically measure the response time and the number of tasks executed in a unit of time on the REST and GraphQL APIs. In addition, the constructs were defined and validated in consensus with two expert software
406
A. Quiña-Mera et al.
engineers. In addition, we used four use cases to minimize data manipulation bias in the established treatments. Conclusion validity. We mitigated threats to the conclusions by performing statistical analyses to accept one of the hypotheses raised in the experiment and thus support the study’s conclusions.
7 Discussion To corroborate the results obtained, we proceeded to analyze other studies related to the comparison of APIs, which generated the following conclusions: Vogel et al. [14] present a study where they migrate a part of an API of a Smart home management system to GraphQL. They report the performance of two endpoints after migration. They conclude that GraphQL required 46% of the time of the original REST API. Wittern et al. [12] show how to automatically generate GraphQL wrappers for existing REST APIs. They propose a tool with the OpenAPI (OAS) specification, with which they evaluated 959 available public REST APIs and generated the wrapper for 89,5% of these APIs. In addition, Seabra et al. [2] studied three applications using REST and GraphQL architecture models. They observed that migration to GraphQL resulted in increased performance in two of the three applications with respect to the number of requests per second and data transfer rate. In relation to the previous studies, we present that requests made with GraphQL API have more advantages in relation to underfetching and overfetching metrics. In addition, GraphQL API handles more efficiently the memory resource compared to REST API. We present limitations of which we should extend studies of this type to measure the wrapper in more detail since in the present study no major impact is observed.
8 Conclusions and Future Work In this paper, we pose the research question (RQ) What is the effect of the architecture in API development on the external quality of the software? We answer the RQ by conducting a computational experiment comparing the effect of the performance efficiency (characterized with ISO/IEC 25023) of three APIs developed on GraphQL, REST, and GraphQL architectures wrapping a REST API. We conduced a computational experimental laboratory around consuming the three APIs by a client application around consuming the three APIs by a client application, which executes four common use cases (2 queries and 2 data insertions). We characterized the external quality of the software using the metrics Average Response Time and Average system response time. After running, tabulating, and statistically analyzing the experiment results, we accept the alternative hypothesis H 1 , which indicates a difference in external quality between the REST API, the GraphQL API, and
REST, GraphQL, and GraphQL Wrapper APIs Evaluation. …
407
the GraphQL API (wrapper); this being so, the GraphQL API has a more efficient performance. Acknowledgements Electronic Commerce Chamber of Ecuador.
References 1. Hartig O, Pérez J (2017) An initial analysis of facebook’s GraphQL language. In: CEUR workshop proceedings, Montevideo 2. Seabra M, Nazário MF, Pinto GH (2019) REST or GraphQL?: A perfor mance comparative study. SBCARS’19 3. Guevara-Vega C, Bernárdez B, Durán A, Quiña-Mera A, Cruz M, Ruiz-Cortés A (2021) Empirical strategies in software engineering research: a literature survey. In: Second international conference on information systems and software tech nologies (ICI2ST). Ecuador, pp 120–127 4. ISO/IEC 25023:2016—Systems and software engineering, ISO: The international organization for standardization. https://www.iso.org/standard/35747.html. Accessed 28 March 2021 5. ISO/IEC 25000 Systems and software engineering, ISO: The international organization for standardization. https://bit.ly/3xhut3j. Accessed 12 Feb 2021 6. Fielding RT (2000) Architectural styles and the design of network-based software architecture. PhD dissertation, University of California 7. Fielding RT, Taylor RN (2002) Principled design of the modern web architecture. ACM Trans Internet Technol 2(5):115–150 8. Sheth A, Gomadam K, Lathem J (2007) SA-REST: semantically interoperable and easier-to-use services and mashups. IEEE Internet Comput 6(11):91–94 9. Brito G, Mombach T, Valente M (2019) Migrating to GraphQL: A practical assessment. In: SANER 2019—proceedings of the 2019 IEEE 26th international conference on software analysis, evolution, and reengineering 10. Pautasso C, Zimmermann O, Leymann F (2008) Restful web services versus “big” web services. In: WWW ’08: proceedings of the 17th international conference on World Wide Web. Beijing China 11. Brito G, Valente M (2020) REST vs GraphQL: a controlled experiment. In: Proceedings—IEEE 17th international conference on software architecture, ICSA 2020, Salvador, Brazil 12. Wittern E, Cha A, Laredo J (2018) Generating GraphQL-wrappers for REST (-like) APIs, 18th international conference, ICWE 2018. Springer International Publishing, Spain 13. Quiña-Mera A, Fernández-Montes P, García J, Bastidas E, Ruiz-Cortés A (20200) Quality in use evaluation of a GraphQL implementation. lecture notes in networks and systems. (405 LNNS), pp 15–27 14. Vogel M, Weber S, Zirpins C (2018) Experiences on migrating restful web services to GraphQL, ASOCA, ISyCC, WESOACS, and satellite events. Springer International Publishing, Spain 15. Hartig O, Pérez J (2018) Semantics and complexity of GraphQL. In: WWW ’18: proceedings of the 2018 World Wide Web conference, Switzerland 16. Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in software engineering, 1st edn. Springer, Berlin, Heidelberg 17. He H, Singh A (2008) Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD international conference on management data, New York, United States, pp 405–417 18. IBM SPSS software. https://www.ibm.com/analytics/spss-statistics-software. Accessed 10 April 2021 19. Author. (2022) Laboratory Package: Efficient consumption between GraphQL API Wrappers and REST API, Zenodo. https://doi.org/10.5281/zenodo.6614351
Design Science in Information Systems and Computing Joao Tiago Aparicio, Manuela Aparicio, and Carlos J. Costa
Abstract Design science is a term commonly used to refer to the field of study that focuses on the research of artifacts, constructs, and other artificial concepts. Furthermore, the purpose of this article is to provide a definition of this domain of knowledge concerning information systems and computing, as well as to differentiate between what it is and what it is not, as well as to provide examples of these in ongoing research. In order to accomplish this goal, we conduct a bibliometric analysis on design science and its interaction with information systems and computation. This study aims to identify the primary aggregations of publications pertaining to design science and their chronological analyses. In addition, we clarify some common misconceptions about this field of study by defining what constitutes design science and what exactly does not constitute design science. In addition, we determined the primary stages of the methodological approach to design science. Keywords Design science · Bibliometric study · Research methodology
1 Introduction Research can be defined as an a