163 64 39MB
English Pages 592 Year 2023
Lecture Notes in Networks and Systems 646
Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj Editors
Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 1
Lecture Notes in Networks and Systems
646
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj Editors
Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 1
Editors Ajith Abraham Faculty of Computing and Data Science FLAME University Pune, Maharashtra, India Machine Intelligence Research Labs Scientific Network for Innovation and Research Excellence Auburn, WA, USA
Sabri Pllana Center for Smart Computing Continuum Burgenland, Austria Kun Ma University of Jinan Jinan, Shandong, China
Gabriella Casalino University of Bari Bari, Italy Anu Bajaj Department of Computer Science and Engineering Thapar Institute of Engineering and Technology Patiala, Punjab, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-27439-8 ISBN 978-3-031-27440-4 (eBook) https://doi.org/10.1007/978-3-031-27440-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Welcome to the 22nd International Conference on Intelligent Systems Design and Applications (ISDA’22) held in the World Wide Web. ISDA’22 is hosted and sponsored by the Machine Intelligence Research Labs (MIR Labs), USA. ISDA’22 brings together researchers, engineers, developers and practitioners from academia and industry working in all interdisciplinary areas of computational intelligence and system engineering to share their experience and to exchange and cross-fertilize their ideas. The aim of ISDA’22 is to serve as a forum for the dissemination of state-of-the-art research, development and implementations of intelligent systems, intelligent technologies and useful applications in these two fields. ISDA’22 received submissions from 65 countries, each paper was reviewed by at least five or more reviewers, and based on the outcome of the review process, 223 papers were accepted for inclusion in the conference proceedings (38% acceptance rate). First, we would like to thank all the authors for submitting their papers to the conference, for their presentations and discussions during the conference. Our thanks go to the program committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. Our special thanks to the following plenary speakers, for their exciting talks: • • • • • • • • • •
Kaisa Miettinen, University of Jyvaskyla, Finland Joanna Kolodziej, NASK- National Research Institute, Poland Katherine Malan, University of South Africa, South Africa Maki Sakamoto, The University of Electro-Communications, Japan Catarina Silva, University of Coimbra, Portugal Kaspar Riesen, University of Bern, Switzerland Mário Antunes, Polytechnic Institute of Leiria, Portugal Yifei Pu, College of Computer Science, Sichuan University, China Patrik Christen, FHNW, Institute for Information Systems, Olten, Switzerland Patricia Melin, Tijuana Institute of Technology, Mexico
We express our sincere thanks to the organizing committee chairs for helping us to formulate a rich technical program. Enjoy reading the articles!
ISDA 2022—Organization
General Chairs Ajith Abraham Andries Engelbrecht
Machine Intelligence Research Labs, USA Stellenbosch University, South Africa
Program Chairs Yukio Ohsawa Sabri Pllana Antonio J. Tallón-Ballesteros
The University of Tokyo, Japan Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Huelva, Spain
Publication Chairs Niketa Gandhi Kun Ma
Machine Intelligence Research Labs, USA University of Jinan, China
Special Session Chair Gabriella Casalino
University of Bari, Italy
Publicity Chairs Pooja Manghirmalani Mishra Anu Bajaj
University of Mumbai, India Machine Intelligence Research Labs, USA
Publicity Team Members Peeyush Singhal Aswathy SU Shreya Biswas
SIT Pune, India Jyothi Engineering College, India Jadavpur University, India
viii
ISDA 2022—Organization
International Program Committee Abdelkrim Haqiq Alexey Kornaev Alfonso Guarino Alpana Srk Alzira Mota Amit Kumar Mishra Andre Santos Andrei Novikov Anitha N. Anu Bajaj Arjun R. Arun B Mathews Aswathy S U Ayalew Habtie Celia Khelfa Christian Veenhuis Devi Priya Rangasamy Dhakshayani J. Dipanwita Thakur Domenico Santoro Elena Kornaeva Elif Cesur Elizabeth Goldbarg Emiliano del Gobbo Fabio Scotti Fariba Goodarzian Gabriella Casalino Geno Peter Gianluca Zaza Giuseppe Coviello Habib Dhahri Habiba Drias Hiteshwar Kumar Azad Horst Treiblmaier Houcemeddine Turki Hudson Geovane de Medeiros
FST, Hassan 1st University, Settat, Morocco Innopolis University, Russia University of Foggia, Italy Jawaharlal Nehru University, India Polytechnic of Porto, School of Engineering, Portugal DIT University, India Institute of Engineering, Polytechnic Institute of Porto, Portugal Sobolev Institute of Mathematics, Russia Kongu Engineering College, India Thapar Institute of Engineering and Technology, India Vellore Institute of Technology, India MTHSS Pathanamthitta, India Marian Engineering College, India Addis Ababa University, Ethiopia USTHB, Algeria Technische Universität Berlin, Germany Kongu Engineering College, Tamil Nadu, India National Institute of Technology Puducherry, India Banasthali University, Rajasthan, India University of Bari, Italy Orel State University, Russia Istanbul Medeniyet University, Turkey Federal University of Rio Grande do Norte, Brazil University of Foggia, Italy Universita’ degli Studi di Milano, Italy University of Seville, Spain University of Bari, Italy University of Technology Sarawak, Malaysia University of Bari, Italy Polytechnic of Bari, Italy King Saud University, Saudi Arabia USTHB, Algeria Vellore Institute of Technology, India Modul University, Austria University of Sfax, Tunisia Federal University of Rio Grande do Norte, Brazil
ISDA 2022—Organization
Isabel S. Jesus Islame Felipe da Costa Fernandes Ivo Pereira Joêmia Leilane Gomes de Medeiros José Everardo Bessa Maia Justin Gopinath A. Kavita Gautam Kingsley Okoye Lijo V. P. Mahendra Kanojia Maheswar R. Marìa Loranca Maria Nicoletti Mariella Farella Matheus Menezes Meera Ramadas Mohan Kumar Mrutyunjaya Panda Muhammet Ra¸sit Cesur Naila Aziza Houacine Niha Kamal Basha Oscar Castillo Paulo Henrique Asconavieta da Silva Pooja Manghirmalani Mishra Pradeep Das Ramesh K. Rasi D. Reeta Devi Riya Sil Rohit Anand Rutuparna Panda S. Amutha Sabri Pllana Sachin Bhosale
ix
Institute of Engineering of Porto, Portugal Federal University of Bahia (UFBA), Brazil University Fernando Pessoa, Portugal Universidade Federal e Rural do Semi-Árido, Brazil State University of Ceará, Brazil Vellore Institute of Technology, India University of Mumbai, India Tecnologico de Monterrey, Mexico Vellore Institute of Technology, India Sheth L.U.J. and Sir M.V. College, India KPR Institute of Engineering and Technology, India UNAM, BUAP, Mexico UNAM, BUAP, Mexico University of Palermo, Italy Universidade Federal e Rural do Semi-Árido, Brazil University College of Bahrain, Bahrain Sri Krishna College of Engineering and Technology, India Utkal University, India Istanbul Medeniyet University, Turkey USTHB-LRIA, Algeria Vellore Institute of Technology, India Tijuana Institute of Technology, México Instituto Federal de Educação, Ciência e Tecnologia Sul-rio-grandense, Brazil Machine Intelligence Research Labs, India National Institute of Technology Rourkela, India Hindustan Institute of Technology and Science, India Sri Krishna College of Engineering and Technology, India Kurukshetra University, India Adamas University, India DSEU, G.B. Pant Okhla-1 Campus, New Delhi, India VSS University of Technology, India Vellore Institute of Technology, India Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Mumbai, India
x
ISDA 2022—Organization
Saira Varghese Sam Goundar Sasikala R Sebastian Basterrech Senthilkumar Mohan Shweta Paliwal Sidemar Fideles Cezario Sílvia M. D. M. Maia Sindhu P. M. Sreeja M U Sreela Sreedhar Surendiran B. Suresh S. Sweeti Sah Thatiana C. N. Souza Thiago Soares Marques Thomas Hanne Thurai Pandian M. Tzung-Pei Hong Vigneshkumar Chellappa Vijaya G Wen-Yang Lin Widad Belkadi Yilun Shang Zuzana Strukova
Toc H Institute of Science & Technology, India RMIT University, Vietnam Vinayaka Mission’s Kirupananda Variyar Engineering College, India VSB-Technical University of Ostrava, Czech Republic Vellore Institute of Technology, India DIT University, India Federal University of Rio Grande do Norte, Brazil Federal University of Rio Grande do Norte, Brazil Nagindas Khandwala College, India Cochin University of Science and Technology, India APJ Abdul Kalam Technological University, India NIT Puducherry, India KPR Institute of Engineering and Technology, India National Institute of Technology Puducherry, India Federal Rural University of the Semi-Arid, Brazil Federal University of Rio Grande do Norte, Brazil University of Applied Sciences and Arts Northwestern Switzerland, Switzerland Vellore Institute of Technology, India National University of Kaohsiung, Taiwan Indian Institute of Technology Guwahati, India Sri Krishna College of Engineering and Technology, India National University of Kaohsiung, Taiwan Laboratory of Research in Artificial Intelligence, Algeria Northumbria University, UK Technical University of Košice, Slovakia
Contents
KMetaTagger: A Knowledge Centric Metadata Driven Hybrid Tag Recommendation Model Encompassing Machine Intelligence . . . . . . . . . . . . . . . R. Ashvanth, Gerard Deepak, J. Sheeba Priyadarshini, and A. Santhanavijayan KCReqRec: A Knowledge Centric Approach for Semantically Inclined Requirement Recommendation with Micro Requirement Mapping Using Hybrid Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vihaan Nama, Gerard Deepak, and A. Santhanavijayan
1
12
Object Classification Using ECOC Multi-class SVM and HOG Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khushboo Jain, Manali Gupta, Surabhi Patel, and Ajith Abraham
23
GA Evolved Configuration Data for Embryonic Architecture with Built-in Self-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gayatri Malhotra, Punithavathi Duraiswamy, and J. K. Kishore
34
A Multi-layer Deep Learning Model for ECG-Based Arrhythmia Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khushboo Jain, Arun Agarwal, Ashima Jain, and Ajith Abraham
44
Analyzing Electoral Data Using Partitional and Hierarchical Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paulo Rogerio Nietto, Maria do Carmo Nicoletti, and Nilton Cesar Sacco
53
Medical Decision Making Based 5D Cardiac MRI Segmentation Tools . . . . . . . . Houneida Sakly, Mourad Said, and Moncef Tagina India Post Service Facility Layout Design Selection and Evaluation Using MCDM Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. M. Vadivel, Sooraj Sanjai, S. Siri, and A. H. Sequeira Weighted Pathfinding in the Paparazzi Problem with Dynamic Obstacles . . . . . . Timo Schöpflin, Pascal Zimmerli, Rolf Dornberger, and Thomas Hanne A Rapid Review on Ensemble Algorithms for COVID-19 Classification Using Image-Based Exams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elaine Pinto Portela, Omar Andres Carmona Cortes, and Josenildo Costa da Silva
65
73
85
96
xii
Contents
Indian Postal Service Quality Assessment Using Graph Theoretic Approach – A Quantitative Decision-Making Tool . . . . . . . . . . . . . . . . . . . . . . . . . 107 S. M. Vadivel, A. H. Sequeira, and Sunil Kumar Jauhar Analyzing the Critical Success Factors of Lean System Implementation in India Post Using DEMATEL Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 S. M. Vadivel, Buvanesh Chandrasekaran, Thangaraja Arumugam, K. Sivakumar, and Uduak Umoh Application of Artificial Intelligence in Mental Health . . . . . . . . . . . . . . . . . . . . . . 128 Anindya Nag, Ayontika Das, Riya Sil, Anwesha Kar, Dishari Mandal, and Biva Das Cold Rolling Mill Energy Consumption Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Danilo G. de Oliveira, José Francisco S. Filho, Fabiano Miranda, Pedro H. Serpa, and Rafael Stubs Parpinelli Virtual Reconstruction of Adaptive Spectral and Spatial Features Based on CNN for HSI Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Maissa Hamouda and MedSalim bouhlel Enhancing Rental Bike Count and Availability Prediction Using Regression Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Dhiraj Kumar, Diptirtha Chatterjee, Bibek Upadhyaya, Shailendra Nath Yadav, and Jyoti Singh Kirar Application of WASPAS Method for the Evaluation of Tamil Nadu Private Travels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 S. M. Vadivel, A. H. Sequeira, Deeksha Sanjay Shetty, and V. Chandana Time Series Forecast Applied to Electricity Consumption . . . . . . . . . . . . . . . . . . . 178 Lídio Mauro Lima de Campos A Survey on Text Processing Using Deep Learning Techniques . . . . . . . . . . . . . . 188 Akshita Tyagi, Terrance Frederick Fernandez, K. Shantha Kumari, and Amit Kumar Tyagi RePI: Research Paper Impact Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Ananya Uppal, P. Maitreyi, H. R. Mamatha, and Jamuna Human-Centred Artificial Intelligence in Sound Perception and Music Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Michele Della Ventura
Contents
xiii
Multi-objective Optimization for Sensor Networks Based Smart Parking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Mehdi Achour and Amine Boufaied Process Automation with Digital Robots Under Smart University Concept . . . . . 242 Sakine Akyol, Onur Dogan, and Orhan Er Performance Evaluation of Manufacturing Product Layout Design Using PROMETHEE II - MCDM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 S. M. Vadivel, A. H. Sequeira, Vimal Kumar, and V. Chandana An Ergonomics Assessment in India Post Manual Sorting Centre Using EDAS – A MCDM Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 S. M. Vadivel, A. H. Sequeira, Uduak Umoh, and V. Chandana A Concept for QoS Management in SOA-Based SoS Architectures . . . . . . . . . . . 271 Ingolf Gehrhardt, Fouad Bahrpeyma, and Dirk Reichelt Comparative Data Oversampling Techniques with Deep Learning Algorithms for Credit Card Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Zainab Saad Rubaidi, Boulbaba Ben Ammar, and Mohamed Ben Aouicha An Error Sensitive Fuzzy Clustering Technique for Mammogram Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Bhawesh K. Chaudhary, Sanjay Agrawal, P. K. Mishro, and Rutuparna Panda A Sustainable Green Supplier Selection Using CRITIC Method . . . . . . . . . . . . . . 308 S. M. Vadivel, Deeksha Sanjay Shetty, A. H. Sequeira, E. Nagaraj, and V. Sakthivel Apartment Waste Disposal Sustainable Facility Location Using ENTROPY Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 S. M. Vadivel, Suganya Palanivelu, A. H. Sequeira, and V. Chandana Prediction of Stock Price Direction Combining Volatility Indicators with Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Azdine Bilal, Abdelhadi Ifleh, and Mounime El Kabbouri Integration of Text and Graph-Based Features for Depression Detection Using Visibility Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Nasser Ghadiri, Rasool Samani, and Fahime Shahrokh Application of VIKOR Method for Green Postal Sustainable Service Design . . . 342 S. M. Vadivel, B. Pranamya, P. Arivazhagan, A. H. Sequeira, and V. Chandana
xiv
Contents
Automotive Stamping Process Optimization Using Machine Learning and Multi-objective Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Bernard da Silva, Ana Paula Athayde Carneiro, José Osvaldo Amaral Tepedino, Jose Francisco Silva Filho, Fabiano Miranda, and Rafael Stubs Parpinelli Augmented Reality in Marketing Sector: Viewpoint of XR the Moroccan Association Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 El Mostafa Bourhim and Oumayma Labti Improving MIL Video Anomaly Detection Concatenating Deep Features of Video Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Silas S. L. Pereira and José Everardo Bessa Maia Moroccan Stock Price Prediction Using Trend Technical Indicators: A Comparison Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 Abdelhadi Ifleh, Azdine Bilal, and Mounime El Kabbouri A Review on Cloud-Based Smart Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Anindya Nag, Gulfishan Mobin, Anwesha Kar, Tanushree Bera, and Pretam Chandra Territorial Design and Vehicle Routing Problem Applied to the Population Census as a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Rogelio González-Velázquez, M. Beatriz Bernábe-Loranca, Erika Granillo-Martínez, and Guillermo De Ita Luna A Unified Framework for Knowledge Inference Based on Heterogeneous Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 Chunfu Xie Evaluation of Convolution Primitives for Embedded Neural Networks on 32-Bit Microcontrollers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Baptiste Nguyen, Pierre-Alain Moëllic, and Sylvain Blayac On a Structure of an Automated Differential Equation Solver Based on Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Damir Aminev, Nikita Demyanchuk, and Alexander Hvatov Stack Tag - Predicting the Stack Overflow Questions’ Tags Using Gated Recurrent Unit Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Varun Prakash, Sagar Raghav, Shubham Sood, Mrinal Pandey, and Mamta Arora
Contents
xv
A Rapid Review on the Application of Unmanned Aerial Vehicles in Construction Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 D. Yuvaraj and K. S. Anandh Retail Demand Forecasting for 1 Million Products . . . . . . . . . . . . . . . . . . . . . . . . . 467 Ioannis Pierros, Eleftherios Kouloumpris, Dimitrios Zaikis, and Ioannis Vlahavas Intelligent Mapping of Virtualized Services on Multi-domain Networks . . . . . . . 480 Vinicius Fulber-Garcia, Marcelo C. Luizelli, Carlos R. Paula dos Santos, Eduardo J. Spinosa, and Elias P. Duarte Fundus Eye Image Classification and Interpretation for Glaucoma Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Maísa Fernandes Gomes and Rafael Stubs Parpinelli Ensemble Learning Based Big Data Classification for Intrusion Detection . . . . . 501 Kamel Yasmine Kamel, Farah Jemili, and Rahma Meddeb Application of Combined SWOT and AHP Analysis to Assess the Virtual Reality and Select the Priority Factors for Education . . . . . . . . . . . . . . . . . . . . . . . . 512 El Mostafa Bourhim and Oumayma Labti Prediction of Colon Cancer Related Tweets Using Deep Learning Models . . . . . 522 Mohammed Rashad Baker, Esraa Zeki Mohammed, and Kamal H. Jihad A Combinatorial Approach: Datamining and an Efficient Deep Neural Network for Heart Disease Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 V. K. Jyothi and Guda Ramachandra Kaladhara Sarma High-Performance Computation in Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . 543 Shabnam Kumari and P. Muthulakshmi Evaluation of Semantic Parsing Frameworks for Automated Knowledge Base Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Martin Verrev Performing Systematic Review on Personalized Menu Scheduling Using PRISMA Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Dorra Kallel, Ines Kanoun, and Diala Dhouib Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
KMetaTagger: A Knowledge Centric Metadata Driven Hybrid Tag Recommendation Model Encompassing Machine Intelligence R. Ashvanth1 , Gerard Deepak2(B) , J. Sheeba Priyadarshini3 , and A. Santhanavijayan1 1 Department of Computer Science and Engineering, National Institute of Technology,
Tiruchirappalli, India 2 Department of Computer Science and Engineering, Manipal Institute of Technology
Bengaluru, Manipal Academy of Higher Education, Bengaluru, India [email protected] 3 Deparment of Data Science, CHRIST (Deemed to Be University), Bangalore, India
Abstract. The emergence of Web 3.0 has left very few tag recommendation structures compliant with its complex structure. There is a critical need for newer novel methods with improved accuracy and reduced complexity for tag recommendation, which complies with the Web 3.0 standard. In this paper, we propose KMetaTagger, a knowledge-centric metadata-driven hybrid tag recommendation framework. We consider the CISI dataset as the input, from which we identify the most informative terms by applying the Term Frequency - Inverse Document Frequency (TF-IDF) model. Topic modeling is done by Latent Semantic Indexing (LSI). A heterogeneous information network is formalized. Apart from this, the Metadata generation quantifies the exponential aggregation of real-world knowledge and is classified using Gated recurrent units(GRU). The Color Harmony algorithm filters out the initial feasible solutions into optimal solutions. This advanced solution set is finalized into the tag space. These tags are recommended along with the document keywords. When the suggested KMetaTagger’s performance is compared to that of baseline techniques and models, it is found to be far superior. Keywords: Color Harmony · GRU · LSI · Tagging · Tag Recommendation · TF – IDF
1 Introduction The tagging system forms a folksonomy which, unlike a taxonomy, is a non-hierarchical classification of objects using keywords by users with no restriction on vocabulary. The obvious and most important application of tag recommendation is to improve user experience. However, an effective tag recommendation system also indirectly improves other services which depend on tags like search, classification of content, and social discovery. Tags are used in Web search to measure similarity between documents and search queries and provide relevant results. Data classification can be done efficiently using tags since generated tags are descriptive and distinguishable in nature. Social © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 1–11, 2023. https://doi.org/10.1007/978-3-031-27440-4_1
2
R. Ashvanth et al.
tagging also allows users to connect with people with similar interests/queries or popular events through relevant tags. Motivation: The World Wide Web is overflowing with data, and its structure is evolving towards Web 3.0, also known as the semantic Web. There are not many techniques that are compliant with the semantic Web for tag recommendation owing to the structural complexity of Web 3.0. So there is a dire need for newer novel methods with improved accuracy and reduced complexity for tag recommendation, which is compliant with the Web 3.0 standard. Tags are potential markers for organizing content on the World Wide Web. This organization allows retrieval systems, search engines, and other expert systems to understand, retrieve, and utilize these entities. Social awareness is largely essential to overcome the competitive disparity between the social Web and the current World Wide Web framework. Contribution: This paper proposes a semantically driven document tag recommendation framework. Since machine intelligence is being used to improve the recommendation, the proposed model’s F-Measure and False Discovery Rate (FDR) are better than the baseline models. GRUs are used to classify the Metadata and the semantic similarity of the classified instances is computed using Agents under the Color Harmony meta-heuristic optimization algorithm (MOA). Organization: The following is a breakdown of the paper’s structure. Section 2 contains the related works, while Sect. 3 envelops the proposed framework. The results and performance evaluation are found in Sect. 4. Finally, the paper draws to a close in Sect. 5.
2 Related Work Kataria et al. [1] proposed techniques for modeling documents from both their natural language content and tags, and representing these documents, tags, and word sequences in a shared lowdimensional vector space. Hong et al. [2] proposed a topic modelingbased tag recommendation system. Topic models benefit from reduced dimensionality and document similarity. In order to find more relevant documents, this method highlights the higher topics while evaluating document similarity. When computing tag scores, the method also takes document similarity and historical tag occurrence into account. Anarfi et al. [3].’s reinforcement learning (RL) approach was demonstrated to automatically suggest tags for mashups. Through word vector similarities, their proposed approach performs efficient exploratory operations to automatically identify acceptable tag combinations for mashups. Zhou et al. [4] presented TagMulRec, a tool for automatically recommending tags and categorizing software objects in large-scale software information web sites that are constantly developing. TagMulRec identifies the software objects that are semantically similar to the given software object and utilizes their tags Hassan et al. [5] introduced a content-based TR technique that uses bidirectional gated recurrent units (bi-GRUs) with attention mechanisms to encode titles and abstracts of scientific publications into semantic vectors for improving the recommendation task. Nguyen et al. [6] presented a customized deep learning strategy for image tag recommendation that takes into account both the user’s preferences and visual data. In order to extract
KMetaTagger: A Knowledge Centric Metadata Driven Hybrid Tag
3
visual information from photos in a supervised manner, they used Convolutional Neural Networks (CNNs), which currently exhibit remarkable performance for image classification and recognition. Lei et al. [7] proposed a tag recommendation model based on text categorization, in which a capsule network with dynamic routing was investigated. The intrinsic spatial link between a part and a whole is encoded by the capsule network, creating viewpoint-invariant information that naturally generalizes to other viewpoints. Vairavasundaram et al. [8] proposed using a spreading activation algorithm to investigate the effect of built-in subject ontologies in efficient tag recommendations. Wu et al. [9] established a generative model (Tag2Word) that generated words depending on the tagword distribution and the tag itself. Tang et al. [10] proposed a coherent encoder-decoder architecture. The encoder uses Recurrent Neural Networks to describe the semantics of the text information, the decoder uses a prediction path to address tag correlation. A system was put forth by Tuarob et al. [11] that automatically annotates records with bad annotation by learning from collections of records with good annotation. In [12–18] several semantically inclined knowledge-centric paradigms in support of the literature of the proposed work have been depicted.
3 Proposed Work
Fig. 1. Proposed system architecture of KMetaTagger
Figure 1 displays the architecture of a semantically driven document tag recommendation framework which is driven by the dataset and powered by Auxiliary Knowledge.
4
R. Ashvanth et al.
The proposed Architecture is an aggregation of two Phases. In Phase 1, the TF-IDF model is applied over the document corpus of the dataset to indicate the number of the most informative frequent terms and the rare terms based on the frequency or the rarity of the term concerned. At the end of TF-IDF, we have the most informative frequent and rare terms across the corpus, subjected to LSI for topic modeling. Word ambivalence, their inherent lack of precision and personal style, and individual variability in word usage make word overlap measurements like TF-IDF problematic. By employing conceptual indexes that have been statistically determined rather than single words for retrieval, LSI makes an effort to overcome these problems. LSI works using the application of Singular Value Decomposition (SVD). When anticipating meaning, the SVD vectors created by LSI are more precise than looking at individual phrases. Finally, LSI can utilize relationships between words to better grasp their sense, or meaning, in a particular context. The TF-IDF statistic examines the relevance of a word to a document in a collection of documents. For each word in a document, the TF-IDF is calculated by multiplying two metrics, the Term Frequency (TF) and the Inverse Document Frequency (IDF). The TF is a metric that determines the frequency of a term occurring in a document. Equation (1) depicts the term frequency for a term t of a document d. tf (t, d ) =
count of t in d number of words in d
(1)
The IDF measures how common or uncommon a word is across the complete document set. An IDF score closer to 0 denotes that the word is more common. This measure is derived by dividing the total number of documents by the number of documents in which the word appears, and computing the logarithm. N (2) idf (t) = log (df + 1) where N denotes the total number of documents in the document set, and the document frequency of t is represented by df(t); the document frequency denotes the count of documents in which the term appears. The TF-IDF equation is given by Eq. 3. N tf − idf (t, d ) = tf (t, d ) ∗ log (3) (df + 1) LSI ensures uncovering of hidden topics, auxiliary topics and associated topics. Hidden topics are uncovered, and the initial topic space becomes diverse yet relevant. The terms extracted from the dataset are sent into a series of knowledge stores individually and separately. For instance, they are sent through the CYC and DBpedia SPARQL endpoints, and relevant entities and terms are harvested from CYC and DBpedia to enrich the tag space. The Linked Open Data cloud and Wikidata are accessed via APIs by giving the extracted informative terms from the dataset as input. At the end of this phase, all this knowledge harvested in the form of entities as well as the topic modeled terms is all formalized in the form of an information network, a linked open network
KMetaTagger: A Knowledge Centric Metadata Driven Hybrid Tag
5
comprising of all the terms where at least one entity is linked with another entity, using Shannon’s entropy value. Equation 4 depicts the Shannon’s entropy of a random variable X. H (X ) = H (P1 , . . . , Pn ) = −
n
Pi log2 Pi
(4)
i=1
where Pi denotes the possibility of X = x i with x i indicating the ith possible value of X out of n symbols. Phase 2 generates the metadata from the real-world World Wide Web using Apache Tika. Apache Tika is a Java-based content identification and analysis platform managed by the Apache Software Foundation. Tika detects and extracts data using a variety of document parsers and document type detection algorithms. This tool can also be used to create a global type detection and content extraction algorithm that can pull structured text and particular metadata from a range of documents. Tika enables search engines to obtain information and metadata from web sites which have been identified following the initial crawl search. It can also be used to categorize papers based on their most crucial terms using document analysis. Since all text-based and multimedia files can be parsed using a common interface, it makes Apache Tika a powerful and versatile library for content analysis.The reason for harvesting metadata is to enhance the number of instances into the tag space and ensure maximum diversity with maximum relevance to grow the tag space for successful document tagging, a full covered tag recommendation model. Since the metadata is excruciatingly extensive, it has to be classified. The best classification model would be the deep learning model. We choose Gated Recurrent Unit (GRU) since it has auto-handcrafted feature selection and automatic discovery of classes takes place from the metadata, and a large volume of metadata with exponentially large volumes of variety and veracity can be automatically classified using the GRUs. The Gated Recurrent Unit Network (GRU) was created to overcome the problem of Vanishing-Exploding gradients, which is common when using a main Recurrent Neural Network. The Update Gate and Reset Gate of the GRU are employed to tackle this problem. The two vectors listed here are used to select the data that will be output. How much prior knowledge must be transmitted to future generations is governed by the Update Gate. How much information from the past should be wiped is controlled by the Reset Gate. The Reset Gate incorporates the Current Memory Gate. The Input Modulation Gate is a part of the Input Gate that is used to make the input non-linear and zero-mean. To lessen the influence of earlier data on data being sent into the future, it was made a sub-component of the Reset gate. The classes discovered are retained. However, only 50% of the instances under each class are considered due to the large volumes of data generated. This 50% of the classified instances based on a random selection of classes using the upper concepts and random concepts is used to compute the semantic similarity with the terms in the information network using an agent modeled using AgentSpeak under a meta-heuristic optimization algorithm (MOA). The MOA selected is the Color Harmony algorithm. The Color Harmony Algorithm (CHA) is designed to search for harmonic colors that can be combined based on how close together they are on the hue color circle in the Munsell color system and harmonic templates. When two colors are paired together,
6
R. Ashvanth et al.
they create color harmony, which is attractive to the sight. The three aspects of Hue, Value, and Chroma serve as the foundation for color relationships in the Munsell color system. Names for colors are dependent on their Hue (e.g., red, yellow, green). A color’s Value determines how light or dark it is, but its Chroma determines how pure or colorful it is. The search space for the Color Harmony algorithm is initially filled with randomly generated groups of colors. Following the concentration phase, colors are first arranged on the hue circle according to their fitness value. At this stage, the variety of the population is slowly declining. The dispersion phase is started to increase diversity if population diversity falls below a set limit; otherwise, the concentration stage lasts. If a purer color is discovered throughout each phase, the hue circle is altered. As a result, a color circle update comes after each phase. The diversity limit is adjusted if the dispersion phase is finished. The search space should be thoroughly searched in order to find the global optimum, and early population convergence toward a narrow section of the search area should be prevented. Until the termination condition is met, the aforementioned procedure is repeated. Population diversity peaks, at which point CHA is ended.The semantic similarity is calculated using three models. Mainly, semantic similarity is measured using SemantoSim. However, associated KL divergence with a step deviation of 0.25 and Heip’s evenness index with a step deviation of 0.5 is also used for computing the semantic similarity or term similarity. SematoSim is used with a threshold of 0.75 in order to increase and maximize the relevance of results. SemantoSim measure is inspired from the Pointwise Mutual Information measure. If the number of terms in the query is two, the query terms are paired as (a,b). pmi(a, b) log p(a, b) (5) SemantoSim(a, b) = p(a).p(b) + log p(b, a) where pmi(a,b) refers to the Pointwise Mutual Information measure. p(a,b) is the probability of the term ‘a’ in its cooccurrence with ‘b’. p(a) and p(b) are the probabilities of the presence of terms ‘a’ and ‘b’ respectively. Let p(a) and q(a) are two probability distributions of a discrete random variable x. The Kullback-Leibler (KL) divergence of q(a) from p(a) is given by Eq. 6. DKL (p(a)||q(a)) =
a∈A
p(a) ln
p(a) da q(a)
(6)
Heip’s evenness index ensures even distribution of all the concepts in order to ensure diversity. H e −1 (7) E= (S − 1) where H is the Shannon’s Entropy value and S is the query number. The reason for using Color Harmony MOA is to convert the initial solution set into a more refined and feasible solution set. This advanced solution set is finalized into the tag space. These tags are recommended along with the document keywords, which are ensured for further alterations and verification by the domain experts of the community.
KMetaTagger: A Knowledge Centric Metadata Driven Hybrid Tag
7
4 Results and Performance Evaluation The proposed KMetaTagger, which is the framework for knowledge-centric metadatadriven hybrid tag recommendation, is baselined with DRCBTR [1], TMA [2], and RLTR [3] for the performance evaluation and comparison. Also, a combination of SVM, cosine similarity, and LDA has been considered to compare the performance of the proposed KMetaTagger. CISI document dataset was used for document-based tag retrieval, and experimentations were conducted on the CISI dataset. The University of Glasgow’s Information Retrieval Group has made this text-based dataset openly accessible. The Centre for Inventions and Scientific Information (CISI) acquired the textual data for 1,460 documents and 112 associated queries. Its objective is to be used to develop information retrieval models that provide a collection of document IDs that are pertinent to a specific input. Precision, recall, accuracy, F-Measure percentages, and False Discovery Rate (FDR) are all relevant measures used for assessing the KMetaTagger’s performance. Precision, recall, accuracy, and F-Measure compute and quantify the relevance of results, whereas FDR computes the false positives generated by the framework. These parameters were assessed using established formulations. Table 1. Performance of the proposed KMetaTagger in comparison to other models Model
Average Precision %
Average Recall %
Average Accuracy %
Average F-Measure %
FDR
DRCBTR [1]
88.69
90.11
89.40
89.39
0.12
TMA [2]
90.41
92.46
91.43
91.42
0.10
RLTR [3]
93.74
95.18
94.46
94.45
0.07
SVM + Cosine + 85.36 LDA
87.42
86.39
86.38
0.15
Proposed KMetaTagger
98.07
97.17
97.16
0.04
96.27
Table 1 shows that the suggested KMetaTagger has the highest precision, recall, accuracy, and F-Measure percentages, as well as the lowest FDR of 0.04. The KMetaTagger has the highest average precision (96.27%), the highest average recall (98.07%), the highest average accuracy (97.17%), and the highest average F-Measure (97.16%), as well as the lowest average FDR (0.04). Table 1 also clearly indicates that the DRCBTR [1] yields 88.69% of average precision, 90.11% of average recall, 89.40% of average accuracy, 89.39% of average F-Measure, and FDR of 0.12. Similarly, the TMA [2] model has an average precision of 90.41%, average recall of 92.46%, an average accuracy of 91.43%, average F-Measure of 91.42%, and FDR of 0.10. RLTR [3] yields 93.74% of average precision, 95.18% of average recall, 94.46% of average accuracy, 94.45% of average F-Measure, and FDR of 0.07. The hybridization of SVM, Cosine similarity, and LDA yields 85.36% of average precision, 87.42% of average recall, 86.39% of average accuracy, 86.38% of average F-Measure, and FDR of 0.15. Based on comparisons
8
R. Ashvanth et al.
with these models, the proposed KMetaTagger produces the highest precision, recall, accuracy, and F-Measure with the lowest FDR value. The proposed KMetaTagger has the highest precision, accuracy, recall, and FMeasure with the lowest FDR because it hybridizes several models like TF-IDF and LSI for Topic Modeling. Topic Modeling ensures the density of hidden topics is modeled into the framework. Besides enriching the topics into the framework, knowledge is incorporated from four distinct sources: CYC, LOD Cloud, Wikidata, and DBpedia. This knowledge incorporation ensures the aggregation of knowledge and enrichment of entities from several real-world sources. A heterogeneous information network is formalized, which increases the diversity and the density of auxiliary knowledge. Apart from this, the Metadata generation quantifies the exponential aggregation of real-world knowledge and its classification using the Deep Learning model, Gated recurrent units(GRU). GRU makes sure that the number of instances populated into the local framework is very high. Apart from the instances being enriched, the GRU classification simplifies the process of further recommendations. Computation of semantic similarity using SematoSim, KL divergence, and Heip’s Evenness index with differential thresholds and step deviation also enhances the relevance of results and its refinement by applying the Color Harmony algorithm. The Color Harmony algorithm filters out the initial feasible solutions into optimal solutions, ensuring that this model yields the best possible tags with significant relevance to the model with a high amount of diversification and enrichment of auxiliary knowledge. As a result, the proposed KMetaTagger yields the best-in-class precision, accuracy, recall, and F-Measure with the lowest FDR value. Although the DRCBTR [1] model yields above-average precision, accuracy, recall, F-Measure percentage, and an average FDR, it lags because documents and tags are incorporated only based on topic level information. There is no enhancement of magnification of knowledge from external knowledge sources into the model. Apart from this, it only depends on the knowledge contained within the document dataset, and the relevance computation mechanisms are also not very strong. Still, however usage of topic level information itself yields an above-average precision, recall, accuracy, and F-Measure, even though there is room for improvement in the model.TMA [2] also does not yield a very high precision, accuracy, recall, F-Measure, and a very low FDR is mainly since document similarity alone is used along with historical tag occurrence. The historical tag occurrence solves personalization, and topic modeling is incorporated; however, topic modeling alone is insufficient for highly enriched knowledge-driven tag recommendation. There is a need for highly distinct and strong relevance recommendation methods. Also, entity enrichment is required because relying on topic modeling alone does not enhance the diversity and density of tags. The RLTR [3] model also does not perform as expected because it uses a reinforcement learning mechanism. It learns from a document corpus, and relationship extraction is based on reinforcement learning for recommending tags for mashups. They use word vector similarity for relevance computation; however, learning from a document corpus is simple, but practically implementing the entire World Wide Web data by learning is a cumbersome task and almost impossible computationally. So relying on subsets of documents alone does not increase and quantify the spectrum of knowledge. Although this approach makes it exploratory, there is a lack of diversity. The word vector similarity
KMetaTagger: A Knowledge Centric Metadata Driven Hybrid Tag
9
Fig. 2. Precision vs. Number of Recommendations distribution curve for proposed model along with other baseline models
alone is not enough to make the relevance computation strong that is why there is a lag in the RLTR [3] model.The hybridization of the SVM, Cosine, and LDA model also does not perform up to the mark because SVM is a conventional binary linear classifier. Cosine similarity is a naive semantic similarity computation scheme. LDA increases the topic density; however, the topic instance enrichment does not occur. As a result, this model also lags, and relevance computation mechanism relying on a single similarity measure, namely Cosine similarity measure, also does not perform up to the mark.As the proposed hybridized KMetaTagger model rectifies all the shortcomings, the model performs better than the baseline models. From Fig. 2, it is evident that the proposed KMetaTagger occupies the highest hierarchy in the Precision vs. Number of Recommendations distribution curve. RLTR [3] occupies the second-highest position in the hierarchy. TMA [2] occupies the third position. Next in the hierarchy is the DRCBTR [1]. The lowest position in the hierarchy is occupied by the SVM, Cosine, and LDA hybridization. The KMetaTagger occupies the highest position in the Precision vs. Number of Recommendations distribution curve due to the implementation of various techniques such as TF-IDF and LSI for Topic Modeling. Knowledge incorporation from various knowledge stores ensures the aggregation of knowledge and enrichment of entities from several real-world sources. Generation of Metadata and its classification using GRU makes sure that the number of instances populated into the local framework is very high. Calculation of semantic similarity enhances the relevance of results and its refinement by applying the Color Harmony algorithm filters out the initial feasible solutions into optimal solutions. The DRCBTR [1] model does not occupy the highest position in the hierarchy because there is no enhancement of magnification of knowledge from external knowledge sources into the model. The relevance computation mechanisms are also not very strong. TMA [2] requires highly
10
R. Ashvanth et al.
distinct and strong relevance recommendation methods and entity enrichment because topic modeling alone is insufficient for highly enriched knowledge-driven tag recommendation. The RLTR [3] model also does not perform as expected because it uses reinforcement learning. Practically implementing the entire World Wide Web data by learning is a cumbersome task and almost impossible computationally. The hybridization of the SVM, Cosine, and LDA model also does not perform up to the mark because SVM is a very naive binary linear classifier. Cosine similarity is a naive semantic similarity computation method, and topic instance enrichment does not occur.
5 Conclusion The outcomes obtained prove the adequacy of the proposed model for intended purposes. The proposed KMetaTagger achieves the highest average accuracy of 97.17 percent and the highest average F-Measure of 97.16 percent among the models compared, implying that it is by far the most effective solution. The high accuracy and F-Measure accomplished is due to several models like TF-IDF and LSI for Topic Modeling and the usage of the Color Harmony MOA to convert the initial solution set into a more refined and feasible solution set. In order to increase the effectiveness of the model, additional metaheuristic algorithm integration will be looked into in the future as part of the work’s future scope.
References 1. Kataria, S., Agarwal, A.: Distributed representations for content-based and personalized tag recommendation. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1388–1395. IEEE (November 2015) 2. Hong, B., Kim, Y., Lee, S.H.: An efficient tag recommendation method using topic modeling approaches. In: Proceedings of the International Conference on Research in Adaptive and Convergent Systems, pp. 56–61 (September 2017) 3. Anarfi, R., Kwapong, B., Fletcher, K.K.: Towards a Reinforcement Learning-based Exploratory Search for Mashup Tag Recommendation. In: 2021 IEEE International Conference on Smart Data Services (SMDS), pp. 8–17. IEEE (September 2021) 4. Zhou, P., Liu, J., Yang, Z., Zhou, G.: Scalable tag recommendation for software information sites. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 272–282. IEEE (February 2017) 5. Hassan, H.A.M., Sansonetti, G., Gasparetti, F., Micarelli, A.: Semantic-based tag recommendation in scientific bookmarking systems. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp. 465–469 (September 2018) 6. Nguyen, H.T., Wistuba, M., Grabocka, J., Drumond, L.R., Schmidt-Thieme, L.: Personalized deep learning for tag recommendation. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 186–197. Springer, Cham (May 2017) 7. Lei, K., Fu, Q., Yang, M., Liang, Y.: Tag recommendation by text classification with attentionbased capsule network. Neurocomputing 391, 65–73 (2020) 8. Vairavasundaram, S., Varadharajan, V., Vairavasundaram, I., Ravi, L.: Data mining-based tag recommendation system: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(3), 87–112 (2015)
KMetaTagger: A Knowledge Centric Metadata Driven Hybrid Tag
11
9. Wu, Y., Yao, Y., Xu, F., Tong, H., Lu, J.: Tag2word: Using tags to generate words for content based tag recommendation. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp. 2287–2292 (October 2016) 10. Tang, S., et al.: An integral tag recommendation model for textual content. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 5109–5116 (July 2019) 11. Tuarob, S., Pouchard, L.C., Mitra, P., Giles, C.L.: A generalized topic modeling approach for automatic document annotation. Int. J. Digit. Libr. 16(2), 111–128 (2015). https://doi.org/10. 1007/s00799-015-0146-2 12. Deepak, G., Santhanavijayan, A.: OntoDynS: expediting personalization and diversification in semantic search by facilitating cognitive human interaction through ontology bagging and dynamic ontology alignment. J. Ambient Intell. Human. Comput. 1–25 (2022) 13. Naga Yethindra, D., Deepak, G., Santhanavijayan, A.: OntoQC: an ontology-infused machine learning scheme for question classification. In: Data Science and Security, pp. 265–274. Springer, Singapore (2022) 14. Palvannan, S., Deepak, G.: TriboOnto: a strategic domain ontology model for conceptualization of tribology as a principal domain. In: International Conference on Electrical and Electronics Engineering, pp. 215–223. Springer, Singapore (2022) 15. Deepak, G., Santhanavijayan, A.: QGMS: A query growth model for personalization and diversification of semantic search based on differential ontology semantics using artificial intelligence. Computational Intelligence (2022) 16. Deepak, G., Surya, D., Trivedi, I., Kumar, A., Lingampalli, A.: An artificially intelligent approach for automatic speech processing based on triune ontology and adaptive tribonacci deep neural networks. Comput. Electr. Eng. 98, 107736 (2022) 17. Arulmozhivarman, M., Deepak, G.: OWLW: ontology focused user centric architecture for web service recommendation based on LSTM and whale optimization. In: European, Asian, Middle Eastern, North African Conference on Management & Information Systems, pp. 334– 344. Springer, Cham (March2021 ) 18. Deepak, G., Santhanavijayan, A.: MKSTD: Multiple Knowledge Sourcing Approach for Topic Directories Construction and Linking Web Pages Using Fusion Semantics. In: Motahhir, S., Bossoufi, B. (eds.) ICDTA 2021. LNNS, vol. 211, pp. 565–574. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73882-2_51
KCReqRec: A Knowledge Centric Approach for Semantically Inclined Requirement Recommendation with Micro Requirement Mapping Using Hybrid Learning Models Vihaan Nama1(B) , Gerard Deepak2(B) , and A. Santhanavijayan3 1 Department of Computer Science Engineering, R V College of Engineering, Bangalore, India
[email protected]
2 Department of Computer Science Engineering, Manipal Institute of Technology Bengaluru,
Manipal Academy of Higher Education, Manipal, India [email protected] 3 Department of Computer Science Engineering, National Institute of Technology, Tiruchirappalli, India
Abstract. Software requirement recommendation is one of the most important strategies that is required when first building a new system. It provides a baseline of what exactly the software will do and how it is expected to perform, along with this it also describes how the product will fulfil all the stakeholder’s needs. In this paper, a model to predict the requirement specifications has been proposed. The techniques used were micro requirement mapping using hybrid learning models such as K-Means Clustering hybridized with Support Vector Machines and Cosine Similarity which yielded 80.17% average precision, 82.69% average recall, 81.43% average accuracy, 81.41% of the average measure of harmonic means and 0.1983 false discovery rate and also the proposed KCReqRec system which yielded 94.27% average precision, 96.39% average recall, 95.33% average accuracy, 95.31% of the average measure of harmonic means and 0.0573 False Discovery Rate. These models were run on the Kaggle Software Requirement Dataset integrated with data from the PURE (PUblic REquirements) dataset. Keywords: Hybrid Learning · Information Network · LSTM · Requirement Recommendation · Semantically Driven · TF – IDF
1 Introduction The Software Requirement Specification or SRS is a document that the stakeholders or developers create to help create a base for the Software Development Lifecycle of a project or system that is being developed. It forms a complete and whole method of how the system is supposed to perform along with the functional and non – functional requirements and usually happens at the end of the requirement engineering phase. This document is extremely important as it reduces the cost of development and also © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 12–22, 2023. https://doi.org/10.1007/978-3-031-27440-4_2
KCReqRec: A Knowledge Centric Approach
13
reduces the amount of time and effort needed to create a product. The SRS document should handle all types of real-world scenarios and should also define how the product would interact with system hardware, users, and also other programs. If this document is flawed it could cause major disruptions in the Software Development Lifecycle and the end product as this is the most important commodity and attribute during the time of the creation of the software. This document is usually generated through several meetings and discussions between the client and the stakeholders. To eliminate this tedious process, a system is proposed to be able to act as an automated model to constitute the SRS recommendation to the stakeholders which would ultimately lead to the most viable and structured paradigm for requirement elicitation. The proposed system would work on collecting the software requirements by giving hints to the user thus being able to deliver the final SRS recommendations. Instead of asking the user for new requirements to construct the SRS, the system would rather recommend already present requirements of similar systems available on the market as capsules or snippets to the stakeholders. Upon receiving these snippets, the clients and stakeholders give modifications if not happy and the system will also generate a map for those requirements that do match. The system recommends these snippets based on the client’s queries of requirements and allows the clients to make amends to either the original query or the recommendation results upon receiving them. Motivation: With the current state of the World Wide Web being overloaded with data a new set of standards known as Web 3.0 is taking heat in the atmosphere around us. Web 3.0 is a set of semantic standards which is highly cohesive and dense with information. To level with this shift in standards, there is a need for a system that is semantically compliant. The fact that the SRS document is the most important document for the SDLC model and an update is required in the field that is semantically compliant under Web 3.0 standards has been the main motivation for the creation of a system that recommends a set of requirements to both stakeholders and clients on their products that are under the development phase. Contribution: A knowledge-centric approach for semantically inclined requirement recommendation with micro requirement mapping using hybrid learning models was proposed. The techniques used to tackle this problem were. The models were trained on data integrated from the PURE dataset and that from Kaggle’s Software Requirement dataset. The precision, recall, accuracy, and F-Measure were all increased when compared to the baseline models. Organization: The rest of the paper has been organized as mentioned. Section 2 contains the related works, and Sect. 3 contains the proposed system architecture. The implementation, results, and lastly the performance evaluation is depicted in Sect. 4 while the paper is concluded in Sect. 5.
2 Related Works Mohebzada et al. [1] have formalized a method of using systematic mapping to help in providing an overview of recommendation systems for the software requirement engineering process, the state of validation, and their characteristics. Ramos et al. [2] in their
14
V. Nama et al.
work have used the Scrum process to be able to extract useful data along with also using both item recommendation and collaborative filtering. Mulla et al. [3] have developed a method of requirement elicitation via stakeholder recommendation and requirement elicitation. The model prescribed tackles three main problems of requirement elicitation in large projects – biased prioritization of requirements, information overload, and inadequate stakeholder input. Sun et al. [4] have proposed a method for software library recommendation using a semantic neural model which works upon analyzing the library usage patterns. They make recommendations based on descriptions of project requirement description to help in avoiding cold-start problems. Req2Lib uses sequence-to-sequence models and word2vec for the recommendation process. Zhang et al. [5] have together come up with a method of recommending non-functional software requirements. They first collected requirements from various sources and then apply a factor analysis technique which helps in identifying the latent factors which help in figuring out the non-functional values. This cluster analysis is used to summarise the popular non-functional requirements. Xu et al. [6] use their proposed CoRM (Co-occurrence Recommend Model) to help in recommending software security requirements. This model extracts the product’s security threats from the target security documents of software wherein the security requirements are tagged. The Skip-thoughts Model is used to help in establishing a connection between a security requirement and a threat. The model calculates the semantic similarities between the different security threats. Cheema et al. [7] have put forward a method of using a web recommendation tool to try and extract features from a document and calculate the reusability of data that is available in the form of Software Requirement Specifications from systems that have already been developed. The SRS documents are fed into the system upon which phrases that reflect functional features are extracted from the documents using information retrieval. Tokens are then extracted and passed and collaborative filtering is applied to help in requirement recommendation. Rajagopal et al. [8] propose a model that can aid in software requirement elicitation. This model identifies keywords from stakeholder interviews and uses keyword mapping to generate the system-candidate requirements. Various techniques such as Capability Maturity Model as well as Quality Function Deployment are used to aid this model in its recommendation process. Kumara et al. [9] have proposed a system known as REPTR (Requirement Perspective Technology Recommendation System) which is used to recommend a technology stack for web applications. The system described aims to describe the technologies recommended and also has external features of linking learning platforms to these technologies. Hu et al. [10] use the method of ontology to help in constructing the requirement knowledge framework. The system proposed builds a multi-ontology framework which is divided into generalization, task, domain, and application ontology to help in requirement elicitation. The paper integrates the concepts of a multi-ontology framework and software requirement error pattern in a consistent form. In [11–20] several models in support of the literature of the proposed work have been depicted.
KCReqRec: A Knowledge Centric Approach
15
3 Proposed System Architecture The proposed system architecture that has been depicted in Fig. 1 for the requirement specification document recommendation framework makes use of semantic information networks to complete its task. Semantic information networks refer to the graphical representation of nodes containing knowledge and also the links between these nodes which aim to depict the hierarchical relationships between them. The process starts with interactions from the client portals where the documents from client meetings and client verbatims are submitted as the primary input to the framework for pre-processing.During the pre-processing stage of this process, we aim to extract individual terms through various methods. Firstly, tokenization is used to separate the sentences into their individual terms. For this, the white space tokenizer is made use of which splits tokens on the occurrence of a white space in the sentence. Next, the process of lemmatization is applied to our inputs. Here it is aimed to reduce each word down to the root word to be able to understand the category and meaning much better. Lemmatization is achieved by using the WordNetLemmatizer present in the Natural language Toolkit corpus. The next step in the journey of pre-processing is called stop word removal which removes all words from our inputs that have no relevance or meaning to the framework. Some examples of such words are ‘and’, ‘the’, ‘in; etc., and their removal is done using the regex set matching mapper and extractor. Upon this, the last step in pre-processing is Named Entity Recognition in which the named entities which represent an organization, person or location, etc. are identified from the given input. To do this we make use of the MaxEnt NE Chunker from the NLTK library.
Fig. 1. KCReqRec system architecture model
After the pre-processing has been completed, the terms generated are used in two different phases. In the first phase, the terms are integrated into an existing taxonomy of terms generated from the SRS documents from the centralized SVN repository. The centralized SVN repository constitutes both functional and non-functional requirements.
16
V. Nama et al.
Out of these two categories mostly the functional requirements serve as indicators and are thus integrated to form the initial information network. As this information network is very sparse the pre-processed terms are subsequently sent to phase 2 which works in parallel to phase 1. The second phase deals with domain-specific metadata generation by using the World Wide Web and the DSpace meta tag harvester. DSpace is an open-source software (under a BSD license) that allows for easy access to all kinds of online digital content which is not limited to texts, images, etc. It is used to create open-access repositories which by definition hold research output and also provide immediate, permanent, and free access to research results that anyone can download, use and distribute. It uses the Qualified Dublin Core-based schema for metadata. A user can add a custom schema like QDC or can extend an already present base schema.The reason the metadata is generated is that as mentioned earlier, the information network is sparse so it is necessary to increase the domain-specific data based on the requirements so that the knowledge from the current World Wide Web context is integrated into the localized frameworks and consequently domain data is generated. A problem that arises with this method is that high volumes of domain metadata are generated. To tackle this issue the deep learning model LSTM (Long – Short-Term Memory) is used. An LSTM is used because of the large heterogeneity of the domain metadata. The deep learning model is also preferred as automatic handcrafted feature selection, automatic discovery of classes and automatic categorization of all items and entities under these classes take place. Since the domain-specific metadata is exponentially large only 50% of automatically discovered classes are considered. Thus, the tag info is incorporated by computing a simple semantic similarity which is an NPMI (Normalised Pointwise Mutual Information) measure by considering positive values and a threshold value of 0.5 and finally enriches the network. The next step is to quantitatively search the SVM repository. This is a repository that is localized to every project, sub-project, and domain. It is vertical and horizontal in nature and is linked to a single repository. To the final versions of the SRS documents along with the domain matching documents the TF-IDF or ‘Term Frequency – Inverse Document Frequency’ model is applied. Through the application of TF-IDF, the most viable documents which are feature-rich based on the rarity and frequency of terms across the document corpus is extracted.TF-IDF also known as the Term Frequency – Inverse Document Frequency is a type of statistical metric which obtains a measurable value on the amount of relevance a single word provides to a particular document present in a collection of these documents. Term frequency of a particular word in a document is computed by having the raw count of instances that the word shows up in the document upon which we modify the frequency by either the length of the document or the raw frequency of a word that is most frequent in the document, it has been mathematically depicted in Eq. (2). Inverse document frequency of a particular word on the other hand defines how common or rare a word is in the entire set of documents. It is calculated by dividing the total number of documents by the number of documents that contain that particular word upon which the resultant is taken a logarithm and has been correctly illustrated in Eq. (3). In mathematical terms, TF-IDF is calculated as shown in Eq. (1) tf idf (t, d , D) = tf (t, d ). idf (t, D)
(1)
KCReqRec: A Knowledge Centric Approach
17
where, tf (t, d ) = log(1 + freq(t, d ))
N idf (t, D) = log count(d ∈ D : t ∈ d )
(2) (3)
As mentioned above the reason for applying TF-IDF is to obtain feature-rich documents from the document corpus. From these documents identified the most informative terms are extracted based on the frequency and relativity of terms. The terms are used to compute the semantic similarity between nodes of the information network as well as the terms of the documents. The semantic similarity is computed under 2 measures – the NPMI measure or the Gini-Simpsons index. For the NPMI index, positive values between 0 and 1 with a threshold of 0.5 are considered. While for the Gini – Simpsons Index a step deviation of 0.25 is considered. This is done so as to ensure uniform distribution and diversity of documents in the terms.Gini-Simpson’s index otherwise known as the Simpson’s Diversity Index is a statistical measure of diversity. It measures diversity after taking into account two factors which are evenness and richness. Richness is the number of different items available. Evenness on the other hand compares the similarity of the number of items for each item present. It has been mathematically depicted in Eq. (4). n(n − 1) (4) D =1− N (N − 1) Mutual Information or MI is a measurement for comparing the amount of information overlapped between two random variables. If 2 random variables X and Y have marginal probabilities p(x) and p(y) and joint probabilities p(x,y) then the mutual information between them is defined as shown in Eq. (5) – I (X ; Y ) =
p(x, y) ln
x,y
p(x, y) p(x)p(y)
(5)
Pointwise Mutual Information or PMI on the other hand measures the amount of difference between the actual joint probability of a particular set of events p(x,y) and what it is expected to be based on the probabilities of each individual event p(x)p(y) upon assuming their independence. It is defined as mentioned in Eq. (6) – i(x, y) = ln
p(x, y) p(x)p(y)
(6)
The PMI term is Normalised by using the terms -ln p(x,y) due to the fact that it has a pleasant property of normalizing not only the lower but also the upper bound as well. The resultant Normalised Pointwise Mutual Information or NPMI is defined as depicted in Eq. (7) – p(x,y) ln in (x, y) = (7) p(x)p(y) − ln p(x, y)
18
V. Nama et al.
The NPMI measure and Gini-Simpsons Index run parallel to each other to obtain the initial solution. This solution is not efficient as there are too many terms. Hence a metaheuristic optimization algorithm known as lightning search is used to convert this initial solution into a more refined solution set. The lightning search algorithm is a meta-heuristic optimization algorithm that is effective in solving real-valued numerical optimization problems. It has been inspired by the mechanism of step leader propagation integrated with the physics-based natural phenomenon of lightning. The concept of fast particles known as projectiles is used in the algorithm. It depends on the techniques of step leader propagation which assumes that these projectiles are those fast particles. The space projectiles are used to demarcate the leader in the next generation. Upon the output of the lightning search algorithm, a term set is formalized. The term set is compared to the external dataset from which the semantic similarity is computed. From this, the terms are ranked and recommended. All the requirements identified in the first step are mapped to the micro-requirements of the document by the application of this proposed architecture.
4 Implementation and Performance Evaluation and Results The implementation of this project was conducted using Python (Version 3.10.0) and Google Collaboratory as the IDE. It was implemented on a 16 GB RAM on an Intel i7 processor with a clock speed of 4.2 GHz. Python’s NLTK libraries were used for preprocessing tasks. The datasets that were used for experimentation were Kaggle’s Software Requirements Dataset which contains various Functional and Non-functional requirements for varying software products. This was then hybridized and integrated with the PURE (PUblic REquirements) dataset upon which they were annotated and linked. The Ontology was dynamically synthesized using OntoCollab for the domain dataset and was later manually modeled using protege. The implementation of the baseline models was done in the same environment on the same dataset as the proposed model and the performance was compared quantitatively. Along with this, the performance measures used to analyze all the models were the exact same. In order to yield results of the proposed KCReqRec model the Precision, Recall accuracy, and f measure percentages along with the False Discovery Rate (FDR) are used as preferred metrics and the values of each with respect to each model have been tabulated as shown in Table 1. Precision, Recall accuracy, and F-Measure yields the relevance of results where the FDR quantifies the number of false positives which are discovered by the KCReqRec and baselines models. In order to compare the results of kc the SMRSE, RRSP, and Stakeholder + CF are chosen as baseline models. Also, K-Means Clustering + SVM + Cosine Similarity hybridization is chosen as one of the hybrid schemes to compare to the metrics of KCReqRec. From Table 1 it is seen that the SMRSE yields 73.21% average precision, 76.17% average recall, 74.69% average accuracy, 75.40% Average F-Measure, and a 0.2679 FDR. The RRSP yields 85.69% average precision, 87.38% average recall, 86.54% average accuracy, 86.53% Average F-Measure, and a 0.1431 FDR. Similarly, the stakeholder plus collaborative filtering yields 81.21% average precision, 87.38% average recall, 82.54% average accuracy, 82.52% average F-Measure,
KCReqRec: A Knowledge Centric Approach
19
Table 1. Comparison of Performance of the proposed KCReqRec with other approaches Model
Average Precision %
Average Recall %
Average Accuracy %
Average F-Measure %
False Discovery Rate
SMRSE [1]
73.21
76.17
74.69
75.407
0.2679
RRSP [2]
85.69
87.38
86.54
86.526
0.1431
Stakeholder + 81.21 CF [3]
83.87
82.54
82.52
0.1879
K-Means Clustering + SVM + Cosine Similarity
80.17
82.69
81.43
81.41
0.1983
Proposed KCReqRec
94.27
96.39
95.33
95.31
0.0573
and a 0.1879 FDR. The hybridization of K-Means + SVM + cosine similarity yields 80.17% average precision, 82.69% average recall, 81.43% average accuracy, 81.41% average F-Measure, and a 0.1983 FDR. However, from Table 1 it is clear that KC yields the best results with 94.27% average precision, 96.39% average recall, 95.33% average accuracy, 95.318% average F-Measure, and a 0.0573 FDR. The reason why KCReqRec is better than baseline models is that it integrates taxonomy from the SRS models and formalizes an information network. Moreover, it generates terms based on the initial SRS document and client portal meetings which are later pre-processed to generate the domain-specific metadata. This metadata is subject to classification using a deep learning model – LSTM. The top 50% of these classified instances are integrated with the taxonomy from the SRS documents and then the information network is formalized. Apart from this the incorporation of the TF IDF model to extract documents and prioritize based upon the frequency and rarity of terms across the document corpus and also the incorporation of hybrid semantic similarity schemes using the NPMI threshold criteria and Gini Simpsons index with a step deviation criterion which as a result allows the proposed model works much better than baseline models. Upon this, the semantic similarity computation takes place under the lightning search algorithm which helps in computing the most relevant entities from the list of all entities by computing the optimal solution from the set of feasible solutions which formalizes a term set. So as a result, owing to the hybridization of all the techniques collectively, it is ensured that the model has high relevance computational scheme along with the use of global knowledge in the form of metadata in the localized framework. As a result, the KC has the highest average precision, recall, accuracy, and F-Measure along with the lowest FDR when compared to baseline models. The proposed SMRSE is a systematic mapping model for requirements engineering. This model doesn’t perform up to the mark as it mainly focuses on systematic mapping where a bubble plot-based mapping is proposed where mapping characteristics are identified and a criterion for exclusion and inclusion are formalized and cross-dimensional
20
V. Nama et al.
features are extracted to enable requirements mapping and requirements recommendations. However, there is a lag because it is quite tedious to perform all of this and either has to be performed manually or an automation function has to be designed for all of this. However, criteria-based mapping doesn’t always perform extremely well and moreover, there is no domain knowledge fed into the model which is the reason why the SMRSE model, although it uses a systematic model as a scheme, doesn’t perform as expected and there is a lag in the model.The RRSP also doesn’t perform as expected because it is a scrum-based strategy that is followed. There are several criteria like sprint planning as well as scrum methodologies using product backlog categorization as well as semistructured user stories incorporation or formulation, tagging of several user stories, and review of these tags - sprint review and also the incorporation of scrum master and sprint retrospective users. Although these techniques are used ultimately, the recommendation is done using collaborative filtering. For Collaborative filtering always each entity has to be rated which is not always possible and also the ratings for each item given by users might have deviation constraints based on the users’ understanding of the requirements. As a result, this is not a very appropriate model to employ as CF doesn’t perform extensively well, however scrum characteristics ensure much more collaboration. However, the inclusion of both these yields an 86.54% average accuracy which is not extreme. It only depends upon human cognitive reasoning which alone cannot be taken as the criteria for recommending the requirements.
Fig. 2. Precision Vs No. of Recommendations
The Stakeholder + CF model also does not perform extensively well as this model mainly focuses on prioritizing the requirements based on the stakeholder’s prioritization. This means every stakeholder based on their importance in the requirement engineering process is given a priority and based on their priority the requirements are prioritized. However, the K-Means nearest neighbour along with CF is used for actual recommendations. The hybridization of these two is good enough. Again depending only on stakeholder ratings and prioritizing the stakeholders is not an appropriate strategy and is not
KCReqRec: A Knowledge Centric Approach
21
right as requirements have to be recommended based on the nature, Quality, content, and meaningfulness of the requirements and not just what it implies to a specific stakeholder and again the methodology does not incorporate global knowledge into the model, as a result, this model does yield a good output and the relevance computation model is weak which results in the model not performing well and also results in a lag in this model.Thus, the KCReqRec which is a knowledge-centric semantically inclined approach performs much better than the other baseline models yielding the best results with 94.27% average precision, 96.39% average recall, 95.33% average accuracy, 95.318% average F-Measure and a 0.0573 FDR. From the figure, it is observed that the proposed KCReqRec occupies the highest curve than the other model followed by the RRSP model after which follows the Stakeholder + CF, next is the K-Means + SVM and lowest is SMRSE. As can be seen in Fig. 2 as the number of recommendations increased for each model their precision fell respectively. It was also noticed that the precision for the proposed KCReqRec model stayed much above all the rest irrespective of the number of recommendations it was made to give.
5 Conclusion Due to the fact that the current Web 3.0 is a highly cohesive and dense set of semantic standards, it is important to develop a semantically inclined recommender system that also gives up-to-the-mark recommendations. A novel system KCReqRec was proposed which also happened to be semantically inclined. This system would be used in providing accurate information regarding recommendations of SRS documents. The dataset used to accomplish this task was Kaggle’s Software Requirements Dataset which was hybridized and integrated with the PURE (PUblic REquirements) dataset upon which they were annotated and linked. It can be concluded that the proposed KCReqRec system has much greater scores than the various other baseline models which it owes mainly to the metaheuristic optimization algorithm – Lightning Search which was used to compute the semantic similarity between terms of the information network and the SVN Standardized Central Repository. The proposed model had the lowest False Discovery Rate of 0.0573 and the highest F-Measure of 95.31% which makes it the most semantically sound yet effective and efficient solution for requirement recommendation.
References 1. Mohebzada, J.G., Ruhe, G., Eberlein, A.: Systematic mapping of recommendation systems for requirements engineering. In: 2012 International Conference on Software and System Process (ICSSP). IEEE (2012) 2. Ramos, F.B.A., et al.: A Non-Functional Requirements Recommendation System for Scrumbased Projects. SEKE (2018) 3. Mulla, N., Girase, S.: A new approach to requirement elicitation based on stakeholder recommendation and collaborative filtering. Int. J. Softw. Eng. Appl. 3(3), 51 (2012) 4. Sun, Z., Liu, Y., Cheng, Z., Yang, C., Che, P.: Req2Lib: a semantic neural model for software library recommendation. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 542–546. IEEE (2020 February)
22
V. Nama et al.
5. Zhang, X.-L., et al.: Non-functional requirement analysis and recommendation for software services. In: 2013 IEEE 20th International Conference on Web Services. IEEE (2013) 6. Xu, Y., et al.: A co-occurrence recommendation model of software security requirement. In: 2019 International Symposium on Theoretical Aspects of Software Engineering (TASE). IEEE (2019) 7. Cheema, S.M., et al.: A recommendation system for functional features to aid requirements reuse. In: 2020 3rd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). IEEE (2020) 8. Rajagopal, P., et al.: A new approach for software requirements elicitation. In: Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Network. IEEE (2005) 9. Kumara, M.M.C.D.: REPTR: Requirement perspective technology recommendation system. Diss. (2021) 10. Hu, X., Liu, J.: Requirements knowledge model construction and requirements elicitation method of avionics systems software based on multi-ontology. In: 2022 24th International Conference on Advanced Communication Technology (ICACT). IEEE (2022) 11. Deepak, G., Teja, V., Santhanavijayan, A.: A novel firefly driven scheme for resume parsing and matching based on entity linking paradigm. J. Discrete Mathem. Sci. Cryptogr. 23(1), 157–165 (2020) 12. Deepak, G., Santhanavijayan, A.: OntoBestFit: a best-fit occurrence estimation strategy for RDF driven faceted semantic search. Comput. Commun. 160, 284–298 (2020) 13. Leena Giri, G., Deepak, G., Manjula, S.H., Venugopal, K.R.: OntoYield: a semantic approach for context-based ontology recommendation based on structure preservation. In: Proceedings of International Conference on Computational Intelligence and Data Engineering, pp. 265– 275. Springer, Singapore (2018) 14. Surya, D., Deepak, G., Santhanavijayan, A.: KSTAR: a knowledge based approach for socially relevant term aggregation for web page recommendation. In: International Conference on Digital Technologies and Applications, pp. 555–564. Springer, Cham (2021 January) 15. Deepak, G., Priyadarshini, J.S., Babu, M.H.: A differential semantic algorithm for query relevant web page recommendation. In: 2016 IEEE International Conference on Advances in Computer Applications (ICACA), pp. 44–49. IEEE (2016 October) 16. Roopak, N., Deepak, G.: OntoKnowNHS: ontology driven knowledge centric novel hybridised semantic scheme for image recommendation using knowledge graph. In: Iberoamerican Knowledge Graphs and Semantic Web Conference, pp. 138–152. Springer, Cham (2021 November) 17. Ojha, R., Deepak, G.: Metadata driven semantically aware medical query expansion. In: Iberoamerican Knowledge Graphs and Semantic Web Conference, pp. 223–233. Springer, Cham (2021 November) 18. Rithish, H., Deepak, G., Santhanavijayan, A.: Automated assessment of question quality on online community forums. In: International Conference on Digital Technologies and Applications, pp. 791–800. Springer, Cham (2021 January) 19. Yethindra, D.N., Deepak, G.: A semantic approach for fashion recommendation using logistic regression and ontologies. In: 2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), pp. 1–6. IEEE (2021 September) 20. Deepak, G., Gulzar, Z., Leema, A.A.: An intelligent system for modeling and evaluation of domain ontologies for Crystallography as a prospective domain with a focus on their retrieval. Comput. Electr. Eng. 96, 107604 (2021)
Object Classification Using ECOC Multi-class SVM and HOG Characteristics Khushboo Jain1(B)
, Manali Gupta2 , Surabhi Patel3 , and Ajith Abraham4,5
1 School of Computer Science, University of Petroleum and Energy Studies, Dehradun, UK,
India [email protected] 2 GITAM University, Hyderabad, Telangana, India 3 School of Computing, DIT University, Dehradun, UK, India 4 Machine Intelligence Research Labs (MIR Labs), Auburn, Washington 98071, USA [email protected] 5 Center for Artificial Intelligence, Innopolis University, Innopolis, Russia
Abstract. Nowadays classification of images into labeled multi-classes is one of the major research problems. In the field of artificial intelligence, the Histogram of Oriented Gradients (HOG) is employed for extracting features to identify which class a particular image belongs to. HOG counts the occurrence of gradient orientations in localized sections of an image. HOG features contain information about both edge and its direction other than edge detection which only contains edge information. Based on HOG features we classified images in the given dataset using Error-Correcting Output Codes (ECOC) based multi-class Support Vector Machine (SVM) classifier. The performance of ECOC-based multi-class SVM is compared with Selection based on Accuracy Intuition and Diversity algorithm (SAID) to find out the outperforming classifier over the given dataset. Based on confusion metrics, it is observed that ECOC-based multi-class SVM performs better than the SAID algorithm. Keywords: Error-Correcting Output Codes (ECOC) · Histogram of Oriented Gradients (HOG) · Multi-class Support Vector Machine (SVM) · Multi-class weather dataset (MWD) · Object Classification
Abbreviations We enlist all the acronyms used in this paper with their connotations. ECOC FN FP HOG HSV K-NN LBP MWD
Error Correcting Output Codes False Negative False Positive Histogram of Oriented Gradients Hue-Saturation and Values K-nearest neighbor Local Binary Pattern Multi-class weather dataset
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 23–33, 2023. https://doi.org/10.1007/978-3-031-27440-4_3
24
OvO OvR PCA SAID SVM TN TP
K. Jain et al.
One vs One One vs Rest Principal Component Analysis Selection based on Accuracy Intuition and Diversity algorithm Support Vector Machine True Negative True Positive
1 Introduction Object detection or identification is one of the important tasks that a human being performs in his/her day-to-day life. For example, if we have some fruits having categories like apple, orange, grapes, banana, pineapple, etc. then we need to identify to which particular category the new fruit will belong. Order to enable a computing system/device to learn to solve such kinds of problems comes under the process called classification. The classification process firstly needs to identify the features of the objects, i.e., fruits in the previous example, based on which it can assign a particular category/class to a new object. The proper selection of features is very much responsible for efficiently classifying the objects. To classify the images into various categories, various methods have been suggested in past for feature selection and extraction. Ebied et al. [1] used an extension of Principal Component Analysis (PCA) for high dimension feature extraction of images during facial recognition. Oluwafemi et al. [2] used histogram-based features such as HueSaturation and Values (HSV), Gradients, Contrasts, and Local Binary Pattern (LBP) for image feature selection. Further, Walton et al. [3] used the Gaussian model for histogram-based feature selection in hyperspectral image classification. Ghosh et al. [4] used a multi-objective-based Genetic algorithm for histogram-based feature selection in Devanagari scripting. As observed that the previous works have significantly used histogram-based feature selection, we have also used it for image feature selection. In our work, we used the Histogram of Oriented Gradients (HOG) for feature extraction in image processing. HOG is used for feature description in areas such as computer vision and image processing to detect an object. This method counts the occurrence of gradient orientations in localized sections of an image. The HOG feature of an image is better than the edge feature because the edge feature only contains information about the edge while the HOG feature also contains the information about the direction of the edge. Thus, the HOG feature enables the classification of the images with better accuracy and lesser error classification as compared to edge-based features. The various classifiers have been used and their variants have been proposed in the past for the classification process of an image. These classifiers could be Naïve Bayes, Logistic Regression, K-nearest neighbor (K-NN), Support Vector Machine (SVM), etc. [5]. The SVM classifier is one of the famous classifiers to classify the given objects in some given categories based on their features. However, the SVM classifier can classify objects into two classes only. This means it is not suitable for problems where we have to
Object Classification Using ECOC Multi-class
25
classify into more than two classes. The extended versions of SVM such as One-to-Rest or One-to-One can classify into more than two classes but need to define a fixed number of binary classification problems. In this work, we implemented a multi-class classifier called Error-Correcting Output Codes (ECOC) based on multi-class SVM, which can classify objects into more than two classes by assigning an arbitrary number to each class. It is a simple yet powerful classifier to classify the given objects. When the representation gets overdetermined, it performs error correction while prediction to give a result having better predictive performance. Here, we used ECOC based multi-class SVM classifier to classify multi-class weather datasets [2] using HOG-based image features. 1.1 Background In 2005, the application of HOG become prevalent when Navneet Dalal firstly introduced the HOG feature in his ph.d. thesis using Bill Triggs for detecting pedestrians in images [6]. Later, HOG-based feature extraction has been used in various application areas. Rybski et al. [7] used HOG features for the classification of moving vehicles for analyzing the potential threats. To recognize characters and digits, Ebrahimzadeh et al. [8] used HOG features for efficiently recognizing hand-written digits using SVM. Later, Deore et al. [9] used HOG features for recognizing hand-written Devanagari characters by applying various classifiers such as SVM, K-NN, and NN Classifiers. To analyze image frames in videos, Kumar et al. [10] used HOG features of each frame image in a video to correct blurred images. Surasak et al. [11] used HOG features for detecting humans in videos by analyzing each frame of the video. Žemgulys et al. [12] used HOG feature extraction to detect referee signals during basketball games. For facial recognition, Nigam et al. [13] used HOG features by transforming the image from spatial to the frequency domain. Later, Rahmad et al. [14] compared viola-jones Haar Cascade (V-J) Classifier and HOG feature extraction-based classifier for facial recognition. They observed that the accuracy of HOG based classifier is more as compared to V-J. In the field of medical sciences, Gornale et al. [15] focused on developing a machine that can detect Knee Osteoarthritis in patients using HOG feature extraction. Further, Gour et al. [16] designed an automated system to detect eye disease i.e., glaucoma using the HOG features of the image. In literature, various classifiers like Logistic regression, Decision tree, Random Forest, Naïve Bayes, K-NN, SVM, etc. have been used for the classification of objects based on their feature vectors. among them, the SVM classifier has been extensively used for image classification based on the various properties of images acting as feature vectors. Chapelle et al. [17] used the SVM technique for image classification using histogram-based information. Further, Li et al. [18] used multi-label SVM to classify the images having more than one label. Yussof et al. [19] used AlexNet and multi-class SVM classifier to identify the Sea Turtles in the ocean using image features. Later, Gajalakshmi et al. [20] used a multi-class SVM classifier along with ECOC to recognize sign languages based on their invariant feature vector. Further, to classify skin diseases multi-class SVM is used in integration with a deep convolutional neural network [21]. Zhou et al. [22] designed an intelligent validation model using multi-classbased ECOC with SVM. further, using the ECOC classifier Rukhsar et al. [23] classified multi-class EEG signals in phase space for epileptic seizure detection. Khan et al. [24]
26
K. Jain et al.
used ECOC based multi-class SVM classifier to classify the oils based on the spectral analysis of acoustic signals. In our research work, we used multi-class SVM with ECOC to classify multi-class weather datasets [2] using HOG-based image features. 1.2 Research Objectives The research objectives of our works are dotted down in the following points: • To implement the HOG method for feature extraction of images in the given dataset. And to find out the cell size of HOG for efficient feature extraction. • To apply ECOC-based multi-class SVM to classify the given dataset. Further, to evaluate different performance metrics in confusion matrix to analyze the efficiency of ECOC-based multi-class SVM over weather dataset using HOG feature vector. • To compare the performance of the proposed classifier with the SAID [2] stacked ensemble algorithm. 1.3 Paper Organization We only our research work is systematized as follows: Sect. 2 discusses the proposed scheme for object classification with a flow chart. Section 3 elaborates the whole system which includes the dataset used, ECOC-based multi-class SVM classifier, and selection of appropriate cell sizes for HOG features. Section 4 presents the results obtained using ECOC-based multi-class SVM over a weather dataset of images using a confusion matrix. Section 5 finally concludes our whole research work and its significance.
2 Proposed Scheme for Object Classification All the “training images” will be stored in the “Training image database”. All images are transformed into their matching HOG feature, which counts the appearance of gradient orientation in a specific section of an image. The HOG descriptor focuses on an object’s shape or structure. It generates histograms for the image’s regions based on the orientation and magnitude of the gradient. These HOG features are stored in the feature database and this feature database is used to train this ECOC-based multi-class SVM classifier. The ECOC technique is used to predict multi-class classification tasks using binary classification models. It enables a multi-class classification task to be reframed into several binary classification tasks, which allows the native binary classification algorithms like SVM, and logistic regression to be employed directly. Each class can be encoded as an arbitrary number of binary classification tasks, and the multi-class classification problem is divided into a set number of binary classification problems. So, once the training is completed, we have a trained classifier. This trained classifier is ready to classify the input images. To test its performance, we have to give the test images. So, all the images are stored in the test image database. They are first converted to their corresponding HOG feature and then given to this trained classifier. So, this trained classifier will classify all the input images and we get the test outcome. The flowchart of the proposed approach is shown in Fig. 1.
Object Classification Using ECOC Multi-class
27
Fig. 1. Flowchart of the proposed approach
3 System Description The system for multi-class classification of the given image dataset is discussed below. 3.1 Image Datasets We have implemented the Object Classification Scheme using HOG features and ECOC Multi-Class SVM on “Multi-class weather dataset (MWD) for image classification” [2]. This MWD dataset provides a platform for outdoor weather research by extracting different features for identifying various weather scenarios. This dataset contains 1125 images of four kinds of weather: cloudy, rainy, shine, and sunrise. As all the images are of variable size and therefore, we have pre-processed and converted them to 300 pixels × 168 pixels. Sample image categories of the “multi-class weather dataset” (MWD) are illustrated below in Fig. 2.
a) Cloudy
b) Rain
c) Shine
d) Sunrise
Fig. 2. Sample images categories of “multi-class weather dataset” (MWD)
Instead of 1125 images, we are using only 800 images. A total of 640 images i.e., 80% of the dataset are used for training the model i.e., 160 images per class, and a total of 160 images are used for testing i.e., 40 images per class.
28
K. Jain et al.
Table 1. Statistical distribution of the total number of images, selected images, training images, and testing images MWD dataset Class
#Selected Images
#Training images
Cloudy
#Total images 300
200
160
#Testing images 40
Rain
215
200
160
40
Shine
253
200
160
40
Sunrise
357
200
160
40
1125
800
640
160
Total
3.2 ECOC Based Multi-class SVM In machine learning, there are many classification algorithms like logistic regression and SVM which are limited to binary classification problems. The binary classification problem has only two target classes such as YES and No, 0 and 1, Black and white, Apple or Orange. Etc. But in the real world, the classification problems generally belong to multiclass classification such as classifying 5 different garments, classifying 4 different shapes, or classifying the 4 different weather conditions. Because SVM can only handle binary classification tasks, our goal of multiclass classification cannot be met by binary SVM. There are also modified SVMs for multiclass problems like “One vs Rest” (OvR) and “One vs One.” (OvO) Classification can be accomplished by splitting a multiclass task into a fixed number of binary classification tasks. The ECOC technique, unlike the OvR and OvO procedures, allows each class to be represented as an arbitrary number of binary classification problems. When an overdetermined illustration is utilized, the extra model can be used for “Error correction” prediction, resulting in improved predictive performance [25, 26]. 3.3 Appropriate Cell Size Selection for HOG Feature For the input image of size 300 pixels × 168 pixels. If we find the HOG features for “Cloudy” in Fig. 3a, by using the cell size 8 × 8, the feature size is 20736 as shown in Fig. 3b. Although the representation looks good the feature size is very large which makes it memory hungry. Also, we find the HOG features for Fig. 3b by using the cell size 16 × 16, the feature size is 4356 as shown in Fig. 3c, which is also a good representation with a noticeable less feature size. Similarly, we find the HOG features for the same input by using the cell size 32 × 32, the feature size is 900 only as shown in Fig. 3d but the representation is not suitable. Therefore, to perform the experiment we have selected the cell size of 16 × 16 as it is promising and less memory hungry as shown in Fig. 3c for “Cloudy”, 3g for “Rain”, 3k for “Shine” and 3o for “Sunrise”. The HOG features of the "multi-class weather dataset" (MWD) by using the cell size 8 × 8, 16 × 16, and 32 × 32 are illustrated in Fig. 3(a–p).
Object Classification Using ECOC Multi-class
29
a. Input Image from cloudy
b. HOG features by using the cell size 8 × 8
c. HOG features by using the cell size 16 × 16
d. HOG features by using the cell size 32 × 32
e. Input Image from Rain
f. HOG features by using the cell size 8 × 8
g. HOG features by using the cell size 16 × 16
h. HOG features by using the cell size 32× 32
i. Input Image from Shine
j. HOG features by using the cell size 8 × 8
k. HOG features by using the cell size 16 × 16
l. HOG features by using the cell size 32 × 32
m. Input Image from Sunrise
n. HOG features by using the cell size 8 × 8
o. HOG features by using the cell size 16 × 16
p. HOG features by using the cell size 32 × 32
Fig. 3. (a–p) HOG features of MWD by using the cell size 8 × 8, 16 × 16 and 32 × 32.
4 Results and Discussions The pre-trained training and testing phases in this work were carried out using MATLAB (2021a). The operating system that was installed on the system was Windows 10 64-bit. An NVIDIA GeForce 1 GB graphics card and an Intel i5-Core 2.5 GHz processor with 8 GB RAM round out the hardware specifications. The confusion matrix for the multi-class problem is formulated with the actual count and predicted count using true positive (TP), false positive (FP), true negative (TN), and false-negative (FN) as shown in Fig. 4. Here, the representation TPcloudy means the predicted class and actual class are the same i.e., cloudy. The Ecloudy,rain is a misclassification when the actual class is cloudy but the predicted class is rain.
30
K. Jain et al. PREDICTED CLASS
ACTUAL CLASS
Cloudy
Rain
Shine
Sunrise
Cloudy Rain Shine Sunrise
Fig. 4. Confusion Matrix of MWD for ECOC based multi-Class SVM
Here the corresponding TP, FP, TN, and FN values are calculated as follows: • The testing data of any class is the sum of the corresponding row which includes TP and FN for that class. For instance, the testing data for the “cloudy” class is the sum of TPcloudy , Ecloudy,rain , Ecloudy,shine , and Ecloudy,sunrise • The TPs of any class are represented on the diagonal of the confusion matrix. Here • TPcloudy, TPrain, TPshine, and TPsunrise are the TPs of cloudy, shine and rain, sunrise class respectively. • The FPs of any class are the sum of values in the conforming column excluding the values of TPs. For instance, the FP for “rain” class is the sum of Erain,cloudy , • Erain,shine , and Erain,sunrise . • The TNs of any class will be the sum of all rows and columns excluding the rows and columns of that class. The performance metrics such as accuracy, precision, specificity, negative predictive value, and sensitivity are estimated according to the following Equation for each class using the TP, TN, FP, and FN values calculated above. Accuracy =
TP + TN TP + TN + FP + FN
(1)
TP TP + FP
(2)
Preision = Specificity =
TN TN + FP
NegativePredictionValue = Sensitivity =
TN TN + FN
TP TP + FN
(3) (4) (5)
The Object Classification using HOG features and ECOC Multi-Class SVM model’s accuracy for cloudy is 98.125%, rain is 98.76%, the shine is 98.13% and sunshine is 98.75%. The confusion matrix for the proposed work is illustrated in Fig. 5. The analysis results based on the performance metrics are presented in Table 2.
Object Classification Using ECOC Multi-class
31
Table 2. Comparative Results of this proposed work Class
Accuracy%
Pr ecision%
Specificity%
Negative Pr edictionValue%
Sensitivity%
Cloud
98.125
97.5
99.15
98.33
95.12
Rain
98.76
97.5
99.17
99.17
97.5
Shiny
98.13
95
98.36
99.17
97.43
Sunrise
98.75
97.5
99.16
99.16
97.5
The comparison results with the related work implemented on the same dataset are depicted in Table 3. Oluwafemi et al. [2] used the MWD for image classification in their work which is publicly available. The authors presented an intuitive selection approach (SAID). Histograms of several characteristics such as contrast, local binary pattern, saturation, value, etc. are retrieved using this method. They were then stacked and sorted using several classification algorithms (Naive Bayes, KNN, Random Forest, and SVM). Their proposed model’s overall accuracy success rate was 81%- 95%. Because of the low resolution of the image data, we believe their approaches’ histogram-based methodologies are inadequate. Table 3. Comparison of other work using the same dataset Article
Year
Model
Class
Accuracy
Oluwafemi et al. [2]
2019
SAID of stacked ensemble algorithms
Cloudy
81.5%
Rain
95.2%
Proposed Work
2022
HOG features and ECOC Multi-Class SVM
Shine
88.4%
Sunshine
81.7%
Cloudy
98.125%
Rain
98.76%
Shine
98.13%
Sunshine
98.75%
5 Conclusion The pre-trained training and testing phases in this work were carried out using MATLAB (2021a). The operating system that was installed on the system was Windows 10 64bit. An NVIDIA GeForce 1 GB graphics card and an Intel i5-Core 2.5 GHz processor with 8 GB RAM round out the hardware specifications. This work presented a new dimension to multi-class classification using the HOG technique for extracting features to identify which class a particular image belongs to. HOG counts the occurrence of
32
K. Jain et al.
gradient orientations in localized sections of an image and the ECOC is a technique for using binary classification models on multi-class classification models. The proposed research work is based on the HOG features to classify images in the given dataset using ECOC based multi-class SVM classifier. The performance of Object Classification using HOG features and ECOC Multi-Class SVM is compared with the SAID method to see the performance of both the classifiers on the same dataset. As a result, the proposed work contributed to improving the model’s effectiveness for all performance metrics i.e., accuracy, precision, specificity, negative predictive value, and sensitivity. In future work, the proposed model can be trained with datasets of different fields, and this time we aim to use the HOG method along with the ECOC technique for multi-class classification prediction tasks. Acknowledgement. This research has been financially supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70-2021-00143 dd. 01.11.2021, IGK 000000D730321P5Q0002). Authors acknowledge the technical support and review feedback from AILSIA symposium held in conjunction with the 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022).
References 1. Ebied, H.M.: Feature extraction using PCA and Kernel-PCA for face recognition. In: 2012 8th International Conference on Informatics and Systems (INFOS), pp. MM-72. IEEE (2012 May) 2. Oluwafemi, A.G., Zenghui, W.A.N.G.: Multi-class weather classification from still image using said ensemble method. In: 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), pp. 135–140. IEEE (2019 January) 3. Walton, N.S., Sheppard, J.W., Shaw, J.A.: Using a genetic algorithm with histogram-based feature selection in hyperspectral image classification. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1364–1372 (2019 July) 4. Ghosh, M., Guha, R., Mondal, R., Singh, P.K., Sarkar, R., Nasipuri, M.: Feature selection using histogram-based multi-objective GA for handwritten Devanagari numeral recognition. In: Intelligent engineering informatics, p. 471–479. Springer, Singapore (2018) 5. Singh, A., Thakur, N., Sharma, A.: A review of supervised machine learning algorithms. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1310–1315. Ieee (2016 March) 6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), Vol. 1, pp. 886–893. Ieee (2005 June) 7. Rybski, P.E., Huber, D., Morris, D.D., Hoffman, R.: Visual classification of coarse vehicle orientation using histogram of oriented gradients features. In: 2010 IEEE Intelligent vehicles symposium, pp. 921–928. IEEE (2010 June) 8. Ebrahimzadeh, R., Jampour, M.: Efficient handwritten digit recognition based on histogram of oriented gradients and SVM. International Journal of Computer Applications 104(9) (2014) 9. Deore, S.P., Pravin, A.: Histogram of oriented gradients based off-line handwritten devanagari characters recognition using SVM, K-NN and NN classifiers. Rev. d’Intelligence Artif. 33(6), 441–446 (2019)
Object Classification Using ECOC Multi-class
33
10. Kumar, A.: Deblurring of motion blurred images using histogram of oriented gradients and geometric moments. Signal Processing: Image Communication 55, 55–65 (2017) 11. Surasak, T., Takahiro, I., Cheng, C.H., Wang, C.E., Sheng, P.Y.: Histogram of oriented gradients for human detection in video. In: 2018 5th International conference on business and industrial research (ICBIR), pp. 172–176. IEEE (2018 May) 12. Žemgulys, J., Raudonis, V., Maskeli¯unas, R., Damaševiˇcius, R.: Recognition of basketball referee signals from videos using Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM). Procedia computer science 130, 953–960 (2018) 13. Nigam, S., Singh, R., Misra, A.K.: Efficient facial expression recognition using histogram of oriented gradients in wavelet domain. Multimedia Tools and Applications 77(21), 28725– 28747 (2018). https://doi.org/10.1007/s11042-018-6040-3 14. Rahmad, C., Asmara, R.A., Putra, D.R.H., Dharma, I., Darmono, H. and Muhiqqin, I.: Comparison of Viola-Jones Haar Cascade classifier and histogram of oriented gradients (HOG) for face detection. In: IOP conference series: materials science and engineering, Vol. 732, No. 1, p. 012038. IOP Publishing (2020) 15. Gornale, S.S., Patravali, P.U., Marathe, K.S., Hiremath, P.S.: Determination of osteoarthritis using histogram of oriented gradients and multiclass SVM. Int. J. Image Graph. Sig. Proc. 9(12), 41 (2017) 16. Gour, N., Khanna, P.: Automated glaucoma detection using GIST and pyramid histogram of oriented gradients (PHOG) descriptors. Pattern Recogn. Lett. 137, 3–11 (2020) 17. Chapelle, O., Haffner, P., Vapnik, V.N.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks 10(5), 1055–1064 (1999) 18. Li, X., Wang, L., Sung, E.: Multilabel SVM active learning for image classification. In: 2004 International Conference on Image Processing, 2004. ICIP’04, Vol. 4, pp. 2207–2210. IEEE (2004 October) 19. Yussof, W.N.J.H.W., Shaharudin, N., Hitam, M.S., Awalludin, E.A., Rusli, M.U., Hoh, D.Z.: Photo Identification of Sea Turtles Using AlexNet and Multi-Class SVM. In: SoMeT, pp. 23– 31 (2020 September) 20. Gajalakshmi, P., Sharmila, T.S.: Sign language recognition of invariant features based on multiclass support vector machine with beam ECOC optimization. In: 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), pp. 587–591. IEEE (2017 September) 21. Hameed, N., Shabut, A.M., Hossain, M.A.: Multi-class skin diseases classification using deep convolutional neural network and support vector machine. In: 2018 12th International Conference on Software, Knowledge, Information Management & Applications (SKIMA), pp. 1–7. IEEE (2018 December) 22. Zhou, Y., Fang, K., Yang, M., Ma, P.: An intelligent model validation method based on ECOC SVM. In: Proceedings of the 10th International Conference on Computer Modeling and Simulation, pp. 67–71 (2018 January) 23. Rukhsar, S.: Discrimination of multi-class EEG signal in phase space of variability for epileptic seizure detection using error correcting output code (ECOC). Int. J. Inf. Technol. 14(2), 965–977 (2018). https://doi.org/10.1007/s41870-018-0224-y 24. Khan, M., Reza, M.Q., Kumar Salhan, A., Sirdeshmukh, S.P.: Classification of oils by ECOC based multi-class SVM using spectral analysis of acoustic signals. Appl. Acoust. 183, 108273 (2021) 25. Jain, K., Singh, A., Singh, P., Yadav, S.: An improved supervised classification algorithm in healthcare diagnostics for predicting opioid habit disorder. Int. J. Reliable and Quality E-Healthcare (IJRQEH) 11(2) (2022) 26. Jain, K., Singh, A.: A two vector data-prediction model for energy-efficient data aggregation in wireless sensor network. Concurrency and Computation: Practice and Experience, e6840 (2022)
GA Evolved Configuration Data for Embryonic Architecture with Built-in Self-test Gayatri Malhotra1,2(B) , Punithavathi Duraiswamy2 , and J. K. Kishore1 1
2
U R Rao Satellite Centre, Bangalore, India [email protected] M S Ramaiah University of Applied Sciences, Bangalore, India
Abstract. The embryonic architecture, which draws inspiration from the biological process of ontogeny, has built-in capabilities for self-repair. The embryonic cells have a complete genome, allowing the data to be replicated to a cell that is not defective in the event of a cell failure in the embryonic fabric. A novel, specially designed genetic algorithm (GA) is used to evolve the configuration information for embryonic cells. Any failed embryonic cell must be notified by the proposed Built-in Self-test (BIST) module of the embryonic fabric. In this study, an effective centralized BIST design for a novel embryonic fabric is suggested. If the self-test mode is activated, the proposed BIST scans every embryonic cell. To optimize the data size, the genome or configuration data of each embryonic cell is decoded using the Cartesian Genetic Programming (CGP) format. This study evaluates the GA’s performance on 1-bit adder and 2-bit comparator circuits present in embryonic cells. Fault detection at every cell is made possible by the BIST module’s design. Additionally, the CGP format can offer gatelevel fault detection. Customized GA and BIST are coupled with the novel embryonic architecture. The embryonic cell can perform self-repair through the process of scrubbing data for temporary defects. Keywords: Bio-inspired systems · Embryonics BIST · Self-test · Genetic Algorithm
1
· Embryonic fabric ·
Introduction
Due to communication delays and limited hardware resources, deep space systems need a distinct method for fault tolerance. These systems ought to be able to adapt themselves to deal with unexpected challenges. Instead of tripling all the FPGA logical cells to achieve TMR, the embryonic fabric-based cellular design offers self-repair employing extra spare cells. Therefore, this method is appropriate for designing smaller space systems for far space missions. The genome configuration data that is stored controls how the embryonic circuit c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 34–43, 2023. https://doi.org/10.1007/978-3-031-27440-4_4
GA Evolved Configuration Data
35
grows and functions. As a result of its structure, the embryonic cellular structure possesses inherent error tolerance. By self-repairing, the embryonic cellular structure is possible to instill fault tolerance in the design [1,2]. The three life axes of Phylogenesis, Ontogenesis, and Epigenesis serve as the inspiration for the electronic tissue known as POEtic design [3]. The cellular expansion that aids in self-repair is referred to as the ontogenetic axis. Systems can develop, replicate, and repair themselves by using the structural principles of live organisms, such as multicellular architecture, cellular division, and cellular differentiation [4]. In the electronic DNA (eDNA) technique described in [5], the electronic cell (eCell) interprets the eDNA to determine the function that it must perform and, in the event that one eCell fails, to transfer the function to another eCell. Each cell in the reconfigurable self-healing embryonic system stated in the [6] has a self-diagnostic system. In this reference, the BIST module employs the EXOR gate to compare the output of the function unit to its replica. With more outputs, there will be more EXOR gates and function unit duplication. Additionally, the BIST design needs to be adjusted for various cell functions. This paper proposes a BIST that is centralized and requires fewer resources. It is based on the comparison between the operating circuit response signature and the golden reference signature (no fault state). The size of the signature register will only expand for larger fabrics with more outputs. The suggested BIST in this work requires a change in the output signal allocation to a linear feedback shift register (LFSR) that generates signature responses. The proposed BIST is
ECELL1
Self-Repair
North
Config
CGP Decoder Self-Repair
SW1
East West
Self-Repair
SW2
Self-Repair
North
East West
Config
CGP Decoder
North
Memory
Memory
SW3
East
West
SW10
East
South
South
South
South
ECELL10
Config
CGP Decoder
North
Memory
Memory
West
ECELL3
ECELL2 Config
CGP Decoder
Fault_indicaƟon data_in confin clk
Memory
Embryonic Fabric Controller Built-In-Self-Test
fabin
fpgaout
Fig. 1. Embryonic Fabric Architecture
for doing self-checks at regular intervals while the system is running. Detection is achievable at the level of the cell, which contains many molecules or nodes. Additionally, fault detection is possible at the molecular level, which corresponds to a node in the CGP format. Some methods of self-healing in the embryonic fabric
36
G. Malhotra et al.
call for the removal of an entire row or column of embryonic cells [7]. According to [8], the scrubbing of configuration memory is intended to self-repair when it comes to transient faults. Instead of eliminating a complete embryonic cell’s row or column. In most cases, temporary errors can be rectified following data scrubbing [9]. In the proposed embryonic fabric, spare cells will only be used if faulty cell isolation fails. One-bit adder and two-bit comparator cell designs are simulated and coded using Verilog. Genetic algorithms are used to generate the genome data, which is then decoded using Cartesian genetic programming (CGP). The embryonic architecture for digital circuits with CGP data is detailed in Sect. 2. The GA design for CGP-format configuration data generation is discussed in Sect. 3. The Built-in Self-test design methodology for Embryonic Cells is discussed in Sect. 4. The design’s supporting components, such as the controller, random pattern generator (RPG), response analyzer, and technique for fault detection, are addressed. The findings of fault detection for adder and comparator cells are discussed in Sect. 5. Section 6 concludes the results and discusses the area of future research.
0
Input1
1
4
3 XOR
3
XOR 2
5
XOR
SUM
6
CARRY
7
Input2 3
0 Previous Carry
1
AND
AND 4
2
6
8
Fig. 2. CGP Configuration of 1-bit Adder
Fig. 3. CGP Configuration of 2-bit comparator
2
Embryonic Digital Circuit Architecture Using CGP Data
Each cell in the proposed embryonic fabric has a CGP decoder, memory, and selfrepair unit. Whereas the self-repair module is a unit of each cell, the hardware
GA Evolved Configuration Data
37
for the BIST controller is shared by the fabric. The digital circuit is represented in CGP format as a rectangular array of nodes. All inputs, operations, and outputs of a node are sequentially indexed using integers. A linear string of these integers makes up the configuration information (genome information) for the embryonic cell. The GA performs the design optimization more effectively than the conventional methods [10,11]. The configuration genome data is generated using a GA, whereas the format of the genome data is Cartesian Genetic Programming (CGP) [12,13]. HsClone and OIMGA algorithms are used to generate CGP data, however, tests of other GA are also planned [14]. Embryonic cells, switch boxes, and the fabric controller module constitute the novel embryonic fabric that is being proposed. Input-output devices, switch boxes, and data flow between cells have to be controlled by the fabric controller. In Fig. 1, the novel embryonic fabric architecture is shown. It consists of ten cells; four each for 1-bit adder and 2-bit comparator with two spare cells. Using switch boxes to cascade the signals, four 1-bit adder cells can create a 4-bit adder, and four 2-bit comparator cells can create an 8-bit comparator. Each embryonic cell contains CGP decoder, memory, and self-repair module. The genome data for the first cell is loaded outside, and during run-time, cloning takes care of replicating the genome data bits to subsequent cells. Earlier, we described the cloning method where data is in Look Up Table (LUT) format [15]. When it comes to a modular design, the CGP data format prevails over the LUT data format. CGP format requires (45 + 4) bits but the 4-bit adder requires 29 (1-bit adder and clone count for four cells). The CGP data for a 1-bit adder and a 2-bit comparator is contained in the 161 bits of the genome data. Each node in CGP format is represented by the expression in1, in2, logicalf unction. Figures 2 and 3 each depict the CGP configuration representation of a 1-bit adder and a 2-bit comparator, respectively. Each logical function is represented as a node in CGP. The sequence of node triplets forms the configuration data in CGP format. Carry for adder cells, and comparison signal for lower bit comparator cells are routed via switch box.
3
Novel Parallel GA Design for CGP Configuration Data Generation
Due to the computational complexity of the GA, it may take a long time to arrive at a converged solution. According to [16], the parallel implementation of a GA on FPGA results in processing time optimization. To create parallel pipelines, the fitness, crossover, mutation, and selection modules are duplicated. The Hsclone (half-siblings-and-a-clone) GA [17] is based on a crossover technique that relies on fitness criteria. The crossover rate is lowered based on the improvement over the previous solution. In order to achieve faster convergence, the parallel pipeline is applied to the modified HsClone method [17] to achieve faster convergence. It is possible for the algorithm to execute simultaneously on more sets of data due to parallel processing. A Fine-grained parallel GA has an
38
G. Malhotra et al.
advantage over a sequential GA, and the FPGA has a parallel platform advantage [18]. OIMGA embeds two searches, a global and a local search. Starting with a global search, a more detailed exploration is done by the local search. The local search explores the global search region to find local optimum individual (LOI). In order to produce the CGP data for adder and comparator circuits, another OIMGA algorithm is also updated and used. Table 1. OIMGA Parameters Parameter Description l
Length of Individual
n
Size of Population
m
Size of space around Local Optimum Individual
t-gens
max no. of consecutive global generations without improvement max no. of consecutive local generations without improvement
d-adjustor
range of mutation
m-rate
mutation probability
Log(Convergence Time )(ms)
k-gens
Convergence Time for OIMGA and PHsClone Algorithms 2.5
2 1.5
1 0.5 0 -0.5
Adder
Counter
18
45
Comparator
120
-1
-1.5
Configuration Bits in CGP Format PHsClone
OIMGA
Fig. 4. Convergence Time for PHsClone and OIMGA
3.1
Optimum Individual Monogenetic GA-OIMGA
OIMGA does not use much memory to store population. The OIMGA parameters are shown in Table 1. The pseudo code for the algorithm is (1) (2) (3) (4) (5) (6) (7)
g = t-gens, d = d-adjustor; g >0, start a global search; for i = 1 to n, loiChrom = randCreateIndv(l); loiFit = fitness(loiChrom); If loiFit > bestFit, bestChrom = loiChrom, bestFit = loiFit; endif, endfor; k = k-gens;
GA Evolved Configuration Data
(8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) 3.2
39
while k>0, start a local search; update =0, number of updates of tempChrom and bestChrom; for i = 1 to m, tempChrom = bestChrom; for j = d to l, micro mutation in the range of d to l; If rand() < m-rate, tempChrom(j) = not(tempChrom(j)); endif, endfor; tempFit = fitness(tempChrom); If tempFit> bestFit, bestChrom = tempChrom, bestFit = tempFit; update = update + 1 ; k = k-gens, d = d-adjustor, endif, endfor; If update = 0, d = d+1, endif; k = k-1, endwhile; If bestFit > topFit, topChrom = bestChrom, topFit = bestFit; g = t-gens, endif, g = g−1, endwhile; Parallel HsClone GA
The ‘PHsClone’ GA is developed as an approach based on the ‘half-siblings-anda-clone’ crossover technique. The pseudo code for the algorithm for adder and comparator is(1) (2) (3) (4) (5) (6) (7)
Generation of four parallel valid random patterns CGP constraint to validate the pattern fitness Check for output function for all node outputs If fitness = best fit from any node, new pattern is evolved If fitness< M percent; Apply Crossover and mutation with random pattern If fitness> M percent; Apply Crossover and mutation with current pattern Repeat from step 2 till convergence
For the adder and comparator, the value of M is tuned for 40% and 70%, respectively. The value of M is chosen based on the fastest convergence rate. The testing and population generation operations run in four concurrent paths. For a 1-bit adder, the configuration data size is 45 bits, whereas, for a 2-bit comparator, it is 120 bits. The HsClone’s drawback is that its population must be stored in memory. Figure 4 displays the convergence time vs. configuration data length vs. PHsClone and OIMGA algorithm performance. It is compared for the 45-bit 1-bit adder, 120-bit 2-bit comparator, and 1-bit counter (18 bits). It is compared for a 1-bit adder (45 bits), 2-bit comparator (120 bits), and 1-bit counter (18 bits). Following are the inferences from the data, – The PHsClone outperforms OIMGA in the 1-bit adder scenario. Here, there is a significant delay in convergence with OIMGA. The reason is that both the sum and carry functions need to be developed in the same 45-bit data. This necessitates data optimization and a larger search area. Due of its four parallel processes and additional genetic operators, PHsClone is able to accomplish this.
40
G. Malhotra et al.
– The OIMGA performs better for 2-bit comparator because only one function that was “larger” out of two evolved. It needs five nodes (60 bits) out of ten nodes (120 bits). There are spare node data bits available. When there are lots of extra nodes accessible, the OIMGA operates more efficiently since invalid nodes can be rejected. The future proposed customized algorithm is where parallel processes are executed, and additional genetic operators are available. When there is a greater amount of spare data available, the OIMGA operates better. Parallel HsClone operates better with smaller amounts of data. The drawback of PHsClone is that it requires extra hardware due to parallel processes and related registers.
4
Embryonic Cells with Built-in Self-Test Design
The fundamental BIST architecture necessitates the inclusion of the three modules in the embryonic fabric design controller, 1. A Test Controller 2. A Test Pattern Generator, and 3. A Response Analyzer. The test patterns for the circuit under test (CUT) are created by the LFSR-based test pattern generator (TPG). To compare the CUT response with the saved response, a response analyzer (RA) is needed. It is created using LFSR, a signature analyzer. The responses are compressed into signatures. For a fault-free scenario, the reference golden signatures have already been preserved. The response signature is compared with the stored reference signature to find if CUT is good or faulty. When activated, the BIST controller, which is a component of the embryonic fabric controller, starts the Self-test mode. The controller passes the “faultindication” from the RA to the defective cell for further action. The implemented TPG LFSR is 40-stage (four bits inputs to each cell) with the characteristic polynomial equation as (1) P (X) = X 39 + X 38 + 1 In a ten-cell embryonic fabric, the responses of each cell must be examined. Sum and carry are the two outputs of an adder embryonic cell. Additionally, each embryonic comparator cell has two outputs: AsmlB and AlargB. The form of Outputs from Embryonic Adder cell
Sum(0)
Sum(1)
Carry(0)
DFF0 CLK
DFF1 CLK
Carry(4)
DFF8 CLK
DFF9 CLK
Fig. 5. MISR for Embryonic Adder Cells Outputs
LFSR that combines all of the cell outputs into one unit is called a Multiple-Input Signature Register (MISR). One “signature-reg-adder” is created by superimposing all adder cell replies. Similar to this, all comparator cell responses are
GA Evolved Configuration Data
41
superimposed into the “signature-reg-cmprtr.” Figures 5 and 6 depict the adder MISR and comparator MISR designs, respectively. Each MISR is a 10-stage LFSR with five adder cells (four cascaded and one spare) and five comparator cells, each having two outputs. The implemented LFSR has characteristic polynomial as (2) P (X) = X 9 + X 8 + 1 Outputs from Embryonic Comparator cell AsmlB(0)
AsmlB(1)
AlargB(0)
DFF0 CLK
DFF1 CLK
AlargB(4)
DFF8 CLK
DFF9 CLK
Fig. 6. MISR for Embryonic Comparator Cells Outputs
Fig. 7. Simulation Results of BIST
Fig. 8. Self Repair Process through Scrubbing
5
Embryonic Adder and Comparator Cell Fault Detection
The degree of the data polynomial (cell output response) for fault detection should be less than 210 − 1, where 10 is the degree of the LFSR’s characteristic polynomial. After computing the estimated clock cycles, the fault-free signature analysis register (SAR) is recorded. The Self-test signal is set to low, and the “signature-reg-adder” and “signature-reg-cmprtr” are saved after the same number of clock cycles. A comparison is made between the “signature” values
42
G. Malhotra et al.
for the adder and comparator and the stored fault-free signature values. One of the adder cells’ output is set to be “stuck at 0” to simulate the fault. After the RA module detects the fault, the fabric controller receives the fault detection signal and directs it to the defective cells. Self-repair is started by reloading the configuration data into all defective cells (Scrubbing). The BIST module signals are depicted in Fig. 7. Self-repair process is shown in Fig. 8. The complete process of fault simulation to fault detection is depicted in Fig. 9. The fault is further localized at the sum or carry function level following the cell-level fault detection.
Self Test = ‘1’
Golden Signature
MISR Response Analyzer
Self Repair at adder cell level
TPG
Fault simulation for Sum stuck at ‘0’
Self test = ‘0’
Scrubbing of adder cells
MISR for adder and comparator
Self Test = ‘1’
Self test = ‘0’
TPG
Signature comparator
Fault Detection
Fig. 9. Fault Simulation and Fault Detection
6
Conclusion and Scope for Future Work
For deep space systems, a novel embryonic fabric architecture with adder and comparator genes is presented. The embryonic fabric architecture employs the BIST. Both fault detection and memory scrubbing have been performed. The simulation results are confirmed for the Self-repair by Scrubbing implementation. Customized GAs are used to evolve the configuration information for the adder and comparator in CGP format. Sequential circuits must also be added to the fabric design. In the event that there is a permanent error in the cells, the BIST method must be modified. For fault reliability, the possibility of a defect occurring within the BIST module must also be taken into account.
References 1. Shanshan, Y., Youren, W.: A new self-repairing digital circuit based on embryonic cellular array. In: ICSICT-2006: 2006 8th International Conference on Solid-State and Integrated Circuit Technology, Proceedings, pp. 1997-1999 (2006). https:// doi.org/10.1109/ICSICT.2006.306573 2. Mange, D., Stauffer, A., Tempesti, G.: Embryonics: a macroscopic view of the cellular architecture. In: Sipper, M., Mange, D., P´erez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 174–184. Springer, Heidelberg (1998). https://doi.org/10. 1007/BFb0057619 3. Thoma, Y., Tempesti, G., Sanchez, E.: POEtic: an electronic tissue for bio-inspired cellular applications. Biosystems 76(1-3), 191–200 (2004)
GA Evolved Configuration Data
43
4. Stauffer, A., Mange, D., Rossier, J.: Design of self-organizing bio-inspired systems. In: Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007) (2007) 5. Boesen, M.R., Madsen, J.: eDNA: a bio-inspired reconfigurable hardware cell architecture supporting self-organisation and self-healing. In: Proceedings - 2009 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009, pp. 147154 (2009). https://doi.org/10.1109/AHS.2009.22 6. Zhang, X., Dragffy, G., Pipe, A.G., Gunton, N., Zhu, Q.M.: A reconfigurable selfhealing embryonic cell architecture. In: Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, pp. 134-140 (2003) 7. Zhang, Z., Wang, Y.: Method to self-repairing reconfiguration strategy selection of embryonic cellular array on reliability analysis. In: Proceedings of the 2014 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2014, pp. 225232 (2014). https://doi.org/10.1109/AHS.2014.6880181 8. Zhai, Z., Yao, Q., Xiaoliang, Y., Rui, Y., Youren, W.: Self-healing strategy for transient fault cell reutilization of embryonic array circuit. In: 2018 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2018, 1, pp. 225-232 (2018). https://doi.org/10.1109/AHS.2018.8541472 9. Salvador, R., Otero, A., Mora, J., de la Torre, E., Sekanina, L., Riesgo, T.: Fault tolerance analysis and self-healing strategy of autonomous, evolvable hardware systems. Proceedings - 2011 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011, pp. 164-169 (2011). https://doi.org/10.1109/ ReConFig.2011.37 10. Chong, K.H., Aris, I.B., Sinan, M.A., Hamiruce, B.M.: Digital circuit structure design via evolutionary algorithm method. J. Appl. Sci. 7, 380–385 (2007) 11. Benkhelifa, E., Pipe, A., Dragffy, G., Nibouche, M.: “Towards evolving fault tolerant biologically inspired hardware using evolutionary algorithms”. In: IEEE Congress Evolut. Comput. 2007, 1548–1554 (2007). https://doi.org/10.1109/CEC. 2007.4424657 12. Miller, J.F.: Cartesian genetic programming. In: Miller, J. (eds.) Cartesian Genetic Programming. Natural Computing Series. Springer, Heidelberg (2011). https://doi. org/10.1007/978-3-642-17310-3 13. Malhotra, G., Lekshmi, V., Sudhakar, S., Udupa, S.: Implementation of threshold comparator using Cartesian genetic programming on embryonic fabric. Adv. Intell. Syst. Comput. 939, 93–102 (2019) 14. Stomeo, E., Kalganova, T., Lambert, C.: A novel genetic algorithm for evolvable hardware. In: 2006 IEEE Congress on Evolutionary Computation, CEC 2006, pp. 134-141 (2006). https://doi.org/10.1109/CEC.2006.1688300 15. Malhotra, G., Becker, J., Ortmanns, M.: Novel field programmable embryonic cell for adder and multiplier. In: 9th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME-2013) (2013) 16. Torquato, M.F., Fernandes, M.A.C.: High-Performance Parallel Implementation of Genetic Algorithm on FPGA. Circuits Systems Signal Process. 38(9), 4014–4039 (2019). https://doi.org/10.1007/s00034-019-01037-w 17. Zhu, Z., Mulvaney, D.J., Chouliaras, V.A.: Hardware implementation of a novel genetic algorithm. Neurocomputing 71(1–3), 95–106 (2007). https://doi.org/10. 1016/j.neucom.2006.11.031 18. AL-Marakeby, A.: FPGA on FPGA: implementation of fine-grained parallel genetic algorithm on field programmable gate array. Int. J. Comput. Appl. 80(6), 29–32 (2013). https://doi.org/10.5120/13867-1725
A Multi-layer Deep Learning Model for ECG-Based Arrhythmia Classification Khushboo Jain1(B) , Arun Agarwal2 , Ashima Jain2 , and Ajith Abraham3,4 1
3
School of Computer Science, University of Petroleum and Energy Studies, Dehradun, UK, India [email protected] 2 Ramanujan College, Delhi University, Delhi, India [email protected] Machine Intelligence Research Labs (MIR Labs), Auburn, WA 98071, USA [email protected] 4 Center for Artificial Intelligence Innopolis University, Innopolis, Russia
Abstract. Electrocardiogram (ECG) is consistently used as a measure to medical monitoring technology that records the cardiac activity to identify the cardiovascular diseases (CVDs) which are the foremost reason of death globally these days. Regrettably, seeking medical experts to analyse big amount ECG signal consumes an inordinate number of medical resources. As a result, machine learning based approaches for identifying ECG characteristics have gradually gained popularity. However, these traditional approaches have some disadvantages, such as the need for manual feature recognition, complex representations, and a prolonged training period. In this article, we present a method for identifying the five classes of heart-beat categories in the MIT-BIH Arrhythmia dataset using A Multi-layer Deep Learning Model (MLDLM) in compliance with the AAMI EC57 standard. The proposed MLDLM technique was tested using MIT-BIH Arrhythmia Dataset which has 1,09,446 ECG heart-beats and sample frequency 125 Hz. This initial dataset contains the classes N, S, V, F, and Q. The PTB Diagnostic ECG Dataset is the second dataset, which is divided into two categories. The results show that the suggested approach is capable of making predictions with average accuracies of 98.75 on MIT-BIH Arrhythmia dataset and 98.87 on MI classification. Keywords: Arrhythmia · Accuracy · CVD · ECG Heart-beats · Healthcare · Multi-classification
1
· Deep CNN ·
Introduction
Cardiovascular diseases (CVDs) are the world’s leading cause of death today. Every year, nearly 18 million people are killed by these deadly diseases. CVDs include heart and blood vessel disorders such as rheumatic heart-disease, coronary heart-disease, cerebrovascular illness, and others. More than 80% CVD c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 44–52, 2023. https://doi.org/10.1007/978-3-031-27440-4_5
Multi-layer Deep Learning Model
45
fatalities are caused by heart-attacks and heart-strokes, with 33% of these deaths happening before the age of 70 [1]. Willem Einthoven devised the first practical ECG in 1895 and was awarded the Nobel Prize in Physiology in 1924 for his achievement [2]. Although later technology developments resulted in better and more portable ECG equipment, Einthoven is responsible for much of the vocabulary used to describe an ECG [3]. The initials P, Q, R, S, and T, that he assigned to the various deflections are still used today. Figure 1 shows ECG points P, Q, R, S, and T in the heart cycle.
Fig. 1. ECG signal points P, Q, R, S, and T in the heart cycle
ECG is a non-invasive investigative method that archives the physiological action of the heart over time. These signals are used to diagnose many CVDs, including atrial fibrillation, premature contractions of atria, premature contractions of ventricles, myocardial infarction, and congestive heart fail [4]. In current era, there has been a rapid development of portable ECG monitors such as the Holter monitor, as well as wearable gadgets in several health-care domains, like Apple Watch. As a result, the amount of analysed ECG signals related data has increased at such a rapid rate that it is difficult to record and analyse it [5]. As a result, automatically and correctly analysing ECG data becomes a fascinating topic, and it is also deployed for sleep staging and biometric identification. Cardiologists and medical practitioners routinely use ECG to monitor the cardiac well-being. The foremost issue with manual analysis of ECG signals is the difficulty in identifying and categorising various wave form and morphologies in such signal. This task takes a long time and is prone to errors when performed by a human. It should be noted that proper cardiovascular diagnosis accounts for approximately one-third of all deaths worldwide [6,7]. Millions of people, for example, suffer from uneven heart-beats, which can be deadly also. As a result,
46
K. Jain et al.
highly accurate, precise as well as low-cost arrhythmic heart-beat diagnosis is the need of the hour. Many studies in the literature studied the use of machine learning approaches to reliably sense irregularities to solve the concerns identified by manual ECG signal analysis. The bulk of these techniques entail a signal preprocessing phase [8,9]. Following that, handcrafted features, which are usually statistical summaries, are derived from these data and employed in subsequent analysis for the prediction and classification problem solving. The inference engine employs Linear Regression, Support Vector Machines, Bayesian Classifier, Decision Trees, Random Forest and other classic machine learning algorithms [10,11] for ECG analysis. Although these handcrafted features give an adequate representation of the signal, new machine learning experiments show that the feature-extraction and illustration approaches are much capable of making accurate and precise predictions and are scalable also for large datset. The complete deep learning architecture enables the computer to learn the best suited features. This approach produces a more accurate and precise depiction of the ECG data, allowing the algorithm to complete with a human cardiologist. We propose a new framework for ECG analysis based on a MLDLM technique with a high capacity for learning. Because the deep learning model was trained with the objective of detecting arrhythmias, it is realistic to expect that the model will need to learn the majority of the shape-related components of the ECG signal. Section 2 describes the related work. Section 3 describes the proposed work’s material and procedure. Section 4 discusses the outcome and discussion. Section 5 is the work’s conclusion.
2
Related Work
Deep learning approaches have a massive number of features that must be trained using massive amounts of data. In computer vision, the ImageNet database and cutting-edge deep learning models are employed to transfer information between different picture understanding tasks [5]. Another instance is that distinct sentence classification tasks have been shown to share a large amount of sentence knowledge [6]. In health informatics, transfer learning has seen limited implementation. For patients with deteriorating diseases, the authors [7] used the parameters of a Gaussian-expert-process and trained it on stable patients. In [12], the author proposed an incremental broad learning classification model for arrhythmia-type recognition based on the biased dropout technique for morphological features extraction of the de-noised signal of the proposed model in the ECG signal pre-processing. According to them, their study was the first to use the IBL model to the classification of arrhythmias. On the MIT-BIH dataset, their model was trained and evaluated. The authors of proposed a deep CNN model in [13] to reliably classify heart-beats. They proposed a batch-weighted loss function to correctly measure the loss in order to address the imbalance between classes.
Multi-layer Deep Learning Model
47
In [14], the authors presented a 33-layer CNN model and an NCBAM module. It was preprocessed in order to feed ECG data into the CNN architecture in order to extract spatial and channel information. A learning matrix fuses ECG spatial, channel, and temporal information. To compensate for the different contributions, the learnt matrix will mine rich relationship information across the categories of information. In [15] proposes a preprocessing strategy that considerably enhances the accuracy of deep learning models used for ECG classification, together with a redesigned deep learning architecture that increases training stability. DeepECG was proposed to use deep CNN model known as (DCNN) with transfer learning to classify arrhythmias based on ECG pictures [16]. The authors conducted a thorough examination of several neural network topologies. In [17], the author suggests using a long short-term memory network in conjunction with CNN to reliably identify CAD ECG signals. The model can detect CAD ECG signals with reasonable accuracy. In [18], proposes a simple and lightweight approach for detecting irregular heart-beats. Furthermore, the authors pre-processed the data and augmented the lower-numbered classes with six different methods to reduce class imbalance while enhancing accuracy. A 9-layer deep CNN was proposed in [19] to classify five different kinds of heart-beats in ECG readings. The experimentation was carried out using original and noise-attenuated ECG signal databases. This collection was artificially augmented to equalise the number of occurrences of the five heart-beat classes and filtered to remove high-frequency noise. The authors of [20] proposed an inter-patient strategy and a method to identify ECG data using random forests and wavelet packet entropy. The author uses wavelet packet decomposition to deconstruct the ECG signals, then calculates entropy from the decomposed coefficients, and then uses Random Forest to develop a classification model for ECG. In [21] proposed a novel method for automatically detecting MI using ECG signals. The authors use a deep CNN system to recognise normal and MI ECG rhythms automatically. In [22] proposes a unique ECG characteristic based on ECG signal fitting with a 20th order polynomial function (PolyECG-S) feature and defined it to the fitted ECG curve. The result is demonstrated by the Akaike information criterion and attained 94.45% accuracy during testing the MI dataset.
3 3.1
Material and Method Dataset
The dataset for the MLDLM is separated into two groups of heart-beat derived from the MIT-BIH Arrhythmia Dataset and the PTB Diagnostic ECG Dataset. Both of these are very popular datasets for heart-beat classification. To evaluate the proposed model, we used a dataset with a total of 109446 ECG Heart-beats with sampling frequency 125 Hz. The MIT-BIH arrhythmia dataset contains the first five classes, while the PTB Diagnostic ECG Dataset contains only two classes. Both datasets include enough information to train a CNN. This dataset
48
K. Jain et al.
was used to test CNN architectures for heart-beat categorization. These signals are preprocessed and segmented, with each segment corresponding to a heartbeat. The first dataset is divided into five classes: N, S, V, F, and Q. The PTB Diagnostic ECG Dataset is the second dataset. The other dataset is separated into two sections.
Fig. 2. MIT-BIH arrhythmia dataset classes
3.2
Methodology
In this part, a thorough examination of the categories and heart-beats in Fig. 2 has been conducted. Pre-processing. We present a simple as well as effective technique for preprocessing the ECG signals and extracting heart-beats because ECG beats are inputs to the proposed method. The following are to be followed: We divided the continuous ECG signal into 10 s windows and chose one of them. We then normalise the amplitude between 0 and 1 to determine the set of all local maximums based on first derivative 0 crossings. The ECG R-peaks is calculated by putting a threshold of 0:9 to the local maximums (normalised values). We also compute the window’s nominal heart-beat duration as the average of RR time intervals (T). Then, for each R-peak, we choose a signal component of 1:2 length. Finally, we pad each picked component with zeros to make it equal to a specified length. This heart-beat extraction method is straightforward and yet effective. Furthermore, all of the extracted beats have the same length, which is required for them to be used as inputs to the future processing sections. Arrhythmia Classifier and MI Predictor Training. In this research, we trained the proposed MLDLM technique on the MIT-BIH dataset to classify ECG beat patterns, as seen in Fig. 3. The trained network can be utilised not only for beat classification, but also as an informative representation of heartbeats, as demonstrated in the following section. As inputs, extracted beats are used. All convolution layers in this case use 1D convolution across time and contain 32 kernels of size 3. We employed 16 filters, followed by Max-pooling to minimise the spatial dimensions of the output volume. Then, to mini ‘mise
Multi-layer Deep Learning Model
49
Fig. 3. Architecture of the Multi-layer Deep Learning Model
the spatial dimensions, we employed 32 filters followed by Max-pooling once more. Conv2D will eventually learn 64 filters. In addition to this we use max pooling of size 3 and stride 2 in all pooling layers. A convolutional layer, a ReLU nonlinearities layer, a residual skip connection, and a pooling layer are all included in each residual block. After training the CNN model, we use the final convolution layer’s output activations to represent input heart-beats. To perform MI classification, we employ the illustrations as input to a layer fully connected network with 32 neurons at each layer. Implementation. We used NumPy, TensorFlow, Seaborn and MathplotLib for Python backend deep-learning libraries to implement CNN in this study. The model was trained using Google Collaboratory. The loss function is cross entropy loss on the ReLu outputs. We employed the Adam optimization method to train the networks. Every 10000 iterations, the learning rate decays exponentially with a decay factor of 0:75. On a GeForce GTX 1080Ti processor, training all of the networks took about 125 min.
4
Results and Discussion
The performances metrics (accuracy, precision, specificity, negative prediction value, and sensitivity) are estimated according to the following Eqs. (1–4) using True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) values are calculated as follow: Accuracy = (TP + TN) / (TP + FN + TN + FP)
(1)
Preision = TP / (TP + FP)
(2)
Specificity = TN / (TN + FP)
(3)
Sensitivity = TP / (TP + FN)
(4)
50
K. Jain et al.
Fig. 4. Confusion Matrix obtained from the Multi-layer Deep Learning Model Table 1. Comparison of other work obtained from MIT-BIH dataset Article
Year Model
Accuracy
MLDLM
2021
Multi-layer deep CNN
98.75
Acharya et al. 2017
Deep CNN
93.50
Li et al.
Wavelet packet entropy and random forests 94.63
2016
The arrhythmia classifier was tested with 4079 heart-beats (out of which 819 is from N, S V, Q class and 803 from F class) from the test split that were not employed in the CNN model training stage. Figure 4 depicts the confusion matrix after the testing phase of MLDLM technique. As perceived from the figure that the proposed model makes correct and precise predictions and distinguishing between distinct categories of heart-beats. Table 1 compares the proposed method’s average accuracy to other approaches Acharya et al. [18] which is based on the deep CNN and Li et al. [19] which is based on the Wavelet packet entropy and random forests. While developing a classifier for MIT-BIH, the results depicts that the accuracies achieved in this work is 98.75% which is better than the Acharya et al. and Li et al. whose accuracies are 93.5% and 94.63% respectively. We used the learnt representations to train our MI predictor, and we split the dataset into 8:2 ratio which means 80% of the PTB dataset is the training dataset and the leftover 20% is the testing dataset. Table 2 compares this work for MI classification to existing studies in the literature for the performance metric. The proposed study outperforms the work of Acharya et al. and Liu et al. in terms of accuracy, precision, specificity, and sensitivity.
Multi-layer Deep Learning Model
51
Table 2. Comparison Results obtained MI classification
5
Work
Accuracy % Precision % Specificity % Sensitivity %
MLDLM
98.87
98.63
98.33
98.15
Acharya et al. 96.36
96.5
95.43
96.15
Liu et al.
94
94.36
96.17
95.13
Conclusion
In this paper, we describe A Multi-layer Deep Learning Model for ECG heartbeat Classification. We specifically trained a multi-layer deep CNN with enduring networks on MIT-BIH Arrhythmia dataset for classification problem. We also demonstrated that the illustration learned for the problem can also be employed to train and validate effective MI classifiers. According to the results, the suggested approach can predict both tasks with accuracies comparable to the other approaches. This research should be expanded to include areas such as cloud and mobile systems in the future. It is critical to advance in the wearable technologies that incorporate low-power consumption. Acknowledgement. This research has been financially supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70−2021−00143 dd. 01.11.2021, IGK 000000D730321P5Q0002). Authors acknowledge the technical support and review feedback from AILSIA symposium held in conjunction with the 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022).
References 1. Kachuee, M., Fazeli, S., Sarrafzadeh, M.: ECG heartbeat classification: a deep transferable representation. In: 2018 IEEE International Conference on Healthcare Informatics (ICHI), pp. 443–444. IEEE (2018) 2. Roth, G.A., et al.: Global burden of cardiovascular diseases and risk factors, 1990– 2019: update from the GBD 2019 study. J. Am. College Cardiol. 76(25), 2982–3021 (2020) 3. Alam, S.T., Hossain, M.M., Islam, M.K., Rahman, M.D.: Towards development of a low cost and portable ECG monitoring system for rural/remote areas of Bangladesh. Int. J. Image Graphics Sign. Process. 10(5), 24–32 (2018) 4. Jain, K., Singh, A., Singh, P., Yadav, S.: An improved supervised classification algorithm in healthcare diagnostics for predicting opioid habit disorder. Int. J. Reliab. Qual. E-Healthcare (IJRQEH) 11(1), 1–16 (2022) 5. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014) 6. Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017
52
K. Jain et al.
7. Alaa, A.M., Yoon, J., Hu, S., Van der Schaar, M.: Personalized risk scoring for critical care prognosis using mixtures of gaussian processes. IEEE Trans. Biomed. Eng. 65(1), 207–218 (2017) 8. Jain, K., Kumar, A.: A lightweight data transmission reduction method based on a dual prediction technique for sensor networks. Trans. Emerg. Telecommun. Technol. 32(11), e4345 (2021) 9. Agarwal, A., Jain, K., Dev, A.: Modeling and analysis of data prediction technique based on linear regression model (DP-LRM) for cluster-based sensor networks. Int. J. Ambient Comput. Intell. (IJACI) 12(4), 98–117 (2021) 10. Raghuvanshi, K.K., Agarwal, A., Jain, K., Singh, V.B.: A generalized prediction model for improving software reliability using time-series modelling. Int. J. Syst. Assur. Eng. Manage. 13(3), 1309–1320 (2022) 11. Jain, K., Kumar, A.: An energy-efficient prediction model for data aggregation in sensor network. J. Ambient. Intell. Humaniz. Comput. 11(11), 5205–5216 (2020) 12. Li, J., Zhang, Y., Gao, L., Li, X.: Arrhythmia classification using biased dropout and morphology-rhythm feature with incremental broad learning. IEEE Access 9, 66132–66140 (2021) 13. Sellami, A., Hwang, H.: A robust deep convolutional neural network with batchweighted loss for heartbeat classification. Expert Syst. Appl. 122, 75–84 (2019) 14. Jikuo Wang, X., Qiao, C.L., Wang, X., Liu, Y.Y., Yao, L., Zhang, H.: Automated ECG classification using a non-local convolutional block attention module. Comput. Methods Programs Biomed. 203, 106006 (2021) 15. Kanani, P., Padole, M.: ECG heartbeat arrhythmia classification using time-series augmented signals and deep learning approach. Procedia Comput. Sci. 171, 524– 531 (2020) 16. Li, C., et al.: Deepecg: image-based electrocardiogram interpretation with deep convolutional neural networks. Biomed. Signal Process. Control 69, 102824 (2021) 17. Tan, J.H., et al.: Application of stacked convolutional and long short-term memory network for accurate identification of cad ECG signals. Comput. Biol. Med. 94, 19–26 (2018) 18. Mahmud, T., Hossain, A.R., Fattah, S.A.: Ecgdeepnet: a deep learning approach for classifying ECG beats. In: 2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA), pp. 32–37. IEEE (2019) 19. Acharya, U.R., et al.: A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. 89, 389–396 (2017) 20. Li, T., Zhou, M.: ECG classification using wavelet packet entropy and random forests. Entropy 18(8), 285 (2016) 21. Acharya, U.R., Fujita, H., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf. Sci. 415, 190–198 (2017) 22. Liu, B., et al.: A novel electrocardiogram parameterization algorithm and its application in myocardial infarction detection. Comput. Biol. Med. 61, 178–184 (2015)
Analyzing Electoral Data Using Partitional and Hierarchical Clustering Algorithms Paulo Rogerio Nietto1(B) , Maria do Carmo Nicoletti1,2 , and Nilton Cesar Sacco1,3 1 Faculdade Campo Limpo Paulista (FACCAMP), C. L. Paulista, SP, Brazil
{pnietto,carmo}@cc.faccamp.br
2 Universidade Federal de São Carlos (UFSCar), S. Carlos, SP, Brazil 3 Faculdade de Tecnologia (FATEC), Araras, SP, Brazil
Abstract. This paper describes the use of two clustering algorithms for investigating how results from the second round of the last Brazilian presidential election in Brazil are organized, taking into account values of the Municipal Human Development Index (MHDI). The investigation intended to uncover possible relationships between indicators that characterize profiles of Brazilian voters and profiles of Brazilian municipalities. MHDI is a customized version of the Human Development Index (HDI), and represents the development and quality of life offered by Brazilian municipalities. The work carried out is based on data results of the last presidential elections in Brazil, held in 2018, focusing on municipality´s data described by MHDI sub-indexes, as well as municipality’s population, related to the 5,558 municipalities in the country. The analysis of results of the second turn of the last presidential election, taking into account MHDI values was carried out based on clustering results induced by two clustering algorithms that employ different strategies: hierarchical (algorithm DIANA) and partitional (algorithm k-Means). Keywords: Data Extraction · Data Analysis · Electoral Data · HDI · MHDI · Hierarchical Clustering · Partitional Clustering
1 Introduction Brazil is a federation composed of 26 states, one Federal District (FD) and 5,558 municipalities. The states are grouped into five regions: Northern (N), Northeast (NE), CentralWest (CO), Southeast (SE) and Southern (S). The FD is a legal and special entity of internal public law, part of the political-administrative structure of the country and is neither a state nor a municipality, although it has the legislative powers reserved to states and municipalities, as stated in Article 32, § 1º of the Brazilian Constitution [1]. The FD belongs to the CO region. Currently the Brazilian government is source of large volumes of data made available in public repositories; several procedures can be used by the general public to access the repositories to monitor how public money is employed. Among the data made publically available it can also be found data related to the quality of life in Brazilian municipalities. The Superior Electoral Court (TSE) contributed to stressing © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 53–64, 2023. https://doi.org/10.1007/978-3-031-27440-4_6
54
P. R. Nietto et al.
transparency by making publically available data on electoral processes at municipal, state and federal levels. The United Nations Development Programme (UNDP) has provided important data on human development in numerous countries. The UNDP has also collaborated with improving indices that characterize the quality of life, health and education of countries. For that the HDI (Human Development Index) proposed by the United Nations (UN) is used as a metric for evaluating human development in countries worldwide. The index varies in the interval [0 1]; underdeveloped countries have their HDI value tending to 0. In 2018 the polarization of the Brazilian political scenario reached its peak during the second round of the presidential elections, with the two presidential candidates having opposite political/governmental agendas. On the one hand, the Workers’ Party (PT), a left-wing party, which had remained in power for four consecutive terms, was trying to regain the presidency after its last president was deposed by a controversial impeachment process. On the other hand, the Social Liberal Party (PSL), a far-right party based on economic liberalism, militarism, anti-communism and social conservatism, was trying to fight for the presidency. The election was won by the PSL candidate with 55.13% of the valid votes [2]. The whole political scenario regarding the 2018 s round election turned out to be a good opportunity for conducting studies on possible factors that influenced the voters’ choice, in such bipolar scenario. Knowing the winner of the election in each municipality eventually can lead to the discovery of patterns relating the result of the election to the human development index of each municipality and, perhaps, to the size of its population. In the academic literature there are many research works related to investigating election´s results, several of them to identify factors that can influence election´s results. Others can be seen as attempts for experiencing with techniques for modeling and, eventually, predicting electoral results. A large number of published works are based on data extracted from Twitter messages [3–8]. The most simplistic approach used is based on counting the number of times a political party or a politician name is mentioned in Twitter messages, as those described in [9–12]. In some works, however, a refinement of the previous approach is implemented by the addition of a process of sentiment analysis aiming at improving the predictive capability of such procedures, such as the proposals in [13–15]. In [16] the approach is combined with the use of machine learning algorithms [17] to reduce the bias of Twitter users’ sample. The work described in [18] besides presenting an up-to-date view on several approaches related to elections and their results based on Twitter´s messages, also focuses on prediction models based on classical political science combined with social media sentiments analysis. The research work presented in this article continues a previous work described in [27] where the main focus was on the statistical analysis of correlation between municipality electoral data results and municipalities data related to Education, Income and Life-Expectancy. The goal of the work reported in this article is to compare the organization of the election 2nd round results’ data with the organization of municipalities’ data described by the Municipal Human Development Index. For the comparison, the municipalities’ populations are taken into consideration. Both data organizations are induced by clustering algorithms and the comparison between the induced clusterings are measured by the external validation index Rand [19]. The remainder of this article is organized into four sections where Sect. 2 presents the MHDI adaptation of the HDI, for
Analyzing Electoral Data Using Partitional
55
evaluating municipalities in a country. Section 3 describes the groups of data instances used in the experiments, extracted from two public data sources. It also describes how the data was pre-processed aiming at standardizing them for the planned experiments. Section 4 presents the methodology employed in the conducted experiments followed by the results as well as a discussion about them. Section 5 resumes the work done and highlights its main contributions.
2 The Municipal Human Development Index (MHDI) Although HDI has been the target of several criticisms (see [20, 21]), it is the most used and accepted index of this kind. The Municipal Human Development Index (MHDI) is a customized version of the HDI to Brazilian municipalities and was proposed and defined in 2012 by the UNDP Brazil and the J. Pinheiro Foundation [22]. Although the MHDI index is still based on Education, Income and Longevity, its three sub-indices namely MHDI Education, MHDI Income and MHDI Longevity, have been adapted to reflect the Brazilian reality and the available national indicators and it can be considered a reasonable view of the development status of a municipality. The MHDI Education Index follows the same approach of the HDI Education Index, where calculations of educational levels of adults and of young people have different rules: (a) APEL (Adult Population Educational Level): measures the percentage of adults aged greater or equal 18 year who have completed primary education. (b) YPEL (Young Population Educational Level): is the arithmetic mean of the four indicators below, as shows Eq. (1). • • • •
I5–6 : % of people aged 5–6 attending school, I11–13 : % of people aged 11–13 attending the final years of primary school, I15–17 : % of people aged 14–17 with primary education completed and I18–20 : % of people aged 18–20 with secondary education completed. YPEL =
I5−6 + I11−13 + I14−17 + I18−20 4
(1)
The MHDI Education Index is then calculated as the geometric mean of APEL and YPEL, weighted by 1 and 2, respectively, as shows Eq. (2). √ 3 (2) MHDIeduc = APEL YPEL YPEL The MHDI Income Index is based on the average capacity for acquiring goods and services by inhabitants of a municipality, as well as on constant factors that state the minimum and maximum values of reference for this indicator, as shows Eq. (3). MHDIinc = where:
ln(MIPC) − ln(MinR) ln(MaxR) − ln(MinR)
(3)
56
P. R. Nietto et al.
• MIPC: municipal income per capita, • MinR: minimum value of reference (R$8.00, roughly equivalent to 100 PPS, or Purchasing Power Standard, which is the amount used in calculating the global HDI) and • MaxR: maximum value of reference (R$ 4,033.00, equivalent to the lowest income per capita of the 10% of Federative Units with highest average income in the country). The MHDI Longevity Index is also based on a single indicator, the life expectancy at birth, which measures the mortality rates for each age group in the municipality. This indicator takes into account diseases as well as external causes, such as accidents and general violence. The MHDI Longevity Index depends on two values associated to the minimum and maximum values of reference related to the life expectancy at birth indicator, as shows Eq. (4). MHDIlong =
(LEAB − MinR) (MaxR − MinR)
(4)
where: • LEAB: life expectancy at birth indicator • MinR: minimum value of reference (25 years old) and • MaxR: maximum value of reference (85 years old) Given the previous calculations of the three sub-indices, the MHDI index is calculated as the geometric mean of the values of the education, income and the longevity subindices, as shows Eq. (5). (5) MHDI = 3 MHDIeduc × MHDIinc × MHDIlong The MHDI and its three sub-indices have their values in the interval [0 1]. Usually IDH are classified as: (a) very high if 0.800 ≤ IDH ≤ 1.000. (b) high if 0.700 ≤ IDH ≤ 0.799 (c) medium if 0.600 ≤ IDH ≤ 0.699; (d) low if 0.500 ≤ IDH ≤ 0.599 and (e) very low if 0.000 ≤ IDH ≤ 0.499.
3 Data Used in the Experiments Before considering the data about municipalities it is important to contextualize the geographic regions related to them. Table 1 shows the 26 Brazilian states grouped in five regions. The FD (Federal District) is geographically in the CO region. Besides the MHDI, the experiments described in Sect. 4 also consider the population size of municipalities as a potential indicator that could influence election results. In the experiments Brazilian municipalities were divided into groups, based on their population size as shows Table 2.
Analyzing Electoral Data Using Partitional
57
Table 1. Regions, States per Region, Population, No. Municipalities (NM) per Category (S: small, M: medium, ML: medium-large, L: large, VL: very large). Regions
States
Northern (N) 7 states
Acre (AC), Amapá (AP), 18,906,962 Amazonas (AM), Pará (8,82%) (PA), Rondônia (RO), Roraima (RR), Tocantins (TO)
Population (%)
449 (8.08%) S = 275; M = 154; ML = 18, L = 0; VL = 2
Northest (NE) 9 states
Alagoas (AL), Bahia 57,667,842 (BA), Ceará (CE), (27.09%) Maranhão (MA), Paraíba (PB), Pernambuco (PE), Piauí (PI), R.G.do Norte (RN), Sergipe (SE)
1,789 (32.19%) S = 1,190; M = 541; ML = 47; L = 7; VL = 4
CentralWest (CO) 3 states
Goiás (GO), Mato Grosso (MT), M. G. do Sul (MS)
16,707,336 (7.79%)
465 (8,37%) S = 359; M = 89; ML = 14; L = 2; VL = 1
Southest (SE) 4 states
Espírito Santo (ES), Minas Gerais (MG), R. Janeiro (RJ), São Paulo (SP)
89,632,912 (42.04%)
1,667 (29,99) S = 1,145; M = 384; ML = 121; L = 12; VL = 5
Southern (S) 3 states
Paraná (PR), Santa Catarina (SC), R. G. do Sul (RS)
30,402,587 (14.26%)
1,188 (21.37%) S = 940; M = 200; ML = 44; L = 2; VL = 2
Total of states – 26
NM (%)
Total of Municipalities Total per Category 5,558 S = 3,909; M = 1,368; ML = 244; L = 23;VL = 14
Table 2. Number of municipalities (NM) in Brazil grouped by population size. Population
NM
Size
(1) ≤20,000
3,909
Small (S)
(2) 20,001–100,000
1,368
Medium (M)
(3) 100,001–500,000
244
Medium-Large (ML)
(4) 500.001–1,000,000
23
Large (L)
(6) ≥1,000,001
14
Very-Large (VL)
Total of Municipalities
5,558
–
58
P. R. Nietto et al.
For running the experiments two groups of data were used: (1) Results of the second round of the 2018 presidential election which are publically available in the site of the Brazilian Superior Electoral Court (TSE − Tribunal Superior Eleitoral) [2] related to all the Brazilian municipalities were downloaded. The site provides condensed results related to each electronic voting device on each one of the 5,558 Brazilian municipalities. (2) Data related to the 5,558 municipalities’ population sizes and their MHDI values were extracted from the Atlas of Human Development in Brazil, available via site [22]. The site is jointly maintained by the United Nations Development Programme (UNDP), the Institute of Applied Economic Research (IPEA - Instituto de Pesquisa Economica Aplicada), the J. Pinheiro Foundation (FJP − Fundação J. Pinheiro) and the Brazilian Federal Government. It is important to mention that some of the available data related to population and MHDI values refer to year 2010, the last year a demographic census was conducted in the country. Previously to using the data of interest from both sources, a methodological process for preprocessing the data was defined. The process was repeated for both data sources and had four steps: (1) downloading the original data from both sources, (2) handling inconsistencies, (3) selecting relevant attributes and (4) importing the data. In step (2) the work of detecting inconsistencies and minor other problems and fixing them was conducted, such as different encoding formats of characters and misspells in names of municipalities. Inconsistencies were detected with the help of the Exploratory tool during the first tests for data import and its visualization. To organize the data aiming at the clustering experiments, the methodology employed is based on the procedure organizing_data, shown in Fig. 1. The procedure creates 25 data files, each containing municipalities’ data related to a pair region-population, where populations were categorized into five levels: small (S), medium (M), medium-large (ML), large (L) and very-large (VL), as in Table 2. procedure organizing_data(Reg,Pop,Files) input Reg % Reg[1]=N, Reg[2]=NE, Reg[3]=CO, Reg[4]=SE, Reg[5]=S Pop % Pop[1]=s, Pop[2]=m, Pop[3]=ml, Pop[4]=l, Pop[5]=vl output: Files % set of files, each containing data related to each Reg-Pop begin Files ∅ for creg 1 to 5 do for cpop 1 to 5 do begin File_creg_cpop extract_data_munic(Reg[creg],Pop[cpop]) Files Files ∪ File_creg_cpop end end end procedure
Fig. 1. Organizing municipalities’ data as 25 data files.
Analyzing Electoral Data Using Partitional
59
4 Methodology and Experiments The methodology used for conducting the experiments is based on (1) the organization of data instances related to municipalities in the five regions of Brazil and, per region, organized based on the population size municipalities, as described in Sect. 3 (Fig. 1) (2) the use of the two clustering algorithms, k-Means (partitional) [23] and DIANA (divisive hierarchical) [24] (3) the use of the internal Silhouete index for individually evaluating induced clusterings and (4) the use of the Rand index [19] for comparing two clusterings induced by each algorithm; one induced from instances described by the MIDH and the other induced using the same instances, but described by the election result. The software used was programmed and executed using the Python version 3.7.1 for Windows 64 bits operating systems. The development environment used was Spyder, version 3.3.1. The experiments start by constructing 25 data sets identified as shows Table 3, containing municipalities’ data organized per region and population, using the procedure organizing_data described in Fig. 1. Initially, for each one of the 25 data sets obtained by the procedure in Fig. 1, the following two-step procedure was executed separately for each algorithm, k-Means and DIANA. (1) All 25 data sets have instances described by two attributes i.e., the corresponding municipality’s MHDI and a binary attribute referred to as winner, with a categorical value informing the winner in the corresponding municipality. Each one of the 25 data sets was split into two files; one of them containing data instances described by the MHDI attribute and the other containing instances described by the winner, in the corresponding municipality.
Table 3. Files with municipalities´ data from regions (N, NE, CO, SW, S) taking into account five population´s sizes (small, medium, medium-large, large and very large). Data files File_N_s
File_NE_s
File_CO_s
File_SE_s
File_S_s
File_N_m
File_NE_m
File_CO_m
File_SE_m
File_S_m
File_N_ml
File_NE_ml
File_CO_ml
File_SE_ml
File_S_ml
File_N_l
File_NE_l
File_CO_l
File_SE_l
File_S_l
File_N_vl
File_NE_vl
File_CO_vl
File_SE_vl
File_S_vl
(2) Both files were input to each algorithm, k-Means and DIANA, one file at a time, with parameter k = 2, for the number of clusters each induced clustering should have. So, each algorithm induced two clusterings, each with 2 clusters, where the first was based on instances described by the election´s results (winner) and the second, based on the same instances now described by MHDI attribute values. The two clusterings induced by each algorithm were then compared to each other in
60
P. R. Nietto et al.
relation to their organizational similarity and, for that, the Rand index was used; this index can be seen as a measure of similarity between two data clusterings. It is important to inform that when using k-Means, in order to bypass a possible misleading random initialization in its first phase, the algorithm was executed 5 times using random initialization and, at each time, the corresponding induced clustering was evaluated using the Silhouette index. The Silhouette index is a measure that emphasizes both, separation between clusters of the clustering and their compactness and have values in the interval [−1 1]. The index validates the induced clustering based on the pairwise difference of between and within-clusters distances [25]. The results presented in the following tables refer to the k-Means using as the k initial centroids those that promoted the best (among the 5) clustering, according to values of the Silhouette index. An interpretation of Silhouette [26] can be stated as: values in 0.71–1.00 indicate a clustering having a sound structure; in 0.51–0.70, as having a reasonable structure; in 0.26–0.50, as having a weak structure and ≤0.25 as a clustering with no substantial structure. As can be seen in Table 4 and Table 5, as far as Silhouette values are concerned, results point out the induced clustering as having a reasonable structure. In Table 4 the S column represents the value of the internal Silhouette index on the clustering induced by data represented by the MHDI attribute. The goal of this experiment was to evaluate the contribution of MHDI for inducing good clusterings. Taking into account that values of the MIDH varies between 0 and 1, the clusters formed were not well separated, although tended to be compact. This could be a possible justification for the values of such index, in most experiments, are always slightest above the average value, which can be translated as resulting clusterings have a reasonable structure. The Rand index is an external clustering validation index and, as such, has been employed in the experiments for comparatively measuring the similarity of the two clusterings induced by the same algorithm, using the same data set of instances, but described by different attributes i.e., by MHDI and by winner. Values related to the Rand index suggest that the two clusterings induced by the same algorithm i.e., by k-Means or by DIANA, considering instances described by the MHDI and considering instances described by winner, only have partial similarity, since Rand values are slightly around 0.55 in almost all data sets. However, the Rand of 0.79 stands out, when the 18 data instances in File_N_ml are input to either clustering algorithms which could be considered as a correlation between MHDI and winner. Both algorithms induced clusterings with close Rand values and the algorithm has not influenced the final results. A second group of experiments was devised, aiming at investigating the granularity (in terms of number of clusters in a clustering) involved in the clustering process. As the performance of both algorithms in the previous group of experiments was similar, the second group of experiments only used the k-Means. The experiments are the same except for the number of clusters in the clustering involving instances described by MHDI, that was set to 3 while maintaining the previous value of 2 for instances described by winner, as expected. Comparing values related to k-Means in Table 4 and Table 5 it can be noticed that by inducing clusterings with three clusters from instances described by MHDI had a negative impact on the value of the Rand index, except for a single
Analyzing Electoral Data Using Partitional
61
Table 4. Validity index values (Silhouette (S) and Rand (R)) associated with clusterings induced by k-Means and by DIANA, for number of clusters = 2. The minus sign - informs problems in the input data due to: (*) insufficient no. of instances, (**) only one class, (***) no municipality in the file. k-Means
DIANA
Files
S
R
S
R
File_N_s
0.56
0.52
0.55
0.51
File_N_m
0.54
0.57
0.55
0.60
File_N_ml
0.62
0.79
0.52
0.79
File_N_l(*** )
–
–
–
–
File_N_vl(* )
–
–
–
–
File_NE_s
0.55
0.50
0.55
0.50
File_NE_m
0.55
0.50
0.54
0.51
File_NE_ml
0.57
0.57
0.53
0.53
File_NE_l
0.82
0.43
0.82
0.43
File_NE_vl(* )
–
–
–
–
File_CO_s
0.55
0.59
0.55
0.59
File_CO_m
0.55
0.61
0.55
0.61
File_CO_ml(** )
–
–
–
–
File_CO_l(* )
–
–
–
–
File_CO_vl(* )
–
–
–
–
File_SE_s
0.58
0.69
0.58
0.68
File_SE_m
0.57
0.66
0.57
0.66
File_SE_ml
0.57
0.55
0.57
0.54
File_SE_l(** )
–
–
–
–
File_SE_vl(** )
–
–
–
–
File_S_s
0.55
0.56
0.55
0.56
File_S_m
0.54
0.55
0.54
0.55
File_S_ml
0.58
0.52
0.58
0.51
File_S_l(* )
–
–
–
–
File_S_vl(* )
–
–
–
–
experiment, the one related to File_NE_l although the difference (0. 05) between values can be neglected.
62
P. R. Nietto et al.
Table 5. Validity index values (Silhouette (S) and Rand (R)) associated with clusterings induced by k-Means for number of clusters = 3. The minus sign - informs problems with the input data due to (*) insufficient no. of instances; (**) only one class; (***) no municipality in the file. File
S
R
File
S
R
File_N_s
0.57
0.50
File_SE_s
0.56
0.61
File_N_m
0.56
0.53
File_SE_m
0.58
0.51
File_N_ml
0.58
0.67
File_SE_ml
0.58
0.42
File_N_l (*** )
–
–
File_SE_l(** )
–
–
File_N_vl(* )
–
–
File_SE_vl(** )
–
–
File_NE_s
0.52
0.37
File_S_s
0.56
0.50
File_NE_m
0.55
0.38
File_S_m
0.55
0.46
File_NE_ml
0.57
0.44
File_S_ml
0.72
0.46
File_NE_l
0.84
0.48
File_S_l(* )
–
–
File_S_vl(* )
–
–
File_NE_vl(* )
–
–
File_CO_s
0.52
0.53
File_CO_m
0.55
0.42
File_CO_ml(** )
–
–
File_CO_l(* )
–
–
File_CO_vl(* )
–
–
5 Conclusions This paper describes the use of clustering algorithms for conducting an investigation into electoral data, with the goal of learning about the relevance that a particular characteristic of municipalities (MHDI) can have on election results. The volume of 5,558 data instances used in the experiments refers to information about municipalities, where each data instance is described by MHDI and the winner political party. Two clustering algorithms have been used in the experiments: the divisive algorithm DIvisive ANAlysis (DIANA) proposed with the intent of minimizing the computational complexity embedded in divisive algorithms and the well-known k-Means partitional algorithm. Both clustering algorithms have been chosen due to being conceptually different. The experiments aimed at comparing both municipalities organizations: by election result and by MHDI values, taking into account five groups of population: small, medium, medium-large, large and very large. Both algorithms reached a similar value which suggests that the first group of experiments was not biased by the clustering algorithm. Values related to the Rand index suggest that the two clusterings induced by the same algorithm i.e., by k-Means or by DIANA, considering instances described by the MHDI and considering instances described by winner, only have partial similarity since Rand values are slightly around 0.55 in almost all data sets. Acknowledgments. Authors thank CAPES, CNPq and UNIFACCAMP.
Analyzing Electoral Data Using Partitional
63
References 1. Constituição Brasileira. https://www25.senado.leg.br/web/atividade/legislacao/constituicaofederal. Retrieved 9 Jul 2021 2. Tribunal Superior Eleitoral. divulgação do resultado das eleições. http://divulga.tse.jus.br/ofi cial/index.html. Retrieved 15 Nov 2021 3. Gayo-Avello, D., Metaxas, P.T., Mustafaraj, E .: Limits of electoral predictions using Twitter. In: Proc. of the Fifth Int. AAAI Conf. on Weblogs and Social Media, pp. 490–493 (2011) 4. Nguyen, D., Trieschnigg, D., Meder, T.: Tweetgenie: development, evaluation, and lessons learned. In: Proc. of The 25th Int. Conf. on Computational Linguistics, pp. 62–66 (2014). http://doc.utwente.nl/94056/ 5. Barbera, P., Rivero, G.: Understanding the political representativeness of Twitter users. Soc. Sci. Comput. Rev. 33(6), 712–729 (2015) 6. Ahmed, S., Jaidka, K., Cho, J.: The 2014 Indian elections on Twitter: a comparison of campaign strategies of political parties. Telemat. Inform. 33(4), 1071–1087 (2016) 7. Sanders, E., de Gier, M., van den Bosch, A.: Using demographics in predicting election results with twitter. In: Spiro, E., Ahn, Y.-Y. (eds.) SocInfo 2016. LNCS, vol. 10047, pp. 259–268. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47874-6_18 8. Korakakis, M., Spyrou, E.P., Mylonas, P.: A survey on political event analysis in Twitter, In: Proc. of The 12th Int. Work. Semant. Soc. Media Adapt. Pers. (SMAP 2017), pp. 14–19 (2006) 9. Sang E T K, Bos J (2012) Predicting the 2011 Dutch senate election results with Twitter, In: Proc. of the Workshop on Semantic Analysis in Social Media, pp.53–60 10. Boutet, A., Kim, H., Yoneki, E.: What’s in your tweets? I know who you supported in the UK 2010 general election. In: Proc. of the 6th Int. AAAI Conf. on Weblogs and Social Media, vol. 6, no. (1), pp. 211–414 (2021) 11. Tumasjan, A., Sprenger, T., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: What 140 characters reveal about political sentiment. In: Proc. of 4th ICWSM, pp. 178–185 (2010) 12. Bessi, A., Ferrara, E.: Social bots distort the 2016 US presidential election online discussion. First Monday 21(11–7) (2016) 13. Jain, A.P., Katkar, V.D.: Sentiments analysis of Twitter data using data mining. In: 2015 Int. Conf. Inf. Process., pp. 807–810 (2015) 14. Bansala, B., Srivastavaa, S.: On predicting elections with hybrid topic based sentiment analysis of tweets. Procedia Comput. Sci. 135, 346–353 (2018) 15. Prabhu, B.P.A., Ashwini, B.P., Khan, T.A., Das, A.: Predicting election result with sentimental analysis using twitter data for candidate selection. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds.) Innovations in Computer Science and Engineering. LNNS, vol. 74, pp. 49–55. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-7082-3_7 16. Coletto, M., Lucchese, C., Orlando, S., Perego, R.: Electoral predictions with twitter: a machine-learning approach. In: 6th Italian Information Retrieval Workshop, pp. 1–12 (2015) 17. Alpaydin, E.: Introduction to Machine Learning, 2nd ed., pp. 537. MIT Press, Cambridge (2010) 18. Liu, R., Yao, X., Guo, C., Wei, X.: Can we forecast presidential election using Twitter Data? An integrative modelling approach. Annals of GIS 27(1), 43–56 (2021). https://doi.org/10. 1080/19475683.2020.1829704 19. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971) 20. Ranis, G., Stewart, F., Samman, E.: Human Development: Beyond the HDI, Economic Growth Center, Yale University, pp. 36 (2005)
64
P. R. Nietto et al.
21. Sagar, A.D., Najam, A.: The human development index: a critical review. Ecol. Econ. 1, 249–264 (1998). https://doi.org/10.1016/S0921-8009(97)00168-7 22. Atlas Brasil. http://atlasbrasil.org.br. Retrieved 15 Dec 2021 23. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Math. Statistics and Probability, vol. 1, no. 14, pp. 281–297 (1967) 24. Kaufman, I., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, Hoboken (2005) 25. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proc. of the 10th International IEEE Conference on Data Mining (ICMD), pp. 911–916 (2010) 26. Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987) 27. Yero, E.J.H., Sacco, N.C., Nicoletti, M.C.: Effect of the municipal human development index on the results of the 2018 Brazilian presidential elections. Expert Syst. Appl. 168, 113305 (2021)
Medical Decision Making Based 5D Cardiac MRI Segmentation Tools Houneida Sakly1(B)
, Mourad Said2 , and Moncef Tagina1
1 COSMOS Laboratory -National School of Computer Sciences (ENSI), University of
Manouba, Manouba, Tunisia [email protected], [email protected] 2 Radiology and Medical Imaging Unit, International Center Carthage Medical, Monastir, Tunisia
Abstract. In this survey, a comparison study between various tools in medical segmentation was proposed. This study serves to evaluate and estimate the adequacy and the impact of the blood flow in the clinical platform which made by GE Healthcare for cardiac MRI. A deep critical analysis of different image processing environments (Libraries, Turn-key, scripting, Data flow….). In the goal to resolve this issue, the major of the fifth dimension of blood flow to handle with the valvular stenosis and regurgitation for medical decision-making was introduced. The contour segmentation approach led to a loss of morphological information. An error rate was estimated when applying a contour to estimate the blood flow in the aortic valve. The interest of 4D cardiac sequence with its studies of the blood flow gives an innovative light to a new terminology of 5D imagery. Keywords: Medical tools · blood flow · 5D imagery · segmentation · decision-making
1 Introduction The segmentation method for flow sequences can introduce errors at the contour marker of the zone of interest. The consequence is an uncertain quantification in the simulations of blood flow. The impact of geometric conditions, limits, and blood viscosity may affect the prediction of parameters useful for the diagnosis of heart disease [1]. In fact, several medical decision-making tools (Onis 4D Viewer, Medisoft EA…) can use 4D flow cardiac imagery; however, they have not yet solved the problem of segmentation perfectly on a dynamic blood flow sequence view of the loss of flow information that can be obtained during the segmentation phase. In this case, the elasticity of the fluid was not correctly identified. Importantly, it is noted that prospective clinical trials have demonstrated that the use of fractional flow reserved in clinical decision-making can identify patients who should be medically treated [1] and those who benefit from revascularization using stents for cardiac pathologies [2]. Various tools offer diversified approaches, such as segmentation, registration, and reconstruction, to extract a set of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 65–72, 2023. https://doi.org/10.1007/978-3-031-27440-4_7
66
H. Sakly et al.
parameters for the identification of pathologies related to prognosis. Therefore, segmentation is considered the most crucial step in medical prognosis. Extracting the area of interest in the segmentation makes it possible to identify parameters such as ejection fraction, heart rate, and maximum velocity. The estimation should be accurate for the measurements of diastolic (EDV) and systolic (ESV) volumes, with the ejection fraction in the left ventricle (LVEF) [3]. Furthermore, epicardial volume segmentation (EpV) is considered crucial for measuring myocardial mass (MM) and other clues such as blood flow compensation and stenosis and regurgitation rates for medical prognosis [4].The remainder of this paper is organized as follows: The second section expands the most useful tools and library devoted to the segmentation step and the contour approach features involved in the medical decision phase. The third depict the concept of the 5D segmentation and medical issue as well as the goals and contributions. Our contribution consists of solving the issue of segmentation in the flow sequences and extracting measurements for medical prognosis.
2 Methods and Materials A description of several image-processing environments based in the segmentation approach is described in Table 1. An open-source platform (VTK, ITK, and MITK) in the application’s libraries for the visualization and the processing of the medical images was described. A detail of some turnkey used in the clinical routine (Volview, Slicer, CAAS MR Flow…) used for the decision-making and medical interpretation. With different scripting such as MATLAB, MATITK, WrapITK could be developed algorithms based on the technique of segmentation in order to improve the existing software and to add functionalities for the experts. Table 1. Comparison of image processing environments.
Image Libraries segmentation [5]
Designation
Supported Somewhat Not Average time supported Supported taken to process segmentation
ITK
X
VTK MITK
8 ± 6 min [6] X
5 ± 0.25 min [7]
X
SimITK [8]
8 ± 6 min [6]
X
NA
ITK-SNAP [9]
X
20.15 min [10]
Multi-Atlas [11]
X
6.4 min [12] (continued)
Medical Decision Making Based 5D Cardiac MRI Segmentation Tools
67
Table 1. (continued)
Turn-key
Designation
Supported Somewhat Not Average time supported Supported taken to process segmentation
VolView
X
±30 min [13]
Slicer
X
13 ± 0.5 min [7]
CAAS MR Flow Scripting
X
MATLAB
X
MATITK
X
WrapITK
X
Data-Flow ANSYS Fluent with openFOAM Osirix
18 min [14] 10 min [15] ±10 min [16] NA
X
X
Report-Card X GE
±12 min [17]
Layout 2 > Layout 6 > Layout 5 > Layout 3 > Layout 4. Table 2. Overall global weightage Criteria
Layout 1
Layout 2
Layout 3
Layout 4
Layout 5
Layout 6
Technical Practices
0.1095461 0.10230234 0.06519717 0.06542137 0.06723106 0.06806202
Work 0.0755859 0.05075821 0.03662696 0.03420789 0.03827669 0.03812235 Environment Ergonomics
0.0750345 0.0583028
0.0347791
0.03529596 0.0404109
0.04256751
Sum
0.2601664 0.21136335 0.13660323 0.13492522 0.14591865 0.14875188
Rank
1
2
5
6
4
3
6 Conclusions Considering facility layout design for the operational performance factors, 6 different layouts for NSH Mysuru were created. Mail flow movement is the major factor for helping to determine the optimal layout design. Facility layout must be considered carefully as they could not be constantly redesigned. AHP was implemented to determine the best layout/alternative. Criteria and sub criteria were set up, based on which the alternatives were compared. The results and weightages were then computed, and the layout with the highest weightage was chosen and sent for further approval and possible implementation from concerned authorities at NSH. The postal administration is quite comfortable with the layout 1 design and planning to implement within six months after getting the necessary approval from Head Post Master General (PMG) from Bangalore. Limitation of this study is can be obtained solution other MCDM methods or heuristics methods such as PROMOTHEE, ENTROPY and PSO and SA algorithms. In future, additional employees can be polled to obtain better FLD findings from diverse domain experts. Other service sectors can use this process to identify the best FLD issues.
84
S. M. Vadivel et al.
References 1. Hadi-Vencheh, A., Mohamadghasemi, A.: An integrated AHP–NLP methodology for facility layout design. J. Manuf. Syst. 32(1), 40–45 (2013). https://doi.org/10.1016/j.jmsy.2012. 07.009 2. Abdul-Hamid, Y.T., Kochar, A.K., Khan, M.K.: An analytical hierarchy process approach to the choice of manufacturing plant layout. J. Eng. Manuf. 213, 397 (1999) 3. Yang, J., Lee, H.: An AHP decision model for facility location selection. MCB University Press 15(9/10), 241–254 (1997) 4. Cambron, K.E., Evans, G.W.: Layout design using the analytic hierarchy process. Comput. Ind. Eng. 20(2), 211–229 (1991) 5. Gao, Z., Yoshimoto, K., Ohmori, S.: Application of AHP/DEA to facility layout selection. In: 2010, 3rd International Joint Conference on Computational Science and Optimization, Tokyo 6. Apple, J.M.: Plant Layout and Material Handling. Wiley, New York (1997) 7. Sengazhani Murugesan, V., Sequeira, A.H., Shetty, D.S., Jauhar, S.K.: Enhancement of mail operational performance of India post facility layout using AHP. Int. J. Syst. Assur. Eng. Manag. 11(2), 261–273 (2019). https://doi.org/10.1007/s13198-019-00854-1 8. Ramachandran, K.: Indian Postal History Focus on Tamilnadu. Imayaa Publications, India (2011) 9. Vadivel, S.M., Sequeira, A.H., Jauhar, S.K., Baskaran, R., Robert Rajkumar, S.: Application of Multi-criteria Decision-Making Method for the Evaluation of Tamil Nadu Private Bus Companies. Soft Computing: Theories and Applications Advances in Intelligent Systems and Computing. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4032-521 10. Solangi, Y.A., Longsheng, C., Shah, S.A.A.: Assessing and overcoming the renewable energy barriers for sustainable development in Pakistan: an integrated AHP and fuzzy TOPSIS approach. Renew. Energy 173, 209–222 (2021) 11. Pathan, A.I., Girish Agnihotri, P., Said, S., et al.: AHP and TOPSIS based flood risk assessment- a case study of the Navsari City, Gujarat, India. Environ. Monit. Assess. 194, 509 (2022) 12. Jorge-García, D., Estruch-Guitart, V.: Comparative analysis between AHP and ANP in prioritization of ecosystem services-A case study in a rice field area raised in the Guadalquivir marshes (Spain). Eco. Inform. 70, 101739 (2022) 13. Özkan, B., Erdem, M., Özceylan, E.: Evaluation of Asian countries using data center security index: a spherical fuzzy AHP-based EDAS approach. Comput. Secur. 102900 (2022)
Weighted Pathfinding in the Paparazzi Problem with Dynamic Obstacles Timo Schöpflin1 , Pascal Zimmerli1 , Rolf Dornberger2
, and Thomas Hanne2(B)
1 Institute for Medical Engineering and Medical Informatics, School of Life Sciences, FHNW,
Muttenz, Switzerland 2 Institute for Information Systems, University of Applied Sciences and Arts Northwestern
Switzerland, Basel/Olten, Switzerland [email protected]
Abstract. The paper investigates the use of the A* Algorithm in a weighted and dynamic paparazzi problem. The performance of different heuristic functions are evaluated, namely the Manhattan, Euclidean and Chebyshev using four and eight neighboring nodes. Tests are conducted on different sized maps. Obstacles and terrain structures with different weights (corresponding to different speeds) are placed fixed on the map. Dynamic obstacles are represented by security guards that move randomly on the map. Overall, we observe lower costs if the paparazzo is able to move diagonally on larger maps. In addition, the Manhattan heuristic performs best in terms of total cost in a predominant number of simulations. On the contrary, the selection of the heuristic has no significant impact on the overall execution time of the simulation. Keywords: Pathfinding · Paparazzi Problem · A* · Heuristics · Weighted Maps · Dynamic Obstacles · Obstacle Avoidance
1 Introduction Research with focus on pathfinding has been a hot topic for many years [1]. Nowadays, pathfinding is applied in various fields, such as in route planning for robots in warehouses [2] and in the gaming industry for controlling the movements of autonomous characters [1]. The goal of the applied algorithm is to find the shortest path from a start point to an end point. For this purpose, different algorithms have been developed, e.g., probabilistic roadmap method, artificial neural networks, genetic algorithms, particle swarm optimization or the widely used A* [3]. It has been shown that the well investigated A* algorithm often finds the shortest path. However, in reality, the shortest path is not always the fastest path, as the agent would e.g., cross open water. Therefore, the traditional A* had been adapted to take different terrains and obstacles into account [4]. Furthermore, real world path finding environments are subject to continuous change. Therefore, an environment which introduces the challenge of constant change to the static paparazzi path finding problem provides the foundation to evaluate different approaches to dynamic pathfinding. This paper reviews the possibility to evolve the static paparazzi pathfinding © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 85–95, 2023. https://doi.org/10.1007/978-3-031-27440-4_9
86
T. Schöpflin et al.
problem to a dynamic pathfinding problem by introducing randomly moving security guards. These guards are randomly placed on one of the cells and are moved after every step of the paparazzo at a pre-defined velocity. The relevance of this new simulation environment is shown by comparing the pathfinding efficiency of a modified A* algorithm in a dynamic environment to the non-dynamic environment described in [4].
2 Problem Statement The paper enhances the paparazzi problem as described in [4–6]. Baldi et al. [6] compared Dijkstra and A* in a paparazzi pathfinding problem with unweighted obstacles. More complexity was added in [4] by using weighted obstacles. Sazaki et al. [7] made use of the A* algorithm in a dynamic, but unweighted pathfinding problem. In this case, A* was used to find the fastest way around a race track in a race car game with obstacle avoidance. The aim of the paparazzo in the presented problem is to find the most efficient way into a celebrity’s villa to take a picture of the famous person which is located at a predefined position. To enter the villa, the paparazzo first has to enter the property and cross the garden. There are different obstacles with individual cost associated to them, making it a weighted pathfinding problem. There are obstacles that offer fast progress such as roads, paths or grass land and there are bushes and ponds and a pool which all slow down the movements of the paparazzo. Fast movements lead to smaller cost, whereas slow movements account for higher cost for the respective path. In addition, security cameras are placed on the estate, mainly overlooking the gate and the paths. The paparazzo should avoid these zones, as being spotted by a camera increases the cost of the chosen path. To add another level of complexity to the pathfinding problem, we introduce security guards patrolling the property. The guards are placed at a random position at the start of the scenario. They change their position randomly at the same rate and velocity as the paparazzo. Being spotted by one of the guards increases the cost of the path drastically. In Fig. 1, an example of a map of the celebrity’s estate as used in this paper is depicted, showing the different field types. In Table 1, these field types and their costs are listed. Note that with increased saturation of the colors, the associated cost is higher. For instance, the cost for crossing shallow water is lower than swimming through deeper water. The pink fields indicate the start node of the celebrity. The end node, i.e., the pre-defined location of the celebrity, is marked yellow. The map, as well as the obstacles and their costs are based on [4]. The newly introduced guards are marked orange. The paper compares the pathfinding with and without guards on map sizes of 25 × 25, 50 × 50 and 100 × 100 nodes using an extended A* algorithm, as well as different heuristics, namely Manhattan, Euclidean and Chebyshev. Each heuristic function was tested with four (only horizontal and vertical movements) and eight neighboring nodes (horizontal, vertical, and diagonal movements).
Weighted Pathfinding in the Paparazzi Problem
87
Fig. 1. Example of a 100x100 map with security guards
Table 1. Field types of the map and their properties.
3 Background Information 3.1 A* Algorithm First introduced by Peter E. Hart, Nils J. Nilsson, and Bertram Raphael in 1968, the A* search algorithm has been widely used in research for many different pathfinding problems [8]. The advantages of the A* are its efficiency and simplicity. Furthermore,
88
T. Schöpflin et al.
due to its modular structure, the algorithm can be easily adapted for different use cases [9]. The A* algorithm tries to find a path from a start node S to a defined end node E by visiting the fewest nodes. The next move of the algorithm is calculated by evaluating the cost of each neighboring node and then moving to the one with the lowest cost. This process is repeated until the end node is reached. The cost function of A* can be written as in (1): f (n) = g(n) + h(n)
(1)
The total cost f of a node n is given as the sum of the costs g(n) and h(n). g(n) is the actual path cost from S to the node currently being evaluated. h(n) is a heuristic term used for estimating the cost of moving from the current node to E. The A* algorithm puts all nodes of a given matrix in an open and closed set. The open set holds nodes which were visited but not expanded, meaning that the neighboring nodes were not explored yet. Initially, S is the only node in this set. The closed set is empty at the beginning of the process. During the pathfinding, nodes which were visited and explored are put in this list [4, 10]. 3.2 Extended A* Algorithm On his mission, the paparazzo has to deal with different obstacles present on the property that slow him down. To take these obstacles into account, the calculation of cost f(n) of each node n has to be extended as shown in (2). f (n) = g(n) + e(n) + h(n)
(2)
The term e(n) takes the extra cost of an obstacle into account. e(n) is added to g(n), which represents the cost from the starting node to the current node. If two neighboring nodes have the same cost, h(n) is added to decide on the next move [4, 6]. 3.3 Heuristics To measure the distance between nodes, different heuristic functions are used in combination with A*. The selection of the heuristic has therefore a direct influence on the complexity of the algorithm in a map-based pathfinding problem. 1) None. If no heuristic function is chosen, the cost h(n) in (2) is ignored. In this case, the algorithm approximates the Dijkstra pathfinding algorithm [11, 12]. 2) Manhattan Distance. This function is the summation of the absolute horizontal and vertical distance between two nodes [4]. This is the standard function on a grid map if four directions of movement (up, down, left, right) are possible. This means, that no diagonal movements are considered [12, 13]. 3) Euclidean Distance. If any direction of movement is allowed, the Euclidean distance will lead to the shortest path. The downside is that the A* may run longer compared to other non-diagonal heuristic functions which are easy to compute [4, 12, 13].
Weighted Pathfinding in the Paparazzi Problem
89
4) Chebyshev Distance. Using the Chebyshev distance, the agent can, as with the Euclidean distance function, move diagonally. The cost of a diagonal move is equal as for a vertical or horizontal move. Chebyshev can be used under the assumption that all eight neighboring nodes are being considered. The diagonal movement is allowed if a vertical and horizontal movement can be performed simultaneously [4, 12]. 3.4 Dynamic Obstacles To introduce more complexity to the classical paparazzi problem, we introduced moving security guards-which should “protect” the celebrity from the paparazzo. These security guards have no intelligence in the current implementation of the scenario. They have no knowledge of the paparazzo and are not trying to capture him. Initially the security guards are randomly placed on the map. After that, at each iteration, they move in a random direction with the same velocity as the paparazzi (one cell). To integrate the security guards in the paparazzi scenario we imposed two restrictions on them. With their black suits they can neither swim nor climb over walls. For the paparazzo, security guards have the highest cost associated to them, since they might try to capture him during the simulation. In addition to the cell they are currently occupying, security guards also have a viewing area around them to account for their perception. The cost of the sight cells decreases slightly (compared to the security guard himself) based on the distance to the security guard.
4 Implementation and Testing 4.1 Code Structure The simulation was implemented in Python 3.9 and is executed in a Jupyter Notebook. The simulation is parameterized so that different scenarios can be compared: • • • •
map_root_dir: root directory of the CSV file of the weighted maps num_security_guards: number of security guards iterations: number of iterations in the simulation selected_map_sizes: selection of map sizes for the simulation (possible values are 25, 50, 100, 200) • selected_heuristics: selection of heuristics for the simulation (possible values are None, Manhattan, Euclidean, Chebyshev) • smart_path_finding: whether the path finding is smart or naïve For each simulation iteration, the resulting meta data and a GIF visualization of the simulation are captured. The relevant meta data are: • iteration name • map name
90
• • • • • •
T. Schöpflin et al.
heuristic number A* executions time elapsed (measured in seconds for full simulation time including image rendering) diagonal or nondiagonal movement number of security guards smart or naive path finding
The results of the different iterations are aggregated in a single Pandas data frame for evaluation. 4.2 Pathfinding There are a total of four different pathfinding combinations for the paparazzo. In all combinations, the paparazzo is moving iteratively through the map along the given path, one cell at a time. The paparazzo is either able to move in all directions (including diagonal) or only in straight lines. This is relevant to evaluate the performance of different heuristic functions. In the first naive implementation, the paparazzo executed the A* algorithm on every iteration to update the path based on the current situation. This behavior is further referred to as “naive pathfinding”. To improve the overall execution time, we also implemented a “smarter” pathfinding, where the paparazzo re-executes the A* algorithm only when a security guard is in close proximity [12]. An exemplary implementation in pseudo code is shown in Fig. 2.
5 Results and Discussion To identify which heuristic performs best in different dynamic scenarios, 48 different simulations were executed ten times each. These scenarios covered any combination of the parameters shown in Table 2. Figure 3 shows a 100x100 map with the paparazzo’s chosen path marked in pink and the area covered by the security guards during the process. Table 2. Possible combinations of the parameters.
Weighted Pathfinding in the Paparazzi Problem
91
Fig. 2. Dynamic A* algorithm.
Fig. 3. Example of a 100 x 100 after the simulation showing the paths and the movements of the security guards.
The summarized results of the different iterations of these simulations are shown in Table 3. All results of the individual iterations were aggregated by the simulation parameters (map, heuristic, diagonal movement or not, smart, or naïve pathfinding,
92
T. Schöpflin et al.
number of security guards). The total cost of the final path, the execution time, and the number of A* executions of the simulations were selected to be aggregated, since they are the most relevant. To be able to better interpret the data, we are not just looking at the mean but also at the 25th , 50th , and 75th percentiles. For the following analysis we considered only the simulations with security guards, unless stated otherwise. Table 3. Results of the 48 different simulations.
(continued)
Weighted Pathfinding in the Paparazzi Problem
93
Table 3. (continued)
5.1 Data Distribution For map sizes 100 × 100 and 50 × 50, we observed typically distributed data, whereas map size 25 × 25 was skewed towards higher values. This is caused by the high number of security guards on a small map. In this view, one security guard occupied 25 cells, which corresponds to 4% of the map. This means ten security guards would occupy 40% of the map. The additional simulations for map size 25 × 25 with three security guards showed that with this reduced number a typical distribution was achieved. 5.2 Pathfinding There is no scenario in which naive pathfinding results in an overall lower cost than smart pathfinding. Executing the A* algorithm only if required sometimes results in a significantly lower total cost with significantly shorter execution time. The execution time of smart pathfinding is between 30% and 75% faster across all simulations. 5.3 Diagonal Movement Overall, we observed lower costs if the paparazzo was able to move diagonally on larger maps. Only the default heuristic function (none) applied on map 100 × 100 performed worse if diagonal movement was enabled, in all other simulations the total cost was lower with diagonal movement. There was no relevant deviation in the execution time of the simulations.
94
T. Schöpflin et al.
5.4 Overall Heuristic Performance Manhattan performed best in the predominant number of simulations in terms of total cost. The selection of the heuristic had no significant impact on the overall execution time of the simulation.
6 Conclusions Our paper extends the static paparazzi pathfinding problem on weighted maps to a dynamic pathfinding problem on weighted maps, with dynamically changing weights based on the movement of the introduced security guards. This differs from previous research that focused on either pathfinding on weighted maps or pathfinding with moving obstacles. We confirm that the Manhattan heuristic which performs best in the static pathfinding problem [4] also produces the lowest path-cost in the dynamic pathfinding problem. A significant difference in the execution time of the simulations per heuristic is not observed, against the observations of [4]. Furthermore, we prove that re-executing the A* pathfinding only when needed outperforms the execution after every step in both path-cost and execution time. For the presented problem, one could evaluate the use of other algorithms than A*. A possible alternative is a Genetic Algorithm (GA). It would be interesting to compare the total cost and run-time of the different algorithms. In addition, there are several options to make the presented problem even more complex. One option is to implement a predatorprey scenario. The predator, i.e., the security guard would search for the paparazzo and not only move randomly on the map. This scenario could be enhanced by using a swarm of security guards to search for the unwanted intruder. Another possible extension of the presented scenario would be to make the celebrity movable.
References 1. Cui, X., Shi, H.: A*-based pathfinding in modern computer games. IJCSNS Int. J. Comput. Sci. Netw. Secur. 11(1), 125–135 (2011) 2. Yang, B., Li, W., Wang, J., Yang, J., Wang, T., Liu, X.: A novel path planning algorithm for warehouse robots based on a two-dimensional grid model. IEEE Access 8, 80347–80357 (2020) 3. Angkuldee, A., Shih, K.P., Ruengittinun, S.: Apply A* search for robot path finding. In: 2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media), pp. 183–186. IEEE (2019) 4. Schär, K., Schwank, P., Dornberger, R., Hanne, T.: Pathfinding in the paparazzi problem comparing different distance measures. In: Proceedings of International Joint Conference on Advances in Computational Intelligence, pp. 81–95. Springer, Singapore (2022) 5. Jenkin, M., Dudek, G.: The paparazzi problem. In: Proceedings of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No. 00CH37113), vol. 3, pp. 2042–2047. IEEE (2000) 6. Baldi, S., Maric, N., Dornberger, R., Hanne, T.: Pathfinding optimization when solving the paparazzi problem comparing A* and Dijkstra’s algorithm. In: 2018 6th International Symposium on Computational and Business Intelligence (ISCBI), pp. 16–22. IEEE (2018)
Weighted Pathfinding in the Paparazzi Problem
95
7. Sazaki, Y., Primanita, A., Syahroyni, M.: Pathfinding car racing game using dynamic pathfinding algorithm and algorithm A. In: 2017 3rd International Conference on Wireless and Telematics (ICWT), pp. 164–169. IEEE (2017) 8. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968) 9. Foead, D., Ghifari, A., Kusuma, M.B., Hanafiah, N., Gunawan, E.: A systematic literature review of A* pathfinding. Procedia Comput. Sci. 179, 507–514 (2021) 10. Zhang, H.M., Li, M.L., Yang, L.: Safe path planning of mobile robot based on improved A* algorithm in complex terrains. Algorithms 11(4), 44 (2018) 11. Lin, M., Yuan, K., Shi, C., Wang, Y.: Path planning of mobile robot based on improved A** algorithm. In: 2017 29th Chinese Control and Decision Conference (CCDC), pp. 3570–3576 (2017) 12. Patel, A.: A*’s Use of the heuristic. http://theory.stanford.edu/~amitp/GameProgramming/ Heuristics.html (2022). last accessed 06 Sept 2022 13. Suryadibrata, A., Young, J.C., Luhulima, R.: Review of various A* pathfinding implementations in game autonomous agent. IJNMT (Int. J. New Media Technol.) 6(1), 43–49 (2019)
A Rapid Review on Ensemble Algorithms for COVID-19 Classification Using Image-Based Exams Elaine Pinto Portela1 , Omar Andres Carmona Cortes2(B) , and Josenildo Costa da Silva2 1
2
Programa de P´ os-Gradua¸ca ˜o em Engenharia da Computa¸ca ˜o (PECS), Universidade Estadual do Maranh˜ ao (UEMA), S˜ ao Luis, MA, Brazil Departamento de Computa¸ca ˜o (DCOMP), Instituto Federal do Maranh˜ ao (IFMA), S˜ ao Luis, MA, Brazil {omar,jcsilva}@ifma.edu.br
Abstract. The world recently has faced the COVID-19 pandemic, a disease caused by the severe acute respiratory syndrome. The main features of this disease are the rapid spread and high-level mortality. The illness led to the rapid development of a vaccine that we know can fight against the virus; however, we do not know the actual vaccine’s effectiveness. Thus, the early detection of the disease is still necessary to provide a suitable course of action. To help with early detection, intelligent methods such as machine learning and computational intelligence associated with computer vision algorithms can be used in a fast and efficient classification process, especially using ensemble methods that present similar efficiency to traditional machine learning algorithms in the worst-case scenario. Therefore, this review is relevant for driving researchers interested in investigating ensemble methods to improve the classification quality of their algorithms and avoid duplicated efforts. Keywords: Ensemble
1
· COVID · Machine Learning · Image
Introduction
The coronavirus (COVID-19) is a highly contagious viral disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), causing the socalled COVID-19 pandemic. The state of the pandemic was officially declared by the World Health Organization (WHO) on March 11th, 2020, leading to a total of 612,724,171 cases of infection around 224 countries with 6,517,123 deaths reported by September 27th of 2022 [41]. Due to its accelerated dissemination, the early detection of an infected person is critical to discontinue the chain of transmission. One of the primary methods to confirm an infected person is the reverse transcriptional polymerase chain reaction (RT-PCR) test, which is performed using a clinical swab sample of the patient. The results are obtained between a few hours and two days [26]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 96–106, 2023. https://doi.org/10.1007/978-3-031-27440-4_10
A Rapid Review on Ensemble Algorithms
97
However, the RT-PCR has a low sensitivity of 65% to 95% and encloses the problem of producing false-negative results, i.e., a negative diagnostic when the patient is infected [5]. Due to this reason and the limited availability of test kits, the hospital domain experts demand other ways of detecting an infected person [10]. The Chest X-Ray (CXR) and chest computed tomography (CT) are two wellknown radiological techniques that can be used to detect typical changes in the pulmonary parenchyma associated with COVID-19 [5]. In fact, those two imagebased exams provide further observations such as dilation and consolidation in addition to ground-class opacities in COVID-19 patients [43]. In this context, due to the easy availability of tomography machines in hospitals and because these kinds of exams are non-invasive clinical techniques, they can act as an alternative or complement to detect COVID-19 in patients. Even though those image-based exams provide a better understanding of the pulmonary condition, the diagnosis and interpretation of radiographic exams require an expert, which is human, requiring additional time for the diagnosis, and being human; he is susceptible to distress and fatigue that can lead to an error. This scenario motivates the use of Machine Learning techniques to develop automatic computer-aided (CAD) systems, which can help physicists in this demanding task. In this context, we would like to contribute to the area guiding researchers interested in helping with the use of Machine Learning to detect COVID-19 chest computed tomographies and chest x-rays by answering the following four main questions: Q1 What are the most used ensemble learning technique: bagging, boosting, stacking, cascade, or hybrid? Q2 How many classes have been used in the classification task? Q3 What are the most used machine learning algorithms and models for identifying COVID-19 in CXR and CT? Q4 Which are the main used datasets, and where do they come from?
2
Ensemble Algorithms
In machine learning, the ensemble is a strategy that combines a set of base models into a global predictive model [13]. Ensembles are an efficient way to improve model accuracy and help to address complex challenges in machine learning such as class imbalance, concept drift, and the curse of dimension [13,24,25,36]. An ensemble is potentially more robust and accurate than its individual base models. To achieve so, the base classifiers must be accurate and diverse [13]. Accuracy is necessary to avoid the base models being worse than random guessing. Diversity ensures that the base models make mistakes on different data points. In this context, we present two main ensemble approaches: homogeneous and heterogeneous.
98
E. P. Portela et al.
In the homogeneous ensemble approach, all models are generated by the same algorithm. The idea is to create diversity between models by creating multiple hypotheses on different training samples. This approach can be divided into bagging and boosting. Bagging is an acronym for “bootstrap aggregating” [7] where a set of bootstrapped data sets is extracted from the original training data with replacement, as the Random Forest (RF) algorithm [8] that is a very popular example. While boosting [16] is a method that generates a series of weighted classifiers, maintaining weights over the training examples. One prominent boosting algorithm is AdaBoost [15,16]. Regarding heterogeneous algorithms, they can be divided into stacking, cascade, and hybrid. Stacking [42] is an approach that first generates base learners from the training data, generally using different algorithms. The cascade [17] is a constructive approach in which each classifier creates a vector of features based on the probability of each class. A Hybrid algorithm came from the idea that each algorithm is the best in different datasets, i.e., an algorithm cannot produce a good model in all types of datasets. Thus, different regions of the input space are approximated using different models [14]. An example of a hybrid system is an invariant decision tree using Naive Bayes classifiers in its leaves [23]
3
Methodology
A rapid review is a technique for synthesizing evidence for a comprehensive or systematic literature search, yet, it requires a briefer time frame than standard systematic approaches [21]. It addresses a research question or a set of questions associated with a single topic. In this particular case, the subject is using ensemble algorithms for COVID classification using x-rays and computerized tomographies as image-based exams as presented previously in Sect. 1. In this work, we aim to answer four questions as previously presented. In Q1, we identify the most frequent ensemble technique. In Q2, we intend to determine how many classes those ensemble algorithms use. In Q3, we determine the most common machine learning algorithms and models. Finally, in Q4, we identify the most common datasets. All questions were designed to avoid double work in future research and drive experts to develop new efficient ensemble models. The search string was “Ensemble and Covid-19 and x-ray or tomography and classification”. Then, as suggested in [9], we used it in four famous and well-known electronic databases: IEEE Xplore, ACM Digital, Springer, and ScienceDirect. 3.1
Inclusion Criteria
The following inclusion criterion has been carried out: – Papers from journals; – Papers whose primary objective is to classify or detect COVID-19 using x-ray and computerized tomographies;
A Rapid Review on Ensemble Algorithms
99
– Papers published from 2020 to 2022; – Papers that classify into at least three classes - Normal, Pneumonia, and COVID-19 The period is obviously narrowed by the pandemic, which started in 2020. 3.2
Exclusion Criteria
As previously mentioned, we have to narrow the number of papers we want to analyze. Hence, the following exclusion criteria have been considered: – – – – – –
Papers from conferences; Papers in languages other than English; Short papers (less than four pages); Papers using only traditional ML algorithm; Papers whose techniques use images other than x-rays and CTs; Papers that presented surveys and reviews.
4
Results and Discussion
We executed the search from January to March of 2022. The search string returned the following number of papers: IEEE Xplore - 29, ScienceDirect 11, ACM Digital - 06, and Springer 214, totaling 260. After applying the inclusion and exclusion criteria, we selected two articles from 2020, fourteen from 2021, and one from 2022, totalizing seventeen papers. Next, we answer our four research questions. 4.1
Q1 - Ensemble Technique Table 1. Types of ensemble models.
1-
[19] Hybrid
2-
[2]
Hybrid 3 -
[37] Hybrid
4-
[10] Bagging
5-
[20] Hybrid
6-
[32] Hybrid 7 -
[39] Hybrid
8-
[22] Bagging
9-
[3]
Stacking 10 - [6]
13 - [26] Hybrid 17 - [40] Bagging
Hybrid 11 - [18] Hybrid
12 - [35] Stacking
14 - [30] Hybrid 15 - [34] Boosting 16 - [38] Stacking –
–
–
The most used, with ten articles, was the hybrid technique which varies in each paper. For example, [19] uses a three-step hybrid ensemble model comprising a feature extractor, feature selector, and a classifier, i.e., a different model for each of the steps, same as [20,37,39] (Table 1). Further, [26] uses a combination of boosting and voting. While [2] classifies using the merge of the two best-performing models in a list of fine-tuned and pre-trained deep learning models and then adds some additional layers (dense and dropout) to improve the overall performance.
100
4.2
E. P. Portela et al.
Q2 - Number of Classes
Table 2 answers Q2, showing that most of the selected papers classify the images as Normal, COVID-19, or pneumonia. With [2] and [35] differentiating pneumonia from bacteria or viruses. Furthermore, [26] added the tuberculosis class. Table 2. Number of classes in the classification task. Paper #Numb. Classes [19]
3
COVID-19, Viral pneumonia or Normal
[2]
4
Viral and Bacterial pneumonia, COVID-19 or Normal
[37]
3
COVID-19, Pneumonia or Normal
[10]
4
Normal x Abnormal/COVID-19 x Pneumonia
[20]
3
COVID-19 x Non COVID-19/Normal x COVID-19 x pneumonia
[32]
3
COVID-19, normal or viral pneumonia
[39]
3
No-finding x others(COVID-19 or Pneumonia)
[22]
3
COVID-19, pneumonia, or normal
[3]
3
COVID-19, normal or pneumonia
[6]
3
COVID-19, normal or pneumonia
[18]
4
COVID-19 x pneumonia x normal/COVID-19 x Non COVID-19
[35]
3
COVID-19, bacteria pneumonia or normal
[26]
4
COVID-19, pneumonia, tuberculosis, or healthy
[30]
3
COVID-19, viral pneumonia, or normal
[34]
3
healthy, COVID-19 or Pneumonia
[38]
3
Normal (healthy), COVID-19, or Pneumonia
[40]
3
Normal, COVID-19, or CP (Pneumonia)
4.3
Q3 - Machine Learning Algorithms and Models
Table 3 summarizes the algorithms used during the development of each solution. Results are divided into models for feature extraction and classification task. Moreover, to compare them, the accuracy rate is also shown for each proposed model, in which we can see that VGG-based ensemble models tend to present the best accuracy. To enhance the visualization of the most used models throughout the papers, Fig. 1 shows the number of times that each model appeared in the final selection. DenseNet model was one of the most used ones, being used in [3,18,20,39], and [40], although, in different versions of it. Figure 2 shows the most used models in the classification task. The first one was Support Vector Machines (SVM), with other models like in [34] and [10] or being the only model, like in [38].
A Rapid Review on Ensemble Algorithms
101
Table 3. Machine learning algorithms and models. Paper Extractor
Classifier
Accuracy (%)
[19]
AlexNet + ReliefF
SVM
98.642
[2]
InceptionV3, MobileNetV2
Softmax Layer
93.73
[37]
MKLBP, INCA
SVM
97.01
[10]
BGWO
SVM, DT, KNN, NB, ANN 98.06/91.33
[20]
VGG19, DenseNet121
SVM
98.28
[32]
ResNet-50
ECOC classifier
98.8/96
[39]
DenseNet121, EfficientNetB0
Bi-LSTM
92.49
[22]
CLAHE, CEED-Canny
VGG16, InceptionV3
97.90
[3]
DenseNet, GoogLeNet
SVM, RF, XGBoost
91
[6]
VGG16, Xception, InceptionV3
MLP with Choquet integral 93.81
[18]
VGG16, DenseNet201
Softmax or Sigmoid Layer
99.21
[35]
Customized CNN
Softmax Layer
99.01
[26]
EfficientNet, GoogLeNet, XceptionNet Softmax Layer
99.32
[30]
ResNet50
95
[34]
CNN Feature Extraction
VGG Net, SVM, XGBoost
95.81
[38]
AlexNet, Relief
SVM
99.18
[40]
VGG-19, ResNet- 18, DenseNet-121
ReLU Layer
95.62
4.4
ReLU Layer
Q4 - Datasets
Table 4 depicts the datasets used to acquire the CXR and CT images. A common problem reported by most authors was the lack of COVID-19 CXR and CT available for this use. Especially those from 2020, which is expected because it was a new illness that no one knew about. In this context, most authors solved the problem using a data augmentation technique that resolves the scarcity of training data by enhancing it by transforming original examples [2]. Some methods used to perform the data augmentation are rotation, reflection, width shift, height shift, zooming, and shearing [20]. As a result, data augmentation was used by [2,6,10,18,20,26,30,39], and [40]. Another method for solving the lack of images is merging datasets to increase the number of positive examples. In [6], three datasets were merged from the most used ones: [12,33], and [1]. Regarding availability, the most used datasets are available in the Kaggle platform, and Github, i.e., [12,33], and [29].
102
E. P. Portela et al.
Fig. 1. Models Extraction
used
for
Feature
Fig. 2. Models used for Classification
Table 4. Most used datasets. Paper Datasets
5
[19]
[12, 29, 33]
[2]
[28], Mendeley, SIRM, Radiopaedia
[37]
Github, Kaggle
[10]
[12], Montgomery set, [29]
[20]
[33], Mendeley, [1, 12, 27]
[32]
[31, 33]
[39]
[12, 29]
[22]
[12, 27], Mendeley
[3]
[1, 12, 29, 33]
[6]
[1, 12, 33]
[18]
[4, 11]
[35]
Pediatric CXR, [29], Twitter COVID-19 CXR, Montreal CXR
[26]
[11, 18]
[30]
[11]
[34]
[31]
[38]
[28, 33]
[40]
COVIDx-CT, CC-CCII, COVID- CT
Conclusions
This paper presented a rapid review of ensemble algorithms for solving ImageBased COVID-19 classification. The most commonly used ensemble technique is the Hybrid approach since it allows the merging of some techniques applied to the same problem. However, Bagging and Stacking are also used, allowing the use of multiple models and getting the best solution.
A Rapid Review on Ensemble Algorithms
103
Solving an image-based classification requires multiple steps from the feature extractor of the images to extract texture descriptors patterns from the CT and CXR images. Those patterns include patchy ground-glass opacity, pulmonary consolidation, and reticulonodular opacity. The second step usually involves feature selection because not all extracted features can be relevant to an accurate characterization of visual information. Thereby, a model can be used during this step, so only the most relevant features are used in the next step. Regarding the dataset, to improve the classification process of COVID-19, some authors used data augmentation and multiple datasets to solve the problem of lacking positive samples and consequently preventing overfitting and enhancing the generalization ability. The main difficulty of this work was that some of the papers did not provide enough information to answer all the questions, requiring us to look into some of the related and cited works to identify the proper answers. Finally, future work includes designing ensemble approaches to improve the COVID-19 classification and using evolutionary algorithms to tune the ensemble models.
References 1. Agchung: Covid-19 chest x-ray dataset initiative, visited on march-01st-2022 (2022). https://github.com/agchung/Figure1-COVID-chestxray-dataset 2. Ahmad, F., Ghani Khan, M.U., Javed, K.: Deep learning model for distinguishing novel coronavirus from other chest related infections in x-ray images. Comput. Biol. Med. 134, 104401 (2021). https://doi.org/10.1016/j.compbiomed.2021. 104401, https://www.sciencedirect.com/science/article/pii/S0010482521001955 3. Arora, R., et al.: AI-based diagnosis of COVID-19 patients using x-ray scans with stochastic ensemble of CNNs. Phys. Eng. Sci. Med. 44, 1257–1271 (2021). https://doi.org/10.1007/s13246-021-01060-9, https://link.springer.com/ article/10.1007/s13246-021-01060-9 4. Asraf, A.: Covid19 with pneumonia and normal chest xray(pa) dataset, visited on may-6th-2022 (2022). https://www.kaggle.com/amanullahasraf/covid19pneumonia-normal-chest-xraypa-dataset 5. Avetisian, M., et al.: CORSAI: a system for robust interpretation of CT scans of COVID-19 patients using deep learning. ACM Trans. Manage. Inf. Syst. 12(4) (2021). https://doi.org/10.1145/3467471 6. Bhowal, P., Sen, S., Yoon, J.H., Geem, Z.W., Sarkar, R.: Choquet integral and coalition game-based ensemble of deep learning models for covid-19 screening from chest x-ray images. IEEE J. Biomed. Health Inform. 25(12), 4328–4339 (2021). https://doi.org/10.1109/JBHI.2021.3111415 7. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 8. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 9. Brereton, P., Kitchenham, B.A., Budgen, D., Turner, M., Khalil, M.: Lessons from applying the systematic literature review process within the software engineering domain. Syst. Software 80, 571–583 (2007)
104
E. P. Portela et al.
10. Chandra, T.B., Verma, K., Singh, B.K., Jain, D., Netam, S.S.: Coronavirus disease (covid-19) detection in chest x-ray images using majority voting based classifier ensemble. Expert Systems with Applications 165, 113909 (2021). https://doi. org/10.1016/j.eswa.2020.113909, https://www.sciencedirect.com/science/article/ pii/S0957417420307041 11. Chowdhury, M.E.H., et al.: Can AI help in screening viral and covid-19 pneumonia? IEEE Access 8, 132665–132676 (2020). https://doi.org/10.1109/access.2020. 3010287, https://doi.org/10.1109/ACCESS.2020.3010287 12. Cohen, J.P., Morrison, P., Dao, L.: Covid-19 image data collection. arXiv 2003.11597 (2020), https://github.com/ieee8023/covid-chestxray-dataset 13. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) Ensemble methods in machine learning. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9 1 14. Faceli, K., Lorena, A.C., Gama, J., Almeida, T.A.D., de L. F., C.A.P.: Inteligˆencia Artificial: Uma Abordagem de Aprendizado de M´ aquina. LTC, 2nd edn. (2021) 15. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997) 16. Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156. Citeseer (1996) 17. Gama, J., Brazdil, P.: Cascade generalization. Mach. Learn. 41, 315–343 (2000) 18. Gianchandani, N., Jaiswal, A., Singh, D., et al.: Rapid COVID-19 diagnosis using ensemble deep transfer learning models from chest radiographic images. J. Ambient Intell. Hum. Comput. (2020). https://doi.org/10.1007/s12652-020-02669-6, https://link.springer.com/article/10.1007/s12652-020-02669-6 19. Jin, W., Dong, S., Dong, C., Ye, X.: Hybrid ensemble model for differential diagnosis between covid-19 and common viral pneumonia by chest x-ray radiograph. Computers in Biology and Medicine 131, 104252 (2021). https://doi.org/10. 1016/j.compbiomed.2021.104252, https://www.sciencedirect.com/science/article/ pii/S0010482521000469 20. Kedia, P., Anjum, Katarya, R.: Covnet-19: A deep learning model for the detection and analysis of covid-19 patients. Appl. Soft Comput. 104, 107184 (2021). https://doi.org/10.1016/j.asoc.2021.107184, https://www.sciencedirect. com/science/article/pii/S1568494621001071 21. Khangura, S., Konnyu, K., Cushman, R., Grimshaw, J., Moher, D.: Evidence summaries: the evolution of a rapid review approach. Syst. Control Found. Appl. 1(10), 1–10 (2012) 22. Kieu, S.T.H., Bade, A., Hijazi, Ahmad, M.H., Kolivand, H.: COVID-19 detection using integration of deep learning classifiers and contrast-enhanced canny edge detected x-ray images. IT Prof. 23(4), 51–56 (2021). https://doi.org/10.1109/ MITP.2021.3052205 23. Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision tree hybrid. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996) 24. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007) 25. Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Wo´zniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017) 26. Kumar, N., Gupta, M., Gupta, D., et al.: Novel deep transfer learning model for COVID-19 patient detection using x-ray chest images. J. Ambient Intell. Hum. Comput. 27 (2021). https://doi.org/10.1007/s12652-021-03306-6, https:// link.springer.com/article/10.1007/s12652-021-03306-6
A Rapid Review on Ensemble Algorithms
105
27. Larxel: X rays and CT snapshots of convid-19 patients, visited on may-6th-2022 (2022). https://www.kaggle.com/andrewmvd/convid19-x-rays?select=X+rays 28. Mooney, P.: Chest x-ray images (pneumonia), visited on may-06th-2022 (2022). https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia 29. NIH, National Institutes of Health Chest X-Ray Dataset, v.o.M.t..: Nih chest xrays (2022). https://www.kaggle.com/nih-chest-xrays/data 30. Babu P, S.A., Annavarapu, C.S.R.: Deep learning-based improved snapshot ensemble technique for covid-19 chest x-ray classification. Applied Intelligence 51 (2021). https://doi.org/10.1007/s10489-021-02199-4, https://link.springer.com/ article/10.1007/s10489-021-02199-4 31. Patel, P.: Dataset contains chest x-ray images of covid-19, pneumonia and normal patients, visited on may-06th-2022 (2022). https://www.kaggle.com/prashant268/ chest-xray-covid19-pneumonia 32. Pathan, S., Siddalingaswamy, P., Ali, T.: Automated detection of COVID-19 from chest x-ray scans using an optimized CNN architecture. Appl. Soft Comput. 104, 107238 (2021). https://doi.org/10.1016/j.asoc.2021.107238, https://www. sciencedirect.com/science/article/pii/S1568494621001617 33. Rahman, T., Chowdhury, M., Khandakar, A.: Covid-19 chest x-ray database (2022). https://www.kaggle.com/tawsifurrahman/covid19-radiography-database 34. Rajagopal, R.: Comparative analysis of COVID-19 x-ray images classification using convolutional neural network, transfer learning, and machine learning classifiers using deep features. Pattern Recogn. Image Anal. 31, 313-322 (2021). https://doi.org/10.1134/S1054661821020140, https://link.springer.com/ article/10.1134/S1054661821020140 35. Rajaraman, S., Siegelman, J., Alderson, P.O., Folio, L.S., Folio, L.R., Antani, S.K.: Iteratively pruned deep learning ensembles for COVID-19 detection in chest xrays. IEEE Access 8, 115041–115050 (2020). https://doi.org/10.1109/ACCESS. 2020.3003810 36. Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov. 8(4), e1249 (2018) 37. Tuncer, T., Ozyurt, F., Dogan, S., Subasi, A.: A novel COVID-19 and pneumonia classification method based on f-transform. Chemometrics and Intelligent Laboratory Systems 210, 104256 (2021). https://doi.org/10.1016/j.chemolab.2021. 104256, https://www.sciencedirect.com/science/article/pii/S0169743921000241 38. Turkoglu, M.: Covidetectionet: Covid-19 diagnosis system based on x-ray images using features selected from pre-learned deep features ensemble. Appl. Intell. 51 (2021). https://doi.org/10.1007/s10489-020-01888-w, https://link.springer.com/ article/10.1007/s10489-020-01888-w ¨ 39. U¸car, E., Umit Atila, U¸car, M., Akyol, K.: Automated detection of COVID-19 disease using deep fused features from chest radiography images. Biomedical Signal Processing and Control 69, 102862 (2021). https://doi.org/10.1016/j.bspc.2021. 102862, https://www.sciencedirect.com/science/article/pii/S1746809421004596 40. Wang, Z., D.J..Z.J.: Multi-model ensemble deep learning method to diagnose COVID-19 using chest computed tomography images. J. Shanghai Jiaotong Univ. (Science) 27, 70-80 (2022). https://doi.org/10.1007/s12204-021-2392-3, https:// link.springer.com/article/10.1007/s12204-021-2392-3 41. WHO: Coronavirus disease (covid-19) (2020) events as they happen (who) (2022). https://covid19.who.int, visit in September-27th-2022 42. Wolpert, D.H.: Stacked generalization. Neural Networks 5(2), 241–259 (1992)
106
E. P. Portela et al.
43. Zhao, W., Zhong, Z., Xie, X., Yu, Q., Liu, J.: Relation between chest CT findings and clinical conditions of coronavirus disease (covid-19) pneumonia: A multicenter study. Am. J. Roentgenol. 214(5), 1072–1077 (2020). https://doi.org/10.2214/ AJR.20.22976, https://doi.org/10.2214/AJR.20.22976, pMID: 32125873
Indian Postal Service Quality Assessment Using Graph Theoretic Approach – A Quantitative Decision-Making Tool S. M. Vadivel1(B)
, A. H. Sequeira2
, and Sunil Kumar Jauhar3
1 Business School, Vellore Institute of Technology, Chennai 600127, India
[email protected]
2 School of Management, NIT Karnataka, Surathkal 575025, India 3 Operations Management and Decision Sciences, Indian Institute of Management, Kashipur,
India [email protected]
Abstract. This research intends to examine the Service Quality (SQ) factors in mail service operations conducted at National Sorting Hub (NSH), Mangalore, Karnataka state, Southern India. In the postal service industry, measuring SQ performance in mail service operations is a major challenge. So, this paper attempts to explore the positive effect of postal SQ factors on Customer Satisfaction (CS) with the data collected from employees (n = 148) to the Indian postal service. Further, to quantify the significance of SQ factors in gaining customer satisfaction, this study has used Graph-Theoretic approach technique. Results established same priority index to the SQ factors such as Human service delivery (Rank 1), Core service (Rank 2), and Systemization (Rank 3). The results indicate that the postal service industry should concentrate more on these factors to enhance their customer satisfaction. Further, the study employs the operating empirical model which is sparsely used in Indian domain. Furthermore, this research aids the SQ in designing and developing the necessary aspects to improve the CS in various service sectors. Keywords: Customer Satisfaction · Graph Theory · Mailing Operations
1 Introduction Since the last decade, the department of India-post has introduced a plethora of innovative services such as tracking system, e-payment, e-post, Book Now Pay Later (BNPL), and much more to satisfy the customers’ needs. The major thrust was towards leveraging technological advancements and bridging the digital divide especially among the rural and urban sectors in India. In all probability, India-post is among the very few publicservice-organizations that provide these affordable and accessible services to the rural India. The department’s innovative schemes are designed to engage the rural people in direct communication with the outside world and to bring the benefits of development to their doorsteps. Post offices are service-oriented enterprises that fulfil a variety of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 107–117, 2023. https://doi.org/10.1007/978-3-031-27440-4_11
108
S. M. Vadivel et al.
social responsibilities related to public services around the globe. In the service industry, providing reliable and high-quality service is critical. Before Indian independence, post and public telephone were the sole medium of communication. After independence, India witnessed a phenomenal increase in the number of post offices. It was opened even in the remotest corners and far-flung areas such as hills and deserts and thereby connected both the urban and the rural areas. The citizens walked for miles together looking for a post office and a public telephone; but, in the recent times, communication is rapid as it is offered in the mobile handset, fax, internet, and the like. So, the Indian postal department has felt an urgent need to face these daunting challenges, and the induction of technology in the postal service was the obvious need of the hour. This has created a digital divide too wherein the differences in the tastes and expectations of customers who reside in the cities/towns and those in rural areas are more significant. The expectations of the customers are very high, and the conventional method of postal services disappoints them. Additionally, this change in the needs is correlated to the changing nature of business and work culture. India-post enhanced the quality of its services to meet the growing expectations of the customers, to regulate bulk mail and to ensure speedy and time-bound postal delivery. Therefore, the postal-department decided to enhance the information technology to satisfy its customers, reduce delays and generate more revenue. The software has been developed for tracking speed-post articles introduced by the postal department as a private courier. This tracks the movement of the mails, and informs the sender or the addressee regarding the status of the article. 1.1 Importance of the Study • Need for a postal SQ evaluation to enhance the postal-staff quality and the postal services in comparison to the competing courier businesses. Increase in competition from the courier business, rapid changes in communication methods, consumer loyalty, globalization, and other factors are all long-term problems for the postal service to remain competitive. • Provision of infrastructural services such as equipment and transportation modes • Provision of a more customer-friendly ambience and a better working environment for the postal employees • Need for courteous, polite and skilled staff to handle tough situations • Need for faster implementation of changes in the existing procedures/rules among the counter clerks so as to avoid committing operational errors This paper is organized as follows: Sect. 1 deals with the importance of SQ and CS in the introduction and rationale of the study. Section 2 describes literature support and the research hypotheses formulation. Section 3 explores the results of data analysis using GT analysis. Section 4 is the discussion of the study, and Sect. 5 encompasses conclusion, limitations and future recommendations based on this research.
2 Literature Review According to Spreng & Mackoy (1996), Customer satisfaction and service quality are unquestionably the two basic notions at the heart of marketing theory and practice.
Indian Postal Service Quality Assessment Using Graph Theoretic Approach
109
In today’s world of fierce competition, providing high-quality service that results in delighted consumers is the key to achieve a long-term competitive advantage. A longitudinal study reveals that, SQ has been regarded as a type of attitude, and the two conceptions (SQ and Attitude) are considered as equal (Berry, Parasuraman, & Zeithaml, 1988). Meanwhile, Bitner et al. (1997) described that SQ perceptions can occur at numerous levels inside an organization, including the core service, the physical environment, interactions with service providers, and so on. The significance of SQ factors had been widely proved with various domains that directly deal with consumers. For instance, Rita, Oliveira, Heliyon, (2019) have identified factors such as internet SQ factors (i.e., customer service, website design, security, and fulfilment) that significantly predicts the complete e-SQ which leads to customer satisfaction and customer trust, likewise, internet services, physical evidence, and internet price were the perceived SQ factors of consumers to adopt internet service. A study conducted by Hussain et al. in 2015 merged the SERVQUAL framework (i.e., responsiveness, reliability, tangibility, assurance, safety and communications, and security) with other significant factors (i.e., SQ, Customer Expectations, Corporate Image, Perceived Value, and CS) to predict the brand loyalty of the airline customers. Correspondingly, the research by Miranda et al. (2018) steered with fuzzy-set Qualitative Comparative Analysis (fsQCA) established that an amalgamation of SERVQUAL elements with comfort, convenience, and connection paves the way to attain maximum CS in the railway sector. In the domain of logistics, service quality elements such as responsiveness, speed, value, and reliability have been identified as significant predictors of customer satisfaction (Yuen & Thai, 2015); also, the social responsibility, outcomes, processes, and management has also been recognized. Narteh (2018) found that the influence of banking SQ elements on CS would be moderated by factors such as the price of services offered by the bank; additionally, the ease of use, fulfilment, reliability, and security/privacy are the major dimensions of ATM SQ elements to acquire customer satisfaction. Past literature on SQ have identified a huge variety of models to study SQ. The SERVQUAL instrument (Berry, Parasuraman, & Zeithaml, 1988), a 22-item scale that assesses the service quality across five dimensions: reliability, responsiveness, assurance, empathy, and tangibles are the bedrock on which all subsequent studies were constructed. Remarkably, Buttle (1996) criticized the SERVQUAL model (i.e., 22 items only) in terms of operationalization, measurement, conceptualization, dimensionality, and applications. Besides, this model is not applicable to all service industries. As a result, we have adopted the model prescribed by Sureshchandar, Rajendran, & Anantharaman (2002); this model exhibits five service quality characteristics as 41 items crucial from the customers’ perspective in the banking sector. Among those 41 items, the current study has considered only 25 items as appropriate for the postal service industry based on a Focus Group Technique (FGT) session. The factors include: (1) core service or service product; (2) human elements of service delivery; (3) systematization of service delivery: non-human element; (4) tangibles of service (servicescapes); (5) social responsibility. • Core service or service product features are offered by postal services • Human elements such as empathy, reliability, assurances, truth, ethics concerned about postal service delivery.
110
S. M. Vadivel et al.
• Systemization of service delivery such as non-human element (procedures, systems, and technology, processes) is offered by the postal service providers. • Tangibles factors include machinery, equipment, signage, employee appearance, aesthetics, office infrastructure, and working environment, etc. at the postal services. • Social responsibility encompasses the creation of responsible corporate citizens through establishing ethical behaviour, loyalty and brand value of the postal image. By taking into account all of the above factors, the current article seeks to investigate SQ dimensions on CS in the NSH postal service as shown in Fig. 1. CS is viewed as a multi-dimensional construct in this study, but the underlying factors/items of CS are the same as those used to quantify or operationalize SQ dimensions. 2.1 Graph Theoretical (GT) Model Few studies in the recent past utilised the Graph theory model for analysing successful factors or barriers in the manufacturing industries. For example, Kavilal, Prasanna Venkatesan, & Sanket (2018) applied the GT model with ISM to quantify supply chain complexity in a single numerical index in the Indian automotive industry. Narayanamurthy & Gurumurthy (2016) applied the GT model for systemic leanness assessment in the manufacturing industry. Johansson et al. (2018), in their study argued that, network formation and contextualization are two essential principles in connectivism that have been recognised and handled using GT, as well as information filtering techniques and CAD model quality assurance. Goyal, Routroy (2018) attempted to use the GT model to assess the environmental sustainability performance of the supply chain for the Indian steel industry.
3 Research Methodology This study adopts a mixed method analysis which comprises of quantitative research methodology and descriptive research approach for data analysis. A cross-sectional study has been conducted and the survey period starts from 01.10.2021–01.03.2022 (6 months). A total of 163 questionnaires were circulated to NSH Mysuru postal customers. Among them, 148 postal customers were considered as valid responses. Figure 2 shows the proposed research methodology using Graph Theory (GT). 3.1 Graph-Theoretic Model Approach The proposed GT approach was used to investigate and assess postal departments customer satisfaction with SQ variables in NSH, India. The data for the above attributes are gathered from the head sorting centre using a questionnaire based on Saaty (1990) ranking weightage. Once the attributes have been defined and Ci determined, the next step is to find the comparative value of the attribute xij suggested by satty. In this, the effect of each subfactor on QS limiting factors can be converted into single numerical values using a graph theoretic and matrix method. The following are the key steps in the methodology:
Prompt service to customers
Ethical behaviour
House keeping
Tangible of service
Neat and professional appearance
Service transcendence
Comfortable provision
Equipment’s physical layout
Intensity & depth of service
Service availability
Appearance of visual sign board
HVAC
Consistently pleasing and courteous
Digital & physical security
Operating time
Core Service
Fig. 1. Cause and effect of Service Quality Dimensions
Social responsibility
Sense of public responsibility
Availability of service
Equal treatment
Customer feel safe and secure
Employee’s proper behaviour
Effectiveness of employee skills
Human effort of service delivery
Fool – proof procedure
Simplified delivery process
Adequate infrastructure
Adequate facility provision
Structured delivery process
Systemization
Postal Customer satisfaction
Diversity & range of service
Indian Postal Service Quality Assessment Using Graph Theoretic Approach 111
112
S. M. Vadivel et al.
Identify the various postal SQ factors through literature
Group the SQ factors and data collection from postal customers Use AHP attributes between
Apply GT – Math model
1- 9 scale
Develop decision matrix Find Permanent function of SQ F (SQ) = P.fn.(SQ) matrix
Based on weightage identify the major postal SQ factors
Ranking the major weighted SQ factors and analysis Fig. 2. Proposed research methodology
(1) Identification of subfactors that influence main factors (evaluation of SQ factors) while taking relative interdependencies into account. (2) Development of the digraph taking into account the specified variables and their interdependencies. (3) Transformation of the Digraphs into matrices following Eq. (1). (4) Transformation of the matrices into permanent functions based on Eq. (2); further, the values of the variables are substituted in consultation with experts. On a Satty scale of 1 to 9, the results were varied. Values were ranged on a Satty scale of 1–9. For more information, see appendix II. (5) Calculation of single numerical values indicating the degree to which a single variable evaluates SQ according to the expression. 3.1.1 Behavioural Digraph In terms of nodes and edges, a behavioural digraph is created to describe the behavioural factors that limit SQ (Fig. 3). Let the nodes represent the causes, and the edges represent the interactions between them. Accordingly, these nodes represent factors (Ci’s) and edges represent dependencies between factors (xij’s). A directed edge from node i to node j is expressed as xij in the digraph. The digraph allows one to see the proposed
Indian Postal Service Quality Assessment Using Graph Theoretic Approach
113
behavioural variables as well as their interactions. The behavioural digraph identifies five variables in particular. 3.1.2 Matrix Representation The above diagraph’s matrix representation yields one-to-one representation. This matrix is a 5 × 5 matrix since there are five variables that limit SQ. The matrix ‘SQ’ is represented as ⎞ C1 x12 x13 x14 x15 ⎜ x21 C2 x23 x24 x25 ⎟ ⎟ ⎜ ⎟ ⎜ SQ = ⎜ x31 x32 C3 x34 x35 ⎟ ⎟ ⎜ ⎝ x41 x42 x43 C4 x45 ⎠ x51 x52 x53 x54 C5 ⎛
(1)
C2
C3 C1
C5
C4
Fig. 3. Digraph for five system elements
Where Ci is the value of the factor represented by node and xij is the relative importance of ith factor over jth represented by the edge xij. • C1 - Core service or service product; C2 - Human element of service delivery; • C3 - Systematization of service delivery: non-human element; • C4 - Tangibles of service (servicescapes); C5 - Social responsibility.
114
S. M. Vadivel et al.
⎞ C11 x1 12 x1 13 x1 14 x1 15 ⎜ x1 21 C21 x1 23 x1 24 x1 25 ⎟ ⎟ ⎜ ⎟ ⎜ per (C1) = per (CS) = ⎜ x1 31 x1 32 C31 x1 34 x1 35 ⎟ ⎟ ⎜ 1 ⎝ x 41 x1 42 x1 43 C41 x1 45 ⎠ x1 51 x1 52 x1 53 x1 54 C51 ⎛
C11 ; C21 , C31 ; C41 ; C51 represent CS1, CS2, CS3, CS4 and CS5 respectively. Where Ci is the value of the factor represented by node and xij is the relative importance of ith factor over jth represented by the edge xij. 3.1.3 Permanent Representation Anderson et al. (2000) applied mathematics of Universal Variable Attribute Characteristic (UVAC) permanent function as a multinomial matrix function. The determinant of the matrix is a concept used in the permanent function. In the determinant, instead of terms with alternate positive and negative signs, all terms of the permanent function have a positive sign. Equation 2 shows the normal setup of the UVAC matrix. m
xij xji Xk Xl Xm + Per(U) = Ci + i=1 i j k l m
(xij xji + xij xjk xki + xik xkj xji Xl Xm + (( i j k l m i j k l m xkl xlk Xm + (xij xjk xkl xli + i j k l m (xij xji )(xkl xlm xmk + xil xlk xkj xji )Xm + i j k l m xkm xml xlk ) + (xij xji )(xkl xlm xmk + xkm xml xlk ) i
j
k
l
(2)
m
The detailed information can be seen from Paramasivam, Senthil, & Rajam Ramasamy (2011) who mentioned the interpretation of the permanent function.
4 Analysis and Results 4.1 Graph Theory Calculation ⎡
⎤ 0.3656 0.9 0.7 0.6 0.8 ⎢ 0.1 0.1408 0.8 0.7 0.5 ⎥ ⎢ ⎥ ⎢ ⎥ C1 − Core Service =⎢ 0.3 0.2 0.2547 0.4 0.3 ⎥ ⎢ ⎥ ⎣ 0.4 0.3 0.6 0.1326 0.9 ⎦ 0.2 0.5 0.7 0.1 0.1059 = Per (CS) = 1.5123
Similarly, C2, C3, C4, C5 as shown in Table 1 indicates that GT has given the same priority. (i.e., ranking) to the first three factors (i.e., Human service delivery–Rank 1, Core service-Rank 2, and Systemization-Rank 3). The postal administration should focus
Indian Postal Service Quality Assessment Using Graph Theoretic Approach
115
Table 1. Graph theory results S.no
Postal Service Quality Elements
Postal customers Feedback (n = 148) Graph Theory Model index
Rank
1
Core service
1.5123
2
2
Human service delivery
1.6414
1
3
Systemization
1.2359
3
4
Tangibles of service
0.8691
5
5
Social responsibility
1.0401
4
customer feedback in order to reinforce the SQ factor to reach consumer delight using Kano model quality function evaluation. As mentioned above, Sureshchandar et al., (2002) assessed the relationship between SQ and CS in Indian banking sector. In that, SQ factors; human service delivery got more important (0.831) compared to bank systemization (0.734). In line with the above, we also got the same priority value for human service delivery (0.404) followed by core service (0.279). In the domain of healthcare, Padma et al. (2010) found that clinical process (0.350) was prioritized as first compared to trustworthiness of the hospital (0.320). Likewise, postal service has to give more priority on social responsibility (−0.102) to gain consumer trust (which lead to loyalty). Besides, Wang et al. (2006) found Tangible of service (0.253) has third priority value for library users in Taiwan. But, in this study, tangible of service (0.163) has found to be fourth, hence, the postal service should improve the tangible of service by providing vital technologies, also improve the workplace environment and visual appearance of the postal atmosphere.
5 Conclusions The purpose of this study is to provide more insight on the differences between the constructs of SQ and CS in the postal service business. This paper aims to study the postal SQ factors through graph theory model. Here, graph theory model was employed to investigate the postal customers’ feedback. The postal customers’ feedback revealed that tangible factors such as machinery, infrastructure and the like as well as social responsibility should be focused on. The dawn of a new century and the advancement in technology forces an increase in quality, variety, and novelty of products and services in the postal service industry. Today postal services have upgraded their services on par with the expectations of the technologically advanced generation with a range of upgraded services. This empirical model used to assess SQ and CS is unique to the postal service industry. This study demands and evaluates responses from postal consumers in developing countries, such as India, People’s expectations in developed countries may differ from those in developing and under-developing countries, thus cultural bias can influence this study’s findings. This postal service SQ has been conducted in NSH
116
S. M. Vadivel et al.
Mangalore dealing with speed post articles in Karnataka, Southern n part of India. So, this case study has derived a specific conclusion. In the future, the SERVEQUAL model and QFD methods can be applied for the enhancement of the sustainable SQ model. Brand loyalty, customer complaints, word of mouth, and other factors could help to clarify the findings. Additionally, it can also be applied to another service sector.
References Anderson, I., Balakrishnan, R., Ranganathan, K.: A textbook of graph theory. Math. Gaz. 84, 562 (2000) Berry, L.L., Parasuraman, A., Zeithaml, V.A.: SERVQUAL: a multiple-item scale for measuring consumer perceptions of service quality. J. Retail. 64, 12–40 (1988) Bitner, M.J., Faranda, W.T., Hubbert, A.R., Zeithaml, V.A.: Customer contributions and roles in service delivery. Int. J. Serv. Ind. Manage. 8(3), 193–205 (1997) Buttle, F.: SERVQUAL: review, critique, research agenda. Eur. J. Mark. 30, 8–32 (1996) Goyal, S., Routroy, S., Shah, H.: Measuring the environmental sustainability of supply chain for Indian steel industry: a graph theoretic approach. Bus. Process. Manag. J. 24(2), 517–536 (2018) Hussain, R., Nasser, A.A., Hussain, Y.K.: Service quality and customer satisfaction of a UAE-based airline: an empirical investigation. J. Air Transp. Manage. 42, 167–175 (2015) Johansson, J., Contero, M., Company, P., Elgh, F.: Supporting connectivism in knowledge based engineering with graph theory, filtering techniques and model quality assurance. Adv. Eng. Inform. 38, 252–263 (2018) Kavilal, E.G., Venkatesan, S.P., Sanket, J.: An integrated interpretive structural modeling and a graph-theoretic approach for measuring the supply chain complexity in the Indian automotive industry. J. Manuf. Technol. Manag. 29, 478–514 (2018) Miranda, S., Tavares, P., Queiró, R.: Perceived service quality and customer satisfaction: a fuzzy set QCA approach in the railway sector. J. Bus. Res. 89, 371–377 (2018) Narayanamurthy, G., Gurumurthy, A.: Systemic leanness: an index for facilitating continuous improvement of lean implementation. J. Manuf. Technol. Manag. 27, 1014–1053 (2016) Narteh, B.: Service quality and customer satisfaction in Ghanaian retail banks: the moderating role of price. Int. J. Bank Mark. 36, 68–88 (2018) Padma, P., Rajendran, C., Lokachari, P.S.: Service quality and its impact on customer satisfaction in Indian hospitals: perspectives of patients and their attendants. Benchmarking 17, 807–841 (2010) Paramasivam, V., Senthil, V., Rajam Ramasamy, N.: Decision making in equipment selection: an integrated approach with digraph and matrix approach, AHP and ANP. Int. J. Adv. Manuf. Technol. 54, 1233–1244 (2011) Rita, P., Oliveira, T., Farisa, A.: The impact of e-service quality and customer satisfaction on customer behavior in online shopping. Heliyon 5(10), e02690 (2019) Saaty, T.L.: How to make a decision: the analytic hierarchy process. Eur. J. Oper. Res. 48, 9–26 (1990) Spreng, R.A., Mackoy, R.D.: An empirical examination of a model of perceived service quality and satisfaction. J. Retail. 72(2), 201–214 (1996)
Indian Postal Service Quality Assessment Using Graph Theoretic Approach
117
Sureshchandar, G.S., Rajendran, C., Anantharaman, R.N.: The relationship between service quality and customer satisfaction – a factor specific approach. J. Serv. Mark. 16, 363–379 (2002) Wang, I.-M., et al.: The relationship between service quality and customer satisfaction: the example of CJCU library. J. Inf. Optim. Sci. 27, 193–209 (2006) Yuen, K.F., Thai, V.V.: Service quality and customer satisfaction in liner shipping. Int. J. Qual. Serv. Sci. 7, 170–183 (2015)
Analyzing the Critical Success Factors of Lean System Implementation in India Post Using DEMATEL Method S. M. Vadivel1(B)
, Buvanesh Chandrasekaran1 , Thangaraja Arumugam1 K. Sivakumar1 , and Uduak Umoh2
,
1 Business School, Vellore Institute of Technology, Vandalur-Kelambakkam Road,
Chennai 600127, India [email protected], {thangaraja.a,k.sivakumar}@vit.ac.in 2 Department of Computer Science, University of Uyo, Uyo, Nigeria [email protected]
Abstract. The paper attempts to analyses the Critical Success Factors (CSF) of Lean System (LS) implementation using Decision-Making Trial and Evaluating Laboratory (DEMATEL) method. In this way, LS elements were grouped together into four different categories such as, Technical Practices (TPs), Workplace Environment Practices (WEPs), Social Practices (SPs), Ergonomic Practices (EPs) have been analyzed through field survey and suitable literature support. In DEMATEL method, set of LS elements effectiveness have been prioritized based on implementation. This study was directed in National Sorting Hub (NSH) Mangalore, Karnataka State, Southern part of India. NSH office particularly dealing with speed post articles. Since LS has successfully implemented NSH Mangalore, which was benchmarked to implement LS in other Indian Postal service industries also. The postal service industries have heterogeneity of mails, intangible measures of production are a great challenge to keenly analyses the CSFs while implementing LS. Keywords: Lean System · DEMATEL · Mail operations · Indian Postal Service
1 Introduction The post office basically deals with collection, sorting, transmission and delivery of traditional mail besides written messages, packets, parcels, insured letters, etc. In course of time, technology came to the post office as an aid to its counter-clerks and other operative staff. The main idea of introducing lean system is to gradually replace the old machines and facilitate the equipment to the working post employees as to common goal of enhancing operational performance. Recently, manual cancellation of letters was replaced as it time-consuming and wastage of manpower (Ramachandran, 2011). Through its enormous network, the post office now not only distributes mail but also engages in a variety of retail businesses. Its ability to manage financial transactions and understanding of the local environment allow it to deliver a number of services to the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 118–127, 2023. https://doi.org/10.1007/978-3-031-27440-4_12
Analyzing the Critical Success Factors
119
public in an efficient and cost-effective manner. India posts truly represents both tradition and modernity. This post office has become a symbol of continuity and change. Only the post office is more closely tied with human relationships than any other government institution. It is widely recognized as a communication enhancer. Lean strategy is an operational excellence focused on best business practices (Rajak et al., 2022). So, the goal of this paper is to study the application of lean system used in postal service to analyse the CSFs. In accord with Kaufmann & Kock (2022), CSFs, are a collection of fundamental aspects that assist an organization in attaining its goals and improving its performance. Furthermore, CSFs assist the business in achieving breakthrough improvement, allowing it to succeed in the eyes of its consumers and investors (Garza-Reyes et al., 2022). Bakar et al. (2015) looked at the influence of CSF on manufacturing and services, however they only looked at LS. There are fewer research publications that have looked into the CSF assessment in the postal service industry. As a result, the authors regard this as a valuable chance to identify the CSFs of the LS using DEMATEL method. 1.1 Rationale of the Study The study aim is to identify and analyse the many CSFs of the lean system from the standpoint of improving postal operational performance. As a result, the following RQs were created to achieve this goal. Research Questions:
RQ1: Why Indian postal service needs to implement lean system practices? RQ2: How can you analyze the Indian postal service CSF using DEMATEL method? RQ3: What opportunities exist for researchers in the subject of CSFs in the Indian Postal service? The paper prepared as follows: Sect. 1 deals with the importance of analyzing LS practices success factors in the Indian postal service industries in the introduction and rationale of the study. Section 2 describes the literature supports Sect. 3 implements DEMATEL methodology. Section 4 is India post case application and describes managerial implications of this study, and Sect. 5 concludes with a list of limitations and scope for future research.
120
S. M. Vadivel et al. Technical practices (TP)
Workplace environment practices (WEP) Light facility
Value stream mapping
Ventilation facility Facility layout
Noise control
Cellular layout approach (Nagare cell)
change Workspace Visual management
Pollution control
Employee environment
Emergency Management support
Multiskilled employee Leaderhip
Work posture
Critical success factor of lean service design implementation
Ease of supervision
Security Training
Social practices (SoP)
Comforts Ergonomics practices (EP)
Fig. 1. Cause and effect of lean system components
2 Literature Support The DEMATEL method to evaluate the strength of cause-and-effect relationship, though it overlaps with the objective of Structural Equation Modelling (SEM), it has a different evaluation method/procedure compare to SEM. For example, unlike SEM, DEMATEL doesn’t need large sample or complex steps to measure the strength of causal relationships. This method has been widely deployed in many services research study. For instance, using the DEMATEL method, Shieh et al (2010) had studied the hospital service industry and found the significant relationships between media staff personality and patient satisfaction and Cheng et al (2012) found that SERQUAL (service quality factors) such as reliability, responsiveness, and assurance are the most significant factor to gain the customer satisfaction in hotel industry. As like, Horng et al., (2013) found the positive effect of centrality, affect, and importance on restaurant comers’ satisfaction, and Chen (2016a, b) studied the air traveler satisfaction using air service quality factors. Besides, Su-Mei Lin (2011) investigated the relationship between marketing mix and growth of fast-food industry and Leksono et al (2018) examined the importance of environment effect, community, economic and
Analyzing the Critical Success Factors
121
intangibility assets on sustainable supply chain management. In the subject of sustainable health care, Leksono et al. (2019) investigated how economic, environmental, and social aspects have a favourable impact on consumer satisfaction. Also, Chen (2016a, b) confirmed the significance of Empathy, Reliability, Responsiveness, and Assurance factors on library service using the DEMATEL visual causeeffect map. This technique (i.e., DEMATEL) has also been utilized to enhance the drone adoption among logistics service sector to deliver the products to consumers, also it explored some influential factors (i.e., Technological advancements and Government regulations) of drone adoption.
3 Research Methodology 3.1 Steps for DEMATEL Method Step1: Defining the goal and evaluating factors considered based on suitable literature support from Vadivel et al. 2021 and Sengazhani Murugesan et al. 2019 and postal field survey. Step 2: The initial direct-relation matrix and average matrix (A) are constructed as follows: The formulation of the initial relation matrix is included in this stage. Experts were asked to score each factor on a scale of 0 – ‘No influence,’ 1 – ‘Little effect,’ 2 – ‘High influence,’ 3 – ‘Very high influence,’ to determine the direct influence between any two factors. A=
1 n Aij k k=1 n
(1)
Step 3: Calculating the normalized direct-relation matrix (D): The average matrix (M) is converted into a normalized direct-relation matrix through Eq. (1). X = k.A k = 1/max1≤i≤n
(2) n
Aij
j=1
Step 4: Developing the total relation matrix (T): The total relation matrix (T) is developed by using Eq. (2). T = X · (1 − X) − 1 where ‘I’ represents the identity matrix. After developing the total relation matrix, T = [tij]n × n, the summation of all the columns and rows are determined. Step 5: Computing the threshold value: The threshold value is computed in order to construct the causal digraph. It is calculated by taking the average of all the factors in the total relation matrix (T) (Cheng et al. 2012) (Fig. 2).
122
S. M. Vadivel et al.
Literature support
Field Survey
Lean experts’ inputs
Identification of Critical Success Factors (CSF) accepted by lean experts and Post Master General, and Managers
Technical Practices (TPs)
Social Practices (SPs)
Workplace Environment Practices (WEPs)
Ergonomics practices (EPs)
Evaluation of LS implementation CSF to frame the operational decision using DEMATEL approach
To analyze the relations among the CSFs
Normalize the direct Relation matrix
Calculate the Total Relation matrix
Develop the causal diagram and analyze relationship map to formulate the strategic decision plan
Fig. 2. Proposed Research Methodology
4 Case Application: Indian Postal NSH Mangalore, India This case study employs the four criteria shown in Fig. 1. The field survey was performed in NSH Mangalore, 2020. The quality manager circulated the self-administered questionnaire to the postal employees to complete the survey. As a result, we have received thirty questionnaires for analyzing this study. The DEMATEL method’s computation is based on this survey. The 4 × 4 matrices are necessary because there are four dimensions to evaluate the CSF of lean system implementation (Tables 1, 2 and 3). The average matrix A discussed in Step 1 can be constructed based on Eq. (1): ⎡ ⎤ 0423 ⎢2 0 3 2⎥ ⎥ A=⎢ ⎣1 1 0 1⎦ 1210 1 k Aij n n
A=
k=1
Step 1: Generating the direct relationship matrix.
Analyzing the Critical Success Factors Table 1. Comparison scale Number
Definition
0
No influence
1
Low influence
2
Medium influence
3
High influence
4
Very high influence
Table 2. Value of LSP elements in scale Elements
TPs
WEPs
EPs
SPs
TPs
0
4
2
3
WEPs
2
0
3
2
EPs
1
1
0
1
SPs
1
2
1
0
Step 2: Normalizing the direct relation matrix x =k ·A k = 1/max1≤i≤n
n
Aij
j=1
Table 3. Value of LSP elements DEMATEL method Elements
TPs
WEPs
EPs
SPs
TPs
0
4/9
2/9
3/9
WEPs
2/9
0
3/9
2/9
EPs
1/9
1/9
0
1/9
SPs
1/9
2/9
1/9
0
Step 3: Calculate the total relation matrix ⎡ ⎤ −0.02178 0.4202 0.6650 0.3547 ⎢ −0.00616 0.39402 0.39193 0.00077 ⎥ ⎥ T =⎢ ⎣ −0.00022 −0.000132 −0.12276 0.00253 ⎦ −0.0022 0.379016 −0.00319 −0.12265
123
124
S. M. Vadivel et al.
TPs -0.02178 -0.00616 -0.00022 -0.0022 -0.03036
WEPs -0.4202 0.39402 -0.00013 0.379016 0.352704
D 0.57772 0.78056 -0.12058 0.250976
R -0.03036 0.352704 0.93098 0.23535
TPs WEPs EPs SPs R
EPs 0.665 0.39193 -0.12276 -0.00319 0.93098
SPs 0.3547 0.00077 0.00253 -0.12265 0.23535
D-R 0.60808 0.427856 -1.05156 0.015626
D 0.57772 0.78056 -0.12058 0.250976
D+R 0.54736 1.133264 0.810398 0.486326
The direct and indirect effects of four dimensions are listed in Table 4. Finally, the threshold value used in Step 3 is 1.133264, which is used to compute the average of the elements in Matrix T. Figure 3 shows the digraph of these four dimensions. Table 4 reveals that working environment practices is the most significant dimension, with a value of 1.133264 (r + c), whereas social practices are the least important dimension, with a value of 0.48632 (r + c). The (r + c) values can be used to determine the relevance of dimensions. Positive (r − c) values indicate that technical practices, working environment practices and social practices are net causes, allowing researchers to investigate further into the cause-effect relationship of dimensions. Due to negative (r − c) values, ergonomic practices are net receivers. (See Fig. 4). Hence, postal administration has to give more importance to eliminate ergonomics related problems. Particularly, workplace environment practices have a direct impact on CSFs of lean system implementation. Furthermore, both ergonomics and technical practices have direct effects on CSFs to improve lean system incorporation. These two criteria are not only net causes, but they also have an impact on the other social practices criteria. Though technical practices and social practices are generally unimportant in terms of smaller (r + c) values, strengthening these two criteria could have positive effects on lean system implementation in terms of operational performance. Table 4. Direct and Indirect effects of four dimension Dimensions
D−R
D+R
TPs
0.60808
0.54736
WEPs
0.42785
1.13326
EPs
−1.05156
0.81039
SPs
0.015626
0.48632
Any pair of dimensions influences or mutually influences the other four dimensions. In summation, the most significant characteristic is workplace environment practices, followed by social practices is least CSF for a postal manual sorting centre in India post
Analyzing the Critical Success Factors
D+R
WEPs
TPs
EPs
125
SPs
1.2 1 0.8 0.6 0.4 0.2 0 0
1
2
3
4
5
6
7
Fig. 3. Digraph of the four dimensions.
TPs
WEPs
D-R
SPs
1 0.5 0 -0.5 0
2
4
6
8
-1 -1.5
EPs Fig. 4. Digraph of the four dimensions.
office. As a result, working environment practices, technological practices, social practices are the three most important criteria to examine critically while analyzing CSFs of lean incorporation. This case study employs the DEMATEL approach to not only assess the value of dimensions/criteria, but also to characterize the contextual relationships among those dimensions/criteria. This technique assists decision-makers in identifying causal links among dimensions/criteria instead of requiring the assumption that the criteria are mutually independent, as is made in standard multiple criteria decision-making procedures. The conventional viewpoint is to rank the importance of the dimensions and criteria so that the most significant dimension or criterion is deemed to be the most significant dimension or criterion. Working environment and ergonomic practices are the two most important factors in this study, with the two highest (r + c) values. However, because these two dimensions are controlled by the other dimensions, enhancing both Working environment and ergonomic dimensions will not significantly enhance the operational performance. Technical practices and social practices on the other hand, are all causal dimensions with positive effects on operational performance enhancement.
126
S. M. Vadivel et al.
This research also suggests that the percentage of R&D spend is a crucial criterion for technical practices dimensions. The realization of CSFs variables determining a successful implementation of LS within India Post service businesses has been described in this article. The discovered important success criteria have provided significant insight for improving critical decision-making processes, which are required for achieving corporate strategic goals such as lean service deployment. According to the report, a lack of proper money prevents many post offices from hiring their ideal management team, and as a result, they suffer from a lack of insightful leadership and planning.
5 Conclusions In this study, efforts were made to clear up some confusion about what are the CSFs of lean initiatives, which is common among practitioners as well as academics. Furthermore, the purpose of the paper was not to provide evidence to support this discussion. Despite the fact that the instruments used to assess success factors only in southern part of India. This postal service has been conducted in NSH Mangalore and RMS office Chennai, Southern part of India. The study focused on the work addressing the CSFs in Lean system which is moderately low in-service sector particularly postal service. Therefore, the CSFs can be repeated with a more significant sample. Furthermore, using Pareto analysis to compare the CSFs of lean service implementation assisted in determining the optimal lean system programme. Based on the influence of the CSFs, it can be deduced from DEMATEL techniques that LS deployment is the optimal plan for business practice. Researchers might concentrate their efforts on confirming these findings in the area of CSFs. The second most important success component in a lean service programme was found to be training and education. As a result, the company should create a custom training programme for implementing lean services. Communication, customer focus, and employee involvement, in addition to top management commitment, were shown to be critical for a successful lean system program. To promote staff involvement and contribution to the lean system programme, a company-wide awareness workshop should be held to achieve these factors.
References Bakar, F.A.A., Subari, K., Daril, M.A.M.: Critical success factors of Lean Six Sigma deployment: a current review. Int. J. Lean Six Sigma 6(4), 339–348 (2015) Chen, I.S.: A combined MCDM model based on DEMATEL and ANP for the selection of airline service quality improvement criteria: a study based on the Taiwanese airline industry. J. Air Transp. Manag. 57, 7–18 (2016) Chen, Y.T.: Applying the DEMATEL approach to identify the focus of library service quality: a case study of a Taiwanese academic library. Electron. Libr. 34, 315–331 (2016) Cheng, C.C., Chen, C.T., Hsu, F.S., Hu, H.Y.: Enhancing service quality improvement strategies of fine-dining restaurants: new insights from integrating a two-phase decision-making model of IPGA and DEMATEL analysis. Int. J. Hosp. Manag. 31, 1155–1166 (2012) Garza-Reyes, J.A., et al.: Deploying Kaizen events in the manufacturing industry: an investigation into managerial factors. Prod. Plann. Control 33(5), 427–449 (2022)
Analyzing the Critical Success Factors
127
Horng, J.S., Liu, C.H., Chou, S.F., Tsai, C.Y.: Creativity as a critical criterion for future restaurant space design: developing a novel model with DEMATEL application. Int. J. Hosp. Manag. 33, 96–105 (2013). https://doi.org/10.1016/j.ijhm.2012.06.007 Kaufmann, C., Kock, A.: Does project management matter? The relationship between project management effort, complexity, and profitability. Int. J. Project Manage. 40(6), 624–633 (2022) Leksono, E.B., Suparno, S., Vanany, I.: Integration of a balanced scorecard, DEMATEL, and ANP for measuring the performance of a sustainable healthcare supply chain. Sustainability 11, 3626 (2019) Leksono, E.B., Suparno, S., Vanany, I.: Using DEMATEL approach to develop relationships of performance indicators on sustainable service only supply chain performance measurement. IOP Conf. Ser.: Mater. Sci. Eng. 337, 012023 (2018) Sengazhani Murugesan, V., Sequeira, A.H., Shetty, D.S., Jauhar, S.K.: Enhancement of mail operational performance of India post facility layout using AHP. Int. J. Syst. Assur. Eng. Manag. 11(2), 261–273 (2019) Rajak, S., Sivakumar, K., Raja Sreedharan, V.: Analyzing the barriers for the implementation of lean and green closed-loop supply chain in Indian SMEs. In: Mathiyazhagan, K., Vimal, K.E.K., Kumar, H., Ramesh, A., Agarwal, V. (eds.) Lean and Green Manufacturing. MIE, pp. 1–22. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-5551-7_1 Ramachandran, K.: Indian Postal History Focus on Tamilnadu. Imayaa Publications, India (2011) Shieh, J.I., Wu, H.H., Huang, K.K.: A DEMATEL method in identifying key success factors of hospital service quality. Knowl.-Based Syst. 23, 277–282 (2010) Lin, S.-M.: Marketing mix (7P) and performance assessment of Western fast-food industry in Taiwan: an application by associating DEMATEL and ANP. Afr. J. Bus. Manag. 5, 10634– 10644 (2011) Vadivel, S.M., Sequeira, A.H., Sakkariyas, R.R., Boobalan, K.: Impact of lean service, workplace environment, and social practices on the operational performance of India post service industry. Ann. Oper. Res. 315(2), 2219–2244 (2021)
Application of Artificial Intelligence in Mental Health Anindya Nag1(B) , Ayontika Das1 , Riya Sil1 , Anwesha Kar1 , Dishari Mandal1 , and Biva Das2 1 Adamas University, Kolkata 700126, India
[email protected] 2 Khulna University, Khulna 9208, Bangladesh
Abstract. The Implementation of artificial intelligence has become one of the most critical tools that have impacted various domains of extensive societal importance including agriculture, education, and economic development. It is a multidisciplinary field that aims to automate activities within a machine similar to that of human intelligence. This advancement has created a huge revolution in the medical field. Various machine learning algorithms for prediction, accuracy detection, temporal model, speech processing, robotics and automated decisionmaking has been used in the development of mental health care. In this paper, authors have described about the various techniques that have been implemented till date such as Personal Sensing, Natural Language Processing (NLP), Audio Analysis, Electroencephalography (EEG), Chatbot, Multi-Agent Model etc. for taking care of mental health over the past few years. Artificial intelligence and machine learning-based technologies provide a promising area in transforming mental health and its possible drawbacks. Furthermore, the authors have provided an overview of artificial intelligence and its various applications in the field of healthcare. Various artificial intelligence-based techniques are required to eradicate the difference between normal clinical care and psychiatric treatments. In recent years, the world has observed a huge economical and mental breakdown of society due to the global pandemic since 2022. The severe impact of Covid-19 is reflected in the life of students thus affecting the education system as well. A review of numerous researches on mental health using artificial intelligence has been done that can be used in the place of usual clinical practices while eliminating its current restrictions, areas requiring additional research and improvement, and proper implications. Keywords: Mental Health · Artificial Intelligence · Machine Learning · Healthcare · Academic Performance
1 Introduction to Artificial Intelligence in Healthcare Artificial Intelligence (AI) helps to take after the decision-making capabilities and simulates the performance of human professionals [1]. AI in healthcare has been a substantial breakthrough that nevertheless omits important temporal elements of scientific care [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 128–141, 2023. https://doi.org/10.1007/978-3-031-27440-4_13
Application of Artificial Intelligence in Mental Health
129
The choices that are made in medical field, and the remedies can affect the future patients who are under observations [3]. In this paper, a detailed survey has been done based on several AI program traits concerning intellectual healthcare. One of the famous examples of a professional device in fitness care is MYCIN. It has been innovated in the year of 1970s at the Stanford University and it has been intended to discover bacterial contamination and suggest suitable antibiotic medication [4]. Over the years, incorporation of expert systems into few Clinical Decision Support System (CDSS) has been done. Choice Assist refers to the supply of information to the clinicians, commonly in decision-making. Some of the CDSS tools encompass the principles of the expert structures. One latest instance in the mental health care is ‘The Texas Medication Algorithm Project’ (TMAP project) taken away University of Texas Southwestern Medical School for Computer-Assisted melancholy remedy treatment. AI applications in the consultation and selection-making done well in research studies even after confronting diverse implementation challenges [5]. CDSS gear each one based totally on professional system models. In the real-world cases, each patient displays their individual characteristics and signs that impact the effectiveness of the treatment [6]. As such, clinicians discover the ways quickly to ignore tips which said that the same or the similar matters for all patient. Also, the practicing clinicians are keenly conscious about the prescription, which have been proven to be powerful in medical trial and research populations are not exactly exchangeable to the particulars in actual-world scenarios [7]. Traditional professional structures and CDSS can be afflicted by understanding engineering trouble: the assignment of effecting and controlling an understanding base of guideline. IBM’s Watson is presently being tested to positive healthcare issues such as Cancer [8]. This kind of recent sort of AI encapsulates few of the concepts of earlier expert systems. Underlying those sorts have been the persistent drawbacks of methods and systems to show the adaptivity and specificity seen in natural intellectual capacity. Specially, artificial structures need to be skilful and flexible with a purpose to make sense of the world and interact with it intelligently. As an example, an AI-based chess is totally based on conventional search and planning [7]. Its application may be skilful, but lacks flexibility. Section 2 – discusses about mental health including artificial intelligence in healthcare, mental health, and other uses. Section 3 – discusses about the various research topics in the field of mental health. Section 4 – provides a clear idea of the different limitations of AI and Mental Health Studies. Section 5 – provides a statistics scenario in the field of AI in Mental Health Care Research. Section 6 – concludes the paper and discusses about the future scope of the work.
2 Mental Health A Mental health is the integral part of physical health, i.e., a good health doesn’t exist without good mental health. It refers to how people think, how they feel and how they behave with others. Artificial Intelligence (AI) is being used nowadays for recognising various types of mental disorder and their symptoms. The AI motion sensors improve the result of anxiety symptom detection.
130
A. Nag et al.
2.1 AI in Healthcare AI is recently used in a large scale to facilitate primary disease detection, allowing better expertise for disorder development, optimizing medication dosages, and discovering novel treatments. The first-rate energy of AI is the express pattern evaluation of huge datasets. The most successful regions of drugs are in leveraging sample recognitions such as ophthalmology, cancer detection, and radiology in which AI methodizes can accomplish better than experienced doctors by comparing images for deformities or subtleties that are unnoticeable to the human eyes, especially retinas. Though the intelligent machines can never completely replace clinicians, but wise systems are getting more and more used to aid scientific selection-making. AI-based machines can quickly incorporate statistics from infinite number of medical statistical resources. To enhance the capability of AI, huge datasets are perfect like digital health report which may be analysed computationally, revealing movements regarding human behaviours and other characteristics that are tough to extract. 2.2 Ethics and AI Mental Health Research The development of AI and its increasing utility in various sectors, like mental fitness, has delivered the need to ethically scrutinize and regulate these applications. AI brings in its very own moral problems consisting of fairness, inclusiveness, transparency, duty, privacy, reliability, and safety. 2.3 AI Research on the Involvement of Patient and Public Mental Health There is a way of engaging users and sufferers in the studies and development of AI for mental fitness through Public and Affected Person Involvement (PPI). Comparable studies have been done for testing of virtual mental health intervention which consists of smartphone applications and internet-based remedies. Studies and improvement of AI for health and intellectual health are being performed out in many Universities with a large amount of AI studies taking region inside the parallel sphere of private groups and undertaking capital-funded new tech start-ups. Lots of those corporations do not have the same systematic exercise of research ethical approval or culture of PPI as universities or the NHS. Predictive modeling is a kind of device learning in AI that makes use of big and private information to find patterns. In this paper, authors have shared studies that might provide assistance to the scientific researchers and participants of the statistics mining network that are increasingly becoming a part of forces to build predictive fashions for health monitoring, treatment selection, and treatment personalization [6]. 2.4 Well-Being and Educational Performance Mental fitness troubles have a excessive pervasiveness amongst college students in higher education. Currently, approximately 70% of International high college graduates attend Universities. The days of the university years are a top length for the arrival of unusual intellectual problems, especially mood, pressure, and texture use disorders [7]. A part of those issues can be defined as stress, anxiety and educational insufficiency. Having
Application of Artificial Intelligence in Mental Health
131
to observe and carry out under stress in the university campus has located to interact with anxiety. Procrastinating and underperforming at university had been determined to predict melancholy and low self-esteem anxiety. Concurrently, mental health-relevant problems influence academic overall achievement [8]. There is a connectedness among educational overall performance and mental fitness troubles. So, one can recognize this interrelatedness, and endorse answers that do not improve one on the price of the alternative. The signs of intellectual fitness troubles are in general coined in phrases of bad effect on emotions of pain, pressure, depletion. When college students be present at college, they have made the change from past due to childhood to rising adulthood. Emerging adulthood is a developmentally essential era which may be described by shifts in autonomy, relational instability, and shifts in anticipated capability [9]. This will describe why this time, and the primary days of college, mainly, associates the sort of huge charge of dropout and educational underperformance. 2.5 Internet-Based Mental Health Care Treating mental fitness problems with conventional face-to-face techniques are more extensive as compared to online remedies. Virtual types of mental health care may have the convenience of being expansible and profitable. Numerous current meta-analyses represent that online treatment can be as efficient as conventional therapy in treating mental health troubles. The primary aim is to treat or stop anxiety, nervousness, depression, and another intellectual health issues among almost first-year newcomer with virtual, online interactions. This analysis and argument take area in the field of statistics and psychology. It is argued that entering to university coincides with a decisive developmental section into rising adulthood. An internet based online or virtual treatment is a more efficient, useful, and low-cost method to treat those problems. A capability precise favorable quality of online treatment is anonymity, that turned into observed to be related to extra self-disclosure. Another side of this research is aimed to improve the educational overall performance and properly-being among college students with goalsetting interventions. Aim setting facilitates students to assign their time wisely and enhance their overall educational achievement. These integrative interferences inspire university students to organize different kind of purpose like academic, social, or mental health-related problems. 2.6 Mental Healthcare Chatbots The Software programs which utilize Artificial Intelligence and Natural Language Processing to apprehend what human wishes and help them to get their convenient outcome are called chatbots. Chatbots are spreading rapidly among websites and online offerings including customer support, advertising and marketing, healthcare, and many more. Mental health chatbots are being used to provide help anytime to the patients who have depressive symptoms and anxiety. Chatbots work like cognitive behavioral therapy, it sometimes applies some techniques to the patients to reduce their stress. Chatbots are trained using supervised learning in which large datasets are used whereas Markov-chain based models, networks of decision trees are used in unsupervised
132
A. Nag et al.
learning. In education sector, use cases of chatbots are under research. The concern areas are language learning, well-being feedback, and support of cognitive thinking. These chatbots promise to bring revolution in the normal mental health platforms, by their capabilities to identify emotion and connecting the end user dynamically so that like humans, non-human agents can express their own empathy [9]. Woebot is a well-known chatbot therapist. Weizenbaum desired to expose how superficial connection was between a human and a machine. However, was amazed to discover that a lot of people along with his secretary might grow to be emotionally connected to this system. They would fail to remember even that they had been speaking to a PC, and Weizenbaum’s secretary apparently requested him for exiting a room to have an “actual verbal exchange” with the program. The most well-known script, doctor, simulated a therapist which worn the Rogerian manner of talking. Carl rogers was a therapist who worn non-directional question and sometimes repeat what a client said. The machine could parrot phrases back, or ask to intricate. Conversational systems have come an extended way through wise assistants such as Cortana (Microsoft), social chatbots aimed toward general communication, Siri (Apple), assignment-focused chatbots and Alexa (Amazon). All the chatbots relied on deterministic feedback which are the outcome of a rule-primarily based manner, which leads to
Fig. 1. A sample Conversation between a human and chatbot
Application of Artificial Intelligence in Mental Health
133
chatbots which are considered as not much smart. The more generally used Machine Learning methodologies permit chatbots to excel stable contextual reaction (Fig. 1).
3 Literature Survey In Table 1, authors have given a brief summarization of various studies by the researchers related to AI and Mental Health. Table 1. Overview of Researches on AI and Mental Health Sl. No
Paper Title
Objectives
Age Range
Research assessments A. Novel monitoring system 01
A neuro-fuzzy approach for the diagnosis of depression [10]
As per the representation of 19 to 50 years Mathematical model psychiatrist perceive various clinical depression symptoms and therefore diagnose these states
02
Use of a Novel Artificial Intelligence Platform on Mobile Devices to Assess Dosing Compliance in a Phase 2 Clinical Trial in Subjects With Schizophrenia [11]
Using new AI baed platform 45.9 ± 10.9 years AI Cure, RCT of medical observation is subjected with SZ
03
Mobile Sensing and Support Classify subjects that include 20 to 57 years for People with Depression: clinical depression data that A Pilot Trial in the Wild [12] has been collected from smartphone
B. Social Media 04
Exploring the Utility of Community-Generated Social Media Content for Detecting Depression: An Analytical Study on Instagram [13]
From community-generated social media content vs. individual generated social media content predict depression
26.7 ± 7.29 years
(continued)
134
A. Nag et al. Table 1. (continued)
Sl. No
Paper Title
Objectives
Age Range
05
Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid [14]
Highlights psychiatric symptoms and predict suicide ideation from various survey data and other text data
40.5 years 40.0 ± 13.8 years never suicidal 41.6 ± 13.9 years suicidal
06
Analysing depression From web-related posts tendency of web posts using predict depression tendency an event-driven depression tendency warning model [15]
07
Depression detection using emotion artificial intelligence [16]
Identify Tweets that illustrate Not reported various signals of depression and also emotional health
08
Predicting Depression Levels Using Social Media Posts [17]
Based on their regular posts identify social network users’ depression
Not reported
Using Brain AGE scores detect accelerated brain aging in SZ analyze to BD or HC
Sz mean 33.7 ± 10.5, (21.4–64.9); HC mean 33.8 ± 9.4, (21.7–57.8); BP mean 37.7 ± 10.7, (23.8–57.7)
Not reported
C. Brain imaging 09
BrainAGE score indicates accelerated brain aging in schizophrenia, but not bipolar disorder [18]
10
Towards artificial Using fMRI data classify SZ Not reported intelligence in mental health by improving schizophrenia prediction with multiple brain parcellation ensemble-learning [19]
11
Machine learning approaches for integrating clinical and imaging features in late-life depression classification and response prediction [20]
From multimodal imaging andnetwork-based features predict late-life MDD diagnosis
Not reported
(continued)
Application of Artificial Intelligence in Mental Health
135
Table 1. (continued) Sl. No
Paper Title
Objectives
12
Resting-state connectivity From fMRI diagnose biomarkers define subtypes of depression with neurophysiological subtypes biomarkers of depression [21]
Age Range Training mean = 40.6 years depression Mean = 38.0 years HC
Clinical assessments D. Mood rating scales 13
Revaluating the Efficacy and Predictability of Antidepressant Treatments: A Symptom Clustering Approach [22]
Defined groups of symptoms determine the efficacy of antidepressant treatments on empirically and examine the replicability of these groups
Training 41.2 ± 13.3 years Testing 42.7 ± 12.2 years
14
A Machine Learning Approach to Identifying Placebo Responders in Late-Life Depression Trials [23]
In antidepressant (citalopram) trials predict who will respond better to placebo or medication
79.6 ± 4.4 years
15
Cross-trial prediction of treatment outcome in depression: a machine learning approach [24]
After a 12-week course of 18 to 75 years citalopram predict whether patients with depression will achieve symptomatic remission
E. Electronic health record (EHR) data 16
Identifying Suicide Ideation From EHRs detect suicide and Suicidal Attempts in a ideation and attempts using Psychiatric Clinical NLP Research Database using Natural Language Processing [25]
Not reported
17
Natural language processing Using NLP identify to extract symptoms of symptoms of SMI from severe mental illness from clinical EHR text clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project [26]
Not reported
18
A Boosted Machine Learning Approach For Detection of Depression [27]
27 to 67 years
From hospital EHR data predict depression (Euro-depression inventory)
(continued)
136
A. Nag et al. Table 1. (continued)
Sl. No
Paper Title
Objectives
Age Range
19
Artificial Neural Network From sociodemographic (ANN) Model to Predict variables and clinical data Depression among Geriatric predict depression Population at a Slum in Kolkata, India [28]
66.6 ± 5.6 years
20
Ten-year prediction of Using health insurance suicide death using Cox records predict the regression and machine probability of suicide death learning in a nationwide retrospective cohort study in South Korea [29]
14+ years. of age
4 Constraints of AI and Mental Health Care Research Psychology is part of technological know-how and part of educated intuition [30]. The covid-19 pandemic is responsible to increase the need to improve intellectual-health services. Damaging intellectual health conditions disproportionately affected teens, men, women, and everyone. Sentiment evaluation used in chatbots combines sophisticated natural language processing (NLP) and device gaining knowledge of strategies to determine the emotion expressed via the consumer [31]. For instance, Anna, a virtual avatar therapist evolved using the University of Southern California, can pick up on nonverbal cues and guide the communique accordingly, which shows an affirmative expression. Though Anna isn’t always presently available to the wider public, it provides a touch of the future of digital therapists. Artificial intelligence systems require a simplification of mental fashions incorporating feelings & emotions. AI captures the diversity of human emotional experience embedding with the programmer’s data input. Voice inflections or gestures vary from person to person, and affective PC structures are possible to war to capture a diversity of human emotional enjoyment. Emotional expressions are manifested through physically adjustments with some overlapping parameters. Single biological measures along with heart carrying signs of emotional changes. There’s nevertheless no consensus within the scientific community about physiological signal combos that are the most relevant to emotion changes, as emotional reviews are pretty individualized. The increase of affective computing therapeutics is taking place simultaneously by using some monitoring gadgets. Over the path of the pandemic, governments pumped investment into the fast improvement of sensors, telephone apps, and AI-based systems for quarantine enforcement, tracing, and fitness-oriented screening. Monitoring our feelings is an next step proposal of the digital evolution in our lives. As opposed to addressing the lack of intellectual-fitness sources, digital answers can create new disparities inside the provision of services. People are for that reason encouraged to seek self-remedy, less expensive guided mediation, or conversational
Application of Artificial Intelligence in Mental Health
137
bot-oriented applications. Most importantly, those technologies can function without clinician oversight or different forms of human support. For many psychologists, the important component of ineffective treatment options is the therapeutic alliance between the practitioner and the affected person. However, devices are not required to abide by medical protection protocols that record the prevalence of destructive activities. Depression detection via a place of job software program monitoring or wearables might cost individuals their assets of employment or result in higher coverage charges. Betterhelp and Talkspace, counseling apps that connect customers to licensed therapists, have been observed to disclose sensitive statistics with third parties approximately customers’ mental fitness history, sexual orientation, and suicidal mind. Virtual wellness tools generally tend to have excessive drop-out prices, as only a small phase of users frequently comply with treatments using mobile applications. AIassociated troubles with underserved in psychological care, stay underrepresented inside the facts used to analyze, broaden, and install this equipment. Mental-health technologies that rely upon affective computing are jumping ahead of science. Even emotional AI researchers are disapproving excessive claims made through businesses and unsupported by scientific consensus [31].
5 Statistics Scenario in the Area of AI in Mental Health Care Research There are significant privacy concerns, as well as making people feel comfortable and willing to accept varying levels of monitoring in their daily lives. As AI tools are developed, it is critical that protocols are in place to ensure their safety and effectiveness, as well as that they are built and trained with a diverse data set to ensure they are not biased toward a specific population (Fig. 2). New ways of therapy, possibilities to engage hard-to-reach communities, greater patient response, and patients feeling freer to connect with bots rather than clinicians are all the key reasons for increasing AI applications in mental health. The rate of AI-based stools in feature-generation healthcare techniques is being identified by the healthcare environment. AI applications are seen having the capability to develop different process in healthcare actions and execution. Like, reducing cost which AI may bring to the healthcare technique is a vital motivator for AI growth. The rapid growth of AI technologies and tools has been enabled by a perfect mix of greater computer processing power, huge information gathering information banks, and a vast AI skill pool, including in healthcare. The degree of AI methodologies, its acceptance, and also its influence on the community is all expected to change dramatically as a result of this.
138
A. Nag et al.
Fig. 2. Publications frequency by different Year in PubMed website using search terms ‘Application of Artificial Intelligence in Mental Health’
6 Conclusion In concern of finding out current relationships between mental illness and latent variables where huge and prime datasets are required. Acquiring such deep phenotypes where huge datasets pose a task for mental health research and need to be collaborative precedence. Deep Learning techniques switch learning, and transfer learning will be progressively essential to handle these complicated records and the feature threat will be in ensuring that these models are clinically interpretable. The clinical usefulness of those systems of wealthy records requires more cautious consideration, and studies on the use of social media want to be held to more advanced methodological standards. Eventually, by the means of AI, insights are derived from data that might help to facilitate diagnosis and treatment. But, it’s far essential to recall the practicality of these insights and whether or not they may be converted and applied in the hospital. With the advancement of time, Abacuses, Calculators, Computers and so on, technology has become a part of mankind. The advantage of tools that permit us to offload cognitive obligations into our environment is that it allows our minds to cognizance
Application of Artificial Intelligence in Mental Health
139
greater complicated problems. AI is not just about growing automobiles that power themselves or ‘AI doctors’ that diagnose autonomously it’s miles approximately empowering humans. If we can see more, understand extra, beyond the constraints with which we are born, then we can do extra. Better understand what lies upon the horizon. Intellectual health care is no exception.
References 1. Miles, O., West, R., Nadarzynski, T.: Health Chatbots acceptability moderated by perceived stigma and severity: a cross-sectional survey. Dig. Health 7, 205520762110630 (2021). https:// doi.org/10.1177/20552076211063012 2. Park, G., Lee, H., Lee, M.: Artificial Intelligence-based Healthcare Interventions: a systematic review. Korean J. Adult Nurs. 33(5), 427 (2021). https://doi.org/10.7475/kjan.2021.33.5.427 3. Yadav, K., Hasija, Y.: IOT and Big Data Inter-Relation: A boom in biomedical healthcare. In: 2022 IEEE Delhi Section Conference (DELCON) (2022). https://doi.org/10.1109/delcon 54057.2022.9753239 4. Rosenfeld, A., et al.: Big data analytics and artificial intelligence in mental healthcare. In: Khanna, A., Gupta, D., Dey, N. (eds.) Applications of Big Data in Healthcare, pp. 137–171. Academic Press (2021). https://doi.org/10.1016/b978-0-12-820203-6.00001-1 5. Horn, R.L., Weisz, J.R.: Can artificial intelligence improve psychotherapy research and practice? Adm. Policy Ment. Health Mental Health Serv. Res. 47(5), 852–855 (2020). https://doi. org/10.1007/s10488-020-01056-9 6. Baptiste, M., Moinuddeen, S.S., Soliz, C.L., Ehsan, H., Kaneko, G.: Making sense of genetic information: the promising evolution of clinical stratification and precision oncology using machine learning. Genes 12(5), 722 (2021). https://doi.org/10.3390/genes12050722 7. Uusitalo, S., Tuominen, J., Arstila, V.: Mapping out the philosophical questions of AI and clinical practice in diagnosing and treating mental disorders. J. Eval. Clin. Pract. 27(3), 478– 484 (2020). https://doi.org/10.1111/jep.13485 8. Wang, T., Park, J.: Design and implementation of intelligent sports training system for college students’ mental health education. Front. Psychol. 12, 634978 (2021). https://doi.org/10.3389/ fpsyg.2021.634978 9. Rubeis, G.: IHealth: the ethics of artificial intelligence and big data in mental healthcare. Internet Interv. 28, 100518 (2022). https://doi.org/10.1016/j.invent.2022.100518 10. Chattopadhyay, S.: A neuro-fuzzy approach for the diagnosis of depression. Appl. Comput. Inform. 13(1), 10–18 (2017). https://doi.org/10.1016/j.aci.2014.01.001 11. Bain, E.E., et al.: Use of a novel artificial intelligence platform on mobile devices to assess dosing compliance in a phase 2 clinical trial in subjects with schizophrenia. JMIR mHealth uHealth 5(2), e18 (2017). https://doi.org/10.2196/mhealth.7030 12. Wahle, F., Kowatsch, T., Fleisch, E., Rufer, M., Weidt, S.: Mobile sensing and support for people with depression: a pilot trial in the wild. JMIR mHealth uHealth. 4(3), e111 (2016). https://doi.org/10.2196/mhealth.5960 13. Ricard, B.J., Marsch, L.A., Crosier, B., Hassanpour, S.: Exploring the utility of communitygenerated social media content for detecting depression: an analytical study on Instagram. J. Med Internet Res 20(12), e11817 (2018). https://doi.org/10.2196/11817 14. Cook, B.L., Progovac, A.M., Chen, P., Mullin, B., Hou, S., Baca-Garcia, E.: Comput Math Methods Med. 2016, 8708434 (2016). https://doi.org/10.1155/2016/8708434 15. Tung, C., Lu, W.: Analyzing depression tendency of web posts using an event-driven depression tendency warning model. Artif. Intell. Med. 66, 53–62 (2016). https://doi.org/10.1016/j. artmed.2015.10.003
140
A. Nag et al.
16. Deshpande, M., Rao, V.: Depression detection using emotion artificial intelligence. Proc. Int. Conf. Intell. Sustain. Syst. ICISS 2017, 858–862 (2017). https://doi.org/10.1109/ISS1.2017. 8389299 17. Aldarwish, M.M., Ahmad, H.F.: Predicting depression levels using social media posts. In: Proceedings of the - 2017 IEEE 13th International Symposium Autonomous Decentralized System ISADS 2017, pp. 277–280 (2017). https://doi.org/10.1109/ISADS.2017.41 18. Gkotsis, G., Oellrich, A., Velupillai, S., Liakata, M., Hubbard, T.J.P., Dobson, R.J.B., et al.: Characterisation of mental health conditions in social media using informed deep learning. Sci. Rep. 7(1), 110 (2017). https://doi.org/10.1038/srep45141 19. Nenadi´c, I., Dietzek, M., Langbein, K., Sauer, H., Gaser, C.: BrainAGE score indicates accelerated brain aging in schizophrenia, but not bipolar disorder. Psychiatry Res. Neuroimaging. 266(March), 86–89 (2017). https://doi.org/10.1016/j.pscychresns.2017.05.006 20. Kalmady, S.V., Greiner, R., Agrawal, R., Shivakumar, V., Narayanaswamy, J.C., Brown, M.R.G., et al.: Towards artificial intelligence in mental health by improving schizophrenia prediction with multiple brain parcellation ensemble-learning. NPJ Schizophr. 5(1), 2 (2019). https://doi.org/10.1038/s41537-018-0070-8 21. Patel, M.J., Andreescu, C., Price, J.C., Edelman, K.L., Reynolds, C.F., Aizenstein, H.J.: Machine learning approaches for integrating clinical and imaging features in late-life depression classification and response prediction. Int. J. Geriatr. Psychiatry 30(10), 1056–1067 (2015). https://doi.org/10.1002/gps.4262 22. Drysdale, A.T., et al.: Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat. Med. 23(1), 28–38 (2016). https://doi.org/10.1038/nm.4246 23. Chekroud, A.M., Gueorguieva, R., Krumholz, H.M., Trivedi, M.H., Krystal, J.H., McCarthy, G.: Reevaluating the efficacy and predictability of antidepressant treatments: a symptom clustering approach. JAMA Psychiatry 74(4), 370–378 (2017). https://doi.org/10.1001/jam apsychiatry.2017.0025 24. Zilcha-Mano, S., Roose, S.P., Brown, P.J., Rutherford, B.R.: A machine learning approach to identifying placebo responders in late-life depression trials. The Am. J. Geriat. Psychiatry 26(6), 669–677 (2018). https://doi.org/10.1016/j.jagp.2018.01.001 25. Chekroud, A.M., Zotti, R.J., Shehzad, Z., Gueorguieva, R., Johnson, M.K., Trivedi, M.H., et al.: Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry 3(3), 243–250 (2016). https://doi.org/10.1016/S2215-0366(15)00471-X 26. Fernandes, A.C., Dutta, R., Velupillai, S., Sanyal, J., Stewart, R., Chandran, D.: Identifying suicide ideation and suicidal attempts in a psychiatric clinical research database using natural language processing. Sci. Rep. 8(1), 7426 (2018). https://doi.org/10.1038/s41598-018-257 73-2 27. Jackson, R.G., Patel, R., Jayatilleke, N., Kolliakou, A., Ball, M., Gorrell, G., et al.: Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open 7(1), e012012 (2017). https://doi.org/10.1136/bmjopen-2016-012012 28. Arun, V., Prajwal, V., Krishna, M., Arunkumar, B.V., Padma, S.K., Shyam, V.: A boosted machine learning approach for detection of depression. In: Proceedings of the 2018 IEEE Symposium Series Computing Intelligence SSCI. 2018, pp. 41–7 (2018). https://doi.org/10. 1109/SSCI.2018.8628945 29. Sau, A., Bhakta, I.: Artificial neural network (ANN) model to predict depression among geriatric population at a slum in Kolkata, India. India. J. Clin. Diagn. Res. 11(5), 1–4 (2017). https://doi.org/10.7860/JCDR/2017/23656.9762
Application of Artificial Intelligence in Mental Health
141
30. Choi, S.B., Lee, W., Yoon, J.H., Won, J.U., Kim, D.W.: Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea. J. Affect. Disord. 231(January), 8–14 (2018). https://doi.org/10.1016/j.jad.2018. 01.019 31. Nica, E., Kliestik, T., Sabie, O.M., Ioanei Gatan, M.L.: Socio-affective technologies for psychological health: Emotional artificial intelligence in empathetic robots. Am. J. Med. Res. 7(2), 9 (2020). https://doi.org/10.22381/ajmr7220201
Cold Rolling Mill Energy Consumption Prediction Using Machine Learning Danilo G. de Oliveira1 , Jos´e Francisco S. Filho1 , Fabiano Miranda1 , Pedro H. Serpa2 , and Rafael Stubs Parpinelli2(B)
2
1 ArcelorMittal Vega, S˜ ao Francisco do Sul, SC, Brazil {danilo.oliveira,jose.francisco}@arcelormittal.com.br Santa Catarina State University, Graduate Program in Applied Computing, Joinville, SC, Brazil [email protected]
Abstract. Even though energy consumption has a significant impact on the operational cost of tandem cold mills (TCM) of steel strips, not enough attention has been given to this important consumable throughout the years. Machine learning techniques are becoming extremely common in the steel industry due to the high level of automation of the segment and the large databases available. This work proposes a complete system capable of handling input data, training a machine learning algorithm, predicting the energy consumption of a TCM, and evaluating results. A performance comparison of Artificial Neural Networks (ANN) and Random Forest (RF) algorithms with an existing statistical model concludes that the RF outperforms the other two on a productto-product base and on a monthly base. Actual model application has also been simulated indicating that the proposed system is adequate to handle energy prediction.
Keywords: Steel Industry Univariate Regression
1
· Energy Prediction · Cold Rolling Mill ·
Introduction
The cold rolling of steel strips is one of the highest consumers of electric energy in the steel industry and this expense is one of the highest in the yearly budget of the business playing a significant role in the operational cost. The tandem cold rolling mill (TCM) studied in this document, composed of four stands and represented in Fig. 1, has a total installed electrical power of 27.0MW, equivalent to approximately 45.000 family houses, and is the highest consumer of electric energy in the Santa Catarina State, Brazil. Such high-power requirements rest in the fact that steel presents a high resistance to plastic deformation. For this very reason, steel is the material of choice of the automotive industry and is a crucial element in reinforcing the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 142–151, 2023. https://doi.org/10.1007/978-3-031-27440-4_14
Cold Rolling Mill Energy Consumption Prediction Using ML
143
automobile structure to improve the safety of the driver and other occupants in the event of a car crash [2]. Additionally, to achieve the required mechanical properties to comply with the safety regulations imposed on the automotive industry around the globe [2], the strip thickness must be reduced by up to 85% in the cold rolling process, requiring a considerable amount of energy.
Fig. 1. Representation of a tandem cold mill.
For all these reasons, predicting energy consumption according to the line and product conditions can have a significant impact on the production cost and, thus, on the profitability of the business. The TCM studied in this work uses an online rolling model for the prediction of several process set points, including electric power, which yields electric energy when integrated with the time domain and multiplied by the efficiency factors. The online model is constantly monitoring process data and comparing them to previous predictions. Differences (or model inaccuracies) are fed into an adaptation algorithm designed to adjust several model parameters in order to compensate for these errors, as described by Lee and Lee (2000) [6]. This approach severely limits the application of this model for long-term predictions since it depends on many process variables that are only available a few minutes ahead in time. An alternative approach to the classical rolling models and finite element modeling is Machine Learning (ML), a field of research that allows the development of computational tools for complex problem-solving [8]. The motivation for the development of ML algorithms, according to de Castro et al. (2007), is to provide alternative solution algorithms to problems that could not be satisfactorily resolved by traditional techniques [3]. ML techniques are data-driven approaches and, for this reason, are highly dependent on data gathering. This work proposes the development of an ML model to predict the electric energy consumption of an entire month of production of a tandem cold mill of thin steel strips. As this is a univariate regression problem, the author compares the performance of two different approaches: feed-forward Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN) and Random Forest (RF). A factorial grid search approach was defined to select hyperparameters for both algorithms.
144
D. G. de Oliveira et al.
The rest of the paper is organized as follows. Section 2 describes the problem scenario. Section 3 contains the dataset details and the ML proposed approach. Section 4 describes the results and analysis. The conclusions and future work directions are shown in Sect. 5.
2
Cold Rolling Mill and Energy Consumption
Cold rolling is an efficient process for producing thin steel sheets for the subsequent stamping, where quality is a critical factor in terms of microstructure, surface texture, and uniformity of mechanical properties and thickness [4]. There are several configurations of TCMs around the world. The main differences between them are in the number of rolls in each stand, which can range from two to twenty, and the number of stands itself, which can range from one to seven stands typically [11]. Figure 1 shows the configuration of the Cold Strip Mill studied in this work, which has four stands, each with four rolls, two work rolls (in direct contact with the steel strip), and two backup rolls. Several theories for both cold rolling and hot rolling have been proposed in the last century after the pioneering works of Sibel and von Karman in 1924 and 1925 [1], in which they developed the first equations for predicting the rolling force and torque. The most robust and complete rolling theory proposed in the 20th century is, without a doubt, the one proposed by Orowan in 1943 [9]. In Liu et al. (1985)’s perspective, despite being considered the “exact” theory of rolling, the approach Orowan adopted is based on a number of assumptions, including a state plan of deformations and no elastic deformation of the steel sheet [7]. The complexity of Orowan’s approach has led several researchers to develop analytical solutions based on simplifications of key complicating factors raised by the original 1943 work [1]. As an alternative to the traditional analytical solution and the costly method of finite elements, modern approaches such as Machine Learning (ML) are becoming increasingly common in the steel industry [5]. ML can be described as the assignment of a specific task to a computer program and the machine learns if there is a measurable performance criterion which improves over time as the program acquires experience in performing the assigned task [10]. So the machine learning process is based on data that positions the steel industry in a favorable condition for the application of such techniques because as early as 1940 and 1950, the steel industry made major investments in the instrumentation of the finishing processes and the application of early computers and data storages in the acquisition and processing of data and advanced modeling [11]. In recent years, several works have focused on the application of Artificial Neural Networks (ANN), Random Forest (RF), and other algorithms in the realm of ML, taking advantage of their versatility and great ability to generalize in complex and non-linear problems, reaching satisfactory results [12] in many problems in the steel finishing lines. Hu et al. (2019) conducted a very comprehensive literature review on the use of ML applied to the steel industry, listing more than 60 references in the area [5]. The authors noted that several papers
Cold Rolling Mill Energy Consumption Prediction Using ML
145
were published using different techniques to predict rolling forces and torques, flatness and its actuators, production schedule, thickness, strip and roll temperatures, mechanical properties of the finished material, and internal stresses, among other process and quality variables, however, none of these references are energy related [5]. As previously mentioned, both algorithms are data-driven and quality is of major importance for any of these techniques to succeed. The next section describes the procedures defined in this work for the adequate selection of the database.
3
Proposed Approach
For the work presented in this paper, three separate databases were initially available: Level 2, IBA (which is a commercial name defined by the developer of the system), and Plant integration and management system (PIMS). These datasets were collected directly from the TCM databases, comprising 14 months of production, and are protected by non-disclosure agreements. The first one, the Level 2 (L2) database, is the official production database responsible for storing basic product information such as customer requirements, measured dimensions, date of production, and quality information. It is organized on a coil-to-coil basis where each line of the dataset corresponds to one rolled coil (product) and it is used as the reference in this work to verify the other datasets. IBA is a dataset resulting from the interaction of an electronic board connected directly to the programmable logic computer (PLC) of the production line. PIMS, on the other hand, acquires data not with optical fiber but with regular Ethernet cables and at 1.0 s only (for some variables, most are at an even slower sampling rate) in order to increase storage capacity. One important positive aspect of the PIMS database is that it is connected directly to the system responsible for reporting the TCM energy consumption (called CCK), which provides PIMS a great advantage over IBA. In order to combine the advantages and reduce the weaknesses of each database, PIMS and IBA were merged into one larger dataset presenting reliable process data and actual energy readings from the entire grid. With the objective to reduce the model dependency on these short-term process variables, the authors decided to consider as inputs only the most basic product data and the process information that could be estimated from scheduling information: entry coil thickness, exit coil thickness, total coil reduction, coil width, coil weight, coil hardness, processing time and average speed (detailed in Table 1). A general system flow is indicated in Fig. 2. The flow chart initiates with the data gathering from L2, PIMS and IBA databases, which is then split into training and testing datasets. The next step is a data processing and hyperparameter tuning, followed by model training and validation. If the desired accuracy is not achieved, data processing and hyperparameter tuning must be restarted. If accuracy is achieved, the model is trained and ready for regular use. The model can be used with both Schedule and Production data providing the energy prediction
146
D. G. de Oliveira et al. Table 1. Selected input and output variables for energy prediction.
Item
Description
Domain Range
Unit
Usage
EntrTh
Entry coil thickness
Real
ExitTh
Exit coil thickness
Real
[1.8–4.8]
mm
Input
[0.35–2.70]
mm
Input
Red
Total coil reduction
Wd
Coil width
Real
[35.0–85.0]
%
Input
Real
[750–1878]
mm
Wg
Coil weight
Input
Real
[0.6–37.5]
t
SH
Strip hardness
Real
[64.0–150.0]
CRT
Coil running time
Real
[0.002–4.600] h
AvgS
Average exit line speed
Real
Energy Consumed Electric Energy Real
kgf/mm
Input 2
Input Input
[85.0–910.0]
mpm
Input
[0–4000]
kWh
Output
for future or past coils. The model application in both cases is further explored in the next sections, along with hyperparameters definition, model development, and testing methods. In order to adequately tackle the prediction of energy consumption of a TCM, a complex, non-linear regression problem, two different ML algorithms were selected for comparison purposes: multi-layer perceptron ANN and RF. They have been implemented taking advantage of the TensorFlow Python package, which allows a hassle-free application of well-developed algorithms with different hyperparameters for tuning the model to this specific problem. A factorial combination grid search method was defined to set model architecture and hyperparameters, allowing the comparison of two well-adjusted models. Through the grid search results, the feed-forward MLP ANN architecture was defined with three hidden layers and 30 neurons in each, with Adam optimizer with a learning rate of 0.01, the tangent hyperbolic function (referred to as tanh function) and MSE of loss function; the RF hyperparameters were set to 100 trees, 5 minimum examples and 32 for maximum depth. Another comparison used in this work is the current practice of budget preparation elaborated by the engineers of the operational team of the TCM. It is based on several years of historical evaluation and it consists in multiplying a constant by production weight to estimate the expected consumption of energy by the process. This is then multiplied by the forecast of energy price for the next fiscal year to reach the financial budget for this important cost figure in the company’s financial report. This procedure has been used for several years and the averaging procedure to calculate the mentioned constant is frequently improved making its performance a very good benchmark for this model.
Cold Rolling Mill Energy Consumption Prediction Using ML
147
Fig. 2. Flow chart of the energy prediction system.
4
Results and Analysis: Model Training and Energy Prediction
The 14 months of the dataset have been split into two datasets: the initial 6 months were reserved for model training and the remaining 8 months for model testing. This uneven distribution of the database was defined based on the fact that each month produces approximately 4000 coils. Thus, the initial 6 months of data provided approximately 20000 coils covering the entire product mix, which was sufficient for adequate model training. After model training, the remaining 8 months of data have been presented to each model for comparison. The scatter plot in Fig. 3 shows that both the ANN (A) and the RF (B) regressors present similar results on the energy prediction with a slight numerical advantage to the ANN. However, Fig. 3 also shows that the budget estimation method (C) presents much worse results than the two regressors on this coil-to-coil comparison. This conclusion can be confirmed by the evaluation of the REC curve in Fig. 4 which confirms that both regressors present similar results over the entire dataset with the budget estimation method is considerably worse. When the month comparison comprises all the rolled coils, including those of which PIMS have no energy data, Fig. 5 shows that both the ANN and RF regressors have very similar results. The RF presents a numerical advantage over the ANN model, with the RF reaching an MAE error of 6.7% and 7.9% of RMSE while the ANN achieved an MAE of 7.2% and RMSE of 8.3% on the 8 months of the database. The benchmark has also outperformed the ANN
148
D. G. de Oliveira et al.
in this comparison presenting an MAE of 7.0% but the worst result of RMSE with 8.6%. The RF presented another interesting advantage when compared to ANN which was the training cost. Throughout the entire set of simulations executed to define model architecture, benchmark tests, and experiments, it was noticeable that the required training time for the RF was ranging from 6 to 8 times faster than the ANN for the same database on a regular 16 GB of RAM, 1.8 GHz, 4 cores computer.
Fig. 3. Comparison of energy measurements against predictions for ANN model (A), RF model (B), and budget estimation (C).
Fig. 4. REC curve of ANN model (blue), RF model (orange), and budget reference (green).
Cold Rolling Mill Energy Consumption Prediction Using ML
149
Fig. 5. Month energy consumption comparison of ANN model (blue), RF model (orange), and budget reference (green) with real energy measurement.
5
Conclusions and Future Work
The tandem cold rolling mill of steel strips is a high-energy demanding process due to the high loads required to deform this strong material. Electric energy has a significant impact on the financial balance of any steel enterprise and being able to accurately predict it provides a strategic advantage in cost control, especially at this moment of prices on the rise all around the world. The objective of this work was to develop a machine learning model capable of predicting the energy consumption of a full-month production of a tandem cold mill and to make such a model as undependable as possible to very shortterm process variables. Artificial neural networks and random forests have been implemented for comparison purposes and a grid search procedure on the hyperparameters of both algorithms was carried out for adequate selection and optimized results. A thorough comparison of the results with test data showed that the RF performed
150
D. G. de Oliveira et al.
better than the ANN. The lower training time also contributed to the decision to select the RF algorithm to be applied in the real scenario. The RF model was also compared with a statistical model which is currently being used by the operational team for energy budget estimation. The RF outperformed this method as well. However, the added value of the RF model was only reachable with extensive data manipulation and the merging of two independent databases to improve the quality of the acquired data set. With the recent popularization of Industry 4.0 concepts, sensors and data acquisition systems are becoming less and less expensive making it highly advisable for the company to evaluate improving the energy data acquisition and storage to improve data quality and reliability. Additional investigations on the database are also going to be carried out to verify if more information could be extracted to enhance training dataset quality. Acknowledgements. This work received financial support from the ArcelorMittal Brazil R&D Group, from the Coordination for the Improvement of Higher Education Personnel - CAPES - Brazil (PROAP/AUXPE) 0093/2021, and from the FAPESC agency.
References 1. Alexander, J.M.: On the Theory of Rolling. In: Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 326(1567), 535–563 (1972), publisher: The Royal Society 2. Buchmayr, B., Degner, M., Palkowski, H.: Future challenges in the steel industry and consequences for rolling plant technologies. BHM Berg- Huettenmaenn. Monatsh. 163, 1–8 (2018) 3. de Castro, L.N.: Fundamentals of natural computing: an overview. Phys. Life Rev. 4(1), 1–36 (2007) 4. Freshwater, I.J.: Simplified theories of flat rolling-I. The calculation of roll pressure, roll force and roll torque. Int. J. Mech. Sci. 38(6), 633–648 (1996) 5. Hu, Z., Wei, Z., Sun, H., Yang, J., Wei, L.: Optimization of Metal Rolling Control Using Soft Computing Approaches: A Review. Archives of Computational Methods in Engineering (2019) 6. Lee, D., Lee, Y.: Application of neural-network for improving accuracy of roll force model in hot-rolling mill. IFAC Proc. Volumes 33(22), 227–231 (2000) 7. Liu, C., Hartley, P., Sturgess, C.E.N., Rowe, G.W.: Elastic-plastic finite-element modelling of cold rolling of strip. Int. J. Mech. Sci. 27(7), 531–541 (1985) 8. Mosavi, A., Salimi, M., Faizollahzadeh Ardabili, S., Rabczuk, T., Shamshirband, S., Varkonyi-Koczy, A.R.: State of the art of machine learning models in energy systems, a systematic review. Energies 12(7), 1301 (2019) 9. Orowan, E.: The Calculation of Roll Pressure in Hot and Cold Flat Rolling. Proceedings of the Institution of Mechanical Engineers 150(1), 140–167 (Jun 1943), publisher: IMECHE 10. Ray, S.: A Quick Review of Machine Learning Algorithms. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) pp. 35–39 (Feb 2019)
Cold Rolling Mill Energy Consumption Prediction Using ML
151
11. Roberts, W.L.: Cold rolling of steel. No. 2 in Manufacturing engineering and materials processing, M. Dekker, New York (1978) 12. Sun, J., Liu, Y.M., Wang, Q.L., Hu, Y.K., Zhang, D.H.: Mathematical model of lever arm coefficient in cold rolling process. Int. J. Adv. Manufact. Technol. 97(5), 1847–1859 (2018)
Virtual Reconstruction of Adaptive Spectral and Spatial Features Based on CNN for HSI Classification Maissa Hamouda1,2(B) and MedSalim bouhlel2 1
2
ISITCom, Sousse University, Sousse, Tunisia maissa [email protected] SETIT Laboratory, Sfax University, Sfax, Tunisia [email protected]
Abstract. The field of image processing is becoming very important, given its great usefulness for safety, the environment, the economy, and health monitoring... Hyperspectral remote sensing (HSI) images are very useful for mapping or environmental monitoring. Indeed, HSIs are a set of gigantic spectral bands, where each spectral band requires specific processing, to combat the problem of high dimensionality and limited ordinary training samples. Fortunately, with the development of Machine Learning and Deep Learning, object recognition in HSI has become possible. The HSI data is very heterogeneous and sometimes lags due to time and capture environment. To classify a pixel of an HSI, it is necessary to study its spectro-spatial neighbors and possibility of offset errors. In this paper, we propose a framework for analysis, 3D virtual reconstruction, and recognition of HSI objects, based on convolutional neural networks (CNN), with edge-adaptive spatial feature extraction. This framework makes it possible to predict all the possibilities of the fundamental nature of the object. It also allows to correct the errors on edges, which are due to the extraction of spatial data of static sizes. These two proposed approaches improve the precision while guaranteeing a very reasonable computation time. Tests were applied to two public HSIs. The results proved the effectiveness of the approaches and can still be improved in the future.
1
Introduction
Remote sensing is a set of techniques used to remotely determine the properties of radiation emitted or reflected by a natural or artificial object [1]. Remote sensing technology involves the entire process of capturing and recording radiant energy emitted or reflected from an object, processing the resulting data [2], and finally analyzing the final data [3]. This process uses sensors commonly found on mobile platforms such as airplanes, satellites, etc., such as cameras, lasers, radars, sonars, etc. [4]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 152–160, 2023. https://doi.org/10.1007/978-3-031-27440-4_15
Virtual Reconstruction of Adaptive Spectral and Spatial Features
153
The Hyperspectral Satellite Image (HSI) classification is too important and difficult [5]. In fact, there are different issues depending on size, quantity, and location of pixels in HSI. The HSI problems are various, among which we cite redundancy and unmixing problems [6], correlation and non-linearity problems [7], and data dimensionality and sampling imbalance problems [8]. The Convolution Neural Networks (CNN) are very efficient in image classification because they allow each pixel to be read and processed with its neighbors [9]. Often it is applied by spatial data of data extracted from the image, of equal sizes, then each block of the CNN protocol is processed to obtain an active pixel prediction (the pixel in the center of the block). However, sometimes when using fixed-size spatial data, Information in some places is wrong. Previous work using CNN has proposed various HSI processing methods. For example, CNN-based methods with spectral feature reduction. In [10] the authors suggested an HSI classification (HSIc) method by CNN based on Dual Extraction. In [5] the authors suggested an HSIc-CNN method based on Smart Spectral Feature Extraction. In [2] the authors suggested a CNN-based HSIc approach in which convolution filters are learned adaptively from the data via unsupervised clustering. This approach (AKSCCk) is based on CKMeans and CNN algorithms. One is to generate a filter without knowing the number of clusters in advance, and other is to classify. By using an adaptive kernel, the proposed approach achieved better classification accuracy compared to previous approaches. The experimental results show the efficiency and feasibility of the AKSCCk approach. In [11] the authors suggested an adaptive network ADGAN for HSIc. To solve the training data imbalance problem, first, adjust the discriminator to only one classifier so that they are consistent with each other. Then, an adaptive drop block (AdapDrop) was proposed as a regularization technique used by generators and discriminators to mitigate the mode collapse problem. AdapDrop generates adaptively shaped drop masks instead of fixed-size boxes, reducing DropBlock restrictions when dealing with differently shaped ground objects. Previous work using CNN has proposed various HSI processing methods. For example, CNN-based methods with adaptive parameters. In [9] the authors suggested an HSIc-CNN method based on adaptive kernel and batch, using the subdivision layer. In [12] the authors suggested an adaptive kernel and batchbased HSIc-CNN method, using the mean subdivision layer. In [13] the authors suggested an approach based on integrating local multiple learning (LML) into fully connected layers of a convolutional neural network (CNN), which can learn representative features. A regularization was adopted to exploit the prediction label distribution of the Ordinary Training Samples and solve the overfitting problem. In [14] the authors suggested an HSIc method based on attention of M3DCNN. First, create a virtual HSI sample using a mixed algorithm to extend the original dataset. the sample size of the extended dataset is twice that of the original dataset, which greatly reduces the phenomenon of overfitting caused by small HSI samples. Second, the structure of 3DCNN is improved. A Convolutional Block Warning Module (CBAM) has been added between each 3D
154
M. Hamouda and M. Bouhlel
Fig. 1. Second and Third Steps
convolution layer and ReLU layer, using a total of three CBAMs to highlight HSI’s spectral and spatial dimension identification capabilities and remove features randomly... Finally, use the softmax classifier to reduce the spectral and spatial features and get the final classification result. In this article, we propose a new approach, which has never been used, for HSI intelligent classification. This approach is called VirSSF-CNN, with multilayers, allowing to detect of the nature of pixels by estimating all possible errors. The method comprises several parts; (1) Extraction of spectral data vectors and fusion of pixels to obtain a single spatio-spectral band. (2) Apply five algorithms to create five virtual layers. (3) Reconstruction of 3D images. (4) Extraction of spatial data of edge-adaptive sizes. (5) Convolution and processing of each block until recognition of the pixel. (6) Place the pixel in its correct position. (7) Fusion of 5 virtual spectral bands and labeling.
2
Proposed Approach
Hyperspectral imaging is a technology for obtaining the image of a scene in a large number of spectral bands that are both narrow and contiguous [15]. The HSI has been used in aeronautical imaging since 1990 and on observation satellites since the end of 2000. It is used in various chemical and industrial processes mainly for determining the quality of products [16]. 2.1
Extraction of Spectral Data Vectors and Fusions of Pixels to Obtain a Single Spatial-Spectral Band
An HSI is a set of spectral bands, where each one imports information about the image taken at a certain time [17]. The first step is to extract the spectral bands. Then we merge these spectral bands to get one at the end. The fusion we apply is by average.
Virtual Reconstruction of Adaptive Spectral and Spatial Features
155
Fig. 2. Fourth and Fifth Steps
2.2
Apply Five Algorithms to Create Five Virtual Layers
The main contribution of this article is at this stage. Indeed, at this level, we propose a virtual creation of 4 new layers, and fifth is a copy of Input image. In effect, a new layer is created from Input image each time, shifting one pixel. Once left, once right, once up, and once down. This will solve the problem of unmixing and limited number of samples (Fig. 1). 2.3
3D Image Reconstruction
After creating 5 layers of our image, we rebuild the new X5 image. So to do this, we have chosen to place five superimposed layers. Let X be Input image from the previous step. The first spectral band was created forward and last at the end with an X in the middle. X is composed of R×C pixels P, where i=1:R and j=1:C are the coordinates of P; for each pixel P of the image X, we create a new virtual image X5, which has the same sizes R×C with B=5 third dimension:X5P = XPi−1,j + XPi+1,j + XPi,j + XPi,j−1 + XPi,j+1 . 2.4
Edge-Adaptive Spatial Data Extraction
The pixels that we want to study must be extracted with their neighboring pixels. That is, if the desired block size is 3×3, we need to extract one pixel left, one pixel right, one pixel top, one-pixel bottom, and also diagonal pixels. Thus, the active pixel is surrounded by all its neighbors. Sometimes, if we choose a larger block, to have more precision on content, taking into account the context, at the limit the spatial data is filled with zeros or random values. This will degrade the result. In order to solve this problem, we added a second contribution to our approach, to extract size-adaptive spatial data at the boundaries (edge of the image) (Fig. 2). That is, instead of extracting spatial data of equal sizes and filling the empty pixels at the edges with 0s, which gives false results during classification; we propose to give a maximum block size, in the middle, then decrease it according to the location of the pixel.
156
2.5
M. Hamouda and M. Bouhlel
Convolution and Processing of Each Block Until Recognition of the pixel
For the convolutional processing [12], we chose the classic model shown in the figure; (1) Convolution (L1), ReLU(L2), subdivision (L3) {×2}; (2) Convolution, ReLU {×1}; (3) FullyConnected (L4) {×2}; So, the CNN layers are : Dictionnary b : Bloc, l : Layer, p : Pixel, n : Neuron, N : Output neurons. L1 The convolution aims to multiplying the values of a first matrix by the corresponding weights of a second matrix and n calculating their sum. These two matrices are data block and filter; if the block size of the previous layer size of result bloclayer bloclayer−1 P ixel was m × n, P ixel are m − 1 × n − 1; it is outputl−1 l l l + BiaslP ). defined by: bP = f ( n∈X l bn ⊗ f iltern,P P L2 The rectified nonlinear units (ReLU) refer to a nonlinear real function; it is defined by: f (X) = max(0, X). L3 The subdivision receives multiple spatial data as input and applies maximum l−1 operation to each of them:blP = f l (max(bl−1 n−sizeOf F ilter , ..., bn+sizeOf F ilter ). L4 The full connection function aims to perform a linear transformation on Input vector through a weight matrix. A nonlinear transformation is then applied to the product via a nonlinear activation function f, where N is the number of N layer−1 l bl−1 × f iltern,P + BiaslP ). output neurons; it is defined by:blP = f l ( n n=1
2.6
Placing Pixels in Their Positions, Merging the Five Spectral Bands and Labeling
We repeat the CNN protocol for each pixel (block of pixels) of each of the five layers of X5. At the end of this step, we obtain a new Y image also containing 5 layers. Finally, we choose the most frequented value of the classes and we look for its label in our learning base. The softmax function is used to convert a score to a probability in multiclass classification. For class=1,2,...,N: blocN = blocclass f ( eN eblocn ). n=1
3
Experiences and Results
To perform tests and validate this work, we chose two public HSIs [18]; These datasets are: SalinasA, IndianPines. 3.1
Tests
We performed four testing steps to really evaluate our proposed approach. Then, we completed the section with state-of-the-art methods. The tests are as follows:
Virtual Reconstruction of Adaptive Spectral and Spatial Features
157
Fig. 3. Classification result of “SalinasA”
1. Tests similar to the proposed architecture (our tests), marked in the tables by (1): (a) SCNN: Simple CNN; (b) SCNN Asd: Simple CNN with adaptive spatial data; (c) VCNN NoAsd: VCNN without adaptive spatial data; (d) VCNN: Proposed VCNN. 2. State-of-the-art works and results, marked in the tables by (2): (a) SFECNN [5]: Intelligent feature extraction based on Softmax function and HSIc based on CNN; (b) DRFLVAT [13] : In-depth representative feature learning with virtual adversarial training for semi-supervised HSIc; (c) DCNN [19]: Double CNN for reduction and HSIc. 3.2
Results and Discussions
The results obtained (Fig. 3, Table 1) by classification of the ’SalinasA’ datasets of the four methods (1) are 77.1224%, 90.4875%, 97.6744%, 89.0866%, successively for methods 1, 2, 3, and 4. Indeed, classes ’Brocoli green weeds 1’ (391 testing samples) and ’Lettuce romaine 4wk’ (616 testing samples) were the most accurate. However, classes ’Lettuce romaine 5wk’ (1525 testing samples) and ’Corn senesced green weeds’ (1343 testing samples) are the least accurate. The results obtained (Fig. 4, Table 2) by classification of the ’IndianPines’ datasets of the four methods (1) are 71.7337%, 80.5993%, 94.8633%, 89.3793%, successively for methods 1, 2, 3, and 4. Indeed, classes ’Oats’ (20 testing samples) and ’Grass pasture mowed’ (28 testing samples) were the most accurate. However, classes ’Soybean mintill’ (2455 testing samples) and ’Corn notill’ (1428 testing samples) were the least accurate. Table 1. Classification result for ‘SalinasA’ Land Cover Type
SCNN (1) VCNN NoAsd (1) SCNN Asd (1) SFECNN (2) DCNN (2)
VCNN (1)
Brocoli green weeds 1 Corn senesced green weeds Lettuce romaine 4wk Lettuce romaine 5wk Lettuce romaine 6wk Lettuce romaine 7wk
0.87857 0.65657 0.82634 0.59085 0.81119 0.78754
0.8816 0.63288 0.82784 0.58085 0.81394 0.7817
0.88809 0.64757 0.82308 0.57723 0.8095 0.783
0.8823 0.6245 0.8299 0.5761 0.8142 0.7820
0.88992 0.6294 0.82708 0.57367 0.8107 0.7769
Kappa
0.7233
0.8678
0.8847
0.8867
0.9171
0.9718
OA AA Time
77.1224 0.7680 89.0705
89.0866 0.8832 452.3220
90.4875 0.9036 87.0772
90.6556 0.9135 0.0816e+04
93.1634 97.6744 0.9342 0.9766 0.2914e+04 444.3389
0.882 0.6317 0.8302 0.5762 0.8137 0.7826
158
M. Hamouda and M. Bouhlel Table 2. Classification result for ‘IndianPines’
Land Cover Type
SCNN (1) SFECNN (2) SCNN Asd (1) VCNN NoAsd (1) DRFLVAT (2) VCNN (1)
Alfalfa 0.99696 0.86612 Corn notill 0.91446 Corn mintill 0.98422 Corn 0.96192 Grass pasture 0.92378 Grass trees 0.99796 Grass pasture mowed 0.96887 Hay windrowed 0.99715 Oats 0.90987 Soybean notill 0.74736 Soybean mintill 0.94102 Soybean clean 0.9772 Wheat 0.90082 Woods Buildings Grass Trees Drives 0.96182 0.99055 Stone Steel Towers
0.9962 0.8633 0.9211 0.9838 0.9628 0.9265 0.9975 0.9631 0.9973 0.9098 0.7596 0.9408 0.9768 0.8967 0.9622 0.9897
0.99639 0.86453 0.90864 0.97327 0.94638 0.92761 0.99762 0.96503 0.99796 0.9126 0.75306 0.93802 0.97531 0.90416 0.96233 0.99078
0.99581 0.86816 0.92773 0.98548 0.96442 0.92815 0.99743 0.95931 0.99791 0.90931 0.7609 0.94618 0.97956 0.88466 0.96273 0.99082
84.21 87.76 87.05 67.02 97.93 98.29 53.33 99.74 81.25 86.50 93.59 77.68 99.39 94.57 79.35 100.00
0.99591 0.86414 0.91805 0.97656 0.95204 0.92918 0.99748 0.95725 0.9981 0.90805 0.76313 0.94269 0.979 0.8862 0.96297 0.99125
Kappa
0.6013
0.7298
0.7306
0.8471
0.89
0.9278
OA AA Time
71.7337 0.6718 326.0276
81.0131 0.7270 0.3418e+04
80.5993 0.7470 253.8959
89.3793 0.8926 1.9271e+03
90.45 87.04 -
94.8633 0.9338 1.7724e+03
Fig. 4. Classification result of “IndianPines”
From this study, we noticed that the highest precise classes are the ones that appear the least on images, so they were perfectly precise. on other hand, the classes with the least identified pixels are those which exist too much in the image. Also, for computation time, VCNN(1) is not ideal, and SCNN Asd(1) is always superior, for three test images. Finally, the results obtained are very acceptable compared to other methods. In future work, we will focus more on tuning the most apparent classes and computation time. After conparaison, it is clear that the proposed method is the best. Also, the adaptive aspect of spatial data is always better with virtualism or a simple method. We also compared state-of-the-art approaches to our proposed approach. experiments have shown that VirSSF-CNN(2) is very efficient and computationally accurate.
Virtual Reconstruction of Adaptive Spectral and Spatial Features
4
159
Conclusion
The hyperspectral satellite image (HSI) is a series of huge spectral bands. Each HSI spectral band requires specific processing to solve the problem of ordinary high-dimensional and limited training samples. In particular, the development of machine learning and deep learning will enable object recognition in HSI. In order to properly examine the properties of an element or a pixel, it is necessary to study its (adjacent) environment and its potential to detect offset errors. This article proposes new methods for analysis, 3D reconstruction, and detection of HSI objects with the Convolutional Neural Networks (CNN) using edge-adaptive spatial data. So, we were able to foresee all the possibilities regarding the nature of the object. We were also able to fix edge errors caused by the uniform exploration of spatial data. These two proposed approaches make it possible to improve the accuracy while guaranteeing very reasonable computation times. Tests were carried out in several public HSIs. The results proved the effectiveness of the approach. The results obtained are very acceptable compared to other methods. In future work, we will focus more on tuning the most apparent classes and computation time. Acknowlegment. This work was supported by the Ministry of Higher Education and Scientific Research of Tunisia.
References 1. Khader, A., Yang, J., Xiao, L.: NMF-DuNet: Nonnegative matrix factorization inspired deep unrolling networks for hyperspectral and multispectral image fusion. IEEE J. Selected Topics Appl. Earth Observations Remote Sensing, pp. 1–17 (2022) 2. Hamouda, M., Ettabaa, K.S., Bouhlel, M.S.: Framework for automatic selection of kernels based on convolutional neural networks and ckmeans clustering algorithm. Int. J. Image Graph. 19(04), 1950019 (2019) 3. Wang, L., Wang, L., Wang, H., Wang, X., Bruzzone, L.: “SPCNet: A subpixel convolution-based change detection network for hyperspectral images with different spatial resolutions. IEEE Trans. Geosci. Remote Sensing 60 1–1 (2022) 4. Fu, H., et al.: A novel band selection and spatial noise reduction method for hyperspectral image classification. IEEE Trans. Geosci. Remote Sensing 60, 1–13 (2022) 5. Hamouda, M., Ettabaa, K.S., Bouhlel, M.S.: Smart feature extraction and classification of hyperspectral images based on convolutional neural networks. IET Image Proc. 14(10), 1999–2005 (2020) 6. Hasheminejad, M.: Optimized kernel nonparametric weighted feature extraction for hyperspectral image classification. J. Inform. Syst. Telecommun. (JIST) 10(38), 111–119 (2022) 7. Zhou, L., Xu, E., Hao, S., Ye, Y., Zhao, K.: Data-wise spatial regional consistency re-enhancement for hyperspectral image classification. Remote Sensing 14(9), 2227 (2022) 8. Yang, R., Zhou, Q., Fan, B., Wang, Y.: Land cover classification from hyperspectral images via local nearest neighbor collaborative representation with tikhonov regularization. Land 11(5), 702 (2022)
160
M. Hamouda and M. Bouhlel
9. Hamouda, M., Saheb Ettabaa, K., Bouhlel, M.S.: Adaptive batch extraction for hyperspectral image classification based on convolutional neural network. In: Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D. (eds.) ICISP 2018. LNCS, vol. 10884, pp. 310–318. Springer, Cham (2018). https://doi.org/10.1007/978-3319-94211-7 34 10. Hamouda, M., Bouhlel, M.S.: Hybrid neural network for hyperspectral satellite image classification (HNN). In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 567–575. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96308-8 53 11. Wang, J., Gao, F., Dong, J., Du, Q.: Adaptive DropBlock-enhanced generative adversarial networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sensing 59(6), 5040–5053 (2021) 12. Hamouda, M., Ettabaa, K.S., Bouhlel, M.S.: Modified convolutional neural network based on adaptive patch extraction for hyperspectral image classification. In: 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2018, pp. 1–7 (2018) 13. Chen, J., Wang, Y., Zhang, L., Liu, M., Plaza, A.: DRFL-VAT: Deep representative feature learning with virtual adversarial training for semi-supervised classification of hyperspectral image. IEEE Trans. Geosc Remote Sensing. 60 1–1 (2022) 14. Sun, K., Wang, A., Sun, X., Zhang, T.: Hyperspectral image classification method based on m-3dcnn-attention. J. Appl. Remote Sensing 16(02) (2022) 15. Ranasinghe, K.: Gauss: Guided encoder-decoder architecture for hyperspectral unmixing with spatial smoothness (2022) 16. Palop, J.J., Mucke, L., Roberson, E.D.: Quantifying biomarkers of cognitive dysfunction and neuronal network hyperexcitability in mouse models of alzheimer’s disease: depletion of calcium-dependent proteins and inhibitory hippocampal remodeling. in Alzheimer’s Disease and Frontotemporal Dementia. Springer, 2010, pp. 245–262. https://doi.org/10.1007/978-1-60761-744-0 17 17. Lin, K., et al.: Outdoor detection of the pollution degree of insulating materials based on hyperspectral model transfer. Available at SSRN 4157180 18. GIC.: Hyperspectral remote sensing scenes. Grupo de Inteligencia Computacional (2014) 19. Hamouda, M., Bouhlel, M.S.: Dual convolutional neural networks for hyperspectral satellite images classification (DCNN-HSI). In: Yang, H., et al. (eds.) ICONIP 2020. CCIS, vol. 1332, pp. 369–376. Springer, Cham (2020). https://doi.org/10. 1007/978-3-030-63820-7 42
Enhancing Rental Bike Count and Availability Prediction Using Regression Modelling Dhiraj Kumar, Diptirtha Chatterjee, Bibek Upadhyaya, Shailendra Nath Yadav, and Jyoti Singh Kirar(B) Banaras Hindu University, Varanasi, India [email protected]
Abstract. Sharing Bike has become a new concept now a days where one doesn’t have to buy a bike to enjoy their daily ride. Currently it has been introduced in many countries for the betterment of public transportation and other activities. One can rent bikes on several basis like hourly, daily, monthly etc. It has a significant role to the rising issues related to the global warming, climate change, carbon emission and many more environmental anomalies. It is very much necessary to make a system or model which facilitates the availability of rental bikes to the customer at the right time to avoid any delay. In this work, we have used “Linear Regression” and “Polynomial Regression” modelling to predict the rental bike count required at each hour very efficiently. We have used a publicly available dataset of Seoul city, the capital of South Korea containing the rental bike count and other climate related variables. The experimental outcomes of this work shows the efficacy of the proposed method. Keywords: Linear Regression · Polynomial Regression · Rental Bike Prediction · Seoul bike prediction · Supervised Machine learning
1 Introduction The bicycle sharing system is a system which facilitates the renting of bicycle by some agency or agents to certain individuals for a fixed period of time. This system is specially used in tourist places or some mega cities in order to facilitate tourists or travelers for their suitable transportation. Although this system began in Europe in 1960, the concept spread worldwide only after mid-2000s [10]. This system is used by local government bodies, private organization in cities and also used by some University organization in their campuses. The Bike sharing depends on several environmental factors like cloudy weather or rise in temperature etc. Various other factors like traffic jam, festivals are also important factors affecting the bike sharing system. This system has several benefits as it helps reducing the pollution and traffic jam in cities [12]. Moreover, it is very much helpful for some unknown tourists.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 161–169, 2023. https://doi.org/10.1007/978-3-031-27440-4_16
162
D. Kumar et al.
2 Literature Review Several research papers on ‘Rental Bike Prediction’ have been published already and are on the web [1–10]. These research works focus primarily on Scatter plot, model building etc. Supervised Machine learning Algorithm has also been applied in most of these research work to predict the demand for rental bike. Even though there is good result to predict bike count per hour, further exploration of ML algorithm can improve the results. In [1], the authors have used four statistical models like Linear Regression, Gradient boosting machines, k nearest neighbor and random forest to predict the trip duration of the rental bike. In [6], the authors have studied the factors affecting the bike trip demand. Research was done by Aditya Singh Kashyap and Swastika Swastik where they have used Scatter plot analysis and correlation analysis to predict the rental bike count efficiently and more accurately. Scatter plot and Correlation plot is a part of Data Mining which is used to know how the dependent variables and changing with the changing values of the independent variables and also to know about the degree to linear relationship among the variables in the dataset.
3 Methodology In this work we have used “Multiple Linear Regression” and “Polynomial Regression” to predict the rental bike count accurately. The flow chart given below represents the procedure of our analysis (Fig. 1):
Fig. 1. Flowchart of the proposed analysis
Dataset from UCI Machine Learning Repository has been utilized in this research work. It has 14 attributes out of which one is dependent variable containing Rental Bike Count per hour with 8760 entries and 13 independent variables our Regression Analysis. The dataset contains daily information of Humidity, Windspeed, Dewpoint, Visibility,
Enhancing Rental Bike Count and Availability Prediction
163
Solar radiation, Temperature, Rainfall, Snowfall, like Weather conditions, and the bike count rented per hour. Date column does not provide relevant information to predict the Rental Bike Count per hour. We have developed a statistical model in this research work to predict the optimal number of bikes to be rented per hour to make a balance between the customer demand with the availability of bikes in that particular area and to understand and factors affecting the rented bike count on a particular day. In the first step, we identify missing values if any in the dataset. There were no missing values present in the dataset. Exploratory Data Analysis is then performed to attain more insights about the dataset. We created a heatmap to get an idea of the linear relationship (correlation) between features of interest. The heatmap is below (Fig. 2):
Fig. 2. Correlation Plot
From the above plot we can say that “Temperature” has the most linear relationship with the “Rented Bike Count” among others. Next, we have plotted the scatter plots of the “Rented Bike Count” with the other independent continuous variables. The scatter plots are given below (Figs. 3, 4, 5, 6, 7 and 8):
164
D. Kumar et al.
Fig. 3. Scatter plot of Rented Bike Count Vs Temperature
Fig. 4. Scatter plot of Rented Bike Count Vs Humidity
Fig. 5. Scatter plot of Rented Bike Count Vs Solar Radiation
Enhancing Rental Bike Count and Availability Prediction
Fig. 6. Scatter plot of Rented Bike Count Vs Visibility
Fig. 7. Scatter plot of Rented Bike Count Vs Wind Speed
Fig. 8. Scatter plot of Rented Bike Count Vs Dew Point Temperature
165
166
D. Kumar et al.
From the above scatter plots we can see that a moderately linear or polynomial model can be a good one to predict the rental bike count per hour efficiently. Next, we have drawn bar plot for the categorical variables. Plots are given below (Fig. 9, 10 and 11):
Fig. 9. Season-wise Bar plot of the samples
Fig. 10. Functioning Day wise Bar plot of the samples
Next, we have checked for the skewness of the features. Some of them are skewed ones, so we have used Yeo-Johnson Power Transformation to transform them into symmetric distribution [11]. Data preprocessing phase of our analysis is completed thereby.
Enhancing Rental Bike Count and Availability Prediction
167
Fig. 11. Holiday Day wise Bar plot of the samples
Next, we have split the dataset into training and testing using Stratified Sampling technique based on “Season”, 80% of the samples are used to train the proposed model and rest to validate that our model is doing well with the previously unseen data. Here we have used Stratification [11] technique based on “Season” to avoid over or underfitting. There may be a situation that for some seasons our model is working very well, but for other it’s giving bad results. Stratification will remove this issue. Now, we develop some regression models based on our training data. In this work, we have used “Linear Regression” and “Polynomial Regression” to get desired results. 3.1 Multiple Linear Regression Multiple linear regression uses multiple independent features for predicting the outcome of a dependent or target feature. The goal of multiple linear regression is to model the best linear relationship between the independent features and target feature i.e. ‘Rented Bike Count’ in our dataset. Multiple linear regression is an extension of simple linear regression that includes multiple independent features. It uses OLS (Ordinary least square) estimator to estimate the regression coefficients. Yi = a0 + a1 xi1 + a2 xi2 + ... + ap xip +
(1)
where i th observation: Xi = Independent features = the error/residual of the model Yi = Dependent feature ap = regression coefficients for each independent feature a0 = intercept (constant term) The assumptions of the multiple regression model is given below: • The target feature i.e. ‘Rented Bike Count’ should be linearly related with each independent feature.
168
D. Kumar et al.
• There should not be any strong correlation between independent features. • Yi observations are selected independently and randomly from the population. • Residuals follow Normal distribution (0, σ ) The coefficient of determination (R-square) is a statistical metric that determines how well the data fits the model. We have used the R-square model to measure the goodness of fit of the model. 3.2 Polynomial Regression The purpose of the regression analysis is to define a functional relationship between the target feature and independent feature in order to predict the target feature. In simple linear regression, the model Y = a0 + a1 x +
(2)
is used, where x is an independent feature and ε is an unobserved random zero mean error conditioned on x. The conditional expectation of y increases by a1 units, for every unit increase in x. However, such linear mapping does not hold for every situation. One such example includes predicting the yield of a chemical synthesis using value of the temperature point at which the synthesis occurs shows that the yield increases as the quantity increases with each increase in temperature. In this case we can propose a quadratic model of the form Y = a0 + a1 x + a2 x2 +
(3)
K-fold cross validation have been used to predict the rental bike count per hour very precisely. We have considered k = 5 which is sufficient enough to get consistent R2 score.
4 Results We have tested all the assumptions of the Multiple Linear Regression model and all the assumptions of the model are satisfied [11]. The summary of Multiple Linear Regression Model and Polynomial Regression of degree 2 is given below (Table 1): Table 1. Summary of Regression R-Square [14]
MSE [15]
CV Accuracy [16]
CV Std. [17]
Linear Regression
0.808
19.16
80.35
0.59
Polynomial Regression
0.898
11.29
79.35
0.59
Enhancing Rental Bike Count and Availability Prediction
169
The model intercept of Linear Regression is 0.01069917702144423. The model intercept of Polynomial Regression is 18628161062.44071.
5 Conclusion and Fututre Enhancement From the above results we can see that the R-Squared value for Multiple Linear Regression model is 0.808, which means our model is able to explain 80.8% of the variability in the dataset. Polynomial regression model has Adjusted R-Squared value of 0.898, which means our model is able to explain the 89.8% of the variability in the dataset. But, the Cross Validation Accuracy Score [11, 12] of Linear Regression is higher than Polynomial Regression. That means Polynomial Regression overfits [11] the data in this case. Since, the Linear Regression model has higher accuracy in the Cross Validation Accuracy score, Linear Regression is the better model than Polynomial Regression to predict the rental bike count per hour accurately in this case. We’ve achieved very high accuracy, but we’re not done yet. We will do our best to improve the model to achieve 100% accuracy so that the model correctly classifies the number of rental bikes required per hour. We also plan to add more sophisticated algorithms to make our predictions more accurate in the future. The more data you have, the more accurate the algorithm will always be. We also try to predict this problem with more data.
References 1. Sathishkumar, V.E., Park, J., Cho, Y.: Seoul bike trip duration prediction using data mining techniques. IET Intell. Trans. Sys. 14(11), 1465–1474 (2020) 2. Sathishkumar, V.E., Cho, Y.: A rule-based model for Seoul Bike sharing demand prediction using weather data. Eur. J. Remote Sens. 53(sup1), 166–183 (2020) 3. Liu, X., Pelechrinis, K.: Excess demand prediction for bike sharing systems. PLoS ONE 16(6), e0252894 (2021) 4. Sathishkumar, V.E., Cho, Y.:Season wise bike sharing demand analysis using random forest algorithm.Comput. Intell. (2020) 5. Wang, Z.: Regression model for bike-sharing service by using machine learning. Asian J. Soc. Sci. Stud. 4(4), 16 (2019) 6. Eren, E., Uz, V.E.: A review on bike-sharing: the factors affecting bike-sharing demand. Sustain. Cities Soc. 54, 101882 (2020) 7. Goh, C.Y., Yan, C., Jaillet, P.: Estimating primary demand in bike-sharing systems. SSRN Electron. J. (2019) 8. Almannaa, M.H., Elhenawy, M., Rakha, H.A.: Dynamic linear models to predict bike availability in a bike sharing system. Int. J. Sustain. Transp. 14(3), 232–242 (2019) 9. Sachdeva, P., Sarvanan, K.N.: Prediction of bike sharing demand. Orient. J. Comput. Sci. Technol. 10(1), 219–226 (2017) 10. Liu, X.N., Wang, J.J., Zhang, T.F.: A method of bike sharing demand forecasting. Appl. Mech. Mater. 587–589, 1813–1816 (2014) 11. Sugiyama, M.: Introduction to Statistical Machine Learning. Morgan Kaufmann (2015) 12. Kirar, J.S., Agrawal, R.K.: Relevant feature selection from a combination of spectral-temporal and spatial features for classification of motor imagery EEG. J. Med. Syst. 42(5), 1–15 (2018)
Application of WASPAS Method for the Evaluation of Tamil Nadu Private Travels S. M. Vadivel1(B)
, A. H. Sequeira2 , Deeksha Sanjay Shetty2 and V. Chandana3
,
1 VIT Business School, Vandalur-Kelambakkam Road, Chennai 600127, India
[email protected]
2 School of Management, National Institute of Technology Karnataka, Surathkal 575025, India 3 Department of Industrial and Production Engineering, The National Institute of Engineering,
Mysuru 570008, Karnataka, India
Abstract. The most important and discussion topic is providing a sustainable service to customer in satisfaction on transportation facilities. Tamil N¯adu government has taken much initiatives to introduce more buses, trains in order to satisfy the customers while travelling. Tamil N¯adu (TN) has a huge population and people are expecting more facilities from TN government. Hence, in addition, Private bus travels came into that place to provide customers has better service. This paper aims to assess the performance of eight private bus travels passengers transport company in Tamil Nadu, Chennai. The performance data have been collected from both managers and frequent travelling passengers. The quantitative data collected from travels managers whereas qualitative data collected from passengers. These quantitative and qualitative have been analyzed with WASPAS a MCDM techniques. A novel MCDM technique known as WASPAS (Weighted Aggregated Sum Product Assessment) proposed in this study. The overall systematic algorithm for determining the best private bus travels company has been illustrated in step-by-step basis for further enhancement. Keywords: WASPAS · Multicriteria Decision Making (MCDM) · Performance measurement · Private Bus Transportation
1 Introduction Promoting the operational effectiveness and service quality of urban public transportation systems requires effective performance evaluation (Gomes, 1989). The decisionmaking (DM) process will be connected to performance metrics. DM typically struggle with the challenge of evaluating a large number of alternatives and choosing the best one based on specific criteria in a variety of organizations, including the transportation sector. Finding the relevant criteria and getting the best possible match between those criteria with relevant sub-criteria and the actual requirement is the major goal of any selection process. To manage DM, there is a requirement for straightforward, organized,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 170–177, 2023. https://doi.org/10.1007/978-3-031-27440-4_17
Application of WASPAS Method
171
and logistical procedures supported by some mathematical instruments. First, the evaluation criteria are typically several hierarchies. Second, the process involves subjective evaluations, which leads to the use of qualitative and imprecise data. In countries with developing economies, like India, the transportation sector, in particular public sector bus companies, is essentially the lifeline because it moves more people from one location to another, whether they are in urban, semi-urban, or rural areas. AHP was mentioned by Vadivel et al. (2020) to assess Tamil Nadu private bus trips from the perspective of the passengers. The final places are other Tamil Nadu states or southern Indian states, with Chennai serving as the beginning point (Destination). Additionally, this paper is structured as follows: The literature support for transportation is improved in Sect. 2. The suggested applied methodology is in Sect. 3. The evaluation of TN Omnibus Companies is covered in detail in Sect. 4, and the final section finishes with a discussion of the study’s shortcomings and potential future advantages.
2 Literature Support WASPAS approach as a useful MCDM tool while addressing eight manufacturing decision-making issues. It is capable of precisely ranking the alternatives across all of the specified criteria [1]. It has also researched the effectiveness of the WASPAS technique in relation to the rating effectiveness of industrial robots. The goal of green supply chain management (GSCM) is to lessen adverse environmental effects throughout the whole supply chain. In an MCDM problem, information uncertainty is handled via fuzzy sets. Weighted Aggregated Sum Product Assessment (WASPAS) is a brand-new, integrated methodology put out by [2]. Fuzzy intuitionistic set theory may be helpful in some difficult decision-making situations. On the example website evaluation, the recommended approach’s effectiveness and usability are taken into account. A recently proposed yet often used multiple criterion decision-making method is WASPAS. Chakraborty et al. [3] studied this the process of choosing the location for a healthcare waste disposal facility is rather complicated. In the context of FFSs, this methodology combines the scoring function, entropy measurement, and traditional WASPAS approach. Case study is an instructive case study to demonstrate the viability and effectiveness of the developed strategy examined by [4]. The difficulty of choosing a location for Turkey’s first marine current energy plant is the subject of this study. Although the feasibility of generating electricity from maritime currents has been studied and discussed extensively in Turkey, little real progress has been made on the subject as of yet. The suggested model’s criteria are weighted using the SWARA approach, and the alternatives are ranked using the WASPAS method. Suggested by [5]. A few Multiple Criteria Decision Making (MCDM) techniques are used in this research. The well-known MOORA (Multiple Objective Optimization on the basis of Ratio Analysis) and WASPAS (Weighted Aggregated Sum Product Assessment) are used by [6]. The study discusses a supply chain management-related hybrid multi-criteria decision-making (MCDM) model. The WASPAS method is used to determine the weights for the criteria. With examples and illustrations, the measurement of conflict between criteria and decision-makers is studied by [7]. Using the WASPAS technique,
172
S. M. Vadivel et al.
six flats in comparable brick homes were evaluated for their indoor environments. The findings demonstrate the applicability of MADM-opt for the evaluation of alternatives in light of the best option. Additionally, it enables the determination of the departure of the evaluated alternatives from the ideal choice [8]. The purpose of this study is to demonstrate how Multi-Criteria Decision-Making (MCDM) techniques can be used to solve 3PL selection issues. A group of decisionmakers from the pertinent department of the business came up with 12 criteria. The third 3PL was the one that, according to the study’s findings, satisfied the required standard suggested by [9]. For sustainable supply chain management, the choice of a sustainable supplier is crucial (SSCM). In order to assess and choose the ideal sustainable supplier suggested by [10], the current study suggests a unique framework based on COPRAS (Complex Proportional Assessment) and SWARA (Step-wise Weight Assessment Ratio Analysis) approaches. Fuzzy sets and utility determining procedures for Multiple Criteria Decision Making (MCDM) are thought of as fresh development approaches that have lately been presented, expanded upon, and employed by some scholars. The study’s findings can help decisionmakers manage data including stakeholder preferences, related or conflicting criteria, and uncertain situations submitted by [11]. The Indian cement industry used weighted aggregated sum product assessment (WASPAS) and step-wise weight assessment ratio analysis (SWARA) to select suppliers. Supplier management appears to be the top weighted criterion, followed by information exchange and coordinated activities, according to SWARA findings. [12] studied this. The goal of this study is to identify humanitarian supply chain management challenges and assess potential solutions. It employs a hybrid framework made up of fuzzy weighted aggregated sum product assessment and fuzzy step-wise weight assessment ratio analysis (WASPAS) recommended by [13].
3 Research Methods See (Figs. 1 and 2).
The decision matrix weight of attributes
WASPAS method 1.The normalized decision matrix 2. The additive relative importance (WSM) 3.The multiplicative relative importance (WPS) 4.The joint generalized criterion (Q)
Fig. 1. Graphic Abstract
The Ranking of Alternatives
Application of WASPAS Method
173
Formulate decision matrix
Check criteria
Reverse Normalization
Normalization
Calculate Weighted Sum model Q11
Calculate Weighted Product model Q12
Calculate Total Weighted Aggregated Sum Product model (WASPAS)
Determine the ranking alternative based on total relative weights.
Fig. 2. Proposed Research methodology
Step 1: Weighted Sum Model Ai
wsm
=
n
WiXij
j=1
Step 2: Weighted Product Model Aiwpm =
n
xijwi
j=1
Step 3: Weighted Aggregated Sum & Product Model Q1 = λQi1 + (1 − λ)Qi2 Qi = Q11 when λ = 1 Qi = Q21 when λ = 0
4 Case Study – Chennai, Tamilnadu, Southern India The purpose of the study is to assess private bus travel in Chennai. The evaluation must provide exceptional customer service while yet being cost-effective for the passengers.
174
S. M. Vadivel et al.
4.1 Application of WASPAS Method Koyambedu (Chennai) location in Tamil Nadu in South India taken as case study research. Operating buses to three states have been chosen. The best private travels have been evaluated the overall hierarchy of all criteria, sub-criteria and eight bus private companies’ (alternatives) mentioned in Sengazhani Murugesan et al. (2020). As per WASPAS method, the following table values are exemplified starting from Table 1, 2, 3, 4, 5 and 6. Table 1. WASPAS Decision matrix Weight Travels Company KPN MGM MJT Parveen Rathi Meena RPN SRM Yoga Lakshmi Total Criteria
0.5 Safety
0.4 Comfort
0.2 Operation
0.4 Service
9 8 7 7 8
9 8 7 8 8
8 6 7 7 8
8 8 7
7 9 7
62 63 Beneficial Beneficial
0.3 Finance
7 6 6 7 7
0.2 Social benefits 8 4 6 6 5
7 4 6 6 5
Reverse Normalization 0.142 0.25 0.167 0.167 0.2
7 9 6
6 8 6
6 8 6
5 8 5
0.2 0.125 0.2
58 Beneficial
53 Beneficial
49 Beneficial
46 NonBeneficial
1.456
Table 2. Calculation of Weighted Sum model Weight Travels Company KPN MGM MJT Parveen Rathi Meena RPN SRM Yoga Lakshmi
0.5 Safety
0.4 Comfort
0.2 Operation
0.4 Service
0.145161 0.129032 0.112903 0.112903 0.129032 0.129032 0.129032 0.112903
0.14285714 0.12698413 0.11111111 0.12698413 0.12698413 0.11111111 0.14285714 0.11111111
0.137931 0.1034483 0.1206897 0.1206897 0.137931 0.1206897 0.1551724 0.1034483
0.1320755 0.1132075 0.1132075 0.1320755 0.1320755 0.1132075 0.1509434 0.1132075
0.2 Social benefits 0.163265306 0.081632653 0.12244898 0.12244898 0.102040816 0.12244898 0.163265306 0.12244898
0.3 Finance 0.098441345 0.172272354 0.114848236 0.114848236 0.137817884 0.137817884 0.086136177 0.137817884
Application of WASPAS Method
175
Table 3. Weighted Sum Model Ranking Safety
Comfort
Operation
Service
0.072581 0.064516 0.056452 0.056452 0.064516 0.064516 0.064516 0.056452
0.05714286 0.05079365 0.04444444 0.05079365 0.05079365 0.04444444 0.05714286 0.04444444
0.0275862 0.0206897 0.0241379 0.0241379 0.0275862 0.0241379 0.0310345 0.0206897
0.0528302 0.045283 0.045283 0.0528302 0.0528302 0.045283 0.0603774 0.045283
Travels Company KPN MGM MJT Parveen Rathi Meena RPN SRM Yoga Lakshmi
Social benefits 0.032653061 0.016326531 0.024489796 0.024489796 0.020408163 0.024489796 0.032653061 0.024489796
Finance
Sum
Rank
0.029532404 0.051681706 0.034454471 0.034454471 0.041345365 0.041345365 0.025840853 0.041345365
0.272325 0.249291 0.229261 0.243158 0.257480 0.244217 0.271565 0.232704
1 4 8 6 3 5 2 7
Table 4. Calculation of Weighted Product model Weight Travels Company KPN MGM MJT Parveen Rathi Meena RPN SRM Yoga Lakshmi
0.5
0.4
0.2
0.4
0.2
0.3
Safety
Comfort
Operation
Service
Social benefits
Finance
0.145161 0.129032 0.112903 0.112903 0.129032 0.129032 0.129032 0.112903
0.14285714 0.12698413 0.11111111 0.12698413 0.12698413 0.11111111 0.14285714 0.11111111
0.137931 0.103448 0.12069 0.12069 0.137931 0.12069 0.155172 0.103448
0.132075 0.113208 0.113208 0.132075 0.132075 0.113208 0.150943 0.113208
0.163265 0.081633 0.122449 0.122449 0.102041 0.122449 0.163265 0.122449
0.098441 0.172272 0.114848 0.114848 0.137818 0.137818 0.086136 0.137818
Table 5. Weighted Product Model Ranking Travels Company KPN MGM MJT Parveen Rathi Meena RPN SRM Yoga Lakshmi
Safety
Comfort
Operation
Service
Finance
Product
Rank
0.444968 0.41836 0.41836 0.444968 0.444968
Social benefits 0.695951 0.605861 0.657039 0.657039 0.633512
0.381 0.359211 0.336011 0.336011 0.359211
0.45915655 0.43802588 0.41524365 0.43802588 0.43802588
0.672872 0.63525 0.65514 0.65514 0.672872
0.498831 0.590017 0.522441 0.522441 0.551813
0.018184 0.014948 0.013127 0.014728 0.016469
1 4 8 6 3
0.359211 0.359211 0.336011
0.41524365 0.45915655 0.41524365
0.65514 0.688911 0.63525
0.41836 0.469381 0.41836
0.657039 0.695951 0.657039
0.551813 0.479243 0.551813
0.014822 0.017788 0.013444
5 2 7
Here, we have considered lambda value is 0.5.
5 Comparative Study The findings of the comparative analysis are presented in Table 7. According to our research, KPN Travels is the company with the best selection among eight private
176
S. M. Vadivel et al. Table 6. Results of WASPAS method final ranking Weighted Sum Model Qi1
Weighted Product Model Qi2
0.2723254 0.2492907 0.2292613 0.2431577 0.2574797 0.2442167 0.2715647 0.2327039
0.0181836 0.0149479 0.0131271 0.014728 0.0164686 0.0148224 0.0177882 0.0134442
Joint Generalized Criterion of WASPAS (Qi) with λ =0.5 0.145254474 0.132119313 0.121194185 0.128942832 0.136974151 0.12951955 0.14467649 0.123074026
Final Rank
Travels Company
1 4 8 6 3 5 2 7
KPN MGM MJT Parveen Rathi Meena RPN SRM Yoga Lakshmi
bus travels, followed by SRM Bus Company Travels. The top four global weights of MCDM techniques mentioned as follows: AHP, TOPSIS, FTOPSIS and WASPAS (0.1732, 0.8283, 0.5044, 0.1452). Table 7. Comparative Ranking Study Results
Private Bus Companies KPN MGM MJT Parveen Rathi Meena RPN SRM Yoga Lakshmi
AHP
TOPSIS
2 5 6 3 4 7 1 8
2 8 5 3 4 6 1 7
Fuzzy TOPSIS 1 8 7 4 3 6 2 5
WASPAS 1 4 8 6 3 5 2 7
SRM is given top importance in AHP, TOPSIS methods, while KPN is given second place. In the FTOPSIS, WASPAS method, KPN is given priority over SRM. The lack of sensitivity analysis in this work is regarded as a weakness.
6 Conclusion In this article, we provide an MCDM-WASPAS method for evaluating the sustainability of eight private bus transport companies’ services. The suggested methodology took into account both quantitative and qualitative goals at once. A real-world case study showed how successful the suggested methodology is. According to the WASPAS approach, KPN places first out of eight travels, followed by SRM. We employed six criterion and 28 sub-criteria, and we took into account the responses of 50 respondents. Private company travel is one of the possibilities evaluated and ranked using a technique that incorporates the opinions of all relevant parties. According to variations in passenger
Application of WASPAS Method
177
comments and the buses running on the relevant route, the findings have varied. This develops into one of the potential future research areas.
References 1. Agarwal, S., Kant, R., Shankar, R.: Evaluating solutions to overcome humanitarian supply chain management barriers: a hybrid fuzzy SWARA-Fuzzy WASPAS approach. Int. J. Disaster Risk Reduction 51, 101–838 (2020) 2. Antucheviciene, J., Saparauskas, J.: MCDM methods WASPAS and MULTIMOORA: verification of robustness of methods when assessing alternative solutions. Econ. Comput. Econ. Cybem. Stud. Res 47, 5–20 (2013) 3. Chakraborty, S., Zavadskas, E.K.: Applications of WASPAS method in manufacturing decision making. Informatica 25(1), 1–20 (2014) 4. Ghorabaee, M.K., Zavadskas, E.K., Amiri, M., Esmaeili, A.: Multi-criteria evaluation of green suppliers using an extended WASPAS method with interval type-2 fuzzy sets. J. Clean. Prod. 137, 213–229 (2016) 5. Jayant, A., Singh, S., et al.: An integrated approach with MOORA, SWARA, and WASPAS methods for selection of 3PLSP. In: Proceedings of the International Conference Industrial Engineering Operation Management, vol. 2018, pp. 2497–2509 (2018) 6. Keshavarz Ghorabaee, M., Amiri, M., et al.: Assessment of third-party logistics providers using a CRITIC-WASPAS approach with interval type-2 fuzzy sets. Transport 32(1), 56–78 (2017) 7. Mardani, A., Nilashi, M., Zakuan, N., et al.: A systematic review and meta-analysis of SWARA and WASPAS methods: theory and applications with recent fuzzy developments. Appl. Soft Comput. 57, 265–292 (2017) 8. Mishra, A.R., Rani, P.: Multi-criteria healthcare waste disposal location selection based on Fermatean fuzzy WASPAS method. Complex Intell. Syst. 7(5), 2469–2484 (2021). https:// doi.org/10.1007/s40747-021-00407-9 9. Mishra, A.R., Rani, P., Pardasani, K.R., Mardani, A.: A novel hesitantfuzzy WASPAS method for assessment of green supplier problem based on exponential information measures. J. Clean. Prod. 238, 117901 (2019) 10. Stanujki´c, D., Karabaševi´c, D.: An extension of the WASPAS method for decision-making problems with intuitionistic fuzzy numbers: a case of website evaluation. Oper. Res. Eng. Sci.: Theory Appl. 1(1), 29–39 (2018) 11. Stoji´c, G., Stevi´c, Ž, Antucheviˇcien˙e, J., Pamuˇcar, D., Vasiljevi´c, M.: A novel rough WASPAS approach for supplier selection in a company manufacturing PVC carpentry product. Information 9(5), 121 (2018) 12. Vadivel, S.M., Sequeira, A.H., Jauhar, S.K., Baskaran, R., Robert Rajkumar S.: Application of Multi-criteria Decision-Making Method for the Evaluation of Tamil Nadu Private Bus Companies. Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol. 1154. Springer, Singapore (2020). https://doi.org/10.1007/978-981-154032-521 13. Yücenur, G.N., Ipekçi, A.: SWARA/WASPAS methods for a marine current energy plant location selection problem. Renew. Energy 163, 1287–1298 (2021)
Time Series Forecast Applied to Electricity Consumption L´ıdio Mauro Lima de Campos1,2(B) 1
2
Universidade Federal do Par´ a, Bel´e, Brazil Instituto de Ciˆencias Exatas e Naturais - Faculdade de Computac˜ ao, Bel´em, Par´ a, Brazil [email protected] https://www.ufpa.br/,https://www.computacao.ufpa.br/
Abstract. Currently, electrical energy is one of the vital sources of energy. Thus, more and more nations are concerned about using electricity more efficiently. Given this scenario, it is necessary to carry out a priori planning to scale both the generation and transmission of energy. Therefore, new demand forecasting methods are needed. The main objective of this work is to evaluate the performance of forecasting methods: regression models and recurrent neural networks, using time series of electricity consumption in the northern region of Brazil. The results showed that the best regression model was the linear one, using moving average, which obtained MAPE of 3.27% for forecasts with daily data. The best result obtained with the Recurrent Neural Network was MAPE equal to 2.4% considering data with monthly periodicity, which was better than the one obtained with the polynomial regression model, which obtained MAPE of 2.66% and better than the multilayer perceptron neural network model that obtained 3.62%.
Keywords: Deep Neural Network speed prediction
1
· Times Series Prediction · wind
Introduction
As electric energy is essential for the industrial infrastructures of nations, companies in this sector increasingly seek to monitor and control them in order to provide improvements for the management and planning of the sector. In Brazil, privatization and deregulation have changed the scenario of the electricity sector, causing it to evolve in the direction of the institution of a market structure, aimed at the free competition for the purchase and sale of electric energy. As a result, the activities of generation, transmission, and distribution are now performed independently and autonomously. Federal University of Par´ a. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 178–187, 2023. https://doi.org/10.1007/978-3-031-27440-4_18
Time Series Forecast Applied to Electricity Consumption
179
As a result, a restructuring of the sector was necessary, aiming to provide electricity to the consumer with quality and safety and at an affordable cost. In this way, load prediction becomes a topic of great interest for this sector. Since the costs of prediction errors end up generating a very high financial cost. Accurate prediction is a critical activity for a stable and efficient power supply, where load and supply planning are combined. Currently, different methods have been applied in order to minimize costs and seek greater system efficiency, in this direction some research has been developed [1,2]. As a result, machine learning techniques begin to be increasingly used, due to the increased accessibility of data, in the modeling and analysis of energy systems. Short-Term Load Predictions (STLF) is critical for cost-effective power generation and system safety. These predictions are made in horizons from hours to several days ahead. These accurate load predictions minimize operating costs while also contributing to energy savings. STLF prediction is also essential for deregulated electricity markets. Furthermore, the amount of energy that the utility must buy or sell on the market in real time at unfavorable prices depends on the prediction error. Therefore, uncertainty modeling in electricity systems is increasingly necessary. In particular, ARMA, ARIMA, ARMAX, and ARIMAX are methods most used classics in time series. This work presents mathematical modeling techniques used for load forecasting. For this, regression models and direct and recurrent networks will be used in order to achieve this purpose. This paper is organized as follows. Section 2 discusses related work, Sect. 3 presents the background of the study, Sect. 4 presents Material and Methods. Lastly, simulation results and conclusions are presented in Sects. 5 and 6, respectively.
2
Related Work
This survey [3] analyzes prediction scopes, data and some of the most used preprocessing methods, the most used machine learning algorithms for prediction, as well as the metrics used for performance evaluation. The research indicates some research focuses that deserve attention today: prediction of energy consumption in buildings in the long term, prediction of energy consumption in residential buildings and prediction of energy consumption in lighting in buildings. This research [4] reviews four machine learning approaches that have been used for energy performance prediction: artificial neural network, support vector machine, Gauss-based regressions, and clustering. But, it does not reach a reasonable conclusion of which machine learning model is the best, given that the literature points out that it can There can be improvements in the accuracy of these models as long as there are large samples and a good optimization of hyperparameters. Another issue is finding standard methods for a fair comparison of different machine learning models. Besides modeling building energy, clustering buildings based on various input parameters remarkably facilitates and enhances energy benchmarking.
180
L. M. L. de Campos
This paper [5] discusses some algorithms and a hybrid deep learning system that integrates long short-term memory networks (LSTM) and convolutional neural network (CNN) model to analyze their performance for short-term load forecasting. Two real-world data sets, namely “hourly load consumption of Malaysia” and “daily power electric consumption of Germany”, are used to test and compare the obtained models. The results show that deep neural network models are good for being used as short-term prediction tools. The method improved the accuracy from 83.17% to 91.18% for the German data and achieved 98.23% accuracy in Malaysian data is are excellent result in load forecasting.
3
BackGround of Study
Load series data usually have particularities. Thus, before we start our load consumption prediction studies, these particularities must be studied and discussed as follows. 3.1
Definitions
Thus, to forecast load consumption, some time-series concepts need to be clarified. Time series have specific attributes: such as trend, seasonality, and noise. Because of that, these considerations are needed to be taken into account. Stationarity. A time series is stationary when its statistical characteristics (mean, variance, autocorrelation) are constant over time. It’s a series that develops randomly in time, around a constant mean, reflecting some form of stable statistical equilibrium (i.e. the laws of probability that act on the process do not change over time). Trend. The trend of a time series is defined as a pattern of growth/decrease of the variable in a certain period of time. Residual(Noise). Noise components are made up of upward or downward movements descending, highly unstable and random not explained by cyclical variations or by tendency, due to chance or random factors. Seasonality. Seasonality is one of the components of a time series, where data experience regular and predictable changes that repeat each year. It can be characterized as a fluctuation or pattern that is predictable and that repeats itself over the course of a year.
Time Series Forecast Applied to Electricity Consumption
3.2
181
Machine Learnings Models
Linear Regression. analysis is used to predict the value of one variable based on the value of another. The variable you want to predict is called the dependent variable. The variable that is used to predict the value of another variable is called an independent variable. This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. Linear regression fits a straight or shallow line that minimizes discrepancies between predicted and actual output values. The most straightforward linear regression equation is as below. Yi = βo + β1 Xi + μi
(1)
where: Y is the dependent variable, βo is an interceptor, β1 is the slope, X is the independent variable, and i is residual of the model, which is distributed with zero mean and constant variance. The model above is the standard model of linear regression, there are some variations of it, among them the exponential, Eq. 3 and the potential, Eq. 2, whose equations are presented below: i
Yi = βo + Xiβ + εi β i Xi
Yi = βo + Xi
+ εi
(2)
(3)
Deep FeedForward NN. This model, characterized by a three-layer or more network of simple processing units connected by acyclic links Figure, is commonly employed in systems biology studies, including evolutionary studies. Information flows through the ANN in discrete time (Fig. 1).
Fig. 1. Structure of Multilayer Perceptron
182
L. M. L. de Campos
Recurrent Neural Network. Human beings do not think from scratch at every moment, knowledge is cumulative and progressive, that is, what is learned today can be used to learn new information in the future. A multilayer forward RNA does not work that way, which is a limitation. On the other hand, a Recurrent Neural Network (RNR) has the ability to deal with “memory”, as they are networks with repetition loops that allow the persistence of information. A recurring network, Fig.2 can be thought of as multiple copies of the same network, each passing a message to the next. The Recurrent neural network dynamics can be formulated by deterministic transitions from previous to current hidden states. The deterministic state transition is described in Eq. 4. In this representation, the horizontal axis represents time. The neural network at time t = 0 passes a value to the network at time t = 1, which passes another value to t = 2, and so on. This value is the neural network’s memory, which represents everything it remembers from the past.
Fig. 2. Model of Recurrent Neural Network
l l RN N : hl−1 t , ht−1 → ht
4
(4)
Material and Methods
This section presents the methodology used to model the time series of electricity consumption in the city of Bel´em in the State of Par´ a. Throughout this section, the main performance indices that are used to quantify how satisfactory the results of the models’ predictions will be discussed. In particular, emphasis will be given to the literature, which deals specifically with long-term forecasting in the energy sector. 4.1
Dataset Description
The database provided by Equatorial Energia (EE) includes the monthly values of energy consumption in watt-hours (wh) between February 2002 and December 2007. For each class of consumers, determined internally by the EE, there is a series containing the monthly values in the mentioned time horizon. The available
Time Series Forecast Applied to Electricity Consumption
183
Fig. 3. Total monthly consumption series for the State of Par´ a referring to the period extending February 2002 and December 2007 Source: EE.
series are: residential, commercial, industrial, rural, public power, public lighting, public service, own consumption. The series of total energy consumption consists of the sum of the consumption referring to the others. In a later step, the data went through a filtering process in which the outliers destandardized samples were removed and replaced by values obtained through polynomial interpolation. For identification of outliers, the Grubbs test [6] was applied. The data were normalized, the normalization of the data was made such that the values of the series were within the range between 0 and 1. For this we have: yn (k) =
y(k) ymax
(5)
Figure 3 shows the behavior of the total consumption series that will be used as a basis for obtaining the models, since the objective is forecast the company’s total consumption. 4.2
Performance Indices
In time series forecasting problems, an important task is to quantify the quality of the prediction obtained. This allows, for example, comparing different algorithms and different model structures using performance indices. The following are some of the performance indices used in the prediction of the time series of energy consumption: Root mean squared error (RMSE), Mean absolute error (MAE), and Mean absolute percentage error (MAPE). These quantities are calculated by: 1 N |pi true − pi f orecast| (6) M AE = Σi=1 N 2 1 N Σi=1 pi true − pi f orecast RM SE = (7) N N 100% pi true − pi f orecast (8) M AP E = N pi true I=1
where: N is the number of samples, pi true is the actual price and pi f orecast is the forecasted price.
184
5
L. M. L. de Campos
Simulation Results
In this section, the results of the prediction of electric energy consumption in the State of Par´ a are presented. The simulations with the respective performance indices of the models Direct and Recurrent Neural Networks and regression models. 5.1
Regression Models - Data with Daily Periodicity
In this section we present the results obtained by the regression models (linear, exponential, logarithmic, polynomial, and potential), without removing outliers and with 2/3 of the data for training and 1/3 for tests, the first five rows of Table 1 presents the models obtained and the quality of the adjustment obtained by the Coefficient of determination (R2 ). Table 1. Obtained regression models Type
R2
Model 0.0002x
Data periodicity Technique
Exponential y = 12639e
0.5788 daily
NOR
Linear
0.5968 daily
NOR
y = 3.4636x + 12538
Logarithmic y = 1311.5ln(x) + 6821.7
0.4695 daily
NOR
Polynomial
y = 0.0006x2 + 2.5109x + 12748
0.5985 daily
NOR
Potential
y = 8523.4x0.08932
0.4806 daily
NOR
Exponential y = 12642e0.0002x
0.5977 daily
WOR
Linear
0.6101 daily
WOR
y = 3,4745x + 12540
Logarithmic y = 1316,6ln(x) + 6799.6
0.4807 daily
WOR
Polynomial
y = 0.0006x2 +2.6037x+12752
0.6126 daily
WOR
Potential
y = 8508.6x0.0897
0.4973 daily
WOR
0.9257 monthly
AOF
Exponential y = 12672e0.007x Linear
y = 106.58x + 12528
0.9259 monthly
AOF
0.7906 monthly
AOF
Polynomial
y = –0,0036x4 +0,4385x3 -17.1x2 3339,51x+11790 0.9421 monthly
AOF
Potential
y = 11010x0.1073
AOF
Logarithmic y = 1586ln(x) + 10512
0.8284 monthly
Exponential y = 13051e0.0002x
0.858
daily
MA
Linear
0.8542 daily
MA
y = 3.5761x + 12915
Logarithmic y = 1349.4ln(x) + 7042.7
0.6673 daily
MA
Polynomial
y = 0.0006x2 2.6417x+13143
0.8581 daily
MA
Potential
y = 8810.7x0.0892
0.7068 daily
MA
All Regression Models (selected and described in the first five rows of the Table 1) presented unreliable R2 measures, that is, the regression models were poorly suited for prediction. The best were the linear models R2 = 0.5968 and polynomial R2 = 0.5995. The Linear Model presented an average absolute percentage error of 6.81% and a polynomial of 6.38%. These experiments use data without removing outliers.
Time Series Forecast Applied to Electricity Consumption
5.2
185
Regression Models - Data with Daily Periodicity (Outlier Removal)
The models selected after removing outliers are shown from line 6 to line 10 of the Table 1). The removal of outliers improved the quality of the Regression Models a little, as they presented unreliable R2 measures, that is, the regression models were poorly suited for prediction. The best were the linear models R2 = 0.6101, and polynomial R2 = 0.6126.The Linear Model presented an average absolute percentage error of 6.30% and a polynomial of 6.35%. We Tried to increase or decrease training data, however, no improvement was achieved 5.3
Regression Models - Data with Monthly Periodicity
The simulation results presented used the average daily energy consumption of each month. The results from lines 11 to 15 show that with the monthly average, the Regression Models gave more reliable R2 measures. The best were the linear models R2 = 0.9259 and polynomial R2 = 0.9421. The Linear Model presented an average absolute percentage error of 2.84% and a polynomial of 2.66%. The prediction values obtained are shown in Fig. 4, Considering long-term predition 23 steps ahead. 5.4
Regression Models - Data with Daily Periodicity (Outlier Removal) - Moving Average
The use of the moving average with daily data improved the quality of the Regression Models, the same presented more reliable R2 measures, that is, the regression models improved the prediction. The best was the linear model R2 = 0.8542 and polynomial R2 = 0.8581. The Linear Model presented an average absolute percentage error of 3.27%, this regression model was the one that obtained the lowest MAPE among all the other models for data with daily periodicity. 5.5
Multi Layer Perceptron (MLP) - Data with Monthly Periodicity
In this section, the results obtained by simulation using multi layer perceptron networks are presented. In the simulations, the sigmoid activation function was used in the intermediate layer of the neural network and relu in the output layer. An acceptable prediction error of 0.00001 was admitted. The simulation results of the best neural network models are presented in the Table 2. Results are highlighted with up to 24 prediction steps forward. It is noticed that all neural networks obtained good prediction results considering the acceptable error. The best neural network presented MAPE 3.62% and average EMQ of around 0.000001.
186
L. M. L. de Campos
Fig. 4. Comparison of actual and predicted value using monthly data Table 2. Obtained Neural Networks models Model
Epochs LR MAPE(%) RMSE
Real
Obtained Steps Forward
MLP[1,1,1]
8000
0.7 3.84
5.53731e–006 0.172443 0.175771 1
MLP[1,5,1]
18000
0.7 3.74
1.03083e–007 0.180612 0.180158 6
MLP[1,8,1]
28000
0.7 3.62
6.76947e–005 0.196994 0.185358 12
MLP[1,8,1]
40000
0.9 3.65
2.69315e–005 0.197821 0.190482 18
MLP[1,9,1]
50000
0.8 3.63
9.60352e–005 0.208547 0.194688 24
RNR[1,10,1] 9000
0.6 3.29
1.96329e–005 0.172443 0.178709 1
RNR[1,10,1] 8000
0.5 3.25
7.80065e–006 0.180612 0.184562 6
RNR[1,8,1]
12000
0.4 3.19
1.45586e–005 0.196994 0.191598 12
RNR[1,8,1]
20000
0.6 3.10
5.57503e–007 0.197821 0.198877 18
0.2 2.4
5.70702e–006 0.208547 0.205169 24
RNR[1,11,1] 60000
5.6
Recurrent Neural Network (RNR) - Data with Monthly Periodicity
In this section we highlight the simulations performed with recurrent neural networks with the recurrent output layer. RNAR was based on the ARX model [7], which is nothing more than an MLP network whose input consists of the recurring output layer. This neural network architecture is similar the ARX Model (Autoregressive with exogenous inputs), given by Eq. 9. Where x(n) is the input to the system and y(n) the output, where the function f(.) is a nonlinear function, generally unknown and x(n) and y(n) correspond to the input and output at time n, while dy > 0 is the memory order. The simulation results are illustrated in lines 6 to 10 of Table 2. The best Recurrent Neural Network model presented a MAPE of 2.40306% and an average RMSE of around 0.000001. This result obtained with the RNR was better than the one obtained with the polynomial regression model that got MAPE of 2.66% and better than the multilayer perceptron neural network model that got 3.62%. y(n) = f [a1y(n − 1) + a2y(n − dy) + x(n)]
(9)
Time Series Forecast Applied to Electricity Consumption
6
187
Conclusions
Linear and non-linear computational intelligence and identification techniques were applied to obtain models to estimate electrical energy consumption. For the modeling, three representations were used, regression models, direct and recurrent neural networks. The results showed that the best regression model was the linear one, using moving average, it obtained MAPE of 3.27% for predictions with daily data. The results obtained with the Recurrent Neural Network, MAPE of 2.4% considering data with monthly periodicity, were better than the one obtained with the polynomial regression model that obtained MAPE of 2.66% and better than the multi-layer perceptron neural network model that obtained 3.62%. It can be seen that the models are able to predict well for both the short, medium with predictions up to 24 steps ahead. The lowest values of MAPE and RMSE obtained in the work of [8] were 11.88% and 361.1681 respectively. The values obtained by our methodology were MAPE of 2.4% and RMSE of 1.03083e-007, therefore, they were lower than those obtained in the work of [9]. Acknowledgment. We would like to thank the Federal University of Par´ a for having supported this study.
References 1. Zhang, L., et al.: A review of machine learning in building load prediction. Appl. Energy 285 (2021) 2. Farsi, L.B., Amayri, M., Bouguila, N., Eicker, U.: On short-term load forecasting using machine learning techniques and a novel parallel deep LSTM-CNN approach. IEEE Access 9, 31191–31212 (2021) 3. Amasyali, K., El-Gohary, N.M.: A review of data-driven building energy consumption prediction studies. Renew. Sustain. Energy Rev. 81, 1192–1205 (2018) 4. Seyedzadeh, S., Rahimian, F., Glesk, I., et al.: Machine learning for estimation of building energy consumption and performance: a review. Vis. Eng. 6, 5 (2018) 5. Farsi, B., Amayri, M., Bouguila, N., Eicker, U.: On short-term load forecasting using machine learning techniques and a novel parallel deep LSTM-CNN approach. IEEE Access 9, 31191–31212 (2021) 6. Grubbs, F.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969) 7. Aguirre, L.A.: Introdu¸ca ˜o a ´ Identifica¸ca ˜o de Sistemas, Editora UFMG, terceira edi¸ca ˜o (2007) 8. Xie, A., Yang, H., Chen, J., Sheng, L., Zhang, Q.: A short-term wind speed forecasting model based on a multi-variable long short-term memory network. Atmosphere 12, 651 (2021) 9. Lee, Y.W., Tay, K.G., Choy, Y.Y.: Forecasting electricity consumption using time series model. Int. J. Eng. Technol. 7(4.30), 218–223 (2018). ISSN 2227-524X
A Survey on Text Processing Using Deep Learning Techniques Akshita Tyagi1 , Terrance Frederick Fernandez2 , K. Shantha Kumari3 , and Amit Kumar Tyagi4(B) 1 School of Computer Science and Engineering, Vellore Institute of Technology,
Chennai 600127, Tamil N¯adu, India [email protected] 2 Institute of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Thandalam, Chennai 602105, India [email protected] 3 Department of Data Science and Business Systems, School of Computing, SRM Institute of Science and Technology, Kattankulathur 603203, India [email protected] 4 Department of Fashion Technology, National Institute of Fashion Technology, New Delhi, India
Abstract. We report an experiment in which we attempted to determine emotion class at the phrase level. The method is based on a mixture of machine learning and key word analysis. A substantial annotated data set exists in which a statement was manually classified beyond the six fundamental emotions: joy, love, rage, fear, surprise and sadness. Using the annotated data set, create an emotion vector for the key word in the input sentence. Calculate the emotion vector using an algorithm of a phrase from the emotion vector of a word. The sentence was then classified into relevant emotion classes based on the emotion vector. In comparison to an individual method, the results are demonstrated and determined to be satisfactory. The goal of this article is to showcase many of the most significant text document categorization approaches and methodologies, as well as to raise awareness of some of the intriguing difficulties that remain unanswered, particularly in the fields of machine learning techniques and text representation. Keywords: Sentence Level · Emotion Detection · Emotion Vector · Machine Learning · Natural Language Processing
1 Introduction The Internet and the World Wide Web are generating a massive quantity of data from users who contribute text relating to product reviews, thoughts, attitudes, and other services. This data is being processed and analyzed by a variety of techniques. To evaluate the material and comprehend these processes, NLP and information retrieval technologies are applied. Sentiment analysis’ fundamental challenge is divided into two categories: positive opinion and negative opinion. This study compares sentiment classification © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 188–205, 2023. https://doi.org/10.1007/978-3-031-27440-4_19
A Survey on Text Processing Using Deep Learning Techniques
189
approaches based on lexicon and sentiment classification methods based on machine learning. Several methods and methodologies are used. This document classifies and categorizes characteristics. To tackle sentiment analysis difficulties, many approaches are used. Emotions are a part of human nature and play a vital role in behavior science. Emotions are a set of thoughts, feelings, experiences, behaviors, cognitions, and conceptualizations that define a state of mind. The three basic methodologies used when creating text-based ED systems, as well as their merits and shortcomings, have been described. The present state-of-the-art is also discussed, with an emphasis on the applicable methodologies, datasets used, key contributions, and limits. Twitter sentiment analysis is a very new and difficult study subject. Because social media sites such as Twitter include a large volume of text sentiment data in the form of tweets, it is possible to determine people’s feelings or opinions on a certain event. Opinion mining is another term for sentiment analysis, is beneficial for film reviews, product reviews, customer service reviews, and opinions about any event. This allows us to determine if a certain item or service is excellent, terrible, or preferable. It may also be used to find out what people think about any event or person, as well as the polarity of text, whether positive, negative, or neutral. Sentiment analysis is a sort of text classification that may categorize text into various emotions. Sentiment analysis is a way of transforming, extracting and understanding views from a text using NLP and classifying them as positive, negative, or natural feelings. The two primary approaches of sentiment analysis have been described as a machine learning and a lexicon-based approach methodology. The lexicon-based approach counts the negative and positive words connected with the data, whereas the machine learning technique employs algorithms to classify and extract sentiment from data. Scholars have now been developing a new sentiment analysis algorithm that is both accurate and useful. One of the NLP approaches is the feature extraction algorithm. To extract sentiment, extract subject-specific features, from each lexicon that contains sentiment, and relate the sentiment to a certain matter the feature can be used. It outperformed machine learning algorithms, achieving accuracy of up to 87% for online review articles and 919% for general web page and news item reviews. This method concentrated on generic text and eliminated several challenging circumstances in order to get better results, such as confusing sentences or sentences with no feeling.
2 Sentiment Analysis is Divided into Numerous Categories Different methods of sentiment analysis are utilized in the market to analyze people’s feelings. Other sorts of sentiment analysis, in addition to regular opinions – positive, negative, or neutral – aid in understanding people’s underlying sentiments, genuine purpose, and feelings. 2.1 Sentiment with a Finer Granularity One of the most fundamental and often utilized techniques of measuring client attitude is to ask them. This analysis offers us a better grasp of the client comments we’ve received. The feelings are categorized using publicly available categories such as positive, neutral,
190
A. Tyagi et al.
and negative. Another method to scale consumer input is to provide a rating choice ranging from 1 to 5. This method is used by the majority of e-commerce companies to determine their clients’ feelings. 2.2 Sentiment Analysis for Emotion Identification This is a more sophisticated approach of identifying emotion in a text. This type of analysis assists in detecting and comprehending people’s emotions. Anger, sorrow, happiness, frustration, fear, panic, and concern are all possible emotions to include. The benefit of employing this is that a business can better understand why a consumer feels a certain way. However, analyzing people’s emotions via emotion detection is challenging since individuals use a variety of phrases with varied meanings, such as sarcasm. 2.3 Analyses Based on Aspects This sort of sentiment analysis is primarily concentrated on the features of a certain product or service. One of the most fundamental and often utilized techniques of measuring client attitude is to ask them, as well as mechanized procedures such as customer care chores, allowing us to acquire valuable insights on the go. Businesses may use aspectbased sentiment analysis to discover which components of their products or services are causing them dissatisfaction, and it can help them progressively resolve those issues. Problems with new software programs, such as malfunctions or serious problems, can also be handled. 2.4 Sentiment Analysis Based on Intent The automated classification of textual material based on the customer’s intent is known as intent classification. An intent classifier can examine writings and reports organically and classify them into intentions like Purchase, Downgrade, Unsubscribe, and so on. This is useful for understanding the goals behind a huge number of the client’s inquiries, automating measures, and gaining valuable experience. When it comes to areas like customer assistance and sales, intent categorization allows firms to be more customer friendly. It enables them to respond to leads more quickly and handle big numbers of queries.
3 Approaches to Sentiment Analysis The main strategy is rules-based and utilizes a dictionary of words named by sentiment to decide the sentiment of a sentence. Sentiment scores regularly should be joined with extra principles to relieve sentences containing negations, sarcasm, or dependent clauses [18–21].
A Survey on Text Processing Using Deep Learning Techniques
191
3.1 Rule-Based Approach The rules incorporate the following NLP methods: • Tokenization, stemming, and part-of-speech tagging. • Lexicons Because the sequential merger of words is not considered in rule-based systems, they are extremely basic. Superior processing methods can be employed, as well as the most up-to-date regulations, to permit newer vocabularies and modes of expression. The inclusion of new rules, on the other hand, might have an impact on previously achieved outcomes and make the entire system highly convoluted. Rule-based systems need continuous fine-tuning and maintenance, which will necessitate funding at regular periods. 3.2 Machine Learning Approach Machine learning strategies rely on machine learning algorithms rather than human designed rules. A sentiment analysis problem is typically described as a classification problem, in which the classifier receives text input and assigns it to one of three classes: positive, negative, or neutral [7].
Fig. 1. Machine Learning Method
• During the training phase, our model learns to match a certain input data set to the associated output data set, as shown in Fig. 1 (a). The textual input is transformed into a features vector by the feature extractor. The feature tag and vector pair pairs are then sent into the algorithm, which generates a model. • The feature extractor translates concealed textual inputs into feature vectors in the prediction process represented in Fig. 1 (b). The model is then fed these vectors, which generate prediction tags for the relevant vectors.
192
A. Tyagi et al.
3.3 Lexicon Based Approach The following was the approach used to complete the sentiment classification challenge. To begin, all text data training weights and classified text are calculated. After then, the full textual data is stored in a one-dimensional emotion field. The mean weights of the training text data were then determined for each sentiment category. The classified text belonged to the 1-D emotion field’s closest category [17]. The accompanying graphic (2) depicts the machine learning and lexicon-based approaches in action.
Fig. 2. Machine Learning Approach vs Lexicon Based Approach.
The following was the approach used to complete the sentiment classification challenge. To begin, all text data training weights and classified text are calculated. After that, the complete textual data is stored in a one-dimensional emotion field. The mean weights of the training text data for each sentiment category were then calculated. The classified text belonged to the closest category in the 1-D emotion field. The process of machine learning and lexicon-based approach is depicted in the above Fig. 2.
4 All Approaches Advantages and Limitations Table 1 shows the advantages and disadvantages of the Rule-Based technique, Machine Learning approach, and Lexicon-Based Approach. By referring the below table, we can see the limitations and advantages of different algorithms so that we can choose the better one.
A Survey on Text Processing Using Deep Learning Techniques
193
Table 1. Approaches’ Benefits and Limitations of different algorithms S. NO
Approaches
Advantages
Limitations
1
Rule Based Approach
Data isn’t required for training. Exceptional precision It’s a great way to collect data since you can set up the system with rules and then let data flow in as people use it.
The recall rate is lower. The task of listing all of the requirements is laborious and time-consuming.
2
Machine Learning Approach
It is not necessary to use a dictionary. Demonstrate a high level of categorization precision.
Many times, a classifier trained on textual input in a single field does not function with other fields.
3
Lexicon Based Approach
It is not necessary to provide a name for the knowledge or the technique of learning.
Requires incredible semantic assets that aren’t widely available.
5 Text-Based Emotion Detection (TBED) This section gives a general introduction of emotion models, which describe how emotions are recognized. Some datasets are indicated for academics looking for data for studies Reference [1] identifies several different ways to describe emotions. According to the search results, 202 of the 1810 results accessible on IEEE Xplore for the search keyword “ED” over the whole year range were focused on “ED from texts.” Similarly, out of a total of 5481 results for “ED,” the Scopus database returned 593 “ED from texts” results. Figures 3 and 4 exhibit graphs depicting the distribution over a ten-year period (i.e., from 2010 to 2020). As opposed to text-based ED, the results demonstrate that multimodal types of ED, such as speech, body language, facial expressions, and so on, are commonly worked on. The scarcity is due to the fact that, unlike multimodal approaches, texts may not depict specific emotional cues, making emotion identification from texts significantly more challenging in contrast to other methods. Furthermore, the challenges of extracting emotions from grammatically incorrect texts, brief messages, sarcasm in written documents, contextual information, and other sources might be exhausting. Inadequate understanding of appropriate text extraction methods for the field, which is still in its infancy due to a lack of study, is a key roadblock in accurately recognizing emotions from written texts. r:A∗T→E
194
A. Tyagi et al.
Fig. 3. In the IEEE Xplore Database, a graph depicting the discrepancy of research in emotion detection and emotion detection from texts.
Fig. 4. In the Scopus Database, a graph depicting the discrepancy of research in emotion recognition and emotion detection from texts.
T is for the text from which emotions are to be drawn, and A stands for the author of T. While the problem may appear simple at first, determining the appropriate relationship under which an author can be significantly associated with their written texts in order to determine their emotions can be difficult. The variable r represents the relationship between the author and their written texts, which frequently express emotions, and it states that, while the problem may appear simple at first, determining the appropriate relationship under which an author can be significantly associated with their written texts in order to determine their emotions can be difficult. Text-based education has distinct hurdles as a result of all of these concerns. Despite its challenges, the field has made great progress in improving human-computer interaction. These include detecting and providing timely assistance to individuals who may be suicidal, 7, 8 detecting insulting sentences in conversations, 10 chatbots for psychiatric counseling, 29 and so on, all of which are still in the early stages of development. 5.1 Datasets for Text-Based ED (Emotion Detection) Research The collecting of data relevant to the course is the next critical stage in recognizing emotions from text after settling on the model to represent emotions. For research purposes, there are a few structured annotated datasets for Emotion Detection that are freely available. This section lists the most important publicly accessible datasets and their characteristics. Table 2 contains a table with the datasets, their attributes and the emotion models they reflect.
A Survey on Text Processing Using Deep Learning Techniques
195
Table 2. Datasets for identifying emotions in texts that are publicly available S.NO
Dataset
Feature
Emotion Model
1
ISEAR21
7665 phrases annotated for fear, Distinct joy, rage, sorrow, guilt, disgust, and shame reactions were acquired from 37 nations through cross-cultural study.
2
SemEval-2017 Task 4
1250 texts were selected from Twitter, Google News, news headlines, and other notable publications. The six primary emotions identified by Ekman have been labeled.
Distinct
3
EmoBank
articles, blogs, news headlines, travel guides, newspapers, letters and fiction are just a few examples of what you may find on the internet.
Dimensional
4
WASSA-2017 Emotion Intensities (EmoInt)
Tweets were used to create this map, which was labeled for emotions including happiness, sadness, fear, and rage.
Distinct
5
Affect data from Cecilia Ovesdotter Alm
The emotions are divided into five Distinct categories: angry, afraid, pleased, sad, disgusted, and startled.
6
DailyDialog
There are 13118 dialogues in this Distinct collection, all of which have been annotated for happiness, sorrow, rage, contempt, fear, surprise, and other emotions.
7
CrowdFlower
It’s made up of 39,740 tweets that Distinct have been annotated for thirteen13 emotions.
8
Grounded emotions
2557 total tweets were gathered Distinct and examined to determine if they were in a joyful or sad mood.
9
Emotional Stimulation
Data for the emotion lexical unit was created using Frame Nets’ annotated data. There are 1594 emotion-labeled sentences in this collection.
Distinct
(continued)
196
A. Tyagi et al. Table 2. (continued)
S.NO
Dataset
Feature
Emotion Model
10
The Valence and Arousal dataset
2895 Facebook Posts were used to create this page.
Dimensional
12
MELD data
Friends talks and utterances were Distinct used to compile this list.
13
Emotion Lines
Conversations from the Friends TV show and Facebook messenger chats were used to compile this list.
Distinct
14
SMILE dataset
Tweets concerning the British Museum were used to compile this list.
Distinct
15
Dens Dataset
The data consists of 9710 paragraphs categorized as pleasure, anger, sadness, anticipation, fear, surprise, disgust, love, and neutral from Wattpad stories and Project Gutenberg books.
Distinct
16
Aman Emotion Dataset
Blogposts were used to create this Distinct piece.
6 Feature Set Python was used to create the feature extraction. SNoW takes only operational features as input, resulting in a feature set with an average size of 30 features. A list of features is provided below. These had been incorporated as Boolean values having sustainable value ranges. In order to gain higher generalization coverage, the ranges often overlapped. • • • • • • • • •
The story’s first phrase Combinations of specific characteristics In a sentence, direct speech (i.e., the entire quote). Type of narrative with a theme (There are three main categories and fifteen sub-types.) (! and?) are notable punctuation marks. Uppercase word in its entirety Number of words in a sentence (0–1, 2–3, 4–8, 9–15, 16–25, 26–35, > 35) Story range progress. Count the number of Vs in a phrase (excluding participles) (0–1, 0–3, 0–5, 0–7, 0–9, > 9) • Count the number of balances of contrary forces words (1, 2, 3, 4, 5, 6) • Emotion or feelings terms from WordNet • Affective words and interjections.
A Survey on Text Processing Using Deep Learning Techniques
197
A variety of storey advances and witticisms in feature conjunctions were coupled with counts of positive and negative words. Feature groups 1, 3, 5, 6, 7, 8, 9, 10, and 14 are obtained directly from phrases in the narrative, having features 9, 10, and 14 being tagged with the SNoW POS-tagger. The number of active verbs in a sentence is represented by Group 10. Verb domination, together with quote and punctuation, aims to represent the idea that emotion is frequently accompanied by greater activity and involvement. There are three main story categories in the current collection (JOKES AND ANECDOTES, ORDINARY FOLK-TALES, AND ANIMAL TALES), in addition to 15 subclasses (e.g., subclass of the ORDINARY FOLK-TALE is a supernatural helpers). Words are plainly significant in semantic tasks. We looked at specific word lists in addition to examining ‘content words.’ Synonyms and hyponyms were manually extracted for nouns and any verbal homonyms that were identical. 6.1 Extraction of Feature The goal of pre-processing is to make the border of each language structure explicit and to remove as many language-dependent elements as feasible, such as tokenization, stop words removal, and stemming [10]. The first stage in pre-processing is FE, which converts text materials into a readable word format. Pre-processing procedures include eliminating stop words and stemming words [12]. Text categorization materials are represented by a large number of characteristics, the majority of which may be irrelevant or noisy [9]. The elimination of a significant number of terms, ideally based on statistics, is known as DR to generate a low-dimensional vector [13]. Because successful dimension reduction makes the learning work more efficient and saves more storage space, DR approaches have lately received a lot of attention [14]. The steeps most used for feature extractions (Fig. 5) are: • Tokenization: It is the process of treating a document as a string and then partitioning it into tokens. • Removing stop words: Stop words like “the,” “a,” “and” and so on are commonly used, thus the unnecessary words should be omitted. • Stemming word: Using a stemming method to transform diverse word forms into canonical forms that are comparable. The process of conflating tokens to their base form, such as connection to connect, computing to compute, and so on, is called this stage.
198
A. Tyagi et al.
Fig. 5. Text document classification
6.2 Feature Selection After feature extraction, the next stage in text classification preprocessing is feature selection to generate vector space, which improves a text classifier’s scalability, efficiency, and accuracy. A decent feature selection approach should take domain and algorithm properties into account [15]. The primary concept behind FS is to choose out a subset of characteristics from the source papers. FS is carried out by retaining the words with the highest score based on a preset estimate of the word’s value [9]. The chosen attributes preserve the physical meaning of the data and improve understanding of the learning process [11]. The large dimensionality of the feature space is a key issue in text categorization. Almost every text domain includes a large number of characteristics, the majority of which are neither relevant or advantageous for text classification tasks, and even small noise features can degrade classification accuracy significantly [16]. As a result, FS is widely utilised in text classification to minimise feature space dimensionality and enhance classifier efficiency and accuracy.
7 Comparison Analysis The lexicon-based technique, machine learning-based approach, and hybrid-based approach are the three types of approaches to sentiment analysis. This research examines the differences between lexicon-based and machine-learning-based techniques. 7.1 Lexicon Based Approach A sentiment dictionary with sentiment terms is the lexicon-based strategy. The emotion words are given a score that signifies positive, negative, or neutral feeling. The collection of sentiment terms, phrases, and even idioms are generated the communication of lexicon sentiment. There are two types of lexicon-based approaches these are dictionary-based classification and corpus-based classification as shown in Fig. 6.
A Survey on Text Processing Using Deep Learning Techniques
199
Fig. 6. Lexicon based Classification
7.2 Dictionary-Based Classification The data is manually gathered in this sort of categorization, and the information is searched for synonyms and antonyms in a sentiment dictionary. WordNet and SentiwordNet are the two dictionaries in question. 7.3 Corpus-Based Classification This comes close to the goals of dictionaries in a single topic. The terms refer to Latent Semantic Analysis (LSA) and a method based on semantics, both of which are statistical and semantic methodologies. 7.4 Machine Learning Based Classification Machine learning algorithms are the most effective approach for classifying sentiments into positive, negative, and neutral categories in text. Machine learning necessitates a dataset for training and testing. The learning dataset is referred to as the document-based learning dataset, and the validation performance is referred to as the validation performance. The reviews are categorized using machine learning techniques. A supervised learning algorithm and an unsupervised learning algorithm are the two types of machine learning algorithms. SVM, Maximum Entropy, Nave Bayes, and KNN are examples of supervised learning algorithms. HMM, Neural Networks, PCA, SVD, ICA, and other unsupervised machine learning methods are examples [2]. 7.5 Support Vector Machine It is one of the most effective classification approaches for machine learning algorithms. In classical learning approaches, SVM is based on structural risk minimization, which determines the hypothesis with the lowest chance of mistakes [3]. It is based on minimizing the empirical risk, which is the learning set’s performance. Quadratic optimization issues result. Complexity categorization issues require a greater number of patterns and a larger scale. On SVM, the feature space dimensionality has no bearing.
200
A. Tyagi et al.
7.6 Comparative Table for Different Classification Algorithm Below the Table 3 is showing the accuracy percentage of different algorithms applied. Through this accuracy we can determine which algorithm gives the best result and we can also see their attribute totals also for each algorithm like totals of positive, negative, and neutral words and because of this total we can calculate the accuracy percentage of different algorithm. Table 3. Comparison of different algorithms S.NO
Algorithm
Total words
Positive Words
Negative
Neutral Words
Accuracy Percentage (%)
1
NB [10]
5576
2115
1199
103
96
2
SMO [10]
5576
1987
1254
127
96
3
Random forest [13]
5576
1419
1210
186
96
4
Random Tree [13]
5576
1204
1213
34
100
5
Keyword [13]
5576
2250
720
806
97
6
Emotion [13]
5576
1456
530
1700
93
7
Sentiword[13]
5576
2110
513
203
87
8
SVM [5]
45000
23514
21486
-
76
9
Maximum Entropy [5]
45000
22606
22226
-
75
10
CNN-KNN
3500
600
600
-
91
7.7 Naïve Bayes It’s a straightforward and efficient categorization algorithm. It is typically used to classify documents at the document level. It estimates the probability of words and categories in a text document. It is heavily reliant on approaches based on features. It requires quick and accurate categorization. Large datasets are not required. 7.8 K-Nearest Neighbor In a comparable test document, it identifies the labels category that is related to the training document. In KNN [4], this approach classifies objects into object-based classes. It is a sort of lazy learning in which just the function is estimated locally, and all computations differ until classification. It’s used to calculate the Euclidean or Manhattan distance [5].
A Survey on Text Processing Using Deep Learning Techniques
201
7.9 Maximum Entropy This categorization method demonstrates the usefulness of NLP applications. There is no connection between the characteristics. When the conditional independence assumption is fulfilled, performance may be improved. With training data, the feature functions model equals the predicted values [5]. 7.10 Decision Tree Learning It’s a tree-based method that consists of a collection of child and root nodes. It concentrates on the desired outcomes. The text property is represented by every internal node in the decision tree model, which is a flow chart structure model. Each branch represents the text’s conclusion, whereas the left node represents the child node or class distributions. ID3, CART, and C4.5 are three well-known DT algorithms. The ID3 method is a relatively simple technique that is used to partition data into categories. The Gini coefficient is utilized as the test attribute for selection criteria using the CART algorithm. The ID3 is given to C4.5. The gain ratio is used as a splitting criterion. 7.11 Semantic Orientation Approach It’s a categorization based on unsupervised learning. The training dataset is not required. It identifies the positive and negative measures that are incorporated in the verb to defend. 7.12 Keyword-Based Classification The categories are known as “Bag of Words.” Because the terms are domain independent, they are classed as either positive or negative. It gives precise spelling categorization and assigns equal weight to each word [6]. 7.13 Emotions Based Classifications This is referred to as fundamental emotions. The set is used to categories the feelings as good or negative. Positive and negative emotions are carefully categorized. In the text files, good and negative emotions are represented by a set of symbols [6]. This section provides a basic overview of emotion models, which specify how emotions are detected. Some datasets in table 4 are highlighted for academics that want data for their field study. This table shows the comparison of different approaches with several kinds of datasets, findings and as well as limitations which is clearly stated below and because of this information one can decide to which algorithm should be implemented for the better results. In this table we have taken different datasets for the algorithms to check for the better findings and limitations with different dataset we have taken for the algorithm we selected. Table 4 provides a summary of the state-of-the-art literature in the domain of text-based ED as discussed in the article. The work has been arranged according to the year of publication (in descending order, from 2019) in order to aid in the comprehension of the field taking into account the progress of research.
202
A. Tyagi et al. Table 4. An overview of recent developments in text-based emotion identification
S.NO
Approach
Dataset(s)
Findings
Limitations
1
Machine Learning
Emo-Dis-HI data
Cross-lingual embeddings and Words are used without transfer learning were used to consideration for their demonstrate that information context. gleaned from valuable resource languages may be applied to other fields of language. An F1 score of 0.53 was obtained.
2
Machine Learning
Tweets
With an accuracy of 72.06% versus 55.50%, the NB machine learning technique outperformed the KNN machine learning technique.
3
Rule Based
ISEAR data
In the ISEAR dataset, Words are used without emotions were detected with a consideration for their strong focus on phrasal verbs context.
4
Machine Learning
Tweets
For text-based emotion recognition, there was a BERT and HRLCE model given. For the joyful, furious, and sad emotion classes, they received an F1 score of 0.779.
5
Machine Learning
Task 3 dataset for SemEval-2019
An attention-based paradigm Doesn’t perform well when for categorizing emotions was it comes to identifying presented. They scored 0.7582 happiness. on the F1 scale.
6
Machine Learning
Texts in many languages on Facebook
A bi-directional transformer BERT architecture was proposed. Hindi texts had a 0.4521 F1 score, whereas English texts received a score of 0.5520.
7
Hybrid
Tweets
Both online and offline, SVM, loose semantic feature NB, and Text emotions were extraction detected using a Decision Tree. A 90% accuracy rate was found.
8
Hybrid
News Headlines
I classified emotions into six groups by using SVM classifier.
Contextual information in sentences is extracted in a limited way.
There are a lot of misclassifications.
A BERT design for bi-directional transformers was proposed. Hindi texts had a 0.4521 F1 score, whereas English texts received a score of 0.5520.
Improve your performance with a robust classification technique.
(continued)
A Survey on Text Processing Using Deep Learning Techniques
203
Table 4. (continued) S.NO
Approach
Dataset(s)
Findings
Limitations
9
Machine Learning
YouTube Comments
Accuracy of Emotion Classification: 59.2% 65.97% and 54.24%, respectively, for multiclass emotion labels.
Accuracy outcomes that are satisfactory
10
Machine Learning
Interviews, forums, and article comments were used to create a dataset
The SVM had an accuracy of more than 75% and a recall of more than 80%, while the Tree Bagger and the Multilayer Neural Network both have recall and accuracy of above 75%.
In the model or in the design, there is no semantic representation.
11
Hybrid
Tweets
To extract actionable emotion patterns from Tweets, we used the NRC emotion lexicon and SVM.
Generalization is challenging due to the small number of emotion classifications.
12
Machine Learning
Emoji Prediction is a shared task during SemEval-2018
For recognizing emotions in emojis, proposed a label wise attention LSTM method.
With regularly used emojis, the model did not operate properly.
8 Issues in Text Sentiment Analysis This section discusses some of the outstanding concerns that have been found and suggests some potential study directions for emotion detection researchers that work with text. The most up-to-date technology talks in this article revealed that field research is primarily divided into two parts. Language representation and categorization are the two parts of the process. The extraction of contextual information is critical during language representation since it provides the foundation for increasing categorization accuracy. The need to offer a comprehensive method for extracting this contextual information from text has been regarded as a critical concern. The use of transformer-based embeddings improved the quality of contextual data extraction significantly. 10,87,89 However, various constraints, such as out of vocabulary (OOV) restrictions, increased level of complexity, and, most crucially, technique in tiny networks, overfitting is a problem, impact the usage of transformers. 99,100. An ensemble of attention and neurofuzzy networks101 might assist lessen the limiting effects of transformers because of the mentioned limitations. As a result, categorization performance improves. Prior to classification, the attention networks should focus on extracting important the neuro-fuzzy networks, on the other hand, have unique characteristics should provide clearer intelligibility and categorization of the recovered characteristic.
9 Conclusion The results of the performed systematic literature review include research on sentiment analysis in social media. The following three contributions are made by the paper. First, we’ll go through the approach for assessing social media sentiment. Although several
204
A. Tyagi et al.
methods have been proposed by researchers, the most frequent methods used in Lexiconbased methods are SentiWordnet and TF-IDF, while Nave Bayes and SVM are used in machine learning. The data itself determines whether type of sentiment analysis is acceptable. Both strategies yielded comparable results in terms of accuracy. The structure of the text, as well as the time and amount of data, must all be considered. Combining lexical and machine learning methods to increase the quality and accuracy of the outcome is recommended. If the data structure is jumbled, there is a little amount of data, and you only have a short amount of time to analyze, the lexicon-based strategy is advised. Machine learning-based methods are better suited to larger data since they take more time and data to train. Combining lexical and machine learning methods to increase the quality and accuracy of the outcome is recommended. Second, we figure out which social media sites are most widely utilized to collect data for sentiment analysis. Twitter is the most widely used social media channel for information gathering. The bulk of the articles in this evaluation make use of Twitter as their primary social media platform. Because of Twitter’s enormous availability, accessibility, and diversity of content, this is the case. Millions of tweets are sent out every day on virtually any topic. As a result, social media is quickly becoming into a vital source of information. Blogs, WordPress, YouTube, and other social media sites, on the other hand, attract less attention. Because the content of each social networking site may differ, it’s worthwhile to investigate different possibilities and discoveries.
References 1. Borod, J.C.: The Neuropsychology of Emotion. Oxford University Press, Oxford, UK (2000) 2. Ajayi, A., Idowu, S.A., Anyaehie Amarachi, A.: Comparative study of selected data mining algorithms used for intrusion detection. Int. J. Soft Comput. Eng. (IJSCE) 3(3), 237–241 (2013) 3. Ashari, A., Paryudi, I., Min Tjoa, A.: Performance comparison between Naïve Bayes, decision tree and k-nearest neighbor in searching alternative design in an energy simulation tool. Int. J. Adv. Comput. Sci. Appl. 4(11), 33–39 (2013) 4. Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6. Anchorage, AK, USA (2008). 5. Emelda, C.: A comparative study on sentiment classification and ranking on product reviews. Int. J. Innovative Res. Adv. Eng. 1(10), 7 (2014) 6. Kharde, V.A., Sonawane, S.S.: Sentiment analysis of twitter data: a survey of techniques. Int. J. Comput. Appl. 139(11), 5–15 (2016) 7. D’Andrea, A., Ferri, F., Grifoni, P., Guzzo, T.: Approaches, tools and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125(3), 26–33 (2015) 8. Yoo, G., Nam, J.: A hybrid approach to sentiment analysis enhanced by sentiment lexicons and polarity shifting devices. In: The 13th Workshop on Asian Language Resources, pp. 21–28. Kiyoaki Shirai, Miyazaki, Japan (2018) 9. Montañés, E., Fernández, J., Díaz, I., Combarro, E.F., Ranilla, J.: Measures of rule quality for feature selection in text categorization. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 589–598. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45231-7_54
A Survey on Text Processing Using Deep Learning Techniques
205
10. Wang, Y., Wang, X.J.: A new approach to feature selection in text classification. In: Proceedings of 4th International Conference on Machine Learning and Cybernetics, IEEE- 2005, vol. 6, pp. 3814–3819 (2005) 11. Liu, H., Motoda, H.: Feature Extraction, constraction and selection: A Data Mining Perpective. Kluwer Academic Publishers, Boston, Massachusetts (MA) (1998) 12. Lee, L.W., Chen, S.M.: New Methods for Text- Categorization Based on a New Feature Selection Method and New Similarity Measure Between Documents, IEA/AEI, France (2006) 13. Manomaisupat, P., Abmad, K.: Feature Selection for text Categorization Using Self Orgnizing Map. In: 2nd International Conference on Neural Network and Brain, 2005, vol. 3, pp.1875– 1880, IEEE Press (2005) 14. Yan, J., Liu, N., Zhang, B., Yan, S., Chen, Z., Cheng, Q., Fan, W., Ma, W.: OCFS: optimal orthogonal centroid feature selection for text categorization. In: 28 Annual International conference on Reserch and Informational reterival, ACM SIGIR, Barizal, pp.122–129 (2005) 15. Wang, Z.-Q., Sun, X., Zhang, D.-X., Li, X.:“An optimal SVM-based text classification algorithm. In: Fifth International Conference on Machine Learning and Cybernetics, Dalian, pp. 13–16 (2006) 16. Chen, J., Huang, H., Tian, S., Qua, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36, 5432–5435 (2009) 17. Dash, R.K., Nguyen, T.N, Cengiz, K., Sharma, A.: Fine-tuned support vector regression model for stock predictions. Neural Comput. Appl. 1–15 (2021). https://doi.org/10.1007/s00 521-021-05842-w 18. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017) 19. Ranjan, G., Nguyen, T.N., Mekky, H., Zhang, Z.L: On virtual id assignment in networks for high resilience routing: a theoretical framework. In: GLOBECOM 2020–2020 IEEE Global Communications Conference, IEEE pp. 1–6 (2020) 20. Kiran, Parameshachari, B.D., Panduranga, H.T., liberata Ullo, S.: Analysis and computation of encryption technique to enhance security of medical images. IOP Conf. Ser.: Mater. Sci. Eng. 925, 012028 (2020) 21. Tyagi, A.K.: Using Multimedia Systems, Tools, and Technologies for Smart Healthcare Services. IGI Global (2022). https://doi.org/10.4018/978-1-6684-5741-2
RePI: Research Paper Impact Analysis Ananya Uppal(B) , P. Maitreyi, H. R. Mamatha, and Jamuna Department of Computer Science and Engineering, PES University, Bangalore, India [email protected], {mamathahr,jamuna}@pes.edu
Abstract. The purpose of this study is to analyze the impact a Research Paper has on other Research Papers which cite it. It can be used by students and researchers to understand a domain and the reach of their work. The method of studying the relevance by reading all citing documents is time-consuming. RePI provides a web-based tool to visualize the key aspects of a paper. It makes use of NLP to extract keywords and APIs to analyze metadata such as the number of cited papers, influential citation count, year of publication, etc. This data can be obtained by a unique identifier for the Research Paper such as a DOI number, etc. RePI also introduces the concept of Impact Metric Ratio which quantifies the Paper’s relevance. It is extrapolated using the influential citation count and makes it easy to compare the papers efficiently. Current methodologies provide techniques to measure journal impact very well, but Research Paper Impact Analysis has scope for further research.
Keywords: Research Paper Analysis Scholar API · Impact Metric Ratio
1
· Keyword Extraction · Semantic
Introduction
There are several different ways to define the impact of a paper based on what the paper discusses and whom the paper is directly affecting. Different organizations may be concerned with different aspects of a research study. It is essential to analyze this impact to gain an understanding of the problems and maximize the impact the research has [1]. Research is an incremental process and has been ongoing ever since the beginning of humanity. One uses earlier findings and builds upon them in order to make new discoveries. This idea forms the basis for the RePI tool. We analyze the impact a particular research paper has on other papers in similar domains by analyzing the Referencing paper and Citing papers. Gauging the reach of one’s work helps the author in finding potential collaborators who might be interested in carrying out more work in the same domain. Natural Language Processing provides an easy way to extract keywords from a research paper corpus but this method is quite time-consuming and may produce varied results. A better way to extract the key aspects of a paper is by analyzing its abstract as it provides a brief overview of what the research paper c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 206–216, 2023. https://doi.org/10.1007/978-3-031-27440-4_20
Research Paper Impact Analysis
207
encompasses. RePI provides an interactive web-based tool that culminates all these ideas to visualize the recurring themes in a paper and provide an overview of essential components. The motivation behind undertaking this project is that the most commonly used method for gauging the importance and relevance of a publication is with the citation number and impact of subsequent works that cite it but the exhaustive method of studying the relevance of a research paper by manually reading every other paper that references it is not feasible. Another hurdle faced while reading a paper is an increase in general scientific jargon and complicated sentence structures [2] making the exhaustive reading of a paper even harder. Hence there is a need for a simple, easy-to-use tool that does exhaustive work and presents users with a clear and concise estimate of a paper’s impact.
2 2.1
Literature Survey Previous Works
Many researchers have explored the domain of analysis of research paper impact. Initial works used the h-index and its variant g-index as an indicator to quantify a researcher’s scientific output [3]. The Science Citation Knowledge Extractor [4] is a tool that undertakes impact analysis of research papers in the biomedical field. It makes use of NCBI’s Entrez API which uniquely identifies a paper using its PubMed ID. There are tools such as Open Knowledge Maps which provide a visual representation of research papers of multiple domains. It uses a mixture of summarization techniques and similarity measures based on article metadata. The tool also makes use of collaborative filtering to produce recommendations and allows users to add their inputs to tool-created knowledge maps [5]. There also exist keyword-based delivery search engines for research papers such as Dimensions [6]. This tool aims to make publication and citation data openly available to all and has produced a database that covers the entire research process from start to finish including the funding stage to the actual publication part. Citation Gecko is another web tool that helps a user visualize papers in the form of a citation map. It creates a hierarchical structure of papers cited by its seed paper and vice versa. Semantic Scholar is another example of a web-based search engine tool for research papers. It also provides an API to fetch metadata of the requested paper [7]. The Semantic Scholar Open Research Corpus provides access to 81.1 million papers. Each paper has an abstract and associated metadata aggregated from multiple literature archives. CiteSeerX is a web-based tool that incorporates a scientific literature digital library and search engine [8]. 2.2
Impact Metrics
Studies have shown that even though the number of downloads increases over the lifetime of a paper there is only a moderate correlation between the number
208
A. Uppal et al.
of downloads and the number of citations. The total number of works citing a given paper along with the downloads is essential for any metric developed. The citeScore metric has been a recent development in this field and is calculated for a certain journal [9]. This metric was introduced by Elsevier and accounts for a 4-year long citation window to calculate the ratio A: B, where A is the count of citations of articles, blogs, etc. in a journal, and B is the total number of articles,blogs published in the time period in that journal. The drawback of this metric is that it is relevant to a journal, not to a specific paper. Journal Impact Factor (JIF) is another common metric [10] that uses data from a two-year time span and considers only articles and blogs. It represents the number of times articles from a journal that has been published in the time span has been cited in the JCR year. A drawback of this metric is that not all journals have a JIF score. The PIE method argues that neither the downloads nor the citations alone can provide a unified metric that quantifies the impact [11]. The Citation Velocity measure provides insights into paper diffusion patterns by finding papers cited with lag times similar to reference papers [12]. The Relative Citation Ratio attempts to isolate influential research papers by calculating the journal citation rate (JCR) for each journal that published a particular article [13]. Another metric for impact analysis is by plotting a Zipf graph which compares the number of citations vs its citation rank [14]. 2.3
Keyword Extraction
Text summaries are essential as the exponential growth of available information makes reading entire text impractical. Abstracts and keywords are the most popular forms of summarizing in academic and scientific texts. A keyword is either a single or a collection of words representing a concept. Previous works have implemented methods of extracting keywords based on their frequencies and cooccurrences. This method of extraction is known as Term-Frequency and Inverse Document Frequency (TF-IDF). Domain-independent extraction and extraction without the use of a corpus give a keyword extraction model more versatility in terms of application [15,16]. There exist several statistical models which facilitate keyword extraction such as N-gram which combines ‘n-words together and computes the probability of that phrase appearing in the text. However, this Markov model is heavily dependent on the entire corpus and is a time-consuming process thus making it difficult to use in our tool [17].
3
Dataset Creation
Cornell University has compiled a dataset containing the metadata of 1.75+ million scientific documents published in arXiv which has been selected for the purpose of training and testing the tool. The chosen dataset is a mirror of the original arXiv dataset containing around 60,000 samples with the metadata of paper in a json format. Features included: PaperID, List of Authors, Title of
Research Paper Impact Analysis
209
Paper, Journal, DOI ID, Abstract, Year of Publication. Preprocessing steps like data cleaning was applied on the original dataset after which features relevant to the application were extracted and used to build a more usable dataset. The arXiv dataset is updated every week and thus the latest papers too can be included in the visualization. The scikit-learn python module is used for preprocessing for feature extraction [18].
4
Methodology and Architecture
Fig. 1. Overview of Steps Involved in RePI
RePI is a web-based tool that can be used to view how a published Scientific/Research paper impacts other works in its domain. Natural language Processing, Data Analysis, and Visualization techniques are used to provide the user with concise, easy-to-interpret results. The workflow for RePI has been described in Fig. 1. There are 3 major steps involved in the implementation of RePI as shown which are API calls using an identifier like DOI, Keyword Extraction, and Visualization. When the user inputs a unique DOI number or any other unique identification number for a published paper, a GET request is made to the Semantic Scholar API. Semantic Scholar Literature Graph provides capabilities to retrieve data about a paper, the paper’s authors, and its citations. It also provides capabilities to search for all papers published by a particular author and features about the author such as AuthorId, Name, Affiliation, and Citation Count. It also produces a citation count and an influential citation count metric which is calculated using features such as the total number of citations, the number of direct citations per section, a boolean value for is Considered Helpful, 1/number of references, etc. A list of relevant information such as the authors, the year of publication, the publication journal, in citation count, out citation count, and the paper’s abstract is returned in the JSON payload which is then split and essential information is extracted. After the information retrieval step, the data analyzed and visualizations of multiple factors such as the number of in citations vs out citations, citation count per year, etc are shown. The cited papers are also grouped year by year to depict if the influence of the citing paper has increased or decreased over the years. It gives an idea of how relevant the citation is. As discussed in the Literature Survey, it can be seen that most current works focus more on assigning a metric to
210
A. Uppal et al.
quantify the importance of a journal and not a single paper itself. This presented us with the motivation to understand how the relevance of a single paper can be quantified. The Impact Metric (IM) is calculated on the basis of the Influential Citations count and overall citation counts in the paper’s lifetime. The abstract retrieved is processed using the RAKE algorithm in order to extract the keywords relevant to the paper. A word cloud is constructed from the generated keywords. Keyword extraction and word cloud generation are done for all the citing documents as well in order to find the more recurring topics and domains influenced by the given paper. The steps involved in the work can be seen in the Work Flow diagram in Fig. 2.
5
Design and API Integration
The application uses Semantic Scholar’s RESTFUL Application Programming Interface in order to obtain information about a paper requested by the user. A GET request is made to the URL containing necessary identifiers such as DOI, ARXIV, PUBMED ID, ACL, and the request URL is appended with unknown references. After this, the input data types are validated and a GET request is sent to the Semantic Scholar API Endpoint and a JSON payload is returned containing all the necessary metadata. This metadata is used for further processing.
Fig. 2. Work Flow Diagram of RePI
The application extracts keywords from the abstract obtained for a specified paper. Extracting keywords only from the abstracts helps in reducing a request’s processing time. The following method of keyword extraction was used:
Research Paper Impact Analysis
5.1
211
RAKE
Rapid Automatic Keyword Extraction [19] is an NLP algorithm for keyword identification/extraction. It is document-oriented and domain independent. The algorithm works on the principle that keywords rarely contain punctuation or lexically meaningless words. It works by partitioning the document into candidate keys on the basis of stop words and identifying the ’content bearing’ words from among these. It uses three matrices namely: Word Degree, Word Frequency and their ratio. 5.2
Implementation
The described workflow has been implemented in the Python [20] language using libraries such as streamlit, seaborn and numpy. Since python is an open-source and free, it was the best option to use. Streamlit provides a user-friendly interface to visualize graphs clearly and also provides the capability to render multi-page applications.
6
Impact Metric Calculation
The Impact Metric (IM) is used for the quantification of the impact of a paper on other works. This is used to give users a numeric value that can be used for comparison across different papers. Previous works in the field have calculated impact factors based only on the total citation count and the total number of downloads for a given paper. Our approach uses the total citations for a paper along with a total Influential Citation Count to quantify the overall Impact of a requested paper. The Impact Metric is calculated as: IM =
ICC T CC
where ICC is the total Influential Citation Count and TC is the Total Citation Counts for a given paper.
7
Results
The results obtained from our study can be summarized as follows: 7.1
Visualizations
The technique chosen to represent the results is via visualization of different parameters against one another. This was chosen because it is easier to glance over a result rather than understand it from the heavy textual content. RePI provides illustrative, easy-to-understand visualizations to help users see the impact of a requested paper. The visuals include: (i) A graph of the number of citations used by the paper vs the number of other
212
A. Uppal et al.
works citing it, as seen in Fig. 3. This graph was plotted to understand if the impact of a paper increased over the years by analyzing the number of times it has been cited over time. If the number of out citations is increasing it shows that the paper is becoming relevant in the field over time which is a good measure of how useful it is.
Fig. 3. In Citations vs Out Citations Graph
(ii) The citation count of the given paper by year (since publication to the present day) is shown in Fig. 4 in order to gauge the relevance of a paper through the years since its publication. An increase in the citation count shows the continued relevance of a paper in its pertaining fields.
Fig. 4. Citations grouped by Years
(iii) A word cloud of keywords for the paper requested by the user. This provides a bird’s eye view of the main ideas encompassed by the paper and helps the user understand the major themes without having to read through or skim the entire corpus. An example of this can be seen in Fig. 5.
Research Paper Impact Analysis
213
Fig. 5. Word Cloud of Keywords for Requested Paper
(iv) A word cloud describing keywords for all the works citing the paper. This is a clear and visual way of understanding the overlapping keywords between the cited paper and citing papers. It provides some context as to how connected the papers are, as seen in Fig. 6.
Fig. 6. Keywords of Citing Papers
Table 1 gives an overview of the visualisations and the reasoning they provide. 7.2
Impact Metric Ratio
The Semantic Scholar API provides many parameters describing a paper, one of which is the Influential Citation Count [21]. This is calculated using a supervised machine-learning approach for classification. The features used for this classification include the total number of direct and indirect citations, author overlap, the field of cited paper, etc. The models used for classification are SVM and random forest.
214
A. Uppal et al. Table 1. Understanding Results
Visualization Method
Reason for choice of the method
In-Citation vs Out-Citation Graph
To visualize the ratio of the citations in a given paper to the number of other works citing it, hence determining its influence.
Yearly Citation Count Graph
To visualize the increase in the citations of a paper through the years since its publication.
Keyword Word Cloud for the requested paper
To find the topics and domains covered in the requested paper
Keyword Word Cloud for all the works citing a requested paper
To find the fields of study influenced by the paper’s work
Fig. 7. Impact Metric Ratio
The Impact Metric is the ratio of the number of Influential Citations of a given paper to its total number of citations as shown in Fig. 7. Based on a few naive trials, an impact metric above 0.05 represents a good paper, i.e., a paper with many Influential Out Citations. The results for 6 sample DOIs can be seen in Table 2. Table 2. Sample Results for Research Papers DOI of Paper
ICC TCC Impact Metric
Result
10.1103/PhysRevD.76.044016
15
138
0.10869565217391304
IM >0.05: Good Paper
10.1086/518646
31
155
0.2
IM >0.05: Good Paper
10.1086/519078
1
20
0.05
IM >0.05: Good Paper
10.1103/PhysRevD.75.094017
3
96
0.03125
IM opcost then pc = pc+1; end if If Cost(S) < Cost(S*) then S* = S; end if If pc > ip then DisturbSolution(S); $pc = 0$; end if UpdateTabuLists(); ic = ic+1; end while S = S*; CleanTabuState(); For i=0 to nit2 do Movement (S, modeN); If Cost(S) < Cost(S*) then S* = S; end if UpdateTabuLists (); ic = ic+1; end for return S* Movement Function (Neighborhood) Input solution S Input Neigborhood size ns localS = S; index = RandomIndex(); first = FirtAGEBFrom(indice); Interchange(localS[indice], first); local cost = Cost(localS); Sn = localS; For i=1 to ns do rand = RandomAGEBFrom(index); Interchange(Sn[index], rand); cost_n = Cost(Sn); If cost_n < local cost then localS = Sn; local cost = cost_n; end if end for S = localS; cost = local cost; StartsTabuStaten(); UpdateMemory();
Territorial Design and Vehicle Routing Problem Applied
409
3.3 Parameters Description • nit1: Determines the number of iterations of the first phase (stopping criteria) to find a good solution. • nit 2: Determines the number of moves or iterations that will be made over the best solution found in the first phase. • ip: Determines the maximum number of times worse solutions will be accepted before perturbing the current solution (only during phase one). • tat: The tabu add tenure determines for how many iterations a new centroid added to the current solution won’t be able to be replaced. • tdt: The tabu drop tenure determines for how many iterations a centroid dropped from the current solution won’t be able to be added again. • ns: Determines the number of neighbors to evaluate in the local search process. 3.4 Parameters Description An initial random solution is generated, considered at this point as the best solution found (S*). This happens at the beginning of the search and the first search phase is started, which will perform a number of iterations given by (nit). Within this phase the cost of the current solution is stored (opcost) to continue with a move on this solution (S). This move is a local search that returns the best neighbor within the neighborhood defined on the current solution. If the cost of the new solution (S) is worse than that of the previous solution (opcost), the disturbance counter (pc) is increased and when it reaches the limit (ip), the current solution will be stopped, generating a new one with which the search will continue, that is, the search moves to another place in space with the expectation of finding better solutions. On the other hand, if the cost of (S) is better than the cost of the best solution found (S*), then (S) becomes the new best solution (S*). As a last step, the tabu lists whose elements (tabu centroids or tabu AGEBs) can be expelled if they have reached the maximum number of iterations given by (tat and tdt) are updated. Then the global iteration counter (ic) is incremented. Once this phase is finished, a new search is started on the best solution found in the previous phase (S*), which is stored in (S) to carry out new movements on this solution. Subsequently, the tabu lists are emptied to start a completely new search that will last for a given number of iterations (nit2). In this second phase the disturbance is eliminated since the objective is to intensify the search on the best solution found in the previous phase, therefore, only if the movement carried out on (S) finds a better solution to (S*) this is stores. Finally, the tabu lists are updated (as in the previous phase) before incrementing the global iteration counter (ic). In movement function, according to the value of (ns), a limited number of neighbors will be explored. A neighbor is the action of replacing one of the centroids of the current solution (chosen at random) by a new one, which is chosen from among the AGEBs assigned to the centroid to be replaced. Only a few candidates are randomly selected to be evaluated in the local search, this quantity is determined by (ns). Then, the best candidate replaces the centroid and the new configuration obtained is evaluated. Finally, the active tabu is initialized, that is, the expelled centroid becomes active tabu (AGEB tabu) and likewise the new centroid (tabu centroid). Implicitly, the algorithm has a memory to
410
R. González-Velázquez et al.
determine when the tabu states begin and end, in addition to a frequency memory that keeps track of the centroids that appear most in the solution. These structures are updated in the last step of the algorithm movement function.
4 Territorial Design and Vehicle Routing Problem Applied to the Metropolitan Area of Valle de Toluca The geographical design of censuses requires tools that support decision making to assign populations areas to agents in such a way that there is a balance in terms of surveys volume and workload. Companies that directly or indirectly design population censuses or provides that kind of services need to zone a geographic area as rural territories to know the services and commodities that the inhabitants require in order to provide them a new developments areas and added value. In general, optimization is the answer to balance the geographical design of the TD problem with respect to one or more criteria of interest, for example: the balance between workloads among survey agents, workers, economic benefits, the reduction of distances between distribution and commercial centers. When the territory is zoned or partitioned [14] with some heuristic method, an optimal sequence of visits is designed for each element of the partition (AGEBs) that applies to each and every one of the geographical units for each survey agent. Once the territory design has been partitioned, it is convenient to use the VRP model [15, 16] for each element of the partition in order to determine the optimal set of routes. Traditionally, VRP addressed a type of problem in which customers must be served via several fleets at a minimum cost of operational objectives, subject to side constraints. Therefore, a fleet starts from a point, namely a depot, delivers goods to the customers in the predefined urban network, and returns to the origin depot [15, 16]. This will guarantee the reduction of unproductive times, distances in the territorial groups formed, allowing greater efforts to be devoted to the attention of each population area or AGEB and the movement between them by the survey agent, seeking to improve the performance of the survey team, with the least possible investment or cost. Due to the fact that in normal operational practice it is common to have a large number of AGEBs per each group, it became a factor that increases the difficulty of optimization algorithms to solve this type of problem. In order to reduce the complexity of the study area, the AGEBs are grouped through partitioning techniques that will be taken into account as a single entity and the restrictions for this model will be defined based on geographic criteria. For this problem, in Sects. 2 and 3, the DT was performed with a partition algorithm that applies a dissimilarity matrix as a grouping characteristic that minimizes the distances between the AGEBs. Therefore, the challenge now is to determine an optimal set of routes for each element of the territorial partition, considering each AGEB as a basic unit, this implies that the distance between the AGEBS, as well as survey agents, has the lowest cost. In an optimal route without omitting that from any basic unit it must be reached to any other unit within the same territory. Such features will allow applying a VRP model [17–19] as an optimization method to travel between units in the same territory without leaving it. The objective of this work is to integrate the TD problem with the VRP [17–19] model for each group of the partition where the optimal route runs through each geographical
Territorial Design and Vehicle Routing Problem Applied
411
unit from the centroid and back to the starting point. In this case, the Metropolitan Area of Toluca Valley (MATV) is partitioned to obtain 24 groups [14] and randomly choose a group marked in red from the Fig. 3 of the partition whose centroid is AGEB 378. As can be seen, the territories must be as compact as possible, achieving it with a partitioning algorithm, that is, the AGEBs in each group must be close to each other as much as possible.
Fig. 3. Partitioning the territory of MATV into 24 groups [14]
For the results obtained by TS for this TD problem, a test instance with 469 AGEBs of the MATV was designed, obtaining a compactness cost for 24 groups of 9.357700013089925 in a time of 28 s. Of the 24 groups formed, a sample was taken from the MATV space that will be called “Sample Group” (SG) marked with a circle as a case study to apply TSP. The SG contains 26 AGEBs with centroid at AGEB 378 and the elements that belong to it are the following: 301, 307, 309, 77, 375, 383, 302, 299, 377, 300, 298, 376, 389, 303, 379, 304, 65, 382, 381, 405, 390, 66, 380, 384, 288. Figure 3 shows the partition of the MATV territory into 24 groups, where an optimal sequence is designed to traverse the entire SG.
5 Vehicle Routing Problem Model In this paper we consider the case of a trained VRP with a homogeneous fleet with fixed demand. We will define the problem as follows, Let G(V , A) be a directed graph where V is a set of nodes that represent the cities or geographic unitswith |V | = n and A is the set of arcs that connect them related by a distance matrix D = dij n×n ∀i, j = 1, 2, · · · n, symmetric where each arc has an associated distance dij , the binary decision variable yij is 1 if the arc i − j is included on the route and has value 0 otherwise, Q is the vehicle capacity, qi amount required at city i (given), must be delivered by just 1 vehicle, ui accumulated deliveries at city i. The integer linear programming model of the VPR is presented below [15, 16]: dij yij (1) Min i
j
Subjetto
412
R. González-Velázquez et al.
ykk = 0∀k = 2, · · · , n, n
(2)
yik = 1∀k = 2, · · · , n,
(3)
ykj = 1∀k = 2, · · · , n,
(4)
i=1 i = k n
j=1 j = k uk ≥ ui + qk − Q + Q(yki + yik ) − (qk + qi )yki ∀i, k = 2, · · · , n, i = k
(5)
uk ≤ Q − (Q − qk yik )∀i, k = 2, · · · , n,
(6)
uk ≥ qk +
n
qi yik ∀k = 1, 2, · · · , n,
(7)
i=1 n j=1
y1j ≥ [[
n i=1
qi Q+0.999 ]]
∀i = 2, · · · , n,
(8)
yij ∈ {0, 1} ∀i, j = 2, · · · n, i = j The objective is to minimize total travel distance (1), for each city, except depot and a vehicle does not travel inside itself (2), The constrain (3) and (4) guarantee that one and only one vehicle visits and leaves each node forming by each route,and a vehicle must enter it qi + qk ≤ Q (3), a vehicle must leave it after service qj + qk ≤ Q (4), qk = amount delivered on trip up to city K ≥ amount needed at K but ≤ vehicle capacity (5), If K is 1st stop, then uk = qk (6), If K is not 1st stop then (7), Must send enough vehicles out of depot (8). This article implemented this model in Lingo [3]. The AGEB 378 is the centroid and the starting point for the survey agent, the other AGEBs are populations to cenusus. In congruence with the mathematical model [20, 21] presented in this section anddiscussed by [20, 21], it has the characteristic of including constraints that avoid the generation of undertrips in the optimal solution, producing the optimal route of: 378, 65, 66, 77, 288, 298, 299, 300, 301, 302, 303, 304 and 307. Figure 4 shows the SG with 26 AGEBs obtained from a Geographic Information System (GIS) [14], additionally the optimal route is shown. The VRP model was implemented in the Lingo software [3] for the design of routes in the group of 23 AGEBS, which we will call the red group (GR) obtained from the MAVT map partition by means of a GIS from where they have the numbers of each AGEBS: 378 66 77 298 299 300 301 302 303 307 309 375 376 377 379 380 381 382 383 384 389 390 405. AGEBS 378 is taken as depot, and said group has a distance matrix of size 23 by 23 (Table 1).
Territorial Design and Vehicle Routing Problem Applied
413
Fig. 4. GM with 26 AGEBs in GIS [14]
Table 1. Routes with the common depot in the AGEB 378 Route
AGEBS Sequence
1
378
302
298
300
301
378
2
378
303
299
384
383
378
3
378
376
375
309
307
378
4
378
377
379
378
5
378
380
405
66
382
378
6
378
381
390
77
389
378
Below is a section of the distance matrix between the red group AGEBS of size 6 by 6 ⎛ ⎞ 0 191 178 173 137 159 ⎜ 191 0 90 278 325 308 ⎟ ⎜ ⎟ ⎜ ⎟ 0 315 314 327 ⎟ ⎜ 178 90 D=⎜ (9) ⎟ ⎜ 173 278 315 0 168 68 ⎟ ⎜ ⎟ ⎝ 137 325 314 168 0 1061 ⎠ 159 308 327 68 106 0
6 Conclusions In this work, a methodology is proposed to carry out a population census by distributing the areas and workloads equitably for each survey agent over a territory called MATV that has 469 AGEBS. The proposal consisted of 2 phases: the first was the partition of the territory MATV with TS metaheuristic due to its high efficiency in the territorial partitioning, The second phase consisted of applying the VRP model to each element of the partition in order to obtain a set of optimal route to visit all the AGEBs, minimizing
414
R. González-Velázquez et al.
the travel time for each survey agent in each group of the DT. Taking into account that TD problem is of class NP-hard, the TS metaheuristic was used to search for optimal or approximate solutions. On the other hand, although VRP is an NP-Hard problem, the size of the instance allows to find the optimal set routes using the Lingo software [3]. Finally, as future work, additional homogeneity restrictions on the number of AGEBs in a multi-objective context including features with geometric compactness will be included.
References 1. Lenstra, J.K., Rinnooy Kan, A.H.G.: Complexity of vehicle routing and scheduling problems. Networks 11(2), 221–227 (1981). https://doi.org/10.1002/net.3230110211 2. Glover, F., Laguna, M.: Tabu Search. Kluwer Academic Publishers, Boston (1997) 3. Lingo Homepage. http://www.lingo.com/. last accessed 2022/06/24 4. Kalcsics, J., Nickel, S., Schröder, M.: Towards a unified territorial design approach — applications, algorithms and GIS integration. TOP 13(1), 1–56 (2005). https://doi.org/10.1007/ BF02578982 5. Zoltners, A.A., Sinha, P.: Sales territory alignment: a review and model. Manage. Sci. 29(11), 1237–1256 (1983). https://doi.org/10.1287/mnsc.29.11.1237 6. Bação, F., Lobo, F., Painho, V.: Applying genetic algorithms to zone design. Soft Comput. Fusion Found. Methodol. Appl. 9(5), 341–348 (2005). https://doi.org/10.1007/s00500-0040413-4 7. Mehrotra, A., Johnson, E.L., Nemhauser, G.L.: An optimization based heuristic for political districting. Manage. Sci. 44(8), 1100–1114 (1998). https://doi.org/10.1287/mnsc.44.8.1100 8. Openshaw, S., Rao, L.: Algorithms for reengineering 1991 census geography. Environ. Plann. A Econ. Space 27(3), 425–446 (1995). https://doi.org/10.1068/a270425 9. Altman, M.: The computational complexity of automated redistricting: is automation the answer? Rutgers Comput. Technol. Law J. 23(1), 81–142 (1997) 10. Correa, E.S., Steiner, M.T.A., Freitas, A.A., Carnieri, C.: A genetic algorithm for the Pmedian problem. In: Spector, L.E., Goodman, E., et al. (eds) Proceeding 2001 Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 182–196. San Francisco, USA (2001) 11. Bernábe, B., Osorio, L., Espinosa, R., Ramirez, R., Aceves, R.: A comparative study of simulated annealing and variable neighborhood search for the geographic clustering problem. In: Computación y Sistemas (eds) The 2009 International Conference on Data Mining 2009 (DMIN 2009), pp. 595–599. Puebla, México (2009) 12. Osman, I.H., Christofides, N.: Capacitated clustering problems by hybrid simulated annealing and tabu search. Intern. Trans. Oper. Res. 1(3), 317–336 (1994) 13. Martí, R.: Procedimientos metaheurísticos en optimización combinatoria. Tech. rep., Departamento de Estadística e Investigación Operativa, Facultad de Matematicas, Universidad de Valencia (2002) 14. Bernábe, B.: Integración de un Sistema de Información Geográfica para Algoritmos de Particionamiento. Unpublished (2013) 15. Moghdani, R., Salimifard, K., Demir, E., Benyettou, A.: The green vehicle routing problem: a systematic literature review. J. Clean. Prod. 279, 123691 (2021). https://doi.org/10.1016/j. jclepro.2020.123691 16. Pelletier, S., Jabali, O., Laporte, G.: The electric vehicle routing problem with energy consumption uncertainty. Transp. Res. Part B Methodol. 126, 225–255 (2019). https://doi.org/ 10.1016/j.trb.2019.06.006
Territorial Design and Vehicle Routing Problem Applied
415
17. Konstantakopoulos, G.D., Gayialis, S.P., Kechagias, E.P.: Vehicle routing problem and related algorithms for logistics distribution: a literature review and classification. Oper. Res. 22(3), 2033–2062 (2022). https://doi.org/10.1007/s12351-020-00600-7 18. Elshaer, R., Awad, H.: A taxonomic review of metaheuristic algorithms for solving the vehicle routing problem and its variants. Comput. Ind. Eng. 140, 106242 (2020). https://doi.org/10. 1016/j.cie.2019.106242 19. Pasha, J., Dulebenets, M.A., Kavoosi, M., Abioye, O.F., Wang, H., Guo, W.: An optimization model and solution algorithms for the vehicle routing problem with a factory-in-a-box. IEEE Access 8, 134743–134763 (2020). https://doi.org/10.1109/ACCESS.2020.3010176 20. Desrochers, M., Laporte, G.: Improvements and extensions to the Miller–Tucker–Zemlin subtour elimination constraints. Oper. Res. Lett. 10, 27–36 (1991) 21. Laporte, G.: The traveling salesman problem: an overview of exact and approximate algorithms. Eur. J. Oper. Res. 59(2), 231–247 (1992). https://doi.org/10.1016/0377-2217(92)901 38-Y
A Unified Framework for Knowledge Inference Based on Heterogeneous Graph Neural Networks Chunfu Xie(B) Yonyou Network Technology Co., LTD, Yonyou Research Institute, Beijing 100000, China [email protected]
Abstract. Heterogeneous graph neural networks show powerful transfer aggregation in multi-relational directed graphs, which play a role in knowledge inference and recommendation tasks. Both of these tasks are applications of heterogeneous graph neural networks on knowledge graphs and have a commonality in many aspects. However, there are no relevant articles for comparative analysis. So we analyze the commonality and critical factors of the two tasks through many literature reviews and propose a unified framework for knowledge inference based on heterogeneous graph neural networks. We conduct experimental comparisons on the WN18RR dataset to verify the key factors that make it effective in knowledge inference, and the results prove the correctness of the unified framework. We draw three conclusions as follows. Firstly, taking explicit modeling relationships is better than invisible modeling relationships; Secondly, taking three directions modeling is better than one-direction modeling; Lastly, considering node attention is better than not considering node attention. Keywords: Heterogeneous Graph Neural Network · Knowledge Graph · Knowledge Inference · Recommendation · Unified Framework
1 Introduction The knowledge graph is a structured representation of facts. The concept of structured knowledge in graphs was first proposed in 1988 [1]; the concept gained great attention after being used in Google’s search engine and has been widely used, such as Freebase [2], DBpedia [3], and YAGO [4]. However, the status quo is that many knowledge graphs are incomplete for two reasons. Firstly, entities or relationships are missing in automatically constructing knowledge graphs due to the limited nature of extraction techniques. The second reason is that the infinite nature of knowledge makes it difficult to include all of them when constructing ontologies. The knowledge graph inference is to fill in the missing associations in the knowledge graph, often formulated as the link prediction problem [5]. The inference prediction problem based on the knowledge graph can be divided into three types of models. The first type is translation-based models, such as TransE [6], TransD [7], and TransAT [8], and the second type is multiplicative models, such as RESCAL [9], DisMult [10], ANALOGY [11], and the third one is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 416–426, 2023. https://doi.org/10.1007/978-3-031-27440-4_40
A Unified Framework for Knowledge Inference
417
deep learning-based models. Deep learning based models are divided into three main categories, convolutional neural network based models such as ConvE [12], ConvKB [13], CAPSE [14], graph neural network based models such as RGCN [15], VR-GCN [16], SACN [17], CompGCN [18], KBGAT [19], HGT [20], RAGAT [21]. Graph neural network-based models’ ability to convey aggregated neighbor information has achieved high success in knowledge inference tasks. Heterogeneous graph neural networks play an essential role in knowledge inference and are also naturally applicable to recommendation systems. Compared with non-graph models, graph neural network-based recommendation methods help us to construct good associations between users and items and make recommendation results more accurate and interpretable, such as the work in KGCN [22], KGAT [23], and KGIN [24]. The above two tasks are applications of heterogeneous graph neural networks in knowledge graph representation, which are common in encoding knowledge graphs. However, there are no relevant articles for comparative analysis. This paper compares and summarizes the commonality of heterogeneous graph neural networks in knowledge inference and recommendation tasks. We propose a unified framework for knowledge graph inference through an extensive literature review. Moreover, experimental comparisons verify several critical reasons for the effectiveness of heterogeneous graph neural networks in knowledge inference and prove the rightness of the unified framework.
2 Related Work 2.1 Knowledge Inference Based on Knowledge Graph Knowledge inference is the inference of factual elements in a missing triad based on the known facts in the knowledge graph. It has two subtasks: entity prediction and relation prediction. For entity prediction, it can be expressed that given the head entity, the relation predicts the tail entity. For relationship prediction, it can be expressed that given the head entity, the tail entity predicts the relationship. 2.2 Heterogeneous Diagram A heterogeneous graph is a graph G = {V , E, φ, ψ} V E representing the set of nodes and the set of edges, respectively. Each node v ∈ V has a node type, φ(v) and each edge e ∈ E has an edge type ψ(v). The number of node types and the number of edge types are each at least 1. A heterogeneous graph is defined when the sum of the number of node types and the number of edge types is greater than 2; a homogeneous graph is defined when the sum of the number of node types and the number of edge types is equal to or greater than 2.
3 Background 3.1 Heterogeneous Graph Neural Networks in Recommender Systems The KGCN [22] model introduces the relationship vector by explicitly modeling the relationship data; the idea inherits the GAT model idea. KGCN model characterizes
418
C. Xie
the user’s preference for a specific relationship of the goods with the dot function of the user vector and the relationship vector. The KGAT [23] model uses the pre-trained TransR model’s vector to characterize the user’s preference for a specific relationship. The KGIN [24] model updates commodity node information with commodity-connected knowledge graph information as neighbor nodes. The model also integrates the idea of the KGCN model, which explicitly models the relationship vector by characterizing the importance of a relationship through the dot product of the neighbor node vector and the relationship vector when updating the target node information. 3.2 Heterogeneous Graph Neural Networks in Knowledge Graph Inference The RGCN [15] model considers multi-directional, multi-relational neighbor nodes and their node information and implicitly models the influence of relationships by passing messages Wr . The VR-GCN [16] model considers multi-relational directed graphs. The method explicitly models the relational vector, which is significantly different from RGCN, and the innovation of the method is reflected in the relational vector hr . The SACN [17] model reflects the weight of the relationship in a separate parameter αt . SACN does not consider directed graphs but only different relations, which are the forward, reverse, and self-loop relations. Compared with the homogeneous graph neural network, there are innovations in the weight coefficients. The CompGCN [18] model considers multi-relational directed graphs, which inherit the advantages and commonalities of RGCN and VR-GCN and consider multiple directions based on the homogeneous model. The KBGAT [19] model considers a multi-relational directed graph. The model originates from GAT, and the most significant difference with the GAT model is the method considering the attention of neighbor nodes and target nodes. The model considers the relationship, explicitly modeling the relationship vector with weights. The HGT [20] model considers that different classes of nodes may have their feature distribution. The weight matrix is parameterized according to the node and relationship types. The model maps the target node i as a query vector and the source node j as a critical vector and calculates the dot product of the two as attention. The RAGAT [21] model constructs message functions with relationship-aware capabilities. Specifically, in addition to a matrix of weights shared between different relationships, network parameters specific to the relationship Wr are defined to extract relationship information from neighboring entities in the parameter space. 3.3 Comparative Analysis We summarize the formulas for the current mainstream methods with a unified perspective, as shown in Table 1.
A Unified Framework for Knowledge Inference Table 1. Summary of mainstream methods. Model GCN
Core formula hl+1 = σ( W l hli + |N1(i)| W l hlj ) i
GAT
hl+1 i
j∈N (i)
KGCN
= σ(
j∈N (i)
soft max(MLP(hli ||hlj )) · W l · hlj )
hl+1 = σ (W l hli + i
KGAT
hl+1 = σ (W l hli + i
KGIN
hl+1 = σ (W l hli + i
RGCN
hl+1 = σ( i
j∈N (i)
j∈Ni ,r∈Rij
j∈Ni ,r∈Rij
j∈N (i),r∈Rij
SACN VR-GCN
hl+1 = σ (W l hli + i
soft max(hlu · hlr ) · W l · hlj )) soft max((Wrl hlj )T tanh(Wrl hli + hr )) · W l · hlj )) soft max(hlj hlr )W l hlj ))
W0l hli + |N1(i)| Wrl hlj )
j∈Ni
αtl W l hlj )
hl+1 = σ (W l hli + W l ( i
j∈Ntr ,r∈Rij
CompGCN
(hlj + hlr )+
j∈Nhr ,r∈Rij
(hlj − hlr )))
hl+1 = i
f(
j∈N (i),r∈Rij
l ψ(h , h )) W l = Wλ(r) j r λ(r)
⎧ ⎪ ⎪ ⎨ hj −hr ⎪ ⎪ ⎩
⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
WOl , r ∈ R WIl , r ∈ Rinv Wl, r S
∈ T (self − loop)
hj ∗ hr
KBGAT
hj hr l+1 hi = σ ( j∈Ni ,r∈Rij
RAGAT
hl+1 = σ( i
j∈Ni ,r∈Rij
soft max(MLP(hli ||hlj ||hlr )) · W l · (hli ||hlj ||hlr ))
soft max(ψ(hj , hr )) · W l · ψ(hj , hr ) ψ(hj , hr ) =
⎧ ⎪ ⎪ ⎨ Wr hj −hr ⎪ ⎪ ⎩ HGT
ψ(hj , hr ) =
Wr hj ∗ hr Wr hj hr = σ( soft max((Wτl (i) hli )(Wrl hlr )(Wτl (j) hlj )) · Wτl (i) · hlj )
hl+1 i
j∈N (i)
419
420
C. Xie
To reflect the core of each method more effectively, we mark the core innovation points of each method in blue in Table 2. Therefore, the meanings of the formulas in Table 2 are different, and the position of the blue color is different to indicate that the innovation points are different. Table 2. Comparison and analysis of mainstream methods. Models
Comparison
GCN/GAT
hl+1 = σ( i
KGCN
hl+1 = σ( i
KGAT
hl+1 = σ( i
KGIN
j∈N (i)
j∈N (i)
αij W (l) hlj ) αij W (l) hlj ) αij W (l) hlj )
j∈N (i) l+1 hi = σ ( αij W (l) hlj ) j∈N (i)
RGCN
hl+1 = σ( i
SACN
hl+1 = σ( i
VR-GCN
hl+1 = σ( i
CompGCN
hl+1 = σ( i
KBGAT
hl+1 = σ( i
RAGAT
hl+1 = σ( i
HGT
hl+1 = σ( i
j∈N (i)
j∈N (i)
j∈N (i)
j∈N (i)
j∈N (i)
j∈N (i)
j∈N (i)
αij W (l) hlj ) αij W (l) hlj ) αij W (l) hlj ) αij W (l) hlj ) αij W (l) hlj ) αij W (l) hlj ) αij W (l) hlj )
Through the above analysis of formulas of two tasks based on a heterogeneous graph neural network, we found that many commonalities can be learned from each other, which are mainly discussed in three aspects, how to consider multiple relationships, multiple directions, and node weights. Firstly, we summarize how the mainstream models mentioned above consider multiple relationships. For heterogeneous graph, neural networks for the encoding of the relationship is mainly considered whether it is an explicit modeling relationship. Explicit modeling relationship data is the relationship vector reflected in the formula hr , where KGCN, KGAT, KGIN, VR-GCN, CompGCN, KBGAT, and RAGAT model explicit modeling relationship vectors, where all recommendations based on heterogeneous graph neural networks are explicit modeling relationships. The invisible modeling relations are the implicit representations of the relation vectors, for example, in the
A Unified Framework for Knowledge Inference
421
RGCN model in the feature transformation Wr , in the SACN model in the self-learning coefficients αt , and in the HGT model in the type of relations. Secondly, we summarize how the above mainstream models consider multiple directions. Knowledge graphs are naturally data with directions, while homogeneous graph neural network GCN models encode single directions. For knowledge graphs with directions, multiple directions need to be considered, and in the above models, they are divided into modeling with directions and modeling without directions. The multiple directions of the RGCN model are reflected in the feature transformation Wr , and the multiple directions of the VR-GCN model are reflected in whether the target node is a head entity or a tail entity. The three orientation matrices reflect the orientation of the CompGCN model. While the GAT model-based models, such as the HGT, KBGAT, and RAGAT, naturally apply to multiple orientations. The SACN model does not consider multiple directions. Furthermore, the three recommender system models, KGCN, KGAT, and KGIN, are related to the knowledge graph, which does not need directions. Finally, we summarize how the above mainstream models consider the weights between nodes. Some of these models inherit the idea of the GAT model and consider the weights of neighbor nodes and target nodes when updating the target nodes, for example, KBGAT, RAGAT, HGT, KGCN, KGAT, and KGIN. When considering the weights, there are divided into two parts according to whether the relationship is considered. For example, RAGAT and KGIN models consider the relationship when calculating the weights between nodes, while The HGT model does not consider the relationship between nodes. There are three main methods for calculating weights. For example, the KGCN model takes dot product, HGT is based on the Transformer idea, and KBGAT inherits the idea of GAT when calculating weights and takes the MLP model. Some of these models inherit the idea of the GCN model without the weights of neighbor nodes and target nodes when updating the target nodes, such as RGCN, VRGCN, SACN, and CompGCN.
4 Experiment Setup 4.1 Dataset and Evaluation Indicators WN18RR: WN18RR was created from the WN18 dataset [6] with deleted relations similar to FB15k-237, a dataset with lexical relations between words. Similar to previous work [6], the mean inverse ranking (MRR), the mean ranking (MR), and the proportion of correct entities in the top N rankings (Hits@N) with N equal 1, 3, and 10, respectively. 4.2 Comparison of Results The graph neural network-based knowledge inference tasks adopt an encoder-decoder framework, with the graph neural network as the encoder and the traditional knowledge graph as the decoder. First, we verify whether the traditional knowledge graph inference is improved when the heterogeneous graph neural network is used as an encoder. We perform the following
422
C. Xie
comparison experiments. CompGCN-DisMult represents the CompGCN model where the decoder takes the DisMult approach. The control experiments take the DisMult model. The results are shown in Table 3. Table 3. The result in the WN18RR dataset. Model
MRR
MR
hits@1
hits@3
hits@10
CompGCN-DisMult
0.431
4589
0.393
0.444
0.514
DisMult
0.412
7868
0.391
0.420
0.452
By comparing the experimental results, we conclude the following that the metrics of traditional knowledge graph inference are improved when heterogeneous graph neural networks are used as encoders. Secondly, to argue whether multi-relationship modeling is better than singlerelationship modeling, we did the following comparison experiments: SACNConv_TransE is a SACN model, and GCN-Conv_TransE is a homogeneous graph convolutional neural network as an encoder, we set the relationship weight coefficient of SACN model αt to 1, and compared the results on WN18RR. The results are shown in Table 4. Table 4. The result in the WN18RR dataset. Model
MRR
MR
hits@1
hits@3
hits@10
SACN-Conv_TransE
0.418
3774
0.383
0.429
0.493
GCN-Conv_TransE
0.405
3211
0.361
0.421
0.488
By comparing the experimental results, we conclude that multi-relationship modeling improves inference metrics compared to individual relationship modeling. Then we argue whether explicit modeling relationships are better than implicit modeling relationship data. Since explicit modeling and invisible modeling are two methods, it is impossible to do comparison experiments in one method, so we consider the classical comparison of explicit and invisible modeling methods to conclude. We take five models for comparison experiments, as shown in Table 5. RGCN and SACN are invisible modeling relationships, and CompGCN, KBGAT, and RAGAT are explicit modeling relationships. From the results, the indicators of the explicit modeling relationship data are significantly higher than those of the implicit modeling, so it is concluded that the explicit modeling relationship data help to improve knowledge reasoning. Then we did the following comparison experiments considering multi-directional compared with individual direction. CompGCN represents the original model with DisMult as the decoder and encodes in three directions: forward, reverse, and self-loop. We unify the matrices representing multiple directions into one matrix, representing
A Unified Framework for Knowledge Inference
423
Table 5. The result in the WN18RR dataset. Model
MRR
MR
hits@1
hits@3
hits@10
RGCN
0.393
7091
0.350
0.411
0.475
SACN
0.418
3774
0.383
0.429
0.493
CompGCN
0.431
4589
0.393
0.444
0.514
KBGAT
0.441
1845
0.360
0.489
0.585
RAGAT
0.486
2337
0.447
0.502
0.557
only individual encoding directions shown in CompGCN-od, and compare them on the WN18RR. The results are shown in Table 6. Table 6. The result in the WN18RR dataset. Model
MRR
MR
hits@1
hits@3
hits@10
CompGCN-dismul
0.431
4589
0.393
0.444
0.514
CompGCN-od
0.422
6451
0.389
0.430
0.500
By comparing the experimental results, we conclude the following. We find whether considering multi-directional heterogeneous graph neural network coding improves the metrics compared to individual directions for the knowledge graph inference task. In order to compare whether considering the attention between nodes improves the metrics, we do the following comparison experiments. RAGAT is to consider the attention between nodes, and RAGAT-wa removes the node attention coefficient and sets it as a constant. The comparisons are performed on the WN18RR. The results are shown in Table 7. Table 7. The result in the WN18RR dataset. Model
MRR
MR
hits@1
hits@3
hits@10
RAGAT
0.486
2337
0.447
0.502
0.557
RAGAT-wa
0.478
3960
0.444
0.494
0.544
By comparing the experimental results, we conclude that the effect of considering node attention is better than that of not considering node attention to improve the knowledge graph inference metrics. In the above analysis, we sent down that when considering node attention, the RAGAT, KGIN model considers the relationship when calculating the weight between nodes. In contrast, the HGT model does not consider the relationship between nodes,
424
C. Xie
and we discuss below whether to consider the relationship vector when discussing the weight. KBGAT is the KBGAT model with decoder DisMult, KBGAT-wr compared to KBGAT, we remove the relationship hr vector. The comparison is performed on the WN18RR, as shown in Table 8. Table 8. The result in the WN18RR dataset. Model
MRR
MR
hits@1
hits@3
hits@10
KBGAT
0.441
1845
0.360
0.489
0.585
KBGAT-wr
0.421
2301
0.345
0.463
0.520
By comparing the experimental results, we conclude that it is better to consider the relationship vector of nodes while considering the node attention than not considering the relationship indicator.
5 Discussion and Conclusion In knowledge graph inference, three main aspects are discussed: whether to model multiple relations and consider multiple directions and node weights. Firstly, we discuss how to consider multiple relationships. We found that explicit modeling relationship data has better metrics than invisible modeling relationship data. We believe that the main reason for this advantage has the following two aspects. Firstly, the types of relationships are diverse in most knowledge graphs. If invisible modeling relationship data, the relationship will be reflected in other parameters. For example, in RGCN, the relationship is invisibly reflected in the feature transformation Wr , leading to mutual interference and thus destroying the impact of the relationship itself. Secondly, all the above frameworks are based on encoder-decoder methods, and the decoder method is based on traditional methods, such as DisMult, which explicitly model relationships. So if there is a relationship vector in the encoder stage, i.e., an explicit modeling relationship, it will make the parameters consistent. Secondly, we discuss how to consider multiple directions. Knowledge graphs are naturally data with directions, while homogeneous graph neural network GCN models encode single directions. For knowledge graphs with directions, whether to consider multiple directions is to be taken into account according to the downstream tasks. We explore whether multi-directional modeling is needed for two tasks, recommendation, and inference, respectively. The recommendation system does not need to consider multi-direction modeling because the knowledge graph in the recommendation system is undirected. The main reason is that the goal of the recommendation system is to recommend the information users are interested in. As long as the relationship between the products that users are interested in exists, there is no need to consider multi-direction. Furthermore, knowledge inference is naturally required to consider the direction. Considering the most basic TransE model idea, the sum of the head entity vector and
A Unified Framework for Knowledge Inference
425
relationship vector equals to tail entity vector. If the direction is not considered, this classical model is not valid. And how to consider the direction, the RGCN model and CompGCN model provide two classical methods. The direction of the RGCN model is reflected in the feature transformation Wr , and the direction of the CompGCN model is reflected through three direction matrices, which represent the knowledge graph target node to neighbor node, neighbor node to target node, and target node self-loop, respectively. Although the two models are represented differently, the core idea is the same, both model the relational data in three directions. Finally, we discuss how to consider the weights between nodes. The methods of recommendation all consider the weights between nodes. The methods of knowledge inference inherit the ideas of GCN and GAT, respectively. We suggest considering the weights between nodes because considering the weights between nodes is equivalent to considering the data of node types, as we discussed in the HGT model. The heterogeneity of graph neural networks is not only reflected in multiple relations but also multiple node types. By considering the node type along with the multi-relationship, the two can influence each other, so we need to model both together. By analyzing the above mechanism, we conclude a unified framework for knowledge inference against heterogeneous graph neural networks. The above experiments validate the correctness of our proposed unified framework for knowledge graph inference based on heterogeneous graph neural networks, and we conclude by briefly stating our conclusions. For the knowledge inference task based on heterogeneous graph neural networks, it is better to take the explicit modeling relationship than the invisible modeling. Taking three directions, forward, reverse, and target node self-loop, for multi-directional modeling is better than individual directions. For the attention weights among nodes, considering node attention is better than not considering node attention. Considering node attention along with the nodes’ relationship vector is better than not considering the relationship vector.
References 1. Stokman, F.N., Vries, P.H.: Structuring knowledge in a graph. In: Human-Computer Interaction, pp. 186–206. Springer, Berlin, Heidelberg (1988). https://doi.org/10.1007/978-3-64273402-1_12 2. Bollacker, K., Evans, C., Paritosh, P., et al.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250 (2008) 3. Lehmann, J., Isele, R., Jakob, M., et al.: Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web 6(2), 167–195 (2015) 4. Mahdisoltani, F., Biega, J., Suchanek, F.: Yago3: a knowledge base from multilingual wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research. CIDR Conference (2014) 5. Arora, S.: A survey on graph neural networks for knowledge graph completion (2020). arXiv preprint arXiv:2007.12374 6. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 26 (2013)
426
C. Xie
7. Ji, G., He, S., Xu, L., et al.: Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (vol. 1: Long papers), pp. 687–696 (2015) 8. Qian, W., Fu, C., Zhu, Y., Cai, D., He, X.: Translating embeddings for knowledge graph completion with relation attention mechanism. In: IJCAI, pp. 4286–4292 (2018) 9. Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning on multirelational data. In: ICML (2011) 10. Yang, Y., Chang, M.W.: S-mart: novel tree-based structured learning algorithms applied to tweet entity linking (2016). arXiv preprint arXiv:1609.08075 11. Liu, H., Wu, Y., Yang, Y.: Analogical inference for multi-relational embeddings. In: International Conference on Machine Learning, pp. 2168–2178. PMLR (2017) 12. Dettmers, T., Minervini, P., Stenetorp, P., et al.: Convolutional 2d knowledge graph embeddings. Proc. AAAI Conf. Artif. Intell. 32(1) (2018) 13. Nguyen, D.Q., Nguyen, T.D, Nguyen, D.Q., Phung, D.: A novel embedding model for knowledge base completion based on convolutional neural network (2017). arXiv preprint arXiv: 1712.02121 14. Vu, T., Nguyen, T.D., Nguyen, D.Q., Phung, D.: A capsule network-based embedding model for knowledge graph completion and search personalization. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 2180–2189 (2019) 15. Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., Navigli, R., Vidal, M.E., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., Alam, M. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38 16. Ye, R., Li, X., Fang, Y., et al.: A vectorized relational graph convolutional network for multirelational network alignment. In: IJCAI, pp. 4135–4141 (2019) 17. Shang, C., Tang, Y., Huang, J., Bi, J., He, X., Zhou, B.: End-to-end structure-aware convolutional networks for knowledge base completion. Proc. AAAI Conf. Artif. Intell. 33(01), 3060–3067 (2019). https://doi.org/10.1609/aaai.v33i01.33013060 18. Vashishth, S., Sanyal, S., Nitin, V., et al.: Composition-based multi-relational graph convolutional networks (2019). arXiv preprint arXiv:1911.03082 19. Nathani, D., Chauhan, J., Sharma, C., et al.: Learning attention-based embeddings for relation prediction in knowledge graphs (2019). arXiv preprint arXiv:1906.01195 20. Hu, Z., Dong ,Y., Wang, K., Sun, Y.: Heterogeneous Graph Transformer. WWW 2020 (2020) 21. Liu, X., Tan, H., Chen, Q., Lin, G.: RAGAT: relation aware graph attention network for knowledge graph completion. IEEE Access 9, 20840–20849 (2021). https://doi.org/10.1109/ ACCESS.2021.3055529 22. Wang, H., Zhao, M., Xie, X., et al.: Knowledge graph convolutional networks for recommender systems. In: The World Wide Web Conference, pp. 3307–3313 (2019) 23. Wang, X., He, X., Cao, Y., et al.: Kgat: knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950–958 (2019) 24. Wang, X., Huang, T., Wang, D., et al.: Learning intents behind interactions with knowledge graph for recommendation. In: Proceedings of the Web Conference 2021, pp. 878–887 (2021)
Evaluation of Convolution Primitives for Embedded Neural Networks on 32-Bit Microcontrollers Baptiste Nguyen1,2(B) , Pierre-Alain Mo¨ellic1,2 , and Sylvain Blayac3 1
3
CEA Tech, Centre CMP, Equipe Commune CEA Tech - Mines Saint-Etienne, 13541 Gardanne, France [email protected] 2 Univ. Grenoble Alpes, CEA, Leti, 38000 Grenoble, France [email protected] Mines Saint-Etienne, CMP, Department of Flexible Electronics, 13541 Gardanne, France [email protected]
Abstract. Deploying neural networks on constrained hardware platforms such as 32-bit microcontrollers is a challenging task because of the large memory, computing and energy requirements of their inference process. To tackle these issues, several convolution primitives have been proposed to make the standard convolution more computationally efficient. However, few of these primitives are really implemented for 32-bit microcontrollers. In this work, we collect different state-of-the-art convolutional primitives and propose an implementation for ARM CortexM processor family with an open source deployment platform (NNoM). Then, we carry out experimental characterization tests on these implementations. Our benchmark reveals a linear relationship between theoretical MACs and energy consumption. Thus showing the advantages of using computationally efficient primitives like shift convolution. We discuss about the significant reduction in latency and energy consumption due to the use of SIMD instructions and highlight the importance of data reuse in those performance gains. For reproducibility purpose and further experiments, codes and experiments are publicly available (https://gitlab.emse.fr/b.nguyen/primitive of convolution). Keywords: Deep Learning · Architecture optimization systems · Convolutional neural network
1
· Embedded
Introduction
The demand for edge inference is growing and neural networks are prime candidates due to their success across a large variety of application domains. However, state-of-the-art deep neural network models, especially convolution neural c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 427–437, 2023. https://doi.org/10.1007/978-3-031-27440-4_41
428
B. Nguyen et al.
networks, require a large amount of memory and computational resources. For example, the standard ResNet-18 model [3] for image classification on ImageNet has around 11M parameters and requires approximately 1 GMACs for an inference which is prohibitive for ARM Cortex-M microcontrollers. Thus, designing efficient neural network architectures is a major topic in the embedded AI community. In the search for efficient neural network architectures, several alternatives to convolution have been proposed, but few of them are practically implemented on deployment libraries for 32-bit microcontrollers. This work focuses on the implementation and characterization of state-of-the-art convolution primitives for ARM Cortex-M MCUs. Our contributions are as follow: – We implement three state-of-the-art convolution primitives for ARM CortexM MCUs and when possible, we propose another implementation which makes use of the SIMD1 instructions (Single Instruction, Multiple Data). – We characterize the latency and energy consumption of five primitives, including the standard convolution, against different parameters such as kernel or input size. – We provide insights on the performance of different primitives, especially for our implementations using SIMD instructions to help machine learning practitioners to design, develop and deploy efficient models according to their requirements.
2
Background
2.1
Preliminaries and Notation
We consider the typical case of a 2D-convolution layer with padding and a square input tensor X of dimensions of Hx × Hx × Cx with Hx the spatial width and Cx the number of channels. The convolution layer produces an output tensor Y of dimensions Hy × Hy × Cy with Hy the spatial width (equal to Hx ) and Cy the number of channels. The convolution is performed thanks to convolutional kernels represented by a weight tensor W of size Hk × Hk × Cx × Cy with Hk the spatial dimension of a kernel (assumed to be square), Cx the number of input channels and Cy the number of output channels (i.e. the number of filters) as defined previously. The output for standard convolution is as follows: Yk,l,n =
Hk Hk Cx
Wi,j,m,n · Xk+i−1,l+j−1,m
∀k, l ∈ [1, Hy ],
∀n ∈ [1, Cy ]
m=1 i=1 j=1
(1) On modern CNN architectures, convolution layers are often coupled with batch-normalization layers that normalize (recentering and rescaling) the inputs of layers to make training faster and improve stability. 1
https://www.keil.com/pack/doc/CMSIS/Core/html/group intrinsic SIMD gr. html.
Evaluation of Convolution Primitives
2.2
429
Convolution Primitives
We detail the different convolution primitives evaluated in this work. Table 1 sums up performance features compared to the standard convolution. Table 1. Summary of the different primitives. Parameters gain is the ratio between the primitive’s number of parameters and the standard convolution. The same applies for theoretical MACs with complexity gain. Convolution type
Parameters
Theoretical MACs
Parameters gain Complexity gain
Standard Hk2 · Cx · Cy Hk2 · Cx · Hy2 · Cy – 2 Cx 1 Hk · G · Cy Hk2 · CGx · Hy2 · Cy Grouped G 2 2 2 Depthwise separable Cx · (Hk + Cy ) Cx · Hy · (Hk + Cy ) C1y + Shift
Cx · (2 + Cy )
Cx · Cy · Hy2
Add
Hk2
Hk2
· Cx · Cy
· Cx ·
Hy2
· Cy
2 2 Cy ·Hk
1
– 1 2 Hk
+
1 2 Hk
1 G 1 Cy 1 2 Hk
+
1 2 Hk
1
Grouped convolution was first introduced in the AlexNet paper from Krizhevsky et al. [7] for practical issues, then several works such as Ioannou et al. [4] have studied its effect on the performance of a neural network model. For the standard convolution, all input channels are used to compute an output channel. For a grouped convolution with G groups, each channel of the input and output are associated with a group Gi . Then, to compute an output channel of the group Gi , only the corresponding input channels are processed, as depicted in Fig. 1. Thus, grouped convolutions (also referred as filter groups) reduce the number of parameters and MAC operations of the layer by a factor G.
Fig. 1. From [4], standard vs. grouped convolutions: the grouped convolution with 2 groups applies half of the filters to each half of the input channels in order to compute each half of the output channels.
Depthwise Separable Convolution. Szegedy et al. [9] introduce depthwise separable convolutions with the Inception architecture. Depthwise separable convolution replaces the standard convolution by two convolutions: depthwise and pointwise. Depthwise convolution is an extreme version of grouped convolution where G = Cx = Cy . The problem is that each filter only handles information
430
B. Nguyen et al.
passed down from one input channel. Pointwise convolution is applied to linearly combine the output channels of the depthwise convolution thanks to 1 × 1 kernels. It also acts as a reduction of the depth of the output tensor Y . Shift Convolution. Even though pointwise convolution is more computationally expensive than depthwise convolution in theory, Jeon et al. [6] notice, with a hardware implementation, that depthwise convolution is more time-consuming than point convolution. They replace depthwise convolution by a shift operation which requires extremely few parameters and less computational power to produce the intermediate feature map I: Ik,l,m = Xk+αm ,l+βm ,m ∀k, l ∈ [1, Hx ],
∀m ∈ [1, Cx ]
(2)
where αm and βm denote the horizontal and vertical shift assigned to the mth channel of the input feature map. Add Convolution. Multiplication operation consumes, in most cases, more energy than addition operation. Chen et al. [2] exploit the fact that convolutions in deep neural networks are cross-correlation measuring the similarity between input and convolution kernel. They propose to replace cross-correlation by L1norm as a similarity measure to perform an add convolution as in Eq. 3. Yk,l,n = −
Hk Hk Cx
|Wi,j,m,n −Xk+i−1,l+j−1,m | ∀k, l ∈ [1, Hy ],
∀n ∈ [1, Cy ]
m=1 i=1 j=1
(3) The output of an add convolution is always negative. Thus, in order to make add convolution compatible with standard activation functions like ReLu, a batch normalization layer following the add convolution layer is needed. 2.3
Neural Network Library for Cortex-M MCU
The challenge of porting neural networks to constrained platforms such as microcontrollers has led to the creation of embedding tools (e.g. TFLM2 , N2D23 , STM32Cube MX-AI4 or NNoM5 ). Those tools support standard convolution as well as depthwise separable convolutions layers. TFLM and STM32Cube MX-AI support floating point operations, 16 and 8 bits integer operations while NNoM supports only 8 bits integer operations. Furthermore, for Cortex-M4 and CortexM7 MCUs (with Digital Signal Processing extensions), SIMD instructions can be used for the computation of different primitives by integrating the middleware CMSIS-NN [8] to those tools. For our study, the open source NNoM library was chosen due to its good performance and its ease of customization. 2 3 4 5
https://www.tensorflow.org/lite/microcontrollers. https://github.com/CEA-LIST/N2D2. https://www.st.com/en/embedded-software/x-cube-ai.html. https://github.com/majianjia/nnom.
Evaluation of Convolution Primitives
3
431
Implementation
In this section, we present the implementation details of NNoM and CMSIS-NN convolution on which our implementations of the different primitives are based. Furthermore, we detail the differences of implementation between the standard convolution and the optimized primitives. 3.1
Quantization
Quantization is the process of reducing the precision of weights, biases, and activations in order to reduce the memory footprint. NNoM library uses 8 bits quantization for the weights, biases, and activations with a uniform symmetric powers-of-two quantization scheme as in Eq. 4. (4) dec = ceil log2 max(|Xf |) ; xi = round xf · 2dec where Xf is a 32 bits floating point tensor, xf a value of Xf , xi its 8 bits quantized version and 2dec is the scale of quantization. Because this scale is a power of 2, the convolution operation only requires integer addition, multiplication and bit shifting, but no division (see Algorithm 1, left). This computation process is used for grouped and shift convolutions because of their similarity to standard convolution. We adapt it to add convolutions as presented in Algorithm 1 (right).
Algorithm 1. Inner loop of convolution (left) and add convolution (right) without bias Input : individual weight w, power-of-2 scale of weight decweight , one input value x, power-of-2 scale of input decinput , power-of-2 scale of output decoutput 1: output ← i · w 2: shif toutput ← decweight + decinput − decoutput 3: output ← output >> shif toutput 4: Return output
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
shif t ← |decinput − decweight | if decinput > decweight then output ← −|i − (w shif toutput Return output
432
3.2
B. Nguyen et al.
Batch Normalization Folding
For convolutions, NNoM library uses the batch normalization folding proposed by Jacob et al. [5]. By merging convolution layers and batch normalization layers, this method accelerates the inference without accuracy drop. Batch normalization folding can be applied for the computation of grouped and shift convolutions but is not suitable fot add convolution. 3.3
Im2col Algorithm with SIMD Instructions
In order to accelerate convolutions, the CMSIS-NN middleware [8] use the image to column (im2col) algorithm [1]. A first step is to sample patches from the input, flatten and stack them as columns of a matrix M . Each filters of the convolution weight W are also flattened and stacked as rows of a matrix N . In the second step, the output is computed with the matrix multiplication Y = M.N . To deal with the increased memory footprint of im2col, Lai et al. [8] limit the number of patches processed at the same time to 2. The matrix multiplication is computed using 2 filters simultaneously to maximize the data reuse at the register file level on ARM Cortex-M. Furthermore, Lai et al. [8] use the parallelized multiply-accumulate instruction SMLAD to speed up the matrix multiplication. For grouped convolution, we apply Lai et al. [8] algorithm to each group. For shift convolution, we modify the first step of im2col to sample a patch with different shifts for each input channel. We did not implement a SIMD version of add convolutions because there is no instructions similar to SMLAD adapted to add convolutions.
4
Experimental Characterisations
The experiments are carried out on a typical 32-bit MCU platform, the Nucleo STM32F401-RE, based on Cortex-M4 that supports SIMD instructions. Unless specified, the compiler is arm-none-eabi-gcc (version 10.3) with the optimization level sets to Os and the MCU’s frequency is fixed at 84 MHz. The software STM32CubeMonitor-Power6 is used to measure the electric current of the MCU. We multiply it by the supply voltage (i.e. 3.3 V) and integrate it over the duration of an inference to obtain the inference’s energy consumption. 4.1
Influence of the Primitive Parameters
Protocol. To evaluate the influence of a parameter (i.e. kernel size, input width...), we consider a layer with every other parameters fixed excepted the concerned one. The experiment plan is defined in Table 3. We measure the latency and energy consumption over 50 inferences (average) with randomized inputs. Results are presented in Fig. 2. 6
https://www.st.com/en/development-tools/stm32cubemonpwr.html.
Evaluation of Convolution Primitives
433
Fig. 2. Influence of the 1) number of groups, 2) kernel size, 3) input width, 4) number of input channels and 5) filters on a) theoretical MACs, b) latency without SIMD instructions, c) energy consumption without SIMD instructions, d) latency with SIMD instructions and e) energy consumption with SIMD instructions and f) speedup for different primitives. The different implementations fit the theory. Using SIMD instructions enables faster and less energy consuming inferences. The speedup of the im2col algorithm varies according to the primitives and their parameters.
434
B. Nguyen et al.
Results Without SIMD Instructions. We observe in Fig. 2 a-c that our implementation fits the theory (Table 1). For example, the theoretical MACs, latency and energy consumption increase quadratically with the kernel size (Fig. 2 2.a, Fig. 2 2.b and Fig. 2 2.c). More specifically, there is a linear relationship between the MACs, latency and consumption. A linear regression leads to scores of 0.995 and 0.999 respectively. Add convolutions are slightly less efficient than convolutions despite the same number of MACs. This is explained by the quantization scheme of add convolution and the additional batch normalization layer (Table 2). Table 2. Primitive parameters for the different experiments. Experiment Groups Kernel size Input width Input channel Filters 1
1–32
3
10
128
64
2
2
1–11
32
16
16
3
2
3
8–32
16
16
4
2
3
32
4–32
16
5
2
3
32
16
4–32
Effect of SIMD Instructions. Using SIMD instructions decreases the latency (Fig. 2 d) and energy consumption (Fig. 2 e) of the different primitives. Our implementation with SIMD instructions also fits the theory. But latency is more relevant to estimate the layer’s energy consumption (regression score of 0.999) than theoretical MACS (regression score of 0.932). This loss of linearity is related to the varying speedup of the im2col algorithm with respect to the primitives and their parameters (Fig. 2 f). A possible explanation is in the data reuse exploitation by the im2col algorithm. To verify this, we measure the number of memory access in those programs. Figure 3 shows the variation of the ratio of memory access without SIMD instructions by the memory access with SIMD instructions (normalized by MAC) for different parameters and primitives. We observe in Fig. 3 the same variations as in Fig. 2 f. Thus, data reuse contributes strongly to the speed up of algorithms using SIMD instructions. However, convolutions and grouped convolutions have similar ratio in Fig. 3 but different speedup in Fig. 2 f. Other factors such as memory access continuity and padding are to be taken into account to explain the performance of these programs.
Evaluation of Convolution Primitives
435
Fig. 3. Influence of the a) number of groups, b) kernel size, c) input width, d) number of input channels and e) filters on the ratio of memory access without SIMD instructions by the memory access with SIMD instructions (normalized by MACs) for different primitives.
4.2
Influence of Other Factors
For the following experiments, we fix the number of groups at 2, the kernel size at 3, the input width at 32, the input channel at 3 and the filters at 32.
Fig. 4. Influence of the MCU frequency on latency, energy consumption without SIMD instructions (a and b) and with SIMD instructions (c and d).
Influence of Frequency. We perform inferences on a frequency range from 10 to 80 Mhz (see Fig. 4). Latency is inversely proportional to the frequency as expected. Power consumption increases with frequency (see Table 3) but to a lesser degree than the decrease of latency. Thus, using the maximum frequency of ARM Cortex-M MCUs lowers the inference’s energy consumption. Table 3. Average power consumption (mW) at different frequencies. 10 MHz 20 MHZ 40 MHz 80 MHz No SIMD 16.16
21.59
32.83
52.09
SIMD
24.66
37.33
62.75
17.57
436
B. Nguyen et al.
Influence of Optimization Level. We perform a convolution inference with two different optimization levels (O0 and Os). As seen in Table 4, the compiler optimization has an important effect on the layer performance. Using Os level accelerates the inference by a factor 1.52. This impact is emphasized with the use of SIMD instructions (factor 9.81). Without optimization, the use of SIMD instructions can even increase the layer’s energy consumption as using SIMD instructions increases the average power consumption. Table 4. Effect of optimization level on inference performance for convolution. Optimization level Latency (s) Consumption (mJ) Optimization Speedup SIMD Speedup No SIMD O0 Os
1.26 0.83
63.9 45.7
– 1.52
– –
O0 Os
1.08 0.11
82.0 7.2
– 9.81
1.17 7.55
SIMD
5
Conclusion
In this paper, we implement and benchmark several state-of-the-art convolution primitives for ARM Cortex-M microcontrollers. Our benchmark shows that for microcontrollers which cannot use SIMD instructions, theoretical MACs is a relevant indicator to estimate the layer energy consumption. For microcontrollers which use SIMD instructions, latency is preferred over theoretical MACS to estimate the layer energy consumption while using SIMD instructions. We explain this by the varying efficiency of the im2col algorithm, from CMSIS-NN, depending on the layers and highlight the role of data reuse in this performance gap. Furthermore, we study the influence of external parameters to the convolution algorithms such as the compiler optimization and the MCU frequency. Our experiments highlight the major impact of the compiler optimization on the layers performance while using SIMD instructions, and show that running the inference at maximum frequency decreases the layer’s energy consumption. Our work opens up new possibilities for neural architecture search algorithms. Acknowledgments. Part of this work was done with the support of ID-Fab (Prototyping platform: project funded by the European Regional Development Fund, the French state and local authorities). Author contributions. Nguyen, Mo¨ellic and Blayac conceived and planned the study. Nguyen carried out the experiments and performed the analysis. Nguyen and Mo¨ellic wrote the manuscript with input from all authors.
Evaluation of Convolution Primitives
437
References 1. Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: Tenth International Workshop on Frontiers in Handwriting Recognition, Suvisoft (2006) 2. Chen, H., et al.: Addernet: do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1468–1477 (2020) 3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2016) 4. Ioannou, Y., Robertson, D., Cipolla, R., Criminisi, A.: Deep roots: improving CNN efficiency with hierarchical filter groups. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1231–1240 (2017) 5. Jacob, B., et al.: Quantization and training of neural networks for efficient integerarithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018) 6. Jeon, Y., Kim, J.: Constructing fast network through deconstruction of convolution. arXiv preprint arXiv:1806.07370 (2018) 7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012) 8. Lai, L., Suda, N., Chandra, V.: CMSIS-NN: efficient neural network kernels for arm cortex-m cpus. arXiv preprint arXiv:1801.06601 (2018) 9. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
On a Structure of an Automated Differential Equation Solver Based on Machine Learning Methods Damir Aminev, Nikita Demyanchuk, and Alexander Hvatov(B) NSS Lab, ITMO University, Kronverksky pr. 49, Saint-Petersburg 197101, Russia alex [email protected]
Abstract. Differential equation solvers are usually well-established software. On the one hand, conventional solvers are designed in a highperformance computation paradigm. On the other hand, it is hard to make changes to the conventional solver structures. In some applications, as an example equation discovery, it is viable to move from high-performance solutions for a given class of equations to a universal machine learning tool that could handle wide classes of equations. In this paper, we describe the current state of automated differential equation solvers and propose the architecture of such software. We highlight the difference between conventional and automated solvers. We also propose the architecture of the differential equation solver based on a machine learning paradigm.
Keywords: differential equation physics-informed neural network
1
· solver · machine learning ·
Introduction
Differential equation solvers are usually considered auxiliary software for optimization problems, such as system dynamics optimization. Therefore, the input of such solvers is strictly fixed. There are many examples of Runge-Kutta-based ODE (ordinary differential equation) system solvers written in different programming languages, starting from Fortran [1] and pure C [2] programming languages to the modern boost library C++ module [3]. The architecture of classical solvers consists of three main parts - transition to the system of linear algebraic equations, linear systems solver and transition from system algebraic equations solutions to the resulting differential equation solution. There are many different approaches to transitioning to system algebraic equations, starting from the Runge-Kutta and Krylov methods for ODE systems [1–3] to Galerkin-like methods [4] for PDE (partial differential equations). Linear algebraic system solvers are usually separate and well-established tools. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 438–447, 2023. https://doi.org/10.1007/978-3-031-27440-4_42
Differential Equation Solver Based on Machine Learning Methods
439
Modern data-driven methods such as physics-informed neural networks (PINNs) [5], and DeepONet [6] require at least applying the arbitrary (but fixed for every given problem) differential operators to neural networks. The arbitrary form of the operator makes us move from classical solvers with fixed architecture and input to modern ones. Another area that requires the solution of an arbitrary differential equation is the a data-driven equation discovery [7,8]. For some applications, the solution of a discovered equation is required to compare the solution with the input data [9]. The arbitrary type of operator requires expanding the input type and, more importantly, expanding the arsenal of tools used to handle the solution. In the paper, we discuss what direction exists to extend the solver, what modules are required, and how those ideas form our solver based on the neural networks. The paper is organized as follows. Section 2 is dedicated to the review of the existing solutions, Sect. 4 is our opinion on how solved based on machine learning method should be built, Sect. 3 shows what advantages the proposed solver has compared to existing ones, Sect. 5 outlines the paper and contains future work directions.
2
Related Work
Classical differential equation solvers, as stated above, are tools that are used for a fast and precise solution of rather narrow problem classes. For example first-order ODE systems that are solved by Runge-Kutta methods with addons [1–3]. With some adjustments, it is enough to solve most ODE problems. Every PDEs problem class requires a separate approach. We may name finite differences method [10], finite element method [4], spectral methods [11], Fourier neural differentiation operators [12]. Every PDE application area has a set of established PDE solution methods. The differential equation solution pipeline may be represented as a decision tree. At the root, we separate the ODE from the PDE. We determine the set of transforms to the first-order system for ordinary differential equations and use a classical solver. Partial differential equations should be transformed to the canonical form and, if possible, solved by the adequately chosen method. The closest to the described pipeline is used in Wolfram Mathematica. Since it is proprietary software and does not have an API, it cannot be used as a part of equation discovery or any other algorithm. We note that theoretically, one may build such a decision tree. However, every new class of equations will extend or change the tree’s structure. As an example of open software, we may note package [13] written in Julia language. If we move from classical solvers to more universal, we find that the number of existing solutions that can be reused is increasingly small [14,15]. Physicsinformed neural networks [5] do not have established code, and the realization differs from application to application. The same can be said of DeepONet [6]. Therefore, checking if the PINN works as the universal equation solver is impossible. DeepXDE [16] software has the repository and is the closest software that can
440
D. Aminev et al.
be used as a universal solver. All frameworks, theoretical PINN and DeepONet, and openly available DeepXDE use the neural networks as the ground to solve differential equations. Apart from Python software there is also Julia package [17]. The process of equation solution may be referred to as Sobolev training [18].
3
State of Neural Network Differential Equation Solvers
Before we dive into the details, we consider the best possible choice for the equation discovery. In our opinion, it is DeepXDE [16]; however, the considerations are valid for most of the software currently available. We assume that within the equation discovery, we get only the equation without boundary conditions. Moreover, it is an arbitrary equation, i.e. it presumably has no fixed order and other properties. The boundaries and initial conditions may be fixed and have a known form. Even though we may fix the form of the boundary conditions, a different operator order requires a different set of boundary conditions, meaning that for every equation order, we have to fix different conditions. To summarize the section, we collect the differences between the proposed approach and DeepXDE in Table 1. Table 1. Comparison of the proposed approach with DeepXDE for some parameters Module
DeepXDE
Proposed approach
Approximator NN
Parametrized model
Differentiation Autograd
Autograd, numerical differentiation
Operator form Constant coefficients
Time- and spatialvariable coefficients
BC form
Arbitrary
Dirichlet, Neumann, Robin, IC, GeneralBC
Differential equation solver DeepXDE is built using the classical solver guidelines. It is not an obstacle to use it in the equation discovery algorithm, but it would not be able to handle arbitrary equations without additional changes in the architecture. Also, the geometry and boundary conditions are somewhat implied by classical solvers. For example, solving the Korteweg-de Vries equation is impossible since the maximum input order without additional changes is second. Therefore, using standard solver guidelines and notions of jacobian and hessian as equation ’building blocks’ and Dirichlet, Neumann, and Robin boundary conditions do not lead us to handle the arbitrary equations. We note that DeepXDE could theoretically handle arbitrary operators and boundary conditions. It is hard to assess the changes required to the architecture to change input. However, an arbitrary operator is more than a simple input
Differential Equation Solver Based on Machine Learning Methods
441
change. Changing philosophy and overall software architecture from a “mathematical” problem description to a less formal one is required. For example, initial and boundary conditions should be considered as one class, and we should be able to define “boundary conditions” in the domain interior.
4
Proposed Architecture
This section proposes how to build the neural network-based differential equation solver. Below is a flowchart of the torch DE solver1 operation algorithm (Fig. 1).
Fig. 1. torch DE solver architecture realization
The solver is implemented so that it can be extended with new methods to solve differential equations without global changes in the architecture. When comparing the well-known software implementation, for example DeepXDE, which implements only one method to solve DE (tf.autograd), it is fair to single out torch DE solver as a more flexible choice of methods to solve boundary problems. To define a new method for solving differential equations, it is necessary to define a new method Equation and the mechanism to determine the derivative Derivative. It is unknown whether neural networks can represent the given differential equation solution. The universal approximation theorem is valid for L2 space, dense in Sobolev W2 space. However, it is required to consider the universal approximation theorem for Sobolev spaces more thoroughly. In torch DE solver, we do not stick to neural networks - the proposed approach may be extended to an arbitrary parametrized model. 1
https://github.com/ITMO-NSS-team/torch DE solver.
442
4.1
D. Aminev et al.
Equation Module
This section describes what is required for differential equation solution as input. To begin with, we use mathematical boundary value problem statement: to find a solution u to the equation on a defined domain Ω ⊂ RK , where K is dimensionality of the space, with given boundary conditions ∂Ω in form Eq. 1. Lu = f, , bu = g
(1)
where L, b - differential and boundary operators, f, g - the arbitrary functions. Thus, the statement of the problem was determined, and we can proceed to the numerical solution of the boundary value problem. It is necessary to know the ODE or PDE, the boundary, and the initial conditions and specify the calculation domain. As for the Solver, we have to specify all these essential parameters such as: grid, equation, boundary conditions. Moreover, it is possible to choose different approaches to solve an equation. Solver supports methods based on a matrix (linear model without activation layers) and neural network optimizations, as well as a method based on the pytorch automatic differentiation algorithm (in detail, solution methods are described in Sect. 4.2). Grid. The grid parameter represents a domain in which we want to calculate a differential equation. The only significant restriction is that only a single connected domain can be considered. We do not assume that geometry has a particular shape, rectangular, circular, or any other analytical domain. To preserve generality, the domain is represented by the number of points. All domain points are used for parametrized model training. We have to separate boundary points and interior points for the finite-difference method. Boundary points are found as closest to the span of the grid points set. Subsequently, the boundary points are sorted for each axis on forward and backward. Points are marked with forward if the coordinate of the point remains within the interior of the span after a small increase in coordinates and backward otherwise for the case of coordinate decrease. Same check is performed for every grid point and points that are both forward and backward marked as central. The point type affects a finite-difference scheme choice. Such an approach, on the one hand, allows for the definition of arbitrary domain geometry and, on the other hand, allows for to use of finite difference and matrix optimization methods together with the automatic gradient. Equation. To add more flexibility, we should move from Jacobians and Hessians to separate differentials or a given order. Moreover, we add variable coefficients and separate dependent variables to solve systems and complex-valued equations. We collect all the required parameters in the equation interface. The interface includes several parameters such as: coefficient, operator, power and the optional parameter variable (it must be specified if the equation contains several variables in the case of system solution). After the equation is given, it is transformed depending on the selected differentiation and model method (details of possible choices are provided
Differential Equation Solver Based on Machine Learning Methods
443
in Sect. 4.2). All these parameters allow setting arbitrary equations, including equations with coefficients that may be a function. Variable coefficients allow, for example, to solve canonical ODEs such as the Legendre equation (Eq. 2(a) and (example ODE legendre.py)) or Panleve transcendents (Eq. 2(b) and (example Painleve I.py))2 . (1 − t2 )u (t) − 2tu(t) + n(n + 1)u = 0 (a) (b) u (t) = 6u(t)2 + t
(2)
Boundary and Initial Conditions. For boundary and initial conditions, flexibility requires a change of point of view. In the classical solvers, we work with canonical types such as prescribed values (Dirichlet-type boundary conditions, it may be a function of boundary) of field or normal differential values (Neumanntype boundary conditions) for the entire boundary. Initial conditions are also prescribed values or function at t = 0. We add an interface that allows prescribing any combination of differential operators at any subset of points. It is done mostly for the differential equations discovery process. However, it made it possible to solve some canonical PDEs, such as the Korteweg-de Vries equation, which has third order and requires extended Robin conditions in general (example KdV.py)). Customizable parameters of the boundary conditions interface are: boundary, boundary operator, boundary values and the optional parameter boundary type (it must be specified if the operator has periodic conditions). We note that there is no check for completeness and correctness of boundary conditions. As a result, we may solve under- and over-defined problems. 4.2
Solution and Solver Module
The input Equation interface allows for converting unified input form to the proper operator that could be applied to a chosen model. Above we mention that it could be any pytorch model (in what follows we describe neural networks and linear models without activation layers – matrices and tensors) with parameter requires grad=True. The latter is required to use the built-in automatic differentiation in pytorch. We also support finite-difference differentiation to support a wider class of models for this method requires grad=True is not required. After initializing the input data, the next step is to choose the differential equation solution method. The general methodology of the following methods is based on the discretization of continuous differential operators; therefore, it is necessary to move from an analytical formulation of the problem to a numerical one. Most numerical methods find the boundary problem solution u ¯ (mesh function) in the points of the discrete subset Ω: (i)
u ¯ = {u(xk ), i = 1, 2..., n, k = 1, 2, ..., K} , (i) ∀i (xk ) ∈ Ω 2
Code examples are provided in torch DE solver/blob/main/examples.
(3)
https://github.com/ITMO-NSS-team/
444
D. Aminev et al.
where K is the number of function parameters (arguments), and n is the number of points with corresponding dimension K. The minimization problem may be formulated as Eq. 4. u − f ||i + λ||b¯ u − g||j min ||L¯
(4)
u ¯
Usually, i and j are represented by the l1 or l2 norms. λ is an arbitrary constant that influences the convergence speed if the boundary conditions are correctly defined. Since, mesh function u ¯ is not determined, it’s essentially to move from continuous differential operators L and b to numerical analogues ¯ and ¯b. If numerical analogues of continuous operators are defined it will be L ¯ u value for solution candidate u sufficient to obtain the L¯ ¯ in each point of X (i) = {xk } ⊂ Ω. Correspondingly, the minimization problem is reformulated as follows: ¯ u − f ||i + λ||B ¯u ¯ − g||j (5) min ||L¯ u ¯
X
The class Solution includes three different approaches (modes) to optimize the solution model. The first mode NN is considered primary and may be applied to arbitrary models. The method is based on differential operator representation with finite-difference schemes. They realized in the Finite diffs class. Forward and backward schemes (the first order of precision O(h), where h is increment and may be (but not necessary) equal to the grid step in the discretization of the given dimension) are implemented as shown in Eq. 6. uf (x) ≈ ub (x) ≈
u(x+h)−u(x) h u(x)−u(x−h) h
(6)
“Central” scheme (second order of accuracy O(h2 ) is realized for internal points of the computational domain as shown in Eq. 7. 1 u(x + h) − u(x − h) (uf (x) + ub (x)) ≈ (7) 2 2h Additional finite difference schemes, higher-order or problem-specific schemes, may be added to Finite diffs. For example, we realized the secondorder schemes. However, they do not increase the overall quality of the solution. The second mode (autograd) is implemented using automatic differentiation (autograd) in the PyTorch library. The third mode (mat) is based on matrix optimization. The model here is essentially a value of the function at the grid points. This method is done mostly for rectangular grids since the matrices are easier and faster to manipulate. The choice of differential equation solving method affects how the initial and boundary conditions in the conventional form Equation class are entered into the Solution class. This class refers to the Derivative class, which in turn refers to the corresponding Derivative {NN;autograd;mat} class as shown Fig. 2). uc (x) ≈
Differential Equation Solver Based on Machine Learning Methods
445
Fig. 2. Differentiation module consisting of different differentiation methods
The corresponding differential operator is applied a proper number of times and for required variables. Every term is multiplied by the corresponding transformed coefficient (a constant or a function). Since differential operators may be applied to the given model, we may solve the minimization problem. The method of loss evaluation loss evaluation in Solution class and based on Eq. 8. ¯ u(x, t; Θ) − f ||i + λ||B ¯u ¯(x, t; Θ) − g||j (8) min ||L¯ Θ
X
The first term responds to differential equation solution searching in each node of the computational grid. The second term ensures the implementation of initial and boundary conditions. Every Derivative realization has its own specific realization problem for minimization. The Derivative NN class implements the minimization as a conventional neural network training process. The differential operator is applied to the model in all grid points (or a grid subset in a mini-batch manner). After that, the loss is based on an Eq. 8 is minimized. The Derivative autograd class differs from a Derivative NN class realization above with differentiation method. The Derivative mat class minimizes the discrete field (matrix) norm of the operator applied to a current state represented as function values at the grid points. Functional Eq. 8 minimization and, accordingly, the search for a DE solution is implemented in the Solver class. This class includes an algorithm for minimizing the functional (equation solution), which is based on various optimization methods optimizer choice and methods for DE solution visualization graphs solution print.
446
D. Aminev et al.
4.3
Cache Module
The positive side of machine learning models is the possibility of using pretrained models as an initial guess for the differential equation solution. The torch DE solver implements a procedure in the Model Prepare class, which performs caching solutions. After the optimization algorithm completion, the neural network parameters (weights) and the optimizer state are saved in the created folder (solution library). Cached solutions are used for initial field interpolation to achieve faster convergence. A model with identical or different architecture and minimal norm (Eq. 8) for a given DE is searched in the solution library (cache folder). For this purpose, at the start of the minimization algorithm to work, the Model Prepare class is called. The caching technique allows the approximation to be done in the shortest time interval. If the input neural network architecture is not the same as the model from the cache with the minimal norm, the input model will be retrained using the cached one. The network weights are perturbed whenever the best model is taken from the cache to avoid the local minima cases. If Derivative mat class is used, the neural network model with default architecture is trained using the resulting matrix values and grid. As a result, each obtained model from different methods Derivative {NN;autograd;mat} is saved uniformly and may be used further by each method to find the solution in the shortest time.
5
Conclusion
In this paper, we describe the architecture of torch DE solver software that allows differential equations to be solved in a more machine learning manner. As advantages of described architecture, one may note: – Widest class of equations that may be solved – Non-formalized boundary conditions – Using of various models-discretizators, independently on built-in pytorch automatic differentiation module – Cache module for faster work So far, the proposed approach is primarily a visualization tool for an equation discovery process. We note that convergence toward the solution is not mathematically proved. However, the model equations show that the solution is obtained precisely, and the number of correctly solved equations is higher than in existing analogues. Acknowledgement. This research is financially supported by The Russian Scientific Foundation, Agreement #21-71-00128.
Differential Equation Solver Based on Machine Learning Methods
447
References 1. Hindmarsh, A.C.: Scientific computing, pp. 55–64 (1983) 2. Hindmarsh, A.C., et al.: ACM Trans. Math. Software (TOMS) 31(3), 363 (2005) 3. Ahnert, K., Mulansky, M.: In AIP Conference Proceedings, vol. 1389, pp. 1586– 1589. American Institute of Physics (2011) 4. Sol´ın, P.: Partial Differential Equations and the Finite Element Method. Wiley (2005) 5. Raissi, M., Perdikaris, P., Karniadakis, G.E.: J. Comput. Phys. 378, 686 (2019) 6. Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G.E.: Nat. Mach. Intell. 3(3), 218 (2021) 7. Maslyaev, M., Hvatov, A., Kalyuzhnaya, A.V.: J. Comput. Sci. 53, 101345 (2021) 8. Fasel, U., Kutz, J.N., Brunton, B.W., Brunton, S.L.: Proc. Roy. Soc. A 478(2260), 20210904 (2022) 9. Maslyaev, M., Hvatov, A.: In: 2022 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2022) 10. Fornberg, B.: Math. Comput. 51(184), 699 (1988) 11. Burns, K.J., Vasil, G.M., Oishi, J.S., Lecoanet, D., Brown, B.P.: Phys. Rev. Res. 2(2), 023068 (2020) 12. Li, Z., et al.: arXiv preprint arXiv:2003.03485 (2020) 13. Rackauckas, C., et al.: arXiv preprint arXiv:2001.04385 (2020) 14. Blechschmidt, J., Ernst, O.G.: GAMM-Mitteilungen 44(2), e202100006 (2021) 15. Frank, S.A.: arXiv preprint arXiv:2207.04487 (2022) 16. Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: SIAM Rev. 63(1), 208 (2021) 17. Rackauckas, C., Innes, M., Ma, Y., Bettencourt, J., White, L., Dixit, V.: arXiv preprint arXiv:1902.02376 (2019) 18. Czarnecki, W.M., Osindero, S., Jaderberg, M., Swirszcz, G., Pascanu, R.: In: NIPS (2017)
Stack Tag - Predicting the Stack Overflow Questions’ Tags Using Gated Recurrent Unit Networks Varun Prakash, Sagar Raghav, Shubham Sood, Mrinal Pandey, and Mamta Arora(B) Manav Rachna University, Faridabad, Haryana, India [email protected]
Abstract. Stack Overflow like reputed forum has well-curated set of Questions and Answers pertaining to computer programming which is being increased exponentially. Thus, it has become more relevant than ever to provide an alternative to the currently in place tagging system. In this research paper, we proposed a conventional deep learning model that employs the concepts of Natural Language Processing with Gated Recurrent Neural Networks to predict the programming language of the question posted on Stack Overflow. The 10 most popular languages have been considered for the experiments. We built three classifier models separately with three different inputs; the first model requires only the title of the question, the second model requires only the body of the question, and the last model requires both title and the body as the input. Results shows that the classifier using the title only and body only as the input resulted in an accuracy score of 79.21% and 84.17% respectively, however, the classifier trained on both title and body from the question estimated at an astonishing accuracy of 97.38%. The underlying results demonstrate that our deep learning model tends to generalize more accurately when provided with both the title and body of the context. Keywords: Stack Overflow · Deep Learning · Natural Language Processing · Gated Recurrent Neural Networks · Deep Convolutional Neural Networks
1 Introduction In the last decade, we have seen an exponential growth in developers interested in computer programming languages. With the increase of developers come the curious minds looking to seek the answers to the questions or doubts that they may have while learning new programming languages or understanding various aspects of the language. In such cases, new developers usually seek for a platform dedicated to helping the developers. It is not unheard for new platforms to emerge in these trying times. After several similar platforms came into light, Stack Overflow managed to stand on top of them when it came to quality to questions and answers. When we first start working on the research, we pondered which algorithm or neural network would be able to help us to solve the problem. After spending a considerable time researching about various technique, we decided to employ Gated Recurrent Unit © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 448–457, 2023. https://doi.org/10.1007/978-3-031-27440-4_43
Stack Tag - Predicting the Stack Overflow
449
Network as it has been evident [1] that GRU Networks work efficiently when dealing with problems relating to Natural Language Processing; the reason for choosing GRU over LSTM was that it required less parameters relatively and was more memory efficient. The roadmap to the rest of the research paper is coordinated as follows: - Related work done around Stack Overflow and programming languages predictions is presented in Sect. 3. We have briefly described the methodology that we have employed in Sect. 4. The experiments and results are presented in Sect. 5. Finally, we have concluded and examined future scope in Sect. 6.
2 Technologies Employed 2.1 Deep Learning Deep Learning [2] is a conventional process or technique, which is a part of machine learning algorithms, that allows us to extract higher level features from raw information. Deep Learning also allows computational models to produce results that may surpass human or manual expectations due to them learning from models utilizing multiple layers. Deep Learning has drastically improved the performance in terms of computer vision, natural language processing and various other fields. 2.2 Recurrent Neural Networks Recurrent Neural Networks [3] is a type of artificial neural network that has a memory of its own. RNNs use current input with the output of the previous input to determine the next output. RNN uses their internal memory to process the input and produce output. 2.3 Long Short Term Memory Unlike RNN which suffers from vanishing and exploding gradient, Long Short Term Memory [4] solves this problem by making it easier to remember the previous data. LSTM is preferably used in cases of time-series data, it also employs back propagation for the training of the model. 2.4 Gated Recurrent Unit Networks Stack Overflow currently allows either manual tagging of the questions or hard-coded scripts for the questions. We plan to reduce the manual work and shift the job of tagging to a deep learning model. As we learned that Recurrent Neural Networks alone do not remember long sequences and often are difficult to train due to vanishing gradient and exploding gradient, we had to choose between Gated Recurrent Unit Networks and Long Short-Term Memory networks. After much consideration and deliberation, we decided on using GRU Networks due to them being more memory efficient as they require less parameters compared to LSTM.
450
V. Prakash et al.
2.5 Natural Language Processing Natural Language Processing [5] is a branch of Artificial Intelligence that allows computational models to understand, decipher and interpret human language. Natural Language Processing is primarily used for analyzing huge amount of natural language data and produce the desirable output. It allows the solve real world problems like text translation from one language to another, sentiment analysis, etc.
3 Related Work 3.1 SCC++: Predicting the Programming Language of Questions and Snippets of Stack Overflow K. Alrashedy, et al. in [6] proposed an NLP and ML based XGBoost classifier which can be used for prediction of the various the programming language of questions asked on Stack Overflow. The classifier managed to achieve the highest accuracy of 91.1% by using features from the title coupled with the body and the programming code of the question. Classifier can also get an accuracy of 81.1% when only the title and body of the question is used, while it can achieve 77.7% of accuracy using only code snippets of the question. 3.2 SOTagger - Towards Classifying Stack Overflow Posts Through Contextual Tagging A. S. M. Venigalla et al. in [7], considered the 6 categories of question to predicting the programming language tag. They experimented with various classification algorithms such as Logistic Regression, Multinomial Naive Bayes, Support Vector Classifier, Random Forest Classifier and came with the best performing classifier as SVC that is capable to classify the programming language with the 78.5% accuracy. 3.3 Predicting the Programming Language: Extracting Knowledge from Stack Overflow Posts J. F. Baquero, et al. in [8] applied two methods for predicting the different programming languages from the 18 different considered programming languages. Their proposed Support Vector Classifier can achieve the accuracy of 60.88% when applied on Text based features and 44.61% when applied on Source Code features. Their key finding in this is that the models can miss-classify some programming languages like Swift, C and C++, MATLAB etc. because posts related to them share common problems. 3.4 Predicting Tags for Stack Overflow Posts C. Stanley, et al. in [9] developed an ACT-R inspired Bayesian Probabilistic model that can predict the programming language tag of the post on Stack Overflow. Model can 65% accurately predict one programming language per post. They used ACT-R due the functionality it provided, it is indisputably capable of storing knowledge of the post for a large scale and is used for memory retrieval process for predicting the programming language of the post.
Stack Tag - Predicting the Stack Overflow
451
3.5 Predicting Tags for Stack Overflow Posts V. Lakshmi, et al. in [10] proposed a LSTM algorithm for the prediction of the tags related to the user post. There main objective is to clean the data in the pre-processing part so that algorithm can accurately predict the tags corresponding to the post. The classification technique implemented are SVM, Naive-bayes and Logistic regression. Architectural flow of the proposed system is as: Cleaning & Pre-processing, Applying Stemming & lemmatization, Word embedding by porter-stemmer, hamming loss, LSTM for Training and then Predicting tags for the input. They also identified that most queries from the dataset have 2–3 tags, so their model has the scope to suitably predict 1 to 5 tags per query. 3.6 Prediction of Relatedness in Stack Overflow: Deep Learning vs. SVM: A Reproducibility Study B. Xu, et al. in [11] evaluate their research by the DNN & SVM based approaches on a broad dataset. They also compared these approaches with the SimBow (a lightweight SVM method). Result that they are able to achieve is that: performance of SVM approach is better than DNN on a larger data, DNN runtime is better than the SVM but SimBow outperforms the both. Their result has shown that SVM model (i.e., Soft SVM) can outperform others but it is still inadequate for a broad dataset because of the relation knowledge units can be stochastic and there will be no feature for the capturing of any relation.
4 Proposed Methodology 4.1 Dataset For our research, we have used an open dataset [12], consisting of over 2 million posts from Stack Overflow. The dataset consists of 8 columns namely, ID, OwnerUserID, Creation Date, Closed Date, Score, Title, Body and Tag. Table 1 shows the feature description and its datatypes. Figure 1 represents the details workflow of the research, starting from data cleaning and preprocessing, followed by data partitioning and GRU framework for the prediction. 4.2 Data Cleaning and Preprocessing We started with cleaning our dataset, which including dropping null rows as well as irrelevant features, like OwnerUserID, CreationDate, ClosedDate and Score. Once that was done, we proceeded with cleaning out HTML tags from the Title and Body. To minimize the computation power, we have selected top 10 most common languages for our research. Once cleaning and feature selection was completed, we proceeded with applying various preprocessing technique, commonly used in Natural Language Processing, such as Tokenization and Pad Sequencing.
452
V. Prakash et al. Table 1. Features along with their descriptions and data types.
Feature
Description
Datatype
ID
Identification number of a thread posted on Stack Overflow
Integer
OwnerUserID
Unique identification number of the user who posted a particular thread on Stack Overflow
Integer
CreationDate
Date when the thread was created by the user
Datetime
ClosedDate
Date when the thread was closed by the user or moderator
Datetime
Score
The score is a sum of (upvotes – downvotes) of all the answers posted on the thread
Integer
Title
The actual title of the thread posted
Text
Body
The content of the thread
Text
Tag
Tags of each thread posted
Text
Fig. 1. Visual representation of the workflow.
Tokenization: It is a process of splitting a sentence into individual words called tokens. Following tokenization, we converted all the tokens into their respective sequence, meaning providing each unique token a unique integer for its identification. Pad Sequencing: Since each title and body contains different number of tokens, we are required to convert the token sequences of variable length into same length. Since, deep learning models work efficiently with data with consistent number of features. It also performs lemmatization on the words before converting them. For our research, we decided to choose a maximum length of 20 sequences for titles and a maximum length of 600 sequences for body content.
Stack Tag - Predicting the Stack Overflow
453
5 Experiments and Results As we learned that Recurrent Neural Networks alone do not remember long sequences and often are difficult to train due to vanishing gradient and exploding gradient [13, 14] problems. Thus, we selected GRU for this research. The experiments were carried out on both local and cloud-based machines. For the cloud-based machine, we’ve Microsoft Azure platform, with STANDARDNC6 virtual machine as our setup. 5.1 Model Summary For our research, we have made a total of three models, with each of them being trained on different input features. The summaries of the models are presented below. 5.1.1 Title Only The first model trains entirely on the titles of Stack Overflow posts. The model trains on input with an input of 20 sequences per title, followed up by embedding layer. GRU is applied after embedding, coupled with dense layers with ReLU [15] activation functions, dropout and batch normalization [16]. Following, the output layer is a dense layer with softmax [17] activation function. The total numbers of parameters are shown below (Table 2). Table 2. Title only model. Layer
Output Shape
Parameters
Embedding
(None, 20, 2000)
156,686,600
GRU
(None, 300)
2,071,800
Dense
(None, 400)
120,400
Dropout
(None, 400)
0
Batch Normalization
(None, 400)
1,600
Dense
(None, 150)
60,150
Dense
(None, 10)
1,510
5.1.2 Body Only The second model trains entirely on the bodies of the posts. This model requires an input of 600 sequences per body of each post. Similarly, this model follows the same layer pattern as the first model. The total number of parameters are tabulated in Table 3.
454
V. Prakash et al. Table 3. Body only model. Layer
Output Shape
Parameters
Embedding
(None, 600, 100)
272,628,320
GRU
(None, 300)
223,200
Dense
(None, 400)
80,400
Dropout
(None, 400)
0
Batch Normalization
(None, 400)
1,600
Dense
(None, 150)
60,150
Dense
(None, 10)
1,510
5.1.3 Title and Body The third model is implemented using the Functional API [18] of TensorFlow. It allows more flexibility compared to the Sequential API. For this model, we have made two separate models, one for title only and other for body only, and then concatenated [19] them to act as a single model, requiring both title and the body to produce the output. The layer structures along with the parameters are given below (Table 4). Table 4. Title & body model. Layer
Output Shape
Parameters
Embedding Title
(None, 20, 2000)
156,716,600
Embedding Body
(None, 600, 170)
273,259,870
GRU Title
(None, 300)
2,071,800
GRU Body
(None, 200)
223,200
Concatenate
(None, 500)
0
Dense
(None, 400)
200,400
Dropout
(None, 400)
0
Batch Normalization
(None, 400)
1,600
Dense
(None, 150)
60,150
Dense
(None, 10)
1,510
The different models are made to determine to which model gives the most accurate output. We have trained all these above models on local as well as cloud-based virtual machines on the abovementioned dataset. To demonstrate the working from dataset to the output, a flowchart has been presented in Fig. 2.
Stack Tag - Predicting the Stack Overflow
455
5.2 Evaluation Parameters To determine the maximum possible accuracy, we have used K-Fold Validation [20], which allows us to train our model in different subsets of our dataset. For our research, we decided to divide our dataset in equal 5 subsets. For each iteration of training our model, 4 subsets were used, while the remaining subset was used to determine the performance of the model. To calculate the performance of the model, we have employed the standard accuracy, and Time taken to build the model for each case which is represented in Table 6 and 5 respectively. Table 5. Time taken to train each model for each GPU. GPU
VRAM
Trained On
GeForce RTX 2080 Ti Tesla K80 GeForce RTX 2080 Ti Tesla K80 GeForce RTX 2080 Ti Tesla K80
11 GB 24 GB 11 GB 24 GB 11 GB 24 GB
Title Only Title Only Body Only Body Only Title & Body Title & Body
Time Taken
35 mins 1 hr. & 30 mins 1 hr. & 25 mins 3 hrs. & 40 mins Memory Overflow 5 hrs. & 10 mins
For the performance of our model, we observed that our model produced a staggering accuracy of 97.38% when trained on both title and body over 5 epochs, as tabulated below. Table 6. Accuracy for each model Model
Loss
Accuracy (%)
Title & Body
0.2996
97.38
Body Only
0.3437
84.17
Title Only
0.6436
79.21
456
V. Prakash et al.
Fig. 2. Visual representation of the results, showing different accuracy on different models.
6 Conclusion and Future Scope This research deals with a seemingly trivial yet, significant problem revolving around predicting the programming language from posts on general programming website-cumforum. One can easily conceptualize from the models proposed here that a Deep Learning model tend to work well with more data, Moreover, we can also state that it is plausible to predict the tags for programming languages employing multiple languages at the same time, for instance Cython. Though predicting programming languages from the textual information has experienced its fair share of attention in the recent years, it still has tremendous possibilities yet to be explored as such, though the existing tools used in tagging the programming languages are somewhat intuitive but, they are often overly dependent on the users to either manually type the tags associated with the code or tag the associated language with the code. However, the recent advances in Natural Language Processing to produce valuable insights from the text-based data have made it more than obvious that unsupervised machine learning techniques can be used to produce more astute and reliable results. For our future prospects, we would like to extend our model to generalize on broader areas not only specific to Stack Overflow, like code posted on competitive programming websites, programming blog post, official libraries documentation, bug repositories, etc. Moreover, we would also like to capture the sentiments of the responses on the post to filter extremely arrogant or vile comments that can hurt the self-esteem of the user who posted the question.
Stack Tag - Predicting the Stack Overflow
457
References 1. Dey, R., Salem, F.M.: Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks (2017). arXiv preprint arXiv 2. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 3. Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning (2015). arXiv preprint arXiv:1506.00019 4. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In: 1999 Ninth International Conference on Artificial Neural Networks ICANN 1999. (Conf. Publ. No. 470), Edinburgh, UK, vol. 2, pp. 850–855 (1999). https://doi.org/10.1049/ cp:19991218 5. Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011) 6. Alrashedy, K., Dharmaretnam, D., German, D.M., Venkatesh Srinivasan, T., Gulliver, A.: SCC++: predicting the programming language of questions and snippets of stack overflow. J. Syst. Softw. 162, 110505 (2020). https://doi.org/10.1016/j.jss.2019.110505 7. Venigalla, A.S.M., Lakkundi, C.S., Chimalakonda, S.: SOTagger - towards classifying stack overflow posts through contextual tagging (2019) 8. Baquero, J.F., Camargo, J.E., Restrepo-Calle, F., Aponte, J.H., González, F.A.: Predicting the programming language: extracting knowledge from stack overflow posts. In: Solano, A., Ordoñez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 199–210. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-66562-7_15 9. Stanley, C.: Predicting tags for stackoverflow posts (2013) 10. Lakshmi, V., Prabha, V., Udayakumar, P., Swetha, N., Chiranji, L., Chowdhary, C.: Stack overflow tag prediction using deep learning algorithm. Int. J. Sci. Res., 2455–6211 (2021) 11. Xu, B., Shirani, R., Lo, D., Alipour, M.: Prediction of relatedness in stack overflow: deep learning vs. SVM: a reproducibility study, pp. 1–10 (2018). https://doi.org/10.1145/3239235. 3240503 12. https://www.kaggle.com/stackoverflow/stacksample 13. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: 30th International Conference on Machine Learning, ICML 2013 (2013) 14. Phi, M.: Illustrated guide to LSTM’s and GRU’s: a step by step explanation (2018) 15. Agarap, A.F.: Deep Learning using Rectified Linear Units (ReLU) (2018) 16. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167 17. Memisevic, R., Zach, C., Hinton, G., Pollefeys, M.: Gated softmax classification. Neural Inf. Proc. Syst. 23, 1603–1611 (2010) 18. Goldsborough, P.: A tour of tensorflow (2016) 19. Rosebrock, A.: Keras: multiple outputs and multiple losses (2018) 20. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection (2001)
A Rapid Review on the Application of Unmanned Aerial Vehicles in Construction Safety D. Yuvaraj
and K. S. Anandh(B)
Department of Civil Engineering, SRM Institute of Science and Technology, Kattankulathur 603203, Tamil Nadu, India [email protected]
Abstract. In the construction sector, the workers are engaged in various activities that expose them to serious risks like accidents causing injuries or even loss of worker life. Managing safety at the job site is a predominant task of the safety managers employed in the projects. The recent advancement with the latest technologies helps safety managers to monitor job site safety practices and to implement corrective action. Unmanned aerial system (UAS) is one such recent technology in which unmanned aerial vehicles (UAV) are used to monitor the construction site for safety issues. It helps safety professionals in inspecting even hard-to-reach places periodically as well as quickly. The purpose of this study was to perform a quick review of the literature on the application of UAVs in construction safety. According to the study, there has been very little research conducted on the employment of UAVs in construction safety around the world. This helps the researchers to know about the present status of the research work being performed on the topic. The barriers and hazards of the utilization of UAV on construction sites are also identified from the literature survey. The research gap identified from the collected literature helps the researchers to progress further in the field. Keywords: Unmanned aerial system · unmanned aerial system · Construction safety · UAS · UAV
1 Introduction The construction industry is widely regarded as one of the most dangerous in the world, with frequent fatalities [1]. It is the sector in which most of the work is carried out by human resources [2]. Despite advancements in construction safety equipment, technology and training, the construction sector has a high rate of fatal and nonfatal injuries and accidents among its workers [3]. Every year around 108 thousand workers die on construction sites accounting for nearly 30% of all occupational fatalities. Also, the chance of occurrence of fatal injuries due to accidents among construction workers is 3 to 4 times that of other workers [4]. In India, the construction sector became the second largest employer and recipient of foreign direct investment (FDI) in 2020–2021 [5] where nearly 24.20% of 48000 occupational fatalities are from the construction sector © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 458–466, 2023. https://doi.org/10.1007/978-3-031-27440-4_44
A Rapid Review on the Application of Unmanned Aerial Vehicles
459
[6]. Every construction site should have safety personnel responsible for implementing safety policies and regulations. They should conduct periodic inspections on the site to detect hazardous locations and ensure all safety precautions are being followed in those areas. The safety engineer should carefully monitor the hazardous locations for any safety measures violation. Accidents occur if the safety personnel need to inspect and monitor the site properly. Hence, to prevent accidents at the construction site, proper implementation of site safety inspection is unavoidable [7]. It is not possible for the safety engineers to regularly inspect the inaccessible, difficult-to-reach, and dangerous site locations where accidents occur frequently. In such situations, Unmanned Aerial Vehicles (UAVs) containing various sensors can be employed by safety engineers to access the remote areas of the site easily, instantaneously, and at a low cost [8]. They help safety engineers establish live contact with workers working in remote areas through communication devices [9]. Unmanned Aerial Vehicles (UAV), also called drones, are aircraft systems that the pilots control from the ground without boarding the aircraft [10]. The person on the ground controlling the vehicle and the system that connects the vehicle and the person is commonly referred to as Unmanned Aerial System (UAS). Initially, the UAVs were employed only for military purposes, and now UAVs find their application in both civilian and commercial areas apart from military purposes. With the improved battery life, flight control, and low-cost and independent navigation features, the application of UAVs has significantly increased in recent years [10]. UAS technology has gained popularity in the construction industry due to its versatility and advantages in terms of usability, costeffectiveness, and safety [11]. The use of drones in the construction industry increased by 239% in 2018 [12]. The UAVs consist of cameras, sensors and communication devices that are used to transfer real-time data. The industry initially makes use of UAVs to perform realtime reconnaissance surveys of their project sites and to provide high-definition (HD) videos and images for documentation [13]. Now, UAVs are employed with other digital technologies like building information modeling (BIM) and extended reality (ER) to aid automation and digitization towards achieving better project performance [14]. Apart from this, UAVs are used as the main tool for ensuring safety among employees in high-risk construction projects. The current study aims to conduct a rapid review of the published research articles on the application of UAVs in construction safety. The review is conducted to identify the current trend on the topic and to spot the research gap. The articles available from 2017 to August 2022 are used for the study.
2 Research Method Rapid review is a method where the components of a systematic literature review procedure are limited or excluded to provide better information on the topic promptly [15]. It balances the time restraint with considerations of bias. A rapid review can be conducted within a time frame of 1 to 6 months [16]. A comprehensive search was conducted in the online databases, Scopus and Web of Science. These online databases were selected as they are complete, well-organized
460
D. Yuvaraj and K. S. Anandh
and robust concerning scientific research when compared with Google Scholar, another online database [17]. In google scholar, the data are unsystematic and not updated. It covers a wide range but is not comprehensive [18]. Thus the online databases Scopus and web of science were only considered for the study. The Articles related to the application of UAVs in construction safety were retrieved from the database. The articles were searched and retrieved from the database by searching the keywords within the ‘Article title, Abstract, Keywords’ field of the database. The keywords relevant to the study i.e. application of unmanned aerial vehicles were chosen and a search string was framed based on the keywords. The search string used for the study was (“Unmanned Aerial System” or “Unmanned Aerial Vehicle” or “Unmanned Aerial Technology” or “Drone”) and (“Construction Safety” or “Safety Monitoring”). Scopus database retrieved 88 documents for the string used. Conference papers, review articles and articles in languages other than English were excluded and the database retrieved 34 documents. Similarly, Web of science retrieved 33 documents and after filtering conference papers, review articles and articles in other languages, 20 articles were retrieved. The documents recovered from Scopus and Web of science were checked for duplication and found 18 duplicated articles. The duplicated articles were removed and the total number of articles from Scopus and Web of science together was 36. The title and the abstract of these articles were studied to remove the articles irrelevant to the study. The final count of articles after removing the irrelevant documents was 17.
3 Results and Discussion 3.1 Overview of the Related Documents The 17 articles seem to be more appropriate among the 36 articles reviewed for the period 2017 – August 2022. Figure 1 shows the year-wise distribution of articles on the application of UAVs in construction safety. It is evident from the figure that the publication of articles related to the application of UAVs in construction safety starts in 2017 with 1 article per year till 2018. The number of articles published showed a drastic hike during 2021 with 8 articles from 3 articles in 2020. Again there prevail a decline in the number of publications in 2022 with 2 articles. This shows that there exists a lag in the research related to the UAV’s application in construction safety. From the literature survey, it is observed that the case study approach was the most performed research method by the researchers. Models have been developed by the researchers for the integration of UAV with construction safety and these models were validated by way of conducting case studies. Hassandokht Mashhadi et al. conducted a case study to explore the feasibility of utilizing UAVs for regulating air pollution and heat stress developed at the sites [19]. Martinez et al. designed and developed iSafeUAS technology for safety monitoring and evaluated its performance at the jobsite [20]. Kim et al. performed a case study on the validation of the framework developed for the automated safety monitoring system by combining ITCP, UAS and deep learning technology [21]. Melo and Costa performed a case study strategy to analyze the safety planning and control procedures using the visual data collected with the help of UAVs
A Rapid Review on the Application of Unmanned Aerial Vehicles
461
Fig. 1. Number of articles published on UAVs application in construction safety.
and the development of a theoretical framework for integrating resilience engineering and UAS technology [22]. Xu and Turkan performed a mixed-method approach to develop a model for mitigating the safety risks concerning UAV application [11], while, Umar performed the mixed-method approach for identifying the safety-related application of drones [23]. Namiam et al., Wu et al., Alizadehsalehi et al., and Gheisari & Esmaeli followed a quantitative approach to identify the effectiveness of UAV utilization in construction safety, and to identify the risk factors associated with the application of UAV in construction safety [24–27]. Manzoor et al. conducted a qualitative study on the integration of BIM with drone technology [28] while Kim et al. conducted interviews with the construction personnel for developing a conceptual safety management system integrating with UAS [29]. 3.2 Advantages of UAV in Construction Safety In many countries, the application of UAVs in construction safety is at an early stage. Most organizations employ UAVs for photographs of their site for marketing purposes, surveying and quality inspection [23]. The organizations that use UAVs in construction safety, employ them for safety monitoring and control processes [27]. It helps to collect visual data from the site with the help of high-definition cameras attached to the device. These data help the safety engineers to visualize the unsafe conditions prevailing on the work site [30]. The majority of construction-related accidents occur in hazardous zones when the workers come in contact with the construction equipment or vehicles available at the
462
D. Yuvaraj and K. S. Anandh
site. UAVs are used to monitor the workers engaged in tasks where safety plays an important role. Some of the safety-related tasks that UAVs are employed are monitoring boom vehicles and cranes close to overhead power lines, monitoring the movement of the boom vehicles and monitoring unprotected edges and openings, investigating fall protection systems, etc. [23–27]. Further, researchers develop models by integrating UAS technology with other concepts like BIM [9, 28, 31], Deep learning [21], Resilience engineering [22] and 4D-BIM [26] for safety monitoring at the construction site and the validation of these models were made by performing case studies. These models helped the safety managers in integrating the safety regulations and standards available, with the UAS technology and monitor the activities for any violation of the standards [21, 26]. Also, the case studies assist the construction personnel in the improvement of the technical features in the UAVs like camera mobility, autopilot, sense and evade facility and real-time video transmission [27]. The model developed by Alizadehsalehi et al. integrating 4D-BIM (3D-BIM + work schedule), UAS technology and safety standards identifies the potential risk prevailing at the site and helps to eliminate them using the safety rules [26]. Further, the safety personnel involved in the case study considered this model to be advantageous as it provides early safety warnings, an automated safety control process, real-time and quick reporting of unsafe conditions and reduced manpower for safety capture at the site. Similarly, the BIM models developed by the visual data collected from the UAVs help the safety managers to visualize the most hazardous zones at the site and also can reduce the time of inspection visits through a walkthrough by the managers. This aid the safety personnel to increase visit count [31] and also improves the decision-making process [28]. The model developed by Melo and Casto by integrating resilience engineering with UAS technology improves the safety planning and control process in the organization through more reliable workflow and reduces safety conflicts [22]. It also helps in reducing the distance between work-as-done and work-as-planned and develops awareness among the workers for improving the conditions of the workplace. Construction workers working on highway road projects are subjected to the safety hazards of being struck by vehicles either construction vehicles or passenger vehicles. These situations are more dangerous which leads to worker fatality. Kim et al. developed a model by integrating game engine-based internal traffic control plans (ITCP), a deep learning method for the detection of objects like cranes, vehicles, workers, etc., and UAS technology for capturing images on the job site for safety monitoring [21]. This model helped the safety managers to identify the workers who violated the safety rules and helped them to separate the workers from the hazardous location where heavy vehicles were equipped. Foundation failure is considered to be one of the hazards developed during the excavation of the foundation pit. It may sometimes lead to the fatality of the workers. Wu et al. developed a model by making use of the UAVs for the fast monitoring and analysis of the construction foundation pit based on the local deformation method [25]. This evaluation of the foundation pit with the help of UAVs saved much time from hours to minutes and also reduces the usage of more equipment and complicated processes.
A Rapid Review on the Application of Unmanned Aerial Vehicles
463
Construction workers are exposed to extreme weather conditions for several hours leading to heat stress which affects their health and leads to heat-related illness. Air pollution is another major problem in the construction industry where the workers are exposed to dust and other pollutants during construction or demolition activities. A nebulizer retrofitted UAV can be used for aerial spraying of water to reduce air pollution and the heat stress developed at the construction site [19]. Apart from monitoring safety at the workplace, UAVs can also be employed to periodically inspect building structures. This helps in the early detection of damages and deterioration of the structural members. In a bridge structure, the scours that develop at the substructure of the bridge due to the flow of streams or floods cause an early failure of the bridge. Ozcan and Ozcan developed a finite element model (3D-FEM) of the bridge and the depth of the scour was accurately measured with the help of the ultra-high resolution images taken by the UAV [32]. 3.3 Barriers and Hazards in the Utilization of UAVs Even though the application of UAVs is found to be advantageous in the field of construction safety, there exist certain limitations in their employment in the construction industry. The inclusion of UAVs can introduce new hazards to construction job sites and adversely affect the health and safety of workers. The application of UAVs on the construction job site may lead to some hazards like the UAV colliding with the worker which sometimes leads to the fatality of the worker, the UAV or some parts of it falling over the worker causing injuries, the collision of the device with the buildings causing damage to the structures or collision with other equipment at the site [20, 23, 24, 27, 30, 31, 33], the sight of UAV and the noise created by it distract the workers leading to accidents [20, 23, 24, 29, 30, 33] and sometimes the workers may feel uncomfortable of being watched by the device leading to the development of stress among them [33] etc. Also, in some congested work sites with several workers working alongside heavy equipment and objects, the deployment of UAVs in such situations will to accidents [24]. These hazards may affect the life of the workers leading to both loss of life and financial loss. Besides the hazards, the studies reveal that there exist some practical limitations in the implementation of UAVs in the construction workplace. Some of the barriers include the weather conditions [23, 27, 29–31, 34], especially the wind condition prevailing at the site, the low battery life of the device restricting its usage to small and medium projects [27, 31], local government policies in flying the drones [20, 23, 27, 30, 31, 34], lack of technical persons in operating the drones [22, 23, 27, 34], interference of sensors present in the device with the metallic objects at the site [20, 29, 31], etc., Maritnez et al. designed and developed iSafeUAS technology for safety inspection purposes customized with a parachute recovery system that reduces the impact energy and the probability of fatality when a UAV falls over a worker [20]. Also, this technology can significantly reduce the risk of UAV collision with other objects, signal intereference and worker distraction.
464
D. Yuvaraj and K. S. Anandh
4 Future Directions The literature survey depicts that less research was conducted on the application of UAVs in construction safety. Also, it is observed that there is no evidence of research conducted in India concerning safety monitoring in construction using UAVs. The research reveals that applying UAS in construction safety is one of the technological advancements in monitoring the worker’s safety on a construction job site. The case studies performed on UAS application in construction safety helped validate the proposed models in the literature. Most of the case studies were conducted on small or mediumsized projects. There must be more in implementing safety models using UAVs in large projects. Further, the barriers faced while adopting UAVs in the field were described in every study. Extensive research must be conducted on the various difficulties that arise while deploying UAVs, and mitigating strategies to overcome them must be developed. The safety model developed in a region may not apply to other regions, so research has to be performed to develop a standard model adaptable to any region.
5 Summary and Conclusion In the construction sector, UAS technology monitors work progress, surveying, developing 3D models, transportation, safety monitoring, inspection, and damage assessment [13, 35]. Technology and its adoption may vary from country and region based on various vital factors; from scratch, it starts with the skilled person, demand in the market, and adaptive human resources measures [36, 37]. To prevent accidents and protect people from hazardous situations, safety engineers must perform periodic safety inspections. In large sites, it is difficult for the safety engineers to conduct periodic inspections as it consumes more time and also, it is difficult for them to inspect the hard-to-reach places physically. UAVs are deployed on the sites to collect visual data using the cameras and sensors attached to them. The collected data helps the safety personnel make quick decisions by providing them with real-time videos of the current situation at the job site. The devices present in the drones can also assist safety managers in communicating with the workers employed in risky environments and guide them. They are also used to monitor the incorporation of safety standards, policies, and regulations at the site. Any violation of the standards can be easily identified using UAVs. The studies also outlined some barriers and hazards when using UAVs in construction sites, like weather conditions, battery life, magnetic interference with metallic objects, government regulations, the collision of vehicles with workers and buildings, etc. Also, UAVs can only be used to collect outdoor data. It is challenging to collect indoor data due to the problem of collision with objects. The study thus reveals the current research stage in the application of UAVs in construction safety. It helps the researchers proceed further in the topic based on the gaps provided. More models can be developed by integrating UAVs with other digital technologies for their effective use in construction safety. Also, the study was limited only to peer-reviewed journal articles available till August 2022 on safety. The study can be further continued by including conference proceedings and the latest articles published after the specified period for systematic literature review.
A Rapid Review on the Application of Unmanned Aerial Vehicles
465
References 1. Im, H.-J., Kwon, Y.-J., Kim, S.-G., Kim, Y.-K., Ju, Y.-S., Lee, H.-P.: The characteristics of fatal occupational injuries in Korea’s construction industry, 1997–2004. Safety Sci. 47(8) (2009) 2. Anandh, K.S., Gunasekaran, K., Mannan, M.A.: investigation on the factors affecting lifestyle of professionals in the construction industries (Kerala and Tamil Nadu). Int. J. Integrat. Eng. 12(9), 246–252 (2020) 3. Meng, X., Chan, A.H.S.: Current states and future trends in safety research of construction personnel: a quantitative analysis based on social network approach. Int. J. Environ. Res. Public Health 18, 883 (2021) 4. International Labour Organization: Construction: A hazardous work (2015). https://www.ilo. org/safework/areasofwork/hazardous-work/WCMS_356576/lang-en/index.htm. Accessed 20 Sept 2022 5. Rani, H.A., Farouk, A.M., Anandh, K.S., Almutairi, S., Rahman, R.A.: Impact of COVID19 on construction projects: the case of India. Buildings 12, 762 (2022) 6. The Times of India: 48000 die due to occupational accidents: study (2017). https://timesofindia.indiatimes.com/business/india-business/48000-die-due-to-occupational-accidents-yea rly-study/articleshow/61725283.cms. Accessed 4 Nov 2022 7. Niskanen, T.: The effects of the enforcement legislation in the Finnish occupational safety and health inspectorate. Saf. Sci. 55, 135–148 (2013) 8. Gheisari, M., Irizarry, J.: Investigating human and technological requirements for successful implementation of a BIM-based mobile augmented reality environment in facility management practices. Facilities 34(1/2), 69–84 (2016) 9. Alizadehsalehi, S., Asnafi, M., Yitmen, I., Celik, T.: UAS-BIM based real-time hazard identification and safety monitoring of construction projects. In: 9th Nordic Conference on Construction Economics and Organization, Goteborg, Sweden (2017) 10. Liu, P., et al.: A review of rotorcraft Unmanned Aerial Vehicle (UAV) developments and applications in civil engineering. Smart Struct. Syst. 13(6) (2014) 11. Xu, Y., Turkan, Y.: The development of a safety assessment model for using Unmanned aerial systems (UAS) in construction. Saf. Sci. 155, 105893 (2022) 12. Drone deploy: The rise of drones in construction (2018). https://dronedeploy.com/blog/risedrones-construction. Accessed 20 Sept 2022 13. Tatum, M.C., Liu, J.: Unmanned aircraft system applications in construction. Procedia Eng. 196, 167–175 (2017) 14. Rachmawati, T.S. N., Kim, S.: Unmanned Aerial Vehicles (UAV) integration with digital technologies toward construction 4.0: a systematic literature review. Sustainability 14(9), 5708 (2022) 15. Tricco, A.C., et al.: A scoping review of rapid review methods. BMC Med. 13(1), 224 (2015) 16. Virginia Commonwealth University: Rapid review protocol. https://guides.library.vcu.edu/ rapidreview. Accessed 24 Sept 2022 17. De Castro e Silva Neto, D., Cruz, C.O., Rodrigues, F., Silva, P.: Bibliometric analysis of PPP and PFI literature: overview of 25 years of research. J. Construct. Eng. Manage. 142(10) (2016) 18. Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., Pappas, G.: Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J. 22(2), 338–342 (2008) 19. Mashhadi, A.H., Handy, R., Farhadmanesh, M., Rashidi, A., Honda, T., Sleeth, Darrah Kaye, Henry, Trent: Feasibility study of using nebulizer-retrofitted UAVs at construction projects: the case study of residential jobsites in Utah. J. Construct. Eng. Manage. 148(10) (2022). https://doi.org/10.1061/(ASCE)CO.1943-7862.0002368
466
D. Yuvaraj and K. S. Anandh
20. Martinez, J.G., Albeaino, G., Gheisari, M., Issa, R.R.A., Alarcón, L.F.: iSafeUAS: an unmanned aerial system for construction safety inspection. Autom. Constr. 125, 103595 (2021) 21. Kim, K., Kim, S., Shchur, D.: A UAS-based work zone safety monitoring system by integrating internal traffic control plan (ITCP) and automated object detection in game engine environment. Autom. Constr. 128, 103736 (2021) 22. Rodrigues Santos de Melo, R., Bastos Costa, D.: Integrating resilience engineering and UAS technology into construction safety planning and control. Eng. Construct. Architect. Manage. 26(11), 2705–2722 (2019) 23. Umar, T.: Applications of drones for safety inspection in the Gulf Cooperation Council construction. Eng. Constr. Archit. Manag. 28(9), 2337–2360 (2021) 24. Namian, M., Khalid, M., Wang, G., Turkan, Y.: Revealing safety risks of unmanned aerial vehicles in construction. Transport. Res. Rec. J. Transport. Res. Board 2675(11), 334–347 (2021) 25. Wu, J., et al.: Rapid safety monitoring and analysis of foundation pit construction using unmanned aerial vehicle images. Autom. Construct. 128, 103706 (2021) 26. Alizadehsalehi, S., Yitmen, I., Celik, T., Arditi, D.: The effectiveness of an integrated BIM/UAV model in managing safety on construction sites. Int. J. Occupat. Saf. Ergon. 26(4), 829–844 (2020) 27. Gheisari, M., Esmaeili, B.: Applications and requirements of unmanned aerial systems (UASs) for construction safety. Saf. Sci. 118, 230–240 (2019) 28. Manzoor, B., Othman, I., Pomares, J.C., Chong, H.-Y.: A research framework of mit. Appl. Sci. 11(18), 8359 (2021) 29. Kim, S., Irizarry, J., Costa, D.B.: Field test-based UAS operational procedures and considerations for construction safety management: a qualitative exploratory study. Int. J. Civil Eng. 18(8), 919–933 (2020). https://doi.org/10.1007/s40999-020-00512-9 30. de Melo, R.R.S., Costa, D.B., Álvares, J.S., Irizarry, J.: Applicability of unmanned aerial system (UAS) for safety inspection on construction sites. Saf. Sci. 98, 174–185 (2017) 31. Martinez, J.G., Gheisari, M., Alarcón, L.F.: UAV Integration in current construction safety planning and monitoring processes: case study of a high-rise building construction project in Chile. J. Manage. Eng. 36(3) (2020) 32. Özcan, O., Özcan, O.: Multi-hazard assessment of RC bridges using unmanned aerial vehiclebased measurements. Baltic J. Road Bridge Eng. 13(3), 192–208 (2018) 33. Jeelani, I., Gheisari, M.: Safety challenges of UAV integration in construction: conceptual analysis and future research roadmap. Saf. Sci. 144, 105473 (2021) 34. Izadi Moud, H., Flood, I., Zhang, X., Abbasnejad, B., Rahgozar, P., McIntyre, M.: Quantitative assessment of proximity risks associated with unmanned aerial vehicles in construction. J. Manage. Eng. 37(1) (2021) 35. Zhou, S., Gheisari, M.: Unmanned aerial system applications in construction: a systematic review. Constr. Innov. 18(4), 453–468 (2018) 36. Al-Mohammad, M.S., et al.: Factors affecting BIM implementation: evidence from countries with different income levels. Construction Innovation, Vol. ahead-of-print No. (2022) 37. Fakher, R.A., Anandh, K.S.: An exploratory study to utilize construction 4.0 technologies in enhancing communication to get quality human resources. In: Loon, L.Y., Subramaniyan, M., Gunasekaran, K. (eds.) Advances in Construction Management. LNCE, vol. 191. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-5839-6_41
Retail Demand Forecasting for 1 Million Products Ioannis Pierros(B) , Eleftherios Kouloumpris , Dimitrios Zaikis , and Ioannis Vlahavas Intelligent Systems Lab, Aristotle University of Thessaloniki, 54154 Thessaloniki, Greece {ipierros,elefthenk,dimitriz,vlahavas}@csd.auth.gr
Abstract. Retail chains without proper demand forecasting tools are susceptible to significant financial losses by missing out on sales due to stocked-out products or having to throw out expired products because they were overstocked. Extensive research has been carried out, comparing different forecasting methodologies and models, examining the influence of different factors, and highlighting the significance of intermittent forecasting. However, these approaches often struggle to scale up and crumble when dealing with larger retail chains. In this paper, we analyze the real case of a big retail chain with 300 stores, 200 product groups per store and over 1 million products in total. We propose an architecture made up of multiple Neural Network models that can generate forecasts in a timely manner, taking into account calendar features, promotions and the interactions between competing products. It produces daily predictions in under 3 h and retrains weekly the models whose performance deteriorates in 12 h, using an AutoML component to explore deeper and larger architectures. It is a critical component of the company’s Order Management System, achieving a Root Mean Squared Error of 4.48 units across each horizon that was defined by the company. Keywords: Retail demand forecasting · Time series Networks · End-to-end system · AutoML
1
· Neural
Introduction
Discrepancies between the available inventory and customer demand in a retail setting might result in monetary ramifications for the business. Products that are out of stock lead to immediate as well as long-term shortage costs from missed sales and potential customer turnover, respectively. Similarly, business operations and sales planning inefficiencies add to the cost due to excess inventory or in some situations spoiling for products with short shelf-life. In the retail This research was partially funded by MASOUTIS, a private national retail chain company, through the Special Account for Research Funds of the Aristotle University of Thessaloniki (Project No. 73026) under contract 180311/26-07-2021. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 467–479, 2023. https://doi.org/10.1007/978-3-031-27440-4_45
468
I. Pierros et al.
industry, this associated cost equates to the product’s contribution margin as shortage costs are typically set higher than inventory costs [13]. Consequently, retailers must establish a viable strategy for selecting the appropriate model for demand forecasting in order to optimize inventory levels. Accurate forecasting is crucial to driving revenue, growth, and profitability. The information gained from these predictions can be used to make proactive, informed, and intelligent decisions about the sales process. Demand forecasts are the theoretical roadmaps of estimated sales performance, revenue, and expenses that guide sales operations in the planning processes. They aid with the prediction of potential problems that could impact sales performance before they happen and highlight potential profit opportunities. Demand predictions can be created on a monthly, weekly, daily, or even hourly basis to assist various planning processes and business choices, but highly granular forecasts are always incredibly helpful. When it comes to fresh food products, the advantages of a granular forecast are evident, as they have short shelf lives and spoil quickly. The heterogeneity of demand patterns, which may differ in terms of frequency, amount, variation, and demand regularity across products, is a major difficulty in Retail Demand Forecasting (RDF). Strong trends, seasonal changes, and irregular demand peaks are common in time series, making demand forecasting difficult [6]. Retail businesses store a massive amount of relevant product data that produce diverse sets of information that grow at ever-increasing rates. This growth is further amplified by the number of product offerings, generating substantial amount of historical timeseries data that makes it difficult for single model approaches to consistently take advantage of. Furthermore, additional factors such as special calendar days, product inter-dependencies and product promotions affect product demands. Consequently, the majority of research in RDF focuses on the various modeling difficulties in lower volume datasets, which are not representative of the actual demands of big retail chains. Time series analysis methods, such as exponential smoothing, ARIMA-type models, and state-space models, forecast demand based on past sales history making them appropriate only if historical temporal patterns are expected to persist in the future. This is frequently not the case in retail practice, as exogenous effects such as erratic price promotions can have a significant impact on customer demand. Regression models seek to model demand using explanatory factors, which may include time and time-varying variables. The most common method for modeling directed links between explanatory factors (features) and demand is multiple linear regression where exogenous factors, such as pricing and marketing efforts, may be included. However, in the event of complicated consumer demand patterns, linear regression predictions are frequently broken. Recently, Machine Learning (ML) approaches have been increasingly adopted to solve the shortcomings of statistical and regression approaches in highvolume/high-demand RDF [3]. Whereas traditional models are more suitable for detecting linear patterns, ML models are better at uncovering non-linear relationships and handling external features. In addition, ML methods make
Retail Demand Forecasting for 1 Million Products
469
better use of big datasets and respond with better adaptability to new data streams. Works that solve RDF with ML commonly use supervised learning to train highly accurate forecasting models, which are then evaluated in simulated scenarios. The improvements are often brought by solutions to domain-specific problems, such as forecasting special calendar days, handling intermittent sales and taking into account the effects of promotions. While these are promising findings, the proposed solutions are rarely applied to such volumes of data that are typically produced by large retailers. This discrepancy of scale is not only related to data volume, but also variety. Specifically, most works focus on particular product groups with a limited number of items, although a larger collection of items with a variety of demand behaviors should be dealt with for production purposes. In this paper, we reassess the above shortcomings and propose a new MLbased RDF system that features several advantages. Firstly, the system was designed for and is able to handle the actual load required by a large retail chain with 300 stores in Greece. The system has been in use for over a year, and it forms an essential part of procedures related to stock management and purchase order automation. Secondly, we address multiple modeling issues via a custom deep learning architecture that can uncover inter-group dynamics and extract useful information from external features. Thirdly, the same architecture is equipped with a Automated Machine Learning (AutoML) mechanism that provides continuous adaptation to a changing retail demand environment. The AutoML mechanism is critical in order to support the inclusion of new products or stores in the system. Consequently, to the best of our knowledge, this is the first paper to propose a fully-automated and complete forecasting system that has also been evaluated in the context of an actual case study. The system was developed in Python 3.6 with the open source libraries Tensorflow v2.6, optuna v2.10 and pandas v1.4 on a virtual machine with an 8-core Xeon(R) CPU E5-2620 v4 @ 2.10 GHz. The remainder of this paper is organized as follows. In Sect. 2, we provide a brief review of related works and explore the techniques used for RDF. In Sect. 3, we elaborate on the overall system architecture and describe our approach in detail. In Sect. 4, we present the experimental evaluation and results. Finally, in Sect. 5, we present our conclusions and the direction for future research.
2
Related Work
Our research highlighted that most works in RDF focus on different modeling issues. Among these are tasks such as improving prediction for special calendar days, the detection of product inter-dependencies and the selection of promotion features. Yet, the vast majority of such works tackle small or medium sized datasets, which are not indicative of the computational requirements posed by large retail chains. Furthermore, while there exist previous works that address problems of scale that are inherent to big retail data, to the best of our knowledge, these works do not include detailed forecasting performance evaluation
470
I. Pierros et al.
with actual data. In the following paragraphs, we provide a brief overview of recent works in this field. ˙ slek and O˘ ¨ gu In [9], I¸ ¨d¨ uc¨ u proposed a demand forecasting system for main distribution warehouses, using bipartite graph clustering to group together warehouses with similar sales. For each group, a hybrid model that combined Moving Average (MA) model and Bayesian Network (BN) was used to forecast sales. While product promotion features were not considered, location demographics were included in the model’s input. Whereas the system was able to serve more than 100 main distribution warehouses, the task was to handle 70 products of a specialized company, which is low compared to the number of products in a generic supermarket chain. Ma et al. designed an RDF system based on a feature selection methodology for high-dimensional marketing data with LASSO regression [12]. The data were selected among 15 product categories of a medium sized store, while the observations were weekly and included sales, prices and promotions. Among the examined feature groups, the work demonstrated that inter and intra-category product promotions can significantly improve forecasting performance, with the improvement being mostly due to the latter group. B¨ose et al. presented a probabilistic RDF platform implemented on Apache Spark [2]. The platform was engineered for scale and is able to scale to millions of products with the use of large-scale and distributed ML approaches. It supports two operation modes, production and experimentation. However, beyond the proposed architecture, the authors did not provide detailed forecasting performance evaluation scores from real case studies. In [10], Liao et al. claimed that related works frequently ignore informative relations between products and stores. Towards this goal, they proposed Structural Temporal Attention Network (STATnet) to capture dependencies among products. Experiments with two real-world datasets demonstrated that STATnet has the potential to outperform state-of-the-art methods. Huber and Stuckenschmidt investigated supervised learning methods to address daily demand forecasting for a bakery chain, with a special emphasis on the prediction of special calendar days [7]. Interestingly enough, rather than modeling demand forecasting as a numerical regression problem, which is the typical representation, the authors opted for a classification problem representation. The system demonstrated promising results on a real-world dataset and scaled to a large set of products and stores. Long Short-Term Memory (LSTM) was reported as the best performing model, whereas advanced hyper-parameter tuning and feature selection methods were not considered. In [5], Falatouri et al. compared LSTM and Seasonal Autoregressive Moving Average (SARIMA) using more than three years of actual sales data. The best model for each product depended on the underlying seasonality, with LSTM outperforming SARIMA for seasonal products, and vice-versa for products with no seasonalities. Furthermore, the authors also considered SARIMAX1 to integrate 1
SARIMAX is an improved version of SARIMA that is able to handle external features, such as promotion and weather features in addition to sales data.
Retail Demand Forecasting for 1 Million Products
471
promotion features, which outperformed both models in most cases. Having said that, there was no attempt to feed LSTM with promotion features and the results were based on only four vegetable products. In [14], Wen et al. propose MQ-Forecaster, a modified seq2seq architecture for multi-step quantile regression that takes advantage of the direct strategy. MQForecaster was applied to Amazon retail data comprising of demand, promotions and catalog fields for 60.000 products. The authors claim that MQ-Forecaster is able handle millions of time series thanks to a forking-sequences scheme that significantly reduced training time, though no experiments backed up this claim. Eisenach et al. extended the previous work by proposing MQ-Transformer [4]. Experiments showed that MQ-Transformer, which is based on recent advances in Transformer architectures, offered several improvements in forecasting accuracy by reducing excess variability. Experimenting on a subset of 2 million products and four years of Amazon data, they reported a 38% improvement over the previously reported state-of-the-art. Whereas this work provides relative performance estimates compared to a baseline, total or per category forecasting performance estimates (e.g., MAE, MSE, RMSE) are not reported. Lim et al. proposed the Temporal Fusion Transformer (TFT) to better combine heterogenous data sources such as historical demand, static product/store related information, and known future events [11]. Besides making better use of heterogenous data, TFT also has a transparent architecture that provides better interpretability. Experiments with one year of data for 135k product/store pairs showed that TFT is also able to outperform MQ-Forecaster in quantile forecasting. Finally, no details are given regarding the training time of TFT. A more detailed literature review for RDF methods is provided by Ingle et al. [8]. In our work, our aim is to address modeling challenges like those mentioned by [7,9,10], with a scalable architecture that can handle as many stores and products as [2], while we also evaluate the system on actual retail data.
3
System Architecture
The system follows an end-to-end approach and is comprised of 4 individual components that are designed for separate tasks, namely data synchronization, model training, inference, and storing the prediction to the database (Fig. 1). Additionally, a logging component that runs in the background pushes regular updates to the user’s database and also stores detailed execution logs locally for debugging purposes. All components can be executed concurrently for different stores and products, significantly reducing the total required execution time. The modular design targeted our main concerns, specifically, to avoid using the user’s database for extended periods of time and to limit coupling between the two ML components and the interaction with outside systems. By detaching model training and inference, we facilitated faster prototyping and local verification and debugging of the core components that changed most-often during development and throughout the system’s lifetime. There are two discerning scenarios that must be efficiently handled: cold-start and daily use. Cold-start
472
I. Pierros et al.
Fig. 1. Architecture overview of the proposed retail forecasting system
refers to the first execution of the system’s functionalities for newly integrated stores and products that were not included in the initial phase of forecasting. Stores and products with very small or non-existent volume of sales were handled with linear models as an edge-case. Furthermore, an important requirement was to accommodate user-defined queries that specify a list of stores/products that would be forecasted, according to the order schedule. Data Synchronization Component. The Data Synchronization Component oversees the downloading of the sales data from the user’s database and stores them in the local filesystem. In this manner, the data can be accessed quickly during model training and inference without network overhead and repeated usage of the user’s database. All common SQL-based databases are supported. The component’s functions include preprocessing (data cleaning, adding calendar features) and aggregating a product’s sales and all relevant information. There is a separate table-formatted file for each product group category in each store, with each row corresponding to the sales of a specific day. Synchronizing stores or products for the first time (cold start) can take some time as it must download data for the whole sales’ history, however for the daily use it only downloads the previous few days since the last execution. Model Training Component. The Model Training Component includes 3 components: promotion selection, AutoML, evaluation. The promotion selection component chooses which promotion categories will be used in the model based on the frequency of their appearance. The evaluation component aggregates all sales and previous forecasts of the last week for each product and calculates the prediction error as the root mean squared error over the forecasted horizon.
Retail Demand Forecasting for 1 Million Products
473
The AutoML component is executed in predetermined periods which are customizable by the user. If the error is below a minimum threshold, then the training process is skipped. On the other hand, an error that surpasses a maximum threshold triggers a neural architecture search exploring larger and deeper models from a precomputed selection of parameters. The selection is decided using a Tree-structured Parzen Estimator [1] on a few representative stores based on their size and sales volume that were selected by the retail chain. If a model doesn’t exist already (cold start), or when the error is in-between the two thresholds, the model is trained with a default architecture that integrates the selected promotion categories and the most recent sales data. Inference Component. The Inference Component produces predictions based on a customizable schedule for the stores and products selected by a filter query provided by the user. Sales data and relevant promotion features are dynamically fetched according to the configuration for the respective model. It supports a varying forecasting horizon, meaning the user could, for example request a 5day forecast on Monday and a 2-day forecast on Thursday. During inference, information regarding future promotions is retrieved from the user’s database, for the respective features that are used by the model. Storing Component. After the inference task has been completed for all stores and products, the predictions are stored in the user’s database by the Storing Component. Finally, a signal is sent to a predetermined end-point that notifies the Order Management System that the system has finished its daily routine and can move on to processing the orders. Logging Component. The Logging Component facilitates monitoring and debugging for all other components. Summary execution logs can be used to track the daily usage of the system as well as the exit code of each process. Each component uses a different list of exit codes, though they can be summarized as info (nominal execution), warning (erroneous states, i.e. no sales data), and error (irregular execution due to faults in the code). Furthermore, detailed debugging logs, such as configuration parameters and progress statuses throughout execution, are saved locally to later investigate issues that might come up. The Logging Component is thread-safe, meaning that it can run concurrently from multiple processes without entangling different log entries together.
4
Experimental Evaluation
The proposed system architecture was developed for an actual retail chain with 300 stores and about 200 product groups per store, in two phases. In the first phase, the retail chain indicated 3 representative stores based on their size and sales volume that were used for prototyping. Afterwards, the system was expanded to execute for the rest of the stores by slowly adding new ones. It is used to generate daily forecasts for the following few days, per the user’s query. The models are evaluated at the end of each week and retrained if necessary.
474
I. Pierros et al.
Over 2 years’ worth of data are used for training the models, with almost 1 billion entries spanning over 1.110.000 SKUs (6337 unique codes) in 300 stores. The stores are categorized according to their size; 69 Small, 118 Medium, 108 Large, and 16 Grand size stores with a median of 6.135.000, 13.390.000, 15.560.000, and 3.330.000 items sold respectively each day per store size category. There are 8 different types of offers, such as 1+1 (i.e. two for the price of one), a percentage discount or points for registered members that can be exchanged for coupons. Furthermore, there are 7 different types of promotions, of which the most impactful are brochures, which were included in the model’s input after a suggestion from the domain expert. However, many SKUs never or rarely have any promotions, therefore it is important to determine whether including the promotion will indeed improve the models’ performance. This is done by calculating the ratio of SKUs with and without promotions and comparing it with a ratio threshold, which is a configurable parameter that is optimized when selecting the basic model architecture. 4.1
Basic Model Architecture
As there are thousands of models that must be trained, it is important to find a default model architecture achieves satisfactory forecasting performance for the majority of available SKUs, thus avoiding costly retraining and AutoML architecture searches. A stacked LSTM was used, in accordance with recent literature showcasing it as a strong choice for forecasting [5,7]. Other than the demand signal for the items in the specific product group, the model also receives additional signals for the promotion offers and calendar features, to predict the sales for the same items. The depth and width configuration of the model were derived following a Grid Search optimization. The number of layers ranged between [1,4] layers, while the number of units in the first layer were selected among {50, 100, 200, 400} units and decayed by a factor of 0.7 for the following layers. For each architecture configuration, the respective models were trained for the 3 representative stores and tested for the last 3 months. All models are trained using an Adam optimizer with a learning rate of 1e − 4 on the Mean Squared Error (MSE) loss. The Root MSE (RMSE) was selected as the reporting metric because it is easily interpretable while also highlighting larger forecasting errors. The Mean Average Performance Error (MAE) metric was deemed unsuitable due to the existence of many SKUs with zero sales for particular days. The results for the neural architecture search for the basic model architecture, presented in Table 1, indicate that the best architecture for the majority of the models is a 3-Layer Stacked LSTM, consisting of [50, 35, 25] units respectively on each layer. A similar approach was used when determining the optimal promotion ratio cutoff threshold, taking into account different ratios in the range [3e−4, 7e−3] with increments of 0.1. Outside of these boundaries, either all models or none would use the promotions as features. Table 2 displays the RMSE for all promotion ratios for each of the 3 representative stores, with the best score displayed
Retail Demand Forecasting for 1 Million Products
475
in bold. Though there are some very close candidates for Store B, ratio 4e-3 is clearly distinguished as a common top candidate among all 3 stores. Table 1. RMSE for each architecture configuration. Layers Units Store A Store B Store C 1
2
3
4
4.2
50 100 200 400 50 100 200 400 50 100 200 400 50 100 200 400
6.877 6.760 7.346 7.604 6.841 7.176 7.070 7.514 6.600 7.126 7.337 6.826 6.887 6.840 6.751 6.928
5.645 6.029 6.005 6.148 5.604 7.793 6.571 6.359 5.505 5.592 5.583 5.736 6.034 5.625 5.801 5.673
2.733 2.788 2.913 2.834 2.757 2.733 2.765 2.890 2.664 2.797 2.778 2.914 2.720 2.699 2.753 2.773
Table 2. RMSE for each promotion ratio. Promotion Ratio Store A Store B Store C 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.0010 0.0020 0.0030 0.0040 0.0050 0.0060 0.0070
6.414 5.742 6.343 6.143 6.168 5.864 5.890 5.766 6.188 6.071 5.497 5.780 6.078 6.349
3.900 3.954 3.884 3.977 3.915 3.888 3.940 3.917 3.903 3.934 3.895 3.925 3.931 3.883
2.419 2.447 2.417 2.433 2.450 2.455 2.443 2.422 2.448 2.415 2.395 2.420 2.434 2.449
AutoML
If the model’s previous forecasting error exceeds a maximum threshold defined by the user, it triggers an automatic neural architecture search which explores larger and deeper models, using a Tree-structured Parzen Estimator (TPE) [1]. Considering θi a configuration parameter and θˆi the search space for that parameter, then the search space for all parameters is given by: ˆ = {θˆi : i = 1, 2, ..., N } Θ Similarly, an architecture configuration can be defined as θ j and the list of all candidate configuration becomes: Θ = {θ j : j = 1, 2, ..., trials} where the number of trials defaults to 20, though it can be customized. Taking into account the trained model φ(θ) on the respective parameters, a wider selection of architecture configurations is first precomputed and evaluated on the 3 representative stores:
476
I. Pierros et al.
ˆ repr ), RM SE(φ(θ i )) ≤ RM SE(φ(θ i+1 )) Θrepr = T P E(Θ repr repr The hyper parameters and their search spaces cover the number of layers (1–5), units per layer (50, 100, 200, ..., 500) and unit decay per layer (0.5, 0.6, ..., 1.0). The preliminary calculation of candidate architectures on the subset of representative stores is indispensable to reducing the size of candidate search space to a manageable level. At the same time, it provides a list of sensible architectures that have already established strong performances. If activated during training, the AutoML component trains and evaluates models for all architecture configurations in Θrepr , choosing the best. 4.3
Case Study
The system presented in this paper has been developed for usage in a realworld case of a retail chain with 300 stores and 200 product groups per store. Integrating the system with a store’s Order Management System is a timeconsuming process. Each store receives daily demand predictions based on its ordering schedule, which are subsequently used to manage the stock of each SKU. Performance was evaluated over a 7-month period, from beginning of 2022 and until the end of July 2022. The system executes daily in under 4 h. Data Synchronization takes around 120 min, Inference 90 min and Storing the predictions less than 30 min. Training requires approximately 24 h and executes weekly. Requested forecasting horizons typically cover the following 5 or 6 days, as seen in Table 3, though shorter 3-day forecasts are also common. Table 3. Frequency of each forecasting horizon per day, in percentages (%). Day Horizon Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8
0 0 15 6 73 6 0 0
0 0 16 8 65 0 2 8
0 0 14 14 6 62 3 1
0 1 20 0 10 62 7 0
2 8 0 13 5 72 1 0
6 0 0 14 7 64 9 0
Forecasting performance is calculated by adding the total predicted quantity over the full horizon and comparing it with the realized sales. As previously, RMSE is used, rounded to the second decimal point. The overall forecasting performance of the system is 4.48 RMSE, which was deemed satisfactory by
Retail Demand Forecasting for 1 Million Products
477
the retail chain client. Figure 2 shows the RMSE after grouping the predictions according the day of the week when they were produced. The forecasting error on Tuesdays and Thursdays (the days when non-produce products are ordered by the specific retail chain) is generally worse, though by a very small margin. Sundays have significantly less traffic and are therefore harder to accurately predict. Nonetheless, the system’s average performance is similar across days and doesn’t indicate any overfitting issues.
Fig. 2. Forecasting RMSE per day.
Evaluating the system per store does not produce any significant outliers, as most stores have comparable forecasting accuracy with a lower than 1 RMSE standard deviation. Most stores have an RMSE < 4.50 with only a few that have slightly increased RMSE. On a per-group basis, the majority of models actually achieve lower than 3.50 RMSE, which is lower than the overall system’s error, though there are a couple outliers. These evaluations are displayed in Fig. 3. Further investigation indicates the main predicting factor to be a different type of promotion that is not considered by the system, highlighting it as a possible future improvement.
Fig. 3. Forecasting accuracy for the system. Left: RMSE per horizon. Middle: binned RMSE histogram per group. Right: binned RMSE histogram per store.
478
5
I. Pierros et al.
Conclusions
Demand forecasting is an important problem for retail chains that want to optimize stock management. The huge volume of sales data have empowered machine learning solutions against traditional forecasting approaches. While a significant amount of machine learning based solutions have been proposed, few of them provide evidence that their solutions can scale to the needs of large retail chains. Furthermore, the few works that do provide evidence of scalability, lack evaluation of their forecasting performance and only focus on the scalability, fail to integrate exogenous features, or only cover a small number of products. This paper presented a complete RDF system that uses machine learning to forecast retail demand. The proposed system is currently used to serve the business needs of one of the largest Greek retail chains, with 300 stores and 200 product groups. At the same time, our experimental evaluation demonstrates a sufficient and balanced forecasting performance across the majority of product groups and stores. The system was designed while keeping usability in mind, and thus is easily extendable, allowing for daily customized horizons and effortless addition of new stores and product groups. Moreover, we presented an automated mechanism for neural architecture search and hyper-parameter tuning that is able to reconfigure and retrain the forecasting models only when it is deemed necessary. We provide three directions towards improving the RDF system proposed in this paper. First, to consider more external features during the modeling process, such as weather conditions and other marketing data. Second, to further investigate and address outlier products for which demand forecasting can be improved. Finally, to explore whether separate models for each day of the week can provide further improvements. We plan on addressing these points in a future version of our RDF system.
References 1. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a nextgeneration hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp. 2623–2631. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3292500.3330701 2. B¨ ose, J.H., et al.: Probabilistic demand forecasting at scale. Proc. VLDB Endowment 10(12), 1694–1705 (2017) 3. Cerqueira, V., Torgo, L., Soares, C.: Machine learning vs statistical methods for time series forecasting: Size matters (2019). https://doi.org/10.48550/arxiv.1909. 13316 4. Eisenach, C., Patel, Y., Madeka, D.: Mqtransformer: multi-horizon forecasts with context dependent and feedback-aware attention. arXiv:2009.14799 (2020) 5. Falatouri, T., Darbanian, F., Brandtner, P., Udokwu, C.: Predictive analytics for demand forecasting-a comparison of sarima and lstm in retail scm. Procedia Comput. Sci. 200, 993–1003 (2022)
Retail Demand Forecasting for 1 Million Products
479
6. Fildes, R., Ma, S., Kolassa, S.: Retail forecasting: research and practice. Int. J. Forecast. (2019). https://doi.org/10.1016/j.ijforecast.2019.06.004 7. Huber, J., Stuckenschmidt, H.: Daily retail demand forecasting using machine learning with emphasis on calendric special days. Int. J. Forecast. 36(4), 1420– 1438 (2020) 8. Ingle, C., Bakliwal, D., Jain, J., Singh, P., Kale, P., Chhajed, V.: Demand forecasting: literature review on various methodologies. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2021) ˙ slek, I., ˙ O˘ ¨ gu 9. I¸ ¨d¨ uc¨ u, S ¸ .G.: A retail demand forecasting model based on data mining techniques. In: 2015 IEEE 24th International Symposium on Industrial Electronics (ISIE), pp. 55–60. IEEE (2015) 10. Liao, S., Yin, J., Rao, W.: Towards accurate retail demand forecasting using deep neural networks. In: Nah, Y., Cui, B., Lee, S.-W., Yu, J.X., Moon, Y.-S., Whang, S.E. (eds.) DASFAA 2020. LNCS, vol. 12114, pp. 711–723. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59419-0 44 11. Lim, B., Arik, S.O., Loeff, N., Pfister, T.: Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37(4), 1748–1764 (2021). https://doi.org/10.1016/j.ijforecast.2021.03.012 12. Ma, S., Fildes, R., Huang, T.: Demand forecasting with high dimensional data: the case of sku retail sales forecasting with intra-and inter-category promotional information. Eur. J. Oper. Res. 249(1), 245–257 (2016) 13. Ulrich, M., Jahnke, H., Langrock, R., Pesch, R., Senge, R.: Classification-based model selection in retail demand forecasting. Int. J. Forecast. 38(1), 209–223 (2022). https://doi.org/10.1016/j.ijforecast.2021.05.010 14. Wen, R., Torkkola, K., Narayanaswamy, B., Madeka, D.: A multi-horizon quantile recurrent forecaster. In: The 31st Conference on Neural Information Processing Systems (NIPS 2017), Time Series Workshop (2017)
Intelligent Mapping of Virtualized Services on Multi-domain Networks Vinicius Fulber-Garcia1(B) , Marcelo C. Luizelli2 , Carlos R. Paula dos Santos3 , Eduardo J. Spinosa1 , and Elias P. Duarte1 1
3
Federal University of Paran´ a, Curitiba, Brazil {vfgarcia,spinosa,elias}@inf.ufpr.br 2 Federal University of Pampa, Alegrete, Brazil [email protected] Federal University of Santa Maria, Santa Maria, RS, Brazil [email protected]
Abstract. One of the challenges of the Network Functions Virtualization (NFV) paradigm is to deploy virtualized network functions and services efficiently. In particular, current solutions for multi-domain service mapping present several restrictions regarding the choice of optimization models and metrics. This lack of flexibility ultimately leads to sub-optimized mappings that do not meet the (often conflicting) requirements of all the parties involved in the deployment process (e.g., network operators, clients, providers). This work proposes GeSeMa (Genetic Service Mapping), a new intelligent mapping solution based on genetic algorithms. GeSeMa allows the specification of arbitrary optimization metrics, constraints, and different evaluation policies. We evaluate GeSeMa through a case study, comparing its results with the results of a stateof-the-art genetic-based mapping solution.
1
Introduction
Network Functions Virtualization (NFV) is driving a paradigm shift in telecommunications. NFV allows network functions that have been traditionally implemented as physical appliances in hardware to be implemented as software that runs on virtual machines [9]. Virtual Network Functions (VNF) [5] can be combined to create virtual network services called Service Function Chains (SFC) [7]. SFCs are compositions of multiple VNFs connected on a service topology. The deployment of virtual services on a network requires that it is efficiently embedded in the infrastructure [6,16]. Informally, the problem of mapping a network virtualization service consists of defining where the network functions that make up the service will be instantiated and executed. The problem becomes more challenging if the network consists of multiple administrative domains. Different domains may have restrictions on the number of services they run and the resource requirements of the respective functions. In addition, the policies the domain adopts together c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 480–490, 2023. https://doi.org/10.1007/978-3-031-27440-4_46
Intelligent Mapping of Virtualized Services on Multi-domain Networks
481
with business rules adopted by each domain also have an impact on which alternatives are feasible and their costs. Moreover, there are network functions that are native to specific domains, to which they must necessarily be mapped. In general, there is a choice of where each function should be executed, which depends on the policies and resources available in the domains. Mapping also depends on the topology of the virtualized service and the multi-domain network topology to which it will be mapped. In this case, the objective is typically to reduce the amount of traffic transferred between domains as flows are forwarded through the network service. Furthermore, other criteria can be defined for each particular mapping process, such as maximizing the number of users and maximizing or minimizing the number of domains used to host the service. It should also be taken into account that the mapping objectives usually change according to the very nature of the service being mapped, the type of environment in which they operate, and also the network technologies involved, such as 5G or earlier cellular networks or even IoT or vehicular networks. Traditional solutions for mapping VNFs are based on evaluation setups that are often static in terms of the set of optimization metrics they employ, as well as objectives and weights, lacking the flexibility required to customize their execution [9,10,15]. Typically, those solutions only allow stakeholders to make simple adjustments of the weights of pre-configured optimization metrics [8]. Thus, the requirements of the multiple stakeholders (i.e., clients, providers, and network operators) are hardly met. A static strategy often leaves stakeholders having to adapt their needs to the restrictions of the mapping solutions they are using. The limitations can be critical in multi-domain environments [11,14,16, 18]. To the best of our knowledge, no current virtual service mapping solution allows arbitrary optimization metrics and objectives to be defined. In this work, we propose a new multi-domain mapping solution called Genetic Service Mapping (GeSeMa). GeSeMa allows the evaluation setup to be customized, providing high flexibility to adapt to the different needs of multiple stakeholders and considering several features. To do that, the stakeholders describe their needs and other service features on a standard request document. GeSeMa then uses a multi-objective optimization metaheuristic based on genetic algorithms to find mapping candidates in a feasible time. We evaluate GeSeMa through a case study, including a comparison with a state-of-the-art geneticbased mapping solution [11]. The rest of this work is organized as follows. Section 2 presents related work. GeSeMa is presented in Sect. 3. Evaluation results are in Sect. 4, including a case study comparing GeSeMa with a state-of-the-art mapping solution. Finally, Sect. 5 concludes the paper and presents future work.
2
Related Work
There are often multiple possible mappings of a given virtualized network service on a multi-domain environment. However, the performance of those distinct mappings varies when different policies, constraints, and optimization metrics are
482
V. Fulber-Garcia et al.
employed [9]. Mapping solutions evaluate the multiple alternatives to guarantee, for instance, the QoS (Quality of Service) and QoE (Quality of Experience) of the final results. Dietrich et. al. [3] propose a solution that optimizes the multi-domain mapping by relying on four static metrics: (i) minimization of financial costs; (ii) minimization of the number of different providers and domains; (iii) minimization of resource usage; and (iv) maximization of suitability weights. In [13], a multi-domain mapping solution recovers information about financial costs, transmission delays, and resource usage to evaluate and optimize (with a minimization objective) the candidate mappings. Finally, in [16], a multi-domain mapping strategy is proposed that considers hybrid scenarios where private and public domains provide optical network resources. The objective of that solution is to minimize financial costs and the usage of frequency slots of the optical channels connecting the domains. The solution proposed in [18] consists of a multi-domain mapping technique based on a vertex-centric algorithm. The solution triggers rounds of message exchanges among providers to find candidate mappings iteratively. The mapping algorithm uses a mechanism to avoid the concentration of the entire service on a single provider. However, it does not optimize any specific metric, only returning for the user a set of candidate mappings that fulfill the allocation and instantiation constraints of the requesting service. With a method similar to [18], DistNSE [1] finds candidate mappings and employs a process based on message exchanges among providers. This solution evaluates two optimization metrics: minimization of financial costs and stabilization of inter-domain load. In [11], a multi-domain mapping technique based on a mono-objective genetic algorithm is proposed. The objective of that solution is to allocate the network functions of a network service chain on a multi-domain environment based on a single indicator (E). This indicator represents multiple domain metrics, such as link availability, bandwidth, and the number of network functions that each domain can host, among others. The solution proposed in [14] employs a mono-objective genetic algorithm to map virtualized network services on physical substrate nodes. The solution aims to optimize the consumption of computing and networking resources by the network services. In this way, the authors propose an objective function that minimizes the residual capacity of nodes to host functions and links to handle their communication, given the mapped services. Despite the fact that most of these solutions evaluate multiple optimization metrics, they do not enable stakeholders to customize the evaluation setup (i.e., it is not possible to define/select neither the metrics employed by the optimization process, nor the objectives/weights). This lack of customization makes it difficult to model and evaluate policies that are closely related to the deployment process (e.g., maximum delay, maximum geographical distance). Furthermore, solutions in [13,16] present limitations in terms of the specification of domain dependencies (i.e. they do not allow the specification of which functions should be allocated to which particular domains). Thus, for example, these solutions
Intelligent Mapping of Virtualized Services on Multi-domain Networks
483
are not suitable to embed hybrid services (i.e., those in which physical network functions coexist with virtualized network functions along a service topology) in multi-domain environments.
3
Genetic Service Mapping
In this section we present GeSeMa (Genetic Service Mapping), a solution that employs genetic algorithms to map virtualized network services across multiple administrative domains. GeSeMa enables stakeholders to define service and network topologies, function and domain dependencies, and the evaluation setup (optimization metrics, objectives, weights, and constraints). This custom information is specified in a request document written in the YAML Ain’t Markup Language (YAML). 3.1
GeSeMa’s Request Model
GeSeMa’s request model presents three main objects that define (i) the service topology and the network functions (SERVICE); (ii) the optimization metrics and objectives (METRICS); and (iii) the domains and their characteristics (DOMAINS). A string specified according to the rules of the Service ChAin Grammar (SCAG) [6] represents the service topology in the SERVICE object. Furthermore, for each network function defined in the service topology, there is a corresponding entry in the FUNCTIONS sub-object. This entry, identified by the function ID, specifies the minimum resource requirements, including memory, virtualized processing cores, and virtualized network interfaces, all defined as integer values. The METRICS object defines metrics and objectives used by the genetic algorithms of GeSeMa to search, evaluate, and optimize candidate mappings. Metrics are of two categories: local or transition. Local metrics are used to evaluate the allocation of network functions to domains, which correspond to the vertices of a graph representing the infrastructure on which the service is to be mapped. Local metrics include, for instance, the financial cost to allocate a function, and the domain load, among others. Transition metrics are related to inter-domain connections – which correspond to the edges of the infrastructure graph. Examples of transition metrics include delay, distance in hops, and geographical distance. The metrics and their categories are defined in the request model using LOCAL and TRANSITION sub-objects, respectively. Each of these sub-objects can define multiple metrics. A metric must be uniquely identified (by its ID), besides having two mandatory attributes: OBJECTIVE and CONSTRAINTS. The objective attribute shows the evaluation criteria for a particular metric, which can be either MAXIMIZATION or MINIMIZATION. The last attribute (CONSTRAINTS) consists of a list of strings, each of which refers to the constraints of an optimization metric. Constraints define acceptance thresholds for the evaluation results of optimization metrics. In order to check results with respect to thresholds, relational operators (“”, “=”, “==” and “! =”) are employed to compare numerical values with the corresponding thresholds.
484
V. Fulber-Garcia et al.
Finally, the DOMAINS object defines the physical and virtual environments available and their transitions (connections). The domains are represented by a directed graph G = (V, E). The set of vertices V corresponds to the set of domains, and the set of edges E represents the logical connections between domains. The model keeps the information about LOCAL metrics of each domain (vertex) and TRANSITION metrics associated with the edges. A particular domain is thus defined with three sub-objects: RESOURCES, LOCAL, and TRANSITION. The RESOURCES sub-object contains information about memory (MEMORY), virtual processing cores (VCPU), and virtual network interfaces (IFACES) made available by the domain. The LOCAL and TRANSITION sub-objects, in turn, define the metrics associated with domains and their connections obtained either with benchmarking or from catalogs; this is used by the optimization process. These sub-objects are also related to the METRICS object, and there must be a correspondence between metric identifiers and benchmark identifiers for both the LOCAL and TRANSITION sub-objects. In special, each entry of the TRANSITION sub-object determines to which domain the transition corresponds (using the domain unique identifier) and then defines the values of the optimization metrics for the transition. 3.2
The Proposed Genetic Multi-domain Mapping Method
GeSeMa executes two well-known genetic algorithms: NSGAII [2] and SPEA2 [19]. Those algorithms have been successfully applied to solve networking problems, including fault diagnosis [4,12]. Note that the system can be extended to include other algorithms. The stakeholders can choose the genetic algorithm taking into account their characteristics, features of the requested service, and the domains, plus the evaluation setup provided. The genetic algorithms model the virtualized service mapping problem as follows: Individuals: An individual’s chromosome is modeled as a vector with N > 1 genes (i.e., positions), where each gene corresponds to a network function of the service topology (i.e., each function is mapped to a position in the vector). Genes contain alleles, represented by integer values in the range [0, M − 1] which correspond to the M > 0 domains available to map the network functions. Note that, in GeSeMa, a valid individual is a candidate mapping. Population: The initial population is created randomly or using a greedybased strategy. The initial population must not violate any function to domain dependencies, if there is any (i.e., for instance, if a domain must host some function, the index corresponding to the specific domain is fixed to the allele of the constrained gene). The population size P > 0 is a parameter defined by the stakeholders. Objectives and Constraints: GeSeMa evaluates objectives (with the evaluation setup) and constraints (e.g., policies, network topology, computational
Intelligent Mapping of Virtualized Services on Multi-domain Networks
485
resources, and dependencies) for all individuals of each generation. We use a taboo list to keep invalid individuals and avoid re-evaluations in case of new occurrences; If it happens, three actions are possible: (i) discard the individual (a standard action); (ii) replace the individual with a new random individual (in case policies or network topology constraints are violated); or (iii) reduce domain redundancy (in case of computational resources constraints are violated). Selection: The selection chooses individuals of a generation to crossover. GeSeMa uses a tournament mechanism that randomizes I individuals and returns the one that is the most fitted among them (i.e., the one on the best Pareto frontier). The tournament size I > 1 is defined by the stakeholders. Crossover: GeSeMa provides four crossover operators: Simulated Binary Crossover, Half Uniform Crossover, Partially Mapped Crossover, and Subtour Selection Crossover. The crossover operator and ratio (i.e., operator application probability) are also defined by the stakeholders. Mutation: The proposed solution employs two mutation operators: replacement and swap. Replacement chooses a random gene and replaces its allele with a new random value. Swap chooses two random genes and exchanges their alleles. Genes with domain constraints are never mutated. Similar to crossover, the stakeholders can define the mutation operator and its ratio. GeSeMa executes two main procedures: (i) validation and configuration of the genetic algorithm; and (ii) creation and evolution of the population. The first procedure uses the model specified in Subsect. 3.1 to validate the provided service request, thus mapping high-level structures to iterable elements (i.e., dictionaries, and lists). Next, the procedure checks previously defined genetic parameters (i.e., population size, tournament size, crossover operator/ratio, mutation operator/ratio, and the number of generations) and, if valid, it configures the genetic algorithm. Finally, the first procedure generates a set of software elements employed for the creation and evolution of individuals by the second procedure. Figure 1 summarizes the second procedure of GeSeMa. At first, the network service, encoded as a string according to the SCAG grammar, is converted to a format that is processed by the genetic algorithms (Fig. 1: A and B). The initial population is generated with valid individuals in terms of the network topology (network domain transitions) and domain dependencies (constrained network functions pinned to their respective domains). Next, the individuals are evaluated (Fig. 1: C) considering the availability of computational resources in the chosen domains and other constraints. In this way, each candidate is evaluated iteratively, gene by gene for all metrics. Results of all genes are aggregated to define the overall result for each metric. Finally, GeSeMa executes selections (Fig. 1: D) in addition to the crossover and mutation genetic operations (Fig. 1: E and F, respectively) to evolve the population. All the stages depicted in Fig. 1 C, D, E, and F represent the processing done to create a generation of individuals (Fig. 1: G). Finally, after each generation has been created, the genetic
486
V. Fulber-Garcia et al.
Fig. 1. Summary of the GeSeMa Workflow
algorithm saves the best-fitted results (local Pareto frontier) for reusing in future generations. After a predetermined number of generations, GeSeMa returns the last Pareto frontier found as the final result (Fig. 1: H). In particular, the evaluation stage (Fig. 1: C) produces information that is relevant for the next stages. Local optimization metrics are computed with the current gene’s allele. Transition optimization metrics, in turn, are processed when a domain transition occurs. The transition metrics use the current gene’s allele and the alleles of previously related genes. Besides the allele, for each gene, there is a so-called relation array with indexes of previously related genes (i.e., previous network functions that have a connection with a particular network function in the requested service topology). In this way, linear chromosomes can represent branched service topologies. The set of partial evaluation results (i.e., by gene/allele) are jointly processed, and the individuals are classified in terms of Pareto frontiers.
4
Experimental Evaluation
In this section, we present an empirical evaluation of GeSeMa1 For the experiments, we employed the topology that corresponds to the Amazon AWS network, consisting of 114 domains [17]. All the experiments were executed 30 times with a confidence level of 95%. Preliminary experiments were run to determine values for the parameters of the genetic algorithms. GeSeMa is compared with GA+LCB, which is a mapping solution based on a mono-objective genetic algorithm [11]. In addition to the traditional mapping process (mapping the main network functions of a network service), GA+LCB includes a backup mapping mechanism that creates a backup schema for the 1
The implementation is available at https://github.com/ViniGarcia/NFV-FLERAS.
Intelligent Mapping of Virtualized Services on Multi-domain Networks
487
requested network service. However, as GeSeMa does not create backups, for comparison purposes, GA+LCB is executed to map the main functions, not the backups. The GA+LCB objective function was configured to maximize the modified domain importance (impk from [11]), which consists of the maximization of three metrics – link availability (dak ), bandwidth availability (dck ), and the availability factor (Ak ) – and the minimization of a single metric – interdomain delay (ddk ). The GA+LCB solution computes this evaluation setup as E = w1 ∗ nor(dak ) + w2 ∗ nor(dck ) + w3 ∗ nor(Ak ) + w4 ∗ (1−nor(ddk )), where 4 nor indicates a normalization function and wn the metric weight ( n=1 wn = 1). Both GeSeMa and GA+LCB are employed to map a network service with 9 generic network functions. Two restrictions have to be guaranteed by both solutions: the result mapping of network functions should not exceed the computational resource limits of the domains, and no more than two network functions should be mapped to each domain. Furthermore, both solutions were configured to obey both maximum delay and minimum availability constraints. The values for metrics dck and Ak are defined randomly in the intervals [100, 500] and [0.95, 0.99], respectively; the value of dak is 114 for all the domains (the network topology is a complete graph); and the value of ddk is defined considering the geographical distance between pairs of domains gdk,k+n in the curve gdk,k+n ∗ (1 − enor(gdk,k+n )∗−4 ) ∗ 0.05. As required by GA+LCB, the initial domain and the final domain are specified in the mapping request document. The genetic parameters of GeSeMA were configured to be as similar as possible to GA+LCB. GA+LCB includes a crossover of half of the population using a personalized algorithm. Thus, we configured GeSeMa with a crossover ratio of 0.5 using the SBX algorithm (SBX has similar behavior to the GA+LCB crossover algorithm). The mutation ratio is set to 0.05, GA+LCB uses a specific, simple mutation algorithm; GeSeMa uses a replacement mutation algorithm. GA+LCB executes a traditional roulette selector; GeSeMa employs a binary tournament selector. GA+LCB creates the initial population based on a k-shortest path algorithm; GeSeMa creates the initial population randomly. GA+LCB uses a self-designed mono-objective genetic algorithm with elitism features; GeSeMa adopts SPEA2. The population size of 50 was the same for both solutions, as well as the execution of 20000 generations. Finally, we removed the parameter weighing of GA+LCB and evaluated the Pareto Frontiers of the returned results for both solutions. The first experiment compares the quality of the candidates returned by GeSeMa and GA+LCB. We use the mean of the relative Pareto frontiers for the comparison (smaller numbers are better). Figure 2 shows the mean frontiers of candidates returned for two cases: “complete” (frontiers of all candidates from all executions are used to compute the mean value) and “top 10” (frontiers of top ten candidates of all executions are used to compute the mean value). The GA+LCB solution presented a better mean of the relative frontiers in the “complete” case. However, GeSeMa surpasses the GA+LCB results in the “top 10” experiment. This behavior occurs due to the number of candidates returned from GA+LCB at each execution: precisely one. Thus, GA+LCB returns a total of 30 candidates
488
V. Fulber-Garcia et al.
Fig. 2. Frontiers Comparison (Genetic)
Fig. 3. Exec. Time Comparison (Genetic)
with the best E value achieved in each execution of the solution. GeSeMa, in turn, returns the entire Pareto frontier, which typically contains multiple candidates. In this experiment, GeSeMa provided approximately 49 candidates per execution, from a total of 1463 candidates evaluated in the “complete” case. Some of these candidates are not better fitted than the ones returned by the GA+LCB, but, as demonstrated by the “top 10” case, the best candidates of GeSeMa are more fitted than the best candidates of GA+LCB. The second experiment compares the mean execution times of GA+LCB and GeSeMa to map the service in the AWS network topology. Figure 3 shows the results. GeSeMa presented a better mean execution time: 104% faster than GA+LCB. These results can be explained as follows. First, GeSeMa employs a lightweight random initial population strategy, while GA+LCB uses a k-smallest path heuristic to create a possibly more fitted initial population. Thus, the GA+LCB strategy requires the execution of shortest path algorithms that take quite a lengthy amount of time to run in large network topologies. Second, the evaluation of multiple optimization metrics with a mono-objective genetic algorithm requires an aggregated index (in GA+LCB, called E). The creation of this index imposes extra time to process the normalization and weighting required by each generation. Third, GA+LCB does not have any mechanism to avoid the evaluation of candidates which have been already discarded but reappear during the execution of the genetic algorithm. GeSeMa, in turn, uses a taboo list to ignore those candidates.
5
Conclusion
The deployment of virtualized network functions and services depends on proper resource allocation while guaranteeing that restrictions are respected. In this context, multi-domain mapping allows embedding a network service across a distributed environment consisting of multiple administrative domains. Current
Intelligent Mapping of Virtualized Services on Multi-domain Networks
489
multi-domain mapping solutions do not enable the stakeholders to customize their evaluation setups. In this paper, we presented Genetic Service Mapping (GeSeMa), an intelligent mapping solution that uses genetic metaheuristics to execute a customizable mapping of service topologies across multi-domain environments. We evaluated the feasibility and performance of GeSeMa compared with another state-of-the genetic-based alternative. The results confirm that GeSeMa produced mappings of superior quality with lower execution times.
References 1. Abujoda, A., Papadimitriou, P.: DistNSE: distributed network service embedding across multiple providers. In: International Conference on Communication Systems and Networks, pp. 1–8. IEEE (2016) 2. Deb, K., et al.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 3. Dietrich, D., et al.: Network service embedding across multiple providers with nestor. In: Networking Conference, pp. 1–9. IEEE (2015) 4. Duarte, E.P., Jr., Pozo, A.T., Nassu, B.T.: Fault diagnosis of multiprocessor systems based on genetic and estimation of distribution algorithms: a performance evaluation. Int. J. Artif. Intell. Tools 19(01), 1–18 (2010) 5. Fulber-Garcia, V., et al.: On the design of a flexible architecture for virtualized network function platforms. In: 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2019) 6. Fulber-Garcia, V., Luizelli, M.C., dos Santos, C.R.P., Duarte, E.P.: CUSCO: a customizable solution for NFV composition. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) AINA 2020. AISC, vol. 1151, pp. 204–216. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44041-1 19 7. Fulber-Garcia, V., et al.: Network service topology: formalization, taxonomy and the custom specification model. Comput. Netw. 178, 107337 (2020) 8. Fulber-Garcia, V., et al.: Customizable deployment of NFV services. J. Netw. Syst. Manage. 29(3), 1–27 (2021) 9. Herrera, J.G., Botero, J.F.: Resource allocation in NFV: a comprehensive survey. Trans. Netw. Service Manage. 13(3), 518–532 (2016) 10. Huff, A., et al.: Building multi-domain service function chains based on multiple NFV orchestrators. In: Conference on Network Function Virtualization and Software Defined Networks, pp. 19–24. IEEE (2020) 11. Li, Y., et al.: Cost-and-QOS-based NFV service function chain mapping mechanism. In: Network Operations and Management Symposium, pp. 1–9. IEEE (2020) 12. Nassu, B.T., Duarte, E.P., Ramirez Pozo, A.T.: A comparison of evolutionary algorithms for system-level diagnosis. In: Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, pp. 2053–2060 (2005) 13. Riera, J.F., et al.: Tenor: steps towards an orchestration platform for multi-pop NFV deployment. In: NetSoft Conference and Workshops, pp. 243–250. IEEE (2016) 14. Rodis, P., Papadimitriou, P.: Intelligent network service embedding using genetic algorithms. In: Symposium on Computers and Communications, pp. 1–7. IEEE (2021) 15. Schardong, F., et al.: NFV resource allocation: a systematic review and taxonomy of VNF forwarding graph embedding. Comput. Netw. 185, 107726 (2021)
490
V. Fulber-Garcia et al.
16. Wang, Y., et al.: Cost-efficient virtual network function graph (VNFG) provisioning in multidomain elastic optical networks. J. Lightwave Technol. 35(13), 2712–2723 (2017) 17. Wikileaks: Amazon atlas 2015. https://wikileaks.org/amazon-atlas/ (2018). Accessed 01 Apr 2021 18. Zhang, Q., et al.: Vertex-centric computation of service function chains in multidomain networks. In: NetSoft Conference and Workshops, pp. 211–218. IEEE (2016) 19. Zitzler, E., et al.: SPEA2: improving the strength pareto evolutionary algorithm. TIK Report 103, 004284029 (2001)
Fundus Eye Image Classification and Interpretation for Glaucoma Diagnosis Ma´ısa Fernandes Gomes and Rafael Stubs Parpinelli(B) Santa Catarina State University - UDESC Graduate Program in Applied Computing, Joinville, SC, Brazil [email protected]
Abstract. Glaucoma is a progressive disease that causes irreversible blindness. Detecting glaucoma in its early stages is a challenging task. Deep learning models have been developed to assist in glaucoma detection from medical images. However, deep learning models are not very transparent in their interpretation. For this reason, interpretability techniques can be applied to explain what information are relevant to the model’s predictions. In this work, we proposed applying interpretability techniques in a Residual Neural Network (ResNet) model to detect glaucoma from fundus images using public databases. Models trained with individual databases showed assertive results on test images. Also, the model trained with a merged dataset reached an accuracy of 99.70%. Two interpretability techniques, LRP, and grad-CAM were applied to provide a visual explanation of the important areas for classification. Keywords: Diagnostic Support · Model Interpretability Deep Learning · High-performance Computing
1
· Glaucoma ·
Introduction
Glaucoma is an eye disease characterized by increased intraocular pressure and cupping. The disease is progressive and does not cause symptoms until it is in an advanced stage, hence the importance of diagnosing glaucoma early to slow down its progression. One of the ways to diagnose glaucoma is from eye fundus images, where the ophthalmologist observes and analyzes the optic disc [13]. With advances in machine learning techniques to help in the medical field, deep learning algorithms have been developed to detect suspected cases of glaucoma based on medical images [12]. Convolutional Neural Networks (CNN) are Deep Learning (DL) models that get significant results in image recognition. One characteristic that differentiates CNN from other machine learning models is the convolution layers responsible for extracting particular characteristics from input images. Hence, deep layers c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 491–500, 2023. https://doi.org/10.1007/978-3-031-27440-4_47
492
M. F. Gomes and R. S. Parpinelli
are added to the model architecture to achieve these features, making the model complex and not very transparent about how it works and what features are relevant for predictions [12]. As the detection of diseases is a task with a high impact on the life of society, the explanation of how the model arrived at the prediction is essential to bring credibility and confidence to the model. Interpretability techniques have been developed to obtain better explanations for DL models. These techniques can be used in the architecture or after the model has already been trained [3,8]. This work aims to apply interpretability techniques to CNN to detect glaucoma in fundus images and understand which image areas are most important for classification through heatmaps. For this, a convolutional neural network ResNet50 [4] was trained with two public databases, ACRIMA and RIM-ONE, and with a third database ACRIMA+RIM-ONE created by merging the first two databases. For model interpretability, two techniques, LRP [2] and gradCAM [10], were used. Both techniques seek to show which information is most important for predictions. This paper is organized as follows. Section 2 is the theoretical basis for understanding the work, Sect. 3 describes the related works, and Sect. 4 describes the proposed model. Section 5 contains the experiments, the results obtained, and the analysis of these results. Section 6 presents the conclusion and future works.
2 2.1
Theoretical Background Glaucoma Disease and Eye Fundus Image
Glaucoma is one of the leading causes of irreversible blindness. The disease is defined by progressive damage to the optic nerve head and the retinal nerve fiber layer. The optic disc area is where the various axons of the human eye are located. The grouping of these axons forms the nerve fiber bundle called the optic nerve that connects incoming information to the brain [11]. The area of the disc that is not filled by nerves forms a depression called a cup. The excavation of the optic nerve is natural for any human being. However, when it noticed an increase in the size of this excavation, it is the first sign that the person could be affected by glaucoma because the nerve fibers die, causing the excavation. With this increase in excavation, the patient’s visual quality is damaged [13]. Fundus images are medical images that help detect and monitor eye diseases. Figure 1 demonstrates a fundus image and its anatomy. The main area of interest for the diagnosis of glaucoma is the optic disc, indicated in Fig. 1 by the circle. The yellowish area inside the optic disc represents the cup. Analyzing the size of the cup in relation to the optic disc is one of the first ways to identify glaucoma in one, but it is not the only metric used to diagnose that the patient has glaucoma. For this reason, identifying the disease, especially in the early stages, is a challenging task that requires specialized ophthalmologists [13]. Glaucoma is progressive and asymptomatic, and symptoms occur when the disease reaches an advanced stage of blindness, causing a reduction in the
Fundus Eye Image Classification and Interpretation for Glaucoma Diagnosis
493
patient’s quality of life. Therefore, early diagnosis is of great importance for intervention to slow the progress of the disease [13]. With advances in machine learning techniques to assist in the medical field, deep learning algorithms have been developed to detect suspected cases of glaucoma based on medical images [9].
Fig. 1. Fundus image. Adapted from [13]
2.2
Convolutional Neural Networks and Model Interpretability
The goal of image classification tasks is that the model learns to classify the data correctly. For conventional models to perform these tasks with good results, they need an excellent extractor of specific features because when it comes to images, the models must be insensitive to variations such as position, illumination, and orientation but are sensitive to minimal and specific variations. Achieving this feature extraction with conventional models requires considerable knowledge of the subject. At this point, the deep learning models are advantageous because, in their architecture, they can do the feature extraction automatically [3]. Convolutional Neural Networks (CNN) is a deep learning model responsible for the significant increase in the use of DL for image recognition. The classification of the input is given by a score that represents the percentage of how much the received input belongs to a specific class [1,12]. Adding deep layers to the model makes it robust and can solve tasks that were not possible before. However, it makes the architecture complex by letting the model act like a black box, not providing detailed information about how the model works and how it arrived at its predictions [8].
494
M. F. Gomes and R. S. Parpinelli
Interpretability techniques make the machine learning model more understandable as to how it works and explain why they predict what they predict. They can be divided into two categories: transparent and post-hoc methods. In transparent methods, the structure of the model is designed to explain how the model acts, working oppositely to black box models. Post-hoc methods are used on already trained models, focusing on explaining how the model makes its predictions, either by visualization or natural language explanation. One of the advantages of post-hoc methods over the transparent method is that the posthoc method does not compromise network performance because it is applied to an already trained model [6,10]. Among the existing techniques in post-hoc interpretability methods, the pixel attribution explanation technique aims to explain the contribution of a pixel to the classification [8]. Layer-wise relevance propagation (LRP) is a post-hoc interpretation method of explanation by pixel assignment through the gradient to obtain pixel decomposition. The method proposed by [2], acts to propagate the prediction in the opposite direction in the artificial neural network until it reaches the first layer. From a classification f(x), it is desired to obtain the decomposition of this relevance for the neurons of the previous layer. Ultimately, it is possible to convert this classification into a heatmap for visualization. The LRP method has different relevance redistribution rules, and each one generates a different heatmap. The Gradient-weighted Class Activation Map (grad-CAM) is a post-hoc interpretability method that does not require the model to change its architecture or be retrained to achieve interpretability. The grad-CAM technique uses the gradient information from the last convolution layer of the CNN to calculate the importance of each neuron for classification. As this layer’s neurons have parts of the input image, it is possible to obtain visual interpretability of the most important areas of the image for classification.
3
Related Works
Advances in deep learning in image recognition tasks have inspired some studies to use these techniques to diagnose glaucoma in fundus images. Some works apply interpretability techniques to obtain a visual explanation making these models more explainable. In the work of [5], some models of CNNs were implemented for detecting glaucoma in fundus images using a set of more than 1900 images for training and testing. The models were also validated with an external database. The experiments were performed using an NVIDIA GeForce GTX TITAN X GPU. The trained architectures were the VGG 16, ResNet 152, and Inception-v4. Among the architectures, the best result was achieved with the ResNet 152 with changes in the configuration and trained with the set of images with clipping in the optical disc area. This model achieved an accuracy of 96%. Then, the application of grad-CAM on the models allowed the visualization of the highlights of the most important areas in the input images. This work observed that the model presented low accuracy when trained with external image sets.
Fundus Eye Image Classification and Interpretation for Glaucoma Diagnosis
495
In the work of [7], a CNN was proposed for detecting glaucoma, focusing on interpretability, and later built a mobile integration. For this, fundus images from several datasets were merged. The result of the model’s classification is analyzed together with the calculations of the morphological characteristics to make the decision. For the MobileNetV2 classification architecture training, an Nvidia Tesla V100 GPU was used, and for the evaluation, other architectures were also used, such as VGG16, VGG19, InceptionV3, and ResNet50. The application of grad-Cam to the model generated images that helped to increase the reliability of the predictions made by the CNN. Studies have shown that CNN architectures present promising results for classifying medical images, aiding in the detection of glaucoma. Applying the gradCAM interpretability technique allowed the visualization of the most important areas for classifying images of a specific class. This application of interpretability in the models brought more reliability to the predictions made by CNN, as it provides a visualization of which areas the model uses to predict. In this work, we proposed implementing a CNN model for the same class of problem. However, in addition to applying the grad-CAM interpretability technique, the LRP technique will also be applied to the trained models. LRP denotes a more punctual view than grad-CAM because, in its implementation, LRP goes to the CNN input layers, while grad-CAM only goes to the last convolution layer.
4
Proposed Model
This work aimed to apply two interpretability techniques in a CNN developed to classify glaucoma fundus images. The scope of this work can be divided into three steps, as shown in Fig. 2. The first step is the pre-processing of input images. Then, the CNN model is developed, trained, and validated in the second step. The last step is the application of interpretability techniques in the trained models. The pre-processing performed on the images was to resize them to the CNN input size of 224 × 224 pixels. The images from each database were randomly separated into a 70% training set and a 30% test set. Balancing was done to ensure the same number of images for each class, glaucoma and normal. The architecture chosen for CNN was the Residual Neural Network (ResNet) [4]. This CNN architecture is characterized by having shortcuts where a group of convolutional layers is ignored. The architecture implementation was done with the Keras1 library. A pre-trained version of Resnet trained with the ImageNet database was used. Among the ResNet models, the ResNet-50 was chosen, which has 50 deep layers, a pooling layer, and a dense layer with a softmax function. The trained models are evaluated using the accuracy metric. The model accuracy metric looks at how many correct predictions were made against all predictions made in the test set. An amount of 10 runs of the proposed models were carried out to obtain the average and standard deviation of the accuracy. 1
https://keras.io.
496
M. F. Gomes and R. S. Parpinelli
Fig. 2. Proposed model flowchart.
After the model was trained and consolidated, interpretability methods were used to explain pixel assignment. As the methods are used in already trained models, it is important to ensure the models have good accuracy, returning a more accurate and understandable interpretability result. The two techniques used for interpretability were LRP and Grad-CAM. From an input image, an already trained model makes the classification, and the value of the prediction made by the model is used by the interpretability techniques to make the pixel attribution calculations for that classification. Both techniques return values that can be converted into a heatmap where it is possible to visualize the most critical parts of the prediction made by the model. The two interpretability techniques are similar in objective but differ in how they are calculated to generate the heatmap. The red points in the images generated by LRP show the most important areas, while the images generated by grad-CAM are red areas. The Keras-vis2 libraries for Grad-CAM and the iNNvestigate3 library for LRP were used to apply the interpretability.
5
Experiments, Results, and Analysis
To perform the experiments, the Linux Ubuntu 20.04 operating system and AMD Ryzen 7 2700X, 3.7 GHz/4.35 GHz, 16 CPUs, 16 GB RAM, 1TB HD, 240 GB SSD, GeForce RTX 2060 Super hardware were used. For the implementation, Python language was used. For the LRP application, the iNNvestigate 2 3
https://github.com/albermax/innvestigate. https://keisen.github.io/tf-keras-vis-docs/.
Fundus Eye Image Classification and Interpretation for Glaucoma Diagnosis
497
library was employed. For the grad-CAM application, the Keras-vis library was employed. The implementation of the CNN model was done with the Keras library. Two public databases were used to perform the experiments: the RIM-ONE4 and the ACRIMA5 . Both have images from eye fundus exams with the presence or absence of glaucoma. These databases were chosen because they have images cut in the area of the optic disc. A third base was created and called ACRIMA+RIM-ONE6 , merging both previous databases. Some images from ACRIMA database were noisy, so they were excluded from the merged database. The number of images in each database can be seen in Table 1. From the databases, three models were implemented: model A trained with the ACRIMA database; model B trained with the RIM-ONE database; model C trained with the merged database ACRIMA+RIM-ONE. For the training step, 200 epochs, and a batch size of 8 were used. The Adam optimizer was employed, with values for the hyperparameters of each model empirically defined as shown in Table 2. The results obtained in the test set by each model are presented in Table 3. Model A was also applied to the test set of the RIM-ONE database, model B was also applied to the test set of the ACRIMA database, and model C was also applied to the test set of the ACRIMA and the RIM-ONE databases. From Table 3, highlighted results indicate in which database the model in the column was trained. Analyzing the highlighted results, it is possible to notice that all three models achieved outstanding results demonstrating that they can correctly classify the fundus images when tested in their respective trained databases. Although all databases are in the same application context, the results obtained by models A and B when applied to a different database from its training were very poor. This indicates that each database has specific and non-perceptive features when visually inspecting the fundus images. From Table 3, model C captured the features of both databases achieving accurate results in all experiments. Concerning processing time, the use of GPU enabled the acceleration of the training process of all models. Comparing the processing time of training the models with and without the use of GPU, a speedup of 6x was achieved on average.
4 5 6
https://github.com/miag-ull/RIM-ONE-dl. https://figshare.com/ndownloader/files/14137700. https://drive.google.com/file/d/1Z49SDwFvRoCzavtFaECHvqVUF yVGOVZ.
498
M. F. Gomes and R. S. Parpinelli Table 1. Image quantity in each dataset. Name
Glaucoma Normal
ACRIMA 396 172 RIM-ONE ACRIMA+RIM-ONE 541
309 313 579
Table 2. Parameters for each model. Model Learning Rate Epsilon Decay A B C
0.0005 0.0001 0.0001
1e−10 1e−11 1e−10
0.01 0.001 0.001
Table 3. Results obtained in the test set for each model in each database. Dataset
Model A Model B Model C
ACRIMA 99.37% 50.08% RIM-ONE ACRIMA+RIM-ONE –
58.27% 96.87% –
99.24% 99.81% 99.70%
After training the CNNs, the interpretability techniques of models A, B, and C were applied. Figure 3 compares the results obtained in applying LRP and grad-CAM techniques for all three models. The input images used to verify interpretability were images present in the test set of each model and labeled as glaucoma. It is possible to notice that the image resulting from the application of interpretability techniques in the three models is similar to the region of most significant importance, where the red area is concentrated in the optic nerve region. As the pixel allocation calculation is different between the two techniques, it is noted that the points highlighted by the LRP form more concentrated areas of interest than the grad-cam, which presents larger areas. This indicates that these were the most critical parts of the image for the model to classify as glaucoma. Hence, with the application of two techniques, LRP, and grad-CAM, complementary information in the visual analysis can be achieved.
Fundus Eye Image Classification and Interpretation for Glaucoma Diagnosis
499
Fig. 3. Interpretability Results.
6
Conclusion and Future Work
Ophthalmology deals with the structure, functions, and diseases of the eye, such as glaucoma, which can lead to total loss of vision in its advanced stages. Early glaucoma diagnosis is essential to prevent disease progression. Advances in deep learning for image recognition have enabled models to be developed to support medicine in diagnosing diseases using medical images. In this work, a CNN ResNet50 model was developed to detect glaucoma from eye fundus images. Also, two interpretability techniques were applied, LRP and grad-CAM, in order to provide a visual interpretation of which input image pixels were more critical for classifying a given class. Two publicly labeled eye fundus image databases for glaucoma diagnosis, ACRIMA, and RIM-ONE, were used in experiments. Models trained and tested on these databases achieved outstanding accuracy results of 99.37% and 96.87%, respectively. However, when applying the CNN models in a different database from its training the results were very poor, indicating that each database has specific and non-perceptive features when visually inspecting the fundus images. Hence, a merged database was created in order to aggregate all features from both ACRIMA and RIM-ONE databases. Results obtained by the model trained in the merged database achieved outstanding in all test sets, indicating that the model is robust and able to capture the features of both databases. Concerning the interpretability of the results obtained, with the application of two techniques, LRP, and grad-CAM, complementary information in the
500
M. F. Gomes and R. S. Parpinelli
visual analysis can be achieved. In most cases, the most important points for the prediction were the part of the optic nerve. Some future work directions are to release the system to a team of experts to analyze and validate interpretability results and to apply the system to other eye fundus images for glaucoma diagnosis. Acknowledgements. This work received financial support from the Coordination for the Improvement of Higher Education Personnel - CAPES - Brazil (PROAP/AUXPE) 0093/2021 and from the FAPESC agency.
References 1. Aloysius, N., Geetha, M.: A review on deep convolutional neural networks. In: 2017 International Conference on Communication and Signal Processing (ICCSP), pp. 0588–0592 (2017) 2. Bach, S., Binder, A., Montavon, G., Klauschen, F., M¨ uller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 1–46 (2015) 3. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2015) 5. Kim, M., et al.: Medinoid: computer-aided diagnosis and localization of glaucoma using deep learning. Appl. Sci. 9(15), 3064 (2019) 6. Lipton, Z.C.: The mythos of model interpretability. Queue 16(3), 31–57 (2018) 7. Martins, J., Cardoso, J.S., Soares, F.: Offline computer-aided diagnosis for glaucoma detection using fundus images targeted at mobile devices. Comput. Methods Programs Biomed. 192, 105341 (2020) 8. Molnar, C.: Interpretable Machine Learning. 2nd edn. (2022) 9. Ng, W.Y., et al.: Updates in deep learning research in ophthalmology. Clin. Sci. 135(20), 2357–2376 (2021) 10. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: GradCAM: visual explanations from deep networks via gradient-based localization, pp. 618–626 (2017) 11. Stein, J.D., Khawaja, A.P., Weizer, J.S.: Glaucoma in adults-screening, diagnosis, and management. JAMA 325(2), 164–174 (2021) 12. Thompson, A.C., Jammal, A.A., Medeiros, F.A.: A review of deep learning for screening, diagnosis, and detection of glaucoma progression. Transl. Vis. Sci. Technol. 9(2), 42 (2020) 13. Weinreb, R.N., Aung, T., Medeiros, F.A.: The pathophysiology and treatment of glaucoma: a review. JAMA 311(18), 1901–1911 (2014)
Ensemble Learning Based Big Data Classification for Intrusion Detection Kamel Yasmine Kamel(B) , Farah Jemili(B) , and Rahma Meddeb(B) Universite de Sousse, Higher Institute of Computer Science and Communication Technologies of Hammam Sousse, 4011 Hammam Sousse, Tunisia [email protected], jmili [email protected], [email protected]
Abstract. The growth of technology has made life much easier with its speed, but we cannot deny that it suers from multiple security problems. Therefore, the Intrusion Detection System appears to overcome these problems, it is indeed a support layer to maintain the information security and keep suspicious activity o the network. To date, Machine Learning algorithms are useful for Intrusion Detection Systems to enable them to identify security threats. Based on the Machine Learning, the paper evaluates the performance of classication algorithms: Random Forest, eXtreme Gradient Boosting, Decision Tree and Naive Bayes. The comparison of the overall performance is done in terms of detection accuracy. Thus, the experimental results on CICIDS2017, N-BaIot and NSL-KDD recent public datasets to ensure the detection of network intrusions, prove the high performance of Random Forest compared to other classiers. In this paper the Big Data techniques have resorted in Intrusion Detection Systems to process and evaluate the Big Data with the aim of ensuring an efficient and accurate data analysis process. Keywords: Intrusion Detection System · Ensemble Learning · Machine Learning Algorithm Random Forest · eXtreme Gradient Boosting · Decision Tree · Naive Bayes
1
Introduction
Traditional security and privacy challenges continue to grow. These issues represent significant barriers to the deployment of various systems and their widespread adoption. Intrusion detection systems (IDS) provide a solution to address security and privacy challenges by detecting multiple attacks. Intrusion detection has been an important area of work for over three decades. Researchers have gained a better understanding of intrusion detection within networks, as well as security requirements. Many researchers have discussed the research issues related to intrusion detection in the distributed environment. As a result, the security and privacy of IDS for the distributed environment is essential and is its priority. Intrusion detection is an important security problem in the cyber c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 501–511, 2023. https://doi.org/10.1007/978-3-031-27440-4_48
502
K. Y. Kamel et al.
world. A large number of techniques based on machine learning approaches have been developed. It should also be noted that they are not very useful for ensuring the identification of all types of intrusions. Over the past decade, the application of machine learning techniques to the problem of intrusion detection has increased to maintain improved adaptability and detection rates. Thus, these techniques are often used in the hope of updating knowledge bases on attacks. Indeed, the machine learning techniques used are able to achieve high detection rates, reasonable communication and computation costs and lower false alarm rates. Indeed, ML is a subset of Artificial Intelligence (AI). It essentially allows the system to learn and advance its automatic capability using experience without any programming. In the case of an Intrusion Detection System (IDS), the ML algorithm performs better in detecting attacks on a large amount of data in less time. Specifically, ML algorithms are classified into three categories: semisupervised, unsupervised and supervised [16]. In fact, the supervised algorithm processes and evaluates data labeled by class, then it finds the relationship that connects the data and their class. In addition, this can be done by regression or by classification. Thus, classification is divided into two phases: training and testing. The training data is formulated using a response variable. We mention some common algorithms used for classification: support vector machine (SVM), Na¨ıve Bayes, logistic regression and neural network, random forest and decision tree. In this work, we have integrated algorithms such as eXtreme Gradient Boosting (XGboost) by testing each time with a data set in order to explore their techniques. Thus, these algorithms belong to several learning approaches. Indeed, we have implemented both types of sets: heterogeneous and homogeneous. The first set consists of several types of classifiers and the second set consists of several instances with the same type of model. The field of machine learning plays an important role nowadays, it guarantees a solution to overcome the challenges of Big Data. In fact, machine learning refers to a group of algorithms or modeling techniques with the ability to learn using data and also make decisions without any human intervention. Machine learning techniques are useful in many situations where a large amount of data is available. To address Big Data problems, machine learning offers a modular and scalable strategy for dealing with data. Big Data consists of a large amount of semi-structured, unstructured and structured data in a heterogeneous format. Therefore, because of this amount of data, the traditional intrusion processing system does not have the capacity to overcome the problems. In addition, the use of an ML algorithm is essential to ensure an IDS of a data environment. In this regard, big data guarantees unprecedented richness of information to the various ML algorithms to do the extraction of the underlying patterns and the creation of predictive models. It should also be noted that traditional Machine Learning algorithms sometimes face critical challenges, such as scalability, to unlock hidden value from big data. The traditional detection system isn’t able to detect the intrusions quickly due to the large volume and high speed of data. In order to assess the intrusions, big data techniques are useful. Thus, big data is defined by 7 V’s which are volume: size of data, velocity: speed that allows the formulation of data, variety: multiple
Ensemble Learning Based Big Data Classification
503
types of data, value: value intended for data, veracity: reliability and quality of data, variability: permanent change in the meaning of data, and visualization: ease of reading or accessing data. In addition, the exponential growth rate of data has the effect of making the traditional data processing system complex, because it requires a large number of resources and more time. Naturally, Big Data is complex to analyze and requires advanced algorithms and powerful technologies [1,8,15]. In this paper, we first present the methodology used, i.e., the proposed approach to the simple and hybrid learning technique, Then, we do the test with three datasets. In the next section, we compare the methodologies and discuss the results. Finally, we do the classification with Big Data.
2
Related Works
Several types of research are introduced for intrusion detection systems. With the emergence of Big Data, the old techniques are no longer effective in dealing with Big data. As a result, several researchers intend to follow the same approach and techniques of Big Data to form a scalable, fast and accurate intrusion detection system. In this section, we present some researchers who have used Big Data machine learning techniques in their evaluation for intrusion detection to maintain classification with Big Data (Tables 1 and 2). Table 1. Related Works for Intrusion detection systems Ref
Authors
Research Statement
Techniques
Main Works
[6]
Dhakar et al
Hybrid Intrusion Detection Systems
Na¨ıve Bayes elevated tree small error pruning
Detect unknown intrusions and minimize the false alarm rate
[26]
Marcelo et al
Map reduce Map reduce for intrusion detection
Time reduction
[4]
Chitrakar et al.
Anomaly based Intrusion Detection
Na¨ıve Bayes classification and K-Medoids clustering
Elevate the detection and rate minimize the false alarm rate
[23]
Rachana Sharma et al
Classified big data intrusion detection systems
K-Nearest Neighbor (KNN)
Mapreduce method with K-Nearest Neighbor classifiers detection rate and FPR
[10]
Jingwei Classified Huang et al. big data intrusion detection systems
LDA
Detection rate
504
K. Y. Kamel et al.
Sharafaldin et al. [22]created an efficient CICIDS2017 dataset, it includes common attack flows and benign network flows. These researchers evaluated the accuracy and performance of the selected features with common machine learning algorithms, such as Random Forest, K-Nearest Neighbor, Adaboost, Naive-Bayes... Indeed, a comparison between the newly produced dataset and universally available datasets from 1998 until 2016 was made considering the 11 criteria showing the current criticisms and errors of the old datasets. As a result, the analyzed results prove that the new dataset reveals all criticisms and errors. Tama [19] evaluated the performance of IDS by measuring the accuracy and false alarm rate using a random forest classifier. He also applied the cross-validation technique 10 times. In addition, for the experiment, Tama used IDS datasets such as NSL-KDD, and GPRS. Then, he compared the results for Decision Tree, Multi-Layer Perceptor (MLP), and NB-Tree classifiers. Finally, following the analysis of these results, they proved the effectiveness and reliability of the proposed model for the Random Forest classifier using a cross-validation technique and parameter settings. Belouch et al. [2] addressed the performance of IDS Random Forest, SVM, Decision Tree, and Na¨ıve Bayes classification algorithms with Apache Spark. Then, they evaluated the overall performance on the UNSW-NB15 dataset in terms of training time, accuracy and prediction time. Vimalkumar and Randhika [25] chose to propose a Big Data framework for the purpose of detecting intrusions in smart grids by applying algorithms such as SVM, Na¨ıve Bayes, DT and Random Forest. In this regard, they used PCA to reduce dimensionality and a correlation-based method to select features. This approach ensures the reduction of the attack prediction time and the elevation of the accuracy for the classification task. This work was carried out by training and testing with the Synchrophasor dataset. Finally, the results were compared in terms of FPR, Recall, specificity and accuracy rate. Ferrag et al. [7]evaluated deep learning models on the CICIDS2018 and Bot-IoT dataset [13]. These models used contain reduced Boltzmann machines, RNNs, convolutional neural networks (CNNs), deep autoencoders, deep belief networks and deep neural networks. The Bot-IoT dataset used refers to a creation designed in 2018 from the University of New South Wales (UNSW), It consists of about 72,000,000 normal instances with botnetInternet of Tings (IoT) containing 46 features. The evaluation was performed on Google Colaboratory 8 by applying TensorFlow with GPU acceleration and Python. Moreover, they used only 5% of the instances as proven by [13]. It is worth noting obviously that the best precision for the Bot-IoT dataset is equal to 98.39%, it was achieved by applying a deep autoencoder, but for the CICIDS2018 dataset, the best reached the value of 97.38% by using an RNN. Regarding the highest recall for Bot-Iot is about 97.01% performed with CNN, similarly the highest recall for the CICIDS2018 dataset is equal to 98.18% found using a deep autoencoder. Chun Guo et al. [18]developed a hybrid learning method named distance sum-based SVM (DSSVM) to train an efficient SDI. In this method, there is a determination of the sum of distances that are based on the correlation between each sample containing data and an obtaining of dimensions for the features of the centers of the dataset. As an example, let
Ensemble Learning Based Big Data Classification
505
us consider the hybrid scheme for IDS of an application of the SVM algorithm developed by Li et al [14]. In this proposal, a GFR method and a hybrid SVM method were developed. This allowed to reach a detection rate up to 98.63%. Thus, the GFR method is initially used for the purpose of extracting 19 critical features to maintain multiple network visits. In the following, they applied the SVM algorithm to ensure the classification of the situation responsible for decision management i.e. to say whether the visit is normal or abnormal. There are research results regarding intrusion detection using machine learning algorithms of the data cited in the bibliography intended for solving the problem of improving the performance of IDS using the new machine learning approaches. In what follows a table presenting analyses of research work done on machine learning algorithms in the IDS that are addressed to Big Data. Researchers are always trying to find a reliable method to detect intrusions with low false alarm rate, high performance and high speed. The main goal of this paper is to increase the speed of intrusion detection in a Big Data environment and improve the performance. In this work, several researchers have employed Apache Spark Big Data tools due to their speed compared to Hadoop.
3 3.1
Existing Techniques Classifications of an IDS for Network Analysis
Security breaches consist of internal and external intrusions. There are three main types of cyber analysis: signature or misuse-based, hybrid and anomalybased techniques. The misuse techniques are used for the detection of known attacks using the signatures of these attacks. These are attractive and relevant for the detection of known attacks and the avoidance of excessive false alarms. Specifically, they require manual updates of the database using signatures and rules. As a result, misuse techniques are not able to detect new attacks. Other types of techniques are those based on anomalies representing normal system and network behavior, and designate these anomalies as deviations from normal behavior. Indeed, these techniques are effective due to their ability to detect zeroday attacks. In addition, they are characterized by normal activity profiles that are customized for each network, model or application. This sometimes creates difficulties for attackers, such as their inability to know what activities they can conduct without being detected. Another advantage is that data from anomalybased techniques can be used to identify signatures of misused sensors. Among the famous disadvantages of these techniques is the high FAR, due to novel system behaviors that can sometimes be considered as false [11,17]. The last type of technique is the hybrid, this detection technique is a mixture of the two previous detection techniques. It is a combination of anomaly and abuse detection. In fact, they are designed to minimize the FP rate from unknown attacks and maximize the detection rates of known intrusions. Research and analysis did not uncover many pure anomaly detection methods, of course, because the majority of the techniques were hybrids [24]. In conclusion, signature-based techniques are
506
K. Y. Kamel et al.
used specifically to detect known attacks, as opposed to anomaly-based detection, which is used to detect unknown behavior within the network. For hybrid detection, it combines both detection techniques. In addition, there are four categories of cyber-attacks: remote to local (R2L), probing, denial of service (DOS) and user to root (U2R). In the case where a user tries to gain access as a local user, this attack is called R2L [9]. In contrast to this attack, if a law-abiding user is denied access to the system by giving back the occupied network resources, it is called DOS. In addition, in the case where a user tries to have the normal access rights of a root/admin user, we speak of the U2R attack. 3.2
Ensemble Learning
Ensemble learning is a new trendy method that has invaded the field of data mining and artificial intelligence. Thus, there is a combination of several learning algorithms and the intention to draw strength from the weak algorithms in order to ensure the effectiveness of the classifier. In the following, we evaluated three ensemble classification techniques: bagging, boosting and stacking, using several weak classifiers. Bagging, this technique stands for aggregation by bootstrapping, it is a simple and effective ensemble method for maintaining improvement in unstable classification problems. This method is useful for high dimensional data, as an example, intrusion datasets, where it is difficult to find a good classifier that can work in one step because of the problems of complexity and scale of the data [5]. Boosting is an ensemble method designed to increase the performance of weak classifiers into a strong classifier. Indeed, this technique is one of the model averaging methods designed mainly for classification, but it can also be applied to regression. In fact, boosting ensures sequential learning of the predictors. Thus, the first predictor learns through the data set, but the next one learns through the training sets based on the performance and skills of the previous predictor [12]. On the other hand, poorly ranked examples are scored and their weight increases significantly to have a high probability of appearing in the next predictor’s learning set. As a result, we obtain several machines specialized in predicting multiple domains of the data set. In this evaluation, we focused initially on the AdaBoost algorithm, this boosting technique is very useful for building a strong classifier [12]. Stacking or stacked generalization is another method of combining different classifiers. This technique is not similar to boosting or bagging since it is used to group multiple classifiers, such as NB, DT, etc. [3]. Thus, stacking has two levels: the first is the base learner (level 0) and the second is the stacking model learner (level 1). The basic learner (level 0) typically uses different models to learn well from a dataset [21]. Then, the outputs of each of the models are associated to build a new dataset. The resulting new dataset consists of instances each of which is related to the actual value designed for prediction. Finally, this dataset is exploited by the model learner to give the final output [20].
Ensemble Learning Based Big Data Classification
4 4.1
507
Proposed Techniques Classification Using Big Data Techniques
In this assessment, we present an approach that relies on Apache Spark Structured Streaming to maintain intrusion classification. Indeed, HDInsight guarantees the installation and configuration of Apache Spark. So, this approach will gain storage and distributed processing. This work involves the following phases: -Collecting traffic data from the Fukuda Lab using a database to obtain an up-to-date dataset. -Downloading the files after collection via the Fukuda Lab website and then transferring it to Azure Blob. -Linking the data directly to Spark at the time the data is put into Azure Blob. -The pre-processing and cleaning of the data. -The ranking of data using ML-Model. -The classifications are finally stored in Azure Data Lake Store. Thus, our system wants to highlight the achievement of a cost-effective and efficient intrusion detection system so that it adapts to the workload with significant processing power if there is an increase in workload. This evaluation must provide a decrease in data analysis time using Apache Spark which is a distributed Big Data framework (Fig. 1).
Fig. 1. Intrusion classification diagram
4.2
Evaluation Method
The proposed simple and hybrid models were evaluated in terms of false positive rate (false alarms) and f-score. Here are the results in terms of accuracy. The evaluated algorithms have many hyperparameters, which allows us to use them to perform classification tasks. We present some of the parameters introduced to obtain the results. On the N-BaIot dataset which considers all 115 features, we introduced these hyperparameters RF max depth = 8, n estimators = 130), XGBoost (base score = 0.3, n estimators = 10), DT (max depth = 6, random state = 2, splitter = random). On the CICIDS2017 dataset, the following hyperparameters were used: RF (n estimators = 1), XGBoost (base score = 0.3, n estimators = 10), DT (criterion = gini, max depth = 5, random state = 2, splitter = random).
508
K. Y. Kamel et al. Table 2. Results for Single and hybrid Algorithms Dataset
Algorithm
ACC % FNR
FPR
Detection Rates
N-BaIot
DT XGBoost RF DT+RF XGBoost+DT RF+XGBoost
95.74 96.23 96.31 96.36 96.25 96.39
0.003 0.005 0.001 0.001 0.001 0.0007
0.01 0.015 0.001 0.001 0.0007 0.0007
0.59 0.59 0.59 0.59 0.59 0.59
NSL-KDD
DT XGBoost RF NB DT+RF XGBoost+DT RF+XGBoost
94.79 95.35 97.14 90.94 98.21 96.30 98.21
0.026 0.01 0.003 0.039 0.019 0.019 0.019
0.08 0.08 0.056 0.14 0.016 0.056 0.016
0.51 0.52 0.52 0.5 0.52 0.52 0.52
CICIDS2017 DT XGBoost RF NB DT+RF XGBoost+DT RF+XGBoost
95.86 96.2 96.30 86.81 96.24 96.08 96.36
0.006 0.001 0.001 0.02 0.001 0.001 0.0007
0.05 0.05 0.01 0.06 0.0007 0.0007 0.0007
0.59 0.59 0.59 0.57 0.59 0.59 0.59
Finally, for the NSL-KDD dataset, we also integrated hyperparameters for each classification algorithm: RF (max depth =40), XGBoost (base score = 0.5, n estimators =5), DT (max depth = 3, random state = 2, splitter = random). As you can see in the table for the simple algorithms, the results give a better accuracy of 97.14% by applying the Random Forest algorithm. Similarly, by performing a combination of Decision Tree and Random Forest or a combination of XGBoost and Random Forest, we get an accuracy of 98.21%. The change of hyperparameters is the main reason for this improvement. These experiments prove also that ensemble classifiers based on the combination of Random Forest and eXtreme Gradient Boosting can improve the performance of all classifiers. This set of RF and XGBoost produces the highest sensitivity and the set of RF and Decision Tree has the highest specificity following tests on datasets with selected features. However, the combination of the DT and XGBoost is the least efficient. In this work, We have proposed a detailed architecture to model our approach, we focus on ensemble learning to ensure better detection of network intrusion data, which is complicated to detect. The testing technique performed on simple machine learning algorithms may not be the optimal solution. So the strategy of combining the tested algorithms on many datasets is used to get
Ensemble Learning Based Big Data Classification
509
better results in terms of accuracy. This strategy can improve the detection rate of intrusive attacks in an efficient way compared to older approaches based on a simple learning model. Also, experimental results eventually show that feature reduction can significantly increase the detection efficiency of classifier attacks in several cases of weakness. Multiple algorithms and models for standard classification problems are available and well-known in the machine learning literature. However, a very minimal number of these algorithms are suitable for classification with a large volume of data. Machine learning methods such as random forest and decision tree are very suitable for the system that deals with security issues. These are very appropriate for the system that evaluates big data classification challenges.
5
Conclusion
An intrusion detection system based on ML is considered a vital element in maintaining our security protection. The former deep learning and shallow learning strategies follow the single learning model approach to ensure better intrusion detection. But this single learning model approach can have obstacles to properly understand the difficult data distribution of intrusion patterns. In this paper, the main goal of the proposed research is to ensure better classification of the IDS in big data. The hybrid model has been implemented using network traffic data to provide intrusion classification. Finally, the results of the proposed hybrid model are more satisfactory and accurate. In the future, we have the possibility to use the deep learning algorithm with more network datasets.
References 1. Bagui, S., Li, K.: Resampling imbalanced data for network intrusion detection datasets. J. Big Data 8(1), 1–41 (2021). https://doi.org/10.1186/s40537-02000390-x 2. Belouch, M., El Hadaj, S., Idhammad, M.: Performance evaluation of intrusion detection based on machine learning using apache spark. Procedia Comput. Sci. 127, 1–6 (2018) 3. Chand, N., Mishra, P., Krishna, C.R., Pilli, E.S., Govil, M.C.: A comparative analysis of svm and its stacking with other classification algorithm for intrusion detection. In: 2016 International Conference on Advances in Computing, Communication, & Automation (ICACCA)(Spring), pp. 1–6. IEEE (2016) 4. Chitrakar, R., Huang, C.: Anomaly based intrusion detection using hybrid learning approach of combining k-medoids clustering and Naive Bayes classification. In: 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1–5. IEEE (2012) 5. Chowdhury, R., Sen, S., Roy, A., Saha, B.: An optimal feature based network intrusion detection system using bagging ensemble method for real-time traffic analysis. Multimedia Tools and Applications, pp. 1–23 (2022) 6. Dhakar, M., Tiwari, A.: A novel data mining based hybrid intrusion detection framework. J. Inf. Comput. Sci. 9(1), 037–048 (2014)
510
K. Y. Kamel et al.
7. Ferrag, M.A., Maglaras, L., Moschoyiannis, S., Janicke, H.: Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J. Inf. Secur. Appl. 50, 102419 (2020) 8. Hafsa, M., Jemili, F.: Comparative study between big data analysis techniques in intrusion detection. Big Data Cogn. Comput. 3, 1 (2019). https://doi.org/10.3390/ bdcc3010001. https://www.mdpi.com/2504-2289/3/1/1 9. Ho, S., Al Jufout, S., Dajani, K., Mozumdar, M.: A novel intrusion detection model for detecting known and innovative cyberattacks using convolutional neural network. IEEE Open J. Comput. Soc. 2, 14–25 (2021) 10. Huang, J., Kalbarczyk, Z., Nicol, D.M.: Knowledge discovery from big data for intrusion detection using IDA. In: 2014 IEEE International Congress on Big Data, pp. 760–761. IEEE (2014) 11. Jemili, F., Zaghdoud, M., Ahmed, M.B.: Intrusion detection based on “hybrid” propagation in Bayesian networks. In: 2009 IEEE International Conference on Intelligence and Security Informatics, pp. 137–142. IEEE (2009) 12. Kilincer, I.F., Ertam, F., Sengur, A.: A comprehensive intrusion detection framework using boosting algorithms. Comput. Electr. Eng. 100, 107869 (2022) 13. Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-IoT dataset. Futur. Gener. Comput. Syst. 100, 779–796 (2019) 14. Li, Y., Xia, J., Zhang, S., Yan, J., Ai, X., Dai, K.: An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Syst. Appl. 39(1), 424–430 (2012) 15. Mahdavisharif, M., Jamali, S., Fotohi, R.: Big data-aware intrusion detection system in communication networks: a deep learning approach. J. Grid Comput. 19(4), 1–28 (2021) 16. Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M.: Benchmarking of machine learning for anomaly based intrusion detection systems in the cicids2017 dataset. IEEE Access 9, 22351–22370 (2021) 17. Meddeb, R., Jemili, F., Triki, B., Korbaa, O.: Anomaly-based behavioral detection in mobile Ad-Hoc networks. Procedia Comput. Sci. 159, 77–86 (2019) 18. Mukherjee, S., Sharma, N.: Intrusion detection using Naive Bayes classifier with feature reduction. Procedia Technol. 4, 119–128 (2012) 19. Primartha, R., Tama, B.A.: Anomaly detection using random forest: a performance revisited. In: 2017 International Conference on Data and Software Engineering (ICoDSE), pp. 1–6. IEEE (2017) 20. Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. 2020, 4586875 (2020) 21. Rashid, M., Kamruzzaman, J., Imam, T., Wibowo, S., Gordon, S.: A tree-based stacking ensemble technique with feature selection for network intrusion detection. Applied Intelligence, pp. 1–14 (2022) 22. Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: ICISSP, vol. 1, pp. 108– 116 (2018) 23. Sharma, R., Sharma, P., Mishra, P., Pilli, E.S.: Towards mapreduce based classification approaches for intrusion detection. In: 2016 6th International ConferenceCloud System and Big Data Engineering (Confluence), pp. 361–367. IEEE (2016) 24. Shaukat, K., et al.: Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies 13(10), 2509 (2020)
Ensemble Learning Based Big Data Classification
511
25. Vimalkumar, K., Radhika, N.: A big data framework for intrusion detection in smart grids using apache spark. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 198–204. IEEE (2017) 26. Wang, L., Jones, R.: Big data analytics for network intrusion detection: a survey. Int. J. Netw. Commun. 7(1), 24–31 (2017)
Application of Combined SWOT and AHP Analysis to Assess the Virtual Reality and Select the Priority Factors for Education El Mostafa Bourhim1,3(B) and Oumayma Labti2,3 1 EMISYS: Energetic, Mechanic and Industrial Systems, Engineering 3S Research Center,
Industrial Engineering Department, Mohammadia School of Engineers, Mohammed V University, Rabat, Morocco [email protected] 2 Laboratory of Research in Management, Information and Governance, Faculty of Juridical Economic and Social Sciences Ain-Sebaa, Hassan II University of Casablanca, Route des Chaux et Ciments Beausite, BP 2634 Casablanca, Morocco [email protected] 3 Moroccan Association of Innovation and Scientific Research in Artificial Intelligence and Extended Reality, BP.154 Settat, Morocco
Abstract. Over the past decade, virtual reality (VR) technology has been increasingly used in educational settings. Its benefits and opportunities are opening up new avenues for learning. This research attempts to capture the strategic core factors involved in the evaluation of VR in education by implementing the A’WOT (AHP-SWOT) method, a combination of SWOT (Strengths, Weaknesses, Opportunities and Threats) analysis and the Analytic Hierarchy Process (AHP). SWOT analysis presents a comprehensive summary of important forces and challenges that are necessary for the development of VR education. However, The SWOT analysis does not have a way of measuring weights analytically and determining the importance of the components. Therefore, AHP analysis was applied to quantify and rank the factors that affect the functioning of the system. The proposed framework analysis allows the priorities of the factors contained in the SWOT analysis to be accurately and analytically determined and measurable. Keywords: Virtual Reality · Analytic Hierarchy Process · SWOT analysis · A’WOT approach
1 Introduction Talking about Immersive Virtual Reality (IVR) and its affordabily, which can provide dive in high-fidelity Virtual world experiences, has attracted attention in the technology in both the school and higher teaching and learning process. The Horizon Report for Universities recognizes IVR as a significant technological advancement that they expect will be embraced in the next years as a result of environments and creative cultures developing trends such as learning environments and creative cultures [1]. This technology © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 512–521, 2023. https://doi.org/10.1007/978-3-031-27440-4_49
Application of Combined SWOT and AHP Analysis
513
usage has been Under important amounts of tests in schools, with contemporary IVR studies concentrating on how This technology Can impact children’s learning [2], and on the ethical and organizational implications of employing IVR in schools [3]. There have also been research on students’ behavioral intention to utilize IVR for studying science in a university setting. Research on the behavioral intention to utilize IVR, with a special focus on student’s intention to use IVR for scientific learning in a higher education environment [4].
2 Related Works 2.1 Virtual Reality Technology Milgram and Kishino [5] built the framework to identify both Augmented Reality (AR) and VR before they got famous lately. The virtuality spectrum has two ends: the real world and the virtual world. Extended Reality settings, which gather between real and virtual objects, take place in the middle of the continuum. At one extreme of the spectrum, there are environments made entirely of real items. Some surroundings, on the other hand, are entirely composed of digital objects. Augmented virtuality (AV) refers to virtual worlds that integrate components from the real world. VR is mainly defined as a tool that creates a digital environment using computer inputs that helps individuals live a virtual experience in an interacting three-dimensional world filled with various sensory and emotional experiences. VR technology has now extended across a wide range of areas and sectors as a result of technical advancements [6]. IVR technologies provide the most immersion. These technologies allow the user to see the virtual space from within the surroundings themselves, developing a sensation of presence and increasing their sense of realism as the virtual scene is constantly taking feedback from head rotations or body movements [7]. On the other hand, Semi-immersive VR systems, provide users the impression of being partially involved in the computer-generated environment. To accomplish this, some sensory inputs are enhanced, or users’ active participation with the virtual world is enhanced [7]. Finally, users engage in a computer-assisted environment created and observed solely through an electronic device, such as Tablets or laptops and that’s what we call it with non-immersive VR systems. Desktop VR systems and 3D worlds are other names for them. Control methods are often executed at this level using a keyboard and/or a mouse [8]. 2.2 Virtual Reality in Education In recent years, there has been a significant growth in interest in educational research for the efficient use of VR [1]. With the help of this technology we can build a virtual environment (CAVE) as a projection system for architectural engineering students is a really good technique. This had a very favorable influence on students’ performance, and as a result, they were able to produce a building plan for a nuclear power plant in a relatively short period of time [2]. VR modules are successful and beneficial in the development of leaders. Because of the multiplicity of delivery mechanisms, this technique has the potential to provide promising outcomes. Furthermore, this test will
514
El. M. Bourhim and O. Labti
give more proof that VR may be used to teach soft skills in addition to hard sciences [3]. VR can be a practical method for including civil engineering students in the learning process. Civil engineering first-year master students must develop a virtual world in around 6 weeks, but first they must create the models in CAD drawings and then transfer the files into Unity 3D, so the students may examine the results in Oculus Rift headsets. This strategy is highly useful and intriguing since it allows users to participate in the whole process, beginning with designing models, integrating models into the Unity Game engine, and ultimately exporting the work to VR apps [4]. VR can be integrated into musical instruction from both a commercial and academic standpoint. According to the findings of this study, the quality of student retention and learning has increased in the classroom. A few potential applications are mentioned to assist students to feel more confident and perform better in public areas in various ways [5]. The main reason why this technology gained so much attention since its development in education is its interactive characteristics, and immersion [6]. Emphasize that its application allows the learner to be put in numerous contexts with realism that a textbook could never accomplish while eliminating certain features that might inhibit learning its usage in teaching allows learners to be immersed in a variety of locales and historical periods [7].
3 Methodology Using the SWOT and AHP methodologies, the critical components of VR as an efficient tool for human teaching were discovered and prioritized. First by using SWOT Matrix as key element to identify the features of VR in education; then AHP approach used for the notation of SWOT matrix’s elements and Sub elements. Figure 1 resume the workflow diagram. 3.1 SWOT Analysis SWOT analysis is a popular strategy for assessing both the external and internal environments at the same time in order to give a systematic approach and support for a decision scenario. There are two major aspects for this analysis: internal and external strategic considerations. The objective of using the SWOT method in the strategy decisions is the selection or construction and implementation of a strategy that fits well with the external and internal elements [16]. Furthermore, the technique used must be congruent with the decision makers’ present and future goals [17]. A Strength may be described a distinct approach, or capacity that enables fulfilling objectives [18] (e.g., VR can Self-Directed Experimentation and Independent Practice). A Weakness is a restriction, defect, or Flaw in a system that prevents it from progressing toward set goals [18] (Inadequate teacher ability to use the technology). An Opportunity is related to both external and internal dynamics in the system’s working environment, such as a trend that raises demand for what the system can supply or enables it to provide it more efficiently [18] (Real-Time Data Analysis and Intelligence). A Threat can be any disadvantageous situation in the entity’s surroundings that impedes its plan by posing a barrier or restrictions that limit strategy execution [18] (for example, Addiction To The Virtual World).
Application of Combined SWOT and AHP Analysis
515
A SWOT analysis is a rigorous approach to issue resolution that includes a thorough review of all factors related to a new product, technology, management, or strategy. The importance of SWOT analysis in an environment scan is depicted in Fig. 1.
Fig. 1. SWOT Framework
3.2 AHP Approach AHP is an effective decision-making tool that breaks down difficult situations into a multilayer hierarchical structure of criteria, objectives, and options [8]. AHP uses a comparison of two elements to identify the relative importance of those elements at each level of the hierarchy. After that, evaluates the alternatives at the lowest level to pick the best choice among them. AHP is an effective decision-making technique, especially where subjectivity is involved, and it is especially well adapted to challenges when the selection criteria may be arranged hierarchically into sub-criteria [9]. AHP is used in multilevel hierarchical systems to assess relative priority on absolute scales using both separate and continuous pairs comparisons [10]. The prioritizing mechanism operates by assigning a number from Saaty’s [8] comparison scale to represent the relative importance of the criterion. Pairwise comparison matrixes of these items are used to calculate significance. 3.3 A’WOT Methodology A’WOT is considered as a hybrid method that work with both AHP framework and SWOT Analysis [11]. The principal goal of an A’WOT method is to assess parameters in a systematic way using the AHP [12]. The SWOT approach offers an easy model for the analysis of decision-making contexts. While AHP provides support for the implementation of SWOT method in an analytical way. It also offers an essential tool for strategic decision-making in many situations and can be used as a communication and educational framework in decision-making processes where several decision-makers are involved. The hybrid method A’WOT [13] proceeds as follows: • We Start by building our SWOT Matrix by listing and structuring the relevant external and internal strategic planning criteria. We based on the literature review to build the elements of SWOT as well as the sub-elements • In the second step, the AHP technique was implemented to obtain the weights of the criteria and sub-criteria within the SWOT factors [14].
516
El. M. Bourhim and O. Labti
• Finally, to evaluate the final SWOT elements, a survey was administered separately and XR The Moroccan Association’s experts were invited to weigh every SWOT factor using a scale from 1–9 pair by pair. The questionnaires were directly conducted in face to face mode. Pairwise comparisons were used to assess the relative weights of each SWOT factor pair. Then, the SWOT parameters were quantified using AHP, the weight vectors and group priorities were calculated, and the components were analyzed with the results matrix.
4 Results and Discussion The initial phase of the analysis consisted of a brainstorming session to assess the key drivers of each SWOT component. As a result of these assessments, seven strengths and five weaknesses, opportunities, and threats sub-factors were generated as the basis for the SWOT diagnosis. Those SWOT components for VR education are listed in Table 1. Based on the SWOT consistency coefficients, it was found that the SWOT decision matrix was highly consistent (CR: 0.028 < 0.10) with the highest ranking derived to be “Opportunities” with a score of 0.649, which is succeeded by “Strengths” with 0.177, “Weaknesses” with 0.107 and “Threats” with 0.067. The decision matrix has been identified as consistent within each SWOT sub-factor pair-wise comparisons as well, with a CR of 0.054 for opportunities, 0.08 for threats and strengths and 0.082 for weaknesses. More in-depth pairwise results for each SWOT group and the sub-factors within each SWOT group can be found in Table 2. Table 1. SWOT Analysis. Strengths (S)
Weaknesses (W)
• • • • •
S1: Immersion S2: Improved Ecological Relevance S3: Increasing the amount of participation S4: Real-Time Performance Evaluation S5: Self-Directed Experimentation and Independent Practice • S6: Motivating Gaming Factors • S7: Safe Testing and Training Environment
• • • • •
Opportunities (O)
Threats (T)
• O1: Processing Power and Graphics/ Video Integration • O2: Real-Time Data Analysis and Intelligence • O3: Academic and Professional Acceptance • O4: Gaming-Industry Drivers • O5: Devices and Wires
• T1: Ethical Challenges • T2: Fear of VR may Eliminate the Need of teachers • T3: Limited Awareness • T4: Unrealistic Expectations • T5: Addiction To The Virtual World
W1: Expensive technology W2: Interaction Methods W3: Wires and Displays W4: Motion Sickness W5: Inadequate teacher ability to use the technology
Application of Combined SWOT and AHP Analysis
517
Table 2. Matrices for SWOT factors. SWOT Groups Strengths (S) Weaknesses (W) Opportunities (O) T hreats (T) CR: 0.028 Strengths 0.177 S1 S2 S3 S4 S5 S6 S7 CR : 0.08
Weaknesses 0.107 W1 W2 W3 W4 W5 CR: 0.082 Opportunities 0.649 O1 O2 O3 O4 O5 CR: 0.054 Threats 0.067 T1 T2 T3 T4 T5 CR: 0.08
S 1.000 0.500 5.000 0.330 S1 1.000
W 2.000 1.000 6.000 0.500
S2 0.500 1.000
S3 9.000 0.500 1.000
O 0.200 0.170 1.000 0.140 S4 0.500 1.000 0.170 1.000
S5 9.000 4.000 2.000 6.000 1.000
T 3.000 2.000 7.000 1.000 S6 6.000 0.200 0.250 1.000 0.140 1.000
S7 5.000 0.200 0.200 0.500 0.140 1.000 1.000
W1
W2
W3
W4
W5
1.000
5.000 1.000
5.000 2.000 1.000
0.500 0.170 0.200 1.000
0.200 0.200 0.200 0.500 1.000
O1
O2
O3
O4
O5
1.000
2.000 1.000
0.110 0.110 1.000
0.170 0.120 3.000 1.000
0.500 0.170 8.000 2.000 1.000
T1 1.000
T2 2.000 1.000
T3 4.000 4.000 1.000
T4 5.000 5.000 4.000 1.000
T5 0.33 0.170 0.170 0.110 1.000
On the basis of the scores, the most significant attribute was deemed to be the opportunities for providing “academic and professional acceptance”, with a weight of 0.565. The SWOT group weights generated through the A’WOT process and the resulting overall scores obtained across all attributes are reported in Table 3. Within the strengths, Immersion was ranked as the highest contributing driver with a score of 0.457. This confirms the fact that Immersion is considered to play an influential function in VR education research. When discussing the weakness, the weakest factor was found to be “inadequate ability of teachers to use technology” that resulted from inadequate background and lack of knowledgeable instructors regarding this technology,
518
El. M. Bourhim and O. Labti Table 3. Overall Priority Scores of SWOT Factors.
SWOT Groups
Group Priority
SWOT factors
Factors priority within the group
Strengths
0.177
Immersion(S1)
0.457
Improved Ecological Relevance (S1)
0.074
Increasing the amount of participation(S2)
0.029
Real-Time Performance Evaluation(S3)
0.105
Self-Directed Experimentation and Independent Practice(S4)
0.022
Motivating Gaming Factors(S5)
0.147
Safe Testing and Training Environment(S6)
0.166
Expensive technology(W1)
0.175
Interaction Methods(W2)
0.061
Wires and Displays(W3)
0.048
Motion Sickness(W4)
0.269
Inadequate teacher ability to use the technology(W5)
0.447
Processing Power and Graphics/ Video Integration(O1)
0.053
Real-Time Data Analysis and Intelligence(O2)
0.032
Academic and Professional Acceptance(O3)
0.565
Gaming-Industry Drivers(O4)
0.235
Devices and Wires(O5)
0.115
Weaknesses
Opportunities
0.107
0.649
(continued)
Application of Combined SWOT and AHP Analysis
519
Table 3. (continued) SWOT Groups
Group Priority
SWOT factors
Factors priority within the group
Threats
0.067
Ethical Challenges(T1)
0.215
Fear of VR may Eliminate the Need of teachers(T2)
0.151
Limited Awareness(T3)
0.073
Unrealistic Expectations(T4)
0.034
Addiction To The Virtual 0.527 World(T5)
with a weight of 0.447. That was followed by “motion sickness,” with a weight of 0.269. The most significant opportunities when using VR in education included “academic and professional acceptance,” with a weight of 0.565, and “gaming industry drivers,” with a weight of 0.235. The strongest threat appeared to be “virtual world addiction,” with a score of 0.527, and “ethical challenges,” with a score of 0.215.
5 Conclusion After applying the hybrid method SWOT and AHP Approach to VR in education, significant advantages of this technology have been identified, and it appears that this technology will continue to expand as we have the highest percentage of opportunities, which will lead to technology companies and investors supporting the development of new applications for educational purposes. There are certain risks associated with utilizing this devices, since our findings indicate that it may develop an addiction, both psychologically and physiologically. More awareness is required so that the user may control the technology rather than the other way around. The world of hardware and software is developing at an incredible rate, with technological advancements and upgrades appearing on a daily basis. Hardware vendors should make greater attempts to reduce the Quality/Price ratio so that VR in education may be used by individuals from diverse walks of society. The AHP technique was a decent way to weight the SWOT Analysis criteria, but it ignores interdependencies and sensory inputs at all levels. This might be avoided by rating SWOT analyses utilizing the ANP analytic network approach. In the future, we will strive to include the ANP technique into the method and address additional challenges such as the inconsistent nature of subjective opinions. We can also make decisions using fuzzy logic [15, 16]. Acknowledgment. The authors gratefully acknowledge the financial support and technical assistance provided by the Moroccan Association of Innovation and Scientific Research in Artificial Intelligence and Extended Reality, BP.154, Settat, Morocco. Without its generous support, this publication would not have been possible.
520
El. M. Bourhim and O. Labti
Funding. The authors gratefully acknowledge the financial support and technical assistance provided by the Moroccan Association of Innovation and Scientific Research in Artificial Intelligence and Extended Reality, BP.154, Settat, Morocco.
References 1. Pellas, N., Mystakidis, S., Kazanidis, I.: Immersive virtual reality in K-12 and higher education: a systematic review of the last decade scientific literature. Virtual Reality 25(3), 835–861 (2021). https://doi.org/10.1007/s10055-020-00489-9 2. Messner, J.I., Yerrapathruni, S.C.M., Baratta, A.J., Whisker, V.E.: Session 1121 Using Virtual Reality to Improve Construction Engineering Education 3. Hickman, L., Akdere, M.: Exploring virtual reality for developing soft-skills in STEM education. In: 2017 7th World Engineering Education Forum (WEEF), pp. 461–465 (2017). https:// doi.org/10.1109/WEEF.2017.8467037 4. Dinis, F.M., Guimarães, A.S., Carvalho, B.R., Poças Martins, J.P.: Virtual and augmented reality game-based applications to civil engineering education. In: 2017 IEEE Global Engineering Education Conference (EDUCON), pp. 1683-1688 (2017). https://doi.org/10.1109/ EDUCON.2017.7943075 5. Serafin, S., Adjorlu, A., Nilsson, N., Thomsen, L., Nordahl, R.: Considerations on the use of virtual and augmented reality technologies in music education. In: 2017 IEEE Virtual Reality Workshop on K-12 Embodied Learning through Virtual & Augmented Reality (KELVAR), pp. 1–4 (2017). https://doi.org/10.1109/KELVAR.2017.7961562 6. Gavish, N., et al.: Evaluating virtual reality and augmented reality training for industrial maintenance and assembly tasks. Interact. Learn. Environ. 23(6), 778–798 (2015). https:// doi.org/10.1080/10494820.2013.815221 7. Blascovich, J., Loomis, J., Beall, A.C., Swinth, K.R., Hoyt, C.L., Bailenson, J.N.: TARGET ARTICLE: immersive virtual environment technology as a methodological tool for social psychology. Psychol. Inquiry 13(2), 103–124 (2002). https://doi.org/10.1207/S15327965PLI 1302_01 8. Bourhim, E.M., Cherkaoui, A.: Efficacy of virtual reality for studying people’s pre-evacuation behavior under fire. Int. J. Hum. Comput. Stud. 142, 102484 (2020). https://doi.org/10.1016/ j.ijhcs.2020.102484 9. Semih, T., Seyhan, S.: A multi-criteria factor evaluation model for gas station site selection. J. Global Manage 2(1), 12–21 (2011) 10. Saaty, T.L.: Decision Making with Dependence and Feedback: The Analytic Network Process: The Organization and Prioritization of Complexity. RWS Publications (1996) 11. Bourhim, E.M., Cherkaoui, A.: Exploring the potential of virtual reality in fire training research using A’WOT hybrid method. In: Thampi, S.M., et al. (eds.) Intelligent Systems, Technologies and Applications. AISC, vol. 1148, pp. 157–167. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3914-5_12 12. Bourhim, E.M., Cherkaoui, A.: Selection of optimal game engine by using AHP approach for virtual reality fire safety training. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds.) ISDA 2018 2018. AISC, vol. 940, pp. 955–966. Springer, Cham (2020). https://doi. org/10.1007/978-3-030-16657-1_89 13. Bourhim, E.M.: Augmented reality for fire evacuation research: an A’WOT analysis. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 277–285. Springer, Cham (2022). https://doi.org/10.1007/978-3030-96308-8_25
Application of Combined SWOT and AHP Analysis
521
14. Bourhim, E.M., Cherkaoui, A.: Usability evaluation of virtual reality-based fire training simulator using a combined AHP and fuzzy comprehensive evaluation approach. In: Jeena Jacob, I., Kolandapalayam Shanmugam, S., Piramuthu, S., Falkowski-Gilski, P. (eds.) Data Intelligence and Cognitive Informatics. AIS, pp. 923–931. Springer, Singapore (2021). https://doi. org/10.1007/978-981-15-8530-2_73 15. Labti, O., Belkadi, E.: Factors affecting the online travel purchasing decision: an integration of fuzzy logic theory. In: Shakya, S., Balas, V.E., Haoxiang, W., Baig, Z. (eds.) Proceedings of International Conference on Sustainable Expert Systems. LNNS, vol. 176, pp. 77–93. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-4355-9_7 16. Labti, O., Belkadi, E.-Z.: Modeling travelers behavior using FSQCA. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 657–666. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96308-8_61
Prediction of Colon Cancer Related Tweets Using Deep Learning Models Mohammed Rashad Baker1(B) , Esraa Zeki Mohammed2 , and Kamal H. Jihad3 1 Department of Software, College of Computer Science and Information Technology,
University of Kirkuk, Kirkuk, Iraq [email protected] 2 ITPC, Ministry of Communications, Kirkuk, Iraq 3 Electronic Computer Center, University Presidency, University of Kirkuk, Kirkuk, Iraq
Abstract. In the present day, social media has become an influential tool in society; therefore, recent research has been directed to monitor, analyze and predict the reactions resulting from the cumulative data of these means. Thanks to modern technology and the spread of social media, it is possible for people to leave their comments on various topics, including the aspect related to disease issues. In this research, people’s reactions to Colon cancer were analyzed and benefited from them to predict the future of this disease; colon patient reviews and data sets were used in conjunction with data collected from Twitter. In this work, we proposed three Deep Learning (DL) models; these models Long Short-Term Memory (LSTM), Gated recurrent units (GRU), and convolutional neural network (CNN). The results showed that GRU model performance gives more stable results than LSTM and CNN models in terms of accuracy. Keywords: Database · Deep Learning · Prediction · Colon Cancer · Tweets
1 Introduction Parallel to technological progress, usage shifted from audio-visual and print communication channels to social media, which gained popularity instead of newspapers and magazines, thanks to communication technologies and Internet. It becomes an essential tool for people to express their opinions on health-related issues, the economy, and products, paving the way for analyzing people’s feelings and understanding their thoughts. The feelings in the texts people shared were positive, negative, or neutral [1]. Since the inception of Twitter in 2006, it has significantly spread and become one of the most used social networking sites worldwide. By 2021 the number of users of the platform worldwide has reached nearly 200 million active users. It’s an excellent way to spontaneously and organically interact with tweets that express how the author is feeling at the moment. This communication is particularly invaluable in the healthcare field. As patients spend more time on social networking sites and less time with caregivers, it is often difficult for healthcare workers to meet their needs or understand patients’ feelings. Even though Twitter is a microblogging site where any user or patient can express their © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 522–532, 2023. https://doi.org/10.1007/978-3-031-27440-4_50
Prediction of Colon Cancer Related Tweets
523
feelings about their personal life or current affairs [2], the currently available tweets portray an untapped resource for interacting with the patient. More specifically, the tweets of cancer patients represent an untapped resource for interacting with cancer patients. The increase in patients’ data in the health field has increased the importance of DL in the health aspect. DL has a significant role in various applications due to its mighty computing power with rapid data storage, and its popularity has grown in the last few years [3]. Today, in various fields related to health and medicine, ML is widely used. Also, ML has seen increasing growth in this aspect, and it can be widely used to predict and analyze many health-related outcomes [4]. In the field of cancer, especially Colon cancer, the second cause of cancer deaths in 2018 and the third most prevalent type of cancer worldwide [5]. Researchers implemented computer programs to analyze the text and grammar of social media posts to evaluate online public opinion. Natural language processing, often known as NLP, is the technique that is used the most frequently [6]. This research explored the public’s perceptions on colon cancer as well as their feelings by using data collected from Twitter. In order to provide relevant guidelines that assist in the implementation of intervention decisions made by public health authorities in the future relating to situations that are analogous to the one in which we are currently engaged, we investigate the public sentiment in social media dialogues and evaluate whether it represents the overall public sentiment regarding this particular type of cancer. The following are the primary aims of the work that is being proposed: • To study public feelings around colon cancer by apply sentiment analysis on public tweets; • To find out if the public sentiment is positive or negative. • To predict the most pressing issues associated with this type of cancer in the future. The other sections of the work are organized as follows. Section Two discusses related works. Section Three discusses data collection and the proposed work. Section Four presents the test results of the proposed work. Last section Five conclusions of the paper and future works.
2 Related Work There are several related and similar works that have been done in the last years that used Artificial Neural Networks (ANN) and Machine Learning (ML) in making successful classification and predictions, also other works that analyze different types of cancer data for health care as shown below: Bychkov et al. 2018, trained deep neural networks using a combination of convolutional and recurrent architectures for the purpose of predicting colon cancer based on photos of tumors found in tissue samples. 420 patients were used in the study to evaluate the system [7]. Bhuvaneswari et al. 2019, employed Bi-directional GRU (BiGRU) and (LSTM) models for catastrophe event prediction. These models use deep learning techniques to classify tweets of the CrisisLexT26 dataset; the performance was analyzed and graded [8].
524
M. R. Baker et al.
Shapcott et al. 2019, utilized a deep learning to diagnose photos of colon cancer that were stored in The Cancer Genome Atlas repository (TCGA). Next, the researchers looked for connections between these clinical characteristics and the dataset of projected features [9]. Allen et al. 2020, undertook a retrospective analysis of tweets about Ovarian Cancer (HBOC), Lynch Syndrome (LS), and hereditary breast ding year 2017. Included in their collection were 63,770 tweets. Their findings centered on information and personal accounts of their interactions with HBOC and LS [10]. Iqbal et al. 2021 retrieved characteristics from prostate images. They then examined the performance of the derived features using a confusion matrix and sensitivity, specificity, positive predictive value, negative predictive value, and Area Under the Curve (AUC). They successfully achieved significant accuracy and AUC [11]. Baker, 2021, This dissertation utilized the BERTweet model to predict sentiment and emotional labels for cancer-related tweets during COVID-19 [2]. Lou et al., 2022, performed a Pathological Complete Response (PCR) prediction using digitized pathological pictures and an artificial intelligence model for Locally Advanced Rectal Cancer (LARC). The system showed high predicting accuracy; Additionally, it can be utilized before operations to assist in treatment decision-making [12]. Waljee et al. 2022, discussed the use of AI and ML as a leverage to improve the early diagnosis and prognosis of SSA and conduct population-based surveillance. Alamoodi et al. 2022, investigated the public tweets during the COVID-19 pandemic and intended to comprehend the public sentiment and themes of debate across the several lockdown waves in Malaysia. Even though the Malaysian government continued to enforce lockdown measures, most Twitter responses were supportive, preceded by neutral and negative emotions. In addition, the results indicated how various subjects were provided in each lockdown and their primary connected keywords [6].
3 The Proposed System This section discusses the collected dataset and how the preprocessing steps were performed. Then we will discuss the main part of our proposed DL model. Figure 1 below illustrates the main component of the proposed classification model.
Fig. 1. The structure of the proposed classification model
Prediction of Colon Cancer Related Tweets
525
3.1 Data Collection and Preprocess Stage The data used in this work were collected from Twitter. We used our developed Tweets collector using Python. We collected all tweets regarding colon cancer by using #ColonCancer hashtag. We have collected data from 01-01-2018 till 19-09-2022. We have collected 45,249 related tweets. After that, we filtered the collected dataset and selected the tweets that were written in English. The next step was applying various NLP preprocessing techniques to clean our dataset. These techniques were: stopwords removing, lemmatization, removing special characters, URLs, and emojis. After we applied preprocessing techniques, the final data was 42,531 tweets. 3.2 Feature Extraction Word2vec is a natural language processing approach that can learn word connections from a vast corpus of text; the word2vec algorithm employs a neural network model. Once trained, a model of this type can find synonymous words or suggest different words for an incomplete text [13]. The data collected for the study is transformed into feature vector format using Word2vec and then classified using deep learning classifiers. 3.3 Labeling In the labeling process, we pre-classify tweets as positive or negative. We used the Vader library, a Lexicon-based sentiment analyzer for processing textual data to classify whether the tweet is positive or negative. If tweet sentiment was in minus (−), we labeled the tweet as negative. If the tweet sentiment were positive (+), we would classify it as positive. After performing classification, we got a total of 17564 tweets labeled as negative and 17245 tweets labeled as positive.3.4. Deep Learning Approaches and Models. 3.3.1 Long Short-Term Memory (LSTM) Long Short-Term Memory (LSTM) units are components of an artificial recurrent neural network (RNN) constructed of LSTM units, sometimes referred to as an LSTM network [14]. LSTM is used in the experiments to generate text and classify the tweets [15]. The LSTM method commonly picks the most likely word each time, depending on which class the tweet is to be classified [8]. Figure 2 below shows the main component and setting for the designed LSTM model. A typical model of LSTM is defined as follows: (1) f t = σ W f · ht−1 , xt + bf it = σ W i · ht−1 , xt + bi
(2)
C˙ t = tanh W c · ht−1 , xt + bc
(3)
C t = f t ∗ C t−1 + it ∗ C˙ t
(4)
526
M. R. Baker et al.
ot = σ W o · ht−1 , xt + bo
(5)
ht = ot ∗ tanh(C t )
(6)
Here, xt represents the input vector at time t, ht is the output vector, ct represents the state of the memory cell, it is the input gate vector, ft is the forget gate vector, ot is the output vector, Wi , Wf , Wo and Wc are the weight matrices, bi, bf , bo and bc are the bias vector, and is the activation function.
Fig. 2. The structure of the LSTM model used in this work
3.3.2 Gated Recurrent Unit (GRU) Gated recurrent units, often known as GRUs and similar to LSTM, are a gating mechanism that may be found in artificial recurrent neural networks [16]. On the other hand, it has been demonstrated that GRUs perform better on datasets with a low to moderate quantity of data. The GRU is comparable to an LSTM with a forget gate but differs from the LSTM in that it does not contain an output gate and fewer parameters. The main component and settings for the designed GRU model are illustrated in Fig. 3. A typical model of GRU is defined as follows: (7) rt = σ W r · ht−1 , xt + br it = σ W z · ht−1 , xt + bz
(8)
h˙ t = tanh W h · rt ∗ ht−1 , xt + bh
(9)
ht = (1 − zt ) ∗ ht−1 + h˙ t
(10)
Here, xt is the vector of inputs at time t, ht is the vector of outputs, rt is the vector of the reset gate, zt is the vector of the update gate, Wr , Wz , and Wh are matrices weight, br , bz and bh are the bias vector, and σ is the activation function.
Fig. 3. The structure of the GRU model used in this work
Prediction of Colon Cancer Related Tweets
527
3.3.3 Convolutional Neural Network (CNN) A convolutional neural network, sometimes known as a CNN, is a deep learning neural network explicitly developed to process structured input arrays [17]. CNN is widely utilized in computer vision [18] and has emerged state-of-the-art for various visual applications, such as picture classification [19]. CNN has also successfully used natural language processing for text categorization [20]. Figure 4 below shows the main component and setting for the designed CNN model. A typical model of CNN is defined as follows: ∞ x(t)δ(t − τ )dτ (11) x(t) = −∞
where δ(t) is the sequence, δ(t − τ ) is the time shifted version of δ(t) and is 1 at t= τ and 0 for any other t.
Fig. 4. The structure of CNN model used in this work
4 Results and Discussion We have used Kaggle to train an test our models. Kaggle Notebook is a cloud computing platform that allows for reliable and interactive analysis. The preprocessed dataset is rearranged to make the performance more comprehensive, reduce the variance, and avoid overfitting the model. Then, the data are divided into 80:20 ratios, where 80% are used to train the model, and 20% are used to test the model. We have used Accuracy (Acc.), Precision (Pr.), Recall (Re.), and F1 score are some of the performance evaluation measures that are being examined in this research. The following description defines these metrics: Accuracy(Acc) =
TP + TN TP + TN + FP + FN
(12)
TP TP + FP
(13)
Precision(Pr.) = Recall(Re.) =
TP TP + FN
(14)
F1Score = 2x
Pr.xRe. Pr. + Re.
(15)
where the terms “true positive”, “false positive”, “true negative” and “false positive” are abbreviated as “TP”, “FP”, “TN” and “FN” respectively.
528
M. R. Baker et al.
LSTM(A)
LSTM(B)
GRU(A)
GRU(B)
CNN(A)
CNN(B)
Fig. 5. Proposed DL models performance (A) Training and validation accuracy (B) Training and validation loss
Prediction of Colon Cancer Related Tweets
529
In our first experiment, we used 20 epochs to train our dataset and then performed our proposed DL classification models on the collected dataset. Table 1 shows the performance results for LSTM, GRU, and CNN classification models. It can be conducted that the accuracy of the GRU classifier score, which is scored at 0.91 is the best accuracy result among both LSTM and CNN, was scored at 0.90 and 0.87, respectively. For our second experiment, we measured the accuracy and loss scores for the training and validation dataset. Figure 5 shows the performance results for proposed DL models. We can conduct that the GRU model’s accuracy in training and validation performs better and is more balanced than LSTM and CNN models. On the other hand, we can see that CNN has instability and causes more loss for training and validation of the dataset compared to GRU and LSTM models. Table 2 shows an overview of the accuracy and loss testing for training and validation of the dataset. Table 1. Results of DL models Model
Class
Precision
Recall
F-Score
Accuracy
LSTM
0
0.89
0.93
0.91
0.90
GRU
CNN
1
0.92
0.88
0.90
Macro avg
0.90
0.90
0.90
0
0.91
0.91
0.91
1
0.91
0.91
0.91
Macro avg
0.91
0.91
0.91
0
0.84
0.92
0.88
1
0.87
0.87
0.87
Macro avg
0.87
0.87
0.87
0.91
0.87
Table 2. Summary of accuracy and loss on both training and validation dataset Training Data
Validation Data
Accuracy
Loss
Accuracy
Loss
LSTM
95.46
13.69
90.30
27.87
GRU
96.47
11.49
91.10
25.80
CNN
90.74
49.88
86.88
49.23
GRU has improved its performance, as demonstrated by assessing the Accuracy and validation scores. Discussions on Twitter relating to colon cancer have been classified more accurately. While other DL models are close to GRU’s performance in terms of accuracy, the loss of LSTM and CNN models at training and validation of the dataset is evident in this experiment. The performance of these DL models, including GRU, could
530
M. R. Baker et al.
be enhanced by improving the hidden layers and number of epochs for proposed DL models.
5 Comparison with Other Related Works This section contain a comparison with other previous similar works in terms of the methods used, with a statement of the results obtained with each method as shown in Table 3 below. Table 3. Comparison with related works References
Used Method
Accuracy
[1]
SVM NB CNN RNN LSTM
90% 87% 93% 92% 94%
[5]
CNN
93.48%
[11]
LSTM
99.84
[8]
LSTM + With Embedding Bi-d GRU + With Embedding
85% 89%
[21]
Nonaugmented combined Augmented combined
91% 96%
The Proposed work
LSTM GRU CNN
90% 91% 87%
6 Conclusions Understanding the feelings of cancer patients and their way of thinking is of great importance because the psychological state of cancer patients has a major role in treatment. Also, the number of tweets from cancer patients is increasing daily to express their suffering, personal experiences and feelings with the disease and often these tweets. It goes unnoticed by the healthcare teams responsible for following up and treating these patients. Therefore, this research highlights the importance of classifying these tweets as a future prediction for colon cancer patients, which is reflected in the provision of live care for them. The current study has explained the use of three of deep learning models in classification and predicting of colon cancer tweets. According to the results of the experiments, the model created with (GRU) has shown the best performance 0.91 among the other used two models (LSTM and CNN). Our suggestions to complement this work is to focus on the psychological aspect and the feelings of patients through
Prediction of Colon Cancer Related Tweets
531
the use of customized programs in conjunction with treatment and use other methods to predict the feelings of cancer patients and take advantage of other social media to understand the suffering of cancer patients and classify them according to the nature of the disease and provide support and appropriate health care for patients. In this work we also highlight some of the major challenges in using the DL and NLP in the medical fields. Some common challenges are the ambiguity from a medical point of view, and the ambiguity in some of the pictures and text written by some of the tweeters on social media.
References 1. Ba¸sarslan, M.S., Kayaalp, F.: Sentiment analysis on social media reviews datasets with deep learning approach. Sak. Univ. J. Comput. Inf. Sci. 4 (2021). https://doi.org/10.35377/saucis. 04.01.833026 2. Baker, W.: ScholarWorks @ UARK Using Large Pre-Trained Language Models to Track Emotions of Cancer Patients on Twitter (2021) 3. Pandey, B., Kumar Pandey, D., Pratap Mishra, B., Rhmann, W.: A comprehensive survey of deep learning in the field of medical imaging and medical natural language processing: Challenges and research directions. J. King Saud Univ. Comput. Inf. Sci. 34, 5083–5099 (2022). https://doi.org/10.1016/j.jksuci.2021.01.007 4. Baker, M.R., et al.: Implementing critical machine learning (ML) approaches for generating robust discriminative neuroimaging representations using structural equation model (SEM). Comput. Math. Methods Med. 2022 (2022). https://doi.org/10.1155/2022/6501975 5. Kavitha, M.S., Gangadaran, P., Jackson, A., Venmathi Maran, B.A., Kurita, T., Ahn, B.C.: Deep neural network models for colon cancer screening. Cancers (Basel) 14 (2022). https:// doi.org/10.3390/cancers14153707 6. Alamoodi, A.H., Baker, M.R., Albahri, O.S., Zaidan, B.B., Zaidan, A.A.: Public sentiment analysis and topic modeling regarding COVID-19’s three waves of total lockdown: a case study on movement control order in Malaysia. KSII Trans. Internet Inf. Syst. 16, 2169–2190 (2022). https://doi.org/10.3837/tiis.2022.07.003 7. Bychkov, D., et al.: Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 8, 1–11 (2018). https://doi.org/10.1038/s41598-018-21758-3 8. Bhuvaneswari, A., Jones Thomas, J.T., Kesavan, P.: Embedded bi-directional GRU and LSTM learning models to predict disasterson Twitter data. In: Procedia Computer Science, pp. 511– 516. Elsevier B.V (2019). https://doi.org/10.1016/j.procs.2020.01.020 9. Shapcott, M., Hewitt, K.J., Rajpoot, N.: Deep learning with sampling in colon cancer histology. Front. Bioeng. Biotechnol. 7 (2019). https://doi.org/10.3389/fbioe.2019.00052 10. Allen, C.G., et al.: Correction to : Communication about Hereditary Cancers on Social Media: A Content Analysis of Tweets about Hereditary Breast and Ovarian Cancer and Lynch Syndrome, pp. 827–831 (2020) 11. Iqbal, S., et al.: Prostate cancer detection using deep learning and traditional techniques. IEEE Access 9, 27085–27100 (2021). https://doi.org/10.1109/ACCESS.2021.3057654 12. Lou, X., et al.: Deep learning model for predicting the pathological complete response to neoadjuvant chemoradiotherapy of locally advanced rectal cancer. Front. Oncol. 12, 1–11 (2022). https://doi.org/10.3389/fonc.2022.807264 13. Paliwal, S., Mishra, A.K., Mishra, R.K., Nawaz, N., Senthilkumar, M.: XGBRS framework integrated with Word2Vec sentiment analysis for augmented drug recommendation. Comput. Mater. Contin. 72, 5345–5362 (2022). https://doi.org/10.32604/cmc.2022.025858
532
M. R. Baker et al.
14. Jiang, C., et al.: A MEMS IMU de-noising method using long short term memory recurrent neural networks (LSTM-RNN). Sensors (Switzerland) 18, 3470 (2018). https://doi.org/10. 3390/s18103470 15. Lau, R.Y.K., Li, C., Liao, S.S.Y.: Social analytics: learning fuzzy product ontologies for aspect-oriented sentiment analysis. Decis. Support Syst. 65, 80–94 (2014). https://doi.org/10. 1016/j.dss.2014.05.005 16. Struye, J., Latré, S.: Hierarchical temporal memory and recurrent neural networks for time series prediction: an empirical validation and reduction to multilayer perceptrons. Neurocomputing 396, 291–301 (2020). https://doi.org/10.1016/j.neucom.2018.09.098 17. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1746–1751. Association for Computational Linguistics (ACL) (2014). https:// doi.org/10.3115/v1/d14-1181 18. Luo, H., Xiong, C., Fang, W., Love, P.E.D., Zhang, B., Ouyang, X.: Convolutional neural networks: computer vision-based workforce activity assessment in construction. Autom. Constr. 94, 282–289 (2018). https://doi.org/10.1016/j.autcon.2018.06.007 19. Sultana, F., Sufian, A., Dutta, P.: Advancements in image classification using convolutional neural network. In: Proceedings - 2018 4th IEEE International Conference on Research in Computational Intelligence and Communication Networks, ICRCICN 2018, pp. 122–129. Institute of Electrical and Electronics Engineers Inc. (2018). https://doi.org/10.1109/ICR CICN.2018.8718718 20. Rios, A., Kavuluru, R.: Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. In: BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 258–267 (2015). https://doi. org/10.1145/2808719.2808746 21. Mulenga, M., et al.: feature extension of gut microbiome data for deep neural network-based colorectal cancer classification. IEEE Access 9, 23565–23578 (2021). https://doi.org/10.1109/ ACCESS.2021.3050838
A Combinatorial Approach: Datamining and an Efficient Deep Neural Network for Heart Disease Prediction V. K. Jyothi(B) and Guda Ramachandra Kaladhara Sarma HCL Technologies, Bangalore, India [email protected], [email protected]
Abstract. This paper explores the role of Data mining and Artificial Intelligence in the area of medical research. Prevention of heart disease is one of the vital areas in medical research. The objective of this study is to design a diagnostic prediction system that can detect and predict heart disease at an early stage by mining relevant data from the clinical dataset using datamining, statistics, and deep learning techniques. Data preprocessing is performed in multiphase such as removing of missing data, numeric transformation, and data normalization for mining an efficient data. Our main contribution is to design an efficient deep neural network model for the early prevention of heart disease. In this point of view, we have designed heart disease prediction system consists of two deep learning neural network architectures namely, (i) Deep neural network in Recognition of heart disease (DeepR) and (ii) efficient Deep neural network in Recognition of heart disease (eDeepR). In these two proposed architectures, DeepR generates 97.64% accuracy and eDeepR generates 99.53% accuracy for recognition of heart disease, after recognition which can be applicable for prevention. To evaluate the performance of proposed networks, conducted experiments on the Cleveland heart disease data set from the UCI repository. Results of proposed systems demonstrate the performance is superior to the previously reported prediction techniques.
Keywords: Heart Disease Prediction Machine learning
1
· Datamining · Deep Learning ·
Introduction
Heart plays a vital role in human body, and it is essential to control from unpredicted accidental death [1,2]. Around 25% of people die suddenly who causes of heart attack and suffered by heart disease [3,4]. Doctors and physician’s struggle to correctly detect the heart disease. Artificial Intelligence (AI) is a large popular and well applicable branch of computer science which can be utilized for building smart systems capable of performing tasks that usually transcend humans. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 533–542, 2023. https://doi.org/10.1007/978-3-031-27440-4_51
534
V. K. Jyothi and G. R. K. Sarma
A subset of AI is known as Machine Learning (ML). In machine learning, using specific dataset, algorithm is trained to learn how to make decisions. There are two different types in ML: supervised and unsupervised. To train a model supervised learning uses labelled data and unsupervised learning uses unlabeled data [5]. ML algorithms apply prediction models on dataset. There are a wide range of applications such as computer vision-object recognition and detection, discovery of drugs, biomedicine and bioinformatics [6–8], natural language processing, speech recognition, disease prediction [5], etc. In all these applications, there is a drastic increase in health care [9] and medical science-related applications [10]. ML algorithms are already designed for the simplification in fixing diagnosis. Deep learning is the most popular wide area in machine learning [8], Deep learning focusing a large attention on parts of biological and medical research [11]. Recently developed AI based algorithms have shown to be more accurate in diagnosing acquired heart disease compared to expert cardiologists, using different data modalities as input [5]. Here, we significantly improve ML results by designing and fine tuning Deep Neural Network (DNN) architectures for recognizing heart disease clinical data, which can yield more than 99% accuracy. The best results are obtained on an efficient Deep Recognition model of 3 layered architecture, it employs regularization, drop out, optimization and deals with outliers automatically. The proposed system with architectures DeepR and eDeepR significantly outperforms in currently designed and published research in ML. The invention of correct prediction of heart disease system can prevent life threats [12]. This is the main motivation behind to design and develop a significant intelligence heart disease prediction system, it saves people in an initial state in low cost. The proposed heart disease prediction system can reduce these kinds of risks via early recognition and prevention. The main contributions of this research work for designing efficient diagnostic system are summarized as follows: 1. An intelligent fast diagnostic architecture named DeepR has been developed to improve the time efficiency of heart disease prediction. 2. An efficient intelligent diagnostic architecture named eDeepR has been developed to improve the classification accuracy of heart disease prediction. 3. This paper investigates the impact of increasing number of neurons in hidden layers for efficient performance by reducing the number of layers 4. This paper explores the comparison between deep neural network architectures with machine learning models in heart disease prediction.
2
Related Work
Many researchers have investigated computational intelligence techniques for the prediction of heart disease [13]. Literature shows that different heart disease prevention systems using ML algorithms, such as, Support Vector machine (SVM), Decision tree (DT), Linear regression, k-nearest neighbor (KNN), Linear Discriminate Analysis (LDA), Quadratic Discriminant Analysis (QDA), Random Forest (RF), Naive Bayes (NB), Artificial Neural Network (ANN), Logistic
DNN for Heart Disease Prediction
535
Regression [1,14–20]. Other ensemble and hybrid systems such as, hybrid model named χ2 -DNN is invented with combination of chi-square statistical feature selection to eliminate irrelevant features and DNN is used for prediction [21], a hybrid classification system based on RFRS Method [22] designed for the prediction of heart disease, hybrid random forest with a linear model [23] are proposed. Fuzzy rule based systems with the combination of DNN and ML algorithm are also proposed [24], an artificial immune recognition system with fuzzy resource allocation mechanism and KNN based model [25], multiple kernel learning with adaptive Neuro-Fuzzy Inference System based DL method [26] are designed. A combination of ML and DL [12] are proposed for heart disease prediction. Deep neural networks (DLL) are used for better prediction such as five layered DLL [27], neural networks ensemble method [28]. To test all these models utilized Cleveland heart disease dataset of UCI repository and high classification accuracy have been reported in last decades. Based on the advances in artificial intelligence [5], specifically developments of DL architectures have been performed to recognize various diseases automatically [29]. A detailed study of literature shows that Neural Network based methods have been adopted in medical diagnosis for better prediction due to their capability in handling complex linear and non-linear problems [30]. Therefore, we motivated to propose heart disease prediction system using deep neural networks for better prediction. In this paper, to significantly improve ML and DL models results, we designing and fine tuning DNN architectures for recognizing heart disease using Cleveland heart disease dataset from UCI repository [13].
3
Proposed Method
The proposed heart disease prediction models comprise three stages namely, data preprocessing, normalization and split data, and designing neural networks and are described in following sections. 3.1
Preprocess the Data
Given a set of data D, consists of attributes {f1 , f2 , f3 , ..., f14 } of Cleveland dataset (illustrated in Sect. 4.2) is defined as D = {f1 , f2 , f3 , ..., f14 }. The dataset can be further partitioned into features i.e., DF and class labels i.e., DCL . And are defined as follows, DF = {f1 , f2 , f3 , ..., f13 } and DCL = {f14 }. In data preprocessing stage, eliminated missing data indicated with a “?” in DF along with that removed the attribute values consisting of ‘NaN’, then for further easy analysis transformed dataset DF to numeric form. DCL is considered as target attribute and are named as class labels of D. 3.2
Normalization and Split Data
After data preprocessing, data need to be normalized, as the attribute values are in different ranges, it would be problematic to feed into a neural network.
536
V. K. Jyothi and G. R. K. Sarma
To make learning easy, data normalization is essential even though network might be able to learn. Feature wise normalization is a best way and widespread practice. In the proposed work the feature normalization is adopted, for each value of an attribute, we subtract the mean of the feature then divide it by the standard deviation, features will be centered around ‘0’, which will have a Σ(f ) unit standard deviation. In the proposed work data is normalized as, μ = Nj , where j = 1, 2, 3, ..., 13 and N isthe total number of samples and μ is mean of (f −µ)
j , where σ is standard deviation. After values of each feature, then σ= N data normalization, the data should be split for training and testing along with class labels. The data set DF is further partitioned into training DF T and testing DF t and it’s defined as DF T = {f1 , f2 , f3 , ..., fT i } and DF t = {f1 , f2 , f3 , ..., fti } respectively. The class labels DCL are partitioned into training and testing as DCLT ={fT i } and DCLt ={fti } respectively. Where fT i and fti consists of target t0 and target t1 , 0-no disease and 1-disease respectively.
3.3
Designing Neural Networks
Designing neural networks process is consisting of framework design, defining neural network, compile network and parameters tuning. Design Framework. The framework design and computational steps of proposed neural network system are illustrated in Fig. 1. It has been shown that the neural network architecture is organized into Ln fully connected layers, where i = L1, L2, ..., Ln, with ni artificial neurons or nodes per layer. The layers Li−1 to Li connections are established by weights Wi of size ni × ni−1 , bias vector bi of length ni . The output of layer Li will be size ni vector. If the input values for layer Li, given by the values at ni−1 nodes of layer Li − 1, are represented as a vector ai−1 of size ni−1 . The output of layer Li will be a vector of size ni with weights Wi ai − 1 + bi . Considered a batch size of bs vectors for training, the inputs ai−1 will be stored in matrices ALi−1 of size ni−1 × bs and the outputs ZLi will be the products of Wi ALi−1 + bi is shown in equation (1). ZLi = Wi ALi−1 + bi
(1)
Deep neural networks in the proposed model are trained by considering the normalized training data i.e., DF T along with each corresponding class labels i.e., DCLT of N number of samples and it is named as A0 or D. The first layer L1 takes inputs, A0 , the training data, along with weights W1 and bias b1 to the network, consists of ni number of neurons, and also consists of parameter such as kernal regularizers, activation functions and dropout. Therefore the output ZL1 will be W1 A0 + b1 as shown in equation (2). ZL1 = W1 A0 + b1
(2)
The output of first layer is passed as an input to next hidden layer and so on. Based on parameters, number of neurons and number of layers, in the proposed work, we designed two different deep neural networks to train the model.
DNN for Heart Disease Prediction
537
Fig. 1. DNN architecture and computational steps for training. Training data ‘D’ consists of 13 features and ‘N’ training samples, weights W, bias b are trained to make predictions t0 or t1.
Define Neural Network. The proposed neural network design is as shown in Fig. 1. In this work defined sequential network model, deeply connected neural network dense layer. Drop out and weight regularization is added to each hidden layer. Dropout is an effective neural network regularization technique [6,21]. By randomly pruning neurons and their connections from a network structure with a dropout rate during training, it can prevent DNNs from over fitting [22–24]. An activation function ‘relu’ is added for each hidden layer. For output layer an activation function ‘softmax’ is added. We have defined two deep neural network architectures based on varying the layers and neurons namely, DeepR and eDeepR. DeepR consists of 4 hidden layers and an output layer. Compile the Network. Once defining a neural network need to be compiled, in the proposed work to compile deep neural network model an optimizer called ‘adam’ is added along with categorical cross entropy loss function. Tuning Parameters. Optimization of hyperparameters is critical in structuring the accurate and efficient neural network. Tuning the depth and number of layers and number of neurons per each layer is challenging. The parameters control the deep neural networks, the network must be tuned significantly using parameters. The proposed deep neural network framework hyperparameters are tuned and are listed as follows: DnnP arameters = [Li, ni , α, epochs, bs]
(3)
where Li, is a ith layer, α is regularization function. using these hyperparameters the proposed deep neural network architecture training on input train dataset DF t and class labels DCL
538
4
V. K. Jyothi and G. R. K. Sarma
Experiments and Results
In this paper, we conducted experiments based on train-test split scheme. In existing models, the results are shown in 70% train −30% test of dataset [24]. The proposed model with two architectures, DeepR and eDeepR are trained with 50 epochs and batch sizes is 10 through the entire training dataset. 4.1
Metrics
To evaluate the performance of the proposed heart disease prediction model, different evaluation metrics namely, accuracy, sensitivity, and specificity [22,32] are utilized. Accuracy is the number of correctly classified data in the test dataset. Sensitivity represents the percentage of patients correctly classified. Specificity shows the percentage of correctly classified healthy data. 4.2
Dataset
The publicly available UCI repository Cleveland heart disease dataset is used in this research work because it is publicly available and therefore improves the reproducibility of results [13]. Data set consists of seventy six attributes, but for the experimentation all the researchers using a subset of fourteen attributes due to their availability, and ability to recognize heart disease at different phases of development. After preprocessing, important 14 features and 303 samples are selected and are used in experimentation. In 14 features, the thirteen features, namely age, sex, cp-chest pain type, trtbps-resting blood pressure, cholcholesteral, fbs-fasting blood sugar, restecg, thalachh-max. heart rate, exngexercise induced angina, oldpeak- ST depression induced by exercise relative to rest, slp- slope of the peak, caa, and thall and fourteenth feature “target”, in terms of ‘0’- no disease and ‘1’-disease are used. 4.3
Results
Proposed heart disease prediction system consists of two deep neural networks namely DeepR and eDeepR as illustrated in Sect. 3. Results are calculated for both networks and are shown in Table 1. DeepR consists of 4 hidden layers and an output layer, and 128 neurons to first hidden layer, 64, 16, and 8 neurons are passed to second, third and fourth layers respectively. Further investigated the impact of increasing number of neurons in hidden layers for efficient performance by reducing the number of layers. To this point of view an efficient intelligent diagnostic architecture named eDeepR has been developed to improve the performance of the heart disease prediction system. eDeepR consists of 3 hidden layers and an output layer, and 1024 neurons to first hidden layer, 512 neurons, 512 neurons are passes to second, third layers respectively. In both architectures Dropout neural network regularization is used to prevent from over fitting. An activation function ‘relu’ is added to each hidden layer. For output layer an
DNN for Heart Disease Prediction
539
activation function ‘softmax’ is added for an efficient classification. Results are calculated using classification accuracy, specificity and sensitivity metrics mentioned in Sect. 4.1. Results obtained for DeepR and eDeepR networks are shown in Table 1. Table 1. Results of proposed system with DeepR and eDeepR architectures. Models Accuracy Specificity Sensitivity Time (in ms for 1 epoch) DeepR 97.64 eDeepR 99.53
4.4
98.11 100
98.11 100
5 9
Comparative Study
The proposed DeepR and eDeepR networks are compared to other existing methods in the literature. To conduct experimentation, these existing methods used Cleveland dataset for the prediction of heart disease and are shown in Table 2. Table 2. Comparison of proposed model architectures and existing models in literature Year and Author
Methodology/Algorithms
Accuracy (%)
ToolDiag, RA [31]
IB1-4
50.0
WEKA, RA [31]
InductH
58.5
ToolDiag, RA [31]
RBF
60.0
WEKA, RA [31]
T2
68.1
WEKA, RA [31]
IB1C
74
Singh, A & Kumar, R., 2020. [1] Decision tree
79.0
Bharti, R et al., 2021 [12]
Random forest
80.3
Bharti, R et al., 2021 [12]
Decision tree
82.3
Singh, A & Kumar, R., 2020. [1] Support Vector Machine
83.0
Bharti, R et al., 2021 [12]
SVM
83.2
Bharti, R et al., 2021 [12]
Logistic Regression
83.3
Bharti, R et al., 2021 [12]
K-Neighbors
84.8
Polat, K et al.,2007 [25]
AIRS with fuzzy and KNN
87.0
Singh, A & Kumar, R., 2020. [1] k-nearest neighbor
87.0
Polat, K., et al., 2007 [25]
Fuzzy-AIRS–k-nn based system
87.0
Mohan, S et al., 2019 [23]
hybrid random forest with linear model 88.7
Das R et al., 2009 [28]
neural networks ensemble method
89.01
Liu, X, et al., 2017 [22]
FRS classification system
92.59
Ali, L et al., 2019 [21]
χ2 DNN
93.33
Ramprakash, P et al., 2020 [32]
DNN and c2 -statistical model
94.0
Bharti, R et al., 2021 [12]
DL
94.2
Paul, A., et al., 2017 [24]
Weighted Fuzzy system ensemble
95.56
Proposed DeepR
4 layered DNN with Dropout
97.64
Proposed eDeepR
3 layered DNN with Dropout
99.53
540
5
V. K. Jyothi and G. R. K. Sarma
Result Analysis and Discussion
We have the following observations from the proposed architecture. The proposed DeepR network is designed with 4 layers having comparatively a smaller number of neurons. It takes 5 ms for each epoch. It generates good results of 97.64% accuracy as compared to other existing models shown in Table 2. The proposed eDeepR network is designed with 3 layers having larger neurons than DeepR. Due to this the efficiency of the model is good and it generates 99.53% accuracy as shown in Table 2. Always that will not be a good solution to increase the number of neurons to improve the efficiency of the model. In this point of view we have designed two deep learning networks consisting of a smaller number of neurons i.e., DeepR and increasing number of neurons designed eDeepR and achieved better performance. As the number of neurons increased it consumes 9 milli seconds (ms) for the completion of one complete epoch. Increasing number of neurons increases the complexity of the architecture and consumes more time. Results of DeepR and eDeepR are analyzed by compared with well know ML and also some of DL models shown in Table 2. From the results we can analyze that the proposed networks DeepR and eDeepR achieved better results and are efficient.
6
Conclusion
In this paper we investigated state of the art system for the prediction of heart disease. Designed two deep neural networks namely, DeepR and eDeepR. An intelligent fast diagnostic system named DeepR is developed to improve the time efficiency of heart disease prediction. An efficient intelligent diagnostic system named eDeepR has been developed to improve the results. Through eDeepR model, investigated the impact of increasing number of neurons in hidden layers for efficient performance. Results are analyzed and explored the comparison between well know ML models and some of existing DL models in heart disease prediction. The comparative study and result analysis show that the proposed systems are efficient.
7
Future Direction
The current heart disease prediction systems are research with excellent proof of concept in achievement of results, further research is necessary in order to turn into design and development of robust diagnostic tools. An efficient system to automatically search for best features is necessary when the lack of clinical data for more accurate and detailed prediction. In future, necessary to develop a software package or tool with a friendly user interface to facilitate doctors or patients directly.
DNN for Heart Disease Prediction
541
References 1. Singh, A., Kumar, R.: Heart disease prediction using machine learning algorithms. In: International Conference on Electrical and Electronics Engineering (ICE32020), IEEE, 78-1-7281-5846-4/20 (2020) 2. Ayon, S.I, Islam, M.M. Hossain, M.R.: Coronary artery heart disease prediction: a comparative study of computational intelligence techniques. IETE J. Res. (2020). https://doi.org/10.1080/03772063.2020.1713916 3. Braunwald, E., Bonow, R.O.: Braunwald’s heart disease: a textbook of cardiovascular medicine, Ed. 9 (2012) 4. Libby, Zipes, D.: Braunwald’s Heart Disease: A Textbook of Cardiovascular Medicine, 2-Volume Set, 11th edn. (2018). ISBN: 9780323555937 5. Bleijendaal, H, et al.: Clinical applicability of artificial intelligence for patients with an inherited heart disease: a scoping review, Elsevier, Trends in Cardiovascular Medicine (2022) 6. Cho, Y.-R., Hu, X.: Network-based approaches in bioinformatics and biomedicine. Elsevier, Methods 198, 1–2 (2022) 7. Blassel, L., Zhukova, A., Villabona-Arenas, C.J., Atkins, K.E., Hue, S., Gascuel, O.: Drug resistance mutations in HIV: new bioinformatics approaches and challenges. Curr. Opin. Virol. 51, 56–64 (2021) 8. Cao, C., et al.: Deep learning and its applications in biomedicine. Genomics Proteomics Bioinform. 16, 17–32 (2018) 9. AlSaad, R., Malluhi, Q., Janahi, I., Boughorbel, S.: Predicting emergency department utilization among children with asthma using deep learning models. Healthcare Analyt. 2, 100050 (2022) 10. Bolhasani, H., Mohseni, M., Rahmani, A.M.: Deep learning applications for IoT in health care: a systematic review. Inform. Med. Unlocked 23, 100550 (2021) 11. Schmidt, B., Hildebrandt, A.: Deep learning in next-generation sequencing, Drug Discovery Today 26 (2021) 12. Bharti, R., Khamparia, A., Shabaz, M., Dhiman, G., Pande, S., Singh, P.: Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning, Hindawi Computational Intelligence and Neuroscience Volume 2021, Article ID 8387680 (2021) 13. UCI Repository of Machine Learning Databases. http://archive.ics.uci.edu/ml/ datasets 14. Aggrawal, R., Pal, S.: Multi-machine learning binary classification, feature selection and comparison technique for predicting death events related to heart disease. Int. J. Pharmaceutical Res. Schol. ISSN 0975-2366 (2020) 15. Salhi D.E, Tari, A., Kechadi, M.T.: Using machine learning for heart disease prediction. In: Advances in Computing Systems and Applications, pp. 70–78 (2021) 16. Rindhe, B.U., Ahire, N., Patil, R., Gagare, S., Darade, M.: Heart disease prediction using machine learning. Int. J. Adv. Res. Sci. Commun. Technol. 5(1) (2021) 17. Rajdhan, A., Sai, M., Agarwal, A., Ravi, D., Ghuli, P.: Heart disease prediction using machine learning. Int. J. Eng. Res. Technol. 9(04) (2020). ISSN: 2278-0181 IJERTV9IS040614 18. Srivastava, K., Choubey, D.K.: Heart disease prediction using machine learning and data mining. Int. J. Recent Technol. Eng. 9(1) (2020). ISSN: 2277-3878 19. Patel, J., Upadhyay, T., Paterl, S.: Heart disease prediction using machine learning and data mining technique. IJCSC 7, 129–137 (2016)
542
V. K. Jyothi and G. R. K. Sarma
20. Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. Int. J. Comput. Sci. Network Secur. 8(8), 343–350 (2008) 21. Ali, L., Rahman, A., Khan, A., Zhou, M., Javeed, A., Khan Khan, J.A.: An automated diagnostic system for heart disease prediction based on χ2 statistical model and optimally configured deep neural network. IEEE Access 7 (2019). https://doi. org/10.1109/ACCESS.2019.2904800 22. Liu, X., et al.: A hybrid classification system for heart disease diagnosis based on the RFRS method, Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2017, Article ID 8272091 (2017) 23. Mohan, S., Thirumalai, C., Srivatsava, G.: Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques, IEEE Access, special section on smart caching, communications, computing and cybersecurity for information-centric internet of things (2019). https://doi.org/10.1109/ACCESS.2019.2923707 24. Paul, A.K., Shill, P.C., Rabin, M.R.I., Murase, K.: Adaptive weighted fuzzy rulebased system for the risk level assessment of heart disease. Appl. Intell. 48(7), 1739–1756 (2017) 25. Polat, K., Sahan, S., Gunes, S.: Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn based weighting preprocessing. Elsevier, Expert Systems with Applications 32, 625–631 (2007) 26. Manogaran, G., Varatharajan, R., Priyan, M.K.: Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neurofuzzy inference system. Multimedia Tools Appl. 77, 4379–4399 (2017) 27. Tomov, N.-S., Tomov, S.: On deep neural networks for detecting heart disease (2018). https://arxiv.org/abs/1808.07168 28. Das, R., Turkoglu, I., Sengur, A.: Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36, 7675–7680 (2009) 29. Lopes, R.R., et al.: Improving electrocardiogram-based detection of rare genetic heart disease using transfer learning: An application to phospholamban p.Arg14del mutation carriers, Elsevier, Computers in Biology and Medicine, vol 131 (2021) 30. Samuel, O.W., Asogbon, G.M., Sangaiah, A.K.: An Integrated Decision Support System Based on ANN and Fuzzy-AHP for Heart Failure Risk Prediction. Expert Syst. Appl. 68, 163–172 (2016) 31. Benchmark datasets used for classification: comparison of results (umk.pl) (2007) 32. Ramprakash, P., Sarumathi, R., Mowriya, R., Nithyavishnupriya, S.: Heart disease prediction using deep neural network. In: Proceedings of the Fifth International Conference on Inventive Computation Technologies, IEEE Xplore (2020). Part Number: CFP20F70-ART; ISBN:978-1-7281-4685-0 (2020)
High-Performance Computation in Big Data Analytics Shabnam Kumari(B) and P. Muthulakshmi Department of Computer Science, CS&H, SRM Institute of Science and Technology, Kattankulathur, Chennai 603203, Tamilnadu, India {sk2581,muthulap}@srmist.edu.in
Abstract. For many years, big data analytics has relied on High-Performance Computing (HPC) for an efficient analysis. Today data is growing at an accelerated pace, so new types of high-performance computing will be required to access historically unprecedented volumes of data. To identify patterns and new insights, high-performance data analytics combines high-performance computing (HPC) with data analytics. The technique of quickly evaluating exceptionally big data sets to identify insights is known as high-performance data analytics. This is accomplished by utilising high-performance computing’s parallel processing to execute strong analytic tools. For government and commercial companies that need to integrate high-performance computing with data-intensive analysis, highperformance data analytics infrastructure is a new and rapidly rising sector. As we can see natural hazards are unpredicted and affects our day to day work unexpected rainfall. So climate prediction becomes a need for nowadays. For this a large amount of database is needed for storing, maintaining, processing the datasets for performing the prediction in well manner. Separately, big data computing and HPC has progressed throughout time. These two techniques are becoming increasingly dependent on one another for data management and algorithms due to the growth of data and the requirement for machine learning algorithms. For instance, public clouds like Microsoft Azure are enabling artificial intelligence algorithms on big datasets by deploying large-scale Graphical processing Unit (GPU) deployments in HPC clusters and adding high-performance computing instances with InfiniBand. Understanding the evolution of HPC systems and big data help to define the important differences, as well as the goals and architectures that support them. Big data systems have benefitted data management, data querying, and streaming applications. Keywords: Big Data Analytics · Artificial Intelligence · High Performance Computing · Cloud Computing
1 Introduction According to studies, HPC computers [1] have substantially aided machine learning, deep learning, and graph algorithms for analysis large amount of data. In general, faster © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 543–553, 2023. https://doi.org/10.1007/978-3-031-27440-4_52
544
S. Kumari and P. Muthulakshmi
and efficient data delivery can be difficult to achieve in the computation of data analytics but with the evolution of system complexity and parallelism while ensuring faulttolerance and scalability. Due to energy efficiency, we are currently concentrating more on minimising transfer of data at all stages of memory hierarchy, which calls for a rethinking of algorithms and the entire HPC software stack. In spite of malfunctions or other faults, these systems must keep operating. One of the key issues with Big Data is the paradox of choices. Overload of choices may lead to inaction. High-performance computer infrastructures have been used in big data analytics [2, 3]. Data centre aggregation and extensive sharing of computing resources have contributed to considerable reductions in calculation time since the advent of cloud computing. Cloud computing is gaining traction as a means of fully using HPC’s capabilities. Analysis of data faster and quicker has always been the aim of data analysts when it comes to huge data sets it is needed to be faster for an error-free process. Depending on the sort of data being analysed, system requirements can vary significantly. SQL queries against a structured database have very different system characteristics than real-time analytics or streaming data, for example. A data storehouse, NoSQL database, graphic analysis capabilities, or Hadoop or MapReduce processing may be included in a result, depending on the data and analytics processes. Data movement must be considered in addition to designing a system to suit the demands of the database and analytic processing algorithms. To make efficient use of the pricier cores, it should be considered how to ingest data from its source, stage data in preparation for analysis, and ensure processes are satisfied. Traditionally, Big Data Computing stacks have been intended to efficiently acquire, store, and analyse huge volumes and types of data at high speeds [4, 5]. Infrastructure and software design ideas are often tuned for cost-effectiveness. Because Big Data Computing stacks are almost always run in a Cloud environment, this research looks at it from that perspective. Analytics are mainly classified as descriptive, predictive, and perspective. Descriptive analysis is mainly based on visualization and predictive analysis uses statistical models. The perspective analysis uses sophisticated machine learning techniques and simulations. Any Big Data application follows generic phases as shown in Fig. 1.
Fig. 1. Phases of Big Data
2 Characteristics of Big Data Today Big data is being produced at a rapid rate by smart devices. The characteristics (The V’s) of big data are [8]:
High-Performance Computation in Big Data Analytics
545
• Volume – The Massive amounts of data that must be communicated from one source to another • Velocity - Information is created, transmitted, acquired and analysed at a certain pace. In order to provide real-time admittance to the several applications that depend on it, data must be transferred and accessed at a constant rate due to the rising rate at which it is generated. • Variety – Data that has been developed in various formats, both unstructured and organized, is referred to as variety. Within the columns of a database, structured data for example name, mobile number, address, financials etc. can be organised. These kinds of data are simple to enter, preserve, search for, and analyse. • Around 80% of current data is unstructured, which makes it more difficult to sort through and extract information from. Chats, music, blogs, images, social media updates, short videos, log files, and machine and sensor data are some of the examples of unstructured data. • Variability – The high amount of irregularity in data flow, as well as the variations which are present in the peak periods, are termed as variability. The fluctuation is explained by a large number of input attributes derived from a diverse set of data types and sources. Variability refers to the irregularity with which enormous datasets are confined into data storage. • Value - The hidden value that can be discovered from data and used to take action. Big data can help you gain a better understanding of the client, target them more effectively, streamline procedures, and improve machine or company performance. • Veracity - Big data can help you gain a deeper understanding of customer needs, target them more effectively, streamline procedures, and improve machine or company performance. • Validity - The precision with which data has been obtained for its target purpose is referred to as validity. Data governance rules should be followed to trend to continue data quality, standardized definitions, and metadata. • Vulnerability - When data is collected and stored, this term relates to the security of the data. • Volatility - This word refers to just how much long is the data valid and how long it must be archived until it is no longer relevant to the present investigation. • Visualization - It’s the procedure for making data understandable to non-technical decision-makers and stakeholders. Visualisation is the process of converting data to information, information to insight, insight to knowledge, and knowledge to a decision-making advantage. Through this research, we will be able to explore the different grounds and places HPC can be implemented in Big Data Analytics [6, 7]. We will explore the emerging highperformance architectures for data-intensive applications, efficient analytical strategies to boost data processing, etc.
3 Big Data and Cloud Computing Big data entails handling petabytes of data and the cloud’s scalable environment allows data-intensive applications to be launched to support corporate analytics. The cloud also
546
S. Kumari and P. Muthulakshmi
promotes internal connectivity and collaboration, giving more workers access to crucial analytics and accelerating data exchange. Cloud computing and big data are inextricably interwoven. Big data is more about extracting values, whereas cloud computing is focused on scalability, flexible, ondemand, and subscription - based self-service models. Cloud computing delivers ondemand integrating computer resources, required storage and processing abilities. Big data analysis needs enormous on-demand computational resources and dispersed storage. To fulfil the requirements of exponential data growth, cloud computing offers discrete processing for scaling and virtual machine expansion. It has aided in the creation of analytical platforms that deliver contextually processed information from all data stored to meet the demands of users, especially those in big data-driven organisations. As somewhat of a consequence, corporations such as Amazon, Microsoft, and Google have benefited. They have begun to offer big data platforms that are both viable and capable of gathering data and analysing it to provide proactive and contextual experiences. It makes scaling easier and faster. Increased processing power, storage, and other resources are needed to handle huge quantities for both structured and unstructured data. (“Cloud plus data: 5 key benefits of the powerful combination”) The cloud not only offers readily obtainable infrastructure as well as the ability to quickly scale that infrastructure to grip significant increases in traffic or usage. Furthermore, cloud-based big data mining has decreased the cost of analytics. As well as lowering on-premises equipment, you can get a good deal on system maintenance and updates, energy ingestion, facilities management, and more. The security and privacy of personal data is one of the most significant aspects of big data. Given that data is stored and processed via third-party infrastructure and services, maintaining privacy and security within cloud are a significant challenge. The risk to privacy and security increases as there is more data, more varied, and more accurate big data. Mobile health has changed how medical services are delivered in a number of ways. Through a range of healthcare mobile applications, consumers are capable of managing their lifestyles, health and wellness, medication reference and analysis. Confidential information may be compromised because everything is transmitted via mobile internet, which makes it possible for hackers or other outside parties to access the network. To engender customer confidence and ensure that their data is not compromised, a number of new law, privacy laws, guidelines, protections, industry regulations, and contractual arrangements need to be developed between providers and customers. You may concentrate on producing insights instead of worrying about the technical components of big data processing. Even better, cloud’s pay-as-you-go model is more economical and wastes fewer resources. To comprehend why these technologies are frequently grouped, you must first comprehend what Big Data and cloud computing are. The best basic definition of Big Data is a big amount of data—think terabytes, petabytes, or even more. There are two types of data: structured and unstructured [9, 10]. This data can be so vast that it is impossible to process using typical database and software methodologies. In its simplest form, cloud computing is the act of storing and accessing data, files, and software over the Internet as opposed to on a local hard disc. The cloud is the personification of the Internet.
High-Performance Computation in Big Data Analytics
547
4 Big Data and Deep Learning Deep learning algorithms utilise hierarchical multi-level strategy to process data. Data at an advanced level is handled using similar techniques and low-level data representation. Four of the most important traits of Big Data are Volume, Variety, Veracity and Velocity. Deep learning plays a significant role in addressing the problems associated to volume and variety. Huge volumes of data are easily extracted by deep learning while shallow learning fails to handle. Deep Learning dealt with abstraction and representation and helps in the analysis of raw data of different types. This eliminates the requirement for humans to enter extra attributes from each new data type. Once hierarchic of data abstractions are absorbed from unsupervised data using deep learning, more conventional and discriminative models can be trained using proportionally less supervised/labelled data points. In this case, labelled data is collected by humans or expert input. When comparing shallow and deep learning architectures, the latter has proven to be more effective at obtaining non-local, global correlation and trends in data. Some notable aspects of Deep Learning-learned abstract representations encompass: • With relatively simple linear models, the information gained from the really intricate and abstract data representations works efficiently. • The extraction of data representation from unsupervised data and with the increase in doing so automatically, its application is broadened into data types like textural, image, sounds etc. • Elevated stages of abstraction and depiction of raw data can yield relational and semantic knowledge. Although there are many facets of data representation depends on deep learning, the ones addressed here are intended for Big Data Analytics. The data models of big data are getting complex day by day. Applying Machine Learning algorithms in these complex models is becoming challenging. Convolution neural network and deep belief network are two examples of deep learning methods for big data. Due to sophisticated nature of big data and numerous layers in the network, High-Performance Computation. 4.1 Semantic Indexing Information retrieval and efficient data storage are two of the most vital responsibilities of big data. Data is gathered and made available across a variety of disciplines. The sheer amount of data was always a problem. To circumvent this, data must be stored as semantic indexes rather than as bit strings. It facilitates in the more efficacious presentation of data. It allows massive volumes of data to be processed more quickly. Rather than consuming raw input for data indexing, deep learning is widely utilised to build high-level abstracted representations of the data that may be castoff for semantic indexing. Such depictions can expose complex relationships and characteristics (particularly when raw data set is Big Data), resulting in semantic understanding and knowledge. Data representations are important for data indexing because they make it possible to
548
S. Kumari and P. Muthulakshmi
keep data instances or points with similar representations adjacent to one another in memory, which makes information extraction more efficient [11]. The elevated abstract data representations must, however, be relevant and display relational and semantic connectivity to really give a decent semantic grasp and knowledge of the input. Deep Learning facilitates the understanding of data’s semantics and relationships, and vector representation of data instances speeds up searching and retrieval [12]. Every time a data input is conveyed by vector representation, allows for a vector-based comparison which is most effective than relating instances derived from raw input, it will be used directly for semantic indexing. This is because, as opposed to using raw big data, learnt complex information representations use semantic and relational information. Data instances having identical vector representations are probable to have similar semantic meanings. As a result, semantic indexing is possible when complicated high-level data abstractions are represented as vectors. 4.2 Statistical Modelling The efficiency of a computational technique is dependent on the use of an appropriate statistical model. The initial stage in constructing any analytical solution is statistical modelling. Given the variability of data, the complexity of the mixture population model is considered a challenge. Calculating relevant statistical indicators inside and across mixture model subpopulations presents a concern. Because of the multiple dimensions, big data analytics has a high dimensionality and sparse nature, which adds to other challenges such as noise accumulation. Other undesired situations, like misleading correlations and inadvertent endogeneity, are caused by high dimensionality and have direct detrimental consequences on analytics outcomes. Statistical models are the source of the majority of predictive analytics algorithms. The aforementioned statistical big data modelling concerns pose challenges in terms of analytics accuracy.
5 Challenges of Deep Learning in Big Data 5.1 Utilization of Incremental Learning Data that is not Stationary Managing the Streaming and Fast-moving Continuous Input Data has been one of Big Data Analytics’ most difficult problems. Analysing these kinds of data is important in real-time monitoring tasks. Deep learning must adapt to handle such flowing data. 5.2 High-Dimensional Data While learning using High-Dimensional data like images, because of the slow learning process of a multi-level hierarchy of data, some deep learning algorithm is not efficient. High-Demand of data is not only complex but also contributes largely to the volume. So, this complexity of data is hard to extract and process quickly and it is necessary for deep learning to adapt to such data.
High-Performance Computation in Big Data Analytics
549
5.3 Large Scale Models With a focusing on models with a large number of training sets that may extract more delicate features and representations, the value of large networks has been empirically shown. 5.4 Future of Deep Learning in Big Data Usually, Deep Learning algorithms [12] learn the pattern of data by recognising a part of the data. Utilize that learnt pattern for extracting the data abstractions and representation. A common question is what volume of data is English to recognize the pattern and generalise fresh data from same Big Data applications domain. Variety is to be considered when shift between input data source and target source in Big Data Analytics for deep learning, the issue becomes one of domain adaptation. It is vital to have a strong data representation which is efficient for data tagging and semantic indexing for effective data extraction. 5.5 Application of Machine Learning Algorithm 5.5.1 PageRank Page Rank is an algorithm founded by Lary Page for the Google Search Engine to rank the web pages. This algorithm can be used in ranking large amounts of data. It is used to predict how probable that data might be useful on a Random Walk. Page Rank is represented as,
PR(u) =
v∈Bu
PR(v) L(v)
(1)
5.5.2 Single Source Short Path (SSSP) SSSP is also known as the Bellman-Ford algorithm is used to minimize the distance between the source vertexes to another vertex. These algorithms are mainly in finding the shortest distance for driving direction in maps and various other applications. The Distance is represented as, Distance(v) = min(Distance (v), minu | (u, v) ∈ E {Distance(u) + w(u, v)}) (2)
6 Parallel Computation in Big Data With implementing parallelism in computation or analysis process, we can save time and cost for making a decision. This section will discuss about how parallel computation can takes place in big data processing?
550
S. Kumari and P. Muthulakshmi
Fig. 2. Types of Parallel Computing
6.1 What is Parallel Computation? Multiple processors work together to perform multiple, smaller calculations that are broken down from larger, more complex tasks in parallel computation. Parallel computation is the act of breaking down complex issues into smaller, related sets of data, which are then executed in parallel by several processors communicating over shared memory. Among the four basic types of parallel computing as shown in Fig. 2. By boosting the systems’ available computing capacity, parallel computing expedites the processing of applications and the completion of tasks. Most supercomputers are built using parallel computing concepts. In operational contexts where a large amount of processing capability or calculation is required, parallel processing is frequently used. 6.2 Parallel Computation Models of Data Analysis The parallel computing paradigm can be regarded as the foundation of all big data analytics, as it aims to maximise resource utilisation while yielding significant time savings. MapReduce (a Hadoop framework utilized to building apps which could manage massive amounts of data on huge clusters) is the ideal option for processing large amounts of data and has dominated the big data world. Researchers are looking for ways to improve the most common parallel programming approach. For big data analytics, massively parallel programming frameworks might help overcome storage and communication limitations. Because of the data’s heterogeneity and unstructured nature, parallel computing models are required. Whilst efforts are being made to use parallelism when processing unstructured data such as text, cost and for large data analytics, time-effective parallel computing would still be a long way off. Deep learning for big data would be a novel concept that attempts obtaining important insights from massive amounts of data using machine learning techniques. These programmes are designed to run in parallel. The sheer volume of data, as previously stated, presents numerous obstacles in the deep big data analytics implementation.One of the key goals of analytics is to assist visualisation, which can only be accomplished effectively with the help of appropriate parallel processing. Designing successful massively parallel systems for data with a great volume, variety, and velocity is exceedingly difficult, and it remains a difficulty in big data analytics.
High-Performance Computation in Big Data Analytics
551
6.3 Hadoop MapReduce Hadoop’s MapReduce architecture is used to build applications which can process huge amounts of records on large clusters. It is sometimes described as programming model for analysing huge datasets over several computer clusters. With this programme, data can be preserved in a distributed format. It simplifies the understanding of vast amounts of data and vast computation. In MapReduce, there are two main tasks: map and reduce. Before going on to the next assignment, we must finish the previous. In the map operation, we separated the input dataset into sections. The Map job processes these bits in parallel. To minimise jobs in the map, we use outputs as inputs. Reducers reduce the amount of tasks by disintegrating the intermediate data from the maps to smaller tuples, resulting in the framework’s ultimate output. The MapReduce framework helps with task management and scheduling. The framework runs the failed tasks again. Even for programmers who aren’t experienced with distributed computing, this framework is simple to use. MapReduce can be built in a variation of programming languages, including Java, Hive, Pig, Scala and Python.
7 Issues of Parallelism Implementing parallelism in big data analysis, cost us in some way like heavy processing, initial setup cost, etc. Many other issues also raised with it, which can be discussed here as: 7.1 Fault-Tolerance Fault Tolerance is a property that allows a system to carry on to function even if a component fails. Fault tolerance in Hadoop refers to the cluster environment’s automatic load balancing and data processing backup to replica nodes in the case of a failure during runtime. As a result, no data is lost, and processing does not stop in the middle. Developers in Hadoop don’t have to worry about fault tolerance because the framework takes care of it. Furthermore, when data is stored in a data node, it is replicated by a replication factor. By default, the replication factor is 3, which implies that each data node is duplicated three times… as a result, in the case that the data node fails, the name node processes data from other nodes, ensuring fault tolerance. 7.2 Memory Model To logically separate the worldwide address space and plot it to system resources, several approaches have been utilised, including array distribution and replication. Distribution techniques such as block, round-robin, and user-defined were all common. With the likelihood of increasing heterogeneity and specialisation of computer cores, data mapping on future platforms will need to represent the projected spectrum in placements of said tasks which utilize it.
552
S. Kumari and P. Muthulakshmi
7.3 Storage and I/O Because they demand vast quantities of input data, numerous scientific data analytics applications, such as seismic algorithms, nowadays are more I/O bound in modern systems. The Kirchhoff migration algorithm, that often utilises depth migration computation to develop three-dimensional visualizations for the subsurface of the earth for instance, it can have over 500 million traces, resulting in many TBs of data.
8 Conclusion This paper focused on difficulties and requirements of big data analytics, and also on how to use HPC to improve the speed and efficiency of data processing. The paper’s major purpose is to describe how and why HPC and Big Data Models are integrated. A Discussion of Deep Learning also helped discuss Semantic Indexing, etc. We even discussed cloud computing as the future of cloud computing makes maximum use of HPC. We also discussed how cloud computing decreases the risk of security, etc. A study about parallel computation was also done, concluding how it could increase the efficiency of data analysis. We also discussed the machine learning aspects of Big Data and some applications like PageRank and SSSP. The discussion of Parallelism opens a lot of ground for further discussion. These concepts will be explored and implemented to aid in the development of a data-intensive network of applications that is fault-tolerant, among other things.
References 1. Tulasi, B., Wagh, R.S., Balaji, S.: High-performance computing and big data analytics– paradigms and challenges. Int. J. Comput. Appl. 116(2) (2015) 2. Asaadi, H., Khaldi, D., Chapman, B.: A Comparative survey of the HPC and big data paradigms: analysis and experiments. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 423–432 (2016). https://doi.org/10.1109/CLUSTER.2016.21 3. Big Data Meets High-Performance Computing”, Intel® Enterprise Edition for Lustre* software and Hadoop combine to bring big data analytics to high-performance computing configurations. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ big-data-meets-high-performance-computing-white-paper.pdf 4. The convergence of HPC and Big Data: What does it mean for HPC sysadmins?. https://ins idehpc.com/2019/02/the-convergence-of-hpc-and-bigdata-what-does-it-mean-for-hpc-sys admins/ 5. Anderson, M., et al.: Bridging the gap between HPC and big data frameworks. In: Proceedings of the VLDB Endowment, vol. 10, issue No.- 8, pp. 901–912 (2017). https://doi.org/10.14778/ 3090163.3090168 6. Muniswamaiah, M., Agerwala, T., Tappert, C.: Big data in cloud computing review and opportunities. Int. J. Comput. Sci. Inform. Technol. 11(4) (2019) 7. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015). https://doi.org/10.1186/s40537-014-0007-7 8. Kumari, S., Muthulakshmi, P.: Transformative effects of big data on advanced data analytics: open issues and critical challenges. J. Comput. Sci. 18(6), 463–479 (2022). https://doi.org/ 10.3844/jcssp.2022.463.479
High-Performance Computation in Big Data Analytics
553
9. Kumari, S., Vani, V., Malik, S., Tyagi, A.K., Reddy, S.: Analysis of text mining tools in disease prediction. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.-P. (eds.) HIS 2020. AISC, vol. 1375, pp. 546–564. Springer, Cham (2021). https://doi.org/ 10.1007/978-3-030-73050-5_55 10. Mishra, S., Tyagi, A.K.: The role of machine learning techniques in internet of things-based cloud applications. In: Pal, S., De, D., Buyya, R. (eds.) Artificial Intelligence-based Internet of Things Systems, Internet of Things (Technology, Communications and Computing). Springer, Cham (2022). https://doi.org/10.1007/978-3-030-87059-1_4 11. Varsha, R., Nair, S.M., Tyagi, A.K., Aswathy, S.U., RadhaKrishnan, R.: The future with advanced analytics: a sequential analysis of the disruptive technology’s scope. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.-P. (eds.) HIS 2020. AISC, vol. 1375, pp. 565–579. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-730505_56 12. Tyagi, A.K., Rekha, G.: Challenges of applying deep learning in real-world applications. Book: Challenges and Applications for Implementing Machine Learning in Computer Vision, IGI Global 2020, pp. 92–118 (2020). https://doi.org/10.4018/978-1-7998-0182-5.ch004
Evaluation of Semantic Parsing Frameworks for Automated Knowledge Base Construction Martin Verrev(B) Tallinn University of Technology, Tallinn, Estonia [email protected]
Abstract. Semantic parsing is a subfield of natural language understanding that translates natural language utterances into detailed representations of their meaning. Though numerous meaning representation frameworks exist, none has been universally accepted. The current paper provides a comparative study of the most promising semantic representations of text, taking into account parsers, performance, extensibility, available corpora, and other aspects - used for the automated construction of a knowledge base for commonsense reasoning. In parallel, a corpus was constructed, and experiments were conducted to capture the linguistic attributes essential for automated reasoning. Based on the findings the author suggests using said parsers in an ensemble. Keywords: knowledge extraction · natural language understanding semantic parsing · meaning representations
1
·
Introduction
We are surrounded by intelligent and sophisticated technologies that excel in performing domain-specific tasks but are often found lacking where general knowledge is required. On the other hand, humans are capable of solving tasks that need reasoning – the capacity to make sense of things, applying logic, adapting to or justifying practices and beliefs based on often incomplete information. It is possible due to having prior commonsense knowledge - facts about the everyday world everyone is expected to know, and natural language is the medium that captures such knowledge. The current paper focuses on semantic parsing, a sub-field of natural language understanding concerned with mapping natural-language utterances to detailed representations of their meaning in formalized representation languages - having an ontology of types, properties, and relations. Representations thus generated go beyond shallow identification of roles and objects in a sentence and reflect the meaning of a sentence as understood by the native speakers of the said language, being both machine and human-readable. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 554–563, 2023. https://doi.org/10.1007/978-3-031-27440-4_53
Evaluation of Semantic Parsing Frameworks
555
While numerous meaning representation notations exist, none of them has been universally accepted. Several technologies exist that all seem to capture various aspects of the meaning but fail in different, often unexpected contexts. Thus, it is crucial to investigate and conduct experiments on existing technologies and provide a systemic approach to evaluating them. The contribution of the paper is a review of the existing frameworks and a comparative study in the field of semantic parsing - being valuable for both the researcher and the practitioner who needs to start applying the parsers in his or her knowledge extraction and formalization tasks. The paper targets an accurate, efficient, and extensible method for meaning extraction for automated graph-based commonsense knowledge base construction. For this, the following research questions were stated: What is the current state of the art in the field of semantic parsing? What are the criteria for choosing a parser for a semantic parsing task? What is the optimal parser or set of parsers for knowledge base construction for commonsense reasoner?
2
Related Works
Davis [1] provides a systemic overview of logic-based formalisms for commonsense reasoning. Kamath [2] provides survey approaches and formalisms for parse generation. The shared task of CoNLL 2020 - Cross Framework Meaning Representation Parsing [3] shared task covered the translation of natural language utterances to different graph representation flavors. Several papers have been published comparing aspects of different frameworks - e.g., UCCA vs. AMR [4], or UCCA vs. DRS [5].
3
Methodology
During the preliminary phase, a review was conducted of existing semantic parsing frameworks to identify the key features and attributes of each framework. The following characteristics for each representation were identified: (a) goals: primary features and driving motivation behind the framework; (b) research: anchor and other notable publications. (c) theories: linguistic and logical theories behind the framework; (d) data underlying datasets, availability of annotated corpora and semantic representation; (e) tooling: availability of parsing and visualization tools. After selecting the frameworks for further analysis, the parsing tools were reviewed and chosen for each notation based on the following attributes: (a) publicly available source code to re-create the results. (b) availability of journal articles or conference papers regarding said tool and (c) available corpora and models to re-create experiments described in said articles. For conducting the experiments, a representative sample of said tools was chosen: for frameworks having over two parsers, the following criteria were applied: development activity - how active and up-to-date the development of said parser is; accuracy - the parsing accuracy of said tool based on literature; and
556
M. Verrev
lightness - given a choice between otherwise matching and equally performant parsers, the more lightweight one was chosen. In addition, a minimal corpus was constructed to capture the essential linguistic features during the experiments. For the evaluation of parse results, a scale was constructed to evaluate the performance of said parsers. Due to different notations and aspects captured, two distinct scales were defined: qualitative hand-evaluation for correctness: how accurately does the parser capture the meaning and quantitative evaluation expressiveness: how much information does it capture. The frameworks were assessed on said scales. After evaluating the performance, a qualitative analysis was conducted, taking into account the features of the framework given for the suitability of using it in a preliminary step for constructing a commonsense knowledge base.
4
Semantic Representation Frameworks
The following frameworks were chosen for further analysis during the preliminary phase, as described in the previous section. 4.1
Abstract Meaning Representations
Abstract Meaning Representation (AMR) is a semantic formalism based on propositional logic and the neo-Davidsonian event representations where each representation is a single-rooted, directed graph. AMR is strongly biased towards English though it does support multilingual meanings. Its concepts are either English verbs, PropBank framesets, or specific keywords. AMR also supports NER, question detection, within-sentence co-reference, modality, and question identification. Limitations of AMR are the lack of a universal quantifier and missing inflections for tense and number. AMR 3.0 has two corpora publicly available with licensing terms not specified, with additional corpora available for LDC members. The Little Prince Corpus consists of 1562 and Bio AMR Corpus of 6952 sentences. AMR parsers have numerous implementations - on GitHub alone, there are 45 projects tagged as so. JAMR1 parser was initially constructed for Semeval2014. JAMR is built on Scala 2.0 and is supported by 200 hand-aligned sentences of the AMR corpus. Transition AMR Parser2 is a transition-based parser for AMR built on top of Pytorch. It consists of a state machine and oracle transforming the sequence-to-graph task into a sequence-to-sequence problem and a sequence-to-sequence model that encodes the stack and buffer state of the parser into its attention heads. The project is under active development. amrlib3 is a Python3 library built on Pytorch for AMR parsing, generation, and visualizations that can also be used as a Spacy library, making it useful for constructing an extraction pipeline. The project is under active development. 1 2 3
https://github.com/jflanigan/jamr. https://github.com/IBM/transition-amr-parser. https://github.com/bjascob/amrlib.
Evaluation of Semantic Parsing Frameworks
4.2
557
Universal Conceptual Cognitive Annotation
Universal Conceptual Cognitive Annotation (UCCA) is a language-agnostic annotation scheme based on Basic Linguistic Theory. Natural language utterances are converted to a graph containing purely semantic categories and structure, whereas syntactic categories and structure are discarded. The meaning representations are not for any specific domain or language - but provide a coarse-grained interpretation that allows for open-ended and partially automated extensions using cognitively motivated categories. The foundational layer can be extended by adding extra domain or language-specific layers. The focus of UCCA has been the ease of annotation. The base element of the foundational layer is a scene - describing movement, action, or event [6]. UCCA has annotated corpora in English, French, and German available under the Creative Commons 3.0 license. The following parsers were identified f for UCCA: UCCA Parser4 was constructed for 2019 SemEval Task 1: SemEval 2019 Task 1: Cross-lingual Semantic Parsing with UCCA and runs on Python3 and Pytorch. TUPA is a transitionbased UCCA parser5 based on bidirectional LSTM.’ 4.3
Universal Dependencies
Universal Dependencies6 (UD) is a project that combines Stanford Dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for encoding additional morphological information to the syntax. [7] UD provides a universal annotation scheme and a set of categories for multilingual corpora consisting of over 200 treebanks in over 100 languages. [8] UDepLambda7 [9] is a parser that performs semantic parsing on UD corpus and transforms the text into lambda calculus. uuparser8 [10] is notable for its ability to parse the input texts on multi-treebank models. Stanza9 is a Python NLP package that is shipped with a dependency parsing module. The project is under active development. 4.4
Elementary Dependency Structures
Elementary Dependency Structures (EDS)10 present an approach to Minimal Recursion Semantics (MRS) banking. MRS is an approach where each input item in a corpus is paired with elementary predicates - meaning single relation with its associated arguments - followed by manual disambiguation of quantifiers. 4 5 6 7 8 9 10
https://github.com/SUDA-LA/ucca-parser. https://github.com/danielhers/tupa. https://universaldependencies.org/. https://github.com/sivareddyg/UDepLambda. https://github.com/UppsalaNLP/uuparser. https://stanfordnlp.github.io/stanza/. http://moin.delph-in.net/wiki/EdsTop.
558
M. Verrev
[11] The semantic form is based on the notion of semantic discriminants - local dependencies extracted from full-fledged semantic representation. [12] PyDelphin11 is a toolkit that supports EDS, MRS, and DMRS formalisms. HRG Parser12 is a string-to-graph parser for EDS graphs where parsing is done in two steps - syntactic parsing using SHRG grammar and semantic interpretation of the syntax. 4.5
Prague Tectogrammatical Graphs
Prague Tectogrammatical Graphs (PTG) provides annotations in English and Czech languages. The English sentences are from a complete English Web Text Treebank13 and a parallel Czech corpus morphologically annotated and parsed into surface-syntax dependency trees in the Prague Dependency Treebank (PDT) 2.0 annotation style based on the same sentences. Noteworthy is the annotations having multiple layers - an analytical (surface-syntax) layer consisting of dependency structures, semantic labels, argument structure, and ellipsis resolution; and a manually constructed deep-syntax tectogrammatical layer on top of that. [13] PERIN14 is a universal cross-framework semantic parser for five frameworks for CONLL 2020 shared task Cross-Framework Meaning Representation Parsing (MRP2020) 15 that supports AMR, DRG, EDS, PTG, and UCCA formalisms. [14] 4.6
Discourse Representation Structures
Discourse Representation Structures (DRS) is a semantic formalism based on Discourse Representation Theory. In contrast to ordinary treebanks, the units of annotation in corpus are texts rather than isolated sentences. [15]. Basic DRSs consist of discourse referents like x representing entities and discourse conditions like man(x) representing information about discourse referents. [16] The corpus is based Groningen Meaning Bank that annotates English texts with formal meaning representations rooted in Combinatory Categorial Grammar [17]. TreeDRSparsing16 is an English to DRTS labeled tree parser using multihead attention model that also includes pre-trained embeddings. EncDecDRSparsing17 is an open-domain neural semantic parser performing prediction in three stages: structure prediction, predicate and relation prediction, and variable prediction [18].
11 12 13 14 15 16 17
https://github.com/delph-in/pydelphin. https://github.com/draplater/hrg-parser. https://catalog.ldc.upenn.edu/LDC2015T13. https://github.com/ufal/perin. http://mrp.nlpl.eu/2020/. https://github.com/LeonCrashCode/TreeDRSparsing/tree/bs sattn drssup. https://github.com/EdinburghNLP/EncDecDRSparsing.
Evaluation of Semantic Parsing Frameworks
4.7
559
Universal Decompositional Semantics
Universal Decompositional Semantics18 (UDS) framework is different from other formalisms because it decodes the meaning in a feature-based scheme - using continuous scales rather than categorical labels. The meaning is captured as a node- and edge-level attribute in a single semantic graph having the structure deterministically extracted from Universal Dependencies. UDS treats parsing as a sequence-to-graph problem - the graph nodes created are based on input sequence, and edges are dynamically added during generation. [19] PredPatt19 is a seq2seq parser for UDS written in Python3 and using either Stanford or Berkley dependencies. Predpatt can be used for layering semantic annotations atop UDS treebanks or considered a component for universal information extraction. It is a part of Decomp Toolkit - a toolkit for working with the UDS dataset - the dataset having 70 657 annotated nodes in total [20]. MISO20 is a transformer-based, multi-formalism deep learning network that transforms the utterance into a UDS graph, amongst others, built heavily on top of AllenNLP.
5
Results and Evaluation
The input and output data for experiments can be found at https://github.com/ martinve/idsa2022. 5.1
Test Corpus
A test corpus was constructed consisting of 351 sentences (3858 tokens) from initial experiments conducted to evaluate the robustness of parsers. The sources for sentences were: CommonsenseQA21 (312 sentences), Geoquery Data22 (5 sentences) and synthetic examples capturing the essential linguistic features for translating the text to logical form (32 sentences). These features are: handling simple facts; extraction of predicates from traditional set theory; extraction of universal and existential quantifiers; handling of negation; handling logical connectives: conjunction, disjunction, and implication; handling of equality; handling of multiple variables and identification and extraction of questions. After initial experiments, the baseline test corpus was pruned, and as a result, a minimal corpus - consisting of 58 sentences (594 tokens) remained.
18 19 20 21 22
http://decomp.io/. https://github.com/hltcoe/PredPatt. https://github.com/esteng/miso uds. https://huggingface.co/datasets/commonsense qa. https://www.cs.utexas.edu/users/ml/nldata/geoquery.html.
560
5.2
M. Verrev
Frameworks, Tools, Models
The frameworks were summarized based on the unit of information and parse depth to conduct further experiments. We follow the classification of Koller [21] for generated dependency graphs based on the relation of graph elements to surface tokens. For bi-lexical dependency graphs (type 0 ), the graph nodes correspond to surface lexical units. Anchored semantic graphs (type 1 ) are characterized by relaxing the correspondence relations between nodes and tokens while still explicitly annotating the correspondence between nodes and parts of the sentence. For unanchored dependency graphs (type 2 ), the correspondence between the nodes and tokens is not explicitly annotated. A set of parsing frameworks and parsers were chosen for conducting the experiments. AMR was chosen as an un-anchored representation, providing the highest level of abstraction from surface tokens. From anchored frameworks: UCCA, EDS, PTG, DRS, and UDS that provide a level of abstraction from the surface form but still retain portions of it, UCCA supports the single sentence as a unit of information and DRS for supporting a passage consisting of multiple sentences were chosen. In addition, UDS was added to the test battery due to deterministically including UD bi-lexical annotations in the resulting parse graph. 5.3
Parse Output
Meaning representations vary greatly depending on the framework used. Here we explore the parse graphs for a sentence “Glass does not conduct electricity” 23 . AMR parse graph is intuitive to interpret and does not need post-processing for evaluation. The graph is represented in Penman notation, being straightforward to process further if required. The negative polarity is explicitly denoted by polarity attribute and in the case of question detection, it is represented by amr-unknown keyword. UCCA parse graph is generated via the intermediate step. The output for the parser was in XML format - one item for each sentence. The intermediate XML results were additionally processed using ucca-tool 24 to visualize the constructions. In addition to textual representation, graph representations were created for each result using the said tool. The parse result is a combination of UCCA foundation-layer categories: e.g., glass as a primary participant in said sentence denoted by category A. UDS parse results are intuitive to process and interpret. Due to the UD parsing mechanism - annotations are added as labels and not as an acrylic graph - the parser performs extremely well on complex sentences. During the initial experiments, no breakage was encountered. Verbose UDS parse also added dependency
23 24
https://github.com/martinve/isda2022/blob/main/example.md. https://github.com/sriram-c/ucca-tool.
Evaluation of Semantic Parsing Frameworks
561
relations25 to the parse graph, while simplified parse graphs omitted such features. DRS parse is a graph consisting of constants and variables. It is noteworthy that thematic roles and temporal quantifiers are added to each parse. The graph has a recursive structure for discourse representation that sometimes results in infinite loops - e.g., for the sentence “A revolution is when something revolves around something else.” It is not known to the author whether it is the issue of specific parser implementation or the model being used. 5.4
Results and Evaluation
Typically the generated representations are compared to gold-standard annotations to evaluate the quality of parse results. Numerous evaluation metrics exist that evaluate different types of information obtained on different levels of granularity: typically an F1 score for graph properties or framework dependent, e.g., SMATCH score for AMR. Due to a variety of annotation schemes and not having the ‘correct’ gold-label annotations, two evaluation measures were defined: granularity and robustness. Granularity was defined as the ratio of tokens in the input sentence compared to a number of semantic attributes captured, averaged over the whole corpus. To evaluate robustness, human evaluation was conducted by the author. In addition to textual representations, graphical representations were generated for UCCA to aid in evaluation. Each result was manually graded on a scale of 0..1. If the information captured was deemed complete and accurate, it was graded 1. If a portion of information was missing, it was graded 0.5. If arbitrary or non-relevant information was added – as it is hard to detect such errors in the knowledge base – the score was lowered by 0.3 points. If essential information was not present or the parse failed, the grade was 0. For each framework, the grades were averaged over the whole corpus (Table 1). Table 1. Comparative summary of robustness and granularity values for semantic parsing frameworks Framework Parser
Model
Robustness Granularity
AMR UCCA UD DRS
Parse T5 v0.1.0 ucca-bilstm-1.3.1 UDS 1.0 built-in
0.94 0.96 0.92 0.92
AMRLib TUPA PredPatt TreeDrsParser
0.14 0.19 0.08 0.02
The robustness metric suggests that all frameworks chosen performed similarly well - though none achieved flawless results - providing incomplete parsing 25
https://universaldependencies.org/en/dep/.
562
M. Verrev
results. The granularity metric indicates additional information generated by the parsing process - a lower value indicates additional information being added temporal variables in the case of UDS and named entities for AMR. The usefulness of such information is dependent on the context of the results being used.
6
Discussion
Due to the ambiguous nature of natural language, no parser alone is ideally suitable for the task. On the other hand, all the parsers chosen performed well in different aspects of capturing the meaning. The robustness of UDS is suitable for preprocessing the input - simplifying the structure of the sentence, and splitting it into key components. At the same time, the splitting boundary for some input sentences seemed arbitrary. On the other hand, AMR is suitable for explicit negation extraction and question detection. Additionally, the Penman output format is suitable for further post-processing due to its rigid yet flexible structure. On the other hand - adding additional hand annotations is not a viable option, and for the current task, we must rely on publicly available annotations. UCCA performed the best on the correctness scale but did not explicitly state negation and entity recognition. At the same time, due to annotation tooling, it is possible to implement the required layers if deemed necessary.
7
Conclusions
Research by the author is currently focused on constructing a hybrid system for representing knowledge in first-order-logic using said parsers in an ensemble. Based on the findings we recommend to use UDS for preprocessing the input to simplify the sentence structure for input passage; considering the strengths and limitations of each individual framework: UDS for preprocessing the input to simplify the sentence structure for input passage; AMR for unanchored representations to generalize the syntactic layer; and UD to constrain and specify the generalized graphs.
References 1. Davis, E.: Logical formalizations of commonsense reasoning: a survey. J. Artif. Intell. Res. 59, 651–723 (2017) 2. Kamath, A., Das, R.: A survey on semantic parsing. In: Automated Knowledge Base Construction (AKBC) (2018) 3. Abzianidze, L., et al.: MRP 2020: the second shared task on cross framework and cross lingual meaning representation parsing, pp. 1–22. Association for Computational Linguistics (ACL) (2020) 4. Pavlova, S., Amblard, M., Guillaume, B.: How Much Of UCCA Can Be Predicted From AMR? In: Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022, pp. 110–117 (2022)
Evaluation of Semantic Parsing Frameworks
563
5. van Noord, R., Abzianidze, L., Haagsma, H., Bos, J.: Evaluating scoped meaning representations. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018) 6. Abend, O., Rappoport, A.: UCCA: a semantics-based grammatical annotation scheme. In: IWCS, vol. 13, pp. 1–12 (2013) 7. Nivre, J., et al.: Universal dependencies v2: an evergrowing multilingual treebank collection. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4034–4043 (2020) 8. Haverinen, K., Nyblom, J., Viljanen, T., Laippala, V., Kohonen, S., Missil¨ a, A., Ojala, S., Salakoski, T., Ginter, F.: Building the essential resources for Finnish the Turku dependency treebank. Lang. Resour. Eval. 48(3), 493–531 (2014) 9. Reddy, S., T¨ ackstr¨ om, O., Petrov, S., Steedman, M., Lapata, M.: Universal Semantic Parsing. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 89–101 (2017) 10. Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Linguist. 4, 313–327 (2016) 11. Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal recursion semantics: an introduction. Res. Lang. Comput. 3(2), 281–332 (2005) 12. Oepen, S., Lønning, J.T.: Discriminant-based MRS banking. In: LREC, pp. 1250– 1255 (2006) 13. Hajic, J., et al.: Announcing Prague Czech-English Dependency Treebank 2.0. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pp. 3153–3160 (2012) 14. Samuel, D., Straka, M.: UFAL at MRP 2020: permutation-invariant semantic parsing in PERIN. CoNLL 2020, 53 (2020) 15. Basile, V., Bos, J., Evang, K., Venhuizen, N.: Developing a large semantically annotated corpus. In: LREC 2012, Eighth International Conference on Language Resources and Evaluation (2012) 16. Liu, Y., Che, W., Zheng, B., Qin, B., Liu, T.: An AMR aligner tuned by transitionbased parser. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2422–2430 (2018) 17. Bos, J., Basile, V., Evang, K., Venhuizen, N.J., Bjerva, J.: The groningen meaning bank. In: Ide, N., Pustejovsky, J. (eds.) Handbook of Linguistic Annotation, pp. 463–496. Springer, Dordrecht (2017). https://doi.org/10.1007/978-94-024-08812 18 18. Liu, J., Cohen, S., Lapata, M.: Discourse representation structure parsing. In: 56th Annual Meeting of the Association for Computational Linguistics, pp. 429–439. Association for Computational Linguistics (ACL) (2018) 19. White, A.S., et al.: Universal decompositional semantics on universal dependencies. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1713–1723 (2016) 20. White, A.S., et al.: The universal decompositional semantics dataset and decomp toolkit. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5698–5707 (2020) 21. Kollar, T., et al.: The alexa meaning representation language. In: NAACL-HLT (3), pp. 177–184 (2018)
Performing Systematic Review on Personalized Menu Scheduling Using PRISMA Guidelines Dorra Kallel1,2(B) , Ines Kanoun1,3 , and Diala Dhouib1,2 1
2 3
OLID Lab: Optimisation, Logistique et Informatique D´ecisionnelle, University of Sfax, Sfax, Tunisia dorra [email protected] Higher Institute of Industrial Management of Sfax, Technopole Sfax, Cite el Ons, 3021 Sfax, Tunisia Higher Institute of Commercial Studies of Sfax, Sidi Mansour, 3061 Sfax, Tunisia
Abstract. Interest in the Menu Planning Problem (MPP) has continued to grow over the last decades. It basically concerns finding the best variety of menu items that fits specific nutrients. Studies in Operations Research (OR) centered around optimizing and meeting the required nutrients intake. Over the last decade, this problem has been the central focus of multiple research works. The basic contribution of this work resides in elaborating a Systematic Literature Review (SLR) of the most outstanding studies on MPP and classifying them abiding by particular criteria. This SLR was conducted in accordance with PICO search tool to define specific key terms and in line with PRISMA guidelines to settle the search equation. Additionally, after applying the SLR’s guidelines, a detailed meta-analysis was provided. Overall, this paper presents a theoretical basics for future researchers concerned in the field MPP.
Keywords: Menu scheduling problem Systematic review · Classification
1
· Operations research ·
Introduction
Malnutrition corresponds to a condition of food in which a lack of calories, protein, or micronutrients has observable negative consequences on body form and function [1]. This concept has whetted a spate of interest in the literature. In particular, scientists in the field of operations research have focused on the maximum of nutritional intake in order to fight diseases. Several studies have been undertaken to address the issue of menu planning for various types of patients with dietary restrictions.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 564–574, 2023. https://doi.org/10.1007/978-3-031-27440-4_54
Performing Systematic Review on Personalized Menu Scheduling
565
The literature review is a significant part of academic investigation. We may gain a better understanding of the width and depth of the current work by evaluating relevant state of the art works and by identifying gaps to be investigated.To conduct a complete and a thorough study of a current research on a particular issue, a systematic approach needs to be followed, and existing approaches have to be examined in such a way that the rest of the research will be built on a strong foundation [2]. This paper therefore provides a systematic literature review methodology in operations research domain. It identifies research works that tackle the topic of menu planning problem. The remainder of the paper is structured as follows: The first section clarifies the review methodology. The second section introduces the review that describes our SLR and specifies the basic research equation. The third section displays a fruitful discussion in addition to the classified handled papers. Finally, the final part closes up the conclusion, exhibits some pertinent conducting remarks.
2
Carrying Out the SLR on MPP
Kitchenham in [3] set forward a guideline for systematic reviews seeking to assist researchers in the course of their scientific investigation. In this regard, three key phases to a successful review as described: – Phase 1: Planning the review • Specifying the search questions • Developing a review protocol – Phase 2: Conducting the review • Identification of research • Study selection • Data extraction strategy • Data synthesis – Phase 3: Reporting the review • Dissemination of result (meta-analysis)
3
Planning the Review (Phase 1)
The main goal of a SLR is to search for a solution to a specific and a well-defined research equation. Initially, we need to start by identifying the review research key terms which shall help develop the inclusion and exclusion criteria, as well as the the search strategy, data collection, and presentation of the findings.
566
3.1
D. Kallel et al.
Research Key Terms: PICO Search Tool
PICO is a prominent tool for designing a review methodology and defining clear and targeted key terms. It is a powerful means for asking specific questions in several areas. The PICO tool focuses on the research’ Population, Intervention, Comparison groups, and Outcomes’. It is widely used to identify systematic reviews. It includes a checklist of the essential principles required for the search strategy. PICO Search Grid - Synonyms: The Boolean “OR” operator was invested to associate the important terms inside each group, while the Boolean “AND” operator was used to relate the groups together. At this stage, we used the PICO search grid synonyms with the Boolean operator in order to create the word’s synonyms, and we utilized the Boolean “OR” and the Boolean “AND” to connect the main terms as displayed in Table 1. Table 1. PICO search grid with boolean operator. PICO terms Key Term Population Interest
Alternate Term
Alternate Term
“targeted patients” OR “particular persons” OR “critical patients” “menu planning”
OR “menu generating”
OR “menu personalizing”
Comparison group AND
“methods”
OR “techniques”
OR “approaches”
Outcome
“healthy menu”
OR “dietary menu”
OR “nutritious menu”
3.2
AND AND
Research Equation
As a result, the final search equation for the databases was: (“targeted patients” OR “particular persons” OR “critical patients”) AND ( “menu planning” OR “menu generating” OR “menu personalizing”) AND (“methods” OR “techniques” OR “approaches”) AND (“healthy menu” OR “dietary menu” OR “nutritious menu” ). 3.3
Electronic Sources and Search Grid
We used the following publishers and bibliographical databases: Springer Link, Google Scholar, Science Direct and Scopus.
4
Conducting the Review (Phase 2)
We found an initial set of 454 publications after using the research equation. The categorization of these selected research works according to particular inclusion and exclusion criteria corresponds to the next essential step.
Performing Systematic Review on Personalized Menu Scheduling
567
Table 2. Inclusion and exclusion criteria. Inclusion criteria
Exclusion criteria
Published from 2015 on-wards
Papers that do not consider Menu Planning Problem (MPP)
Written in English
The clarity of users’ instructions
Belonging to one of these types (Journal, conference, Studies that do not meet our search objectives symposium or book chapter) Their number of pages are more than 5
4.1
Study Selection
In order to determine their real relevance, all discovered studies are subjected to primary study selection using inclusion and exclusion criteria. The process of identifying research frequently results in a large number of papers that do not address the research equation. Inclusion and Exclusion Criteria: The inclusion criteria will determine which of these research papers have to be included in the list of studies that are relevant. The articles were opted for after an initial review of the titles and abstracts, which was followed by a review of the entire texts.The exclusion criteria might be applied to studies which have already been selected in order to find those who don’t satisfy the additional restrictions.
I. Idenficaon
Studies idenfied through database searching (n = 454 )
Studies aer duplicates removed (n = 396 ) Studies excluded (n = 284 ) II. Analysis Studies remaining aer analysis tles (n = 112) Studies excluded (n = 57 ) Studies remaining aer analysis abstracts (n = 55)
Studies read full-text that meet the inclusion criteria (n = 47 ) III. Selecon Studies read full-text excluded, with reasons (n = 2 )
IV. Inclusion
Studies included in the literature review (meta-analysis) (n = 45 )
Fig. 1. Flow diagram PRISMA of study selection according to [5].
568
4.2
D. Kallel et al.
Data Extraction Strategy: PRISMA Flowchart
The extracted data were collected using the Cochrane PRISMA guidelines in [4]. For each publication, the study’s quality as well as the degree of evidence were evaluated (Table 2). Number of Studies Included: The bibliographic search revealed 454 results, among which 128 in the Springer Link database, 105 in the Google Scholar database, 97 in the Science Direct database and 124 in the Scopus database. The final number of selected papers was reduced to 45, as plotted in the PRISMA flowchart in Fig. 1.
5
Reporting the Review (Phase 3)
After presenting the most outstanding studies addressing nutritional meal planning, this part centers around their classification. Table 3. The classification of treated papers. Optimization References Type of patients methods Sick Children and patients students
Constraints Different groups
Adults and elderly
Athletes
Nutrients requirements
Budget cost
Categorization of Variety of recipes menu
[6]
-
-
-
-
-
-
[7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]
-
-
-
-
-
-
-
Approximate [38] methods [39] [40] [41] [42] [43] [44] [45] [46]
-
-
-
-
-
-
-
-
-
-
-
-
Exact and approximate methods
[47]
-
-
-
-
-
-
-
[48] [49] [50]
-
-
-
-
-
-
Exact methods
- not defined considered
Performing Systematic Review on Personalized Menu Scheduling
5.1
569
Synthesis of Results (Meta-analysis)
To the best of our knowledge, this is the pioneer survey to provide a comprehensive classification of MPP research papers using the SLR. Based on the characteristics of MPP, the following taxonomy is set up to handle the papers: type of patients, constraints and optimization methods. Type of Patients: Personalized menu planning is applied to several patients categories. In this systematic review classification, we report five groups of patients: Sick patients; Children and students; Different groups; Adults and elderly; Athletes. Constraints: As they correspond to a problem that plans healthy and balanced menus, the main constraint is to respect the necessary nutrients for each individual as plotted in Table 3. The cost constraint stands also for an important part of the PPM constraints, and many studies have addressed this constraint to minimize the budgetary cost of the served meals. Several authors have chosen to include “categorization of recipes” and “variety” in the composition of the proposed menus to appeal to patients. Optimization Methods: Decision-making needs to be more reasonable and optimal as the world becomes increasingly complicated [51]. On the other side, there is a variety of approaches addressing optimization problems, including exact methods and approximate ones, that are invested in a variety of applications. As we have opted for the criterion of inclusion, we were able to elaborate a classification on the trends of optimization methods. Heuristics are no longer the most used approximate methods. During this period, exact methods are the most widely used methods. Metaheuristics, associated with exact models, are the least applied methods followed by approximate methods as illustrated in Fig. 2. 5.2
Discussion
By examining 45 articles according to systematic literature review, this paper proposes significant contributions to the scientific researchers, providing a comprehensive classification of MPP research papers over the period 2015–2021. Relating to the type of patients, sick people are the most common addressed type of patients, but much spate of interest has also been oriented to different groups of patients. Children and students are addressed in 10 papers, adults are found in 5 publications, and only one work between 2015 and 2021 addresses athletes. As can be inferred from Fig. 2, nutritional needs are the most frequently identified constraints appearing in all articles, followed by budget cost (appearing in 22 publications) and categorization of recipes (covering 7 publications) and then comes menu variety constraint (5 articles). We have summarized all the proposed
D. Kallel et al. TYPE OF PATIENTS
CONSTRAINTS
OPTIMIZATION METHODS
Sick patients
Children and students
Nutrients requirement
Different groups
Adults and elderly
Budget cost
Exact methods
Categorization of recipes
Approximate methods
Variety of menu
Exact and approximate methods
4
1
5
7
9
22
NUMBER OF PUBLICATIONS
5
NUMBER OF PUBLICATIONS
10
NUMBER OF PUBLICATIONS
13
45
15
Athletes
32
570
Fig. 2. Dissemination of the papers according to the type of patients, constraints and optimization methods on MPP.
optimization methods to solve the different extensions in MPP. According to summary Table 3 and Fig. 2, we notice that 71% of the works (32 articles) addressed the MPP using exact methods. The reason for which the authors applied these operational research methods resides in the fact that they provide very good results while satisfying the constraints related to the recommended nutritional values according to the case of the patients at a reasonable time. It has the capacity to generate efficient tools for solving optimization problems so as to achieve a total or an approximately optimum value.In contrast, the use of approximate methods is less frequent in the field of MPP. Authors generally adopt approximate methods to solve large instances. Heuristics and meta-heuristics can be invested alone without exact methods. It is noteworthy that only a few papers have developed a matheuristic to deal with the MPP. These methods correspond to 20% of the works. Four studies uniquely have opted to use both exact and approximate methods, which makes 9% of all the collected works.As can be inferred from Fig. 2 which summarizes the optimization methods, the authors chose frequently to use exact methods.
6
Conclusion
This survey provides a systematic review of the literature for recent research works on the menu planning problem. The basic objective of this review is to highlight recent research works addressing this problem. We suggested specific keywords using the PICO tool and we used boolean operators to connect them with alternative keywords. As a result, we built up a search string that is injected into four bibliographic databases that concern research in operations Research. All obtained publications went through the inclusion and exclusion steps using the PRISMA flowchart guideline.The proposed SLR addresses three main classifications over 45 publications over the period 2015–2021. These classification correspond to: type of patients, constraints and optimization methods. Our ultimate objective lies in
Performing Systematic Review on Personalized Menu Scheduling
571
providing a theoretical basis for future researchers interested in this field (MPP). Our SLR servers as an enlightening guideline through its useful information about the constraints, types of patients and optimization methods used in this type of menu personalizing problem.
References 1. Stratton, R.J., Green, C.J., Elia, M.: Disease-related malnutrition: an evidencebased approach to treatment. Cabi (2003) 2. Abid, A., Kallel, I., Ayed, M.B.: Teamwork construction in e-learning system: a systematic literature review. In: 2016 15th International Conference on Information Technology Based Higher Education and Training (ITHET). IEEE, pp. 1–7 (2016) 3. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2007) 4. Stovold, E., Beecher, D., Foxlee, R., Noel-Storr, A.: Study flow diagrams in Cochrane systematic review updates: an adapted PRISMA flow diagram. Syst. Rev. 3(1), 1–5 (2014) 5. DMoher, D., Liberati, A., Tetzlaff, J., Altman, D.G., Prisma Group: Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 6(7), e1000097 (2009) 6. Baharom, N., Isa, N.S.A.M.: Mathematical modelling of diet planning problem for hypertension patients. J. Comput. Res. Innov. 6(4), 1–9 (2021) 7. Benvenuti, L., De Santis, A., Cacchione, P.: Multi-indicator design and assessment of sustainable diet plans. J. Clean. Prod. 313, 127699 (2021) 8. Lestari, D., Abadi, A.M., Dhorurri, A., Tarigan, A.I., Herlinawati, E.: Optimization of nutritional-menu planning for toddlers by goal programming model. In: 7th International Conference on Research, Implementation, and Education of Mathematics and Sciences (ICRIEMS 2020). Atlantis Press, pp. 264–273 (2021) 9. Malvar, R.J., et al.: Cost optimization of food diet for adult Filipino patients with stage 1 or stage 2 chronic Kidney diseases. Turk. J. Comput. Math. Educ. (TURCOMAT) 12(3), 5453–5459 (2021) 10. Paidipati, K.K., Komaragiri, H., Chesneau, C.: Pre-emptive and non-pre-emptive goal programming problems for optimal menu planning in diet management of Indian diabetes mellitus patients. Int. J. Environ. Res. Public Health 18(15), 7842 (2021) 11. Sufahani, S.F., Jamaludin, M.A.: Nutrient planning for heart problem (stroke) patient by using optimization technique. Int. J. Adv. Comput. Syst. Softw. Eng. 2(1), 8–17 (2021) 12. Sufahani, S.F., Osman, B.I.: Optimization approach on nutritious menu planning for sinusitis patient among Malaysian. Int. J. Adv. Comput. Syst. Softw. Eng. 1(4), 12–22 (2020) 13. Pichugina, O.: Diet-menu problem modelling and applications. In: 2020 IEEE 2nd International Conference on System Analysis & Intelligent Computing (SAIC). IEEE, pp. 1–5 (2020) 14. Ahmad, N., Sani, N.S.A., Zaidi, N.M.: Optimal diet selection for university students using integer linear programming. In: AIP Conference Proceedings, vol. 2138, no. 1, p. 040002. AIP Publishing LLC (2019)
572
D. Kallel et al.
15. Hern´ andez, M., G´ omez, T., Delgado-Antequera, L., Caballero, R.: Using multiobjective optimization models to establish healthy diets in Spain following mediterranean standards. Oper. Res. 1–35 (2019) 16. Ping, K.Y., Sufahani, S.F.: Special healthcare on nutrient boundary and diet for myopia patients through mathematical modelling. J. Des. Sustain. Environ. 1(1) (2019) 17. Sapri, N.S.M., Bedi, M.R.B.A.P.D.S., Abdul-Rahman, S. and Benjamin, A.M.: A diet recommendation for diabetic patients using integer programming. In: AIP Conference Proceedings, vol. 2138, no. 1, p. 040022. AIP Publishing LLC (2019) 18. Hui, L.S., Sufahani, S.: Healthy menu scheduling for high blood pressure patient with optimization method through integer programming. Adv. Comput. Intell. Syst. 1(1) (2019) 19. Benjamin-Neelon, S.E., Vaughn, A.E., Tovar, A., Østbye, T., Mazzucca, S., Ward, D.S.: The family child care home environment and children’s diet quality. Appetite 126, 108–113 (2018) 20. Hadzhikolev, E., Hadzhikoleva, S.: Application of the simplex method to create a weekly menu planner. Acta Universitatis Cinbinesis, Ser. E: Food Technol. 22(2) (2018) 21. Jridi, I., Jerbi, B., Kamoun, H.: Menu planning with a dynamic goal programming approach. Multiple Crit. Decis. Mak. 13, 74–87 (2018) 22. Sheng, L.Z., Sufahani, S.: Optimal diet planning for eczema patient using integer programming. J. Phys.: Conf. Ser. 995(1), p. 012049 (2018) 23. Sufahani, S., et al.: A mathematical study on “additive technique” versus “branch and bound technique” for solving binary programming problem. J. Phys.: Conf. Ser. 995(1), 012001 (2018) 24. Sufahani, S., et al.: Applied mathematical optimization technique on menu scheduling for boarding school student using delete-reshuffle-reoptimize algorithm. J. Phys.: Conf. Ser. 995(1), 012002 (2018) 25. Sudin, A.M., Sufahani, S.: Mathematical approach for serving nutritious menu for secondary school student using “delete-reshuffle-reoptimize algorithm. J. Phys.: Conf. Ser. 995(1), 012048 (2018) 26. Arnaut-Berilo, A., Delalic, A., Huseinbasic, A.: A nutritional analysis of the food basket in BIH: a linear programming approach. South East Eur. J. Econ. Bus. 12(1), 104–113 (2017) 27. Dhoruri, A., Lestari, D., Ratnasari, E.: Menu variations for diabetes mellitus patients using goal programming model. In: AIP Conference Proceedings, vol. 1867, no. 1, p. 020015. AIP Publishing LLC (2017) 28. Eghbali-Zarch, M., Tavakkoli-Moghaddam, R., Esfahanian, F., Azaron, A., Sepehri, M.M.: A new multi-objective optimization model for diet planning of diabetes patients under uncertainty. Health Educ. Health Prom. 5(3), 37–55 (2017) 29. Schaynov´ a, L.: A nutrition adviser’s menu planning for a client using a linear optimization model. Acta Polytechnica Hungarica 14(5), 121–137 (2017) 30. Urrutia, J.D., Mercado, J., Tampis, R.L.: Minimization of food cost on 2000-calorie diabetic diet. J. Phys.: Conf. Ser. 820(1), 012002 (2017) 31. Ali, M., Sufahani, S., Ismail, Z.: A new diet scheduling model for Malaysian school children using zero-one optimization approach. Glob. J. Pure Appl. Math. 12(1), 413–419 (2016) 32. Iwuji, A.C., Nnanna, M., Ndulue, N.I.C., et al.: An optimal dash diet model for people with hypertension using linear programming approach. Open J. Optim. 5(01), 14 (2016)
Performing Systematic Review on Personalized Menu Scheduling
573
33. De Carvalho, I.S.T., Granfeldt, Y., Dejmek, P., H˚ akansson, A.: From diets to foods: using linear programming to formulate a nutritious, minimum-cost porridge mix for children aged 1 to 2 years. Food Nutr. Bull. 36(1), 75–85 (2015) 34. Fahmida, U., et al.: Effectiveness in improving knowledge, practices, and intakes of “key problem nutrients” of a complementary feeding intervention developed by using linear programming: experience in Lombok, Indonesia. Am. J. Clin. Nutr. 101(3), 455–461 (2015) 35. Gerdessen, J., De Vries, J.: Diet models with linear goal programming: impact of achievement functions. Eur. J. Clin. Nutr. 69(11), 1272–1278 (2015) 36. Levesque, S., Delisle, H., Agueh, V.: Contribution to the development of a food guide in Benin: linear programming for the optimization of local diets. Public Health Nutr. 18(4), 622–631 (2015) 37. Sufahani, S.F., Ismail, Z.: Planning a nutritious and healthy menu for Malaysian school children aged 13–18 using “delete-reshuffle algorithm” in binary integer programming. J. Appl. Sci. 15(10), 1239 (2015) 38. Chifu, V.R., Pop, C.B., Birladeanu, A., Dragoi, N., Salomie, I.: Choice functionbased constructive hyper-heuristic for generating personalized healthy menu recommendations. In: 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, pp. 111–118 (2018) 39. Hernandez-Ocana, B., Chavez-Bosquez, O., Hernandez-Torruco, J., Canul-Reich, J., Pozos-Parra, P.: Bacterial foraging optimization algorithm for menu planning. IEEE Access 6, 8619–8629 (2018) 40. Chifu, V., Bonta, R., Chifu, E.S., Salomie, I., Moldovan, D.: Particle swarm optimization based method for personalized menu recommendations. In: Vlad, S., Roman, N. (eds.) International Conference on Advancements of Medicine and Health Care through Technology. IFMBE Proceedings, vol. 59, pp. 232–237. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52875-5 50 41. Fister, D., Fister, I., Rauter, S.: Generating eating plans for athletes using the particle swarm optimization. In: 2016 IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI). IEEE, pp. 193–198 (2016) 42. Pop, C.B., Chifu, V.R., Salomie, I., Racz, D.S., Bonta, R.M.: Hybridization of the flower pollination algorithm-a case study in the problem of generating healthy nutritional meals for older adults. In: Patnaik, S., Yang, X.S., Nakamatsu, K. (eds.) Nature-Inspired Computing and Optimization. Modeling and Optimization in Science and Technologies, vol. 10, pp. 151–183. Springer, Cham (2017). https://doi. org/10.1007/978-3-319-50920-4 7 43. Silva, J.G.R., et al.: Solving a multiobjective caloric-restricted diet problem using differential evolution. In: IEEE Congress on Evolutionary Computation (CEC). IEEE 2017, pp. 2062–2069 (2017) 44. Chifu, V.R., Salomie, I., Petrisor, L., Chifu, E.S., Moldovan, D.: Hybrid immune based method for generating healthy meals for older adults. In: 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE 2016, pp. 248–255 (2016) 45. Moldovan, D., et al.: Diet generator for elders using cat swarm optimization and wolf search. In: Vlad, S., Roman, N. (eds.) International Conference on Advancements of Medicine and Health Care through Technology, 12th–15th October 2016, ClujNapoca, Romania. Springer, Cham, vol. 59, pp. 238–243 (2017). https://doi.org/ 10.1007/978-3-319-52875-5 51 46. Segismundo, M.I.V., Comendador, B.E.V.: Prenatal nutrition diet generator utilizing modified genetic algorithm for smartphone. J. Autom. Control Eng. 3(1) (2015)
574
D. Kallel et al.
47. El Moutaouakil, K., Cheggour, M., Chellak, S., Baizri, H.: Metaheuristics optimization algorithm to an optimal moroccan diet. In: 2021 7th Annual International Conference on Network and Information Systems for Computers (ICNISC). IEEE, pp. 364–368 (2021) 48. Bello, P., Gallardo, P., Pradenas, L., Ferland, J.A., Parada, V.: Best compromise nutritional menus for childhood obesity. PLoS ONE 15(1), e0216516 (2020) 49. Porras, E., Fajardo, A., Medina, R.: An adequate dietary planning model using particle swarm optimization. In: Kaenampornpan, M., Malaka, R., Nguyen, D., Schwind, N. (eds.) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2018. Lecture Notes in Computer Science, vol. 11248, pp. 247–254. Springer, Cham (2018). https:// doi.org/10.1007/978-3-030-03014-8 23 50. Moreira, R.P.C., Wanner, E.F., Martins, F.V.C., Sarubbi, J.F.M.: The menu planning problem: a multiobjective approach for Brazilian schools context. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 113– 114 (2017) 51. Talbi, E.-G.: Metaheuristics: From Design to Implementation, vol. 74. Wiley (2009)
Author Index
A Abraham, Ajith 23, 44 Achour, Mehdi 230 Agarwal, Arun 44 Agrawal, Sanjay 297 Akyol, Sakine 242 Aminev, Damir 438 Anandh, K. S. 458 Arivazhagan, P. 342 Arora, Mamta 448 Arumugam, Thangaraja 118 Ashvanth, R. 1 B Bahrpeyma, Fouad 271 Baker, Mohammed Rashad 522 Ben Ammar, Boulbaba 286 Ben Aouicha, Mohamed 286 Bera, Tanushree 387 Bernábe-Loranca, M. Beatriz 404 Bilal, Azdine 323, 380 Blayac, Sylvain 427 Boufaied, Amine 230 bouhlel, MedSalim 152 Bourhim, El Mostafa 361, 512 C Carneiro, Ana Paula Athayde 351 Chandana, V. 170, 252, 262, 316, 342 Chandra, Pretam 387 Chandrasekaran, Buvanesh 118 Chatterjee, Diptirtha 161 Chaudhary, Bhawesh K. 297 Cortes, Omar Andres Carmona 96 D da Silva, Bernard 351 da Silva, Josenildo Costa 96 Das, Ayontika 128 Das, Biva 128
de Campos, Lídio Mauro Lima 178 De Ita Luna, Guillermo 404 de Oliveira, Danilo G. 142 Deepak, Gerard 1, 12 Della Ventura, Michele 217 Demyanchuk, Nikita 438 Dhouib, Diala 564 do Carmo Nicoletti, Maria 53 Dogan, Onur 242 Dornberger, Rolf 85 dos Santos, Carlos R. Paula 480 Duarte, Elias P. 480 Duraiswamy, Punithavathi 34 E El Kabbouri, Mounime 323 Er, Orhan 242 F Fernandez, Terrance Frederick 188 Filho, José Francisco S. 142 Filho, Jose Francisco Silva 351 Fulber-Garcia, Vinicius 480 G Gehrhardt, Ingolf 271 Ghadiri, Nasser 332 Gomes, Maísa Fernandes 491 González-Velázquez, Rogelio 404 Granillo-Martínez, Erika 404 Gupta, Manali 23 H Hamouda, Maissa 152 Hanne, Thomas 85 Hvatov, Alexander 438 I Ifleh, Abdelhadi
323, 380
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 646, pp. 575–577, 2023. https://doi.org/10.1007/978-3-031-27440-4
576
J Jain, Ashima 44 Jain, Khushboo 23, 44 Jamuna, 206 Jauhar, Sunil Kumar 107 Jemili, Farah 501 Jihad, Kamal H. 522 Jyothi, V. K. 533 K Kabbouri, Mounime El 380 Kallel, Dorra 564 Kamel, Kamel Yasmine 501 Kanoun, Ines 564 Kar, Anwesha 128, 387 Kirar, Jyoti Singh 161 Kishore, J. K. 34 Kouloumpris, Eleftherios 467 Kumar, Dhiraj 161 Kumar, Vimal 252 Kumari, K. Shantha 188 Kumari, Shabnam 543 L Labti, Oumayma 361, 512 Luizelli, Marcelo C. 480 M Maia, José Everardo Bessa 370 Maitreyi, P. 206 Malhotra, Gayatri 34 Mamatha, H. R. 206 Mandal, Dishari 128 Meddeb, Rahma 501 Miranda, Fabiano 142, 351 Mishro, P. K. 297 Mobin, Gulfishan 387 Moëllic, Pierre-Alain 427 Mohammed, Esraa Zeki 522 Muthulakshmi, P. 543 N Nag, Anindya 128, 387 Nagaraj, E. 308 Nama, Vihaan 12 Nguyen, Baptiste 427 Nietto, Paulo Rogerio 53
Author Index
P Palanivelu, Suganya 316 Panda, Rutuparna 297 Pandey, Mrinal 448 Parpinelli, Rafael Stubs 142, 351, 491 Patel, Surabhi 23 Pereira, Silas S. L. 370 Pierros, Ioannis 467 Portela, Elaine Pinto 96 Prakash, Varun 448 Pranamya, B. 342 R Raghav, Sagar 448 Reichelt, Dirk 271 S Saad Rubaidi, Zainab 286 Sacco, Nilton Cesar 53 Said, Mourad 65 Sakly, Houneida 65 Sakthivel, V. 308 Samani, Rasool 332 Sanjai, Sooraj 73 Santhanavijayan, A. 1, 12 Sarma, Guda Ramachandra Kaladhara 533 Schöpflin, Timo 85 Sequeira, A. H. 73, 107, 170, 252, 262, 308, 316, 342 Serpa, Pedro H. 142 Shahrokh, Fahime 332 Sheeba Priyadarshini, J. 1 Shetty, Deeksha Sanjay 170, 308 Sil, Riya 128 Siri, S. 73 Sivakumar, K. 118 Sood, Shubham 448 Spinosa, Eduardo J. 480 T Tagina, Moncef 65 Tepedino, José Osvaldo Amaral Tyagi, Akshita 188 Tyagi, Amit Kumar 188 U Umoh, Uduak 118, 262 Upadhyaya, Bibek 161 Uppal, Ananya 206
351
Author Index
577
V Vadivel, S. M. 73, 107, 118, 170, 252, 262, 308, 316, 342 Verrev, Martin 554 Vlahavas, Ioannis 467 X Xie, Chunfu
416
Y Yadav, Shailendra Nath Yuvaraj, D. 458
Z Zaikis, Dimitrios 467 Zimmerli, Pascal 85
161