140 10 47MB
English Pages 632 [631] Year 2023
Lecture Notes in Networks and Systems 717
Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj Editors
Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 4
Lecture Notes in Networks and Systems
717
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Ajith Abraham · Sabri Pllana · Gabriella Casalino · Kun Ma · Anu Bajaj Editors
Intelligent Systems Design and Applications 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) Held December 12–14, 2022 - Volume 4
Editors Ajith Abraham Faculty of Computing and Data Science Flame University Pune, Maharashtra, India Machine Intelligence Research Labs Scientific Network for Innovation and Research Excellence Auburn, WA, USA
Sabri Pllana Center for Smart Computing Continuum Burgenland, Austria Kun Ma University of Jinan Jinan, Shandong, China
Gabriella Casalino University of Bari Bari, Italy Anu Bajaj Department of Computer Science and Engineering Thapar Institute of Engineering and Technology Patiala, Punjab, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-35509-7 ISBN 978-3-031-35510-3 (eBook) https://doi.org/10.1007/978-3-031-35510-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Welcome to the 22nd International Conference on Intelligent Systems Design and Applications (ISDA’22) held in the World Wide Web. ISDA’22 is hosted and sponsored by the Machine Intelligence Research Labs (MIR Labs), USA. ISDA’22 brings together researchers, engineers, developers and practitioners from academia and industry working in all interdisciplinary areas of computational intelligence and system engineering to share their experience and to exchange and cross-fertilize their ideas. The aim of ISDA’22 is to serve as a forum for the dissemination of state-of-the-art research, development and implementations of intelligent systems, intelligent technologies and useful applications in these two fields. ISDA’22 received submissions from 65 countries, each paper was reviewed by at least five or more reviewers, and based on the outcome of the review process, 223 papers were accepted for inclusion in the conference proceedings (38% acceptance rate). First, we would like to thank all the authors for submitting their papers to the conference, for their presentations and discussions during the conference. Our thanks go to the program committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. Our special thanks to the following plenary speakers, for their exciting talks: • • • • • • • • • •
Kaisa Miettinen, University of Jyvaskyla, Finland Joanna Kolodziej, NASK- National Research Institute, Poland Katherine Malan, University of South Africa, South Africa Maki Sakamoto, The University of Electro-Communications, Japan Catarina Silva, University of Coimbra, Portugal Kaspar Riesen, University of Bern, Switzerland Mário Antunes, Polytechnic Institute of Leiria, Portugal Yifei Pu, College of Computer Science, Sichuan University, China Patrik Christen, FHNW, Institute for Information Systems, Olten, Switzerland Patricia Melin, Tijuana Institute of Technology, Mexico
We express our sincere thanks to the organizing committee chairs for helping us to formulate a rich technical program. Enjoy reading the articles!
ISDA 2022—Organization
General Chairs Ajith Abraham Andries Engelbrecht
Machine Intelligence Research Labs, USA Stellenbosch University, South Africa
Program Chairs Yukio Ohsawa Sabri Pllana Antonio J. Tallón-Ballesteros
The University of Tokyo, Japan Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Huelva, Spain
Publication Chairs Niketa Gandhi Kun Ma
Machine Intelligence Research Labs, USA University of Jinan, China
Special Session Chair Gabriella Casalino
University of Bari, Italy
Publicity Chairs Pooja Manghirmalani Mishra Anu Bajaj
University of Mumbai, India Machine Intelligence Research Labs, USA
Publicity Team Members Peeyush Singhal Aswathy SU Shreya Biswas
SIT Pune, India Jyothi Engineering College, India Jadavpur University, India
viii
ISDA 2022—Organization
International Program Committee Abdelkrim Haqiq Alexey Kornaev Alfonso Guarino Alpana Srk Alzira Mota Amit Kumar Mishra Andre Santos Andrei Novikov Anitha N. Anu Bajaj Arjun R. Arun B Mathews Aswathy S U Ayalew Habtie Celia Khelfa Christian Veenhuis Devi Priya Rangasamy Dhakshayani J. Dipanwita Thakur Domenico Santoro Elena Kornaeva Elif Cesur Elizabeth Goldbarg Emiliano del Gobbo Fabio Scotti Fariba Goodarzian Gabriella Casalino Geno Peter Gianluca Zaza Giuseppe Coviello Habib Dhahri Habiba Drias Hiteshwar Kumar Azad Horst Treiblmaier Houcemeddine Turki Hudson Geovane de Medeiros
FST, Hassan 1st University, Settat, Morocco Innopolis University, Russia University of Foggia, Italy Jawaharlal Nehru University, India Polytechnic of Porto, School of Engineering, Portugal DIT University, India Institute of Engineering, Polytechnic Institute of Porto, Portugal Sobolev Institute of Mathematics, Russia Kongu Engineering College, India Thapar Institute of Engineering and Technology, India Vellore Institute of Technology, India MTHSS Pathanamthitta, India Marian Engineering College, India Addis Ababa University, Ethiopia USTHB, Algeria Technische Universität Berlin, Germany Kongu Engineering College, Tamil Nadu, India National Institute of Technology Puducherry, India Banasthali University, Rajasthan, India University of Bari, Italy Orel State University, Russia Istanbul Medeniyet University, Turkey Federal University of Rio Grande do Norte, Brazil University of Foggia, Italy Universita’ degli Studi di Milano, Italy University of Seville, Spain University of Bari, Italy University of Technology Sarawak, Malaysia University of Bari, Italy Polytechnic of Bari, Italy King Saud University, Saudi Arabia USTHB, Algeria Vellore Institute of Technology, India Modul University, Austria University of Sfax, Tunisia Federal University of Rio Grande do Norte, Brazil
ISDA 2022—Organization
Isabel S. Jesus Islame Felipe da Costa Fernandes Ivo Pereira Joêmia Leilane Gomes de Medeiros José Everardo Bessa Maia Justin Gopinath A. Kavita Gautam Kingsley Okoye Lijo V. P. Mahendra Kanojia Maheswar R. Marìa Loranca Maria Nicoletti Mariella Farella Matheus Menezes Meera Ramadas Mohan Kumar Mrutyunjaya Panda Muhammet Ra¸sit Cesur Naila Aziza Houacine Niha Kamal Basha Oscar Castillo Paulo Henrique Asconavieta da Silva Pooja Manghirmalani Mishra Pradeep Das Ramesh K. Rasi D. Reeta Devi Riya Sil Rohit Anand Rutuparna Panda S. Amutha Sabri Pllana Sachin Bhosale
ix
Institute of Engineering of Porto, Portugal Federal University of Bahia (UFBA), Brazil University Fernando Pessoa, Portugal Universidade Federal e Rural do Semi-Árido, Brazil State University of Ceará, Brazil Vellore Institute of Technology, India University of Mumbai, India Tecnologico de Monterrey, Mexico Vellore Institute of Technology, India Sheth L.U.J. and Sir M.V. College, India KPR Institute of Engineering and Technology, India UNAM, BUAP, Mexico UNAM, BUAP, Mexico University of Palermo, Italy Universidade Federal e Rural do Semi-Árido, Brazil University College of Bahrain, Bahrain Sri Krishna College of Engineering and Technology, India Utkal University, India Istanbul Medeniyet University, Turkey USTHB-LRIA, Algeria Vellore Institute of Technology, India Tijuana Institute of Technology, México Instituto Federal de Educação, Ciência e Tecnologia Sul-rio-grandense, Brazil Machine Intelligence Research Labs, India National Institute of Technology Rourkela, India Hindustan Institute of Technology and Science, India Sri Krishna College of Engineering and Technology, India Kurukshetra University, India Adamas University, India DSEU, G.B. Pant Okhla-1 Campus, New Delhi, India VSS University of Technology, India Vellore Institute of Technology, India Center for Smart Computing Continuum, Forschung Burgenland, Austria University of Mumbai, India
x
ISDA 2022—Organization
Saira Varghese Sam Goundar Sasikala R Sebastian Basterrech Senthilkumar Mohan Shweta Paliwal Sidemar Fideles Cezario Sílvia M. D. M. Maia Sindhu P. M. Sreeja M U Sreela Sreedhar Surendiran B. Suresh S. Sweeti Sah Thatiana C. N. Souza Thiago Soares Marques Thomas Hanne Thurai Pandian M. Tzung-Pei Hong Vigneshkumar Chellappa Vijaya G. Wen-Yang Lin Widad Belkadi Yilun Shang Zuzana Strukova
Toc H Institute of Science & Technology, India RMIT University, Vietnam Vinayaka Mission’s Kirupananda Variyar Engineering College, India VSB-Technical University of Ostrava, Czech Republic Vellore Institute of Technology, India DIT University, India Federal University of Rio Grande do Norte, Brazil Federal University of Rio Grande do Norte, Brazil Nagindas Khandwala College, India Cochin University of Science and Technology, India APJ Abdul Kalam Technological University, India NIT Puducherry, India KPR Institute of Engineering and Technology, India National Institute of Technology Puducherry, India Federal Rural University of the Semi-Arid, Brazil Federal University of Rio Grande do Norte, Brazil University of Applied Sciences and Arts Northwestern Switzerland, Switzerland Vellore Institute of Technology, India National University of Kaohsiung, Taiwan Indian Institute of Technology Guwahati, India Sri Krishna College of Engineering and Technology, India National University of Kaohsiung, Taiwan Laboratory of Research in Artificial Intelligence, Algeria Northumbria University, UK Technical University of Košice, Slovakia
Contents
Machine Learning Approach for Detection of Mental Health . . . . . . . . . . . . . . . . . Rani Pacharane, Mahendra Kanojia, and Keshav Mishra U-Net as a Tool for Adjusting the Velocity Distributions of Rheomagnetic Fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elena Kornaeva, Alexey Kornaev, Alexander Fetisov, Ivan Stebakov, and Leonid Savin Detection of Similarity Between Business Process Models with the Integration of Semantics in Similarity Measures . . . . . . . . . . . . . . . . . . . . Wiem Kbaier and Sonia Ayachi Ghannouchi Efficient Twitter Sentiment Analysis System Using Deep Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Devi Priya, Boggala Thulasi Reddy, M. Sarvanan, P. Hariharan, S. Albert Alexander, and Geno Peter An Efficient Deep Learning-Based Breast Cancer Detection Scheme with Small Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adyasha Sahu, Pradeep Kumar Das, Sukadev Meher, Rutuparna Panda, and Ajith Abraham Comparative Analysis of Machine Learning Models for Customer Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parmeshwara Joga, B. Harshini, and Rashmi Sahay
1
8
17
30
39
49
An Intelligent Approach to Identify the Eggs of the Insect Bemisia Tabaci . . . . . Siwar Mahmoudi, Wiem Nhidi, Chaker Bennour, Ali Ben Belgacem, and Ridha Ejbali
62
Overview of Blockchain-Based Seafood Supply Chain Management . . . . . . . . . . Nesrine Ouled Abdallah, Fairouz Fakhfakh, and Faten Fakhfakh
71
Synthesis of a DQN-Based Controller for Improving Performance of Rotor System with Tribotronic Magnetorheological Bearing . . . . . . . . . . . . . . . . . . . . . . . Alexander Fetisov, Yuri Kazakov, Leonid Savin, and Denis Shutin
81
Card-Not-Present Fraud Detection: Merchant Category Code Prediction of the Next Purchase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marouane Ait Said and Abdelmajid Hajami
92
xii
Contents
Fast Stroke Lesions Segmentation Based on Parzen Estimation and Non-uniform Bit Allocation in Skull CT Images . . . . . . . . . . . . . . . . . . . . . . . . Aldísio Gonçalves Medeiros, Lucas de Oliveira Santos, and Pedro Pedrosa Rebouças Filho
99
Methods for Improving the Fault Diagnosis Accuracy of Rotating Machines . . . 110 Yuri Kazakov, Ivan Stebakov, Alexander Fetisov, Alexey Kornaev, and Roman Polyakov Heuristics Assisted by Machine Learning for the Integrated Production Planning and Distribution Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Matheus de Freitas Araujo, José Elias Claudio Arroyo, and Thiago Henrique Nogueira LSTM-Based Congestion Detection in Named Data Networks . . . . . . . . . . . . . . . 132 Salwa Abdelwahed and Haifa Touati Detection of COVID-19 in Computed Tomography Images Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Júlio Vitor Monteiro Marques, Clésio de Araújo Gonçalves, José Fernando de Carvalho Ferreira, Rodrigo de Melo Souza Veras, Ricardo de Andrade Lira Rabelo, and Romuere Rodrigues Veloso e Silva Abnormal Event Detection Method Based on Spatiotemporal CNN Hashing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Mariem Gnouma, Ridha Ejbali, and Mourad Zaied A Multi-objective Iterated Local Search Heuristic for Energy-Efficient No-Wait Permutation Flowshop Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . 166 Gabriel de Paula Félix, José Elias C. Arroyo, and Matheus de Freitas Araujo An Elastic Model for Virtual Computing Labs Using Timed Petri Nets . . . . . . . . 177 Walid Louhichi, Sana Ben Hamida, and Mouhebeddine Berrima A Decision Support System Based Vehicle Ontology for Solving VRPs . . . . . . . 194 Syrine Belguith, Soulef Khalfallah, and Ouajdi Korbaa Web API Service to RDF Mapping Method for Querying Distributed Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Artem Volkov, Nikolay Teslya, and Sergey Savosin Risk Management in the Clinical Pathology Laboratory: A Bayesian Network Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 José Crispim, Andreia Martins, and Nazaré Rego
Contents
xiii
Leveraging Sequence Mining for Robot Process Automation . . . . . . . . . . . . . . . . . 224 Pietro Dell’Oglio, Alessandro Bondielli, Alessio Bechini, and Francesco Marcelloni Intelligent Agents System for Intention Mining Using HMM-LSTM Model . . . . 234 Hajer Bouricha, Lobna Hsairi, and Khaled Ghedira Unsupervised Manipulation Detection Scheme for Insider Trading . . . . . . . . . . . . 244 Baqar Rizvi, David Attew, and Mohsen Farid A Comparative Study for Modeling IoT Security Systems . . . . . . . . . . . . . . . . . . . 258 Meziane Hind, Ouerdi Noura, Mazouz Sanae, and Ajith Abraham Improving the Routing Process in SDN Using a Combination of the Evidence Theory and ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Ali El Kamel, Hamdi Eltaief, and Habib Youssef GANASUNet: An Efficient Convolutional Neural Architecture for Segmenting Iron Ore Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Ada Cristina França da Silva and Omar Andres Carmona Cortes Classifying 2D ECG Image Database Using Convolution Neural Network and Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Tran Ngoc Tuan, Duong Trong Luong, Pham Viet Hoang, Tran Quoc Khanh, Hoang Thi Lan Huong, Tran Xuan Thang, and Tran Thuy Hanh Conceptual Model of a Data Visualization Instrument for Educational Video Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Yavor Dankov Mobile and Cooperative Agent Based Approach for Intelligent Integration of Complex Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Karima Gouasmia, Wafa Mefteh, and Faiez Gargouri Euler Transformation Axis Method for Online Virtual Trail Room Using Fusion of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 B. Surendiran, Dileep Kumar, S. Amutha, and N. Arulmurugaselvi A Novel Approach for Classification of Real Time Data Stream to Reduce Query Processing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Virendra Dani, Priyanka Kokate, and Jyotsana Goyal A Review on Machine Learning and Blockchain Technology in E-Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Deepika Tenepalli and Navamani Thandava Meganathan
xiv
Contents
Machine Learning Models for Toxicity Prediction in Chemotherapy . . . . . . . . . . 350 Imen Boudali and Ines Belhadj Messaoud Underwater Acoustic Sensor Networks: Concepts, Applications and Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Kamal Kumar Gola, Brij Mohan Singh, Mridula, Rohit Kanauzia, and Shikha Arya A Step-To-Step Guide to Write a Quality Research Article . . . . . . . . . . . . . . . . . . 374 Amit Kumar Tyagi, Rohit Bansal, Anshu, and Sathian Dananjayan A Survey on 3D Hand Detection and Tracking Algorithms for Human Computer Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Anu Bajaj, Jimmy Rajpal, and Ajith Abraham Multi-level Image Segmentation of Breast Tumors Using Kapur Entropy Based Nature-Inspired Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 Shreya Biswas, Anu Bajaj, and Ajith Abraham Interference Detection Among Secondary Users Deployed in Television Whitespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 Joachim Notcker, Emmanuel Adetiba, Abdultaofeek Abayomi, Oluwadamilola Oshin, Kenedy Aliila Greyson, Ayodele Hephzibah Ifijeh, and Alao Babatunde Sampling Imbalanced Data for Multilingual Machine Translation: An Overview of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 Albina Khusainova Digital Twin-Based Fuel Consumption Model of Locomotive Diesel Engine . . . 428 Muhammet Ra¸sit Cesur, Elif Cesur, and Ajith Abraham Centrality of AI Quality in MLOPs Lifecycle and Its Impact on the Adoption of AI/ML Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Arunkumar Akkineni, Somayeh Koohborfardhaghighi, and Shailesh Singh A Survey on Smart Home Application: The State-of-the-Art and Future Research Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Riya Sil, Shabana Parveen, and Rhytam Garai A Survey on Currency Recognition Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 Riti Mukherjee, Nirban Pal, and Riya Sil Cryptocurrencies: An Epitome of Technological Populism . . . . . . . . . . . . . . . . . . . 477 Senthil Kumar Arumugam, Chavan Rajkumar Dhaku, and Biju Toms
Contents
xv
Forecasting Bitcoin Price During Covid-19 Pandemic Using Prophet and ARIMA: An Empirical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Chavan Rajkumar Dhaku and Senthil Kumar Arumugam Performance Evaluation of Signature Based and Anomaly Based Techniques for Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Vivek Kumar Agrawal and Bhawana Rudra Comparative Study on Black Hole Attack in Mobile Ad-Hoc Networks . . . . . . . 506 Gajendra Kumar Ahirwar, Ratish Agarwal, and Anjana Pandey Machine Learning-Based Approach to Analyze Students’ Behaviour in Digital Learning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Jaly Dasmahapatra, Riya Sil, and Mili Dasmahapatra Fake Review Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 535 C Sasikala, Rajasekaran Thangaraj, Devipriya R, S RajeshKumar, Ramachandramoorthy K. B, S Ramya, and K Umapathi Context-Aware QoS Prediction for Web Services Using Deep Learning . . . . . . . 547 AS Tasneem, AP Haripriya, and KS Vijayanand An Efficient Resource Allocation Technique in a Fog Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 Ayoub Hammal, Mehdi Lerari, Khaled Zeraoulia, and Youcef Hammal Comparative Study of Various Pattern Recognition Techniques for Identifying Seismo-Tectonically Susceptible Areas . . . . . . . . . . . . . . . . . . . . . . 567 Mridula and Kamal Kumar Gola Intelligent Diagnostic System for the Sliding Bearing Unit . . . . . . . . . . . . . . . . . . 577 Alexey Rodichev, Andrey Gorin, Kirill Nastepanin, and Roman Polyakov A Systematic Review on Security Mechanism of Electric Vehicles . . . . . . . . . . . . 587 Vaishali Mishra and Sonali Kadam Experimental Investigation of CT Scan Imaging Based COVID-19 Detection with Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Aditya Shinde, Anu Bajaj, and Ajith Abraham Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
Machine Learning Approach for Detection of Mental Health Rani Pacharane(B) , Mahendra Kanojia , and Keshav Mishra Sheth L.U.J and Sir M.V. College, Mumbai, Maharashtra, India [email protected]
Abstract. Mental health issues are the most prevalent between the ages of 15 and 60. People are unaware of this health issue. The identification of mental health made use of numerous machine-learning approaches. Artificial intelligence (AI) called machine learning enables computers to automatically learn from their experiences and advance over time without explicit programming. In this work, ANN, regression, linear regression, random forest, and decision tree models were utilized to identify mental health problems precisely. This was because machinelearned technology was an effective method for evaluating whether a person had a mental health issue. For the detection of mental diseases, machine learning is crucial. There are ANN and Linear Regression used for better results. There were key indicators as to whether an episode was imminent. These crises could be predictable if we could detect a pattern of stress, isolation, or exposure to triggers. The model data used is a survey based on a company that gave information about how many people had gone through mental health problems. Our proposed ANN model gives an accuracy of around 64% as we increase epochs, the model performs better, and accuracy increases. Keywords: Mental Health · Machine Learning Model · Decision tree · Artificial Neural Network [ANN]
1 Introduction Our mental well-being is referred to as our mental health [1, 3]. Our physical, emotional, behavioral, and social status are all factors. Our intellectual health influences how we sense our lives, how we manage normal conditions, and how we interact with the humans around us physical and mental fitness is the result of a complex interplay of many individual and environmental factors, consisting of clinical and genetic own family records of healthy conduct and lifestyle (eg, smoking, exercising, substance use), the number of pollutants exposed to non-public and occupational strain, disturbing exposures, situations and historical past in someone’s life and available help (e.g. timely health care, social aid) intellectual fitness is the muse of emotions, questioning, communication and other cognitive features [3]. Mental health is also key to relationships; private and emotional properly-being and contribution to the network or society. [4] Many humans with intellectual contamination do not speak about it. But mental contamination is nothing to be embarrassed about! It’s from a medical condition, similar © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 1–7, 2023. https://doi.org/10.1007/978-3-031-35510-3_1
2
R. Pacharane et al.
to heart sickness or diabetes. And intellectual health conditions are treatable. We are continuously increasing our expertise of how the human brain works, and remedies are available to help human beings efficiently manipulate mental fitness situations. Intellectual contamination does not discriminate; it can affect each person no matter age, gender, geographic area, income, social popularity, race/ethnicity, faith/spirituality, sexual orientation, starting place, or different aspects of cultural identity. At the same time as intellectual infection can appear at any age, 3-quarters of all mental illnesses start by age 24. Mental illness comes with lots of bureaucracy. Some are moderate and intervene with everyday existence in the simplest constrained way, which includes specific phobias (bizarre fears). Other mental health situations are so extreme that a person might also need sanatorium care. [6] The cause for selecting this subject matter was that machine studying is a subject that gives computer systems the capacity to study without being explicitly programmed. The capacity to study. it is presented in energetic use in many greater places than one may anticipate. The main goal is to enable computer systems to study automatically without human interplay and regulate moves thus. Some gadget studying models along with SVM (guide Vector gadget), [11] Unsupervised gaining knowledge of and Reinforcement getting to know were used for this cause. This study is fully based on texture feature extraction and excessive precision prediction, a higher choice became to pick supervised getting-to-know algorithms due to the fact this set of rules includes a goal variable to be anticipated from a given set of datasets using this set of variables. We generate a function that maps the inputs to the desired outputs, and this training process is maintained till the model reaches the preferred stage of accuracy in the education dataset. A few examples are SVM, Random wooded areas, Choice Trees, and so on. System mastering intends to create systems that may analyze by enjoying the use of sophisticated statistical and probabilistic techniques. it’s far believed to be a precious tool for predicting mental health.
2 Literature Review SVM and Decision trees have been implemented in the past majorly in mental health detection. This section gives a superficial brief look at the recent work done. Two works [1, 2] proposed in 2018 used the SVM model for mental health detection. Authors of [2] compared Coarse Gaussian SVM to give better results as compared to Medium Gaussian SVM. Author [2], 2018 used LIWC (Linguistic Inquiry and Word Count) software for detecting depression, but we can apply more than 54 attributes. However, it achieved accuracy between 60 and 80% [2]. On the other hand, with an average accuracy of 76% and 25 out of 52studies reaching80% accuracy, the clinical use of these models may appear nearby 4using cross-validation. [2]. Accuracies ranged from the low 60s to predict depression [2]. The accuracy scores obtained for the classifiers built-in SVM, KNN, ensemble, and tree ensemble (random forest) give an equivalent accuracy score of 0.9 [3]. Catboost algorithm on training dataset Accuracy Plot test dataset is represented Logistic regression provides predictive accuracy of 87.5 and precision of 84.0% on the test dataset. The author [8] used a support vector machine for classification purposes and achieved more than 95% classification accuracy [8]. The Author of [5] used the Random Forest algorithm to provide the best result, with a predictive accuracy of 90%.
Machine Learning Approach for Detection of Mental Health
3
Kessler et al. tested a machine learning algorithm to predict the persistence and severity of major depressive disorder. Authors of [11] stated that random forest and support vector machines (SVM) had the highest AUCs 0.754) (95% CI 0.698–0.804; and 95% CI 0.701–0.802, respectively. The Author of [7] in working on the random forest was an accuracy is 83.33%, NB at 71.42%, SVM at 85.71%, and KNN at 55.55%. Where SGD achieved the best overall F1 score of 89.42% overall, closely followed by SVM, LR, and MNB with 89.39%, 89.37%, and 89.07% respectively [8]. The author [9] used a SupportVector Machine (SVM) followed by a Decision Tree and neural network. These three models have high accuracy which is above 70%. Five machine learning techniques were implemented [10], Logistic Regression, K-NN Classifier, Decision Tree Classifier, Random Forest, and Stacking. The outcome of these models is compared where the Stacking technique reported the highest prediction of 81.75%.
3 Dataset Description The dataset [11] consists of a survey that measures attitudes toward mental health and the prevalence of mental health issues in the tech industry. The dataset contains the following data such as age is used for age classification, and the country state refers to which country the employee lives. The critical part of the dataset is the family history of mental health and treatment seeking of mental health conditions. The work interface [11] for employees’ mental health conditions at work. The information on how many employees are working in the organization. The data include the percentage of employees working remotely 50% of the time, is the organization a tech company? Some data asked were the employees had any mental health benefits or any discussion about mental health issues in the organization. Important for the employee ease of taking a medical leave for a mental health condition. Comforting the employee, coworker, and supervisor to discuss mental health consequences. There is data containing mental health interviews that would state that it would bring up a mental health issue in the discussion. Data where employer faces between mental and physical health. Finally, data consist of adding additional notes or comments to the employees.
4 Proposed Model ANN is used in the model to predict the mental health of people in a company who have gone through mental health checkups. ANN is one of the most precise models used in the detection of mental health. We have used a survey dataset adapted from open source Kaggle Repository [11].
4
R. Pacharane et al.
Fig. 1. TensorFlow ANN Survey Model Algorithm ● ● ●
● ●
import dataset Apply data wrangling on dataset For feed-forward ANN set the model parameters as follows: ● Hidden layer activation function ReLU. ● Output layer activation function to Sigmoid and optimizer to ADAM. ● Binary cross-entropy loss functions Train the feed-forward ANN with 70% data for 50 epochs. After training test the model with the remaining 30% data.
We have used the tech survey dataset, the features of the tech survey dataset are discussed in the dataset description section. As the data was not suitable to directly train the ANN-based model there was a requirement for data wrangling. The proposed ANN model works on real number data whereas the dataset in hand included categorical, ordinal, and numeral types of information. We have used Python’s Panda library function to import the dataset into the pandas’ data frame. After analyzing the dataset, we understood that There were many null values in the various features. We have filled the null values with mode or average values. As the data is collected using google forms the timestamp was a combination of Year, month, day, hour, minute, and seconds. We have separated all the time components from the Timestamp feature and created them as separate features. The values added by the respondent were not in similar text for example we found that for gender there were 49 unique values where it can have a max of three unique values. To handle such inconsistency in the dataset We have converted the feature and their values into required unique values, for example, gender values were converted to ‘f’ for females and ‘m’ for males. Values of data in the used dataset are converted into binary features, ordinal features, and nominal features as per feature categories. All the values were now converted to numbers to transform the dataset from hybrid values to numerical values only. The number-based dataset is normalized using the Standard Scaler method. The data wrangling process converted the dataset into the desired format so as to be used with ANN-based modeling. Our dataset now has 1259 instances and
Machine Learning Approach for Detection of Mental Health
5
27 features. The Numerical dataset is given as input to the proposed ANN model. Our proposed model gave the best results with ReLU as the hidden layer activation function and Sigmoid as the output layer activation function. We implemented the adaptive optimizer ‘ADAM’ and the loss function used is binary cross entropy. The dataset was split into 70% training and 30% testing set. We achieved an accuracy of 64% at the 50th epoch with a batch size of 64. The model used binary features for the rest of the data of the normal range; the term used the nominal data name. Binary values used positive and negative functions. The function uses the encoded values as the input layer. The Model also used an optimizer as Adaptive Moment Estimation [ADAM]. There was the loss function which was used by binary cross-entropy. The k-fold validation technique was used in the model and the folded size was 20. The ANN model is trained using a batch size of 64 for 50 epochs. Root mean square optimizer with binary cross-entropy loss functions gives the best possible accuracy. The model gives an accuracy of 60 to 64%. The optimizer uses the ReLU Activation function layer As we can see in Fig. 1 the hidden layer uses the ReLU activation function. The Sigmoid function is used by the output layer to give the results.
5 Results and Discussion The proposed model is a tech review survey and the results are discussed in this section. From the output of the proposed model where we can see that the model has successfully predicted Mental health Checkups and the accuracy was observed to be 64%. As we see Fig. 2 represents the epoch vs loss graph, We can infer that at epoch 50 the model achieved 0.53% loss. As the number of epochs increases, the validation loss increases.
Fig. 2. Loss Vs Epoch graph
As we see Fig. 3 represents the epoch vs accuracy graph, We can infer that at epoch 50 the model achieved 64% accuracy. As the number of epochs increases, the accuracy of the model can be increased (Fig. 3).
6
R. Pacharane et al.
Fig. 3. Loss Vs Epoch graph
We have compared our work with three other works in a similar domain. Authors of [3, 4] have worked only on ANN and Random Forest and [2] on Decision Tree and SVM. Author [6] has used Linear Regression. We have achieved an accuracy of 64% using ANN (Table 1). Table 1. Result Comparison. Ref. No
Model
Result in accuracy
[3]
Decision tree and SVM
70%
[4][4]
KNN and Random Forest
90%
[6]
Linear regression
71%
Proposed model
ANN
64%
6 Conclusion and Future Scope In this research, we implemented that the data of a tech survey is based on a data wrangling technique ANN-based model was used. There was data converse numerical and categorical data. The result indicated the prediction of whether the tech company employees have gone through any mental health check-up or not in the company. The Future scope in order to assist mental health specialists in determining if a patient is at risk of acquiring a specific mental health issue, machine learning algorithms may help identify important behavioral biomarkers. The algorithms may also help in monitoring a treatment plan’s efficacy.
Machine Learning Approach for Detection of Mental Health
7
References 1. Islam, M.R., Kabir, M.A., Ahmed, A. Kamal, A.R., Wang, H., Ulhaq, A.: Depression detection from social network data using machine learning techniques. Health Inf. Sci. Syst. 6(1), 8 (2018).https://doi.org/10.1007/s13755-018-0046-0] 2. Janssen, R.J., Mourão-Miranda, J., Schnack, H.G.: Making individual prognosis in psychiatry using neuroimaging and Machine Learning. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 3(9), 798–808 (2018). https://doi.org/10.1016/j.bpsc.2018.04.004 3. Shatte A.B.R., Hutchinson D.M., Teague S.J.: Machine learning in mental health: a scoping review of methods and applications. Psychol. Med. 49(9), 1426–1448. (2019). https://doi.org/ 10.1017/S0033291719000151. Epub 2019 Feb 12. PMID: 30744717. 4. Srividya, M., Mohanavalli, S., Bhalaji, N.: Behavioral modeling for mental health using machine learning algorithms. J. Med. Syst. 42(5), 1–12 (2018). https://doi.org/10.1007/s10 916-018-0934-5 5. Graham, S., et al.: Artificial intelligence for mental health and mental illnesses: an overview. Curr. Psychiatry Reports 21(11), 116 (2019). https://doi.org/10.1007/s11920-019-1094-0] 6. Tate, A. E., McCabe, R. C., Larsson, H., Lundström, S., Lichtenstein, P., Kuja-Halkola, R.: Predicting mental health problems in adolescence using machine learning techniques. PLOS ONE 15(4). e0230389 (2020). https://doi.org/10.1371/journal.pone.0230389 7. MohdShafiee, N.S., Mutalib, S.: Prediction of mental health problems among higher education students using machine learning. Int. J. Educ. Manag. Eng. 10(6), 1–9 (2020). https://doi.org/ 10.5815/ijeme.2020.06.01[14] 8. Oyebode, O., Alqahtani, F., Orji, R.: Using machine learning and thematic analysis methods to evaluate mental health apps based on user reviews. IEEE Access 8, 111141–111158 (2020). https://doi.org/10.1109/access.2020.3002176 9. Tao, X., Shaik, T.B., Higgins, N., Gururajan, R., Zhou, X.: Remote patient monitoring using radio frequency identification (RFID) technology and machine learning for early detection of suicidal behavior in mental health facilities. Sensors 21(3), 776 (2021). https://doi.org/10. 3390/s21030776 10. Vaishnavi, K., Nikhitha Kamath, U., Ashwath Rao, B., Subba Reddy, N.V.: Predicting mental health illness using machine learning algorithms. J. Phys. Conf. Ser. 2161(1), 012021 (2022). https://doi.org/10.1088/1742-6596/2161/1/012021 11. https://www.kaggle.com/code/gcdatkin/mental-health-treatment-prediction/data
U-Net as a Tool for Adjusting the Velocity Distributions of Rheomagnetic Fluids Elena Kornaeva1 , Alexey Kornaev1,2 , Alexander Fetisov1(B) , Ivan Stebakov1 , and Leonid Savin1 1 2
Orel State University, Komsomolskaya, 95, 302026 Orel, Russia [email protected] , [email protected] Innopolis University, Universitetskaya 1, 420500 Innopolis, Russia https://innopolis.university/
Abstract. Hydrodynamics of viscous fluids deals with Navier-Stokes equation - a partial differential equation with unknown distributions for velocity and pressure in a flow domain. It is difficult to find its analytical solution, especially in cases of unsteady flows, flows of non-Newtonian or rheomagnetic fluids. It is usually solved numerically using finite difference, finite element, or control volume methods. The goal of this research is application of proposed physics-based loss to rheomagnetic fluids flows modeling. The basic network architecture is U-Net. The network receives an image of the flow domain and calculates the fluid velocity distribution in a form of an image of the stream function distribution. The network was tested for the asymptotic case, the results were compared with numerical solution and known analytical solution. Proposed tool allows modeling 2D flows of rheomagnetic fluids. The proposed method is general and allows modeling 3D flows. Keywords: Physics-based machine learning · Convolutional neural networks · Navier-Stokes equation · Calculus of variations · Hydrodynamics · Rheomagnetic fluids · Body forces
1
Introduction
The idea of using artificial neural networks as a tool for the solution of the boundary value and variational problems has been known for a long time and it is still actual [7,11,19,21]. First of all, due to the modern network architectures and deep learning. A physics-based loss is usually a conservation law that depends on a set of variable functions. Those functions can be parameterized and approximated with artificial neural networks. The approach of physics-based loss This paper was supported by the Russian Science Foundation under the grant No 2219-00789, https://rscf.ru/en/project/22-19-00789. The authors gratefully acknowledge this support. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 8–16, 2023. https://doi.org/10.1007/978-3-031-35510-3_2
U-Net as a Tool for Adjusting the Velocity Distributions
9
minimisation can be applied to mechanics of rigid [17] and deformable [19] bodies, and to hydromechanics [12]. The role of convolutional neural networks in solution problems in many practical areas, as well as the accuracy of obtained solutions, has increased in the past years [5,10,13,20]. In this work we applied U-Net architecture [18] as a tool for minimization of the proposed physics-based loss and solution of the rheomagnetic fluids flow problem in hydrodynamics.
2
Theoretical Basics
A boundary value problem with partial differential equations may be equivalent to minimization of a target functional [8]. Since the minimization of a target functional is the main operation in machine learning, then we may use an artificial neural network as a tool in solution of a variational problem. 2.1
Physics-Based Loss
Kornaeva et al. [12] proposed the following physics-based loss function given in a flow domain Ω characterized with surface S and unit normal vector n: JL∗ [ψm (xi )] = (Πv + ρF(∇ × Ψ))dΩ → min, (1) Ω
where Ψ = [[ψm (xi )]] is unknown stream function, Πv = μHdH is viscoelastic potential, μ is viscosity (a constant value for Newtonian fluids), H is shear strain rate intensity, ∇ is Hamiltonian with components [[∂/∂xi ]], cross product is for rotor operation, ρ is density, F is a body force (e.g. gravity or magnetic force), V = ∇ × Ψ is velocity vector with components [[vi ]]. The right-side part of Eq. 1 depends on velocity since the shear strain rate intensity has the following form: H = 2ξij ξij , (2) where Tξ = [[ξi , j]] = (∇ ⊗ V + V ⊗ ∇)/2 is strain rate tensor. So, the proposed loss depends on unknown Ψ function. The value of this function must be fixed on the rigid walls of the flow domain and the difference between values of the function on the walls is equal to the flow rate. 2.2
Rheomagnetic Fluids
Rheomagnetic fluids under the action of applied magnetic field are able to change their yield strength, which opens up wide possibilities for controlling their rheological characteristics [16]. The mechanism of the mutual influence of the rheomagnetic fluid and the electromagnetic field lies in the resulting body force of electromagnetic nature [4]. Magnetizable metal particles, which are part of rheomagnetic fluids, increase the flow resistance when the electromagnetic field
10
E. Kornaeva et al.
occurs [14]. The continuum approach is best known for the mathematical description of the flows of rheomagnetic fluids [9]. In general, tribological devices using rheomagnetic fluids as a working fluid operate in three main modes: valve mode, shear mode and compression mode, as well as their combinations [2]. The shear mode describes the flow of a medium in a channel between parallel fixed planes under the action of a transversely applied electromagnetic field. The solution of such problems is of particular interest from the point of view of testing new approaches to modeling flows in media with complex rheology.
3
Simulation Modeling
The main idea of the proposed simulation model is domain-based approach to approximation of the unknown function. The model is implemented with UNet [18]. It needs just one image of the flow domain as input. So, the model is data set free. The simulation result is the image of the adjusted Ψ function distribution in the flow domain with the mask of the flow domain. Intensity of each pixel in the image corresponds to the value of the Ψ function in the position of the pixel. The algorithm of the model is shown in Fig. 1 has the following points.
Fig. 1. Simulation model intuition. The U-Net receives initial distribution of the unknown Ψ function with the mask of the flow domain that contains the information on the flow rate. The network outputs numerical solution for the Ψ function with the mask of the flow domain. The physics-based loss is minimized during training.
The U-Net [18] receives initial values of the unknown Ψ function (e.g. linear distribution) in the form of image N × N × 1 with the flow domain mask and the additional mask for one of the walls with constant value Q equal to the flow rate (see Fig. 2). The network outputs the N × N × 1 image of the adjusted Ψ
U-Net as a Tool for Adjusting the Velocity Distributions
11
Fig. 2. Initialisation of the unknown Ψ function for the case of fluid flow between two parallel plates: linear distribution (a), linear distribution with mask (b), slice of the Ψ function (c).
function distribution (see Fig. 1). Then the masks described above are applied to the image. Since the velocity distribution is the rotor of the Ψ function (see Eq. 1) and the shear rate intensity Eq. 2 depends on the gradient of the velocity distribution the numerical differentiation of the Ψ function and the components of velocity allow calculation of the integrand of the loss Eq. 1. Numerical differentiation of the y function at a given point (pixel) i with step h in the direction of the x axis can be implemented using the second order approximation [15]: (∂y/∂x)(i) ≈
y (i+1) − y (i−1) . 2h
(3)
As the result, the main mechanical distributions (velocity, strain rates, shear rate intensity, etc.) can be calculated numerically (see Fig. 3). Then the loss can be calculated using a formula for numerical integration [15]. Training of the network is the loss minimisation Eq. 1. The minimized loss provides adjusted solution for the Ψ function and other distributions. First of all, the velocity distribution.
4
Results and Discussion
The section deals with the task of a rheomagnetic fluid flow between two parallel plates. The task has analytical solution. The problem considered a flow channel of 0.1 mm width, and 0.8 mm length. The pressure drop between the ends of the channel is Δp = 200P a. Rheomagnetic fluid with a density of 1000 kg/m3 , the dynamic viscosity of 0.001 P a·s, and an electrical conductivity of 1·107 S/m was chosen. The following boundary conditions were applied: the fluid velocity is equal to zero on the walls and the flow rate (or alternatively, pressure drop along the horizontal direction) is given. The main numerical solution was obtained with
12
E. Kornaeva et al.
Fig. 3. Simulation results for the case of fluid flow between two parallel plates: the Ψ function distribution (a) and the velocity distribution (b).
U-Net of given architecture [3] with the proposed loss. The network received an image of size 512 × 512 × 1. The training process took about 40000 epochs. The alternative numerical solution was obtained with Comsol software [1,6]. Solution process was carried out on the basis of convergence criteria that was chosen to be root-mean-square residuals (RMS) equal to 106 . The analytical solution for the test task has the following form [15]: v1 =
p0 − p1 2 1 ρf1 (L22 − x22 ), (L2 − x22 ) − 2μL1 2μ
(4)
The computational experiment consisted in determining the velocity distributions for various values of the transversely applied electromagnetic field. The variable in this case was the strength and induction of the magnetic field, and the measured value was the maximum velocity of the fluid flow. The experiment was described using the Hartmann number, which is a dimensionless quantity characterizing the ratio of the magnetic force to the viscous one. Based on the Hartmann number, the induction and intensity of the applied electromagnetic field were calculated. An example of the main numerical solution obtained with U-Net is presented in Fig. 3. The results of the alternative numerical solution obtained using Comsol Multiphysics are presented in Fig. 4. The flow domain had the grid with 872 triangular and quadrangular elements. The maximum element size is 2·10−5 m, the minimum element size is 1·10−5 m. Along the walls of the simulated channel, 5 additional boundary layers are built, which makes it possible to increase the calculation accuracy in the near-wall region. Figure 5 shows the comparative results for the numerical and analytical solution in the form of the velocity profile of the rheomagnetic fluid.
U-Net as a Tool for Adjusting the Velocity Distributions
13
Fig. 4. Rheomagnetic fluid flow between two parallel plates: Comsol meshing (upper symmetrical part of the flow domain) and simulation results (lower symmetrical part of the flow domain), Ha = 5.
Fig. 5. Rheomagnetic fluid flow velocity profile between parallel plates at different Hartmann numbers: the effect of decreasing the flow rate in the presence of an electromagnetic field.
Table 1 shows the main data of the computational experiment. With an increase in tension, there is an increase in the volumetric Lorentz force, which characterizes the mutual influence of the fluid flow and electromagnetic forces. The Lorentz force causes flow deceleration and decreases the maximum value of the velocity.
14
E. Kornaeva et al. Table 1. The computational experiment comparative data.
Hartman number
Magnetic field strength, A/m
Magnetic field induction, T
Lorentz V1max , force, F1 , Comsol, N/m3 m/s
V1max , analytical, m/s
V1max , proposed, m/s
Ha = 0 Ha = 2.5 Ha = 5
0 198000 397000
0 0.25 0.5
0 –1.18·105 –2.1·105
0.313 0.189 0.0836
0.306 0.181 0.0829
0.312 0.188 0.0840
It can be seen from Table 1 that the proposed model is quite accurate in comparison with alternative Comsol Multiphysics solution and the analytical solution. The proposed approach has an advantage for ease of implementation since the input of the network is simply image of the flow domain and value of the flow rate. However, the proposed approach is suitable for modeling only stationary or quasi-stationary flows.
5
Conclusions
The proposed physics-based loss allows to train the network without data set and to solve the complex problems in hydrodynamics. The loss is general and can be applied to solution of 3D flow problems. Meanwhile, the proposed simulation model deals with 2D flow problems. It can be generalized in the future research. In this paper we studied the problem of a rheomagnetic fluid flow between two parallel plates and compared the obtained solution with analytical solution and with alternative numerical solution. The error of the proposed model is up to 2.2% in comparison with the analytical solutions. We suppose, that the error is mostly caused by the numerical differentiation and integration and it can be decreased in the future research.
References 1. Comsol multiphysics. software - understand, predict, and optimize. https://www. comsol.com/comsol-multiphysics 2. Ahmadian, M., Poynor, J.: An evaluation of magneto rheological dampers for controlling gun recoil dynamics. Shock Vibr. 8, 674830 (2001). https://doi.org/10. 1155/2001/674830 3. Buda, M., Saha, A., Mazurowski, M.A.: Association of genomic subtypes of lowergrade gliomas with shape features automatically extracted by a deep learning algorithm. Comput. Biol. Med. 109, 218–225 (2019). https://doi.org/10.1016/j. compbiomed.2019.05.002
U-Net as a Tool for Adjusting the Velocity Distributions
15
4. Burgers, J.M.: Magnetohydrodynamics, by T. G. COWLING, New york : Interscience publishers, inc., 1957. 115 pp. dolar3.50. J. Fluid Mech. 3(5), 550-552 (1958). https://doi.org/10.1017/S0022112058220181 5. Das, P.K., Diya, V.A., Meher, S., Panda, R., Abraham, A.: A systematic review on recent advancements in deep and machine learning based detection and classification of acute lymphoblastic leukemia. IEEE Access 10, 81741– 81763 (2022). https://doi.org/10.1109/ACCESS.2022.3196037. https://ieeexplore. ieee.org/document/9848788/ R finite element 6. Dickinson, E.J., Ekstr¨ om, H., Fontes, E.: Comsol multiphysics: software for electrochemical analysis, a mini-review. Electrochemis. Commun. 40, 71–74 (2014). https://doi.org/10.1016/J.ELECOM.2013.12.020 7. Dissanayake, M.W., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numerical Methods Eng. 10, 195– 201 (1994). https://doi.org/10.1002/CNM.1640100303. https://onlinelibrary.wiley. com/doi/full/10.1002/cnm.1640100303 8. Gelfand, I.M., Fomin, S.V.: Calculus of variations. Courier Corporation (2000). https://books.google.com/books/about/Calculus of Variations.html?hl=ru& id=YkFLGQeGRw4C 9. Ghaffari, A., Hashemabadi, H., Ashtiani, M.: A review on the simulation and modeling of magnetorheological fluids. J. Intell. Mater. Syst. Struct. 26, 881–904 (2014). https://doi.org/10.1177/1045389X14546650 10. Hazarika, R.A., Abraham, A., Kandar, D., Maji, A.K.: An improved LeNet-deep neural network model for Alzheimer’s disease classification using brain magnetic resonance images. IEEE Access 9, 161194–161207 (2021). https://doi.org/10.1109/ ACCESS.2021.3131741 11. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989). https://doi.org/10.1016/ 0893-6080(89)90020-8 12. Kornaeva, E., Kornaev, A., Fetisov, A., Stebakov, I., Ibragimov, B.: Physics-based loss and machine learning approach in application to non-newtonian fluids flow modeling. 2022 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2022). https://doi.org/10.1109/CEC55065.2022.9870411. https://ieeexplore.ieee. org/document/9870411/ 13. Li, Z., Yang, W., Peng, S., Liu, F.: A survey of convolutional neural networks: analysis, applications, and prospects. CoRR abs/2004.02806 (2020). https:// arxiv.org/abs/2004.02806 14. Omidbeygi, F., Hashemabadi, S.: Exact solution and CFD simulation of magnetorheological fluid purely tangential flow within an eccentric annulus. Int. J. Mech. Sci. 75(Complete), 26–33 (2013). https://doi.org/10.1016/j.ijmecsci.2013.04.009 15. Patankar, S.: Numerical heat transfer and fluid flow. Electro Skills Series, Hemisphere Publishing Corporation (1980). https://books.google.ru/books? id=N2MVAQAAIAAJ 16. Rabinow, J.: The magnetic fluid clutch. Trans. Am. Inst. Electr. Eng. 67(2), 1308– 1315 (1948). https://doi.org/10.1109/T-AIEE.1948.5059821 17. Raymond, S.J., Camarillo, D.B.: Applying physics-based loss functions to neural networks for improved generalizability in mechanics problems (2021). https://doi. org/10.48550/ARXIV.2105.00075. https://arxiv.org/abs/2105.00075 18. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. CoRR abs/1505.04597 (2015). http://arxiv.org/abs/ 1505.04597
16
E. Kornaeva et al.
19. Samaniego, E., et al.: An energy approach to the solution of partial differential equations in computational mechanics via machine learning: concepts, implementation and applications. Comput. Methods Appl. Mech. Engineering 362, 112790 (2019). https://doi.org/10.1016/j.cma.2019.112790. http://arxiv.org/abs/ 1908.10407 20. Swain, M., Tripathy, T.T., Panda, R., Agrawal, S., Abraham, A.: Differential exponential entropy-based multilevel threshold selection methodology for colour satellite images using equilibrium-cuckoo search optimizer. Eng. Appl. Artif. Intell. 109, 104599 (2022). https://doi.org/10.1016/J.ENGAPPAI.2021.104599 21. Thuerey, N., Holl, P., Mueller, M., Schnell, P., Trost, F., Um, K.: Physics-based deep learning. In: WWW (2021). http://physicsbaseddeeplearning.org
Detection of Similarity Between Business Process Models with the Integration of Semantics in Similarity Measures Wiem Kbaier1(B) and Sonia Ayachi Ghannouchi2(B) 1 Higher Institute of Computer Science and Communication Techniques Hammam Sousse
(ISITCOM), University of Sousse, Sousse, Tunisia [email protected] 2 RIADI Laboratory, University of Manouba, Manouba, Tunisia [email protected]
Abstract. Business process models play an important role in today’s organizations and they are stored in models repositories. Organizations need to handle hundreds or even thousands of process models within their model repositories, which serve as a knowledge base for business process management. Similarity measures can detect similarities between Business process models and consequently they play an important role in the management of business processes. Existing researches are mostly based on the syntactic similarities based on labels of activities and deal with mapping of type 1:1. To address the problem, semantic similarities remain difficult to detect and this problem is accentuated when dealing with mapping of type n:m and considering large models. In this paper, we will present a solution for detecting similarities between business process models by taking into account the semantics. We will use a genetic algorithm, which is a well-known metaheuristic, to find a good enough mapping between two process models. Keywords: Business Process models · Similarity measures · Semantics · Genetic Algorithm · Matching
1 Introduction In organizations, thousands of business processes (BP) are modeled and stored due to the diversity of needs and operations associated with these processes. In fact, process documentation generates a large number of process models in a repository. So, a repository helps to improve business model development and resolves flaws in process modeling [2]. But, with the rapidly changing environment, organizations must be able to quickly and flexibly adjust their business processes to meet new demands. However, it is extremely complicated to create business processes from the begining. Hence the need to manage repositories of business process models so that organizations continually improve their operations [2]. For this, the detection of similarity within the repository © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 17–29, 2023. https://doi.org/10.1007/978-3-031-35510-3_3
18
W. Kbaier and S. A. Ghannouchi
between business process models is mandatory but the problem is how to be able to recover models that are similar or containing similar fragments. Besides, similarity measures are frequently used in text similarity analysis and Clustering. Any measure of similarity or distance usually measures the degree of proximity between two entities, which can be in any format of text, such as documents, sentences or even terms. These similarity measures can be useful in identifying similar entities and distinguishing clearly different entities from one another. Similarity measures are very effective, and sometimes choosing the right measure can make a big difference in decision making. In this context, similarity measures are used to detect the similarity between business process models. Therefore, calculating the similarity between BPs is a task performed in a wide variety of business process management applications. So, similarity measures [5] can be useful in many cases such as merging BPs and managing repositories to check if similar models are stored in the repository. Also, similarity measures can facilitate the reuse of BP models because they reduce time and cost. Therefore, it is important to find existing BP models and reuse them. Within a business, customer requirements can change. So it is necessary to have a similarity measure that simplifies changes by determining the processes that meet these needs. This simplifies the management and facilitates the reuse of these processes. Besides, it is necessary to measure the degree of conformity between a reference model and a given model using similarity measures. Moreover, during execution, services are called. But, these services can fail for example due to a computer failure. So you have to find identical or similar services to automate execution [5]. A basic technique required for many approaches of process model similarity is matching. More precisely, PMM (Process Model Matching) is composed of techniques that allow the automatic identification of corresponding activities between two business process models. Several correspondence techniques have been developed. The remainder of the paper is organized as follows. Section 2 presents the measures and the matching and their problems are explained in Sect. 3. Section 4 illustrates the related work found. Section 5 presents our approach. Section 6 concludes the paper.
2 Similarity Measures Mainly, there are four measures which are most used by different authors. These are syntactic, semantic, structural and behavioral measures. In addition, according to a similarity search carried out by Dumas et al. [9] the similarity measures are classified according to three criteria which are: the labels (label), the graphical structure of the model, and the execution semantics. In fact, the grouping of similarity measures varies slightly according to the different proposals of the authors as we will show in the related work section. 2.1 Syntactic Measures Syntactic measures relate to simple comparisons of strings and do not take into account the meaning or context of words. The most used syntactic measures are the Levenshtein
Detection of Similarity Between Business Process Models
19
distance which counts the number of edit operations (add/remove/substitute) and the Jaro-Winkler distance, which works similarly, but produces a value included between 0 and 1. The Jaccard and Dice coefficient measures both calculate the similarity between two activity labels as a function of the number of shared and unshared words [13]. Also, the authors consider cosine similarity, Jensen-Shannon distance measure and the substring measure that allows taking into account substring relationships between the activities. The limitation of these measures is not only their inability to recognize synonymous terms, but also their tendency to view unrelated words as similar. Take, for example, the unrelated words “contract” and “contact”. The Levenshtein distance between these words is only 1, indicating a strong similarity between the terms [13]. 2.2 Semantic Measures Semantic measures aim to take into account the meaning of words. A very common strategy for doing this is identifying synonyms using the lexical database WordNet [17]. The most prominent semantic measure is the Lin similarity. Lin Similarity is a method for calculating the semantic relatedness of words based on their information content according to WordNet taxonomy [13]. To use the Lin measure for measuring the similarity between two activities (which mostly contain more than a single word), we have to combine Lin’s similarity with the bag-of-words model. The bag-of-words model turns an activity into a set of words, ignoring grammar and word order. The Lin similarity can then be obtained by identifying the word pairs from the two bags with the highest Lin score and calculating their average. Other measures based on the WordNet dictionary include Wu & Palmer and Lesk [13]. The first calculates the similarity between two words by considering the path length between these words in WordNet taxonomy. The latter compares the WordNet dictionary definitions of the two words. Some approaches also directly verify hypernym relationships (a hypernym is a more common word). For example, some authors consider “car” and “vehicle” to be identical words since “vehicle” is a hypernym of “car” [13]. The semantic measures used are very basic and based on the WordNet dictionary. This is a significant problem because any measure based on WordNet returns a similarity score of zero if a term is not in the WordNet dictionary. Although the WordNet dictionary is quite comprehensive, it does not cover the complex compound words that we often find in process models. For example, “problem report” or “budget plan” activities are measured by determining synonyms for each word and not considering compound words [13]. 2.3 Structural Measures The structural similarity mainly reflects the similarity of process model topology which expresses the logical relationship between the activities. It depends on the relationship between the relevant data and the control flow. Therefore, the structure is one of the important static attributes of the process model. The relevant aspects of this category come from graph theory and the general graph-structured based similarity between models can be quantified by the graph edit distance [22]. Dijkman et al. [7] define the graph
20
W. Kbaier and S. A. Ghannouchi
edit distance between different process models and design the basic edit operations and the similarity formula. The graph edit distance between two graphs is the minimum number of graph edit operations required to switch from one graph to another. Different graph edit operations can be taken into account such as: deleting or inserting nodes, substituting nodes, and deleting or inserting edges. 2.4 Behavioral Measures Behavioral similarity emphasizes the execution semantics of business process models. This is usually expressed by a set of allowed execution traces of the process model. These traces can be generated by simulation or during the actual execution of a process and are usually stored in a process log. Currently, most of the behavioral similarities of processes are obtained by measuring the similarity of simulated traces of process models [22].
3 Problem Illustration In this section, the limitation of similarity measures and the choice of dimensions are discussed. Then, we illustrate the cardinality problem and a presentation of genetic algorithm. 3.1 Similarity Measures Becker and Laue [5] provide a detailed survey on the exact calculations used by process model similarity measures. So, the results show that different similarity measures rank the similarity between BP very differently. For example, when we take two business process models and then we calculate the similarity with two different measures, the first measure might indicate that they are similar, while the second measure indicates that they are not similar. Therefore, Becker and Laue [5] conclude that there is not a single “perfect” similarity measure. At the same time, it is unclear which measures can be meaningfully applied in a specific context. In addition, Szmeja et al. [18] proposed a classification of similarity measures related to semantics into different dimensions. Each dimension represents a different type of similarity. For these reasons, we take as starting point the similarity measures classified by dimension of semantics because they allow to add extra information to the similarity score, and to highlight differences and similarities between results. 3.2 Dimensions of Semantic Similarity Understanding the meaning of a dimension is absolutely necessary in order to decide which dimensions to utilize. For this, the semantics of each dimension is explained in detail. • Lexical: Entities are lexically similar, when the words used to label them are similar according to a dictionary.
Detection of Similarity Between Business Process Models
21
• Co-occurrence: Objects are co-occurrently similar, when they often appear together. • Taxonomic: Objects are taxonomicaly similar, when they are of similar classes, kinds or types • Descriptive: Objects are descriptively similar, when they have similar properties, attributes or characteristics • Physical: Objects are physically similar, when their physical characteristics and appearance are similar • Compositional: Objects are compositionaly similar, when they have a similar set of parts or ingredients • Membership: Objects are membership similar, when they have similar sets of representatives, instances or members We precisely focus on lexical dimensions because our objective is to detect similarity by introducing semantics. 3.3 Cardinality Problem Few works have addressed the problem of cardinality when detecting similarities between business process models. It is possible to determine the activity of the first model which corresponds to another activity of the second model. This corresponds to the case of simple cardinalities [1: 1]. On the other hand, we may find problems in detecting the corresponding activities because one activity can be similar to a set of activities of the other model. We are talking in that case about complex cardinalities which are either [1: N] or [M: N]. Recent works in the area of business process model mapping show an interest in complex mappings. A considerable number of complex matches cannot be successfully identified because two models can differ in terms of the terms they use (synonyms) as well as their level of detail. Given these differences, the correct recognition of correspondences between two process models can become a complex and difficult task. To solve this problem, we can detect the mapping with the Genetic Algorithm (GA). The GA is a populationbased approach which means that it starts by an initial solution with many points. In our case, for detecting the optimal mapping, it starts by one activity of the first model and browses the other activities of the second model until it finds the best correspondence. We consider a set of solutions for a problem and select the set of best ones out of them. 3.4 Genetic Algorithm The GA is a search algorithm which was based on survival of the fittest of natural evolution. This metaheuristic is based on the method of natural choice of the best one for producing the children of the next generation maintaining the law of Darwin’s theory. GAs are based on the random search technique but in a structured manner. These heuristics depend on the method of natural selection and natural genetics. So, in each generation, a new set of offspring is generated, utilizing portions of the fittest individuals of the previous generations. The GA can be applied with a coding of the parameter set and they start its execution from a group of points not from a single point [14].
22
W. Kbaier and S. A. Ghannouchi
Since we are interested in a solution for our problem based on metaheuristic algorithms, it is important to note that these algorithms are used to solve real-life complex problems arising in different fields. These metaheuristic algorithms are broadly classified into two categories namely single solution and population based metaheuristic algorithm. We have chosen to continue with population based metaheuristic algorithm because the first category such as simulated annealing, tabu search (TS), microcanonical annealing (MA), and guided local search (GLS) utilize single candidate solution and improve this solution by using local search. However, the solution obtained from single-solution based metaheuristics may stuck in local optima. Population-based metaheuristics utilize multiple candidate solutions during the search process. These metaheuristics maintain the diversity in population and avoid the solutions which stuck in local optima. The well-known population-based metaheuristic algorithm is the genetic algorithm [14]. For instantiating a genetic algorithm to solve the problem at hand, five phases are considered in a genetic algorithm. The first one concerns the encoding of a solution (i.e., the representation of an initial population), the second is the fitness function to be used to evaluate the quality of a candidate solution and the third one concerns the crossover and mutation operators.
1. 2. 3. 4.
Genetic Algorithm Randomly initialize population (t) Evaluate Fitness of population (t) Repeat Select parents from population (t) Perform crossover on parents creation population (t+1) Perform Mutation on population (t+1) Evaluate fitness of population (t+1) Until best individual is good enough
4 Related Work According to the literature, several papers [1, 7–11, 16, 19, 20] provided an overview of similarity measures such as the definition and calculation of certain measures. Dijkman et al. [7] were interested in determining the similarity between business process models that are documented in an organization’s repository. Indeed, Dijkman et al. [7] presented three similarity measures to solve this problem which are evaluated experimentally in terms of “precision” and “recall”. Thus, according to this evaluation, the authors deduced that the results of the measurements are comparable but the structural measure is more efficient than the others. The measures found in [7] are: • Node matching similarity which compares the labels linked to the elements of models. • Structural similarity which compares element labels according to the model topology. • Behavioral similarity which compares the labels of the elements as well as the causal relationships captured in the process model. The detection of similarities between business process models is a very interesting subject and it is adopted by several authors because it allows solving some problems
Detection of Similarity Between Business Process Models
23
such as the measurement of the conformity between the reference models and the current models and finding associated models in a repository. So, Becker and Laue [5] gave a detailed overview of the exact calculations used by similarity measures of process models. According to this paper, the comparison of two business process models should be established in two steps. In the first step, the activities of one model that match the activities of the other model are identified. This step is known by matching i.e. if the models have been created in different organizations or if they describe a business process at different levels of details, the corresponding activities must be identified either by using one of the existing algorithms or on the basis of expert judgment. In the second step, the similarity measures between models are applied. Moreover, Zhou et al. [22] have shown that measuring process similarity plays an important role in business process management. Additionally, to establish efficient use of processes, the paper [22] specializes in detecting similar patterns by applying process models which are predefined descriptions of business processes and process logs that can be viewed as an objective observation of the actual execution behavior of the process. In other words, they specialize in structural measures and behavioral measures. Another paper [4] proposed an automated approach to query a repository of business process models through queries for structurally and semantically relevant models. Specifically, a user establishes a request “BPMN_Q” and therefore he must receive a list of models ordered by relevance according to the request. The objective of this paper is to find models through queries then apply similarity measures only to models with the same scope. This idea is further confirmed by Jabeen et al. [13]. Therefore, the search for process models favors the detection of correspondences between model activities. In addition, there are authors [4, 11, 12, 15] who work using Semantic Business Process Models (SBPM). Thus, a business process model is considered as a semantic model if it contains descriptions based on the ontology of the process models, this helps to resolve ambiguity issues caused by the use of synonyms and homonyms. On the other hand, the authors of [10] aim to solve the ambiguity problem of PM by using an ontology-based description of process models. Indeed, the detection of labels of similar elements automatically helps ensure the interconnectivity and interoperability of business processes. For this reason, an OWL_DL description is applied for models with the Petri net notation. Several works have dealt with the matching between business process models, hence the development of different techniques for the matching of process models (PMM: Process Model Matching). Furthermore, the evaluations described by Cayoglu et al. [6] and Antunes et al. [3] can be regarded as related works in which process model matching techniques are summarized and compared to each other. However, these techniques are heuristic and, therefore, their results are uncertain and should be evaluated on a common basis. So due to the lack of datasets and frameworks for the evaluation of techniques, the idea of Process Model Matching Contest 2013 (PMMC’13) was created by Cayoglu et al. [6] which aimed to address the need for
24
W. Kbaier and S. A. Ghannouchi
effective evaluation by defining mapping issues between process models in published data sets. The domain experts developed two sets of benchmark data called PMMC’13 to allow the researcher validating their work. These databases are validated by expert opinion and also by three performance measures (precision, recall, F score). Likewise, Antunes et al. [3] launched a second competition in 2015 which is the same concept as the first competition in 2013 but with an improved database and more techniques to evaluate. In addition, [13] aims to improve the performance of PM matching techniques by first establishing an analysis of a literature review on all existing process model matching techniques and secondly takes place by performing an analysis of PMMC 2015 to determine the most suitable technique. In fact, PMM techniques consist of automatically identifying the corresponding activities between two models that represent similar or identical behavior. In fact, the paper shows that the techniques presented in PMMC’13 and PMMC’15 are mainly based on syntactic measures and semantic measures based on Wordnet. According to [21] business process models can be compared, for example, to determine their consistency. [21] reported that any comparison between process models relies on a match that identifies which activity in one model matches which activity in another. This paper introduces the matching tools which are called matchers. Weidlich et al. [21] present the ICoP framework (Identification of Correspondences between Process Models), which can be used to develop such matchers. The framework allows the creation of matchers from reusable components or newly developed components. It focuses on detecting complex matches between activity groups.
5 Our Approach Our approach makes it possible to detect the similarity between business process models and more precisely the detection of similar fragments in large models, while better taking into consideration the semantics at the level of the different similarity measures and not only for the semantic similarity measure. Indeed, according to the work found, certain measures apply the semantics by accessing the WordNet dictionary. But the problem is that these metrics only apply at the activity label level and do not take into account other elements of the models such as gateways and events. For this reason, we propose to begin with the first step of process model matching which is the first line matcher. In this step, we apply the genetic algorithm that helps to identify the best mapping because we apply the Jiang similarity measure and resolve the problem of complex cardinality. After obtaining the best mapping, we apply the structural measures for this mapping. So, we deal with both semantic and structural measures. 5.1 Steps of Genetic Algorithm To create this representation of a Candidate solution presented in Fig. 1, we propose to use a string representation. Each chromosome (solution) contains the labels of the activities
Detection of Similarity Between Business Process Models
25
Fig. 1. Two business processes and their correspondences [13]
of each model. Thus each activity in the two BPs has the chance to be matched. The size of the chromosome is equal to max (|BP1|, |BP2|). For example a candidate solution (a possible mapping) between the two processes in Fig. 1 is represented by an array of strings. The size of the array presented in Table 1 is 6. Table 1. An example of candidate solution Check formal requirements
Examine employement references
Check grades
Inform applicant about decisions
Invite for eligibility assessment
In this example, the model of company B is the source BP. The first dimension indicates that an activity in the BP of company B « Receive documents» is mapped to an activity in the second BP of company A « Check formal requirements». The second activity in the BP of company B « Evaluate» is mapped to an activity in the second one «Examine employement references», the activity « Ask to attend aptitude test» is mapped to « Check grade» in the BP of company A. The activity « Assess capabilities of applicant» is mapped to the activity « Inform applicant about decisions». The activity « Send letter of acceptance» is mapped to the activity « Invite for eligibility assessment». The activity « Send letter of rejection» is not mapped to any activity. Thus, the candidate solution represented by this chromosome is the matching M = {(Receive documents, Check formal requirements), (Evaluate, Examine employement references), (Ask to attend aptitude test, Check grade), (Assess capabilities of applicant, Inform applicant about decisions), (Send letter of acceptance, Invite for eligibility assessment)), (Send letter of rejection),∅)}. Then, we need to apply crossover and mutation to obtain the optimal mapping. The process established for recovering models which are similar or which contain a similar fragment is in three stages which are represented in Fig. 2.
26
W. Kbaier and S. A. Ghannouchi
Fig. 2. Overview of the Proposed Approach.
The first step is to identify the activities of one model that match the activities of the other model. The Genetic Algorithm shows this step of matching i.e. describing which activity in the first model corresponds to an activity in the second model using an algorithm that uses activity pairs to give you all possible mappings then with the fitness function, we can choose the best mapping with complex cardinality. The second step is the application of the similarity measures. In fact, these measures are stored in a database and classified by dimensions. So, for calculating the semantic similarity we need activity from BP1 and activity from BP2. First, each label of these activities are splited into a set of tokens. Then, each token is lemmatized to be able to find the score of similarity between it and its most similar token in the second label using the Jiang similarity measure. We calculate the semantic similarity by using the Jiang measure. This method has used a corpus in addition to a hierarchic ontology (taxonomic links). The distance between two concepts C1 and C2, formulated in this work is the difference between the sum of the information content of the two concepts and the information content of their most informative subsumer: SimJnc(c1, c2) =
1 IC(C1) + IC(C2) − 2 ∗ IC(LCS)
This measure is insightful to the shortest path length between C1 and C2 and the density of concepts along this same path. Where, IC stands for information contest and LCS is the lowest common subsumer of C1 and C2 defined as the common parent of them with minimum node distance. The last step is the recommendation that is to say after knowing if the two models are similar, it is necessary to act and choose an action to be done as illustrated in Fig. 3. In this context, when the similarity calculation is applied in the case of companies merging, the recommendation is to merge two models and therefore adopt a newly established model. If we are managing the company’s repository, the possible recommendation is to reuse the reference model by making the necessary changes. If the two models are not similar then the recommendation is to add a new model to the repository.
Detection of Similarity Between Business Process Models
27
Fig. 3. Proposed recommendation
To justify our proposal, we will apply our results obtained by the genetic algorithm with the databases presented in Process Model Matching Contest 2015. For the evaluation of the proposed approach, we have selected three benchmark datasets, which are specific to three different domains. The three datasets were developed for the PMMC’15. We have chosen to use these three datasets because PMMC’15 content has been used in the evaluation of many process model matching techniques and they are available online. Also, the involvement of leading experts of the domain and the frequent use of the datasets for the evaluation of matching techniques are reasons for our choice. This evaluation is done by applying our approach to these databases. we apply our approach on PMMC’15 models. Then, we calculate the three performance measures (precision, recall, F score) of these datasets. Once we obtain the performance values of our prototype then we compare them with the performance measures of PMMC’15 models established by the other techniques.
6 Conclusion This paper studies three classes of similarity measures (syntactic, semantic and structural) designed to answer process model similarity queries. So, our objective is to detect the similarity between business process models and more precisely the detection of similar fragments while considering the semantics at the genetic algorithm. Our approach was to apply semantic similarity in genetic algorithm to obtain the best mapping with complex cardinality using Genetic Algorithm. Then, we apply the structural similarity for the mapping obtained to discover if the two business processes are similar or not and act with recommendations. For future research, first we will evaluate our approach by experimental validation. Then, we will develop a prototype to be able to apply the genetic algorithm and we will evaluate the results obtained by the databases published in PMMC 2015. We will also study the benefits of our approach in various case studies. Furthermore, we will investigate the detection of similarity in the case of large models.
References 1. Aiolli, F., Burattin, A., Sperduti, A.: A Metric for Clustering Business Processes Based on Alpha Algorithm Relations. Department of Pure and Applied Mathematics, University of Padua, Italy, pp. 1–17 (2011)
28
W. Kbaier and S. A. Ghannouchi
2. Ali, M. Shahzad, K.: Enhanced benchmark datasets for a comprehensive evaluation of process model matching techniques. In: Pergl, R., Babkin, E., Lock, R., Malyzhenkov, P., Merunka, V. (eds.) EOMAS 2018. LNBIP, vol. 332, pp. 107–122. Springer, Cham (2018). https://doi. org/10.1007/978-3-030-00787-4_8 3. Antunes, G., et al.: The process model matching contest 2015, vol. 248, pp. 127–155. Geellschaft für Informatik (2015) 4. Awad, A., Polyvyanyy, A., Weske, M.: Semantic querying of business process models. In : 2008 12th International IEEE Enterprise Distributed Object Computing Conference, pp. 85– 94. IEEE (2008) 5. Becker, M., Laue, R.: A comparative survey of business process similarity measures. Comput. Ind. 63(2), 148–167 (2012) 6. Cayoglu, U., et al.: Report: the process model matching contest 2013. In: Lohmann, N., Song, M., Wohed, P. (eds.) BPM 2013. LNBIP, vol. 171, pp. 442–463. Springer, Cham (2014). https:// doi.org/10.1007/978-3-319-06257-0_35 7. Dijkman, R., Dumas, M., Van Dongen, B., Käärik, R., Mendling, J.: Similarity of business process models: metrics and evaluation. Inf. Syst. 36(2), 498–516 (2011) 8. Dijkman, R.M., et al.: A short survey on process model similarity. In: Bubenko, J., Krogstie, J., Pastor, O., Pernici, B., Rolland, C., Sølvberg, A. (eds.) Seminal Contributions to Information Systems Engineering, pp. 421–427. Springer, Heidelberg. https://doi.org/10.1007/978-3-64236926-1_34 9. Dumas, M., García-Bañuelos, L., Dijkman, R.M.: Similarity search of business process models. IEEE Data Eng. Bull. 32(3), 23–28 (2009) 10. Ehrig, M., Koschmider, A., Oberweis, A.: Measuring similarity between semantic business process models. In: Proceedings of the fourth Asia-Pacific Conference on Conceptual Modelling, vol. 67, pp. 71–80 (2007) 11. Gerth, C., Luckey, M., Küster, J.M., Engels, G.: Detection of semantically equivalent fragments for business process model change management. In: 2010 IEEE International Conference on Services Computing, pp. 57–64. IEEE (2010) 12. Humm, B.G., Fengel, J.: Semantics-based business process model similarity. In: Abramowicz, W., Kriksciuniene, D., Sakalauskas, V. (eds.) BIS 2012. LNBIP, vol. 117, pp. 36–47. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30359-3_4 13. Jabeen, F., Leopold, H., Reijers, H.A.: How to make process model matching work better? an analysis of current similarity measures. In: Abramowicz, W. (ed.) BIS 2017. LNBIP, vol. 288, pp. 181–193. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59336-4_13 14. Katoch, S., Chauhan, S.S., Kumar, V.: A review on genetic algorithm: past, present, and future. Multimedia Tools Appl. 80(5), 8091–8126 (2020). https://doi.org/10.1007/s11042-020-101 39-6 15. Koschmider, A., Oberweis, A.: How to detect semantic business process model variants? In: Proceedings of the 2007 ACM Symposium on Applied computing, pp. 1263–1264 (2007) 16. Schoknecht, A., Thaler, T., Fettke, P., Oberweis, A., Laue, R.: Similarity of business process models—a state-of-the-art analysis. ACM Comput. Surv. (CSUR) 50(4), 1–33 (2017) 17. Shahzad, K., Pervaz, I., Nawab, A.: WordNet based semantic similarity measures for process model matching. In: BIR Workshops, pp. 33–44 (2018)) 18. Szmeja, P., Ganzha, M., Paprzycki, M., Pawłowski, W.: Dimensions of semantic similarity. In: Gaw˛eda, A.E., Kacprzyk, J., Rutkowski, L., Yen, G.G. (eds.) Advances in Data Analysis with Computational Intelligence Methods. SCI, vol. 738, pp. 87–125. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67946-4_3 19. Thaler, T., Schoknecht, A., Fettke, P., Oberweis, A., Laue, R.: A comparative analysis of business process model similarity measures. In: Dumas, M., Fantinato, M. (eds.) BPM 2016. LNBIP, vol. 281, pp. 310–322. Springer, Cham (2017). https://doi.org/10.1007/978-3-31958457-7_23
Detection of Similarity Between Business Process Models
29
20. van Dongen, B., Dijkman, R., Mendling, J.: Measuring similarity between business process models. In: Bubenko, J., Krogstie, J., Pastor, O., Pernici, B., Rolland, C., Sølvberg, A. (eds.) Seminal Contributions to Information Systems Engineering: 25 Years of CAiSE, pp. 405–419. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36926-1_33 21. Weidlich, M., Dijkman, R., Mendling, J.: The ICoP framework: identification of correspondences between process models. In: King, R. (ed.) Active Flow and Combustion Control 2018: Papers Contributed to the Conference “Active Flow and Combustion Control 2018”, September 19–21, 2018, Berlin, Germany, pp. 483–498. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-642-13094-6_37 22. Zhou, C., Liu, C., Zeng, Q., Lin, Z., Duan, H.: A comprehensive process similarity measure based on models and logs. IEEE Access 7, 69257–76927 (2019)
Efficient Twitter Sentiment Analysis System Using Deep Learning Algorithm R. Devi Priya1(B) , Boggala Thulasi Reddy2 , M. Sarvanan2 , P. Hariharan2 , S. Albert Alexander3 , and Geno Peter4 1 Computer Science and Business Systems, KPR Institute of Engineering and Technology,
Coimbatore, India [email protected] 2 Department of Computer Science and Engineering, KPR Institute of Engineering and Technology, Coimbatore, India 3 Vellore Institute of Technology, Vellore, India 4 School of Engineering and Technology, University of Technology Sarawak, Sibu, Malaysia
Abstract. Emotion reputation from textual content is an essential process in processing natural language inputs that provides huge blessings to special regions inclusive of synthetic intelligence, human interplay with computer systems etc. Emotions are created and reflected as form of human response to the events around them. Analysis of those feelings by not considering face and voice modulation is often essential and demands an efficient technique for accurate interpretation of feelings. Despite these challenges, it’s important to know that the human feelings represent the use of mistreatment textual content thru social media packages like facebook, instagram, twitter etc. This paper provides a comprehensive analysis of sentimental class of multitude of tweets. Here, strategies to categorize the sentiments of an expression into superb or terrible feelings are discussed. The superb feelings include eagerness, happiness, love, neutral, relief, fun, marvel and terrible feelings include anger, boredom, emptiness, hatred, sadness, worrying. The paper have conducted experiments and evaluated performance of Recurrent Neural Networks (RNN) and proposed Improved Long short-time period memory (LSTM) on three special datasets to expose the way to achieve better performance. ILSTM contains efficient preprocessing techniques in addition to traditional LSTM procedures. The simulation results prove the superiority of proposed method over other existing methods in terms of both accuracy and execution time. Keywords: Twitter analysis · Emotion Classification · Recurrent Neural Networks · Long Short Term Memory (LSTM) · Sentiment analysis
1 Introduction Twitter serves as a common social networking platform wherein users post messages with the form of tweets [1]. It serves as a forum wherein people can post their thoughts or sentiments on varied domains, concepts, themes or events. It possess collection of consumer opinions and sentiment processing throughout diverse subjects like popular © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 30–38, 2023. https://doi.org/10.1007/978-3-031-35510-3_4
Efficient Twitter Sentiment Analysis System Using Deep Learning Algorithm
31
internet articles and blogs. The amount of relevant records is very much higher in twitter datasets when compared with other social media and blog platforms. When comparing with other blog pages, the reaction charge on Twitter is an awful lot which can be done more quickly. Sentiment evaluation is extensively used at different events which includes consumers or entrepreneurs to advantage insights into products or recognize the marketplace trends. As the device gaining knowledge of algorithms have drastically advanced with inside the current past, an efficient method is needed to decorate the accuracy of sentiment evaluation recommendations. The systems gaining knowledge are strongly processed using statistical evaluation with attention on growing predictions with the assist of new technological inventions. While thinking about software throughout in numerous businesses and studies, Artificial Intelligence (AI) is taken into consideration. Various machine learning algorithms have been used for preprocessing input data of various formats and also to perform data [2, 3]. Statement evaluation on Twitter makes use of numerous approaches in which Deep learning have received brilliant effects in recognizing emotions. This paper concentrates on classifying person’s emotions in Twitter dataset by implementing deep learning methods namely LSTM (Long brief Term Memory) and Recurrent Neural Networks. The proposed LSTM algorithm is used to educate the emotion type labels of the tweet and classify those tweets into bad or effective sentiments. The results of experiments conducted show better accuracy of the proposed ILSTM is superior than traditional methods which are available already.
2 Literature Review In [4], the emoticons are taken into consideration as noisy inputs. The algorithms used for processing emoticons are Naive Bayes (NB), Support Vector Machine (SVM) and Maximum Entropy. Combination of Tree Tagger for POS (Part Of Speech) tagging with Naive Bayes and N-Grams process varied distribution of words with different types of emotions. Even then, supporting emotional analysis of multiple languages is not yet sufficiently explored. In [5], despite the conventional gadget studying type algorithms, LSTM was proposed to obtain high-quality performance in emotion analysis. When RNN method is used, there arises some sudden increase and decrease in end performance for the duration of the levels of returned propagation. This trouble became rectified with the use of a sophisticated RNN and LSTM with its extra complex inner shape which incorporates reminiscence cells which supports the framework to consider the statistics stored inside the reminiscence cells. In [6], it is emphasized that the twitter will become a vital framework for Microblogging. It is used to analyze deeply the general feelings of people and evaluate numerous fields related to their emotions. Recent research works were performed with tweet messages which represents opinions of common people about government elections in India. In [7], the authors set their attractions on evaluation for figuring out the significance of semantic capabilities such as push tags which are commonly used in Natural language processing applications. They performed extensive characteristic choices and evaluation
32
R. D. Priya et al.
that may help to identify that the evaluation of verbal capabilities enhances the classifier accuracy to a great extent. In [8], a method to analyze emotions using a corpus with emotional messages from net blogs was proposed which uses emotional blogs in net posts which represents signs of moods of the users. Classification of sentimental feelings for each sentence asses through the evaluation stage with the help of SVM and Conditional Random Fields (CRF). Also, few experiments were performed with different methods to train the session for generating report about sentiment of messages. Sentiment class of tweets is done by predicting polarity to perform evaluations from exceptional web sites, called as noisy inputs. These kinds of labels are utilized to develop a schooling model. 1000 tweets have been grouped for training and some other 1000 tweets are chosen for testing [9]. But, it no longer provide an explanation about the information series of testing. However, they prioritize the usage of capabilities for tweets which posssess certain styles such as retweets where replies are given to tweets, hash tags, replies, punctuations, spaces and exclamations [10]. Extra functionalities like preceding polarity of phrases and Part of Speech (POS) tags have been implemented. The Sem Eval iterations for Twitter sentiment evaluation have already been implemented to test their ability using different methodologies [11]. In order to construct a correct classification mechanism for sentiment analysis of emotion recognition, the functions of CNN and RNN are combined in an efficient manner in message level and word level. In [12], various deep learning algorithms have been used to detect the fake tweets since they have to be identified in order to achieve accurate performance. LDA, LSA, CNN and RNN algorithms were all implemented and it was observed that LSA and LDA models show almost equal performance. In [13], social sensor cloud services for twitter dataset was implemented in a novel fashion where LSTM was used to detect at-risk social sensors using event definitions at an early stage. While processing inputs from internet and transferring them across networks, security is also important [14]. Table 1 shows the list of papers which have implemented machine learning methods for twitter sentiment analysis and their detailed performance. Table 1. Review of Existing methods with detailed performance measures S.No
Literature
Algorithm
Accuracy
1
Yadav et al. (2020) [1]
Naïve Bayes Logistic regression Support vector machine(SVM)
0.80610 0.82472 0.83716
2
Ramachandran and Parvathi (2019)[4]
Naïve Bayes
0.92518
3
Agarwal et al. (2019) [5]
POS Chunks N-gram
79.02 80.72 82.32
4
Go et al. (2020) [10]
Naive Bayes MaxEnt SVM
82.7 83.0 81.6
Efficient Twitter Sentiment Analysis System Using Deep Learning Algorithm
33
3 Proposed Method The proposed approach called as Improved LSTM (ILSTM) is used to categorize the feelings as effective, poor and impartial or different fashions. In our version, each effective and poor fashion is combined. Initially, the emotions are classifies as effective or poor after which similarly to different feelings are rated accordingly. The textual content feedback can be given to the version and is being pre-processed. Preprocessing prepares the dataset well with no inconsistencies and transforms it to the required condition. The classification is performed by implementing Long Short Term Memory, advanced version of Recurrent Neural Networks (RNNs). It takes the tweets as input and classifies into good (positive) and bad (negative) emotions. Good positive emotions include enthusiasm, happiness, neutral feeling, good comfort, funny actions, love and surprising actions. The negative emotions and feeling involve anger, worry, boring action, hatred, emptiness and sadness. The basic flow of processing twitter data is given in Fig. 1 and the steps are explained in detail below. 3.1 Pre-processing The twitter messages are often casual and includes different nature of the usage of tweets primarily depending upon nationality, behavior, regional customs, age, gender and environmental conditions of the user. Hence, tweets written in twitter pages usually bring about a loud statistics set of irrelevant texts, emoticons, symbols etc. Retweets are very common in twitter datasets where the same tweets are sent again and careful analysis of these tweets is very much essential to perform accurate analysis. So, in order to create new datasets, uncooked twitter statistics must be optimized in such a way that the classifier algorithms categorize emotions with better accuracy. Different pre-processing techniques used in the proposed method are given as below: i. ii. iii. iv. v. vi. vii.
Lower Easing Removal of twitter handles Removal of URLs Removal of stopwords Removal of punctuations and spaces Removal of handles Stemming
All these processing can be done easily using Natural language Toolkit (NLTK) package which is an open source package for NLP operations. 3.2 User-Mention Handle is any other unique individual utilized in tweets to point out different persons. The tweeters typically tags any other person as @name_user. As the intention is to get rid of all undesirable symbols contained in the data sets, all person taggings such as @name_user with the phrase USER-MENTION are updated. The equivalent expression used to suit person tagging is @[s] +.2 and the process is given in Fig. 2.
34
R. D. Priya et al. Tweet level pre processing
Lower Easing
Normalization
Removal of punctuation and spaces
Tokenization
Stop words
Streaming
Fig. 1. System flow of propose Pre-processing strategies
3.3 EMOJ Positive and Negative It may be very not unusual place to apply extraordinary sorts of pix in the tweets. They consists of smileys, gestures of hands, etc. It is known that all social media websites have a huge variety of pix, it is extremely difficult to discover equal representations for every input. The pix will never be excluded as they are very critical in emotion transfer and hence sorts of substitutions for good and bad feelings are set as EMOJ-POSITIVE and EMOJ-NEGATIVE respectively (Fig. 3).
Watched by user
recommended to user
Fig. 2. User mentions
Fig. 3. Different types of Emotions
Efficient Twitter Sentiment Analysis System Using Deep Learning Algorithm
35
3.4 Feature Selection The measure Term Frequency - Inverse Document Frequency (TF-IDF) is very commonly used in information retrieval for extracting the important phrases or uncommon phrases in a textual content records. Term frequency (TF) is used to convert the phrases in string layout to numerical formatted records such that the device studying fashions can recognize the stored information. TF discovers the number of incidence of phrases that each class possess. In the datasets, the function TF is set as phrases. The frequency of every phrase within the dataset that’s being calculated use the time period frequency as given in figure 4.
Rating
Collaborative filtering
User-feature weight
Select the highest user User feature weight
Content-based recommendation
Content and collaborative
Hybrid recommendation
Sum combiner
Best combiner
Fig. 4. Feature selection
3.5 Classification LSTM is an extended version of RNNs and the Long Short-Term community possess high memory capabilities. The detailed flow diagram of LSTM is given in figure 5. The duplicated cells are related in an extremely precise approach to escape from diminishing and increasing gradients. LSTM consists of regularly related blocks that are the reminiscence units. It casts off or include facts to the mobile state. In addition, they are maintained with the aid of using systems referred to as gates. Gates are the systems that comes to a decision whether or not to by skip the facts or not. Hence, LSTM facilitates in classifying tweets with the aid of using additionally the usage of the lengthy variety of dependencies. It is flexible enough to eliminate or include statistics to the mobile network which contains huge statistical data and are managed through gates. The segment layer generates the outputs namely 0 and 1 representing the statistics which are processed. The value of zero indicates that not anything is transmitted via the gate whereas the value of one indicates the entire thing may be by skipping and the functions should be
36
R. D. Priya et al.
picked out to be provided as input to the mobile state. The overlook layer present in the LSTM structure performs this selection and makes choice of overlooking the functions or not. The value of ht-1 determines the values of output. It means that the output can take the value of either a zero or a one for every mobile state. When the value is one, it depicts the reputation of the function and the value zero represents the statistics is not accountable. Here, there is a need to accept or reject the function phrases obtained from the information units to categorize the emotions.
Fig. 5. LSTM Architecture
4 Experimental Results and Discussion Twitter dataset available in the internet repository has been chosen for experimentation and used for special information units for good and bad emotions to evaluate the algorithms. The tweets are passed through normalization procedure to create the information appropriate for pre-processing. These steps are obligatory to get rid of irrelevant symbols and generate the normalized tweets. The experiments results as given in Table 2 clearly indicate that ILSTM produce higher accuracy when compared to Naïve Bayes, Logistic regression and RNN classifiers. The ILSTM version become efficiently suitable for the given twitter dataset. The actual positive consequences are being diagnosed as proper instances and are termed as True Positive, represented as TP. Few instances which can be accurate are being diagnosed as negative, known as false positive (FP). Table 3 compares the execution time of methods measured in terms of milliseconds. It is noted that the execution time of ILSTM is lower when compared with that of other methods since the preprocessing steps normalize the inputs so that the classification process is completed quickly. The proposed approach included each semantic phrase and also the emotional phrase vectors and hence the overall performance of the machine is enhanced. ILSTM is basically dependent on totally getting to know used Doc2Vect for characteristic extraction in the phrase sequence. Because of that, its overall efficiency is higher than that of RNN which models the spatial nature of the phrase sequence. When RNN is used, the
Efficient Twitter Sentiment Analysis System Using Deep Learning Algorithm
37
Table 2. Performance comparison of methods Method
Precision
Accuracy %
RNN
0.73
71.2
Naïve Bayes
0.75
74.3
Logistic Regression
0.82
84.8
ILSTM
0.91
97.3
Table 3. Comparison of Execution Time Method
Time (ms)
RNN
39
Naïve Bayes
48
Logistic Regression
45
ILSTM
42
experimental results show that its performance is lower due to a vanishing gradient problem. These problems are solved efficiently by utilizing the Improved LSTM that is an advanced model of LSTM with strong computing skill ability and memory capacity. In addition to that ILSTM is capable of classifying emotions contained in long sentences more efficiently when compared with that of traditional RNN.
5 Conclusion The paper thus proposes a framework based on LSTM modelled for evaluating sentiments of twitter inputs. The tweets that are derived out of evaluation are a combination of various phrases and emoticons. The classifier algorithms used for experiments are deep learning strategies namely RNN and LSTM. For obtaining higher accuracy in the proposed ILSTM, exclusive characteristic choice strategies like TF-IDF and Doc2Vect are integrated. The analysis process creates a vector which is given as input to the type model. The proposed model shows better performance in classifying the twitter emotional messages. As a future work, other deep learning frameworks can be investigated for evaluating the tweets so that more accurate emotion analysis can be performed.
References 1. N. Yadav, O. Kudale, S. Gupta, A. Rao, and A. Shitole, “Twitter sentiment analysis using machine learning for product evaluation,” in IEEE International Conference on Inventive Computation Technologies (ICICT), vol. 2, 2020 2. R Devi Priya, R Sivaraj, N Anitha and V Devisurya. “Forward Feature Extraction from imbalanced microarray datasets using Wrapper based Incremental Genetic Algorithm”, International Journal of Bio-Inspired Computation, vol.16, No.3, 171–180, 2020
38
R. D. Priya et al.
3. R Devi Priya and R Sivaraj “Preprocessing of Microarray gene expression data for SSmissing values”, International Journal of Data Mining and Bioinformatics, Vol. 16, No 3, pp183–204, 2016 4. D. Ramachandran and R. Parvathi, “Analysis of twitter specific preprocessing technique for tweets,” Procedia Computer Science, vol. 165, no. 2, 2019 5. A. Agarwal, F. Biadsy, and K. Mckeown, “Contextual phrase-level polarity analysis using lexical affect scoring and syntactic n-grams,” in Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), vol/. 2, 2019 6. B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” arXiv preprint cs/0409058, 2019 7. S. A. El Rahman, F. A. AlOtaibi, and W. A. AlShehri, “Sentiment analysis of twitter data,” in 2019 IEEE International Conference on Computer and Information Sciences (ICCIS), vol.2, 2020 8. M. Hu and B. Liu, “Mining and summarizing customer reviews”, Proceedings of the 10th ACM GIGKDD international conference on knowledge discovery and data mining,” 2020 9. S.-M. Kim and E. Hovy, “Determining the sentiment of opinions,” in COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, vol. 2, 2020 10. Go, A., Bhayani, R., Huang, L.: “Twitter sentiment classification using distant supervision”, CS224N project report. Stanford 1(12), 2 (2020) 11. A. Bermingham and A. F. Smeaton, “Classifying sentiment in microblogs: is brevity an advantage?,” in Proceedings of the 19th ACM international conference on Information and knowledge management, p. 3, 2020 12. H.Kirn, M. Anwar, A. Sadiq, Z, H.M. Zeeshan, I. Mehmood, R.A. Butt, Deepfake Tweets Detection Using Deep Learning Algorithms: Engineering Proceedings, 20, 2, 2022 13. Hinduja, S., Afrin, M., Mistry, S., Krishna, A.: Machine learning-based proactive social-sensor service for mental health monitoring using twitter data. International Journal of Information Management Data Insights 2(2), 100113 (2022) 14. A. John Blesswin, G. Selva Marry, S. Manoj Kumar, “Secured Communication Method using Visual Secret Sharing Scheme for Color Images”, International Journal of Internet Technology, Vol. 22, No. 4, 803–810, 2021
An Efficient Deep Learning-Based Breast Cancer Detection Scheme with Small Datasets Adyasha Sahu1(B) , Pradeep Kumar Das5 , Sukadev Meher1 , Rutuparna Panda2 , and Ajith Abraham3,4 1
5
National Institute of Technology Rourkela, Rourkela, India [email protected], [email protected] 2 Veer Surendar Sai University of Technology, Burla, India 3 Machine Intelligence Research Labs, Auburn, WA 98071, USA [email protected] 4 Center for Artificial Intelligence, Innopolis University, 420500 Innopolis, Russia School of Electronics Engineering (SENSE), Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India [email protected], [email protected] Abstract. Breast cancer is the second major reason of cancer death among women. Automatic and accurate detection of cancer at an early stage, allow proper treatment to the patients and drastically reduces the death rate. Usually, the performance of conventional deep learning networks degrades in small databases due to a lack of data for proper training. In this work, a deep learning-based breast cancer detection framework is suggested in which EfficientNet is employed to deliver excellent performance even in small databases. The uniform and adaptive scaling of depth, width, and resolution result in an efficient detection framework by maintaining a proper tradeoff between classification performance and computational cost. Moreover, an Laplacian of Gaussian-based modified high boosting (LoGMHB) is applied in addition to data augmentation as a preprocessing step prior to the deep learning model to boost performance further. Here, performance analysis is conducted in both mammogram and ultrasound modalities to display the proposed method’s superiority. The experimental results with five-fold cross-validation show that the proposed breast cancer scheme outperforms other comparison methods in all the performance measures.
Keywords: Breast cancer learning · classification
1
· mammogram · ultrasound · deep
Introduction
Breast cancer is common type of cancer widely seen among women allover the world [21,24]. It is the irregular and unusual growth of breast cells shaping a tumor or lump. Since it is one of the leading cause of cancer death among women, disease c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 39–48, 2023. https://doi.org/10.1007/978-3-031-35510-3_5
40
A. Sahu et al.
detection at initial stage is very essential [17] . Mammography and ultrasound are two popularly recommended imaging modality for breast cancer screening. Mammography takes breast images using low dose x-ray, whereas ultrasound is a nonradiation procedure that uses sound waves to take breast images. Early detection is very essential for providing proper treatment plan to the patient before it severely invade other parts of the body [1,3–12,20,21,24]. The tumor types are benign and malignant. Benign tumors are less dangerous, grow slowly, and less likely to infect the nearing cells, whereas malignants infect the neighbours with a faster pace and can affects other organs of our body as well. Manual detection of the disease is quite sensitive to error. To reduce the false detection the radiologists or the specialists need to go through multiple reading, which is expensive and time-taking as well [13]. Therefore, a strong need is there for Computer aided diagnosis (CAD)-based automatic detection of the disease. During last few years, several researchers have developed different efficient CAD system for breast cancer detection [1,13,17]. Application of deep learning and transfer learning in CAD gives more precise and accurate result and aid the experts. Generally deep learning models require a large database to train the network properly. on the other hand availability of large standard database is quite challenging in the medical field. To get over this issue we can apply transfer learning, which are pretrained in the source domain i.e. large Imagenet dataset and finetuned then applied in the trarget or application domain. They performs well in small databases as well. A number of researchers have implimented transfer learning networks to detect and classify breast cancer accurately and efficiently [1,3,13,17,19–21,24]. A whole mammogram classification scheme based on ResNet50 is proposed by Bagchi et al. [3] in 2020. In this article the performance of the proposed scheme is compared with ResNet101 and ResNet18 and found ResNet-50 performs better than other versions of ResNet. Hence, it is not always right that the performance enhances if depth of the network increases. Again in 2020 a modified ResNet50 structure is presented by Rahman et al. [20] to detect and classify breast mass using DDSM database. The proposed method’s performance is compared with InceptionV3 and got better performance for ResNet [14]. A transfer learning-based efficient classification technique is proposed by Zhang et al. in [24]. In [13], Falconi et al. have suggested a breast cancer classification system in accordance to the BI-RADS categories. VGG16 and VGG19 are used here for classification of mammogram images. An argument technique is suggested in [1] by Adedigba et al. for improvement of the variance and size of the dataset. The experimental analysis show that DenseNet outperforms other comparing networks: SqueezeNet, VGG, AlexNet, and ResNet having more training and testing accuracy. In 2019, Khan et al. [17] have suggested a framework, where 3 transfer learning networks: ResNet, VGGNet, and GoogLeNet were employed to extract features proir to the average pooling. Average pooling is used for benign and malignant classification. The contributions of this work are:
An Efficient Deep Learning-Based
1.1
41
Contributions
– In this work, a deep learning-based breast cancer detection framework with EfficientNet [23] is employed to deliver excellent performance even in small databases. Importantly it’s uniform and adaptive scaling of depth, width, and resolution leads the more efficient detection by maintaining a proper tradeoff between classification performance and computational cost. – Moreover, an Laplacian of Gaussian-based modified high boosting (LoGMHB) is applied in addition to data augmentation as a preprocessing step prior to the deep learning model to boost performance further. – The proposed scheme is validated using two different imaging madality: mammogram and ultrasound to show the robustness of the proposed scheme. – For a fair execution and performance comparison all the results are taken with 5-fold cross-validation rather than single observation. In addition to this six different performance measures have used for quantitative performance analysis. The remaings of this work is structured as follows: Sect. 2 depicts the methodology. In this section, we have discussed features and architecture of the proposed breast cancer detection scheme. Section 3 presents a brief detail of the datasets used for the experimentation. In Sect. 4 we have discussed and shown the experimental results obtained. At the end work is concluded with a brief explanation in Sect. 5.
2
Proposed Method
Transfer learning techniques have been used extensively for automatic disease detection for the last few years as they perform well even with a small number of data. Transfer learning solves the issues associated with conventional deep learning networks (performance degradation in small databases due to a lack of data for proper training). It motivates us to present an efficient transfer learningbased breast cancer detection network, as shown in Fig. 1, to deliver excellent performance even in small databases. Here, the mobile inverted bottleneck [22] block helps to achieve squeezeand-excitation optimization effectively, which is the prime building block of this framework. It combines the merits of bottleneck structure and inverted residual. Before applying to the network, the quality of the images is improved by applying LoGMGH [8] for performance improvement. In addition, data augmentation is employed preprocessing step in which horizontal rotation, vertical rotation, shifting, and flipping are done to properly train the network. 2.1
Preprocessing
Generally the quality of medical images deteriorated due to the undesirable noises and blurring effect result in false detection during diagnosis. Therefore, preprocessing of the images is very vital to improve of the image quality. Here,
42
A. Sahu et al.
Fig. 1. Proposed Breast Cancer Detection Scheme
LoG-based modified highboosting operation is applied to the images prior to the classification stage as pre-processing to get denoised deblurred image [8]. Laplacian operator enhance the edge along with deblur the image. On the other hand it is very prone to noise. Therefore, a 2-D Gaussian filter can be applied before the Laplacian operator to reduce the effect of noise, and function as a LoG filter. The LoG operation is written as follows: P (i, j) = I (i, j) + kLG (i, j)
(1)
Here in Eq. 1, I (i, j) and P (xI, J) refer the input image and the preprocessed image respectively. The weight factor is denoted as k, and (k > 1) . The value of k must be selected in a way that it efficiently boost the quality of image. In this work, the k value is taken as k = 1.5. After applying the LoG operation on the input image the expression can be written as: 2 −1 i2 + j 2 i + j2 LG (i, j) = 1− exp − . (2) πσ 4 2σ 2 2σ 2 Here, σ denotes the standard deviation, which value must be choosen correctly that the image quality is improved. Here, the size of LoG filter is 7 × 7 and σ = 1. 2.2
CNN Architecture
In modern deep-learning networks, scaling of depth, width, or resolution is extensively used for performance enhancement. For example, different ResNet architectures are developed by scaling up the depth of the network. Tunning of width variation is presented in MobileNet and MobileNetV2, whereas the resolution variation is suggested in [15,23]. Usually, scaling up the depth of CNN results in the extraction of more vital and complex deep features. In contrast, it increases difficulties in training the
An Efficient Deep Learning-Based
43
network effectively after a certain depth due to vanishing gradient issues. Though batch normalization [16,23] and skip connection [14] can help to overcome this limitation up to some extent, it degrades the computational efficiency. An increment in the width of a small-size network helps to extract more vital fine-grained features and makes the training easier. In contrast, width increment also enhances the difficulties in training a hugely wide shallow model. Similarly, resolution increment enhances the clarity of input images and helps to extract more vital fine-grained features. However, it enhances the computational cost. Majority of existing works focus on scaling up one of these three parameters, although it is possible to scale it orbitally. However, orbitary scaling of all these parameters is becoming challenging as it needs manual tunning. Hence, in EfficientNetB0 [23] architecture, an efficient compound scaling approach is suggested for uniformly scaling up the CNNs. Here, uniformly scaling is achieved as presented in Eq. 3. dn = aα wn = bα rn = cα where, a.b2 .c2 ≈ 2a ≥ 1, b ≥ 1, c ≥ 1.
(3)
Here, the compound coefficient (α) uniformly scaling the network depth (dn ), width (wn ), and resolution (rn ). This coefficient is responsible to control the availability of more resources for scaling. The constant terms: a, b, and c are responsible for effectively using these extra resources for depth, width, and resolution scaling up [23]. These terms are estimated by a small grid search. In this work, α is selected as 1, which indicates the availability of twice more resources. The values of a, b, and c are selected as 1.2, 1.1, and 1.15, respectively, as suggested in EfficientNetB0 [23]. The uniform and adaptive scaling of depth, width, and resolution result in an efficient detection framework by maintaining a proper tradeoff between classification performance and computational cost. In addition, the LogMHB-based pre-processing boost the image quality and detection performance as well.
3
Datasets
In this work, we have used two different imaging modality of breast cancer to show the robustness of the proposed scheme. All the experiments are executed using two standard databases: Mini-DDSM [18] database for mammogram images and BUSI (Breast Ultrasound Images) [2] for ultrasound images respectively, which are available in public domain. In this work a set of 400 image (200 images from each class of benign and malignant) and 260 images (130 images from each class of benign and malignant) are selected randomly for training and validation for mammogram and ultrasound respectively. So that the efficiency and superiority of the proposed scheme can be presented in small dataset as well.
4
Results and Discussion
In this section a comparative performance analysis of the proposed breast cancer detection scheme are displayed in both mammogram and ultrasound dataset.
44
A. Sahu et al.
The performance of the proposed scheme is compared with some well-known transfer learning networks: AlexNet, vgg16, ResNet50, Xception. For an unbiased analysis, the same pre-processing scheme is employed before all these transfer learning networks. The quantitative analysis of the performance is carried out by means of six standard performance measures: AUC, accuracy, precission, specificity, sensitivity, and F1 score. For an unbiased analysis of the proposed scheme all the results are taken in 5-fold cross validation instead of single iteration. A certain experimental platform is mentained to perform all the experiment genuinely. The specifications of the system are: Intel(R) Core(TM) i7-11700 processor, 16 GB RAM, 2.50 GHz clock speed, and NVIDIA T400 8 GB GPU. The simulation is done in R2020b. The parameters are finetuned to the values: Batch size = 64, Initial learning rate = 0.0001, and Number of epochs = 12. Table 1 represents the comprehensive performance comparison of the proposed scheme with other comparing methods for breast cancer detection for mammogram images for the mammogram dataset (mini-DDSM). Here, the mean result of the five-fold-crossvalidation is presented. The table depicts that the suggested framework delivers better performance than others with highest values for all the performance measures: 0.9675 AUC, 96.75% accuracy, 96.52% precission, 96.50% specificity, 97.00% sensitivity, and 0.9676 F1 score. The detailed result comparion and analysis for ultrasound images in the BUSI dataset is presented in Table 2. Mean of the five-fold-crossvalidation is shown here. From the table it is found that the proposed scheme outperforms its comparing methods with best 0.9154 AUC, 91.54% accuracy, 91.54% precission, 91.54% specificity, 91.54% sensitivity, and 0.9154 F1 score (Fig. 2). Table 1. Breast Cancer Detection Performance in Mammogram Images using Mammogram Dataset (mini-DDSM) Method
AUC
Accuracy Precision F1 Score Sensitivity Specificity
Xception 0.8700
87.00
87.37
0.8693
86.50
87.50
AlexNet
0.9500
95.00
94.55
0.9502
95.50
94.50
Vgg16
0.9450
94.50
94.06
0.9453
95.00
94.00
ResNet50 0.9550
95.50
95.50
0.9550
95.50
95.50
96.52
0.9676
97.00
96.50
Proposed 0.9675 96.75
For better analysis the true positive (TP), False positive (FP), True negative (TN), and False negative (FN) values are presented in a tabular format for both the mini-DDSM and ultrasound dataset in Table 3 and Table 4, respectively. Both the tables show that the proposed scheme has less number of false detection (FP and FN) and hence exhibits better performance than others. The ROC curve
An Efficient Deep Learning-Based
45
Table 2. Breast Cancer Detection Performance in Ultrasound Images using Ultrasound Dataset(BUSI) Method
AUC
Accuracy Precision F1 Score Sensitivity Specificity
Xception 0.8385
83.85
83.33
0.8397
84.62
83.08
AlexNet
0.9000
90.00
89.39
0.9007
90.77
89.23
Vgg16
0.8808
88.08
86.67
0.883
90.00
86.15
ResNet50 0.9038
90.38
Proposed 0.9154 91.54
90.08
0.9042
90.77
90.00
91.54
0.9154
91.54
91.54
Table 3. TP, TN, FP, FN Performances Method
mini-DDSM TN FP FN TP
Xception 175 25
BUSI TN FP FN TP
27
173 108 22
20
110
AlexNet
189 11
9
191 116 14
12
118
Vgg16
188 12
10
190 112 18
13
117
12
118
ResNet50 191
9
9
191 117 13
Proposed 193
7
6
194 119 11 11 119
Fig. 2. ROC Performances with mini-DDSM Dataset
for the proposed and the comparing methods with mammogram (mini-DDSM) and ultrasound (BUSI) datasets are presented in Fig. 3 and Fig. 4, respectively. They depict that the proposed method outperforms others. In future this work can be extended by using the data collected from a cancer specialized hospital to examine the efficacy of the proposed method in realtime medical applications.
46
A. Sahu et al.
Fig. 3. ROC Performances with BUSI Dataset
5
Conclusion
In this paper, we have presented an efficient breast cancer detection technique to detect malignancy in breast images effectively. Implimentation of EfficientNetB0 in this framework, successfully mitigates the issue of unavailability of large dataset in medical filed. The uniform and adaptive scaling of depth, width, and resolution in EfficientNetB0 yield an efficient detection framework by maintaining a proper tradeoff between classification performance and computational cost. Thus, the proposed breast cancer detection framework combines the merits of both EfficientNetB0 and LoGMHB to deliver excellent performance even in small databases. From the experimental analysis, it can be concluded that the proposed breast cancer detection framework performs better in both mammogram and ultrasound datasets. Acknowledgement. This research has been partly financially supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70-202100143 dd. 01.11.2021, IGK 000000D730321P5Q0002). Authors acknowledge the technical support and review feedback from AILSIA symposium held in conjunction with the 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022).
References 1. Adedigba, A.P., Adeshinat, S.A., Aibinu, A.M.: Deep learning-based mammogram classification using small dataset. In: 2019 15th International Conference on Electronics, pp. 1–6. Computer and Computation (ICECCO), IEEE (2019) 2. Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data in Brief 28(104), 863 (2020)
An Efficient Deep Learning-Based
47
3. Bagchi, S., Mohd, M.N.H., Debnath, S.K., Nafea, M., Suriani, N.S., Nizam, Y.: Performance comparison of pre-trained residual networks for classification of the whole mammograms with smaller dataset. In: 2020 IEEE Student Conference on Research and Development (SCOReD), pp 368–373. IEEE (2020) 4. Das, P.K., Meher, S.: An efficient deep convolutional neural network based detection and classification of acute lymphoblastic leukemia. Expert Syst. Appl. 115311 (2021) 5. Das, P.K., Meher, S.: Transfer learning-based automatic detection of acute lymphocytic leukemia. In: 2021 National Conference on Communications (NCC), pp. 1–6. IEEE (2021) 6. Das, P.K., Meher, S., Panda, R., Abraham, A.: A review of automated methods for the detection of sickle cell disease. IEEE Rev. Biomed. Eng. 13, 309–324 (2019) 7. Das, P.K., Jadoun, P., Meher, S.: Detection and classification of acute lymphocytic leukemia. In: 2020 IEEE-HYDCON, pp 1–5. IEEE (2020) 8. Das, P.K., Meher, S., Panda, R., Abraham, A.: An efficient blood-cell segmentation for the detection of hematological disorders. IEEE Trans. Cybern. (2021) 9. Das, P.K., Pradhan, A., Meher, S.: Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques. In: Gopi, E.S. (ed.) Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication. LNEE, vol. 749, pp. 425–437. Springer, Singapore (2021). https://doi.org/10.1007/978-981-160289-4 32 10. Das, P.K., Diya, V., Meher, S., Panda, R., Abraham, A.: A systematic review on recent advancements in deep and machine learning based detection and classification of acute lymphoblastic leukemia. IEEE Access (2022) 11. Das, P.K., Nayak, B., Meher, S.: A lightweight deep learning system for automatic detection of blood cancer. Measurement 191(110), 762 (2022) 12. Das, P.K., Sahoo, B., Meher, S.: An efficient detection and classification of acute leukemia using transfer learning and orthogonal softmax layer-based model. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2022) 13. Falcon´ı, L., P´erez, M., Aguilar, W., Conci, A.: Transfer learning and fine tuning in mammogram bi-rads classification. In: IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 475–480. IEEE (2020) 14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 15. Huang Y, et al.: GPIPE: efficient training of giant neural networks using pipeline parallelism. In: Advances in Neural Information Processing Systems, vol. 32 (2019) 16. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, PMLR, pp. 448–456 (2015) 17. Khan, S., Islam, N., Jan, Z., Din, I.U., Rodrigues, J.J.C.: A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recogn. Lett. 125, 1–6 (2019) 18. Lekamlage, C.D., Afzal, F., Westerberg, E., Cheddad, A.: Mini-DDSM: mammography-based automatic age estimation. arXiv preprint arXiv:2010.00494 (2020) 19. Ragab, D.A., Sharkas, M., Marshall, S., Ren, J.: Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ 7, e6201 (2019)
48
A. Sahu et al.
20. Rahman, A.S.A., Belhaouari, S.B., Bouzerdoum, A., Baali, H., Alam, T., Eldaraa, A.M.: Breast mass tumor classification using deep learning. In: 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), pp. 271–276. IEEE (2020) 21. Sahu, A., Das, P.K., Meher, S.: High accuracy hybrid CNN classifiers for breast cancer detection using mammogram and ultrasound datasets. Biomed. Signal Process. Control 80(104), 292 (2023) 22. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 23. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, PMLR, pp. 6105– 6114 (2019) 24. Zhang, X., et al.: Classification of whole mammogram and tomosynthesis images using deep convolutional neural networks. IEEE Trans. Nanobiosci. 17(3), 237–242 (2018)
Comparative Analysis of Machine Learning Models for Customer Segmentation Parmeshwara Joga, B. Harshini, and Rashmi Sahay(B) Department of Computer Science and Engineering, Faculty of Science and Technology (IcfaiTech), The ICFAI Foundation for Higher Education, Hyderabad, India [email protected]
Abstract. In the present competitive era, entrepreneurs struggle to increase and retain their customer base. Behavioral-based customer segmentation assists in the identification of potential customers, their buying habits, and shared interests. This helps to build an efficient strategy to increase customer base and product sales. In this paper, we compare the efficacy of four machine learning algorithms, namely, KMeans, DBSCAN, Agglomerative Clustering, PCA with KMeans, in performing behavioral-based customer segmentation. The machine learning algorithms divide customers into an optimal number of customer segments based on parameter, namely, annual income, age, and spending score. The knowledge of customer segments based on the selected par meters will assist in exploring novel ways to increase marketing persona. Using the four algorithm, we group customers with common interests by extracting and analyzing patterns in the available customer data. Our comparison shows that Agglomerative Clustering has the highest silhouette score of 0.6865 in performing behavioral-based customer segmentation in comparison to. The comparative analysis on feature based customer segmentation explored in this paper can be used by organization for exploring efficient mechanisms for customer segmentation.
1
Introduction
The increase in online marketing in recent years has resulted in an explosion of customer data. Due to increased competition, organizations are thriving towards maximising sales, profits, customer satisfaction, market satisfaction, and minimising costs. However, they fail due to lack of emphasis towards understanding the customer behavior and buying patterns. Organizations can understand and comprehend the market and their consumers more effectively if they perform an in depth analysis of the massive costumer data available. An efficient mechanism to perform customer data analysis is by applying segmentation. In this method, customers are clustered into ‘n’ number of groups based on parameters like buying traits, age, and gender. Clustering is the process of grouping the data based on similarities in the data set. Customers belonging to a particular group c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 49–61, 2023. https://doi.org/10.1007/978-3-031-35510-3_6
50
P. Joga et al.
have some common traits. Customers are grouped in a way that a customer belonging to a particular group shares a common interest with other customers of the same group [1]. This enables organization to effectively design and target market campaigns to various groups and enhances the likelihood of a customer purchasing the items. For instance, the business may use social media to market its brand among teenagers. By analysing the emotion behind the reviews and customer data, this paper leverages real-time data to create client categories. Based on the categories created, advertisement campaigns and strategies may be designed. Organizations can thus make stronger consumer relationships and improving their overall performance. In this paper, we apply different machine learning algorithms used for segmentation of customers data and compare their performance using the Silhouette and Davis-Boulton scores. The rest of the paper is divided into the following sections. Section 2 presents the problem statement. Section 3 presents the related research in costumer segmentation. Section 4 explain the methodology and the four algorithms applied for customer segmentation in this paper. Section 5 presents the results and compares the performance of the four algorithm and Sect. 6 concludes the paper.
2
Problem Statement
Increasing customer base and client retention has become a major concern for the organization trying to gain a competitive edge in the market. Customer segmentation, i.e., a practice of discovering common attributes among consumers and categorizing them has become a part of building marketing strategies. However, the challenge arises in selection of appropriate and optimal algorithms that suits best to a given customer data. For different data sets, different algorithms may perform an efficient segmentation. In this paper, we achieve the following objective. 1. The paper analysis the efficiency of four machine learning algorithm, namely, KMeans, DBSCAN, Agglomerative Clustering, PCA with KMeans, in solving the customer segmentation problem. 2. The paper compares their performance in segmenting mall customer data set.
3
Literature Review
Customer segmentation is an important management tool in Customer Relationship Management literature. In practice, the goal of customer segmentation is to maximizes customer satisfaction and hence improves company’s profit significantly. It is also an active research area especially in industrial management literature [3,4]. Segmentation started to become more widely accepted and used in the middle of the 20th century. Smith W.R. described in details strategies that can be adapted by using segmentation. The author further emphasized that segmentation is essentially a merchandising strategy and merchandising represents the adjustment of market offerings to consumer or user requirements [5].
Customer Segmentation
51
Demographic segmentation is a process of splitting customer groups based on traits such as age, gender, ethnicity, income, level of education, religion, and profession [6,7] . Our lifestyles have been continuously changing and so the data gets updated too often in attributes such as age, income etc. It is also a subjective topic as it doesn’t provide any insights on needs and values of customers. This type of segmentation is not appropriate for fields like music, movie recommendation, online shopping etc. Psycho-graphic segmentation allows incredibly effective marketing by grouping customers at the more personal level by defining their hobbies, personality traits, values, life goals, lifestyles, and beliefs [3,8] . It uncovers hidden motivation and attitudes of customers however customer actions, loyalty and other factors are not taken into consideration. Collection of this data is difficult and also requires complex setup process to get accurate data and sometimes may rely on assumptions. Geographic segmentation allows many different kinds of considerations when advertising to consumers by grouping them based on their geographic location such as their country, region, city, and even postal code [3,9]. However buying behavior, needs or wants of customers cannot be interpreted as the needs of people living in the same region need not be same. Changing population and weather also makes this segmentation less effective. Behavioral segmentation is perhaps the most useful of all e-commerce businesses as most of this data such as customers’ spending habits, browsing habits, purchasing habits, loyalty to a brand, interactions with the branch, and previous product ratings can be gathered via the website itself [3,10]. It enables us to understand customer needs and behaviors which help us to prioritize group of customers that have a common trait, to build brand loyalty and also to avoid wasting time on low spending customers.
4
Algorithms for Customer Segmentation
In order to implement the algorithm for customer segmentation of Mall data, we use the following libraries:Pandas, numpy, matplotlib, seaborn, sklearn, and scipy. The mall dataset used in the present work consists of five attributes, namely, Customer ID, Gender, Age, Annual Income, and Spending Score. The distribution of the data set is shown in Fig. 1. Most of the annual income falls between 50K to 85K. Most of the customers are within 25 to 40 years old with an average age of 38.5 and median age of 36 years. Both the bar chart and counts show that there are more females than male customers. Table 1 show the correlation of the features in the dataset. It is clear that correlation between the features are low. Therefore, to create the clustering models, we will use all of the features. In the following Subsection, we preset four clustering algorithm that are applied on the Mall Customer dataset.
52
P. Joga et al.
Fig. 1. Distribution of Features Table 1. Correlational Matrix of the Features CustomerID Age
4.1
CustomerID
1
Age
–0.026763
Annual Income Spending Score
–0.02676 0.977548 1
–0.01239
0.013835 –0.32722
AnnualIncome 0.977548
–0.01239 1
0.009903
SpendingScore 0.013835
–0.32722 0.009903
1
Customer Segmentation Using K-Means
Algorithm 1 describes Customer Segmentation using K-Means clustering. The process of the algorithm may be explained by the following steps: Step 1: Determine the number of clusters k. Step 2: Pick k random points from the data to serve as centroids. Step 3: Assign each point to the nearest cluster centroid. Step 4: Calculate the centroids of freshly generated clusters again. Step 5: Repetition of steps 3 and 4. In order to determine the value of k, elbow curve method is used as depicted in Line No. 5 of Algorithm 1. The Elbow method is a popular strategy that involves running k-means clustering for a set of clusters k (say, 1 to 10) and computing the sum of squared distances from each point to its assigned centre for each value (distortions). When the distortions are plotted and the plot resembles an arm, the “elbow” (the point of inflection on the curve) is the optimal value of k. The elbow graph is shown in Fig. 2. The x-axis in Fig. 2 represents the number of clusters. The number of clusters measured at the elbow bend is assigned to the variable k, as at this point the value of within-cluster sum of squares (WCSS) abruptly stops decreasing. The decline in the graph is minor after the value 5 in
Customer Segmentation
53
Algorithm 1: Customer Segmentation using K-Means Output:k Number of Clusters INPUT :Dataset DAT A - Customer Mall Data from Kaggle Require :int K - Number of Clusters 1: function Elbow(DATA) 2: for k in range do plot() //curve bends = optimal point randomly initialize centroids c=c1,c2....ck ; 3: end for 4: end function 5: k = Elbow(DATA); 6: while centroid position does not change do 7: assignment step; 8: for each data point di do 9: find closest centerck ∈ c to data point di ; 10: di → ck; 11: end for 12: update the centroid value; 13: end while
Fig. 2. Elbow Curve
the x-axis, thus the variable k is assigned the value 5. Therefore, the customer data is divided into 5 clusters. 4.2
Customer Segmentation Using DBSCAN
Algorithm 2 describes Customer Segmentation using DBSCAN. The process of the algorithm may be explained by the following steps: 1. First, an arbitrary point is selected from the dataset (until all points have been visited). 2. If there are at least ‘minPoint’ points within a radius of ε the point, we consider all of these points to be part of the same cluster. 3. The clusters are then extended by repeating the neighborhood computation for each surrounding point recursively. 4. Sort the data points into three categories: core points, boundary points, and noise points. 5. Discard the noise points. 6. Assign cluster to a core point.
54
P. Joga et al.
7. Color all the density connected points of a core point. 8. Color boundary points according to the nearest core point. The term core points, boundary points and noise points mentioned in the above step are explained as follows: • Core points: Points that have sufficient number of neighbors with in the radius. • Boundary points: Points that are within radius of a core point, but don’t have sufficient neighbors. • Noise points: Points other than core and boundary.
Algorithm 2. Customer Segmentation using DBSCAN Model Output:k Number of Clusters INPUT :Dataset DAT A - Customer Mall Data from Kaggle INPUT : f loat ε-radius threshold INPUT : int minpts -min number of points required in a cluster 1: ε → avg distance di and its nearest neighbors; 2: x − axis → avg distances; 3: y − axis → DAT A; 4: (elbow of K-distance graph); 5: minpts → no. of dimensionality of dataset; 6: function DBSCAN(DATA,ε, minpts) 7: for each unvisted points diD do 8: consider di as visited; 9: X ← GetN eighbours(di, ); 10: if |X| < minpts then 11: consider di as noise point; 12: else 13: P ← di 14: end if 15: for each data point di ∈ X do 16: X ← X ÷ d‘ ; 17: if d‘ is not visited then 18: mark x as visited; 19: X ← GetN eighbours(d‘; ); 20: if |X ≥ minpts then 21: X ← X X; 22: end if 23: end if 24: if d is not in anycluster then 25: P ←P d’ 26: end if 27: end for 28: end for 29: end function
The performance of the DBSCAN algorithm in terms of silhouette score with increasing number of minimum points in a cluster is shown in Fig. 3. 4.3
Agglomerative Clustering (Using PCA)
Agglomerative model is a hierarchical unsupervised machine learning algorithm that uses bottom-up approach for clustering. Algorithm 3 depicts the Agglomerative clustering using Principal Component Analysis (PCA) for segmentation
Customer Segmentation
55
of customer mall data and the following steps describe the process followed in this model: 1. Scaling the Data: Scaling is used normalize the data in the range of 0-1 in order to reduce variance. We used Min-Max scaling for our data where,
Fig. 3. Min Sample Curve for DBSCAN Model
Xsc =
(X − Xmin) (Xmax − Xmin)
(1)
2. Dimensionality reduction Using PCA: Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality reduction. It converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation. Here, we are fitting the data points in 2 principle components (PCA1,PCA2) as shown in Fig. 4.
Fig. 4. Two Principle Components of the Dataset
3. Make one cluster for each data point: Initially there are 200 clusters, one cluster for each data point. 4. Combing clusters that contain closest pair of elements: We apply greedy approach to combine two clusters close to each other as one using single linkage method. 5. Repeat step 4 until there is only one cluster that contains all the data points 6. Visualize the grouping by creating a Dendrogram and find the optimal number of clusters.
56
P. Joga et al.
Algorithm 3: Agglomerative Clustering with PCA Model Output: Data Points with Clusters INPUT :D- Data Points 1: C , C’ ← n, Di ← {xi}, i = 1, 2, ...n (X−Xmin) 2: Xsc = (Xmax−Xmin)
3: pca → PCA(2) 4: data → pca.fit transform(D) 5: while C’ equals C do 6: C’← C’−1 7: Dj= Find Nearest Cluster(Di) 8: Merge Di and Dj 9: return C clusters 10: for each data point di do 11: find closest center ck ∈ c to data point 12: assign di → ck 13: end for 14: update the Centroid value 15: end while 16: dendograms(P CA model, method → ward) 17: scatterplot(clustering model) 18: Visualize the optimal number of clusters
4.4
di
K-Means Using PCA
Algorithm 4 describes Customer Segmentation using K-Means clustering. The process of the algorithm may be explained by the following steps: 1. Employ PCA for projecting the data in a lower dimensional space: The four features in the dataset are further reduced to 2 components to reduce the noise. 2. Determination of number of clusters in k means: In order to estimate the number of cluster: a. We run the algorithm for different number of clusters b. Calculate WCSS (Within Cluster Sum of Squares) considering different number of clusters.
Fig. 5. Within Cluster Sum of Squares
c. Plot WCSS against the number of components and using elbow method find the optimal number of clusters as shown in Fig. 5.
Customer Segmentation
57
d. Number of possible clusters: 3, 4, 5 and 6 clusters: Create the best scoring model by employing the regular k means model with each of the features. The silhouette scores are calculated for varying number of clusters ranging from 3 to 7 and 2 principle components. The optimal values are selected based on the analysis. 3. Parameters with highest silhouette score are selected . 4. Visualizing the optimal clusters using scatter plot.
Algorithm 4: K Means with PCA Model Output: K Clusters of Data points INPUT :D- Data Points 1: pca → P CA(2) 2: data → pca.f it.transf orm(D) 3: Estimate WCSS 4: while Centroid position does not change do 5: Fit the model with the principal component scores 6: for each data point di do 7: find closest center ck ∈ c to data point di 8: assign di → ck 9: end for 10: update the Centroid value 11: end while 12: Scatterplot(P CAS core, f itmodel) 13: Visualize the optimal number of clusters
In the following section we present the results and discussions.
5 5.1
Results and Discussion K-Means Model
(b) Spending Score Vs An- (c) Spending Score Vs Gennual Income der Male (a) Age Vs Spending Score
Fig. 6. Analyzing relation between Age, Annual Income, Gender and Spending Score
Age is clearly the most important element in determining Spending Score, as shown in Fig. 6a. Younger folks, regardless of their annual income, spend more.
58
P. Joga et al.
Figure 6b clearly shows that 5 separate clusters have been produced. Customers in the red cluster have the lowest income and lowest spending score, while customers in the blue cluster have the highest income and highest spending score. From the plots, it is evident that younger people tend to spend more. Focusing on their interests may be beneficial. • • • •
Cluster 0 has low spending score with low annual income. Cluster 1 has high spending score with higher annual income. Cluster 2 has an average spending score with average annual income. Cluster 3 has low spending score with annual income just greater than average. • Cluster 4 has high spending score and high income with age groups lesser than that in cluster 1. The sihoutte score of K- Means Clustering model is 0.452054753807565 and Davies Bouldin Score is 0.82225964178141. 5.2
DBSCAN
Dbscan also shows that younger people spend more, irrespective of their annual income as depicted in Fig. 7. The points that are marked -1 represents the noise points which does not belong to any cluster. Cluster 0 represents high spending score with low annual income. Cluster 1 represents average spending score with annual income less than average. Cluster 2 depicts customers with high spending score and annual income greater than average. Cluster 3 depicts low spending score and average income customers. Similarly, DBSCAN reveals that age is the most important element to consider, with younger people spending more regardless of their annual income. The sihoutte score is 0.20473300. The Davies Bouldin Score: 2.23593606.
Fig. 7. Spending Score vs Annual Income
Customer Segmentation
5.3
59
Agglomerative Clustering with PCA
In this approach, we fit the updated data into an agglomerative model and construct a dendrogram as shown in Fig. 8a. The sihoutte score is 0.68651389 and the Davies Bouldin Score is 0.40911039.
(a) Dendogram
(b) Clusters with Agglomerative clustering
Fig. 8. Agglomerative Clustering with PCA
5.4
Kmeans with PCA
Fig. 9. Clusters with Kmeans and PCA
60
P. Joga et al. Table 2. Comparison of the clustering models
S. NO CLUSTERING MODEL
Silhouette score Davis-Boulton score
1.
KMeans clustering model
0.44406692
0.82225964
2.
DBSCAN clustering model
0.20473300
2.23593606
3.
Agglomerative Clustering (Dendrograms & PCA) model
0.68651389
0.40911039
4.
PCA with KMeans clustering 0.552626 model
0.584301
Figure 9 show the clusters obtained after applying Kmeans with PAC. In the Figure, • Cluster 1 depicts Customers with medium annual income and medium annual spend • Cluster 2 depicts Customers with high annual income and high annual spend • Cluster 3 depicts Customers with low annual income and low annual spend • Cluster 4 depicts Customers with high annual income but low annual spend • Cluster 5 depicts Customers low annual income but high annual spend It is clear from Fig. 9 that younger people tend to spend more. Table 2 depicts the performance of the four clustering models. From the table, its is clear that taking 5 clusters with 2 principles components gives the optimal clusters considering the high silhouette score of 0.552626.
6
Conclusion
Discovering the various attributes that contribute to a more meaningful customer base allows organization to cater their customer exactly what they want, and thereby increasing customer participation and profits. According to the findings, younger individuals are more likely to use the product(s)/service(s) from mall than older persons. The company should target ads to this demographic to increase conversion rate. The analysis also show that females buy more expensive products than males, and they tend to spend more even if their yearly income is less than $50,000. This is an area where the company should collect additional data for future study.
References 1. Vaidisha Mehta, R.M.S.V.: a survey on customer segmentation using machine learning algorithms to find prospective clients. In: 2021 9th International Conference on Reliability, Infocom Technologies and Optimization, vol. 1, p. 4 (2021) 2. Camilleri, M.A.: Market segmentation, targeting and positioning. In: Travel Marketing, Tourism Economics and the Airline Product. THEM, pp. 69–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49849-2 4
Customer Segmentation
61
3. Windler, K.: Identifying the right solution customers: a managerial methodology. Indus. Market. Manage. 60, 173–186 (2017) 4. Thakur, R., Workman, L.: Customer portfolio management (cpm) for improved customer relationship management (crm): are your customers platinum, gold, silver, or bronze? J. Bus. Res. 69(10), 4095–4102 (2016) 5. Smith, W.: Product differentiation and market segmentation as alternative marketing strategies. J. Marketing 21(1), 3–8 (1956) 6. Kansal, T.: Customer Segmentation using K-means Clustering, no. 1, p. 4. IEEE (2018) 7. Meghana, N.M.: Demographic strategy of market segmentation. Indian J. Appl. Res. 6(5), 6 (2016) 8. Liu, H.: Personality or value: a comparative study of psychographic segmentation based on an online review enhanced recommender system. In: MDPI (2019) 9. Goyat, S.: The basis of market segmentation: a critical review of literature. Europ. J. Business Manage. 3, 45–54 (2011) 10. Susilo, W.H.: An impact of behavioral segmentation to increase consumer loyalty: empirical study in higher education of postgraduate institutions at Jakarta. In: 5th International Conference on Leadership, Technology, Innovation and Business Management (2016)
An Intelligent Approach to Identify the Eggs of the Insect Bemisia Tabaci Siwar Mahmoudi1(B) , Wiem Nhidi2 , Chaker Bennour3,4 , Ali Ben Belgacem4 , and Ridha Ejbali2 1 Faculty of Sciences of Gabes, Gabes University, Gabes, Tunisia
[email protected]
2 Research Team in Intelligent Machines, National School of Engineers of Gabes,
University of Gabes, B.P. W 6072 Gabes, Tunisia 3 Department of Biology, Faculty of Sciences, Gabes University, Gabes, Tunisia 4 Dryland and Oases Cropping Laboratory, Arid Regions Institute, Medenine, Tunisia
Abstract. Bacterial scab and viral diseases of plants caused by the whitefly represent a prolem that attracts the attention of biologists. Among the many species of whiteflies, the B.tabaci, is an insect able to attack multiple crops, weeds, and ornamental hosts. Their small size belies, their ability to reproduce quickly and their skills for moving over relatively short distances contribute to put several potential hosts at the risk of infestation. Plant protection against these insects is critical for increasing crop quantity and quality. At the time and up to now, farmers have used conventional and manual means of protection against this insect. So, an effective protection strategy must start with early detection and identification of this type of insect in order to know if it is a female egg. In recent years, deep learning in general and auto-encoders have given excellent results in many images classification tasks. This gave us the opportunity to improve the accuracy of classification in the field of agriculture and the identification of insect eggs on plants. Keywords: Artificial Intelligent (AI) · Deep Learning (DL) · Auto-encoders (AE) · Bemisia Tabaci. Identify the egg of Bemisia Tabaci
1 Introduction Agriculture all over the world, especially in Tunisia, has been a major source of food supply for humanity for several centuries, with all countries depending mainly on agriculture for their food. But these crops are lately affected by diseases that are often considered one of the main factors that decreases yield [1]. For example, downy mildew is one of the most widespread diseases in the world and can lead to a significant drop in crop yields and numerous lesions on plants and fruits [2]. In Tunisia, among the economically important diseases that affect plants and crops, it can be found bacterial scabs, which are caused by species of Streptomyces scabies. It is a viral disease caused by a virus in the plant organism. These viruses can also be transmitted by insects, mites, nematodes, fungi, infected pollen or vegetative propagating material, by contact between plants and by infected or contaminated seeds [3]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 62–70, 2023. https://doi.org/10.1007/978-3-031-35510-3_7
An Intelligent Approach to Identify the Eggs of the Insect Bemisia Tabaci
63
Farmers are facing major threats due to the emergence of various pests and diseases in their crops. Fungi, bacteria, and viruses are some of the common causes of infection that are caused by some types of insects such as whiteflies and especially Bemisia Tabaci. Viruses are obligate parasites, meaning they need a living host to grow and multiply. Once inside an injured cell, the virus particle loses its protein coat and the nucleic acid then directs the production of multiple copies of itself and related proteins, resulting in the development of new virus particles. In this context, we will talk about those pests that affect agriculture, including insects, especially whiteflies. Protecting plants against these insects is crucial to improving the quantity and quality of crops. An effective protection strategy must begin with early detection and identification of this type of insect. Generally, the main problems of most farmers are that they do not have a specific method to detect and identify these insects. Indeed, it is during the daily watering of the field that the farmers realize the infection in the plants. Even after this observation, not necessarily knowing what the problem is, they do not use the appropriate control methods. For good identification, it is necessary to recourse to experts in entomology, which most farmers do not do because of the high cost of the intervention. There are only classical and manual methods of control against the insect in general and B.tabaci in particular, we can mention the cultural control (prevention, daily monitoring, installation of insect-prof nets, weeding, the use of yellow sticky strips as trap), the biological control (the use of auxiliaries like, Encarsia formosa, Eretmocerus eremicus and Eretmocerus mundus). These are parasitoid insects that parasitize the larvae and lay eggs. They can lay up to 300 eggs. This egg hatches in the larva and continues to feed on the worm until it dies. There is also chemical control (the use of pesticides) which means that the farmer must focus on the active ingredient (material that acts to obtain the desired effect and called “pesticide”) in the chemical product and not the trade name of the product. Also, the farmer should not use the same product for each treatment. The right strategy is the use of integrated pest management to minimize the use of chemicals to protect the environment, but we are still looking for reliable computer methods to improve and at the same time simplify this task. As part of this work, we focus on the detection and early identification of these types of insects in order to limit their spread and indirectly avoid the drop in yields they cause. So, to solve the problem mentioned earlier and to help biologists, we propose to combine the methods of image processing and machine learning in order to be able to detect and identify this insect and neutralize it from images of the eggs of insects on tomato and eggplant leaves.
2 Overview of Proposed Approach Our objective here is to determine the type of eggs of the venomous insect Bemisia Tabaci only by the eggshell image, which can change from a fertilized egg to an unfertilized egg. The image classification model is induced by a deep learning (DL) algorithm, using images of 120 eggs.
64
S. Mahmoudi et al.
2.1 Deep Learning Algorithm DL is a machine learning method that has already proven to be very effective in various domains, including image recognition, language processing, speech recognition, biology and robotics [7]. DL algorithms were rapidly developed. Whenever the data base is rising, and the GPU performance is also high, so DL automatically is improving leads to good results [7]. In Deep-learning we find the use of an auto-encoder which is a type of unsupervised learning algorithm, it consists of a deep neural network allowing us to build a new data representation. 2.2 Auto-encoder An auto-encoder is a neural network trained to predict its own inputs (x = x’). It aims to minimize the reconstruction error of the input data, as shown in Fig. 1:
Fig. 1. Auto-Encoder architecture
We want to build our own network that takes as input the pixels of an unlabelled image and outputs the features. We used the principle of autoencoder to create our network. We started to train this auto encoder without using labels to have an unsupervised training. The architecture of an auto-encoder consists of two sets of neural layers. The first set forms what is called the encoder, which processes the input data to build new representations (code). The second set, called the decoder, attempts to reconstruct the data from this code. • The Encoder The encoder serves to compress the input data into a smaller representation. To do this, it extracts the most important features from the initial data. This results in a compact formation called a bottleneck also known as the latent space. • The Decoder Unlike the encoder, the decoder decompresses the bottleneck to reconstruct the data. Its challenge is to use the features contained in the packed vector to try to reconstruct the dataset as faithfully as possible.
An Intelligent Approach to Identify the Eggs of the Insect Bemisia Tabaci
65
• The Latent Space The latent space corresponds to the compressed data, i.e. the space between the encoder and the decoder. The purpose of creating this latent space is to limit the flow of information between the two components of the auto-encoder. This limitation translates into the suppression of noise in order to let only the important information pass. 2.3 Stacked Auto-encoder for Egg Classification It’s possible that a single autoencoder will be unable to reduce the input features’ dimensionality. We therefore employ stacked autoencoders for certain application scenarios. As the name implies, the stacked autoencoders consist of several encoders layered on top of one another. The image displays a stacked autoencoder with three encoders placed on top of one another.
Fig. 2. Overview of proposed approach
2.3.1 Feature Extraction Using Fast Beta Wavelet Transform (FBWT) We have raw images of a Bemisia Tabaci egg. We have exploited FBWT for descriptor extraction. Our beta wavelet network architecture (BWNA) is that: The hidden layer is composed only of wavelet functions (horizontal (ψHi), diagonal (ψDi), and vertical (ψVi) wavelets) and scaling (ψi) that contribute more to the reconstruction of the image, knowing that at a given iteration, one selects a scale function (ψi),
66
S. Mahmoudi et al.
a horizontal wavelet function (ψHi), a diagonal wavelet function (ψDi), and a vertical wavelet function (ψVi) to obtain the same number of each (see Fig. 2). [4, 5] provide additional information. This technique greatly reduces and compresses the size of the lattice. As a result, after applying the Fast Beta Wavelet Transform (FBWT) defined in [4, 5] to the original image, we compute the shape descriptor using the coefficients of the wavelets with the best contributions. As shown in Fig. 3, the texture descriptor is extracted using the coefficients of the best scaling functions, and the color descriptor is computed using the approximated image [6].
Fig. 3. Feature extraction using FBWT
2.3.2 Extract the Descriptors This is the result of the application of FBWT as shown in Fig. 3. We obtained 3 descriptors for each image (shape, color and texture): The shape is represented by 6 features, the color by 6 features and the texture by 64 features. 2.3.3 Fusion of Descriptors We merged three descriptors into a single matrix containing 76 features and added padding zero to obtain a square matrix of size 9 * 9. 2.3.4 Stacked Auto-encoder We have used two auto-encoders (Table 1): Table 1. The parameters of two auto-encoders Parameters
Layers size
L2Weight Regularization
Sparsity Regularization
Sparsity Proportion
Autoenc1
76
0.004
4
0.15
Autoenc2
50
0.002
4
0.1
An Intelligent Approach to Identify the Eggs of the Insect Bemisia Tabaci
67
The steps for creating our network are as follows: • Step 1: Create a wavelet network with a single hidden layer whose transfer function is based on a wavelet family • Step 2: Create another wavelet network by removing the last layer to generate the features obtained in the first hidden layer • Step 3: Train the second autoencoder using the features that were generated by the first autoencoder (step 2) • Step 4: Remove the final layer to generate the features obtained in the second hidden layer • Step 5: Stack the encoders of the autoencoders to form a deep wavelet array • Step 6: Repeat steps 3, 4 and 5 depending on the number of hidden layers desired. After the learning phase, we organize the obtained vectors into a matrix with each column representing an image in order to apply the classification phase. After the retraining of the model, we can test it to make predictions by giving it as input, an original image (an egg of the insect B.tabaci) to have the result of the classification (if it is an egg fertilized or an unfertilized egg). The stacked autoencoder phase in Fig. 2 illustrates the steps involved in creating our network.
3 Results and Discussion Our goal is to use the stacked auto-encoder architecture in egg datasets and improve image classification by identifying Bemisia Tabaci types (fertilized/unfertilized). 3.1 Databases Used To evaluate our algorithm, we then used a large number of images of the eggs of the insect B.tabaci on tomato and eggplant leaves which is from M.Chaker Bennour, to train our classification model. He has acquired it from the same geographic and throughout a long period of time. This dataset consists of 120 egg images divided into two classes, 60 images of fertilized egg and 60 images of unfertilized egg of 2 different types of plants, tomatoes and eggplant. Each input image size is 50 * 50 * 3. Figure 4 shows a sample of images from the dataset that we extracted:
Fig. 4. Example of a Bemisia Tabaci egg image
68
S. Mahmoudi et al.
3.2 Result 3.2.1 Accuracy This parameter sums all true positives and true negatives and divides them by the total number of instances. It provides an answer to the following question: of all the positive and negative classes. High values of this parameter are often desirable. It should be calculated with the following formula: Acc =
vraipositif + vrain´egatif total
3.3 Test Phase Finally, we moved on to the Test phase: We made a test database (1/3 of the database) and we tested. Then we repeated the training phase to improve the feature weights and tested. The classification results of the trained model with the egg images are shown in Fig. 5.
Fig. 5. Confusion Matrix for Egg classification
According to this figure, we can notice that the model has a good prediction percentage:
An Intelligent Approach to Identify the Eggs of the Insect Bemisia Tabaci
69
• 83.3% correct classification (41.7% correct classification of c1 + 41.7% correct classification of c2) • 16.7% incorrect classification (8.3% incorrect classification of c1 + 3.8% incorrect classification of c2)
4 Conclusion Plants are victims of several diseases due to the insect. From bacterial diseases to viral diseases, each of these pathologies has a negative impact on the crop yield. In this paper we have implemented an automated detection system based on auto-encoders to identify the types of insect eggs in order to reduce the severity of these diseases on tomato and eggplant crop. Extensive experiments show that our proposed algorithm gives a good result and shows high performances, which involves the robustness for our new method. We aim at improving our system by exploiting other classifier and use other algorithms for training a deep neural network other than autoencoder, and we expect better results than this if we apply our method to a larger data set. Acknowledgment. The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program.
References 1. Hanssen, M., Lapidot, M.: Major tomato viruses in the mediterranean basin. Adv. Virus Res. 84 (2012) 2. Blancard, D.: Tomato diseases. Academic Press the Netherlands (2012) 3. Viral diseases. Disponible depuis: 8https ://ausveg.com.au/biosecurity-agrichemical/crop protection/overview-pests-diseases-disorders/viral diseases/. Consulté le 27 mai aux environs de 15h30 4. Jemai, O., Zaied, M., Ben Amar, C.: Fast learning algorithm of wavelet network based on fast wavelet transform. Int. J. Pattern Recogn. Artifi. Intell. 25(8) 1297–319 (2011) 5. Zaied, M., Said, S., Jemai, O., Ben Amar, C.: A novel approach for face recognition based on fast learning algorithm and wavelet network theory. Int. J. Wavelets Multiresolution Inf. Process. 9(6) 923–945 (2011) 6. ElAdel, A., Ejbali, R., Zaied, M., Amar, C.B.: A new system for image retrieval using beta wavelet network for descriptors extraction and fuzzy decision support. SoCPaR 232–236 (2014) 7. Nhidi, W., Ejbali, R., Dahman, H.: An intelligent approach to identify parasitic eggs from a slender-billed’s nest. In: Proceedings SPIE 11433, Twelfth International Conference on Machine Vision (ICMV 2019) (2019) 8. ElAdel, A., Ejbali, R., Zaied, M., Amar, C.B.: Deep learning with shallow architecture for image classification. In: International Conference on High Performance Computing & Simulation (HPCS) (2015) 9. Hassairi, S., Ejbali, R., Zaied, M.: A deep convolutional neural wavelet network to supervised Arabic letter image classification. In: 15th International Conference on Intelligent Systems Design and Applications (2015)
70
S. Mahmoudi et al.
10. Wiem, N., Ali, C.M., Ridha, E.: Wavelet Feature with CNN for Identifying Parasitic Egg from a Slender-Billed’s Nest In: International Conference on Hybrid Intelligent SystemsHIS (2020) 11. ElAdel, A., Ejbali, R., Zaied, M.: A hybrid approach for content-based image retrieval based on fast beta wavelet network and fuzzy decision support system. CB Amar Mach. Vision Appli. 27(6), 781–799
Overview of Blockchain-Based Seafood Supply Chain Management Nesrine Ouled Abdallah1,2(B) , Fairouz Fakhfakh2,3 , and Faten Fakhfakh2 1
German University of Technology in Oman, Halban, Oman [email protected], [email protected] 2 ReDCAD Laboratory, ENIS, University of Sfax, Sfax, Tunisia [email protected] 3 ISMAIK, University of Kairaoun, Kairaoun, Tunisia
Abstract. Seafood products are among the most commonly traded food products. Unfortunately, this industry is suffering from many malpractices and unsustainable resource management. It is also suffering from operational inefficiencies and mistrust among supply chain stakeholders. At the same time, the demand for transparency is increasing not only to boost operating efficiency in the supply chain but also to ensure that accurate product information is accessible to end consumer. In order to address this issue, Blockchain is being regarded as a promising tool to efficiently manage a supply chain. This paper focuses on Blockchain-based seafood supply chain management. It brings out an exhaustive survey of such works and presents a comparative analysis of the studied approaches. Finally, it outlines some critical challenges for further research. Keywords: Blockchain · Supply Chain · Seafood · Survey · Challenges
1
Introduction
The seafood industry is one of the oldest and largest segments of the economy worldwide. It is a significant source of national food supplies, employment, and improvement of many countries [1]. However, this industry suffers from malpractices and poor management of fisheries: illegal, unreported and unregulated fishing, over-fishing, and by-catch. This drains fish populations and impacts the marine ecosystem balance. To tackle these malpractices, sustainability is of crucial importance. The demand for sustainable seafood is increasing around the world. According to the World Commission on Environment and Safety, “Sustainability is meeting the needs of the present generation without compromising the ability of the future generations in meeting their own needs” [2]. Sustainability becomes then one of the main challenges in a seafood supply chain. A seafood supply chain can imply many middlemen (intermediaries) between the fisherman and the consumer. It is usually a complex one that links fishermen, transporters, traders, processors, exporters, retailers and consumers. To efficiently manage such complex chain, administrative challenges are imposed. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 71–80, 2023. https://doi.org/10.1007/978-3-031-35510-3_8
72
N. Ouled Abdallah et al.
Keeping track also of the different processes and the ownership at different stages is hard. Any lack of planning, coordination and monitoring between all members can make seafood supply chain more vulnerable to the risk of corruption, poor transparency, fragmentation and lack of real-time information. In addition, seafood products are highly perishable. Once caught, they have to go through different stages of the supply chain before reaching the end consumer. Thus, if poorly controlled and inadequately handled, their safety can be impaired and a high risk of contamination can affect the supply chain. Consequently, the health and safety of the consumer will be also compromised [3]. According to AlRizeiqi et al. [4], the lack of integrated traceability in the seafood supply chain can contribute to product recalls, waste and exclusion from lucrative export markets. Therefore, traceability, tracking and transparency are required and essential in the seafood supply chain to prevent seafood contamination, protect public health, boost seafood companies, improve trade and maintain consumer trust. Some traceability systems, such as Trace Register [5] and the NGO Ecotrust’s ThisFish [6], were suggested for conventional seafood supply chain. However, they were not efficient enough. In fact, they are centralized. This means that the owner of the data is just one partner of the supply chain and the integrity of the data depends on an implicit trust of this partner [7]. Having a centralized database does not allow transparency in the supply chain and makes data vulnerable to tampering. Different other technologies such as: Radio-identification (RFID), barcodes, Global Policy and Strategy (GPS), etc. were also introduced in conventional supply chain to gather data in the different processes and stages of the supply chain [8]. Nevertheless, the data collected is not resilient to intentional or accidental modifications. [9–11] are among several works that introduced the paradigm of block-chain to overcome many of these challenges. Blockchain has lately come to light as a promising technology for sharing and safeguarding vulnerable data in distributed and decentralized networks [12]. This technology is a form of distributed ledger technology. In fact, a blockchain is a growing list of records, called blocks, that are linked using cryptography [13]. Blockchain technologies have many key characteristics such as decentralization, persistency, anonymity, and auditability (which improves the traceability and the transparency of the data stored). Consequently, they were adopted in many application fields including public and social services (Education, Land Registration, ...), financial services and risk management (Enterprise transformation, P2P financial market, ...) and the Internet of Things. According to [13,14], Blockchain technologies have also a great potential to transform each step of a supply chain. Therefore, they are being used to improve supply chain management in many fields including pharmaceutical industry [15], shipping logistics [16], food production (e.g.: coffee [17]) and especially seafood industry [18]. In the literature, there are a number of useful surveys which have been conducted to emphasize the adoption of Blockchain technology in food supply chains management [19–23]. However, these works study the management of food sup-
Overview of Blockchain-Based Seafood Supply Chain Management
73
ply chains in general. To the best of our knowledge, our paper is the first review that focuses on the incorporation of Blockchain in the specific context of seafood supply chain management. The rest of this paper will be as following. In Sect. 2, we provide an overview of the Blockchain technology and how it could be used to manage supply chains. Section 3 reviews the existing approaches of using Blockchain in seafood supply chains management. In Sect. 4, we establish a comparison of these approaches based on some criteria and we identify a set of challenges and possible future research directions. Finally, the last section summarizes the paper.
2
Blockchain Technology: Overview and Adoption in Supply Chains Management
The Blockchain technology was introduced by Nakamato in 2008 [24]. A Blockchain can be compared to a public ledger in which a chain of blocks contains the records of all committed transactions made by different actors. This chain continues to expand when new blocks are appended to it. The blocks in the chain are linked to each other using cryptography. The Blockchain technology has some key characteristics among which we cite the followings: • Decentralisation: Traditional centralized transaction systems require that every transaction has to be verified and approved by a single trusted third party. This unavoidably causes cost and performance problems at the central servers. In contrast, a Blockchain network transaction can be made between any two peers (P2P) without the need for centralized authentication. As a result, performance bottlenecks at the central server can be reduced. Server costs, such as those associated with development and operation, can also be greatly decreased. • Persistency: It is almost impossible to tamper with the blockchain network’s transactions. In fact, each transaction must be verified and recorded in blocks that are distributed across the entire network. Each broadcasted block would also undergo transaction verification and validation by other nodes. Therefore, any falsification might be immediately detected. • Anonymity: The Blockchain network allows each user to communicate with the others using a created address. In order to protect his identity, a user could create many addresses. Hence, the private information of users is no longer maintained by a single entity. The transactions stored in the Blockchain are kept somewhat private thanks to this approach. • Auditability: Users can easily check and track the history of records by gaining access to any distributed network node because every transaction on the Blockchain is validated and recorded with a timestamp. Thanks to these advantages, Blockchain is becoming a perfect candidate to support supply chain management offering its different actors a suitable and secure environment for collaborative decision making and allowing a great cost
74
N. Ouled Abdallah et al.
saving and efficiency improvement. Blockchain can hence be used in different types of supply chain such as healthcare supply chain [25], manufacturing supply chain [26], food supply chain [27] and in particular seafood supply chain.
3
An Overview of Blockchain Based Seafood Supply Chain Management Systems
A seafood supply chain is composed of different actors. Nasurudeen et al. divided it into two groups, the provider part and the customer part [28]. The provider part takes the raw material (seafood) from the fishermen and transfers it to the warehouse to be ready for the distribution to the retailer/customer. The main problems in this supply chain can be the delay during the distribution process, the lack of transparency, the lack of real-time information sharing, and the difficulty to trace the circulation of flow. To solve these problems, many researchers investigated the possibility of incorporating Blockchain technology in seafood supply chain management. In [29], the authors investigate the feasibility of deploying Blockchain in manufacturing and perishable goods/seafood supply chains for the aim of provenance tracking, handling storage, transportation, product history, and tamper proof checks. In fact, to improve the traceability in the supply chain, Fishcoin was proposed [18]. It is a Blockchain-based data ecosystem dedicated to the seafood supply chain. It uses tokens to facilitate data collection and distribution across fish supply chains. To handle the problem of the delay of delivered products between the different parties, Nasurudeen et al. designed smart tags for seafood supply chain using Blockchain [28]. These tags are included from the provider to reach the final customer. The benefits of these special tags is to get the details about the product (such as: expiry date, etc.) that is trustworthy of product manufacturing. In [30], Tsolakis et al. focused on the fishery ecosystem in Thailand. They proposed an architecture of Blockchain-centric food supply chains that supports sustainable development goals. As the data and organizational integrity are major issues in supply chains, it is important to have international collaboration and integration. Therefore, a framework is implemented to tackle these issues. In the fish industry, Blockchain will build the real-time supply network technologies needed (e.g. visibility and data-enabled reporting of product quality) to ensure network performance and effectiveness. In another proposal [31], Mathisen investigated the strategic conformity of Blockchain application to Norwegian aquaculture. The author explored also the potential benefits of this application by considering the expected efficiency in terms of sustainability, dependability, quality, and cost. In [32], the authors introduced a traceability solution using Ethereum smart contracts [33] to trace and track fish products in the fishery supply chain. This solution avoids several types of malpractices and fish fraud by guaranteeing transparent interactions between all parties. It consists in developing a set of
Overview of Blockchain-Based Seafood Supply Chain Management
75
smart contracts to track events in the fishery supply chain process and test its functionalities. In the same context, Low et al. [34] introduced a seafood traceability tool which provides a Web interface for suppliers for inputting seafood data such as product types, storage temperature, location, etc. In order to validate the proposed tool, the authors presented a tuna fish case study to ensure the credibility and reliability of the Blockchain based supply chain. This tool can take immediate action if any contamination occurs. Consequently, further losses can be avoided and the risk of offering contaminated seafood to the costumer is eliminated. Moreover, Hang et al. proposed a Blockchain-based fish farm platform to guarantee agriculture data integrity [35]. This platform aims to provide fish farmers with secure storage to guarantee large amounts of data that cannot be altered. In addition, a smart contract is used to automate the data processing in the fish farm. Furthermore, Hang et al. implemented a proof of concept that integrates a legacy fish farm system with the Hyperledger Fabric Blockchain on top of the proposed architecture. Wang et al. suggested also to adopt Blockchain within fish supply chains [36]. In fact, they developed a multi-layer Blockchain based system which is integrated into seafood supply chains. The proposed system provides trusted fish origin, real-time supply chain condition tracking, and automated quality assessment. Fishers can receive fish quality feedback to encourage the best fishing practice. Sydney Fish Market and industry regulatory bodies can conserve, manage, share, and examine the captured data in the Blockchain based system. In [37], the authors presented an enterprise-level IoT platform for seafood farmers that satisfies the end-to-end traceability needs of consumers. This Blockchain-based platform aims to serve seafood for the global population and it can be integrated with customer-oriented applications and value chain operators. Consumers can obtain information about suppliers and products while enterprises can use it to decrease costs and have increased market opportunities. In another work, Larissa et al. [38] proposed a new model of the supply chain with Blockchain technology for the Indonesian fishing industry. Data recording is performed creating a QR code on each player, and tracking history will be checked by scanning the QR code. So, clients can scan the QR code and consult all fish product history. This can increase the traceability and trust in the client while buying products.
4
Discussion and Research Challenges
In order to compare the different existing approaches of adopting the Blockchain technology within the seafood supply chain management systems, we present the following table (see Table 1).
76
N. Ouled Abdallah et al. Table 1. Comparison of the reviewed approaches
Year Reference
Case study
Used technologies Validation strategy
2018 Mathisen et al. [31]
Norwegian fish industry
QR code
2019 Mondragon et al. [29]
Composite materials for Not mentioned aerospace applications and perishable goods/live seafood
Not mentioned Simulation
2020 Demestichas et al. [39] Asian seafood
IOT
Not mentioned
2020 Ahamed et al. [28]
Not mentioned
QR code
Not mentioned
2020 Hang et al. [35]
A legacy fish farm system
IOT
Simulation
2020 Jæger et al. [37]
An enterprise-level IOT platform for seafood farmers
IOT
Simulation
2021 Tsolakis et al. [30]
Thai fish industry (Canned tuna manufacturing, local fishing operations, commercial fishing operations and trade)
RFID
Simulation
2021 Low et al. [34]
Tuna fish and shrimp case Not mentioned studies in Malaysia
Simulation
2021 Larissa et al. [38]
Fishing industry in Indonesia
Not mentioned
2021 Wang et al. [36]
Sydney fish market
IOT
Simulation
2022 Patro et al. [32]
Wild-caught fish and farmed fish
Ethereum smart contracts
Simulation
QR code
In this comparison study, we are considering the following metrics: • Case study: It represents the case study used to evaluate the proposed approach. • Used technologies: It indicates the technology introduced in the seafood supply chains to track and gather data in the different processes and stages of the supply chain (such as Radio Frequency Identification (RFID), Quick Response (QR) code, etc.). • Validation strategy: It indicates how a proposed protocol is validated in terms of correctness and performance criteria. This consists in ensuring whether a protocol is operating as expected. As the application of Blockchain technology in seafood supply chain management presents many benefits, it has become the subject of several research works. However, there are still many open issues that are not yet well addressed in the literature. In what follows, we present some of them to pave the way for future studies as new challenges: • Information flow has to be enhanced and a unique Blockchain model must be designed in order to integrate all the different stakeholders together regardless of standards/social diversity.
Overview of Blockchain-Based Seafood Supply Chain Management
77
• Most of the presented works are at the phase of a proof of concept and only provide an initial assessment of the proposed approaches. Additionally, there is no testing of the systems in real-world conditions and on a large scale. This certainly prevents the correct assessment of possible issues or limitations. • The area of traceability must control the design of data collection and sharing. The main purpose of data recording is to prevent illegal, unregulated and unreported fishing operations. Besides, some crucial operations are overlooked in terms of data monitoring, including: – Monitoring the volume and the condition of the ice used for fishing activities is a major problem. – Unloading fish on the port is another problem that has to be considered. In fact, cleaning processes of the adopted surfaces, if any exist, are not documented. – Sorting the fish carried out on the ship or on the port is subject to several errors such as species and quality. Thus, data interoperability is a challenge able task as several supply chain operations are often being neglected. Nevertheless, the effect of such operations can impact the sustainability of fish supply network processes. • The specification and the proof of the correctness of the mechanisms adopted for seafood supply chain management is one of the major challenges. In fact, formal verification [40] is extremely important for preventing design errors and ensuring the properties defined by the designer. For example, timing constraints have to be considered to ensure the supply chain on time-delivery performance.
5
Conclusion
Blockchain is an emergent technology that has some key characteristics such as decentralisation, persistency and auditability. These key characteristics are making it a suitable solution to be integrated in the management systems of supply chains, especially seafood supply chains. In this paper, we have addressed this specific context. In fact, we presented an overview of different existing works and approaches. We then analyzed and compared these works while considering some criteria such as the use case, the technology used, etc. Finally, we have emphasized a set of open issues in this field that represent research challenges and that once resolved could significantly improve the existing solutions. As a future work, we aim to design some common benchmarks that will allow a better evaluation and comparison of the existing approaches and algorithms for the management of seafood supply chains based on Blockchain technology. We also aim to suggest solutions to solve some of the mentioned open issues such as formal verification methods to avoid design errors. Acknowledgements. This research work was funded by the TRC project “Seafood Supply Chain Management Based on an IoT Integrated Blockchain”.
78
N. Ouled Abdallah et al.
References 1. Hardt, M.J., Flett, K., Howell, C.J.: Current barriers to large-scale interoperability of traceability technology in the seafood sector. J. Food Sci. 82(S1), A3–A12 (2017) 2. Thomsen, C.: Sustainability (world commission on environment and development definition). Encyclopedia of Corporate Social Responsibility, pp. 2358–2363. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28036-8 531 3. Aich, S., Chakraborty, S., Sain, M., Lee, H.I., Kim, H.C.: A review on benefits of IoT integrated blockchain based supply chain management implementations across different sectors with case study. In: 21st International Conference on Advanced Communication Technology (ICACT), pp. 138–141. IEEE (2019) 4. AlRizeiqi, M.H., Hargaden, V., Walsh, P.P.: Quantifying the benefits of supply chain traceability in Oman’s seafood industry (2016) 5. Eisl, J.: Trace register allocation. In: Proceedings of the ACM International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), pp. 21–23 (2015) 6. Szostek, O.: Global resource systems (2014) 7. Cook, B., Zealand, W.: Blockchain: transforming the seafood supply chain. World Wide Fund for Nature (2018) 8. Vella, D.: Using technology to improve supply chain management. SourceToday, Nov (2012) 9. Galvez, J.F., Mejuto, J.C., Simal-Gandara, J.: Future challenges on the use of blockchain for food traceability analysis. TrAC, Trends Anal. Chem. 107, 222–232 (2018) 10. Rejeb, A.: Blockchain potential in tilapia supply chain in ghana. Acta Technica Jaurinensis 11(2), 104–118 (2018) 11. Cruz, E.F., da Cruz, A.M.R.: Using blockchain to implement traceability on fishery value chain. In: the International Conference on Software Technologies (ICSOFT), pp. 501–508 (2020) 12. Back, A., et al.: Enabling blockchain innovations with pegged sidechains. 72, 201– 224 (2014) 13. Babich, V., Hilary, G.: Om forum-distributed ledgers and operations: what operations management researchers should know about blockchain technology. Manufact. Serv. Oper. Manage. 22(2), 223–240 (2020) 14. Goyat, R., Kumar, G., Rai, M.K., Saha, R.: Implications of blockchain technology in supply chain management. J. Syst. Manage. Sci. 9(3), 92–103 (2019) 15. Al-Saqaf, W., Seidler, N.: Blockchain technology for social impact: opportunities and challenges ahead. J. Cyber Policy 2(3), 338–354 (2017) 16. Laaper, S., Fitzgerald, J., Quasney, E., Yeh, W., Basir, M.: Using blockchain to drive supply chain innovation. In: Digitalization in Supply Chain Management and Logistics Proceedings of the Hamburg International Conference of Logistics 2013, Vol. 1 (2017) 17. Thiruchelvam, V., Mughisha, A.S., Shahpasand, M., Bamiah, M.: Blockchain-based technology in the coffee supply chain trade: case of burundi coffee. J. Telecommun. Electron. Comput. Eng. 10(3–2), 121–125 (2018) 18. Howson, P.: Building trust and equity in marine conservation and fisheries supply chain management with blockchain. Mar. Policy 115, 103873 (2020)
Overview of Blockchain-Based Seafood Supply Chain Management
79
19. Tsoukas, V., Gkogkidis, A., Kampa, A., Spathoulas, G., Kakarountas, A.: Blockchain technology in food supply chain: a state of the art. In: 6th SouthEast Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), pp. 1–8. IEEE (2021) 20. Duan, J., Zhang, C., Gong, Y., Brown, S., Li, Z.: A content-analysis based literature review in blockchain adoption within food supply chain. Int. J. Environ. Res. Public Health 17(5), 1784 (2020) 21. Dom´ınguez, J.P., Roseiro, P.: Blockchain: a brief review of agri-food supply chain solutions and opportunities. ADCAIJ: Adv. Distrib. Comput. Artif. Intell. J. 9(4), 95 (2020) 22. Juma, H., Shaalan, K., Kamel, I.: A survey on using blockchain in trade supply chain solutions. IEEE Access 7, 184115–184132 (2019) 23. Etemadi, N., Borbon, Y., Strozzi, F.: Blockchain technology for cybersecurity applications in the food supply chain: a systematic literature review. In: Proceedings of the XXIV Summer School “Francesco Turco”-Industrial Systems Engineering, pp. 9–11. Bergamo, Italy (2020) 24. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Decentralized Business Review (2008) 21260 25. Jadhav, J.S., Deshmukh, J.: A review study of the blockchain-based healthcare supply chain. Soc. Sci. Hum. Open 6(1), 100328 (2022) 26. Raja Santhi, A., Muthuswamy, P.: Influence of blockchain technology in manufacturing supply chain and logistics. Logistics 6(1) (2022) 27. Pandey, V., Pant, M., Snasel, V.: Blockchain technology in food supply chains: review and bibliometric analysis. Technol. Soc. 69, 101954 (2022) 28. Ahamed, N.N., Karthikeyan, P., Anandaraj, S., Vignesh, R.: Sea food supply chain management using blockchain. In: 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 473–476. IEEE (2020) 29. Mondragon, A.E.C., Coronado, C.E., Coronado, E.S.: Investigating the applicability of distributed ledger/blockchain technology in manufacturing and perishable goods supply chains. In: IEEE 6th International Conference on Industrial Engineering and Applications (ICIEA), pp. 728–732. IEEE (2019) 30. Tsolakis, N., Niedenzu, D., Simonetto, M., Dora, M., Kumar, M.: Supply network design to address united nations sustainable development goals: a case study of blockchain implementation in Thai fish industry. J. Bus. Res. 131, 495–519 (2021) 31. Mathisen, M.: The application of blockchain technology in Norwegian fish supply chains-a case study. Master’s thesis, NTNU (2018) 32. Patro, P.K., Jayaraman, R., Salah, K., Yaqoob, I.: Blockchain-based traceability for the fishery supply chain. IEEE Access (2022) 33. Wang, Z., Jin, H., Dai, W., Choo, K.K.R., Zou, D.: Ethereum smart contract security research: survey and future research opportunities. Front. Comput. Sci. 15(2), 1–18 (2021) 34. Low, X.Y., Yunus, N.A., Muhamad, I.I.: Development of traceability system for seafood supply chains in Malaysia. Chem. Eng. Trans. 89, 427–432 (2021) 35. Hang, L., Ullah, I., Kim, D.H.: A secure fish farm platform based on blockchain for agriculture data integrity. Comput. Electron. Agric. 170, 105251 (2020) 36. Wang, X., et al.: Blockchain-enabled fish provenance and quality tracking system. IEEE Internet Things J. 9(11), 8130–8142 (2021) 37. Jæger, B., Mishra, A.: IoT platform for seafood farmers and consumers. Sensors 20(15), 4230 (2020)
80
N. Ouled Abdallah et al.
38. Larissa, S., Parung, J.: Designing supply chain models with blockchain technology in the fishing industry in Indonesia. In: IOP Conference Series: Materials Science and Engineering, vol. 1072, 012020. IOP Publishing (2021) 39. Demestichas, K., Peppes, N., Alexakis, T., Adamopoulou, E.: Blockchain in agriculture traceability systems: a review. Appl. Sci. 10(12) (2020) 40. Holzmann, G.J., Peled, D.: An improvement in formal verification. In: Formal Description Techniques VII. IAICT, pp. 197–211. Springer, Boston, MA (1995). https://doi.org/10.1007/978-0-387-34878-0 13
Synthesis of a DQN-Based Controller for Improving Performance of Rotor System with Tribotronic Magnetorheological Bearing Alexander Fetisov(B) , Yuri Kazakov, Leonid Savin, and Denis Shutin Orel State University n.a. I.S. Turgenev, Komsomolskaya Street 95, 302026 Orel, Russia [email protected]
Abstract. Journal bearings with magnetorheological fluids as tribotronic devices allow to adapt the parameters of the rotor-support system to changing operating modes. Multiple nonlinearities make them relatively complex objects for control. It is also necessary to take into account a number of restrictions at once, e.g., on the control action, friction, vibration amplitude. Such cases usually require applying advanced control techniques, such as based on machine learning methods. A controller based on deep Q-learning networks (DQN) was designed and tested for solving the task of reducing vibration and friction in the considered magnetorheological journal bearing. A nonlinear dynamic model of the rotor on such bearings was developed and validated by experimental study to implement the DQN controller. The bearing model is based on the magnetohydrodynamics equations and allows determining its dynamic parameters using the Multi-Objective Genetic Algorithm. The linearized dynamic bearing model takes into account the variability of the rotational speed and the control electromagnetic field. The learning process of the DQN agent was carried out with the establishment of a threshold on the vibration amplitude, which corresponds to the passing the resonant frequency of the rotor-bearing system. Testing the trained controller on the simulation model showed a decrease in the maximum amplitude by 15% in absolute values of vibration displacements, and by 32% in the time-averaged values comparing to a passive system. Also, the control current was not applied to the bearing outside the resonance zone, without increasing the coefficient of friction. Keywords: active fluid film bearing · intellectual controllers · DQN · magnetorheological fluid · vibration reduction
1 Introduction In recent years, the topic of controlled bearings of rotary machines has been actively developed. There are several trends in the area of active fluid film bearings. A large number of works are devoted to electromagnetic bearings [1–9]. Some recent studies focus on bearings with controllable rheological characteristics of lubricant and their influence on rotor dynamics [1–3]. Two main techniques of applying magnetorheological fluids in active bearings are considered, namely using such fluids as a lubricant [3], and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 81–91, 2023. https://doi.org/10.1007/978-3-031-35510-3_9
82
A. Fetisov et al.
magnetorheological texturing [4]. In thrust bearings a significant increase in the stiffness and damping can be achieved [5]. Design of journal magnetorheological bearings varies a lot, so different results are obtained by researchers. The current source can be a simple wire [6], a radially installed coil [7], a coil forming a pole source of a magnetic field [8]. Study [9] describe a journal bearing with radially directed magnetic field theoretically and show a significant increase in the load capacity while increasing magnetic field strength. Magnetorheological fluid film bearings have strongly nonlinear properties that makes them complex objects for control. So, only simple control techniques, such as relay control, are usually implemented by researchers. A number of constraints, e.g., on the control action, friction, vibration amplitude, should be met to provide the stable and reliable operation of rotor-bearing systems. Such complexity makes it reasonable to consider applying advanced control techniques, such as based on machine learning methods. However, machine learning and artificial intelligence methods in modern tribology and rotating machinery are mostly used for failures detection in equipment. A large overview of modern methods for diagnosing defects in rolling bearings is presented in [10]. In [11] the authors use convolutional neural networks to classify the vibration signal. In [12, 13] machine learning methods are applied for detection of bearing defects. Machine learning based intelligent control systems based on machine learning methods are almost never applied to active bearings, while this area is currently being developed towards the synthesis of the best bearing concept. The control systems that are considered in the works in this area are often simple feedback controllers (PI, PD, PID). The present study contributes the application of advanced control techniques, such as based on reinforcement learning, to designing high performance control system for nonlinear dynamical industrial applications, such as tribotronic bearings with magnetorheological application. The ability of such controllers to follow the control purposes and meet the applied constraints is demonstrated by improving the dynamic behavior of a rotor system with such bearings.
2 System Description and Modeling An active journal fluid film bearing with magnetorheological lubricant under consideration in the present study contains axially directed coils. The design of the bearing is shown in Fig. 1. The journal bearing bushing 2 is mounted housing 1 coaxially with electromagnetic actuator 5. Shaft 6 with journal 7 fixed with coupling nut 8 is placed inside the assembly. The unit is closed with two covers 3 and 4 with labyrinth seals. The purpose of the control to be implemented for such bearings is mainly minimizing rotor vibration amplitudes at critical speeds. A complex simulation model of a rotor system with such a bearing was developed for the designing and testing the bearing controller. The model includes sub-models of the adjustable bearing and a rigid rotor. The simulation model also corresponds to the structure and parameters of the experimental test rig used for its validation.
Synthesis of a DQN-Based Controller for Improving Performance of Rotor System
83
Fig. 1. Structure of an active magnetorheological bearing.
2.1 Rotor Model The rotor model is described by the equations of motion of a rigid body [14]. The schematic of the rotor rigidly fixed by the clutch is presented in Fig. 2.
Fig. 2. Schematic of a rigid rotor.
The rotor moves along the x, y axes and also rotates around them by ϕ and θ . It also rotates around z axis at an angular speed ω. Taking into account the corresponding assumptions and constraints, also including consideration of gyroscopic forces, the system of equations of the rotor motion is as follow [14, 15]: ⎧ ⎪ b2 ⎪ m¨x = Rb1 ⎪ x + Rx + Fubx , ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ m¨y = Rb1 + Rb2 + Fuby − mg, y
y
⎪ ⎪ b2 ⎪ Jd ϕ¨ + Jp θ˙ = −bRb1 ⎪ y + aRy , ⎪ ⎪ ⎪ ⎪ ⎩ b2 Jd θ¨ − Jp ϕ˙ = bRb1 x − aRx ,
(1)
where Fx is projections of all forces on the x axis, Fy is projections of all forces on the y axis, My is sum of moments on the y axis, Mx is sum of moments on the x axis, m is the mass of the rotor, Jd and Jp are the diametrical and polar moments of inertia, Rb1 x , b1 , Rb2 are the lubricant film reactions, F , R and F are the periodical unbalance Rb2 ubx uby x y y forces.
84
A. Fetisov et al.
2.2 Bearing Model The model of the flow of a magnetorheological fluid in the gap of a sliding bearing is based on the equations of magnetohydrodynamics [16]. The system of equations includes the momentum conservation equation for the flow of an incompressible viscous fluid, which has an additional term in the form of a body force from the action of an applied magnetic field. The flow of a magnetorheological fluid in the gap of the journal bearing was calculated using the control volume method in Ansys CFX. As the considered rotor motion in the bearing is implemented without mainly extending the relative eccentricity ratio of 0.5…0.6, it is possible to linearize the dynamic model of the bearing near the equilibrium state. In this case the equivalent damping and stiffness coefficients can be found by the little perturbation method [17]. The linearized coefficients K and B in these equations are the partial derivatives of the bearing reactions. The equilibrium rotor position was determined using the response surface optimization method. Figure 3 shows the response surface for X and Y components of the bearing reaction at the rotor speed of 420 Hz and a magnetic field of 0 T. The equilibrium position of the journal in the bearing was determined using the MOGA method (Multi-Objective Genetic Algorithm) [18]. The equation for lubricant film reaction for rotor oscillating around an equilibrium point is: E Fx x − xE B11 B12 Vx K11 K12 Rx = · − · , (2) − FyE Ry K21 K22 y − yE B21 B22 Vy where FxE and FyE are initial forces in a bearing, Vx , Vy are the velocities of the rotor center in the bearing, xE and yE are the coordinates of equilibrium points.
Fig. 3. Response surfaces for the horizontal and vertical bearing reactions under a certain value of the control magnetic field.
Calculation of the bearing stiffness and damping coefficients was implemented for the magnetic field range of 0..1 T and rotation speed of 1000…4000 rpm according to the parameters of the physical system used for the experimental research, see the description below. The obtained relationships for the direct stiffness and damping coefficients are
Synthesis of a DQN-Based Controller for Improving Performance of Rotor System
85
shown in Fig. 4. With an increase in the induction and intensity of the electromagnetic field, the load capacity of the bearing increases, which causes an increase in the rigidity and damping of the bearing. Based on the calculations, 16 sets of stiffness and damping of the lubricating layer were obtained. The remaining values were obtained by interpolating the desired data. The set of dynamic coefficients is also represented by the values on axis 2 and the cross coefficients. However, in this case, all coefficients are not given, and more detailed information is presented in works [19, 20].
Fig. 4. Stiffness (left) and damping (right) coefficients of the magnetorheological bearing.
The second bearing was modeled as a rigid rolling bearing. Such a model contains only one constant radial stiffness coefficient. An equilibrium point was considered equal to the equilibrium point of the fluid film bearing. A simulation model of the rotor system including the models of rotor, bearings and coupling, was created in Simulink.
3 Model Verification The developed model was verified by comparing it to the results of the experimental study carried out with the experimental rig shown in Fig. 5. The rig contains a variable
Fig. 5. Test rig with the active magnetorheological bearing (on the right end).
86
A. Fetisov et al.
speed electric motor connected to the shaft with two disks fixed on it. The rolling bearing is placed at the motor end of the shaft, and the active bearing with magnetorheological lubricant and electromagnetic actuator is at the free end. The magnetorheological support under consideration includes a hydrodynamic bearing with length of 36 mm and inner diameter of 40 mm. The bearing diametric clearance is of 150 microns. The rotor with a length of 450 mm has a mass of 5.5 kg. The first natural frequency at the initial dynamic coefficients is of 65 Hz. The left support includes a rolling bearing. A more detailed description of the experimental setup is given in [19, 20]. An experimental study to test the model included measuring the amplitude of oscillations at different speeds and different values of the applied control magnetic field in the range of 0..1 T, created by the control current of the actuator in the range of 0..1 A at a voltage of 0…10 V. The experiment included 3 levels of the applied magnetic field created by an electromagnetic actuator with a current of 0 A, 0.5 A and 1 A. The rotor was accelerated by an electric motor through a critical frequency, disconnected from the motor using a split coupling and ran out to a stop. Each case under study was reproduced 7 times. The total number of experiments was 21. Figure 6 shows the comparison of the results of theoretical and experimental study at the same range of modes. Both results show a shift of the critical frequency of the rotor system by 6 Hz (about 10% of the bench operation frequency). An increase in damping eliminates the effect of increasing amplitudes of rotor vibration. With an increase in the applied electromagnetic field, the experimentally measured vibration amplitudes decreased by 11%, while the theoretical study shows the reduction by 8%. Qualitatively similar results for this type of bearings are presented in the works of other authors [8, 9].
Fig. 6. Comparison of theoretically (left) and experimentally (right) obtained amplitudefrequency characteristics the rotor system with the magnetorheological bearing.
The results of theoretical and experimental studies agree qualitatively and quantitatively, and the model can be considered as adequate and applicable for the further study.
Synthesis of a DQN-Based Controller for Improving Performance of Rotor System
87
4 Designing a DQN Controller A DQN agent is a reinforcement learning algorithm. At each time step t the controller (agent) receives feedback from the system (environment) in the form of a state signal St , then takes an action At and a reward rt in response. It is supposed that a current state completely characterizes the state of the system. The agent trains a critic q(S, A) to estimate the return of the future reward [21]: gt = rt + γ r t+1 + γ 2 r t+2 + . . . ,
(3)
where γ is discount. During the training process it is necessary to achieve the minimization of the error between the trained function q(S, A) and the optimal function q∗ (S, A) that can be estimated with the Bellman equation [21]. The critic is normally an artificial neural network that minimizes the loss function while training: 2
1 m
yt − q St , At |(k) , (4) L (k) = i=1 m where (k) are the weights of the network, m is the number of training samples in minibatch, yt = rt + γ max qt+1 (St+1 , At+1 ) is the estimation for the future reward. A
The task of controlling the dynamic behavior of a magnetorheological fluid friction bearing was to minimize the vibration displacement amplitudes at critical speeds, and minimizing the friction torque in other operating modes. According to the theoretical calculation, the less is the control magnetic field, the less is the friction torque, as shown in Fig. 7.
Fig. 7. The dependence of the coefficient of friction in the bearing on the operating parameters.
An amplitude limit of 40 µm was set as the threshold for the active control system based on the DQN agent in order to minimize the rise in friction torque when the system is far from its critical speeds. Crossing this threshold would trigger a learning interrupt and receive a reward of −30 points. If the control system managed to keep the rotor within the specified limits of vibration, then the agent received a reward of +0.02 point.
88
A. Fetisov et al.
The rotor was kept in a given vibration displacement band by switching between sets of dynamic coefficients. The simulation time was 200 s. The frequency of the control action of the controller was 10 signals per second. The Adam algorithm was used as an optimization method. The learning rate was 0.001. Figure 8 shows the structure of the DQN-agent for the considered bearing. The DQN agent training procedure is shown in Fig. 9. The training process included 5000 episodes. The network parameters are as follows: the Adam algorithm was used as an optimization method, the learning rate was 0.001, GradientThreshold was 1, discount factor was 0.85, mini batch size was 250.
Fig. 8. The DQN-agent for the magnetorheological fluid-film bearing system.
For critic system, a fully connected neural network with two inputs was used. For the first input, two hidden layers with 14 and 18 neurons, respectively, were used. For the second input, one hidden layer with 18 neurons was used. The average value of the reward received in the learning process for 40 episodes was 25. When the average reward reached the maximum reward value (25 points), the training was interrupted, and the resulting controller model was considered trained.
Fig. 9. The DQN-agent training process.
Synthesis of a DQN-Based Controller for Improving Performance of Rotor System
89
5 Results and Discussion The trained DQN controller was finally tested using the simulation model of the rotor system. The simulation scenario included the acceleration of the rotor during 200 s from 0 to 800 rad/s. This range covers the critical speed of the rotor system approximately at 350 rad/s. The simulation results are shown in Fig. 10. The results show that the applying the developed DQN controller reduces the vibration amplitude by 15% comparing to the passive system. RMS value of vibration displacements, averaged over time, decreased by 32%. The control magnetic field was applied to the system only during passing the critical speed, while the rest of the simulation time the vibration amplitude remained within the 40 µm threshold, and the control current was not applied. Unlike conventional controllers, such as PID, DQN controller does not provide the control law in the explicit form. With that, the results show that the obtained control algorithm meets the constraints minimizing the viscous friction in the bearing outside the resonance zone.
Fig. 10. Testing the designed DQN controller of a magnetorheological fluid-film bearing at the scenario of the rotor acceleration and passing the critical frequency.
It should be noted that rotor systems usually have large non-linearity in parameters. Also, control tasks in such supports are often multicriteria, which gives prerequisites for the use of intelligent control systems. Implementing classical controllers, e.g., PID, in such conditions can be complicated and even not able to provide the stable performance. Setting up adaptive properties to such systems is able to improve their performance, but takes a lot of time and leaves the human factor. Intellectual and self-tuning controllers utilizing the system models, like DQN, look preferable in such cases. The presented control algorithm developed using the described method made it possible to solve the tasks meeting all the purposes and limitations. The significant advantage of the described approach is simplicity of considering the necessary constraints in the system. Even multiple constraints can be easily taken into account by the rewards system during the training process. At the same time, the training time and the complexity of selecting hyperparameters are the drawbacks of such an approach. Also, the disadvantage is the limited use of
90
A. Fetisov et al.
this regulator for solving problems other than the one presented. However, the range of control problems solved by controllers of this type can be expanded by carrying out a large number of computational experiments on a simulation model and depends only on the available computational facilities. Also, the required computational time can be reduced by simplifying the used models, including their linearization and/or using approximation techniques, as was shown in the present study.
6 Conclusion The developed methodology for synthesis intellectual controllers for tribotronic bearings includes several main stages, namely, analysis of the requirements to the desired operation modes; specifying the corresponding agent training parameters; developing and validation of the process model; training and testing the controlling agent. The described case of synthesizing the DQN agent for the rotor system with active magnetorheological bearing demonstrates improvement in its dynamic behavior, including reducing vibrations at the critical frequency and minimizing the friction. Also, several particular conclusions can be drawn. 1. Mapping the key dynamic parameters of a tribotronic bearing within the range of available control actions allows obtaining models with sufficient accuracy to train controlling DQN agents. 2. If it is possible to linearize the system under the given operating conditions, the order of the approximation dependences can be reduced and the process simulation speed can be improved. 3. A set of parameters and constraints to be considered by the penalties and rewards system during training the agent allows not only to directly limit the values of particular parameters in the resulting process, but also adjust the physical modes of the object operation based on the a priori information about the physical characteristics and significant relationships describing it. In the considered case, the control was implemented only during passing the critical frequency, while in other modes the controller does not affect the operation of the rotor system. The developed methodology in general is suitable for the synthesis of controllers based on deep reinforcement learning with an actor-critic approach for wider range of tribotronic systems, including active bearings, seals, dampers. Acknowledgment. The study was supported by the Russian Science Foundation grant No. 22– 19-00789, https://rscf.ru/en/project/22-19-00789/.
References 1. Asadi Varnusfaderani, M., Irannejad Parizi, M., Hemmatian, M., Ohadi, A.: Experimental parameters identification of a flexible rotor system equipped with smart magnetorheological bearing. Mechatronics 87, 102880 (2022)
Synthesis of a DQN-Based Controller for Improving Performance of Rotor System
91
2. Quinci, F., Litwin, W., Wodtke, M., Nieuwendijk, R.: A comparative performance assessment of a hydrodynamic journal bearing lubricated with oil and magnetorheological fluid. Tribol. Int. 162, 107143 (2021) 3. Bompos, D., Nikolakopoulos, P.: Journal bearing stiffness and damping coefficients using nanomagnetorheological fluids and stability analysis. J. Tribol. 136, 041704 (2014) 4. Lampaert, S.G., Quinci, F., van Ostayen, R.A.: Rheological texture in a journal bearing with magnetorheological fluids. J. Magn. Magn. Mater. 4(499), 166218 (2020) 5. Horak, W., Szczech, M.: The analysis of the working conditions of a thrust squeeze bearing with a magnetorheological fluid operating in the oscillatory compression mode. Tribologia 285, 45–50 (2019) 6. Osman, T., Nada, G., Safar, Z.: Effect of using current-carrying-wire models in the design of hydrodynamic journal bearings lubricated with ferrofluid. Tribol. Lett. 11, 61–70 (2001) 7. Hesselbach, J., Abel-Keilhack, C.: Active hydrostatic bearing with magnetorheological fluid. J. Appl. Phys. 93, 8441–8443 (2003) 8. Urreta, H., Aguirre, G., Kuzhir, P., de Lacalle, L.N.L.: Actively lubricated hybrid journal bearings based on magnetic fluids for high-precision spindles of machine tools. J. Intell. Mater. Syst. Struct. 30, 2257–2271 (2019) 9. Zapomel, J., Ferfecki, P.: A new concept of a hydrodynamic bearing lubricated by composite magnetic fluid for controlling the bearing load capacity. Mech. Syst. Signal Process. 168, 108678 (2022) 10. Hamadache, M., Jung, J.H., Park, J., Youn, B.D.: A comprehensive review of artificial intelligence-based approaches for rolling element bearing phm: shallow and deep learning. JMST Adv. 1, 1–27 (2019) 11. Chen, H.Y., Lee, C.H.: Vibration signals analysis by explainable artificial intelligence (xai) approach: application on bearing faults diagnosis. IEEE Access 8, 134246–134256 (2020) 12. Pandarakone, S.E., Mizuno, Y., Nakamura, H.: A comparative study between machine learning algorithm and artificial intelligence neural network in detecting minor bearing fault of induction motors. Energies 12, 2105 (2019) 13. Ashraf, W.M., et al.: Artificial intelligence based operational strategy development and implementation for vibration reduction of a supercritical steam turbine shaft bearing. Alexandria Eng. J. 3(61), 1864–1880 (2022) 14. Chasalevris, A., Dohnal, F.: Modal interaction and vibration suppression in industrial turbines using adjustable journal bearings. J. Phys.: Conf. Ser. 744, 012156 (2016) 15. Di, L., Lin, Z.: Control of a flexible rotor active magnetic bearing test rig: a characteristic model based all-coefficient adaptive control approach. Control Theory Technol. 12(1), 1–12 (2014). https://doi.org/10.1007/s11768-014-0184-0 16. Davidson, P.A., Thess, A. (eds.): ICMS, vol. 418. Springer, Vienna (2002). https://doi.org/ 10.1007/978-3-7091-2546-5 17. Naife, A.: Introduction to Perturbation Methods, 2nd edn. Mir (1984) 18. Cherny, S.G., Chirkov, D.V., Lapin, V.N., et al.: Numerical Modeling of Flows in Turbomachines. Nauka, Novosibirsk (2006) 19. Fetisov, A.S., Kazakov, Y., Tokmakov, N.V.: Rotor trajectories on magnetorheological fluidfilm bearings. Fund. Appl. Prob. Eng. Technol. 6(350), 76–82 (2021) 20. Fetisov, A., Kornaev, A.: Journal bearing with variable dynamic characteristics: simulation results and verification. Fund. Appl. Prob. Eng. Technol. 2(346), 140–145 (2021) 21. Mnih, V., et al.: Playing atari with deep reinforcement learning, pp. 1–9 (2013)
Card-Not-Present Fraud Detection: Merchant Category Code Prediction of the Next Purchase Marouane Ait Said(B) and Abdelmajid Hajami LAVETE Lab, Faculty of Science and Technics, Hassan First University, Settat, Morocco [email protected], [email protected]
Abstract. Card-not-present transaction fraud still have damage nowadays to banks even after implementing an advanced fraud detection system, we are talking about 75% in value of all card frauds on average [2]. About 8 million reported cases in 2018 and the number is increasing every year [2], billions of dollars are lost due to the CNP fraud and AI systems are still unable to spot those transactions in real-time, and only rely on customers reporting the fraud case after the damage already happened, in this article, We will identify the harm caused by the CNP transaction and offer a potential artificial intelligence-based solution that can forecast the future purchase based on consumer Merchant Category Code activity, fraudster MCC behavior, and how MCC usage may help avoid a fraud case. First, we will showcase the difficulty of this type of fraud, then we will answer the following question, how can a Merchant category code be used to prevent this kind of fraud? (*) finally, we will present the design of our MCC clustering solution and how it can impact the payment system actors. Keywords: Card-not-present · Clean Fraud detection · Artificial intelligence · Machine Learning · MCC
1 Introduction Day after day the number of transactions per customer per day is increasing due to the diversity of the payment methods (Mobile payment, NFC, etc.…) The banks and the customer will find it hard to track fraud, and small amounts of fraud on CNP(card not present) operations can no longer be detectable by the bank or the cardholder himself, and the AI systems dataset will remain unbalanced and not accurate to train the systems and the confidentiality of transactions messages is preventing the development of the systems which explain why until today cardholder are notifying banks with fraud cases from the past, so new dimensions needs to be introduced to the game of real-time payment fraud detection, In this paper, we will introduce a system based on merchant category code prediction, and provide a prototype of how can this new dimension help with the AI systems training.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 92–98, 2023. https://doi.org/10.1007/978-3-031-35510-3_10
Card-Not-Present Fraud Detection
93
2 State of the Art 2.1 What is a Card-Not-Present Transaction? When the cardholder or his debit/credit card is not physically present during the transaction triggering moment from the merchant point of sale, the transaction is called a CNP (card-not-present) transaction, most common for e-commerce, by phone, fax, mail, etc. The transaction is a “card-present” transaction if The payment information is physically taken, This happens when an EMV chip is processed or when cards are physically in contact with the reader. In this article, we are targeting all types of CNP transactions such as: – Online purchase: used in E-commerce transactions via the internet. – Phone/Mail purchase: cardholder sharing his card details with a business via Phone or Mail (Technically the phone/mail purchase triggering is not made by the cardholder since he didn’t visualize the amount used during the transaction) – Recurring payments: automatic transaction for billing purposes. 2.2 Card not Present Fraud Scenario Instant Payments are electronic payments that are handled in seconds and credited directly from one bank to another. Both the payer and the payee receive immediate confirmation, and the entire procedure takes less than 10 s and can be made at any time of the day. There is a transfer limit of 100,000 EUR. Instant Payments are always one-time transactions, and each transfer is final.
Fig. 1. Card not present authorization request process
1) The cardholder is trying to purchase a transaction, (for card-not-present only card number, expiry date, and CVC is required, along with 3DS if enabled.
94
M. Ait Said and A. Hajami
2) The fraudster recorded the card number, expiry date and CVC of the cardholder for fraud uses (Even 3DS can be cracked with advanced hacking tools). 3) At this level, the cardholder data is the same for normal and fraud transactions. 4) The merchant is not able to distinguish between the transaction coming from the legitime cardholder and the one that came from the fraudster. 5) The Acquirer will transfer the authorization to the payment network (Visa, MasterCard…) 6) The payment network will forward the authorization request to the issuer bank. 7) The issuer bank has no input to identify if the transaction source is the cardholder itself or a fraudster, security checks will lead to treating both transaction the same way. The above Fig. 1 is showing that from the issuer’s perspective, the transaction will appear the same, In this paper new dimension will be introduced to help the issuer in decision making. The MCC prediction for the next purchase is the new dimension that we are using in this paper to predict the future transaction of a cardholder, but what is the MCC? and how can it be used to serve fraud detection system 2.3 What is the Merchant Category Code? Credit card issuers utilize merchant category codes, which are four-digit numbers, to categorize different types of businesses. The sorts of services or products being sold to clients are indicated by a business’s MCC. If a company offers both products and services, the MCC will normally represent the sort of business that generates the majority of its revenues. A company could occasionally be able to ask for an extra MCC for a different area of the business. A superstore that has a pharmacy and a grocery store, for instance, may have various MCCs in the same structure. MCC Codes, also known as merchant category codes, are used to distinguish between various business and industry kinds. They are provided to merchant accounts during the setup process. Every sector has unique transactional patterns and varying risk thresholds (potential for fraud). The issuing bank uses MCC codes to decide whether or not to accept the transaction. For instance, only Nevada, New Jersey, and Delaware allow for internet gambling. To stop transactions from arriving from states that forbid online gaming, use the code 7995. The processing rates for merchants may also be impacted by MCC codes. A merchant could be forced to pay more processing costs than necessary as a result of an incorrect categorization. Visa has almost 500 MCCs may differ depending on the card processor, although there are certain similarities. The following is a list of some CitiBank[] popular merchant categories to give you a sense of the codes. MCC Codes - Merchant Category Codes to use in ISO8583 Field 18, which means that every transaction in the payment network should have a 4 digits number to indicate it’s merchant category code.
Card-Not-Present Fraud Detection
95
2.4 Prediction of Merchant Category Code for the Next Buy
Fig. 2. Overview of the proposed solution
In this paper, the focus will be on the concept followed in our proposed solution, the technical and detailed solution will be drafted in a full paper article later on, where the input data of our solution is cardholder browsing cookies and social media activity, and the output is a forecast model available for the issuer to consume with the prediction of the next purchase merchant category code that will be used by the cardholder. To achieve our goal, we used a combination of clustering algorithms to approach the MCC cluster (Fig. 3) and a classification algorithm to forecast the exact MCC from the range, The primary objective of a classification algorithm is to determine the category of a given dataset, and these algorithms are mostly employed to forecast the results for categorical data, In our case, the categories are the MCCs shown in Fig. 2. The number of clusters is already defined (Fig. 2) (K-means clustering with k = 14), and our clustering issue is referred to be a multi-class classifier since there are more than 9999 possible outputs. We used a public dataset from a 1-year browsing activity from different sources (Cookies and browsing history, Facebook activity, Google activity …). Implementing K-means with java libraries and we got promising results that we will be shared in detail in the full article.
3 Related Works 3.1 Online Payment Fraud Detection AI Proposals In recent years, especially after the banks’ shift to fast payment, extensive studies have been conducted on many online payment fraud detection elements. Still, almost all of the proposals have a low detection ratio, or if they have a high detection ratio, the implementation of the suggested solution is missing, for example:
96
M. Ait Said and A. Hajami
Fig. 3. Clustering diagram
• A Research from 2016 [7] used “RUS based on linear mapping, non-linear map-ping, and probability” with the dataset of Taiwanian bank reached 79.73% payment fraud detection ratio. • A Research from 2018 [1] used “Random Forest Algorithm and Decision Tree” with an e-Commerce Chinese company dataset that reached a 98.67% payment fraud detection ratio. • A Research from 2018 [3] used “KNN” with Kaggle dataset reached 96.9% payment fraud detection ratio and same paper mentioned ratio of 98.2% using Logistic Regression. • A Research from 2018 [8] used “Back Propagation Neural Network with Whale Optimization Algorithm” with Kaggle dataset reached 96.4% payment fraud detection ratio. • A Research from 2018 [9] used “Deep Autoencoder” where the dataset was from Kaggle, reaching a 90% payment fraud detection ratio. • A Research from 2018 [2] used “Artificial Neural Networks” where the dataset was from an unknown source reached an 88.9% payment fraud detection ratio. • A Research from 2019 [6] used “Local Outlier Factor and Isolation Forest algorithm” with German bank dataset reached 99.6% payment fraud detection ratio; • A Research from 2019 [4] used “Auto-encoders Deep Learning” with an unknown dataset reached 91% payment fraud detection ratio; Despite that, the accuracy of the above researches and other researches [5, 10–16] looking pretty much high, but when we go through each one of them, we face the fact that the ratios are not valid, because the lack of information about which type of fraud is targeted is not mentioned, and any fraud beside clean fraud can be easily detected if the EMV standards are implemented, for example, Maniraj, S.P., Aditya, S., Shadab [4] paper with 99.6% accuracy and Xuan, S., Liu, G., Li, Z., Zheng [1] 98.67% accuracy didn’t mention precisely if EMV checks were part of the detection ration during their
Card-Not-Present Fraud Detection
97
work and which fraud type was targeted, which lead to the conclusion that this can’t be implemented in the real world where all the basic payment fraud are already detected during security checks of EMV fields (Application cryptogram and PIN, CVC3, etc…),
4 Conclusion In this article, we presented the scope of our approach, which is the card-not-present transactions (75% in value of all card frauds), also we did introduce a new use of the Merchant category code which is a required data that should be provided in all payment authorizations, Then we showed the normal fraud scenario which can’t be distinguished from a legitim one because the inputs are pretty much the same, After that, we explained what is the use of merchant category codes in Today’s payment lifecycle and what we want to use in our fraud detection model. The ability to predict the next MCC that the cardholder will use in his next cardnot-present purchase will give the payment actors a new dimension that wasn’t explored yet, and this will open the gate to future approaches that can be used to enhance how much the issuer bank knows about the cardholder and the fraudster because we can do the same analyses and provide an algorithm than predict next MCC that a fraudster will use as well, and the intersection between the two inputs will make it much clear to the issuer and payment actors to identify the fraudulent transaction in real-time.
References 1. Ait Said, M., Hajami, A.: AI methods used for real-time clean fraud detection in instant payment. In: Abraham, A., et al. (eds.) SoCPaR 2021. LNNS, vol. 417, pp. 249–257. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96302-6_23 2. https://www.thalesgroup.com/en/markets/digital-identity-and-security/banking-payment/ digital-banking/dcv#:~:text=What%20is%20card%2Dnot%2Dpresent,merchant%20for% 20a%20visual%20check 3. Charleonnan, A.: Credit card fraud detection using RUS and MRN algorithms. In: Management and Innovation Technology International Conference (MITicon), 2016, pp. MIT-73. IEEE (2016) 4. Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017) 5. Maniraj, S.P., Aditya, S., Shadab, A., Deep Sarkar, S.: Credit card fraud detection using machine learning and data science. Int. J. Eng. Res. Technol. (IJERT) 08(09) (2019) 6. Renstrom, M., Holmsten, T.: Fraud Detection on Unlabeled Data with Unsupervised Machine Learning. The Royal Institute of Technology (2018) 7. Pumsirirat, A., Yan, L.: Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine. Int. J. Adv. Comput. Sci. Appl. 9(1), 18–25 (2018) 8. Al-Shabi, M.: Credit card fraud detection using autoencoder model in unbalanced datasets. J. Adv. Math. Comput. Sci. 33(5), 1–16 (2019). https://doi.org/10.9734/jamcs/2019/v33i53 0192 9. Marie-Sainte, S.L., Alamir, M.B., Alsaleh, D., Albakri, G., Zouhair, J.: Enhancing credit card fraud detection using deep neural network (2020)
98
M. Ait Said and A. Hajami
10. Cheng, T., Wen, P., Li, Y.: Research status of artificial, neural network and its application assumption in aviation. In: 2016 12th International Conference on Computational Intelligence and Security (CIS) (2016) 11. Pattidar, R., Sharma, L.: Credit card fraud detection using neural network. International Journal of Soft Computing and Engineering (IJSCE) 1(NCAI2011) (2011) 12. Maes, S., Chenwinkel, B.V., Maderick, B.: Credit card fraud detection using Bayesian Networks and Neural Networks. Department of computer science and computational modeling lab (COMO), Brussels, Belgium (2009) 13. Sethi, N., Gera, A.: A revised survey of various credit card fraud detection techniques. Int. J. Comput. Sci. Mobile Comput. 3, 780–791 (2014) 14. Holand, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1992) 15. Özçelik, M.H., I¸sik, M., Duman, E., Çevik, T.: Improving a credit card fraud detection system using a genetic algorithm. In: 2010 International Conference on Networking and Information Technology. IEEE (2010) 16. Ojgo, A.A., Ebook, A.O., Yoro, R.E., Yerkun, M.O., Efozia, F.N.: Framework design for statistical fraud detection (2015)
Fast Stroke Lesions Segmentation Based on Parzen Estimation and Non-uniform Bit Allocation in Skull CT Images Ald´ısio Gon¸calves Medeiros(B) , Lucas de Oliveira Santos, and Pedro Pedrosa Rebou¸cas Filho Instituto Federal de Educa¸ca ˜o, Ciˆencia e Tecnologia do Cear´ a, Fortaleza, Brazil {aldisio.medeiros,lucas.santos}@lapisco.ifce.edu.br, [email protected]
Abstract. Stroke is the second leading cause of death worldwide. Those who survive usually experience vision loss, speech, paralysis, or confusion. Agile diagnosis proves to be decisive for patient survival. This paper presents an approach for fast stroke lesions segmentation in skull CT images, with 0.5 s, on average, by sample. The segmentation stage uses Parzen window density estimation to classify potentially injured regions. We present a methodology for applying the non-uniform distribution of bits in the image from an evaluation of the quantization laws: μ-law and A-law algorithm. The quantization algorithms modify the distribution of pixels, and the result is an enhancement of the region where the lesion is suspected. The results show that the proposed method has the highest mean of accuracy, reaching 99.85%, and specificity of 99.94%, surpassing classical approaches by 16%. On the other hand, the algorithm presented similarity indexes of 93.39% for the Matthews correlation coefficient and DICE of 93.35%. The proposed methodology also compared the results with four approaches that use deep learning; it proved to be equivalent in accuracy, DICE, and specificity, with superior results in sensitivity up to 8% to one of the approaches based on the recent Detectron2 neural network. The results indicate that the proposed method is competitive concerning the approaches already presented in the literature. Keywords: Stroke region segmentation μ-law · A-law · aid to medical diagnosis
1
· Parzen window · level set ·
Introduction
Stroke is the leading cause of disability in the world. It is estimated that 70% of patients who have suffered a stroke do not return to work due to sequelae or because they need specific care. The most significant consequences are difficulties in speech, motor coordination, locomotion, and cognitive delays. The degree of disability is directly related to the fastness of diagnosis and appropriate treatment. The longer the time of diagnosis, the greater the chances of aftereffects of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 99–109, 2023. https://doi.org/10.1007/978-3-031-35510-3_11
100
A. G. Medeiros et al.
the disease that cause disability. Therefore, detecting the lesion contributes to a fast diagnosis that improves patient survival [4]. The computed tomography (CT) exams of the skull are one of the most used techniques for the rapid diagnosis of the patient because this technique presents a lower cost, more quickness and is a non-invasive procedure. In this scenario, image analysis stands out in the clinical evaluation, helping the specialist identify the lesion region, assess the size of the affected brain area, and thus establish the diagnosis and preliminary treatment of the pathology [7]. In recent years, several studies have directed efforts toward developing algorithms that automatically detect lesions through CT exams. This stage is critical to classifying the type of stroke to define the medical procedure in the preliminary treatment. Different recent approaches present the application of techniques based on deep learning. The approaches based on deep learning present promising results; however, commonly, these solutions present limitations for applications with low computational resources. This work proposes a new probabilistic approach based on Parzen density estimation for classifying regions to detect hemorrhagic stroke on CT images of the skull. Fully automatic segmentation combined a non-uniform bit allocation from A-law and μ-law algorithm to highlight the lesion candidate region as a previous step to probabilistic classify the pixels. Besides, this approach evaluates radiological densities of the skull to establish a safe area for the study of the lesion. Our method is called Level Set Based on Radiological Densities inspired by the μ-Law (LSBRD-μ), as an evolution of the traditional Level Set algorithm initially proposed by Osher and Sethian [8]. The results of the proposed method are compared with recent approaches in the literature, LSCPM [9], Fuzzy C-means (FCM), Ada-MGAC [7], and the recent method known as FPLS [14]. In addition, the proposed evaluation compares the results against classic techniques as Watershed [6], besides different deep learning-based methods. Specifically, this article offers the following study and contributions: 1. A fully automatic approach to hemorrhagic stroke segmentation; 2. Classification based on probability density as a low computational cost technique; 3. Image enhancement based on non-uniform bit allocation with μ-Law and ALaw; 4. Low detection time. This work is organized as follows: Sect. 1 presents the segmentation techniques evaluated in this study, and the Sect. 3 the main techniques used as the theoretical basis of this study. The Sect. 4 presents the proposed methodology and in Sect. 5 the results are discussed based on the evaluation metrics. Finally, the Sect. 6 presents contributions, conclusions and proposals for future studies.
Fast Stroke Lesions Segmentation
2
101
Related Works: classical and Deep Learning Approaches
The Watershed algorithm has been widely evaluated in the literature, highlighting its recent application in stroke detection in work by K¨ orbes and Lotufo [6], in addition recent works applied it to identify brain lesions on magnetic resonance imaging. As a classic technique, the Watershed algorithm depends directly on the intensity variations in the image, so it is sensitive to contrast oscillations presents in computed tomography images. The Fuzzy C-means (FCM) method presents a behavior based on the similarity between pixels in a region, proposed by Dunn [2]. Pixels are grouped according to their characteristics through iterative minimization of the cost function. FCM shows promising results in low-contrast imaging, such as CT of the brain. However, this algorithm uses Euclidean distance as a similarity metric between each pixel evaluated about the formed clusters. This procedure is computationally expensive for time-constrained applications. Rebou¸cas et al. [9] proposed the Level Set based on analysis of radiological densities. It is inspired by the segmentation algorithm known as Level Set, initially proposed by [13]. An evaluation of the stroke region is presented based on the density of the tissue represented in the image under Hounsfield Units (UH), adopting 80 UH for window width and 40 UH for center level. Figure 1 presents an example of tissue analysis from densities.
Fig. 1. From left to right, three examples of the analysis of cerebral radiological densities. The first show the CT scan image, in the right side show different densities of intracranial tissue. This is an analysis of the HU densities of the skull (spinal fluid in magenta, white mass in green, gray matter in blue, blood in red). The lesion is highlighted in red inside the skull.
Medeiros et al. [7] presented a method called Ada-MGAC inspired by techniques from mathematical morphology. According to the Level Set approach, proposed by [13], Ada-MGAC presents the detection of stroke as a geodesic active contours model does not depend on the parameterization of the curve. The detection of the curve that defines the region corresponds to the geodesics of a riemannian space. A recent algorithm Level Set based for the detection of stroke in tomography images was proposed by Rebou¸cas et al. [14] called Flog Parzen Level Set (FPLS). In this work, the authors suggest an approach combining estimates
102
A. G. Medeiros et al.
through the Parzen window to classify pixels in the region that indicate the presence of a lesion with the transformation of pixel intensity by applying the logarithm function. The work of Rebou¸cas et al. [14] has few iterations to achieve convergence, which is the main advantage of this method. However, the different characteristics in the pixels of the CT images of the skull contribute to different contrasts. These differences directly affect the choice of the logarithm base used to identify the candidate regions. Recent works use Convolutional Neural Networks (CNN) as feature extractors, known as deep feature extractors. In this sense, it uses the output of neurons from the last convolutional layer as a sample feature vector. In work by Han et al. [4] the identification of stroke is proposed in two stages. This work combines the output of the deep feature extractors with the Detectron2 deep neural network. Detectron2 network is combined with Parzen window (Detectron2-fλ), K-means clustering (Detectron2-fδ), and with Region Growth (Detectron2-fμ).
3
Materials and Methods
The following is described the theoretical foundation of the Level Set segmentation algorithm and a description of Probabilistic classification based on the Parzen. Finally, we present the evaluation metrics proposed in this work. 3.1
Level Set
Osher and Sethian [8] proposed a numerical solution based on implicit contours called Level Set. Let a two-dimensional space with two regions and that there is a boundary between them. Assume C(x, y) as a function that defines the boundary. The idea of the implicit contour is to represent C in a function φ that can deform the boundary, thus defining new positions for the points. For this, we add a third component to space that represents time. We call the new boundary function φ[(x, y), t]. Making the function φ[(x, y), t = 0] = C(x, y), we have an initial value for the function φ. We, therefore, have, for φ[(x, y), t = 0] the Level Set zero. The curve C(x, y) becomes a level curve embedded in the Level Set φ. This can be done, for example, using a signed distance function. Equation 1 presents this description: −sign(x, y, C) , if (x,y) is out of C curve φ[(x, y), t] = (1) sign(x, y, C) , otherwise. where sign denotes the distance from the point (x, y) to the closest point on C. In this representation, one of the advantages is that we do not consider the parameterization of the curve itself, but the region circumscribed by the curve. By manipulating φ which defines a new boundary for the C curve. This modeling also offers adaptation to topological changes, as illustrated in Fig. 2.
Fast Stroke Lesions Segmentation
103
Fig. 2. Level set function representation ψ(x, t) and the curve φ(s, t). Adapted from [13].
3.2
Parzen Window
The Parzen Window [1] is a non-parametric type of probability density estimation method. This algorithm does not assume a probability distribution of the data. This approach calculates the probability that a point z belongs to a nonEuclidean region defined by a hypercube R = {zi }ki=1 with dimensionality d as in the X Eq. 2: k zi − z 1 ϕ p(z) = , (2) k i=1 h where ϕ is the kernel function used to limit the neighborhood, h is the hypercube edge size and k is the number of pixels of R. The Gaussian kernel is the most used function in the literature to evaluate the probability from the neighborhood of points because it has relevant properties. The first property is that the Gaussian function has a smoothed surface, which contributes to the fact that the density estimation function p(z) also presents a smooth variation. 3.3
Non-uniform Bit Allocation: μ-law and A-law Algorithms
According to Ribeiro [12], The μ Law algorithm (μ-law), also approximated by the μ Law, is a compression and expansion algorithm used mainly in Code Modulation telecommunications systems 8-bit Pulse [3], in North America and Japan. Compression and expansion algorithms, represented by the term companding, aim to reduce a noise channel and distortion effects with a limited dynamic range. Given an input signal M (x, y) the compression algorithm of the μ-law can be described in analog form according to the Eq. 3: M (x, y) = f (x, y) ·
ln[1 + μ · f (x, y)] ln(1 + μ)
(3)
where M is an image that indicates the sign of the input value x, y and μ=255. The A-law compression algorithm is widely used in European countries. In the context of digital image signals, the Law-A compression equation can express be according to the Eq. 4:
104
A. G. Medeiros et al.
A(x, y) = f (x, y) ·
1 + [ln[A · f (x, y)]] 1 + ln(A)
(4)
where A is an image that indicates the sign of the input value x, y and A=87,6. The main difference between the compression algorithms mentioned above is the magnitude of compression the output signal undergoes. In this sense, while the μ-law algorithm limits the values to 13 bits, the A-law algorithm compresses the samples to 12 bits. The mi-law magnifies the contrast between pixels, this can be seen with the best distribution of pixels in the histogram. The A-law, on the other hand, only enhances the region of greatest intensity. 3.4
Datasets and Evaluation Metrics
CT images were obtained with the support of the Trajano Almeida clinic (Diagnostic Imaging) and evaluated in previous studies (Rebou¸cas et al. [9,11]). All sensitive patient data was omitted in all analyses of this study. Seven evaluation metrics were used to verify the classifiers performance: Accuracy (ACC), Sensitivity (TPR), Specificity (SPC), Precision (PPV), Matthews correlation coefficient (MCC), Dice (DSC) and Jaccard (JSC). The calculation is done through the confusion matrix generated by the classifiers [5].
4
LSBRD: An Approach Based on Parzen Estimation and Non-uniform Bit Allocation via μ-law and A-law
In this section, we detail the methodology proposed in this work. In Step 1: Skull segmentation process, a threshold applied to segment the skull region was in the specific range for bones, which is 200 HU. This way, all pixels above this threshold characterize the brain bone or other noises from the CT scan machine. The result of this process can be seen in Fig. 4 in items a), b), and c) of Step 1. After removing the patient’s skull, via mathematical operation of intersection between a) and b), was applied a blurring filter to smooth the image contours, item d) of Step 1 (Fig. 4). In Step 2: Preprocessing with μ-law and A-law the region of interest detection, the pixel range that contains the bleeding regions is between 56 to 76 HU, Rebou¸cas et al. [10]. However, using only a thresholding function in the grayscale image to separate the area of interest from the image produces a lot of noise around the lesion. That is because different brain regions can have the same pixel intensity. Thus, only a thresholding function is not practical enough to locate the bleeding part. However, if this pixel range is reducing, some information about both noise and the lesion is also reduced. Thus, the threshold used in this work is more restrictive and, therefore, with a smaller range than the standard. Although a small part of the lesion has been reduced, the stroke is still pre-segmented.
Fast Stroke Lesions Segmentation
105
Given character, the μ-Law, when applied in a region, the pixels’ intensities in that location tend to get closer. That causes a level of lightning in the image. In this way, the μ-law has applied to the range of pixels that do not match the Avg. That range has where reduced, as seen earlier. Therefore, the region undergoes a lightening in pixels, as can be seen in Fig. 3 in items a) and b). Thus, there was a considerable difference between pixels that are lesion and those that are not.
a) Original and histogram
b) A-law result and histogram
c) Mi-law result and histogram
Fig. 3. Comparison of the histogram of the original exam, exam after A-law and μ-law enforcement, respectively.
Although there is a considerable gain in reducing noisy image elements, a post-processing step is necessary so that only one part serves as input for the classification. Thus, a morphological erosion operation was applied to the image before applying a threshold. That was necessary to circumvent the effects resulting from the limitation of the pixel intensity range that characterizes hemorrhagic stroke (HS), as described in Step 2, which caused a slight loss of lesion information, as can be seen in Step 2 of Fig. 4, in the law μ result. In Step 3: Level Set Initialization, as the lesion region bein located, in the Step 2, it is used as the initialization Level Set of the Parzen Window. Parzen windowing classifies the pixels that are or are not part of the lesion. Figure 4 in Step 3 shows the visual results of the evolution of the Level Set curvature. Tables 1 and 2 show the average results of the qualitative and quantitative metrics applied in the validation process of the generated results. The active contour iterates over each pixel in the image. Thus, the larger the picture, the longer the estimated time for processing to finish. In this sense, before Parzen bein applied to the impression, it is reduced by 50% of its original size, and at the end method, it increases the original size. This process proved to be effective both in evaluative metrics and in the average processing time per image.
5
Results and Discussions
In this section, the results obtained by the LSBRD-μ are present. To validate the results obtained, the database described in Sect. 3.4 bein used. Table 1 shows the results of the MCC, DSC, and JSC metrics and also their respective standard deviations. As can be seen, for the MCC metric, the LSBRDμ is superior by 4.39% and by 2.04% the methods Rebou¸cas et al. [14] and
106
A. G. Medeiros et al.
Medeiros et al. [7], which is the one with the best score (91.35%), respectively. Furthermore, the method Korbe et al. [6] presents the worst indexes for this metric with 86.87%, being surpassed by the LSBRD-μ in 6.52%. Considering the DSC metric, the proposed method obtained the best values, being more effective than the Medeiros et al. [7] method by 2.07%. As the Medeiros et al. [7] method is the most effective among the competitors with a 91.28% hit rate, it is was understood that the LSBRD-μ is more accurate than the others. For the JSC metric, was repeated the same comparative logic, as the Medeiros et al. [7] method is the best among the competitors. However, the proposed method is the most effective, as it outperforms the Medeiros et al. [7] method by 3.45% and the Korbe et al. [6] method by 10.49%, which obtained the lowest performance.
Fig. 4. Flowchart of the method applied.
These MCC, DSC, and JSC metrics express the level of proximity between the segmented image and the one marked by the specialist. Thus, it is was understood that the LSBRD-μ is closer to the gold standard of segmentation than the others. Furthermore, the standard deviation values obtained are the smallest, being 1.18% lower than the Medeiros et al. [7] and Rebou¸cas et al. [14] methods for the MCC metric. Considering the DSC metric, the LSBRD-μ is 1.25% lower than the Medeiros et al. [7] method and 6.72% lower than the Korbe et al. [6] method, which is the worst among all. The same logic extends to JSC metrics, where the proposed method is 1.85% lower than the Medeiros et al. [7] method, which is the best among the others. These data allow us to state that the dataset processed by the LSBRD-μ presents a low fluctuation of values, and this gives the method a smaller data dispersion.
Fast Stroke Lesions Segmentation
107
Table 1. Results from the Matthews Correlation Coefficient (MCC), DICE Similarity Coefficient (DSC) and Jaccard Similarity Coefficient (JSC) with their respective standard deviations. Algorithm
Year
Time
MCC
DSC
JSC
PPV
TPR
SPC
LSBRD-μ LSBRD-A Rebou¸cas et al. [14] Medeiros et al. [7] Rebou¸cas et al. [9] Korbe et al. [6] Dunn [2]
2022 2022 2021 2020 2017 2010 1973
0.56±0.07 0.55±0.08 1.50±0.02 2.20±0.2 1.76±0.29 4.80±0.62 8.69±3.16
93.39 ± 02.68 93.37 ± 02.68 89.00 ± 03.86 91.35 ± 03.86 89.27 ± 05.78 86.87 ± 07.41 88.00 ± 16.80
93.35 ± 02.80 93.34 ± 02.79 88.42 ± 06.68 91.28 ± 04.05 88.85 ± 06.51 86.63 ± 07.84 87.41 ± 18.50
87.66 ± 04.77 87.64 ± 04.77 80.45 ± 09.71 84.21 ± 06.62 80.50 ± 09.79 77.17 ± 11.20 80.77 ± 19.20
93.78 ± 05.75 93.67 ± 05.60 99.11 ± 00.09 99.88 ± 00.09 99.75 ± 00.25 99.84 ± 00.19 99.32 ± 02.99
93.35 ± 04.23 93.43 ± 04.41 83.34 ± 08.72 93.46 ± 04.32 99.96 ± 00.05 99.85 ± 00.11 99.86 ± 00.10
99.94 ± 00.05 99.94 ± 00.05 99.94 ± 00.09 99.93 ± 00.05 83.03 ± 11.13 87.32 ± 13.38 90.53 ± 21.31
Table 1 shows the results of PPV, TRP, and SPC and also their respective standard deviations. As can be seen, the proposed method presents the highest indexes in the SPC metric, being 16.91% higher than the Rebou¸cas et al. [9] method, which is the worst among all. Furthermore, even though the value is equal to that obtained in Rebou¸cas et al. [14], the proposed way is more concise when analyzed from the standard deviation point of view. In the context of this work, the SPC metric indicates how effective the method is in asserting that a given region is not an HCV. The proposed algorithm is the most effective from this point of view. Although the technique does not have the best indices in the PPV and TPR metrics, it proves to be competitive to the others, being even superior to the Rebou¸cas et al. [14] method in the TPR metric, surpassing it by 10.01%, being this method more current in the literature when compared to others. The methods compared above do not use algorithms based on deep learning networks. However, Table 2 shows the results that the Detectron 2 algorithm obtained for the same database used in this work. In this sense, the proposed method presents competitive results with those of the Detectron 2 network, being even higher by 0.56% in the sensitivity metric. That shows that LSBR-μ can predict if the detected region with stroke. For the Dice metric, the method is superior to Detectron-f λ by 1.98%. This logic extends to accuracy, with the proposed method superior to this same network by 0.02%. 5.1
Algorithm Performance Analysis
Another relevant analysis is the method performance in terms of the time required to segment the lesion in the images. The importance of this analysis is related to care by physicians given to patients diagnosed with stroke. The faster the method, the quicker and more accurate the diagnosis and treatment of this disease. In this sense, when looking at the Tables 1 and 2, it is noted that the proposed method has an average segmentation time of only 1.14 s. This value is smaller than the one obtained by the Rebou¸cas et al. method, which has an average time of 1.50 s.
108
A. G. Medeiros et al.
Regarding segmentation time, Table 2 shows that the proposed method is the second-fastest in lesion segmentation. It is worth mentioning that the segmentation time of methods based on deep learning does not take into account what was spent in the training stage so that the network could perform the stroke detection process on these images. Table 2. Results generated by the proposed method and by the algorithm based on the Detectron 2 deep learning network together with fine-tuning techniques. Algorithm
Year
Time
ACC
DSC
TPR
SPC
LSBRD-μ LSBRD-A Detectron 2 [4] Detectron-f λ [4] Detectron-f δ [4] Detectron-f μ [4]
2022 2022 2020 2020 2020 2020
0.56±0.07 0.55±0.08 0.09±0.06 3.08±1.89 3.41±2.00 4.53±1.0
99.85 ± 0.07 99.86 ± 0.07 99.89 ± 0.05 99.83 ± 0.06 99.88 ± 0.05 99.88 ± 0.05
93.35 ± 2.80 93.34 ± 2.79 94.81 ± 2.11 91.37 ± 3.70 94.09 ± 2.40 94.04 ± 2.42
93.35 ± 4.23 93.43 ± 4.41 92.79 ± 3.87 85.21 ± 6.29 90.89 ± 4.57 90.61 ± 4.54
99.94 ± 0.05 99.94 ± 0.05 99.97 ± 0.03 99.99 ± 0.02 99.97 ± 0.03 99.98 ± 0.03
6
Conclusion and Future Works
In this work, a new hemorrhagic stroke (HS) segmentation method in CT images of the skull bein proposed using an approach based on the junction of the μ law with the non-parametric Parzen Window estimation method, called LSBRD-μ. The LSBRD-μ is effective in determining if the localized region is damaged. That can bein confirmed by the high value of ACC (99.85%). That indicates that the method can predict with almost 100% accuracy that the pixels obtained are, in fact, from the region of interest. In addition, the value obtained in the SPC metric (99.94%), which is the highest among all, indicates that the method is the best in detecting non-stroke regions. Regarding the segmentation time, presented in Tables 1 and 2, the LSBRD-μ showed an average time of approximately 1 s per image with a standard deviation of 0.04 s. Thus, the proposed method is the fastest among the traditional algorithms and is the second-fastest compared to the deep learning algorithms. Therefore, it is concluded that the proposed algorithm is efficient in correctly segmenting the hemorrhagic stroke in brain CT images. As future works, this algorithm will be evaluated in different types of brain exams such as MRI and Ultrasonography. Acknowledgment. The authors would like to thank The Cear´ a State Foundation for the Support of Scientific and Technological Development (FUNCAP) for the financial support (grant #6945087/2019).
Fast Stroke Lesions Segmentation
109
References 1. Classifiers based on Bayes decision theory. In: Theodoridis, S., Koutroumbas, K. (eds.) Pattern Recognition, pp. 13 – 89. Academic Press, Boston, 4th edn. (2009) 2. Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters (1973) 3. Faruque, S.: Pulse code modulation (PCM). In: Radio Frequency Source Coding Made Easy. SECE, pp. 65–90. Springer, Cham (2015). https://doi.org/10.1007/ 978-3-319-15609-5 4 4. Han, T., et al.: Internet of medical things-based on deep learning techniques for segmentation of lung and stroke regions in CT scans. IEEE Access 8, 71117–71135 (2020) 5. Hossin, M., Sulaiman, M.: A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowl. Manage. Process 5(2), 1 (2015) 6. K¨ orbes, A., Lotufo, R.: Analise de algoritmos da transformada watershed. In: 17th International Conference on Systems, Signals and Image Processing (2010) 7. Medeiros, A.G., Santos, L.O., Sarmento, R.M., Rebou¸cas, E.S., Filho, P.P.R.: New adaptive morphological geodesic active contour method for segmentation of hemorrhagic stroke in computed tomography image. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12320, pp. 604–618. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61380-8 41 8. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988) 9. Rebou¸cas, E.d.S., Braga, A.M., Sarmento, R.M., Marques, R.C., Rebou¸cas Filho, P.P.: Level set based on brain radiological densities for stroke segmentation in CT images. In: IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp. 391–396 (2017) 10. Rebou¸cas, E.S., Marques, R.C.P., Braga, A.M., Oliveira, S.A.F., de Albuquerque, V.H.C., Rebou¸cas Filho, P.P.: New level set approach based on Parzen estimation for stroke segmentation in skull CT images. Soft. Comput. 23(19), 9265–9286 (2018). https://doi.org/10.1007/s00500-018-3491-4 11. Rebou¸cas, E., Braga, A., Sarmento, R., Marques, R., Filho, P.P.: Level set based on brain radiological densities for stroke segmentation in CT images (2017) 12. Ribeiro, M.A.B.: Estudo de um modelo de conversor a/d com n´ıveis de quantiza¸ca ˜o configur´ aveis (2017) 13. Sethian, J.A.: Level set methods and fast marching methods: evolving interfaces in computational geometry, fluid mechanics, computer vision, and materials science, vol. 3. Cambridge university press (1999) 14. de Souza Rebou¸cas, E., et al.: Level set approach based on Parzen window and floor of log for edge computing object segmentation in digital images. Appl. Soft Comput. 105, 107273 (2021)
Methods for Improving the Fault Diagnosis Accuracy of Rotating Machines Yuri Kazakov1(B) , Ivan Stebakov1 , Alexander Fetisov1 , Alexey Kornaev1,2 , and Roman Polyakov1 1
2
Department of Mechatronics, Mechanics, and Robotics, Orel State University, Komsomolskaya Street, 95, Orel 302026, Russian Federation [email protected] Lab of Artificial Intelligence, Innopolis University, Universitetskaya Street, 1, Innopolis 420500, Russian Federation
Abstract. The paper considers the use of fully connected networks for classifying the states of a rotary machine based on a vibration signal. An experimental stand is proposed. We worked with three different states of the experimental setup. The new approach is to use generative adversarial networks to create artificial data and various architectures of fully connected neural networks. We also tested different combinations of training and validation datasets. As a result, the use of all these methods makes it possible to improve the accuracy of the network by about 6.5%. Keywords: Rotary machines · Malfunctions of rotary machines Intelligent methods for fault diagnosis · Generative Adversarial Networks
1
·
Introduction
Today, intelligent diagnostic systems are used to detect defects. In rotary machines, researchers often deal with rolling bearings. It should be noted that intelligent diagnostic systems are universal, and they are used to solve a variety of problems. Convolution networks and recurrent networks are widely used in the field of fault diagnostics [1,2,4,10,13,17,19,20]. Convolution networks are often used for image processing or multichannel signals. Recurrent networks are used to process sequences. However, these networks are quite complex in hyperparameter tuning. Also, one of the ways to improve the accuracy of diagnostics is transfer learning. In [12], the authors use convolution networks and transfer The work has been carried out at the Oryol State University named after I.S. Turgenev with the financial support of the Ministry of Science and Higher Education of the Russian Federation within the project “Creation of a digital system for monitoring, diagnosing and predicting the state of technical equipment using artificial intelligence technology based on domestic hardware and software”, Agreement No. 075-11-2021-043 from 25.06.2021. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 110–119, 2023. https://doi.org/10.1007/978-3-031-35510-3_12
Methods for Improving the Fault Diagnosis Accuracy of Rotating Machines
111
learning for fault detection in induction motor using time domain and spectral imaging. As a result, the authors show that the use of transfer learning can improve the accuracy of defect detection. Today, methods that combine several approaches at the same time have begun to gain popularity. In [3], the authors proposed a methodology that consists of three parts: feature extraction, fault detection, and fault diagnosis. The results of the work show that this approach allows detecting faults even without labeled data. An analogue of such networks can be fully connected networks. A small dataset is a common problem when training artificial neural networks. Today, the use of artificial data has become a common way to increase the training sample. Generative adversarial networks (GANs) are one way to get artificial data. One of the first articles on this topic was published by Ian J. Goodfellow and his colleagues [5,6]. Generative adversarial networks can be based on various types of networks and methods, such as fully connected networks, convolution networks [14], autoencoder [11], conditional generative adversarial networks, etc. Often GANs are used to create new images. This direction is being actively developed [18]. Also in recent years, a number of works on signal generation have been carried out. In [7], Kay Gregor Hartmann and colleagues used GANs to generate time series. They generated electroencephalographic brain signals. Bin Tan and colleagues [16] used the GAN to generate a cognitive radio signal. They then used the generated and real signals together to automatically classify the modulation. The results showed that classification accuracy increased by 6%. In [8], Debapriya Khazra and Yung-Cheol Byun presented a new GAN model for generating synthetic biomedical signals. The proposed model showed better results in the formation of biomedical signals than the known models. Yi Shi and colleagues in [15] presented a way to protect against a spoofing attack. This method is based on GAN. To improve protection against spoofing attacks, GANs generate a signal that can fool the classifier. This effect allows the classifier to be trained on a more flexible dataset. In this paper, we considered fully connected neural networks for diagnosing defects in a rotary machine. We also tested the possibility of using a GAN to generate a synthetic rotor vibration signal. We then tested the effect of synthetic data on the accuracy of fault classification.
2 2.1
Intellectual Diagnostic Methods Fully Connected Neural Networks
One of the basic methods of data classification is fully connected networks. This type of networks refers to supervised learning methods. The main idea of such net-works is that all neurons of each layer are connected to the neurons of neighboring layers, taking into account the weights [9]: ul = Wl Z(l−1) + b;
(1)
where ul - current layer data matrix, Z(l−1) - previous layer data matrix, Wl - weight matrix, b - bias vector, l - layer number index. Inside the layer, an
112
Y. Kazakov et al.
activation function is applied to the data. In its simplest form, a ‘sigmoid’ is used as an activation function [9]: Zl = (1 − exp(−ul ))(−1) ;
(2)
where Zl - current layer output data matrix. If we have more than two classes, then the activation function of the output layer is’softmax’ [9]: Y = exp(ul )/
N out
exp(ul );
(3)
i=1
where Nout - the number of output layer neurons or the number of classes, Y is output probability matrix. 2.2
Generative Adversarial Network
These networks use for creating a synthetic data. Usually they are used for generating new images. However we used GAN for creating vibration signal. GAN is based on two artificial neural networks. These networks must train together. First network is generator. The generator receives an array of random numbers as an input, and at the output it produces data according to the example of a training sample. Second network is discriminator. The discriminator based on training data and generated data tries to classify them as “real” or “generated” (see Fig. 2.).
Fig. 1. Scheme of GANs.
Both neural networks should work at maximum performance. Generator must generate data that “fools” the discriminator, but discriminator must distinguish between real and synthetic data. As a result, the main task of the generator is to generate data equivalent to real ones, and the task of the discriminator is to learn to clearly distinguish between real and synthetic data. As a result, we have the following loss functions for generator and discriminator [6]: LG = −mean(log(YGen ));
(4)
Methods for Improving the Fault Diagnosis Accuracy of Rotating Machines
LD = −mean(log(YReal )) − mean(log(1 − YGen ))
113
(5)
where YGen is output probability matrix for the discriminator with respect to synthetic data, YReal is output probability matrix for the discriminator with respect to real data. To evaluate a generator and a discriminator use the notion of score. Then the score of the generator can be found by the formula [6]: SG = mean(YGen );
(6)
The discriminator score can be found by the formula [6]: SD = 0.5mean(YReal ) + 0.5mean(1 − YGen );
3 3.1
(7)
Results and Discussion Data Collection
Test rig conception included two sliding bearings, temperature sensors, displacement sensors, coupling and electric motor (see Fig. 1). Displacement sensors measured vibration of the rotor. Temperature sensors measured temperature of a lubricant layer. They were located into body of the bearings.
Fig. 2. Test rig conception.
The motor control and data collection was carried out using a layout of the switching unit, developed as part of the 2nd stage of the project “Creation of a digital system for monitoring, diagnosing and predicting the state of technical equipment using artificial intelligence technology based on domestic hardware and software”. Displacement sensors were SIC IMA 12-06BE 1ZC0S. They are analog sensors with voltage output. It haves sensing distance equal 0. . . 6 mm. Reproducibility is 0.3 mm. Working temperature range is –25◦ C. . . +70◦ C. In the experiment the frequency of data acquisition was 1000 sig/sec. Temperature sensors were HEL705-U-1-12-C2. These sensors are variable resistance. Measuring temperature
114
Y. Kazakov et al.
range is –200 ◦ C. . . +260 ◦ C. Their accuracy is 0.2%. In the experiment the frequency of data acquisition was 1 sig/sec. Test rig was used for getting an experimental dataset. There were states three different states of the rotor machine such as no defects, misalignment in the coupling and imbalance. Second state was based on misalignment between electric motor and rotor. We put a little plate on the right side of the electric motor base. Plate was 0.5 mm. Load disks were used for creating third state of the test rig. Unbalance was about 197 gmm. We had ten parallel experiments for this states. Each experiment lasted 10 min and each experiment we rebuilt the test rig to change lubricant oil. This phenomenon gave us the opportunity to reset a state of the test rig. The frequency of obtaining data on the vibrations of the rotor is 1000 sig/sec. The frequency of temperature data acquisition is 1 sig/sec. Rotor speed was 1800 rpm. Real and smoothed rotor vibration signals were measured. Smoothed signal was based on digital low pass filters. 3.2
Fully Connected Neural Networks to Rotor Diagnostic Defects
According to the results of the experiment, for each experiment, we received 600 thousand signals for each displacement sensor and 600 signals for each temperature sensor. Also, we randomly selected 3 experiences from each class and used them only to test our algorithms. And the seven remaining experiences were used for training and tuning algorithms. Important aspects of fault diagnosis are the speed and accuracy of fault detection. Therefore, we must strive for high accuracy with a small amount of data. In this regard, at the beginning, we conducted a series of computational experiments on training fully connected neural networks on a different amount of data with a different number of neurons. It is worth noting that for all computational experiments in this section we used a fully connected neural network with one hidden layer and an output layer with a “softmax” activation function. For the experiments, we used smoothed rotor vibration data for the right bearing. The data was split into chunks of 100 signals with a step of 1000. Then the data along the X and Y axes were glued into one array and went through the normalization process. The experiments differed in the time frames used. To train the networks, the data was divided into training, validation and test data in the ratio 80:15:5. The results of the experiments can be found in Table 1. Based on the data presented in Table 1, we can conclude that an increase in the number of neurons can improve the accuracy of defect detection. However, this trend is not constant. Of all the options, the biggest improvement was about 3%. However, at short time intervals, a tendency for the error to decrease with its subsequent increase. Also, with decreasing time, the accuracy of the networks drops significantly. Both observations are most likely related to the retraining process. This is due to a decrease in the amount of training data. For example, for the first option, the training sample consisted of 12600 options, but for the last option, 1260 options. However, the best option turned out to be a 5-minute option. This may indicate that while maintaining a balance between the number
Methods for Improving the Fault Diagnosis Accuracy of Rotating Machines
115
Table 1. The error of fully connected neural networks depending on the number of neurons and the time of the experiment. Diagnostic time Number of neurons 10 20 50
100
200
10 min
32.9%
25.8%
24.5%
5 min
33.4% 29.5% 24.1% 24.3% 23.3%
2 min
37.2%
32.2%
27.1%
26.4%
27%
1 min
38.6%
35.4%
29.6%
31.3%
30.7%
28.9%
27.3%
of neurons and a sufficient number of training examples, it is possible to achieve good accuracy with a short defect detection time. Based on these observations, we decided to change the data generation parameters. We reduced the bias step from 1000 to 200, which allowed us to collect more data. We then trained the network to diagnose defects within a minute. Comparative results for training a network with a large amount of data and the best variant from Table 1 is presented in Table 2. Table 2. Comparative results for training a network with a large amount of data. Diagnostic time 5 min
Number of neurons 10 20 50
100
200
33.4% 29.5% 24.1% 24.3% 23.3%
Previous data 1 min 38.6%
35.4%
29.6%
31.3%
30.7%
New data 1 min
29.3%
25.7%
25.1%
25.3%
31.2%
Based on the new data, we can conclude that for this sample, the dependence of the network accuracy on the number of neurons is preserved. Also, the increase in training data made it possible to obtain better results even with a diagnosis time of 1 min. In reality, developers often face the problem of a small data set when developing neural networks. As for rotary machines and power equipment, this problem is very acute. Using multiclass data can help improve network accuracy. Therefore, we decided to take into account the temperature sensor data for the right bearing. However, this did not give a positive effect. This most likely indicates that temperature changes are not associated with some kind of defect. 3.3
Generative Adversarial Network to Increasing the Volume and Variety of Training Data
The generative adversarial networks described in Sect. 2.2 are capable of reproducing the data they were trained on. This makes it possible to generate new,
116
Y. Kazakov et al.
often slightly different data from the training set. Therefore, we decided to use this type of networks to increase the number and variability of the training sample, which in theory should have a positive effect. Due to the fact that our data is presented as a sequence, we used fully connected networks for the generator and discriminator. The generator network had three hidden layers with 256, 512, 1024 neurons, respectively. The activation function in the hidden layers was “leakyrelu”. The activation function of the output layer was “tanh”. The discriminator network had three hidden layers with 1024, 512, 256 neurons, respectively. The activation function in the hidden layers was “leakyrelu”. The “dropout” parameter was also used on the network. The “dropout” value was 0.3. The activation function of the output layer was sigmoid (Sect. 2.1). The input layer of the generator took a vector of 300 random values. And at the output, a vector of the same size as the training sample was generated. In turn, the discriminator at the input received an example from the training sample or a vector generated by the generator. There was only one evaluation value at the output. In this type of networks, the learning rate parameter plays an important role. Generator and discriminator learning rates were 0.00002. The dataset presented earlier was used to train the network. Thus, 2000 examples of signals were generated for each class. Then, we have carried out several computational experiments. The results showed that for training networks, it is necessary to use at least 1000 examples for each class. The training dataset was 2000 examples of signals for each class. After training the networks, new data was generated for each class. Then, computational experiments were carried out to train fully connected neural networks. We tested the influence of the number of generated examples, as well as the number of neurons, on the accuracy of network training for classifying defects in a rotary machine. The experimental results can be found in Table 3. Table 3. Dependence of the classification error on the number of neurons and the amount of synthetic data. Amount of generated data Number of neurons 20 50 100
200
25%
25.4%
24.7%
24%
23.1%
50%
25.6%
22.5%
22.1%
22.5%
100%
25.5% 23.2% 21.9% 22.2%
150%
26.7%
22.6%
21.8%
23.5%
Based on the data from the Table 3, we can conclude that the use of artificial data makes it possible to increase the accuracy of the network. However, we see the retraining phenomenon, when the amount of real data and generated data becomes equal or the amount of generated data begins to exceed. This
Methods for Improving the Fault Diagnosis Accuracy of Rotating Machines
117
phenomenon is due to the fact that networks begin to highlight the features of generated data and pay less attention to real data. However, the improvement in accuracy by 3–4% suggests that the generated data makes the training sample larger and more diverse. There are techniques when training datasets are divided into training and validation samples in a non-random way. Often, artificial data is used only in the training process and is not used in the validation process. As a result, we put the generated data to the training dataset and real data to the validation dataset. We got a sample of 15 thousand examples for training and 6 thousand for validation. However, this splitting of the data gave a network error of about 48%. This is due to the fact that the experimental data do not participate in the network training process. Then we conducted a series of computational experiments with different combinations of data. As a result, we got the best option when the training sample contained of real and generated data, and the validation dataset contained real data with a small amount of generated data. Our training sets were a training data of 21k examples (15k generated data and 6k real examples) and a validation data of 8k examples (6k real data and 2k generated data). This training option had an error of about 20%. Then we tested this sample on several variants of neural networks. An “hourglass” network with three hidden layers and it has the number of neurons 100, 5, 100, respectively. The error of this network was about 19.7%. Then we tried a number of networks with simple layer connections. As a result, the best option was a network that had three hidden layers with the number of neurons 200, 100, 50, respectively. This network has error equal 18.4%. However, increasing the number of layers in the direction of decreasing or increasing neurons did not give a positive result.
4
Conclusion
As a result, the concept of a stand for recreating the defects of rotary machines was proposed. Vibration and temperature data were obtained for different states of the bench. Then these data were used for computational experiments on classifying the states of rotary machines using intelligent methods. We used fully connected neural networks for classification. They were tested on a different dataset with different numbers of neurons. We also used different network architectures. The best option was a network that had three hidden layers with 200, 100, 50 neurons, respectively. This network has an error of 18.4%. Then GANs were applied to generate new data. This made it possible to increase the accuracy of the network by about 3–4%. As a result, using all methods, the accuracy of fully connected networks can be increased by about 6.5%. Acknowledgment. The work has been carried out at the Oryol State University named after I.S. Turgenev with the financial support of the Ministry of Science and Higher Education of the Russian Federation within the project “Creation of a digital system for monitoring, diagnosing and predicting the state of technical equipment using artificial intelligence technology based on domestic hardware and software”, Agreement No. 075-11-2021-043 from 25.06.2021.
118
Y. Kazakov et al.
Author contributions. Yu. Kazakov and A. Fetisov developed the test rig and collected data for training neural networks. A. Kornaev proposed the idea of using GANs to generate artificial data. Kazakov recreated and trained the GANs. Stebakov and Kazakov conducted computational experiments on training fully connected neural networks with different parameters. R. Polyakov was in charge of supervising this work.
References 1. An, J., Ai, P., Liu, D.: Deep domain adaptation model for bearing fault diagnosis with domain alignment and discriminative feature learning. Shock Vibr. 2020, 4676701 (2020). https://doi.org/10.1155/2020/4676701 2. Belagoune, S., Bali, N., Bakdi, A., Baadji, B., Atif, K.: Deep learning through LSTM classification and regression for transmission line fault detection, diagnosis and location in large-scale multi-machine power systems. Measurement 177, 109330 (2021). https://doi.org/10.1016/J.MEASUREMENT.2021.109330 3. Brito, L.C., Susto, G.A., Brito, J.N., Duarte, M.A.: An explainable artificial intelligence approach for unsupervised fault detection and diagnosis in rotating machinery. Mech. Syst. Sig. Process. 163, 108105 (2022). https://doi.org/10.1016/ J.YMSSP.2021.108105 4. Dias, A.L., Turcato, A.C., Sestito, G.S., Rocha, M.S., Brand˜ ao, D., Nicoletti, R.: A new method for fault detection of rotating machines in motion control applications using profidrive information and support vector machine classifier. J. Dynamic Syst. Meas. Control Trans. ASME 143, 041007 (2021). https://doi.org/10.1115/1. 4048784/1088474 5. Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63, 3422622 (2020). https://doi.org/10.1145/3422622 6. Goodfellow, I.J., et al.: Generative adversarial networks (2014). https://doi.org/ 10.48550/arxiv.1406.2661 7. Hartmann, K.G., Schirrmeister, R.T., Ball, T.: EEG-GAN: generative adversarial networks for electroencephalograhic (EEG) brain signals (2018). https://doi.org/ 10.48550/arxiv.1806.01875 8. Hazra, D., Byun, Y.C.: SynSigGAN: generative adversarial networks for synthetic biomedical signal generation. Biology 2020 9(12), 441 (2020). https://doi.org/10. 3390/BIOLOGY9120441 9. Kornaeva, E.P., Kornaev, A.V., Kazakov, Y.N., Polyakov, R.N.: Application of artificial neural networks to diagnostics of fluid-film bearing lubrication. IOP Conf. Ser. Mater. Sci. Eng. 734, 012154 (2020). https://doi.org/10.1088/1757-899X/734/1/ 012154 10. Kumar, A., Gandhi, C.P., Zhou, Y., Kumar, R., Xiang, J.: Improved deep convolution neural network (cnn) for the identification of defects in the centrifugal pump using acoustic images. Appl. Acoust. 167, 107399 (2020). https://doi.org/ 10.1016/j.apacoust.2020.107399 11. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders (2015). https://doi.org/10.48550/arxiv.1511.05644 12. Misra, S., et al.: Fault detection in induction motor using time domain and spectral imaging-based transfer learning approach on vibration data. Sensors 2022 22, 8210 (2022). https://doi.org/10.3390/S22218210
Methods for Improving the Fault Diagnosis Accuracy of Rotating Machines
119
13. Nguyen, H.D., Tran, K.P., Thomassey, S., Hamad, M.: Forecasting and anomaly detection approaches using lstm and lstm autoencoder techniques with the applications in supply chain management. Int. J. Inf. Manage. 57, 102282 (2021). https:// doi.org/10.1016/J.IJINFOMGT.2020.102282 14. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015). https://doi.org/10.48550/ arxiv.1511.06434 15. Shi, Y., Davaslioglu, K., Sagduyu, Y.E.: Generative adversarial network in the air: deep adversarial learning for wireless signal spoofing. IEEE Trans. Cogn. Commun. Netw. 7, 294–303 (2021). https://doi.org/10.1109/TCCN.2020.3010330 16. Tang, B., Tu, Y., Zhang, Z., Lin, Y.: Digital signal modulation classification with data augmentation using generative adversarial nets in cognitive radio networks. IEEE Access 6, 15713–15722 ( 2018). https://doi.org/10.1109/ACCESS. 2018.2815741 17. Tsai, D.M., Jen, P.H.: Autoencoder-based anomaly detection for surface defect inspection. Adv. Eng. Inf. 48, 101272 (2021). https://doi.org/10.1016/J.AEI.2021. 101272 18. Wang, K., Gou, C., Duan, Y., Lin, Y., Zheng, X., Wang, F.Y.: Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sin. 4, 588–598 (2017). https://doi.org/10.1109/JAS.2017.7510583 19. Wang, X., Mao, D., Li, X.: Bearing fault diagnosis based on vibro-acoustic data fusion and 1d-cnn network. Measurement 173, 108518 (2021). https://doi.org/10. 1016/J.MEASUREMENT.2020.108518 20. Yu, Q., Kavitha, M., Kurita, T.: Autoencoder framework based on orthogonal projection constraints improves anomalies detection. Neurocomputing 450, 372– 388 (2021). https://doi.org/10.1016/J.NEUCOM.2021.04.033
Heuristics Assisted by Machine Learning for the Integrated Production Planning and Distribution Problem Matheus de Freitas Araujo1(B) , Jos´e Elias Claudio Arroyo1 , and Thiago Henrique Nogueira2 1
Departamento de Inform´ atica - Universidade Federal de Vi¸cosa, Av. Peter Henry Rolfs, s/n, Vi¸cosa, Minas Gerais 36570-900, Brazil {matheus.f.freitas,jarroyo}@ufv.br 2 Universidade Federal de Vi¸cosa, Engenharia de Produ¸ca ˜o, Rodovia MG-230 - Km 7, Rio Parana´ıba, Minas Gerais 38810-000, Brazil [email protected]
Abstract. This work addresses a problem that integrates the unrelated parallel machine scheduling and capacitated vehicle routing problems. In this integrated problem, a set of jobs must be processed on machines and then distributed using a fleet of vehicles to customers. The integrated problem’s objective is to determine the machines’ production scheduling and the vehicle routes that minimize the total weighted tardiness of the jobs. As the problem is NP-Hard, we propose four neighborhood search heuristics and a framework that uses machine learning to solve it. The framework aims to define the best neighborhood search heuristics to solve a given instance based on the problem characteristics. The proposed methods are evaluated and compared by computational experiments on a set of proposed instances. Results show that using a machine learning framework to solve the problem instances yields better performance than neighborhood search heuristics.
1
Introduction
Integrating cyber-physical systems (CPS), the Internet of Things (IoT), and the Internet of Services (IoS) into manufacturing industries will enable the creation of a “smart factory”. The idea of a decentralized production system is the basis for the smart factory. This system enables people, machines, and resources to communicate with one another. Smart factories are a fundamental characteristic of Industry 4.0 [4]. Within this context, factories will be able to be fully automated. Autonomous vehicles can carry out ordering systems, production of products, and distribution systems. Automation will make smart factories extraordinarily efficient but also highly flexible and individualized. New technologies and the advancement of Industry 4.0 have significantly changed production and distribution processes. However, companies should also consider other factors when deciding on customer experience. The approaches c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 120–131, 2023. https://doi.org/10.1007/978-3-031-35510-3_13
Heuristics Assisted by Machine Learning for the Integrated
121
mentioned in this paper are geared toward improving customer service levels rather than just minimizing costs. These approaches aim to meet deadlines with minimal delay in an integrated production planning and distribution problem. The problem consists of producing a set of products in unrelated parallel machines and then distributing them to their customers using a fleet of capable vehicles. The objective is to minimize the total weighted tardiness. Production scheduling and distribution problems are two widely studied combinatorial optimization problems. These problems are usually addressed independently. However, it is possible to find several works that address these integrated problems. Chen [1,2] presents an extensive review of works addressing these integrated problems. Tamannaei et al. [18] and Felix & Arroyo [3] study an integrated problem that uses a single machine and a heterogeneous fleet of vehicles to deliver items. The objective is to minimize the total weighted tardiness of jobs and transport-related costs. Liu et al. [8], Ngueveu et al. [13] and Ribeiro and Laporte [15] studied a similar problem with the same production system, however they used a homogeneous fleet of vehicles and the objective is to minimize the sum of the jobs’ delivery times. Zou [22] studied the integrated problem that uses a single machine and a fleet of vehicles with limited capacity. In this case, the objective is to minimize the maximum delivery time. Ta et al. [17] studied an integrated problem that uses a flow-shop scheduling and a single vehicle with an infinite capacity to carry out deliveries to minimize the total delay. Naganoa et al. [11], studied an integrated problem of production and distribution in flow-shop scheduling and vehicle routing capacitated problem. The objective is to create a sequence of orders that minimize the integrated problem’s makespan, that is, the delivery date from the last job to the last customer. The authors propose a mathematical model of mixed integer programming and an Iterated Greedy algorithm to solve the problem. Martins et al. [10] studied a similar integrated problem, where the production system is modeled as a hybrid flow shop scheduling problem. A vehicle that can carry multiple loads delivers items. The authors propose a mixed integer linear programming model and a metaheuristic biased-randomization variable neighborhood descent (BR-VND) as a solution. Hou et al. [5], developed an enhanced brainstorm optimization (EBSO) algorithm to solve the problem integrated with distributed flow shop scheduling problem and multi-depot vehicle routing optimization problem. The objective is to minimize total weighted earliness and tardiness. The algorithm’s results are compared to the mathematical model, and and other works available in the literature [7,9,14,19,21]. This work proposes a constructive heuristic and four neighborhood search heuristics to solve the production scheduling and distribution problem. Furthermore, we propose a framework that uses machine learning to predict the best heuristic to solve a given instance based on its characteristics. It is essential to look at the characteristics of the individual instances to determine the most appropriate algorithm as the integrated problem is multi-component, i.e., some
122
M. de Freitas Araujo
instances will have characteristics that will make one of the sub-problems harder to solve than the others. The paper is organized as follows: Sect. 2 presents the problem. Section 3 explains the proposed heuristics and the framework. Section 4 presents the computational experiments, and Sect. 5 presents this work’s conclusions.
2
Problem Definition
There is a set of jobs I = {1, 2, ..., n} that must be processed, without preemption, on a set M = {1, 2, ..., m} of m unrelated parallel machines. Each job j has a processing time pij on each machine i, a due date (dj ), a tardiness penalty (wj ), and a size (space occupied in a vehicle) (hj ). In addition, there is a machine-dependent setup time sijk for processing job k just after job j. In general, sijk = sikj . After processing the jobs on the machines, they must be divided into batches and then delivered to their respective clients by using a set L = {1, 2, ..., l} of l vehicles. For the routing problem, a graph G(V, A) is defined, where V = {0, 1, ..., n} is the set of nodes and A = {(j, k) : 0 ≥ j, k ≥ n, j = k} is the set of arcs. The node 0 represents the deposit, and each job is associated with a customer. The set of customers is I = {1, ..., n}. A travel time tjk is associated with each arc (j, k). Each vehicle v ∈ L has a capacity (qv ) that must be respected. It is assumed that each vehicle can only be used once, and each customer can only be visited once. The objective of the problem is to determine the processing order of the jobs on the machines and the delivery routes of the used vehicles to minimize the total weighted tardiness (TWT) of the jobs. wj Tj , where Tj = max{0, Dj −dj } is the tardiness The TWT is defined as j∈I
of the job j. Dj is the delivery time of job j. To calculate Dj , first, the processing completion time Cj of each job j and the departure time Sv (start time) of each vehicle v are determined. Sv is defined as Sv ≥ max{Cj }, for all jobs j belonging to the batch of jobs carried by the vehicle v. To illustrate the problem, in Fig. 1, a solution is presented for an instance with three machines (m = 3), six jobs (n = 6), and four vehicles (v = 4). Figure 1 shows the scheduling of jobs on the machines and the distribution routes. The scheduling of jobs on machines M1 , M2 and M3 are [I5 , I6 ], [I4 , I1 , I2 ] and [I3 ], respectively. The delivery routes for vehicles V1 , V2 , V3 e V4 are respectively {I0 → I4 → I5 → I0 }, {I0 → I1 → I6 → I0 }, {I0 → I2 → I0 } and {I0 → I3 → I0 }, where I0 represents the deposit. The start times for vehicle trips V1 , V2 , V3 , and V4 are 14, 25, 45, and 39, respectively. Figure 1 shows the completion times and delivery times. For example, the completion and delivery time of job I6 are 23 and 61, respectively.
Heuristics Assisted by Machine Learning for the Integrated
123
Fig. 1. The scheduling of jobs and delivery routes.
3
Proposed Algorithms
We present two solution-decoding algorithms and a constructive algorithm. Then, we propose four neighborhood search heuristics. Furthermore, we present a framework that uses machine learning to select the best neighborhood search heuristic to solve a problem instance. The algorithms are shown in detail in the following section. 3.1
Decoding Algorithms
Even though our problem has multiple components, we can divide the solution into two parts (sol = {s, r}), with the first part representing machine scheduling (s) and the second part representing vehicle routing (r). The scheduling s is built from the list of jobs L1, where L1 represents the order in which the jobs are inserted into the machines by the NEH decoding algorithm. Vehicle routing r is obtained from the list of jobs L2, and L2 represents the order of job insertion into routing by the PIFH decoding algorithm. The NEH decoding algorithm is based on the NEH heuristic proposed by [12]. This heuristic is frequently employed in scheduling problems. NEH inserts jobs from the L1 list sequentially on the machines. These jobs are assigned to the scheduling position, which minimizes machine completion time. The scheduling s is finally defined. PIFH is the second decoding algorithm. This algorithm is a modification of Solomon’s PIFH heuristic ([16]). The PIFH heuristic is an efficient route construction algorithm. This heuristic employs a greedy criterion to insert jobs into the routes sequentially. The PIFH decoding algorithm inserts jobs into the routes sequentially using the L2 list. Each algorithm iteration inserts a job into the routing to minimize the weighted tardiness (wt). The vehicle routes r are finally defined.
124
3.2
M. de Freitas Araujo
Initial Solution
Using the decoders NEH and PIFH, an initial solution sol = {s, r} is obtained. The job sequencing on the machines is built from a list of jobs L1. A list of jobs in L2 is used to generate delivery routes. The list L1 is built by sorting the jobs in descending order of the sum of the median processing and setup times. The list L2 is constructed by sorting the jobs in increasing order of the difference between the due date and the job completion time (dj − Cj). Neighborhood search heuristics also use this rule to generate the list L2. 3.3
Neighborhood Search Heuristics
To improve the quality of the solutions, we developed four neighborhood search heuristics. Neighborhood search methods use two types of moves: exchange and insertion. The movements are made on the job lists L1 and L2, which are used by the decoders. The exchange movement in the list consists of exchanging two position elements and creating a new list. The insertion movement involves removing an element from the list and then re-inserting it in a different position in the list. Consider the set R1(L) to be a collection of all neighbors of the list L performing exchange moves and the set R2(L) to be a collection of all neighbors of the list L performing insertion moves. The first heuristics perform the exchange and insertion movements on the list of jobs L2. The Neighborhood Search Heuristics 1 (NSH 1) takes the lists L1 and L2 as input and performs the following steps: Step 1: From L1 and L2, the decoders determine the solution sol = {s, r}. Step 2: For each list l in Rk (L2), ∀k ∈ {1, 2}. The PIFH decoder sets the route r , building a new solution sol = {s, r }. Step 3: Check if the new solution is better than the current solution (F (sol ) < F (sol)). Step 3.1: If the new solution is better, update solution sol and list L2 with l. Step 4: Return the best solution (sol). The second Neighborhood Search Heuristics (NSH 2) works similarly to the previous algorithm; however, the exchange and insertion movements are performed on the list L1 instead of L2. The algorithm begins with the lists L1 and L2 and follows the steps below: Step 1: From L1 and L2, the decoders determine the solution sol = {s, r}. Step 2: For each list l in Rk (L1)∀k ∈ {1, 2}. The NEH decoder sets the scheduling s . Step 3: The list L2 is created from the scheduling s . Step 4: The PIFH decoder sets routing r , building a new solution sol = {s , r }. Step 5: Check if the new solution is better than the current solution (F (sol ) < F (sol)).
Heuristics Assisted by Machine Learning for the Integrated
125
Step 5.1: If the new solution is better, update solution sol and list L1 with l. Step 6: Return the best solution (sol). The Neighborhood Search Heuristics 3 (NSH 3) algorithm also performs exchange and insertion movements in the list L1. The only difference between NSH 3 and NSH 2 is the step 4. Step 4 of the NSH 2 should be replaced by: Step 4: The Neighborhood Search Heuristics 1 is executed, passing the lists l and L2 as parameters. The NSH 1 returns a new solution sol . The Neighborhood Search Heuristic 4 (NSH 4) works similarly to the previous NSH 3. The only difference between them is that, in addition to executing step 5.1, whenever a better solution is found, the following steps are also performed: Step 5.2: Update lists Rk (L1), ∀k ∈ {1, 2}. Step 5.3: Reset the search and return to Step 2 from the beginning. 3.4
Framework
To solve combinatorial optimization problems, the algorithm configuration that produces the best average result for a given set of instances is usually chosen. In general, this algorithm yields good results in some cases and bad results in others. Based on this, we propose a framework that uses machine learning (ML) to determine the best heuristic for solving a specific instance of the problem. For this purpose, the instances of the problem were initially classified based on their similarities. We define 19 classes of instances. Following categorization, a random instance from each class was chosen to constitute the training set. The remaining instances were used as a test set. The proposed heuristics ran for a maximum of 600 s for each instance of the training set, and we chose the heuristic that determined the best solution. After running all the heuristics for the train set, a model based on ML is built to predict the best heuristic to solve an instance. The ML model has the following instance characteristics as input: number of jobs (n), number of machines (m), average processing time (p), average setup time (s), average travel time (t), average demand (h) and average vehicle capacity (q). The ML model output is the best heuristic to solve each specific instance based on its characteristics. The intention of training the ML model is to build a function h(x) : x → Y , so that h(x) is predicted to the corresponding value of Y . After the training process, we have a predictor h(x) which, given the characteristics of an instance, returns the best heuristic algorithm structure. The predictor is a Multilayer Perceptron (MLP) Neural Network with sequential linear layers. Each layer computes its output by using a linear combination of the inputs. The MLP was trained using the Adam algorithm described in [6]. This algorithm is a first-order gradient-based optimization algorithm for stochastic objective functions based on adaptive estimation of lower-order moments.
126
M. de Freitas Araujo
The Mean Squared Error (MSE) of the differences between the target output and the network output is used to evaluate network performance. The neural network training took about 3 h. However, once the model is trained, the time spent obtaining the recommended algorithm for any instance is close to 0.1 s. As shown in the following section, using this ML framework, it was possible to determine the best solutions for the integrated problem.
4
Computational Experiments
In this section, we present the results of experiments conducted to evaluate the performance of the proposed methods: neighborhood search heuristics and the ML framework. All methods were written in Python 3 and ran on an Intel Core i5-3570 (3.4 GHz) CPU with 16 GB of RAM. The set of instances and all the algorithms presented in this paper are available at https://github.com/ matheusinhofreitas/Machine Learning Framework. We measure solution quality by the relative percentage deviation (GAP) from the best-known solution obtained among all the methods. The GAP is defined by −Fbest ) , where Fbest is the minimum TWT value obtained GAP = 100 × (Fmethod Fmethod among all the compared methods and Fmethod is the TWT obtained with a given method. To carry out the computational experiments, a set of 720 random instances is generated based on some works from the literature. An instance of the problem is characterized by the following parameters: the number of machines m ∈ {2, 4, 8}, the number of jobs n ∈ {10, 15, 30, 50}, the setup times (generated in the ranges [1, 9] or [1, 99]), the average travel time between customers and the depot (20 or 100) and the total vehicle capacity (which is 20%, 50% or 90% higher than total customer demand). Five instances were generated for each parameter combination, totaling 720 instances. 4.1
Computational Results
Initially, the ML Framework (MLF) is compared to the neighborhood search heuristics (NSH 1, NSH 2, NSH 3, and NSH 4). The average GAPs are classified based on the number of jobs and machines. Table 1 values represent the average GAPs and runtime across 60 instances. The smallest GAPs are highlighted in bold. Table 1 shows that NSH 4 outperforms the other methods for small instances (n = 10, 15 jobs). NSH 4 performs worst in the other instance groups (n = 30, 50). The ML framework obtains the best average GAP for two groups out of a total of 12 groups of instances. The heuristics NSH 1, NSH 2, NSH 3, and NSH 4 produced the best GAP for 0, 2, 2, and 6 groups of instances, respectively. The average GAP for the methods ML Framework, NSH 1, NSH 2, NSH 3, and NSH 4 is 9.9%, 47.8%, 28.1%, 16.4%, and 17%, respectively. The ML Framework is the best method when considering the overall average.
Heuristics Assisted by Machine Learning for the Integrated
127
Table 1. Average GAPs (%) and runtime (seconds) for the compared methods. Instances ML Framework NSH 1 NSH 2 NSH 3 NSH 4 GAP Time GAP Time GAP Time GAP Time GAP Time N10 m2 N10 m4 N10 m8 N15 m2 N15 m4 N15 m8 N30 m2 n30 m4 N30 m8 N50 m2 n50 m4 N50 m8
13.7 05.1 03.3 18.9 11.6 4.8 19.4 12.3 06.1 07.6 07.0 09.2
7.75 9.67 9.66 117.23 124.18 119.65 106.86 167.39 180.35 180.53 180.54 180.56
62.6 55.3 34.2 78.2 57.2 42.8 59.3 45.9 28.9 64.3 33.0 12.2
0.07 0.07 0.07 0.37 0.38 0.38 6.67 6.90 7.09 57.96 61.32 151.03
38.1 40.4 33.0 53.3 50.6 40.4 07.3 23.1 25.3 02.9 07.9 15.1
0.16 0.21 0.25 0.98 1.06 3.63 22.47 24.51 30.85 180.25 180.06 180.08
09.8 05.1 03.3 17.3 12.1 4.8 37.2 13.4 03.8 56.0 24.7 08.6
8.94 9.12 9.04 114.22 116.12 115.13 180.33 180.24 180.57 180.42 180.20 180.12
04.6 03.2 02.2 09.1 03.3 01.3 42.4 25.2 14.6 57.7 31.4 09.5
17.94 17.98 14.95 177.22 179.36 176.46 180.12 180.37 180.32 180.68 180.37 180.21
Average
09.9
115.4
47.8
24.4
28.1
52.0
16.4
121.0
17.0
138.7
The ML Framework obtained the best GAP for 396 instances (55%) out of 720 instances. In 346 cases (48.1%), the heuristic that found the best solutions was NSH 4. For 345 instances (47.9%), the NSH 3 found the best GAP. The NSH 2 and NSH 1 found the best GAP for 201 (27.9%) and 22 (3.1%), respectively. In Table 1, it is still possible to see that the MLF is similar to the execution time of the NSH3 and NSH4 Algorithms. The only disadvantage of MLF to other algorithms is the need for previous machine learning model training. However, MLF training only needs to be performed once. The MFL can capture the profile of several instances during training. Training is only necessary again if the instances to be resolved by MLF are utterly different from the classes defined in model training. An Analysis of Variance (ANOVA) [20] was performed to determine whether the observed differences are statistically significant. Using the P-value of the ANOVA, which was 0.0 for the compared methods, and a critical value of 0.05 (5%), it is possible to conclude that there are statistically significant differences between the experiments. Figure 2 depicts the percentage of times the framework selected each neighborhood search heuristic to solve a given instance (720 instances). The framework only used NSH 1 once (0%) and NSH 2 148 times (21%). The NSH 3 was the most chosen method, 530 times (74%). Finally, the NSH 4 was selected in 41 cases (6%). Figure 3 shows how often (percentage) the framework chose each heuristic to solve an instance. The ML Framework chose the best heuristic in 57% of the cases and the second best in 34% of the cases. In 91% of cases, the ML Framework decided appropriate heuristics.
128
M. de Freitas Araujo
Fig. 2. Framework choices
Fig. 3. Framework assertiveness
The Figs. 4, 5, 6 and 7 show the average GAP values of the best-known solution for the main characteristics of the instances. The values were divided into intervals, and the average GAP for each resolution method is known. Furthermore, the overall average of the instance results for all methods is displayed. For a given instance, the AVG computes the method dispersion. For cases of low complexity, all methods are expected to find solutions close to the optimal one, i.e., the average value (AVG) will be close to zero, and method dispersion will be low. Only a few methods can perform well against the GAP as complexity increases, resulting in higher dispersion and AVG. When analyzing the Figs. 4, 6 and 7, it is observed that AVG increases with the growth of p, h e q, while the increase in s (Fig. 5) interfered slightly with AVG, the increase in complexity is because the more significant the average capacity, the fewer vehicles are available to the instance. The s was generated independently of the instance. However, it is still possible to notice a dispersion of the methods with its growth. The Figs. 4 and 5 present the results of the algorithms when comparing the average processing time (p) and average setup time (s) of the instances, respectively. In these cases, the MLF proved competitive compared to the other heuristics, achieving good results for different ranges of values for various characteristics. Furthermore, it is possible to observe that, in some intervals, NSH 4 proved to be more efficient than the other heuristics. This happens because the values found by NSH 4, in these cases, determine the Fbest used to define the GAP. The Figs. 6 and 7 present the results of the algorithms when comparing the average demand (h) and average vehicle capacity (q) of the instances. In these cases, the MLF was able to find good results for the different ranges of values of the different characteristics. From the previous graphs, it is possible to observe that the MLF can adapt to changes in the characteristics of the instances. The MLF could recognize and select the best heuristic for each instance.
Heuristics Assisted by Machine Learning for the Integrated
129
Fig. 4. Average Processing Time (p)
Fig. 5. Average Setup Time (s)
Fig. 6. Average Demand (h)
Fig. 7. Average Vehicle Capacity (q)
5
Conclusions
This paper proposed a MILP model and four neighborhood search heuristics for the integrated production scheduling and distribution problem. We presented a machine learning framework to choose the best heuristic to solve each specific instance of the problem. Computational experiments revealed that the results obtained through the framework were more effective than the resolution of the MILP model and heuristic. Furthermore, changes in the characteristics of the instances had a minimal effect on the framework’s results, making this strategy suitable for determining the best algorithm to use to solve each problem instance. In future studies, we suggest implementing Local Search Algorithms and Iterated Local Search (ILS) to solve this problem. Furthermore, we recommended adapting the Machine Learning Framework to estimate the best local search algorithms to be used by the ILS to solve a given instance. In addition, the Machine Learning Framework should be used in other combinatorial optimization problems. The ML Framework can be used, in these cases, to estimate the best meta-heuristic to solve a given problem instance. Acknowledgments. This work was supported by CAPES and CNPq.
130
M. de Freitas Araujo
References 1. Chen, Z.L.: Integrated production and distribution operations. Handbook of quantitative supply chain analysis. 74, 711–745 (2004) https://doi.org/10.1007/978-14020-7953-5 17 2. Chen, Z.L.: Integrated production and outbound distribution scheduling: review and extensions. Oper. Res. 58(1), 130–148 (2010) 3. Felix, G.P., Arroyo, J.E.C.: Heur´ısticas para o sequenciamento da produ¸ca ˜o e roteamento de ve´ıculos com frota heterogˆenea. LII Simp´ osio Brasileiro de Pesquisa Operacional 52 4. Hofmann, E., R¨ usch, M.: Industry 4.0 and the current status as well as future prospects on logistics. Comput. Ind. 89, pp. 23-34 (2017) 5. Hou, Y., Fu, Y., Gao, K., Zhang, H., Sadollah, A.: modelling and optimization of integrated distributed flow shop scheduling and distribution problems with time windows. Expert Syst. Appl. 187, 115827 (2021) 6. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). CoRR abs/1412.6980 7. Kurdi, M.: A memetic algorithm with novel semi-constructive evolution operators for permutation flow shop scheduling problem. Appl. Soft Comput. 94, 106458 (2020) 8. Liu, L., Li, W., Li, K., Zou, X.: A coordinated production and transportation scheduling problem with minimum sum of order delivery times. J. Heuristics 26(1), 33–58 (2020) 9. Mao, J.Y., Pan, Q.K., Miao, Z.H., Gao, L.: An effective multi-start iterated greedy algorithm to minimize make span for the distributed permutation flow shop scheduling problem with preventive maintenance. Expert Syst. Appl. 169, 114495 (2021) 10. Martins, L.D.C., Gonzalez-Neira, E.M., Hatami, S., Juan, A.A., Montoya-Torres, J.R.: Combining production and distribution in supply chains: the hybrid flow-shop vehicle routing problem. Comput. Ind. Eng. 159, 107486 (2021) 11. Nagano, M., Tomazella, C., Tavares-Neto, R., Abreu, L.: Solution methods for the integrated permutation flow shop and vehicle routing problem. J. Proj. Manage. 7(3), 155–166 (2022) 12. Nawaz, M., Enscore, E.E., Jr., Ham, I.: A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega 11(1), 91–95 (1983) 13. Ngueveu, S.U., Prins, C., Calvo, R.W.: An effective memetic algorithm for the cumulative capacitated vehicle routing problem. Comput. Oper. Res. 37(11), 1877– 1885 (2010) 14. Pan, Q.K., Gao, L., Wang, L., Liang, J., Li, X.Y.: Effective heuristics and metaheuristics to minimize total flowtime for the distributed permutation flow shop problem. Expert Syst. Appl. 124 (2019) 15. Ribeiro, G.M., Laporte, G.: An adaptive large neighborhood search heuristic for the cumulative capacitated vehicle routing problem. Comput. Oper. Res. 39(3), 728–735 (2012) 16. Solomon, M.M.: Algorithms for the vehicle routing and scheduling problems with time window constraints. Oper. Res. 35(2), 254–265 (1987) 17. Ta, Q.C., Billaut, J.C., Bouquard, J.L.: Heuristic algorithms to minimize the total tardiness in a flow shop production and outbound distribution scheduling problem. In: 2015 International conference on industrial engineering and systems management (IESM), pp. 128–134 (2015)
Heuristics Assisted by Machine Learning for the Integrated
131
18. Tamannaei, M., Rasti-Barzoki, M.: Mathematical programming and solution approaches for minimizing tardiness and transportation costs in the supply chain scheduling problem. Comput. Ind. Eng. 127, 643–656 (2019) 19. Wang, S., Wu, R., Chu, F., Yu, J.: Variable neighborhood search-based methods for integrated hybrid flow shop scheduling with distribution. Soft Comput. 24(12), 8917–8936 (2020) 20. Zar, J.H.: Biostatistical analysis. 4th. New Jersey, USA, p. 929 (1999) 21. Zhang, X., Li, X.T., Yin, M.H.: An enhanced genetic algorithm for the distributed assembly permutation flow shop scheduling problem. Int. J. Bio-Inspired Comput. 15(2), 113–124 (2020) 22. Zou, X., Liu, L., Li, K., Li, W.: A coordinated algorithm for integrated production scheduling and vehicle routing problem. Int. J. Prod. Res. 56(15), 5005–5024 (2018)
LSTM-Based Congestion Detection in Named Data Networks Salwa Abdelwahed1,2(B) and Haifa Touati1,2 1
Hatem Bettahar IResCoMath Lab, University of Gabes, Gabes, Tunisia [email protected], [email protected] 2 Faculty of Sciences of Gabes, Gabes, Tunisia
Abstract. Nowadays, Named Data Network (NDN) presents the most famous paradigm based on Information Centric Network (ICN). Known as the Internet of the future, it searches on data content rather than its localisation. Congestion control is still a research focus in NDN. Seen in-network caching and since interests can be satisfied at different nodes, the end to end TCP congestion control doesn’t work. Congestion control protocols must be adapted to the specific features of NDN. Also, congestion detection presents an essential step in congestion control procedure. Some non-intelligent techniques only detect congestion when it occurs and then flag it. In this paper, we propose an intelligent consumer-based detection method to improve the congestion control efficiency by giving the consumer the ability of perception and speeding up detection, so that network overload could be known in advance. To detect congestion, the proposed scheme integrates a perception andprediction strategy based on a type of Recurrent Neural Network, namely Long Short Term Memory (LSTM).
Keywords: Named Data Network Term Memory LSTM
1
· Congestion detection · Long Short
Introduction
Seen the evolution of network traffics towards speed, diversity and large volume, research works have focused on ways of managing, processing and storing data in network. Trials have attempted to move to content-centric architectures instead of host-centric ones. This leads to the design of Information Centric Networks (ICN). Among these designs, the Named Data Networking (NDN) architecture is the most famous. As stated in RFC 7927 [1], NDN re-designs the Internet communication paradigm from IP address based model to content name based mode. A content can be retrieved from any intermediate router or from the original content sources, according to the cache policies. In NDN, content retrieval is managed by the exchange of two types of packets: an Interest packet sent by the node that requests the content, i.e. the consumer, and a Data packet sent either by the content provider, i.e. the producer, or any intermediate router that c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 132–142, 2023. https://doi.org/10.1007/978-3-031-35510-3_14
LSTM-Based Congestion Detection in Named Data Networks
133
caches a copy of the content. To manage data dissemination, each NDN node implements three data structures: the Content Store (CS), The Pending Interest Table (PIT) and the Forwarding Information Base (FIB). The CS is used to store Data replicas in order to serve upcoming demands for the same content. The PIT tracks the outstanding Interests and the interface from which they come, in order to deliver back the corresponding Data packet using the Interest reverse path. Finally, the FIB serves for Interest routing. Upon receiving an Interest packet, the NDN node searches in the CS. If there’s an available matching data, it will be returned to the incoming interface. Otherwise, the node searches in the PIT, if a matching entry is found, the incoming interface is appended to that entry, if not a new entry is created in the PIT and the Interest is forwarded through the interface(s) specified by the FIB [2]. When a data packet is received, the NDN node checks the PIT, If a matching entry exists, it forwards the Data packet to all interfaces listed in the PIT, removes the corresponding PIT entry and caches the Data in the CS. Otherwise, it rejects the packet. Among the most important research topics in NDN: the congestion control. Mainly congestion events result from network overload and full queues. If the consumer emission window is large, packets will queue up in routers causing the increase of the RTT values and possible packet loss. The consumer experiences congestion when a packet drop took place, it then decreases the interest sending rate. RTT based congestion control methods don’t work effectively in NDN since Data packets can be brought from any intermediate node in the whole network [3]. Congestion can cause throughput degradation and ineffective use of network resources. Hence, we require to speed up congestion detection so that network overload could be known in advance and we can operate and control network resources effectively. In the literature, NDN congestion detection methods include: detecting the excessive increase in the PIT size, comparing interest sending rate with data arrival rate [4], and monitoring outgoing queues sizes of intermediate nodes and returning explicit notifications [5–7]. Recently, Machine learning techniques reached good achievements in several fields, including networking, which encourages us to explore its potential in congestion prediction. In this paper, we propose a method for predicting the level of congestion in NDN using an LSTM model based on Recurrent Neural Networks. The remainder of the paper is organized as follows: Sect. 2, presents our backgrounds and reviews recent research related to NDN congestion control. Then, Sect. 3, describes the proposed LSTM based congestion detection method, Sect. 4 evaluates its performances through different simulations. Finally, Sect. 5, concludes the paper and presents some future directions.
2 2.1
Background and Related Works Long Short Term Memory Background
Recurrent Neural Networks (RNNs), are able to generate multiple outputs from a single input. It’s a type of deep network with repetition between two movements forwarding and back-warding. RNNs remember only the last part of information
134
S. Abdelwahed and H. Touati
and cause the vanishing gradient problem, a phenomenon that appears when the number of layers increases and some information can be lost. Weight values become almost equal and the gradient becomes more and more smaller. Also, RNNs consisting of cigma cells or tanh cells are not able to learn the relevant information of input data and the long term dependencies built into time series when the input gap is large [13]. A solution for learning long term dependencies and vanishing gradient problem, is the LSTM cell, which is also a type of RNNs that uses a short or a long memory, frequently used for time series prediction thanks to their powerful learning capacities [14]. Allowing the network to store or forget information from the past, depends on whether or not it is decisive in predicting the future. An explanatory diagram of LSTM cell, is shown in Fig. 1.
Fig. 1. LSTM prediction model [19]
where ht−1 , is the value of previous hidden layer, Xt is the new input value of a sequence, ht is the hidden state at time t, Ct is the cell state at time t, ft and ot are the forget and the output gates. Hidden layers of LSTMs keeps its state as well as the weight values in order to not lose sequential information and we guarantee an improvement in learning. LSTM networks, have been used for modeling dynamic system in multiple topics such as image processing, energy consumption, autonomous systems, speech recognition and communication [15]. Time series prediction is a process that extracts useful historical information and determines future values, this mechanism helps us very well in network transmissions, to have the notion of perception and know the state of networks before sending a request or taking an action between increasing or decreasing the transmission rate, by controlling the sending interests.
LSTM-Based Congestion Detection in Named Data Networks
2.2
135
Related Works
Regarding the particular features of NDN, a large body of works has been proposed in the literature to address the transport strategy challenges and design specific congestion control schemes for NDN. A summary of these works is presented in Table 1. In [9], the authors proposed a Rate Based Congestion Control (RB-CC) method for NDN. To detect congestion, RB-CC compares the consumer’s interest sending rate with the data arrival rate. If the data arrival rate is lower than the interest sending rate, a congestion deduced and the interest sending rate is decreased multiplicatively. If the two rates are equal, the interest sending rate will increase following CUBIC TCP curve [16]. The authors noticed that the accuracy measurement is not tied to RTT but it depends on network traffic. In an another work [10], Yi et al. proposed a consumer based control scheme called BBR-Inspired Congestion control for Data fetching over NDN (BBR-CD). This scheme estimates the Bandwidth Delay Product (BDP) of the bottleneck in order to compute and control the interest pacing rate and the NDN congestion window. BBR-CD integrates four modules: a loss detector, a BDP estimator, an inflight controller and an interest scheduler. In [11], the authors proposed an Intelligent Forwarding Strategy for congestion control using Q-Learning and LSTM (IFSQLSTM). An LSTM model is used to predict the new PIT entry rate in next time gap and the amount of data packets based on PIT entry rate, in order to know if a congestion has taken place or not. then the data is sent to a non congestive path using Q-learning according to the LSTM output. Authors of [12] proposed an Adaptive Congestion Control Protocol (ACCP) which uses Deep Belief Network (DBN) with Restricted Boltzmann Machine (RBM) for traffic prediction. The DBN trained the time series to study low dimensional feature which is used to train the Gaussian Restricted Boltzmann Bachine (GCRBM). The GCRBM predicts the later time series data. Then, the congestion level is estimated and fed back to the receiver in a NACK packet. Based on these notifications, the receiver adjusts the interest sending rate according to the Exponential Increase Addition Increase Multiplication Decline algorithm. Authors of [8] propose an Edge computing Aided Congestion Control (EACC) using Neuro-Dynamic Programming (NDP) in NDN. EACC detects congestion during the transmission along the path by controlling buffer occupancy in intermediate nodes. A new congestion indicator field is added in the data packet and used to notify the congestion event. Then congestion is avoided by controlling interests emissions. Interest transmission control is performed by deploying a computing unit in each edge router. Then, the forwarding process is formulated in a Markovian Decision Process (MDP) that is solved using NDP. Finally, authors of [17] propose an Intelligent Edge-Aided Congestion Control Scheme for NDN (IEACC). A proactive congestion detection scheme is implemented in intermediate routers that exchange congestion information using data packets. IEACC divides the data packets according to their degree of congestion
136
S. Abdelwahed and H. Touati
using a clustering algorithm, and provides specific inputs to the Deep Reinforcement Learning (DRL) algorithm implemented in the edge routers in order to form a trusted neutral for maintaining fairness and avoiding congestion. Table 1. NDN congestion control mechanisms Protocol
Year Consumer or router based
ML-based Congestion detection method
RB-CC [9]
2021
Consumer-based
−
Compare interest Multiplicative sending rate and decrease/CUBIC data arrival rate
BBR-CD [10]
2021
Consumer-based
−
BDP estimation
BBR-CD, RTT filtering and interest scheduling
IFSQLSTM [11] 2021
Router-based
+
LSTM
Q-learning
ACCP [12]
2018
Router-based
+
Deep learning
Adjust Interest sending rate
EACC [8]
2020
Edge router-based
+
Record NDP congestion status in data packet
IEACC [17]
2022
Router-based
+
Proactive congestion detector
3
Congestion control method
Clustering and DRL
LSTM-Based Congestion Detection
The main problem in congestion control, is that the sender doesn’t know the network capacity until after it figures a packet drop. Some detection strategies monitor the outgoing queues of each router and detect congestion only when it occurs in network. Given the wide use of Machine Learning techniques in various fields, it has also been integrated into several strategies in the field of transmission networks. To accelerate the congestion detection step, we propose to make predictions. Inspired by the work presented in [11], we design a consumer based approach that uses neural networks (LSTM) to predict packet data rate, the reason is to give it the ability of perception. To predict packet data rate we propose the use of Seq2Seq LSTM also named encoder-decoder LSTM. Seq2Seq LSTMs are used for predicting an output sequence based on an input sequence of time series [15]. Time series forecasting requires analysis of data to extract its characteristics. It is used for non-stationary data types, this is the case of network transmissions where the rate can vary suddenly. The structure reads the inputs and predicts an output for each input using a time step to form a vector representation of the predicted packet data rate sequence of each consumer, with the same dimension as the input recorded packet data rate sequence. LSTM can predict a long sequence and extract the long and short
LSTM-Based Congestion Detection in Named Data Networks
137
term dependencies. Congestion can be detected by comparing the input and the output stream throughput (when the predicted data rate value in time t + 1, is less than the corresponding interest packet rate). Indeed, our scheme allows to effectively control network congestion. Chosen LSTM parameters are shown in Table 2. Our scheme is presented in Fig. 2: First, we simulate the network with the presented topology and with different parameters for long simulation periods, to extract a sufficient number of indata rate samples, in order to build a sequence that will undergo an LSTM prediction. The model is based on a CNN, 3 LSTM layers and 1 dense layer. We accumulate the sample sequence, and we use a part for training and the other for the test. We predict for each input, an equivalent output. We remark that whenever we increase the number of data, LSTM predicts better. Sequence input and output model is perfectly suitable for our application. Table 2. LSTM settings Parameters and hyperparameters Setting Values Activation function Loss function Optimizer Num of LSTM layer Num of dense layer Epochs Verbose Model
Fig. 2. System model
Relu/Sigmoid MSE Adam 3 1 50 1 sequential
138
4
S. Abdelwahed and H. Touati
Performance Evaluation
We simulated our proposed scheme using the ndnSIM simulator [18]. As shown in Fig. 3, we used a topology widely used for congestion-related scenarios, with 2 consumers, 2 producers and a bottleneck link between the 2 routers. Simulation parameters are summarized in Table 3. To show the reliability of LSTM prediction, we simulated three scenarios using the NDN default PCON algorithm [4]: variable and non variable link with Consumer CBR application, and a non variable link with Consumer ZipfMandelbrot application. In the non variable link scenario, the 2 consumers are present from the beginning until the end of transmission, while in the variable link scenario, consumer 2 shares the transmission link with consumer 1 only in defined periods, during the other periods, consumer 1 is the only consumer in the network. A Seq2Seq LSTM, was implemented using python 3.7. The model used is a CNN-LSTM, with 3 layers, the activation functions are Relu and Sigmoid, the number of units is 32.
Fig. 3. NDN simulation topology Table 3. Simulation parameters Consumer CBR
Interest rate Routers CS size Consumer application Caching strategy
Consumer Zipf Mandelbrot Interest rate Routers CS size Consumer application Caching strategy
100 I/s 1000 CBR LRU 40 I/s 50 ZipfMandelbrot LRU
The prediction results of consumer 1 Indata rate are shown in Figs. 4, 5 and 6. LSTM shows a good prediction of Indata rate sequence in different scenarios, especially in the case of using a large number of samples. The prediction obtained has a level of variance close to the original series. Figure 5 shows that the model predicts the peaks of increase and decrease of Indata rate caused by
LSTM-Based Congestion Detection in Named Data Networks
139
the variable link scenario. In Fig. 6 using the ConsumerZipfMandelbrot application and when the rate becomes more variable than the scenarios using the ConsumerCbr application, LSTM also follows the same level of variance. The model accuracy presented in Figs. 7, 8 and 9 show a good fit to the data. The loss function, defines the prediction error. In our case, it decreases until reaching 0.0388, 0.0051 and 0.1570, respectively for the three scenarios non variable ConsumerCbr, variable ConsumerCbr, and non variable ConsumerZipfMandelbrot. A value close to zero of loss function shows the very small difference between the Actual and the Predicted Indata values.
Fig. 4. LSTM Indata rate prediction for Consumer1 (Non variable Link)
Fig. 5. LSTM Indata rate prediction for Consumer1 (Variable link)
Fig. 6. LSTM Indata rate prediction for Consumer1 (ConsumerZipfMandelbrot)
140
S. Abdelwahed and H. Touati
Fig. 7. Model accuracy (non variable link)
Fig. 8. Model accuracy (variable Link)
Fig. 9. Model accuracy (ConsumerZipfMandelbrot)
5
Conclusion
The prediction information during transmission using prediction systems is needed to detect congestion, know the congestion level, and this is effective for its prevention. In this paper, we have proposed an LSTM congestion detection method by the prediction of future indata rate based on its current value
LSTM-Based Congestion Detection in Named Data Networks
141
calculated at the consumer side in order to give it the ability of perception and prediction. The LSTM structure has shown its capability of time series forecasting and we have achieved good results. Our future work will be devoted to the design of a congestion control solution complementary to our detection method.
References 1. Kutscher, D., Eum, S., Pentikousis, K., Psaras, I., Corujo, D., Saucez, D., Schmidt, T., Waehlisch, M.: Information-Centric Networking (ICN) research challenges. RFC 7927, 1–38 (2016) 2. Touati, H., Mejri, S., Malouch, N., et al.: Fair hop-by-hop interest rate control to mitigate congestion in named data networks. Cluster Comput., 2213–2230 (2021) 3. Mejri, S., Touati, H., Kamoun F.: Are NDN congestion control solutions compatible with big data traffic. In: International Conference on High Performance Computing & Simulation, (HPCS), pp. 978-984(2018) 4. Klaus,S., Cheng, Y.,Beichuan, Z.,Lixia, Z. : A practical congestion control scheme for named data networking. In: 3rd ACM Conference on Information-Centric Networking, pp. 21–30 (2016) 5. Mejri, S., Touati, H., Kamoun, F.: Preventing unnecessary interests retransmission in named data networking. In: IEEE International Symposium on Networks, Computers and Communications (ISNCC), pp. 1–6 (2016) 6. Mejri, S., Touati, H., Kamoun F.: Hop-by-hop interest rate notification and adjustment in named data networks. In: IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6 (2018) 7. Mejri,S., Touati, H., Malouch, N., Kamoun, F.: Hop-by-hop congestion control for named data networks. In: 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 114–119 (2017) 8. Qin, J., Xing, Y., Wei, W., Xue K.: Edge computing aided congestion control using neuro-dynamic programming in NDN. In: IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2020) 9. Sichen ,S., Lixia ,Z. : Exploring rate-based congestion control in NDN. In: 8th ACM Conference on Information-Centric Networking, pp. 141–143 (2021) 10. Yi, H., Constantin, S., Lan, W., Alex, A., Lixia, Z.: BBR-inspired congestion control for data fetching over NDN. In: MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM), pp. 426–431 (2021) 11. Sanguk, R., Inwhee ,J., WonTae, K.: Intelligent forwarding strategy for congestion control using Q-Learning and LSTM in named data networking. Mobile Information Systems, pp. 1–10 (2021) 12. Tingting, L., Mingchuan, Z., Junlong, Z., Ruijuan, Z., Ruoshui, L., Qingtao, W.: ACCP: adaptive congestion control protocol in named data networking based on deep learning. Neural Comput. Appl. 31, 4675–4683 (2019) 13. Yong, Y., Xiaosheng, S., Changhua, H., Jianxun, Z.: A review of recurrent neural networks : LSTM cells and network architectures. neural computation, pp. 1235– 1270 (2019) 14. Vincent, F., Peter, H., Riashat, I., Marc, G., Bellemare, Joelle, P.: An Introduction to Deep Reinforcement Learning. Foundations and Trends in Machine Learning (2018) 15. Benjamin, L., Timo, M., Hannes, V., Nasser, J., Michael, W.: A survey on long short-term memory networks for time series prediction. Procedia CIRP, 650–655 (2021)
142
S. Abdelwahed and H. Touati
16. Sangtae, H., Injong , R., Lisong, X.: CUBIC: A New TCP Friendly High Speed TCP Variant. Operating Systems Review, pp. 64–74 (2008) 17. Jiayu, Y., et al.: IEACC: an intelligent edge-aided congestion control scheme for named data networking with deep reinforcement learning. IEEE Trans. Network Serv. Manage. (2022) 18. Mastorakis, S., Afanasyev, A., Moiseenko, I., Zhang, L.: ndnSIM 2.0: a new version of the NDN simulator for NS-3. NDN Technical report NDN-0028 (2015) 19. Wei, X., Chuan, X., Hongling, L., Xiaobo, L.: A hybrid LSTM-based ensemble learning approach for China coastal bulk coal freight index prediction. J. Adv. Transp., 1–23 (2021)
Detection of COVID-19 in Computed Tomography Images Using Deep Learning J´ ulio Vitor Monteiro Marques1 , Cl´esio de Ara´ ujo Gon¸calves2 , 3 Jos´e Fernando de Carvalho Ferreira , Rodrigo de Melo Souza Veras1 , Ricardo de Andrade Lira Rabelo1 , and Romuere Rodrigues Veloso e Silva1,3(B) 1
Computer Science - PPGCC/UFPI, Teresina, Piau´ı, Brazil [email protected] 2 Eletrical Engineering - PPGEE/UFPI, Picos, Piau´ı, Brazil 3 Information Systems - CSHNB/UFPI, Picos, Piau´ı, Brazil
Abstract. COVID-19 is an infectious disease caused by the novel coronavirus (SARS-COV-2). The global total number of cases is 618 million, leading to 6.5 million deaths by October 2022. As this disease is highly contagious, diagnosis and necessary measures to prevent its spread, including quarantine, must be done early. To help with diagnosis and screening, we propose a deep learning solution that leverages CT images. This solution is based on a new methodology for image pre-processing that focuses on improving image characteristics, combined with data augmentation, transfer of learning and fine-tuning of the model. Of the three convolutional neural networks used - ResNet101, VGG19 and InceptionV3 - the ResNet101 model had the best performance, reaching 98% Precision, 96.84% Kappa, 97.78% Precision, 97.55% Sensitivity and 97.70% F1Score. The promising results demonstrate that the proposed method can help specialists to detect this disease. Keywords: Computed Tomography · COVID-19 Fine-Tuning · Pre-Processing · Transfer Learning
1
· Deep Learning ·
Introduction
COVID-19 is an infectious disease caused by the novel coronavirus (SARS-COV2) [1], which has recorded more than 618 million confirmed cases and 6.5 million deaths worldwide as of October 2022 [2]. Symptoms can range from mild clinical cases with rapid recovery to severe health cases caused by pneumonia, respiratory failure, septic shock, and multiple organ failure [3]. The standard for diagnosing COVID-19 is real-time reverse transcription polymerase chain reaction examination with pharyngeal swabs (RT-CPR) [4]. As this disease is highly contagious, the diagnosis must be made early to take the necessary measures, including isolating the patient, thus preventing the proliferation of such disease.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 143–152, 2023. https://doi.org/10.1007/978-3-031-35510-3_15
144
J. V. M. Marques et al.
Although, other methods for evaluation and diagnosis have been widely studied, such as radiography and computed tomography (CT). These tests may contribute evidence of suspected COVID-19 [5]. Radiography is usually the most available, being the most indicated for evaluating extremities such as the chest, and is the first-line test for detecting pneumonia. However, for COVID-19, radiography has limited performance compared to chest CT [6]. Chest CT may reveal lesions that cannot be detected on a radiography. However, the analysis of chest CT images requires a great manual effort, which makes the work and analysis exhausting, in addition to requiring a team familiar with the findings in the images suggestive of viral pneumonia compatible with COVID-19 [7]. The present work aims to automatically identify chest CT images that have findings compatible with pulmonary diseases, such as COVID-19 and pneumonia. For this purpose, we used deep learning techniques combined with image preprocessing and data augmentation techniques. This work analyzed the following architectures of convolutional neural networks (CNNs): ResNet101, VGG19, and InceptionV3 using transfer learning and fine-tuning on a dataset with chest CT images with three classes: COVID-19, pneumonia, and healthy. In addition, we investigated the impact of image pre-processing and its advantages to ensure the reliability and effectiveness of the method used. With this, we intend to reduce the manual workload and overcome the problem of a lack of specialized teams with such findings. As contributions, we can mention: 1) the evaluation of finetuning after the transfer of learning; 2) a novel pre-processing method for CT images; and 3) the more accurate evaluation of the pre-processing method using the activation regions of the networks used.
2
Related Work
The advancement of COVID-19 due to its high transmission rate, allow several studies to the detection of pulmonary diseases using deep learning techniques in chest CT images. During the literature survey, it was possible to observe works with promising performance about the metrics used. This section presents recent studies on detecting viral pneumonia caused by COVID-19 on chest CT images. In the study by Seum et al. [8], the authors evaluated the performance of 12 pre-trained deep learning models, using image pre-processing and segmentation, where ResNet18 had the best performance, reaching 89.92% accuracy, 80.40% sensitivity, 99.59% specificity, 99.50% precision, and 88.94% F1-Score. In the work [9], the authors proposed a new model of deep learning with self-adaptive auxiliary loss called DSN-SAAL. Using image pre-processing techniques, and the proposed model obtained metrics of 99.43% accuracy, 99.36% sensitivity, 99.52% specificity, 99.44% F1-Score and 99.95% AUC. In [10], the authors compared the performance of three ResNet [11], (ResNet18, ResNet-34 and ResNet-50), using the datasets of [12] and [13]. Using preprocessing techniques on CT images, ResNet-18 obtained better results, with an accuracy of 94.30%, a sensitivity of 91.40%, a specificity of 97.30%, a precision of 97.10%, F1-Score of 94.20% and AUC of 98.50%. In the work [14], the authors
Detection of COVID-19 in Computed Tomography Images
145
proposed a new diagnostic model based on a seven-layer neural network, using image pre-processing techniques and the data augmentation technique. The proposed method obtained 94.03% of accuracy, 94.44% of sensitivity, 93.63% of specificity, and 94.06% of F1-Score. Finally, we have the work [15], where the authors proposed a methodology for selecting images, and a new pyramid-shaped network based on ResNet50V2. The method proposed by the authors presented better results in general, obtaining 98.49% of accuracy, 94.96% of sensitivity, 94.70% of specificity, and 81.26% of precision. During the literature survey, it was possible to observe that the authors did not fully explore the use of pre-processing techniques in their research, limiting themselves to resizing images and some clippings that can lead to a loss of image quality. To investigate this issue, we propose a method that uses pre-processing steps to improve input data in CNN architectures. Additionally, we applied the fine-tuning technique to improve the performance of the evaluated models and investigated the use of three CNNs: ResNet101, VGG19, and InceptionV3. Thus, we can also investigate the impact of the pre-processing methodology on the results.
3
Materials and Methods
3.1
Image Acquisition
One of the main problems in developing deep learning research in the context of the new coronavirus in CT images is the scarcity of data. To work around this issue, we use the COVIDx CT-2A [16] image base. The COVIDx CT-2A dataset was generated from several open datasets and comprises 194.922 CT slices from 3.745 patients, publicly available at Kaggle1 . This set is composed by three classes: Healthy (60,083 images), Pneumonia (40,291 images), and COVID-19 (94,548 images), divided in to training, validation, and testing sets with unbalanced data. These unbalanced data can lead to problems in building models and the final generalization of the network. We balance the data using a partial data subsampling technique to avoid these problems. Finally, we have a balanced dataset with 76,200 samples for training, 18,720 samples for validation and keeping the original 25,658 samples for the test set. 3.2
Pre-processing
The pre-processing step in CT images can improve the quality of the images, highlighting the regions of interest, and improving the performance of CNN models. Therefore, we present the methodology for pre-processing chest CT images in this session. The dataset images come from many different data sources. There are several types of CT with different characteristics, resolutions, dimensions, and noises. The methodology’s objective is to remove these noises and highlight the region of interest without losing details, improving the classification of these images. Figure 1 demonstrates all the steps of the methodology used. 1
https://www.kaggle.com/hgunraj/covidxct.
146
J. V. M. Marques et al.
Fig. 1. Pre-processing methodology flow. We present the outputs after each preprocessing step that we used on the images during the proposed methodology.
The input images received transformations to highlight the region of interest. First, we applied the Otsu threshold to binarize the images; after, we inverted the background of the images and applied an erosion with a structuring disc element of size 15. Values between 2 and 20 were tested for the size of the structuring element. The erosion at this stage was responsible for eliminating the small regions outside the region of interest that consists of the rib cage. After that, we applied the convex hull to highlight the region of interest, using the set of pixels included in the smallest convex polygon that surrounds all the white pixels; then, it was possible to obtain the segmentation of the region of interest in the original images, using the multiplication between the original image and the convex hull. In the next step, a crop is applied to the images, keeping only the rib cage, then the zero padding technique is applied to complete the images. Finally, we resize all the images to 300 × 300 pixels, leaving them with square dimensions to serve input for CNNs. 3.3
Data Augmentation
Although we have a large image dataset compared to others in the literature, we used data augmentation after analyzing many state-of-the-art works that show that this technique brings greater generalization ability to CNNs. The parameters used to increase the data were: horizontal flip, vertical flip, zoom range of 0.05, rotation range of 360◦ , width shift range of 0.05, height shift range of 0.05, and shear range of 0.05. These parameters were used based on the work of [15]. 3.4
Evaluated Architectures
Deep Learning is a subfield of artificial intelligence that trains algorithms to perform complex tasks like humans. One of these tasks is image identification.
Detection of COVID-19 in Computed Tomography Images
147
Convolutional neural networks (CNN) are used for this imaging task, this type of network has a deep and hierarchical architecture. CNNs use convolution layers to extract features from images and represent them as information. This information passes through the pooling layer to reduce its dimensionality; at the end of the network, the fully connected layers are present, which are responsible for classifying this information. After analyzing several works, ResNet101, VGG19, and InceptionV3 networks are chosen. The first architecture is VGG-19, a version of the VGGNet network proposed by [17]. This network has 19 trainable layers, divided into five blocks and three fully connected layers. This network has a total of 143 million parameters. The Residual Neural Network (ResNet) proposed by [11] is developed to solve the problem of gradient disappearance that occurs with the addition of many layers in a sequential model. Second, we have the ResNet architecture, formed by residual blocks that ignore the connections between the input of the block itself and the output. ResNet has five architectures of different depths: 18, 34, 50, 101, and 152 trainable layers. In this work, ResNet101 is evaluated, which has a topology of 101 layers. Finally, we have InceptionV3 proposed by [18] as a successor to the GoogLenet and InceptionV2 architectures. With 159 layers and 24 million parameters, InceptionV3 has symmetric and asymmetric blocks, convolutional layers, MaxPolling, feature concatenation, dropouts, and dense layers. 3.5
Proposed Method
Figure 2 shows the workflow of the proposed method. In the first step, the input images go through the pre-processing, generating square images where the region of interest is highlighted for classification. After that, the images are resized to 300 × 300, and we apply the data augmentation technique to the training images. The final step is feature extraction and classification using CNN models with transfer learning and fine-tuning. Models used with ImageNet’s pre-trained weights. Since the layers of the models are frozen, we only use the weights of the convolutional layers. At the end of the evaluated CNNs, we added two dense layers of 1024 neurons with a dropout of 0.2 that use Relu as the activation function. The last layer, also known as the classification layer, is a dense layer with a softmax function that classifies images into three classes: COVID-19, Healthy, and Pneumonia. To compile the CNNs, we use the Adam optimizer with a learning rate of 0.001; and the categorical cross entropy as the loss function. We performed the training of the pre-trained model for 20 epochs and used accuracy as an evaluation metric. The model with the best performance was fine-tuned. For this, we thawed half of the layers and retrained the model for ten epochs to adjust the weights. The choice of parameters was made through empirical tests, where several values were used, and the ones mentioned above showed the best performance. In Table 1, we can see all the hyperparameters used to configure the networks used.
148
J. V. M. Marques et al.
Input
Pre-Processing Dense 1024
Pooling Image (300x300) Data Augmentation
Convolutional Layers
D R O P O U T
Dense 1024
0.2
Convolutional Layers of the Pretrained Model
D R O P O U T
Dense 3
COVID-19
Healthy SoftMax Pneumonia
0.2 Added Layers
COVID-19 Unfrozen the lower half of
Fine-Tuning
Healthy
the convolutional layers Pneumonia
Fig. 2. Flowchart of the proposed methodology and its steps. Table 1. Hyperparameters used during transfer learning (TL) and fine tuning (FT). Models
E Op
LR
BS DL NN
Models with TL 20 Adam 0.001 32 3 Models with FT 10 Adam 0.00001 32 3
4
Dp
1024,1024,3 0.2 1024,1024,3 0.2
Experimental Results
In this section, we report the results in three subsections. In the first one, we report the results of the classification of the images of the networks trained through the transfer of learning without image pre-processing and with image pre-processing; in the second, we demonstrate the importance and difference of the method with image pre-processing. Finally, in the third subsection, we report the best results of the model after using transfer learning, where the model was fine-tuned to adjust the weights. All tests were performed on the test dataset, where networks have no prior knowledge. In the experiments, we used five evaluation metrics: accuracy (Acc), Kappa index (Kappa), precision, sensitivity (Sens), and F1-Score (F1) [19]. We implemented all algorithms and CNNs on an 11 GB RTX 2080 Ti GPU. We use the Keras [20] library with the Tensorflow backend to develop and run the algorithms. 4.1
Transfer Learning Results
In Table 2, we present the results obtained by the models trained during 20 epochs using transfer learning. Also, we show the results with and without the pre-processing methodology. According to Table 2, there is a small improvement in the metrics when we use the model with the pre-processed images. However, this performance gain
Detection of COVID-19 in Computed Tomography Images
149
Table 2. Results obtained with the models without image pre-processing and their parameter numbers in millions (M). Models
Acc
Kappa Precision Sens
F1
Parameters
No Pre-processing VGG19 93.74 90.08 ResNet101 95.33 92.68 InceptionV3 91.11 85.91
93.36 94.57 90.32
92.66 92.99 143.7M 95.18 94.86 44.7M 89.11 89.55 23.9M
93.63 95.56 91.45
92.58 93.03 143.7M 94.78 95.11 44.7M 90.74 91.03 23.9M
Pre-processing VGG19 93.88 90.30 ResNet101 95.68 93.16 InceptionV3 92.17 87.61
is not so satisfactory, as only the metrics may not present the real value of the method with pre-processing. We will make a more detailed evaluation in the next subsection. The method that uses pre-processing maintained the quality of the images, highlighting the region of interest. Analysing the metrics, although there seems to be a small difference between the raw dataset and the preprocessed dataset, there is a visual difference when using Grad-CAM [21] to compute the activation regions of the models. We can observe that the model is identifying and learning the lesions, thus illustrating the true result obtained. In Fig. 3, we can see this difference. In the experiment trained on the original images, learn the dataset’s characteristics and may not represent the problem. On the other hand, the experiment that uses pre-processing focuses on the essential characteristics to classify these images, highlighting the regions of interest and maintaining quality, which brings us greater reliability in the results. Furthermore, the proposed method is more robust considering Grad-CAM views, making it possible to classify images from different acquisition methods and keeping the focus within the region of interest. 4.2
Fine-Tuning Results
After analyzing the metrics of the models with the transfer-learning technique, we selected ResNet101 as the best model to perform the fine tuning. We unfreeze its convolutional layers and then freeze half of them to adjust the weights. To recompile the model, we used the same transfer-learning method, with a single difference in the learning rate of the Adam optimizer, where the rate used for fine-tuning was 0.00001. In this experiment, we used the pre-processed dataset, as it has the regions of interest highlighted and gives us greater reliability in the results. Table 3 presents ResNet101 results using transfer-learning (TL) compared to the fine-tuning (FT) approach. It was possible to notice an increase in the metrics after adjusting the weights.
150
J. V. M. Marques et al.
Fig. 3. Visual representation using Grad-CAM showing the activation regions for each model with and without pre-processing. Table 3. Comparison between ResNet 101 with transfer learning (TL) and fine tuning (FT) and their parameter numbers in millions (M). Models Acc (TL) (FT)
5
Kappa Precision Sens
95.68 93.16 98.00 96.84
95.56 97.87
F1
Parameter
94.78 95.11 44.7M 97.55 97.70 44.7M
Discussion
In Table 4, we compare the proposed method with the state-of-the-art works addressed in this work. It is possible to verify that the results obtained are encouraging and very close to the literature. We can also analyze that fine-tuning affects the model’s performance, and pre-processing gives us greater reliability in predictions. We have overcome some works, but we still need to optimize the proposed model and pre-processing method to achieve better results. The proposed method presents an evaluation with the second largest number of images, behind only the work of [15], but we present promising metrics when compared to the literature. Table 4. Comparison of the proposed work with the state-of-the-art. Methods
Acc
Kappa Precision Sensitivity F1
Seum et al. 2020 Kai hu et al. 2021 Xuan Cai et al. 2020 Zhang et al. 2022 Rahimzadeh et al. 2021
89.92 99.43 94.30 94.03 98.49
-
99.50 97.10 81.26
80.40 99.36 91.40 94.44 94.96
88.94 99.44 94.20 94.06 -
Proposed Method
98.00
96.84
97.87
97.55
97.70
Detection of COVID-19 in Computed Tomography Images
6
151
Conclusion
The present work presents a method for classifying images with pulmonary diseases using CNNs combined with image pre-processing, data augmentation, transfer learning, and fine-tuning techniques. From the results obtained, we can conclude that the use of CNNs can be of paramount importance to classify CT images and detect the presence of COVID-19 in real environments, in addition to being possible to classify between normal pneumonia and that caused by COVID-19, that have similar characteristics and can be easily confused. With the proposed methodology, it was also possible to observe that the ResNet101 architecture obtained greater robustness for the problem in question compared to the tested models. It is also possible to highlight the importance of image pre-processing and its advantages. In future works, we intend to improve the proposed pre-processing method, using other techniques to increase the potential of the evaluated model. We also intend to optimize the hyperparameters and select attributes using genetic algorithms so that we can reduce the execution time of the predictions; In addition to using networks for the task of segmentation, highlighting the regions of lesions caused by COVID-19, thus providing additional assistance to the specialist in the monitoring and diagnosis of this disease. Acknowledgments. This work was carried out with the full support of the Coordination for the Improvement of Higher Education Personnel - Brazil (CAPES) - Financial Code 001.
References 1. World Health Organization. Coronavirus disease (covid-19). Accessed Oct. 18, 2022 2. World Health Organization. Weekly epidemiological update on covid-19 - 12 october 2022. Accessed Oct. 12, 2022 3. Kazimierczuk, M., Jozwik, J.: Analysis and design of class e zero-current-switching rectifier. IEEE Trans. Circuits Syst. 37(8) (1990) 4. Wang, W., Yanli, X., Gao, R., Roujian, L., Han, K., Guizhen, W., Tan, W.: Detection of sars-cov-2 in different types of clinical specimens. JAMA 323(18), 1843– 1844 (2020) 5. ACR: Amemrican College of Radiologyion. Acr recommendations for the use of chest radiography and computed tomography (ct) for suspected covid-19 infection. Accessed Oct. 10, 2022 6. Godet, C., Elsendoorn, A., Roblot, F.: Benefit of ct scanning for assessing pulmonary disease in the immunodepressed patient. Diagn. Interv. Imaging 93(6), 425–430 (2012) 7. Rosa, M.E.E., et al.: Covid-19 findings identified in chest computed tomography: a pictorial essay. Einstein (Sao Paulo, Brazil) 18, eRW5741–eRW5741 (2020) 8. Seum, A., Raj, A., Sakib, S., Hossain, T.: A comparative study of cnn transfer learning classification algorithms with segmentation for covid-19 detection from ct scan images. In: International Conference on Electrical and Computer Engineering, pp. 234–237 (2020)
152
J. V. M. Marques et al.
9. Kai, H., Huang, Y., Huang, W., Tan, H., Chen, Z., Zhong, Z., Li, X., Zhang, Y., Gao, X.: Deep supervised learning using self-adaptive auxiliary loss for covid-19 diagnosis from imbalanced ct images. Neurocomputing 458, 232–245 (2021) 10. Cai, X., Wang, Y., Sun, X., Liu, W., Tang, Y., Li, W.: Comparing the performance of resnets on covid-19 diagnosis using ct scans. In: 2020 International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 1–4 (2020) 11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 12. Zhao, J., Zhang, Y., He, X., Xie, P.: . Covid-ct-dataset: A CT scan dataset about COVID-19. CoRR, abs/2003.13865 (2020) 13. Soares, E., Angelov, P., Biaso, S., Froes, M.H., Abe, D.K.: Sars-cov-2 ct-scan dataset: A large dataset of real patients ct scans for sars-cov-2 identification. medRxiv (2020) 14. Zhang, Y., Satapathy, S.C., Zhu, L.-Y., G´ orriz, J.M., Wang, S.: A seven-layer convolutional neural network for chest ct-based covid-19 diagnosis using stochastic pooling. IEEE Sensors J. 22(18), 17573–17582 (2022) 15. Rahimzadeh, M., Attar, A., Sakhaei, S.M.: A fully automated deep learning-based network for detecting covid-19 from a new and large lung ct scan dataset. Biomed. Signal Process. Control 68, 102588 (2021) 16. Gunraj, H., Wang, L., Wong, A.: Covidnet-ct: a tailored deep convolutional neural network design for detection of covid-19 cases from chest ct images. Front. Med. 7, 608525–608525 (2020) 17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv e-prints, arXiv:1409.1556 (2014) 18. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016) 19. Vieira, P., Sousa, O., Magalh˜ aes, D., Rabˆelo, R., Silva, R.: Detecting pulmonary diseases using deep features in x-ray images. Pattern Recogn. 119, 108081–108081 (2021) 20. Chollet, F., et al.: Keras (2015) 21. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Gradcam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Abnormal Event Detection Method Based on Spatiotemporal CNN Hashing Model Mariem Gnouma1,2(B) , Ridha Ejbali1,2 , and Mourad Zaied1 1 Research Team on Intelligent Machines, National Engineering School of Gabes,
University of Gabes, Street Omar Ibn El Khattab, Zrig Eddakhlania, 6029 Gabes, Tunisia [email protected], {ridha_ejbali,mourad.zaied}@ieee.org 2 Departement of Computer Sciences, Faculty of Sciences of Gabes, Erriadh City, 6072 Zrig, Gabes, Tunisia
Abstract. With the development of public security awareness, anomaly detection has become a crucial demand in surveillance videos. To improve the accuracy of abnormal events detection, this paper proposes a novel spatio-temporal architecture called spatio-temporal CNN hashing model. In this paper, we propose a novel deep CNN learning framework that can exploit the differential binary motion image data to learn informative hash codes, to accurately classify abnormal events. However, we exploit the feature learning capabilities of CNN architectures to learn representative hash codes to obtain compact binary codes to solve the domain adaptation problem. In order to attain adequate computation for feature distance, we include hashing learning into a deep network. Exactly, a hashing layer is inserted after the last fully connected layer to move to the high-dimension and real-value features into low-dimension and binary features. Extensive experiments on a public available dataset have demonstrated the effectiveness of our framework, which achieves the state-of-the-art performance. Keywords: Convolutional Neural Network · Feature extraction · Hashing code · Abnormal detection
1 Introduction Abnormal events detection aims to identify aberrant events that do not conform to the expectation, where only normal elements are available during training [1]. It plays a key role in intelligent surveillance. However, it is a very challenging task for the resulting reasons. First, irregular events are distinct according to circumstances, thus the same activity may be normal or anomalous in different scenes. Second, classic binary-classification approaches are inapplicable to abnormal events detection, due to the absence of irregular events in training set [2, 3]. Third, some normal events happen frequently while some happen rarely. Therefore, several training information exhibit large intra-class variations and unbalanced distribution. Nevertheless, we can define an anomaly as the detection of activities in a given set of data that do not conform to an established standard. In recent years, the continuous progress of computer vision has promoted the analysis and detection of abnormal events, and has made great progress in abnormality detection. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 153–165, 2023. https://doi.org/10.1007/978-3-031-35510-3_16
154
M. Gnouma et al.
Recently, as a powerful big data learning tool, deep neural networks have provided many excellent models. Among them the most used is the convolutional neural network (CNN) which is trained by backpropagation of layers of convolution filters to achieve a better performance especially for the tasks of recognition of anomalous activities in public places [4]. CNN is designed to adaptively and automatically learn spatial hierarchies of features through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers. The key of the method in this paper is to track the features of CNN changes over time. In order to capture the temporal variation of CNN features, we cluster them into a set of binary codes. However, this is the first work that suggests the use of existing Deep CNN hashing models combined with spatio-temporal information for the representation of movement in crowd investigation. In this paper, we demonstrate that spatiotemporal aspects for abnormal behavior with discriminative features would be useful in reaching an accurate deep network. This article introduces an effective detection process that captures local variations in the appearance of video blocks. Although we show that we can obtain better performance by employing deeper models for feature extraction, we choose to use the VGG-f model [5] which permits us to process the video streams in real-time at 20 frames per second on a normal CPU. In this paper, we propose a novel deep hashing CNN (DHC), which integrates the hashing learning into a CNN to extract the deep feature for abnormal events classification. Primary, we used a well-pretrained VGG network to simultaneously extract deep features of the DBMI frames. Then, a hashing layer is inserted after the last fully connected layer (fc8) to transform the high-dimension real value features into the lowdimension compact binary features, which can considerably speed up the computation for feature distance. Next, a loss function is used to minimize the feature distance of similar pairs. Lastly, the deep features obtained through the trained DCH is fed into an Softmax classifier for abnormal events detection and localization. Different from the previous approaches that only study the semantic data of the individual pixel, the proposed technique aims at developing the correlation between them. Additionally, the hashing technique is presented to learn the compact binary codes. To the best of our knowledge, this is the first time to use the hashing learning into deep CNN networks for abnormal events detection and localization. This paper offers an integral and automated vision-based detection system in crowded scenes. It aims at presenting the following contribution: Our method comprises four steps: 1) The reconstruction of a Differential Binary Motion Image (DBMI) 2) The extraction of Deep CNN based binary features from a sequence of input DBMI images 3) The reconstruction of a novel Deep CNN Hashing model (DCH) 4) The compute of the spatio-temporal Deep CNN Pattern measure using the extracted deep binary hash codes.
Abnormal Event Detection Method Based on Spatiotemporal
155
2 Related Work With a fast development of video acquisition devices, surveillance system is broadly used for privacy protection, criminal tracking, monitoring process, etc. Most of the surveillance acts captured for a long time are absurd and usual. Traditional surveillance systems rely on a human operator to monitor scenes and discover rare or unequal events by observing monitor screens. However, observing surveillance video is a labor-intensive task. Therefore, to address the problem, automatic abnormal event detection has increasingly involved attention in computer vision fields. Yet, they have multiple definitions in reality. In this paper, we treat low-possibility actions as abnormal events, which convert ambiguous concepts into operational ones. In the previous years, several advances have been made in the fields of abnormal event detection. For an overview of this topic, see [6]. Abnormal detection technologies can be divided into two categories: Methods based on handcrafted representation, methods based on deep learning. By comparing the approaches of computer vision, deep learning approaches show advantages. Yet, in handcrafted representation, the model describes the motion or the appearance of the whole human body. It creates the global structure of an object, and the region of interest is encoded as an entire part. The global illustration of human action is realized by optical flow [7] or silhouette [8]. Meanwhile, in [8], the authors considered features as spatio-temporal properties obtained from joint information of human body color and depth. In fact, they compute intensity variation feature from the human depth silhouettes and joint displacement along with specific-motion features from the human body joints data. In [9], the authors used silhouette to introduce five classes of abnormal activities. In this method, moving targets are extracted by image differencing technique. For each image, spatial features are derived using the silhouette data such as the center of gravity and size. Gnouma et al. [10] utilized a Block Matching motion estimation algorithm based on acceleration and changes of the human body silhouette area to detect anomalous human events. They thought to use Block Matching algorithm to interpret the movement of the optical flow and its light intensity to detect fall. Nevertheless, those methods does not depend on human detection or segmentation, so it can cause many problems for anomaly detection. Compared with those methods, our proposed DBMI is able to track spatio-temporal interest points to provide a good estimation of modeling group interaction. Moreover, it takes full advantage of the knowledge of normal data. It has a stronger ability of flexibility to openly handle the large intra-class variations in normal facts. Furthermore, we concentrated on typical image of actions in each video to simplify the interpretation of actions. Though traditional handcrafted models still achieve poorly on complex highdimensional datasets particularly when only normal type samples are specified, the question of how to learn their characteristic properties is still a challenging research problem. To improve the performance of this classification and solve the problem of inadequate training of data in network traffic anomaly detection, an interesting approach is to complete the one-class classification task through the use of a deep learning method. Among deep learning approaches reduced the computation complexity and presented improvements in the learning of complex connections. An overview of this topic is introduced in [6]. Recent studies in the field have used both handcrafted features
156
M. Gnouma et al.
and deep learning features. For example, Ilyas et al. [11] proposed a hybrid deep network method that combines handcrafted with deep learning features to discover irregular events. Mohtavipour et al. [12] developed a model that identifies abnormal objects using a specific features resulted from handcrafted approaches. These features are related to speed, appearance, and typical frame and fed to a CNN as temporal, spatial, and spatiotemporal streams [13]. The use of CNNs with motion features was explored by several researchers to recognize anomalous events. Gnouma et al. [14] proposed a model that primary identifies the region of interest using a binary quantization map and then uses a stacked auto encoder to identify anomalies. Our studies used both handcrafted features and deep learning CNN features. Our proposed solution integrated semantic information (inherited from the existing CNN model) with spatio-temporal data with minimal additional training cost. As well, it is distinguished from other studies in the field that identify and locate abnormal events in public and private scenes in that our suggested solution overcame the difficulty of binary classifications and achieved high results compared to other classification.
3 Proposed Approach
Fig. 1. Proposed DCH framework
3.1 Spatiotemporal Stream To accrue spatial data, we used an evaluated version of our approach proposed in [15] for the input of the stream of the DCH network. Its overall system structure is illustrated in Fig. 1. DBMI is a binary image that characterizes moving objects in consecutive video frames. To build a DBMI, Moving substances are extracted by the image differencing process. The output of the method is a binary image Img x, y).
Abnormal Event Detection Method Based on Spatiotemporal
157
To create Img (x, y), it is essential to measure the difference between t, t-1 consecutive frames. The buildup of Img (x, y) in successive frames is the BMI frame and proposed as follows: DBMIt (x, y) = ∪n (1) f (t) Imgt−i (x, y) − Imgt−(i+1) (x, y) i=1 i =i+2 In this equation DBMI (x, y) are the foreground binary images, Img-i (x,y) present the binary frames sequence containing the ROI and f(t) present is the weight function which gives higher performance to more recent frames. Next, normalization processes are performed to find the final image input to our DCH networks.
Fig. 2. DBMI extraction in consecutive images
Figure 2 displays examples of DBMI extraction in consecutive images. As can be appreciated in normalized DBMI, many pixels have been identified and it can crop a discriminative feature for irregular activities. In addition, in DBMI feature extraction stage, unimportant areas with low speed have been removed. In unusual activities datasets, videos concenter on human presences, but it is very common that humans do not all lead the framework, so it can be a big problem for the classification phase. For this kind of analysis, a new suggestion for image normalization is offered to extract only the region of interest in the image and to eliminate the black background. We advise that we eliminate all the insignificant information in the input frame. The output of the first binary image includes white regions of moving objects and black regions of stationary objects and background. As can be seen in the first extraction, the white area contains all moving pixels and it cannot define the action comportment well, but in DBMI (last frame), only very fast pixels have been perceived and it can yield a discriminative feature for irregular events recognition. Furthermore, in DBMI, unimportant regions with low speed have been detached from original frame. 3.2 Network Architecture Due to the complex condition of abnormal events detection, irregular events classification takes many challenges. As, different objects may distribute similar spectral or
158
M. Gnouma et al.
Fig. 3. The network structure used in our method.
curves properties of the same objects. Most existing deep learning-based methods use label data as supervised data to train deep models, which cannot successfully solve the above problem of aberrant detection especially in crowded scene. In this paper, we propose a novel feature extraction approach to learn the deep features for abnormal events detection. Figure 3 demonstrate the framework, which can be generally divided into Deep Features learning (DFL) and Hashing learning (HL). 3.2.1 Deep Feature Learning High-level features learned using convolutional neural networks (CNNs) are most effective in many computer vision tasks [16]. To construct our appearance features, we consider the pre-trained CNN architecture that can process frames as fast as possible, namely VGG-f [5]. Since we want abnormal action detection to work in real time on a standard desktop computer without an expensive GPU, the VGG-f is the best choice because it can process around 20 frames per second. We note here that better anomaly detection performance can be obtained by using deeper CNN architectures such as VGG-verydeep [17], or GoogLeNet [18]. The VGG-f model is trained on the Avenue datatsets [19]. We use a pre-trained CNN model to extract deep features as shown below. Given an input video, we adjust the image to 250 × 250 pixels. Then we subtract the average frame from each frame and use it as input to the VGG-f model. We implement the neural network as a deep CNN, which consists of five-convolution layers conv1 – conv5 and 3 fully connected layers fc6 - fc8 followed by a loss layer. In our model, we introduce a hashing layer hash-fc8 in place of the standard fc8 layer to learn a binary code. We remove the fully connected layers (identified as fc6,fc7 and fc8, and softmax) and treat the activation maps of the last convolutional layer (conv5) as appearance features. Though, the fully connected layers are suitable for abnormal events detection, the last convolutional layer (fc5) contains valuable appearance and pose information, which is more useful for our anomaly detection task. Interestingly, Smeureanuet al [20] have also found that the conv5 features are more suitable for action recognition in video.
Abnormal Event Detection Method Based on Spatiotemporal
159
Lastly, we reshape each activation map and concatenate the vectors corresponding to the 256 filters of the conv5 layer into a single feature vector of 43264 (13 × 13 × 256) components. The obtained feature vectors are normalized using the L2 norm. However, the focus of this paper is to prove the effectiveness of combination between hashing learning and deep learning, rather than the exploration of different deep networks. Let X = [x1 , x2 , x3 …, xn | x ∈ Rm] represent the set of training frames, where m refers to the number of bands. In this formulation, the hashing methods aim to seek many hash functions to design and measure each example into X dense binary vector. Let LPij be the label of a pair of frame patch (xi, xj)N i,j=1 , we have this definition if j = 1 if xi and xj come from same class and 0 otherwise. Over the above-mentioned deep CNN network, the deep features (dfi, dfj)can be obtained by: dft = (W, b|xt ), t = i, j
(2)
where is the network function described by the network weight W and bias b. This propagation actually makes a series of linear and nonlinear transformations, with a convoluting, a pooling, and a nonlinear mapping. 3.2.2 Hashing Learning In order to obtain DCH features, the deep features extracted in the above step are used to conserve the similarity of original space. More precisely, the feature distance of similar pairs should be as small as possible; the different pairs are proposed to be away from each other in feature space. Figure 4 illustrate the proposed Hashing model into the deep CNN to obtain discriminative and compact representations for irregular events detection.
Fig. 4. Hash learning is integrated into the deep CNN to obtain discriminative and compact representations for abnormal events detection
The best idea to achieve the above object is to use Euclidean distance as the metric to calculate the similarity degree between deep features [21]. The optimal solution is given by: Dist = df i − df j
(3)
160
M. Gnouma et al.
When the dimension of the entity is high, a Euclidean distance is not a possible choice. The best solution for an efficient computation for feature distance is the integration of hash learning in a deep network. Specifically, a hash layer is inserted after the last fully connected layer to transfer high-dimensional and real-valued features into lowdimensional and binary features. Bnc = sgn(df t )
(4)
where sgn(dft ) achieves element-by-element operations for a matrix or a vector, Bnc returns 1 if v > 0 and -1 otherwise. Once obtaining the binary codes Bnc = {bt }N t=1 for all the samples, the probability of the pairwise labels PL = {PLij } can be defined as p(PLij |Bnc) =
σ (wij), PLij ij = 1 1 − σ (wij) PLij = 0
(5)
where σ (Wij )is the logistic function and Wij = (1/2)bTi bj. However, the loss function can be given by calculating the negative log-likelihood of the observed pairwise labels in PL. Through minimizing the above loss function, the feature distance in Hamming space between two similar models can be optimized as small as possible, and the Hamming distance between two different models becomes as large as possible. Though, motivated by [21], the above loss function can be formulated in a discrete way. H =−
LPij
N bnci − dfi 22 LPij ψij − log 1 + eψij + β i=1
(6)
where ψij = (1/2)df Ti df j , i, j, … N, β is a regularization parameter which can make df approach to bnci . 3.2.3 Classification The deep CNN hashing framework generates the binary hash codes for anomaly detection, where a hash layer is well designed to check the compact binary output. In the training phase, a loss function is used for learning optimal parameters of the deep CNN model. Once designing the loss function, the DCH hashing can be trained in an end-toend mode with the stochastic gradient descent procedure. Once our network is trained enough, we can simply obtain the deep features by propagating all samples on the trained DCH. In particular, for an unknown sample xk , the feature dfk can be calculated by the network function dfk = (W, b|xk ), k = 1, 2, 3. . . N.
(7)
As a final point, in order to evaluate the efficiency of the learned features, these features are fed into a Softmax classifier for the ulterior classification.
Abnormal Event Detection Method Based on Spatiotemporal
161
4 Experiments Results 4.1 Network Architecture In this section, we evaluate the proposed deep CNN hashing network with state-of-theart works in the field of abnormal detection. For experimental framework, we extract deep appearance features from the training and the test video sequences. We consider the pre-trained VGG-f [5] models to create CNN architecture and extract deep features. To generalize the evaluation, we tested our proposed approach on both private and public scenes. To perform localization, we replaced the class score by bounding box location coordinates. A bounding box location is represented by the 4D vector (coordinates(x, y), height, width). The VGG-f has been trained on the Avenue dataset, it consists of five convolution layers (conv1 - conv5), and three fully connected layers (fc6, fc7, fc8). We insert the hashing layer hash-fc8 that outputs vectors in the place of fc8. 4.2 Datasets We conduct experiments on a public dataset to evaluate the performances of DCH model using the DBMI descriptor. The public dataset is Avenue datasets [19]. All of the experiments are performed with MATLAB 2019a on a Windows OS with Intel Core i5 processor and 32 GB RAM. Avenue datasets is used for local anomaly events detection, where abnormal events occur in a relatively small local region. This dataset contains 16 train videos with 15.328 images and 21 test videos with 15.297 frames. The Avenue dataset is captured by a camera with a horizontal view. Therefore, we not only have to detect abnormal events but also we needed to detect abnormal body movements. Using this dataset, both of qualitative and quantitative analyses of the results are cited with comparisons to the already existing methods. 4.3 Evaluation We compare our abnormal behavior detection system based on deep CNN features and DBMI frames extraction with five state-of-the-art approaches [2, 13, 20, 22, 23]. The pixel and frame-level AUC metrics computed on the Avenue CUHK data set are presented in Table 1. Compared with Chu et al. [13], our framework yields an improvement of 15.2%, in terms of frame-level AUC, and an improvement of 2.1%, in terms of pixel-level AUC. We further obtain better results than Rodrigues et al. [2], as our approach gains 14.45% in terms of frame-level AUC and 7.47% in terms of pixel-level AUC. Generally, our system is able to surpass the performance of state-of-the-art methods. Figure 5 demonstrates the comparison of regularity score curves in different testing videos from the Avenue datasets with Li, Tong et al. [23] (Fig. 5(b) red regions). The regularity score decreases when anomaly events occur. The blue area denotes the framelevel of our anomaly events approach. Further, Fig. 6 show the frame-level and pixel-level ROC curves. It illustrates the frame-level anomaly scores in six different scenarios in
162
M. Gnouma et al.
Table 1. Summary of the results in terms of frame-level and pixel-level AUC on the Avenue data set with the state-of-the-art approaches Method
Frame AUC
Pixel AUC
AUC
Deep Appearance [20]
84.6%
93.5%
89.05%
Multi-timescale Trajectory [2]
82.85%
88.33%
87.12%
TwoStream + deep [23]
96.3%
Sparse-coding + CNN [13]
82.1%
93.7%
87.9%
Discriminative learning [22]
78.3%
91.0%
84.65%
Our approach
97.3%
95.8%
96.55%
-
96.3%
Fig. 5. Regularity score curves in two testing videos from the Avenue dataset.
the Avenue CUHK dataset, produced by our method based on VGG-f features and a Softmax classifier.
Fig. 6. Examples of detection results on the Avenue dataset
Abnormal Event Detection Method Based on Spatiotemporal
163
Table 2. Summary Abnormal detection results in terms of frame and pixel-level AUC on the Avenue data set Method
Frame AUC
Pixel AUC
NF + VGG-fc6 + Softmax
84.6%
84.5%
Time (FPS) 1.95
NF + conv5 + Softmax
87.2%
88.7%
1.95
DBMI + VGG-fc7 + Softmax
93.27%
92.8%
20.46
DBMI + VGG-fc6 + Softmax
94.6%
93.7%
20.61
DBMI + conv5 + Softmax
97.28%
95.75%
20.7
In Table 2, we presented preliminary results on the Avenue chuck data set to offer realistic evidence in favor of the hash-CNN features that we selected for the subsequent experiments. The results indicate that better performance can be gotten with the conv5 features rather than the fc6 and fc7. However, we demonstrated that our reconstruction of a Differential Binary Motion Image (DBMI) is powerful than normal images (NF) without pre-processing, and it can achieved promising results. For speed evaluation, we measured the time required to excerpt features and to guess the anomaly scores. We presented the number of frames per second (FPS) in Table 2. Although, we are able to report better results with the VGG-f [5] architecture. Using a single core, our final model is capable to process the test videos in real-time at nearly 21 FPS.
Fig. 7. The performance (separability and accuracy) on the Avenue dataset
To further demonstrate the effectiveness of our method, we compute the average score of the normal and unusual events on the Avenue dataset as presented in Fig. 7 (left). The corresponding gap, which represents the difference of normal and abnormal events, is calculated by subtracting the average abnormal score from the average normal score. We show results with different CNN architectures and features from different layers. Evidently the integration gap of the DBMI with the VGG conv5 indices is the largest, which demonstrates that the preprocessing applied for the initial framework is useful to obtain more details and the regions of interests which can then help the process
164
M. Gnouma et al.
to detect precisely the abnormalities. Figure 7 (right) presents the ROC curves on the Avenue dataset. As we can see that, the combination of DBMI with the VGG conv5 is superior to the other CNN architectures cues.
5 Conclusion In this paper, a solution is developed to detect abnormal behaviors in Avenue chuck dataset. This problem is considered essential and needs extensive studies because it presents safety in public places. In these gatherings, many abnormal events appear that need to be mitigated. In this work, we have proposed a novel framework for abnormal event detection in video that is based on extracting deep features from pre-trained CNN models. These features are connected to speed of movement and appearance and fed to a new Deep CNN Hash model architecture as a spatiotemporal streams. Thus, this research is the first of its kind to integrate hash code learning with CNN architecture for surveillance video abnormal event detection. The Softmax model achieved an average of 97.28% AUC. We also analysed the accuracy performance for different types of anomalies in different models. The results show that our proposed fusion framework has the best AUC performance compared with state-of-the-art on Avenue dataset. Our proposed framework is suitable for video anomaly detection which contains abnormal behaviours and abnormal objects.
References 1. Ramachandra, B., Jones, M., Vatsavai, R.R.: A survey of single-scene video anomaly detection. IEEE TPAMI 44, 2293–2312 (2020) 2. Rodrigues, R., Bhargava, N., Velmurugan, R., Chaudhuri, S.: Multi-timescale trajectory prediction for abnormal human activity detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2615–2623 (2020) 3. Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 341–349 (2017) 4. Pandey, H., Karnavat, T.L., Sandilya, M., Katiyar, S., Rathore, H.: Intrusion detection system based on machine and deep learning models: a comparative and exhaustive study. In: Abraham, A., et al. (eds.) HIS 2021. LNNS, vol. 420, pp. 407–418. Springer, Cham (2022). https://doi. org/10.1007/978-3-030-96305-7_38 5. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of BMVC (2014) 6. Anoopa, S., Salim, A.: Survey on anomaly detection in surveillance videos. Mater. Today Proc. 58, 162–167 (2022) 7. Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M.J.: On the integration of optical flow and action recognition. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 281–297. Springer, Cham (2019). https://doi.org/10.1007/978-3-03012939-2_20 8. Jalal, A., Kamal, S., Azurdia-Meza, C.A.: Depth maps-based human segmentation and action recognition using full-body plus body color cues via recognizer engine. J. Electr. Eng. Technol. 14(1), 455–461 (2019)
Abnormal Event Detection Method Based on Spatiotemporal
165
9. Zin, T.T., Kurohane, J.: Visual analysis framework for two-person interaction. In: IEEE 4th Global Conference on Consumer Electronics (GCCE), pp. 519–520 (2015) 10. Gnouma, M., Ejbali, R., Zaied, M.: Human fall detection based on block matching and silhouette area. In Ninth International Conference on Machine Vision (ICMV), vol. 10341, pp. 18–22. SPIE (2017) 11. Ilyas, Z., Aziz, Z., Qasim, T., Bhatti, N., Hayat, M.F.: A hybrid deep network based approach for crowd anomaly detection. Multimedia Tools Appl. 80(16), 24053–24067 (2021). https:// doi.org/10.1007/s11042-021-10785-4 12. Mohtavipour, S.M., Saeidi, M., Arabsorkhi, A.: A multi-stream CNN for deep violence detection in video sequences using handcrafted features. The Visual Computer 38(6), 2057–2072 (2022) 13. Chu, W., Xue, H., Yao, C., Cai, D.: Sparse coding guided spatiotemporal feature learning for abnormal event detection in large videos. IEEE Trans. Multimedia 21(1), 246–255 (2018) 14. Gnouma, M., Ejbali, R., Zaied, M.: Video anomaly detection and localization in crowded scenes. In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J.A., Quintián, H., Corchado, E. (eds.) CISIS/ICEUTE -2019. AISC, vol. 951, pp. 87–96. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-20005-3_9 15. Gnouma, M., Ladjailia, A., Ejbali, R., Zaied, M.: Stacked sparse autoencoder and history of binary motion image for human activity recognition. Multimedia Tools Appl. 78(2), 2157– 2179 (2018). https://doi.org/10.1007/s11042-018-6273-1 16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS, pp. 1106–1114 (2012) 17. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 27, 1–9 (2014) 18. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of CVPR, pp. 1–9 (2015) 19. Avenue Dataset. Detection of unusual crowd activity. http://www.cse.cuhk.edu.hk/leojia/pro jects/detectabnormal/dataset.html. Accessed 30 Sept 2022 20. Smeureanu, S., Ionescu, R.T., Popescu, M., Alexe, B.: Deep appearance features for abnormal behavior detection in video. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 779–789. Springer, Cham (2017). https://doi.org/10.1007/9783-319-68548-9_70 21. Fang, L., Liu, Z., Song, W.: Deep hashing neural networks for hyperspectral image feature extraction. IEEE Geosci. Remote Sens. Lett. 16(9), 1412–1416 (2019) 22. Del Giorno, A., Bagnell, J.A., Hebert, M.: A discriminative framework for anomaly detection in large videos. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 334–349. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_21 23. Li, T., Chen, X., Zhu, F., Zhang, Z., Yan, H.: Two-stream deep spatial-temporal auto-encoder for surveillance video abnormal event detection. Neurocomputing 439, 256–270 (2021)
A Multi-objective Iterated Local Search Heuristic for Energy-Efficient No-Wait Permutation Flowshop Scheduling Problem Gabriel de Paula F´elix, Jos´e Elias C. Arroyo, and Matheus de Freitas Araujo(B) Department of Computer Science, Universidade Federal de Vi¸cosa, Vi¸cosa, MG 36570-900, Brazil {gabriel.felix,jarroyo,matheus.f.freitas}@ufv.br
Abstract. This paper addresses a multi-objective no-wait permutation Flowshop scheduling problem. In this problem, a job must be processed, from start to finish, without any interruption between machines. The jobs have processing times that are influenced by a speed factor inversely proportional to the energy consumption of the machines. The objective of the problem is to determine the job sequencing and the speed level to execute them, in such a way that two objectives are simultaneously minimized: the total energy consumption of the machines and the total tardiness in relation to the due dates of the jobs. Motivated by the computational complexity of the problem, a multi-objective heuristic based on the Iterated Local Search (ILS) meta-heuristic is proposed. To test the efficiency of the ILS heuristic, a set of 110 instances from the literature is used. The performance of the proposed heuristic is evaluated by comparing it with results from other heuristics available in the literature.
1
Introduction
Manufacturing activities account for the majority of industrial energy consumption. Energy is used in the industrial sector for a wide range of purposes, such as manufacturing and assembly processes, heating and cooling processes, lighting and air conditioning for buildings. In this work, we estudy the energy-efficient no-wait Permutational Flowshop Scheduling (nwPFS) problem. This problem is a special variant of the Permutation Flowshop Scheduling (PFS) problem. In nwPFS, the operations of each job must be processed without interruption between two consecutive machines. That is, jobs cannot wait between machines. Once they start processing on the first machine, they must complete their processes to the last machine without any interruption between machines [10]. This problem originates from specific requirements of production processes or from the lack of adequate storage spaces between jobs operations. Thus, the nwPFS problem aims to find the job sequencing for all machines where each job follows the same processing order on the machines and where there is no waiting c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 166–176, 2023. https://doi.org/10.1007/978-3-031-35510-3_17
A Multi-objective Iterated Local Search Heuristic
167
time between consecutive machines for a job. The nwPFS problem is one of the well-studied job scheduling problems and has important practical applications in industries [7]. In energy-efficient production scheduling, two approaches can be considered to reduce the energy consumption of machines. One approach is to shut down machines during idle times [2]. While the machine shutdown approach saves energy substantially, it cannot be used in some production environments because it can shorten the life of some machines. Another approach is speed scaling in which machines can process jobs with different speed levels (high, normal and slow), which correspond to different levels of energy consumption. In this case, the processing speed of the machines is a variable that can be adjusted to decrease the power consumption [11]. Machines operating at high speed levels reduce job processing times, but consume more energy. Machines with low speed levels increase processing times but consume less energy. As in [6,12] and [11], in this paper the speed scaling strategy is adopted and only the energy consumed by the machines, used to process the jobs, is considered. The tradeoff between processing time and energy consumption is an existing fact in industries and this leads to two conflicting objectives. In this study, the simultaneous minimization of the total energy consumption and the total tardiness are considered. The nwPFS problem with energy efficiency has been addressed by [9,12] and [11]. In [9] considered the minimization of the total flow time and the total energy consumption, and proposed algorithms based on the metaheuristics Artificial Bee Colony (ABC) and GA. [12] proposed a variable block insertion heuristic and an Iterated Greedy algorithm to minimize the makespan and the total energy consumption. [11], to minimize the makespan and the total energy consumption, proposed three metaheuristic algorithms to find approximations of the Pareto optimal frontiers: a multi-objective algorithm based on ABC (MO-DABC), a conventional multi-objective GA (MO-GA) and a multi-objective GA with local search (MO-GALS). As far as is known, the nwPFS problem with simultaneous minimization of total tardiness and total energy consumption has only been studied in [11]. In this paper we propose a Multi-Objective ILS (MO-ILS) heuristic for the simultaneous minimization of the total tardiness and the total energy consumption in the nwPFS problem. We also propose an algorithm to generate initial solutions of the problem and a multi-objective local search algorithm which uses four neighborhood structures are proposed.
2
Problem Description
The nwPFS problem is defined as follows. A set J = {J1 , .., Jn } of n jobs must be processed on m machines (M1 , ..., Mm ), so that the order of processing is the same on all machines. All jobs and machines are available for processing in zero time. Each job Ji (i = 1, .., n) has a processing time PJi ,k on each machine
168
G. de Paula F´elix et al.
k (k = 1, .., m), and a due date DJi . To satisfy the no-wait constraint, no job queues are allowed for machines M2 , ..., Mm . Once a job starts processing on the first machine, it must be processed continuously on all machines until its completion. Job processing time on the first machine can be delayed to ensure that it does not have to wait for processing on subsequent machines. For the singleobjective case, the nwPFS problem consists of finding the processing order of the n jobs such that the total tardiness is minimized. Figure 1 shows an example of sequencing n = 5 jobs on m = 3 machines. To calculate completion times of the jobs, the minimum distance between the start times, on the first machine, of two consecutive jobs Ji−1 and Ji must determined [5]. These distances (d(4, 5), d(5, 2), d(2, 1) and d(1, 3)) are shown in Fig. 1. The completion time of the first job in the sequence (J1 = 4) is the sum of its processing times on all machines. The completion time oj job Ji (i = 2, ..., n) i m is: CJi = j=2 d(Jj−1 , Jj ) + k=1 PJi ,k .
Fig. 1. Gantt chart: sequencing of n = 5 jobs on m = 3 machines.
The minimum distance between the start times of two consecutive jobs, Ji−1 and Ji , is calculated as follows: q q−1 d(Ji−1 , Ji ) = PJi−1 ,1 + max{0, max2≤q≤m { h=2 PJi−1 ,h − h=1 PJi ,h }}. In this study, the simultaneous minimization of total tardiness (TT) and total energy consumption (TEC) is considered. As in [11], the nwPFS problem considers the speed scaling strategy for the processing of jobs. Thus, the processing time of a job can vary according to the assigned speed level. Therefore, high speed levels reduce process times but consume more energy, and low speed levels increase processing times but consume less energy. Due to the speed variation of machines, it is assumed that each job is processed at the same speed level on all machines. In this paper, a solution to the problem is represented by two vectors S and V , where S is the sequence or permutation of the n jobs and V stores the speed levels assigned for processing the jobs. As proposed by [1], three speed levels are considered, L = {1, 2, 3}. The values 1, 2 and 3 represent the fast, normal and slow speed levels, respectively. The reason behind this idea is that the fast speed level (1) decreases the completion time of a job more but consumes more energy, while the slow speed level (3) increases the completion time with less energy consumption. Finally, the normal speed level (2) does not change the processing time of the job, i.e. job Ji is processed with the same time PJi k on each machine k (k = 1, ..., m).
A Multi-objective Iterated Local Search Heuristic
169
For example, a solution with n = 5 jobs can be represented by the vectors S = (J1 , ..., Jn ) = (4, 5, 2, 1, 3) and V = (v1 , ..., vn ) = (3, 1, 3, 2, 1). The permutation S defines the order in which jobs should be processed, and V defines the speed levels for processing jobs. In this example, J1 = 4 is the first job to be processed with speed level v1 = 3 (slow), J2 = 5 is the second job to be processed with speed level v2 = 1 (fast), and so on. For speed levels 1, 2 and 3, the respective speed factors f [1] = 1.2, f [2] = 1.0 and f [3] = 0.8 are defined [1]. If a job Ji is processed with speed level 1, its P i ,k PJi ,k = 1.2 , that is, the processing time on machine k will be changed to fJ[1] processing time decreases. If the speed level is 3, the processing time is changed PJi ,k to 0.8 , i.e. the time increases. If the speed level is 2, the processing time is not P
Ji ,k changed ( 1,0 = PJi ,k ). As in [11] and [1], the values of the objectives TT and TEC, for a solution (S, V ), are calculated as follows. Consider two jobs Ji−1 and Ji processed consecutively with speed levels vi−1 and vi , respectively. The minimum distance between the start times of jobs Ji−1 and Ji (i = 2, ..., n), on the first machine, is calculated as: q PJi−1 ,h q−1 PJi ,h PJi−1 ,1 + max 0, max − d(Ji−1 , Ji ) = . 2≤q≤m h=2 f [vi−1 ] f [vi−1 ] h=1 f [vi ] m PJ1 ,k The completion time of first job J1 (on machine m): CJ1 ,m = f [v1 ] . i k=1 Completion time of jobs Ji (i = 2, ..., n): CJi ,m = j=2 d(Jj−1 , Jj ) + m PJi ,k f [vi ] . k=1 n Makespan (maximum completion time): Cmax = j=2 d(Jj−1 , Jj ) + m PJi ,k f [vi ] .
k=1
Total Tardiness: T T =
n i=1
max{0, CJi ,m − DJi }.
Total Energy Consumption: T EC =
n
PJi ,k f [vi ] . i=1 n m m PJi ,k ×τk ×λ[vi ] ϑk ×τk ×θk + . 60×f [vi ] 60 i=1 k=1 k=1
Machine idle time k (k = 1, ..., m): θk = Cmax −
The parameters used in the TEC calculation are as follows: λ[vi ] - conversion factor for speed level vi . ϑk - conversion factor for machine idle time k. τk machine power k (kW). For speed levels 1, 2 and 3, the respective conversion factors λ[1] = 1.5, λ[2] = 1.0 and λ[3] = 0.6 are used. The conversion factor for idle time and machine power is defined as ϑk = 0.05 e τk = 60 kW, ∀k = 1, ..., m [1,11].
3
Multi-objective Iterated Local Search Heuristic
This section presents the steps of the proposed Multi-Objective Iterated Local Search (MO-ILS) heuristic to solve the nwFSP problem. Initially, a constructive
170
G. de Paula F´elix et al.
heuristic is proposed to generate a set of non-dominated solutions. Next, the neighborhood structures used to generate new non-dominated solutions, the local search procedure and the solution perturbation method are presented. The Algorithm 1 presents the pseudocode of the MO-ILS proposed to determine an approximation of the set of Pareto-optimal solutions of the problem under study. The MO-ILS algorithm receives as input the set of solutions D generated by the constructive heuristic, the number of iterations of the local search (numIter ), the number of solutions to be explored at each iteration (numSol ) and the number of perturbations (numPert) to be performed on the non-dominated solutions. The algorithm returns the set of non-dominated solutions determined during all iterations.
Algorithm 1: MO-ILS (D, numSol, numIter, numPert) 1 2 3 4 5 6 7
Set1 ← MO-Local Search(D, numSol, numIter ); D ← NonDominatedSet(D ∪ Set1); while Stop Condition do Set1 ← RandomlySelect(numSol, D); Set2 ← Perturb(Set1, numPert); Set3 ← MO-Local Search(Set2, numSol, numIter ); D ← NonDominatedSet(D ∪ Set3); returns: D
The generation of a solution consists of construct a sequence S of the n jobs and the respective vector V of speed levels. In the Algorithm 2 the steps of the proposed heuristic to build a solution are presented. As in the NEH algorithm [3], initially, the jobs are sorted according to a dispatch rule (SortingCriteria), obtaining a list Lt of candidate jobs (step 1). In step 1, S and V are initialized empty. Jobs in the Lt list are sequentially inserted into the best position of the S sequence (step 4). For each job, the processing speed level is defined (step 5). Each time a job is inserted into the sequence S, the insertion neighborhood of the partial sequence is generated and the best neighbor solution is chosen (step 6). Once the job sequence S and the vector of velocity levels V are constructed, the solution is evaluated (i.e., T T and T EC are calculated). An example, for n = 5 jobs, is presented below. Suppose the list of ordered jobs is Lt = {2, 4, 1, 5, 3}, and the first two jobs in the list have already been inserted, obtaining the partial sequence S = (4, 2). The next job in the list Lt (job 1) must be inserted in the best position of S. Suppose the sequence obtained after inserting 1 is: S = (4, 1, 2). The Insertion Neighborhood is obtained by inserting each job j of S (j = 1) in all possible positions of S, obtaining the following 4 neighbor sequences: (1, 4, 2), (1, 2, 4), (2, 4, 1), (4, 2, 1). If (1, 2, 4) is the best neighbor, then the next job in the list Lt (job 5) must be inserted in the best position of S = (1, 2, 4). These steps are repeated until all the jobs of Lt are inserted in the sequence S.
A Multi-objective Iterated Local Search Heuristic
171
In the constructive heuristic (Algorithm 2) the following parameters are used: SortingCriteria, SpeedType and Function of evaluation. Two sorting criteria are m used. O1 : jobs are sorted in decreasing order of total processing times k=1 PJi ,k . O2 : Jobs are sorted in increasing order of due dates DJi . Speed levels (SpeedType) for processing jobs are assigned in four ways. The first three consist of assigning speed levels 1, 2, or 3 to all jobs. That is, all jobs will be processed with the same speed. The fourth way is to assign the speed level randomly. As the problem under study is bi-objective, the construction of a solution is obtained considering only one objective (Function). The objectives considered for the evaluation of partial solutions are total tardiness of the jobs or makespan. Combining the parameters SortingCriteria, SpeedType and evaluation Function, it is possible to construct 2 × 4 × 2 = 16 different solutions. From these 16 solutions, the set D formed by the non-dominated solutions is determined (an approximation of the Pareto-optimal set). The MO-ILS heuristic starts from the set D. Algorithm 2: Const Heuristic(SortingCriteria, SpeedType, Function) 1 2 3 4 5 6 7
Lt ← SortJobs(SortingCriteria); S←∅;V ←∅ for i ← 1 until n do S ← Insert Jobs Lt[i] In BestPosition Of S (Lt[i], Function); vi ← Choose Speed Level ForJob Lt[i] (SpeedType); S ← Best Solution Insertion Neighborhood(S, Function); Calculate Objectives TT TEC(S, V ); return: (S, V )
Algorithm 3: MO Local Search (D, numSol, numIter ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Set1 ← D for iter ← 1 until numIter do if iter > 1 then Set1 ← RandomlySelect(numSol, D); Set2 ← ∅ i ← SelectNeighborhood(1, 2); if i == 1 then for each solution (S, V ) in Set1 do N 1 ←Insertion Neighborhood(S); N 2 ←Swap-Job-Pair Neighborhood(S); Set2 ← NonDominatedSet(Set2 ∪ N 1 ∪ N 2); else for each solution (S, V ) in Set1 do N 1 ←Single-Speed-Level Neighborhood(V ); N 2 ←Dual-Random-Speed-Level Neighborhood(V ); Set2 ← NonDominated Solutions of(Set2 ∪ N 1 ∪ N 2); D ← NonDominated Solutions of(D ∪ Set2); return: D
172
3.1
G. de Paula F´elix et al.
Multi-objective Local Search and Perturbation
The goal of multi-objective Local Search (MO-LS) is to improve a set of nondominated solutions. Four neighborhood structures are used, two for job permutation and two for the speed levels vector. Neighborhoods for Job Permutation: 1. Insertion(Ji , S) - Inserts job Ji (i = 1, ..., n) in all positions of permutation S. 2. Swap-Job-Pair (Ji , Jj , S) - Swap jobs Ji and Jj (i, j = 1, ..., n; i = j). Neighborhoods for the Speed Level Vector: 3. Single-Speed-Level (Ji , vi ) - Changes the speed level vi of job Ji (i = 1, ..., n) to another speed level vj (vj = 1, 2, 3; vi = vj ). 4. Dual-Random-Speed-Level (Ji , Jj ) - For each pair of jobs Ji and Jj (i, j = 1, ..., n; i = j), new speed levels vi and vj are randomly generated. The Algorithm 3 presents the pseudocode of MO-LS. The algorithm takes as input a set D of solutions, the number of solutions to be explored at each iteration (numSol ) and the number of iterations of the algorithm denoted by numIter. At each iteration of the algorithm, numSol different solutions are randomly selected from the set D. For each selected solution, two neighborhoods are generated, considering the Permutation S and the Speed Levels in V . Among all the neighboring solutions generated, after being evaluated, the set of nondominated solutions is determined. This set is used to update the set D. The MO-LS algorithm returns the “best” set of non-dominated solutions obtained during all iterations. The perturbation of a solution consists of performing numP ert swaps in the vector of speed levels. A swap consists of randomly selecting two jobs Ji and Jj , and swapping their speed levels vi and vj . That is, after the swap, the jobs Ji and Jj will be processed with speeds vj and vi , respectively.
4
Computational Experiments
This section describes the computational experiments performed to study the performance of the MO-ILS heuristic. The solutions generated by the MO-ILS heuristic are compared with the solutions determined by two algorithms from the literature: Multi-objective Discrete Artificial Bee Colony (MO-DABC) proposed by [11], and Multi-objective Iterated Greedy (IGALL) proposed by [4]. [11] showed that the MO-DABC and IGALL algorithms present the best results for the minimization of the objectives TT and TEC in the nwFSP problem. All computational results obtained by the algorithms MO-DABC and IGALL are available on the website https://data.mendeley.com/datasets/7v6k5chjrg/1. To evaluate the proposed MO-ILS algorithm, we use the same instances used by [11]. These instances were generated by [8] and consist of 11 combinations of the number of jobs and number of machines (nxm): 20 × 5, 20 × 10, 20 × 20, 50 × 5, 50 × 10, 50 × 20, 100 × 5, 100 × 10, 100 × 20, 200 × 10 and 200 × 20. For each combination of n and m there are 10 instances. Therefore, a total of 110 instances were used.
A Multi-objective Iterated Local Search Heuristic
173
The MO-ILS algorithm was coded in C++ and executed on an Intel(R) Core TM i7-4790K CPU @ 4.00 GHz x 8 computer with 32 GB RAM, running Ubuntu 18.1 64-bit. The MO-ILS algorithm ran for 50 nm milliseconds, the same stopping condition used in the algorithms MO-DABC and IGALL [11]. For the parameters of MO-ILS, different values were tested. The best results were obtained for: numSol = 5, numIter = 30 and numPert = 3. Also, the MOILS algorithm was run five times for each instance. The non-dominated solution sets obtained in all runs are joined to determine only one set of non-dominated solutions. To compare the sets of non-dominated solutions obtained by the algorithms MO-ILS, MO-DABC and IGALL, in each problem instance, three evaluation metrics are used. Suppose X1 , X2 and X3 are the sets of non-dominated solutions obtained by the MO-ILS, MO-DABC and IGALL algorithms, respectively. As the Pareto-optimal set is not known, a reference set, denoted by Ref, is constructed using all non-dominated solutions found by the three compared algorithms. That is, Ref is the set of non-dominated solutions of (X1 ∪ X2 ∪ X3 ). Then, each set Xi is compared with the set Ref. The first metric is the epsilon+ (Iε+ ) which measures the distance between a set Xi and the reference set Ref. This metric is defined as follows: Iε+ (Xi , Ref) = max {min { max {qj − pj }}}, where p = (p1 , p2 ) and p∈Ref
q∈Xi
1≤j≤2
q = (q1 , q2 ) are points in the objective space. The metric Iε+ determines the greatest distance between a reference point and the closest point in the set Xi . That is, the smaller the value of Iε+ , the better the approximation of the set Xi to Ref will be. The second metric is a measure of distance that determines the approximation of a set Xi with respect to the set Ref : dist(Xi , Ref) = 1 1 1 (q1 −p1 ), (q2 −p2 )}, ∀p∈ Ref min {d(p, q)}, where d(p, q) = max{ |Ref | ∀q∈Xi Δ1 Δ2 with Δi being the variation between the largest and smallest value of the objective function fi in the reference set (Δi = max{fi } − min{fi }, i = 1.2). Note that dist(Xi , Ref) is the average of the distances from each point in Ref to the closest point in Xi . The third metric is the percentage of reference solutions obtained in the set |Xi ∩ Ref| . Xi : P (Xi , Ref) = 100 × |Ref| 4.1
Obtained Results
Table 1 presents the comparison between the algorithms MO-ILS, MO-DABC and MO-IGALL, considering the three metrics. The values reported in the table are the averages obtained by each algorithm, for a group of 10 instances (grouped by the number of jobs n and the number of machines m). For metric Iε+ , it can be seen that in all groups of instances, except for the group 50×5, the proposed MO-ILS algorithm presents the lowest averages of Iε+ . For the 50 × 5 group, MO-DABC had the lowest average, followed by IGALL.
174
G. de Paula F´elix et al.
The results of MO-DABC and IGALL were very close. For all instances with n = 100, IGALL is better than MO-DABC, and for instances with n = 200, MODABC is better than IGALL. Considering the overall average, MO-ILS is much better than IGALL which in turn is slightly better than MO-DABC. The overall average presented by MO-ILS is 5.4 times lower than the overall of IGALL. For distance metric dist(Xi , Ref), we can see that, for all groups of instances, the proposed MO-ILS algorithm presents the lowest averages of dist. Considering this metric, the results of MO-DABC and IGALL were also very close. For all instances with n = 100, IGALL is better than MO-DABC, and for instances with n = 200, MO-DABC is better than IGALL. By the overall average, it can be seen that MO-ILS is much better than IGALL, which in turn is slightly better than MO-DABC. The overall average presented by MO-ILS is approximately 90 times lower than the overall average of IGALL. This shows that the solutions generated by MO-ILS are very close to the reference solutions, or they are part of the reference set Ref. The third metric determines the percentage of reference solutions obtained by the algorithms MO-ILS, MO-DABC and IGALL. These results show that the proposed MO-ILS heuristic determined most of the reference solutions. That is, the solutions generated by MO-ILS heuristic dominate almost all the solutions found by the MO-DABC and IGALL algorithms. By the overall average, MOILS determined 92% of the reference solutions, while MO-DABC and IGALL generated only 4.8% and 5.0% of the solutions, respectively. The results of the MO-DABC and IGALL were slightly different than the results of the other two metrics. Considering the dist metric, IGALL was considerably better than MODABC for smaller instances (with n = 20), and MO-DABC was better than IGALL for instances with n = 50, 100 and 200. In the evaluation of multiobjective algorithms, this type of situation can happen. An algorithm may be better for one metric and worse for another, or vice versa. Table 1. Average results considering the three metrics. Epsilon+ (Iε+ ) metric
Distance metric
Percentage metric
n
m
MO-ILS MO-DABC IGALL
MO-ILS MO-DABC IGALL MO-ILS MO-DABC IGALL
20 20 20 50 50 50 100 100 100 200 200
5 10 20 5 10 20 5 10 20 10 20
47.77 51.97 59.57 1585.09 1731.19 1442.86 2716.24 4892.73 6944.33 1690.61 2809.75
0.043 0.081 0.080 0.081 0.129 0.100 0.022 0.081 0.107 0.020 0.030
3.093 3.751 4.531 6.909 7.159 7.460 10.638 10.896 10.403 7.258 4.691
3.498 3.578 2.387 7.019 6.783 6.994 9.845 9.621 10.128 7.886 5.066
87.63 83.17 86.14 95.37 91.96 92.59 97.47 95.16 93.39 94.44 91.95
2.72 3.45 7.17 4.36 7.59 6.15 2.33 4.37 5.37 4.44 5.54
12.31 18.52 15.25 0.61 0.97 1.35 0.29 0.60 1.42 1.15 2.63
11808.22 0.074
7.210
6.774
91.75
4.86
5.01
241.90 436.55 679.86 1385.23 2653.21 6479.28 5283.81 11703.29 21042.00 42483.00 39720.98
Average 2179.28 12009.92
495.97 625.83 572.23 1549.19 2474.36 5556.54 4457.46 9862.27 20679.41 43827.66 39789.47
A Multi-objective Iterated Local Search Heuristic
5
175
Conclusions
In this paper, the no-wait permutation flowshop scheduling problem was addressed, minimizing two conflicting objectives: total energy consumption of the machines and total tardiness of the jobs. Due to the computational complexity of the problem, and in an attempt to find Pareto-optimal solutions, a multi-objective Iterated Local Search (MO-ILS) heuristic was proposed. It uses the concept of Pareto dominance to select the best solutions. To generate a set of initial solutions of the problem, a constructive algorithm (that uses some dispatch rules) was proposed. Also, a multi-objective local search procedure, that uses four neighborhood structures, was proposed. The MO-ILS heuristic was tested on known medium and large instances and was compared with two meta-heuristics from the literature (MO-DABC and IGALL). Three evaluation metrics were used. After the computational experiments performed, it can be concluded that the proposed heuristic presented an excellent performance in solving the energy-efficient scheduling problem. MO-ILS generated more nondominated solutions (reference solutions) than MO-DABC and IGALL. Acknowledgments. This work was supported by CAPES and CNPq.
References 1. Mansouri, S.A., Aktas, E., Besikci, U.: Green scheduling of a two-machine flowshop: trade-off between makespan and energy consumption. Eur. J. Oper. Res. 248(3), 772–788 (2016) 2. Mouzon, G., Yildirim, M.B.: A framework to minimise total energy consumption and total tardiness on a single machine. Int. J. Sustain. Eng. 1(2), 105–116 (2008) 3. Nawaz, M., Enscore, E.E., Jr., Ham, I.: A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega 11(1), 91–95 (1983) ¨ 4. Oztop, H., Fatih Tasgetiren, M., T¨ ursel Eliiyi, D., Pan, Q.-K.: Green permutation flowshop scheduling: a trade- off- between energy consumption and total flow time. In: Huang, D.-S., Gromiha, M.M., Han, K., Hussain, A. (eds.) ICIC 2018. LNCS (LNAI), vol. 10956, pp. 753–759. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-319-95957-3 79 5. Pan, Q.K., Tasgetiren, M.F., Liang, Y.C.: A discrete particle swarm optimization algorithm for the no-wait flowshop scheduling problem. Comput. Oper. Res. 35(9), 2807–2839 (2008) 6. Ramezanian, R., Vali-Siar, M.M., Jalalian, M.: Green permutation flowshop scheduling problem with sequence-dependent setup times: a case study. Int. J. Prod. Res. 57(10), 3311–3333 (2019) 7. Sapkal, S.U., Laha, D.: A heuristic for no-wait flow shop scheduling. Int. J. Adv. Manuf. Technol. 68(5), 1327–1338 (2013) 8. Taillard, E.: Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 64(2), 278–285 (1993) 9. Tasgetiren, M.F., Y¨ uksel, D., Gao, L., Pan, Q.K., Li, P.: A discrete artificial bee colony algorithm for the energy-efficient no-wait flowshop scheduling problem. Procedia Manuf. 39, 1223–1231 (2019)
176
G. de Paula F´elix et al.
10. Tseng, L.Y., Lin, Y.T.: A hybrid genetic algorithm for no-wait flowshop scheduling problem. Int. J. Prod. Econ. 128(1), 144–152 (2010) 11. Y¨ uksel, D., Ta¸sgetiren, M.F., Kandiller, L., Gao, L.: An energy-efficient bi-objective no-wait permutation flowshop scheduling problem to minimize total tardiness and total energy consumption. Comput. Ind. Eng. 145, 106,431 (2020) 12. Y¨ uksel, D., Ta¸sgetiren, M.F., Kandiller, L., Pan, Q.K.: Metaheuristics for energyefficient no-wait flowshops: a trade-off between makespan and total energy consumption. In: 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2020)
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets Walid Louhichi1,2(B) , Sana Ben Hamida3 , and Mouhebeddine Berrima1 1 Parallel Algorithms and Optimization Research Unit, Faculty of Sciences Tunis, Tunis, Tunisia
[email protected] 2 02 rue Suez, 6000 Tunis, Tunisia 3 Research Team on Intelligent Machines, National Engineering School of Gabes, Gabes University, Gabes, Tunisia
Abstract. With the onset of the corona pandemic and during periods of confinement, teaching has become totally online. E-Learning is successful for theoretical subjects, but this is not the case with practical workshops. The laboratories require a rather enormous hardware and software infrastructure. These resources consist of all the software, applications, computers components, servers, network and any element necessary for the implementation of the educational system and specially the practical work rooms (labs). The solution is to unify cloud computing with virtual labs and e-Learning. Cloud Computing represent a new digital technology for education that supports virtualization and offers the majority of computing resource as a pay-as-you-go service. The services offered by the cloud provide are mentioned in detail in a contract signed with user. The SLA contract (Service Level Agreement) contains a set of objectives (Service Level Objective) such as minimum downtime and penalties. The adoption of Cloud computing for education is a solution for the problems of maintenance, security, insufficient hardware infrastructure and the unavailability of licensed and new software editions. This work is divided into two parts, the first is to show the usefulness of adopting cloud computing for teaching and especially virtual laboratories for practical sessions and on the other hand give an elastic model for the management of virtual laboratories in the goal of minimizing the cost by adapting the resources provided to the resources used while maintaining the quality of services. The established model is edited, analyzed, and verified by the CPN tools software. Keywords: Cloud computing · Virtual labs in cloud computing · Virtual Laboratories as a service · E-Learning · Elasticity
1 Introduction Nowadays learning is based on the use of online technologies. E-Learning is an internet bases system; this technology facilitates access, configuration of available resources. Due to the distance learning platforms the student can find his educational resources © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 177–193, 2023. https://doi.org/10.1007/978-3-031-35510-3_18
178
W. Louhichi et al.
anywhere and any time. Generally, the distance learning platforms have no problem supporting the smooth running of the sessions and sharing resources except for problems related to internet connections. The real problem encountered in teaching is the material infrastructure available at the computer labs which does not support the installation and implementation of some software, in addition to software licenses. The hardware configuration in terms of computation and storage capacities is an essential element for successful E-Learning. Virtualization, remote resource management, internet and videoconferencing represent new information technologies that have given rise to a new paradigm called Cloud Computing. This new concept is adopted to meet the technological demands of organizations, it allows them to access IT resources and applications at any time and from anywhere and pay per use. With the emergence of cloud computing many universities adopt cloud computing technology to benefit from the reduced cost of personnel, the purchase and updating of software, the cost of hardware (servers, PCs, switchers, etc.) as well as the cost of hardware configuration required for certain applications or software. In this paper we make an analytic study in the importance of the use of cloud computing in education and specially the virtual labs. After the abstract and the introduction, the rest of this paper is organized as follow. Section 2 contains definitions of cloud computing, cloud deployment models and cloud services models, whereas Sect. 3 presents some works in the application of cloud computing in the education. Section 4 is about the definition of timed and colored Petri Nets and elasticity mechanisms in Cloud Computing. In Sect. 5 we present our contribution and finally Sect. 6 is our conclusion and perspectives.
2 Cloud Computing Cloud computing is the access to computer services (servers, storage, software, networks) on the Internet. So instead of storing documents directly on your computer’s physical servers, you can store them on the Internet. In fact, cloud computing is a new way of delivering information technology solutions as a service [1]. According to the official NIST definition [2], Cloud computing is a model for providing universal, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or interaction from service providers. We can also characterize cloud computing by providing a way to exploit virtualization and aggregate computing resources, cloud computing can provide economies of scale that would otherwise be inaccessible. Almajalid R. [3] defines cloud computing as a collection of computer resources supplied through the internet. A cloud is formed when a diagram depicting the relationships among the various elements is drawn. All apps, networks, and servers are taken into consideration. As a result, the end user may utilize the internet to access all of these pieces. Cloud computing allows customers to pay for just the services they need, such as processing time, memory, and bandwidth, making it more affordable. End-users do not need to buy new equipment or maintain gear, upgrade or update old software, get software licenses, or synchronize data because the cloud service encompasses all of these things. Because cloud computing is very scalable, the number of
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
179
resources consumed is determined by the application’s requirements, and payment is based on real resource usage. It is also distinguished by its mobility and platform independence, which means that it may be accessed from anywhere, at any time, and on any device. Cloud computing, according to NIST [2], has three service models, four deployment types, and five primary features, which are listed below. 2.1 Cloud Service Models To comprehend cloud computing as a new approach to IT, it is necessary to know the different service models offered by cloud computing. Figure 1 expose the cloud computing model which will be detailed below. 1) Software as a Service (SaaS) Clients can use the various providers’ apps that operate on the cloud infrastructure thanks to SaaS [4], but they don’t have control over the hardware, network infrastructure, or operating system. It enables access to programs that are functional in nature, such as CRM, web conferencing, ERP, and email, among other things. Security, quick scalability, software compatibility, worldwide accessibility, and reliability are just a few of the advantages of SaaS. It is also in charge of a variety of company operations responsibilities, such as human resource management, content management, accounting, and computerized billing, to name a few. SaaS is a model for easily accessing cloud-based online applications. The SaaS vendor oversees managing the computer system, which may be accessed via a web browser. NetSuite, Google, Citrix, and Salesforce.com are just a few of the companies that offer SaaS. 2) Platform as a Service (PaaS) PaaS [5] allows customers to hire virtual servers as well as other services needed to run existing applications. It also assures that the customer is responsible for the design, development, testing, deployment, and hosting of applications. Clients can deploy and operate applications, such as hosting environment setups, but they do not have control over the hardware, operating system, or network infrastructure. Lack of software updates, decreased risk, and easier implementation are only a few of its features. PaaS is a cloud-based approach that allows designing, testing, and managing diverse business applications. The use of this approach streamlines the process of developing corporate software. PaaS provides a virtual runtime environment in which we may build and test a variety of apps.
180
W. Louhichi et al.
Google App Engine, Salesforce.com, Microsoft Azure, and Rackspace Cloud Sites are all PaaS providers. 3) Infrastructure as a Service (IaaS) IaaS [6] is in charge of a variety of tasks, including running the application and operating systems, as well as housing, maintaining, and managing the client’s varied equipment. IaaS, on the other hand, is unable to control the underlying cloud infrastructure. Consumers are charged on a utility computing basis. Dynamic scalability, internet connectivity, automated administrative activities, platform virtualization, and lower overall ownership costs lead to lesser capital are some of the qualities associated with IaaS. IaaS, is a virtualized arrangement of computing resources that can be accessed through the internet. An IaaS cloud provider may supply the company with a complete computer infrastructure, including storage servers and networking equipment. This infrastructure is also maintained and supported by the vendor. Rackspace Cloud Servers, Google, Amazon EC2, and IBM are among the IaaS services supplied by dealers. We have presented the main important cloud service models, however Patel and Kansara [7] divided it into four layer based on their functioning. In fact, they add Function as a Service (FaaS) layer. With FaaS, Customers may respond code in real time instead of allocating processing resources ahead of time. 2.2 Cloud Deployement Models In the literature, we have found different classification of cloud deployment models. The main of them are private cloud, public cloud, hybrid cloud and community cloud [3, 8] and [9]. Patel and Kansara [7] present six types of deployment models, with the previous deployment models, they expose virtual private cloud and Inter-Cloud which is a form of deployment paradigm that consists of two types of clouds: Federated Clouds and Multi-clouds. In this paper, we will settle for presenting the four main deployment models as shown in Fig. 1. 1) Privtate cloud Internal or corporate is another name for the private cloud deployment architecture. A private cloud is the property of a single company. The system is controlled and managed centrally by that entity. A private cloud server can be hosted by a third-party (for example, a service provider). The majority of businesses maintain their gear in their own data centers. Everything will be overseen and managed by an internal staff from there. A type of cloud computing in which service access is restricted or the client has some control/ownership over how the service is implemented [10]. 2) Public cloud A public cloud is one of the most used cloud computing models. Customers and organizations may access resources like as applications, infrastructure, and storage over
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
181
the internet, thanks to the service provider. Microsoft, Google, and other service providers have their own infrastructure in their data centers. The only way of access will be via the internet. In the public cloud design, there will be no direct connectivity. The services may be supplied for free (e.g., Gmail) or as pay-per-use services to companies. The client has minimal visibility or control over the location of the computer infrastructure in this scenario. Although public cloud services are simple to manage and inexpensive, they are not as secure as private clouds [11]. 3) Hybrid cloud Hybrid clouds services are a mix of public and private clouds provided by various vendors. One of the drawbacks of these services is that we must coordinate the management of many security platforms. Hybrid clouds are made to work together effortlessly, with data and apps flowing freely across the two platforms. It’s the ideal choice for a company or organization that need a little bit of both, which varies by industry and size [7]. A hybrid cloud, in essence, begins as a private cloud and then expands the integration to include one or more public cloud services. When enterprises have sensitive data that cannot be kept in the cloud or legal obligations that need data security, storage, and more, this deployment strategy makes sense. 4) Community cloud The Community Cloud intends to merge Grid Computing’s distributed resource provision, Digital Ecosystems’ distributed control, and Green Computing’s sustainability with Cloud Computing’s use cases, while leveraging Autonomic Computing’s selfmanagement improvements. Replace vendor clouds by forming a Community Cloud out of the underutilized resources of user PCs, with nodes capable of fulfilling all functions, including consumer, producer, and, most critically, coordinator [12].
Fig. 1. Cloud computing architecture for university (Service and deployment Models)
182
W. Louhichi et al.
2.3 Benefits and Challenges of Cloud Computing The cloud, like any other new technical innovation, comes with its own set of benefits but presents some issues and concerns. These obstacles have the potential to stifle adoption, necessitating attention and solutions to overcome them. 1) Benifits of cloud computing Cloud computing allows cutomers to access technologies (infrastructure, platforms, software) that normally require regular, sophisticated and relatively expensive IT support. For example, few small businesses have the knowledge and expertise required to perform day-to-day management of dedicated servers and storage servers to run commercially available ERP or CRM packages. By leveraging cloud-based IaaS and SaaS solutions, businesses can stop worrying about the details of installation, infrastructure operation, and technical software. Furthermore, by using cloud computing, SMEs can access service levels that are much higher, particularly in terms of reliability and performance, which could not be achieved with a standard on-site installation. Dimitrios Gkikas [8] classifies the benefits of cloud computing in immidiate and long term benefits. Author begin with, one of the most notable advantages of the cloud computing utility model is the decrease of IT expenditures and the simplification of IT operations. As an indirect benefit of cloud computing Gkikas [8] speeks about the ability of companies to concentrate on their core operations and what matters most to them. We can expose an other advantage, that cloud users do not need to be concerned about overprovisioning because computing resources may be dynamically supplied based on the needs and demands of the users. However Almajalid [3] mentions five most advantages witch are: Convinience and improved accessibility, cost saving, reduced expenditure on technology infrastructure, minimal training on the personnel and super-computing power. 2) Challenges of cloud computing Despite the benefits of cloud computing, there are a number of challenges that should be addressed. Vendor lock, latency, dependability, security, control, performance, and privacy are among the most pressing problems. Organizations that are hesitant to hand up management of their IT resources to other parties risk modifying existing technology without the agreement of their consumers. As a result, consumers rely on the provider to operate and update their software because they have no control over the servers [13]. Security and privacy: Although data in the cloud is often shared, the data owner should have complete control over who gets access to the data and what they are authorized to do with it once they have it. A standard-based heterogeneous data-centric security strategy that moves data protection from systems and apps is a crucial aspect for providing this data management in the cloud. Documents must be self-describing and defending in this method, independent of their contexts. Data lock-in is also a significant issue. Because the provider ends business or service activities, cloud users may be forced to migrate data or services to another provider or back to an in-house IT environment.
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
183
Vulnarability to attack: Every component in cloud computing is online, exposing possible risks. Even the greatest teams are subjected to serious attacks and security breaches on occasion. It’s simple to run before you learn to walk since cloud computing is designed as a public service. After all, no one at a cloud vendor looks over your administrative abilities before providing you an account: all you usually need is a valid payment card to get started. Limited control and flexibility: Because the service provider owns, manages, and monitors the cloud infrastructure, the consumer has very little influence over it.
3 Related Works In this section, we review research on the use of cloud computing in education. Chandra and Malaya [14] highlight the use of cloud computing in education through an analytical study. This investigation concerns Virtual Cloud Labs (VCL), Appache VLC project, IBM case study, On-premises versus Cloud and Indian cloud computing scenario. This survey validated the effectiveness of the cloud as a provider of educational services. The Adoption of cloud solutions for education reduces software licensing costs and the number of IT personnel, optimizes the use of existing IT resources to better respond to data processing, provides a sufficient platform for educational evolution and minimizes the global cost. Wang and Xing [15] dealt with the application of cloud computing in the computerization of education. This work presents the experience of some universities in the application of the cloud computing. The authors affirmed that the cloud allows a good adaptation to the evolutions of information technologies in education, provides a unified and open platform for information on education, equal distribution of resources and an improvement in the informatization of education. Radulescu [16] offers a new approach that uses distance education with cloud computing. This contribution consists in extending the model of cloud services by virtual laboratories. The new service is called Virtual Laboratories as a Service VLaaS. The VLCS (Virtual Laboratories Cloud System) is a software that connects cloud management system with E-Learning management system. The utilization of virtual laboratory can allocate, configurate and release resources and laboratory per use. In [17] Vaidya et al. present the impact of cloud computing in education. The great use of smartphones on a daily basis, in addition to their efficiency for communication and their connectivity to cloud services are demanded in modern education given the enormous amount of educational resources stored and processed. The authors are for the adoption of mobile cloud computing given its reduced cost, its portability and its ease and speed of sharing and deployment. The authors presented some cloud computing systems for education such as MobihybridEduCloud (MHEC) and Mobile Hybrid Cloud (MHC). Xia et al. [18] proposed a new schema describing the ease with which cloud computing has provided for educational institutions. Outsourcing large amounts of data in the form of images to cloud servers, which reduces computational load and high storage, but raises privacy concerns. In the above-mentioned paper, an efficient computing protocol that protects the privacy offor highly encrypted images demonstrating the local bit pattern methods is discussed. The system proposed in article above uses image segmentation, blocks, and pixel permutations to protect it’s privacy.
184
W. Louhichi et al.
Ayush Gaur, Manoj Manuja [19] talked about the “Education Cloud”, which is a framework that connects the flexibility of the powered by the cloud and the use of a point of contact. Online classes are created using the Learning Management System (LMS). To provide a single point of contact, a cloud management portal is used, which acts as a union between the LMS and any cloud service provider. The user can request any cloud service provider. Which cloud services and support the required implementation, which makes the model simpler and more scalable. Mehmet Faith Erkoç, Serhat Bahadir Kert [20] discussed interesting factors in Cloud Computing first time reducing hardware and computing costs and secondly increasing required storage capacity with exponential growth in data size over time and third is the widespread availability of and adoption of Compute Services. The goal of this prototype was to efficiently handle technological needs such as data storage and university computing. Mircea and Andreescu [21] presented an adoption plan and a starting point for colleges to employ Cloud Computing. Authors, presented a strategy witch is divided into five stages, with the first focusing on evaluating data and processes/functions/applications from several major universities based on a set of key criteria, and the second on creating a link between these aspects and the models/services/applications available on the Cloud market. Hawever, Naved et al. [22] introduced cloud computing as a way to save IT expenses while increasing flexibility and scalability; nevertheless, the long-term viability of many cloud computing services remains to be proven. Measures to address security, privacy, and legal problems, as well as cloud technology standards, are currently being developed. The relevance of cloud computing technologies in educational institution management is discussed in by Naved et al. [22]. This creates a slew of issues for educational institutions when it comes to cloud computing usage. Lin Hu [23] examined the possibilties of the integration of cloud computing with mobile education and proposed a new mobile education mode based on cloud computing. Authors annouced that the use of cloud computing technology in the field of mobile education will significantly lower educational institution operational costs and free up funds for low-carbon, creative education. The essential framework of the future education mode is the mobile education cloud, which delivers stable learning services to the majority of learners. Di Yu [24] annouced that most Chinese universities used cloud computing for teaching management, breaking away from the traditional mode of university resource management and fully integrating with modern advanced concepts and technologies to keep up with the times. Author addes that Cloud computing gives access to a lot of computer power and a lot of information. Learning and scientific research resources, making work more convenient and efficient. At the same time, it offers a new path for educational management growth informationization. In the same context Ghazel Riahi [25] exposed that the cloud computing, by introducing an efficient scaling mechanism, can allow suppliers to build e-learning systems, resulting in a new type of e-learning system. [26] The VCL (Virtual Computing Lab) is a cloud-based solution for educational services provides students, teachers, and administrators with hardware, software, and service resources without the need for a powerful hardware setup at home or office. Users can access to the VCL account through a personal computer or workstation at
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
185
any time and from any location. The benefits of VCL are not available with traditionnal practical work rooms.
4 Background In this section, Timed and Colored Petri Nets are presented then, a definition of the elasticity and its mechanisms. 4.1 Timed and Colored Petri Nets A Petri Net (PN), also called a Place/ Transition Network or P/T Net, is a graphical and mathematical model used to represent and verfy the dynamic behavior of various systems (IT, industrial, …). [27] A Generic Petri Net (PN) is a bipartie graph composed of non-empty sets of places P and transitions T. Depending on the model, places and transitions are linked by oriented arches, each arc can have a valuation. The place can contain tokens from the initialization or following the crossing of transitions, usually tokens indicate that there are resources available. In Colored Petri Net (CPN), the places have a data type called clour set which enumerates a set of authorized token colors at this location. Unlike the PN, the token in CPN has an attached data (color). The Timed and Colored Petri Nets (TCPN) is a CPN to which is added the notion of the time. A timestamp is attached to token to indicate the arrival time on the transition or the time when the token is ready to firing transition. A formal definition for a CPN is stated as nine-tuple CPN = (P, T, A, P, V, C, G, E, I), where [27]: • • • • • • •
P is a finite set of places (P = {p1, p2, …, pn}) T is a finite set of transitions (T = {t1, t2, …, tm}), such that P ∩T = ∅ A ⊆ P × T ∪ T × P is a finite set of directed arcs. is a finite set of directed non-empty colour sets. V is a finite P for all variables v ∈ V set of typed variables such that Type [v] ∈ C: P → is a colour set function that assigns a colour set to each place. G:T → EXPRv is a guard function that assigns a guard to each transition t such that Type [G(t)] = Bool. • E: A → EXPRv is an arc expression function that assigns an arc expression to each arc a such that Type [E(a)] = C(p)MS, where p is the place connected to the arc a. • I: P → EXPR∅ is an initialization function that assigns an initialization expression to each place p such that Type[I(p)] = C(p)MS.
186
W. Louhichi et al.
4.2 Cloud Elasticity The elasticity is the variation of resources to the workload. In order to adjust the supply of resources in proportion to demand, elasticity controller must be used. It makes it possible to set up a set of rules allowing the resizing of cloud resources to meet demand with minimum costs and energy consumption. Scaling is a strategy adopted by considering several variables such as the actual workload and the resources allocated and available, and even the prediction of the future state of the cloud system [1]. Resizing is beneficial for the cloud provider as well as the user to reduce the cost as needed. In the case of virtual laboratories, the administrator can adjust the requested resources to the software used, which reduce the cost. The elasticity can be horizontal or vertical, as shown in Fig. 2. The horizontal scaling is ensured by two mechanisms; the first, scaling out, is the addition of resources such as VM if additional resources are needed (in case of under-supply), and the second, scaling in, is the removal of a resource in the event of over-supply. Another method of ensuring the elasticity of cloud systems is migration. This operation is the change of the hosting of a virtual machine from one physical machine to another. The vertical scaling is ensured by two mechanisms; the first, scaling up, reconfiguring the resource by adding the amount of CPU, RAM or BW (in case of under-supply), and the second, scaling down, is the reconfiguring the resource by reducing the amount of CPU, RAM or BW (in case of over-supply).
Fig. 2. Scaling mechanisms
5 Proposed Approach In this section, we present the challenges to move to VCL and we describe our proposed contribution in detail. Our proposed model is based on Timed and Colored Petri Nets (TCPN) to create sizing and resizing the Virtual Cloud Labs by using vertical elasticity. 5.1 The Challenge for Moving to VCL Based on what exists in computer labs, we notice several limitations in these traditional labs. Table 1 summarizes the various disadvantages of the traditional labs.
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
187
The use of virtual laboratories of cloud computing first, allows a significant reduction in hardware and software costs through virtualization and the pay per use service. Then they avoid the problems related to the installations given the insufficiency of infrastructure and IT staff skills. The adoption of VCL in education is beneficial for the student, the teacher, and the establishments. Table 1. Disadvantages of traditional Labs Element
Remarks
Computers
Hardware configuration does not support the software used
IT Staff
Number of technicians is enough to manage laboratories
Students
More than one student per computer so less practice per student
Teacher
He cannot ensure that the information is transmitted to the student. The evaluation of students is done by pair
Software
High cost of licenses which leads to the use of unlicensed software or trial versions
Access time
Reduced access time to the teaching session
Storage space
Very limited and non-private storage space, destruction of data by other users is possible
Scalability
In case of software that requires more hardware configuration we do not have too many solutions to scale our infrastructure
Infrastructure and computational power
No capacity for on-demand infrastructure and computational power
Collaboration
Weak collaboration between users
Security
Low level of security
Virtualization
Infrastructure based on high use of physical resources
5.2 Model Description In this section, we present our proposed model which is based on the Timed Colored Petri Nets for managing virtual cloud lab elasticity. Figure 4 shows the proposed model for managing VCL, first for a lab can be created by an user (Teacher, Student, Others) in any Cloud Computing deployment model (Public, Private, Hybrid, Community) on a well determined course and with initial capacity. The essential metric of capacity is the Mips (Million instructions per second) of the central processing unit (CPU). Then, the
188
W. Louhichi et al.
creator or the administrator of le lab allocates the resources necessary for the smooth running of the course. In our model we have chosen to put three types of resources that correspond to three different Virtual Machine (VM) configurations as shown in Table 2. As already mentioned, we use vertical elasticity mechanisms to adjust reserved resources to consumed resources after system sizing. Vertical elasticity can be a scaling up or scaling down. We have chosen the following configuration for the Virtual Machines [28]. Table 2. Virtual Machines specification. Machine Name
CPU (MIPS)
M4.4Xlarge
15,000
M4.10Xlarge R4.16Xlarge
RAM (GB)
Storage (GB)
BW(Gbps)
64
1 GB -16 TB
100 Mbps
97,000
160
1 GB -16 TB
10 Gbps
350
488
1 GB -16 TB
100 Gbps
The machines M4.4Xlarge, M4.10Xlarge and R4.16Xlarge matches respectively the VMA, VMB and VMC tokens of the color set Resource. 5.3 Vertical Elasticity Algorithm The state of the VCL can be normal, oversupplied, or undersupplied. To determinate the state of VCL we calculate the sum of all allocated resources for the lab, and we compare it with the capacity of the lab. A laboratory is undersupplied when the resources used exceed 90% of the laboratory capacity. A laboratory is oversupplied when the resources used are less than 30% of the laboratory capacity.
State (L) =
Over_provisioning if ( ∑ capacity (Ri) < 0,3 * capacity (L)) Under_provisioning if ( ∑ capacity (Ri) >0,9 * capacity (L)) Normal otherwise
Begin For each L ∈ Lab { S=0 ; for each R ∈ L { S=S+ CPU(R) } If (S> Cap(L) * 0,90) Then Scaling Up ; Else If (S< 0,3 CAP(L)) Then Scaling Down ; Else NULL; End if ; End if; } End;
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
189
5.4 Proposed Solution For the development of our model, we took as support strategies and architectures adopted by cloud computing oriented high education and E-Learning. Figure 1 shows the different deployment models as well as the different services offered by an Cloud architecture for high education. This figure contains an additional level for mobile resources concerning E-Learning, Virtual Lab, research environment and others. Virtual computing lab is a service provided by cloud computing that allows students and teachers (users) to book and access to virtual machines, labs and applications online. The applications can be standard or customized. Every user can use this virtual lab anywhere and anytime from their computer or workspace. A virtual laboratory is a set of hardware and software provided by a cloud provider and deployed in a cloud environment. It can be a service model using virtual machines configured as needed. It such case configuration can be hardware (CPU, core, Memory,…) or software (applications, programs). The user can manage his workspace according to the needs of the course. The management of the workspace can be linked with a learning management system (Moodle, blackboard,). Figure 3 shows the lifecycle of a virtual lab.
Fig. 3. Virtual laboratories life cycle
In this solution we tried to have flexibility for the creation of virtual laboratory and an ease of resizing of system for a good adaptation of the resources provided to the workload. Adapting the resources used to the workload allows us to reduce the cost of services without affecting its quality (without downtime).
190
W. Louhichi et al.
The elasticity of this model is beneficial for the cloud provider as well as for the user of the laboratories. Figure 4 shows our Petri Nets elastic model. The formal definition of the proposed TCPN Model is: TCPN = {P, T, A, , V, C, G, E, I} • P = {Model, User, Subject, Resource, LABS, LabRes} • T = {Create_VCL, Allocate_Ressource, Scaling_UP, Scaling_Down} • A = {(Model → Create_VCL), (User → Create_VCL), (Course → Create_VCL), (Create_VCL → LABS), (LABS → Allocate_Resource), (Resource → Allocate_Resource), (Allocate_Resource → LABS_RES), (LABS_RES → Scaling_UP), (LABS_RES → Scaling_Down), (Scaling_UP → LABS_RES), (Scaling_Down → LABS_RES)} • = {Model, User, Resource, Labs, LabRes} • V = {M, U, C, R, L}
C(p) =
Model if p ∈ {Model, LABS} User if p ∈ {User, LABS} Labs if p ∈ { LABS } Resource if p ∈ {Resource} LabsRes if p ∈{LABS_RES}
G (t) = true for all t∈ T
5.5 Support Tools Our model is edited by CPN/Tools.CPN/Tools were originally developed by the CPN Group at Aarhus University from 2000 to 2010 [22]. CPN/Tools comprise two main components, a graphical editor, and a backend simulator component. CPN/Tool is a tool for editing, simulating, and analyzing basic PNs, timed PNs and colored PNs [1].
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
191
Fig. 4. An elastic Virtual Computing Lab in Higher education.
6 Conclusion The proposed model extends the functionality of cloud computing for education by providing a virtual lab as a Service and the scaling of lab resources according to the utilization. This model uses an algorithm for vertical elasticity. The program controls the VCL by comparing the resources allocated by the resources used, after comparing the resources we can adopt the scaling depending on Virtual Cloud State. This model enriches the e-Learning of practical work with another method and transcending the limits of material configuration encountered in physical laboratories. This method reduces the cost of the lab service by scaling resources according to the applications used, the time of use. In fact, the adoption of our elastic model will guarantee an optimization of the use of resources which is adjustable according to the demands of the users. In our future work first, we will try to model virtual machines or laboratories by a package of collaborating applications for the same objective of a teaching unit and with a necessary and sufficient hardware configuration. Then we will plan to add a module for horizontal elasticity and even migration and why not we offer well-configured laboratories that contain the necessary software for the study of performance for research subjects. These laboratories can be classified by objective or themes.
192
W. Louhichi et al.
References 1. Louhichi, W., Berrima, M., Robbana, N.B.R.: A timed and colored petri nets for modeling and verifying cloud system elasticity, Int. J. Electrical Comput. Eng. 16(3), 24–33 (2022) 2. Swenson, Final Version of NIST Cloud Computing Definition Published, NIST, 25 octobre 2011. https://www.nist.gov/news-events/news/2011/10/final-version-nist-cloud-comput ing-definition-published. (Accessed 15 avril 2022) 3. Almajalid, R.: A Survey on the Adoption of Cloud Computing in Education Sector, (juin 2017) 4. Sitaram, D., Manjunath, G.: Chapter 4 - software as a service. In: Sitaram, D., Manjunath, G., (eds.) Moving to the Cloud, p. 153–204. Syngress, Boston (2012) 5. Sitaram, D., Manjunath, G.: Chapter 3 - platform as a service. In: Sitaram, D., Manjunath, G., (eds.) Moving to the Cloud, pp. 73- 152. Syngress, Boston (2012) 6. Sitaram, D., Manjunath, G.: Chapter 2 - infrastructure as a service In: Sitaram, D., Manjunath, G., (eds.) Moving to the Cloud, pp. 23- 71. Syngress, Boston (2012) 7. Patel Hiral, B.: Cloud Computing Deployment Models: A Comparative Study (mars 2021) 8. Gkikas, D.: The Impact of Cloud Computing on Entrepreneurship and Start-ups : Case of Greece. 2014. Consulté le: 16 (avril 2022) 9. Bamiah, M., Brohi, S.: Exploring the cloud deployment and service delivery models. Int. J. Res. Rev. Inf. Sci. 3, 2046-6439 (2011) 10. Kulkarni, G., Patil, N., Patil, P.: Private cloud secure computing. Int. J. Soft Comput. Eng. (IJSCE), 2231-2307 (2012) 11. Balasubramanian, R., Aramudhan, M.: Security issues: public vs private vs hybrid cloud Computing. Int. J. Comput. Appli. (0975 – 8887), 13 (2012) 12. Marinos, A., Briscoe, G.: Community Cloud Computing, pp. 472–484 (janv. 2009) 13. Fujita, H., Tuba, M., Sasaki, J.: I. C. on A. C. S: WSEAS (Organization) International Conference on Digital Services, Internet and Applications, Éd., Study on advantages and disadvantages of Cloud Computing – the advantages of Telemetry Applications in the Cloud (2013) 14. Chandra, D.G., Borah Malaya, D.: Role of cloud computing in education. In: 2012 International Conference on Computing, Electronics and Electrical Technologies (ICCEET), Nagercoil, Tamil Nadu, India, pp. 832-836 (mars 2012) 15. Wang, B., Xing, H.: The application of cloud computing in education informatization. In 2011 International Conference on Computer Science and Service System (CSSS), pp. 2673–2676 (juin 2011) 16. R˘adulescu, S.A.: ¸ A perspective on E-learning and cloud computing. Proc. - Soc. Behav. Sci. 141, 1084–1088 (20140) 17. Vaidya, S., Shah, N., Virani, K., Devadkar, K.: A survey: mobile cloud computing in education. In: 2020 5th International Conference on Communication and Electronics Systems (ICCES), pp. 655–659 (juin 2020) 18. Xia, Z., Ma, X., Shen, Z., Sun, X., Xiong, N.N., Jeon, B.: Secure image LBP feature extraction in cloud-based smart campus. IEEE Access 6, 30392–30401 (20180) 19. Gaur, A., Manuja, M.: Implementation framework for cloud-based education-as-a-service. In: 2014 IEEE International Conference on MOOC, Innovation and Technology in Education (MITE), pp. 56–61 (déc. 2014) 20. Erkoç, M.F.: Cloud Computing For Distributed University Campus: A Prototype Suggestion, pixel-online.net, Consulté le: 22 (avril 2022) 21. Mircea, M., Andreescu, A.: Using cloud computing in higher education: a strategy to improve agility in the current financial crisis. Commun. IBIMA 2011 (2010)
An Elastic Model for Virtual Computing Labs Using Timed Petri Nets
193
22. Naved, M., Sanchez, D.T., Dela Cruz, A.P., Peconcillo, L.B., Peteros, E.D., Tenerife, J.J.L.: Identifying the role of cloud computing technology in management of educational institutions. Mater. Today: Proc. 51, 2309–2312 (2022) 23. Hu, L.: The construction of mobile education in cloud computing. Proc. Comput. Sci. 183, 14–17 (2021) 24. Yu, D.: Application of cloud computing in education management informatization construction. J. Phys. Conf. Ser. 1744, 032062 (2021) 25. Riahi, G.: E-learning systems based on cloud computing: a review. Proc. Comput. Sci. 62, 352--359 (2015) 26. Rindos, A., Vouk, M.: The Transformation of Education through State Education Clouds p. 12 27. Volpe, G.: Agostino Marcello Mangini and Maria Pia Fanti. Job Shop Sequencing in Manufacturing Plants by Timed Coloured Petri Nets a Praticle Swarm Optimization, IFAC PapersOnLine, vol. 55–28, pp. 350–435 (2022) 28. Ali, S., Mostafa, G.-S., Leila, E.: An elastic controller using Colored Petri Nets in cloud computing environment. J. Netw. Softw. Toolsand Appli. 22(3) (2019)
A Decision Support System Based Vehicle Ontology for Solving VRPs Syrine Belguith1(B) , Soulef Khalfallah1,2 , and Ouajdi Korbaa1 1
2
MARS, Research Laboratory LR17ES05, University of Sousse, ISITCom Hammam Sousse, Sousse, Tunisia [email protected] High Institute on Management of Sousse (ISGS), Sousse, Tunisia
Abstract. Vehicle Routing Problem (VRP) is an optimization problem. In the practice, vehicle characteristics must be explicitly considered in the resolution. This concept is defined in literature for a specific purpose (Intelligent Transportation, Automotive, Transportation planning, etc.). For a better understanding of the domain of the classification of VRP, a VRP-Vehicle ontology is presented as a means of knowledge representation of this field. A system-based VRP-Vehicle ontology proposed aims at generating a CSV file that represents the classification of the VRP. Furthermore, it helps transport companies to specify the characteristics of the customers and the vehicles that will be used for the resolution of the VRP.
Keywords: Vehicle ontology
1
· VRP · Decision Support System
Introduction
The vehicle concept is one of the main resources for transport companies and any company interested in routing problems, such as schools that use buses to transport students, the post office to distribute postal parcels, etc. Data or other vehicle-related concepts vary from one vehicle to another. Indeed, the same means of transport may comprise a compartment for objects and a compartment for passengers. These vehicle specifics must be explicitly considered in modeling and optimization to obtain accurate costs and feasible solutions. In the VRP literature, the specification of vehicle characteristics for optimization is based on the knowledge of experts in the domain. Whereas in industry, these properties are incompressible for transport companies. Currently, ontology is the most successful language for a formal representation of knowledge. It aims to represent a domain by formally defining the appropriate concepts, the properties in the field, and the relationships between these concepts. Ontology provides a common vocabulary to distinguish the different meanings of terms between the different domains. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 194–203, 2023. https://doi.org/10.1007/978-3-031-35510-3_19
A Decision Support System Based Vehicle Ontology for Solving VRPs
195
Divers studies apply Vehicle ontology to diverse fields. To our knowledge, no work considered vehicle ontology for classification purposes. The main purpose of this paper is to afford the companies of transport, a decision support system that would help them to identify the right vehicle used in their fields for the resolution of VRP. Furthermore, it is based on mapping API to generate parameters for the classification of VRP. For the development of this system, there is a need to use an ontology that can capture all vehicle characteristics based on the standardized classification of vehicles. This paper is organized as follows: Sect. 2 reviews the literature on classifications of VRPs as well as ontologies for the vehicle domain. Section 3 provides an overview description of the Decision Support System. Then, Sect. 4 describes the proposed ontology VRP-vehicle. Finally, Sect. 5 summarizes the conclusions and future works.
2 2.1
Literature Review Classification of Vehicle Routing Problems
Traditionally a VRP problem is identified by a succession of letters such as VRP, CVRP where the C designates capacity, VRPTW where TW designates time windows, etc. But given the complexity of VRP problems, this classification could not cover all the characteristics of a problem and many suggestions in the literature are proposed. The Classification of VRP is the identification of parameters. These parameters are presented in a different manner in the literature. In [1], the authors present a large number of parameters for the classification of VRPs. These attributes are identified from an interesting taxonomy. This taxonomy classification is based on the description of “problem physical characteristics” and “scenario characteristics”. The problem’s physical characteristics describe the vehicle, travel time, depot, restrictions, and objectives related. Scenario Characteristics are based on presenting some parameters such as time windows structure, waiting time, customer services, etc. In [2], the authors add other parameters to this taxonomy for the classification of VRPs such as l’Open VRP (OVRP), Dynamic VRP (DVRP) et Time-Dependent VRP (TDVRP). In the diagram proposed by [3], the parameters are based on the description of resources, constraints, and objectives to be optimized. The part of the resource is divided into three subfields: mobile resources which specify all information about vehicles and workers; fixed resources which define the attributes related to the transportation network and depot, and eventually demands that can be associated with Customer characteristics. In [4], the authors define explicit characteristics of problems related to vehicles, the transportation network, and customers. They present specific characteristics of concepts related to Vehicles (drivers) that can have an important influence on solution quality. For example, they take into consideration the regulation of the working hours of drivers and integrates it into the optimization. They conclude that a simple simulation of driver behavior
196
S. Belguith et al.
can be more reliable than a full optimization algorithm taking into account all regulatory factors and exceptions. To conclude, The VRP classifications in the literature describe mainly vehicle features and concepts related as well as transportation network features. 2.2
Ontologies for Vehicle Domain
Different ontologies of vehicles have been proposed in the literature. They can be classified according to the scope. In this context, we have identified five main areas, namely: intelligent transport systems, transportation planning, automotive, the sale of cars and finally classification of accidents (Table 1) Intelligent transportation systems: Intelligent transportation system (ITS) are the information systems and control that use integrated communications and information processing technologies in transport. The vehicle ontology is used to define the characteristics of three transport modes (public, private and priority) for the intelligent and sustainable transportation system [5]. In this ontology, vehicle data are identified by the price, the GPS location, CO2 passenger emissions, product volume/package, the number of seats and parcels. These specific characteristics make it possible to maximize the economic compensation of the users and to minimize the damage to the environment. Another classification of vehicles is divided into public vehicles (taxi), private vehicles (car), priority vehicles, and utility vehicles [6]. The vehicle route varies depending on the traffic found or the unexpected events. In addition, vehicle types are presented as maintenance vehicles, private, public, commercial vehicles, and emergency vehicles [7]. Each vehicle gets the information of the road network from the ontology model and the absolute coordinates in consequence to share information between vehicles. Transportation Planning: Transportation planning is the process of defining future policies, the development of infrastructure, and other activities based on a plan to find future transportation needs. ICity-Vehicle is a representation of different cat´egories of vehicles that allow Transit. it describes two concepts: Vehicle (public and private) and Vehicle Type [8]. It is based also on standard ontologies or formal representations to describe vehicle data properties like Schema.org, OM ontology, etc. Then, it defines certain vehicle types as instances. While each type is characterized by its data properties. GCI-Vehicle reuses iCity-Vehicle and redefine it into three classes; Public Transport Vehicle, Vehicle and Aircraft staff [9]; Airplanes are also public vehicles that can use by all audiences.
A Decision Support System Based Vehicle Ontology for Solving VRPs
197
Table 1. Scope, application, concepts of identified ontologies. Ref Scops
Applications
[5]
Intelligent transportation
Shared vehicle Transport, Mode, Vehicle system type, Driver
Concepts
Number of Seats, Price per parcel, Fuel capacity, CO2 emissions per number of seats, GPS Location, Parcels Capacity
Data property/Objets property
[6]
Intelligent transportation
A system to Vehicle, Driver, Warning, support driver VehicleType safety
vehicle type, can drive, has driver, has action, has warning, has location
[7]
Intelligent transportation
Decision support system
Vehicle, infrastructure
vein:isAtASpeedOf vein:previousJunction vein:comingJunctions
[8]
Transportation planning
No service
V´ehicule, V´ehiculeType, capacity,OM :length, schema org thing
SchemaOrgProperty(cargoVolume, drive wheel configuration, fuel consumption) SchemOrgDataProperty(number of axels, number of doors, vehicle seating capacity)
[9]
Transportation planning
Answer to competency questions
Public transport vehicle, Personnel vehicle et Aircraft
hasvehicle:hasMode, hasMake, hasModel, hasInsuranceProvider, vehicle:hasVehicleType
[10] Automotive
Answer to competency questions
Vsso:Vehicle Vsso: vehicle Component, Vsso: DynamicVehicleProperty
hasStaticVehicleProperty hasDynamicVehicleProperty
[11] Automotive
No service
Branch(component), signal hasVIN hasFuelType and attribute
[12] Accidents
Network Simulation Tests, real crash tests
Vehicle, occupant, environnement, accident
Trailer, model, make, chaisis
Automotive: VSSO [10] is a new formal representation of the signals and attributes of cars. The basic concepts of the ontology are based on standard GENIVI and W3C data models. They are divided into three main concepts: branch or components attributes that describe the static information of a car and signal that describe the dynamic information treated with sensors. The applications are monitoring the fleet, the contextual representation of a car, and interaction with web services. At the web-scale, the W3C Automotive Ontology Working Group work that represents a schema.org extension, have developed web ontologies for the interoperability and reuse of vehicle data in the same range of automobile. VSS [11] redefines the concepts of VSSO in three other concepts. It improves it to be a standard semantic model based on use cases. These are grouped into the analysis, as a way to query data, and services, as a means of manipulating the data. Vehicle sales: In the field of vehicle sales, an ontology for describing vehicles and user profiles is proposed in the recommendation of vehicles. A system based on these ontologies helps the user to generate personalized vehicle recommendations. [12] The purpose of VAO (Vehicle Advertisements Ontology) is to help vehicle buyers find the most relevant vehicles that match their profiles [13]
198
S. Belguith et al.
Classification of accidents: The objective of Draft [14] (Road Accident ontology) is to create an ontology of road accidents and accompanying resources involving people, vehicles, and animals, having causes and effects, etc. The goal of Caova (Accident Ontology for Vanets) is to provide an interoperable data structure for automotive safety applications. This ontology helps to classify accidents and estimate their severity [15]. To conclude, there are several vehicle ontologies, each is designed for a welldetermined goal (representation of the domain, accidents, transport planning). To our knowledge, no previous wok considered vehicle ontology for classification purposes. On the other hand, these existing ontologies model several categories of vehicles in different domains, but they do not take into consideration a standardized classification of Vehicles.
3
Decision Support System
The objective of the decision support system is to enable the user or decisionmaker to choose the right vehicle and optimize the route or routes of his vehicle’s fleet. The user describes the characteristics of the fleet and the position and characteristics of the customer. The vehicle ontologies through its knowledge base help him choose the right fleet. The internal management of the system generates distance matrices and CSV files that represent the classification of the VRP then it chooses the most appropriate algorithm to propose a solution for the decision-maker. The overall architecture of the decision support system is shown in Fig. 1.
Fig. 1. Decision support system based ontology for the VRP
A Decision Support System Based Vehicle Ontology for Solving VRPs
199
Our proposed DSS architecture consists of three major components: The user interface The user interface is dedicated to the interaction between the system and the user to identify data related to the fleet vehicle (capacity, weight, speed, number of axles, number of compartments, etc.) according to each category of vehicle. This user interface represents the vehicle properties in a standard “English” vocabulary. It displays also the images that exist in the knowledge base which will allow the user to better identify vehicles. Secondly, it focuses also on identifying customers’ characteristics such as demand types (delivery, pick-up, pick-up-delivery), depot, and time windows of clients. We are using the Google MyMaps application to depict the location of each customer. The Knowledge Base. The knowledge base is the basis of our VRP-Vehicle ontology which represent the vehicle domain for the VRP; it describes the concepts, the properties related to the Categories of vehicles, Vehicle, compartment, and the relationships between these concepts. Indeed, our Knowledge base includes pictures that recommend the decision-maker to choose the right vehicle according to his vehicle’s fleet. The Model Management System. This component is the main component of our system. The model management system refers to generating the parameters related to the fleet of vehicles (Costi jk, Speedk, Capk) and to customers from the data of the company of transport. These parameters are stored in a CSV file which defines the classification of the VRP (Fig. 2). Furthermore, this CSV file that compromises parameters help us in the formulation of constraints and objectives of VRP as result to optimize the routes. The model management system can generate also distance matrices based on API Bings Maps. This API is ameliorated to identify a distances matrix or a time duration for any problem size
Fig. 2. Classification OUTPUT
4
Proposed VRP-Vehicle Ontology
The objective of VRP-Vehicle Ontology is used for classify VRPs and to create reasoning rules in order to deduce the category of each vehicle. The VRP-Vehicle
200
S. Belguith et al.
ontology describes all the characteristics of the vehicle. From these characteristics, we deduce new parameters. These parameters bring us back to the classification of the VRPs on the one hand, and the formulation of the constraints, and the objectives on the other hand. Several concepts, different attributes or/and relationships between vehicles, and other concepts are defined in VRP-Vehicle Ontology. The attributes include characteristics of vehicles such as speed, number of wheels, type of fuel, weight, and number of compartments. The concepts which are Category of vehicle, Personal Vehicle Type, Public Transportation Vehicle, and compartment are shown in Fig. 3. The object Data properties are based on a standard data model as SchemaOrg Property for interoperability of data vehicle (Fig. 4). The VRP-Vehicle ontology reuses some concepts or classes of the iCity-vehicle and GCI-Vehicle. This ontology is classified into a public or personal vehicle.
Fig. 3. Taxonomy of VRP-Vehicle concept
A Decision Support System Based Vehicle Ontology for Solving VRPs
201
Fig. 4. Object properties of VRP-vehicle
– Public transport Vehicle which is characterized by public mode like train, public bus, ferryCrossing, aircraft, and shared taxi, etc. – Personnel vehicle which describes personal vehicles such as cars, bicycles, trucks, motorcycles, etc. Public transport vehicle is characterized by special routes while the optimization is based on Personnel Vehicle. As mentioned in Sect. 2, iCity describes some instances of vehicle types while in our ontology, the categoryVehicle concept is proposed. The latter describes the categories of each vehicle which is based on a standardized classification of vehicles. The proposed ontology is based on the most commonly used classification system developed by the Federal Highway Administration (FHWA) in the area of traffic surveillance. Vehicle types are classified according to the gross vehicle weight rating (GVWR) and The number of axles of the vehicle. We create reasoning rules in order to deduce the category of each vehicle are shown in Fig. 5.
Fig. 5. SWRL rules for vehicles classification
202
S. Belguith et al.
A compartment defines a loading area of the product that imposes incompatibility or separated constraints; Most ontologies describe compartment (Fig. 4) as a data property or object property (iCity: hascargoCapacityLoad, iCity: hasSeatingCapacity) While this concept is the basic concept for the VRP. It is characterized by several specificities. A vehicle has one or more compartments, each having a capacity according to a particular unit of measurement: the capacity could be measured in number, in Pounds, in Kg, in Liters, etc.
5
Conclusion
The vehicle ontology we propose is part of a decision support system (DSS) that allows classifying the vehicle routing problem, identifying the appropriate algorithm to solve it, and finally proposing a routing solution optimized for the decision-maker. For future work, the ontology can be modeled with more types of vehicles such as drones. Furthermore, it is necessary to implement a model of VRP to evaluate our system in a real domain. We will generate all parameters related to vehicles and customers.
References 1. Eksioglu, B., Vural, A.V., Reisman, A.: The vehicle routing problem: a taxonomic review. Comput. Ind. Eng. 57(4), 1472–1483 (2009) 2. Braekers, K., Ramaekers, K., Van Nieuwenhuyse, I.: The vehicle routing problem: state of the art classification and review. Comput. Ind. Eng. 99, 300–313 (2016) 3. Cherif-Khettaf, W.R., Rachid, M.H., Bloch, C., Chatonnay, P.: New notation and classification scheme for vehicle routing problems. RAIRO-Oper. Res. 49(1), 161– 194 (2015) 4. Vidal, T., Laporte, G., Matl, P.: A concise guide to existing and emerging vehicle routing problem variants. Eur. J. Oper. Res. 286(2), 401–416 (2020) 5. Giret, A., Julian, V., Carrascosa, C., Rebollo, M.: An ontology for sustainable intelligent transportation systems. In: Bajo, J., et al. (eds.) PAAMS 2018. CCIS, vol. 887, pp. 381–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-31994779-2 33 6. Fernandez, S., Hadfi, R., Ito, T., Marsa-Maestre, I., Velasco, J.R.: Ontology-based architecture for intelligent transportation systems using a traffic sensor network. Sensors 16(8), 1287 (2016) 7. Choi, S.K.: An ontological model to support communications of situation-aware vehicles. Transp. Res. Part C Emerg. Technol. 53, 112–133 (2015) 8. Katsumi, M., Fox, M.: iCity Transportation Planning Suite of Ontologies. University of Toronto (2020) 9. Yousif, W., Fox, M.A.: Transportation Ontology for Global City Indicators (ISO 37120). Enterprise Integration Laboratory Working Paper (2018) 10. Klotz, B., Troncy, R., Wilms, D., Bonnet, C.: VSSo: the vehicle signal and attribute ontology. In: SSN@ ISWC, pp. 56–63, October 2018
A Decision Support System Based Vehicle Ontology for Solving VRPs
203
11. Wilms, D., Alvarez-Coello, D., Bekan, A. An evolving ontology for vehicle signals. In: 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), pp. 1–5. IEEE, April 2021 12. Le Ngoc, L., Abel, M.H., Gouspillou, P.: Towards an ontology-based recommender system for the vehicle domain. In: 3rd International Conference on Deep Learning, Artificial Intelligence and Robotics, ICDLAIR, December 2021 13. de Paiva, F.A.P., Costa, J.A.F., Silva, C.R.M.: An ontology-based recommender system architecture for semantic searches in vehicles sales portals. In: Polycarpou, M., de Carvalho, A.C.P.L.F., Pan, J.-S., Wo´zniak, M., Quintian, H., Corchado, E. (eds.) HAIS 2014. LNCS (LNAI), vol. 8480, pp. 537–548. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07617-1 47 14. Dardailler, D : DRAFT Road Accident Ontology. https://www.w3.org/2012/06/ rao.htmlowl. Accessed 19.01.17 15. Barrachina, J., et al.: Caova: a car accident ontology for vanets. In: 2012 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1864–1869. IEEE, April 2012
Web API Service to RDF Mapping Method for Querying Distributed Data Sources Artem Volkov1(B) , Nikolay Teslya2 , and Sergey Savosin2 1
ITMO University, Saint-Petersburg, Russia [email protected] 2 SPC RAS, Saint-Petersburg, Russia {teslya,SVsavosin}@iias.spb.su
Abstract. The paper describes the method for querying spatiotemporal data across several data sources presented in form of ontology. Data from a single source is often cannot be used for complex analysis tasks. On the other hand, it is difficult to combine several data sources and use them together in a single system. If all data are stored in relational databases, it is possible to use distributed SQL. If the data are in several sources and the task is to obtain new knowledge from existing sources and flexibly connect new sources, it is better to use knowledge graphs and tools such as RDF for data representation and SPARQL for data querying. In case of full access to each knowledge source, the only suitable solution is federated SPARQL. The paper describes a method of working with Web services, so that for users it was like a federated SPARQL. The article proposes an algorithm of the method and provide an example of its use on the system of analysis of accident cards.
1
Introduction
One of the most important tasks in a smart city development are data obtaining and processing. The obtained data can be classified into two categories: historical data and real-time data. Early smart city systems were mostly focused on the historical data due to the ease of data obtaining and software implementation. For the real-time data, it is necessary to carry out its processing also in real time. On the other hand real-time data allows to act immediately or shortly after the data enters their system. Real-time data analysis responds to queries within seconds. In this case, the processing of large amounts of data occurs with high speed and low response time. For example, real-time big data analysis uses data from financial databases to make trading decisions. Conventional information systems use relational databases to store and manage data. Relational databases require a stable data model. For knowledge bases, this is not always applicable, because the knowledge of the system is dynamic and can ”evolve”. Therefore, in such cases, it is advantageous to use graph databases [1]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 204–213, 2023. https://doi.org/10.1007/978-3-031-35510-3_20
Web API Service to RDF Querying Distributed Data Sources
205
The results of data analysis are directly related to the quality of the data sources. There may be omissions or incorrect values in the data. While aggregating data, most often there is only one task in the context of which the data will be used in the future. Due to that information storage and access is optimized to improve performance and reduce the cost of the solution. Such an aim is a problem when solving interdisciplinary, complex tasks, because it is required to combine multiple data sources, presented in different formats. In such applications, various solutions are used to merge the data: Data Warehouse, Data Lake, Data Fabric. Data Warehouse contains data in a common format that comes from a variety of sources. When new data arrives, they are automatically converted to common format. This leads to a monolithic solution that is hard to scale [2]. Data Lake is an unstructured data repository that contains raw data in contrast to Data Warehouse. Data consumers request and format the data according to their tasks [3]. The last approach is Data Fabric, where data analysis occurs in real time, resulting in lower management costs for Big Data infrastructure. The approach favors stream processing and implies that an external system provides a data stream that becomes part of the source solution API [4]. These solutions are suitable if there is control over all the data sources. But there is no comprehensive solution. When the task is interdisciplinary, it is needed to combine several data sources. Some of the data sources are external, and their data schema can change significantly. Sometimes data sources can describe the same data using different terms. Understanding the common semantics of data can simplify data aggregation. Data from a single source is not enough for the task of complex analysis. From the other hand it is difficult to use various data sources due to complex aggregation process. This paper proposes a method called W2RMM (Web API to RDF Mapping Method) that extends the ability to merge data through connected remote Web API services with proprietary data sources. The paper also provides an example of the system usage for analysis of road accident cards. The data obtained from the source of road accidents cards contains a lot of problems with the quality and have to be verified with data from the other sources.
2 2.1
Related Work Smart City Platforms
Smart city platforms make it possible to obtain new information through the analysis of open data source. For instance, the Traffic Accident Map project presents visualization of open sources and infographics. It uses the combination of a geo-referenced point and an address to determine the location of the accident. This does not improve data quality when the geo-referenced location is incorrect [5]. Another example is HomeHub, an open source project to assess the quality of housing in the city. [6]. The project uses open data to calculate housing quality indicators and visualizes them on a map. The consumer of smart
206
A. Volkov et al.
city services is most often assumed to be city residents, and the supplier is public institutions [7]. The clustering of road accident data allows identifying priority areas for improving urban transport infrastructure in the task of maximizing the benefitto-investment ratio [8]. When clustering geopoints, it is important to consider that the algorithms used may affect the output result, so it is worth paying attention to the quality of the input data [9]. Analyzing only geopoints in static without considering time can lead to incorrect conclusions, because time and space are closely related [10]. In the basic understanding, position over time may vary slightly more than other parameters, so it is important to analyze comprehensively [11]. Using various data on the use of urban infrastructure it is possible to understand who the average user is, which part of the system are the busiest and allows planning the further development of the system [12]. It becomes a problem to include all the data sources in a single knowledge base, as you need to copy all the data from one repository to another. In this case, frequent data changes leads to repeating of copying process. Such data needs to be converted before it can be used in the knowledge graph, which leads to duplication of information and unnecessary data conversion operations. If case of storing the data in its original form, using a mediator to access such data, the work is significantly reduced. Most of the papers describing the approach to data fusion using ontology do not consider the issue of scaling the system [13]. If the volume of data is small, and system operates only with historical data, then the issue of scaling is irrelevant. However, at the moment the growth rate of data volume is increasing every year, which leads to the fact that any developed system must take into account the issue of scalability. It is especially relevant for systems where realtime operation is required [14]. 2.2
Relational Databases
The R2RML standard was developed to retrieve data from a relational database using an ontology (RDB to RDF Mapping Language) [17]. Mapping is the determination of data correspondence between potentially different semantics of different objects (systems). The standard describes how to describe the mapping file between the ontology and the database. Access to such data is organized with SQL. It is not possible to work directly with such data as an ontology and transform SPARQL queries into SQL. To be able to do this, it is needed to develop a mediator to translate queries from SPARQL to SQL. Also, the data obtained in SQL must be converted into a knowledge graph. Consider the work of the R2RML mediator, which provides a SPARQL query point for users. A user sends a SPARQL query to the mediator. Mediator, using R2RML mapping, traps the SPARQL query into a SQL query and sends it to the relational database. The database returns the response to the mediator in table stream data format. The mediator, using mapping, transforms the tabular data into a knowledge graph and sends it to the user [18].
Web API Service to RDF Querying Distributed Data Sources
3 3.1
207
Accident Card Analysis System R2RML Mapping
At the moment, Ontop [19] is the most widely used tool for working with R2RML mappings. Ontop is well-designed in terms of optimizing SQL queries to improve the performance of queries into the ontology [18]. At the moment this tool can only be used as a standalone application, a console utility. The tool allows generating mapping in OBDA and R2RML formats. It can use the generated mapping to create a SPARQL endpoint as well. The complexity of using the R2RML standard is that the data mapping must be defined manually. The ontology is also developed manually by experts in the problem domain. In the case of a relational database, information about the semantics of the data can be obtained from the data schema. Therefore, it is possible to take advantage of the schema information and generate a mapping. The mapping will contain both a description of the data in the form of an ontology, and SQL queries for obtaining data [20]. Therefore, not all of the ontology tools will work, since only few of them can use external data sources. One of the solutions to the problem is to get all the data in RDF format using mapping and load it into a specialized storage. But this is not always possible, especially when the data changes frequently and is large. Some tools allow to automatically generate mapping and ontology using a database connection. But it is not always possible to use the existing reviews of tools for working with R2RML to select the most suitable one, since it is not always possible to get an objective picture [18]. Then to access data via SPARQL a R2RML mediator need to be used [17]. There are various tools for working with R2RML mappings. Some of them, such as ontop, allow generating a mapping from a data schema [18]. If there is no access to all data, data can only be retrieved in parts and there is a limit on the number of queries, the SPARQL mediator cannot be used. It is necessary to think over a method of working with web services, so that it would be as convenient for users as working with federated SPARQL. The federated SPARQL queries can access several ontologies at once [21]. This study proposes an approach to bypass the problem of data heterogeneity through the use of ontology. Using ontology, data consistency rules can be compiled, which will allow data sources to be unified in a single ontological representation [22]. In order to simplify the operation with geo-sources, this study combine data that are in relational databases with data from LinkedGeoData [23]. Further, the data is accessed by SPARQL endpoint using the Virtual Knowledge Graph (VKG) technology [24]. 3.2
Data Quality
An important aspect of working with open data is checking the quality of the data. In the case of geodata, the most important is the accuracy of geo-points position. The more accurate the position data, the better the quality of the
208
A. Volkov et al.
output analysis so it is necessary to compute data quality [25]. All data on accidents were downloaded from the Leningrad region traffic police portal from 01.01.2015 to 30.04.2021 [26]. Initially, 56 716 accident cards were uploaded. These data were then analyzed for credibility. All accident cards that are not in the Leningrad region have been deleted. For example, the coordinates were mixed up. After this check, 5 985 accident cards were deleted. That’s about 10.55% of all data. This was verified by placing all the accident cards on the map by their coordinates. This distribution is shown in Fig. 1. The first problem of data quality is insufficient data accuracy. In the case of coordinates, rounding leads to significant errors to the real position. In the original data set, there are cards whose longitude and latitude are integers, which is most likely due to coordinate rounding. An example would be the coordinate “30,60”. Such points are not on roads, but may indicate swamps or a bay. The inaccurate entries were deleted because they cannot be corrected. After deleting, the data volume decreased by another 2 573, which is about 4.54% of all downloaded data. The following inaccuracies were also found in the data. For example, in 60 cards, it was noticed that the “Lighting” column stores the value “Daylight Time”, when the time of the accident column is night or evening time. And vice versa - when the “Illumination” column stores the value “Dark time of day, lights on”, “Dark time of day, no lights on” or “Twilight” in the morning or afternoon. In total, such incorrect values were found in 5 019 cards, which is about 8.85% of all downloaded cards of accidents. But these data can not be deleted, but change some values in the column “Lighting”. It is necessary to explain about the system of analysis of accidents, what data it uses and how the conceptual architecture is arranged. There is no information about weather conditions in the accident cards, only a verbal description of lighting conditions. Basically, this is a limited set of possible values. This information can either be missing or contain incorrect values. In some cases, the weather information is inconsistent, such as indications of heavy fog and full visibility at the same time. The system implements one of the options for a distributed approach for data storage and processing. For easy scaling, each data source is allocated into a separate module together with a handler. In the center of the system is a module, which is aimed at interaction with the user. This module can be defined as the core of the system. Further, additional modules with private data can be connected, which can be extended using open data [27]. Each module acts as a standalone application. The work of the module does not directly affect the work of other modules. The modules work only with their own data source, without forwarding requests to another module. The core of the system performs the task of a search engine and a single entry point for users. The kernel does not analyze the data, because this will be done already in the modules. From this we can conclude that the kernel is a special module, which has the exclusive right to interact with other modules.
Web API Service to RDF Querying Distributed Data Sources
209
Fig. 1. The raw road accidents geopoints distribution
4 4.1
Web API Service to RDF Mapping Method Description Weather Data Sources Specific
There are numerous open sources of weather data. The problem is that only few of them provide freely accessible data for the whole world. More often local data for one country is published. And in some countries this data can be used in research, for example in Spain [28]. The article uses open historical data to conduct a study on traffic accidents in the country. For Russia, the main owner of meteorological data is Roshydromet, whose website contains a section with open data [29]. This data is presented as CSV files and was last updated in 2015. Therefore, they cannot be used in a study to obtain historical weather data for a particular geopoint. Some services only have historical data on the weather in the United States. Other services have no data for Russia, for instance [30]. The most popular global source of weather data is OpenWeather. It can be free used for querying current weather in case of a small number of requests The service provides paid access to historical data. The Yandex.Weather service only has 10 years of historic weather data in its Commercial Extended tariff. Even the minimum tariff costs a considerable amount of money [31]. The Wunderground service has data for St. Petersburg, but it can be free used only in the graphical interface. Access to the Wunderground API is paid and starts with large amounts of money [32]. The VisualCrossing service has data, and it is possible to use the API, but there is no possibility to specify a specific geo-point, only the city of St. Petersburg [33]. There is a limit of 1000 records per day for the free plan. The work suggested using the VisualCrossing service.
210
4.2
A. Volkov et al.
W2RML Scheme
The scheme of the method is shown in Fig. 2. The sequence of actions is shown in parentheses. Below is a description of each of the blocks. The user interacts with the system through a graphical interface, entering SPARQL queries. The kernel has information about the existing data services, sends SPARQL queries to the services, merges the data and returns the data as an RDF graph to the user. The data service of SPARQL endpoint road traffic accident cards implemented with Ontop tool, to which R2RML mapping of road traffic accident cards is connected [19]. The database of traffic accident cards is a PostgreSQL relational database with the PostGIS extension included to store geo-coordinates. The weather service provides a mediator that translates SPARQL requests into requests to the caching database and into HTTP requests to the web weather service. The weather service is also responsible for merging the data of the traffic accident cards and weather data. Weather service caching database is used to reduce number of requests to external service. Web service is an external service that provides weather data. Date and city in case of Russia are input to the web service, as the used web service does not return the data in case of geopoints input. Consider the algorithm of the method. 1. The platform user writes a SPARQL query. The query specifies the data from the SPARQL endpoint and specifies the data extension flag. 2. The query is duplicated in the R2RML card service, which works as a SPARQL endpoint. 3. The accident cards data service forms a SQL query in the card database using R2RML mapping. 4. The service gets a response from the database in native format. 5. The service sends the query result to RDF. 6. The kernel sends the original SPARQL query and the data obtained in step 5 to the weather service. 7. The weather service polls the caching database for data that can extend RDF. 8. The caching database returns data that satisfies the request. 9. In the absence of the required data, the weather service requests data from the web service in HTTP format. 10. The web service sends results to the weather service in JSON format. 11. The weather service generates and sends extended data in RDF to the kernel. 12. The kernel returns data to the user in RDF format.
Web API Service to RDF Querying Distributed Data Sources
211
Fig. 2. Scheme of the method W2RMM (Web API to RDF Mapping Method)
5
Conclusion
Numerous data sources should be required for the effective operation of the smart city platform in various tasks. Open data solves the problem of data collection and storage, but adds the problem of integrating heterogeneous data. The use of ontology eliminates this obstacle in the different semantics of the data. The proposed method is as follows. Each data source provides an open SPARQL access point. Integration of data sources is accomplished through federated SPARQL queries that first take information from sources with a full data set and then to sets of partially available data. In the traffic analysis system under consideration, this approach allows the use of weather data in clustering, which will give the opportunity to improve road infrastructure during different seasons of the year. For example, to determine the patterns of clustering accidents in winter or under which weather conditions there are many accident participants. The future work is concentrated on the platform implementing for cross-verification of data obtained from the various data sources. The method for spatio-temporal query to the resulting ontology over linked open data sources also will be developed. Acknowledgements. The study was carried out with the support of the Russian Foundation for Basic Research within the framework of the scientific project No. 2007-00904 and 20-07-00560. The architecture is partially due to the State Research, project number FFZF-2022-0005.
212
A. Volkov et al.
References 1. Hus´ akov´ a, M., Bureˇs, V.: Formal ontologies in information systems development: a systematic review. Inf. 11(2), 66 (2020) 2. Tiwari, P., Kumar, S., Mishra, A., Kumar, V., Terfa, B.: Improved Performance of Data Warehouse. In: 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Mar. (2017) 3. Khine, P., Wang, Z.: Data lake: a new ideology in big data era. In: ITM Web of Conferences, vol. 17, p. 03025 (2018) 4. Gartner: Data fabric architecture is key to modernizing data management and integration. https://www.gartner.com/smarterwithgartner/data-fabric-architectureis-key-to-modernizing-data-management-and-integration 5. Traffic Accident Map. https://dtp-stat.ru/ 6. HomeHub, Data and calculations. https://homehub.su/calculations 7. Bawany, N., Shamsi, J.: Smart city architecture: vision and challenges. Int. J. Adv. Comput. Sci. Appl. 6 (2015) 8. Sun, Y., Wang, Y., Yuan, K., Chan, T., Huang, Y.: Discovering spatio-temporal clusters of road collisions using the method of fast bayesian model-based cluster detection. Sustainability 12(20), 8681 (2020) 9. Puspitasari, D., Wahyudi, M., Rizaldi, M., Nurhadi, A., Ramanda, K., Sumanto: K-means algorithm for clustering the location of accident-prone on the highway. J. Phys.: Conf. Ser., 1641, 012086 (2020) 10. Beek, W., Zijdeman, R.: nlGis: a use case in linked historic geodata. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 11155, pp. 437–447. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98192-5 58 11. Galatoulas, N., Genikomsakis, K., Ioakimidis, C.: Spatio-temporal trends of ebike sharing system deployment: a review in Europe, North America and Asia. Sustainability 12(11), 4611 (2020) 12. Chen, Z., van Lierop, D., Ettema, D.: Exploring dockless bikeshare usage: a case study of Beijing, China. Sustainability 12(3) 1238 (2020) 13. Garijo, D., Poveda-Villal´ on, M.: Best practices for implementing FAIR vocabularies and ontologies on the web. arXiv 2003.13084 (2020) 14. Roldan-Molina, G.R., Mendez, J.R., Yevseyeva, I., Basto-Fernandes, V.: Ontology fixing by using software engineering technology. Appl. Sci. 10(18), 6328 (2020) 15. Asfand-e-yar, M., Ali, R.: Semantic integration of heterogeneous databases of same domain using ontology. IEEE Access (2020) 16. Trino, Distributed SQL query engine for big data. https://trino.io/ 17. The World Wide Web Consortium (W3C), Relational Databases Are Not Designed For Heterogeneous Data. https://www.w3.org/TR/r2rml 18. Calvanese, D., et al.: Ontop: answering SPARQL queries over relational databases. Semantic Web 8 (2016) 19. Ontop, A Virtual Knowledge Graph System. https://ontop-vkg.org/ 20. Priyatna, F., Corcho, O., Sequeda, J.: Formalisation and experiences of R2RMLbased SPARQL to SQL query translation using morph. In: WWW 2014: Proceedings of the 23rd International Conference on World Wide Web, pp. 479-490 (2014) 21. P´ aez, O., Vilches-Bl´ azquez, L.: Bringing federated semantic queries to the GISbased scenario. ISPRS Int. J. Geo-Inf. 11(2), 86 (2022) 22. Ding, L., Xiao, G., Calvanese, D., Meng, L.: Consistency assessment for open geodata integration: an ontology-based approach. GeoInformatica 25, 733–758 (2021)
Web API Service to RDF Querying Distributed Data Sources
213
23. Ding, L., Xiao, G., Pano, A., Stadler, C., Calvanese, D.: Towards the next generation of the LinkedGeoData project using virtual knowledge graphs. J. Web Semantics 71, 100662 (2021) 24. Calvanese, D., Lanti, D., Mendes de Farias, T., Mosca, A., Xiao, G.: Accessing scientific data through knowledge graphs with Ontop. Patterns 2, 100346 (2021) 25. Wu, H., Zhong, B., Medjdoub, B., Xing, X., Jiao, L.: An ontological metro accident case retrieval using CBR and NLP. Appl. Sci. 10(15), 5298 (2020) 26. State automobile inspectorate, road safety indicators. http://stat.gibdd.ru/ 27. Rodriguez, J.A., Fernandez, F.J., Arboleya, P.: Study of the architecture of a smart city. In: Proceedings of The 2nd International Research Conference on Sustainable Energy, Engineering, Materials and Environment, vol.2, no. 23, p. 1485 (2018) 28. Towards data science, predicting traffic accident hotspots with spatial data science. https://towardsdatascience.com/predicting-traffic-accident-hotspots-withspatial-data-science-cfe5956b2fd6 29. Rosgidromet, Open Data. http://www.meteorf.ru/opendata/ 30. Meteostat Developers, Daily Data. https://dev.meteostat.net/api/point/daily. html 31. Yandex technologies, API access rates. https://yandex.ru/dev/weather/doc/dg/ concepts/pricing.html 32. Weather underground, aint Petersburg Russia weather history. https://www. wunderground.com/history/daily/ru/saint-petersburg/ULLI/date/2019-1-30 33. Weather Data & API — Visual crossing, weather query builder. https://www. visualcrossing.com/weather/weather-data-services
Risk Management in the Clinical Pathology Laboratory: A Bayesian Network Approach José Crispim1(B)
, Andreia Martins2 , and Nazaré Rego1
1 NIPE, Escola de Economia e Gestão, Universidade do Minho, Campus de Gualtar, 4710-057
Braga, Portugal [email protected] 2 Hospital Senhora da Oliveira, E.P.E., Creixomil, 4835-044 Guimarães, Portugal
Abstract. The tests performed at clinical pathology laboratories play an important role in medical decision-making. This paper describes the application of a risk assessment framework to the clinical pathology laboratory of a hospital in the north of Portugal. The study involved a scoping literature review, focus groups and Bayesian Networks. Noisy-OR Canonical model was used to determine the conditional probabilities of Bayesian Networks. The approach can easily be adapted for other clinical laboratories. The study presents a new, simple and easy to understand alternative to the traditional Failure Mode Effects Analysis (FMEA) for risk management that: 1) facilitates the global visualization of the interdependencies between risk events; 2) obtains the likelihoods of the risks; and 3) allows simulations of risk mitigation strategies. The framework and its outcomes were well accepted by clinical pathology laboratory professionals since they considered it suitable to contextualize the risk network structure to the specific reality of the laboratory and the resulting model was useful to raise awareness about the latent risks at the laboratory. Moreover, some risks not referred in the literature were identified. Keywords: Risk management · Clinical Laboratory · Bayesian Networks
1 Introduction Clinical laboratory tests play an important role in medical decision-making. These laboratories are health care facilities that provide a wide range of laboratory procedures (by examining and analyzing components in blood, urine and body fluids) which aid physicians in carrying out the diagnosis, treatment, and management of patients [1]. Unfortunately, no laboratory tests or devices are foolproof and errors can occur at preanalytical, analytical and post-analytical phases of testing [2]. Risk management can be used to evaluate conditions that can possibly lead to errors and to outline the necessary steps to detect and prevent those errors before they occur, causing numerous troubles with the potential to, in extreme situations, harm patients. An error at the laboratory can affect a patient, a technologist, the laboratory director, a physician, another hospital department or even the whole hospital. Risk is essentially the probability that an error © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 214–223, 2023. https://doi.org/10.1007/978-3-031-35510-3_21
Risk Management in the Clinical Pathology Laboratory
215
occurring in the laboratory would lead to harm. According to the International Organization for Standardization – ISO14971 [3], risk management is the systematic application of management policies, procedures and practices to the tasks of analyzing, evaluating, controlling, and monitoring risk. Risk management in clinical laboratories has been seldom studied. Njoroge and Nichols [2] outline how to develop and maintain a quality control plan for medical laboratory testing based on industrial risk management principles and suggest Failure Mode and Effects Analysis (FMEA) to identifies potential sources of failure and a failure reporting and corrective action system (FRACAS) for risk analysis. Chang et al. [4] put this model in practice through the development and application of a computerized risk registry and management tool in a clinical laboratory of a university hospital. Xia et al. [5] also applied this model in practice and extended it with by incorporating Sigma metrics in the quality indicators used for risk assessment. Typically, these studies use quality indicators from the literature: Plebani et al. [6] identified 20 for the whole testing process. Through a national survey in China, Duan et al. [7] contextualize these indicator for the complete testing process in the clinical laboratories of China, which resulted in 15 indicators. Zeng et al. [8] construct a risk quality control chart that intuitively shows specific risk levels and can warn lab staff if something was out-of-control, without the need for quality control rules to make judgments. Janssens [9] also uses the classical FMEA approach for carrying out a prospective risk analysis in a real clinical laboratory case study. The literature review shows that quality indicators have been used for error identification and FMEA to rank errors [e.g., 10]. Also, the literature shows that quality indicators should be contextualized to the region laboratory [11]. This study applies the framework proposed by Crispim et al. [12] to the clinical pathology laboratory of a 500-bed general public hospital in the north of Portugal that serves a population of around 450 thousand inhabitants. In our laboratory case, confidentiality between the participants was not regarded as important; on the contrary, it was considered that group discussions could be insightful. Therefore, the data needed for contextualization were collected through focus groups. This information provided a deeper understanding of the fundamentals, processes and contexts that shape the potential occurrence of errors in the laboratory testing process [13]. Bayesian networks, an alternative to FMEA approaches, are a method for risk representation that is closer to the complex reality and the dynamics of risks [14], suitable for knowledge representation and reasoning and to deal with uncertain information in complex environments [15]. The testing process at a laboratory can be viewed as a network of interdependent risks: a risk in an “upstream” activity can trigger several risks “downstream”, and a risk in a “downstream” activity may emerge from the combined occurrence of several risks located “upstream”. Several reasons for errors in laboratories to be neglected have been pointed out [16]: 1) the difficulties in discovering and identifying all types of errors; 2) the need for a well-designed model aiming to evaluate all phases in the total testing process; and 3) the poor perception by professionals and other stakeholders of the harmfulness of errors in laboratory. By developing a risk assessment model based on Bayesian Networks for the whole testing process of a hospital, from a managerial point of view, this study aims to reduce error occurrence in the analyzed service and to contribute to further developments
216
J. Crispim et al.
in models for risk management in clinical pathology laboratories and other health care services. The developed model offers the strong capability to visualize the specific reality of the laboratory analyzed in order to solve the identified gaps.
2 Research Design The study followed a sequential research design that included three stages: 1) scoping literature review to identify hazards, failures and errors concerning all the phases of the testing process; 2) focus groups to select a final list of risks and construct the network of risk events; and 3) risk assessment using Genie software. Table 1 presents an overview of the activities performed. Table 1. Stages of the risk model development. Stage 1 - Identification of hazards, non-conformities and errors concerning testing process Inputs: Literature review protocol Activities: Review the literature Outputs: List of selected possible risks and their causes Stage 2 – Construction of network of risk events Inputs: List of selected possible risks and their causes Activities: a) Discuss hazard/risk concept and the initial risk list - Focus Group 1; b) Discuss the risk causes and interconnections - Focus Group 2; c) Discuss final risk events to include in the risk network - Focus Groups 3; d) Discuss probabilities of occurrence of risks - Focus Groups 4 Outputs: Risk network contextualized to the hospital clinical pathology laboratory Stage 3 - Risk assessment Inputs: Risk causes, their interconnections and probabilities of occurrence Activities: a) Risk Bayesian Network construction using Genie software; b) Risk evaluation; c) Risk ranking Outputs: Risk model based on Bayesian Network and current risk ranking
A scoping review of the literature published from 2006 to 2022, following PRISMA [17] by searching ((“risk management” OR “risk analysis” OR “risk assessment”) AND (“clinical laboratory”) AND (“quality indicators” OR “errors” OR “risks”)) in PubMed and Web of Science databases identified hazards, non-conformities and errors related with the testing process. Focus group participants were selected through purposeful sampling and based on their function in the laboratory so that at least one representative of each function was included: 2 medical technologists /clinical laboratory scientists, 3 clinical laboratory
Risk Management in the Clinical Pathology Laboratory
217
technicians, and 1 laboratory assistant. The focus groups qualitative content analysis comprised the steps of preparation, organization, and reporting [18]. Preparation involved defining the samples; deciding the type of content analysis – directed for the data collected from the four focus groups [19]; preparing the discussion guides; and deciding how to collect data and the unit of analysis (Focus Group 1 – risks identification; Focus Group 2 and Focus Group 3 – risks interconnections; Focus Group 4 – risk probabilities of occurrence). The four focus groups involved the same set of six participants. Bayesian Networks are directed acyclic graphs with nodes that represent random variables and edges that represent their conditional dependencies. Each node has a finite set of mutually exclusive states and is associated with a conditional probability distribution that gives the probability of each state for each combination of values of its parents. The parent node directly affects the child node. The joint probability distribution of a Bayesian Networks over its set of variables X i ∈ X = {X 1 , … Xn } is given by the product of all the conditional probability distributions [20]: n P(X ) = P(Xi |pa(Xi )), (1) i=1
where pa(X i ) are the parent nodes of X i in the network. In this work, we considered that each node of the network has two states: “true” and “false”. A Conditional Probability Table (obtained from Focus Group 4) is associated with each node to denote the causal influence between variables represented by the edges. Thus, the Bayesian Network models the belief of the participants, the clinical laboratory professionals. The high number of causes for specific nodes, since, for every event, the number of possible combinations grows exponentially [21], makes elicitation from experts an impossible task. Thus, to build probability distributions from a small number of parameters, we used the canonical model Noisy-OR [22]. (2) NoisyOR : P Y |Xi , Xj = 1 − (1 − P(Y |Xi )) ∗ 1 − P Y |Xj To build the Bayesian Network, we used the academic version of the GeNIe Modeler – BayesFusion software (https://www.bayesfusion.com/genie/).
3 Results 3.1 Literature Review The purpose of the scoping review was merely descriptive and the analysis consisted on registering, for each article in the dataset, the hazards, non-conformities and errors concerning a testing process. Twenty-one articles were identified after the initial search based on the title, abstracts, and keywords. Through the reading of the abstracts and a verification that the inclusion criteria (articles containing a list of quality indicators, errors, risks or non-conformities) were met, seven articles were considered relevant. From the information extracted from these papers, an initial list of risk events was prepared and used in Focus Group 1. This list was classified in 4 groups: 2 related with the pre-analytical phase, 1 for the intra-analytical phase, and 1 for the post-analytical phase. Tables 2, 3, and 4 list the risk events.
218
J. Crispim et al. Table 2. Hazards, no-conformities or errors in the Pre-analytical Phase.
Study/Focus Group
Code
Risk related with the reception/transport/store of the samples
[6]
R1
Samples not received/ samples damaged during transportation/ packaging of defective samples
[6]
R2
Unlabeled samples or Incorrect identification of external samples
[16]
R3
Contaminated samples
[16]
R4
Mislabeled samples;
[23]
R5
Check-in not performed - Laboratory information system malfunctions (network/computers/label machine)
FG1
R6
Vacuum system failure
[6]
R7
Requests with erroneous data entry in the computer system
FG1
R8
Delay in integrating the sample into the computer system
FG1
R9
Sample not integrated into the computer system
[6]
R10
Unintelligible requests/misidentified requests/requests without clinical question/inappropriate requests, with respect to clinical question/
[6]
R11
Sample loss (vacuum/by hand)/ sample hemolyzed/ sample clotted/ inappropriate time in sample collection
FG1
R12
High workload
FG1
R13
Staff absenteeism
FG1
R14
Lack of ergonomic conditions
FG1
R15
Lack of administrative material at reception
FG1
R16
Poor conditions in the waiting room (noise/ temperature/ space)
[6]
R17
Delay in sample delivery for processing/ samples with excessive transportation time
Study/Focus Group
Code
Risk related with the collection of the samples
FG1
R18
Stress for the patient in the waiting room
FG1
R19
Stress for the professional/patient due to lack of conditions/privacy in the collection room
FG1
R20
Stress for the professional caused by a high number of collections
FG1
R21
Professional hazards (e.g., needle stick injury)
[23]
R22
Incorrect order of tube collection
[6]
R23
Insufficient sample volume
[16]
R24
Insufficient sample homogenization
[16]
R25
Errors of equipment or reagents
[10]
R26
Sample recollection (difficult puncture access, identification errors, deficient labeling, coagulated, hemolysis, low volume, spillage of biological products)
[10]
R27
Contamination of the patient/professional
Risk Management in the Clinical Pathology Laboratory
219
Table 3. Hazards, no-conformities or errors in the Intra-analytical Phase. Study/Focus Group
Code
Font size and style
FG1
R28
Energy failure
FG1
R29
Incorrect functioning of the computer system (connection failures)
FG1
R30
Water “falls” from the ceiling on top of the equipment
[24]
R31
Environment temperature and humidity
[16]
R32
Equipment breakdown
FG1
R33
Delay in repair of the equipment
FG1
R34
Sample loss due to equipment failure
[16]
R35
Sample conditions (coagulated, hemolyzed, lipemic)
[24]
R36
Incorrect operator procedure (e.g., biological spill)
[24]
R37
Exposure to aerosols
FG1
R38
Rupture stock (calibrators/ controls/ reagents)
FG1
R39
Lack of material replacement in the sector (gloves / pipettes)
FG1
R40
Ergonomic conditions for professionals
FG1
R41
Lack of use of personal protective equipment (PPE) by the operator (gloves)
FG1
R42
High noise
Table 4. Hazards, no-conformities or errors in the Post-analytical Phase. Study/Focus Group Code Font size and style [10]
R43
Inaccurate results
[25]
R44
Delay in test result
FG1
R45
Analysis repetition (for confirmation of results)
[10]
R46
Result sent to a different patient (failure of the information system or error in manual entry)
FG1
R47
Lack of communication between professionals
3.2 Risk Model The Bayesian Network of risks developed was fed by the data collected through focus groups and incorporates network design and quantification with the suggested probabilities of occurrence (see Fig. 1). The green color signals the risks related with the reception/transport/store of the samples in the Pre-analytical Phase; yellow signals the risks related with the collection of the samples in the Pre-analytical Phase; blue signals the risks related with the Intra-analytical Phase; and purple the risks related with
220
J. Crispim et al.
the Post-analytical Phase (note: since the case study is in Portugal, the networks were created using Portuguese: the word “verdadeiro" means true and “falso” means false).
Fig. 1. Bayesian Network of risks for hospital clinical pathology laboratory.
The use of the Bayesian Networks to outline how hazards are connected helps laboratory professionals to have a greater awareness of the effect and propagation of a potential hazard or error. Furthermore, it allows what-if analyses: for example, to analyze the direct impacts of the occurrence of a given event on a specific risk – for example, the impact of R28 - Energy failure on R5 - Check-in not performed (see Fig. 2). Clinical directors need to identify preventive actions to reduce risk probability before the risk occurs or protective actions to lower the impact of the occurrence of a risk. Mitigation actions should be directed at the primary causes associated with the risks with the greatest sensitivity. Another type of analysis that can be done is illustrated in Fig. 3: the causes that affect R26 - Sample recollection more are signaled in red. The implementation of the model allowed the group of participants to rethink some risky situations. For example, for R33 - Delay in the repair of the equipment, the suggested mitigation actions were: creation of a manual with easy event resolution procedures, training of professionals on the use of equipment, and negotiation of contractual conditions with the supplier. The group subjectively estimated that the probability of occurrence would drop by 20%. A direct impact was found on the R44 - Delay in test result, but with only a 1% reduction. Another example that illustrates the importance of this model is the analysis performed on R1 - Samples not received/ samples damaged during transportation/ packaging of defective samples. From the focus group discussion, training on sample collection, storage and transport procedures was suggested as a mitigation action, which was thought to reduce the risk by 30%. However, after performing the Bayesian Network calculations, it was verified that it would decrease only 3% due
Risk Management in the Clinical Pathology Laboratory
221
Fig. 2. Simulation of the impact of R28 - Energy failure on R5 - Check-in not performed
Fig. 3. Simulation of risk events that affect R26 - Sample recollection more.
to the interdependencies between risks in the network. This type of analysis improves the knowledge about risks and potential risk mitigation actions. Other results, in addition to the risk management model, that we want to emphasize were: the synthesis of a comprehensive list of risks inherent to clinical pathology, the
222
J. Crispim et al.
creation of relationships between risks, causes (factors) and effects that allow evolving from a linear view expressed in previous FMEA works towards a network view, and with this demonstration of the inherent complexity of clinical pathology, it has created greater awareness and sensitization among health professionals.
4 Conclusions Risk management can minimize the possibility of error occurrence. This study presents a new approach to the traditional FMEA. The risk model developed helped the laboratory to adopt a proactive role in minimizing the potential errors by identifying the weakness of each testing phase. The model intends to be simple so that it can be easily implemented. Another method for gathering information could have used (e.g., Delphi method). This stage is very important because it allows the model to be contextualized to the specific reality of the laboratory. Once implemented, the efficiency of the model should be continually monitored and revised in order to maintain the residual risk at a clinically accepted level.
References 1. Bayot, M.L., Brannan, G.D., Naidoo, P.: Clinical laboratory. In: StatPearls. StatPearls Publishing (2022) 2. Njoroge, S.W., Nichols, J.H.: Risk management in the clinical laboratory. Ann. Lab. Med. 34(4), 274–278 (2014) 3. ISO/IEC, Safety aspects - Guidelines for their inclusion in standards. ISO/IEC Guide 51, International Organization for Standardization Geneva (1999) 4. Chang, J., Yoo, S.J., Kim, S.: Development and application of computerized risk registry and management tool based on FMEA and FRACAS for total testing process. Medicina (Kaunas) 57(5), 477 (2021) 5. Xia, Y., et al.: Risk assessment of the total testing process based on quality indicators with the Sigma metrics. Clin. Chem. Lab. Med. 58(8), 1223–1231 (2020) 6. Plebani, M., Sciacovelli, L., Aita, A.: Quality indicators for the total testing process. Clin. Lab. Med. 37(1), 187–205 (2017) 7. Duan, M., et al.: National surveys on 15 quality indicators for the total testing process in clinical laboratories of China from 2015 to 2017. Clin. Chem. Lab. Med. 57(2), 195–203 (2019) 8. Zeng, Y., et al.: Establishment of risk quality control charts based on risk management strategies. Ann. Clin. Biochem. 59(4), 288–295 (2022) 9. Janssens, P.M., van der Horst, A.: Improved prospective risk analysis for clinical laboratories compensated for the throughput in processes. Clin. Chem. Lab. Med. (CCLM) 56(11), 1878– 1885 (2018) 10. Plebani, M., et al.: Harmonization of quality indicators in laboratory medicine: a preliminary consensus. Clin. Chem. Lab. Med. 52(7), 951–958 (2014) 11. Keckler, M.S., et al.: Development and implementation of evidence-based laboratory safety management tools for a public health laboratory. Saf. Sci. 117, 205–216 (2019) 12. Crispim, J., Fernandes, J., Rego, N.: Customized risk assessment in military shipbuilding. Reliabil. Eng. Syst. Saf. 197, 106809 (2020)
Risk Management in the Clinical Pathology Laboratory
223
13. Lehoux, P., Poland, B., Daudelin, G.: Focus group research and “the patient’s view.” Soc. Sci. Med. 63(8), 2091–2104 (2006) 14. Alaeddini, A., Dogan, I.: Using Bayesian networks for root cause analysis in statistical process control. Expert Syst. Appl. 38(9), 11230–11243 (2011) 15. Zhang, L., et al.: Towards a fuzzy bayesian network based approach for safety risk analysis of tunnel-induced pipeline damage. Risk Anal. 36(2), 278–301 (2016) 16. Plebani, M.: Errors in clinical laboratories or errors in laboratory medicine? Clin. Chem. Lab. Med. (CCLM) 44(6), 750–759 (2006) 17. Tricco, A.C., et al.: PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann. Intern. Med. 169(7), 467–473 (2018) 18. Assarroudi, A., et al.: Directed qualitative content analysis: the description and elaboration of its underpinning methods and data analysis process. J. Res. Nurs. 23(1), 42–55 (2018) 19. Hsieh, H.-F., Shannon, S.E.: Three approaches to qualitative content analysis. Qual. Health Res. 15(9), 1277–1288 (2005) 20. Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs, vol. 2. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-68282-2 21. Cárdenas, I.C., et al.: Using prior risk-related knowledge to support risk management decisions: lessons learnt from a tunneling project. Risk Anal. 34(10), 1923–1943 (2014) 22. van Gerven, M.A.J., Lucas, P.J.F., van der Weide, T.P.: A generic qualitative characterization of independence of causal influence. Int. J. Approx. Reason. 48(1), 214–236 (2008) 23. Carraro, P., Plebani, M.: Errors in a stat laboratory: types and frequencies 10 years later. Clin. Chem. 53(7), 1338–1342 (2007) 24. James, H.N.: Risk management for point-of-care testing. Ejifcc 25(2), 154–161 (2014) 25. Jairaman, J., Sakiman, Z., Li, L.S.: Sunway medical laboratory quality control plans based on six sigma, risk management and uncertainty. Clin. Lab. Med. 37(1), 163–176 (2017)
Leveraging Sequence Mining for Robot Process Automation Pietro Dell’Oglio1(B) , Alessandro Bondielli2 , Alessio Bechini3 , and Francesco Marcelloni3 1
Department of Information Engineering, University of Florence, Florence, Italy [email protected] 2 Department of Computer Science, University of Pisa, Pisa, Italy [email protected] 3 Department of Information Engineering, University of Pisa, Pisa, Italy {alessio.bechini,francesco.marcelloni}@unipi.it
Abstract. The automation of sequences of repetitive actions performed by human operators in interacting with software applications is crucial to prevent work from being perceived as alienating and boring. Robot applications can automatise these sequences once they have been identified. In this paper, we propose a two-step approach to mine sequences of actions that could be automated from log data produced by the interactions of a human operator with specific software applications. Since the number of possible sequences may be very high and not all the sequences are interesting to be automatised, we focus our mining process on sequences that meet precise patterns. First, Frequent Episode Mining algorithms are applied for extracting all the sequences of actions that occur with at least a minimum frequency. Then, we exploit fuzzy string matching based on the Levenshtein distance for filtering out the sequences that do not match established patterns. We evaluate the effectiveness of the approach using a benchmark dataset and present a case study on a realworld dataset of activity logs generated in the context of the AUTOMIA project.
1
Introduction
Nowadays, specific software systems have been developed to automate repetitive interactions with software applications from human operators. Internet bots (or more simply bots) represent a very popular example: they are software systems running automated tasks over the Internet with the aim of imitating human activity. It is crucial determining and modelling the target processes, which correspond to sequences of actions; this can be achieved by analysing recordings of the interactions of operators with their computers. Such data can be obtained by action tracking software tools, which produce detailed logs. However, not all the activities are to be considered repetitive and relevant to replace with a robot. We are interested only in frequent sequences that adhere to specific patterns and, if automated, can produce a high added value. These sequences c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 224–233, 2023. https://doi.org/10.1007/978-3-031-35510-3_22
Leveraging Sequence Mining for Robot Process Automation
225
are difficult to identify manually, especially when the human operator is working on multiple tasks at the same time. In this paper, we propose a two-step approach for automatically determining these sequences. First, we extract candidate sequences from action logs by exploiting Frequent Episode Mining (FEM) techniques [9]. Then, we adopt fuzzy string matching for identifying, among the candidate sequences, those that are compliant with specific patterns of interest (named as archetype patterns in this paper). Thus, we ensure that the decision of automating the archetype pattern (or one very similar to it) is supported by the actual data. We evaluated the approach in two ways. First, we exploited a public benchmark dataset, arranging it to better fit our use case scenario, in order to obtain a quantitative evaluation of the system. Second, we carried out a case study on a real-world dataset of activity logs obtained in the context of the project AUTOMIA1 . The paper is organized as follows. In Sect. 2 we present an overview of related works, focusing on Episode Mining. Section 3 describes the proposed approach. In Sects. 4 and 5 we present the evaluation on benchmark data and the case study, respectively. Finally, Sect. 6 discusses limitations and future directions, and draws some conclusions.
2
Related Works
Given a sequence of events, each marked with a specific timestamp, along with the maximum length for temporal windows, Frequent Episode Mining (FEM) aims to identify sub-sequences of events in bounded temporal windows (a.k.a. “episodes”) that occur frequently in the sequence. The problem is particularly relevant to system log analysis [13]. FEM was initially proposed in 1995 by Toivonen and Mannila to analyze sequences of alarms (see [9]). Usually, the general problem is formally defined as follows. An event E is a subset of a given finite set of items I = {i1 , i2 , ..., im }. A complex sequence S of events E is a temporally ordered list of tuples of the form (Eti , ti ), where Eti is the event that occur at time ti . An episode is a non-empty sub-sequence of S of the form α = E1 , E2 , ..., Ep , spanning a temporal interval. The support of an episode α is defined as the number of its occurrences in S. The objective of FEM is, given a temporal window of length winlen and a minimum support threshold minsup, identifying all the episodes fitting in winlen, and occurring in S at least minsup times. The WINEPI and MINEPI algorithms [7] have been proposed to solve the problem of Frequent Episode Mining. They follow the same procedural scheme, but differ in the calculation of frequency and support. In particular, MINEPI adopts countermeasures to avoid possible issues in episode counting that occur in WINEPI [6]. EMMA and MINEPI+ [5] represent an improvement over WINEPI and MINEPI. Occurrences of an episode, and thus its support, are counted based 1
https://www.itpartneritalia.com/automia-la-rpa-che-migliora-il-lavoro/.
226
P. Dell’Oglio et al.
on the frequency of the head of the episode (i.e., its first element). Algorithms for high-utility episode mining [2,11], proposed more recently, aim to identify highutility episodes in complex event sequences such as transactional databases. Sequence and Sequential Pattern Mining techniques have been used, for example, for understanding incorrect behavioral patterns on interactive tasks [14], or for learning procedures by augmenting sequential pattern mining with planning knowledge [4]. An interesting application of sequence mining techniques is the one that integrates tools from click-stream analyses and graphmodelled data clustering with psychometrics for identifying common response processes [15].
3
The Proposed Approach
The proposed approach is based on a two-step process, as shown in the flowchart in Fig. 1. In the first step, the Frequent Episode Miner (FEM-M) performs Frequent Episode Mining to discover statistically significant episodes in the action log, which are candidate episodes to characterise processes deserving to be automatized. As their number may be very large, and as many of them might be irrelevant for our goals, a downstream selection must be executed. In the second step, the Target Episode Finder (TEF-M) spots out the most significant episodes taking as reference some patterns (archetype patterns) that are considered interesting for the specific application. Such a task exploits fuzzy string matching. In the case study on the automation of actions based on activity logs, possible patterns are, for instance: “opening a web browser, performing a Google search, and opening the first result of the search” and “opening a file, modifying it, saving and closing”. It is worth mentioning that both FEM-M and TEF-M perform actions that can be defined and implemented independently from each other; they provide results that could be used also in analyses of completely different types. This characteristic gave us the opportunity to implement them according to a RESTful architecture, with clear advantages in terms of flexibility of the overall pipeline.
Fig. 1. Flowchart of the proposed approach
Leveraging Sequence Mining for Robot Process Automation
3.1
227
FEM-M: Frequent Episode Miner
The system receives in input a dataset containing sequential data, such as activity logs (see Fig. 1). Note that each event in the sequential dataset must be associated with a timestamp. Among the required data pre-processing actions, it is important setting timestamps in a standard format: we chose to adopt the Unix timestamp convention. Next, pre-processed data are passed as input to the algorithm used in FEM-M. In our experimentation we chose MINEPI+ [3], but our implementation of FEM-M can be configured also for using EMMA [6]. In particular, MINEPI+ and EMMA have a feature that makes them particularly suited to generalization to many problems: with respect to other FEM algorithms, they exploit the concept of “head frequency” for computing the support (see Sect. 2), and thus they can be natively applied on not-timestamped data. We employed the versions of the algorithms included in the SPMF library2 , which leverage all the optimizations described in [6]. The algorithm requires to set only two parameters, namely the minimum support threshold and the maximum window size for an episode. Note that, for originally non-timestamped data (i.e., sequences without a time associated with them), the window size can be simply defined considering the number of events in episodes. Thus, episodes that occur less frequently than the minimum support threshold and those not fitting in the maximum temporal window are not considered. The output of FEM-M is the set of frequent episodes in the dataset. For each frequent episode, all the composing events are reported, along with the episode support value relative to the dataset. The output could be exploited from two perspectives. On the one hand, it serves as input for the TEF-M module described below. On the other hand, leveraging episode lengths and their support, e.g., sorting the obtained episodes by frequency, and providing a visual representation of them, can also be used to identify the patterns whose generating processes can be automated. For example, sufficiently long sequences with high support values may be further investigated to discover new patterns worth of possible automatization. 3.2
TEF-M: Target Episode Finder
The second step of the system deals with the identification of relevant episodes, with respect to specific application goals. The input for the TEF-M module is represented by: i) a set of frequent episodes and ii) an “archetype pattern”, which corresponds to a sequence of events that describes an activity of interest for the application domain. Its purpose is the identification of possible episodes that approximate as closely as possible the provided archetype pattern. The statistically significant presence of such episodes ensure that the archetype pattern is indeed frequently exploited and thus a viable candidate for automation.
2
https://www.philippe-fournier-viger.com/spmf/MINEPI PLUS EPISODE.php.
228
P. Dell’Oglio et al.
To evaluate the similarity of episodes to the archetype pattern, we resort to the use of a purposely-defined distance function. Specifically, the metrics chosen for filtering the frequent episode set (keeping only the episode most similar to the archetype pattern) is the normalized Levenshtein distance, which in our setting quantifies the fuzzy matching between two episodes by assuming values in the interval [0, 1]. The implementation of the adopted distance has been based on the FuzzyWuzzy Python library3 . The choice of the specific distance is motivated by the fact that it directly operates on strings, thus not requiring an additional layer of representation of episodes. This provides a good proof of concept for our general framework. Each episode is compared with the archetype pattern to evaluate the degree of similarity. An episode is considered similar only if the value of the Levenshtein similarity (i.e., 1 − Levenshtein Distance) is sufficiently high. In the implemented module, we used a tunable threshold parameter with values in the range [0, 1] if set to 1, only episodes identical to the archetype pattern are selected. Even if TEF-M is directly aimed at validating the presence of a predetermined pattern among those identified, as a by product we also obtain other patterns that are similar to the archetypal one and that could be taken in consideration for automation as well.
4
Experimental Results
We first evaluated the effectiveness of the choice of the fuzzy string matching metrics as the main core of the TEF-M stage of our pipeline. To this aim, we considered a public benchmark dataset that includes public domain books converted into sequences of elements [10]. Such a choice represents a good fit for the target domain of process automation. In a text the parts of speech are organized within each sentence according to specific rules of usage, as it happens also for actions in instances of processes recorded in a log. Moreover, the extensive but limited number of Parts-of-Speech lends itself well to simulate a finite set of interactions with a computer. The dataset is part of the SPMF library4 and includes novels of the XIX century in English. Each novel is split into sequences of words and Part-of-Speech (PoS), i.e., nouns, verbs, adjectives, etc. For our experiments, we chose one of the available books in the repository, namely “A Romance of the Republic” by author Lydia Maria Child. We decided to experiment on the PoSbased variant, i.e. considering as event types the possible distinct PoSs (“tags”) associated with each word in the dataset. Such a choice has been motivated by the observation that the number of different PoSs is smaller than the vocabulary of a text, and it represents a better fit for modeling a typical application scenario for the proposed approach. Specifically, the used tags are the standard ones
3 4
https://pypi.org/project/fuzzywuzzy/. https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php.
Leveraging Sequence Mining for Robot Process Automation
229
defined in the Penn Treebank Project one5 , which thus correspond to the elements of the set of items I for the dataset. Thus, for our purposes we considered each PoS as an event, and we associated each event with an integer timestamp corresponding to its position in the text. For example, the timestamp for the first PoS in the novel is 1, the second is 2, and so on. The chosen dataset consists of a sequence of 125395 events (i.e., PoSs). We then applied the Frequent Episode Miner module to obtain all the frequent episodes of parts-of-speech in the novel. Regarding the chosen parameterization, we set the minimum support threshold to 500, and the maximum window size to 10. The FEM-T stage identified 4121 frequent episodes. An example of an episode Ep is as follows: Ep = RB , DT , JJ . The episode is composed of an adverb (“RB”) followed by a determiner (“DT”) and then an adjective (“JJ”). To evaluate the approach with respect to actual data, we generated a test set of archetype episodes. Specifically, we randomly selected 100 episodes from the output of the FEM-M, and we slightly modified them by randomly adding an event (actually, a PoS) to the episode. Referring to the previous example, the resulting modified archetype episode is Epmod = RB , WDT , DT , JJ . In this case “WDT” is an event actually corresponding to a “Wh-determiner”. In the archetype pattern list, each episode of the first list is modified by adding a Wh-determiner. The goal of the experiment is to apply TEF-M using as input both the list of frequent episodes produced by the FEM-M and each episode in the archetype pattern list. We then rank the lists of obtained episodes to analyse how frequently we retrieve the corresponding target episode in the upper part of the ranking. Thus, we can evaluate the ability of the approach in correctly identifying the similarity between an archetype pattern and a target episode. Figure 2 presents the results obtained for 100 episodes: in the chart, both specific values and the overall trend is reported. The metric used is the Average Recall at n (AR@n), where n is the position in the ranking to bound the limit of the upper part. If n = 1, AR@1 represents the fraction of target episodes correctly retrieved in the first position. For n = 10, AR@10 represents the fraction of target episodes that are retrieved in the upmost ten positions, and so on for increasing values for n. Figure 2 shows that for our test set of 100 episodes, AR@1 is quite high, with 0.56. AR increases from 0.56 to 0.61 for n = 10. Obviously, the higher the value of n, the more likely a target episode can be found in the upmost n positions of the ranking. Given an output consisting of several episodes, we can search among those to retrieve the target pattern. The results show that we are reasonably able to identify the target pattern among the first positions of the ranking.
5
https://www.ling.upenn.edu/courses/Fall 2003/ling001/penn treebank pos.html.
230
P. Dell’Oglio et al.
Fig. 2. Average recall versus number of positions considered in the ranking.
5
A Case Study
The proposed approach was also tested in a specific use case in a real-world scenario. Specifically, we considered a real-world dataset of activity logs and actions of interests performed by human operators who interact with a computer. The tests have been conducted in the context of the project AUTOMIA. The goal of the project was to identify automatable processes in users activity, i.e. identify repetitive processes. The data represent the interaction of a user with a computer. The user performs tasks that are automatically tracked at different granularity levels for the various operations. The tracking system was part of the project and includes proprietary algorithms. Each dataset record relative to an event accounts for the action performed by a user, the active OS GUI window, metadata, and finally the relative timestamp. In the case study, we assessed our proposed method on a small sample of archetype episodes. First, we leveraged FEM-M to obtain frequent episodes from the dataset. Due to the relatively small dimension of the available dataset, and the fact that the expected outputs are relatively short episodes, we chose to set the minimum support threshold to 4 and the max window width to 18. Second, we exploited TEF-M to compute the similarity between each of the archetype episodes and all the frequent episodes identified with FEM-M. In this case, as the goal was to obtain only episodes that are very similar to the archetypal ones, we set the Levenshtein similarity threshold to 0.75. We recall that we refer to similarity as 1 − Levenshtein distance. Due to space constraints, we only present a short example in Fig. 3. The figure shows an archetypal episode and one of the retrieved episodes with the highest similarity. In this case, the archetypal episode refers to the action of searching and downloading r´esum´es from a r´esum´e aggregator website, namely IProgrammatori.6 . Note that the two episodes are nearly identical. 6
https://www.iprogrammatori.it/.
Leveraging Sequence Mining for Robot Process Automation
231
Fig. 3. An archetypal episode and one of the retrieved episodes with the highest similarity
The obtained results have been manually evaluated by partners of the project. We have been able to assess the reliability of the proposed method both in the identification of frequent episodes that may be of interest in the context of automation, and in the effective retrieval of pre-determined patterns as part of the identified episodes. Thus, this early evaluation has shown to be strongly indicative of the effectiveness of the proposed method.
6
Conclusions
In this paper, we have proposed an approach which exploits evidence mining algorithms for determining automatable sequences in activity logs. The approach is implemented as a pipeline. First, Frequent Episode Mining algorithms are
232
P. Dell’Oglio et al.
applied to extract frequent episodes from the activity logs. Then, fuzzy matching is used for determining whether the frequent episodes follow pre-determined patterns. We evaluated the proposed approach using a purposely adapted benchmark dataset, and in a real-world scenario with actual data from activity logs. The proposed approach has both strengths and weaknesses. Having developed the approach on tested algorithms and methods enables us to obtain a strong baseline for its effectiveness regardless of the context. Moreover, the choice of algorithms lends itself well to a wide array of scenarios. In this context, we can point out how recent Episode Mining algorithms move on to other lines of research, such as the High-Utility Frequent Episode Mining [16], address specific problems such as the mining of partially-ordered episode rules [1], or the mining of the top-k high utility episodes [12]. Such algorithms could be easily and effectively implemented in future versions of our system. Conversely, we must point out how the techniques used are relatively simple and, while able to provide a strong baseline and an effective proof of concept for the proposed approach, could be further refined and improved. In this context, in the future we expect to move forward from fuzzy matching and incorporate metrics and techniques that can better model the data semantics. For example, we could either enrich the description of events with additional information and metadata that could lead to a more effective representation for comparison with similarity metrics, or leverage learned representations of events by means of techniques typically used for natural language representations, such as word embeddings [8]. Finally, a third module for visualization will be implemented so that results for both modules could be more easily interpreted by the users of our system. We believe that our proposed pipeline could be helpful in the context of robot process automation, especially in its broader aspects, with the final goal of helping humans focus only on the more interesting, challenging, and rewarding tasks in their daily activities.
References 1. Fournier-Viger, P., Chen, Y., Nouioua, F., Lin, J.C.-W.: Mining partially-ordered episode rules in an event sequence. In: Asian Conference on Intelligent Information and Database Systems, pp. 3–15. Springer (2021) 2. Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1 8 3. Fournier-Viger, P., Yang, P., Lin, J.C.-W., Yun, U.: HUE-Span: fast high utility episode mining. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds.) ADMA 2019. LNCS (LNAI), vol. 11888, pp. 169–184. Springer, Cham (2019). https://doi.org/ 10.1007/978-3-030-35231-8 12 4. Gervasio, M., Myers, K.: Learning procedures by augmenting sequential pattern mining with planning knowledge. In: Proceedings of the 8th Annual Conference on Advances in Cognitive Systems (2020) 5. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Disc. 15(1), 55–86 (2007)
Leveraging Sequence Mining for Robot Process Automation
233
6. Huang, K.-Y., Chang, C.-H.: Efficient mining of frequent episodes from complex sequences. Inf. Syst. 33(1), 96–114 (2008) 7. Le, B., Duong, H., Truong, T., Fournier-Viger, P.: Fclosm, fgensm: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl. Inf. Syst. 53(1), 71–107 (2017) 8. Lenci, A.: Distributional semantics in linguistic and cognitive research. Italian J. Linguist. 20(1), 1–31 (2008) 9. Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc. 1(3), 259–289 (1997) 10. Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using small sets of frequent part-of-speech skip-grams. In: The Twenty-Ninth International Flairs Conference (2016) 11. Raissi, C., Poncelet, P., Teisseire, M.: Speed: mining maxirnal sequential patterns over data strearns. In: 2006 3rd International IEEE Conference Intelligent Systems, pp. 546–552 (2006) 12. Rathore, S., Dawar, S., Goyal, V., Patel, D.: Top-k high utility episode mining from a complex event sequence. In: Proceedings of the 21st International Conference on Management of Data, Computer Society of India (2016) 13. Tsai, C.-F., Lin, W.-C., Ke, S.-W.: Big data mining with parallel computing: a comparison of distributed and mapreduce methodologies. J. Syst. Softw. 122, 83– 92 (2016) 14. Ulitzsch, E., He, Q., Pohl, S.: Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. J. Educ. Behav. Stat. 47(1), 3–35 (2022) 15. Ulitzsch, E., et al.: Combining clickstream analyses and graph-modeled data clustering for identifying common response processes. Psychometrika 86(1), 190–214 (2021) 16. Wu, C.-W., Lin, Y.-F., Yu, P.S., Tseng, V.S.: Mining high utility episodes in complex event sequences. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 536–544 (2013)
Intelligent Agents System for Intention Mining Using HMM-LSTM Model Hajer Bouricha1(B) , Lobna Hsairi2 , and Khaled Ghedira3 1
3
National School of Computer Science, University of Manouba, Manouba, Tunisia [email protected] 2 IT Department, University of Jeddah, Jeddah, Saudi Arabia The Higher Institute of Management of Tunis, University of Tunis, Tunis, Tunisia [email protected]
Abstract. With the significant increase in information in the system log files, much research is based on the log files to improve the process. System logs contain a lot of information at different times about the behavior of the system and user. To enhance the process based on log files, the users’ behavior must be learned. With process discovery, an intentional process model and prediction strategies can be learned from log data. However, system logs do not fulfill the requirements that process discovery and prediction algorithms place on log files. To solve this problem, a new ensemble-based multi-intelligent agent system is introduced for discovering intentional process models and predicting users’ strategies in intention mining. This research proposal, therefore, proposes an architecture of four layers for intelligent agents to generate an intention mining process based on the communication coordination of intelligent agents, and we propose an HMM-LSTM-based hybrid solution to model and predict strategies of students using the case study of Educational Process Mining (EPM): A Learning Analytics Data Set. Keywords: Intention Mining · Intelligent Agent Model · Long Short-Term Memory · Log Files
1
· Hidden Markov
Introduction and Motivation
Nowadays, with the increasing popularity of technology and development, information systems can capture all stakeholders’ events, actions, and activities at various levels of granularity, from low to high. These systems collect various types of temporal data known as “log files” or “trace activities.” These files are registered in a variety of domains, including business processes, daily activities, IoT devices, medical systems, and education, among others. It also provides numerous opportunities to gain knowledge and comprehend what is happening. Understanding human strategies precisely based on intention has been a significant challenge in recent years. Despite technological advancements in the field of information systems engineering, intention mining [1] is still used regularly in many applications and services. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 234–243, 2023. https://doi.org/10.1007/978-3-031-35510-3_23
Intelligent Agents HMM-LSTM Model
235
In this research, we address the intention mining model in sequential data (log files). The solution presented in this article is a set of intelligent agents that are enhanced by and based on the combination of a hidden Markov model (HMM) [2] approach, and the long short-term memory (LSTM) [3] network of neurons. The methodology consists of combining the HMM-based model with a deep learning approach (an LSTM neural network) to obtain an intentional process model and predictions on the activities that the user could choose in order to anticipate possible strange behavior. In summary, the problem we want to address concerns how to read and interpret user activity to optimize and redirect the process. This research is driven by two main questions: RQ1: How could a multi intelligent agent approach reduce the complexity of a prediction system?, and RQ2: How could a hybrid solution combining HMM and LSTM discover an intentional process model and predict our strategies? Many data mining models are implemented as multi-intelligent agent systems in various sectors such as health, management, and engineering [4]. The characteristics of it, as well as its distributed nature, make it one of the most common data mining systems [5]. A multi-intelligent agent system can be used to reduce the complexity of various data mining methods by developing modular components in which each component is in charge of its own sub-tasks and the entire system is working toward a common goal [6,7]. Each agent employs a suitable method to complete its task and employs a tuning process to select the appropriate parameters [8]. In other words, because of their flexibility and unique features, multi-intelligent agent systems are a popular approach for implementing decision support systems [9,10]. Furthermore, the decentralized control of multi-intelligent agent systems implies that each agent in this architecture operates autonomously and, in some ways, self-deterministically [11]. This feature also makes the multi-intelligent agent system more robust because the system will continue to operate even if an individual agent crashes or does not respond within a reasonable amount of time. All of these specifications lead us to the conclusion that this model would be useful for “intention mining,” which is a specific domain of data mining. The main sequential modeling for intention mining is carried out by the HMM. HMM had limitations; in the Markov model, the transition state depends only on the current state (Markov hypothesis). However, the hidden state at any time step can contain information of almost arbitrary length. Moreover, the use of HMM in the scaling up of such a system has long been impractical, even with the presence of dynamic programming algorithms like the Viterbi [2], due to the quadratic complexity nature of the inference problem and the transition probability matrix. However, this problem can be solved by using neural networks, especially long short-term memory (LSTMs) [12]. LSTMs use specially designed memory cells with automatic connections to maintain the temporal states of the network. LSTMs prevent backscatter errors from disappearing and bursting. Instead, errors can run backward over an infinite number of virtual layers [13]. The interpretability of HMMs is combined with the predictive power of LSTM in hybrid HMM-LSTM approaches. In this paper, we investigate the effectiveness of combining the Hidden Markov Model (HMM) with the Long ShortTerm Memory (LSTM) model using a hybridization process that we introduce.
236
H. Bouricha et al.
This procedure entails combining the HMM’s hidden state probabilities with those of the LSTM. This paper is interested in the student’s learning activities. Educational Intention Mining (EIM) is an emerging field in educational process mining (EDM). EPM makes unexpressed knowledge explicit and facilitates a better understanding of the educational process by using log data gathered specifically from educational environments [14,15]. EIM also uses this data to better understand the users’ ways of thinking and working, which is another way to improve the process model. It is worth noting that we have found no research on the use of Intention Mining in the educational field. The rest of this paper is structured as follows: Sect. 2 provides an overview of the current state of the art in Intention Mining, including algorithms, statistics, and mathematical models applied to discover the intentional process model. The proposed architecture and the hybrid model (HMM-LSTM) are explained in Sect. 3. Finally, Sects. 4 and 5 conclude with the validation of this implementation and the final future research obtained during this study.
2
Related Works
Various approaches have been explored for Intention Mining, among them the majority of these approaches focus on HMM models and their different types. The first approach, “Intention Mining,” which takes information from event logs as input and produces an intentional process model that describes the user’s behavior and the best strategies fully used in the event logs, was introduced by [1,21]. They present the Map Miner Method (MMM), a novel approach for automating the construction of an intentional process model from traces. This method uses the Hidden Markov Model in supervised and unsupervised learning to model the relationship between users’ activities and their strategies (i.e., the various ways in which to meet the intention). Also, this method includes some specific algorithms Viterbi and Baum-Welch developed to infer users’ behavior and construct an intentional process model (Map), severally. MMM models the Table 1. An overview study of Intention Mining approaches. Approaches Algorithms
Models
Study cases
[16]
Viterbi algorithm
HMM
The entity/relationship diagram application
[17]
Baum-Welch algorithm
HMM
Data collector of the user eclipse
[18]
Probabilistic algorithm
Design Science
Childcare system with a Netherlands software company
[19]
Baum-welch algorithm
C-HMM
Financial statement of Banks
[20]
State-splitting algorithm H-HMM and maximum a posteriori probability MAP scoring
Trauma resuscitation process
[23]
NLP
Knowledge expert NewsIR’16 repository
Intelligent Agents HMM-LSTM Model
237
intention as an associate-oriented graph ( which different levels of granularity) to own a far better understanding of the human manner of thinking within the framework of unsupervised learning and to validate their approach, [17] use the case study of data collector of the user eclipse. Also, the entity/relationship diagram application is a case study for a supervised learning approach [16]. [20] present a novel approach to Intention Mining using a hierarchical hidden Markov Model (H-HMM) to present intention at multiple levels of granularity. Also, they introduce a state-splitting approach that avoids the random guessing of the model topology, and they use maximum posterior probability (MAP) scoring to guide the state-splitting and control model complexity. Their case study is based on a real-world process; they applied their approach to the trauma resuscitation process and extracted a multi-level intention model. [19] propose a novel approach using another variant of HMM which is Coupled Hidden Markov Model CHMM and using the Baum-welch algorithm to train the CHMM. To validate their approach, they collect financial statements from banks to describe patterns of fraud detection by the Financial Services Authority. Applying the design science, [18] created another Intention Mining oriented approach named FlexPAISSeer, with two principal component artifacts: the first is intentMiner which aims to discover the intentional model in unsupervised learning. The second is IntentRecommender which aims to generate recommendations based on intention and confidence factors. These two modules were tested in a case study with a Dutch software company using a childcare system that allows for flexible data-driven process enactment of various types. Moreover, the [23] use the NewsIR’16 repository knowledge base for any business event log for specific business rules, and applies the knowledge expertly to discover the user’s intentions for the business information system (Table 1).
3
Architecture of Multi Intelligent Agents System Approach
the Multi-Agent System approach is constructed using a four-layer architecture to discover an intentional process model and predict users’ strategies. The proposed architecture is depicted in Fig. 1, and each layer will be discussed in turn. 3.1
Description of Agents
The first layer is data pre-processing. The cleaning, filtration, reorganization, and normalization of data are the objects of the agent in this layer. It is necessary to prepare the data set to apply our approach; the number of activities is large and includes both recurring and non-recurring activities. i.e., less frequent activities. Because these activities have not been repeated enough to form a behavioral pattern, they are not representative of the user’s behavioral characteristics. For this reason, as well as for readability, we have limited this study to the most
238
H. Bouricha et al.
Fig. 1. Multi-Agent system Architecture for Intention Mining
common activities performed by users. An appropriate data is the output of this layer that will be used in the next layer, in [22] we detailed this layer of global architecture. The second layer consists of the following: following data pre-processing, the agent of this layer selects the most important parameters for discovering intentional process models and prediction users’ strategies that will be used as input to the third layer. The agent uses the Hidden Markov Model to determine the emission and transition matrices, using the Baum-Welch algorithm to discover the parameters of the intentional process model. HMM, models observed activities in terms of hidden strategies using transition and emission matrices. The agent of the third layer has strong communication and cooperation with the agents of the second and fourth layers. The agent uses the output of the second layer (emission matrices) to classify the strategies of the users on the one hand and predict the strategies on the other hand by applying the RNN-LSTM. In the layer, there are two agents; the first was created to discover and group strategies and sub-strategies into high-level strategies to build an intentional process model. This agent accomplishes this by grouping the sub-strategies into strategies using an RNN-LSTM classification. The second agent uses the emission matrices of the HMM model as input to predict the users’ intentions using the LSTM model. As previously stated, this multi-level topology results from the deep architecture of humans organizing their ideas and concepts hierarchically. The third layer’s output, a model for prediction and a classification model, is trained and submitted to layer 4. This layer contains two distinct subtasks: “prediction” and “presentation of the intentional process model.“The intentional process model presents the discovery of intention to find the gap between the described process and the real process. There is also a user interface for interacting with an expert, obtaining new and unknown sequences, viewing prediction results, and reporting significant variables through sensitivity analyses.
Intelligent Agents HMM-LSTM Model
3.2
239
Hybrid Model
As shown in Fig. 2, our main hybrid model combines HMM and LSTM sequentially. We begin by running the discrete HMM on the data and extracting the hidden state distributions from the HMM’s Baum-Welch algorithm (layer 2), and then sequentially adding this information to the architecture (layer 3 for LSTM). The LSTM component of this architecture can classify strategies and predict intent because it only needs to fill in the gaps in the HMM’s predictions. We begin by investigating a method that does not require the RNN to be modified to be understandable, as the interpretation happens after the fact. In this section, by extracting hidden states and approximating them with an emission-hidden Markov model, we can model the big picture of LSTM state changes (HMM). The emission matrices are then fed into the LSTM (see Fig. 2). The LSTM model can then use the HMM’s information to fill in the gaps when the HMM is underperforming, resulting in an LSTM with fewer hidden state dimensions.
Fig. 2. HMM-LSTM hybrid model
4 4.1
Experimentation and Validation Dataset
This paper is interested in the student’s learning activities. Educational Intention Mining (EIM) is an emerging field in educational process mining (EDM). EPM makes unexpressed knowledge explicit and facilitates a better understanding of the educational process by using log data gathered specifically from educational environments [14,15] also EIM uses this data to understand the users’ ways of thinking and working, which is another method to improve the process model. It is worth noting that we have found no research on the use of Intention Mining in the educational field. The experiments in this dataset were carried out with a group of 115 first-year engineering undergraduates from the University of Genoa. This research was conducted using Deeds (Digital Electronics Education and Design Suite), a simulation environment used for e-learning in digital electronics. The environment provides students with learning materials via specialized browsers and challenges them to solve a variety of problems of varying
240
H. Bouricha et al.
difficulty. This dataset contains a time series of student activities from the digital electronics course’s six practical sessions. Table 2 represents the activities related to the EIM dataset, each activity appears with an index such as a1 , which represents the “StudyEsn of sessionn of exercise. Table 2. A set of activities of EIM Dataset Index Related Activities
4.2
a1
StudyEsn of sessionn of exercise
a2
DeedsEsn of sessionn of exercise.
a3
DeedsEs
a4
Deeds
a5
T extEditorEsn of sessionn of exercise
a6
T extEditorEs
a7
TextEditor
a8
Diagram
a9
Properties
a10
StudyM aterials
a11
F SMEsn of sessionn of exercise
a12
F SMRelated
a13
Aulaweb
a14
Blank
a15
Other
Evaluation Metrics
To evaluate the results obtained by applying the hybrid HMM-LSTM to realworld data, we chose the recall, precision, and F-factor (a combination of recall and precision) powers2020evaluation.. Before we present these measures, we will define some terms that are pertinent to the context of this research. The accuracy of the HMM-LSTM prediction was determined by determining whether the prediction corresponded to the actual intentions. The proportion of intentions correctly identified by the HMM-LSTM to the total number of intentions in the dataset is referred to as “recall.” It should be noted that it does not take into account the number of intentions that the algorithm incorrectly identified. The following expression is used to define Recall: Recall = T P/(T P + F N )
(1)
Precision, on the other hand, is the ratio of intentions correctly identified by the HMM-LSTM to intentions correctly identified by the algorithm. This is how it is defined: P recision = T P/(T P + F P ) (2)
Intelligent Agents HMM-LSTM Model
241
In general, higher recall can lead to lower precision and vice versa. The F-score is made up of precision and recall: F − Score = (2 ∗ P recision ∗ Recall)/(P recision + Recall) 4.3
(3)
Result
In this section, we will present the results obtained in the experimentation. Start, study, apply, search, and end are the inferred names for intentions. The prescribed map is made up of five intentions: Study, Apply, Search, and Start and End. It also includes 89 student log files and 15 activities. The table represents the results obtained in terms of precision, recall, and F-score for the three intentions. Table 3 The mean values of recall, precision, and F-score across all strategies, which express the overall performance of trace estimation, are shown in the table, with high intention mining accuracy of 86%, the precision of 0.89%, recall of 0.83%, and F1-score of 0.87. Moreover, Fig. 3 depicts these metrics averaged over 30000 test sequences for the three intentions: Study, Search, and Apply, all of which indicate the feasibility of intelligent agents and hybrid models (HMM-LSTM) for intention mining. Table 3. Results of Precision, Recall and F-Score Precision Recall F-Score 0.89
0.83
0.87
Fig. 3. recall, precision and F-score for three intentions; Study, Search, Apply
5
Conclusion
In this paper, a multi-intelligent agent system was created to classify and predict users’ intentions as an effective factor in Educational Process Mining. A multiintelligent agent system has the following features:
242
H. Bouricha et al.
1. Our approach uses the multi-agent architecture as an appropriate approach for reducing prediction and classification problem complexity and increasing prediction and classification accuracy by generating autonomous intelligent agents that model independent sub-tasks of an intention mining problem. 2. It uses a hybrid model of HMM and LSTM, adds a variable intelligent agent to robustly determine the HMM parameters, and decreases the complexity of developing the model. The proposed approach was examined in a case study of Educational Process Mining (EPM): A Learning Analytics Data Set. The results of testing the model indicate that the classification and prediction accuracy of the proposed approach is significantly higher and can be considered a promising alternative for classification and prediction. Future research efforts will be devoted to: – Compare our approach with the existing approaches in the same case study. – Apply our approach to another case study. – Using another type of prediction and classification model equipped with the multi-intelligent agent.
References 1. Khodabandelou, G., Hug, C., Deneck`ere, R., Salinesi, C.: Process Mining Versus Intention Mining. In: Nurcan, S., et al. (eds.) BPMDS/EMMSAD -2013. LNBIP, vol. 147, pp. 466–480. Springer, Heidelberg (2013). https://doi.org/10.1007/9783-642-38484-4 33 2. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13(2), 260–269 (1967) 3. Graves, A., Fern´ andez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In : Proceedings of the 23rd international conference on Machine learning, pp. 369376 (2006) 4. Zarandi, M.F., Hadavandi, E., Turksen, I.B.: A hybrid fuzzy intelligent agent-based system for stock price prediction. Int. J. Intell. Sys. 27(11), 947–969 (2012) 5. Da Silva, J.C., Giannella, C., Bhargava, R., Kargupta, H., Klusch, M.: Distributed data mining and agents. Eng. Appl. Artif. Intell. 18(7), 791–807 (2005) 6. Weiss, G.: Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT press, Cambridge (1999) 7. Hafezi, R., Shahrabi, J., Hadavandi, E.: A bat-neural network multi-agent system (BNNMAS) for stock price prediction: case study of DAX stock price. Appl. Soft Comput. 29, 196–210 (2015) 8. Sokolova, M.V., Fern´ andez-Caballero, A.: Modeling and implementing an agentbased environmental health impact decision support system. Expert Syst. Appl. 36(2), 2603–2614 (2009) 9. Leondes, C.T.: Fuzzy logic and expert systems applications. Elsevier (1998) 10. Lussier, Y.A., et al.: Partitioning knowledge bases between advanced notification and clinical decision support systems. Decis. Support Syst. 43(4), 1274–1286 (2007) 11. Albashiri, K. A., Coenen, F., Leng, P.: EMADS: an extendible multi-agent data miner. In : International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, London, pp. 263-275 (2008) https://doi.org/10. 1007/978-1-84882-171-2 19
Intelligent Agents HMM-LSTM Model
243
12. Graves, A.: Long short-term memory. supervised sequence labelling with recurrent neural networks, pp. 37-45 (2021) 13. Deshmukh, A.M.: Comparison of hidden markov model and recurrent neural network in automatic speech recognition. Eur. J. Eng. Technol. Res. 5(8), 958–965 (2020) 14. Bogar´ın, A., Cerezo, R., Romero, C.: A survey on educational process mining. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 8(1), e1230 (2018) 15. Trcka, N., Pechenizkiy, M., Van der Aalst, W.: Process mining from educational data. Handb. Educ. Data Min., pp. 123-142 (2010) 16. Khodabandelou, G., Hug, C., Deneckere, R., Salinesi, C. Supervised intentional process models discovery using hidden markov models. In : IEEE 7th International Conference on Research Challenges in Information Science (RCIS). IEEE, 1-11 (2013) 17. Khodabandelou, G., Hug, C., Deneck`ere, R., Salinesi, C.: Unsupervised discovery of intentional process models from event logs. In : Proceedings of the 11th Working Conference on Mining Software Repositories, 282-291 (2014) 18. Epure, E.V., Hug, C., Deneck´ere, R., Brinkkemper, S.: What Shall I Do Next? In: Jarke, M., et al. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 473–487. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07881-6 32 19. Sungkono, K. R., Sarno, R.: CHMM for discovering intentional process model from event logs by considering sequence of activities. In : 2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI). IEEE, 1-6 (2017) 20. Yang, S., Ni, W., Dong, X., Chen, S., Farneth, R.A., Sarcevic, A., Burd, R.S.: Intention mining in medical process: a case study in trauma resuscitation. In : 2018 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, pp. 36-43 (2018) 21. Khodabandelou, G., Hug, C., Salinesi, C.: A novel approach to process mining: Intentional process models discovery. In : 2014 IEEE 8th International Conference on Research Challenges in Information Science (RCIS). IEEE, 1-12 (2014) 22. Hajer, B., Arwa, B., Lobna, H., Khaled, G.: Intention mining data preprocessing based on multi-agents system. Procedia Comput. Sci. 176, 888–897 (2020) 23. Diaz, O., P´erez, M.: Strategy Mining for Inferring Business Information System User Intentions. Appl. Sci. 12(12), 5949 (2022)
Unsupervised Manipulation Detection Scheme for Insider Trading Baqar Rizvi1,2(B) , David Attew2 , and Mohsen Farid1,3 1
3
Data Science Research Centre, University of Derby, Derby, UK [email protected] 2 Chief Regulatory Officer, Aquis Exchange PLC, London, UK {brizvi,dattew}@aquis.eu School of Computing and Engineering, University of Derby, Derby, UK
Abstract. Stock price manipulation in capital markets is the use of illegitimate means to influence the price of traded stocks to attempt to reap illicit profit. Most of the existing attempts to detect such manipulations have either relied upon annotated trading data, using supervised methods or have been restricted to detecting a specific manipulation scheme. Several research in the past investigated the issue of insider trade detection mainly focusing on annotated data and few components involved in insider trades. This paper proposes a fully unsupervised model based on learning the relationships among stock prices in higher dimensions using non-linear transformation, i.e., Kernel-based Principal Component Analysis (KPCA). The proposed model is trained on input features appended with reference price data extracted from the trades executed at the Primary Market: the market of listing. This is intended to efficiently capture the cause/effect of price movements about which insider trading was potentially committed. A proposed kernel density estimate-based clustering method is further implemented to cluster normal and potentially manipulative trades based on the representation of principal components. The novelty of the proposed approach can be explained by automated selection of model parameters while avoiding labelling information. This approach is validated on stock trade data from Aquis Exchange PLC (AQX) and the Primary Market. The results show significant improvements in the detection performance over existing price manipulation detection techniques. Keywords: Market Abuse · Insider Dealing · Stock Price Manipulation · Kernel Principal Component Analysis · Kernel Density Estimate Clustering
1
Introduction
Market manipulation is one of the key issues influencing market sentiments undermining investor faith both in a given security and exchange. One of most B. Rizvi—This research work is sponsored by Aquis Exchange PLC, London, and the UK Research and Innovation (UKRI). c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 244–257, 2023. https://doi.org/10.1007/978-3-031-35510-3_24
Unsupervised Manipulation Detection Scheme for Insider Trading
245
common market manipulation schemes is insider dealing. Insider dealing refers to the trading of a security whilst in possession of price-affecting non-public information. The UK’s Market Abuse Regulation (UK MAR), makes insider dealing, unlawful disclosure, market manipulation and attempted manipulation civil offences, within Financial Conduct Authority powers and responsibilities for preventing and detecting market abuse [1]. Criminal insider dealing is an offence under Part V of the Criminal Justice Act 1993, and criminal market manipulation is an offence under sections 89–91 of the Financial Services Act 2012. Methods used until now involving machine learning and some bio-inspired techniques to detect stock market manipulation have provided with acceptable results in an unsupervised environment. However, computational complexity and size of the dataset are still important challenges need to be addressed. This leads to unnecessary computations like the forming of new clusters and the calculations of the new parameter values, which are always susceptible to an error leading to false positives in detection. A rather conventional method of creating a model for insider dealing detection is to apply rule-based conditions with certain thresholds, with any trading instance exceeding the thresholds being treated as potentially anomalous. However, selected parameters and thresholds are fixed over time and independent of the underlying data. A better approach would be to select data-driven parameters and thresholds over an unsupervised model and validate it on the various datasets. The proposed method is leveraged upon KPCA multi-dimensional KDE clustering in [2] with the addition of reference prices from Primary Market trading records. The input dataset is first transformed non-linearly, onto higher dimensions using kernel principal component analysis. Multi-dimensional kernel density estimate (MKDE) is subsequently applied to selected features from KPCA. The rationale behind using such a technique is that it is data-driven, non-parametric, and self-dependent as compared to other traditional machine learning techniques that can also be applied in an unsupervised capacity. Some recent developments in unsupervised deep learning techniques like autoencoders in detecting anomalies have also attracted different application domains including computer vision, bio-medical sciences, industrial design, etc. [3–5]. However, being limited in the size of input data, any deep learning model should be avoided as it will be computationally complex. The critical desire to reduce the amount of computation by not sacrificing the accuracy of the results can be achieved by training the model using traditional machine-learning methods in an unsupervised environment. This will even save the amount of computations in simply calculating the distances from the decision boundary rather than optimizing it using labelled data. Besides the well-populated examples of insider trading [6], Fig. 1 shows an anonymised example of potential insider trading based on a trading behaviour by a client X on security A identified by Aquis Exchange as a suspicious incident. It shows that the client who has a record of trading on the given security makes an unusually large buy prior to favourable news release about its corporate merger that subsequently raised the market price of that security. Following the
246
B. Rizvi et al.
announcement client X had the opportunity to liquidate the position and sell at the elevated price. It can be observed that besides transactions flagged as suspicious that cause a spike in the time series, there also exist several similar instances of transactions classified as normal trades. Such an overlap in definition can noticeably lead to false positives. The given problem can be summarised both in terms of anomalous trades and substantial price change in reference price due to the public news release whilst clearly distinguishing the normal and abnormal trades. The proposed research aims to detect insider trades while addressing the above-mentioned problems by first transforming the input data, trades on both buy and sell sides (price and volume information) along with reference prices from external platforms onto higher dimensions using KPCA. The proposed multi-dimensional KDE clustering algorithm in the transformed domain is then applied for detecting anomalous trades. The idea is to also capture the uniformity in normal trading behaviour and to detect the instance when it starts to deviate towards abnormality.
Fig. 1. A potential instance of insider trading identified as a suspicious incident on Aquis Exchange PLC (normalised price)
2
Literature Review
Several researchers in the past proposed to improve insider trading detection with promising results, however, most of them focused on the impact of the information release on the market price, failing to address the trading patterns
Unsupervised Manipulation Detection Scheme for Insider Trading
247
of investors historically. In addition, most of them did not have the added advantage of using orders and trade instances that can recognise the individual trading behaviour of an investor over given security. Seth and Choudhary [7] proposed to detect insider trading events by proactively predicting insider trade prior to its real occurrence. The authors used multiple datasets from US equity markets based on SEC litigation releases related to insider dealing. The approach is used to forecast insider trading event while trained upon LSTM model on historical data. The results are further filtered out based on the earnings release information only i.e., only events closer to earnings release dates are selected. This makes the approach significantly biased as it ignores other publicly available information such as corporate news that also influences the price of a given security. The proposed approach by Goldberg et al. [8] envisaged the trading behaviour of investors on Primary Markets using logistic regression combined with the analysis of news releases using NLP. Whilst the model claimed to reduce the false positives and improve detection accuracy, it is a supervised learning model that makes it biased towards the training data. Such models are more prone to errors and false positives as the market trends and trading patterns evolves over time. Li et al.[9] proposed to detect market manipulation using supervised learning techniques on market close and tick trading data appended with labels. The results in terms of accuracy and AUC scores, proved significant performance improvement while using market close data but poor performance when using tick data authors claimed better performance when processing daily trading data and poor performance for tick trading data. The approach was implemented on price data from 64 stocks acquired from China security regulation commission (CSRC). However, the use of annotated data for manipulation detection makes any model biased towards the dataset under consideration. In addition, the fact that the labels are assigned manually to trading instances (given no accurate information about timestamps is available) makes the detection algorithm prone to errors. More, recently Close and Kashef [10] combined bio-inspired and clustering approach to detect market manipulation while comparing it with multiple combinations of bio-inspired and other existing unsupervised techniques. This approach was leveraged upon author’s own work [11] about using dendritic cell algorithm (DCA) as a transformation approach and clustering the transformed components to detect anomalous trading behaviour using KDE. The findings revealed that the original work performed better. Islam et al.[12] proposed an approach similar to [7] proposing the use of LSTM networks to forecast a potential insider trading event prior to the announcement of an official news release. The authors trained a LSTM network upon traded stock volume in an attempt to predict volume information for the test data. In order to train the model on relevant insider trading cases, an NLP model combined with decision tree algorithm is used. The approach could be leveraged upon by appending the input data to LSTM with traded price information. Training a model only upon volume data ignores the effect on security prices caused by significant trading, a potential feature that could also indicate the impact of public information release.
248
B. Rizvi et al.
As it is evident most of the aforementioned research either used supervised learning techniques or focussed on few specific components that can contribute towards insider trading detection. The approach proposed in this work is completely independent on data annotations while analysing the normal trading behaviour of investors. The input data in the proposed research also includes the impact of public news release on the price of the given security and the features that indicates the deviation from normal trading. The rest of the paper is organised as follows: Sect. 3 explains the methodology implemented followed by the experimental analysis in results and discussion section. This includes the dataset description and discussion of the results. The research is finally concluded in Sect. 5 where the research findings are explained.
3
Proposed Methodology
Fig. 2. Architecture of the work flow
The architecture of work process shown above in Fig. 2 can be explained in a way that the input dataset comprising of features, explained below are extracted from the stock prices were first windowed in to size, N. The input features are first subjected to KPCA following a non-linear transformation before applying MKDE clustering explained in Algorithm 1. 3.1
Feature Characterisation
The amount of useful information extracted from the data contributes significantly towards the progress of the model besides helping in reducing the redundancy and sparsity. The input to the proposed model is the traded stock information from several different stocks by a range of diverse trading participants on the Aquis Exchange. The set of input features used in this approach are described as follows,
Unsupervised Manipulation Detection Scheme for Insider Trading
249
Fig. 3. Probability distribution using kernel density estimate for top two principal components for 30 trading days in stock A by Client X along with its contour. The red circle marking the anomalous data point following MKDE clustering algorithm. (Color figure online)
– – – –
Total value of the trade (Buy and Sell), x(t) Reference Prices (from trades on LSE), r(t) Volume, v(t) A new feature vector amplifying the difference between two consecutive trading instances given it exceeds a certain threshold q(t) = |x(t) − x(t − n)| 3 ∗ q(t) q(t) > threshold w(t) = q(t) q(t) ≤ threshold
(1) (2)
The value of threshold should be equal or proportional to the abnormal percentage of change for a given client trading over a stock which can be picked based on their historical behaviour. Here, n is a set of instances in days ∈ 1,10. – 1 ex (x) > 0.95 & px (x) < 0.95 ∗ max(px (x)) (3) c(t) = 0 otherwise – The gradient of the new variable,
d(w(t)) dt
250
B. Rizvi et al.
– A feature vector only comprising of high frequency components, x ˆ(t) given the high frequency stock trades are more prone to the anomalies are considered. It is proposed to compute discrete wavelet transform (DWT) to filter out the low frequency components by decomposing the input signal up to single level into approximate and detail coefficients. Approximate coefficients represent low frequency components and detail coefficients represents high frequency components, Xa ,b Xa ,b ≤ λ (4) Xa ,b = 0 Xa ,b > λ where Xa,b represent the detail coefficients, a and b are shifting and scaling parameters for a given coefficient and λ is the threshold. Hard thresholding algorithm is further implemented in a way that the detail coefficients outside the threshold, calculated using universal threshold estimation method, are set to zero. These filtered components are then reconstructed using Inverse DWT. It can summarised that the input feature set F0 7 is a combination of seven different attributes and can represented as, ˆ(t)]. F0 7 = [x(t), r(t), v(t), c(t), w(t), d(w(t)) dt , x 3.2
Kernel Principal Component Analysis (KPCA)
Principal component analysis can be considered as the linear projection of input data onto an orthogonal space where variance is maximised [13] in an attempt to transform data to lower dimensions. It is well evident that components generated through standard PCA have added noise and indicate no meaningful pattern in the transformed domain [14–16]. In this approach Kernel principal component analysis (KPCA) is proposed to discover localised micro-structure patterns in stock trades over time. KPCA can be understood as the non-linear transformation of input data features having d onto higher dimensions m for d and . The trustworthiness use case is connected via a connection to the security use cases named “Giving the Availability and the Functionality of hardware and equipment”, “Keeping the Confidentiality and Integrity of Data”, and “Protection against attacks, vulnerabilities and intrusions”. These three use cases must be performed by the security actor. The > connection in the Fig. 3 was pointed from trustworthiness action to security actions. Moreover, the > and > connections are also utilized. Using use case diagram, we model the security of physical and network layers of the IoT systems as shown in the figure bellow.
Fig. 3. Use case diagram for security modeling the physical and the network layer of IoT systems with UML.
A Comparative Study for Modeling IoT Security Systems
265
4.3 Modeling IoT Security Systems Using SysML Language: Requirement Diagram This section aims at providing security requirements in the two first layers of IoT. Security is an important characteristic of an IoT system. Security requirement cannot be proposed for the whole of the IoT system due to several reasons. To design the IoT security system we need to model IoT security system requirements, among which some functions can be used like Availability, Confidentiality, Integrity, Authentication, Authorization, Access Control, Key Management, Lightweight Cryptography, Encryption, Privacy, IoT communication technologies and protocols protections, Trustworthiness, Reliability, Intrusion Detection System (IDS). The main aim is to ensure the security of the IoT on two levels. The objective of this subsection is to model the constrained and security requirement related to the physical and the network layer. The two next subsections split the need to be delivered on the physical and network layer because each of them has its security issues and problems. Requirement diagram is the chosen diagram among the SysML diagrams, it consists of targeting the requirements of the physical and the network layers as presented bellows. The first step is to break down a requirement into several unit requirements. To design a physical layer, several requirements are needed. In other words, in order to model the security of the physical layer we need to consider the following concerns: • For resource constrained/limited, we need Lightweight Cryptography as a solution; • Availability of sensors: the equipment (like sensors) of the physical layer must be available and functional at all the time; • The integrity of massive data generation needs to be checked; it must not be falsified or modified by a third because this layers exchange and generate enormous of sensitive data that will be used by the users. So, data integrity is essential to deal with attacks; • CIA security: CIA could be guaranteed in every IoT devices/system. Cryptography is the most known solution for considering the CIA security pillars (Confidentiality, Integrity, Availability). • Accuracy must be improved to enhance integrity of these sensitive data. For best accuracy, we need to ensure data integrity for objects; • Reliability: against attacks, threats and vulnerabilities. For that, reliability must be solved and applied. Several techniques and methods are used to secure these systems like AI (Artificial Intelligence); • Usability: there are many connected objects deployed in IoT systems. This requirement must be improved and need to be considered; • Resistance: for physical security, the sensors must be discreet in the case for example of forest fire detection, otherwise the sensor will be stolen. In the other hand, these sensors must not be physical (for animals), for instance, a cartographic study of the forest is necessary; • Performance: for energy limitations, the battery must be charged/rechargeable for a specific period of time (6 months for example), it must save as much energy as possible. It is very important for the IoT devices to keep the battery for a longer time/duration for an application where it is not easy to recharge for example sensor in a disaster place or in river [12], or rainfall, Wind;
266
M. Hind et al.
• Trustworthiness: To prove the identity of an IoT system and confirm trust in third party [13], the trustworthiness must be taken into consideration. Network layer security concerns include: (1) Heterogeneity and Compatibility, (2) anomaly and intrusions must be detected, (3) Encryption, and (4) Reliability. Among the security requirements that are needed for modeling the security of the network layer are as follows: • Compatibility: The heterogeneity of IoT systems can be defined as a challenge related to different communication technology. The heterogeneity of the network layer is due to different communication technology, different components with different characteristics. A compatibility [11] is the major challenges because in IoT system we need to make certain that the different communication technologies can work with each other; • Network Security: the system must detect the anomalies and intrusions at the time of occurrence. In other words, IDS on the gateway layer will be the best idea for detecting intrusions in the time it occurs; • Reliability: The network layer is threatened by various attacks due to the communication technologies. So, the technologies of communication of IoT must be secured, protected and not vulnerable as well as reliable.
5 Analysis and Discussion To secure IoT systems, there is no specific IoT architecture that has been accepted as standards. The aim of modeling with UML use case diagram is to compare this modelization with the SysML requirement diagram to discover which is most suitable and the efficient one for modeling IoT security systems (see Table 2). Firstly, there is a weak security design. Secondly, the lack of study on modeling IoT systems. Thirdly, the lack of modeling languages for IoT systems as well as security context. The physical and network layers have been modeled to ensure the security of IoT systems. The Table 2 shows a comparison of the modeling languages including UML, SysML and ThingML based on four points [2]. While, the Table 3 bellow shows the advantages and disadvantages of UML and SysML languages. The two Tables 2 and 3 show the limitation of UML and the effectiveness of SysML language. Table 2. Comparison of the different modeling language. UML
SysML
ThingML
UML extension
✖
✔
✖
Extension specific for IoT
✖
✖
✔
System security concerns model
✖
✖
✖
Security requirements modeling
✖
✖
✖
A Comparative Study for Modeling IoT Security Systems
267
Table 3. Advantages and Drawbacks of UML and SysML Languages for IoT modeling security systems. Advantages
Limitations
UML
The great number of resources available [5]; Documenting and visualizing software systems design
Not suitable to model complex systems; It is limited to model hardware, architecture, physical constraints like battery life and energy consumption; There is not an UML extension for modeling IoT security concerns in a visual representation [2]; Does not design requirement system
SysML
Suitable to design complex systems; It can be used for modeling physical constraints; It is simpler to express requirements/needs; Possibility to model security needs and hardware architecture; Possibility to model the link between hardware and software; Possibility to model the security requirements for IoT domain
The drawback of SysML modeling language is that the requirements are presented as a text
For IoT systems, there is no standard or suitably representative language. From the Table 2, we can say that the SysML language is more suitable for modeling IoT security systems than UML language. From the Table 3, to meet our goals in terms of design, we considered the SysML profile. IoT challenges include resource constrained, compatibility and heterogeneity. For this reason, SysML has been chosen due to its efficiency on modeling such complex systems like IoT with high security constraints; Based on a figure of the fourth section, we can say that, UML language does not design neither IoT resource constrained nor IoT requirement security system. In other words, constraints and performance are not modeled by the use case diagram. Therefore, OCL (Object Constraint Language) is another language for expressing constraints. Based on Table 2 and 3, we can say, that SysML is the most suitable for modeling IoT security systems than UML language. The Physical and the Network layers are the layers that need more security in the IoT environment [11]. In this research, I try to model these two layers by using two languages (UML and SysML), and two diagrams including (Use Case, and Requirement diagrams). As a results, SysML is the best way to visualize the design of the IoT system, it also enables trustworthiness, performance, security, reliability and accuracy for the proposed system, as well as the possibility to model the security requirements for the IoT domain. It aims at describing interface requirements. The main aim is to enhance security on these two layers as well as to prove the choice of SysML language and more precisely the Requirement diagram by making a detailed and comprehensive comparison and analysis between them.
268
M. Hind et al.
6 Conclusion Nowadays, IoT become widely used in almost all domain ranging from personal uses to professional application. “Design for security” consists of considering security from the design phase to the implementation phase. To model security needs, SysML is the best choice and the most suitable. The objective of this work is to obtain the best IoT modeling security systems by using the best language, and which is the most efficient diagram that suits us in the case of security. Therefore, this article provides a best effort to investigate a detailed and comprehensive modelization of IoT security system. We have been compared the two modeling languages. This study looked for security modeling in these layers. The results carried out show the effectiveness of SysML for modeling IoT security. The SysML language is the most suitable for modeling IoT security systems. For this reason, we chose the SysML as our modeling language. To conclude, the idea of this paper is a comparative study between UML and SysML for IoT domain. SysML is chosen as the best language. From SysML, requirements diagram is the best in terms of security. The layers to model are Physical and Network.
References 1. https://uml.developpez.com/cours/Modelisation-SysML/. Accessed 2022/04/05 2. Robles-Ramirez, D.A., Escamilla-Ambrosio, P.J., Tryfonas, T.: IoTsec: UML extension for Internet of things systems security modeling. In: 2017 International Conference on Mechatronics, Electronics and Automotive Engineering (ICMEAE), pp. 151–156. IEEE (2017) 3. Friedenthal, S., Moore, A., Steiner, R.: A practical guide to SysML: the systems modeling language. Morgan Kaufmann (2014) 4. Belloir, N., Bruel, J.M., Hoang, N., Pham, C.: Utilisation de SysML pour la modélisation des réseaux de capteurs. In: LMO, pp. 169–184 (2008) 5. Geller, M., & meneses, A.A.: Modeling IoT Systems with UML: A Case Study for Monitoring and Predicting Power Consumption (2021) 6. Thramboulidis, F.: Christoulakis, UML4IoT—A UML-based approach to exploit IoT in cyber physical manufacturing systems. Comput. Ind. 82, 259–272 (2016) 7. Reggio, G.: A UML-based proposal for IoT system requirements specification. In: Proceedings of the 10th International Workshop on Modeling in Software Engineering, pp. 9–16 (2018) 8. Costa, B., Pires, P.F., Delicato, F.C., Li, W., Zomaya, A.Y.: Design and Analysis of IoT Applications: A Model-Driven Approach, in IEEE 14th Intl Conf on Dependable, pp. 392– 399. Autonomic and Secure Computing, Auckland (2016) 9. Hind, M., Noura, O., Amine, K. M., and Sanae, M. Internet of Things: Classification of attacks using CTM method. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security, pp. 1–5 (2020) 10. Meziane, H., Ouerdi, N., Kasmi, M.A., Mazouz, S.: Classifying security attacks in IoT using CTM method. In: Ben Ahmed, M., Mellouli, S., Braganca, L., Anouar Abdelhakim, B., Bernadetta, K.A. (eds.) Emerging Trends in ICT for Sustainable Development. ASTI, pp. 307– 315. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-53440-0_32 11. Meziane, H., Ouerdi, N.: A Study of Modelling IoT Security Systems with Unified Modelling Language (UML). Int. J. Adv. Comput. Sci. Appl. (IJACSA) 13(11) (2022)
A Comparative Study for Modeling IoT Security Systems
269
12. Jha, D.N., et al.: IoTSim-Edge: a simulation framework for modeling the behavior of Internet of Things and edge computing environments. Software: Practice and Experience 50(6), pp. 844–867 (2020) 13. Akram, H., Konstantas, D., Mahyoub, M.: A Comprehensive IoT Attacks Survey based on a Building-blocked Reference Model. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 9(3) (2018)
Improving the Routing Process in SDN Using a Combination of the Evidence Theory and ML Ali El Kamel(B) , Hamdi Eltaief, and Habib Youssef University of Sousse, Prince Research Lab, ISITCOM, Hammam Sousse, Tunisia [email protected]
Abstract. Today, Denial of Service (DoS) attacks are recognized as a major jeopardy for network performance and generally lead to distrusted paths. Henceforth, Trustworthiness has gain more concerns from researchers which focus on protecting communications by introducing new secure routing approaches. Particularly in SDN, trust-driven secure routing is recognized as a promising solution for providing security efficiently in future networks. This paper proposes a trust-based routing approach as an alternative to Shortest Path First (SPF). It aims to forward packets through trusted routes in SDN. It is based on the classification of each switch either as benign, malicious or uncertain node according to a trustworthiness score computed by aggregating multi-side recommendations using the Dempster-Shafer (D-S) combination rule. The classification is achieved by a Multi-class Support Vector Machine (MC-SVM) model which aims to build a trust graph that is used to find the most trusted path joining a source to a destination using a trust-driven cost. Simulation results and analysis show that the proposed scheme performs better than the SPFbased baseline routing process in terms of the end-to-end throughput and the end-to-end delay.
Keywords: SDN SVM
1
· Security · Dempster-shafer (D-S) · Multi-class
Introduction
Today, more concerns need to be addressed to the security issue in multiple network fields, particularly in routing issue. Obviously, the common security strategies based on end-point authentication and data encryption have proved their efficiency in defending the conventional networks. However, they seem to be unsuitable for application in SDN because of 1) the emerging of real-time applications with hard resource and time constraints and 2) the large vector of attacks mainly caused by the rifeness of SDN and 3) the large processing latency and the high overhead introduced by the common techniques of security. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 270–280, 2023. https://doi.org/10.1007/978-3-031-35510-3_26
ISDA2022
271
Therefore, an alternative purpose consists of moving from security which aims at protecting data to security which focuses on protecting nodes. One of the most promising solutions to protect nodes is to trust them. Trustworthiness consists of computing a trust degree which reflects the behavior, reliability and competency of an individual node. The trust degree can be used to identify malicious nodes so that routing decisions can be promptly taken to protect the network efficiently. This can be achieved through an intelligent classification of nodes based on their trust degrees. Many classification techniques have been developed in the last years among them the Support Vector Machine (SVM). This paper puts forward a Trust-based routing approach in SDN that aims to establish secure routes by dynamically identifying malicious switches then escaping from them during routing process. Malicious switches are identified using the D-S evidence theory and the SVM algorithm. Indeed, a Multi-class SVM (MC-SVM) model is used to classify a switch as benign, malicious or uncertain according to a per-switch global trust vector. The global trust vector consists of a set of trust degrees computed through an integration of direct and indirect trust values. Based on the classes of switches, the controller establishes a trust graph in which a new trust-driven cost is assigned to each link and used to find the most trusted path joining end-points by application of a new algorithm denoted Trust Path First (TPF). The rest of this paper is organized as follows. Section 2 presents related work on trust-based security. Section 3 details the trust-based routing scheme and all related techniques and processes. Section 4 outlines the computation of the global trust vector. Section 5 describes the multi-class SVM-based classification model and how it is used to classify switches. Section 6 presents the establishment of the trust graph and Sect. 7 presents the Trusted Path First algorithm. Finally, Sect. 8 shows simulation results before exposing concluding remarks and future work.
2
Related Work
Authors in [1] assume that the main cause of DDoS attacks in SDN is compromised hosts. They reveal also that compromised hosts exploit the vulnerabilities of Openflow to propagate their attacks. To deal with this issue, they propose a time and space-efficient solution for the identification of these compromised hosts. They prove through simulation that the solution consumes less computational resources and space and does not require any special equipment. Detecting compromised hosts can be achieved through a Trust Management Scheme (TMS). Trust management can be data-based or node-based. To the best of our knowledge, most work has focused on data-based TMS while little researches have addressed mitigating DDoS attacks through the use of nodebased TMS. As for example, authors in [2] suggest an efficient TMS for SDN applications. It helps the controller to authenticate network applications and
272
A. El Kamel et al.
to set authorisation permissions that inhibit manipulation of network resources. Authors in [3] propose TRUFL, a distributed mechanism for establishing and verifying host’s trust in SDN. The proposed framework achieves faster transfer rates and reduced latency compared to centralized trust management. Authors in [4] propose a trust establishment framework for SDN. The main idea is to establish direct trust between OpenFlow SDN controller and the applications. Finally, authors in [5] propose a novel TMS for SDN applications. The proposed solution evaluates applications’ trustworthiness based on their impact on several network performance parameters. Then, trust values are used to take a decision on managing and selecting authorized applications in SDN. Li et al. [6] propose an extension to the AODV (ad hoc on-demand distance vector) routing protocol widely used as a routing protocol in WSN. They define a trust-based recommendation model of trusted routing nodes which helps to improve the security of the routing environment. In [7], Lu et al. propose a secure routing scheme based on a quantification of the behaviors of the routing nodes. In [8], authors propose a QoS-aware routing algorithm which can evaluate the trustworthiness of the forwarding nodes. The algorithm integrates a direct QoS trust value obtained from adjacent routing nodes and the indirect trust value of the 2-hop neighbor routing nodes, computed based on the multi-hop transitive rule. In [9], researchers use the “reputation system” model to schedule routing paths in WSN. This system evaluates the time-series reputation of other nodes to make routing selection. Mainly, it is required to build a reputation table per each node to maintain the historical behaviors. Unfortunately, this cannot guarantee the real-time security of WSNs.
3
Overview of the Trust-Based Routing Scheme
The proposed scheme aims at establishing a trusted path joining a source to a destination. A trusted path is a path which does not include malicious switches. A switch is said to be malicious if it is classified as a misbehaving node through an evaluation of its trustworthiness. Trustworthiness is computed regarding manyside believes which are combined using the Dempster-Shafer (D-S) evidence theory. Indeed, D-S theory aims at computing a per-switch Global Trust (GT) vector through the combination of both a direct and indirect trust values. Direct trust value refers to the believe of the controller on the switch while indirect trust value is acquired from an integration of many-side recommendations that are solicited from all the switch’s neighbors. Both the direct and indirect trust values depend on several trust factors collected from the switch as well as from neighbors. The per-switch GT vector is used to label switches either as Benign ({B}), Malicious({M }) or Uncertain({U }). Classification is based on the OneAgainst-All (OAA) Multi-Class Support Vector Machine (MC-SVM) and solved through a Binary Decision Tree (BDT) model.
ISDA2022
273
Beyond the classification of switches, the controller establishes a direct trust graph. Then, a trust-driven cost is computed and assigned to each link. Finally, a Trust Path First (TPF) algorithm is applied to find the most trusted path joining a source to a destination.
4
Global Trust (GT) Vector Computation
The per-switch Global Trust (GT) vector is a 3-D vector of real values inputted to a multi-class SVM algorithm in order to label a switch as of B, M or U class. Each value in the GT vector is based on two independent degrees: the Direct Trust (DT) degree and the Indirect Trust (IT) degree. The DT describes the belief of the controller while the IT degree is obtained through an aggregation of many-side recommendations collected from the switch neighbors. Aggregation is based on the Dempster-Shafer (D-S) evidence theory. The computation process consists of the following steps: 1)Trust factors extraction and 2)Global Trust Vector computation. 4.1
Trust Factors Extraction
A network attack results in significant changes in the node behavior. In SDN, various factors play a significant role in reporting abnormal behaviors when a flooding attack against the controller or a memory exhaustion against a switch is taking place. 1. Memory Size Factor (MSF): the factor M SFi rates the changes of the flowtable size in the switch si . 2. Incoming Requests Factor (IRF): the factor IRFi rates the changes of the number of incoming new flows in the switch si . 3. PacketIn Messages Factor (PMF): the factor P M Fi rates the changes of the number of PacketIn messages transmitted by the switch si to its assigned controller. 4. Data Forwarding Factor (DFF): the factor DF Fi,j rates the changes of the number of bytes successfully transmitted by the switch si to another switch sj . 5. Successful Packet forwarding factor (SPF): the factor SP Fi,j rates the ratio of packets successfully transmitted by the switch si to the switch sj . 4.2
Computation of the GT Vector
Let GTi,j be the global trust vector associated to switch sj according to a controller ci and regarding various trust hypothesis. This vector is defined as follows: GTi,j = (mi,j (B), mi,j (M ), mi,j (U ))
(1)
where mi,j (A) (for A ∈ {B, M, U }) defines the global belief of the controller ci on the switch sj to be classified as A. mi,j (A) is computed through a weighted
274
A. El Kamel et al.
sum of both a direct trust degree DTi,j (A) and an aggregated indirect trust degree ITi,j (A) as follows: mi,j (A) = αDTi,j (A) + (1 − α)ITi,j (A)
(2)
α ∈ [0, 1] is a confidence factor which describes how confident is the switch sj regarding its controller. Direct Trust Vector. The direct trust degree on a switch sj , DTi,j (A), is computed and updated periodically by the controller ci through a weighted sum of a set of trust factors: M SFj , IRFj and P M Fj (Eq. 3). DTi,j (A) = f (M SFj , IRFj , P M Fj )
(3)
The functions f is a weighted sum function chosen in advance according to the specific assignments of the network. Indirect Trust Vector. The indirect trust vector of the switch sj is computed based on a multi-side recommendations obtained from sj ’s neighbors. In order to avoid trust recycle recursion and decrease network communication payload, the recommendation values are only confined to direct one-hop neighbors. Computation of One-hop Indirect Trust Vector Let V (sj ) be the set of neighbors of the switch sj . If su ∈ V (sj ), the indirect trust vector of the controller on switch sj obtained through a recommendation of the switch su is defined as follows (Eq. 4): u (A) = DTi,u (A) × DTu,j (A) (4) ITi,j DTu,j (A) defines the direct trust vector that su has on sj and is based on a weighted sum of factors DF Fu,j and SP Fu,j . Aggregation of Multiple Indirect Trust vector: Given a set V (sj ) of neighbors of the switch sj , all indirect trust vectors are combined through the D-S rule in order to avoid false recommendations from any neighbor switch which may be compromised or may be itself a malicious switch. Firstly, the consistent intensity vector Iu,v is computed between all switches su and sv ∈ V (sj ) as follows (Eq. 5): 1 u − IT v )2 (ITi,j (5) Iu,v = 1 − i,j 2 Next, the total consistent intensity of the trust recommendation of switch su on switch sj is computed as follows: Iu,v Iu =
v∈V (sj ),v=u
max
w∈V (sj ) v∈V (s ),v=w j
Iw,v
(6)
ISDA2022
275
Iu is used to amend the BPA (the mass function) of each trust hypothesis as follows: u u (A) = Iu × ITi,j (A) (7) ITi,j Finally, the indirect trust value associated to each trust hypothesis A ∈ {B, M, U } is expressed as follows: u ITi,j (A) ITi,j (A) =
u∈V (sj )
u∈V (sj )
5
Iu
(8)
Multi-class SVM Classification Model
The problem in this paper is defined as a multi-class classification problem where a set of N-samples training data is used. Each sample is represented as a tuple (xi , yi ) (∀i = 1 · · · N ) where xi = (x1i , x2i , x3i ) ∈ R3 corresponds to the trust vector of the ith sample and yi ∈ {B, U, M } its expected class. The 3-class classification problem can be solved using the Binary Decision Tree (BDT) [11]. BDT offers an impressive improvement in categorization speed even in problems where a large number of classes is addressed. BDT consists of arranging multiple SVMs in a binary tree structure having M − 1 nodes, where M is the number of classes. Each SVM in the tree is a One-AgainstAll (OAA) binary classification problem and is trained using two of the classes. In our proposed approach, only two SVM nodes are needed to be trained in order to classify switches.
6
Trust Graph Establishment
Beyond each classification process, the controller establishes and maintains a dynamic “Trust graph”. A trust graph consists of a set nodes where each node is marked with B, M or U according to the class of the corresponding switch. Then, the controller assigns to each edge eij connecting the switch si to the switch sj , a cost cij which is defined as follows: – cij = 1, if sj is a B-class switch (whatever is the class of si ). – cij = 0, if sj is a M-class switch (whatever is the class of si ). – cij = α1i , if sj is a U-class switch. αi is the number of all uncertain switches belonging to the set of neighbors of si . Given a route R = (s1 , · · · , sn ) of n switches that joins a source to a destination, we define the Trust Score (T − score) of the route R as follows: cij (9) TR = (si ,sj )∈R2
276
A. El Kamel et al.
The main objective is to find the most trustworthy path that joins a source to a destination through a maximization of TR . We formulate the objective function as follows: cij (10) R∗ = max TRk = max Rk
7
Rk
2 si ,sj ∈Rk
TPF: Trusted Path First Algorithm
The proposed algorithm is denoted TPF for Trusted Path First. It aims to find a path which maximizes the Trust Score as defined in Eq. 9. It inputs a trust graph and returns the most trusted route. Steps of the proposed algorithm are listed below: – Step 1; Initialization: s0 is the ingress switch and V is a 1-D array holding all the visited switches – Step 2: For all switch si except neighbors of s0 , set c0i = 0. – Step 3: find sk which is not visited and such that it has the maximum cost – Step 4: add switch sk to V since it has now been visited – Step 5: update costs for all the neighbors of sk which are not yet in V, using the following equation: ckj = max(c0j , c0k × ckj ). – Step 6: Repeat Steps 3 to 5 until all the switches are in V (all the switches have been visited).
8 8.1
Simulation Results Evaluation of the Classification Model
We consider a training dataset of 5000 samples obtained from the kaggle repository [13]. In this dataset, 75% of samples are used for training while 25% (so 1250 samples) are used to validate the model. We evaluate the classification through the following metrics: Precision, Recall and F1-Score. Hyper-parameters of the MC-SVM model are shown in Table 1. In the Table 2-(a), the performance metrics of the classification model are presented. After being trained, the model offers a precision of 0.93 to classify malicious nodes correctly, 0.9 for uncertain nodes and 0.84 to recognize benign nodes. The confusion matrix of the classification model is presented in Table 2-(b). The model is able to deliver acceptable TP rates which are 67,6%, 96,75% and 89,02% for M-class, U-class and B-class, respectively. Finally, Fig. 1 illustrates the performance of the classification using various kernel functions. The classification is optimal for C = 100, with RBF kernel and a γ = 1. Hereby, The F1-score reaches 0.90.
ISDA2022
277
Fig. 1. Evaluation of the SVM-based learning model Table 1. Hyperparameters of the MC-SVM Parameter Description C
Value
Soft-margin C-value 100
γ
The gamma value
Kernel
The kernel function Linear, Poly, RBF
1
Table 2. (a): Performance of the model with Kernel function RBF, γ = 1 and C = 100, (b): Confusion matrix for RBF, γ = 1 and C = 100 Class Precision Recall F1-score
M
U
B
M
0.93
0.68
0.78
M(266) 67.6% (180) 32.4% (86)
U
0.90
0.97
0.93
U(893) 1.58% (14)
96,75% (864) 1.67% (15)
B
0.84
0.89
0.87
B(91)
10.98% (10)
8.2
0%
0% 89.02%(81)
Evaluation of the TPF Algorithm
We consider the topology described in Fig. 2-(a). All algorithms are implemented in python and integrated in the Ryu controller [14]. In this simulation, one controller, 10 switches and 6 hosts are considered. The controller is able to handle PacketIn messages at the rate of 80 kfps. Hosts h3 and h4 are expected to run TCP-SYN flooding attacks on s4 and s8, respectively. Each host is initially allowed to generate 15 kfps using the Multi-Generator (MGEN) Network Test Tool [18] toward the target server h5 at a rate of 150 kbps. Packets are forwarded through the shortest path using classical routing. Attacks are accomplished using hping3 network tool [15] for a duration of 30 s. Data plane is simulated using Mininet [16] and all switches use OpenVSwitch [17]. Simulation takes 100 s and is repeated 30 times.
278
A. El Kamel et al.
The first simulation analyzes the efficiency of the TPF algorithm regarding the throughput offered to h0 during its connection with h5. TPF is compared to the Shortest Path First (SPF). We consider the case of a single attack point (denoted by 1-AP) and the case of multiple attack points (2-AP). In this case, h3 starts attacking s4 at t = 30 s. Then at t = 60 s, h4 starts attacking the switch s8 similar to h3. The rate of new-flow generation at h3 and h4 goes from 15 kfps to 40 kfps. Results are shown in Fig. 3-(a). When the attack starts, the throughput of h0 falls down for more than 10 kbps when SPF is used whilst the throughput is slightly affected by the attack traffic using TPF. Mainly, this is due to the fact that SPF is mainly based on link states but does not rely on the current states of nodes. Hence, a switch may become overloaded besides a flooding attack but the controller will not be aware since no changes on links are triggered. TPF involves both link and node states so it is able to perform better than SPF. As the number of attack points increases, the throughput offered by SPF-based routing is more affected compared to TPF-based routing. As shown in Fig. 3-(b), the throughput offered remains nearly close to 140 kbps while it goes under 120 kbps for SPF when multiple attacks happen (2-AP). The second simulation analyzes the capacity of TPF to ensure efficient packet
Fig. 2. (a)Simulation topology (b)End-to-End delay for different scenarios (1-AP, 2-AP)
Fig. 3. (a) Throughput offered to h0 when only s4 is attacked (b) Throughput offered to h0 when both s4 and s8 are attacked
ISDA2022
279
forwarding for delay-sensitive applications. Results are shown in Fig. 2-(b). TPF ensures reduced end-to-end delay for packets even if multiple attacks happen simultaneously. At 100 s, the disparity between delays of SPF and TPF exceeds 10 ms for the scenario of 2-AP and reaches 7.4 ms for single attack point.
9
Conclusion and Future Work
This paper aims to ensure secure routing within SDN-capable networks. Mainly, it proposes a node-driven security mechanism which consists of computing a set of trust degrees and using them to classify the switch as Benign, Malicious or Uncertain. Classification is based on a multi-class SVM model and solved through a One-Against-All (OAA) algorithm. The classification of switches leads to a trust topology of the network where links’ weights are updated according to classes of corresponding switches. Hereby, the TPF algorithm is proposed. As shown in results, the proposed approach outperforms the SPF-based routing when a single-point or multi-points attack happens. More scenarios are currently being implemented to assert efficiency of the proposed approach even in Very Large-Scale Networks (VLSN). As future work, the proposed approach will be extended to multi-controllers SDN so a TPF-like algorithm is to be defined to support secure inter-domain routing.
References 1. Ali, S., Alvi, M.K., Faizullah, S., Khan, M.A., Alshanqiti, A., Khan, I.: Detecting DDoS attack on SDN due to vulnerabilities in OpenFlow. In: International Conference on Advances in the Emerging Computing Technologies (AECT) 2020, pp. 1–6 (2019) 2. Lawal, A., et al.: A trust management framework for network applications within an SDN environment. In: 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA) (2017), pp. 93–98 (2017) 3. Chowdhary, A., et al.: TRUFL: distributed trust management framework in SDN. In: ICC 2019 - 2019 IEEE International Conference on Communications (ICC), pp. 1-6 (2019) 4. Burikova, S., Trust Management Framework, A., for Software Defined Networksbased Internet of Things, et al.: IEEE 10th Annual Information Technology. Electronics and Mobile Communication Conference (IEMCON) 2019, 0325–0331 (2019) 5. Yao, Z., Yan, Z.: A trust management framework for software-defined network applications. Concurrency Computat Pract Exper. (2020) 6. Li, X.; Lyu, M.R.; Liu, J.: A trust model based routing protocol for secure ad hoc networks. In: Proceedings of the 2004 IEEE Aerospace Conference Proceedings, Big Sky, MT, USA, 6-13 March 2004; Volume 2, pp. 1286–1295 (2004) 7. Lu, Z., Sagduyu, Y.E., Li, J.H.: Securing the backpressure algorithm for wireless networks. IEEE Trans. Mob. Comput. 16, 1136–1148 (2017) 8. Sirisala, N., Bindu, C.S.: Recommendations based QoS trust aggregation and routing in mobile Adhoc networks. Int. J. Commun. Netw. Inf. Secur. 8, 215 (2016)
280
A. El Kamel et al.
9. Venkataraman, R., Moeller, S., Krishnamachari, B., Rao, T.R.: Trust-based backpressure routing in wireless sensor networks. Int. J. Sens. Netw. 17, 27–39 (2015) 10. Gordon, I.S.: Edward. The Dempster-Shafer theory of evidence (1990) 11. Meshram, A., Gupta, R.: Sanjeev Sharma3. In: Advanced Probabilistic Binary Decision Tree Using SVM for large class problem 12. Dijkstra, E.W.: A short introduction to the art of programming (1971) 13. DDOS attack SDN Dataset, Published: 27-09-2020,Version 1, Contributors: Nisha Ahuja, Gaurav Singal, Debajyoti Mukhopadhyay 14. Ryu controller: https://ryu.readthedocs.io/en/latest/getting_started.html. Accessed 1 Oct 2021 15. Hping3: http://linux.die.net/man/8/hping3/. Accessed 3 Oct 2021 16. Mininet Overview: http://mininet.org/overview/. Accessed 2 Oct 2021 17. Openvswitch 2.5 documentation: https://www.openvswitch.org/support/distdocs-2.5/. Accessed 2 Oct 2021 18. Multi-Generator (MGEN) Network Test Tool, U.S. Naval Research Laboratory. https://www.nrl.navy.mil/Our-Work/Areas-of-Research/InformationTechnology/NCS/MGEN/ 19. PCL: IBM Packet Capture Library. IBM/AIX Documentation
GANASUNet: An Efficient Convolutional Neural Architecture for Segmenting Iron Ore Images Ada Cristina Fran¸ca da Silva1 and Omar Andres Carmona Cortes2(B) 1 2
Programa de P´ os-Gradua¸ca ˜o em Engenharia da Computa¸ca ˜o e Sistemas (PECS), Universidade Estadual do Maranh˜ ao (UEMA), S˜ ao Luis, MA, Brazil Departamento de Computa¸ca ˜o (DComp), Instituto Federal do Maranh˜ ao (IFMA), S˜ ao Luis, MA, Brazil [email protected]
Abstract. Iron ore segmentation has a challenge in segmenting different types of ores in the same area; the detection and segmentation of iron ore are used to analyze the material quality and optimize the plant processing. This paper presents an UNet-based Convolutional Neural Network (CNN) optimized by a technique so-called Neural Architecture Search(NAS) to segment fine iron ore regions. The images were collected from an iron ore plant, in which it was possible to obtain a dataset composed of 688 images and their label segmentation. The results of the optimized architecture show that the UNet-based architecture achieved a result of 80% of Intersect Over Union(IoU) against UNet without optimization with 75% and DeepLabV3+ with 78%, respectively. Keywords: UNet-based Vision · CNN
1
· Segmentation · Iron ore · Computer
Introduction
Iron ore segmentation is a key to measuring the granulometry distribution in a production plant. The monitoring process is fundamental to analyzing the material quality in real-time, and the granularity variable is valuable to make decisions to optimize the plant’s results. Visual estimations are still analyzing some processes in iron ore plants; however, machine learning algorithms can improve this identification process in speed and certainty. Especially because segmenting this material in a variety of ores type in the same image, with different textures and formats, is a significant challenge. In this context, classic techniques have been deeply explored in image processing to segment ores (e.g., Watershed segmentation or Threshold-based method). On the one hand, these approaches present significant problems with oversegmentation. On the other hand, deep learning segmentation-based techniques are less sensitive to noise and faster, presenting results even in real-time scenarios. Taking deep learning into account, some architectures such as UNet, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 281–291, 2023. https://doi.org/10.1007/978-3-031-35510-3_27
282
A. C. F. da Silva and O. A. C. Cortes
DeepLabV3, and MaskRCNN (so-called Convolution Neural Network - CNN) have provided satisfactory results for different purposes using images. The decision and building of a deep learning model to segment images requires intensive experimentation and in-depth knowledge to design specific architectures for each problem. Based on these difficulties, some strategies in AutoML can help seek a suitable solution. In this context, Neural Architecture Search (NAS) is a subfield that involves seeking parameters and the neural architecture structure with search strategies. There are different possibilities for searching the space to find a better architecture, starting with a useful backbone in which the model can handle generic shapes, allowing fewer parameters during the search process. In this work, we propose a search mechanism to find an efficient neural architecture capable of segmenting fine iron ore in different images in real-time. A comparison between our architecture with some state of art deep learning models is shown using a metric named Intersect Over Union (IoU) [10], which is the proper metric to deal with different segmentation scales. Additionally, we used the UNet as a backbone; then, we performed the search for a complementary CNN using Genetic Algorithms, leading us to the GANASUNet model.
2
Related Work
Regarding the mining field, image processing algorithms are essential tools. One of the most famous ones is the watershed algorithm [12], which has been used frequently to separate iron ore particles; however, it presents a critical issue: over-segmentation. Deep learning models have been investigated in this context, especially those related to convolutional neural networks (CNN). Next, we present some deep-learning approaches in the mining field, covering works on iron ore segmentation and neural architecture search. The distribution of iron ore, the so-called granulometry analysis, is an essential indicator for decision-making in real-time. This task analysis can be done in a few steps, like conveyor belts. In the article from Liu et al. [9], traditional methods of image extraction features were used for pre-processing the images, and the UNet and ResNet segmentation networks were used only to identify the contour of the ore grains. The result can be compared with traditional methods using three metrics: Segmentation Precision (SA), Over-Segmentation (OS), and Under-Segmentation (US). As a result, it is possible to identify that the proposed technique had a mean SA of 0.94 and Watershed’s algorithm of 0.64. Additionally, the US values of the proposed algorithm achieved a value of 0.10, while Watershed had a result of 0.34. Chen et al. [3] proposed a method of segmenting and tracking rocks in videos using two convolution networks: Mask R-CNN and Deep Sort. The Mask RCNN model has been improved by employing an ArcFace-based loss function to improve feature recognition. With a database of 180 images, data augmentation strategies were applied to increase the base to 900 images. The experiments showed a result of 84.27% of F1-Score for the improved model and 79.99% for the standard model, an increase of 4.28%.
GANASUNet
283
To identify and calculate pellet iron ore granulometry, Duan et al. [6] improved an UNet to develop a lighter neural network called lightweight UNet, with fewer parameters and addiction of normalization layers to reduce computational time. The experimental results showed that the model needs 1.5 to 3 s to process a frame with 256 × 256 pixels, an acceptable computational time to use the model in real-time. Furthermore, the proposal was compared against the canonical UNet using the DICE metric, getting a value of 0.8597 for the proposal and 0.6035 for the canonical model. In Svensson’s work [15], a database with 180 microscopic images of fine iron ore was used in an experiment for the segmentation using the following convolutional neural networks: PSPNet, FC-DenseNet, DeepLabV3+, and GCN. The experiments were applied to identify the best architecture, the ideal size of the training database, and the impact of data augmentation strategies [8]. With the techniques of increasing the database, the average accuracy of the models increased by 3.14%, and the average IoU improved by 6.84%. Concerning the seek for the best architecture, as previously stated, NAS seeks, at the same time, parameters and the neural architecture using search strategies. Weng et al. [16] proposed a search architecture based on an UNet backbone, searching for the best composition of cells that form graphs representing primitive operations like convolutions and pooling operations. As a result, they presented the NAS-UNet model that obtained better mIoU results compared to the traditional UNet and the FC-DenseNet model. The investigation executed by Domingos et al. [5] used a genetic algorithm to evolve convolutional neural networks. In this paper, they optimized the DeepEmotive neural network in a search space of 5 parameters: Convolution layers, Filters, Filter Size, Subsampling Type, and Dense Layer Neurons. The experiments were carried out in two different databases for emotions’ classification: FER-2013, in which the optimized solution had a 1.68% increase in accuracy, and the Cohn-Kanade AU with a 0.68% increase. None of the proposals presented analyzes the material in real-time at a specific screening step inside the plant, where image conditions are impaired by vibration, changes in lighting, and the environment inherent to iron ore particles. This work differs from the works presented, as the database was acquired at a different stage, the sieving stage. Furthermore, this experiment aims to segment iron ore regions with a smaller granulometry called fine iron ore.
3
Our Proposal: GANASUNet
In identifying an optimized architecture for segmenting fine iron ore in images, we used the canonical UNet segmentation network as a backbone. Afterward, we seek a new CNN for performing the classification task. Thus, based on [16] work, a process of searching was carried out to optimize the architecture modularly, in which a cell was optimized and had different operations identified by the NAS algorithm. The following steps devise the proposed method: (1) Acquisition of
284
A. C. F. da Silva and O. A. C. Cortes
iron ore images and manual labeling; (2) Genetic Algorithm Neural Architecture Search (GANASUNet) architecture using Genetic Algorithms; (3) Model Training and Evaluation. 3.1
Data Collection and Acquisition
A region of fines can be defined as an area with a more significant proportion of fine iron ore, which may contain some iron ore grains with larger grain sizes. For each image in the database, a manual segmentation process was applied to find fine iron ore regions, following the guidelines of a specialist in the mining area. Figure 1 shows some examples of this process.
Fig. 1. An example of fine iron ore data labeled by a specialist
Only two classes are mapped into the segmentation mask using the neural network to identify the fine ore region. A 256 × 256 matrix is filled with values of 0 (zero), representing the absence of a fine ore region. In contrast, the matrix is filled with 1 (one) in the case of a region with fine ore. 3.2
Model Architecture Using Genetic Algorithms
The process of generating the NASUNet architecture using a GA started with deciding the kind of search space, which is discrete. The discretization of the search space also impacts the configuration of chromosomes because it simplifies its representation. Thus, each chromosome has 15 genes to represent architectural configurations. The first four genes indicate the type of convolutions used in each neuron of the fetched cell. The options are a convolution, two convolutions in a row, or a convolution with dilation. Genes 5 through 10 indicate the connection of neurons to each other. To optimize the search runtime, each neuron can only connect with the following neurons, and if a neuron has no connection, it is discarded from the cell. The kernel size can range from 2 × 2 to 5 × 5, represented by gene 11. Genes 12 to 15 represent each cell’s dropout amount, varying in the set ‘0’, ‘0. 2’, ‘0. 5’. The complete configurations of a chromosome can be seen in Table 1.
GANASUNet
285
Table 1. Search space to the chromosome Gene Options
Settings
1
{Conv2D, 2 Conv2D, Conv2D with dilation} First Neuron Convolution
2
{Conv2D, 2 Conv2D, Conv2D with dilation} Second Neuron Convolution
3
{Conv2D, 2 Conv2D, Conv2D with dilation} Third Neuron Convolution
4
{Conv2D, 2 Conv2D, Conv2D with dilation} Fourth Neuron Convolution
5
{Yes, No}
Neuron Connection {first, second}
6
{Yes, No}
Neuron Connection {first, third}
7
{Yes, No}
Neuron Connection {first, fourth}
8
{Yes, No}
Neuron Connection {second, third}
9
{Yes, No}
Neuron Connection {second, fourth}
10
{Yes, No}
Neuron Connection {third, fourth}
11
{(2,2), (3, 3), (4, 4), (5, 5)}
Kernel Size
12
{’0’, ’0.2’, ’0.5’}
First Cell Dropout
13
{’0’, ’0.2’, ’0.5’}
Second Cell Dropout
14
{’0’, ’0.2’, ’0.5’}
Third Cell Dropout
15
{’0’, ’0.2’, ’0.5’}
Fourth Cell Dropout
4 4.1
Experimental Results Settings
All networks used in this experiment were trained using the parameters shown in Table 2. The batch value adopted was chosen based on the memory limitation for training the networks. The Adam optimizer was used because of its fast convergence properties. Furthermore, the Learning Rate parameter changes after five iterations without improving the validation values. The default stop criterion is 100 iterations. Additionally, the training process finishes if the loss value of the validation does not decrease for ten consecutive iterations, a constraint that was added in order to avoid overfitting [17]. Table 2. Training parameters to neural networks Parameters
Value
Batch
5
Optimizer
Adam
Learning Rate [1.0e-04; 1.0e-05]
286
A. C. F. da Silva and O. A. C. Cortes
The configuration used by the traditional UNet network consisted of a VGG19 [14] for the downsampling layers, with a dropout of 0.5 after each cell of the upsampling operations. Additionally, transfer learning was applied from VGG19 with pre-training weights from image-net [4]. Regarding the DeepLabV3 neural network [18], the main feature is the use of several convolution layers with dilation [2]. The default network configuration used a ResNet50 [7] as an initial structure and transfer-learning of weights from image-net training. Concerning the Genetic Algorithms, the objective function was defined as maximizing the Intersect Over Union result of the test data. The configurations adopted for the search engine developed in this work are in Table 3. Table 3. Genetic Algorithm settings Parameters
Values
Crossover Rate
80% size (test)
Evaluation Function
i=1
IoU
Population Size
5 individuals
Selection Type
Tournament selection
Mutation Rate
6.6%
Chromosomes codification Real
4.2
Results
As previously stated, the modeling of a chromosome was presented in Table 1. Thus, the GA searches for the best combination of parameters within the problem’s search space. The neural search based on the UNet network backbone identified that the data presented, the best set of characteristics, which represents the best individual, was: [0
2
1
1
1
0
1
0
1
1
3
2
0
1
0]
Table 4 presents the configuration according to each individual gene found. It is possible to identify that fewer dropout operations and a larger kernel size represented a better result. We called this architecture GANASUNet. The graphical representation of the cell structure can be seen in Fig. 2.
GANASUNet
287
Table 4. GANASUNet - Results found based on the configuration to each gene Value
Settings
Results
0
First Neuron Convolution
Conv2D
2
Second Neuron Convolution
Conv2D with dilation
1
Third Neuron Convolution
2 Conv2D layers
1
Fourth Neuron Convolution
2 Conv2D layers
1
Connection of the first neuron with the second
Yes
0
Connection of the first neuron with the third
No
1
Connection of the first neuron with the fourth
Yes
0
Connection of the second neuron with the third No
1
Connection of the second neuron with the fourth Yes
1
Connection of the third neuron with the fourth No
3
Kernel Size
2
First Dropout Cell
0.5
0
Second Dropout Cell
0
1
Third Dropout Cell
0.25
0
Fourth Dropout Cell
0
4x4
Fig. 2. Graphic representation of the optimized cell
The training process was evaluated using a similarity metric called Intersect Over Union (IoU). The IoU is the ratio between the intersection and the union of two regions X and Y , according to Eq. 1 [1]. IoU =
X ∩Y X ∪Y
(1)
An essential property of IoU relies on the area of regions allowing this metric to be invariant regarding scales and, therefore, widely used in object segmentation and detection [1]. For the training process, the loss function selected was based on the IoU calculation. The objective is to minimize the negative value of this metric according to Eq. 2 [11].
288
A. C. F. da Silva and O. A. C. Cortes
LIoU = −IoU
(2)
To evaluate the efficiency of the neural architecture found by the GA-based optimization process, its results were compared against two other architectures that represent state of the art regarding convolutional neural networks applied to the segmentation tasks: Canonical UNet [13], and DeepLabV3 [18]. In the segmentation models training stage, the image dataset comprises 688 images and was divided according to the following configuration: 70% training, 20% testing, and 10% validation. Figure 3 (a) presents information related to traditional UNet training. In the figure, the blue series represent the training data error. In contrast, the series represented in orange shows the variation of the model error against the validation dataset. The training stage lasted 40 epochs, being interrupted due to the stop criterion that ends the process after ten consecutive epochs without reducing the validation error. Figure 3 (b) presents details regarding DeepLabV3 training stage. As in the previous figure, the blue series depicts the error in the training data. On the other hand, the series represented in orange shows the variation of the model error against the validation dataset. The training stage lasted only 25 epochs, following the already presented criterion of interruption of the training process after ten epochs iterations without decreasing the validation error. There is a difference in the number of epochs required for neural network models to converge in the training process, while DeepLabV3 needed less than 5 epochs to reach the loss value of −0.7, UNet required 28 epochs.
Fig. 3. The loss value to the training epochs
The results showed a difference of approximately 2% between the IoU obtained with the neural search architecture compared to the result obtained by the DeepLabV3 architecture. Compared to traditional UNet architecture, the increase is approximately 5%. Table 5 presents the results of the average of 100 test runs for each of the three networks.
GANASUNet
289
Table 5. Mean IoU and Standard deviation results to 100 tests runs Model
IoU
Standard Deviation
UNet
0.7542
0.005
DeepLabV3
0.7804
0.008
GANASUNet 0.8034 0.016
Figure 4 presents cases in which the segmentation correctly identified the area of interest, differentiating areas of granulated ore from fine regions. The segmentation represented by images in the top row has 79% of IoU, and the segmentation presented in the bottom images has 72% of IoU.
Fig. 4. Real images (Left-column); Segmentation by GANASUNet (Middle Column); Segmentation by Specialist (Right-Column)
5
Conclusion
In this work, a search engine was developed to find an efficient convolutional neural network for fine iron ore segmentation. The UNet’s architecture was used as a basis for the construction of an optimized structure for the problem, called GANASUNet. The architecture found was tested on a database collected during different periods of the day and in different seasons. Of the videos collected, 688 images were segments to build the dataset used in the training and testing stages of the models. The NASUNet model was trained using the constructed dataset and obtained the value of 80% of intersection over the union. The results obtained are promising, considering that the images were collected in a real production environment
290
A. C. F. da Silva and O. A. C. Cortes
with constant equipment vibration, different variations in ore format, and a similar color appearance throughout the material. In this context, the following contributions were accomplished: the implementation of an image dataset of iron ore from an industrial environment, the creation of a pre-processed database with manual segmentation performed by a specialist in the mining field, the usage of a GA-based optimization mechanism to perform the architecture search optimized for ore segmentation, the modeling of the set of parameters used by the search engine to find an optimized model, and the evaluation of the effectiveness of the segmentation method for the fine iron ore. Even with the results considered satisfactory for the contribution of the work and methodology created, some improvements are still needed in future work: expand the database with iron ore samples from other mining areas, and since the proposal of this work was the segmentation of fine ore, to have a better understanding about the mining material, it is necessary a more specific work to identify and also segment iron ore granulated.
References 1. van Beers, F.: Capsule networks with intersection over union loss for binary image segmentation, February 2021. https://doi.org/10.5220/0010301300710078 2. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017) 3. Chen, M., Li, M., Li, Y., Yi, W.: Rock particle motion information detection based on video instance segmentation. Sensors (Basel, Switzerland) 21 (2021) 4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009) 5. Domingos, D., Carmona Cortes, O., Lobato, F.: Evoluindo redes neurais convolucionais na detec¸ca ˜o de emo¸co ˜es usando micro ags (05 2022) 6. Duan, J., Liu, X., Wu, X., Chuangang, M.: Detection and segmentation of iron ore green pellets in images using lightweight u-net deep learning network. Neural Comput. Appl. 32 (2020). https://doi.org/10.1007/s00521-019-04045-8 7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015) 8. Hernandez-Garcia, A.: Data augmentation and image understanding. arXiv preprint arXiv:2012.14185 (2020) 9. Liu, X., Yuwei, Z., Jing, H., Wang, L., Sheng, Z.: Ore image segmentation method using u-net and res unet convolutional networks. RSC Advances 10, 9396–9406 (2020) 10. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression (2019) 11. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019). https://doi.org/10.1109/CVPR.2019.00075
GANASUNet
291
12. Roerdink, J., Meijster, A.: The watershed transform: Definitions, algorithms and parallelization strategies. Fundam. Inf. 41 (2003). https://doi.org/10.3233/FI2000-411207 13. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. CoRR abs/1505.04597 (2015). http://arxiv.org/abs/ 1505.04597 14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014) 15. Svensson, T.: Semantic Segmentation of Iron Ore Pellets with Neural Networks (Dissertation). Ph.D. thesis, Lule˚ a University of Technology (2019). http://urn. kb.se/resolve?urn=urn:nbn:se:ltu:diva-74352 16. Weng, Y., Zhou, T., Li, Y., Qiu, X.: Nas-unet: neural architecture search for medical image segmentation. IEEE Access 7, 44247–44257 (2019). https://doi.org/10. 1109/ACCESS.2019.2908991 17. Ying, X.: An overview of overfitting and its solutions. J. Phys. Conf. Ser. 1168, 022022 (2019). DOI: https://doi.org/10.1088/1742-6596/1168/2/022022 18. Yurtkulu, S.C., S ¸ ahin, Y.H., Unal, G.: Semantic segmentation with extended deeplabv3 architecture. In: 2019 27th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2019)
Classifying 2D ECG Image Database Using Convolution Neural Network and Support Vector Machine Tran Ngoc Tuan, Duong Trong Luong(B) , Pham Viet Hoang, Tran Quoc Khanh, Hoang Thi Lan Huong, Tran Xuan Thang, and Tran Thuy Hanh School of Electrical and Electronics, Hanoi University of Science and Technology, Hanoi, Vietnam [email protected]
Abstract. An electrocardiogram (ECG) is a technique for capturing the electrical activity of the heart and offers a way to diagnose conditions that are related to the heart. Any irregular heartbeat that results in an anomaly in cardiac rhythm is known as an arrhythmia. Arrhythmia early identification is crucial to preventing many diseases. It is not possible to swiftly identify arrhythmias that could result in unexpected deaths by manually analyzing ECG readings. In order to build computer-aided diagnostic (CAD) systems that can automatically recognize arrhythmias, numerous studies have been published. In this paper, we offer a unique method for classifying a database of 2D ECG images using convolutional neural networks (CNN) and support vector machines (SVM) to aid in the diagnosis of arrhythmias. The MIT-BIH database served as the source for the experimented data. Results from the suggested method have an accuracy of 98.92%. Keywords: Classify · Convolution Neural Network · Support Vector Machine · Accuracy · Arrhythmia
1 Introduction The most obvious indication of cardiac function is the heartbeat, which is a fundamental physiological process of the human body. The heartbeat may exhibit a number of aberrant states, including tachycardia, bundle branch or atrioventricular obstruction, and premature atrial or ventricular contraction, depending on one’s age and lifestyle choices.Therefore, arrhythmia affects millions of people, moreover, some emergency arrhythmias are life-threatening and can lead to cardiac arrest [1]. Atrial fibrillation affects 2% to 3% of people in Europe and North America as of 2014 [2]. In 2013, atrial fibrillation and atrial flutter caused 112,000 deaths, up from 29,000 in 1990 [3]. Due to the COVID-19 virus’s capacity to produce myocardial harm, cardiac arrhythmias are frequently established and associated with substantial morbidity and death among patients hospitalized with the infection in the most recent episodes of the SARS-CoV2 pandemic [4]. Roughly 15% of all deaths worldwide and about 50% of CVD deaths are caused by sudden cardiac death [5]. Ventricular arrhythmias account © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 292–300, 2023. https://doi.org/10.1007/978-3-031-35510-3_28
Classifying 2D ECG Image Database
293
for about 80% of sudden cardiac mortality. Arrhythmias can happen to anyone, but older persons are more likely to have them. Arrhythmias could also happen. CVD can be avoided with early detection and efficient therapies such vagal exercises and medicines. Electrocardiogram (ECG) signals are typically analyzed to determine the arrhythmia in a clinical context. The electrical activity of the heart throughout time is represented by an ECG signal, which is made up of waves that look like heartbeats and repeat at regular intervals. The doctor monitors these heartbeats to determine whether an arrhythmia is present, a labor- and time-intensive process. In recent years, many advances in diagnosis and treatment have brought patients a higher quality of life, longer life expectancy, especially the achievements of AI in medical research. Because of its outstanding advantages in medical research, the researchers mainly used machine learning or deep learning to build their models to identify ECG signals in previous studies [6, 7]. The build process consists of two main steps: feature extraction and classification [8–13]. Different techniques, such as artificial neural networks (ANNs), multi-view-based learning, and linear discriminants (LDs), have been employed for classification. Despite the positive results these methods have produced, there are notable differences between the ECG waves and their morphological properties for different individuals, as well as between the ECG waves taken at various times for the same patient. These approaches’ fixed properties are insufficient to reliably distinguish between the arrhythmia of various patients. Deep learning-based methodologies have recently drawn increasing amounts of attention as a result of the rapid development of deep neural networks. In our paper, an automatic ECG classification method is proposed based on Inter-beat intervals, Continuous Wavelet Transform (CWT), Convolution Neural Network (CNN), and Support Vector Machine (SVM) for ECG classification.
2 Method The method consists of preprocessing, transform the signal into the 2D spectral images using CWT, create data RR interval, and classification. 2.1 Database In the experiments, we use the MIT-BIH arrhythmia database. We follow the recommendations of the Association for the Advancement of Medical Instrumentation (AAMI) for class labeling. The AAMI standard defines five classes of interest: normal (N), ventricular (V), supraventricular (S), fusion of normal and ventricular (F), unknown bests (Q) [14]. Moreover, as the Q class is basically nonexistent, we ignore it. As training, 800 ECG (200 from each class) were used for every of 22 records of the MIT-BIH database termed as DS1 = {101, 106, 108 109, 112, 114, 115, 116, 118, 119, 122, 124, 203, 205, 207, 208, 209, 215, 220, 223, 230}. The remaining database records are arranged into DS2 = {100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212, 213, 214, 219, 221, 222, 228, 231, 232, 233, 234} which used to test the performance of the method [15].
294
T. N. Tuan et al.
2.2 Pre-processing Data denoising: In this paper, we use the 50 Hz notch and the discrete wavelet transform (DWT) with a basis function of bio 3.9 to eliminate the influence of electrode artifact noise, white noise and baseline wander noise (BW) on the ECG signal (Fig. 1).
(a)
(b) Fig. 1. The ECG signal (a) Raw ECG and (b) filtered ECG
Heartbeat segment: To get a fixed-size ECG signal of 200 samples for each heartbeat, we take 90 samples before and 110 samples after the R-peak. These sample points have effectively captured the most significant heartbeat waves. Figure 2 provides an illustration of the segmentation.
Fig. 2. (a) An illustration of the segmentation, (b) ECG Beat
Arrhythmia typically affects not only the shape of the heartbeat but also the surrounding RR intervals (also known as R peak intervals), shown in Fig. 3. Consequently, we include RR interval data to our CNN for ECG categorization. For example, previousRR, post-RR, ratio-RR, and local-RR are four often utilized RR interval properties that are extracted [16]. The prior-RR is the amount of time between the previous and current heartbeats. The RR-interval between the current and following heartbeats is known as the post-RR. The ratio of prior to subsequent RRs is known as the RR ratio. The local-RR is the average of the last ten RR intervals of the present heartbeat. The previous-RR, postRR, and local-RR have all been subtracted from the average RR-interval to remove interpatient variation. The fusion features are sent into two completely connected layers for classification.
Classifying 2D ECG Image Database
295
Fig. 3. RR interval
2.3 ECG Signal Transformation Using CWT The most common time-frequency for CWT signal decomposition is the time-frequency domain, and wavelet functions are typically used. By modifying the scale and translation settings, CWT can offer high time resolution and low frequency resolution in the high frequencies and high frequency resolution and low time resolution in the low frequencies.The wavelet transform fulfills these two requirements. It makes the continuous signal x(t) from one dimension in two a 2D space defined as formula (1): +∞ t−b 1 dt (1) x(t)φ S(a, b) = √ a a −∞ where, a is a scale factor, b is a translation factor applied in the continuous mother wavelet. In this step, the CWT is applied to generate an ECG signal into a 2D spectrum. When analyzing signals, such as ECG signals, we use the “Mexican hat” and gauss8 wavelet close to the signal (Figs. 4, 5 and 6).
Fig. 4. Mexh wavelet
Fig. 5. Gaus8 wavelet
2.4 CNN-SVM Model Architecture The final output layer of the CNN model is swapped out for an SVM classifier to create the architecture of a CNN-SVM mode. Ten values between 0 and 1 calculated by the
296
T. N. Tuan et al.
(a)
(b)
Fig. 6. ECG signal and CWT scalogram: (a) Gaus8 wavelet scalogram, (b) mexh wavelet scalogram
SoftMax activation function are the outputs of the final dense layer in the CNN. The linear combination of the outputs from the preceding hidden layer with trainable weights plus a bias term makes up the input of the activation function. The hidden layer’s output values can be used as input features for additional classifiers in addition to making sense to the CNN model [17]. The hidden layer’s outputs are used by the SVM as a new feature vector during training. The SVM classifier will be utilized to perform once its training is complete [18] (Fig. 7).
Fig. 7. The architecture of a combining CNN-SVM model, where the final layer of the CNN model is replaced by a SVM classifier
Based on the characteristics of the CNN network and the characteristics of the SVM algorithm as well as the previously published research, the author proposes the classify 2D ECG image database using the CNN-SVM model. The CNN model is used to extract the features of the image. These features are input to the SVM classification algorithm, the SVM algorithm classifies SVEB and VEB classes by the database of 2D electrocardiogram images to achieve a good result.
Classifying 2D ECG Image Database
297
3 Result and Discussions In the results, we used 3 parameters to evaluate the effectiveness of our proposed model: Positive predictive value (PPV), sensitive (SE) and accuracy (ACC). These are defined by their equations: PPVi = SEi = ACCi =
TPi TPi + FPi TPi TPi + FNi
TPi + TNi TPi + TNi + FPi + FNi
where: • TPi (True positive) and FNi (false negative) refer to the number of the i-th class correctly predicted and the number of the i-th class classified into other classes, respectively. • TNi (True negative) and FPi (false positive) are the number of other classes that is not classified as the i-th class and the number of other classes is predicted as the i-th class, respectively. Because the SVEB and VEB classes are more important than other classes in the ECG classification, we examine these two classes in depth, as shown in Table 3. 3.1 Comparison Result Between CNN-SVM and Normal CNN The first following two tables are confusion matrix of CNN with cross-entropy and CNN with SVM classifier, respectively. Number of samples in each classis are imbalances, with much more instances on N class. However, it is natural in real life. An observation can be obtained is that the two matrices are quite similar. True positive in N class of Vanilla CNN is higher than SVM classifier, while this value in VEB and SVEB is lower than SVM (Tables 1 and 2). Table 1. The confusion matrix of Vanilla CNN Predicted Label True Label
N
S
V
F
Total
N
43986
134
54
44
44218
S
228
1495
113
0
1836
V
201
11
3003
4
3219
F
357
2
28
1
388
Total
44772
1642
3198
49
49661
298
T. N. Tuan et al. Table 2. The confusion matrix of CNN-SVM Predicted Label True Label
N
S
V
F
Total
N
43915
147
72
84
44218
S
124
1547
164
1
1836
V
54
44
3118
3
3219
F
302
3
55
28
388
Total
44395
1741
3409
116
49661
Table 3. The classification performance of Vanilla CNN and CNN-SVM in SVEB and VEB classes
SVEB
VEB
CNN
CNN-SVM
PPV
91.05%
88.86%
Sensitive
81.43%
84.26%
Accuracy
99.02%
98.92%
F1-Score
85.97%
86.50%
PPV
93.29%
91.46%
Sensitive
93.90%
96.86%
Accuracy
99.12%
99.21%
F1-Score
93.59%
94.08%
The metrics to evaluate model are positive predicted value (PPV), sensitivity (SE) and accuracy (ACC), which is described formally in previous section. Table above is evaluation of these metrics over SVEB and VEB class. We can see that overall performances of Vanilla CNN and CNN with SVM classifier are almost the same, the accuracy of Vanilla CNN is slightly higher than SVM classifier in SVEB class, whereas SVM is a little better than Vanilla CNN in VEB on the other hand. More detail about the performance, Vanilla CNN is better than SVM classifier in PPV metrics in both two classes, with about 2% higher, which means that Vanilla CNN is more precise. In contradiction, SVM classifier has higher score on SE metrics compared to Vanilla CNN, it’s performance on SE metrics is about 3% better. This shows that there is a slightly trade-off between the Vanilla CNN and CNN-SVM classifier. 3.2 Performance Comparison with Other Published Methods Performance of our method is competitive to other state-of-the-art model, with highest accuracy on SVEB and second place on VEB. Specifically, our model achieves best performance of SE score, and slightly lower of PPV metrics on both class.
Classifying 2D ECG Image Database
299
Table 4. The classification performance of Vanilla existing works and our CNN-SVM method in SVEB and VEB classes Method
VEB PPV
SVEB SE
F1
Acc
PPV
SE
F1
Acc
SVM [16]
92.75% 85.48% 88.96% 98.63% 35.98% 79.06% 49.46% 93.33%
CNN and CWT [19]
93.25% 95.65% 94.43% 99.27% 89.54% 74.56% 81.37% 98.74%
Deep CNN [20]
76.51% 90.20% 82.79% 97.45% 38.87% 33.12% 36.18% 95.49%
Dynamic features 85.25% 70.85% 77.38% 97.32% 38.40% 29.5% [21] Our method
33.36% 95.34%
91.46% 96.86% 94.08% 99.21% 88.86% 84.26% 85.97% 98.92%
3.3 Discussion To make a fair comparison, we only compare it to the method that uses the same dataset in this study. Table 4 is the comparison of our method with other four related works. Among existing methods, the best is proposed by Wang et al. [19], which outperforms other in most of metrics. This method achieved an accuracy of around 99% in SVEB and VEB classes. The overall performance of our proposal is competitive with this work. However, looking into more details, our model is much better in SE metric of SVEB class, approximately 10% higher, and PPV score is nearly the same, which consequently leads to a 5% higher F1 score. VEB, the performance of this method can be considered as the same, with a little higher PPV and SE score of Wang et al. method, comparing to our proposal, about 2% and 1% respectively.
4 Conclusion In recent years, deep learning modeling and machine learning have made many advances in different fields, especially in medicine. The traditional method consumes a lot of human and material resources and also wastes a lot of treatment time of doctors and patients, because it is especially important to create algorithms to detect and diagnose arrhythmias automatically.. In our method, we are using CNN Combine SVM to classify the database of 2D electrocardiogram images to support arrhythmia diagnosis with high results, with an accuracy of up to 98.92% (can give results). In the future, we plan to improve the method for optimizing the accuracy and designm the classification software for application. Furthermore, we will try to classify more types of ECG signals in different data sets.
References 1. Sannino, G., De Pietro, G.: A deep learning approach for ECG-based heartbeat classification for arrhythmia detection. Future Gener. Comput. Syst. (2018)
300
T. N. Tuan et al.
2. Zoni-Berisso, M., Lercari, F., Carazza, T., Domenicucci, S.: Epidemiology of atrial fibrillation: european perspective. Clin. Epidemiol. 6, 213–220 (2014). https://doi.org/10.2147/CLEP. S47385, PMC 4064952. PMID 24966695 3. GBD 2013 Mortality Causes of Death Collaborators: Global, regional, and national agesex specific all-cause and cause-specific mortality for 240 causes of death, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet 385(9963), 117–171 (2015) 4. Kuck, K.H.: Arrhythmias and sudden cardiac death in the COVID-19 pandemic. Herz 45(4), 325–326 (2020) 5. Mehra, R.: Global public health problem of sudden cardiac death. J. Electrocardiol. 40(6 Suppl), S118–S122 (2007) 6. Priya, K.D., Rao, G.S., Rao, P.S.V.S.: Comparative analysis of wavelet thresholding techniques with wavelet-Wiener filter on ECG signal. Procedia Comput. Sci.e 87, 178–183 (2016) 7. Yochum, M., Renaud, C., Jacquir, S.: Automatic detection of P, QRS and T patterns in 12 leads ECG signal based on CWT. Biomed. Signal Process. Control 25, 46–52 (2016) 8. Gao, Z., Wang, L., Zhou, L., Zhang, J.: HEp-2 cell image classification with deep convolutional neural networks. IEEE J. Biomed. Health Inform. 21(2), 416–428 (2017) 9. Li, W., Wu, G., Zhang, F., Du, Q.: Hyperspectral image classification using deep pixel-pair features. IEEE Trans. Geosci. Remote Sens. 55(2), 844–853 (2016) 10. Huang, Y., Wu, R., Sun, Y., Wang, W., Ding, X.: Vehicle logo recognition system based on convolutional neural networks with a pretraining strategy. IEEE Trans. Intell. Transp. Syst. 16(4), 1951–1960 (2015) 11. Hariharan, B., Arbelaez, P., Girshick, R., Malik, J.: Object instance segmentation and finegrained localization using hypercolumns. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 627–639 (2017) 12. Wu, X., Du, M., Chen, W., Li, Z.: Exploiting deep convolutional network and patch-level CRFs for indoor semantic segmentation. In: Presented at the 2016 IEEE. 11th conference on ındustrial electronics and applications (ICIEA), pp. 150–155 (2016) 13. Liu, Y., Chen, X., Peng, H., Wang, Z.: Multi-focus image fusion with a deep convolutional neural network. Inf. Fus. 36, 191–207 (2017) 14. Al Rahhal, M.M., Bazi, Y., Al Zuair, M., Othman, E., BenJdira, B.: Convolutional neural networks for electrocardiogram classification. J. Med. Biol. Eng. 38(6), 1014–1025 (2018). https://doi.org/10.1007/s40846-018-0389-7 15. De Chazal, P., O’Dwyer, M., Reilly, R.B.: Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. (2004) 16. Zhang, Z., Dong, J., Luo, X., Choi, K.S., Wu, X.: Heartbeat classification using diseasespecific feature selection. Comput. Biol. Med. (2014) 17. Basly, H., et al.: CNN-SVM learning approach based human activity recognition. In: International Conference on Image and Signal Processing (2020) 18. Luong, D.T., Kien, H.T., Duc, N.M.: Automatic White Blood Cell Classification Using the Combination of Convolution Neural Network and Support Vector Machine (2020) 19. Wang, T., Lu, C., Sun, Y., Yang, M., Liu, C., Ou, C. Automatic ECG Classification Using Continuous Wavelet Transform and Convolutional Neural Network (2021) 20. Jian, L., Shuang, S.; Guozhong, S., Yu, F.: Classification of ECG Arrhythmia Using CNN, SVM and LDA. In: Proceedings of the Artificial Intelligence and Security, 5th International Conference (2019) 21. Chen, S., Hua, W., Li, Z., Li, J., Gao, X.: Heartbeat classification using projected and dynamic features of ECG signal. Biomed. Signal Process. Control (2017)
Conceptual Model of a Data Visualization Instrument for Educational Video Games Yavor Dankov(B) Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohdridski”, Sofia, Bulgaria [email protected]
Abstract. The design and creation of specialized instruments that are focused on visualizing gaming and learning results from educational video games will help both learners and players perceive their achieved results, as well as designers and creators of educational video games in the design and creation process of new, improved educational video games within the software platforms for educational video games. In this regard, creators and users of educational video games must have such instruments to facilitate the perception of generated information from the video games played by learners/players. On the other hand, the visualization of the data on the gaming and learning results using various data visualization methods and techniques will provide an easy and understandable way to perceive this valuable information. Therefore, the paper focuses on designing a specialized instrument for visualizing gaming and learning results from educational video games. The paper presents the conceptual model of the instrument and the main functionalities this tool can provide to users of educational video games created and generated within the framework of an educational platform for maze video games for education, in particular, the APOGEE software platform. The paper also presents a detailed conceptual model of the data visualization instrument for managing the account and visualizing data functionalities. Keywords: Educational Video Games · Data Visualization · Serious Games · Game-based learning · Instruments
1 Introduction In modern society, video games are increasingly becoming integral to everyday life. Undoubtedly, video games have a significant influence and impact on users and are a current topic for researchers, practitioners, manufacturers and video game users, as well as for the video game industry for entertainment, game distribution, etc. [1]. Video games are increasingly used in modern learning in various fields because of their proven “power” and the possibility of integrating essential messages and values in them - these are the so-called serious games [2–4]. In this paper, the focus is on video games in the field of education as part of serious video games. Educational video games enable the implementation of modern interactive learning strategies (including game-based learning) [5] and provide an attractive and interactive © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 301–309, 2023. https://doi.org/10.1007/978-3-031-35510-3_29
302
Y. Dankov
way of presenting educational and game content to learners and players in the online environment [6, 7]. Creating and designing educational video games and the means to improve and facilitate this process is a popular and current topic that develops daily [8–10]. Adapting and adjusting the game design to users’ needs, requirements, and preferences also contributes to developing the field of study. [11, 12]. Numerous studies and surveys enrich the scientific field and strive for improvement and development [13, 14]. In several studies, educational video games are perceived as a complete instrument for game-based learning applications [15–17], applied in an architectural, colourful and user-engaging environment [18, 19], especially for the learning process that predisposes to the interactive perception of the learning content and increasing the user experience in the games. On the other hand, enriching the opportunities of educational platforms for the creation of video games for education with specialized instruments to support the design, creation and use of these games will significantly improve and develop these platforms. The design and creation of specialized instruments that are focused on visualizing gaming and learning results from educational video games will help both learners and players perceive their achieved results, as well as designers and creators of educational video games in the design and creation process of new, improved educational video games within the software platforms for educational video games. In this regard, the creators and users of educational video games need such instruments to facilitate the perception of generated information from the video games played by learners/players. On the other hand, the visualization of the data on the gaming and learning results using various data visualization methods and techniques will provide an easy and understandable way to perceive this valuable information and extract knowledge from data [20]. Therefore, the paper focuses on designing a specialized instrument for visualizing gaming and learning results from educational video games. The paper presents the conceptual model of the instrument and the main functionalities this tool can provide to users of educational video games created and generated within the framework of an educational platform for creating maze video games for education, in particular, the APOGEE (https://apogee.online/) software platform. Designing and creating educational video games is a laborious process, most commonly implemented within the framework of a specially developed software platform for creating and generating educational video games. Therefore, for the instrument’s design, development and future improvement, it is planned to realize experimental research through the APOGEE software platform for the automated creation and generation of maze video games for education. The platform is described in [21, 22]. This instrument’s design and future development will provide opportunities for data visualization through various methods and techniques for visualizing data. Therefore, this will help facilitate the overall perception and visual analysis of game session-generated data. Stakeholders will have the opportunity to visually perceive and analyze educational video gaming and learning results in a modern and current way. The design and creation of an instrument for visualizing gaming and learning results from educational video games will provide a current and possible solution to address the challenges of perceiving and analyzing gaming and learning results by different stakeholders, including users with diverse knowledge in the field.
Conceptual Model of a Data Visualization Instrument
303
The paper continues as follows: Sect. 2 consists of two subsections, presenting the proposed conceptual model of the instrument for visualization of gaming and learning results from maze video games for education within the APOGEE software platform. Subsection 2.1. Presents the initially designed user functionalities of the instrument and the conceptual model of the main functionalities of the instrument. Subsection 2.2. Presents the detailed conceptual model of the data visualization instrument for managing the account and visualizing data functionalities. The paper ends with a conclusion and future work.
2 The Proposed Conceptual Model 2.1 User Functionalities Model of the Instrument The design and development of the instrument are based on its application in the educational platform for the automatic creation and generation of maze video games for education. Integrating such an instrument into the platform will significantly improve its capabilities and provide an opportunity for visual processing and perception of the results of the game sessions played by the platform users. On the other hand, the visualization of the data for all the results of the game sessions will allow designers of educational video games to visually analyze the data on the results achieved by players and learners. Therefore, this will provide additional opportunities for game designers to visually perceive and discover trends and models in the visualized data to obtain valuable information about the overall success of users within a maze video game for education on the APOGEE platform. The paper presents the Conceptual Model of the Data Visualization Instrument Functionalities (Fig. 1). The model presents the basic functionalities of the instrument for visualizing gaming and learning results from maze video games for education in the APOGEE software platform. The main designed user functionalities that the instrument provides to users are as follows: 1. 2. 3. 4. 5.
Visualize Data; Manage Account; System Configure; Storing Profile Game Data; Gathering Gameplay Data.
Each of these functionalities contains other additional functionalities, which will be presented and further developed in the subsequent studies and publications of the author of this paper. The UML use case diagram, illustrated in Fig. 1, shows the designed main five functionalities the instrument provides to the users and also presents the basic categories of users in the platform that can use the tool functionalities. They are divided into three types of users, namely: • Game Creator/Game Designer; • Game Player/Game Learner; • Administrator.
304
Y. Dankov
Fig. 1. Conceptual Model of the Data Visualization Instrument Functionalities
Each of these users can only use certain functionality of the instrument. In Fig. 1. These functionalities are presented as ovals, and users who can use the instrument are illustrated as actors on the left and right of the proposed instrument. Therefore, the user Game Creator/Game Designer can use the functionalities Visualize Data, Manage Account, and System Configure. In contrast, Game Player/Game Learner users can use Visualize Data and Manage Account functionalities. Platform Administrators have access to all tool features (including Storing Profile Game Data and Gathering Gameplay Data functionalities) to maintain the instrument. All users must first have system registration (accessing the functionality Manage Account) to use the instrument for visualizing gaming and learning results from maze video games for education and its functionalities. Integrating the instrument into the APOGEE platform will allow users to have individual profiles to use when playing maze video games for education in the APOGEE platform. Thus, these users can utilize the tool’s functionality to visualize the results of the game sessions by accessing the functionality Visualize Data. The instrument will, in turn, be responsible for collecting the necessary data thru the functionality Gathering Gameplay Data and storing this data in the relevant database of game and learning results (Storing Profile Game Data functionality), as well as providing the ability to adjust the tool through the System Configure functionality. 2.2 Conceptual Model of the Data Visualization Instrument for the System Login and Visualize Data Functionalities After presenting the Conceptual Model of the Data Visualization Instrument Functionalities and the primary users of the instrument, this paper also presents the Conceptual
Conceptual Model of a Data Visualization Instrument
305
Model of the Data Visualization Instrument for the Manage Account and Visualize Data Functionalities, illustrated in the UML diagram in Fig. 2. This model describes in more detail all functionalities related to using two of the designed and described in the previous section basic user functionalities of the instrument, namely Visualize Data and Manage Account functionalities.
Fig. 2. Conceptual Model of the Data Visualization Instrument for the Manage Account and Visualize Data Functionalities.
The focus of the diagram is precisely on the use of these two main functionalities by Game Creator/Game Designer and Game Player/Game Learner users. To use the instrument’s capabilities to visualise gaming and learning results from maze video games for education, users must start the tool using the Manage Account functionality to enter their profile. After that, the system will recognize them as the corresponding user and
306
Y. Dankov
provide the delegated rights to use the tool. Accordingly, the instrument enables the user to enter his data through the Login functionality, after which it checks whether these data are valid, whether this user already has an account, and whether the entered data is correct (Password Validation functionality). If the user does not have a profile in the system, the tool enables the creation of a new account through the Registration New User functionality. Registered users can edit their profiles using the Configure Profile functionality. After the user has entered valid user data and logged in to the system, the user has access to the corresponding user profile with a defined user role. After that, this user can use the instrument’s functionality to visualize the results of the game sessions through the Visualize Data functionality. Therefore, when accessing the Visualize Data functionality, the instrument checks the validity of the user’s profile through the Verify Profile functionality. In this way, the instrument determines what functionalities the respective user can benefit from, accessing the main functionality of the Visualize Data tool. Therefore, the tool provides the following capabilities depending on whether the user is a Game Creator/Game Designer or Game Player/Game Learner: • The user is a Game Creator/Game Designer. The available functionalities for this user (accessing the Visualize Data functionality) are the following: Game Designer Dashboard; Visualize Player Dashboard, and Visualize Learner Dashboard; • The user is a Game Player/Game Learner. The available functionalities for this user (accessing the Visualize Data functionality) are Visualize Player Dashboard and Visualize Learner Dashboard. Each user can use the instrument’s functionality on the initial condition that this user is a registered user in the system and has the appropriate profile with the necessary user rights and a user role in the APOGEE platform. The Game Designer Dashboard functionality is only available to Game Creator/Game Designer users. This functionality enables these users to use the following functionalities: • • • • •
Viewing Learner Profile Dashboards; Viewing Player Profile Dashboards; Viewing the Number of Created and Played Games; Viewing the Playtime of Each Game; Configure the Game Designer Dashboard.
Therefore, the Game Creator/Game Designer users can use these functionalities to view the general information that is stored in the tool’s database, which includes viewing information about the games played by users, the game and learning results achieved by players and learners, as well as to visualize data and statistics about the games designed by the designers, and to be able to monitor the individual learner and player dashboards, and so on. The Visualize Player Dashboard functionality provides the following additional functionalities:
Conceptual Model of a Data Visualization Instrument
307
• Viewing Player Personal Statistics; • Viewing Overall Player Statistics; • Configure Player Dashboard. Users use these functionalities to visualize the personal player results achieved during game sessions. The tool provides the functionality to visualize aggregated data on the ranking of the accomplished game results of the players who have played the same maze video game for education on the APOGEE platform. In this way, users will have the opportunity to monitor their achieved results and compare themselves with other players and their outcomes. The Visualize Learner Dashboard functionality provides the following additional functionalities: • Viewing Learner Personal Statistics; • Viewing Overall Learner Statistics; • Configure Player Dashboard. Therefore, users can have information visualized to them through various data visualization methods. Thus, learners can see their learning outcomes and the extent to which they have grasped and perceived the didactic content in the maze video game for the education they have played on the APOGEE platform. The instrument will also provide the functionality to visualize the aggregate learner data and ranking according to the success of each learner who played the same video game on the platform.
3 Conclusion and Future Work Thanks to the instrument for visualization of gaming and learning results from maze video games for education, the data visualized to stakeholders will serve to extract valuable information and knowledge that is useful for both the creators of educational video games in the creation and design of educational maze video games, as well as for players and learners for their results, success, effectiveness and efficiency, etc., as well as for all other stakeholders. The instrument will support the overall process of making informed and strategic decisions based on the visualized gaming and learning results of educational video games through the various data visualization methods and user dashboards. Further development of the presented conceptual models and their extension with additional functionalities is planned as future work. All this will contribute to the future design and development of the instrument, as well as its integration into the APOGEE platform and conducting experimental studies on the instrument’s usability. Acknowledgements. This research is supported by the Bulgarian Ministry of Education and Science under the National Program “Young Scientists and Postdoctoral Students – 2”.
308
Y. Dankov
References 1. Adams, E.: Fundamentals of Game Design, Third Edition. New Riders, Pearson Education (2014). ISBN: 978-0-321-92967-9 2. Abt, C.: Serious Games. University Press of America (1987) 3. Kara, N.: A systematic review of the use of serious games in science education. Contemporary Educ. Technol. 13(2), ep295 (2021) 4. Laamarti, F., Eid, M., Saddik, A.: An overview of serious games. Int. J. Comput. Games Technol. 2014, Article ID 358152 (2014) 5. Aleksieva-Petrova, A., Dorothee, A., Petrov, M.: A survey for policies and strategies for ICT implementation in the learning process. In: Proceedings of the 12th International Technology, Education and Development Conference, Valencia, Spain, 5–7 March 2018, pp. 192–197, (2018) 6. Bakan, U., Bakan, U.: Game-based learning studies in education journals: a systematic review of recent trends. J. Actualidades Pedagógicas, (72), 119–145 (2018), ISSN: 0120–1700 7. Velaora, C., Dimos, I., Tsagiopoulou, S., Kakarountas, A.: A Game-based learning approach in digital design course to enhance students’ competency. Information 13, 177 (2022) 8. Darwesh, A.: Concepts of serious game in education. Int. J. Eng. Comput. Sci. 4(12) (2015). ISSN:2319-7242 9. Boyle, E., et al.: An update to the systematic literature review of empirical evidence of the impacts and outcomes of computer games and serious games. J. Comput, Educ. 94, 178-192 (2015) 10. Papp, D., Gy˝ori, K., Kovács, K.E., Csukonyi, C.: The effects of video gaming on academic effectiveness of higher education students during emergency remote teaching. Hungarian Educ. Res. J. 12(2), 202–212 (2022) 11. Bontchev, B., Vassileva, D., Aleksieva-Petrova, A., Petrov, M.: Playing styles based on experiential learning theory. Comput. Hum. Behav. 85, 319–328 (2018) 12. Terzieva, V.; Bontchev, B.; Dankov, Y.; Paunova-Hubenova, E.: How to Tailor Educational Maze Games: The Student’s Preferences. Sustainability 2022, 14, 6794, (2022) 13. Rüth, M., Birke, A., Kaspar, K.: Teaching with digital games: How intentions to adopt digital game-based learning are related to personal characteristics of pre-service teachers. Br. J. Edu. Technol. 53, 1412–1429 (2022) 14. Amanatidis, N.: Augmented reality in education and educational games-implementation and evaluation: a focused literature review. Computers Children, 1(1), em002 (2022) 15. Zirawaga, V.S., Olusanya, A.I., Maduku, T.: Gaming in education: using games as a support tool to teach history. J. Educ. Pract. 8, 55–64 (2017) 16. Utoyo, A.W.: Video games as tools for education. J. Game, Game Art Gamif. 03(02) (2018) 17. Bontchev, B.: Rich educational video mazes as a visual environment for game-based learning. In: CBU International Conference on Proceedings., Prague, Czech Republic, vol. 7, pp. 380– 386 (2019) 18. Andreeva, A.: Colorful and general-artistic aspects of architecture and design. viewpoints. in aesthetic achievements of the exhibition activities of Technical University—Sofia 2009–2019; Technical University: Sofia, Bulgaria, vol. 1, pp. 78–96 (2019) 19. Andreeva, A.: Conceptual requirements for non-traditional exhibition spaces. architectural, design and art aspects. Bulgarian J. Eng. Des. (40), 53–58 (2019). Mechanical Engineering Faculty, TU-Sofia, ISSN:1313-7530 20. Dankov, Y., Birov, D.: General architectural framework for business visual analytics. In: Shishkov, B. (ed.) BMSD 2018. LNBIP, vol. 319, pp. 280–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94214-8_19
Conceptual Model of a Data Visualization Instrument
309
21. Bontchev, B., Vassileva, D., Dankov, Y.: The APOGEE software platform for construction of rich maze video games for education. In: Proceedings of the 14th International Conference on Software Technologies (ICSOFT 2019), Prague, Czech Republic, 26–28 July 2019, pp. 491– 498 (2019) 22. Bontchev, B., Terzieva, V., Paunova-Hubenova, E.: Personalization of serious games for learning. Interactive Technology and Smart Education, Emerald 18(1), 50–68 (2020). ISSN: 1741-5659
Mobile and Cooperative Agent Based Approach for Intelligent Integration of Complex Data Karima Gouasmia1(B) , Wafa Mefteh2 , and Faiez Gargouri1 1 MIRACL Laboratory, University of Sfax, Sfax, Tunisia
[email protected], [email protected] 2 RISC-ENIT Laboratory, University of Tunis-Manar, Tunis, Tunisia [email protected]
Abstract. Since many years, data integration has become a delicate task in the data-warehousing process. Indeed, the collected data (from various applications and existing in different forms) must be homogenized to meet several needs such as analytical activities. Today, organizations collect a huge mass of data which becomes more and more complex. Collected data have different types (text, video, image…) and are located in heterogeneous and dispersed sources. The complexity and the dispersion of this data return their integration, a difficult task that necessitates the use of efficient techniques and performed tools in order to provide a unified data source. Our objective is to take advantage of the agent software technology, in particular cooperative agents and mobile agents to perform the integration phase of complex data. This paper gives an overview about related works and presents a new approach for an intelligent integration of complex data based on cooperative and mobile agents. Keywords: Cooperative agent · Mobile agent · Intelligent data integration
1 Introduction Currently, we are in the era of the digital revolution. Every day, human society generates billions of gigabytes of data from various sources of information such as sensors, social networks, online shopping, etc. For this reason, our data production is growing dramatically [3]. Due to the explosion of data, the need to process and especially to analyze these data increases sharply. Nowadays, data analysis becomes very important with the technological advances. It is able to produce very interesting information satisfying the requirements of the users [3, 15]. On the other hand, the diversity of data sources and their heterogeneity related to the format or to the structure of data sources [18], such as semi structured sources (XML documents, email), structured sources (databases of relational data) and also the unstructured format (Word documents, videos). In addition of the distribution of sources and the huge amount of collected data. All these factors make the data more complex and their analysis is certainly very difficult [14]. Thus, many researchers work on improving the analysis of this data, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 310–319, 2023. https://doi.org/10.1007/978-3-031-35510-3_30
Mobile and Cooperative Agent Based Approach
311
through a computer process called data integration in order to homogenize the data and give them an understandable meaning. They aim to process data from various sources and in different formats, adapt it and load it to a desired destination [5]. In short, data integration is a delicate and valuable task to efficiently analyze and exploit a large set of data. Also, it is very essential to know that the integration of data is not a standardized solution. It can vary according to specific needs [13]. For this reason, it is necessary for data integration to use powerful tools and advanced technologies to integrate complex data. Several works have been proposed to improve this process using a variety of technologies but we focus on multi-agent system [9], especially mobile and cooperative agents. They are not widely used despite their efficiency that can benefit the process of data integration. A multi agent system is an efficient solution for solving the problem of integrating complex data by subdividing it into smaller tasks. These tasks are assigned to agents that move from one machine to another on the network and cooperate with each other to perform a specific task. Using mobile and cooperative agents, the data integration process can be an intelligent, dynamic and parallel process that we need for reliable data processing. In the next section, we give an overview about related works and a synthesis about the main used technologies. Then, we present our new approach for an intelligent integration of complex data.
2 Overview About Related Works Several approaches were proposed to improve the data integration process by using the ETL (Extract Transform Load) tool and various techniques such as ontology, Hadoop and agent technology [4, 5, 7, 18, 19]. A.G. Akinyemi et al. [19] proposed a data integration framework based on Semantic Web techniques and ISO 15926 standard. This framework is used to integrate and query relevant data for decommissioning oil and gas assets. The working process of the proposed framework consists of four steps: pattern definition, data mapping, data storage, and information extraction. The authors expressed that the proposed solution helps to improve work efficiency. Liu Xin et al. [18] propose an ontology-based data integration system (OPSDS). OPSDS provides a semantically richer global ontology and query based data access. The bottom layer of the OPSDS architecture consists of a set of data sources, such as Oracle, SQL Server, etc. The middle layer consists of local ontologies extracted from the data sources below. The global ontology is then formed as a result of the combination of local ontologies. Ontology is used as a technique capable of solving the problem of semantic heterogeneity with a minimum of energy and a reduced cost. Mahfoud Bala et al. [4] developed a platform called P-ETL (Parallel-ETL). This platform is intended for the integration of large data according to the MapReduce paradigm. A P-ETL is presented in five phases: Extracting, Partitioning, Transforming, Reducing and Loading. ETL and MapReduce allowing to perform a data integration process with better scalability and low processing rate. G. Jayashree et al. [7] used the ETL process and Extensible Markup Language (XML) to overcome data integration (DI) challenges. The XML language capable of organizing data based on graphical and hierarchical representations. For this reason, the authors
312
K. Gouasmia et al.
considered that XML are the best solution to solve the problem of organizing data in a common syntax. F. Clerc et al. [5] proposed a new approach to complex data integration, called SMAIDoC. This approach is based on multi-agent system. SMAIDoC (Fig. 1) consists of an agent set offering the different services needed for the integration of complex data, each service generating a product. In spite of the originality of this approach, it presents several defects. Indeed, the user is brought to make everything manually. The architecture does not help to look for objects or to extract automatically or semi-automatically the parameters of these objects. The user looks manually for objects to be extracted and give the parameters of objects looked for. It is an enormous defect because the architecture does not make the main task for which it was created which is the data extraction. The DataAgent who is responsible for the extraction, it in fact only allows the capture of data given by the user. This agent represents just a simple interface. In addition, the architecture does not confirm the generated XML files (Table 1). Table 1. Related works about data integration: Synthesis Technique
Approach
Critics
Ontology
[18]
+Save energy +Lower the costs −More difficult to use
Semantic Web, ISO 15926
[19]
+Lower the costs +Benefit the environment
Simple ETL, Mapreduce
[4]
+Better scalability +Decrease processing time −Parallelism only at the process level
ETL, XML
[7]
+Provides a flexible, robust, and easy common data model
Multi Agent System
[5]
−Manually search for objects to extract −Manually extract data +Facilitates the data integration process
In our work we use the ETL tool to ensure better data integration. It consists of three operations [1]: – Extraction: this is the operation in which the data is extracted from various sources in different formats. This operation is also called the reading process. – Transformation: this is the most important operation, in which a set of transformations is applied to the data: filtering, joining, dividing, sorting, etc. The purpose of this phase is to ensure the homogeneity of the data. – Loading: After the transformation, the data is loaded into the desired location, usually in the data warehouse (DW).
Mobile and Cooperative Agent Based Approach
313
In recent years, many works focused on the ETL process. Mondal et al. [12] exploited machine learning in the ETL process to solve problems encountered in traditional data warehouses such as data availability and quality. Machine learning is used precisely in the data pre-processing stage for the purpose of processing and cleaning the data before loading it into the data warehouse. Ramazan et al. [17] have proposed an agent based ETL platform, its goal is to provide a reliable mechanism for extracting, transforming and loading heterogeneous data from various sources into the data warehouse. In this platform each agent performs a specific task and cooperates with the other agents to ensure the overall goal by respecting the format and semantics of the data. The use of multi-agent system makes the ETL process more reliable, flexible and fast. Matija et al. [16] proposed a new architecture for the ETL workflow generator. The main idea of this solution is to automatically integrate information about ontology-based mapping and transformation into traditional ETL tools. The proposed architecture indicates that the ETL Workflow Builder communicates with four parts: Sources (databases, files, etc.), Destination (the data warehouse), Extract Semantic Knowledge(the mappings and the transformations to be performed based on the source, destination and domain ontology), Current ETL tools (Talend Open Studio, Pentaho). The proposed architecture is message based, which is capable to process several messages at the same time. In order to improve the traditional ETL process, Rahul Bagave [2] implemented a new P-ECTL (Parallel-Extract Clean Transform Load) framework based on Apache Spark. This framework makes it possible to extract data on a multi-node cluster, in order to solve the problem of massive data processing and then improves the ETL extraction task (Table 2). Table 2. Related works about the ETL process: Synthesis Technique
Approach
Critics
Ontology
[16]
+Parallel processing +Flexibility −Manual setup
Machine learning
[12]
+Real-time accessibility +Improve data quality
Multi Agent System
[17]
+Flexibility and reliability −Too limited to data types
Apache Spark
[2]
+Minimize the time consumption +Accuracy
314
K. Gouasmia et al.
3 A Mobile Cooperative Agent Based Approach for Intelligent Integration of Complex Data Given the need to develop a robust data integration approach that can provide reliability, minimum risk of error and minimum processing time, we propose a new complex data integration system based on mobile and cooperative agents with more potential like mobility, scalability and effciency [6]. Our approach is a multi-agent architecture (Fig. 1) which consists of four static agents (InteractAgent, MonitorAgent, LoadAgent and ProcessAgent) and two mobile agents (ExtractAgent and ResearchAgent) in order to interact and cooperate together realize the ETL data integration process. To develop our approach and build our system, we followed the steps proposed by the ADELFE 3.0 [11] methodologies for the construction of adaptive multi-agent systems with emerging functionality based on simulation [8] (Fig. 2). We have simulated the different tasks of the ETL process as services given by specific agents.
Fig. 1. A multi agent architecture for complex data integration
Mobile and Cooperative Agent Based Approach
315
The user sends his demand to InteractAgent who plays the role of a mediator. ResearchAgent is a mobile agent capable of migrating to sources in order to search for requesting objects. If the user requests data from an external source, ResearchAgent travels over the network to retrieve sources containing data that suit the user’s request. In the other case, where the user requests data from an internal source, ResearchAgent moves to the system files and looks for data consistent with the user’s request. After performing its search, ResearchAgent sends the result to InteractAgent. ExtractAgent is responsible of the meta-data extraction. If it is about a single location, he is going to move and if it is about more than one location, he clones himself, moves towards one location and sends his clones to the other various location. The extracted parameters will be transformed into an XML document which is then loaded into the data-warehouse. MonitorAgent is responsible for handling the problems encountered during the execution of the various activities.
Fig. 2. ADELFE3.0 [11]
The Simulation Based Design (SBD) activities offered by ADELFE 3.0 (using the sesam simulation tool) (Fig. 3) helped us to better see the result offered by our architecture and the adjustment of the behaviour of the agents after the execution of each simulation before a real implementation. In SeSAm, agent behavior is implemented in the form of an activity diagram. It is made up of a set of activities and transitions between these activities. The analysis of the result is given by SeSAm (Fig. 4), indicates the number of requests processed over time. The horizontal axis shows the step number and the vertical axis shows the number of demands.
316
K. Gouasmia et al.
The first simulation (Figure on the left): in step 45 we have two requests processed. The final simulation (Figure on the right): in step 45, there are eleven demands processed which indicates that agents are able to process a request without spending too much time. Depending on the simulation results, we can validate our designed architecture because the agents are not blocked and they are able to solve the problem of data integration reliability.
Fig. 3. Data integration system model
Fig. 4. Number of demands processed over time
4 Discussion Our main objective is to minimize the processing time and the risk of error, thus ensuring reliable data integration. Figures 5 and 6 show the implementation of a real query on our data integration system. The query: Domain: Computer science, Keyword: Computer science, Type: videos, images, documents, Source: Internal or External (via Internet).
Mobile and Cooperative Agent Based Approach
317
According to the implementation, we note that our solution minimizes the risk of error because each agent performs a specific task to solve cooperatively the general task of the ETL process. It is less difficult to identify errors at different stages of data integration. Also, our solution offers the service "validation of data sources" which allows the user to select the valid sources which according to his request. This service minimizes the risk of error and increases reliability and accuracy and so makes our solution approach more reliable than other solutions. The tasks of extraction, research and transformation are done automatically respectively by ExtractAgent, ResearchAgent and ProcessAgent unlike the SmaiDoc approach [5] which does not allow that and the user is led to do everything manually.
Fig. 5. Complex data integration system
The multi-agent system is capable of solving the problem of integrating complex data in a reliable manner by subdividing the global activity into smaller and independent tasks. These tasks are allocated to intelligent agents. Each deals with a specific task and cooperates with the other agents. This minimizes the processing time of a data integration request and reduces the error rate. For this reason, we found that our solution is capable of providing more reliable and less difficult complex data integration compared to other techniques [18].
318
K. Gouasmia et al.
Fig. 6. The extracted meta-data.
5 Conclusion and Future Scope In this article, we proposed a new approach to integrate complex data based on mobile and cooperative agents. This approach is a multi-agent architecture in which agents perform cooperatively the tasks of the ETL data integration process. According to the obtained results, we deduce that our system is able to solve the data integration problem in a more intelligent and less difficult way. It is able to process a request for integration of complex data in minimum time and also with a minimum error rate. In a future work, we enhance the agent behaviour by adopting a self-design and cooperative agent model (S-DLCAM) [10] in order to give to our agents the ability of self-learning from their experiences in order to improve their behaviours and so their performance.
References 1. Amil, A., Ilham, A., Usman, S.: Performance analysis of extract, tranform, load (etl) in apache hadoop atop nas storage using iscsi. In: International Conference on Computer Applications and Information Processing Technology (2017) 2. Bagave, R.: Enhancing extraction in etl flow by modifying as p-ectl based on spark model. National College of Ireland (2020) 3. Bala, M., Alimazighi, Z.: Etl process modeling in a mapreduce model. Maghreb Conference on Advances in Decision-Making Systems, 2013
Mobile and Cooperative Agent Based Approach
319
4. Bala, M., Mokeddem, O., Boussaid, O., Alimazighi, Z.: Parallel and distributed etl platform for massive data integration. In: International Conference on Extraction and Knowledge Management (2015) 5. Clerc, F., Duffoux, A., Rose, C., Bentayeb, F., Boussaid, O., Smaidoc: a multi-agent system for the integration of complex data. In: International Conference on Industrial Applications of Holonic and Multi-Agent Systems HoloMAS: Holonic and Multi-Agent Systems for Manufacturing (2003) 6. Dorri, A., Kanhere, S.S., Jurdak, R.: Multi-agent systems: a survey. In: IEEE. Translations and Content Mining are Permitted for Academic Research Only (2018) 7. Jayashree, G., Priya, C.: Data integration with xml etl processing. In: International Conference on Computing, Engineering and Applications (2020) 8. Mefteh, W.: Simulation-based design: Overview about related works. Mathematics and Computers in Simulation (2018) 9. Mefteh, W., Mejri, M.-A.: Complex systems modeling overview about techniques and models and the evolution of artificial intelligence. In: World Conference on Information Systems and Technologies (2020) 10. Mefteh, W., Migeon, F., Gleizes, M.-P., Gargouri, F.: S-dlcam: a self-design and learning cooperative agent model for adaptive multi-agent systems. In: Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (2013) 11. Mefteh, W., Migeon, F., Gleizes, M.-P., Gargouri, F.: Adelfe 3.0 design, building adaptive multi agent systems based on simulation a case study. In: Computational Collective Intelligence (2015) 12. Mondal, K.C., Biswas, N., Saha, S.: Role of machine learning in etl automation. In: International Conference on Distributed Computing and Networks (2020) 13. Ostrowski, D., Kim, M.: A semantic based framework for the purpose of big data integration. In: International Conference on Semantic Computing (2017) 14. Riani, M.: Problems and challenges in the analysis of complex data: static and dynamic approaches. In: Part of the Studies in Theoretical and Applied Statistics book series (STAS) (2012) 15. Shelake, V.M., Shekokar, N.: A survey of privacy preserving data integration. In: International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (2017) 16. Novak, M., Kermek, D., Magdaleni, I.: Proposed architecture for ETL workflow generator. In: Central European Conference on Information and Intelligent Systems (2019) 17. Talib, R., Hanify, M.K., Fatimaz, F., Ayesha, S.: A multi-agent framework for data extraction, transformation and loading in data warehouse. In: International Journal of Advanced Computer Science and Applications (2016) 18. Liu, X., Hu, C., Huang, J., Liu, F.: Opsds: a semantic data integration and service system based on domain ontology. In: IEEE First International Conference on Data Science in Cyberspace (2016) 19. Akinyemia, A.G., Suna, M., Gray, A.J.G.: Data integration for offshore decommissioning waste management. Autom. Constr. (2020)
Euler Transformation Axis Method for Online Virtual Trail Room Using Fusion of Images B. Surendiran1(B)
, Dileep Kumar1 , S. Amutha2
, and N. Arulmurugaselvi3
1 Department of CSE, National Institute of Technology Puducherry, Karaikal, India
[email protected]
2 Vellore Institute of Technology, Chennai, India 3 GPTG, Gandharvakottai, India
Abstract. In recent days, most people prefer online shopping rather than going to the shop and buying goods. While buying clothes on E-commerce sites, customers don’t have an overview of how the clothing looks on them, but trail rooms are available in shops. A new software application was developed using OpenCV named Online Virtual Trail Room (OVTR) to explain how the dress looks on the customer who purchases online. This paper aims to create an OVTR proposing a new Euler Transformation axis algorithm that doesn’t require a web camera or front camera. The proposed methodology uses the Viola and Jones method for face detection and eyes detected in the ROI of the detected face. The proposed virtual trail room fusion method can be implemented in any web based platform using ordinary mobile camera and avoids the need of complex systems like 3D rendering or Augmented reality. Keywords: Online virtual trial room · OpenCV · Virtual fitting room · Viola and Jones method · Euler Transformation Axis
1 Introduction Buying wearables online is always a gamble since you never know how something would appear on you. Furthermore, buying clothing or decorations from a Businesses that don’t sell online take a long time since you have to find the store first., enter the trial room and try on each outfit. All the virtual trail rooms that have been developed till now works based on video tracking. It requires a web camera or a front camera to capture the live pictures of the customer. But the main problem exists in detecting the customer’s face. This can be done with the help of either OpenCV or PHP-Face detect. PHP-Face detects can’t notice the eyes of the customer, but OpenCV does. Hence all the Online Virtual Trail Room (OVTR) uses OpenCV. Here we aim to develop a virtual trail room using OpenCV that doesn’t require a web camera or front camera. All the existing trial room uses video tracking methodology for the detection customer face, Video tracking takes more time and creates a load on the server. Some uses augmented reality or 3D rendering, which are highly complex and high cost to implement for all retailers. So, some OVTR’s restrict video tracking once the customer’s face © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 320–327, 2023. https://doi.org/10.1007/978-3-031-35510-3_31
Euler Transformation Axis Method
321
is detected, which is not user-friendly as the customer adjusts themselves to fit the clothing. And this works only when a web camera or a front camera is present. This paper proposes a new algorithm that requires a customer photo and doesn’t depend upon video tracking, web camera, or front camera. The paper is organized as follows, Sect. 2 covers literature review. Proposed methodology and implementation details are discussed in Sect. 3 & 4 respectively. Section 5 discuss the conclusion.
2 Literature Review The Augmented Fitting Room Using Kinect and Cloth Simulation method is used in the Intelligent Mirror for Fitting Room [1]. This work aims to create an intelligent mirror that can enhance the appearance of a client standing in front of it by displaying several outfits tailored to their physique. In particular, the client may freely position in front of the mirror, such as turning around to look at his or her back and side views, and the fitting garments will remain in real-time aligned with body postures. Few research methods that use the Kinect sensor systems are A Real-Time Virtual Dressing Room App using the Kinect [2], Virtual trial room [3], Virtual Try-on using Kinect system & HD camera [4], Virtual Fitting Room using Webcam [5], Kinect-Based Virtual Try-on System [6]. Ari Kusumaningsi et al. [7] developed a virtual dressing room using augmented reality with the 3D model using OpenNi library and Kinect SDK. Virtual fitting room [8] employs augmented reality with 3D geometrical modeling, spatial analytic geometry, and depth acquisition algorithms among the techniques used. A method for adding geometry pictures to cloth simulation claims to generate cloth motion for surface meshes of any genus while preserving that simplicity of a mass-spring concept. To increase speed, that work uses an implicit/explicit integration technique that uses the regular structure of geometry pictures. The cloth can also drape over other things, which are likewise represented as geometric pictures. The method is fast enough to simulate relatively thick fabric meshes in real-time. The use of 3D technologies to analyze garment fit to determine the usefulness of pattern customization technologies is the subject of this study. Virtual dressing rooms [9] Changing the form of clothing shopping is done using a motion-sensing gadget or a vertical wand containing tiny antennae - and augment reality is used to identify a person’s unique shape. One of the main disadvantages of online purchasing is returns, which can be eliminated with this technology. More than 20% of online clothing orders are returned. Consumers may soon be able to scan themselves in their living rooms before clicking “buy” on their display and thank sizing software was developed for home motion-sensing devices like the famous Microsoft Kinect. Using criteria supplied by garment manufacturers, the software can determine if a dress will fit like a tent or a tourniquet before a customer ever takes it off the rack. Application of Virtual Ornaments and Fabric Try-on Reality [10] the user’s body is detected and sized, reference points are detected using facial detection and augmented reality markers. Clothing was superimposed over the user’s image. Augmented reality, virtual reality, human-friendly interfaces, and facial identification are some of the methods employed. This project aims to create Virtual Fitting Room (VFR) software that
322
B. Surendiran et al.
can operate on an embedded device with network connection and camera setup. This VFR can improve the way clients buy offline by selecting the appropriate style and size of clothing/ornament. It may be utilized with a Personal Computer or a Beagle Bone, allowing consumers to shop for new clothing offline more efficiently. A Depth data based Virtual Dressing Room [11] can be classified as augmented reality (AR), in which a realtime image of reality is expanded and layered with extra information. Image processing in Virtual Dressing Room is the method utilized, and it provides a solution for the issues mentioned above. The program is built around a mirror with display that displays the camera’s picture. The image captured by the camera is then virtually overlaid with the selected dress. In general, this approach falls under the category of augmented reality (AR), in which a real-time view of reality is enhanced and overlaid with extra data. Although a home setup is feasible, this study primarily focuses on applications in textile stores.
3 Proposed Methodology A new software application is developed using OpenCV named Online Virtual Trail Room (OVTR) to explain how the dress looks on the customer who buys online. This work aims to create an OVTR proposing a new algorithm that doesn’t require a web camera or front camera. The proposed methodology uses the Viola and Jones method for face detection and eyes detected in the ROI of the detected Face. The proposed architecture is shown below in Fig. 1.
Input customer image
Convert to grey scale image
Overlay the customer
Scale & rotate image
Detect face & eyes Calculate face angle & eye distance
Fig. 1. Proposed methodology- OVTR.
3.1 Algorithm Detect the face & eyes of the user image using the Viola-Jones method [12]. Step 1. Find the top-left coordinates of the face, height & width (fx, fy, fh, fw). Step 2. Find the center of the eyes as (exl, eyl) (exr, eyr). Step 3. Find the distance between the eyes (d1) and the angle in which face is inclined to the vertical axis as α. Step 4. Find the angle of the model face as β and distance between the eyes as d2. Step 5. Angle θ = α- β is calculated.
Euler Transformation Axis Method
323
Step 6. Scale the image of the user based on the scaling factor and rotate it by angle θ. Step 7. Overlay the extracted face on the model’s face such that both the left eye coordinate of the model face and user’s face should coincide. 3.2 Methodology Face & Eye Recognition The main challenging task in the OVTR system is to detect the customer’s face and eye from which further overlapping tasks can be done. Using OpenCV has the benefit of including ready-made classifiers for the face, eyes, and grin, among other things. Here, the input customer image is first converted to a grayscale model, and faces and eyes are detected with the help of the properties of the face and eyes as shown in Fig. 2.
Fig. 2. Face and Eye detection (the rectangle in blue color denote the detected frontal face and two green rectangles denotes the two eyes in the image).
The face angles of the customer (say α), model image (say β) and the distance between the center of eyes can be calculated as, a = eye_x_right − eye_x_left
(1)
b = eye_y_right − eye_y_left
(2)
Angle = tan − 1(b/a)
(3)
Eyes distance =
a2 + b2
(4)
324
B. Surendiran et al.
Scaling Image scaling is a computer graphics technique for enlarging or contracting the size of a digital image. It is the technique for resizing an image to suit the circumstance. Because the number of pixels necessary to fill the bigger area is more than the number of pixels in the original image, increasing the size of an image might be more difficult. Here the scaling factor is identified with the following formula Eq. 5, Find the scaling factor where scaling factor (k) =
customereyesdistance model eyesdistance
(5)
Rotation In order to make both customer and model image be at the same angle, the customer image has to be rotated by calculating the angle, θ = (α – β). The image is then rotated with respect to the center of the image as the origin and the center of the eye coordinates and top left coordinates of the face are transformed by an angle θ, and Affine transformation is used to rotate the image. After rotation & scaling, the coordinates of the image get changed; using transformation rules [13], we can find the respective coordinate values of the face & eyes. This method is based on Euler Transformation Axis for rotation of face and eyes is shown in Eqs. 6 & 7 below. rotated_points_x = (rotated_points_x ∗ cos (alpha radians) − rotated_points_y ∗ sin (alpha_radians)) rotated_points_y = (rotated_points_x ∗ sin (alpha radians) + rotated_points_y ∗ cos(alpha radians))
(6)
(7)
4 Implementation Details The implementation has been carried out successfully using python language. All the build-in essentials are installed and the platform used is OpenCV, an open-source computer vision and machine learning software library. Several existing virtual trial room methods such as 3D virtual trial room with augmented reality [14], virtual dressing environment [15], a virtual try-on [16], Virtual Dressing Room with Web Deployment [17–20] were developed with OpenCV as an essential platform. Django framework is used. It is an MVC (Model View Controller) framework that promotes component reusability and pluggability concept. MySQL is the backend for the web interface built. The results of the proposed OVTR is shown in Fig. 3. It represents the customer’s image, model image, and the final overlaid image. With the proposed model and ViolaJones method, the output virtual trail image has been generated effectively. The proposed OVTR application interface (Fig. 4) has been designed in a userfriendly manner where all the steps are clearly mentioned. This interface allows the customer to create an account, upload the customer image along with the selected product and try the product virtually.
Euler Transformation Axis Method
325
Fig. 3. Customer image, Product image and Overlaid image.
Fig. 4. OVTR- Application interface.
5 Conclusion In this work, an online virtual fitting room using face overlay was implemented successfully. Here, the main objective is to create an OVTR proposing a new algorithm that doesn’t require a web camera or front camera. Users have to upload their image and click try on to see their overlaid image on the model as they wish. This customer image is converted to greyscale mode, which makes it easy to detect face and eyes. Compared to existing paper like augmented fitting room, virtual trail room using depth data that
326
B. Surendiran et al.
needs a front camera, it’s an optimal solution for an online virtual trial room. In addition, the proposed Euler transformation algorithm can track and scale the clothing according to the user’s face and eye distance. Overall, the presented Online Virtual Trail Room found to be effective solution for a quick, simple and easy virtual trail for clothing.
References 1. Kit, N.K.: Intelligent Mirror for Augmented Fitting Room Using Kinect Cloth Simulation. HCI International (2013) 2. Isıkdogan, F., Kara, G.: A real time virtual dressing room application using Kinect. Comput. Vis. Course Proj., no. January, pp. 1–4 (2015) 3. Paul, V., Sanju Abel, J., Sudharsan, S.: Praveen M"VIRTUAL TRIAL ROOM. South Asian Journal of Engineering and Technolog, vol.3, no.5, pp. 87–96 (2017) 4. Giovanni, S., Choi, Y., Huang, J., Khoo, E., Yin, K.: Virtual try-on using kinect and HD camera. Motion Games SE - 6, 55–65 (2012) 5. Barde, C., Nadkarni, S., Joshi, N., Joshi, S.: Virtual Fitting Room using Webcam, no. 3, pp. 60–64 (2015) 6. Yousef, K.M.A., Mohd, B.J., AL-Omari, M.: Kinect-based virtual try-on system: a case study. In: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp. 91–96 (2019). https://doi.org/10.1109/JEEIT.2019.871 7498 7. Kusumaningsih, A., Kurniawati, A., Angkoso,C.V., Yuniarno, E.M., Hariadi, M.: User experience measurement on virtual dressing room of Madura batik clothes. In: Proceedings 2017 International Conference Sustain. Inf. Eng. Technol. SIET 2017, vol. 2018–January, pp. 203–208 2018 8. Pachoulakis, I., Kapetanakis, K.: Augmented reality platforms for virtual fitting rooms. Int. J. Multimed. Appl. (IJMA), vol. 4, no.4, August 2012 9. Chang, A.: Virtual dressing rooms changing the shape of clothes shopping. Los Angeles Times, 23 July 2012 10. Rajaram, G., Anandavenkatesan, B.: Virtual Ornaments and Fabric Try-on Reality Application,” International Journal of Engineering Development and Research, 2014 11. Presle, P.: A virtual dressing room based on depth data. Int. J. Comput. Sci. Trends Technol. (IJCST) 2(2), Mar-Apr 2014 12. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2001) 13. OpenNI - http://openni.org/. Accessed 2012–23–02. http://en.wikipedia.org/wiki/OpenNI 14. Ramesh, A.: 3D Virtual Trial Room, vol. 6, no. 13, pp. 1–4 (2018) 15. Shaikh, F.: Virtual trial room. Int. J. Res. Appl. Sci. Eng. Technol. V(XI), 2101–2104 (2017) 16. Kanse, Y., Konde, D., Halder, S., Dhake, T.: Virtual Trial Room. SSRN Electron. J., pp. 1–4 (2021) 17. Rajan, S.P., Hariprasad, V., Purusothaman, N., Tamilmaran, T.: Virtual dressing room with web deployment. Turkish J. Comput. Math. Educ. 12(7), 2660–2666 (2021) 18. Daniel Shadrach, F., Santhosh, M., Vignesh, S., Sneha, S., Sivakumar, T.: Smart virtual trial room for apparel industry. In: 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), pp. 1–5 (2022). https://doi.org/10.1109/ ICDCECE53908.2022.9793030
Euler Transformation Axis Method
327
19. Davis, D., Gupta, D., Vazacholil, X., Kayande, D., Jadhav, D.: R-CTOS: real-time clothes try-on system using OpenCV. In: 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), pp. 1–4 (2022). https://doi.org/10.1109/ASIANCON55314.2022.9909352 20. Sunaina M, S.S., Manjusha P, L.S.N.J., Kishore, O.: Virtual ornament room using haar cascade algorithm during pandemic. In: Joshi, A., Mahmud, M., Ragel, R.G. (eds.) Information and Communication Technology for Competitive Strategies (ICTCS 2021). Lecture Notes in Networks and Systems, vol 400. Springer, Singapore (2023). https://doi.org/10.1007/978981-19-0095-2_24
A Novel Approach for Classification of Real Time Data Stream to Reduce Query Processing Time Virendra Dani1(B)
, Priyanka Kokate2 , and Jyotsana Goyal3
1 Computer Science and Engineering Department, Shri Vaishnav Vidyapeeth Vishwavidyalaya,
Indore, India [email protected] 2 Computer Science and Engineering Department, Shri Govindram Seksaria Institute of Technology and Science, Indore, India 3 Computer Science and Engineering Department, SAGE University, Indore, India
Abstract. The difficulty of data analysis and decision-making becomes more difficult as the volume of data grows. To put it another way, processing big amounts of data necessitates many resources to analyze and provide a result. Big data is a term that refers to an environment that is used to process enormous amounts of data and analyze them. However, the query answer is generated with a significant delay if the traffic is sluggish, and the data block size is large. To optimize the delayed response, considerable effort must be made to improve the performance of large data systems. This paper proposes a method based on streaming data mining for overcoming this delayed data response. The proposed technique contributes to live twitter stream collection, data pre-processing and translation of unstructured data into structured data features, and data stream classification utilizing the group learning concept for streamed text data. Even when a single pattern appears for query processing, this method optimizes query processing speed and generates responses in less time. Keywords: Real Time Streamed Data · Twitter API · MapReduce · HDFS · Hadoop · C 4.5 · Decision Tree
1 Introduction Typically, non-stationary data generating processes result in ordered, perhaps endless sequences of data points known as data streams. Data mining techniques that are often used with data streams include clustering, classification, and frequent pattern mining. New algorithms for various sorts of data are suggested on a regular basis, and it is critical to properly examine them under controlled settings. 1.1 Data Streaming Hundreds of data sources create streaming data continuously, and they generally deliver records in minute pieces (order of Kilobytes). In-game player activity, information from © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 328–337, 2023. https://doi.org/10.1007/978-3-031-35510-3_32
A Novel Approach for Classification of Real Time Data Stream
329
social networks, financial trading floors, or geospatial services, log files produced by users of your mobile or web applications, digital commerce purchases, telemetry from connected devices or data centre instrumentation, and information from social networks are all examples of streaming data [1]. This data must be utilized for a range of analytics, including correlations, aggregations, filtering, and sampling. Using data from this kind of analysis, businesses can gain visibility into many facets of their operations and customer behaviour, including service usage, server activity, website clicks, and the geolocation of devices, people, and physical objects, enabling them to react swiftly to changing circumstances. Businesses may, for example, monitor social media streams daily to spot shifts in public perception of their brands and products and respond quickly when the requirement occurs [3]. 1.2 Streamed Data Model In the data stream concept, part, or all of input data to be processed is sent as one or more continuous data streams rather than as random access from disc or memory [2]. The inclusion of certain data in traditional stored relations is not precluded by using the data stream paradigm. Joins between data streams and stored relational data are frequently performed using data stream queries. As a result, we eliminate any possible transaction-processing difficulties that may develop because of concurrent modifications to stored relations and data stream processing [4, 5].
2 Literature Review This section lists the many research attempts that have been done to improve query processing time for streaming data, as well as the various approaches that have been used for notable literature. By adapting Hadoop MapReduce-like techniques to graphics processing units, M. Mazhar Rathor et al. [6] established a real-time Big Data stream processing approach (GPUs). The authors designed a MapReduce-like technique for GPUs for statistical parameter computing by dividing overall Big Data files into fixed-size parts. The scaling problem is approached by Michael Borkowski et al. [7] by avoiding superfluous scaling operations and over-compensating reactions to short-term fluctuations in workload. This provides for flexibility while also lowering the overhead costs associated with scaling operations. The authors conduct a real-world testbed study to confirm effects and present a break-even cost analysis to demonstrate approach’s economic viability. Chen Luo et al. [8] focused on data intake and query processing for LSM-based storage systems that are utilized for a variety of purposes. The authors first suggest and test a set of improvements for efficient batched point lookups, extending applicability of LSM-based secondary indexes dramatically. Mahmud Hasan et al. [9] introduced TwitterNews +, an event detection system that uses customised inverted indices and an incremental clustering technique to discover large and small, noteworthy events in real time from the Twitter data stream at a minimal computational cost.
330
V. Dani et al.
The implementation of de-facto standard Hadoop MapReduce as well as framework Apache Spark was emphasized by Khadija Aziz et al. [10]. After that, the authors use Spark and Hadoop to run experimental simulations to assess a real-time data stream. To bolster argument, the authors present a comparison of the two implementations in terms of design and performance, followed by a discussion of the simulation findings. In streaming data, Abdulaziz Almaslukh et al. [11] present a comprehensive experimental evaluation of several spatial-keyword index designs. To successfully support streaming data applications, the authors enhance current snapshot spatial-keyword searches with the temporal dimension.
3 Proposed System Figure 1 illustrates the high-level conceptual model of the proposed approaches. A four-stage procedure is used to define the full data model and its processes:
Fig. 1. Layers View of Methods
3.1 Gather Data Stream There are several types of data formats existing, and data mining constantly works with a variety of data types and their forms. We divide the data formats into two main groups, static data, and dynamic data, based on the present situation. Data that is provided in a set volume and are not increased in a timely way are considered static. The dynamic data, on the other hand, are those that constantly rise with time. Such data are sometimes referred to as time series data or streaming data in the literature. The data from the live feeds are collected using an extra method. Additionally, Twitter is a live stream data source, and Fig. 2 shows the strategy for gathering data from this source. The depicted graphic assumes that Twitter is the source of the stream data. The query API that Twitter offers may be used to obtain data directly from the social media platform. After the user’s account access is authenticated, the local Hadoop storage is configured into the Apache Storm architecture. This gives users the option to query the
A Novel Approach for Classification of Real Time Data Stream
331
Fig. 2. Collection of Data Stream
Twitter server for tweets using the Twitter account. A file in the HDFS directory serves as a temporary storage location for the server’s answer to the query. That is further employed for categorization purposes. 3.2 Classification of Data Data categorization is a method for grouping similar patterns of data by identifying the pattern in the data. As a result, the acquired data must be processed to extract certain attributes that will allow the real group of data to be recognized. However, any classifier may operate directly on static data to produce more accurate analysis. As a result, the categorization process must be modified in accordance with the data source interface. The classifier model must first be discussed before any changes to the classification technique are discussed. The C4.5 decision tree method is used in this experiment for categorization. Unstructured data must be modified to apply the method because the data set only uses structured data now. Figure 3 shows the steps involved in converting data from an unstructured to a structured format. The system receives the created data as input. The unstructured nature of such data makes it potentially noisy for an algorithm to process. As a result, the text’s stop words are first eliminated. Stop words like “is,” “am,” “are,” “this,” “the,” and “so on” regularly appeared in sentences without having much of an influence on the finding of any subject or domain. The procedure described below is performed to get rid of them (Table 1). The data must now be tagged using the NLP parser once the stop words have been removed. The NLP parser takes the input and extracts the information about the parts of speech. In the following stage, the features from the speech data are extracted using this portion of the data. The frequency of each speech component is calculated in this case and provided in a data table. Table 2 provides an example of the extracted feature. For determining the best set of data, the calculated characteristics can be used with the decision tree create tree data structure. We thus refer to the earlier assertion that we used the binary classification process, leading to the introduction of a new instance of the classifier between two distinct classes. Table 3 is used to describe the stream data categorization procedure. Following the completion of the data categorization, a list of the various groups is created and used in subsequent steps.
332
V. Dani et al.
Fig. 3. Data Parsing
Table 1. Stop Word Removal
Input: Collected Data D, Stop Word list Sw Output: Refined Data R
Process: 1: for 2: find the 3: End for 4: Return R
Table 2. Example Features Noun
Pronoun
Verb
Adverb
20
5.8
6.1
6
50
6.2
9.7
7.7
3.3 Gather Data Stream Following data categorization, the various data groupings are kept together in a separate big data hive structure. As a result, many hive structures each contain a comparable set of data. These hive data are utilized to create queries to locate the necessary data.
A Novel Approach for Classification of Real Time Data Stream
333
Table 3. Classification of Sample
Input: classifier C, Pattern data D Output: grouped data G
Process: 1: While D != null 2: fetch pattern Di 3: If new class appeared a. If count = = 0 i. Initialize the classifier instance ii. Classify using iii. Update count = count +1 b. Elseif count = = 1 i. Classify using ii. Update count = count +1 c. End if 4: G = 5: End if 6: i = i + 1 7: End while 8: Return G
3.4 Gather Data Stream The system receives the query keywords through the query interface, and then uses a big data query to locate the necessary entries in the database. Based on the reaction, the system’s performance is calculated and recorded in a database for further analysis.
4 Result Analysis 4.1 Accuracy The accuracy of a classification approach is calculated by dividing the number of correctly categorized patterns by total number of patterns created for the classification result. As a result, it may be used as a metric for how well the categorization method was taught. Figure 4 depicts the accuracy of the suggested Live Data Stream Classification method that has been implemented. The figure’s X axis depicts the number of times the algorithm was performed during classification and tagging, while the Y axis depicts the results in terms of accuracy %. The performance of suggested model, according to the obtained findings, gives more accurate outcomes. Furthermore, the accuracy of the Data categorization model is stable even when the number of runs is varied.
334
V. Dani et al.
Accuracy in %
ACCURACY 100 80 60 40 20 0
Proposed Method
1
2
3 4 5 6 Number of Runs
7
Traditional Decision Tree
Fig. 4. Accuracy
4.2 Error Rate The error rate of a system is defined as the number of data misclassified samples during algorithm classification. Given graph depicts the networks’ relative packet delivery scenario. The X axis represents node counts of network, while the Y axis represents the proportion of completely delivered packets.
Error Rate in %
ERROR RATE 50 40 30 20 10 0
Proposed Method Traditional Decision Tree 1
2
3 4 5 6 Number of Runs
7
Fig. 5. Error Rate
Figure 5 depicts the applied classification method’s comparative error rate. Throughout multiple executions, the proposed classification performs well and efficiently, and as the volume of data rises, performance decreases. As a result, the suggested classifier performs more accurately and efficiently than the default C 4.5 data categorization approach.
A Novel Approach for Classification of Real Time Data Stream
335
4.3 Memory Utilization The amount of system memory used is sometimes referred to as the system’s space complexity when describing algorithm performance (Fig. 6).
Error Rate in %
MEMORY UTILIZATION 600000 400000
Proposed Method
200000 0 1
2 3 4 5 6 Number of Runs
7
Traditional Decision Tree
Fig. 6. Memory Utilization
Quantity of memory used is determined by the amount of data in the main memory, which has an impact on the computational cost of running an algorithm. X axis depicts the number of algorithm tests, while the Y axis depicts relative memory usage during execution in kilobytes (KB). According to the presentation, the memory of both implemented result memory demonstrates the system’s complexity, with typical C4.5 requiring less space than the suggested data stream classification method. 4.4 Time Consumption The amount of time required to categorize all the live stream data is known as the time consumption. The time consumption of the proposed algorithm is given using Fig. 7. Performance of suggested approach, according to the comparison results analysis, minimizes the time consumption. As a result, our suggested method of stream data categorization of twitter data employing augmentation of the decision tree, i.e. C 4.5, is better and more efficient. Figure 7 depicts the applied classification method’s comparative error rate. Throughout multiple executions, the proposed classification performs well and efficiently, and as the volume of data rises, performance decreases. As a result, the suggested classifier performs more accurately and efficiently than the default C 4.5 data categorization approach.
336
V. Dani et al.
Time in Milisecond
TIME CONSUMPTION 50 40 30
Proposed Method
20 Traditional Decision Tree
10 0 1
2
3 4 5 6 Number of Runs
7
Fig. 7. Time Consumption
5 Conclusion Normal computer approaches cannot effectively and efficiently process the enormous amount of data. As a result, the need for fresh approaches to improving system reaction time and computation efficiency has emerged. Big data infrastructure is used to process data more effectively. The main problem with this technique is that it doesn’t function until the entire data block has been processed. The model delayed the data response procedures if there was less data. To decrease the time, it takes for user queries to be answered, it is necessary to develop a new method that operates on data streams and provides frequent responses to user queries. The “few data may response quicker” hypothesis is applied to find a solution for the problem that is being addressed. The twitter live stream data is chosen since the suggested approach is based on streaming text data. The storm infrastructure and the Twitter API are used to collect such data. Using NLP tagging and the TF-IDF approach, feature extraction is first carried out on the obtained data. The data are then divided into groups using the ensemble-based classification approach. These categorized or grouped data are immediately stored in the HIVE structured database. The conserved data may be utilized with any rapid response application or the query interface to provide the end user answer.
References 1. Wares, S., Isaacs, J., Elyan, E.: Data stream mining: methods and challenges for handling concept drift. SN Appl. Sci. 1(11), 1–19 (2019). https://doi.org/10.1007/s42452-019-1433-0 2. What is Streaming Data? https://aws.amazon.com/streaming-data/ 3. Hahsler, M., Bolanos, M., Forrest, J.: Introduction to stream: an extensible framework for data stream clustering research with R. J. Stat. Softw. 76, 1–50 (2015) 4. Reddy, P.B., Kumar, C.H.S.: A simplified data processing in MapReduce. (IJCSIT) Int. J. Comput. Sci. Inf. Technol. 7(3), 1400–1402 (2016)
A Novel Approach for Classification of Real Time Data Stream
337
5. Rammer, D., Pallickara, S.L., Pallickara, S.: Atlas: a distributed file system for spatiotemporal data. In: Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing, pp. 11–20 (2019) 6. Rathore, M.M., Son, H., Ahmad, A., Paul, A., Jeon, G.: Real-time big data stream processing using GPU with spark over hadoop ecosystem. Int. J. Parallel Prog. 46(3), 630–646 (2018) 7. Borkowski, M., Hochreiner, C., Schulte, S.: Minimizing cost by reducing scaling operations in distributed stream processing. Proc. VLDB Endow. 12(7), 724–737 (2019) 8. Luo, C., Carey, M.J.: Efficient data ingestion and query processing for LSM-based storage systems (2018). arXiv preprint arXiv:1808.08896.) 9. Hasan, M., Orgun, M.A., Schwitter, R.: Real-time event detection from the Twitter data stream using the TwitterNews+ Framework. Inf. Process. Manag. 56(3), 1146–1165 (2019) 10. Aziz, K., Zaidouni, D., Bellafkih, M.: Real-time data analysis using Spark and Hadoop. In: 2018 4th international Conference on Optimization and Applications (ICOA), pp. 1–6. IEEE (2018) 11. Almaslukh, A., Magdy, A.: Evaluating spatial-keyword queries on streaming data. In: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 209–218 (2018)
A Review on Machine Learning and Blockchain Technology in E-Healthcare Deepika Tenepalli and Navamani Thandava Meganathan(B) SCOPE, VIT, Vellore 632014, Tamil Nadu, India [email protected], [email protected]
Abstract. Healthcare always plays an important role in human life. All stakeholders, including physicians, nurses, patients, life insurance agents, etc., may easily access patients’ medical information, credit goes to cloud computing, which is a key factor in this transformation. Cloud services provide flexible, affordable, and wide-ranging mobile access to patients’ Electronic Health Records (EHR). Despite the immense advantages of the cloud, Patients’ EHR security and privacy are key concerns, like real-time data access. These EHR data can be useful in finding and diagnosing chronic diseases like cancer, heart attack, diabetes, etc. Due to the severity of these diseases, most of the population is dying because of the lack of prediction techniques in the early stages of diseases. Hence, it becomes one of the most promising research problems to analyze and find solutions to overcome this loss. This work includes a review of e-healthcare-related research studies that can aid researchers in understanding the drawbacks and benefits of current healthcare systems that use machine learning, blockchain technology, and other components to ensure privacy and security. Keywords: E-healthcare · Electronic Health Records (EHR) · Machine Learning · Blockchain Technology
1 Introduction As we all know the old saying “Health is Wealth”, everyone gives importance to health. Healthcare begins from our home remedies to well-established hospitals. In the 19th century, the practice of medicine evolved into a natural science-based profession involving training and expertise. In the conventional healthcare system, patients were not participating in taking decisions about how to manage their health and illnesses, they depend on the healthcare providers, so all the burden is on healthcare practitioners. The fundamental driving force for Patient autonomy, including the utilization of potentially disruptive developments that were becoming available, was the dread of being left in the hands of decisions over which they had no control [1]. We know that everybody’s life revolves around healthcare, and to improve it, eHealthcare was designed using the most recent methods and technologically advanced types of equipment. Healthcare organizers or providers keep an Electronic Health Record (EHR) about patient information, which will further aid in the diagnosis and prognosis © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 338–349, 2023. https://doi.org/10.1007/978-3-031-35510-3_33
A Review on Machine Learning and Blockchain Technology
339
of diseases to deliver better health provisions (predicting, diagnosis, treatment, etc.) [2].An Electronic Health Record (EHR) is a collection of different medical records which are created during every clinical interaction with the patients or for every visit to the hospital. Due to the development of self-care, and smart homecare systems, valuable healthcare data are now created continuously and have long-term therapeutic importance. Existing E-Healthcare systems, however, lack the level of secrecy, integrity, privacy, and user confidence necessary for broad usage since they have not yet fully developed and matured still it is a most promising challenge for the researchers [2]. Involving E-Healthcare in human life not only provides online services but also reduces the burden of the patients to travel from their homes to hospitals and reduces time and energy. It can also avoid the maintenance of hardcopies of medical reports and provide the anytime consultancies. Although it has immense advantages, there are mainly two key challenges: security and storage cost. Due to the abundance of web users in the era of the Internet and the massive volume of cloud users in the cloud, providing data security is one of the most challenging tasks in today’s fast-paced, open world. The fact that conventional encryption techniques don’t significantly increase security, providing better security and privacy for EHR data is a promising challenge for research [3]. Generally, EHR data is stored and secured in a cloud environment to maintain the reliability and accessibility of the data. We know that cloud services are to be paid for the usage and hence increase the storage cost of the EHR data. Hospitals have control over patients’ health data and this data is shared among the health service providers for various treatments. While sharing this data, there is a chance of getting privacy, security problems, and delays in the exchange of information. To overcome these issues in a better way, nowadays researchers in the healthcare domain are relying on Blockchain Technology [4]. Since EHR data is increasing day by day, it is difficult to process manually, and producing accurate results also difficult. So, healthcare providers depend on the latest technologies which can reduce their burdens. Machine Learning is a subset of Artificial Intelligence (AI), which is a widely used technique in healthcare predictions and diagnosis of medical data. In recent years, machine learning has emerged as the most significant technology for solving numerous real-world problems on a global scale. Therefore, it is becoming a popular problem-solving technology among researchers in the healthcare domain. The remainder of this work is structured as follows: Sect. 2 describes how machine learning is utilized in E-Healthcare. In Sect. 3, we have discussed the importance of Blockchain Technology in E-Healthcare. The process of data collection for healthcare is discussed in detail in Sect. 4. Also, some of the research findings and issues observed from the existing works are presented in Sect. 5. Finally, the conclusion and future work are described in the last section.
2 Machine Learning for E-Healthcare In the most promising period of current healthcare data analysis, manually evaluating healthcare data is a difficult task for healthcare practitioners [5]. Machine Learning may be an important tool for predicting the future or making a judgment based on a given
340
D. Tenepalli and N. Thandava Meganathan
huge dataset. Machine Learning is a technique in which a machine is trained automatically instead of explicitly personalized. Machine learning’s primary goal is to create a computer program that can access data and utilize that as training data. Predictions are more accurate when there is more relevant data. Many industries, including banking, retail, health care, and social data can benefit from machine learning [5].In recent years, intelligent health monitoring platforms and healthcare facilities have made substantial use of Deep Learning (DL) and Machine Learning (ML) techniques. The health data produced by various Internet of Medical Things (IoMT) devices and sensors were analyzed using DL and ML algorithms. 2.1 Machine Learning (ML) Models in E-Healthcare Machine Learning may be divided into three categories: Supervised learning, unsupervised learning, and reinforcement learning. When we know the labels of the outcomes, we can use supervised Learning. There are regression and classification algorithms based on variable categorization. Regression algorithms include linear regression, support vector machines, decision trees, and so on. Classification algorithms are Support vector machines, Naive Bayes classifiers, K-nearest neighbor, Logistic regression classifiers, Decision tree classifiers, and so on. Unsupervised learning is used when the data is not labeled, and the specific output label is not known. Unsupervised learning algorithms are divided into clustering and association algorithms [6]. Machine learning-based algorithms are used to extract clinical information from electronic health data to simplify the diagnosis process. Machine learning algorithms can also be useful in self-diagnosis of the people. These technologies play a significant role in lowering costs in healthcare industries as well as providing clues for even more accurate illness forecasting. These days, every nation is concerned about it, and every year they spend more money on it to enhance the healthcare systems using the technologies like big data and machine learning [7].To help researchers to choose the better algorithm for the predictions, Ferdouset al. [8] researched several machine learning algorithms for disease diagnosis and determined the accuracy of the algorithms. However, there is a necessity to analyze larger databases. Similarly, Hossain et al. [9] have proposed a framework for early-stage disease detection using machine learning techniques, it predicts the diseases in less time but needs to be improved to work for the entire system, i.e., it only tends to work for pathogenic infections. Mohan et al. [10] came up with a way to predict heart disease by using a mix of machine-learning techniques. The authors also proposed a structured attribute set to choose data features for machine learning classifier training and testing.
A Review on Machine Learning and Blockchain Technology
341
Table 1. Comparative Analysis of ML Models in Healthcare Dataset
Training Model
Disease-Type
Performance Measure-Accuracy (%)
Remarks
[3]
UCI Machine Fuzzy Rules and Learning CNN repository (Diabetic Data set, Heart Disease Data set)
Diabetic and Heart Beat Rates
99
• Prediction of deadly diseases is implemented • Proposed a new Elliptic-Curve Cryptography(ECC) based cloud storage algorithm
[9]
Diabetic hospital dataset
• Decision Tree • Naïve Bayes • Logistic Regression • Random Forest
Diabetic
• • • •
• Reduced processing time and thus a lower error rate • Requires deployment of a whole system
[11]
MIMIC III (Medical Information Mart for Intensive Care) dataset
• Random Forest(RF) • Convolution Neural Network(CNN)
Cardiac arrest
• 81 • 71
It works well for small data set only
[14]
Cleveland Heart Disease(HD) dataset
SVM-FCMIM( Support Vector Machine with Fast Conditional Mutual Information) features selection algorithm
Heart Disease
92.37
Feature selection method is proposed to choose the suitable features that increase classification accuracy and shorten the diagnosis system’s processing time. Prediction accuracy is improved
[25]
Cleveland, Hungary, and Switzerland datasets
Machine Learning/Deep Learning
Chronic diseases
92
• Better performance compared to existing models • Vulnerable to Man-in-the-middle attacks, data leakage, jamming, spoofing attacks, etc
[26]
Physio-Bank(ECG Database)
Improved Support Vector machines(I-SVM)
Heart Failure Disease (HFD)
94.97
• Low-cost implementation and improved treatment accuracy • Input is taken only for ECG signals and the system must be improved for biological signals of chronic diseases
96.5 83.4 92.4 97.4
(continued)
342
D. Tenepalli and N. Thandava Meganathan Table 1. (continued) Dataset
Training Model
[27]
Statlog (Heart) dataset
[28]
Disease-Type
Performance Measure-Accuracy (%)
Remarks
RelifF-Rough Heart Disease set-based approach
92.59
• High accuracy is achieved by choosing the right feature for the model’s training and testing • Computation time is high
PPG-BP(Blood Pleasure) and PPG-DaLiA Data sets
• Decision Tree (DT) • Convolution Neural Network-Long Short-Term Memory (CNN_LSTM)
• 99.5 • 97.5
• Helpful for rural residents • Concentrated solely on Photoplethysmography (PPG) data collections
[29]
UCI Repository
• K-Nearest Heart Disease Neighbor (KNN) • Support Vector Machines (SVM)
• 88 • 88
Need to improve accuracy
[30]
Diabetes dataset from Kaggle Machine Learning Repository
Decision Tree Iterative Dichotomiser 3 (DT ID3)
Diabetic
99
• Working well with the feature-selection methods • Need to develop for deadly diseases
[31]
UCI Repository
Memory-based Meta-heuristic Attribute Selection (MMAS)
Chronic Diseases like breast cancer, diabetes, heart disease, etc
94.5
• It works well for small data set only • Need to extend with real data sets which will aid in treatments
[32]
UCI Machine Learning Repository
• Decision Tree(DT) Support • Vector Machine(SVM) • Logistic Regression(LR)
Chronic kidney • 95.9 disease • 94.8 • 93.3
Cardiovascular Disease
• Early prediction of disease is possible • Need to improve the performance
In Table 1, we have made a comparative analysis of machine learning models with different disease predictions. Research works related to predicting the diseases like Heart Disease, Diabetics, Chronic diseases, etc., are analyzed by collecting the dataset from the University of California Irvine (UCI) Machine Learning Repository and Kaggle repositories. The researchers made predictions by applying different ML algorithms. This analysis shows that the accuracy of these systems varies from a small value to the highest value (71 to 99) with smaller data sets. Even though the researchers have come up with better prediction results using different types of ML models, still there exist
A Review on Machine Learning and Blockchain Technology
343
some important issues yet to be addressed. The issues like security, privacy, lack of precise disease prediction, fault diagnosis, etc. need to be focused on. Generally, Decision Trees are providing good accuracy in processing healthcare data but are lacking with the larger dataset. And in recent research studies, it is stated that even though we have numerous Machine learning (ML) and Deep Learning (DL) algorithms, only the models like Support Vector Machines(SVM),Random Forest(RF),K-nearest Neighbor(KNN),Naïve Bayes (NB) and Convolution Neural Network(CNN), are the most implemented algorithms for healthcare problems [11]. SVM’s ability to manage non-linear classification is a bonus. The challenge of selecting the best kernel for non-linear classification is a potential drawback of SVM. Moreover, if the dataset is big, the training time for SVM models could be long. However, SVM excels at solving linear problems [11]. In this section, we have discussed the need for machine learning in e-healthcare, different types of machine learning algorithms used in E-Healthcare, and EHR security. Also, we have discussed a few of the existing research works in e-healthcare concerning the methodologies applied and the limitations of those works.
3 Blockchain Technology in E-Healthcare Blockchain is a part of distributed ledger technology. Distributed ledger technology is a way to record data that is shared among many computers, and it is hard to change or hack. Blockchain technology with cryptographic security was first presented in 1991. Adding timestamps to the documents prevents data from being altered. In 2008, Satoshi Nakamoto updated the concept to make it more relevant and introduced bitcoin as a digital ledger [15]. Blockchain 1.0, released in 2008, was a chain of blocks holding numerous units of accessible information and transactions using a digital ledger in an electronic peer-to-peer system. A public blockchain called Ethereum blockchain was created as part of Blockchain 2.0 in 2013 and allows users to store assets as smart contracts. It serves as a base for the creation of several decentralized applications [15]. Later, blockchain networks were implemented as federated (consortium), private, and public blockchain networks in all real-time applications, improving operational efficiency. Blockchain is a computer program that allows users to conduct transactions without the need for an impartial third party. In a Public Blockchain, any network user has access to the block, can complete transactions, and may participate in the consensusbuilding process [15, 16]. An intermediary registration or a reliable third party is not present in this situation. Any participant can access the blocks at any moment on public blockchains since they are transparent and accessible. Satoshi Nakamoto created Bitcoin as a cryptocurrency and peer-to-peer payment system utilizing a public blockchain. The participants’ transactions are organized into blocks, and verification of the block’s legitimacy is done by the network’s nodes, known as miners. Public Blockchain is not trustworthy because of these miners. Because private blockchain is built on access control, only authorized users are allowed to participate in the network [15]. Blockchain was first developed to power Bitcoin, but now it has developed into a technology that forms the basis for many other decentralized applications. Blockchain is being utilized to secure private information in industries including banking, insurance, and healthcare. Blockchain has been suggested as a curative for some of the most
344
D. Tenepalli and N. Thandava Meganathan
important problems facing the healthcare industry, including the safe exchange of medical records and adherence to data protection rules [17]. Better healthcare assistance for medical staff and other stakeholders in healthcare facilities has been made possible by Blockchain technology’s wide-ranging and promising possibilities [18].
Beginning of a Transaction
Creation of Data Blocks
Adding block to the chain
Distribution of Blocks
Consensus and approval from the network
Updated block is distributed
Fig. 1. Workflow model of Blockchain Technology [28]
Figure 1 explains the Workflow model of Blockchain Technology. The process starts with a transaction or exchange, and then the data is formatted as a block. After the block has been generated, it is sent to the members, and it awaits approval from the network and consensus before continuing. This new block is then added to the chain, which is then eventually updated and distributed. 3.1 EHR Security Securing patients’ health records is very important in any healthcare system. Due to the prevalence of EHRs, hospitals emphasize data security. Because of dispersed and concurrent processing, cloud security cannot be guaranteed. Due to the abundance of electronic health records, hospitals prioritize data security. Mustafa et al. [4] made a research study on Blockchain Technology in healthcare to make awareness of blockchain technology utilization in healthcare for users. ML learns from the data that we supply, and as a result, it can identify fraudulent activity within the system by analyzing suspicious access. Guo et al. [12] have developed a fog-based health monitoring system to collect patient data and track the condition of patients in a remote location using Symmetric Homomorphic Encryption (SHE). It can reduce the computation cost and satisfies the security properties but difficult to exchange data between arbitrary devices. Singh et al. [13] have developed a federated learningbased secure architecture for data privacy preservation, which allows the data to be secured on local bodies while also providing secure data collaboration for IoT devices. It still needs to optimize latency and storage requirements. Parallel processing and distribution undermine the reliability of cloud security. To address this issue, Blockchain technology has been implemented in the cloud to secure medical data that is susceptible to fraud, tampering, and breaches of confidentiality [4]. Blockchain technology is not only used for security it also helps to maintain the medical data in a distributed ledger, data sharing can be possible without any tampering, and
A Review on Machine Learning and Blockchain Technology
345
easy to detect fraudulent activities. Thus, Blockchain technology plays a vital role in providing better security and privacy for healthcare data. 3.2 Applications of Blockchain Technology in E-Healthcare There is a wide range of applications in Blockchain Technology, some of the Blockchain Applications in Healthcare are shown in Fig. 2. To store and retrieve the patient’s Electronic Health Records securely, Blockchain technology can be utilized because blockchain act as a tamper-proof digital ledger. To share clinical patient information among clinical researchers also, blockchain technology can be used because it acts as a secure and distributed system. In Neuroscience Research, patients’ brain activities and states are stored and shared among the researchers more safely and securely using this technology. Medical Fraud detection is also one of the important applications of Blockchain technology. Since blocks are immutable, medical data stored in the blocks cannot be modified, only authenticated access is possible. Hence, any outlier fraudulent activity can be easily detected. Taloba, et al. [19] have proposed a Blockchain-based platform for managing secure healthcare data. It works well with the centralized EHR system but still, there is a need for decentralization. One of the most well-known applications of blockchain technology is in the field of healthcare. Blockchain technology may solve healthcare data safety, anonymity, access, and storage challenges [20]. Healthcare needs ubiquity. Two parties can share facts or information accurately, quickly, and consistently whether they be human or machine. Sanober et al. [21] looked at blockchain applications in different fields, but E-Healthcare management was the focus. They found that blockchain technology not only played a big role in crypto currencies, but also has a lot of scope in healthcare, thanks to features like Decentralized storage, modifiability, safety, and confidentiality.
Electronic Health Records (EHR)
Clinical Research (CR) Applications of Block-
Neuroscience Research (NR)
chain technology in eHealthcare
Medical Fraud Detection (MFD) Pharmaceutical Industry and Research (PIR)
Fig. 2. Applications of Blockchain Technology in E-Healthcare [33]
Accessible and secure medical data, statistics, knowledge, and insight on the network are necessary to assure the legitimacy, authenticity, and trustworthiness of the e-healthcare system [22]. Blockchain provides privacy and security for data exchange
346
D. Tenepalli and N. Thandava Meganathan
across decentralized nodes [22]. Thus, it is observed that blockchain technology plays an important role in providing better security and privacy protection of healthcare records. Some of the problems faced in existing systems are as follows: • Data sharing between arbitrary devices is a challenging task • Usage of improper protection strategy exposes information to intruders • Optimization of latency in accessing the records and provision of optimized storage space is yet to be addressed • Need for Trust and reward mechanism for managing the e-healthcare data • Depending on the size of the EHR, the time required to access and search for the data varies and thus this issue needs to be addressed
4 Data Collection for E-Healthcare Data collection is the process of acquiring information from various sources. Typically, healthcare data can be collected in two ways online and offline. In offline mode, hospitals in the public or private sector collect the data and submit a request for authorization to utilize patient records for investigations. Additionally, we can acquire offline patient data sets from test centers. Online datasets can be collected by publicly available Repositories such as UCI Machine learning, Kaggle, Google dataset, Physio-Bank, hospital datasets, Statlog (Heart) data set, etc. It is a fact that research relies on data collection, which is also quite important in figuring out how effective and expensive the work will be. There are a variety of methods, from handwritten documents to digital files. There is currently no definite method of data collection, rather every approach has its pros and cons. Research dataset collection is a difficult task for every study [23]. Most of the E-healthcare research works have used input data from the sources like UCI, Kaggle, etc., and applied it to machine learning or deep learning, and blockchain technologies to help diagnose diseases, make predictions and protect people’s privacy. Generally, data can be divided into two types, primary and secondary data types. Primary data is the original data generated by the researchers to start the research work. Primary data can be generated by the methods like survey process, questionnaire distribution, and processing, etc. Secondary data is already existing data that can be used for further processing. These are collected by the researchers from various sources when they start their research for some problems. Collecting data from the available repositories is the secondary data type, so it is already preprocessed on another system. Hence, the researchers who are starting their medical research can collect secondary data and perform filtering to get accurate results. Here filtering is a technique used to clean the data suitable to our system. One of the major problems in this data collection phase is selecting valid data sets. As we know that machine learning training solely depends on data that we are feeding into the network. Hence, a large amount of data is always required to process and get the best outcomes from the system. By observing all these, it is better to collect primary data and train our system to get better prediction accuracy, and later we can test it with the secondary data sets to verify the performance of the developed system.
A Review on Machine Learning and Blockchain Technology
347
To develop these machine learning models, there are machine learning frameworks like TensorFlow, Amazon machine learning, Scikit-learn, Apache mahout, and a cognitive toolkit of Microsoft [34]. Based on the requirements, users can choose any of these frameworks, but the most commonly used tool is Amazon machine learning.
5 Research Findings and Issues Despite the tremendous benefits that e-health services provide, some challenges still need to be overcome. Some of these challenges are developing a system that can find diseases in their early stages, lowering the cost of health care, managing patients’ health information well, protecting patients’ privacy, and being able to share information between different health facilities. [24].The following research findings have been found through this study: • Despite the world’s moving towards digitalization, few hospitals still not efficiently using the EHR system, hence, creating and maintaining the system must be improved further. • Even though there has been a lot of research on E-Healthcare, security and privacy issues have not yet been addressed effectively. • There are many benefits to using the cloud for maintaining E-healthcare records, but still, there is room for optimization. • There has been more research on both centralized and decentralized EHR systems, but more research on decentralization is still needed. • Though machine learning and artificial intelligence play vital role in e-healthcare, improving the accuracy of diagnosis is always a challenge for researchers. • It is important to look for the different ways of blockchain technology utilization in E-Healthcare. • It’s hard to share data between any two devices in the blockchain network. • In a blockchain network, improper usage of security strategies leaves information open to thieves.
6 Conclusion In this work, machine learning and blockchain technologies that are utilized for ehealthcare for the diagnosis and prognosis of diseases are discussed through an indepth analysis of the Existing systems. Table 1 shows a comparative analysis of the recent research works done on machine learning algorithms to predict various chronic diseases. It is observed that the decision tree algorithm is working well with small data sets in healthcare and the support vector machine algorithm is working better with linear problems by providing high accuracy. Even though the existing works are doing well on their part, still many issues have not yet been addressed fully. Also, we have discussed the need for security and privacy in keeping the patients’ data secure and respecting their privacy. In addition to that, methodologies and frameworks were discussed. Even though there exist several security and privacy methods, there is a need for building
348
D. Tenepalli and N. Thandava Meganathan
trustworthy frameworks for e-healthcare systems and more robust security measures need to be proposed. In future work, it is planned to build a secure trust-based architecture by utilizing the most up-to-date security and diagnosis methods, which will allow for improving the prediction performance and to provide better security features. Also, developing a framework for an EHR system that can also suggest medicines for the patients based on the disease predicted is an interesting research work to be focused on.
References 1. Meskó, B., Drobni, Z., Bényei, É., Gergely, B., Gy˝orffy, Z.: Digital health is a cultural transformation of traditional healthcare. Mhealth 3 (2017) 2. Sahi, M.A., et al.: Privacy preservation in e-healthcare environments: state of the art and future directions. IEEE Access. 30(6), 464–478 (2017) 3. Munirathinam, T., Ganapathy, S., Kannan, A.: Cloud and IoT based privacy preserved eHealthcare system using secured storage algorithm and deep learning. J. Intell. Fuzzy Syst. 39(3), 3011–3023 (2020) 4. Mustafa, M., Alshare, M., Bhargava, D., Neware, R., Singh, B., Ngulube, P.: Perceived security risk based on moderating factors for blockchain technology applications in cloud storage to achieve secure healthcare systems. Comput. Math. Methods Med. 19, 2022 (2022) 5. Dhillon, A., Singh, A.: Machine learning in healthcare data analysis: a survey. J. Biol. Today’s World. 8(6), 1 (2019) 6. Alanazi, A.: Using machine learning for healthcare challenges and opportunities. Inf. Med. Unlocked. 21, 100924 (2022) 7. Tumpa, E.S., Dey, K.: A review on applications of machine learning in healthcare. In: 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI), 28 April 2022, pp. 1388–1392. IEEE (2022) 8. Ferdous, M., Debnath, J., Chakraborty, N.R.: Machine learning algorithms in healthcare: a literature survey. In: 2020 11th International conference on computing, communication and networking technologies (ICCCNT), 1 July 2020, pp. 1–6. IEEE (2020) 9. Hossain, M.A., Ferdousi, R., Alhamid, M.F.: Knowledge-driven machine learning-based framework for early-stage disease risk prediction in edge environment. J. Para. Distrib. Comput. 1(146), 25–34 (2020) 10. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 19(7), 81542–81554 (2019) 11. Soudan, B., Dandachi, F.F., Nassif, A.B.: Attempting cardiac arrest prediction using artificial intelligence on vital signs from Electronic Health Records. Smart Health. 23, 100294 (2022) 12. Guo, C., Tian, P., Choo, K.K.: Enabling privacy-assured fog-based data aggregation in Ehealthcare systems. IEEE Trans. Ind. Inf. 17(3), 1948–1957 (2020) 13. Singh, S., Rathore, S., Alfarraj, O., Tolba, A., Yoon, B.: A framework for privacy-preservation of IoT healthcare data using Federated Learning and blockchain technology. Futur. Gener. Comput. Syst. 1(129), 380–388 (2022) 14. Li, J.P., Haq, A.U., Din, S.U., Khan, J., Khan, A., Saboor, A.: Heart disease identification method using machine learning classification in e-healthcare. IEEE Access. 9(8), 107562– 107582 (2020) 15. Balusamy, B., Chilamkurti, N., Beena, L.A., Poongodi, T.: Blockchain and machine learning for e-healthcare systems. In: Blockchain and Machine Learning for e-Healthcare Systems, pp. 1–481 (2021)
A Review on Machine Learning and Blockchain Technology
349
16. Amanat, A., Rizwan, M., Maple, C., Zikria, Y.B., Almadhor, A.S., Kim, S.W.: Blockchain and cloud computing-based secure electronic healthcare records storage and sharing. Front. Public Health 19, 2309 (2022) 17. Tandon, A., Dhir, A., Islam, A.N., Mäntymäki, M.: Blockchain in healthcare: a systematic literature review, synthesizing framework and future research agenda. Comput. Ind. 1(122), 103290 (2020) 18. Javed, W., Aabid, F., Danish, M., Tahir, H., Zainab, R.: Role of blockchain technology in healthcare: a systematic review. In: 2021 International Conference on Innovative Computing (ICIC), 9 Nov 2021, pp. 1–8. IEEE (2021) 19. Taloba, A.I., Rayan, A., Elhadad, A., Abozeid, A., Shahin, O.R., Abd El-Aziz, R.M.: A framework for secure healthcare data management using blockchain technology. Int. J. Adv. Comput. Sci. Appl. 12(12) (2021) 20. Khezr, S., Moniruzzaman, M., Yassine, A., Benlamri, R.: Blockchain technology in healthcare: a comprehensive review and directions for future research. Appl. Sci. 9(9), 1736 (2019) 21. Sanober, A., Anwar, S.: Blockchain for content protection in E-healthcare: a case study for COVID-19. In: 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), 5 Mar 2022, vol. 1, pp. 661–666. IEEE (2022) 22. Shaikh, Z.A., Khan, A.A., Teng, L., Wagan, A.A., Laghari, A.A.: BIoMT modular infrastructure: the recent challenges, issues, and limitations in blockchain hyperledger-enabled e-healthcare application. Wirel. Commun. Mobile Comput. (2022) 23. Wilcox, A.B., Gallagher, K.D., Boden-Albala, B., Bakken, S.R.: Research data collection methods: from paper to tablet computers. Med. Care 1, S68-73 (2012) 24. Qureshi, M.M., Farooq, A., Qureshi, M.M.: Current eHealth Challenges and recent trends in eHealth applications. arXiv preprint arXiv:2103.01756 (2021) 25. Bordoloi, D., Singh, V., Sanober, S., Buhari, S.M., Ujjan, J.A., Boddu, R.: Deep learning in healthcare system for quality of service. J. Healthcare Eng. 8, 2022 (2022) 26. Geweid, G.G., Abdallah, M.A.: A new automatic identification method of heart failure using improved support vector machine based on duality optimization technique. IEEE Access. 4(7), 149595–149611 (2019) 27. Liu, X., et al.: A hybrid classification system for heart disease diagnosis based on the RFRS method. Comput. Math. Methods Med. 3, 2017 (2017) 28. Sadad, T., Bukhari, S.A., Munir, A., Ghani, A., El-Sherbeeny, A.M., Rauf, H.T.: Detection of cardiovascular disease based on PPG signals using machine learning with cloud computing. Comput. Intell. Neurosci. 4, 2022 (2022) 29. Kumari, V., Reddy, P.B., Sudhakar, C.: Performance interpretation of machine learning based classifiers for e-healthcare system in fog computing network. In: 2022 IEEE Students Conference on Engineering and Systems (SCES), 1 July 2022, pp. 01–05. IEEE (2022) 30. Haq, A.U., et al.: Intelligent machine learning approach for effective recognition of diabetes in E-healthcare using clinical data. Sensors 20(9), 2649 (2020) 31. Mishra, S., Thakkar, H.K., Singh, P., Sharma, G.: A decisive metaheuristic attribute selector enabled combined unsupervised-supervised model for chronic disease risk assessment. Comput. Intell. Neurosci. 8, 2022 (2022) 32. Pal, S.: Chronic kidney disease prediction using machine learning techniques. Biomed. Mater. Dev. 31, 1–7 (2022) 33. Ramzan, S., Aqdus, A., Ravi, V., Koundal, D., Amin, R., Al Ghamdi, M.A.: Healthcare applications using blockchain technology: motivations and challenges. IEEE Trans. Eng. Manag. (2022) 34. Singh, K.K., Elhoseny, M., Singh, A., Elngar, A.A. (eds.): Machine Learning and the Internet of Medical Things in Healthcare. Academic Press, Cambridge (2021)
Machine Learning Models for Toxicity Prediction in Chemotherapy Imen Boudali1,2(B) and Ines Belhadj Messaoud2 1 Sercom Laboratory, University of Carthage, 1054 Carthage, Tunisia
[email protected]
2 National Engineering School of Tunis - ENIT, University of Tunis EL Manar, Tunis, Tunisia
[email protected]
Abstract. While undergoing chemotherapy treatments, patients may face some complications affecting their health. This can be apparent in different forms of symptoms that patient may experience. Side effects may happen with any kind of treatment and mainly depend on the type of drug, the combination of drugs, the combination of treatment, the dose and the overall health of patients. Severe complications may occur due to the imbalance of the average chemical balance in the patient body. Treating physicians need information and support about the evolution of the treatment in order to provide the necessary medical help in time and to prevent any further complications. In this paper, we propose to support the medical staff by predicting the risk of toxicity for each patient after each chemotherapy session. This prediction is reported to treating physicians in order to decide which adjustment of drug therapy is needed. Thus, we propose an intelligent approach based on machine learning models for predicting and classifying chemotherapy induced complications according to predefined toxicity levels. Thus, patient symptoms are analyzed and side effects are detected in order to predict toxicity outcomes as well as any complication risk. The proposed prediction models are trained and assessed on real medical data that was collected during the treatment phase of cancer patients in Tunisia. Simulation results and a comparative study of the proposed models are provided by considering accuracy metrics and performance coefficient. Keywords: Cancer chemotherapy · toxicity level · machine learning models · risk prediction · classification · data Analysis
1 Introduction During the last decades, cancer is becoming a leading cause of mortality worldwide and the most important weakness to increasing life expectancy in every country during the last century [1]. According to recent study on cancer mortality in Tunisia, the rate is steadily increasing and tumors are the second largest leading cause of death after cardiovascular diseases [2]. For these reasons, a continuous evolution related to cancer research has been performed over the past decades. In early stage, scientists applied different methods to identify types of cancer before causing symptoms. Furthermore, other research works © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 350–364, 2023. https://doi.org/10.1007/978-3-031-35510-3_34
Machine Learning Models for Toxicity Prediction in Chemotherapy
351
are concerned with developing new strategies for the early prediction of cancer treatment outcome. Then, with the emergence of new technologies in medical field, great amounts of cancer data have been collected and are provided for medical research community [3]. Despite the large number of published papers in cancer research and the inherent potentialities of ongoing efforts, some issues that are related to therapeutic complications are still challenging. In fact, oncologists should inquire about the progress of their treatments and the side effects of their prescriptions during the chemotherapy phase. This follow up would allow rapid and optimal treatment in order to avoid any kind of complications [1]. Scientific studies have shown that side effects of chemotherapy mainly depend on the type of used drugs by the medical staff, the prescribed dose, the administration mode and the overall state of health. Notice that side effects appear despite additional preventive treatments [4]. In this context, our purpose is to study and analyze the most common side effects for 10 types of frequent cancers: breast cancer, prostate cancer, lung cancer, skin cancer, bladder cancer, neck cancer, non-Hodgkin’s lymphoma, uterine cancer, myeloid leukemia, colorectal cancer. We focus on the most common side effects that should be stated after 7 days of a chemotherapy session. According to many National Health Institutes, there are five severity levels for side effects [5]: low, medium, severe, highly severe and death related to adverse events. In our work and according to the collected data set, we only consider three severity levels by preserving the two first ones and combining the remaining sublevels into only one level. Hence, we consider: level 1: Low, level 2: Medium and level 3: Severe. The third level requires immediate medical intervention, adjusting drugs or their combination, new medicine prescriptions or change of dose. In this work, our objective is to support medical decision processes for cancer patients who are undergoing a chemotherapy. Hence, we propose a support approach that provides a web application for cancer patients in order to introduce information about side effects, 7 days after the last chemotherapy session. Then, on the base of predictive models, the severity level of the patient is detected for a possible emergency. The prediction models acts on collected data regarding patient health status. The proposed approach provides a support for medical staff by analyzing the collected data and predicting the severity level of health in order to decide a fast medical intervention. The detected severity level of a cancer patient may be low, medium or high. The proposed tool will contributes to prevent and to decrease complication risks during the chemotherapy process of a cancer. The proposed machine learning models are based on stored medical data that will be preprocessed and which dimensions are reduced in order to reveal models and create robust analysis. The optimal model will be selected according to precision and accuracy metrics. The remainder of this paper is organized as follows. Section 2 is a literature review of clinical prediction models and mainly those related to cancer research. Afterwards, we present in Sect. 3, the topic of chemotherapy and the related side effects. In Sect. 4, we introduce theoretical aspects of the proposed prediction models for detecting the toxicity
352
I. Boudali and I. B. Messaoud
level of chemotherapy and any eventual complications for patients. Next, we present in Sect. 5, the accuracy metrics that we used in order to assess the performances of our models. Then, we explain the characteristics of our dataset in Sect. 6. The preprocessing step is detailed in Sect. 7 before presenting the prediction step in Sect. 8. A comparative study of the proposed models is provided in Sect. 9 in order to select the most appropriate one.
2 Related Works We present in this section a review of prediction models regarding healthcare systems in general and cancer research more specifically. In order to deliver effective healthcare, it is crucial to identify risks by expecting and preventing outcomes and adverse events. Giving the huge volumes of routinely collected data, predictive models about future risks may be built on the base of machine learning methods [6, 7]. In this scope, predictive models have been potentially used in healthcare practice during the last decades. Prediction models can be used to identify patients at risk and ease care management process by considering patient severity. Moreover, they can be reliable to understand the success or failure of treatment pathway. The literature about clinical prediction models and practical cases is abundant. In [8], the author provided an early deep review about the applications of prediction models in medical field (statistical concepts, regression methods and validation strategies). In [9], the authors proposed big data analytics to identify and manage high risk and high cost patient. Moreover, the works of [10] proposed a clinical prediction model to assess risks for Chemotherapy-related hospitalizations in case of advanced cancer. In [11] the authors proposed a reporting guideline for clinical prediction models. It consists in transparent reporting of a multivariable prediction model for individual prognosis or diagnosis. The authors in [12] provided depth mining of clinical data especially the construction of clinical prediction model. As reported in this review, commonly used methods encounters multiple linear regression model, logistic regression model and Cox regression model [13]. The assessment and verification of the prediction models efficiency are the key of data analysis varying from statistical analysis, data modeling, to project design. As we notice, the presented works about clinical prediction models are based on different types of regression models and data analytics. This requires intensive expert efforts for collecting data and designing limited features [14]. In recent years, with the massive quantities of healthcare data and the rising interest of health record system [15], machine learning for healthcare systems has become an important emerging field, especially for solving various problems of clinical outcomes prediction [16]. In this context, cancer research has been widely addressed by using artificial intelligence based learning approaches. Early applications concern the development of predictive models for cancer. Most of first applications employed various methods such as Artificial Neural Networks (ANN), Support Vector Machine (SVM), Bayesian Networks (BN), Decision Trees (DT). These techniques have been used for identifying, classifying, detecting and distinguishing tumors as discussed in the review of [4]. Then,
Machine Learning Models for Toxicity Prediction in Chemotherapy
353
an increasing interest was focused on cancer prediction and prognosis [4, 17, 18]. In recent research [19], the authors applied machine learning models for assessing the predictability of four major cancer surgical outcomes. Moreover, in [20] the authors present review of Machine Learning-based models for predicting and classifying Radiotherapy complications. As stated in the different published reviews and research papers, machine learning methods have resulted in accurate and effective decision-making in healthcare systems [21]. Nevertheless, some specific issues in cancer research are still challenging such as chemotherapy response and complications. In fact, few papers have tackled with prediction of cancer therapy. In [22] the authors present an overview of recent advancements in therapeutic response prediction using machine learning. In [23], the authors focus on supervised machine learning for drug repurposing in cancer. However, the aforementioned studies did not focus strictly on risk complication and toxicity prediction in chemotherapy. Hence, our interest in this paper will be focused on predicting toxicity level during cancer chemotherapy by using machine-learning techniques as highly promising approach in medical field.
3 Machine Learning Methods for Toxicity Prediction Giving our interest to medical data analysis and prediction through machine learning techniques [6, 7], we present in this section theoretical aspects of the proposed prediction methods. As shown in Fig. 1, the machine learning process is mainly defined by four steps [7].
1. Collecng Data from various sources
4. Model Evaluaon
2. Data Preprocessing: cleaning, and feature engineering
3. Model building and selecon of the ML algorithm
Iterave Improvement in Train model and/or in selecng ML algorithm
Fig. 1. Machine Learning Process
3.1 Linear Discriminant Analysis Linear Discriminant Analysis (LDA) is commonly used tool for dimension reduction, classification and data visualization. Besides its simplicity, this tool produces decent, robust, and interpretable classification results. When dealing with real-world classification problems, Linear Discriminant Analysis is the most common benchmarking method
354
I. Boudali and I. B. Messaoud
before other more complex and flexible ones [24]. LDA model estimates the mean and the variance of all input data for each class as follows [24]. 3.2 Naïve Bayes Model This supervised learning algorithm is based on Bayes theorem and is commonly used for classification problems. This probabilistic classifier makes quick predictions on the basis of the probability of objects. This model is one of the most effective Classification algorithms since it helps in building fast machine learning models by making quick predictions [24]. Its main advantage is its fast Machine Learning algorithms for predicting a class of datasets. It can be employed for binary as well as multi-class classifications. As compared to the other algorithms, it performs well in Multi-class predictions. 3.3 Decision Trees Model Decision Trees is a supervised learning technique, which is commonly used for solving various classification and Regression problems. However, this method is mostly preferred for classification problems [6]. In this tree-structured classifier, the internal nodes represent the dataset features, while branches correspond to decision rules and each leaf node is the outcome. This graphical representation is used in order to generate all the possible solutions based on some given conditions.
4 Evaluation of Methods One of the most common evaluation metrics in machine learning is the confusion matrix [24]. This metric can be applied in the case of multi-class classification problems to generate a group of prediction scores. Various calculations for the model may be generated by using the confusion matrix such as classification accuracy, Sensitivity, Specificity, etc. Once the confusion matrix is defined, many calculations for the model may be generated. In order to understand the classifier metric, we consider in this work Classification Accuracy, Sensitivity, specificity and Kappa coefficient. We denote by TP: true positive; TN: true negative; FP: False positive; FN: False Negative. – Classification Accuracy: It estimates how often the model gives true predictions [24]: Accuracy = (TP + TN )/(TP + FP + FN + TN ) – Sensitivity: it is defined as True positive rate and corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. Sensitivity = TP/(FN + TP)
Machine Learning Models for Toxicity Prediction in Chemotherapy
355
– Specificity: it is defined as True negative rate and corresponds to the proportion of negative data points that are correctly considered as negative, with respect to all negative data points. Specificity = TN/(FP + TN) – Kappa Coefficient is an evaluation metric, which compares an observed accuracy with an expected accuracy (or random chance). The aim is to assess the performance of any classifier in relation to a “random classifier” [25]. Formally, it is defined as follows: k = (Accuracy − randomAccuracy)/(1 − random)
5 Characteristics of the DataSet The considered dataset concern 1000 cancer patients who declared side effects during the chemotherapy. This dataset is related to 10 types of frequent cancers: breast cancer, prostate cancer, lung cancer, skin cancer, bladder cancer, neck cancer, non-Hodgkin’s lymphoma, uterine cancer, myeloid leukemia, colorectal cancer. We focus on the most common side effects that should be stated after 7 days of a chemotherapy session. In Table 1, we present a part of this data where 14 different variables represent the input and output/Target data. These variables are described as follows: 1. 2.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Patient.ID: the patient identifier, which is unique value for each patient; Toxicitylevel: it represents the severity grade and the patient’s general state that has to be predicted as a target data. The three category values are “High”, “Medium” and “Low”; Age: the patient’s age; Gender: Male or Female; Chronic Disease grade: which describe the patient’s cancer stage; Chest pain: indicates the level of chest pain; Urinate troubles: this variable is relative to the urinary tract infections. Fatigue Constipat: it is relative to constipation signs; Difbreath: indicates the level of breathing difficulties; Vomiting: indicates the level of vomiting signs; Diarrhea: is relative to the level of Diarrhea signs; Skintroubles: represents the level of skin troubles; Fever: indicates the level of detected fever;
The value of each symptom variable (chest pain, Urinate troubles, fatigue, constipation, breathing difficulties, vomiting, diarrhea, skin troubles and fever) is defined in the input data as follows: Besides the intensity level of adverse events, we also consider another important parameter which is the occurrence. For level 1, the values 1, 2 and 3 are assigned to a symptom when the adverse event of level 1 occurs once, twice or more times within a week respectively. For level 2, the values 4, 5 and 6 are considered when adverse event of level 2 occurs one, twice or more times within a week. In the same way, the values 7, 8 and 9 are assigned according to the frequency of events of level 3 in a week. Hence, we have 9 possible values for each symptom during the treatment process.
356
I. Boudali and I. B. Messaoud Table 1. Six first lines of the dataset Level
1 Low 2 Medium 3 High 4 High 5 High 6 High
Age
Gender
33 17 35 37 46 35
1 1 1 1 1 1
Chronic_ Chest_Pain Disease_grade 2 2 2 2 3 4 3 7 3 7 3 4
Urinate_ troubles 4 3 8 8 9 8
Fatigue Constip Dif_ VomitingDiarrhea Skin_ Fever Breath Troubles 3 4 2 2 3 2 4 1 3 7 8 6 1 2 8 7 9 2 1 6 2 4 2 3 1 4 6 5 3 2 4 1 4 4 3 8 7 9 2 1 6 2
6 Data Preprocessing After inspection of the raw dataset, we confirm that no missing values are detected. Then, we check correlation between variables since machine learning methods assume the independence of predictive variables. In Fig. 2, we show the generated correlation matrix.
Fig. 2. Correlation Matrix between variables in the dataset
According to this matrix, there is some correlations between the different variables (shown in blue color). Therefore, we need to remove all the detected correlations in order to get all the essential information from our data. One of the most common technique of transforming a set of variables into smaller one is the Principal Component Analysis (PCA) [26]. This technique is a dimensionality-reduction method, which is commonly used in case of large data sets. It aims to reduce the data set dimensionality by transforming a large set of variables into a smaller one while preserving most of the
Machine Learning Models for Toxicity Prediction in Chemotherapy
357
information in the large set. Hence, the correlated variables are transformed into new ones that are known as principal components (PC). This transformation consists in projecting each data point onto only the principal components to get lower-dimensional data by maintaining the data’s variation [27]. Thus, the method is used to summarize main characteristics of data in order to make predictive models. Notice that smaller data sets are easier to analyze and explore for machine learning techniques without processing extraneous variables. However, the challenge of dimensionality reduction is the trade off between accuracy and simplicity. There are different ways to determine the parameter p, the number of Principal Components among a set of variables. One way is to retain components with a proportion of variance greater than 0.1. When this proportion is less than 0.1, the corresponding principal component rarely has interpretive value. Another way to determine the number of principal components to keep is to use a visual inspection of a scree plot. This diagram displays the eigenvalues in a downward curve by ordering the eigenvalues from largest to smallest. In the screen test, the elbow of the graph where eigenvalues seem to level off is determined and components to the left of this point have to be retained as significant factors [26]. This strategy is illustrated by statistical data that are generated from multivariate distribution with correlations equal to 0.0. Table 2. The importance of the different components before data reduction Standard Deviation
Proportion of variance
Cumulative Proportion
PC1
1.8972
0.3272
0.3272
PC2
1.2915
0.1516
0.4789
PC3
1.2337
0.1384
0.6172
PC4
1.0694
0.1040
0.7212
PC5
0.9755
0.0865
0.8077
PC6
0.81958
0.06106
0.86877
PC7
0.74520
0.05048
0.91925
PC8
0.57555
0.03011
0.94937
PC9
0.49467
0.02225
0.97161
PC10
0.44722
0.01818
0.98980
PC11
0.3350
0.0102
1.0000
The results of the performed principal component analysis are shown in Table 2. The output of this analysis is a list of characteristics for each principal component: standard deviation, proportion of variance and cumulative variance. This table illustrates the importance of the different components before data reduction. We notice that the two first components explains 0,4789 of the information quantity (or variance). In fact, we need 8 principal components to explain more than 0.94 of the variance and 10 components to explain more than 0.98. Hence, after dimensionality reduction process by using PCA,
358
I. Boudali and I. B. Messaoud
we obtain the results in Table 3, which illustrates the importance of principal components after data reduction. The obtained values show that 97% of the variance is explained by 7 principal components in the transformed dataset. Table 3. The importance of the different components after data reduction Standard deviation
Proportion of Variance
Cumulative Variance
PC1
1.495
0.2794
0.2794
PC2
1.2666
0.2005
0.4799
PC3
1.0435
0.1885
0.6685
PC4
1.0435
0.1361
0.8046
PC5
0.80726
0.08146
0.88608
PC6
0.67711
0.05731
0.94339
PC7
0.50526
0.03191
0.9753
PC8
0.4445
0.0247
1.0000
7 Prediction Process We provide in this section, the outcomes of each machine learning model. 7.1 Linear Discriminant Analysis The obtained results by using linear discriminant analysis are illustrated in Fig. 3. We can deduce the following information: – Correct and incorrect Predictions for each class: the first line of confusion matrix shows that for cancer patients with a real severe state “high” after 7 days of the last session of chemotherapy, the LDA model leads to correct predictions for 35% of the total number of cases. However, the model generates incorrect predictions for the other two classes (lower than 10%). Remember that the total number of cases during the Test phase is the sum of all the lines’ (or columns’) sums (199 cases). – Frequency of each true class in the test dataset: we notice the following proportions of the test dataset for each class: 39.69% “High” class, 29.14% “Medium” and 31.15% “Low”. These proportions are calculated by the sum of each line. – Frequency of each predicted class in the test dataset: we obtained the following predicted proportions for each class: 36.68% “High”, 30.15% “Medium” and 33.16% “Low”.
Machine Learning Models for Toxicity Prediction in Chemotherapy
359
Fig. 3. Results of Linear Discriminant Analysis
7.2 Naïve Bayes Model With the Naïve Bayes method, we obtained the results of Fig. 4. We deduce that: – Correct and incorrect predictions for each class: the distribution of correct prediction is the same as the DLA. In fact, the Naïve Bayes model leads to correct predictions for 35% of the total number of cases. – Frequency of each true class in the Test dataset: we notice the following proportions 35.17% for class “High”, 35.67 for class “Medium”, and 29.14% “Low”. – Frequency for each predicted class in the Test Dataset: we obtained the following predictions: 36.68% predicted observations “High”, 30.15% “Medium” and 33.16% “Low”.
360
I. Boudali and I. B. Messaoud
7.3 Decision Tree Model With Decision Tree, we obtained the results shown in Fig. 5. We deduce that: – Correct and incorrect predictions for each class are the same as those obtained with Naïve Bayes Model. In fact, we obtained 35% for correct predictions from the total number of cases. – Frequency of each true class in the Test dataset: we noticed 39.69% for “High”, 38.69% for “Medium” and 29.14% for “Low”. – Frequency of each predicted class in the Test dataset: we obtained 36.68% for “High”, 30.15% “Medium” and 33.16% for “Low”.
Fig. 4. Results of Naïve Bayes Model
8 Comparative Study In order to assess the performances of the proposed classification models, we considered the Accuracy metric and Kappa coefficient. In Fig. 6, we present in order the performances of the explained three models according these two metrics.
Machine Learning Models for Toxicity Prediction in Chemotherapy
361
Fig. 5. Results of Decision Tree Model
We notice that Naïve Bayes model leads to the best performance results with Accuracy 0.9447, comparing to LDA model (Accuracy = 0.8995) and Decision Tree model (Accuracy = 0.8593). Moreover, we compared the coefficient Kappa that estimates the quality of results. This coefficient takes into account the resulting errors in lines and columns and vary between 0 and 1. In other words, Kappa describes the proportional reduction of the resulting error from a classification method and compares it to the resulting error of a completely hazardous classification. For instances, a Kappa of 0.9 is not the result of a chance. According to the obtained results, we notice that Naïve Bayes holds the highest Accuracy value and the highest Kappa (=0.91). It is visibly the best model in our classification problem.
362
I. Boudali and I. B. Messaoud
Fig. 6. Evaluation of the proposed models according to Classification Accuracy and Kappa coefficient
9 Conclusions Advanced health tracking systems for cancer patients are crucial for early and better care service. In this context, we proposed here to support medical decision processes for cancer patients during chemotherapy. The proposed approach is based on a supervised machine learning that predict the toxicity level of cancer patient (high, medium, low). On the base of collected medical data, a first step of data cleaning and feature engineering is performed. Then, the prediction process is launched by using separately three prediction models: Linear Discriminant Analysis, Naïve Bayes and Decision trees. Simulation results on clinical data showed the accuracy of the proposed prediction models. The best results was achieved by Naïve Bayes model with 94% of accuracy. The highest values of specificity and sensitivity metrics are also obtained with Naïve Bayes model. Moreover, we opted for Kappa coefficient, which describes the proportional reduction of the resulting error from a classification method and compares it to the resulting error of a completely hazardous classification. The Kappa coefficient confirms the outperforming of Naïve Bayes model since it achieved the highest Kappa coefficient value 0,95%.
Machine Learning Models for Toxicity Prediction in Chemotherapy
363
References 1. Adam, G., Rampasek, L., Safikhani, Z., et al.: Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis. Oncol. 4, 19 (2020) 2. Statistiques Nationales sur les causes de décès en Tunisie. Ministère de santé, Institut National de santé (2021). http://www.santetunisie.rns.tn 3. Kourou, K., Exarchos, T.P., Exarchos, K.P., et al.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015). https://doi.org/10. 1016/j.csbj.2014.11.005 4. Carr, C., Ng, J., Wigmore, T.: The side effects of chemotherapeutic agents. Current Anaesth. Crit. Care 19(2), 70–79 (2008). https://doi.org/10.1016/j.cacc.2008.01.004 5. Common Terminology Criteria for Adverse Events (CTCAE). U.S. Department of Health and Human Services, Report Version 5.0., November 27 (2017) 6. Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning-Methods, Systems, Challenges. Springer, Cham (2019).https://doi.org/10.1007/978-3-030-05318-5 7. Zhang, X.D.: Machine learning. In: A Matrix Algebra Approach to Artificial Intelligence, pp. 223–440. Springer, Singapore (2020) 8. Steyerberg, E.W.: Clinical prediction models: a practical approach to development, validation and updating. Springer, New York (2009). https://doi.org/10.1007/978-0-387-77244-8 9. Bates, D.W., Saria, S., Ohno-Machado, L., et al.: Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33(7), 1123–1131 (2014). https://doi.org/10.1377/hlthaff.2014.0041 10. Brooks, G.A., Kansagra, A.J., Rao, S.R., et al.: A clinical prediction model to assess risk for chemotherapy-related hospitalization in patients initiating palliative chemotherapy. JAMA Oncol. 1(4), 441–447 (2015). https://doi.org/10.1001/jamaoncol.2015.0828 11. Collins, G.S., Reitsma, J.B., Altman, D.G., et al.: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. J. Eur. Urol. 67(6), 1142–1151 (2015). https://doi.org/10.1016/j.eururo.2014.11.025 12. Zhou, Z.R., Wang, W.W., Li, Y., et al.: In-depth mining of clinical data: the construction of clinical prediction model with R. Ann. Transl. Med. 7(23), 796 (2019). https://doi.org/10. 21037/atm.2019.08.63 13. Harrell, F.E.: Ordinal logistic regression. In: Regression Modeling Strategies. SSS, pp. 311– 325. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19425-7_13 14. Weng, W.-H.: Machine learning for clinical predictive analytics. In: Celi, L.A., Majumder, M.S., Ordóñez, P., Osorio, J.S., Paik, K.E., Somai, M. (eds.) Leveraging Data Science for Global Health, pp. 199–217. Springer, Cham (2020). https://doi.org/10.1007/978-3-03047994-7_12 15. Henry, J., et al.: Adoption of electronic health record systems among US non-federal acute care hospitals. ONC Data Brief 35, 2008–2015 (2016) 16. Ghassemi, M., Naumann, T., Schulam, P., et al.: Opportunities in machine learning for healthcare. arXiv preprint arXiv:1806.00388 (2018) 17. Ding, D., Lang, T., Zou, D., et al.: Machine learning-based prediction of survival prognosis in cervical cancer. BMC Bioinform. 22(1), 1–17 (2021). https://doi.org/10.1186/s12859-02104261-x 18. Kumar, Y., Gupta, S., Singla, R., Hu, Y.-C.: A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Arch. Comput. Methods Eng. 29, 2043–2070 (2021). https://doi.org/10.1007/s11831-021-09648-w 19. Goncalves, D.M., Henriques, R., Santos, L., Costa, R.S.: On the predictability of postoperative complications for cancer patients: a Portuguese cohort study. BMC Medical Inform. Decis. Mak. 21, 200 (2021). https://doi.org/10.1186/s12911-021-01562-2
364
I. Boudali and I. B. Messaoud
20. Isaksson, L.J., Pepa, M., Zaffaroni, M., et al.: Machine learning-based models for prediction of toxicity outcomes in radiotherapy. Front. Oncol. 10, 790 (2020). https://doi.org/10.3389/ fonc.2020.00790 21. Kumar, Y., Singla, R.: Federated learning systems for healthcare: perspective and recent progress. In: Rehman, M.H.U., Gaber, M.M. (eds.) Federated Learning Systems: Towards Next-Generation AI. SCI, vol. 965, pp. 141–156. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-70604-3_6 22. Rafique, R., Riazul Islam, S.M., Kazi, J.U.: Machine learning in the prediction of cancer therapy. Comput. Struct. Biotechnol. J. 19, 4003–4017 (2021). https://doi.org/10.1016/j.csbj. 2021.07.003 23. Tanoli, Z., Vaha-Koskela, M., Aittokallio, T.: Artificial intelligence, machine learning, and drug repurposing in cancer. Expert Opin. Drug Discov. 16(9), 977–989 (2021) 24. Shobha, G., Rangaswamy, S.: Machine learning. In: Handbook of Statistics, vol. 38, pp. 197– 228, Elsevier (2018). https://doi.org/10.1016/bs.host.2018.07.004 25. Uddin, S., Khan, A., Hossain, M., et al.: Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Inform. Decis. Mak. 19, 281 (2019). https:// doi.org/10.1186/s12911-019-1004-8 26. Jolliffe, I.: Principal component analysis. In: Lovric, M. (eds.) International Encyclopedia of Statistical Science. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-048982_455 27. Bro, R., Smilde, A.K.: Principal component analysis. Anal. Methods 6(9), 2812–2831 (2014). https://doi.org/10.1039/C3AY41907J
Underwater Acoustic Sensor Networks: Concepts, Applications and Research Challenges Kamal Kumar Gola1(B) , Brij Mohan Singh1 , Mridula1 , Rohit Kanauzia1 , and Shikha Arya2 1 COER University, Roorkee, Uttarakhand, India
[email protected], [email protected], [email protected], [email protected] 2 Indian Institute of Technology, Roorkee, Uttarakhand, India [email protected]
Abstract. During the last years, Underwater Acoustic Sensors Networks (UASNs) has emerged as an interesting area of research. The harsh characteristics that the underwater environment possesses cause many challenges for the lifetime of UASNs and the major challenge that UASNs face is Energy efficiency which occurs for the reason of limited battery power that the sensor nodes own. Hence, in order to cope with the underwater characteristics like limited bandwidth and high attenuation and to optimize the limited energy consumption, the designing of UASNs protocols need to done with utmost care. Lots of research have been done in this fields but still several issues and challenges have been met by this networks like: hotspot problem, real implementation of routing protocols, propagation delay, power consumption, energy efficient path etc. The main focus of this work is to present the existing issues and challenges that provides the future directions to do the research in the field of underwater acoustic sensor networks. Keywords: Void node avoidance · Propagation delay · Power consumption
1 Introduction The ocean, in the coming years, will proffer a significant part to cater the needs of human and industry as well as the renewable energy, from the sea, will be reaped and at the same time the industries which deal into gas and oil will be shifted into much deeper sea. Additionally, from the seafloor, many valuable materials will be dug. In view of this, there falls the need of constructing and maintaining new offshore and port infrastructure which, for sure, is a herculean task owing to the mammoth size of sea, its unexplored environment and people’s incompetency of working for a longer period of time under water on account of its high pressure. As a result, researcher, in order to substitute the conventional ocean exploration and monitoring methods, are striving hard to utilize wireless sensor networks. Underwater sensor nodes, to communicate with each other, prefers acoustic waves to radio frequency (RF) which get weakened in the underwater environment and this wireless sensor network is known as underwater acoustic sensor © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 365–373, 2023. https://doi.org/10.1007/978-3-031-35510-3_35
366
K. K. Gola et al.
network which has routing protocol design as one of the burning area of research as the protocol, assures the promising and data transmission efficiently from the direction of source node to the destination node. UASNs routing protocol design, in comparison to WSN, is complicated for many reasons notable among them are highly untrustworthiness of underwater routing due to uninterrupted movement of nodes through water currents, inefficiency of the high propagation delay and limitation of wide application that WSNs technologies has in line with particular traits of acoustic waves and channels of underwater [1]. The communication range, wherein communication takes place in between each sensor nodes, affects the density of nodes, its placement and cost of the network which aimed monitoring area possesses, is a major requisite for UASNs. For UASNs, Acoustic and Optical communication are the two means of communication out of which the majorly used one is underwater acoustic wireless communications due to its potential of being accessible and prerequisite of communication over great distances but at the same time, because of low propagation speed, limited bandwidth, high attenuation and hostile effect which it produces on underwater creatures, the aforementioned also has many limitations such as too much delay and scattering. Keeping these shortcomings in mind, Optical waves is one more approach to put in use as it, according to the research which have been carried out, owns to its credit the benefit of having greater rate of data, low latency and efficiency of energy and that too on cost of limited array of communication. Before establishing networking platform for integrating physical aspect and nodes deployment’s firmware and network formation, specification of sensor node and complexity aspect are to be taken into account as it aids in distinguishing routes vigorously and that too without any information be it added one or prior regarding other nodes. Additionally, node algorithm complexity that has impact on nodes’ energy optimization, also needs to put into consideration. The performance which node localization procedures has also get hampered by Underwater acoustic channel complexity such as shift of Doppler, multipath, high delay and these are the requisites. The network gets visible to numerous treats and nasty attacks by vibrant traits of underwater environment. In order to empower communication for the sake of exchanging information, it is mandatory for networks to enthuse trust prior to nodes securely connection to the network.
2 Basics of Acoustic Communication The acoustic signal is the only practical means which have shown proven results of working satisfactorily in the underwater environment. Though, there are several other alternatives available in the form of underwater optical and electro-magnetic waves, but the features of underwater sensor requirements out rule them. The high frequency electromagnetic wave has a communication range which is quite limited due to absorption effect and high attenuation, being 1 m in fresh water. Although, at low frequencies propagation can be accepted but, it comes with several problems like, requirement of very lengthy sized antenna and also a very high price of transmission power. Even though the technical details are not robust, the electro-magnetic modems used for the underwater communication have grown at a fast pace during the recent years. The acoustic signal absorption along with frequencies utilised for under sea water is less by three
Underwater Acoustic Sensor Networks
367
orders magnitude. Although, the optical link is considered to be excellent for under water communication, but it is effective only in very clean water. It is not effective in case if the network is disturbed or it has a range less than 5 m. Over and above it an accurate positioning is essential for parochial beam optical transmitters. Thus, it cannot be considered as an apt way for under-water long distanced communications, especially when the water is not as clean as the shallow one. On the contrary, the acoustic signal is a suitable medium and reliable only for densely deployed economical and temporary under-water sensor network. It also facilitates the multi directional transmission along with access of disseminated channel with reasonable signal attenuation. But, Inspite of all the attractions which are relative to the optical waves and the electro-magnetic waves, the underwater acoustic signal poses a critical limitation i.e. very high error rate, path loss, large propagation procrastination and problem of limited bandwidth. The loss of path is majorly due to the distance of transmission and also due to frequency of the signal. The small bandwidth leads to limitation in data rate which is further dependent on both, the frequency and the range of communication. The long-range systems which are operational over long distances far exceed the bandwidth and are not limited to small kHz. On the contrary, a system which is short ranged and is operational over a limited range, can communicate with a bandwidth of approximately 100 kHz.
3 UASNs Architecture The general architecture of UASNs [1] are classified into following categories based on deployment. 3.1 1-D Architecture In this kind of design, sensor nodes are independently conveyed and every sensor node go about as independent organization. These sensor nodes are capable to detect, interaction and sending the information to the remote station. Here, sensor node might be a drifting float ready to detect the submerged environment. Like, autonomous underwater vehicle (AUV) sensor node can be conveyed in submerged environment for some instant of time to detect the data and afterward float back to the surface to communicate the data. Acoustic, radio signal or optical communication are utilized to send the data. The topological idea of this design is star where the information is sent from sensor node to remote station by means of one hop. 3.2 2-D Architecture In this architecture, a group of sensor nodes are deployed in the form of cluster. For each cluster, there is one cluster head. This cluster head is also called as anchor node. The anchor node is responsible to transmit that sensed information to the surface buoyant nodes. Here, the communication is done in two dimension named horizontal and vertical. The horizontal communication link is used by each member to communicate with the respective anchor node while vertical communication link is used by anchor node to communicate with the surface buoyant node. Acoustic, radio frequency or optical communication are used to transmit the information. The usage of communication medium depends on applications and nature of underwater environment.
368
K. K. Gola et al.
3.3 3-D Architecture Like 2D, a group of sensor nodes are placed at different depth in the form of cluster. These sensor nodes are anchored at different depth. Three types of communication scenarios take place named: intercluster communication, intracluster communication and anchorbuoyant node communication. At different depth, intercluster communication take place by sensor nodes, intracluster communication take place between sensor node and anchor node while anchor-buoyant communication take place between anchor node and buoyant node. Acoustic, radio frequency or optical communication are used for all types of communication. 3.4 4-D Architecture This type of architecture is the combination of 3D and mobile UASNs. Here, remotely operative underwater vehicles (ROVs) is used to collect the information from the anchor nodes and then relay that information to the remote station. This ROVs can be any autonomous robots, ship or submarines also. Each sensor node can transmit their data to ROVs directly based on the distance that how much it is closed to the ROVs. Acoustic, radio frequency or optical communication can take place between underwater sensor node and ROVs. Sensor nods uses the radio communication link if close to ROVs and have large amount of data whereas acoustic communication link is used if sensor nodes is far from ROVs and have small amount of data.
4 Related Works As far the survival of humans and other living organisms on earth is concerned, Ocean which covers approximately 96.5% of all earthly water, is undoubtedly an imperative component. It is sad to say that despite the above stated fact, almost 95% area that the ocean own, due to non-existence of ample acoustic communication technologies, still remains unexplored. Through association of several pervasive sensor devices aiming at empowering Underwater Acoustic Sensor Networks (UASNs) to collect efficient and trusted data, UASNs have added the technique by which ocean can be explored. Attributable to numerous scenarios wherein the aforementioned can be implemented like, to mention, monitoring of underwater environs and pollution, search of rare minerals and coastal areas, it has becoming prevalent. The design of UWSN is made up of a wide range of battery inhibited sensors. Within a specific environment, self-directed vehicles aiming at collection of data are arranged. The topological nature of underwater environment is highly dynamic and sensor nodes move with the flow of water. Underwater sensor, contrasting earthly sensor networks, meets with few challenges which are unique in its nature; notable among them are movement of nodes with water currents, non-appearance of GPS, deployment of three dimensional node. Subsequently, impracticality of applying radio waves, which gets withered easily inside the water on account of absorption, as a mode of communication in underwater, brings forth additional challenges to this. Hence, acoustic waves, as a mean of communication, with the ability of empowering long distance communication inside water have been put in use. As a matter
Underwater Acoustic Sensor Networks
369
of fact, the underwater acoustic channel is also not untouched with obstacles for example Doppler effect, greater noise level, higher broadcast delays and greater rate of bit error and all these must be taken into consideration while designing an effective protocol for underwater sensor networks. The routing protocols must possess the ability of producing the promising and active communication links devoid of any assistance of pre-arranged devices as it is unfeasible to get it arranged beforehand owing to deployment of underwater acoustic sensor networks in the area which does not permit to arrange the aforesaid [2]. Here, Table 1 shows a summary in terms of key points, research gaps, advantages, disadvantages and simulation platform. These research gaps provide a new direction where a researchers can explore. Table 1. Comparative Analysis of Routing Protocols Ref. No.
Author’s Key Points/Application name & Year
Research gap
Advantages
Disadvantages
Simulation Tool
[3]
(Wang et al. 2017)
Route maintenance, improves network lifetime and use of beacon message
Need real implementation of the work
Improves network lifetime
Presence of void node due to improper selection of next hop
NS-3
[4]
(Jin et al. 2017)
Selection of best next hop and Use of action utility function
Void area coverage
Reduces average latency, increase network lifetime
Network lifetime decreases due to random distribution of residual energy
NA
[5]
(Kim 2018)
Use of reinforcement learning and game theory, for harsh condition
Void area coverage
High throughput and energy efficiency
Facing problem MATLAB of Packet delivery ratio and loss due to retransmission
[6]
(Shah et al. 2018)
Depth adjustments, Transmission range
High propagation delay and limited bandwidth
Delivery ratio High is high transmission loss
MATLAB
[7]
(Wang et al. 2018)
Use of Opportunistic Directional Void area Forwarding Strategy (ODFS) coverage
Energy efficiency
Delivery ratio is high
NS-3
[8]
(Khan et al. 2019)
Reduce the packet loss and channel congestion
High packet delivery ratio
Backward transmission path is large
MATLAB
[9]
(Lu et al. 2020)
Depth information, Residual Void detection energy and void detection factor, Q value based holding time
Delivery ratio High is good transmission loss
Aqua-sim platform (NS-2)
Reduction in node number reduces the performance of protocols
(continued)
370
K. K. Gola et al. Table 1. (continued)
Ref. No.
Author’s Key Points/Application name & Year
Research gap
Advantages
Disadvantages
Simulation Tool
[10]
(Khan et al. 2020)
Reduced base angle, intelligent approach to select the best forwarder node, Holding time
Selection of best forwarder node
Delivery ratio is good, increases network lifetime
Transmission loss is high, network and communication overhead
MATLAB
[11]
(Rathore et al. 2020)
Whale and wolf optimization algorithm, Fitness function
Reduction in number of nodes, network lifetime
Delivery ratio Backward is high transmission path is large
NS-2
[12]
(Gola & Gupta 2021)
Use of optimization function, link quality, residual energy and depth difference
End to end delay
Reduces energy consumption
MATLAB
[13]
(Chaaf et al. 2021)
Use of level based clustering, stable cluster head selection, use of dynamic sleep scheduling mechanism, use of Autonomous Underwater Vehicles (AUVs) to identify the void holes
Void hole problem
Less energy Early death of consumption, cluster head less delay and high packet delivery ratio
NS-3
[14]
(Mheme, Comeau, Phillips & Aslam 2021)
Use of hop count discovery procedure, data transmission is done based on rank
Void node avoidance
High packet Not suitable for delivery ratio, large network low energy consumption
MATLAB
High delay
5 Open Issues and Research Challenges In previous section, it has been seen that many researchers have devoted a lot of efforts to handle the issues of UASNs associated with MAC layer design, target detection, coverage and connectivity strategy, acoustic channel modeling etc. However, some challenges are still open that need attention. This section presents the open issues and research challenges in the field of UASNs. 5.1 Void Node Problem As we know that successful data packets delivery is one of the most challenging task in any type of routing. One major issue that always affect the overall performance of the routing is the presence of void node in the networks. A node is known as void node if it does not have any forwarder node in the network. Sometimes it is also known as dead end. The presence of such nodes can affect the performance in terms of packet loss, delay and energy consumption specially. It has been seen that the performance of cooperative routing techniques are affected to a significant extent in sparse networks. However, cluster based routing performs well to handle the void region but the early death of cluster head (CH) cannot be avoided. This area still needs to be explored. Therefore, we need to establish the appropriate routing algorithm to overcome the issue of void node avoidance and also balance the load of cluster head.
Underwater Acoustic Sensor Networks
371
5.2 Secure Routing As we know that data security is one of the most important challenges for any kind of networks and underwater acoustic sensor networks in one of them. Sometimes it is not possible to replace the sensor node in underwater environment on regular basis. Even sensor node also have some limitations like: limited energy, limited storage capacity, limited communication capability. In underwater environment, data is transmitted from one node to another through a specific route and each sensor node may become a forwarder node and could make a certain situations for the attackers. The entire networks may be destroyed due to lack of sufficient security and hence, all efforts would be in vain. Therefore, the security aspects during data transmission needs more attention to solve several attacks and enhance the robustness of the underwater acoustic sensor networks. However, an additional energy costs may be imposed when providing security in routing. Therefore, we need to design a suitable security mechanisms and also maintains a tradeoff between security and energy efficiency. 5.3 Optimal Energy Efficient Route A routing plays an important role while discuss the UWASNs. To find the shortest route in the networks is again a major challenge in any routing process. The main issue is that whether we should follow the shortest route or energy efficient route to forward the data packets from source to destination. There are two scenario in this situation. The first that energy efficient route may not be shortest and requires more time to forward the data packet to the destination. The second is that shortest route may not be energy efficient. Therefore, we need to establish an optimal energy efficient routing that maintains a tradeoff between shortest route and energy efficiency. 5.4 Hotspot Problem Mostly, in underwater environment sensor nodes are deployed in hierarchical fashion. That means some sensor nodes are at depth, some are at middle and some are nearest to sink node. The data is transmitted from bottom of the seabed to the water surface. It is understood that the sensor nodes which are nearest to the water surface are act as best relay node due to their close proximity with the sink node. This results in almost unavoidable fast energy drainage and consequently early death of such nodes. This type of problem is known as hotspot problem. Thus, these connectivity holes and/or partitions creates hamper reliability of the network. Very few research have been done to address this problem but cannot completely avoid the early death of sensor node which are nearest to the sink. Therefore, this problem also needs attention to be resolved. 5.5 Link Stability Due to water currents, most of the time the sensor nodes are in moving condition. Hence, the topology of underwater acoustic sensor networks is highly dynamic in nature and routing link is highly unstable. This unreliable link degrades the performance of networks in terms of low throughput and packet drop specially. Therefore, need to design stable link for reliable routing.
372
K. K. Gola et al.
5.6 Network Partitioning Due to mobility of sensor nodes, network partitioning cannot be avoided. The process to split the network into two or more unconnected parts due to free mobility of sensor node is known as network partitioning. Thus, this creates a scenario where the sink node is not accessible to other nodes. Even the predefined estimated path is also not exist. The results also increase the problem of connectivity void problem. Therefore, this problem also needs attention to be resolved. 5.7 Sensor Node Movement Model With the flow of water, sensor nodes are drifting in underwater environment causes the changes in their respective locations. This dynamic topology degrades the overall performance of the networks. Many mobility model are already available for terrestrial sensor nodes but the dynamic structure and movement of water makes UASNs totally different from terrestrial sensor networks. Therefore, an appropriate sensor node movement model is needed for UASNs. 5.8 Network Coverage and Connectivity A major issue in UASNs is the placement of sensor nodes for coverage and connectivity in the networks. This issue is related to data packets transmission, energy consumption and node deployment in UASNs. In UASNs, each node must be connected to at least one other nodes in the networks and also deployed in the target area. The purpose is to provide the better coverage and connectivity in the networks. There are few existing optimization algorithms that perform the node deployment strategy in UASNs. However, these algorithms struggle with more number of iterations, more computational time and provides better coverage rate in a long period of time. Therefore, an energy efficient optimization scheme is needed to achieve better network coverage and connectivity in the networks.
6 Conclusion In the previous years, a lots of improvement have been done in the field of UASNs. Still there exist some gap for improvement when we talk about large networks. An exceptional structure and characteristics of underwater environment makes this networks so much complicated and reveal the gaps between technologies and its applications. Here, this work provides the state of the arts in the open issues and research challenges related to underwater acoustic sensor networks. This investigation helps the researchers to assist them to find the solution of existing issues for further improvement and also open the door for long term success in the field of UASNs.
Underwater Acoustic Sensor Networks
373
References 1. Gola, K., Gupta, B.: Underwater sensor networks: ‘comparative analysis on applications, deployment and routing techniques.’ IET Commun. 14(17), 2859–2870 (2020). https://doi. org/10.1049/iet-com.2019.1171 2. Gupta, B., Gola, K.K., Dhingra, M.: HEPSO: an efficient sensor node redeployment strategy based on hybrid optimization algorithm in UWASN. Wireless Netw. 27(4), 2365–2381 (2021). https://doi.org/10.1007/s11276-021-02584-4 3. Wang, H., Wang, S., Bu, R., Zhang, E.: A novel cross-layer routing protocol based on network coding for underwater sensor networks. Sensors 17(8), 1821 (2017). https://doi.org/10.3390/ s17081821 4. Jin, Z., Ma, Y., Su, Y., Li, S., Fu, X.: A Q-learning-based delay-aware routing algorithm to extend the lifetime of underwater sensor networks. Sensors 17(7), 1660 (2017). https://doi. org/10.3390/s17071660 5. Kim, S.: A better-performing Q-learning game-theoretic distributed routing for underwater wireless sensor networks. Int. J. Distrib. Sens. Netw. 14(1), 1550147718754728 (2018). https://doi.org/10.1177/1550147718754728 6. Shah, M., Wadud, Z., Sher, A., Ashraf, M., Khan, Z.A., Javaid, N.: Position adjustment–based location error–resilient geo-opportunistic routing for void hole avoidance in underwater sensor networks. Concurr. Comput. Pract. Exp. 30(21), e4772 (2018). https://doi.org/10.1002/cpe. 4772 7. Wang, Z., Han, G., Qin, H., Zhang, S., Sui, Y.: An energy-aware and void-avoidable routing protocol for underwater sensor networks. IEEE Access 6, 7792–7801 (2018). https://doi.org/ 10.1109/access.2018.2805804 8. Khan, A., Aurangzeb, K., Qazi, E.U.H., Ur Rahman, A.: Energy-aware scalable reliable and void-hole mitigation routing for sparsely deployed underwater acoustic networks. Appl. Sci. 10(1), 177 (2019). https://doi.org/10.3390/app10010177 9. Lu, Y., He, R., Chen, X., Lin, B., Yu, C.: Energy-efficient depth-based opportunistic routing with q-learning for underwater wireless sensor networks. Sensors 20(4), 1025 (2020). https:// doi.org/10.3390/s20041025 10. Khan, I., et al.: Adaptive hop-by-hop cone vector-based forwarding protocol for underwater wireless sensor networks. Int. J. Distrib. Sens. Netw. 16(9), 155014772095830 (2020). https:// doi.org/10.1177/1550147720958305 11. Rathore, R.S., et al.: W-GUN: whale optimization for energy and delay-centric green underwater networks. Sensors 20(5), 1377 (2020). https://doi.org/10.3390/s20051377 12. Gola, K.K., Gupta, B.: Underwater Acoustic Sensor Networks: An Energy Efficient and Void Avoidance Routing Based on Grey Wolf Optimization Algorithm. Arab. J. Sci. Eng. 46(4), 3939–3954 (2021). https://doi.org/10.1007/s13369-020-05323-7 13. Chaaf, A., et al.: Energy-efficient relay-based void hole prevention and repair in clustered multi-AUV underwater wireless sensor network. Secur. Commun. Netw. 2021, 1–20 (2021). https://doi.org/10.1155/2021/9969605 14. Mhemed, R., Comeau, F., Phillips, W., Aslam, N.: Void avoidance opportunistic routing protocol for underwater wireless sensor networks. Sensors 21(6), 1942 (2021). https://doi. org/10.3390/s21061942
A Step-To-Step Guide to Write a Quality Research Article Amit Kumar Tyagi1(B)
, Rohit Bansal2 , Anshu3 , and Sathian Dananjayan4
1 Department of Fashion Technology, National Institute of Fashion Technology, New Delhi,
India [email protected] 2 Department of Management Studies, Vaish College of Engineering, Rohtak, India 3 Faculty of Management and Commerce (FOMC), Baba Mastnath University, Asthal Bohar, Rohtak, India 4 School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamilnadu 600127, India
Abstract. Today publishing articles is a trend around the world almost in each university. Millions of research articles are published in thousands of journals annually throughout many streams/sectors such as medical, engineering, science, etc. But few researchers follow the proper and fundamental criteria to write a quality research article. Many published articles over the web become just irrelevant information with duplicate information, which is a waste of available resources. This is because many authors/researchers do not know/do not follow the correct approach for writing a valid/influential paper. So, keeping such issues for new researchers or exiting researchers in many sectors, we feel motivated to write an article and present some systematic work/approach that can help researchers produce a quality research article. Also, the authors can publish their work in international conferences like CVPR, ICML, NeurIPS, etc., or international journals with high factors or a white paper. Publishing good articles improve the profile of researchers around the world, and further future researchers can refer their work in their work as references to proceed with the respective research to a certain level. Hence, this article will provide sufficient information for researchers to write a simple, effective/impressive and qualitative research article on their area of interest. Keywords: Quality Research · Research Paper · Qualitative Research · Quantitative Research · Problem Statement
1 Introduction The word ‘Research’ when we talk about it among new researchers/students, students, feel blank and get fear what it is? Research means searching and refining old content in a new way. In simple words, for a literature review work, readers/researchers do not need to read many papers; they can refer to a single article on the respective topic. For example, for class imbalance problem, refer to [wide scale]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 374–383, 2023. https://doi.org/10.1007/978-3-031-35510-3_36
A Step-To-Step Guide to Write a Quality Research Article
375
1.1 For Science For science streams like Economics, History etc., results are not mandatory. In such streams, hypothesis matter. But, subjects like Physics, Chemistry and mathematicsrelated filed required proper proof and verification of statements. 1.2 For Engineering For engineering, results are a must. For example, we can show real-time or simulatorbased results for the transportation sector. Similarly, for healthcare, either we can try our proposed model on primary data or collect data (secondary data). Also, if you want to put a literature review, then you can put a comparison of existing work like [13, 14]. A literature work also contains some simulation-based results which show how other approaches can fit at which benchmarks or why these existing approaches do fit or fail. So, an efficient solution is required to solve the necessary raised problem (Fig. 1).
Fig. 1. Flow Diagram for Selecting Articles
2 Collecting Quality Research Articles Many duplicate articles are available over the web, which we need to segregate while downloading our articles. Before moving further, we first need to select an interesting topic we like or our area of interest. If we do not know anything or what to search for, we may refer to the web, leading technologies in the current era or top problems in computer science. Whereas results we will get many results. For that, we can refer to:
376
• • • • • •
A. K. Tyagi et al.
www.googlescholar.com/www.google.com www.xmol.com www.researchgate.com www.elsevier.com www.springer.com and many more scholarly related databases
2.1 Previously Published Articles First, we need to select our topics and download previously published topics for this task. In this task, we can categorise works with and without results. Works may be easier to read without results, while results give a specific or clear picture of how that problem has been solved via a framework/method/algorithm. Articles as Pre-print or in the ArXiv database: Many researchers publish their work as pre-print or in the arXiv database to avoid paying a fee to a journal or conference and to ensure that their research article is received by its targeted audience before it is published in reputable journals. These publications are of great value and focus on the results of their research. These research papers will be of more use to you in the process of preparing a high-quality research paper. On the other hand, the majority of reputable journals advise against citing more than two publications from the pre-print or ArXiv database in a single paper. We are only permitted to refer to articles that have been published by reputable publishers such as MDPI, Hindawi, PLOS one, Springer, Elsevier, IEEE, ACM, and other similar organizations. You could limit or mislead yourself in numerous ways if you download papers from fraudulent publications, which is why you should avoid websites that claim to be journals. 2.2 Segregating Unused/duplicate Articles
3 Reading Articles Reading a research article is different from reading a scientific blog. Initially, it’s timeconsuming and reduces over time with experience gain. Abstract, introduction, literature survey, methods, simulation and findings, and conclusion are the basic sections that are included in the layout of the majority of research articles, which follow a conventional pattern. Download from the internet at least twenty research publications based on the topic that most interests you in the field of research. You should always start with the introduction section and not the abstract. That should provide you with enough information to understand why the research is being done. First, for each article, determine the problem statement and then the solution that has been produced. The second step is to determine whether or not both the problem and the solution are clear or ambiguous. In the third step, it is important to comprehend the proposed framework or model. Finally, compare its performance and results to those of other frameworks and models presented in previously published research articles.
A Step-To-Step Guide to Write a Quality Research Article
377
4 Summarizing all Works for Literature Review A literature review provides the reader with a full understanding of the developments in the field. The presentation of insight regarding conceptual and theoretical frameworks as well as procedures will be useful for the research communities. Discuss some of the most pressing concerns and topics currently being discussed in the field. With the help of newer works, you can explain how your study fits in with the field’s overall trend and highlight its significance. It is not advised to provide scores based on the quality of the article. Instead, clearly mention the cons and pros of every research article. Review the relevant literature on your subject and emphasize the novel and important aspects of your research. The existing framework/models should be assessed in such a way that the readers should be able to identify the reasons for selecting the research article for the particular problem. Elaborate on the key details of the contribution of the article, including the framework, models, codes, etc. A summary of relevant studies that illustrate how yours adds to contradicts or fills in gaps in the existing literature. To show what does and does not work, as well as what is lacking in the field, you can utilize your literature review as evidence. Provide evidence of the relevance of your research to a real-world problem or issue. It’s important to cite other studies in your field to demonstrate that you’re building upon work that has been recognized as relevant by the academic community at large. A common practice is to provide a summary of the literature review in a tabular form. It makes the task easy for the readers to understand what to expect from your survey.
5 Finding a Feasible Problem It’s essential to strike a balance between questions that have previously been meticulously discussed and those that are unanswerable when choosing a topic for your research prospectus. Avoid picking something that has been discussed at length already, but also avoid picking something for which you have no good arguments. You should choose a subject that is not only interesting to you and leaves room for further investigation but also one that is feasible. Before beginning a research project, it is important for scientists to think through a few logistical concerns relating to its feasibility. The nature and scope of the problem: This is more about figuring out what factors influence your topic of study so you can formulate research questions. This is also a good moment to step back and do some background research to make sure you can find evidence from recent studies that support the existence of this problem as a gap in the scientific literature. Secondly, you will need a theory to back up your research, whether it’s qualitative or quantitative in nature. Thirdly, the methodology to solve the problem. This also concerns hardware and software availability. By answering the above points and with the help of the literature survey, a feasible problem statement can be chosen.
6 Solving the Identified Problem Simulation Tools: A computer simulation is a software that, when executed, allows one to investigate the approximate behaviour of a mathematical model employing a series
378
A. K. Tyagi et al.
of steps. The amount of computation required for a simulation is usually enormous. Therefore, computers are required to perform these computations. However, the sheer number of calculations is not the only challenge posed by simulation. There is a vast range of methods and resources available for simulating systems, as they are all used to help understand complex systems and make decisions. It is difficult to develop a simulation tool from scratch for a particular problem domain. There are some standard simulation tools available, such as Matlab, Simulink, ns-3, Vortex, etc. Various open source and proprietary simulation tools are available for different application domains. For example, ns-3 is an open-source tool for simulating network communication, CircuitLogix is a tool for designing and simulating electronic circuits, and Flood Modeller is a hydraulic simulation tool for modelling and visualising floods. Based on the application domain, support for libraries and the researcher’s experience, the right tool can be chosen for the implementation. Languages need to be known: The selection of a language for the simulation is a challenging task. There are a lot of different languages that compete with one another, and each one has its own set of benefits and drawbacks. Every simulation tool supports its own set of languages. Language C, Python, Ruby, and java are necessary for simulating computer science results. The use of simulation tools would be limited to professional programmers if there weren’t any dedicated libraries for these languages that attempt to alleviate the user of some of the effort. The ways in which its users make use of the standard programming features provided by all simulation languages will vary. Languages typically offer a degree of flexibility in describing various modelling circumstances.
7 Comparing Your Results with Existing Results Reviewers will consider this section while deciding the practical applicability of your research findings; whether your findings confirm or refute your theory, you should briefly discuss them here. To what extent did your research cover gaps in the existing knowledge base, how beneficial is your methodology, and how well did your approaches to interpreting the results of your study all stand out as strengths in your writing will all be within your control. These can be achieved by interpreting the research findings from prior research works and comparing them to your findings. In addition to this, it demonstrates the depth of your knowledge in the research field. The reason for this point is so that he or she can learn how to select traits, as well as which attributes are relevant, etc., for the needs of a certain situation (for prediction). Up to this point, all of the sections (Sects. 1 to 7) have been utilized to write an essay that is relevant to quality. However, now that we have finished writing it if we want to publish it, we need to seek the appropriate platform, which may be international conferences or journals. As a result, the next section will provide additional detail on such information.
8 Publishing Your Research Work As a researcher, choosing which journal to send your work to in order to get it published is one of the most essential decisions you will have to make. Regrettably, there is no one straightforward tool that may guarantee that you will choose the very best possible site
A Step-To-Step Guide to Write a Quality Research Article
379
to publish your research. Instead, take into consideration the following aspects so that you may make a knowledgeable decision. The journal/conference in which you select to submit your work could have a substantial effect on the reach and significance of your study. One needs to dedicate a considerable amount of effort to compile a list of appropriate journals, taking into account the research they cover, the publication process they use, and the turnaround time. 8.1 Journals There are four to five categories of journals are: • • • • • •
UGC – CARE Scopus Web of Science Extended Science Citation Index (ESCI) Science citation index – Extended (SCIE) Science citation index (SCI)
Keep in mind that SCI and SCIE indexed journals are very excellent tools for conducting high-quality research. Researchers can be more effective in providing insights into the many fields, applications, places, and industries by referring to the indexed papers that are available. There are a few SCI journals that are: • • • • • • • • • •
IEEE transactions on networking IEEE Access MDPI sensors MDPI applied sciences FGCS, Elsevier JKS, Elsevier Wireless communication network Machine Learning, Springer Journal of Ambient.., Springer And many more journals existed on Elsevier, Springer, IET, MDPI, IOS Press, Wiley, etc.
The authors should check the scope of the journals before submitting their work to the respective journals/conferences. Other publishing options to consider are conferences, book chapters, etc. 8.2 Conferences Scopus. Web of Science. Note that some reputed conferences around the world are (conducted every year): CVPR, ICML, NeurIPS, HPC, CoCoNet, ISDA, IAS, etc.
380
A. K. Tyagi et al.
8.3 Chapters Scopus. Web of Science. Note that we can find a call for chapters for respective books at respective publishers’ web site or over Google. Read each mentioned detail carefully and submit your work accordingly and follow the same process for the future if any correction is required. 8.4 Other We are able to transform novel works that have a product and process (including an inventive step and industrial use) into patents or projects. But such inventions should not be filed before by anyone around the world. Before submitting a patent application, the inventor is required to check such information as part of the patentability search. We can file a patent in Many sample documents of patents for National and international counties can be found on their respective patent office (of that country) websites. • National- India • International – USA, Germany, Japan, China, etc. We suggest all future researchers share the research data, such as the code, dataset, supplements, etc., with the research communities. This will enable research communities to verify the results, reuse your data and work on it for the betterment of the results. The research data can be uploaded to repositories such as Github, Mendeley data or Kaggle, and the URL can be shared in the research paper. Sharing data gives you lots of benefits, such as exposure to your work, citations, etc. Also, it boosts the faith and authenticity of your research.
9 Challenges Faced During Conducting/Implementing Research There are several changes faced by all newcomers/researchers, which can be listed here as: • • • • • • • • • • •
Availability of Poor internet Not accessing all research-related databases/limited access to quality research articles Copyright/permission from the third party Collection of data sets, i.e., primary or secondary Validity of data set/Genuineness Validating the simulating results for a particular data set Existing many models for simulation Less skilled people Privacy of communicated work with reviews of any conferences/journals Not availability of high-performance systems for processing Weather situation
A Step-To-Step Guide to Write a Quality Research Article
381
In the last, we suggest all authors verify the following points before submitting their articles to any journal/conference. • Abstracts need to be 250–300 words, including a summary of problem definition, background, motivations, proposed work and results. • Proper Keywords (minimum 4, maximum 6) • Avoid using writing small lines. • All the references and figures need to be highlighted in black colour in the manuscript. • Proper Citation throughout the work (minimum 20 References) • At the end of the introduction part, all the section descriptions and organisation of the work need to be depicted/explained • Introduce all authors in references, do not write et al. in references (in literature survey can be). • One heading needs to start from one page and needs to end on the second page. • Each and every figure need to be made by yourself. • Plagiarism needs to be below 10 per cent with a zero/one words similarity index (3% per cent from Single Source). • All references need to be in a format like the author’s name, paper title, journal name, page no., year/APA styles/according to journal/conferences style (where you are referring/submitting your work) • Heading 12 points Times new roman, Margin for up, down, left and right 1.30”, References 10 Times new roman, the title of the work18 points, all other content, 11-time new roman (including author name and affiliations). For the sample, researchers have suggested articles to refer to for journals [1–4], conference papers [5–12], and chapters [13–22]. Note that the above-discussed points may vary from journals to journals or conferences, so use such points in your research work according to the journal/conference guidelines.
10 Conclusion and Future Scope As was just mentioned, conducting research is essential in order to discover new ideas, inventions, and thoughts. However, the vast majority of students, academics, and researchers continue to struggle with a number of obstacles while discovering new things. This is due to a lack of adequate understanding regarding what to do, how to do it, and other relevant topics. As a result, in this work, we have provided a summary of all necessary elements, ranging from the most basic to the most advanced, for the publication of research articles (in reputable journals or conferences). If we check any research article that has been published in a high-impact journal or that has been presented at a reputable international conference, we will discover that all research articles employ the same methodology. It is important that the content of our respective works does not conflict, that we retain our ethical standards (for example, by adhering to the COPE rules), and that we do not favour or cite the work of others only for the sake of mutual understanding. This work can be utilized in the future to write chapters, reports, and other types of work for very influential publishers. This work will be considered future work.
382
A. K. Tyagi et al.
Acknowledgement. We want to think of the anonymous reviewer and our colleagues who helped us to complete this work.
Authors Contributions. Amit Kumar Tyagi & Sathian Dananjayan have drafted and approved this manuscript for final publication.
Conflict of Interest. The author declares that no conflict exists regarding the publication of this paper. Scope of the Work. As the author belongs to the computer science stream, so he has tried to cover up this article for all streams, but the maximum example used in situations, languages, datasets etc., are with respect to computer science-related disciplines only. This work can be used as a reference for writing good quality papers for international conferences journals. Disclaimer. Links and papers provided in the work are only given as examples. To leave any citation or link is not intentional.
References 1. Nair, M.M., Tyagi, A.K., Sreenath, N.: The future with industry 4.0 at the core of society 5.0: open issues, future opportunities and challenges. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7 (2021). https://doi.org/10.1109/ ICCCI50826.2021.9402498 2. Tyagi, A.K., Fernandez, T.F., Mishra, S., Kumari, S.: Intelligent Automation Systems at the Core of Industry 4.0. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds.) ISDA 2020. AISC, vol. 1351, pp. 1–18. Springer, Cham (2021). https:// doi.org/10.1007/978-3-030-71187-0_1 3. Goyal, D., Tyagi, A.: A Look at Top 35 Problems in the Computer Science Field for the Next Decade. CRC Press, Boca Raton (2020) https://doi.org/10.1201/9781003052098-40 4. Tyagi, A.K., Meenu, G., Aswathy, S.U., Chetanya, V.: Healthcare Solutions for Smart Era: An Useful Explanation from User’s Perspective. In the Book “Recent Trends in Blockchain for Information Systems Security and Privacy”. CRC Press, Boca Raton (2021) 5. Varsha, R., Nair, S.M., Tyagi, A.K., Aswathy, S.U., RadhaKrishnan, R.: The future with advanced analytics: a sequential analysis of the disruptive technology’s scope. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.-P. (eds.) HIS 2020. AISC, vol. 1375, pp. 565–579. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-730505_56 6. Tyagi, A.K., Nair, M.M., Niladhuri, S., Abraham, A.: Security, privacy research issues in various computing platforms: a survey and the road ahead. J. Inf. Assur. Secur. 15(1), 1–16 (2020) 7. Madhav, A.V.S., Tyagi, A.K.: The world with future technologies (Post-COVID-19): open issues, challenges, and the road ahead. In: Tyagi, A.K., Abraham, A., Kaklauskas, A. (eds.) Intelligent Interactive Multimedia Systems for e-Healthcare Applications, pp. 411–452. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6542-4_22 8. Mishra, S., Tyagi, A.K.: The role of machine learning techniques in the Internet of Thingsbased cloud applications. In: Pal, S., De, D., Buyya, R. (eds.) Artificial IntelligenceBased Internet of Things Systems. Internet of Things (Technology, Communications and Computing). Springer, Cham. https://doi.org/10.1007/978-3-030-87059-1_4
A Step-To-Step Guide to Write a Quality Research Article
383
9. Pramod, A., Naicker, H.S., Tyagi, A.K.: Machine Learning and Deep Learning: Open Issues and Future Research Directions for Next Ten Years. Computational Analysis and Understanding of Deep Learning for Medical Care: Principles, Methods, and Applications. Wiley Scrivener (2020) 10. Kumari, S., Tyagi, A.K., Aswathy, S.U.: The Future of Edge Computing with Blockchain Technology: Possibility of Threats, Opportunities and Challenges. In the Book Recent Trends in Blockchain for Information Systems Security and Privacy. CRC Press, Boca Raton (2021) 11. Dananjayan, S., Tang, Y., Zhuang, J., Hou, C., Luo, S.: Assessment of state-of-the-art deep learning based citrus disease detection techniques using annotated optical leaf images. Comput. Electron. Agric. 193(7), 106658 (2022). https://doi.org/10.1016/j.compag.2021. 106658 12. Nair, M.M., Tyagi, A.K.: Privacy: History, Statistics, Policy, Laws, Preservation and Threat analysis. J. Inf. Assur. Secur. 16(1), 24–34 (2021) 13. Tyagi, A.K., Sreenath, N.: A comparative study on privacy preserving techniques for location based services. Br. J. Math. Comput. Sci. 10(4), 1–25 (2015). ISSN: 2231–0851 14. Rekha, G., Tyagi, A.K., Krishna Reddy, V.: A wide scale classification of class imbalance problem and its solutions: a systematic literature review. J. Comput. Sci. 15(7), 886–929 (2019). ISSN Print: 1549–3636 15. Kanuru, L., Tyagi, A.K., A, S.U., Fernandez, T.F., Sreenath, N., Mishra, S.: Prediction of pesticides and fertilisers using machine learning and Internet of Things. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6 (2021). https:// doi.org/10.1109/ICCCI50826.2021.9402536 16. Ambildhuke, G.M., Rekha, G., Tyagi, A.K.: Performance analysis of undersampling approaches for solving customer churn prediction. In: Goyal, D., Gupta, A.K., Piuri, V., Ganzha, M., Paprzycki, M. (eds.) Proceedings of the Second International Conference on Information Management and Machine Intelligence. LNNS, vol. 166, pp. 341–347. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-9689-6_37 17. Sathian, D.: ABC algorithm-based trustworthy energy-efficient MIMO routing protocol. Int. J. Commun. Syst. 32, e4166 (2019). https://doi.org/10.1002/dac.4166 18. Varsha, R., et al.: Deep learning based blockchain solution for preserving privacy in future vehicles. Int. J. Hybrid Intell. Syst. 16(4), 223–236 (2020) 19. Tyagi, A.K., Aswathy, S U.: Autonomous Intelligent Vehicles (AIV): research statements, open issues, challenges and road for future. Int. J. Intell. Netw. 2, 83–102 (2021). ISSN 2666–6030. https://doi.org/10.1016/j.ijin.2021.07.002 20. Tyagi, A.K., Sreenath, N.: Cyber physical systems: analyses, challenges and possible solutions. Internet Things Cyber-Phys. Syst. 1, 22–33 (2021). ISSN 2667–3452, https://doi.org/ 10.1016/j.iotcps.2021.12.002 21. Tyagi, A.K., Aghila, G.: A wide scale survey on botnet. Int. J. Comput. Appl. 34(9), 9–22 (2011). (ISSN: 0975–8887) 22. Tyagi, A.K., Fernandez, T.F., Aswathy, S.U.: Blockchain and aadhaar based electronic voting system. In: 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, pp. 498–504 (2020). https://doi.org/10.1109/ICECA4 9313.2020.9297655 23. Kumari, S., Muthulakshmi, P.: Transformative effects of big data on advanced data analytics: open issues and critical challenges. J. Comput. Sci. 18(6), 463–479 (2022). https://doi.org/ 10.3844/jcssp.2022.463.479
A Survey on 3D Hand Detection and Tracking Algorithms for Human Computer Interfacing Anu Bajaj1,2(B) , Jimmy Rajpal4 , and Ajith Abraham2,3 1
4
Thapar Institute of Engineering and Technology, Patiala, India [email protected] 2 Machine Intelligence Research Labs (MIR Labs), Auburn, USA 3 Center for Artificial Intelligence, Innopolis University, Innopolis, Russia Guru Jambheshwar University of Science and Technology, Hisar, Haryana, India
Abstract. 3D hand detection and tracking algorithms has increased research interests in computer vision, pattern recognition, and humancomputer interfacing. It is greatly inspired by the emerging technologies like RGBD camera, depth sensors and processing architecture. Therefore, this paper presents a survey on recent works on 3D hand detection and tracking and their applications as a natural user interface to control the computer with hand movements and gestures. It examines the literature in terms of 1) 3D hand capturing techniques used like RGBD cameras, depth sensors, 2) processing with different image processing and computer vision algorithms and their hardware implementation 3) and applications in human computer interfacing for realization of the system. While the emphasis is on 3D mouse and keyboard, the related findings and future challenges are also discussed for practitioners. Keywords: Computer vision · image processing · hand gesture recognition · hand detection · hand tracking · artificial intelligence human computer interaction · human computer interfacing
1
·
Introduction
In this digital era, as everything is going to be automated so the perspective towards the computing systems is changing at a fast pace. It has enormous applications from military to medicine. Moreover, we are slowly heading towards virtual environment. This transition has sparked the evolution of human computer interfaces from bare-minimum components to full-fledged spatial computing. Some basic interfacing devices which have been used from the past several years are keyboards, mouse, and stylus pen or touch screen surfaces. It started with the wired devices that require a surface to be placed near the processing system making the computer system less portable and bulky. It leads to the development of wireless devices that used a short-range wireless c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 384–395, 2023. https://doi.org/10.1007/978-3-031-35510-3_37
A Survey on 3D Hand Detection and Tracking
385
technology to communicate with the computer system. They are advantageous over the wired devices in terms of their portability to the peripheral devices, however, they also require the surface for placement and some battery system. Both the interfacing devices (wired or wireless) have their pros and cons based on their usage and place of application. It advents the drift in research to make them more compact and user-friendly. Hence, touchscreens came into picture which reduced the dependencies of keyboard/mouse that drastically improve the user experience with computer system. However, they are difficult to handle for large screen devices [1]. One viable solution is to provide input to the computer without any handheld equipment that leads to an emerging area of virtual reality or augmented reality. It needs specialized equipment like glasses for creating a virtual environment [2]. Apart from that, all these devices cause health problems for the users like bad posture due to prolonged use may lead to wrist pain, stiffness, cervical, swelling from shoulder to palm and fingers and not good for eyes health when in close proximity [3]. It is now possible to solve such a complex problem with the advent of emergent technologies like computer vision, RGBD camera, image processing, machine learning and depth sensing technologies. These technologies not only resolve the above problems but also improve the user experience in terms of usability and accessibility while reducing the hardware requirements. Hence, it motivates us to study the existing literature on the use of advanced technologies that involved in the development of an efficient and effective human computer interface like mid-air mouse and typing keyboard [4] which can be done with the help of processing architectures, 3D hand detection and tracking algorithms. The organization of paper is: Sect. 2 provides the methodology adopted for the selection of the papers. Section 3 presents existing work in 3D hand detection and tracking algorithms followed by their hardware implementation in Sect. 4. Section 5 provides their applications in the human computer interfacing. The major findings and future challenges are discussed in Sect. 6 which is followed by conclusion in Sect. 7.
2
Methodology and Paper Selection
The literature review mainly focuses on original papers investigating 3D hand detection and tracking. Only the journals, conference publications and patents are considered. The published works are identified by conducting a systematic literature search in IEEE Xplore, Springer, ScienceDirect, ACM Digital Library and Google patents. These resources were used due to their technological and scientific orientation. The literature search consisted of three key concepts (i) 3D hand detection and tracking algorithms (ii) capturing technologies and (iii) hardware implementation, e.g., Fully Programmable Gate Array (FPGA). Techniques. The literature search was conducted using the following keywords: ‘HCI’, ‘human computer interaction’, ‘human computer interfacing’, ‘3D hand recognition’, ‘3D hand tracking’, ‘3D hand detection’, ‘deep learning’, ‘machine learning’, ‘capturing device’, ‘sensors’, ‘depth sensing’, ‘rgb’, ‘camera’, ‘stereovision’,
386
A. Bajaj et al.
‘active stereo’, ‘passive stereo’, ‘fpga’, ‘gpu’, ‘processor’, ‘development board’. The papers were then shortlisted based on the novel approaches and duplication was avoided. This review focuses on papers published from 2002 to 2022 inclusive. The selected papers were studied carefully to assure that the eligibility criteria were satisfied. Through this process, 38 articles were considered as relevant, hence included in this review. Moreover, the papers are categorized into two major categories: software and hardware implementation (see Fig. 1). The software implementation included the 3D hand detection and tracking algorithms (17) like CNN, R-CNN, VGG16, YOLO, matching and estimation algorithms. The hardware implementation covered the capturing devices (31), and FPGA based implementation (12). The capturing devices are further sub-divided into non-vision based devices (6) and vision based (25). The vision-based devices consisted RGB (7) depth sensing (6) and combination of both, i.e., RGBD devices (12). The numbers in round brackets tell the number of papers discussing a particular method. However, there are overlaps also like Sawant et al. [24] proposed the 3D matching and estimation algorithms using RGBD camera and implemented on FPGA.
Fig. 1. Categorization of papers based on the technology used
3
3D Hand Detection and Tracking Algorithms
3D image processing is getting attention of the researchers which requires depth information. Depth images can either be extracted from stereo-video cameras or directly sensed with depth cameras like Microsoft Kinect. The image frame acquired from the camera sensor is transferred to the image sequencer for storing current image and past images. These images are passed to the hand localization
A Survey on 3D Hand Detection and Tracking
387
unit for image segmentation to get the hand related information. It includes various processing tasks like depth thresholding, body detection for hand location, cascade classifiers on HAAR features and skin-color maps [5]. After hand segmentation; hand tracking is done to capture temporal and spatial information for gesture recognition. The Kalman filter, CAMSHIFT and mean shift were used for hand tracking. The shape and position of hand are then processed by the classification algorithms to classify the gestures according to the problem at hand. Some of such gesture classification algorithms are Hidden Markov Models (HMMs) with Gaussian Process Regression, k-Nearest Neighbors (k-NN), Average Neighborhood Margin Maximization (ANMM), Neural Networks (NN), Support Vector Machines (SVM), Temporal matching methods, for more details refer to [6]. Various researchers have proposed solutions for 3D hand detection and tracking algorithms like Joo et al. [7] detected and tracked the hands only from depth images by designing a classifier that has boosting and cascading structure. The classifier checked the difference in depth at learning and detection phase to predict the hand region with the help of Depth adaptive mean shift algorithm. Ma and Peng [8] presented an improved threshold segmentation method for stable gesture recognition irrespective of distance. It used the depth and color information and extracted the hand using the local neighbor method in the complex scenarios. The position of fingertips was identified with the convex hull detection algorithm. The results showed accurate and quick recognition with the distance interval of 0.5 m to 2.0 m. Das [9] collected a large dataset of depth images and developed a new 3D convolutional neural network (CNN) to estimate pointing direction through gestures depth images. Tran et al. [12] developed a tool using a 3D convolutional network with 3D RGB-Depth camera for spotting and recognition of hand gestures in real time. For RGB-D cameras; a Microsoft Kinect V2 camera was used for data collection and 3D image capturing in real time. The program was developed on computer system to extract fingertip locations and gesture recognition. Similar approach and methodology were also presented by Park et al. [13] to extract the hand regions but used Generalized Hough transform instead of convolutional methodologies. Liu et al. [14] also suggested a Microsoft Kinect Device based two stage hand recognition system, i.e., data training and gesture recognition. The hand dynamic knowledge was stored in a radial basic function neural networks during training stage. The knowledge was extracted through estimators in recognition stage to attain correct gestures in real time.
4
Hardware Implementation of 3D Hand Detection and Tracking Algorithms
The development in the 3D image processing increased substantially in the past few years that overcomes the bottleneck issues in the image capturing and their processing technology. Now there are newer capturing techniques like usage of two cameras and dot projectors. However, there has been a lot of research being
388
A. Bajaj et al.
conducted in making processing devices and cameras more efficient and improved performance which are further discussed below: Swaminathan [10] coupled vertical cavity surface emitting laser (VCSEL) array with a micro-lens array in such a manner that light passed from VCSEL array projected a sequence of patterns onto object. One camera was used to capture the images of the same object illuminated from VCSEL array. The captured images were used by processing unit to reconstruct the depth information. Spektor et al. [11] proposed a device and designed the processor architecture with two input ports connection for two CMOS color sensors. The processor was configured to receive all captured images from the CMOS sensors and to create a depth and color images from them. This collected information was transmitted to output ports. The device also consists of other modules like ADC, Power module etc. for communications, processing, power management and different peripheral device interfacing modules. [38] proposed the usage of VCSEL technology for imaging that leverage millimeter level accuracy at good capturing speed which would help in hand gesture recognition and tracking. Krips et al. [15] combined the advantages of artificial intelligence and VLSI circuits and designed a real time detection and tracking system on video images. It used the single pixel-based classification using RGB values to detect hands and the artificial neural network was realized on Field-Programmable Gate Array (FPGA). Further, Oniga et al. [16] proposed a FPGA based gesture recognition system. The authors used the artificial neural networks to process and classify the static hand gestures. Hikawa and Kaida [17] employed novel video processing architecture and used self-organizing map and Hebbian network on FPGA. The proposed posture recognition system used 24 American sign-language hand signs, and results showed recognition speed of 60 frames/s. On the other hand, Wang [18] used an improved hand segmentation algorithm combined YCbCr color space for hand tracking. The movement of the hands were detected by the threeframe-difference motion detection method. Finally, gestures were recognized by developing the hand model based on the detected finger state. The complete system was implemented on Cyclone II FPGA. Singh et al. [19] proposed a FPGA based smart camera system for applications like focused region extraction, video history generation, object detection, object tracking, filtering of frames of interest and motion detection are shown. The authors used Xilinx ML510 (Virtex-5 FX130T) FPGA Board and developed a customized board for camera interfacing using VHDL programming language coding on Xilinx ISE and ModelSim Simulator. One can refer to Singh et al. [20] to get details for development of camera-based FPGA solutions. Moreover, the choice of right FPGA board for application development and comparison of some FPGA development boards with resource availability was also discussed [21,22]. Temburu et al. [23] developed an ARM-FPGA System-On-Chip implementation as GPU implementation for 3D stereo mapping. A pipeline architecture was implemeted to handle the tasks on the processing units for using plural ARM cores. The tasks constitute of capturing, rectifying, processing them to 3D point
A Survey on 3D Hand Detection and Tracking
389
clouds and finally transmitting them as a Robotic Operating System (ROS) message. The intel realsense D435i was also used as hardware module for stereovision system. Prior to this; Sawant et al. [24] applied a Semi Global Matching (SGM), More Global Matching (MGM) and their modification version-based stereovision disparity estimation algorithms. They were implemented with the help of Intel Realsense D435i camera with Zedboard FPGA development board using OpenCV library. It can be observed that the whole process requires high performance hardware like 128 Core GPU which increases the overall cost of the device. Table 1 briefs the 3D hand recognition software algorithms and hardware used by the researchers. Observations from Table 1 shows that RGBD camera and depth sensors are used by the researchers to detect and track the hands in 3D. Most of the works used neural networks for 3D hand detection and tracking and FPGA boards for hardware implementation. On the other hand, few researchers have detected and tracked the fingers for better recognition. Several image processing and computer vision techniques applied are classification, segmentation, pointing direction estimation and hand posture/sign recognition which can aid in the development of the effective and efficient human computer interfacing.
5
3D Hand Detection and Tracking Algorithms for Human Computer Interfacing
Islam et al. [25] recognized hand gestures with computer vision methods for controlling real-time applications. These gestures are later converted into actions for mouse or keyboard actions. Mukherjee et al. [26] showed a method of writing in computer system by tracking the fingertips in real time video system referred as Table 1. Summary of hardware techniques and software algorithms used for 3D hand detection and tracking Authors
Recognition Type
Capturing Technique
Processing Technique
Algorithms Used
Hardware
Joo et al. [7]
Hand Detection and Tracking Finger Tracking
Depth Sensing
Classification
-
RGB and depth sensing Depth Sensing RGBD camera
Segmentation Estimate pointing direction Hand gesture recognition
Depth adaptive mean shift algorithms Convex hull detection algorithm 3D CNN 3D CNN
-
RGBD camera
Hand region extraction
RGBD camera
Hand gesture recognition
Generalized Hough Transform 3D CNN
-
RGB sensing
Pixel based classification
ANN
FPGA
RGB sensing
Gesture recognition
ANN
FPGA
RGBD camera
Posture recognition Segmentation with YCbCr color space
Self-organizing map and Hebbian network Three-frame-difference motion detection method
FPGA
RGBD camera
Ma and Peng [8] Das [9] Tran et al. [12]
Finger Tracking Hand Detection and Tracking Park et al. [13] Hand Detection and Tracking Liu et al. [14] Hand Detection and Tracking Krips et al. [15] Hand Detection and Tracking Oniga et al. [16] Hand Detection and Tracking Hikawa and Kaida [17] Hand Detection and Tracking Wang [18] Finger Tracking
-
-
FPGA
390
A. Bajaj et al.
air-writing. The hand recognition was done through Faster R-CNN framework by detecting hand in image frame, segmenting it and counting the raised fingers. A new fingertip detection and tracking method called distance weighted curvature entropy was also introduced. Moreover, a fingertip velocity-based criterion was employed to determine that writing was stopped. Ghosh et al. [27] proposed an image processing based virtual keyboard which could resolve the problem of keyboard-based hardware limitations and also be used for augmented reality and air keyboard purposes. Chhibber et al. [28] proposed hand posture styles for triggering commands on a laptop. The researchers conducted deep learning training on 30-participant posture preferences and generated nearly 350K images dataset with different lighting conditions and backgrounds. The model is validated with 20 participants under real-time usage. Results found low error rates and fast formation time. Raees et al. [29] proposed a navigation in virtual environment with index finger movement by placing green piece of paper on fingertip. The dynamic mapping of fingertip was done with the help of OpenCV and OpenGL. Enkhbat et al. [30] proposed a virtual keyboard operation by recognizing the typing hands using a single RGB camera. CNN was used for detecting the click actions from finger position, movement and movement speed. Initially background subtraction technique is used to remove pixel data other than hand pixels. VGG16 architecture was used to train data and a score was generated to decide whether clicks are true or false. The best score obtained was 92% in experienced group and 67% in unexperienced group. Similarly, Chua et al. [31] also developed a hand gesture control system using a single RGB camera. An algorithm was developed which integrates gestures with the controls of keyboard and mouse. Various python libraries like pynput were used with YOLOv3 model to attain good results. Du et al. [32] proposed a 3D hand model fitted to a virtual keyboard to accurately determine the position of finger. A structured light was used to determine the range of hand motion. The hand motion and hand model were matched to estimate the pose of hand. Furthermore, optimization algorithms were used to improve the speed. Robertson et al. [33] proposed algorithms and architecture for a vision-based virtual mouse interface that tracked and recognized the user’s hand positions and signs to control an intelligent kiosk. On the other hand, non-vision based methods have also been proposed like Hu et al. [34] developed a virtual keyboard that works on any flat surface using 60 GHz Wi-Fi radio as radar. It used a signal processing pipeline for detecting, segmenting, separating, and recognizing keystrokes. It enabled concurrent keystrokes and required minimal one-time effort to calibrate the keyboard at initial setup. [37] also introduced a method of attaining the three dimensional information in vicinity of hand. It showed a smartwatch having multiple sensors connected to different locations so as to attain the hand and finger gestures leading to a good user response. It is observed from the above methods that even without using color and depth sensing cameras; the three dimensional data
A Survey on 3D Hand Detection and Tracking
391
of hand movements can be generated but would impose even a bigger distance constraint as well as requires additional sensors to be embedded in hardware. Miwa et al. [35] analyzed the defocus information of a finger touch on the virtual keyboard without using the 3D position. The finger touch was detected by comparing the DCT (Discrete Cosine Transform) coefficient of the two images obtained from the 2 cameras and a half mirror. The 3D point of the finger like feature point, edge, or small region should be identified, which is a flaw in the previous virtual keyboard. This time-consuming task is resolved by using the optical system of the virtual keyboard. The authors formulated theoretical minimum distance and verified the same with experiment. Ambrus et al. [36] disclosed a touch interface based on 3D camera. The user places his hands on the table top surface which behaves as touch interface for typing on the computer system. The camera detects the relative position of the hands and transmits them to the computer system. Later, the computer system processes the images and generates the meaningful results from images of hand gestures through which actions on the screen are performed. A summary of the above works in terms of their application in human computer interfacing is presented in Table 2. Table 2. Application areas of 3D hand detection and tracking algorithms in human computer interfacing Authors
Hand Detection and Tracking
Islam et al. [25] Mukherjee et al. [26] Ghosh et al. [27] Chhibber et al. [28] Raees et al. [29] Enkhbat et al. [30] Chua et al. [31] Du et al. [32] Robertson et al. [33] Hu et al. [34] Miwa et al. [35]
Y Y Y
Finger Tracking
Mouse Keyboard Writing
Y Y Y
Y Y Y
Y Y Y Y Y Y Y Y
Y Y Y
Y
Y Y Y
Y Y
Y Y
It is observed from the table that most of the work has been done either for the mouse or keyboard operations. Only one work discussed the writing operations. The resolution is enhanced by several researchers to fine tune the results for finger tracking.
392
6
A. Bajaj et al.
Major Findings and Future Challenges
Observations suggests that though some helping aids and augmented realitybased input interfaces are there, still they do not provide full-fledged solution. It is also revealed from the literature that the main focus was on a single topic like hand recognition system, hand tracking system and their simple hardware implementation. Researchers also proposed 3D camera based human computer interfaces like virtual mouse or keyboard, however, they were intended to perform the specific task of either mouse or keyboard. Moreover, it is also observed that the existing solutions require additional GPU/CPU to process a large amount of 3D data operations. In addition to high processing power, there is a high latency between the system and hand gesture which thereby increases the overall cost of the device and requirement of highperformance GPU. Apart from this, the functioning of the device in the vicinity of high/low light region was not analyzed. The inaccuracies and misfires are also needed to be addressed. Therefore, we need a device which is efficient and effective to solve these issues, i.e., accessing and controlling the computing systems. In other words, development of an efficient and effective human computer interface with accurate, precise and fast hovering mouse, keyboard and writing operations. Alternatively, our future work is to propose a human computer input interfacing system with the following benefits: – – – – –
7
Elimination of extra hardware like mouse, keyboard and stylus Removal of surface requirements Capability to work from distance without holding any device Better and faster writing and typing speed Performing all tasks like mouse, keyboard and stylus in a single device
Conclusions
The paper reviews the existing work in the 3D hand detection and tracking algorithms and their applications in the human computer interfacing. The study began with the use of RGBD cameras having more powerful processor and different type of algorithms then the conventional RGB cameras. The emerging depth sensors added with color sensing cameras has enhanced various hand detection and tracking approaches and applications. However, the solution which could prevail in working environment has not yet developed, therefore, there is need of not only better capturing and processing equipment but also better algorithmic solutions and approaches to attain optimum solution. The future work is to develop an efficient and effective human computer interface with accurate, precise and fast hovering mouse and keyboard operations.
A Survey on 3D Hand Detection and Tracking
393
Acknowledgement. This research has been financially supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70–2021-00143 dd. 01.11.2021, IGK 000000D730321P5Q0002). Authors acknowledge the technical support and review feedback from AILSIA symposium held in conjunction with the 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022).
References 1. Kaminani, S.: Human computer interaction issues with touch screen interfaces in the flight deck. In: 2011 IEEE/AIAA 30th Digital Avionics Systems Conference, pp. 6B4-1. IEEE (2011) 2. Hutama, W., Harashima, H., Ishikawa, H., Manabe, H.: HMK: head-MountedKeyboard for Text Input in Virtual or Augmented Reality. In: The Adjunct Publication of the 34th Annual ACM Symposium on User Interface Software and Technology, pp. 115–117. ACM (2021) 3. Yadegaripour, M., Hadadnezhad, M., Abbasi, A., Eftekhari, F., Samani, A.: The effect of adjusting screen height and keyboard placement on neck and back discomfort, posture, and muscle activities during laptop work. Int. J. Human-Comput. Interact. 37(5), 459–469 (2021) 4. Yi, X., Liang, C., Chen, H., Song, J., Yu, C., Shil, Y.: From 2D to 3D: facilitating Single-Finger Mid-Air Typing on Virtual Keyboards with Probabilistic Touch Modeling. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 694-695. IEEE (2022) 5. Toni, B., Darko, J.: A robust hand detection and tracking algorithm with application to natural user interface. In: 2012 Proceedings of the 35th International Convention MIPRO, pp. 1768-1774. IEEE (2012) 6. Suarez, J., Murphy, R.R.: Hand gesture recognition with depth images: a review. In: 2012 IEEE RO-MAN: the 21st IEEE International Symposium on Robot and Human Interactive Communication, pp. 411–417. IEEE (2012) 7. Joo, S.I., Weon, S.H., Choi, H.I.: Real-time depth-based hand detection and tracking. Sci. World J. 2014, 17 (2014) 8. Ma, X., Peng, J.: Kinect sensor-based long-distance hand gesture recognition and fingertip detection with depth information. J. Sens. 2018, 1–9 (2018) 9. Das, S.S.: Techniques for estimating the direction of pointing gestures using depth images in the presence of orientation and distance variations from the depth sensor Doctoral dissertation. (2022) 10. Swaminathan, K., Grunnet-Jepsen, A., Keselman, L.: Intel Corp: Compact, low cost VC SEL projector for high performance stereodepth camera. U.S. Patent 10, 924,638. (2021) 11. Spektor, E., Mor, Z., Rais, D.: PrimeSense Ltd: integrated processor for 3D mapping. U.S. Patent 8,456,517 (2013) 12. Tran, D.S., Ho, N.H., Yang, H.J., Baek, E.T., Kim, S.H., Lee, G.: Real-time hand gesture spotting and recognition using RGB-D camera and 3D convolutional neural network. Appl. Sci. 10(2), 722 (2020) 13. Park, M., Hasan, M.M., Kim, J., Chae, O.: Hand detection and tracking using depth and color information. In: Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV’12), 2, pp. 779–785 (2012)
394
A. Bajaj et al.
14. Liu, F., Du, B., Wang, Q., Wang, Y., Zeng, W.: Hand gesture recognition using kinect via deterministic learning. In: 2017 29th Chinese Control and Decision Conference (CCDC), pp. 2127–2132. IEEE (2017) 15. Krips, M., Lammert, T., Kummert, A.: FPGA implementation of a neural network for a real-time hand tracking system. In: Proceedings 1st IEEE International Workshop on Electronic Design, Test and Applications, pp. 313–317 IEEE (2002) 16. Oniga, S., Tisan, A., Mic, D., Buchman, A., Vida-Ratiu, A.: Hand postures recognition system using artificial neural networks implemented in FPGA. In: 2007 30th International Spring Seminar on Electronics Technology (ISSE), pp. 507–512. IEEE (2007) 17. Hikawa, H., Kaida, K.: Novel FPGA implementation of hand sign recognition system with SOM-Hebb classifier. IEEE Trans. Circuits Syst. Video Technol. 25(1), 153–166 (2014) 18. Wang, Z.: Hardware implementation for a hand recognition system on FPGA. In: 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication, pp. 34–38. IEEE (2015) 19. Singh, S., Saurav, S., Saini, R., Mandal, A.S., Chaudhury, S.: FPGA-Based Smart Camera System for Real-Time Automated Video Surveillance. In: Kaushik, B.K., Dasgupta, S., Singh, V. (eds.) VDAT 2017. CCIS, vol. 711, pp. 533–544. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-7470-7_52 20. Singh, S., Saurav, S., Shekhar, C., Vohra, A.: Prototyping an automated video surveillance system using FPGAs. Int. J. Image Graph. Signal Process. 8(8), 37 (2016) 21. Singh, S., Shekhar, C., Vohra, A.: FPGA-based real-time motion detection for automated video surveillance systems. Electronics 5(1), 10 (2016) 22. Singh, S., Mandal, A.S., Shekhar, C., Vohra, A.: Real-time implementation of change detection for automated video surveillance system. Int. Sch. Res. Not. 2013, 5 (2013) 23. Temburu, Y., Datar, M., Singh, S., Malviya, V., Patkar, S.: Real time System Implementation for Stereo 3D Mapping and Visual Odometry. In: 2020 IEEE 4th International Conference on Image Processing, Applications and Systems (IPAS) pp. 7–13. IEEE (2020) 24. Sawant, P., Temburu, Y., Datar, M., Ahmed, I., Shriniwas, V., Patkar, Sachin: Single Storage Semi-Global Matching for Real Time Depth Processing. In: Babu, R.V., Prasanna, M., Namboodiri, Vinay P.. (eds.) NCVPRIPG 2019. CCIS, vol. 1249, pp. 14–31. Springer, Singapore (2020). https://doi.org/10.1007/978-981-158697-2_2 25. Islam, S., Matin, A., Kibria, H.B.: Hand Gesture Recognition Based Human Computer Interaction to Control Multiple Applications. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2021. LNNS, vol. 371, pp. 397–406. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93247-3_39 26. Mukherjee, S., Ahmed, S.A., Dogra, D.P., Kar, S., Roy, P.P.: Fingertip detection and tracking for recognition of air-writing in videos. Expert Syst. Appl. 136, 217– 229 (2019) 27. Ghosh, P., Singhee, R., Karmakar, R., Maitra, S., Rai, S., Pal, S.B.: Virtual Keyboard Using Image Processing and Computer Vision. In: Tavares, J.R.S., Dutta, P., Dutta, S., Samanta, Debabrata (eds.) Cyber Intelligence and Information Retrieval. LNNS, vol. 291, pp. 71–79. Springer, Singapore (2022). https://doi.org/ 10.1007/978-981-16-4284-5_7
A Survey on 3D Hand Detection and Tracking
395
28. Chhibber, N., Surale, H.B., Matulic, F, Vogel, D.: Typealike: near-Keyboard Hand Postures for Expanded Laptop Interaction. In: Proceedings of the ACM on HumanComputer Interaction, 5(ISS), pp.1–20. ACM (2021) 29. Raees, M., Ullah, S., Rahman, S.U.: VEN-3DVE: vision based egocentric navigation for 3D virtual environments. Int. J. Interact. Des. Manufact. (IJIDEM) 13(1), 35–45 (2019) 30. Enkhbat, A., Shih, T.K., Thaipisutikul, T., Hakim, N.L., Aditya, W.: HandKey: an Efficient Hand Typing Recognition using CNN for Virtual Keyboard. In: 2020-5th International Conference on Information Technology (INCIT), pp. 315–319. IEEE (2020) 31. Chua, S.N., Chin, K.Y., Lim, S.F., Jain, P.: Hand Gesture Control for HumanComputer Interaction with Deep Learning. J. Electr. Eng. Technol. 17(3), pp. 1961–1970 (2022) 32. Du, H., Charbon, E.: 3D hand model fitting for virtual keyboard system. In: 2007 IEEE Workshop on Applications of Computer Vision (WACV’07), pp. 31–31. IEEE (2007) 33. Robertson, P., Laddaga, R., Van Kleek, M.: Virtual mouse vision based interface. In: Proceedings of the 9th international conference on Intelligent user interfaces, pp. 177–183 (2004) 34. Hu, Y., Wang, B., Wu, C., Liu, K.R.: Universal Virtual Keyboard using 60 GHz mmWave Radar. In: 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), pp. 385–390. IEEE (2021) 35. Miwa, M., Honda, K., Sato, M.: Image Defocus Analysis for Finger Detection on A Virtual Keyboard. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 24–30. IEEE (2021) 36. Ambrus, A.J., Mohamed, A.N., Wilson, A.D., MOUNT, B.J. Andersen, J.D.: Microsoft Technology Licensing LLC: Touch sensitive user interface (2017) 37. Devrio, N., Harrison, C.: Disco Band: multiview Depth-Sensing Smartwatch Strap for Hand, Body and Environment Tracking. In: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pp. 1-13. (2022) 38. Han, Y., Li, Z., Wu, L., Mai, S., Xing, X., Fu, H.Y.: High-Speed Two-Dimensional Spectral-Scanning Coherent LiDAR System Based on Tunable VCSEL. J. Lightwave Technol. 25 Oct 2022
Multi-level Image Segmentation of Breast Tumors Using Kapur Entropy Based Nature-Inspired Algorithms Shreya Biswas1 , Anu Bajaj2,3(B) , and Ajith Abraham3,4 1 Jadavpur University, Jadavpur, India 2 Thapar Institute of Engineering and Technology, Patiala, India
[email protected]
3 Machine Intelligence Research Labs, Auburn, USA 4 Center for Artificial Intelligence, Innopolis University, Innopolis, Russia
Abstract. Medical image segmentation entails the extraction of essential Region of Interests (RoIs) for the further analysis of the image. Segmentation using the thresholding technique partitions an image into multiple objects and one background using multiple threshold values. This paper focuses on the application of dragonfly algorithm and crow search algorithm to optimize between-class variance using Kapur’s entropy as fitness function. The proposed methods have been assessed on benchmarked images for threshold values ranging from 2 to 14 and performance is compared with traditional methods like Kapur and Otsu’s multilevel threshold techniques. Experimental results have been evaluated using wellperformed metrics like peak signal to noise ratio, visual information fidelity, structural similarity index matrix, and feature based similarity index matrix. Computational time is also compared. Experimental results show that the proposed method dragonfly algorithm with Kapur’s entropy performed better compared to crow search algorithm and traditional methods. Keywords: Dragonfly Algorithm · Image Processing · Segmentation · Kapur Entropy · Multi-level thresholding · Biomedical Images · Breast Tumor detection · Crow Search Algorithm
1 Introduction Breast cancer is the one that most affects women all around the world [1]. In order to identify diseases in their early stages, disease screening procedures are continually being developed. One tool for aiding in the identification of breast illnesses is the dynamic thermography image analysis. The main goal of segmenting the images is to identify problematic areas of the anatomy, so that further analysis can be done. But it is difficult and a time-consuming task, because MRI, CT scans and IR images can be tricky making their understanding cumbersome. Trained technicians are required in order to understand these specialized images. Automatic cancer detection and segmentation of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 396–407, 2023. https://doi.org/10.1007/978-3-031-35510-3_38
Multi-level Image Segmentation of Breast Tumors
397
the cancerous cells is of paramount importance since sometimes it becomes difficult to manually detect the presence of these cancerous cells. The use of dynamic thermography image analysis for breast cancer identification has been shown in works like [2, 3]. Thermal infrared images (thermographics) can be used to reveal lesions in dense breasts. In these images, the temperature of the regions that contain tumors is warmer than the normal tissue. To detect this difference in the temperatures, thermal infrared cameras are used to generate infrared images at fixed time steps, obtaining a sequence of infrared images. Thermograms are reliable, cost-effective and accurate. They are non-invasive, do not emit any radiation, and are different since they can be used to test physiological changes in addition to anatomical changes [4]. The heterogeneity of the images for breast cancer have an anatomopathological explanation [5]. In fact, the neoplasm cellular structure is different from the one of normal tissue. Blood flows and histology of neoplasms are significantly different from the normal tissue around them. If an image contains multiple objects of different brightness or reflectivity then multiple thresholds are needed to segment the image [6]. Histogram analysis is required for this since looking at the histogram of an image, we can get an idea of the approximate threshold depending on the regions of the crests and valleys. Regions with uniform intensity give rise to strong peaks in the histogram [7]. It is clear from the histograms in Table 1 that it is difficult to find a single global threshold to clearly segment the objects. Table 1. Histogram analysis of some images from the DMR-IR database Sick Breast Images Original Image
Histogram
Healthy Breast Images Original Histogram Image
398
S. Biswas et al.
To find the optimal thresholds, entropy criterion has been widely used as information theory progressed [8]. Entropy quantifies the information content of the image. In this proposed work, we used Kapur Entropy as the fitness value for the meta-heuristic algorithms. The search for multiple optimal thresholds is a brute forced searching method and is often time consuming and resource extensive [9]. To reduce the complexity, optimisation processes are incorporated so as to reduce the initial possible combinations [10]. To solve this problem, we employ nature-inspired meta-heuristic algorithms to first optimize the initial set of pairs among which the optimal thresholds are chosen. Hence, this paper proposes a Kapur’s entropy-based Dragonfly Algorithm (DA) [11] and Crow Search Algorithm (CSA) [12] for multilevel thresholding of Breast Tumor Segmentation. The organization of the paper is as follows: Sect. 2 briefs the related work in breast cancer segmentation using nature-inspired algorithms. The working of the proposed algorithms is discussed in Sect. 3. Section 4 presents the experimental setup and the results are discussed in Sect. 5 followed by conclusion in Sect. 6.
2 Related Works Several researchers have proposed computation of multilevel thresholding for image segmentation in the medical domain. Authors in [13] have used an improved golden jackal optimization algorithm for skin cancer image segmentation. Improved ant colony optimization has been proposed for COVID-19 X-ray image segmentation [14]. Multilevel thresholding and data fusion techniques have been used for a color image segmentation method and applied on the breast cancer cells images [15]. Modified versions of the chimp optimization algorithm have been proposed for multi-threshold breast tumor segmentation [16, 17]. On the other side, Zhao et al. [18] proposed a salp swarm algorithm for segmenting the breast cancer microscopy using multi-thresholding. These works motivate us to apply recent algorithms like DA and CSA as they are advantageous over others in terms of few parameter settings and premature convergence [19].
3 Proposed Work In multi-level thresholding technique, distinct segments Si [S0 , S1 , …, Sn ] of given image for thresholds T1 , T2 , …, Tn are found by usingSi = {0, if Ti ≤ f (i, j) < Ti+1 }
(1)
This section describes the working of proposed DA and CSA algorithms fused with Kapur Entropy as fitness function to find multiple optimal thresholds of an image.. Initially, a random solution is generated and fitness is calculated. The proposed algorithms iterate for new candidate solutions based on the fitness values.
Multi-level Image Segmentation of Breast Tumors
399
3.1 Kapur Entropy For image segmentation, Kapur entropy has been often used and has shown to provide satisfactory results [7]. Assuming that [T1 , T2 , T3 , …,Tn ] represents the threshold combinations which divide the image in different segments, and fitness function of Kapur’s entropy is: f (T1 , T2 , T3 , ...., Tn ) = argmax{H (Ti , T2 , T3 , ...., Tn )}
(2)
where Hi are the distinct segment entropies, ωi , denote the probability of each class and are defined as Hi = −
T i −1 j=0
T i −1 pj pj ln , ωi = pj ωi ωi
(3)
j=0
3.2 Dragonfly Algorithm The best position of the food sources after a sufficient number of iterations provides us with the optimal thresholds for a particular image. The position updating formula for the position of the dragonflies is as follows Xt+1 = (sSi + aAi + cCi + fFi + eEi ) + wXt
(4)
where s, a, c, f, e, and w are the separation, alignment, cohesion, food, enemy and inertia factors, S i , Ai , C i , F i , and E i , denote the separation, alignment, cohesion, food source, and the position of enemy of ith individual, and t is the number of iterations. The equations for S, A, C, F, and E are as follows: Si = −
M X − Xj
(5)
j=1
M
Ai =
j=1 Xj
M
(6)
Ci = Ai − X
(7)
Fi = X + − X
(8)
Ei = X − − X
(9)
where X, X + , X − and X j are the current dragonfly, food source, enemy and jth neighbor positions, M is the number of neighboring dragonflies. The dragonflies’ positions are updated with the step vector Xt+1 and levy flight as: Xt+1 = Xt + Levy × Xt+1
(10)
400
S. Biswas et al.
3.3 Crow Search Algorithm CSA tries to simulate the behavior of crows and how they manage and store their food sources. At a particular time t, the position update of a crow (X i ) [with fl as the flight length] is dependent on that of a random crow (X j ) and then it generates a random number (r i ). M is taken to be the memory matrix, where mi,t ∈ M is the memory of ith crow at time t. If r i is greater than Awareness Probability (AP), then (11) Xi,t+1 = Xi,t + ri ∗ fl t ∗ mj,t − Xi,t else, Xi,t+1 = a random position
(12)
If the fitness function of the new solution Xi,t+1 is less than mi,t+1 , then. mi,t+1 = Xi,t+1
(13)
mi,t+1 = mi,t
(14)
else,
4 Experimental Setup The proposed method has been evaluated on a publicly available standard dataset for Breast Tumor images, called the Database for Research Mastology with Infrared Image (DMR-IR Dataset) [20]. The database mainly consists of the regions of interest (ROIs) of the images in grayscale. Some of the images in this database are as shown in Fig. 1. The values of the parameters, s, c, f , e, a and w of DA were 0.1, 0.7, 1, 1, 0.1, and 0.5 for DA and for CSA, AP and fl were taken as 0.1 and 2 respectively. For both the algorithms, we observed that more iterations than 100 did not improve the results significantly and 20 population size was enough for the convergence of the algorithm.
Healthy Images
Healthy Left
Healthy Right
Cancerous Images
Sick Right
Sick Left
Fig. 1. Sample breast images from the DMR-IR database
We have used some standard metrics to compare the results like Peak signal-to-noise ratio (dB) (PSNR), Structural Similarity Index Matrix (SSIM) [21], Visual Information Fidelity (VIF) [22], Feature-based similarity index Matrix (FSIM) [23]. Each algorithm is implemented 10 times because of the stochastic nature of the algorithms.
Multi-level Image Segmentation of Breast Tumors
401
5 Results This section gives the experimental results and their analysis. The proposed algorithms (DA_K and CSA_K) checked the search capability up to 14 threshold values. We have compared results with Multi Kapur’s Entropy based thresholding technique (MKE), Multi Otsu’s thresholding technique (MO), and the Dragonfly algorithm with Otsu as its fitness function (DA_O). We applied these algorithms on 64 images in the database. The segmentation results are shown in Table 2 for number of thresholds (N) = 2, 4, 6, 8, 10, and 14. Table 2. Optimal threshold values of different images for all the algorithms Image
Healthy Right
Healthy Left
N
DA_K
MKE
DA_O
MO
CSA_K
2
132 171
112 173
108 144
99 156
115 159
4
120 181 196 218
119 151 176 218
95 131 167 201
82 120 151 183
67 91 148 186
6
82 107 144 161 192 215
70 117 149 177 199 212
60 97 119 148 184 238
68 112 130 142 155 187
73 97 121 158 180 196
8
33 84 108 126 144 164 196 218
43 87 98 136 151 49 74 99 116 131 50 81 110 130 166 183 200 158 182 202 149 167 184 203
50 108 126 146 156 190 206 214
10
44 65 86 100 127 50 71 90 106 145 42 68 80 103 129 52 74 87 99 137 139 165 196 211 146 183 202 227 146 168 193 205 157 162 186 210 225 236 223 229
44 65 86 100 127 139 165 196 211 225
14
31 42 56 69 81 93 38 48 60 74 81 95 39 48 55 67 78 92 31 41 55 62 78 89 29 36 47 59 72 90 104 119 131 144 118 121 133 142 101 114 138 145 97 102 128 147 118 128 140 168 158 187 211 223 167 176 189 220 163 170 199 219 159 177 193 210 179 193 203 219
2
87 181
112 189
101 144
99 156
95 145
4
77 181 193 208
79 175 195 218
85 192 194 206
81 152 200 215
91 161 214 213
6
73 95 117 165 188 204
67 83 114 149 176 67 81 89 173 189 89 113 100 139 199 209 197 210
72 95 133 168 193 211
8
54 99 104 125 189 206 213 219
48 96 107 112 170 55 106 115 143 194 206 228 192 194 197 212
51 87 118 131 174 197 209 225
10
51 66 84 98 117 131 152 171 196 207
49 68 86 92 101 110 124 162 186 200
46 71 89 102 110 47 58 79 100 102 41 68 89 99 107 133 151 176 207 131 152 162 190 118 135 167 185 214 211 201
14
43 55 65 79 92 101 112 120 131 149 158 176 188 211
41 55 60 71 80 101 119 130 145 159 167 186 197 202
38 52 59 80 93 104 111 123 141 149 165 180 190 208
45 55 65 82 94 101 111 122 133 140 150 168 198 212
43 55 64 74 98 108 117 121 139 152 169 178 188 217
2
122 199
118 201
129 215
112 195
122 200
4
92 132 175 205
97 128 171 208
105 152 175 198
82 124 165 224
97 119 175 199
6
127 144 161 179
117 147 160 180
111 136 153 179
119 124 150 184
130 141 159 186
51 74 102 133 162 185 203 211
(continued)
402
S. Biswas et al. Table 2. (continued) 7 202 228
194 209
221 230
196 211
208 223
8
104 111 137 159 188 209 212 235
96 123 142 153 169 193 205 233
102 122 141 157 177 205 219 234
92 111 134 157 186 193 216 227
96 113 126 140 152 175 202 221
10
98 111 130 142 151 169 182 194 200 224
94 108 125 138 160 171 185 196 208 214
91 117 139 149 153 162 179 180 210 224
97 104 113 139 150 175 183 199 206 221
91 103 129 144 152 168 179 191 202 220
14
82 94 100 116 121 137 147 164 177 182 202 215 221 236
78 98 103 121 127 129 155 171 185 186 203 211 222 239
80 89 108 113 121 132 143 159 174 185 195 209 213 228
86 95 100 115 124 138 155 162 179 180 192 200 220
72 84 110 123 127 143 149 165 172 188 202 211 215 229
2
130 220
110 200
117 202
109 191
104 209
4
81 122 174 229
74 101 181 221
74 93 168 232
82 103 176 223
84 108 169 220
6
79 109 132 171 185 219
83 105 125 159 181 224
85 125 142 155 184 228
79 113 141 157 194 213
85 121 139 153 185 213
8
60 86 121 132 169 194 210 227
74 88 125 132 160 72 101 135 176 203 214 225 188 203 218 226
86 100 123 128 161 203 209 228
86 98 122 138 165 192 210 225
10
55 89 105 113 122 139 158 177 197 218
72 82 105 113 122 73 94 106 109 140 164 172 189 117 132 168 174 211 200 222
71 96 103 114 114 140 157 184 202 216
78 82 114 119 126 134 154 173 190 217
14
41 69 72 89 99 106 121 132 140 159 175 189 202 227
43 54 76 80 90 115 118 123 141 163 166 182 200 223
56 66 73 83 89 100 129 142 149 161 177 183 210 223
46 58 64 91 94 101 111 123 136 150 182 184 194 217
Sick Right
Sick Left
41 59 76 86 98 115 130 139 145 162 178 181 212 227
It is observed from Table 2 that DA_K technique was found to yield efficient results. Referring to Table 1, we notice that the histograms of the corresponding images have dips in the histogram lines almost at those places with the thresholds found by DA_K. If we focus on the image Sick Left in Table 2, Fig. 2 shows the visual representation of all the thresholds obtained by DA_K on it. We have also compared the computational cost and performance of the algorithms is verified using the evaluation metrics. Due to space constraints we have presented the results of the computation time for two images (Sick Left and Sick Right) and the evaluation metric values of one image (Sick Left) as shown in Table 3 and 4. The metrics in Table 4 are mentioned below 255 , (where MSE is the Mean Squared Error) (15) PSNR = 20log MSE Nj Nj Nj = sNj ) j∈subbands I (C ; F | S (16) VIF = Nj Nj Nj = sNj ) j∈subbands I (C ; E | S where N = number of blocks in subband j of the wavelet decomposition by C SL (x).PC m (x) FSIM = x∈ x∈ PC m (x)
Nj
(17)
Multi-level Image Segmentation of Breast Tumors
DA_K with N=2
DA_K with N=4
DA_K with N=6
DA_K with N=8
DA_K with N=10
DA_K with N=14
403
Fig. 2. Visual Representation of Thresholds obtained by DA_K
SSIM (x, y) =
(2μx .μy + C1 )(2σx .σy + C2 ) (μ2x + μ2y + C1 )(μ2x + μ2y + C1 )
(18)
Table 3. Time required to calculate the Optimal threshold values (in seconds) Image
Sick Left
Sick Right
N
DA_K
MKE
DA_O
MO
CSA_K
2
9.783883
29.9993
9.77293
0.01785
9.59373
4
11.39278
201.3409
11.57342
0.39809
11.39924
6
14.29379
4992.971
13.92749
0.87397
12.48284
8
20.38279
13983.38
21.49759
4.98278
23.29483
10
25.99278
50773.89
26.87564
170.3488
27.38283
14
30.02982
119732.2
31.76759
1312.798
33.58479
2 4 6 8 10 14
8.203359 10.39482 13.39482 19.34924 25.29439 29.49379
30.3829 218.933 4827.33 12937.98 52558.87 107236.2
9.0568 11.7989 13.9765 21.8657 27.9757 34.9757
0.02822 0.49283 0.92839 5.02983 189.994 1327.86
9.0283 10.3893 11.3732 22.3983 29.3786 33.7838
It is observed from Table 3 that the computation cost of the DA_K performed better than other algorithms when the number of thresholds are on the higher range (10, 14) whereas MO is better for lower range (2–8) thresholds. We can observe from Table 4 that as the value of N increases, the similarity of the segmented image and the original image increases and hence RMSE of the two concerned
404
S. Biswas et al. Table 4. Performance Metric values for an image
Image
N
PSNR
VIF
FSIM
SSIM
DA_K
MKE
DA_O
MO
CSA_K
2
13.762373
12.69581
13.93772
14.382623
13.298369
4
17.616812
16.29379
17.29363
18.392639
18.003682
6
22.872673
20.28369
21.39327
22.492679
21.983628
8
27.273882
22.39827
26.30372
27.071792
27.563826
10
28.927678
26.30283
28.39262
28.002739
28.937329
14
29.983787
28.28937
29,18362
28.937389
29.018723
2
0.0517679
0.0517278
0.053163
0.0528392
0.0491682
4
0.0595674
0.0582683
0.059728
0.0587192
0.0557299
6
0.0687973
0.0619373
0.066819
0.0686182
0.0638279
8
0.0772783
0.0737492
0.072937
0.0752463
0.0729372
10
0.0816737
0.0798728
0.079927
0.0791633
0.0802792
14
0.0917839
0.0852788
0.088932
0.0891783
0.0892678
2
0.6816836
0.6625722
0.692628
0.6782793
0.6829728
4
0.7183698
0.7027383
0.717192
0.7128492
0.7100383
6
0.7825637
0.7582693
0.778290
0.7692739
0.7792733
8
0.8886128
0.8792793
0.839274
0.8918393
0.8722789
10
0.9038782
0.9037892
0.891739
0.9019393
0.9103984
14
0.9913893
0.9682693
0.919328
0.9892830
0.9792794
2
0.402698
0.500683
0.482638
0.391798
0.4829367
4
0.497293
0.519027
0.496383
0.448269
0.5183682
6
0.529826
0.577825
0.519273
0.528269
0.5381263
8
0.692739
0.601782
0.598369
0.619326
0.6816832
10
0.826839
0.638269
0.716282
0.699273
0.8192379
14
0.898612
0.699379
0.791692
0.793682
0.8582929
images decreases. Therefore, a higher value of PSNR indicates that the image is of higher quality. Similarly, SSIM, VIF, FSIM should also increase as the number of thresholds increases. Table 4 shows that DA_K performed well in all the above-mentioned factors. The visual comparison of DA_K is presented in Table 5 for different threshold values. Overall, we can say that DA_K is superior to other algorithms, followed by CSA_K, DA_O, MO and MKE in the order of their occurrence.
Multi-level Image Segmentation of Breast Tumors
405
Table 5. Results of DA_K on images from the DMR-IR database Type of Breast Thermal Images
Segmented Image Original Image
(N = 2)
(N = 4)
(N = 6)
(N = 8)
(N = 10)
(N = 14)
Healthy
M:\Scanning\Springer\Bwf\540342_1_En\Chapter206\
Cancerous
6 Conclusion This study introduces a multi-threshold image segmentation method applied to Breast Tumor images. The proposed method is a fusion of nature-inspired algorithms like Dragonfly Algorithm and Crow Search Algorithm with Kapur entropy. The performance of the proposed methods is verified through a classic dataset of Breast thermal images, and the results are compared with other classical algorithms. DA_K works best for a higher number of thresholds. Its computation time is also significantly less for those thresholds when compared to other methods, thus proving its superior performance. The experimental results show that DA_K performed better than the other algorithms for PSNR, VIF, FSIM and SSIM. Acknowledgement. This research has been financially supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70–2021-00143 dd. 01.11.2021, IGK 000000D730321P5Q0002). Authors acknowledge the technical support and review feedback from
406
S. Biswas et al.
AILSIA symposium held in conjunction with the 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022).
References 1. About Breast Cancer Homepage. https://www.cancer.org/cancer/breast-cancer/about/howcommon-is-breast-cancer.html 2. Mohamed, A.N., Moreno, A., Puig, D.: Breast cancer detection in thermal infrared images using representation learning and texture analysis methods. Electronics 8, 100 (2019) 3. Mambou, S.J., Maresova, P., Krejcar, O., Selamat, A., Kuca, K.: Breast cancer detection using infrared thermal imaging and a deep learning model. Sensors 18(9), 2799 (2018) 4. Szentkuti, A., Kavanagh, H.S., Grazio, S.: Infrared thermography and image analysis for biomedical use. Periodicum Biologorum 113(4), 385–392 (2011) 5. Polyak, K.: Heterogeneity in breast cancer. J. Clin. Invest. 121(10), 3786–3788 (2011) 6. Arora, S., Acharya, J., Verma, A., Panigrahi, P.K.: Multilevel thresholding for image segmentation through a fast statistical recursive algorithm, Pattern Recogn. Lett. 29(2), 119–125 (2008) 7. Kapur, J.N., Sahoo, P.K., Wong, A.K.C.: A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Processing 29(3), 273–285 (1985) 8. Ruiz, F.E., Pérez, P.S., Bonev, B.I.: Information theory in computer vision and pattern recognition. Springer Science & Business Media (2009) https://doi.org/10.1007/978-1-84882297-9 9. Sezgin, M., Bulent, S.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13(1), 146–168 (2004) 10. Michalak, H., Okarma, K.: Improvement of image binarization methods using image preprocessing with local entropy filtering for alphanumerical character recognition purposes. Entropy 21(6), 562 (2019) 11. Mirjalili, S.: Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput. Appl. 27(4), 1053– 1073 (2015). https://doi.org/10.1007/s00521-015-1920-1 12. Askarzadeh, A.: A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm. Comput. Struct. 169, 1–12 (2016) 13. Houssein, E. H., Abdelkareem, D.A., Emam, M.M., Hameed, M.A., Younan, M.: An efficient image segmentation method for skin cancer imaging using improved golden jackal optimization algorithm. Comput. Biol. Med. 149, 106075 (2022) 14. Qi, A., et al.: Directional mutation and crossover boosted ant colony optimization with application to COVID-19 X-ray image segmentation. Comput. Biol. Med. 148, 105810 (2022) 15. Harrabi, R., Braiek, E.B.: Color image segmentation using multi-level thresholding approach and data fusion techniques: application in the breast cancer cells images. J Image Video Proc 2012, 11 (2012) 16. Houssein, E.H., Emam, M.M., Ali, A.A.: An efficient multilevel thresholding segmentation method for thermography breast cancer imaging based on improved chimp optimization algorithm. Expert Syst. Appl. 185, 115651 (2021) 17. Si, T., Patra, D.K., Mondal, S., Mukherjee, P.: Breast DCE-MRI segmentation for lesion detection using Chimp Optimization Algorithm. Expert Syst. Appl. 204, 117481 (2022) 18. Zhao, S., Wang, P., Heidari, A.A., Chen, H., He, W., Xu, S.: Performance optimization of salp swarm algorithm for multi-threshold image segmentation: comprehensive study of breast cancer microscopy. Comput. Biol. Med. 139, 105015 (2021)
Multi-level Image Segmentation of Breast Tumors
407
19. Bajaj, A., Abraham, A.: Prioritizing and minimizing test cases using dragonfly algorithms. Int. J. Comput. Inf. Syst. Ind. Manage. Appl. 13, 062–071 (2021) 20. Database for Research Mastology with Infrared Image Homepage. http://visual.ic.uff.br/dmi 21. Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004) 22. Sheikh, H.R., Bovik, A.C., de Veciana, G.: An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 14(12), 2117–2128 (2005) 23. Zhang, L., Zhang, L., Mou, X., Zhang, D.: FSIM: a feature similarity index for image quality assessment. IEEE Trans. Image Process. 20(8), 2378–2386 (2011)
Interference Detection Among Secondary Users Deployed in Television Whitespace Joachim Notcker1 , Emmanuel Adetiba1,3(B) , Abdultaofeek Abayomi4 , Oluwadamilola Oshin1 , Kenedy Aliila Greyson2 , Ayodele Hephzibah Ifijeh1 , and Alao Babatunde1 1 Department of Electrical and Information Engineering and Covenant Applied Informatics and
Communication African Center of Excellence (CApIC ACE), Covenant University, Ota, Nigeria [email protected] 2 Department of Electronics and Telecommunications Engineering, Dar Es Salaam Institute of Technology, Dar Es Salaam, Tanzania 3 Institute for Systems Science, HRA, Durban University of Technology, P.O. Box 1334, Durban, South Africa 4 Department of Information and Communication Technology, Mangosuthu University of Technology, Durban, South Africa
Abstract. Interference is one of the significant issues in television white space (TVWS) that limits the scalability of secondary user networks, lowers the quality of service, and causes harmful destruction to primary users. Interference among secondary users is one of the severe problems in TVWS because there is no legal rule that governs the coexistence of secondary nodes in the available white space channels. Many studies have been conducted to recognize the presence of primary signals in order to identify spectrum gaps and avoid interference between primary and secondary users, but the majority of them failed to detect interference among secondary users. Furthermore, the few works that mitigate interference among secondary users, rather than detecting it, assume interference. Therefore, in this paper, we develop an interference detection algorithm using an energy detector. To enhance the energy detector’s functionality, we consider dynamic thresholds rather than static ones. We also modify the binary hypothesis to account for interference between two non-cooperative users coexisting in TVWS. We simulate the energy detector technique in MATLAB R2020a environment and utilised various signalto-noise ratios (SNR) values. With an SNR of −8 dB, the proposed algorithm attains a maximum performance of 95.35% as the probability of detection and meets the standard set by IEEE 802.22 which requires the probability of detection to surpass or equal to 90%. Keywords: Television White Space · Interference Detection · Secondary Users · SNR
1 Introduction The need for efficient bandwidth utilization in wireless communication has surged in recent years. According to Cisco, global internet protocol (IP) traffic would surpass 2 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 408–417, 2023. https://doi.org/10.1007/978-3-031-35510-3_39
Interference Detection Among Secondary Users Deployed
409
zettabytes by 2019 and will continue to exceed 2 zettabytes per year [1]. A study in 2015, showed that the Internet was accessible by 5 billion devices, and the figure was expected to rise to 25 billion by the end of year 2020 [2, 3]. It was also reported that the number of people that were connected to the Internet was over 3.2 billion [3–5]. As a result, the radio spectrum has become scarce. Recent observations however revealed that due to static spectrum allocation, some of the permitted spectrums are underutilized [6, 7]. TVWS is one of the technologies being explored in wireless communication to alleviate the spectrum scarcity problem [8, 9]. Effective utilization of this free space is one of the fundamental concerns that has resulted in the creation of coexistence techniques that can accommodate primary and secondary users [10, 11]. However, this has led to interference problems among secondary users themselves and between secondary and primary users [10, 11]. Interference is one of the major difficulties that hinder the effective utilization of unused portions of TV bands [10, 11]. Furthermore, interference limits the scalability of secondary user networks deployed in TVWS, lowers the Quality of Service (QoS), and causes harmful destruction to primary users [10, 11]. The majority of works in the literature identify primary signal presence to detect spectrum gaps and prevent interference between primary and secondary users but did not identify secondary user interference [11–17]. Beside this, even the few studies that mitigate interference between secondary users assume interference than detecting it [10, 18, 19]. Hence, there is a need for further research to detect interference among secondary users deployed in TVWS in order to improve the quality of service among them. Therefore, in this work, we developed an interference detection algorithm using an energy detector and simulate it on MATLAB. The remainder of this paper has been divided thus: Sect. 2 presents the interference problem mathematically, proposes the model and performance metrics. Section 3 discusses the simulation results obtained while Sect. 4 concludes the study.
2 Methodology 2.1 Detection Problem Let two non-cooperative secondary users randomly deployed in the same region and access the same channel ‘m’ out of total available white space channels M. Let CR0 be the first node to access channel m at time tn. Then the transmission of CR1 accessing the same channel will cause interference at the receiver of CR0 . The interference detection problem at CR0 receiver can be derived from the binary hypotheses model: H0 : r[n] = z[n]
(1)
H1 : r[n] = i[n] + z[n]
(2)
where n = 0, 1,…, N-1 is observed samples, H0 denotes no interfering signal, whereas H1 denotes an interfering signal. Then, z[n] denotes Additive white Gaussian noise (AWGN), with zero mean and variance of σn 2 , whilei[n] is the interfering signal from CR1 and r[n] is the received signal at CR0 receiver [14, 15].
410
J. Notcker et al.
2.2 Proposed System Model 2.2.1 TVWS-Energy Detection Algorithm To validate the hypotheses in Eqs. (1) and (2) we use the energy detection method because of its efficiency of usage, it does not require a thorough understanding of the signal to be detected and it has low computational complexity [14, 15, 20]. It computes the signal’s received energy T(x) sometimes referred to as Test Statistics as [14, 15]: T(x) =
N−1
|x[n]|2
(3)
n=0
Then, test statistics are compared with the threshold to decide whether an interfering signal is present or not. The flow chart below indicates the detection algorithm realized in MATLAB environment (Fig. 1).
Initialize random number generator
Assign rameter ues
paval-
Generate Signal for n=0, 1…N-1
Start Monte Carlo Performance Evaluation
YES Generate H0 and H1
Compute Test Statistics for H0 only and for H1
Realization Complete?
Compute Receiver Operating Curve (ROC)
NO Input number of thresholds desired
Fig. 1. Flowchart for TVWS signal detection implementation.
2.2.2 Detection Performance Metrics We use the receiver operating characteristics curve (ROC) to evaluate the performance of the developed interference detection algorithm. The ROC is simply a graph of the probability of a false alarm (P fa ) versus the probability of detection(Pd ) parameterized by threshold [14, 15]. To generate ROC graphs, we use the Monte Carlo Simulation method. It is a necessary approach because computer-based simulated data can be used as input for an algorithm [21, 22]. The mathematics expressions for (P fa ), (Pd ), and the
Interference Detection Among Secondary Users Deployed
411
threshold (λ) [12, 14, 16, 23] are given as:
λ − N σw2 Pfa = Q √ σw2 2N ⎛ ⎞ λ − N σw2 + σi2 ⎠ Pd = Q⎝ 2 2 2N σw + σi
√ λ = σw2 Q−1 Pfa 2N + N )
(4)
(5)
(6)
where N is the total number of samples, σ 2 i is the variance of the interfering signal to be detected, σ 2 w is the noise variance and, Q(.)andQ−1 (.) are the Q-function and its inverse respectively. Q-function is the tail distribution function of the standard normal distribution [14, 15] defined as: 1 ∞ t2 ∫ exp(− )dt Q(x) = √ 2 2π x
(7)
2.2.3 Experimental Parameters The details of the experimental simulation implemented in MATLAB 2020a environment is presented in Table 1. The number of samples for random detection was set to 1000 while 10,000 trials of Monte Carlo trials was utilised with SNR values set for −8 dB, − 10 dB, −12 dB and −16 dB respectively with a noise variance of 1 dB. Table 1. Simulation details No
Parameter
Value(s)
1
Probability of false alarm
0.01
2
Noise variance
1 dB
3
Signal to Noise Ratio
−8, −12, −10, −16 (dB)
4
Number of Monte Carlo Trials
10000
5
Random Detection Samples
1000
412
J. Notcker et al.
3 Results and Discussion We used MATLAB R2020a to evaluate the effectiveness of the suggested algorithm and use 10000 trials for Monte Carlo Simulations. The probability of a false alarm was set as 0.01, and noise variance as 1 while using 1000 detection samples. From Fig. 2, a performance of 0.9535 probability of detection was obtained with SNR of −8dB. This result meets IEEE 802 standards for proper detection, which require a probability of detection to be greater than or equal to 90% [14, 15].
Fig. 2. Pfa vs Pd with SNR = −8dB.
We also experimented with different SNR values. In Fig. 3, we set SNR = −12dB and obtained 0.4088 as the probability of detection as shown below.
Interference Detection Among Secondary Users Deployed
413
Fig. 3. Pfa vs Pd with SNR = −12dB.
In Fig. 4, with SNR of −10dB, 0.6964 was attained as the probability of detection while in Fig. 5, a performance of 0.1434 probability of detection was obtained when the SNR = −16dB. Therefore, from the simulated results, it shows that the probability of detection increases as the SNR increases, and the maximum probability of detection performance result of 0.9535 was obtained in this experiment when the SNR value was −8dB.
414
J. Notcker et al.
Fig. 4. Pfa vs Pd with SNR = −10Db.
Interference Detection Among Secondary Users Deployed
415
Fig. 5. Pfa vs Pd with SNR = −16dB.
4 Conclusion In this current study, we developed interference detection algorithm for identifying interference between two non-cooperative secondary users deployed in TVWS using energy detector technique. We simulated the energy detector using MATLAB R2020a, by various values of signal to noise ratios (SNR). As we set −8 dB for SNR, the developed algorithm attains maximum performance of 95.35% as probability of detection and meets the standard set by IEEE 802.22 which requires probability of detection to be greater than or equal to 90%. Moreover, as the value of SNR decreases the performance of the algorithm degrades. Therefore, for future work, real time measurement of noise variance is required to enhance the efficiency of the energy detector in low SNR values. Also, we recommend implementation of energy detector using Universal Software Radio Peripheral (USRP) and GNU radio software for real time experiments. Acknowledgement. The Covenant University Centre for Research, Innovation, and Discovery (CUCRID) supported this investigation. This publication would not have been possible without the financial backing that was provided to the authors.
References 1. Cisco. Cisco visual networking index (VNI) global mobile data traffic forecast update, 2017– 2022 white paper. Comput. Fraud Secur. pp. 3–5 (2019). http://www.gsma.com/spectrum/ wp-content/uploads/2013/03/Cisco_VNI-global-mobile-data-traffic-forecast-update.pdf
416
J. Notcker et al.
2. Zhou, X., Sun, M., Li, G.Y., Fred Juang, B.H.: Intelligent wireless communications enabled by cognitive radio and machine learning. China Commun. 15(12), 16–48 (2018) 3. Adetiba, E., Matthews, V.O., John, S.N., Popoola, S.I., Abayomi, A., Chen, K.: NomadicBTS : Evolving cellular communication networks with software-defined radio architecture and open-source technologies. Cogent Eng. 5(1), 1–15 (2018). https://doi.org/10.1080/23311916. 2018.1507465 4. ITU (International Telecommunications Union). Measuring digital development. Facts and figures 2019. ITU Publ. pp. 1–15 (2019). https://www.itu.int/myitu/-/media/Publications/ 2020-Publications/Measuring-digital-development-2019.pdf 5. Okokpujie, K., Reuben, A., Ofoche, J.C., Biobelemoye, B.J., Okokpujie, I.P.: A comparative analysis performance of data augmentation on age-invariant face recognition using pretrained residual neural network. J. Theor. Appl. Inf. Technol. 99(6), 1309–1319 (2021) 6. Ahmed, H., Asaduzzaman.: Channel assignment augmentation algorithm to mitigate interference for heterogeneous ‘tV White Space’ users. In: 2018 Joint 7th International Conference Informatics, Electronics and Vision 2nd International Conference Imaging, Vision Pattern Recognition, ICIEV-IVPR 2018, no. June, pp. 200–205 (2019). https://doi.org/10. 1109/ICIEV.2018.8641003 7. Yun, D.W., Lee, W.C.: Intelligent dynamic spectrum resource management based on sensing data in space-time and frequency domain. Sensors 21(16), 1–21 (2021). https://doi.org/10. 3390/s21165261 8. Zhang, W., Yang, J., Guanglin, Z., Yang, L., Yeo, C.K.: TV white space and its applications in future wireless networks and communications: a survey. IET Commun. 12(20), 2521–2532 (2018). https://doi.org/10.1049/iet-com.2018.5009 9. Oluwafemi, I.B., Bamisaye, A.P., Faluru, M.A.: Quantitative estimation of TV white space in Southwest Nigeria. Telkomnika (Telecommun. Comput. Electron. Control 19(1), 36–43 (2021). https://doi.org/10.12928/TELKOMNIKA.V19I1.17881 10. Adekar, R.H., Kureshi, A.K.: Interference Mitigation of Heterogeneous Cognitive Radio Network using Spatial Diversity. 2, 3595–3601 (2019). https://doi.org/10.35940/ijeat.B4039. 129219 11. Ranjan, R., Agrawal, N., Joshi, S.: Interference mitigation and capacity enhancement of cognitive radio networks using modified greedy algorithm/channel assignment and power allocation techniques (2020). https://doi.org/10.1049/iet-com.2018.5950 12. Wan, R., Ding, L., Xiong, N., Shu, W., Yang, L.: Dynamic dual threshold cooperative spectrum sensing for cognitive radio under noise power uncertainty. HCIS 9(1), 1–21 (2019). https:// doi.org/10.1186/s13673-019-0181-x 13. Luo, J., Zhang, G., Yan, C.: An energy detection-based spectrum-sensing method for cognitive radio. Wirel. Commun. Mob. Comput. 2022, (2022). https://doi.org/10.1155/2022/3933336 14. Lorincz, J., Ramljak, I.: Algorithm for Evaluating Energy Detection Spectrum Sensing Performance of Cognitive Radio MIMO-OFDM Systems. pp. 1–22 (2021) 15. Ramírez, G.A., Saavedra, M.A., Araque, J.L.: Analysis of an energy detection algorithm for spectrum sensing. In: Proceedings of 2018 8th IEEE-APS Topical Conference Antennas and Propagation in Wireless Communication APWC 2018, no. September, pp. 924–927 (2018). https://doi.org/10.1109/APWC.2018.8503754 16. Arjoune, Y., El Mrabet, Z., El Ghazi, H., Tamtaoui, A.: Spectrum sensing: Enhanced energy detection technique based on noise measurement. In: 2018 IEEE 8th Annual Computing and Communication Workshop and Conference CCWC 2018, vol. 2018-Janua, pp. 828–834 (2018). https://doi.org/10.1109/CCWC.2018.8301619 17. Carrick, M.: Cyclostationary Methods for Communication and Signal Detection Under Interference Interference (2018)
Interference Detection Among Secondary Users Deployed
417
18. Hendre, V., Murugan, M., Deshmukh, M., Ingle, S.: Transmit Antenna Selection with Optimum Combining for Aggregate Interference in Cognitive Underlay Radio Network. Wireless Pers. Commun. 92(3), 1071–1088 (2016). https://doi.org/10.1007/s11277-016-3593-1 19. Deshmukh, M.M., Zafaruddin, S.M., Mihovska, A., Prasad, R.: Stochastic-geometry based characterization of aggregate interference in TVWS cognitive radio networks. IEEE Syst. J. 13(3), 2728–2731 (2019). https://doi.org/10.1109/JSYST.2019.2904584 20. Fajemilehin, T., Yahya, A., Langat, K., Opadiji, J.: Optimizing cognitive radio deployment in cooperative sensing for interference mitigation. BIUST Research and Innovation Symposium 2019 (RDAIS 2019), vol. 2019, no. June, pp. 76–81 (2019). https://drive.google.com/open? id=168whyUBm9_N5lXw0gwGr-yDcYA2sMvys 21. Al Zubaer, A., Ferdous, S., Amrin, R., Romzan Ali, M., Alamgir Hossain, M.: Detection and false alarm probabilities over non-fading and fading environment. Am. J. Electr. Comput. Eng. 4(2), 49 (2020). https://doi.org/10.11648/j.ajece.20200402.13 22. Dannana, S., Chapa, B.P., Rao, G.S.: Spectrum sensing for OFDM cognitive radio using matched filter detection. Int. J. Recent Technol. Eng. 8(2), 1443–1448 (2019). https://doi.org/ 10.35940/ijrte.B2124.078219 23. kockaya, K., Develi, I.: Spectrum sensing in cognitive radio networks: threshold optimization and analysis. EURASIP J. Wirel. Commun. Netw. 2020(1), 1–19 (2020). https://doi.org/10. 1186/s13638-020-01870-7
Sampling Imbalanced Data for Multilingual Machine Translation: An Overview of Techniques Albina Khusainova(B) Innopolis University, Innopolis, Russia [email protected]
Abstract. This work presents an overview of methods which alleviate data imbalance problem in Multilingual Neural Machine Translation (MNMT). The idea of MNMT is to build a single model for many translation directions instead of building separate models for each direction. In this work, the methods are divided into two groups (static and dynamic methods), compared with each other, and analyzed in terms of performance. Static methods are defined as those which have a fixed sampling strategy, whereas dynamic methods adjust the sampling strategy based on the current state of model training. Analysis shows that both static and dynamic methods are able to improve translation quality, especially for low-resource directions. Static methods are simpler but sensitive to the hyperparameter choice. As the overview demonstrates, dynamic methods in general outperform static methods. Keywords: Neural machine translation · Multilingual machine translation · Imbalanced data · Low-resource languages · High-resource languages · Oversampling · Overfitting · Underfitting
1
Introduction
Multilingual neural machine translation (NMT) systems are gaining more and more attention in the last years due to their ability to combine many translation directions in a single model and the potential for knowledge transfer between different languages [2], which leads to improved traslation quality. Ever-increasing multilingual machine translation models are developed every year, both in terms of model capacity and the number of languages involved. There are multilingual models with dozens [14], hundreds [1,2,7], and even thousands of languages [3]. Training multilingual NMT models, however, has well-known challenges such as heterogeneity and imbalance of training data. MNMT models are trained on the combination of parallel corpora for different language pairs. Heterogeneity is the problem of linguistic dissimilarity between these training corpora. It complicates the training process since finding common representations for dissimilar data is challenging. Imbalance is the problem of skewed data distributions, when c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 418–427, 2023. https://doi.org/10.1007/978-3-031-35510-3_40
Sampling Imbalanced Data for Multilingual MT
419
there may be hundreds of millions of training samples for one language pair and just thousands for another lower-resourced pair. Such skewed distributions lead to a model that converges inconsistently over different training datasets. The focus of this work is the latter problem of data imbalance. The question is how to train a model when training data is imbalanced. If data is sampled from language pairs according to their distribution in the training dataset, the model is likely to be skewed toward high-resource pairs. Alternatively, if low-resource data is oversampled, there is a high risk of overfitting toward low-resource directions [10,16]. Vice versa, undersampling high-resource directions may lead to underfitting in these directions. Therefore, the choice of effective data sampling strategies for training a multilingual model is crucial to its performance. The majority of recent works on developing multilingual NMT models [3,7,15,20] use the Temperature-Based sampling approach, introduced by Arivazhagan et al. [2]. In this approach, the hyperparameter T controls the extent to which low-resource pairs are oversampled and high-resource directions are undersampled. Meanwhile, lately there emerged a number of more sophisticated and efficient methods which dynamically sample imbalanced data for MNMT [8,18,19,21]. To the best of our knowledge, there are no existing works compiling these approaches together and comparing their effectiveness. At the same time, such summary may be a valuable resource for researchers and developers of multilingual MT models to compare existing techniques, choose from them, and develop their own techniques based on the current state-of-the-art. This work attempts to close this gap by providing an overview of different sampling strategies used to alleviate the data imbalance problem for multilingual neural machine translation. The overview describes these strategies and compares their effectiveness measured in BLUE scores, separately for low- and high-resource directions, whenever available. 1.1
Organization of the Overview
This overview brings together, describes, and compares different sampling and sample weighting techniques for mitigating the data imbalance problem in MNMT. First such known methods were introduced in 2019, hence the overview covers the range of years from 2019 to 2022. Only those methods which were tested specifically on MNMT were considered. Google Scholar was used as the primary search tool for these works with such search queries as “data imbalance in multilingual neural machine translation”, “low-resource machine translation overfitting”, “regularization in multilingual neural machine translation”. Further, all works citing the two most popular papers on the topic [2,18] were examined to find the relevant approaches. The resulting set of approaches can be broadly divided into two major groups: Static methods and Dynamic methods. Static methods define the sampling strategy upfront and it does not change throughout the training process, whereas in dynamic methods sampling policy depends on the current state of training a
420
A. Khusainova
model. Below, the methods in each group are described and compared, starting from the simpler static methods group.
2
Static Methods
This section describes several static approaches designed to mitigate the data imbalance problem when training multilingual NMT models. They are static in the sense that they do not depend on the course of training, instead specifying the sampling strategy before the training starts. Most of them are the variants of temperature-based sampling [2] approach. Despite the simplicity, sometimes they perform better than more complex dynamic methods, and definitely can be used as a default or starting point when training a multilingual machine translation model. 2.1
Proportional Sampling
This is the naive approach, where data is sampled from language pairs proportionally to their share in the training set: p(l) = ΣkDDl k , where Dl is dataset size for language pair l. This is equivalent to mixing data for all language pairs together and randomly sampling during training. The problem here is that high-resource data dominates all other data, leading to low scores for low-resource directions [13]. Hence, if the multilingual model is built mainly to improve low-resource translation, this probably would be an inappropriate choice. The experiments in [21] show that this approach outperforms baseline bilingual models in manyto-one setting (many source languages, one target language), but lags behind in one-to-many case (one source language, many target languages). The more detailed analysis of proportional sampling in [2] (Table 1) shows that compared to bilingual baselines the performance has degraded in both cases, but for one-to-many case the trend is clearly observed: the lower-resourced language pair is, the worse are the results, which is in line with the above reasoning. One possible improvement to this approach is to set the artificial limit to the number of samples in any language pair [15]. The optimal value for this limit, however, depends on the language pair in question and so should be tuned, which can be expensive. 2.2
Uniform Sampling
1 , which The idea here is to sample uniformly from all language pairs, p(l) = Npairs is equivalent to oversampling low-resource data. One possible concern is that the model may memorize oversampled data, thus, overfit the low-resource directions. Raffel et al. [15] hypothesize that overall low scores obtained using this technique (compared to bilingual baselines and also proportional sampling) may be caused both by overfitting low-resource pairs and underfitting high-resource ones. The results in [2] (Table 1) show that uniform sampling is actually able to improve translation quality in low-resource directions, especially in many-to-one case, but at the expense of high-resource pairs’ performance. Probably, the risk of overfitting may depend on such factors as the degree of imbalance in data.
Sampling Imbalanced Data for Multilingual MT
2.3
421
Temperature-Based Sampling
This is the most popular technique of the time, introduced simultaneously by Arivazhagan et al. [2] and Conneau and Lample [5]. The idea of this technique is to sample according to the original distribution, regulated by a temperature term T : 1/T
p(l) =
pl
1/T k∈| pairs | pk
,
pl =
Dl , Σk Dk
(1)
where Dl is dataset size for language pair l. Setting T = 1 is the same as proportional sampling, and T = ∞ makes it uniform sampling. Choosing values in between controls the degree of oversampling. The optimal value for the hyperparameter T will be different for every new training setting, and its tuning is very expensive, which is the drawback of this technique. Arivazhagan et al. [2] evaluate this approach for T = 1, T = 5, and T = 100, dividing language pairs into three groups: low-resource, medium-resource, and high-resource, see Table 1. Proportional sampling (T = 1) shows the worst overall results, not being able to surpass bilingual models in any group, and especially failing in the low-resource group. However, compared to the results for T = 5 and T = 100, which are close to each other, proportional sampling performs slightly better in the high-resource group. Balanced sampling (T = 5) and uniform sampling (T = 100), by contrast, outperform bilingual models in the low-resource group. Note that the less is T , the less is the degradation of translation quality for high-resource directions. So, temperature-based sampling is indeed able to shift the model’s focus to the low-resource language pairs at the cost of the performance of high-resource ones. Table 1. Average BLEU scores for different static methods for one-to-many (O2M) and many-to-one (M2O) models. Results are provided for languages grouped by dataset size (high-, medium-, low-resourced). Adapted from [2]. Method
O2Mhigh O2Mmed O2Mlow M 2Ohigh M 2Omed M 2Olow
Bilingual
29.34
17.50
11.72
37.61
31.41
21.63
Proportional, T = 1 28.63
15.11
6.24
34.60
27.46
18.14
Uniform, T = 100
27.20
16.84
12.87
33.25
30.13
27.32
T =5
28.03
16.91
12.75
33.85
30.25
26.96
Temperature Heating. The modification to the above method was suggested by Dua et al. [6], and its idea is to gradually increase temperature every epoch according to the equation: e te+1 = (2) 1 + k√ ∗ t2s , C
422
A. Khusainova
where ts is starting temperature, e is the epoch number, C is the total number of epochs, and k is a coefficient controlling the rate of the temperature increase. Thereby, high-resource data is better represented in the initial stages of training, and then the sampling distribution between translation pairs becomes more uniform. The authors show that temperature heating improves the results of the baseline model, however, it is not clear, whether the baseline was trained using basic temperature-based sampling or not. They also report that increasing k leads to faster convergence. Sinkhorn Temperature Sampling. Another modification by Fan et al. [7] adapts temperature-based sampling to many-to-many case, such that there is a balanced distribution between languages, not between language pairs. The authors experimentally show the stable improvement of 0.5 BLEU over basic temperature-based sampling. 2.4
Oversampling and Downweighting
This approach differs from the above ones in that it manipulates sample weights in the loss function, not exactly sampling probabilities. Khusainova et al. [10] identified the persistent problem of overfitting in low-resource directions when training their hierarchical MNMT model. The problem was caused by oversampling low-resource data and the solution which solved the problem was to keep oversampling but to downweight low-resource samples. Specifically, the weight of low-resource samples in the loss function was decreased proportionally to data disbalance (the ratio between the size of the low-resource dataset and the size of the closest high-resource dataset). Basically, if data was oversampled 5 times to match the size of the high-resource dataset, then its samples’ weight would be multiplied by 1/5. The reasoning behind this approach is to regularize low-resource directions and at the same time give more influence to high-resource samples. As a result, this approach indeed solved the problem of overfitting and improved the scores in both high-resource and low-resource directions.
3
Dynamic Methods
Static methods are criticized for being too restrictive and not being able to adapt to the changes in the training process. They don’t factor in such aspects as language similarity and are sensitive to the hyperparameter choice, e.g. temperature. This group contains a more interesting type of methods which dynamically sample data such that the current sampling distribution is (supposedly) most beneficial for model training. 3.1
MultiDDS. Miltilingual Differentiable Data Selection
Introduced by Wang et al. [18], this is the first well-known method that dynamically samples data for multilingual NMT. A number of subsequent works
Sampling Imbalanced Data for Multilingual MT
423
compare their methods’ performance with MultiDDS based on the benchmarks also introduced in this work [18]. The core idea of this method is to sample data so that to maximize the model’s performance on the aggregated development set. This is achieved by preferring such datasets, whose gradients are similar to the gradient of the development set, which includes all language pairs. In contrast to this, the earlier work [17] applies the same DDS technique to improve on a particular language pair. The base element of the method is the scorer which defines the sampling distribution over training datasets for different language pairs. It is trained alternately with the MNMT model using reinforcement learning. The reward approximates the effect of the training data on the model’s performance on the development set and has the form of cosine similarity between two vectors of gradients. After the scorer’s update, its scores are used to sample data for the next runs of the model. This approach is compared to temperature-based sampling with T = 1/5/∞. On average, MultiDDS outperforms all these settings, but not by a large margin. Another observation is that it achieves better results on the dataset with related languages, which may indicate that it is able to account for language similarities, unlike temperature-based sampling method. 3.2
MultiUAT. Uncertainty-Aware Training
This work by Wu et al. [19] is built upon the MultiDDS [18] with the difference in the scorer objective. Here, instead of cosine similarity, the uncertainty metric is used. If the model is uncertain about the data, then its sampling probability will be increased. The authors offer multiple ways of calculating uncertainty, for example, it can be the average entropy of the target sentence. The authors argue that this approach is superior to MultiDDS and show it on the example of self-correlated corpus which is unreasonably oversampled in MultiDDS. The experiments demonstrate that both MultiDDS and MultiUAT perform better than static temperature-based sampling with T = 1/5/∞. MultiUAT, however, tested with different uncertainty measures, on average outperforms MultiDDS in all trained models and seems to give the best improvements for high-resource language pairs. 3.3
Adaptive Scheduling for Multi-task Learning
The work by Jean et al. [9] presents the approach where every dataset is assigned a dynamically changing weight. This weight is based on the ratio of the trained model’s validation BLEU score and the baseline model’s BLEU score. The baseline model can be a bilingual model for a particular language pair. So, if the validation score is small compared to the baseline score, the data will be oversampled, and vice versa. In the beginning, the weights are uniform, and there is also some lower limit for the weight to avoid catastrophic forgetting. Once calculated, these weights are used to scale the learning rate.
424
A. Khusainova
The method outperforms bilingual models, however, is not able to beat the multilingual model trained with a constant sampling ratio. It should also be noted that it was only tested on a simple model with two source and one target language. 3.4
CCL-M. Competence-Based Curriculum Learning
Zhang et al. [21] introduce a new competence-based method, where competence is an estimate of how well the model has learned a language pair. Resembling [9], it is defined as a ratio between the likelihood score of the model for the language pair and the likelihood score of the baseline bilingual model. First, only highresource language pairs are involved in model training. As their competence reaches some threshold, similar (in terms of vocabulary overlap) low-resource language pairs are also added to the training. While training the model, the data from different language pairs is sampled dynamically according to the inverse of their competencies. Compared with temperature-based sampling and MultiDDS, CCL-M shows a good improvement in scores. Although for high-resource language pairs the results are lower than those of the corresponding bilingual models, the gap is smaller compared to MultiDDS. 3.5
IBR. Iterated Best Response
Zhou et al. [22] introduce the approach based on the distributionally robust optimization. They suggest resampling training data every epoch such that it corresponds to the worst weighting for the current model. Thereby, the model is supposed to focus on problematic pairs. This is implemented as a min-max game, where the objectives of the model and the sampler are opposite. In their experiments, this method is compared to MultiDDS and temperature-based sampling with T = 1/5/100. Iterated Best Response performs on average better than all baselines, improving both on low-resource and high-resource directions. As for the performance on individual language pairs, the highest scores can be achieved by both baseline methods. 3.6
CATS. Curvature Aware Task Scaling
Li et al. [13] present the original idea of utilizing geometric properties of the loss landscape during model training. Specifically, they learn weights for each language pair such that their combined gradients direct the optimization trajectory towards low curvature regions, where there is less interference between languages. It should be noted that these weights are used to rescale the loss for each language pair, not to change the sampling probability. The approach is compared with temperature-based sampling with T = 1/5/∞, MultiDDS, and GradNorm [4]. GradNorm is the approach for multitask learning, which initially was not tested for MNMT. CATS and GradNorm
Sampling Imbalanced Data for Multilingual MT
425
are similar, but GradNorm’s objective is to rescale each task’s gradients to make them closer to the average gradients norm. It is shown that CATS performs better than baselines on average, improves on low-resource pairs, and is able to better balance performance on low-resource vs high-resource pairs, in contrast to temperature-based sampling methods. 3.7
Multi-arm Bandits
This group of methods [11,12] utilizes multi-arm bandits to select data for training MNMT models. Bandits are trained to maximize their reward, which can depend on the change in loss or likelihood, etc. In most of the cases, [11] cannot beat the proportional sampling approach. As for [12], their method also doesn’t show good results, not being able to outperform the baseline. 3.8
LSSD. Language-Specific Self-distillation
The most recently introduced method by Huang et al. [8] concludes the current overview. The main idea of the method is to save the best checkpoints for each language pair as the training goes on. Then, if the saved checkpoint outperforms the current model on a given language pair, the model is optimized using the distillation loss in addition to the standard loss for corresponding samples. The distillation loss is computed as a cross-entropy between the model’s and the checkpoint’s predictions. The suggested method does not explicitly change sampling probabilities, instead manipulating sample weights in the loss function. The approach is compared to many described above methods: MultiDDS, MultiUAT, CCL-M, and Iterated Best Response. The results (Table 2) show superior on average performance of LSSD compared to these methods. Iterated Best Response is the closest in performance to LSSD and in one of four cases, its scores are higher. Table 2. BLEU scores for different dynamic methods for two datasets with diverse and related languages, and two models: one-to-many (O2M) and many-to-one (M2O). Adapted from [8]. Method
M 2Orelated M 2Odiverse O2Mrelated O2Mdiverse
MultiDDS-S [18] 25.52
27.00
17.32
18.24
MultiUAT [19]
26.39
27.83
18.64
19.76
CCL-M [21]
26.73
28.34
18.89
19.53
IBR [22]
28.71
29.74
22.21
23.44
LSSD
29.15
30.57
22.20
23.55
426
4
A. Khusainova
Conclusion
This overview aimed to compile the existing techniques for mitigating the data imbalance problem in multilingual neural machine translation. The identified techniques were classified into two groups, static and dynamic, described and compared with each other. Most static methods evolve around the temperaturebased sampling approach, and most dynamic methods adjust sample weights in the inverse proportion to the model performance—the worse the model performs on a language pair, the more often its data will be sampled. As for the performance, dynamic methods generally outperform static ones, and Iterated Best Response and Language-Specific Self-Distillation achieve the greatest improvements among dynamic methods. All these approaches are intended to improve performance in all translation directions simultaneously, while in some scenarios it may be preferable to selectively focus on certain directions, such as a particular language pair or a group of low-resource language pairs. Only the DDS technique has the adaptation for such a case [17]. So, the prospects for future work may include extending data imbalance mitigation methods for this biased MNMT scenario.
References 1. Aharoni, R., Johnson, M., Firat, O.: Massively multilingual neural machine translation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Long and Short Papers) Association for Computational Linguistics, Minneapolis, Minnesota. Human Language Technologies. 1, pp. 3874–3884 (2019) 2. Arivazhagan, N., et al.: Massively multilingual neural machine translation in the wild: findings and challenges. CoRR abs/1907.05019 (2019) 3. Bapna, A., et al.: Building machine translation systems for the next thousand languages. Tech. Rep. Google Research (2022) 4. Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International Conference on Machine Learning. PMLR, pp. 794–803 (2018) 5. Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. Curran Associates, Inc, 32 (2019) 6. Dua, D., Bhosale, S., Goswami, V., Cross, J., Lewis, M., Fan, A.: Tricks for training sparse translation models. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Seattle, United States. Human Language Technologies., 3340–3345 (2022) 7. Fan, A., et al.: Beyond English-centric multilingual machine translation. J. Mach. Learn. Res. 22, 1071–10748 (2021) 8. Huang, Y., Feng, X., Geng, X., Qin, B.: Omniknight: multilingual neural machine translation with language-specific self-distillation. (2022) arXiv preprint arXiv:2205.01620
Sampling Imbalanced Data for Multilingual MT
427
9. Jean, S., Firat, O., Johnson, M.: Adaptive scheduling for multi-task learning. (2019)arXiv preprint arXiv:1909.06434 10. Khusainova, A., Khan, A., Rivera, A.R., Romanov, V.: Hierarchical transformer for multilingual machine translation. In: Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects. Association for Computational Linguistics, Kiyv, Ukraine, pp. 12–20 (2021) 11. Kreutzer, J., Vilar, D., Sokolov, A.: Bandits don’t follow rules: balancing multifacet machine translation with multi-armed bandits. In: Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 3190–3204 (2021) 12. Kumar, G., Koehn, P., Khudanpur, S.: Learning policies for multilingual training of neural machine translation systems. (2021) arXiv preprint arXiv:2103.06964 13. Li, X., Gong, H.: Robust optimization for multilingual translation with imbalanced data. In: NeurIPS (2021) 14. Neubig, G., Hu, J.: Rapid adaptation of neural machine translation to new languages. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, pp. 875–880 (2018) 15. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020) 16. Tran, C., Bhosale, S., Cross, J., Koehn, P., Edunov, S., Fan, A.: Facebook AI’s WMT21 news translation task submission. In: Proceedings of the 6th Conference on Machine Translation. Association for Computational Linguistics, 205–215 (2021) 17. Wang, X., Pham, H., Michel, P., Anastasopoulos, A., Carbonell, J., Neubig, G.: Optimizing data usage via differentiable rewards. In: International Conference on Machine Learning, pp. 9983–9995 PMLR (2020) 18. Wang, X., Tsvetkov, Y., Neubig, G.: Balancing training for multilingual neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 8526– 8537 (2020) 19. Wu, M., Li, Y., Zhang, M., Li, L., Haffari, G., Liu, Q.: Uncertainty-aware balancing for multilingual and multi-domain neural machine translation training. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Punta Cana, Dominican Republic, 7291–7305 (2021) 20. Xue, L., et al.: mT5: A massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. Human Language Technologies, pp. 483–498 (2021) 21. Zhang, M., Meng, F., Tong, Y., Zhou, J.: Competence-based curriculum learning for multilingual machine translation. In: Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 2481–2493 (2021) 22. Zhou, C., Levy, D., Li, X., Ghazvininejad, M., Neubig, G.: Distributionally robust multilingual machine translation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 5664–5674. (2021)
Digital Twin-Based Fuel Consumption Model of Locomotive Diesel Engine Muhammet Ra¸sit Cesur1(B) , Elif Cesur1 , and Ajith Abraham2 1 Faculty of Engineering and Natural Sciences, Istanbul Medeniyet University,
Industrial Engineering, ˙Istanbul, Turkey [email protected] 2 Machine Intelligence Research Labs (MIR Labs), Auburn, USA
Abstract. In this study, we developed a digital twin (DT) model of a diesel engine in TÜLOMSAS. ¸ We estimated the fuel consumption of the engine using the designed DT model. For this purpose, we first created the physical model of fuel consumption. We measured the parameters of the physical model that can be measured directly or other parameters related to these parameters through sensors attached to the engine. We demonstrated that all the parameters of the physical model are essentially interrelated by examining the correlations between the observed data and fuel consumption. Using the measured data for fuel consumption, air consumption, rpm, and combustion temperature, we created two Artificial Neural Networks (ANN) with a single hidden layer and a double hidden layer. By analyzing the results of the models, we created, we showed that the ANN with a single hidden layer gave more accurate results in predicting fuel consumption. This model has an error rate of 2.3% and estimates fuel consumption with an average error of 7.34 L. The created DT is a model that can help in many aspects of planning, such as trip scheduling and preventive maintenance. Using this model, the ideal driving speed between stations can be calculated and train services can be scheduled to minimize fuel consumption. The remaining useful life can be calculated by studying the fuel consumption behavior, and fault detection can be performed in accordance with the fuel consumption pattern. Keywords: Digital Twin · Artificial Neural Network · Fuel Consumption
1 Introduction In the modern world, where the value of energy efficiency is constantly increasing, it has become essential to maintain industrial operations effectively. Autonomous control and energy efficiency are two important thresholds that evolving and changing industrial systems must meet. If these two barriers are overcome, economies can expand and become more sustainable. When looking at the goals of industrial change as it relates to transportation systems, they are 1) autonomous driving, 2) the development of new engine and fuel technologies, and 3) more effective environmental and energy management of current systems. The DT is one of the fundamental technologies that can be used to implement planning and control procedures for both autonomous driving © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 428–435, 2023. https://doi.org/10.1007/978-3-031-35510-3_41
Digital Twin-Based Fuel Consumption Model
429
and energy efficiency [1–3]. Because DT technology can simulate how a vehicle will behave under different environmental conditions, it can also predict how its subsystems will behave while driving. Thus, both the pre-trip planning function and the on-the-fly decision-making function receive information from the DT. In this study, the fuel consumption of a diesel engine used in freight locomotives was modeled by building a DT of the engine. The created model can be used to 1) plan train trips with the lowest fuel consumption, 2) perform preventive maintenance by detecting anomalies in fuel consumption, 3) guide the driver while driving, or 4) the decision-making function of the autopilot during autonomous driving. The scientific discussion about the DT has attracted a lot of attention recently. Liu et al. predict significant growth in the field of DT based on data from more than 90200 Google searches and more than 3000 academic search results from Google Scholar, WOS, and Scopus [4]. The complexity of the concept of the DT has led to several definitions in the literature [5–8]. As part of the study of product lifecycle management, described as a virtual and physical region, each comprising a model, a link to data and a separate flow of information, Grieves developed the basic concept of DT. [9, 10]. This concept was previously referred to as the Mirrored Spaces Model, before being referred to as the DT [11, 12]. The latest research on engine performance, combustion characteristics modeling, and fundamental variables were also examined. The combustion characteristics of a fourstroke single-cylinder naturally aspirated diesel engine were predicted using an ANN [13]. The biodiesel fuel ratio, engine load, air consumption, and fuel flow rate were considered as inputs. The model was then built using the recorded in-cylinder pressure data and heat release as output parameters. [14] performed an additional combustion prediction study using Bayesian optimized Gaussian process regression. Engine load, fuel injection timing, peak cylinder pressure, and CO emissions were obtained as additional control parameters. The number of control parameters was also determined, including engine load, fuel pre-injection duration, cylinder peak pressure, and CO emissions. In terms of engine performance and emissions, [15] created a new model, the Firefly neural model, which was then contrasted with the original neural model. [16] attempted to simulate the heat and mass transfer of the cylinder. All features, including heat added to the cycle, the mass inside the cylinder, air volume, fuel, cooling system, and heat emitted from the exhaust, are provided as inputs to the model for this purpose. The parameters of the ignition model in the [17] study are the mean temperature of the burned zone, the oxygen mass rate, and the reaction time; unlike Song, the model also takes into account the ambient temperature. Though the number of scientists and professionals who value DT has been growing exponentially over time and several papers have provided DT case studies of the different kinds of engines there are still not enough studies to fully comprehend how DT might be used to increase system efficiency. At this point, this study presented a DT case study relevant to Pielstick 16PA4 V185 type locomotive diesel engine.
430
M. R. Cesur et al.
2 Methodology 2.1 Physical Model Since the engine can only produce a certain amount of power, fuel consumption is proportional to the efficiency of the engine. Since the study considers the diesel engine, the fuel consumption model was created using the cycle calculations of the diesel engine. According to Yardım [18], the mechanical efficiency (nm ) and thermal efficiency (nt ) are multiplied to obtain the total efficiency (nT ) of the engine in Eq. 1. nT = nm × nt
(1)
Yardım [18] defined thermal efficiency as the ratio of net heat (Qnet ) to the total amount of heat (Qin ) supplied to the cycle. Sabancı and I¸sık [19], on the other hand, expressed thermal efficiency as a function of the internal volume change of the piston and the pressure inside the piston. Equation 2 illustrates the thermal efficiency using the atmospheric pressure (P1 ) generated by the intake stroke in the piston, the pressure generated by the compression of the clean air in the piston (P2 ), the volume of compressed air (V2 ), and the volume after injection (V3 ). Qnet nt = =1− Qin
P1 P2
(k−1) k
k
V3 V2 −1
V3 V2
k−1 (2)
As Yardım [18] noted, the calculation of the relationship between overall efficiency and fuel consumption depends on the specific heat of the fuel (Hu ). If the unit of specific fuel consumption (be ) in Eq. 3 is kg/Hph, the coefficient (c) should be 632; if it is kg/kWh, the coefficient should be 860. nT =
c Hu be
(3)
As you can see from the equations the parameters that determine fuel consumption are specific fuel consumption, heat added to the engine cycle, the heat lost in the engine cycle, pressure in the piston, and air volume in the piston. While the heat parameters have a linear effect on fuel consumption, there is a non-linear relationship between the pressure and temperature parameters and fuel consumption. The specific heat of the fuel, on the other hand, is a factor independent of the engine and was not considered in this study because it is unlikely to affect operating conditions. According to Yardım [18], the value of this parameter is often expressed as the sum of its values in all studies. As a result, it was found that fuel consumption is affected by temperature, pressure in the piston, and the gas volume in the piston, considering how efficiently the engine runs. Apart from these parameters, it is obvious that the crankcase pressure also has an impact on understanding the change in efficiency, as the increased gas leaks due to the aging of the engine affect the crankcase.
Digital Twin-Based Fuel Consumption Model
431
2.2 Digital Twin Model The DT model generates the results of the physical model in the real world depending on the collected data from the engine. It is not possible to measure all of the above parameters directly with a running engine. The values of these factors can be understood indirectly by using a method that can simulate the linear and nonlinear relationships of fuel consumption with variables such as torque, the amount of air drawn in by the engine, temperature variations in the cooling system, fuel temperature, and pressure. Considering the given parameters, the relationship between the parameters and fuel consumption is confirmed by a correlation matrix, which is shown in Fig. 1. The correlation matrix was created using data from the engine test bench. The data of the specified parameters have the highest correlation with fuel consumption out of 78 columns of measured data. The explanation of the columns shown in Fig. 1 can be found in Table 1. Table 1. Selected parameters #
Name of Parameter
Definition
Unit
1 2
C_FUEL
Fuel Consumption
Kg/H
AIR_FLOW
Air Flow
Meter/Sec
3
LTSu
Low Water Temperature
Centigrade
4
HTsu
High Water Temperature
Centigrade
5
LOAD
Load
Kg
6
TORK
Torque
N.m
7
P_C
Crankcase Pressure
Bar
8
T_FUEL
Fuel Temperature
Centigrade
9
P_FUEL
Fuel Pressure
Bar
10
HUM
Humidity
G/m3
The method used to simulate the linear and nonlinear relationships of fuel consumption is an ANN, which is a successful method for deriving correlations between inputs and outputs. For the fuel consumption model, two models with a single hidden layer and with two hidden layers were selected and their topologies were compared. The following parameters were chosen for the neural network topology: The number of hidden neurons in a single layer was set to 30. The stopping criterion called a threshold, which is defined as the limit of the error function is 0.0001 The starting values for the weights are determined at the beginning of the model instead of random values as 0.15 The activation function, also known as the function used to smooth the result, is logistic. Backpropagation is the type of algorithm used to compute the neural network. Ten repetitions are required for training. Between −0.1 and 0.1 is the learning rate threshold, which sets the lower and upper bounds for the learning rate. The learning rate used in conventional backpropagation is 0.02. The squared sum of the errors forms the error function (sse).
432
M. R. Cesur et al.
Fig. 1. Correlation matrix
3 Results and Discussion First, a neural network model with one hidden layer and then a neural network model with two hidden layers were developed for fuel consumption and compared in terms of performance criteria such as mean absolute deviation (MAD), mean absolute percentage error (MAPE), mean squared error (MSE). Among the models created, the model with a hidden layer had a mean absolute error of 7.339 L in estimating fuel consumption. The mean absolute percentage error of the model was 2.397%, while the mean squared error Table 2. Network topologies and performances Attributes
Single Hidden
Double Hidden
Number of Nodes in Hidden Layer
30
30, 5
Threshold
0.0001
0.0001
Start Weights
0.15
0.15
Activation Function
logistic
logistic
MAD
7.339
7.351
MSE
100.445
99.522
MAPE
%2.397
%2.413
Digital Twin-Based Fuel Consumption Model
433
was 100.445. The two hidden layers of the model have a mean absolute error of 7.351 L in estimating fuel consumption. The mean squared error of the model was 99.522 and the mean absolute percentage error was 2.413%. According to the data in Table 2 for the two hidden layers, the model with one hidden layer has a lower mean absolute error and a lower mean absolute percentage error, while the model with two hidden layers has a lower mean squared error. The decrease in mean squared error indicates that the prediction variance is decreasing at a constant rate. However, the mean absolute percentage error increased for the model with two hidden layers. The predictions of the two ANN models are shown in Fig. 2. The results of the estimates were similar; however, it was found that the variance of the estimates decreased as the number of layers increased. Physical changes in the real environment lead to variations in the amount of fuel consumption at a constant speed. Therefore, the amount of error increased due to the decrease in variance. Increasing the number of layers in the fuel consumption model created with the parameters specified in the study has a
Fig. 2. a) ANN with a single hidden layer b) ANN with double hidden layer.
434
M. R. Cesur et al.
negative effect. The model that is closest to the truth is the ANN model with a single hidden layer.
4 Conclusion The objective of this study is to obtain the digital twin of a diesel engine made at TÜLOMSAS¸ to develop a fuel consumption model. It was studied and attempted to estimate variables such as combustion temperature, air consumption, power output, and fuel consumption of the engine using ANN models. In considering the physical model of fuel consumption, the presence of linear and nonlinear relationships between fuel consumption and the above parameters are presented. In addition, a correlation matrix is used to confirm that the specified parameters are highly correlated with fuel consumption. As a result of the physical model and data analysis, a DT model over two ANN is proposed. One model is a single hidden layer ANN, and the other is a double hidden layer ANN. When comparing the models with a single hidden layer ANN, it is assumed to be better, while the double hidden layer ANN has a lower MSE value. Due to the variability of fuel consumption at constant conditions in the real world, the MSE optimization of the double hidden layer ANN leads to an increase in the overall model error (MAPE). Acknowledgment. Ajith Abraham is supported by The Analytical Center for the Government of the Russian Federation (Agreement No. 70–2021-00143 dd. 01.11.2021, IGK 000000D730321P5Q0002). Authors acknowledge the technical support and review feedback from AILSIA symposium held in conjunction with the 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022).
References 1. Macián, V., Tormos, B., Bermúdez, V., Ramírez, L.: Assessment of the effect of low viscosity oils usage on a light duty diesel engine fuel consumption in stationary and transient conditions. Tribol Int. 79, 132–139 (2014). https://doi.org/10.1016/J.TRIBOINT.2014.06.003 2. Huang, Y., et al.: Impact of potential engine malfunctions on fuel consumption and gaseous emissions of a Euro VI diesel truck. Energy Convers Manage. 184, 521–529 (2019). https:// doi.org/10.1016/J.ENCONMAN.2019.01.076 3. Tran, T.A.: Effect of ship loading on marine diesel engine fuel consumption for bulk carriers based on the fuzzy clustering method. Ocean Eng. 207, 107383 (2020). https://doi.org/10. 1016/J.OCEANENG.2020.107383 4. Liu, M., Fang, S., Dong, H., Xu, C.: 15 review of digital twin about concepts, technologies, and industrial applications. J. Manuf. Syst. 58, 346–361 (2021). https://doi.org/10.1016/j. jmsy.2020.06.017 5. Shafto, M., Conroy, M., Doyle, R., Glaessgen, E.: DRAFT Modeling, Simulation, information Technology & Processing Roadmap. Technology Area (2010) 6. Madni, A.M., Madni, C.C., Lucero, S.D.: Leveraging digital twin technology in model-based systems engineering. Systems 7(1), 7 (2019). https://doi.org/10.3390/systems7010007 7. Liu, Q., Liu, B., Wang, G., Zhang, C.: A comparative study on digital twin models. In: AIP Conference Proceedings. vol. 2073 (2019). https://doi.org/10.1063/1.5090745
Digital Twin-Based Fuel Consumption Model
435
8. Leng, J., Wang, D., Shen, W., Li, X., Liu, Q., Chen, X.: 13 Digital twins-based smart manufacturing system design in industry 4.0: a review. J. Manufact. Syst. 60, 119–137 (2021). https://doi.org/10.1016/j.jmsy.2021.05.011 9. Grieves, M.: Digital Twin : Manufacturing Excellence through Virtual Factory Replication. White Paper, no. March (2014) 10. Grieves, M.W.: Product lifecycle management: the new paradigm for enterprises. Int. J. Prod. Dev. 2, 1–2 (2005). https://doi.org/10.1504/ijpd.2005.006669 11. Grieves, M., Vickers, J.: Origins of the Digital Twin Concept. Transdisciplinary Perspectives on Complex Systems: New Findings and Approaches. August (2016) 12. Grieves, M., Vickers, J.: Digital twin: mitigating unpredictable, undesirable emergent behavior in complex Systems. In: Kahlen, F.-J., Flumerfelt, S., Alves, A. (eds.) Transdisciplinary Perspectives on Complex Systems, pp. 85–113. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-38756-7_4 13. Can, Ö. Baklacioglu, T., Özturk, E. Turan, O.: Artificial neural networks modeling of combustion parameters for a diesel engine fueled with biodiesel fuel. Energy 247(5), 123473 (2022). https://doi.org/10.1016/j.energy.2022.123473 14. Said, Z., et al.: Modeling-optimization of performance and emission characteristics of dualfuel engine powered with pilot diesel and agricultural-food waste-derived biogas. Int. J. Hydrogen Energy (2022). https://doi.org/10.1016/j.ijhydene.2022.07.150 15. Riess, S., Rezaei, J., Weiss, L., Peter, A., Wensing, M.: Phase change in fuel sprays at diesel engine ambient conditions: modeling and experimental validation. J. Supercrit. Fluids 173, 105224 (2021). https://doi.org/10.1016/j.supflu.2021.105224 16. Song, E., Shi, X., Yao, C., Li, Y.: Research on real-time simulation modelling of a diesel engine based on fuel inter-zone transfer and an array calculation method. Energy Convers Manage. 178, 1–12 (2018). https://doi.org/10.1016/j.enconman.2018.10.014 17. Kumar, M., Tsujimura, T., Suzuki, Y.: NOx model development and validation with diesel and hydrogen/diesel dual-fuel system on diesel engine. Energy 145, 496–506 (2018). https:// doi.org/10.1016/j.energy.2017.12.148 18. Muzaffer Hakan Yardım, Motor Teknolojisi. Nobel Yayıncılık (2019) 19. Sabancı, A., I¸sık, A. Motorlar, ˙I.Y.: Ankara: Nobel Yayıncılık (2012)
Centrality of AI Quality in MLOPs Lifecycle and Its Impact on the Adoption of AI/ML Solutions Arunkumar Akkineni1(B) , Somayeh Koohborfardhaghighi1 , and Shailesh Singh2 1 Deloitte, Austin, USA
{aakkineni,skoohborfardhaghi}@deloitte.com 2 Deloitte, Arlington, USA [email protected]
Abstract. Despite the challenges around incorporating Artificial Intelligence into business processes, AI is revolutionizing the way companies are doing business. The biggest business and social challenge in the adoption of AI solutions is achieving the end users’ trust in the models and scaling prototypes to production ready models in an enterprise environment. Scaling trustworthy AI in a more transparent, responsible, and governed manner could facilitate widespread adoption of AI solutions in an enterprise environment. After conducting an extensive literature review on different aspects of AI quality, we have developed an integrated AI Quality-MLOps framework which enables the development and deployment of AI solutions in an enterprise environment. AI Quality is the center of the proposed framework, and it guides businesses towards putting a complete set of quality metrics, tests, approaches, and algorithms together to ensure conformance with business objectives. This approach improves the delivery efficiency of the solution both during the design and production phase while conforming to the regulatory guidelines adopted by an organization. Keywords: MLOps · AI Quality · AI Ethics · Model Security · Data Privacy · AI Risk Assessment · Explainability · Causality · Reliability
1 Introduction One of the biggest business and social challenge in the adoption of AI solutions is achieving the end users’ trust in the models and translating prototypes to production ready models. Due to the ever-changing needs of our complex business environments as well as technological social responsibility, the issue of trust has expanded its spectrum across other desired quality criteria as well (i.e., refers to AI quality criteria throughout this paper). Trustworthy AI is highly intertwined with the security, privacy, ethical issues, reliability, explainability and performance of the solutions. For example, lack of capacity within an organization to properly explain the reasoning behind AI models, or even to reliably maintain such solutions slow down the adoption process. Also, with the governments introducing regulation of Artificial Intelligence applications and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 436–448, 2023. https://doi.org/10.1007/978-3-031-35510-3_42
Centrality of AI Quality in MLOPs Lifecycle and Its Impact
437
Data Act, stakeholders need to outline, prioritize, and communicate risk related information for these solutions. In order to fulfill such a need, organizations need to setup an AI risk assessment framework which captures the risk profile associated with proposed AI/ML solutions. Also, a systematic framework which guides the development of high-quality AI/ML solutions with proper visualization, monitoring, and evaluation techniques should be in place. We highlight during this paper that AI quality should be central to such a systematic framework. According to [1, 2] AI quality can be defined as the crucial characteristics of societal and business value and risk, or even those characteristics that guide both humans and AI to work together for the best of business and society. Businesses are currently not actively incorporating AI quality into their MLOps framework leading to costly operations to the business and negatively impacting end users’ expectations. Since a machine learning life cycle (MLOps) is built on a continuous and interdependent set of processes, critical areas of AI quality should be properly mapped and addressed during different stages of MLOps. Improper attention to such frequently raised issues at one stage of MLOps propagate to other stages as well. For example, not having an effective bias mitigation at the very beginning of data collection processes not just impacts the performance of the model [30] but also raises liability issues for the organization. We argue in this paper that companies should rethink their operations to ensure the integration of AI quality criteria with their MLOps framework. Therefore, we have developed an integrated AI Quality-MLOps framework, presented in Fig. 1, which ensures the seamless development and deployment of AI solutions in an enterprise environment. AI quality is the center of our proposed framework, and it guides businesses towards putting a complete set of quality metrics, tests, approaches, and algorithms to ensure conformance with business objectives, and to mitigate risk from the AI system to both the organization and broader society. We also capture the interdependency between the various AI quality criteria and discuss how this approach improves the delivery efficiency of the solution both during design and production while conforming to the regulatory guidelines adopted by an organization. The rest of this paper is organized as follows: In Sect. 2 we deliver a general overview of the Machine Learning Operations with a subsequent discussion on the necessity of establishing an AI risk assessment framework for capturing the risk profile associated with proposed AI/ML solutions. We present the proposed integrated framework and its components in Sect. 4. The summary of the work and discussion on the topic are presented in Sect. 5.
2 MLOps MLOps workflows usually undergo a high degree of refinement due to the emerging issues and changing circumstances, therefore, their efficiencies depend on many factors. For example, a simple change in the distribution of input data points (i.e., which might be due to data drift or security issues such as data poisoning) can affect the entire workflow and generate different outcome than the one desired. Therefore, relevant stakeholders require certain visibility depending on where the challenging event pops up across the workflow so that they apply the right control mechanism, guided by the risk assessment framework.
438
A. Akkineni et al.
Fig. 1. Proposed Integrated AI Quality-MLOps Framework.
2.1 Visualization in MLOps Visualization usually is seen as the pictorial representation of the data which guides the stakeholders in having a better fact-base decision-making process. Visualization in machine learning lifecycle is critically important and it can be categorized based
Centrality of AI Quality in MLOPs Lifecycle and Its Impact
439
on the presented functional features. Visualization consists of four main components across the MLOps life cycle. Exploratory Data Analysis (EDA) for example, helps the data scientists to identify interesting patterns in the data. Visualization of ML Model Architecture Tuning assists in choosing the correct algorithm and considering the right architecture for it. Model Performance Visualizations guide us towards depicting the right performance metrics and finding the optimal combination of hyperparameters for the algorithms we choose. Finally, Model Inference Visualization helps the business to observe the dynamics associated with the data and model performance in production, model security and explainability. 2.2 MLOps Monitoring Monitoring has become a necessity for assessing the maturity of AI based systems and it can be performed on both functional and operational levels. Functional level monitoring has its focus on the MLOps’ inference engine; therefore, it continuously tracks ML model’s performance (i.e., monitoring drifts). This ensures that new real-time data or data integrity does not degrade the predictive power of deployed models. Operational level monitoring focuses on operational and machine learning-related issues (e.g., model usage, security, integrity, data protection, access control) and model’s ethical compliance with desired quality criteria (e.g., preserving privacy, addressing bias, fairness and explainability).
3 AI Risk Assessment We analyzed different risk assessment frameworks in the industry (e.g., COSO ERM1 , ICO2 , PDPC3 , COBIT4 , IIA5 ) which would help to evaluate risks associated with the development and deployment of AI/ML models. Each of these frameworks consists of different components with the main goal to prepare the organization for the risks associated with the integration of new technology related solutions into their business ecosystem. However, after careful evaluation of the proposed frameworks and government agency policies [31], a consensus was reached for establishing an ideal AI risk assessment framework with 4 main components which are mainly (1) Governance (2) Risk Identification and Assessment (3) Control Framework and (4) Risk monitoring, Reporting and Communication. Governance piece is responsible for clearly defining organizational policies, standards, and procedures as well as roles and responsibilities of varieties of the stakeholders. This piece also incorporates risk management roles, reporting and communication protocols across business lines and overseeing ethical principles into organizational policies and standards. The Risk Identification and Assessment part of the framework is responsible for the identification of risks associated with data/model lineage, data/model security, model performance, regulatory and compliance risks as well as data/model governance and privacy laws. The third core component is the control framework which defines a set of practices which ensure that the existing policies, standards, and procedures sufficiently address
440
A. Akkineni et al.
AI/ML related concepts and risks. The control framework includes, the adequacy of enterprise-wide data architecture, the strength of data lineage practices, the sufficiency of data security measures and dynamic retraining and updating models. Frequent tracking and monitoring of existing models’ performance, the degree and frequency of monitoring data, the adequacy of documentation requirements for different stages of MLOps also are part of the control framework. Risk monitoring, Reporting and Communication is responsible for establishing appropriate key risk indicators, key performance indicators and adequacy of reporting towards appropriate stakeholders across the enterprise. According to the perceived level of risk, this component determines the appropriate level of human involvement during different stages of MLOps.
4 Proposed Integrated Framework 4.1 AI Quality Criteria The list of AI quality criteria [1, 2] which are the focus of this research is presented in Fig. 1. As one can see, we have ten quality criteria which are divided in two groups. The regulatory group consists of four non-measurable criteria which point to the necessity of the autonomy and control (i.e., human in the loop), social requirements, legal frameworks, ethical and normative guidelines surrounding data and models. We have six measurable AI quality criteria (e.g., Performance Metrics and Fairness) which can be evaluated using technical tests, algorithms, reference standards and measures. 4.1.1 Ethical and Normative Guidelines Ethical Application of AI Index (EAAI) [3] is an advisory framework by the American Council for Technology-Industry Advisory Council (ACT-IAC) which incorporates five core parameters underpinning the impact of AI on organizations: Bias, Fairness, Transparency, Responsibility, and Interpretability. According to this index, humans are ultimately responsible for the ethical application of AI solutions. The index helps relevant stakeholders to identify and discuss the dimensions of the triggers, to define relevant indicators and measures, and finally to assess an AI application’s level of credibility throughout the whole MLOps lifecycle. The output of this step will be used by the AI risk assessment framework for considering and assigning the right protocols and safety nets in place with respective to the solution. In addition to that, the output of the index can guide the AI risk assessment framework towards establishing a proper visualization and adequate monitoring on critical aspects of measuring the credibility of AI solutions. 4.1.2 Social Requirements This component defines action plans for Technological Social Responsibility within a business that aims to adopt and operate AI solutions. The action plans include, empowering the employees, certifying the employees to adopt AI ethically, AI for social good, and filling up the digital gap with respect to the artificial intelligence technology.
Centrality of AI Quality in MLOPs Lifecycle and Its Impact
441
4.1.3 Legal Framework Although the concept of Risk is easy to understand from a simple classification of its intensity levels (i.e., Unacceptable Risk, High Risk, Limited Risk, Minimal to no risk), we need to define a risk spectrum for the end users of a system considering its specific use. Defining this spectrum involves understanding of the significance of information provided by the solution (i.e., as the outcome of the model) as well as the significance of the task for which we are implementing the solution. This process involves high degree of conceptual thinking as well as stakeholder involvements (i.e., domain experts and risk assessors). This process will create an output which represents a risk matrix and can be used for categorizing the types of risk associated with the solution in place [9]. The situation will become even more critical down the road considering the type of algorithm used during the development of the solution. Algorithms are capable of learning and adapting themselves to new environments (e.g., in the presence of new data) and can change their behavior accordingly. Therefore, before jumping into the development tasks, stakeholders need to step back and perform the right level of risk assessment in order to meet obligations and fulfill the right requirements for the liability of an AI/ML solutions. This will be reflected on the degree of accountability and the degree of MLOPs Monitoring as well. 4.1.4 Autonomy and Control Despite the fear around replacing humans by artificial intelligence, human skills matter a lot for the successful AI transformation journey of enterprises and well as development of AI/ML solutions. All the risk assessment and legal frameworks around AI highlight the importance of human in the loop. Autonomy and Control criteria receives the human related control signals from the AI risk assessment framework (i.e., control part) and redirect them to the right stakeholders of the system (i.e., Data scientists, compliance team etc.). Presence of human in the loop is necessary for certain situations in which human judgement is inevitable and necessary. Monitoring thresholds within the MLOps framework for example determines whether automated train/testing of models conform to the reliability criteria or human intervention is required to analyze the model and perform another experiment. 4.1.5 Fairness AI/ML solutions create bias when they are assisting stakeholders with group decision making questions (e.g., who should be eligible for certain job posting, same day delivery, receiving a loan etc.). Therefore, organizations need to make sure that end users of the solution are fairly treated before finalizing their decisions. Traditionally unprivileged groups (i.e., with respect to race, gender, age etc.) should not be treated differently by AI solutions so that organizations be mindful and respectful of their rights. This issue is identified as a risk, guided by Ethical and Normative Guidelines, and ultimately redirected to the established risk assessment framework within the organizations for proper handling. Therefore, organizations need to establish fairness metrics as well as algorithms to identify and tackle bias properly, to avoid certain risks and liability in business. The metrics and algorithms [4] can be used in different stages of MLOPs framework
442
A. Akkineni et al.
from data collection and engineering to model development and deployment/inference stage. In the inference stage, metrics and algorithms help the Human in the Loop to set appropriate fairness thresholds, to trigger the alarm for re-training or selecting models so that the business meets the right metrics/fairness requirement as the data distributions change in the production environment. Since Fairness is a socio-technical challenge, stakeholders need to be aware and cautious that sometimes fairness metrics can be misleading especially when they are addressing different aspects of fairness (i.e., individual [5] versus group fairness). 4.1.6 Explainability and Causality Explainability of a learning model refers to understanding of how the model is making the prediction. Explainability only reveals the correlations among its training data defined per model structure. However, this doesn’t imply that the observed result is the consequence of the cause-effect relationships among variables in the data. Therefore, when it comes to interpreting a model’s predictions businesses should prioritize the competing interests of their stakeholders. Certain stakeholders are only interested to see how the model came to certain decisions. Local and global explanation algorithms provide model specific and model agnostic insights to explain the behavior of AI/ML models and help stakeholders to have a better transparency on how the model operates (i.e., ranking the most important features, average marginal contribution of features [6], etc.). Causal reasoning provides human like explanations of the models and has a hierarchical structure [7]. It provides causal information, in terms of the kind of questions that is in the interest of a business stakeholder. The first level of Causal Hierarchy is called association and involves correlation, which requires no causal information. In the second level of the Causal Hierarchy, we have Intervention, wherein we might be interested in measuring the causal effects from our observational data to quantify the impact of an intervention policy (i.e., specific actions in marketing, user interface design, a government policy, etc.). In order to be able to prove a causal relationship between a feature and the outcome of a model, we need to apply interventions. As depicted in Fig. 1, businesses need to follow different approaches to measure the effectiveness of an intervention policy. Causal reasoning is able to present counterfactual explanations. Counterfactual explanations also have the potential to compare the real world to an alternative world in which the end user’s feature vector could have had different values and due to that the model’s prediction could be different. This feature helps the stakeholders to have a better reasoning while they evaluate the decision made by the AI model for certain individuals (e.g., what could have been done so that a loan could be offered to an individual). As depicted in Fig. 1, such insights provide the human in the loop a better understanding of the behavior of the models during the inference stage. Proper integration of explainability and causality with MLOp’s visualization and monitoring pieces could inform the human in the loop that re-training and testing of a model is needed. Based on the qualitative guidelines provided by AI quality criteria such as Legal Framework and Ethical and Normative guidelines, the Risk Assessment Framework guides the human in the loop to decide if the explainable insights are trustworthy.
Centrality of AI Quality in MLOPs Lifecycle and Its Impact
443
4.1.7 Data Protection The continuous flow of data within MLOps framework expose organizations to data security and privacy risks. While data security risks need establishing data access control, privacy risks require controlling what can be inferred from a data release. Data security and privacy impact assessments are both preventative measures which assist organizations in this regard. Organizations can use Data/Privacy impact assessment framework (i.e., PIAs, DPIAs) to mitigate organizational privacy risks in the early stages of their MLOps life cycles. While the focus of PIAs is on general data privacy risks associated with new projects, stakeholders should consider and perform a DPIAs for projects in which there is a high risk of damage to the rights and freedoms of the end users of the AI/ML solutions. Also, due to privacy attack (i.e., linkage of third-party information to sensitive data records), while analyzing sensitive data, stakeholders should consider data anonymization techniques such as Differential Privacy [8]. Differential Privacy is a critical criterion for MLOps, which leads to a huge improvement in preserving the privacy of an AI/ML solution’s end users. Theoretically, Differential Privacy delivers an ε-differentially private mechanism where epsilon is its privacy cost. This technique adds some level of noise (i.e., depends on the epsilon value) to the data so that it produces approximate statistics out of data analysis. This way a person or group can no longer be identified. 4.1.8 Reliability As depicted in Fig. 1, Reliability of an AI/ML solution can be applied to different aspects of its decision-making process such as “Uncertainty” in prediction, “Robust Generalization” of the solution with respect to the out-of-distribution (OOD) data, and finally “Adaptation” ability of the Model over the course of its learning process. In the following examples, we discuss the importance of each of these topics in detail. The first topic is around the presence of “Uncertainty” in AI/ML solutions. For instance, although it is a common practice to develop a machine learning model to make a prediction, such a behavior should be avoided in certain applications especially those with high level of risk (e.g., AI enabled medical devices for diagnosing certain disease [9]). The ML model architect should consider a reject option in the model design, if the risk of making a misprediction is high and it could lead to costly decision making [10]. Since this reject option will help to keep the coverage (i.e., by reducing the error rate) as high as possible, the method not just positively impacts the model’s performance but also equips the model with a graceful failure mechanism. It also draws stakeholders’ attention to such data samples for further investigation since the issue might be due to lack of training samples or the model being exposed to an adversarial example (i.e., poisoned features store). In another situation, due to insufficient domain information on the whole number of classes in the training dataset, we might need to follow techniques (i.e., Open Set Recognition [11], Discriminatory or Generative Models) which help us in designing model architectures to classify the known and recognize the unknown classes. The second topic is around robust generalization of AI/ML solutions. In traditional machine learning paradigm, the model is built using training data, and test data is used to estimate the model quality. However, since inference data distribution might be different
444
A. Akkineni et al.
from baseline (i.e., training) data distributions, or the training data may come even from multiple domains, we should expect the model performance to be prone to degradation. Fortunately, the existence of new learning paradigms such as Invariant Causal Prediction [12] and Invariant Risk Minimization [13] help to build predictive models which are capable of better generalization beyond the training data and optimize the accuracy of a model. Invariant Risk Minimization for example considers knowledge about the data selection process which are coming from different environments and fit predictors that are approximately invariant across environments. Invariant Causal Prediction also assumes that some aspects of the data distribution vary across the training set because of the noise imposed by the environment but that the underlying causal mechanism remains constant. The third issue is around Adaptation of AI/ML solutions, especially in situations where there is scarcity of supervised data. In fact, lack of appropriate data is one of the reasons that companies do not progress towards adoption of AI/ML solutions. Data labeling process has its own cost (i.e., time-consuming), therefore, companies need to use algorithms which can address the challenge of learning without using data labelling (e.g., Active learning [14], Zero-shot learning [15], One-shot learning [16], Few-shot learning [17]). Such algorithms prioritize the data which should be labelled beforehand in order to have the highest impact to training a supervised model. Uncertainty, Robust Generalization and Adaptation have a significant impact in the reduction of the MLOps’s development and maintenance costs. Since such concepts are context specific as well, ethical and normative guidelines will have a significant role in communicating the frequently raised issues to stakeholders so that the risk assessment framework properly captures and considers certain control measures for them. Since all three concepts directly impact the performance of the model and its underlying architecture, proper visualization and monitoring are needed for the effective human in the loop engagement. 4.1.9 Security Security concerns across the MLOps lifecycle is not just limited to data protection and IT infrastructure. One major security risk which businesses should expect is the presence of AI attacks and their potential in severely damaging the privacy, integrity and performance of the AI/ML solutions [18]. As depicted in Fig. 1, there are different categories of attacks on the ML models based on the actual goal of the attacker and the stages of the MLOps life cycle. The attacker might target the model’s feature store with a Poisoning Attack [19] or target the inference engine with a Model Evasion Attack [19] to disrupt the model as it makes predictions. The adversary also may aim to reconstruct the input data from the confidence scores predicted by the deployed model (i.e., Model Inversion Attacks [20]) or to duplicate the parameters of the models (i.e., Model Extraction Attacks [21]); or even to predict whether a particular data point was contained in the model’s training dataset (i.e., Membership Inference Attacks [22]). As one can see, some of these attacks lead to violation of user privacy and some seriously damage the models’ predictive performance. Therefore, businesses should not just familiarize themselves with the major cause of attacks but also consider defense mechanisms for faster intervention to avoid model
Centrality of AI Quality in MLOPs Lifecycle and Its Impact
445
degradation. We have mapped variety of attacks and their relevant defense mechanisms across the MLOps lifecycle in Fig. 1. This figure outlines a framework to protect the models in training and inference across three dimensions which are mainly confidentiality (i.e., privacy violation), integrity (i.e., through data poisoning or model Evasion attacks), and availability (i.e., denial of service) [18]. As depicted in Fig. 1, Data Poisoning attacks are happening during the training phase, and they target the model’s Feature Store. The goal of Data Poisoning attack is to inject and/or manipulate the training data to negatively influence the outcome of the model or corrupt its logic. Defensive strategies to protect against Poisoning Attacks include the use of Differential Privacy and improving the robustness of the models [23]. Oracle attacks such as Model Inversion Attack, Model Extraction, and Membership Inference Attack are used during inference stage to extract the data, its importance, and even its belonging in the feature store without attacking the feature store itself. Defensive strategies against oracle attacks [24] during the inference stage include but not limited to adversarial training, defensive distillation, feature squeezing etc. Model Evasion attacks use marginally perturbed inference data to cause the model to misclassify the observations during the testing phase. Defensive strategies for such attacks include model robustification techniques such as adversarial training, training a separate classification layer in a deep neural net architecture, and creating a separate null class for misclassified samples [25]. A third class of attacks include Denial of Service and Distributed Denial of Service attacks which require robust IT operations and threat detection [25]. 4.1.10 Performance The last measurable component in AI quality is Model Performance. In the literature we can find variety of methods and metrics to measure the performance of a model during its development and production. However, this is not the focus of this section. Instead, we focus on the performance degradation prediction here. Although the underlying cause can be different case by case (i.e., different kinds of drifts, security issue, etc.), early detection of performance degradation cuts the costs across MLOps lifecycle (i.e., for redesigning the model). In the training phase for example, a change in the distribution of training data (i.e., known as Data Drift) can lead to a model’s performance degradation over time. Therefore, as it is shown in Fig. 1, stakeholders need to frequently track and capture the changes in the distribution of training data (i.e., batches of data distributions) through different statistical tests and techniques. Having said that, since the performance degradation can be visualized and monitored at the inference stage as well, stakeholders should utilize real-time metrics and algorithms (e.g., ADWIN [26], DDM [27], EDDM [28], PHT [29]) to detect performance degradation of a deployed model as early as possible. This process helps the Human in the Loop to set appropriate data drift thresholds, to trigger the alarm for updating the feature store, re-training or even selecting another model for updating the model registry.
446
A. Akkineni et al.
5 Conclusion and Future Work The presence of trust issues in the adoption of AI/ML solutions have their roots in not having a clear ethical guidance, lack of regulatory transparency and accountability, and faulty operations during the design, development, and deployment of AI/ML solutions. Despite observing a huge amount of effort by public and private organizations around trustworthy AI, the concepts around AI quality used by these organizations have only surface-level differences. We believe that in order to put trustworthy AI to practice, organizations need to familiarize themselves with the recent approaches and advancement in the area of AI/ML so that conceptual foundation can be the basis for the development and deployment of AI Solutions. In this paper, we have developed an integrated AI Quality-MLOps framework with differentiating features such as Causal Reasoning, Model Security (Attacks and Defense Strategies), Model Reliability (Dealing with Uncertainty, Robust Generalization and Adaptation), Data/Model Protection (Through Differential Privacy), Model Performance (Drift Metrics). In addition to that, we have captured the interdependency between the presented AI Quality criteria across the MLOps lifecycle. We believe that the adoption of AI Quality metrics and methodologies through the entire MLOps lifecycle safeguards the business from operational risks and cost escalations. As an extension to the current study, we aim to identify the cost associated with improper adoption of AI quality criteria and the lack of a robust AI risk assessment framework. Acknowledgements. As used in this document, “Deloitte” means Deloitte Consulting LLP, a subsidiary of Deloitte LLP. Please see www.deloitte.com/us/about for a detailed description of our legal structure. Certain services may not be available to attest clients under the rules and regulations of public accounting. This publication contains general information only and Deloitte is not, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or other professional advice or services. This publication is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified professional advisor. Deloitte shall not be responsible for any loss sustained by any person who relies on this publication. Copyright © 2022 Deloitte Development LLC. All rights reserved.
References 1. Schmitz, A., Akila, M., Hecker, D., Poretschkin, M., Wrobel, S.: The why and how of trustworthy AI. at-Automatisierungstechnik 70(9), 793–804 (2022) 2. DIN (German Institute for Standardization) Homepage. https://www.din.de/resource/blob/ 772610/e96c34dd6b12900ea75b460538805349/normungsroadmap-en-data.pdf. Accessed 16 Oct 2022 3. American Council for Technology-Industry Advisory Council’s Homepage (ACT-IAC). https://www.actiac.org/system/files/Ethical%20Application%20of%20AI%20Framework_ 0.pdf. Accessed 16 Oct 2022 4. Bellamy, R.K., et al.: AI Fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63(4/5), 4:1–4:15 (2019)
Centrality of AI Quality in MLOPs Lifecycle and Its Impact
447
5. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226 (2012) 6. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems. vol. 30 (2017) 7. Pearl, J.: The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62(3), 54–60 (2019) 8. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14 9. FDA Homepage. https://www.fda.gov/media/122535/download. Accessed 16 Oct 2022 10. Cortes, C., DeSalvo, G., Mohri, M.: Learning with rejection. In: Ortner, R., Simon, H.U., Zilles, S. (eds.) ALT 2016. LNCS (LNAI), vol. 9925, pp. 67–82. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46379-7_5 11. Boult, T.E., Cruz, S., Dhamija, A.R., Gunther, M., Henrydoss, J. Scheirer, W.J.: Learning and the unknown: surveying steps toward open world recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33, No. 01, pp. 9801–9807 (2019) 12. Peters, J., Buhlmann, P., Meinshausen, N.: Causal inference using invariant prediction: identification and confidence intervals. arXiv. Methodology (2015) 13. Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893. (2019) 14. Settles, B.: Active Learning Literature Survey. (2009) 15. Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International conference on machine learning, pp. 2152–2161 (2015) 16. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. pattern Anal. Mach. Intell. 28(4), pp. 594–611 (2006) 17. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. (2016) 18. Frazão, I., Abreu, P.H., Cruz, T., Araújo, H., Simões, P.: Denial of service attacks: Detecting the frailties of machine learning algorithms in the classification process. In: Luiijf, E., Žutautait˙e, I., Hämmerli, B.M. (eds.) CRITIS 2018. LNCS, vol. 11260, pp. 230–235. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05849-4_19 19. Newaz, A.I., Haque, N.I., Sikder, A.K., Rahman, M.A., Uluagac, A.S.: Adversarial attacks to machine learning-based smart healthcare systems. In: IEEE Global Communications Conference, pp. 1–6 IEEE (2020) 20. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015) 21. Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: 25th USENIX Security Symposium, pp. 601–618 (2016) 22. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: IEEE Symposium on Security and Privacy, pp. 3–18 IEEE (2017) 23. Wang, C., Chen, J., Yang, Y., Ma, X., Liu, J.: Poisoning attacks and countermeasures in intelligent networks: status quo and prospects. Digital Commun. Netw. 8(2) (2021) 24. Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., Mukhopadhyay, D.: Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069 (2018) 25. Tabassi, E., Burns, K.J., Hadjimichael, M., Molina-Markham, A.D., Sexton, J.T.: A taxonomy and terminology of adversarial machine learning. NIST IR, 1–29. (2019) 26. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. Society for Industrial and Applied Mathematics (2007)
448
A. Akkineni et al.
27. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29 28. Baena-Garcıa, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavalda, R., Morales-Bueno, R.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams. Vol. 6, pp. 77–86 (2006) 29. Page, E.S.: Continuous inspection schemes. Biometrika 41(1/2), pp.100–115 (1954) 30. Cotter, A., et al.: Training fairness-constrained classifiers to generalize. In: ICML Workshop: Fairness, Accountability, and Transparency in Machine Learning (2018) 31. Federal Housing finance Agency’s HomePage. https://www.fhfa.gov/SupervisionRegulat ion/AdvisoryBulletins/Pages/Artificial-Intelligence-Machine-Learning-Risk-Management. aspx. Accessed 08 Nov 2022
A Survey on Smart Home Application: The State-of-the-Art and Future Research Trends Riya Sil(B)
, Shabana Parveen, and Rhytam Garai
Adamas University, Kolkata 700126, India [email protected]
Abstract. In the recent technological era, the enormous quantity of workload of our society has led to an insufficient amount of time for daily household chores, errands and other home service requirements such as the need for plumbers, electricians, carpenters, chefs, etc. [1]. The rate of online services including home service booking and product purchase has gained a lot of popularity in recent years. Online services have posed quite a huge amount of threat to the daily-wage labourers and other workers who are still not familiar with technology and the internet. Nowadays, people prefer to stay at the comfort of their home and feel more convenient when it comes to booking online services. Considering all these problems, in this paper, the authors have given a detailed overview of various smart home applications that offer assistance to the customers to connect directly with the workers who will deliver service at doorsteps [2]. The main purpose of this paper is to provide society with various smart home applications that in turn will provide standardized workers who are specialized in specific fields and can provide service to the customers accordingly. Furthermore, numerous research challenges and future trends have been presented to apprehend the smart home application concept. Keywords: Home application · Customer service · Online service · Event Organization · Android application
1 Introduction A smart home application is referred to as a home service application or a smart home service application that is used to efficiently control non-computerized devices connected at home, with the help of a smartphone [20]. Home service apps can be used for single-purpose, for example using the app to control household lights to automatically manage the lighting of the house [5]. Home service apps can automatically control different things which include heater, air conditioner, television system, doors, windows and coverings, security alert systems, water sprinkler systems in the garden and household appliances [4]. Nowadays with the help of appropriate sensing technologies i.e., sensors available in the market, smart home automation apps monitor every particular activity that happens in the surroundings and provide an alert if there is any kind of problem or something suspicious which doesn’t happen regularly [6]. Efficient security systems and other efficiency in other appliances have led to the growth of the smart home © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 449–459, 2023. https://doi.org/10.1007/978-3-031-35510-3_43
450
R. Sil et al.
application [15]. Smart home service application offers efficient solutions to all kind of needs for elderly people and their respective families, helps to monitor the physiological problems and different kinds of functional issues, and also provide proper aiding when an emergency situation is found [19]. Smart home service application features in both wireless and wired systems. Wireless systems can be installed easily [9]. Making a wireless automatic home system with maximum features provided such as house lighting system, temperature control within a room, and security system, which makes it user-friendly and also cost-efficient [11]. On the other hand, wired systems, are considered reliable more because they are not easy enough to be hacked [21]. But to mention a major drawback of a wired system is that they are not cost effective. India being a growing nation still searches for a tradesperson physically to fix a problem. For this specific reason, the authors have proposed a smart home application [9]. The idea of home service application will always be helpful for those people who value their time and the unemployed ones. The idea of home service application is very vital and will also benefit numerous people across the country. Nowadays every person uses smart phone for which it will be easier for them to use home service application rather than going outside and finding someone to get the job done without any prior knowledge of the tradesperson’s ability and efficiency [23]. These home service applications are always helpful mostly for working people to cope-up with their busy schedule and fix their household chores simultaneously [15]. Home service application is a field where people can trust to get qualified workers with proper skills in their specific fields and fix everything very easily and efficiently [7]. There are some systems that can be added to smart home service application such as [17]: i)
Customer Demand: Customer demands of a worker with more work experience rather than accepting the default worker provided by the application. ii) Customer Urgency: Customer demands for a worker for urgent purpose. iii) Customer Requirement: Customer demands for requirement of worker specialized in different fields. There are already many home service applications available such as URBANCLAP, HOUSEJOY etc [19]. The proposed application would provide assistance to the users and also help unemployed people with employment. In this paper, authors have focused to bring a change in the society by making citizens aware of the availability of these kinds of services and also help unemployment.
2 Smart Home Technology Smart Home Technology is a combination of technology and services for a better and easy way of living [18]. Different technologies are used to provide sharp monitoring and allowing easy controlling balanced interaction [1]. One can enable single button and voice control of various home systems at the same time. The daily house works and activities are done by one click or communication can be done over the voice control without any interference which is much easier, much safer and more efficient with less expense. it will improve home comfort, security. It can also be used for aged people and those with incapacity providing a safe, secure and easy environment.
A Survey on Smart Home Application
451
2.1 Smart Devices The smart devices are used in various aspects, which includes: (i) (ii)
(iii) (iv)
(v) (vi)
Security – With Smart Security sensing of fire, water leaks and gas leaks can be found out and cameras can track outside of the home even if it’s dark outside [6]. Environment – Lights, fans, air conditioning, heating, and energy usage can be controlled with the help of remote control [11]. When a person is not at home and forgets to turn off the lights and fans, we turn them off with one click from the smartphone [20]. Welfare – One can monitor health, consult personal trainer, and diagnosis through smart devices [16]. Entertainment - Television, smart home theatre, multi-room audio, video, games are part of entertainment [17]. Audio can be connected all over the house as per requirement [6]. Communication – Various communications such as video call, home calendar, reminders, communication inside and outside the house [22]. Green – Helps to reduce electricity with the use of sensors which is mostly needed nowadays.
2.2 Smart Home Network Smart home network technologies are divided into two main types are: (i) Wiring System- people may want to put the wire inside wall and there are many home-automations which are connected via wiring system for example new wire (optical fiber), Powerline etc. (ii) Wireless System- In wireless system there should be two main component they are sender and receiver [24]. Some new appliances use the wireless technology to be in touch with other devices [7]. Some examples of wireless communication system are microwaves, radio frequency (RF), Infrared (IR), Wi-Fi, Bluetooth and many more [9]. Some of smart home network can work with both wiring system and wireless system [11]. Z-wave is an example of wireless communication system for smart home which is a good and affordable wireless smart home [6]. 2.3 Smart Home Controller Smart home controlling devices are used for managing the systems by forwarding signal or data to control the switch [10]. Remote control is not the only example of the controllers, but also tablets, web browsers are like controller. One can control devices with the controller even when people are far from home [8]. 2.4 Smart Kitchen Among all the application of smart home technology smart kitchen is most fascinating. Some smart appliances are refrigerators, coffee makers, microwaves, and dishwashers
452
R. Sil et al.
[3]. The smart refrigerator which is also known as the Internet Refrigerator is able to communicate with internet and make things much easier [5]. The refrigerator also takes an automatic record of items inside of it and it can alert the users [23]. Moreover, the refrigerator also takes an automatic record of items inside of it and it can alert the users to what is there. Microwaves can communicate with smart refrigerators and suggest recipes based on the foods available in the refrigerator [13].
3 Smart Home Energy Management System (SHEMS) In this era of 4th Industrial Revolution (which is a fusion of advances in Artificial Intelligence, Robotics, the Internet of Things) [12]. There is an urgent need of Energy Management System as in every aspect of our day-to-day life we use energy may it in the form of fuel energy or electric energy from cooking to preserving the foods in our refrigerators [23]. As the raw materials to produce energy are mostly non-renewable we need to have a proper Energy Management System to use them sustainably, hence the key functions of a productive EMS are to monitor, optimize, control and check the usage of energy in our Industries and most importantly our homes [14]. 3.1 Generation and Storage of Renewable Energy Sources As there are limited number of resources for the production of energy, we went to find Renewable Energy Sources, mainly- i) Solar Energy, ii) Wind Energy, iii) Water Energy. Sun is the ultimate and primary source of energy for every living being on earth, plants trap photons from the sunlight to make its food. Taking inspiration from plants humans can also use Solar Energy to generate electrical energy by the help of Solar panels [24].We can store the generated energy in a rechargeable battery & use it, which is renewable, free of cost & eco-friendly [2]. Governments can help to reduce the price of solar panels and installation charges by tax cutting so that it becomes easier for general public to install them in their houses [4]. Instalment of solar panels is a one-time investment and time to time servicing with minimum cost which is minimal in the long run compared to traditional electricity services that we receive from the power plants [6]. This will also help to reduce different types of pollution that occurs while generating electricity in power plants with the help of Uranium 235 which is a non-renewable resource. In places where wind currents are high, windmills can be installed to generate energy & store the generated energy in batteries. Generating of electric energy by the help of flowing water is done mainly in dams by help of spinning turbines [3]. 3.2 Category of Energy Usage in Residue There are various ways in which energy is used in our houses, they are mainly, i) Electrical Energy (usage- Home appliances, EVs, Cooking on inductions) [13] ii) Fuel Energy (usage- cooking using LPG, petrol & diesel in vehicles) [16]
A Survey on Smart Home Application
453
3.3 Implementation of SHEMS Electricity is the main energy resource people use in their houses, it is very important to manage it properly. In this era where interaction of human & computer is advancing every day, we can use this to automate our homes to save energy. By the help of AIML (Artificial Intelligence & Machine Learning) based applications we can train our home appliances to regulate our energy usage & reduce energy wastage by analysing & learning from our day-to-day life style & habits. For e.g., an Air conditioner consumes a lot of electricity but it can be cut down by the help of automated sensors. Whenever the room attains a certain temperature, the AC will go into sleep mode & will consume very less energy & again when the temperature drops or raises to a certain degree it will turn itself on by the help of sensors attached to it, the limits of the temperature will be set by the user beforehand. This can reduce the wastage of energy by ACs even when we are sleeping peacefully.
4 Survey Results Our aim is to make an application for all the small-scale business retailers and labourers to simply connect with the shoppers and cater to their home service requirements at their doorsteps through this platform [10]. A tiny low contribution to get rid of unemployment [17]. It’s an initiative to assist those who are removed from technology to remain within the ever-growing competition. Some apps provide only few services that supported the availability of their professional employees [9]. At Doorstep would haven’t any such problem because it would connect the customer on to the shop/labour on the premise of their requirement [8]. Customers can even book their employees supported the ratings that a specific labour/shop has. Table 1 presents a detailed comparison of various tools used. Table 1. Comparison among various Smart home application in India Name Of App
Tools Or Technologies Used
Description
Limitations
HouseJoy [25]
Tech stack used in Housejoy is as follows: Application: jQuery, React, NGINX, Slick, Socket.IO Utilities: Google Tag Manager, Google Maps Business Tools: Gmail, Google Forms, Zoho Mail
HouseJoy is a home • Lack of manpower service application • Lack of that has been gaining professionalism by a lot of popularity service man • Delay in refund to lately. It offers customers day-to-day services such as house washing, electricians, AC Maintenance, Salon at Home, plumbing, and many more (continued)
454
R. Sil et al. Table 1. (continued)
Name Of App
Tools Or Technologies Used
Description
Limitations
Zimmber [26]
Tech used in Zimmber is as follows: Application: React, Slick, Socket.IO Utilities: Google Maps, Google Tag Manager Business Tools: Gmail, Google Forms
The company • Service provided is Zimmber online not good marketplace • High in cost provides electrical services that includes electrical installation and repairs, plumbing services, air conditioner services, house painting services, enabling users to review providers and schedule appointments for a range of home-repair projects
Urban clap [27]
Tech used in urbanclap is as follows: Application: Nodejs. Nginx, React, Angularjs. Redis Utilities: Google Maps, Google Tag Manager Business Tools: urbanclap app
UrbanClap is a online marketplace for local services such as repair & maintenance, home cleaning, homecare & design, pest control, packers & movers, business services, event management, weddings & party management, health & wellness, salon, etc
• High in cost • Lack of professionalism with respect to customer service • No customer service available
(continued)
A Survey on Smart Home Application
455
Table 1. (continued) Name Of App
Tools Or Technologies Used
Description
Limitations
Helpr [28]
Tech used in Helpris is as follows: Application: HTML5, Google analytics, and jQuery Utilities: Google Maps, Google Tag Manager Business Tools: Helpr app
Helpr is a online marketplace for local services such as home cleaning, homecare,pest control, packers & movers, health & wellness, salon, etc
• Extremely poor service • Customer service not good
TaskBob [29]
Tech used in TaskBob is as follows: Application: Google, Slick Utilities: Google Maps, Google Tag Manager Business Tools: TaskBob app, Gmail
It provides home and beauty services. You can hire a driver, plumber, electrician, cleaning professionals and beauticians using the App
• Scalability of business • Lack of profit margin • Customer dissatisfaction • Lack of business strategy • Lack of customer frequency
Timesaverz [30] Tech used in Timesaverz is as follows: Application: MYSQL, NGINX Utilities: Google tag manager, Google maps Business tools: Gmail, Timesaverz app
Timesaverz helps to get curated home service providers for various services ranging from home cleaning, pest control, appliances repairs, house interior and handyman jobs
• Customer dissatisfaction • Bad service • Lack of professionalism by service man • Lack of professionalism by customer service • High cost
Doormint [31]
DoorMint is a mobile platform to book on-demand laundry services. Then an agent will pick up the clothes for washing, iron and dry cleaning and delivers back at the pre-scheduled time
• Customer not happy with the application • Poor service • Customer service is extremely poor • Products not even in usable condition
Tech used by Doormint is as follows: 1. Application: NGINX, MySQL, Ruby, Rails 2. Utilities: Google Analytics, Amazon CloudFront, ElasticSearch 3.Business Tools: GSuite,AdRoll,SnapEngage, MadMimi
(continued)
456
R. Sil et al. Table 1. (continued)
Name Of App
Tools Or Technologies Used
Description
Limitations
Sulekha [32]
Tech used by Sulekha is as follows: 1. Application: MONGODB, NOSQL 2. Utilities: Google Maps, Google Play Store 3. Business Tools: Sulekha app, Gmai
Sulekha is a online • Service quality is marketplace for extremely poor • Service provided to local services such customers is not at as cleaning, laundry all good services, repairing • Highly dissatisfied tasks, homecare & customers design, pest control, Professional services, business services, event and functions, health and wellness
Thumbtack [33] Tech used by Thumbtack is as follows: 1.Application: React, Slick, Socket.IO 2. Utilities: Google Maps, Google Tag Manager 3. Business Tools: Gmail, Google Forms, ThumbTack app
Thumbtack is a • Customer not home service app provided proper that helps people services • No assistance for find out ratings of customers with local service regard to payments providers and hire • Emails sent even if them without any customer not cost. You’ll find willing to use any people with specialization in a various departments that include house cleaning, taking care of pets etc
Mr Right [34]
Mr. Right’s website • Accessibility is a gives an user major area of friendly platform concern • Customer service that shows lot of not up to the mark easy home services and bad experience such as plumbing, carpentry, electrical, • Poor review from customers cleaning, pest control, etc. The app gives lot of home services from verified professionals, with less pricing, and every possible payment options
Tech used by Mr Right is as follows: 1. Application: Slick, NodeJS, MYSQL 2. Utilites: Google Maps, Google Tag Manager 3.Business Tools: Gmail, Google Forms, Amazon, Mr Right app
A Survey on Smart Home Application
457
5 Conclusion This paper represents the efficient and smart home and household services required in our daily lives. The smart home services will help to ease the various hard and timeconsuming jobs, making it in less time construct. The application uses database application like MYSQL, for location purposes uses Google Maps, and also look for better customer support. Tech support will also be checked thoroughly for security purposes. The environment and interface will help a great deal for the new or unexperienced users. It will not only provide jobs for many unemployed as well increase the areas of the employed. It also increases the profit margin of small scale shopkeepers and labourers who are either devoid or far from internet, since it will increase their area. In this great boom of web and app applications it will open doors for many employments, thus not only increasing the rate of employment but solve many problems we are facing today [24]. Thus, “Hitting two birds with a single stone”. The results of this application are calculated and has been efficient. Several surveys have been taken both from customers as well as personal viewpoints. A lot of research work has been done on the advantages and the disadvantages of the applications working on the same domain so that this application has less or no disadvantages at all so that it can give the customers a smoother and error free experience. It solves a big problem for a person who is totally new to a place where he has to live for work and other purposes. The paper also has discussions of some challenges such as costs, fake profiles, security breach and implementation issues. The future scope of this application involves mainly on upgrading various faults of the existing application interface and also adding various new services like some outdoor occasions like birthdays, wedding, funerals that requires more man power and more employability. The upgrade of the app in future will be to connect the different stores in one interface (For example in case of birthday parties, shops required will be: Cake shop, Restaurants with food delivery, Catering, Décor Shops, Ground Booking Agency etc. These required shops or agencies or services must be connected and presented to the customers) [15]. The required stores and agencies must be cumulated together and presented as a package to the customers which will be provided into 3 levels, that will vary from prices to prices, in the form of cheapest to most expensive, thus it will be able to serve a large population since the money margin will increase. The future upgrading of the app will avail features like personal app wallet, discounts, increased commission as time goes on. Exciting and attractive deals will also be provided to attract more customers like “combo offer”, “1 service free from 1 paid service”. Thus, the paper reviewed the definition and explanation of smart home services, their advantages, how the application will be helpful both for customers and the connected clients etc. and also for managing special occasions like birthdays, wedding, funeral etc [20].
References 1. Sessa, M.: Home smart home. Smart Soc. 68–84 (2019). https://doi.org/10.4324/978042920 1271-5 2. Cui, X.: Smart home: smart devices and the everyday experiences of the home. Materializing Digital Futures (2022). https://doi.org/10.5040/9781501361289.ch-013
458
R. Sil et al.
3. Costa, L., Barros, J., Tavares, M.: Vulnerabilities in IOT devices for smart home environment. In: Proceedings of the 5th International Conference on Information Systems Security and Privacy (2019). https://doi.org/10.5220/0007583306150622 4. Govada, S.S., Rodgers, T., Cheng, L., Chung, H.: Smart environment for smart and sustainable Hong Kong. smart environment for smart cities, 57–90. https://doi.org/10.1007/978-98113-6822-6_2Application of embedded system in voice recognition control of Smart Home. (2017). Int. J. Recent Trends Eng. Res. 3(3), 112–119 (2019). https://doi.org/10.23883/ijrter. 2017.3053.alfmy 5. IOT based Smart Home Automation System. Int. J. Res. Eng. Appl. Manage. 6(3), 34–41 (2020). https://doi.org/10.35291/2454-9150.2020.0436 6. Guebli, W., Belkhir, A.: TV home-box based IOT for smart home. In: Proceedings of the Mediterranean Symposium on Smart City Application - SCAMS 2017 (2017). https://doi. org/10.1145/3175628.3175634 7. Harper, R.: Inside the smart home: ideas, possibilities and methods. In: Harper, R. (ed.) Inside the Smart Home, pp. 1–13. Springer-Verlag, London (2003). https://doi.org/10.1007/1-85233854-7_1 8. Marikyan, D., Papagiannidis, S., Alamanos, E.: Smart home sweet smart home. Int. J. E-Bus. Res. 17(2), 1–23 (2021). https://doi.org/10.4018/ijebr.2021040101 9. Leppënen, S., Jokinen, M.: Daily routines and means of communication in a smart home. In: Harper, R. (ed.) Inside the Smart Home, pp. 207–225. Springer-Verlag, London (2003). https://doi.org/10.1007/1-85233-854-7_11 10. Karimi, K., Krit, S.: Internet of Thing for smart home system using web services and android application. In: Elhoseny, M., Singh, A.K. (eds.) Smart Network Inspired Paradigm and Approaches in IoT Applications, pp. 191–202. Springer, Singapore (2019). https://doi.org/ 10.1007/978-981-13-8614-5_12 11. Front matter. Smart Home Technologies and Services for Geriatric Rehabilitation, iii (2022). https://doi.org/10.1016/b978-0-323-85173-2.00014-x 12. Willetts, M., Atkins, A.S., Stanier, C.: Big Data, Big Data Analytics application to Smart Home Technologies and services for Geriatric Rehabilitation. In: Smart Home Technologies and Services for Geriatric Rehabilitation, Academic Press, Cambridge, 205–230 (2022). https:// doi.org/10.1016/b978-0-323-85173-2.00001-1 13. Yan, H., Liu, J.: An empirical study on the behavioral intention to use mobile group purchase apps: taking meituan app as an example. In: 2019 16th International Conference on Service Systems and Service Management (ICSSSM) (2019). https://doi.org/10.1109/icsssm.2019. 8887607 14. Mrinal, M., Priyanka, L., Saniya, M., Poonam, K., Gavali, A.B.: Smart home — automation and security system based on sensing mechanism. In: 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT) (2017). https://doi.org/ 10.1109/icecct.2017.8117986 15. Jia, H.-H., Ren, H.-P., Bai, C., Li, J.: Hyper-chaos encryption application in intelligent home system. In: 2017 International Conference on Smart Technologies for Smart Nation (SmartTechCon) (2017). https://doi.org/10.1109/smarttechcon.2017.8358522 16. Luo, R.C., Lin, H.-C., Hsu, Y.-T.: CNN based reliable classification of household chores objects for service robotics applications. In: 2019 IEEE 17th International Conference on Industrial Informatics (INDIN) (2019). https://doi.org/10.1109/indin41052.2019.8972242 17. Widespread home testing can keep people safe and get them back to work — here’s how, Forefront Group (2020). https://doi.org/10.1377/forefront.20200406.55720 18. Jyani, N., Bansal, H.: UrbanClap: india’s largest home service provider. Asian J. Manage. Cases 097282012110189 (2021). https://doi.org/10.1177/09728201211018978
A Survey on Smart Home Application
459
19. Al-Atwan, N.S., Nitulescu, M.: Monitoring and controlling home appliances by different Network Technologies. In: 2020 21th International Carpathian Control Conference (ICCC) (2020). https://doi.org/10.1109/iccc49264.2020.9257242 20. Electromagnetic compatibility. product family standard for audio, video, audio-visual and Entertainment Lighting Control Apparatus for professional use. (n.d.). https://doi.org/10. 3403/bsen55103 21. Szoniecky, S., Toumia, A.: Knowledge design in the Internet of Things: blockchain and connected refrigerator. In: Proceedings of the 4th International Conference on Internet of Things, Big Data and Security (2019). https://doi.org/10.5220/0007751703990407 22. Issa Ahmed, R., (n.d.): Wireless network system based multi-non-invasive sensors for Smart Home. https://doi.org/10.22215/etd/2013-07201 23. Specification for safety of household and similar electrical appliances. particular requirements. (n.d.). https://doi.org/10.3403/00018443u 24. Adler-Karlsson, G.: How to get rid of unemployment and transform work into play. Work – Quo Vadis? 101–117 (2019). https://doi.org/10.4324/9780429427985-6 25. Housejoy. https://www.housejoy.in/. Accessed 29 Oct 2022 26. Zimmber. https://www.crunchbase.com/organization/zimmber. Accessed 23 Oct 2022 27. Urban clap. https://www.urbancompany.com/. Accessed 23 Aug 2022 28. Helpr. https://www.helpr.in/terms. Accessed 23 Oct 2022 29. TaskBob. https://www.crunchbase.com/organization/taskbob. Accessed 3 Sep 2022 30. Timesaverz. https://www.crunchbase.com/organization/timesaverz-com. Accessed 2 Sep 2022 31. Doormint. https://doormint.in/. Accessed 23 Aug 2022 32. Sulekha. https://www.sulekha.com/. Accessed 23 Aug 2022 33. Thumbtack. https://www.thumbtack.com/more-services. Accessed 02 Oct 2022 34. Mr Right. https://www.mrright.in/. Accessed 26 Sep 2022
A Survey on Currency Recognition Method Riti Mukherjee, Nirban Pal, and Riya Sil(B) Department of Computer Science & Engineering, Adamas University, Kolkata 700126, India [email protected]
Abstract. Currency is an important part of trading in our everyday life. Although humans can easily identify and recognize currency used in everyday life, the problem of currency recognition arises when automated machines have to identify currencies for different tasks. The need for an efficient currency recognition system occurs due to the mainstream use of vending machines, currency counters and in several banking areas. With new technology, machines make tedious and monotonous jobs easy for us. Smart trading is impossible without an accurate and efficient currency recognition method. It is widely used in archaeology to identify and study the ancient coins from different eras and places and know more about the period it is from. It also possesses an excellent scope for an intelligent future. Over the years, various number of currency recognition techniques has been developed. Most of them (mechanical and electromagnetic methods) depend on the physical parameters of the currency while image processing methods depend on the features like shape, colour, and edge. The later includes several steps like image acquisition, pre-processing, feature extraction and classification of the given currency. In this paper, authors have briefly discussed the various methods or techniques used for both coin and paper currency recognition. All these methods use variety of techniques and tools to increase the efficiency and accuracy of recognition. The method to be used for currency recognition should provide the maximum accuracy for a variety of coins, but it should also be efficient in terms of cost, time, space, and more. By summarizing all the works till date on currency recognition, the authors have compared all the methods to look for a suitable currency recognition method and the areas that need to be worked upon to produce an even better currency recognition system. Keywords: Coin Currency · Paper Currency · Neural Network · Deep Learning · Feature Extraction · Image Processing
1 Introduction Due to the increasing need of currency recognition in vending machines, banks, supermarkets etc., researchers have developed several currency recognition systems. There are three ways of currency recognition: Mechanical methods, Electromagnetic methods, and Image Processing methods. The mechanical method-based systems focus on using the physical parameters like radius and thickness for recognition, whereas the electromagnetic method-based systems use the material properties for coin recognition by passing the coins through an oscillating coil at a frequency that is definite. But both the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 460–476, 2023. https://doi.org/10.1007/978-3-031-35510-3_44
A Survey on Currency Recognition Method
461
above-mentioned systems have major drawbacks. The mechanical method-based system fails to differentiate two coins with similar physical properties and hence also cannot differentiate fake coins from real ones. Also, this system cannot successfully differentiate between two different coins of similar radius and weight like 2-EURO and 1-Turkish Lira coins [1]. The electromagnetic method-based systems proved to be an improvement of the former mechanical method-based systems. In this method, the coins were passed across an oscillating coil at a definite frequency which helped in distinguishing the varied materials the coins were made of. But it still posed some problems with fake coins. The third method which is the image processing-based systems eliminated all these problems. Image processing-based systems involve three major steps. First is image acquisition where we capture the image of the currency using a camera or a scanner. Then comes image pre-processing where we process the image using various techniques like segmentation, edge detection etc. Which helps in extracting various features. Preprocessing helps in segregating the actual coin image from its background. This removes the clutter from the background and helps in obtaining a clear image with definite features which helps in easy classification of the coin. The last step is the currency recognition which is done using the features extracted from the previous step. The features extracted are matched with those from the previously collected images of various coins in the database. Although, various techniques do exist which use the image-processing based classification but each one has its own advantages and drawbacks. The coin recognition system proposed by Fukumi et el. [2] used a rotation invariant neural network which can identify coins rotated at any degree but the entire process is quite time-consuming [3]. Other techniques which were based on edge-detection [4] and surface color [5] were fairly accurate in identifying the coins but posed problems like dealing with noise sensitivity and increase in computational costs. Similar was the case of paper currency recognition. But over time researchers produced better and improved methods and techniques which solved the drawbacks of the previous currency recognition systems. The main objective of this paper is to summarize the works on currency recognition till date. The entire paper is divided into the following sections. Section 2 discusses the methods for coin currency recognition. Section 3 discusses the methods for paper currency recognition. Section 4 regarding the other works related to coin and paper currency recognition methods. The conclusion and future scope are explored in Sect. 5.
2 Method for Coin Currency Recognition Yasue Mitsukura et el. [6] in the year 2000, produced a technique of coin recognition where a neural network of small size is developed using simulated annealing and genetic algorithm. A neural network of three layers trains the input signals. The inputs are selected by using genetic algorithm with simulated annealing. The simulation results showed that this method is effective for finding a few input signals for this coin recognition system. Overall, this coin recognition system is low-cost and the accuracy achieved using this method is 99.68%. L.J.P. Van der Maaten et al. [7] in the year 2006, proposed a reliable coin classification system, called “COIN-O-MATIC” for heterogenous coins. This coin classification system uses photographs of various coins and sensor information. The authors said that this proposed system showed promising results on a test set
462
R. Mukherjee et al.
available for the MUSCLE CIS benchmark. This system also secured a good classification ratio with computational efficiency. The accuracy efficiency of this coin-recognition system is approximately 72%. The steps followed for classification of coins using this system are discussed as follows: (i)
(ii)
(iii)
(iv)
(v)
Segmentation – In this stage, the coin to be recognized is separated from its background. This is an approach of two stages: Fast Segmentation is the first procedure which involves three steps which are thresholding, edge-detection and then application of morphological operations. Here, the bounding box of the coin is checked. If it is not square (approximately) and large, then failed segmentation is detected. In case of failed segmentation, the next procedure is applied which is mostly same as the previous one except for thresholding. Convolution with a box filter is used instead of thresholding which takes off the conveyor belt structure from coin photographs which allows successful edge-detection of dark coins. Although overall, this segmentation procedure is computationally quite expensive. Feature extraction – In this stage, efficient and specific features are extracted from the pictures of coins. This system uses edge angle-distance distributions which is using distance and angular information to get an overall good characterization of distribution of the edge pixels of a coin. The features obtained are used to train a classifier. Pre-selection – Coin classes are selected based on measurements of thickness and area which are acquired by a thickness sensor and by counting the number of pixels in the coins(segmented) respectively. Classification – Here, 3-nearest neighbor classifier is used. The advantage of using this is it usually performs well on problems consisting of a high number of classes. Images from both sides of the coin that are the heads and tails are evaluated and classified separately. If both the images of heads and tails of a coin are classified in the same class, then the coin is classified according to that. Verification – This stage is necessary because coins in the test set are different than in the training set. So, to get accurate results, it is checked based on visual comparison whether both coin results have the same labels.
Linlin Shenet et al. [8] in the year 2009, developed a coin classification system which was an image-based procedure using Gabor wavelets. These Gabor wavelets are used for extracting features for representing local texture. The coin image is divided into several small sections by using concentric ring structure for achieving rotation-invariance. For whole image representation, Gabor coefficients statistics in each section is chained to a feature vector. Nearest neighbor classifier and Euclidean distance measurement are used to match the two coin images. This proposed method is compared with Edge Angle Histogram Distribution (EAHD) and Edge Distance Histogram Distribution (EDHD) to test this method’s performance. A study showed that EAHD displayed 24.73% and EDHD displayed 53.09% accuracy and the combination of both distributions achieved 30.68% accuracy. Usage of Gabor wavelets in this system increased its discrimination power which resulted in increased accuracy of 74.27%.
A Survey on Currency Recognition Method
463
Hussein R. Al-Zoubi et al. [9] in the year 2010, introduced a coin recognition system for Jordanian coins which used a statistical approach. This coin classification method depends on the color and area of the coin which needs to be recognized. More than 97% recognition rates can be achieved using this method. The steps in this coin recognition approach are (i) (ii) (iii)
Loading the image of the coin which is under test. A gray-level image is obtained from the colored image. A gray-level histogram is drawn using the gray-level image and the threshold value is calculated which is the value between two peaks of the histogram. (iv) A black and white image is generated using the threshold value. A pixel is given the value 0 (black) or the value 255 (white) depending on the value of pixel in the gray image. If it is less than the threshold value, then the pixel is given the value 0, otherwise 255. (v) The obtained image then goes through cleaning which is done by opening and closing through erosion and dilation. (vi) The number of white pixels is counted to find the area of the coin. (vii) Values of individual color for each pixel of the colored image (original image) is summed up and then divided by the total image area to calculate the average value for the RGB (Red, Green, Blue) colours. (viii) The average RGB values and the area of the coin is compared with standard values of the coin categories. The coin is categorized into the category which produces the minimum error. Also, the new values obtained from the image of the coin under test are used to update the standard values. Shatrughan Modi et al. [10], in the year 2011 produced an Automated Coin Recognition System using an Artificial Neural Network (ANN) to recognize Indian coins of denomination 1, 2, 5 and 10. This system considers images of both sides of the coin and hence can recognize a coin from both sides. Hough Transformation, Pattern Averaging and several other techniques are used to extract features from the images. These features are then served as an input to a trained Neural Network which classifies the given coin. The accuracy of this system is 97.74%. The stages of the entire process are discussed as follows(i)
Getting a RGB coin image – In this step, both sides of Indian coins were scanned using a color scanner. The scanned images were at 300 dots per inch (dpi). (ii) Converting the RGB image to Gray Scale – The 24-bit RGB image obtained from the first step is then transformed into a Gray-scale image of 8-bit. (iii) Removing the shadow of coin – Sobel Edge Detection is used to detect the edge of the coin and then Hough Transform for Circle Detection is used to remove the shadow of the coin in the Gray-scale image. Then extraction of coin from the background is done with the help of center coordinates and radius. (iv) Cropping and trimming the image – This is done to make the dimensions of the image equal to 100 × 100. (v) Generating Pattern Averaged image – The 100 × 100 images are segmented and reduced to 20 × 20 by using segments of 5 × 5 pixels and then considering the
464
R. Mukherjee et al.
average of pixel values in the segment. This is done to reduce the complexity and computation in the neural network. (vi) Generating Feature Vector and passing it to trained Neural Network – A Feature Vector of 400 × 1 is generated from the 20 × 20 image by putting all the pixel values into a vector of 1 column. Then the trained neural network is fed the input of Feature Vector consisting of 400 features. (vii) Classification of the coin – This is the last step where the neural network helps in classifying the image of the coin into one of the classes and based on the results the denomination of the coin can be determined. Rahele Allahverdi et al. [11] in the year 2012, introduced a coin classification system for Sassanian coins using Discrete Cosine Transform (DCT). This method shows accuracy of 86.2%. The steps followed for classification of coins using this system are discussed as follows (i)
Pre-processing – In this step, the area of the coins is extracted from cluttered background images. A binary gradient mask is created using a Sobel operator and then it is dilated, holes are filled and small undesired objects are removed. Finally, the region of the coin is extracted by applying the binary mask to the image. (ii) Feature extraction – Discrete Cosine Transform (DCT) is used for feature extraction. DCT shows better accuracy over wavelet transform and Feature transform. (iii) Coin Classification – This is done by voting strategy where each binary classifier can vote for a class and the class with the highest vote number is chosen. Here, the classification is done by Support Vector Machine which is a supervised learning procedure. It gives favorable solutions to problems of multiclass. Adnan Khashman et al. [12] in the year 2006, developed a coin identification system for 2-Euro and 1-Turkish Lira Coins (which are physically similar) known as the Intelligent Coin Identification System. There are two phases in this system: the image processing phase and the training a back propagation neural network phase. This system successfully identifies rotated coins. The first phase reduces the data in the images which in turn reduces the time and computational costs. The second phase learns the patterns of the coins at rotation intervals of 90°. This accuracy of this system is 96.3%. The phases of the system are discussed as follows(i) Image Processing – The images of the coin go through mode conversion, cropping, thresholding, compression, trimming, and pattern averaging. In mode conversion, the RGB image of the coin of 352 × 288 pixels is converted into grayscale. Then the gray image is then cropped to an image of 250 × 250 pixels. Then it undergoes thresholding using a threshold value of 135 which converts it to black and white image. Then it is compressed to 125 × 125 pixels and trimmed to 100 × 100 pixels image. Finally, in pattern averaging the 100x100 pixel image is reduced to a 20 × 20 bitmap using 5 × 5 segments. This provides a faster identification system and makes it a rotation invariant system by solving the problem of varying pixel values within the segments which occurs because of rotation.
A Survey on Currency Recognition Method
465
(ii) Neural Network – This system uses a 3-layer back propagation neural network with 400 input neurons, 25 hidden neurons and 2 output neurons. The coins are classified by output neurons which use binary coding which are [0 1] for 1-TL coins and [1 0] for 2-Euro coins. The neurons in the output and hidden layers are activated using the sigmoid activation function. In this phase the neural network is trained using initial random weights between –0.6 and 0.6 and then tested.
Fig. 1. Comparison of accuracy of different Coin Recognition Techniques.
3 Methods for Paper Currency Recognition Masato Aoba et al. [13] in the year 2003, produced a system of paper currency recognition for Euro. The paper currency recognition system uses perception of three layers and Radial Basis function (RBF). Three-layer perception and RBF (for classification and validation) are used in this system. Perception of three layers is applied for recognition of pattern which is extremely productive tool in terms of classification for paper currency. RBF network can predict probability distribution from a sample data in an effective manner, that is why it is having a feature of rejecting data which is invalid. The accuracy of the proposed system is 100%. Ali Ahmadi et al. [14] in the year of 2003 suggested a method which is used to eliminate non-linear dependencies within variables and bring out the main leading characteristic of data. By utilizing a model of self-organizing map (SOM) the space of data is split into some parts. Then in every part the PCA is executed. A network of learning vector quantization (LVQ) is worked as main system classifier. By utilizing a mode of simple linear, data complexity and correlation complexity which are within variables are molded. The perfection of this method is up to 100%.
466
R. Mukherjee et al.
D. A. K. S. Gunaratna et al. [15] in the year of 2008 introduced a system “SLCRec” having function of linear transformation which was special. It is used to cancel noise patterns which are on the background of the paper currency and does not affect the characteristic images of paper currency. It repairs images of interest. The transformation shows the span of original gray scale into a smaller span of 0 to 125 then by using edge detection, improved robustness for noise and proper characterization of boundary for latest and old spoiled note can be realized. There are two components which are contained in this system: - component of image processing and component of neural network. Firstly, the currency notes which are scanned are transformed into gray scale. That indicates the file formatted image is transformed to pixel values. After that original gray scale pixel values generate a set of value with a linear mixture of the prior values. Then, to take out the image identity, edge detection is carried out. After that, the information from the detected edge is extracted and sorted in a layout which is required by the neural network. The Neural Network Component consisted of four classes and the neural network is trained with notes representing different operational conditions for these four classes. When notes with little to no difference are brought up for classification, the neural network is anticipated to give expected results. The accuracy of this system is 100%. Kalyan Kumar Debnath et al. [16] in the year 2010 proposed a currency recognition system for Bangladeshi currency (Taka) using ensemble neural network (ENN). ENN is a classifier which is trained using Negative Correlation Learning (NCL). The images of the notes are transformed into gray scale and compressed accordingly. In the compressed image, each pixel is sent to the network as an input. This ensemble network decreases the chances of wrong classification compared to an ensemble network with independent training and a single network. This system is efficient in case of recognizing high noise or an old image of the paper currency. The accuracy of this method is 100%. Junfeng Guo et al. [17] In the year of 2010 extracts features from paper notes using an algorithm. The algorithm what they used to extract features from the paper currency is block-LBP algorithm which is a better variety of a traditional method. The method is local binary pattern method (LBP). There are two phases in which the suggested recognition system runs. Model creating phase is the first phase. This phase contains how to make ready model for paper currencies and quality extraction utilizing block-LBP algorithm. The quality vectors of the model images are secured as first phase’s output and applied as second phase’s output. Verification phase is the second phase where the resemblance between the sample image and model image is calculated for matching the models. It is having high speed of recognition with excessive classification perfection which can be obtained by block-LBP algorithm as contrast with traditional LBP algorithm. The accuracy of this given method is 100%. Chetan B. V. et al. [18] In the year of 2012 produced a method of side unvarying paper currency identification. It has two phases. The first phase is to recognize database notes of the same dimension. The second phase is model matching which is executed by tallying the edges of input and matching dimension database note images. The accuracy of the proposed system is 99.5%. The different steps in the entire process are as follows: -
A Survey on Currency Recognition Method
467
(i)
Image Acquisition and Segmentation – For capturing the image, a digital camera is used. Then comes the image segmentation which is a process of three steps. In the first step, edges of the image are detected using the Sobel operator. Then in the second step, noisy edges are filtered out. Finally in the last step, the coordinates of the boundary are noted. Then, the actual currency note is segmented from the input image using the boundary coordinates. (ii) Dimension Matching – In this step, the number of pixels row and column wise are used to obtain the dimensions of the currency note which are then matched with the dimension of every database note. The dimension of the database notes which are matching are then noted down. (iii) Template Matching – After dimension matching, the database notes and the input notes go through correlation. Template matching consists of: Edge Detection, Template Matching by rearrangement of database note and comparison of threshold. (iv) Decision Making – The database note that yields the maximum matching score is considered as the final match and thus the input note is recognized. Ebtesam Althafiri et al. [19] in the year 2012, proposed an image-based paper currency recognition system for Birhani paper notes. This system is based on two classifiers: The Neural Network and the weighted Euclidean distance using suitable weights. At first, the paper currency is scanned to produce a color image of approximately 600 dpi. Then that image is pre-processed to get four diverse types of images. They are the binary image, the gray scale image using Canny mask, the gray scale image using Prewitt mask and the gray scale image using Sobel mask. The sum of the pixels and the Euler number of each of these images are calculated to extract the features. After converting the input image to gray scale, the correlation coefficient of the original image is computed.
Fig. 2. Comparison of accuracy of different Paper Currency Recognition Techniques
Coin-O-Matic: A fast system for reliable coin classification
Statistics of Gabor features for coin recognition
Efficient coin recognition using a statistical approach
2
3
4
Hussein Al-Zoubi et al. [9]
Linlin Shenet et al. [8]
L.J.P. van der Maaten et al. [7]
Design and Evaluation Yasue Mitsukura of Neural Networks et al. [6] for Coin Recognition by Using GA and SA
1
Author
Paper name
Sl.no.
Features
Statistical Approach
Gabor Wavelet
COIN-O-MATIC
99.68%
Accuracy
• Area and color of the coin are key features used in this system
• Refinement power is better; Is robust to illumination variance and noise; For classification, Gabor feature is used
97%
74.27%
• For feature extraction, 72% edge angle-distance distribution is used; For classification, the nearest neighbor approach is used; Used to classify heterogenous coin collection
Neural network (NN) by • Low-cost; Neural using simulated Network developed annealing (SA) and using GA and SA is genetic algorithm (GA) small-sized
Technique used
Table 1. Overview of methods of Coin and Paper Currency Recognition
(continued)
Coin Currency
Coin Currency
Coin Currency
Coin Currency
Type of currency
468 R. Mukherjee et al.
Paper name
Automated Coin Recognition System using ANN
Sasanian Coins Classification Using Discrete Cosine Transform
Intelligent Coin Identification System
Euro banknote recognition system using a three-layer perceptions and RBF networks
Sl.no.
5
6
7
8
Masato Aoba et al.[13]
Adnan Khashman et al. [12]
Rahele Allahverdi et al. [11]
Shatrughan Modi et al. [10]
Author
3-layer perception and Radial Basis function (RBF)
3-layer back propagation neural network
Discrete Cosine Transform
Artificial Neural Network
Technique used 97.74%
Accuracy
• Perception of three layers is applied for recognition of pattern • RBF network can predict probability distribution from a sample data in an effective manner
• Can identify rotated coins • Can distinguish between coins of similar physical features
100%
96.3%
• For feature extraction 86.2% DCT is used which gives better accuracy than Wavelet transform and Fourier transform • For classification, support vector machine is used
• Can recognize coins from both sides
Features
Table 1. (continued)
(continued)
Paper Currency
Coin Currency
Coin Currency
Coin Currency
Type of currency
A Survey on Currency Recognition Method 469
ANN Based Currency Recognition System using Compressed Gray Scale and Application for Sri Lankan Currency Notes-SLCRec
10
D. A. K. S. Gunaratna et al. [15]
A Reliable Method for Ali Ahmadi et al. Recognition of Paper [14] Currency by Approach to Local PCA used local principal component analysis PCA
9
Author
Paper name
Sl.no.
Edge Detection, Artificial Neural Network
Local Principal Component Analysis (PCA)
Technique used
Accuracy
• cancel noise patterns which are on the background of the paper currency without affecting characteristic images • repairs images of interest
100%
• eliminate non-linear 100% dependencies within variables and bring out the main leading characteristic of data • A network of learning vector quantization (LVQ) is worked as main system classifier
Features
Table 1. (continued)
(continued)
Paper Currency
Paper Currency
Type of currency
470 R. Mukherjee et al.
A reliable method for paper currency recognition based on LBP
A Robust Side Chetan.B.V et al. Invariant Technique of [18] Indian Paper Currency Recognition
12
13
Junfeng Guo et al. [17]
Kalyan Kumar Debnath et al. [16]
A Paper Currency Recognition System Using Negatively Correlated Neural Network Ensemble
11
Author
Paper name
Sl.no.
Side Invariance Technique
Block-LBP Algorithm
Negatively Correlated Neural Network Ensemble
Technique used
Accuracy
• recognize database notes of same dimension and matching model by tallying the edges of input and dimension database note images
• extracts feature from paper note using block-LBP algorithm
99.5%
100%
• decreases the chances 100% of wrong classification compared to an ensemble network with independent training and a single network • recognizing high noise or an old image of the paper currency
Features
Table 1. (continued)
(continued)
Paper Currency
Paper Currency
Paper Currency
Type of currency
A Survey on Currency Recognition Method 471
Paper name
Bahraini Paper Currency Recognition
Sl.no.
14
Ebtesam Althafiri et al. [19]
Author Weighted Euclidean Distance, Neural Network using feed forward back propagation
Technique used
Accuracy
• based on two 96.4% (WED), 85.1% classifiers: The Neural (NN) Network and the weighted Euclidean distance using suitable weights • The final classification is done using the Neural Networks using feed forward back propagation method
Features
Table 1. (continued)
Paper Currency
Type of currency
472 R. Mukherjee et al.
A Survey on Currency Recognition Method
473
Finally, the classification is done using the Neural Networks using feed forward back propagation method and the Weighted Euclidean Distance (WED) method where the former gives 85.1% accuracy for best case and the latter shows 96.4% accuracy.
4 Literature Survey A lot of researchers have worked on various currency recognition techniques. However, as their techniques vary, so does the efficiency of the proposed systems/techniques and the accuracy of the currency recognition. Apart from the prominent techniques discussed above there are a few more works that have contributed or opened various pathways to improve the existing currency recognition techniques. They are briefly discussed below – (i)
Coin Recognition using Image Abstraction and Spiral Recognition – Abdolah Chalechale [20], in the year 2007 presented a coin recognition system which uses image abstraction and spiral decomposition. At first, an abstract image is obtained from the original coin image by considering the strong edges of the coin which is then used for feature abstraction process. Spiral Decomposition method is used for feature extraction in which the key concept revolves around the spiral distributions of pixels in the abstract image. This allows the system to find the similarity between coin images of full color multiple components. The major advantage of this system is that image segmentation is not done, which was quite cost-intensive. Also, the author compared the results of this system with Polar Fourier Descriptor (PFD), QVE and Edge Histogram Distribution (EHD) and concluded that the proposed system performs better than the others. (ii) Discovering Characteristic Landmarks on Ancient Coins Using Convolutional Networks – Jongpil Kim and Vladimir Pavlovic Rutgers [21], in year 2016 used Convolutional Neural Networks (CNN) to find characteristic landmarks for recognizing ancient Roman Imperial Coins using deep CNNs (Convolutional Neural Networks) combined with domain hierarchies of master-design. At first, the Roman coin is recognized by exploiting the hierarchical knowledge structure embedded in the coin domain, which is then combined with the classifiers based on CNN. Then, an optimization problem is formulated to find out salient coin regions of specific class. Analyzing the salient regions confirm that those salient regions are consistent with human expert annotations. Practically, this system can successfully identify the ancient Roman coins and the landmarks in a general fine-grained classification problem. (iii) Coin Detection And Recognition Using Neural Network – S.Mohamed Mansoor Roomi et al. [22] in the year 2015, proposed a coin recognition system for Indian coins. The main goal of this system was to recognize if the given object was a coin or not and if it was a coin, then to find its denomination. A camera was fitted inside the coin drop box which was used to obtain an image of the head of the coin at constant illumination and fixed distance with rotation invariance. For images of various coins of different denominations, a database for reference was maintained. The image then goes through segmentation, edge detection and Hough transform
474
R. Mukherjee et al.
to separate the coin from its background, detect boundaries and obtain the circular structure of the coin. If the object is a coin or non-coin, it is decided using polar transform. Then the features are extracted using Fourier coefficients and the feature vectors are fed to the neural network which with the help of the reference database determines the denomination of the coin. The accuracy of this method is 82%. (iv) Feature Extraction of Currency Notes: An Approach Based on Wavelet Transform – Amir Rajaei et al. [23] in the year 2012, produced a method for paper currency recognition which extracted the texture features of the note images. To extract those features, Discrete Wavelet Transform (DWT), particularly Daubechies 1 (DB1) is used and the approximate coefficient matrix of the altered image is noted. From this approximate coefficient matrix, a set of coefficient statistical moments are obtained. A feature vector is used to store all the extracted features which can then be used to recognize, classify, and retrieve the currency notes. (v) A paper currency number recognition based on fast Adaboost training algorithm – Hai-dong Wang et al. [24] In the year of 2011 introduced fast Adaboost weak classifier training algorithm which was a method of number recognition of paper currency. This algorithm is used to sort the Eigen values in an array in ascending order. After that to search the finest threshold and bias this algorithm traverses the sorted array. The speed of training can be increased. The accuracy of this paper currency recognition system is 97%.
5 Conclusion and Future Scope In this paper we have discussed about the methodology and technique of coin and paper currency recognition system based on Neural Network, Image processing and other various techniques. All the methods and techniques of recognizing paper and coin currency are elaborated briefly here so that one can understand easily. We sum up the techniques and features of their methods in a tabular form which is beneficent to the reader for understanding the whole work at a glance. The table provides information about the paper name, authors, techniques used, features of each and every method with their accuracy which helps in drawing a comparison in order to determine the best method for currency recognition in terms of accuracy and optimality. Overall, this paper summarizes the works in a way that is easy to understand and to get a brief knowledge about currency recognition. Although a lot of research works have already been done on both the topics of coin and paper currency recognition but there is some drawback related to the accuracy and efficiency in terms of recognizing coin currency. On the other hand, the research that has been done so far on the paper currency recognition system is giving accuracy of 100 percent but there is an issue of efficiency. It is an arduous task for researchers to get maximum efficiency with 100 percent accuracy for mixed currencies. This is also a challenging factor for researchers when the physical status of a currency is not up to the mark. Therefore, a lot of things need to be reworked and some untouched areas need to be worked upon to obtain a perfect currency recognition system.
A Survey on Currency Recognition Method
475
References 1. Online Resource: https://www.dw.com/en/currency-confusion-helps-smokers/a-1477652 accessed on 11.11.22 2. Fukumi, M., Omatu, S., Takeda, F., Kosaka, T.: Rotation-invariant neural pattern recognition system with application to coin recognition. IEEE Trans. Neural Netw. 3(2), 272–279 (1992) 3. Fukumi, M., Omatu, S., Nishikawa, Y.: Rotation-invariant neural pattern recognition system estimating a rotation angle. IEEE Trans. Neural Netw. 8(3), 568–581 (1997) 4. Nölle, M., Penz, H., Rubik, M., Mayer, K., Holländer, I., Granec, R.: Dagobert-a new coin recognition and sorting system. In: Proceedings of the 7th Internation Conference on Digital Image Computing-Techniques and Applications (DICTA’03), Syndney, Australia (2003) 5. Adameck, M., Hossfeld, M., Eich, M.: Three color selective stereo gradient method for fast topography recognition of metallic surfaces. In: Machine Vision Applications in Industrial Inspection XI (Vol. 5011, pp. 128–139). International Society for Optics and Photonics (2003) 6. Mitsukura, Y., Fukumi, M., Akamatsu, N.: Design and evaluation of Neural Networks for coin recognition by using Ga and sa. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium (2000) 7. Laurens, J.P., van der M., Paul J. Boon.: Coin-omatic: A fast system for reliable coin classification Proceedings of the MUSCLE CIS Coin Computation Workshop, Sep. 1, 2006, Germany, pp. 7–17 8. Shen, L., Jia, S., Ji, Z., Chen, W.-S.: Statictics of Gabor features for Coin Recognition. In: 2009 IEEE International Workshop on Imaging Systems and Techniques (2009) 9. Al-Zoubi, H.R.: Efficient coin recognition using a statistical approach. In: 2010 IEEE International Conference on Electro/Information Technology (2010) 10. Modi, S., Bawa, S.: Automated coin recognition system using ann. Int. J. Comput. Appl. 26(4), 13–18 (2011) 11. Allahverdi, R., Bastanfard, A., Akbarzadeh, D.: Sasanian coins classification using discrete cosine transform. In: The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012) (2012) 12. Khashman, A., Sekeroglu, B., Dimililer, K.: Intelligent Coin Identification System. In: IEEE International Symposium on Intelligent Control (2006) 13. Aoba, M., Kikuchi, T., Takefuji, Y.: Euro banknote recognition system using a three-layered perceptron and RBF networks. IPSJ Trans. Math. Model. Appl 44, 99–109 (2003) 14. Ahmadi, A., Omatu, S., Kosaka, T.: A reliable method for recognition of paper currency by approach to local PCA. In: Proceedings of the International Joint Conference on Neural Networks (2003) 15. Gunaratna, D.A.K.S., Kodikara, N.D., Premaratne, H.L.: ANN based currency recognition system using compressed gray scale and application for Sri Lankan currency notes-SLCRec. Proc. World Acad. Eng. Technol. 35, 235–240 (2008) 16. Debnath, K.K., Ahmed, S.U., Shahjahan, M., Murase, K.: A paper currency recognition system using negatively correlated neural network ensemble. J. Multimed., 5(6) (2010) 17. Guo, J., Zhao, Y., Cai, A.: A reliable method for paper currency recognition based on LBP. In: 2010 2nd IEEE International Conference on Network Infrastructure and Digital Content (2010) 18. Chetan, B.V., Vijaya, P.A.: A robust side invariant technique of Indian paper currency recognition. Int. J. Eng. Res. Technol. 1(3), 1–7 (2012) 19. Althafiri, E., Sarfraz, M., Alfarras, M.: Bahraini paper currency recognition. J. Adv. Comput. Sci. Technol. Res. 2(2), 104–115 (2012)
476
R. Mukherjee et al.
20. Chalechale, A.: Coin recognition using image abstraction and spiral decomposition. In: 2007 9th International Symposium on Signal Processing and Its Applications (2007) 21. Kim, J., Pavlovic, V.: Discovering characteristic landmarks on ancient coins using convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR) (2016) 22. Roomi, S.M., Rajee, R.B.: Coin Detection and recognition using neural networks. In: 2015 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2015] (2015) 23. Rajaei, A., Dallalzadeh, E., Imran, M.: Feature extraction of currency notes: An approach based on wavelet transform. In: 2012 Second International Conference on Advanced Computing & Communication Technologies (2012) 24. Wang, H.-dong, Gu, L., Du, L.: A paper currency number recognition based on fast Adaboost training algorithm. In: s2011 International Conference on Multimedia Technology (2011)
Cryptocurrencies: An Epitome of Technological Populism Senthil Kumar Arumugam1
, Chavan Rajkumar Dhaku2(B)
, and Biju Toms2
1 Department of Professional Studies, CHRIST (Deemed to be University), Bangalore,
Karnataka, India [email protected] 2 School of Commerce, Finance and Accountancy, CHRIST (Deemed to be University), Bangalore, Karnataka, India [email protected], [email protected]
Abstract. From a global perspective, which holds significant cryptocurrencies, this study discusses the volatility and spillover effect between the whales’ cryptocurrencies. Volatility in cryptocurrency markets has always been a time-varying concept that changes over time. As opposed to the stock market, which has historically and recently, the cryptocurrency market is much more volatile. The markets have evidenced many fluctuations in the prices of cryptos. As a result, countries are transforming their policies to suit financial technologies in their economic practices. Blockchain technology allows people to obtain more benefits in a financial transaction and breaks hurdles in the financial system. The study has found no ARCH effect in BinanceCoin, BT Cash, Bitcoin, Vechain, and Zcash. It is discovered that there is an ARCH effect in the case of Ethereum, Tether, Tezos, and XRP. Whale cryptocurrencies have an ARCH effect. Daily closing prices of ten cryptocurrencies, including bitcoin, from January 1, 2019, to December 31, 2020, to determine the price volatility where the bitcoin whales hold significant cryptocurrency values. It has given significant results in case of volatility since we also found that Bitcoin’s largest cryptocurrencies among the sample taken for the study have less volatility than other currencies. Keywords: Bitcoin · Blockchain · Cryptocurrency · Volatility · Populist
1 Introduction Many transactions are not transparent in the present financial system since they do business through a centralized process. It isn’t easy to see the transactions from origin to end when many persons are involved. Moreover, financial institutions do not share some information among themselves and with their stakeholders. This institution enjoys monopoly power, leading to people’s anxieties, disappointments, and unhappiness, which exploit suspicion in the so-called establishment [1, 2]. The disruptive thought branded as “populism” provides a conceptual approach and works against elites involved in corruption. Since 2009, the blockchain technology (BCT) centered digital © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 477–486, 2023. https://doi.org/10.1007/978-3-031-35510-3_45
478
S. K. Arumugam et al.
asset called “Cryptocurrencies” has disrupted the existing financial structure worldwide [3]. As a result, countries are transforming their policies to suit financial technologies in their economic practices. This study compares the link between the practicalities of economic populism (Inferno, 2018) and those of BCT-based cryptocurrencies and puts forward the model of technological populism [2, 4, 5]. BCT allows people to obtain more benefits in a financial transaction and breaks hurdles in the financial system [6]. BCT assures decentralization and transparency through distributed ledger processes, tracing and tracking the transaction, trustworthiness, consensus protocol by parties involved, the smart contract between parties, and security in the trade. Innovation through technology promises social and environmental benefits, resolves complex socio-economic problems, provides power to the defenseless, and prevents misuse of the system by elites is so-called ‘Technological populism.’ An individual who enjoys these benefits using technology and creates wealth is known as a ‘technological populist’ [7]. In the cryptocurrency market, the persons who hold a large amount of the total supply of bitcoin or other cryptocurrencies are called ‘whales.’ These persons manipulate the prices of cryptocurrencies. It again imitates the nature of the ‘elite’ [8]. Hence, the researchers questioned the price fluctuation between one currency and another, considering where the bitcoin whales keep their money. Whether the technological populist get benefits from the effect of cryptocurrency whales? The study helps to understand the foundation of populism in technological innovation in blockchain and its impact on cryptocurrencies.
2 Related Work This article compares political populism to blockchain and proposes a theory of technological populism [1]. Technological populism, as reflected by blockchain platforms, exploits the rhetoric of empowering the disenfranchised through a decentralized decision-making process, enabling anonymity of transactions, dehumanizing trust (promoting trust in computation rather than humans and institutions), and breaking the financial system and money supply monopoly [1]. Few studies focus on populism and technology and how it influences economic growth. Similar studies studied the [1] causal link between populism, which defines and measures populist rule throughout the executive and legislative branches, and TFP. Populism strongly harms TFP, according to the study. [2] examines how much one or more cryptocurrencies would need to be adopted and establish a network effect before deploying such a funding scheme.
3 Objectives of the Study • To compare the link between the practicalities of financial populism and those of BCT-based cryptocurrencies and put forward the model of technological populism. • Determining the price volatility of particular cryptocurrencies, whales have made considerable investments and examined the implications of currency spillover.
Cryptocurrencies: An Epitome of Technological Populism
479
4 Research Methodology The study examined the daily closing prices of ten cryptocurrencies, including bitcoin, from January 01, 2019, to December 31, 2020. The total duration of the study is two years, and the historical data of the cryptocurrencies are extracted and used for analysis. These ten cryptocurrencies represent the whales’ most significant investments in the cryptocurrency exchange industry. Our dataset comprises daily returns of crypto whales for Zcash, XRP, Vechain, Tezos, Tether, Ethereum, Bitcoin, Bitcoin cash price, Binance Coin Price, and Aragon Price from January 01, 2019, to December 31, 2020. Therefore, the sample contains 751 observations for each time series. Two years of study have been taken due to Covid-19 which has disrupted the capital and cryptocurrency markets. It will help us to understand the real impact of other cryptocurrencies more in the pre or post covid period. All the prices are listed in US dollars, and the data can be sourced online at www.investing.com. The daily closing price returns of cryptocurrencies are defined as: Yit = In(Pit) − In(Pit − 1) where p i, t is the price of the cryptocurrency i, i = 1, 2, 3 on day t. The study followed the descriptive statistics of the cryptocurrency whales. In this paper, we probe the short-run and long-run volatility detained through the conditional covariance matrix among the cryptocurrency’s whales. Accordingly, we have used the Dynamic Conditional Correlation (DCC), a simple class of multivariate GARCH model. DCC GARCH Model is applied to capture the degree of volatility correlation changes or spillover of two or more variables. The market integration is depicted by conditional correlation in movement, which is time-varying. DCC was propounded by Engle (2002) to capture the dynamic correlation of returns. Further, we have used volatility as the dependent variable and other cryptocurrencies as the independent variable. The objective is to know whether whales’ cryptocurrencies carry any impact on other cryptocurrencies. Cryptocurrency volatility is measured using statistics like ARCH (1,0,1) and GARCH (1,0,1) (Table 1). 4.1 ARCH Model Engle (1982) proposed the ARCH Model (Auto-regressive Conditional Heteroscedastic Model). q 2 2 2 + . . . + aq εt−q = a0 + aiεt−i et2 = α0 + a1 εt−1 i=1
ARCH model is used to check the long-term effects parameter. ARCH Coefficient shows the long-term effect, and the GARCH Coefficient shows the short-term impact. 4.2 GARCH Model The following model is applied to check the longevity and Intensity of the effect of volatility σt2 = ω + b1σ 2 t − 1 + et GARCH
(1)
480
S. K. Arumugam et al.
5 Analysis and Findings 5.1 Correlation Matrix Table 1. Shows the hypothesis statement. Ho
There is no ARCH effect among the cryptocurrencies
H1
There is an ARCH effect among the cryptocurrencies
Ho
Whales Cryptocurrencies do not impact the Cryptocurrency market
H1
Whales Cryptocurrencies impact the cryptocurrency market Author Compilation Table 2 (a). Summary Statistics for Cryptocurrencies ZCASH
Mean Medium
XRP
VECHAIN
TEZOS
0.0001
−0.0006
0.0020
0.0020
−0.0007
9.61E-05
0.0003
−0.0012
TETHER −7.91E06 0
Maximum
0.2208
0.3397
0.2744
0.2638
0.0197
Minimum
−0.4456
−0.5410
−0.6685
−0.6144
−0.0151
0.0623
0.0598
0.0022
Std. Dev
0.0523
0.0519
Skewness
−0.9532
−1.2928
−1.4619
−1.1100
0.7348
Kurtosis
12.5550
30.2186
22.0194
19.2646
21.9114
JB P-Value
2891.51 0.0000
22768.81
11278.36
0.0000
0.0000
8207.492 0.0000
10959.02 0.0000
Source: E-Views output Table 2 (b). Summary Statistics for Cryptocurrencies BINANCE COIN
ETHEREUM
BITCOIN
BITCOIN
Mean
0.0025
0.0022
0.0027
0.0010
Medium
0.0021
0.0017
0.0018
−8.77E05
Maximum
0.1792
0.2165
0.1589
0.3650
Minimum
−0.5811
−0.5896
−0.4972
−0.5977
0.0488
0.0489
0.0390
0.0556
−2.3962
−2.4462
−2.6518
−1.1907
Std. Dev Skewness
ARAGON 0.0028 0 0.6531 −0.8463 0.0812 −0.7578 (continued)
Cryptocurrencies: An Epitome of Technological Populism
481
Table 2 (continued) BINANCE COIN Kurtosis JB P-Value
31.7421 25861.46
ETHEREUM 33.2493 28599.1
0.0000
0.0000
BITCOIN 40.9399 44699.66
BITCOIN 25.5457 15655.07
0.0000
0.0000
ARAGON 30.9011 23781.06 0.0000
Source: E-Views output Table 3 (a). Correlation Matrix Zcash
XRP
Vechain
Tezos
Tether
Zcash
1.000000
0.687140
0.623126
0.585688
−0.053202
XRP
0.687140
1.000000
0.599460
0.608291
−0.044255
Vechain
0.623126
0.599460
1.000000
0.608553
−0.074585
Tezos
0.585688
0.608291
0.608553
1.000000
−0.016528
Tether
−0.053202
−0.044255
−0.074585
−0.016528
Ethereum
0.766581
0.725211
0.715926
0.686801
−0.063697
Bitcoin
0.697560
0.622200
0.624164
0.618130
−0.089667
Bitcoin cash
0.734414
0.705798
0.648158
0.596287
−0.070744
Binance coin
0.680773
0.581284
0.633957
0.607184
−0.079766
Aragon
0.354068
0.330532
0.373005
0.336371
−0.089472
1.000000
Source: E-Views output Table 3 (b). Correlation Matrix. Ethereum Zcash
0.766581
Bitcoin 0.697560
Bitcoin cash 0.734414
Binance coin 0.680773
Aragon 0.354068
XRP
0.725211
0.622200
0.705798
0.581284
0.330532
Vechain
0.715926
0.624164
0.648158
0.633957
0.373005
Tezos
0.686801
0.618130
0.596287
0.607184
0.336371
Tether
-0.063697
-0.089667
-0.070744
-0.079766
-0.089472
1.000000
0.850213
0.834045
0.769364
0.411805
Ethereum Bitcoin
0.850213
1.000000
0.813741
0.711878
0.387121
Bitcoin cash
0.834045
0.813741
1.000000
0.689941
0.343987
Binance coin
0.769364
0.711878
0.689941
1.000000
0.397986
Aragon
0.411805
0.387121
0.343987
0.397986
1.000000
Source: E-Views output
One of the fundamental analyses of the cryptocurrency whales’ coins return is to test the correlation among the variable. We perform the Pearson correlation to check
482
S. K. Arumugam et al. Table 4. Showing the ARCH effect
Variable
Chi-Squared value
P-Value
Hypothesis Result
rbinancecoin
15.045
0.2390
There is no ARCH Effect
rbtcash
14.897
0.2471
There is no ARCH Effect
rbitcoin
10.794
0.5467
There is no ARCH Effect
21.155
0.0481**
There is an ARCH Effect
0.0000**
There is an ARCH Effect
rethereum rtether
301.71
rtezos
27.638
0.0062**
There is an ARCH Effect
rvechain
15.929
0.1945
There is no ARCH Effect
rzcash
17.111
0.1455
There is no ARCH Effect
rxrp
40.914
0.0000**
There is an ARCH Effect
The symbols *, **, and *** denote the significance at the 10%, 5%, and 1% levels, respectively. Source: E-Views output
the relationship within the variable. Tables 3(a) and 3(b) show that all the pairs of cryptocurrencies have a strong correlation at a 5% significance level. This coin returns display that it has a strong dependence on linearity. We have found tether has a negative correlation with other coins’ returns. A positive correlation suggests that any positive change in the one-coin return variable may lead to a positive difference in the other coin return. Volatility clustering is present in the log return ZCash, Vechain, Tezos, Ethereum, Bitcoin, Bitcoin Cash, and Binance Coin currencies. To check the serial correlation in squared return original theory of partial correlation dies down very fast. This paper examines serial autocorrelation of square return for each currency at the level. The first difference significance level is analyzed if the serial autocorrelation is not present. Based on the result, other ARCH and GARCH models are for discussion. Table 5. Showing the ARCH and GARCH effect Variable
Estimates
Std Error
t-value
P-value
AIC
HQ
rethreum mu ar (1) ma(1) omega alpha(1) beta(1)
0.0035 −0.6214 0.5186 0.0002 0.1374 0.7901
0.0015 0.1824 0.2004 0.0000 0.0357 0.0534
2.3633 −3.4070 2.5874 2.6963 3.8389 14.7864
0.0181 0.0006 0.0096 0.0070 0.0001 0.0000
−3.3005
−3.2859
(continued)
Cryptocurrencies: An Epitome of Technological Populism
483
Table 5. (continued) Variable
Estimates
Std Error
t-value
P-value
AIC
HQ
Rtether mu ar(1) ma(1) omega alpha(1) beta(1)
0.0000 0.3643 −0.7009 0.0000 0.0626 0.9317
0.0000 0.1285 0.1149 0.0000 0.0094 0.0087
0.1274 2.8347 −6.0996 0.0150 6.6278 106.5342
0.8985 0.0045 0.0000 0.9879 0.0000 0.0000
−10.301
−10.287
Rtezos Mu ar(1) ma(1) omega alpha(1) beta(1)
0.0013 −0.6333 0.5505 0.0001 0.0992 0.8651
0.0018 0.1609 0.1724 0.0000 0.0340 0.0495
0.7413 −3.9354 3.1915 1.8409 2.9141 17.4709
0.4584 0.0000 0.0014 0.0656 0.0035 0.0000
−2.8859
−2.8714
Rxrp Mu ar(1) ma(1) omega alpha(1) beta(1)
-0.0001 -0.4662 0.3923 0.0002 0.3511 0.6189
0.0013 0.3133 0.3287 0.0000 0.0572 0.0468
−0.0989 −1.4880 1.1934 5.0500 6.1365 13.2071
0.9211 0.1367 0.2326 0.0000 0.0000 0.0000
−3.4456
−3.4311
Source: R Studio output
According to Table 5, Ethereum, Tether, Tezos, and XRP have significant volatility coefficients. The coefficient value of Residual (−1)^2 (ARCH) and GARCH (−1) is (>1), indicating volatility clustering and persistence. The volatility model is sGARCH(1,1) and ARMA(1,1), which showed substantial alpha and beta values in all cases. Ethereum, Tether, Tezos, and XRP have coefficient values near 1, indicating extreme volatility and decay. Beta 2 (b2) exceeds Beta 1. (b1). The null hypothesis is rejected, and the alternative hypothesis is accepted. The symmetrical and ACF curves show cryptocurrency volatility (Fig. 1). 5.2 DCC-GARCH Result To study the spillover effect on each cryptocurrency, Ethereum is considered the base currency since the currency fulfills the condition of the DCC-GARCH and has the second highest supply in the market. Ethereum is used for short-run and long-run spillover effects on other cryptocurrencies in this table. Below is the result of the short-run and long-run spillover effects of Ethereum on other cryptocurrencies. Table 6 shows Ethereum’s short- and long-term effects on other cryptocurrencies. This study checks short-run and long-run spillover effects using dcca1 (alpha) and dccb1 (beta value). Ethereum’s ARCH impact made it the primary currency for the spillover effect. Ethereum has the second-largest supply behind Bitcoin. Bitcoin cannot meet
484
S. K. Arumugam et al.
Fig.1. Shows the flow of Technological Populism carried out for research
Table 6. Short and Long Run Spillover effect Variable
Estimates
Std.Error
t-value
P-Value
[rethereum].mu
0.003665
0.001786
2.052490
0.040122
[rethereum].omega
0.000201
0.000197
1.024105
0.305785
[rethereum].alpha1
0.136411
0.102151
1.335380
0.181752
[rethereum].beta1
0.796391
0.123518
6.447561
0.000000
[rbinancecoin].mu
0.002156
0.001519
1.419657
0.155708
[rbnancecoin].omega
0.000170
0.000096
1.763517
0.077813
[rbinancecoin].alpha1
0.179263
0.097537
1.837906
0.066076
[rbinancecoin].beta1
0.778477
0.047566
16.366153
0.000000
[rbtcash].mu
0.001523
0.001813
0.839718
0.401066
[rbtcash].omega
0.000437
0.000199
2.193980
0.028237
[rbtcash].alpha1
0.105519
0.087159
1.210658
0.226026
[rbtcash].beta1
0.766074
0.045016
17.017817
0.000000
[rbitcoin].mu
0.003861
0.001527
2.528704
0.011448
[rbitcoin].omega
0.000170
0.000076
2.247251
0.024624
[rbitcoin].alpha1
0.171672
0.145327
1.181283
0.237490
[rbitcoin].beta1
0.755587
0.031374
24.082990
0.000000
[rtether].mu
0.000007
0.000035
0.196509
0.844211
[rtether].omega
0.000000
0.000000
0.033877
0.972975 (continued)
Cryptocurrencies: An Epitome of Technological Populism
485
Table 6. (continued) Variable
Estimates
Std.Error
t-value
P-Value
[rtether].alpha1
0.106012
0.029953
3.539260
0.000401
[rtether].beta1
0.892273
0.034895
25.570593
0.000000
[rtezos].mu
0.001493
0.001792
0.833355
0.404644
[rtezos].omega
0.000169
0.000211
0.803189
0.421866
[rtezos].alpha1
0.100410
0.061177
1.641308
0.100734
[rtezos].beta1
0.863584
0.106470
8.111089
0.000000
[rvechain].mu
0.003244
0.002106
1.540676
0.123396
[rvechain].omega
0.000578
0.000277
2.082003
0.037342
[rvechain].alpha1
0.239747
0.109134
2.196816
0.028034
[rvechain].beta1
0.655082
0.067241
9.742280
0.000000
[rzcash].mu
0.000892
0.001658
0.538063
0.590533
[rzcash].omega
0.000439
0.000252
1.742213
0.081471
[rzcash].alpha1
0.185138
0.077356
2.393313
0.016697
[rzcash].beta1
0.676628
0.117240
5.771290
0.000000
[rxrp].mu
-0.000048
0.001336
−0.035801
0.971441
[rxrp].omega
0.000271
0.000130
2.091100
0.036519
[rxrp].alpha1
0.350135
0.136828
2.558940
0.010499
[rxrp].beta1
0.621224
0.120256
5.165823
0.000000
[Joint]dcca1
0.024495
0.002511
9.753475
0.000000
[Joint]dccb1
0.941977
0.010786
87.330431
0.000000
Source: R Studio output
spillover effect requirements. Four of the ten crypto-currencies have the ARCH effect (p-value less than 0.05). To prevent spillover, we took from Ethereum to all other coins. If the P-value of dcca1 is less than 0.05, Ethereum has a short-term spillover effect on other cryptocurrencies. Ethereum has a long-term spillover effect on other currencies if the dccb1 p-value is less than 0.05.
6 Conclusion In this study, whales affect the bitcoin market. ARCH and GARCH are also in cryptocurrency. The DCC-GARCH model reveals that whales significantly impact short-term and long-term currency volatility. Due to its market capitalization, Bitcoin attracts the most investment. Bitcoin doesn’t cause the ARCH effect. Ethereum is the second-biggest investment behind bitcoin. ARCH and GARCH analyses showed bitcoin having more market volatility.
486
S. K. Arumugam et al.
Cryptocurrencies demonstrate significant conditional correlation in the market. Tezos, Tether, and Aragon have shown persistent volatility utilizing the total coefficient value of Residual (-1) and GARCH (-1). The market was affected by whales’ currency trading and investment in the largest market capitalization currency. The whale’s portfolio management in selected currencies has caused short-term and long-term volatility spillover on cryptocurrencies. Data and cryptocurrencies are limited to evaluate the shortterm, lasting spillover effect and cryptocurrency volatility. Measure the stock market and macroeconomic effects of the most traded and volatile cryptocurrency.
References 1. Gikay, A.A., Stanescu, C.G.: Technological Populism and Its Archetypes: Blockchain and Cryptocurrencies. SSRN Electron. J. (2019) 2. A.A, “Steve Bannon: Crypto to Become Part of ‘Global Populist Revolt’ (2019) 3. A. J, “Why Blockchain Will Trump Populism” (2017) 4. Baron, “Crypto Clash: Political Risks to Cryptocurrency. Available online (2021) 5. B. D, Surge proves ‘populist’ Dogecoin is no joke - Cryptocurrency skyrockets 400 percent in one week (2021) 6. B. J, “Forget far-right populism – crypto-anarchists are the new masters.” The Guardian, available online, (Accessed February 21, 2021) (2017) 7. Lehner, E.: A call for Second-Generation Cryptocurrency Valuation Metrics. How does access to this work benefit you ? Let us know ! (2018) 8. Gasset, S.: Bitcoin: signal of a traditional monetary system in distress? The Institute for Politics and Society, Paper , p. nil (2021) 9. Mäntymäki, M., Wirén, M., Najmul Islam, A.K.M.: Exploring the disruptiveness of cryptocurrencies: a causal layered analysis-based approach. In: Hattingh, M., Matthee, M., Smuts, H., Pappas, I., Dwivedi, Y., Mäntymäki, M. (eds.) Responsible Design, Implementation and Use of Information and Communication Technology. I3E 2020. LNCS, vol 12066. Springer, Cham (2020). Doi:https://doi.org/10.1007/978-3-030-44999-5_3 10. Stanescu, C.G., Gikay, A.A.: Technological populism and its archetypes: blockchain and cryptocurrencies. Nordic J. Com. Law, 66–109 (2019)
Forecasting Bitcoin Price During Covid-19 Pandemic Using Prophet and ARIMA: An Empirical Research Chavan Rajkumar Dhaku1(B)
and Senthil Kumar Arumugam2
1 Department of Commerce, CHRIST (Deemed to Be University), Central Campus, Bangalore,
India [email protected] 2 Department of Professional Studies, CHRIST (Deemed to be University), Central Campus, Bangalore, India [email protected]
Abstract. Bitcoin and other cryptocurrencies are the alternative and speculative digital financial assets in today’s growing fintech economy. Blockchain technology is essential for ensuring ownership of bitcoin, a decentralized technology. These coins display high volatility and bubble-like behavior. The widespread acceptance of cryptocurrencies poses new challenges to the corporate community and the general public. Currency market traders and fintech researchers have classified cryptocurrencies as speculative bubbles. The study has identified the bitcoin bubble and its breaks during the COVID-19 pandemic. From 1st April 2018 to 31st March 2021, we used high-frequency data to calculate the daily closing price of bitcoin. The prophet model and Arima forecasting methods have both been taken. We also examined the explosive bubble and found structural cracks in the bitcoin using the ADF, RADF, and SADF tests. It found five multiple breaks detected from 2018 to 2021 in bitcoin prices. ARIMA(1,1,0) fitted the best model for price prediction. The ARIMA and Facebook Prophet model is applied in the forecasting, and found that the Prophet model is best in forecasting prices. Keyword: ADF · Bitcoin · Bubble · Blockchain · Cryptocurrency · Covid-19 · Fintech
1 Introduction Due to its unique characteristics, ease of use, openness, and rising popularity, Bitcoin has drawn much attention. Since Nakamoto initially described it in a paper in 2008, it first appeared online in 2009. As rightly said, observing the bubble-like behavior in the bitcoin data does not require any deep understanding and insights, but not everyone will agree on the correct interpretation. The term “bubble” has always been used in the financial world to describe situations when asset prices dramatically change from their mathematical valuation (reference) point over a brief period and then collapse in an equal or shorter time. Herding conduct in the market frequently results in this explosive movement in asset values. Financially speaking, these are referred to as “destabilizers” © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 487–495, 2023. https://doi.org/10.1007/978-3-031-35510-3_46
488
C. R. Dhaku and S. K. Arumugam
since they cause a structural break in the time series but also cause investors’ opportunity costs to rise by double as time is lost. Many researchers have studied bitcoin’s volatility and its spillover into other cryptocurrencies. The definition of currency says that it should have some store value and be exchangeable for some commodity. This study has extensively used the Prophet and ARIMA models to forecast the Bitcoin Prices for the next 365 days. The study aims to predict and identify the structural breaks in Bitcoin and check the bubbles during the covid-19 pandemic. The study also aims to predict the price of bitcoin after the pandemic. A successful currency often serves as an accounting unit, value store, and trade medium—all three issues bitcoin faces. Since bitcoin has no inherent value, its value ultimately depends on its usefulness as a form of payment in the consumer market. The development of Bitcoin as a viable unit of account confronts numerous challenges. Its tremendous volatility is one issue covered in more depth below. Retailers who accept bitcoin must frequently recalculate prices because the value of a bitcoin relative to other currencies fluctuates considerably daily. It is both expensive for the store and confusing for the customer. This problem would theoretically disappear in a world where bitcoin served as the primary source of money, but such a place does not exist at the moment (Fig. 1).
Fig. 1. Flow chart of Bitcoin price forecasting application models.
2 Related Work Many researchers have found the speculative bubbles in bitcoin, though [1] provided empirical evidence to address the bubbles in Bitcoin markets and determine the fundamental value of Bitcoin. Some researcher has classified speculative bubbles into two categories rational and irrational [2]. Speculative bubbles are rational when investors know that prices have moved away from fundamental values in such cases, considering a higher price. Speaking about the irrational bubble is mainly driven by the psychological factor of herd instincts [3]. [11] has adopted the ARIMA and Prophet methods for forecasting the bitcoin prices, finding that Prophet outperforms ARIMA by 0.94 to 0.68 in R2 value. Possible causes for the development of bubbles include self-fulfilling expectations (rational bubble), mispricing of fundamentals (intrinsic rational bubble), and the endowment of irrelevant external variables with asset pricing value (extrinsic rational bubble). Investors’ expectations that they can sell an inflated asset for a profit at a higher price
Forecasting Bitcoin Price During Covid-19 Pandemic
489
result in rational bubbles. In contrast, irrational bubbles develop when investors are motivated by psychological variables that have nothing to do with the asset’s intrinsic value. Investors may fall victim to this when they use basic heuristics influenced by market sentiments or unrealistically optimistic expectations, trends, and fads [4]. Another paper examined the relationship between the explosive behavior of cryptocurrencies through a unit root approach [5]. According to additional research, the FIGARCH model and SB variables offer a significantly higher predicting accuracy performance [6]. It has been observed in [12] that price volatility is a primary issue with decentralized cryptocurrencies; also, bitcoin prices display non-stationary behavior, meaning that their statistical distribution fluctuates over time. Few authors have used the unit root testing approach to study the connection between the volatile behavior of cryptocurrencies [5]. Using sub-sample dickey-fuller statistics, the authors of one study attempted to examine the effect of non-stationarity volatility on the test’s effectiveness for explosive financial bubbles [7]. Since post-2020, we can see the multiple bubbles in the bitcoin [8] found a recursive testing procedure and dating practical algorithm detecting various bubble events. Due to the numerous events in the bubble of bitcoin, there is also frequent jump, these jumps positively impact market activity as proxied by volume, and traders harm the liquidity [9]. In another study [14], the researcher has used the comparison of the state-of-art strategies in predicting the movement for bitcoin, including Random Guessing and a Momentum based strategy, and also applied ARIMA, Prophet, Random Forest, Random Forest Lagged-Auto-Regression, and Multi-Layer Perceptron (MLP), MLP has achieved the highest accuracy of 54% compared to other time series prediction. In [14], studied the forecasting prices of Bitcoin and Google Stock, it has been discovered that ARIMA is better at predicting the prices of time series data. A similar study was carried out in [15] and discovered that the Facebook prophet model precisely predicts the costs of various cryptocurrencies.
3 Research Methodology The equation used in this study is y(t) = μ + δy(t − 1) +
i
p
ϕiy(t − 1) + ε(t)
Y(t) is the daily closing price of Bitcoin, µ is the intercept, p is the maximum number of lags, ϕ is the differentiated lag coefficient for “i” lags, and is the error term [10]. The above equation hints at the bitcoin currency. The source of the data is collected from investing.com. Data is collected from 1 April 2018 to 31 March 2021. To check the bitcoin rationality bubble pre- and post-covid. Three advanced enhanced Dicker Fuller tests (ADF, RADF, SADF) identified the bitcoin bubble. Phillips, Shi, and Yu’s 2012 supremum supplemented Dickey-generalized Fuller’s version. From April 2018 to March 2021, 1096 bitcoin high-frequency closing prices were collected. Bitcoin bubble stationarity was assessed. Time series stationary data employing unit root tests such the Augmented Dickey-Fuller Test (ADF), RADF, and SADF. Structural breakdowns study identified time series pattern breaks. Bai-perron test to find numerous breaks and confirm structural break analysis coefficient dates. We used the Auto Regressive Integrated Moving Average Model (ARIMA) to anticipate bitcoin values for one year and compared it to the Facebook prophet model.
490
C. R. Dhaku and S. K. Arumugam
4 Result and Discussion The Descriptive statistics show total observations of 1096, which describes a mean value of 11536.09, Standard error of 333.5784, and standard deviation of 11043.41. The kurtosis value is 7.711405, which follows the leptokurtic. It means fat tails observed in Bitcoin price. The skewness value is 2.859883; bitcoin data is highly skewed. The ADF statistics show the stationarity result in time series data of bitcoin price, where the tstatistics value is −33.801, and the p-value is 0.000 significant value. AIC is 15.877, SIC is 15.890, HQ is 15.882, and the Durbin Watson statistics test for detecting the autocorrelation in the residuals values from a statistical model shows 1.998 value, which means that there is a positive autocorrelation in detected samples. Further, we have applied the Unit root with beak test on Bitcoin Price, which shows a t-statistic value of −10.444; the p-value is significant, which is 5.5 are placed in class A, and Four Hundred Ninety Four (494) earthquakes having magnitude between 3.5 < Mw < 5.5 are placed in class B. These earthquakes henceforth are called as central earthquake. Distribution of these earthquakes in the study area and Class A & Class B is as shown in Fig. 2. Four features are extracted from the area enclosing 25 km around each central earthquake, as given below (F1- F4): 1. 2. 3. 4.
F1: Magnitude of index earthquake, Abbreviated as MAG F2: Number of epicentres other than Index earthquake, Abbreviated as NE F3: Number of tectonic features, Abbreviated as NT F4: Number of intersections, Abbreviated as INT
Fig. 2. Distribution of earthquakes in the study area. Circles of blue color represents Class A & and squares of red color represents Class B earthquakes.
572
Mridula and K. K. Gola
An example of identification of these features for the 1986 Dharmshala earthquake is as shown below in Fig. 3. Blue circle represents the area of influence of 25 km, within which the features are identified and extracted. Features F1 F2 F3 F4
Value 5.8 6 5 2
F1: Magnitude of Central earthquake = 5.8; F2: Number of epicenters other than central earthquake within the circle: 06; F3: Number of tectonic features: 05; F4: Number of intersections of tectonics features: 02.
Fig. 3. Identification and extraction of features [F1 – F4] by considering a circle of radius 25 km from central earthquake; Example of Dharamshala earthquake of 26th April, 1986 Mw : 5.8; Lat: 32.15o N; Long: 76.40o E.
Mean and standard deviations of all the four features identified for both classes A and B are calculated (Table 1.) and curves are plotted to observe the degree of separation between features of class A and B. Of all extracted features F4 i.e. number of Intersections shows the maximum difference between means. Lesser is the degree of separation between classes, difficult is the classification. 4.1 Steps The PR techniques applied in this paper consists of following steps: Step1: Preparation of Training data set. Step2: Use of various pattern recognition techniques for classifying the set of data into two classes as trained through the training set. Step3: Four classification techniques namely: LDA, SVM, KNN, and ANN. Matlab functions for Discriminant analysis, SVM, ANN and KNN are used to train and classify the earthquakes. Step4: Calculating the accuracy of the algorithm.
Comparative Study of Various Pattern Recognition Techniques
573
Table 1. Values of Mean and Standard Deviations of extracted features Feature
Mean Class A
Class B
Standard Deviation Curves to represent separation between Class B two classes
Class A
F1
1.217
0.866
2.606
1.359
F2
0.717
0.408
1.939
1.087
F3
1.826
1.623
3.028
1.6486
F4
13.956
9.238
16.449
8.4893
5 Results and Discussion Linear Discriminant Analysis: The decision tree used for classification as per LDA is shown in Fig. 4. The error in classification was found to be = 24.74%, which indicates that out of 540 epicenters, 145 earthquakes are misclassified, i.e. are placed in the different class than original. Figure 4 illustrates the epicenters (with a green cross) which were placed in the classes other than as designated. The Decision Tree is based on the following logic: NODE
DECISION
1
if NT=0.5 then node 3 else B
2
if INT=12.5 then node 5 else A
3
if INT=3.5 then node 7 else B
4
class = A
5
class = B (continued)
574
Mridula and K. K. Gola
(continued) 6
if NT=1.5 then node 9 else B
7
if INT=5.5 then node 11 else B
8
class = A
9
if NT=2.5 then node 13 else B
10
if MAG=0.5 then node 15 else B
11
class = B
12
if MAG=2.5 then node 17 else B
13
class = B
14
class = B
15
if NT=1.5 then node 19 else B
16
if NE=2 then node 21 else B
17
class = A
18
class = A
19
class = B
20
class = B
21
class = A
Fig. 4. Decision Tree for classification
The accuracy of classification of Support vector machine, SVM is 93.55% i.e. it has the error of 6.45%. Similarly, accuracy for KNN is 92.22% and ANN has showed the best performance by having accuracy of 97.1%. Figure 5 shows the confusion matrix for the ANN. A comparative result of all four classification techniques are as shown in Table 2. ANN showed best results in terms of accuracy because it has the ability to work with insufficient data. However, the complexity is increased due to involvement of
Comparative Study of Various Pattern Recognition Techniques
575
several parameters (training samples, size of network, learning parameters etc.). In this case the complexity of the seismic data was well handled by the ANN.
Fig. 5. Confusion matrices for training class, validation, test samples and combination of all
Table 2. Experimental results for Classification of earthquakes Based on different methods S/ N
Experiment
Parameters
LDA
SVM
KNN
ANN
Number of Observation
540
540
540
540
Number of training set
210
210
210
210
Number of Testing Data
330
330
330
330
Number of Classes
2
2
2
2
Number of Features
4
4
4
4
Error (%)
24.74
6.45
7.78
2.9
Accuracy of classification (%)
75.26
93.55
92.22
97.1
6 Conclusion This study formulated comparative study of different classification algorithms for identifying seismically susceptible areas. Four features; namely magnitude of central earthquake, number of seismicity, number of tectonic features and number of intersections, were extracted from the radius of 25 km around the index earthquake. Of the four classification methodologies, viz. LDA, SVM, ANN, and K-NN, ANN showed the best performance by reducing the error in misclassifying the epicenters from either classes. The errors in classification were 24.74%, 6.45%, 7.78%, 2.9% for LDA, SVM, KNN
576
Mridula and K. K. Gola
and ANN respectively. The LDA is easy to implement however the error in classification is more because of the complexity of seismic data. The main advantages of k-NN is that it is easy to implement, simplicity, no need to make a model, adjust numerous parameters or build supplementary assumptions. But it is slower in cases of more predictors/examples/independent variable increased. ANN is more complex in nature due to involvement of several parameters, showed best results in terms of accuracy because it has the ability to work with insufficient data.
References 1. Middlemiss, C.S.: The Kangra earthquake of 4th April, 1905. Geological survey of India (1910) 2. Mridula, Sinvhal, A., Wason, H.R.: A review on pattern recognition techniques for seismic hazard analysis. In: Proceedings of International Conference on Emerging Trends in Engineering and Technology, pp. 854–858 (2013) 3. King, G., Yielding, G.: The evolution of a thrust fault system: processes of rupture initiation, propagation and termination in the 1980 El Asnam (Algeria) earthquake. Geophys. J. Int. 77(3), 915–933 (1984) 4. Ikeda, M., Toda, S., Kobayashi, S., Ohno, Y., Nishizaka, N., Ohno, I.: Tectonic model and fault segmentation of the Median Tectonic Line active fault system on Shikoku, Japan 28(5) Tectonics, (2009) 5. Mark, R.K.: Application of linear statistical models of earthquake magnitude versus fault length in estimating maximum expectable earthquakes. Geology 5(8), 464–466 (1977) 6. Gelfand, I.M., Guberman, S.l., Izvekova, M.L., Kelis-Borok, V.I., Ranz Man, E.J.A.: Criteria of high seismicity, determined by pattern recognition. Tectonophysics 13, 415–422 (1972) 7. Gelfand, I.M., Guberman, Sh.A., Izvekova, M.L., Keilis-Borok, V.I., Ranzman, E.: Recognition of places where strong earthquake may occur, I. Pamir and Tien Shan, 6, (In Russian) Computational Seismology (1973) 8. Oday, V., Gaur, V.K., Wason, H.R.: Spatial prediction of earthquakes in the Kumaon Himalaya by pattern recognition, 30(2 & 3), pp. 253–264, Mausam (1979) 9. Sinvhal, A., Sinvhal, H., Joshi, G., Singh, V.N.: A valid pattern of microzonation. In: Proceedings of 4th International Conference on Seismic Zonation, vol. 3, pp. 641–648 (1991) 10. Sinvhal, A.: Seismic modelling and pattern recognition in oil exploration. Springer Science & Business Media; 2012 Dec 6 11. Bhatia, S.C., Chetty, T.R.K., Filimonov, M.B., Gorshkov, A.I., Rantsman, E.Y., Rao, M.N.: Identification of potential areas for the occurrence of strong earthquakes in Himalayan arc region. Proc. Indian Acad. Sci.-earth Planetary Sci. 101(4), 369–385 (1992) 12. Peresan, A., Zuccolo, E., Vaccari, F., Gorshkov, A., Panza, G.F.: Neo-deterministic seismic hazard and pattern recognition techniques: time-dependent scenarios for North-Eastern Italy. Pure Appl. Geophys. 168(3), 583–607 (2011) 13. Mridula, Sinvhal, A., Wason, H.R.: Identification of seismically susceptible areas in western Himalaya using pattern recognition. J. Earth Syst. Sci. 125(4), 855–871 (2016) 14. Sinvhal, A., Khattri, K.: Application of seismic reflection data to discriminate subsurface lithostratigraphy. Geophysics 48(11), 1498–1513 (1983) 15. Sinvhal, A., Khattri, K.N.: Sinvhal, H., Awasthi, A.K.: Seismic indicators of stratigraphy. Geophysics 49(8), 1196–1212 (1984)
Intelligent Diagnostic System for the Sliding Bearing Unit Alexey Rodichev(B) , Andrey Gorin, Kirill Nastepanin, and Roman Polyakov Orel State University n.a. I.S. Turgenev, Komsomolskaya Street 95, 302026 Orel, Russia [email protected]
Abstract. The article deals with the problem of classifying the states of functioning of a rotor system. In a continuous experiment, four different states of a sliding bearing unit were distinguished on the basis of thermogram images. ResNet convolutional neural network training was used for automated classification of thermogram images obtained with a thermal imager The data were partitioned into training and testing samples. Two types of partitioning were used: random one and sequential one. The data were partitioned in the ratio of 70% for training and 30% for testing. With sequential partitioning, the training sample was selected at the beginning of recording each individual experiment, and the test sample at the end. The best results were obtained when training the ResNet18 SNN. The test accuracy for the random method of data partitioning was 67.5%, and for the sequential method it was 54.6%. Keywords: sliding bearing · diagnostics · defects · artificial neural network (ANN) · super-precision neural network (SNN)
1 Introduction The sliding bearing units are widely used in all fields of science and technology. One of the causes of failure of a bearing unit is the wear of the sliding bearing surfaces. The early detection of the defect allows to avoid an accidental situation, and as a consequence, irretrievable loss of an expensive unit as a result of failure of critical units [1–6]. At the moment, it is very important not only to detect a defect that occurs in the process of operation of a unit or aggregate, but also to learn how to predict it in time, which will significantly reduce the accident rate and, as a consequence, the labor intensity of repair of complex technological equipment. The most widespread method of diagnostics of the current state of bearing units and aggregates is their monitoring based on the analysis of vibration measurement data. Such studies are well known [7, 8], since the vibration signals reflect the dynamic behavior of the rotor system well. The infrared photogrammetry is another promising method for monitoring the state of a bearing unit and, in particular, the state of a slide bearing, which deserves its attention. This method consists in monitoring the state of the sliding bearing unit by means of thermogram images, since the rise or fall of the temperature from its set value is directly related to the occurrence of a defect or malfunction [9]. The design of experimental test rig, the essence of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 577–586, 2023. https://doi.org/10.1007/978-3-031-35510-3_54
578
A. Rodichev et al.
method, and the approach to its implementation remain virtually unchanged from study to study. The equipment used differs insignificantly and probably depends on the state of the researchers’ material and technical resources. The main differences/innovations proposed by the authors of each study relate to the algorithms of feature extraction from the thermogram images obtained [10, 11]. In particular, several varieties of convolutional neural networks application as the most effective method of data processing have been proposed. All researchers note that the most popular and thoroughly described in the scientific literature method of diagnosing faults of machines and technological equipment is a vibration method, which has much less accuracy compared to the proposed methods of diagnostics using infrared photogrammetry. The use of the vibration method is associated with some difficulties: it implies an installation of certain sensors and post-processing of the received signal (getting rid of noise), etc. The same can be said about the frequency method of diagnostics. The accuracy of the proposed methods using convolutional neural networks is close to 100%, in contrast to the vibration method (~80% depending on conditions) [12–14].
2 Experimental Research To study the possibility of predictive diagnostics of sliding bearing units using the measuring system “thermal imager - thermogram images - artificial neural network” and timely elimination of malfunctions occurring during the operation of bearing units, an experiment consisting of a series of consecutive experiments was conducted. The task of experiment was to reveal the changes occurring in the bearing unit, in the course of time, on the basis of indications of the sensors which are integrated into the given unit and carrying out infra-red photogrammetry of a state of the sliding bearing unit, in our case with the use of the thermal imager. For this experiment, an experimental test rig was developed (see Fig. 1). This test rig involves the use of sliding bearing units with an installed monitoring and measuring system, as well as the possibility of bringing hydraulic paths to all supports and, accordingly, the possibility of feeding the lubricant directly into the sliding unit during its operation. The experimental test rig has a frame on which the motor is mounted, the rotor resting on the sliding bearings and connected to the motor shaft by means of a flexible cam coupling with the elastic elements. The sliding bearings are mounted in bearing units (supports) which are lubricated by flexible hydraulic paths inside their containers fixed on the stand. The hydraulic ducts together with the lubricant reservoirs form the lubrication system of this unit. Two disks are attached to the rotor to increase its inertial mass. In the course of the experiment, two independent monitoring systems were connected to the unit to diagnose the operation of the bearing unit. The first monitoring system consisted of temperature and wear sensors for the bearing surface, which were integrated in the bearing construction (see Fig. 2). A speed sensor was integrated into the measuring system to determine the speed. A microcomputer Raspberry Pi 3 Model B + was used to receive readings from all sensors built into the sliding bearing unit and a real-time sliding bearing condition monitoring program was developed to process the sensor readings.
Intelligent Diagnostic System for the Sliding Bearing Unit
579
Fig. 1. Structure-functional diagram of the experimental test rig.
Fig. 2. Schematic diagram of the control and measuring system: 1 - body; 2 - shaft; 3 - sliding bearing; 4 - sliding bearing surface temperature sensors; 6 - sliding bearing working surface temperature sensors; 8 - lubricant temperature sensors; 5, 7 - wear control indicator; 9 - speed sensor; 10 - cover; 11 - controller; 12 - personal computer
The second monitoring system included a UNI-T UTi260B thermal imaging camera with a personal computer connected to it. Both monitoring systems worked in parallel, neither of them excluded the other. The electric motor was controlled with the help of a model of switching unit developed within the framework of the 2nd stage of the project “Creation of digital system
580
A. Rodichev et al.
for monitoring, diagnostics and predicting the technical equipment state using artificial intelligence technology on the basis of national hardware and software”. The experiment was conducted within 30 h, which were divided into time intervals, two hours each (experiments). During this time, the data was obtained from all sensors of the first monitoring system, while the second monitoring system “thermal imager personal computer” took 108000 images of thermograms. The state of the sliding bearing unit and, in particular, of the sliding bearing working surface was monitored by means of the monitoring systems as well as visually by means of disassembling and inspecting the sliding bearing unit. In the course of the experiment, several defects occurred, namely the wear of the sliding bearing working surface and damage to the seal of the bearing unit. The damage to the seal was immediately detected visually (leakage of lubricant from the slide bearing unit), parallel to this the readings of the monitoring system sensors were recorded and thermograms of this bearing unit were obtained. The wear of the working surface of the sliding bearing was determined using the sensors of the monitoring system (rise in temperature of the lubricant and the sliding layer). The temperature change was also recorded with a thermal imaging camera (see Fig. 3).
Fig. 3. Images of the thermogram of the sliding bearing unit.
In the course of the experiment, a malfunction of the surface layer temperature sensor was detected from the data obtained by the monitoring system. The state of the sliding bearing unit with this malfunction was also recorded by the thermal imager, using images of thermograms.
Intelligent Diagnostic System for the Sliding Bearing Unit
581
3 Processing of the Obtained Results To process the data obtained from the thermal imager, 28,800 images were extracted as thermogram images with the help of a neural network. The images were divided into four groups (see Fig. 4) of 7,200 images each: 1. 2. 3. 4.
damage to the working surface of the sliding bearing; damage to the seal in the bearing unit; damaged to the temperature sensor; no damages.
Fig. 4. Images of the thermogram of the sliding bearing unit.
The beginning of solving the problem associated with the training of a neural network is the creation of databases of thermogram images, which are divided into two blocks: training one and test one. Each block contains images of thermograms of four states of the system. To solve the problem of defect classification let us consider two ways of training a neural network, conditionally calling them the “random partitioning” method and the “sequential partitioning” method. The classification of a neural network by the method of “random partitioning” consists in distribution of thermogram images into the above-mentioned classes by training the network and testing on a test sample, where the thermogram images obtained in the course of the experiment are randomly divided into two samples - training one and test one, in the ratio of 70% and 30% respectively for each of four states of the system. (see Fig. 5). The classification of the neural network by the “sequential partitioning” method consists in the distribution of images of thermograms: where 70% is the initial stage of the experiment, 30% is the final stage of the experiment (see Fig. 6), for each of the four states of the system.
582
A. Rodichev et al.
Fig. 5. The classification of the neural network by “random partitioning”.
Fig. 6. The classification of the neural network by “sequential partitioning” method.
Two problems were solved using the teacher-assisted learning method, which trained the model by determining the relationship between the input data set X = ((x(1) ,…,x(N) )) and the target class set Y = ((y(1) ,…,y(N) )) by finding the weight coefficients , where N is the number of examples. Each example was fed to the input of the ANN, then a direct pass of the ANN was performed to obtain the prediction, which was compared to the target value by calculating the loss function. The Cross entropy loss function was used for the classification task: L() = −
C yc ln(hc ) ⇒ min,
(1)
c=1
where C - is the number of classes, h - is the predictions of the ANN. The ANN predictions are a vector of probabilities for each class, the sum of which is equal to one. The predicted class was determined by the maximum values of the probability predictions. The SoftMax function was used to obtain the output values of the ANN: h(xi ) =
e xi C e xc
(2)
c=1
Next, the back pass of the ANN was performed to minimize the loss function by gradient descent methods by changing the weight coefficients. During one epoch, the
Intelligent Diagnostic System for the Sliding Bearing Unit
583
training sample was randomly partitioned into batches of a certain size, which were sequentially fed to the input of the ANN. Next, the loss function was calculated and one gradient descent step was performed. The training continued until the error over the entire training batch reached an acceptably low level. If the number of epochs was large, retraining occurred. In this work, super-precise neural networks (SNN) were used. They compressed pixel intensity data and extracted important features. The convolution operation was the total scalar product of the input image fragments and the corresponding core. The core components were a part of a set of network parameters and their values were determined in the course of training. The super-precise layers use padding to control the size of the output and stride to reduce the dimensionality of the data. The number of cores in the super-precise layer determined the number of output channels. After the convolution operation, batch normalization layers and activation functions were also used. The most popular activation function is the ReLU function. x, if x ≥ 0 (3) h(x) = 0, if x < 0 One of the well-known SNN architectures is ResNet. Their feature is the use of fast access connections, which allows to stabilize the SNN training as their depth increases. These connections feed several layers of the SNN and perform summation of the outputs. For an additional comparison, the obtained thermogram images were trained using two neural networks: resNet 18 and resNet 34 in two ways. The training resulted in a trained neural network model. After the training phase, the trained model was tested on the test sample images, which allowed to test the network on the images that the neural network did not detect. Based on the results of this training, the error matrices of the resNet 18 and resNet 32 neural networks were obtained (see Fig. 7).
Fig. 7. Confusion matrices "random partitioning”.
584
A. Rodichev et al.
To evaluate the accuracy of the artificial neural networks, we used the “accuracy” metric, which represents the proportion of correct predictions out of the total number of predictions. For resNet 18 the “accuracy” metric was 67.5%, and for resNet 34 the “accuracy” metric was 59.2%. The error matrices of resNet 18 and resNet 32 neural networks when solving the “sequential partitioning” classification problem are presented in Fig. 8.
Fig. 8. Confusion matrices “sequential partitioning”.
The accuracy of the neural networks in solving the classification problem using the “sequential partitioning” method: for resNet 18 the “accuracy” metric was 54.6%, and for resNet 34 the “accuracy” metric was 52.9%.
4 Conclusion Based on the obtained results of a long-term experiment divided into a series of experiments, a set of data on the thermal state of the sliding bearing unit was prepared, in the course of which four different states of the bearing unit were obtained: 1. optimal operation of the sliding bearing unit (no damages); 2. damage to the measuring system during operation (damage to the temperature sensor); 3. damage to the seal in the sliding bearing unit 4. damage to the working surface of the sliding bearing. Based on machine training methods, sample neural networks were prepared, which showed relatively high accuracy in determining the defective state of the system under consideration. The best results were obtained when training the ResNet18 SNN. The testing accuracy with the random method of data partitioning was 67.5%, and with the sequential method it was 54.6%.
Intelligent Diagnostic System for the Sliding Bearing Unit
585
On the basis of the conducted researches, it is possible to draw a conclusion about perspective application of diagnostics of the state of sliding bearing units with the help of monitoring system “thermal imager - images of the thermogram - artificial neural network”. Acknowledgment. The work was done within the framework of the project “Creation of digital system for monitoring, diagnostics and predicting the state of technical equipment using artificial intelligence technology on the basis of national hardware and software”. Under the contract for research, development and technological work No. 4869-2081 dated April 19, 2021 with LLC “ELSEL”.
References 1. Feng, Z., Liang, M., Chu, F.: Recent advances in time-frequency analysis methods for machinery fault diagnosis: a review with application examples. Mech. Syst. Signal Process 38(1), 165–205 (2013) 2. Li, Y., Wang, X., Si, S., Huang, S.: Entropy based fault classification using the case western reserve university data: a benchmark study. IEEE Trans.Reliab. (2019-03-07) [2019-03-28] (2019). Doi: https://doi.org/10.1109/TR.2019.2896240 3. Wang, Z., Du, W., Wang, J., Zhou, J., Han, X., Zhang, Z., et al.: Research and application of improved adaptive momeda fault diagnosis method. Measurement 140, 63–75 (2019) 4. Wang, Z., He, W., Du, W., Zhou, J., Han, X., Wang, J., et al.: Application of parameter optimized variational mode decomposition method in fault diagnosis of gearbox. IEEE Access 7, 44871–44882 (2019) 5. Li, Y., Wang, X., Liu, Z., Liang, X., Si, S.: The entropy algorithm and its variants in the fault diagnosis of rotating machinery: a review. IEEE Access 6, 66723–66741 (2018) 6. Zhang, C., Harne, R.L., Li, B., Wang, K.: Statistical quantification of dc power generated by bistable piezoelectric energy harvesters when driven by random excitations. J. Sound Vib. 442, 770–786 (2019) 7. Zhang, C., et al.: Multi-faults diagnosis of rolling bearings via adaptive customization of flexible analytical wavelet bases. Chinese J. Aeronautics 2019. (2019-03-25) [2019-03-28]. Doi.https://doi.org/10.1016/j.cja.2019. 03.014 8. Li, Y., Li, G., Yang, Y., Liang, X., Xu, M.A.: fault diagnosis scheme for planetary gearboxes using adaptive multi-scale morphology filter and modified hierarchical permutation entropy. Mech. Syst. Signal Process. 105, 319–337 (2018) 9. Zhao, M., Lin, J.: Health assessment of rotating machinery using a rotary encoder. IEEE Trans. Ind. Electron. 65(3), 2548–2556 (2017) 10. Li, Y., Du, X., Wan, F., Wang, X., Yu, H.: Rotating machinery fault diagnosis based on convolutional neural network and infrared thermal imaging. Chin. J. Aeronaut. 33(2), 427–438 (2020) 11. Choudhary, A., Mian, T., Fatima, S.: Convolu-tional neural network based bearing fault diagnosis of rotating machine using thermal images. Measurement 176, 109196 (2021) 12. Shao, H., Xia, M., Han, G., Zhang, Y., Wan, J.: Intelligent fault diagnosis of rotor-bearing system under varying working conditions with modified transfer convolu-tional neural network and thermal images. IEEE Trans. Industr. Inf. 17(5), 3488–3496 (2020)
586
A. Rodichev et al.
13. Shao, H., et al.: Fault diagnosis of a rotor-bearing system under variable rotating speeds using two-stage parameter transfer and infrared thermal images. IEEE Trans. Instrum. Meas. 70, 1–11 (2021) 14. Jia, Z., Liu, Z., Vong, C.M. and Pecht, M.: A rotating machinery fault diagnosis method based on feature learning of thermal images. IEEE Access 7, 12348–12359 (2019). Author, F.: Article title. Journal 2(5), 99–110 (2016)
A Systematic Review on Security Mechanism of Electric Vehicles Vaishali Mishra1(B)
and Sonali Kadam2
1 VIIT, Pune, India [email protected] 2 BVCOEW, Pune, India [email protected]
Abstract. A classic protocol for in-vehicle network communication is the Controller Area Network (CAN) bus for electric vehicles. The key characteristics of CAN bus are its simplicity, reliability, and applicability for real-time applications. Unfortunately, the lack of a message authentication mechanism in the CAN bus protocol leaves it opens to numerous cyberattacks, making it easier for attackers to access the network. In this paper existing anomaly detection model based survey is proposed, also proposed model based on one-class SVM is proposed approach is to enhanced security control in EV. Additionally, to demonstrate that the suggested method can be used with existing datasets. The suggested method’s independence from the meaning of each message ID and data field, which allows the model to be applied to various CAN datasets, is demonstrated by benchmarking with additional CAN bus traffic datasets. Keywords: Security · CAN · Electrical vehicle · Attack
1 Introduction The currently used interfaces have increased the number of remote attack surfaces that can be used by attackers to send a malicious message and gain access to a CMU. Attack detection is the process of identifying patterns in a dataset that do not conform to the established intended behaviour. Attack detection is regarded as a significant subject that has been investigated in a number of study areas. Attack detection is the process of watching communication traffic between ECUs in the CAN bus protocol and spotting any unusual behaviour. With evolution of vehicle auto making for full automation, the probability of attack intrusion is also increasing. As the data in CAN communication is not encrypted format, the possibility of data corruption is high. The simplicity of CAN protocol has made this protocol as most suitable interfacing. However, the vulnerability of this interface has a higher possibility of attacker intrusion. The rapid increase in the advanced features of vehicle operations such pedestrian detection, path planning, auto-parking, collision avoidance etc. increases the monitoring and control interface in addition to the existing controls such as the speed monitoring, power charging, steer controlling etc. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 587–598, 2023. https://doi.org/10.1007/978-3-031-35510-3_55
588
V. Mishra and S. Kadam
The evolution of automation in electrical vehicle interface observes following problems to address to make the system more secure. • CAN are more vulnerable to attacker due to open packet type commutation. • Attacker can easily control the operation by reverse reading the packets in CAN protocol • The support to vehicle interface under attack detection needs a simplification. • The attacks lead to vehicle isolation, and the self-control mechanism is less robust to variable factors in the vehicle in making decision.
2 Literature Survey The rapid development of new learning techniques has given many folds to detection and decision system. Machine learning (ML) is used in recent application of vehicle control for cyber security usage. One very popular usage of ML is in the detection of intrusion using intrusion detection system (IDS) [1–3]. A increasing number of academics have focused on attack,which is indicative of the fact that this subject has been viewed as one of the most important by academia, business, and government. Various studies demonstrate the CAN bus’s security features’ strengths and limitations [4]. A security interface called VeCure is presented in [5] for the solution of security provisioning in CAN bus interface. A traffic monitoring of CAN bus interface for in vehicle network communication is presented in [6]. The version of the traffic conditions are used in the monitoring of in vehicle network which is then used by the information computing in defining the condition of vehicle communication. A deep neural network based approach for intrusion detection is proposed in [7]. The feature based method is used in deriving information from the CAN packet in analysis of the intrusion and decision making for normal or attack data. An ensemble classifier model is presented in [8] for the detection of attack type in varying conditions. The presented approach is developed as an independent system with no prior information’s. In a state based approach for security operation in vehicle monitoring, [9] presented a Hidden markov model (HMM) using stochastic modeling. The HMM model process on the assumptions of vehicle position in deriving security measure. [10, 11] outlined a framework for intrusion detection, which presented a clock skew monitoring in exploiting the attack. The observation for clock variation is used as the monitoring parameter in controlling of vehicle operations. This method observed attacks of clock spoofing. In [12] neural network is used in interfacing of CAN bus data for the attack prediction. The intrusion detection was developed based on the packet traffic values and NN is used in predicting the upcoming packets. The error of packet variation is considered in attack detection. Other learning approach such as the regression model is outlined in [13–15] where a correlative approach among different sensors are used to derive the malfunction for in vehicle control monitoring.
A Systematic Review on Security Mechanism of Electric Vehicles
589
In detection of attack in [16] a classifier model for CAN measuring is proposed. The presented approach developed a greedy model in deriving the attack detection. The method reads the message fields and correlate with a set range to track the attack condition. In [17] the CAN ID parameter is used in deriving the attack condition. Signature based model is presented in [18] for the detection of attack conditions. However, the methods are relied on specific packets in developing a decision. The variation is time CAN broadcasting is evaluated in [19–21], where the version in the time parameter for CAN broadcast is evaluated. The variation is used in the measurement of attack condition. Similar approach of time monitoring is outlined in [22], where a one class support vector machine (SVM) is used for deriving attack conditions. In order to give the best security and accuracy, a modified one-class support vector machine (OCSVM) was used in the construction of a current approach of cyber security detection utilising machine learning [23]. A brand-new, upgraded version of the bat algorithm-based one-class SVM. The proposed bat algorithm will introduce a successful two-stage modification strategy for bat algorithm by enhancing population diversity to avoid the premature convergence. This would maximise the effectiveness and performance of the attack detection model against cyber threats. In [24] a monitoring of electrical vehicle charging under different attacks and switching is presented. The communication model developed a server vehicle communication in selection of authenticated switch in charging of EV. [25] Presented a twin approach for multiple data input in the control of EV operation. The existing approach however lacks in developing a validation of attack in making decision under variable conditions. The complexity and switching of control under attack condition results in node isolation which is not been addressed in EV operations, and the inbuilt self test needs a robust monitoring in self decision under varying input parameters are needed a more integrated observation in making a self decision. [35] Authors had discussed working of CAN bus, possible vulnerabilities of the CAN bus and different kinds of attacks and ways to mitigate them.
3 Existing Method An illustration of current monitoring and controlling of electric vehicle (EV) under cyber attack is illustrated in Fig. 1. In monitoring and controlling operation, In monitoring and controlling operation, Using the in-vehicle network protocol, values from sensors and actuators are sent to CMU. There are different communication protocol in which CAN, Flexible Data-Rate (CAN FD), Local Interconnect Network (LIN), FlexRay, and Media Oriented Systems Transport (MOST) were used. Among these protocol CAN bus is used in most interfaces. This has become a standard in communication using CAN. These protocols has also been used in various real time applications such as the military application, agriculture, aerospace, medical application etc. In the controlling of vehicle interface, the system has been a open loop system where signals from external interface are provided for monitoring and controlling option. The mean of communication is basically a wireless mean. The wireless mean has a limitation of attacks, where attacker can insert malicious data into the CAN data to control the vehicle operation. The need of attack monitoring and operation control for wide range of vehicle interface is a primal need in the evolving electrical vehicle operation (Table 1, 2 and 3).
590
V. Mishra and S. Kadam
Fig. 1. Existing framework of monitoring and control operation of EVs under cyber attack Table 1. Existing Types of attacks covered: Authors
Types of attacks covered
Conclusion
[2019][26] All types of attacks
All messages sent by any foreign node are correctly identified and deleted to prevent their impact on the ABS
[2019][27] All types of attacks
Researchers will find some inspiration for improving security solutions in this area from the concerns covered in the paper. Security researchers must therefore conduct thorough investigations into a wide range of attacker strategies and tactics in order to create realistic data for training and testing
[2022][28] Mainly Message shot,posing node attack A actual in-vehicle CAN bus message dataset was used for the experiments, and the findings show that the AMAEID model performs better than certain conventional machine learning algorithms on three classification assessment metrics [2022][29] All types of attacks
Intrusion Detection System (IDS)
[2022][30] All types of Attacks
A unique context-aware ensemble IDS for CAN bus security is called CAN-CID. Our tests revealed that the ensemble model outperformed two baselines and a proposed model variant while also improving attack detection performance overall
(continued)
A Systematic Review on Security Mechanism of Electric Vehicles
591
Table 1. (continued) Authors
Types of attacks covered
Conclusion
[2020][36] All types of Attacks, mainly DoS
For a deployable in-vehicle IDS, a low detection latency like the CID model is required
[2020][31] DoS, Fuzzing, and Spoofing
IDS is more promising countermeasure than others. It cant protext every attack but prevents most of them to cause severe problems
[2021][32] Frame Spoofing, Unauthorized access
The attack-free traffic from a genuine car should be used to create the CAN bus attack dataset. By adding three different types of attacks—DoS, Fuzzing, and Spoofing—to the attack-free environment, you can create attack datasets
[2021][33] All types of attacks
the layer’s and the optimizer’s false-negative rates are used
[2021][34] All types of attacks
“Researched and constructed an intrusion detection system (IDS) that analyses message identifier sequences to detect anomalies in vehicular CAN bus traffic and evaluated its efficacy using CAN bus data obtained from a heavy-duty vehicle.“
Table 2. Studies on optimization Detection/Prevention techniques implemented For electric vehicle: Authors
Detection/Prevention techniques implemented
Outcome
[2019][26] Message Authentication System
The model incorporates additional security characteristic information specific to each transmitted CAN frame in order to use message authentication security feature to detect any third-party intrusion
[2019][27] CAN-CID Model and Other
An overview of the several IDS products currently available with categorization depending on the approach (continued)
592
V. Mishra and S. Kadam Table 2. (continued)
Authors
Detection/Prevention techniques implemented
Outcome
[2022][28] Attention Mechanism and Auto Encoder According to the experimental data, it is for Intrusion Detection (AMAEID) demonstrated that integrating the attention mechanism and the autoencoder increases the AMAEID model’s performance, which also partially demonstrates how well the new methodology in this work works [2022][30] CAN-CID Model
The centre word prediction accuracy determines how accurately the CAN-CID GRU model detects words. To identify weak anomalies, we anticipate to be able to distinguish between accurate forecasts for benign frames and faulty predictions for assault frames. The CAN-CID model necessitates more variation in the benign data in order to minimise the undetected CAN ID sequences and time intervals, which is one of the suggested approach’s shortcomings
[2020][36] IDS
Compromising Network Segmentation, Encryption, Authentication, IDS
[2020][31] LSTM
Datasets for CAN bus assaults are generated from genuine automobile data. We construct three distinct attack scenario datasets for DoS, Fuzzing, and Spoofing using the raw data we extract from the real automobile. The CAN bus systems can be protected with a reliable IDS. Due to the fact that DoS and spoofing attacks employ particular CAN IDs and have a similar pattern of recurrent attacks, the suggested LSTM model has a greater detection accuracy in these situations (continued)
A Systematic Review on Security Mechanism of Electric Vehicles
593
Table 2. (continued) Authors
Detection/Prevention techniques implemented
Outcome
[2021][32] HOTP
A lightweight authentication protocol for CAN bus is proposed which utilizing the HMAC-Based One-Time Password (HOTP), a mechanism introduced in RFC4226, the message authentication code was added to each frame to avoid frame spoofing
[2021][35] IDS
Vehicle CAN Bus Anomaly Detection Using Message Identifier Sequences
[2021][33] Situation Prediction & Risk Assessment A security situation awareness model is System proposed based on the stacked denoising auto-encoder (SDAE) and bidirectional long short-term memory (Bi-LSTM) [2021][34] IDS
Evaluation of Machine Learning-Based Intrusion Detection’s Comparative Performance in the In-Vehicle Controller Area Network Bus
Table 3. The CAN bus system literature in the automotive field is summarized below. References Detection strategy Method
Placement strategy
[37]
Anomaly-based
CAN
[38]
Specification-based CANopen 2.0 and 3.01 specification
ECU
[37]
Anomaly-based
Frequency-based
CAN
[39]
Signature-based
Sensor-based
ECU
[40]
Anomaly-based
Statistical-based (entropy-based)
CAN
[41]
Anomaly-based
Frequency-based
CAN
[42]
Anomaly-based
Frequency-based
CAN
[43]
Anomaly-based
Frequency-based
CAN
[44]
Anomaly-based
ML-based (ANN)
Central gateway
[45]
Anomaly-based
ML -based (deep neural network)
CAN
[46]
Anomaly-based
Statistical-based (hidden Markov)
CAN
[47]
Anomaly-based
Statistical-based (RLS and CUSUM) CAN
Frequency-based
(continued)
594
V. Mishra and S. Kadam Table 3. (continued)
References Detection strategy Method
Placement strategy
[48]
Anomaly-based
Machine learning-based (OCSVM)
CAN
[49]
Anomaly-based
Frequency-based
CAN
[50]
Anomaly-based
Frequency-based
CAN
4 Proposed Model • Proposed Anomaly Detection Method Based on CAN Bus The suggested bat algorithm would introduce a successful two-stage modification strategy for bat algorithm, adjusting the attack detection model parameters ideally to maximize the efficiency and performance of the model against cyber threats by increasing population diversity to avoid the premature convergence. In [24] a monitoring of electrical vehicle charging under different attacks and switching is presented. The communication model developed a server vehicle communication in selection of authenticated switch in charging of EV. [25] Presented a twin approach for multiple data input in the control of EV operation. The existing approach however lacks in developing a validation of attack in making decision under variable conditions. The complexity and switching of control under attack condition results in node isolation which is not been addressed in EV operations, and the inbuilt self-test needs a robust monitoring in self decision under varying input parameters are needed a more integrated observation in making a self-decision. The proposed approach is illustrated in fig 2 below.
• One-Class Support Vector Machine (Ocsvm) OCSVM is a classification system that determines the smallest subsets in an input space that include a specified percentage of the data. Consider the so-called “normal dataset,” which is a particular set of data. OCSVM resolves (1) to produce the best classification as shown below. |w| = xi yi zi where w is a vector of weights, x is a Lagrange multiplier, y signifies either + 1 or −1, i.e., the sample’s class, and z denotes the samples drawn from the data. • BAT Algorithm The proposed bat algorithm would introduce a successful two-stage modification approach for bat algorithm by increasing population diversity to avoid the premature convergence and optimize the attack detection model parameters for maximizing the effectiveness and performance of the model against cyber-attacks.
A Systematic Review on Security Mechanism of Electric Vehicles
595
Fig. 2. Proposed Frame work
5 Conclusion The key originality of the paper is its description of the difficulties and obstacles associated with electric vehicles in the Indian setting. When we looked at the prior research in this field, we discovered that the majority of the studies either looked at the behavior of exchanged frames or used the data contained in the frame very briefly without giving the data itself much thought. The current strategy, however, falls short in creating a validation of attack in decisionmaking under uncertain circumstances. The integrated self-test requires a robust monitoring in self-decision under varied input parameters, and the complexity and switching of control under attack condition leads in node isolation, which is not handled in EV operations. a self-decision that incorporates observation in a more comprehensive way.
596
V. Mishra and S. Kadam
Furthermore, no conventional classification methods are employed in these investigations. For these reasons, we have suggested in this work that the OCSVM, RF, SVM, or MLP classifiers be used to differentiate between legitimate and malicious transmissions based on bat algorithm.
References 1. Kieu, T., Yang, B., Jensen, C. S.: Outlier detection for multidimensional time series using deep neural networks. In: 19th IEEE International Conference on Mobile Data Management (MDM), pp. 125–134 (2018) 2. Sommer, C., Hoeer, R., Samwer, M., Gerlich, D.W.: A deep learning and novelty detection framework for rapid phenotyping in high-content screening. Mol. Biol. Cell. 28(23), 3428– 3436 (2017) 3. Sanjay Sharma, C., Krishna, R., Sahay, S.K.: Detection of advanced malware by machine learning techniques. In: Ray, K., Sharma, T.K., Sanyog Rawat, R.K., Saini, A.B. (eds.) Soft Computing: Theories and Applications. AISC, vol. 742, pp. 333–342. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0589-4_31 4. Avate_pour, O., Malik, H.: State-of-the-art survey on in-vehicle network communication (CAN-Bus) security and vulnerabilities (2018) arXiv:1802.01725 5. Wang, Q., Sawhney, S.: VeCure: a practical security framework to protect the CAN bus of vehicles. In: 2014 International Conference on the Internet of Things (IOT). Cambridge, MA, USA, pp. 13–18 (2014) 6. Marchetti, M., Stabili, D., Guido, A., Colajanni, M.: Evaluation of anomaly detection for invehicle networks through information-theoretic algorithms. In: 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), pp.1–6 (2016) 7. Kang, M.-J., Kang, J.-W.: Intrusion detection system using deep neural network for in-vehicle network security, PLoS ONE 11(6), e0155781 (2016) 8. Theissler, A.: Detecting known and unknown faults in automotive systems using ensemblebased anomaly detection. Knowl.-Based Syst. 123, 163–173 (2017) 9. Narayanan, S.N., Mittal, S., Joshi, A.: OBD_securealert: an anomaly detection system for vehicles. In: 2016 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–6 (2016) 10. Cho, K.-T., Shin, K. G.: Fingerprinting electronic control units for vehicle intrusion detection. In: Proceedings 25th USENIX Security Symposium (USENIX Secur.). Berkeley, CA, USA: USENIX Association (2016) 11. Tayyab, M ., Hafeez, A, Malik, H.: Spoo_ng attack on clock based intrusion detection system in controller area networks. In: Proceedings NDIA Ground Vehicle Systems Engineering and Technology Symposium, pp. 1–13 (2018) 12. Taylor, A., Leblanc, S., Japkowicz, V.: Anomaly detection in automobile control network data with long short-term memory networks. In: Proceedings 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 130–139 (2016) 13. Li, H., Zhao, L., Juliato, M., Ahmed, S., Sastry, M.R., Yang, L.L.: POSTER: intrusion detection system for in-vehicle networks using sensor correlation and integration. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2531–2533 (2017) 14. Ganesan, A., Rao, J., Shin, K.G.: Exploiting consistency among heterogeneous sensors for vehicle anomaly detection. SAE Tech. Paper 2017-01-1654 (2017)
A Systematic Review on Security Mechanism of Electric Vehicles
597
15. Pajic, M., et al.: Robustness of attack-resilient state estimators. In: 2014 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS). IEEE, pp. 163–174 (2014) 16. Markovitz, M., Wool, M.: Field classification, modeling and anomaly detection in unknown CAN bus networks. Veh. Commun. 9, 43–52 (2017) 17. Marchetti, M., Stabili, D.: Anomaly detection of CAN bus messages through analysis of ID sequences. In: 2017 IEEE Intelligent Vehicles Symposium. (IV), Los Angeles, CA, USA, pp. 1577–1583 (2017) 18. Studnia, I., Alata, E., Nicomette, V., Kaâniche, M., Laarouchi, Y.: A language-based intrusion detection approach for automotive embedded networks. Int. J. Embedded Syst. 10(1), 1–11 (2018) 19. Tomlinson, A., Bryans, J., Shaikh, S.A., Kalutarage, H.K.: Detection of automotive CAN cyber-attacks by identifying packet timing anomalies in time windows. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Luxembourg City, Luxembourg, pp. 231–238 (2018) 20. Martinelli, F., Mercaldo, F., Nardone, V., Santone, A.: Car hacking identification through fuzzy logic algorithms. In: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7 (2017) 21. Lee, H., Jeong, S. H., Kim, H. K.: OTIDS: a novel intrusion detection system for in-vehicle network by using remote frame. In: Proceedings 2017 15th Annual Conference on Privacy, Security and Trust (PST), pp. 57–60 (2017) 22. Taylor, A., Japkowicz, N., Leblanc, S.: Frequency-based anomaly detection for the automotive CAN bus. In: Proceedings 2015 World Congress on Industrial Control Systems Security (WCICSS), pp. 45–49 (2015) 23. Avatefipour, O., et al.: An intelligent secured framework for cyberattack detection in electric vehicles’ CAN bus using machine learning. IEEE Access 7, 127580–127592 (2019) 24. AYDIN, Ö.: Authentication and Billing Scheme for The Electric Vehicles: EVABS. Uluslararası Yönetim Bili¸sim Sistemleri ve Bilgisayar Bilimleri Dergisi 6(1), 29–42 (2022) 25. Ghanishtha, B., Mohan, H., Singh, R.R.: Towards the future of smart electric vehicles: digital twin technology. Renew. Sustain. Energy Rev. 141, 110801 (2021) 26. Ishak, M.K., Khan, F.K.: Unique Message Authentication Security Approach based Controller Area Network (CAN) for Anti-lock Braking System (ABS) in Vehicle Network “(EUSPN 2019), Coimbra, Portugal 160 4–7 (2019) 27. Lokman, S.-F., Othman, A.T., Muhammad-Husaini, A.-B.: Intrusion detection system for automotive Controller Area Network (CAN) bus system: a review. EURASIP J. Wireless Commun. Netw. New York 2019(1), 1–17 (2019) 28. Wei, P., Wang, B., Dai, X., Li, L., He, F.: A novel intrusion detection model for the CAN bus packet of in-vehicle network based on attention mechanism and autoencoder. Digital Commun. Netw. 9, 14–21 (2022) 29. Tahsin C., Dönmez, M.: Anomaly detection in vehicular CAN Bus using message Identifier Sequences. IEEE Explore 9, 136243–136252 (2021) 30. ShiLiang, D., Huo, K., Wu, T.: A CAN bus security testbed framework for automotive cyberphysical systems. Hindawi August 2022 Wireless Communications and Mobile Comput. 1–11 (2022) 31. Rajapaksha, S., Kalutarage, H., Al-Kadri, M. O., Madzudzo, G., Petrovski, A.V.: Keep the moving vehicle secure: context-aware intrusion detection system for in-vehicle CAN bus security. In: 2022 14th International Conference on Cyber Conflict: Keep Moving! (CyCon), pp. 309–330 (2022) 32. Hossain, M.D., Inoue, H., Ochiai, H., Fall, D., Kadobayashi, Y.: Long short-term memorybased intrusion detection system for in-vehicle controller area network bus. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 10–17 (2020)
598
V. Mishra and S. Kadam
33. Luo, J.N., Wu, C.M., Yang, M.H.: A CAN-Bus lightweight authentication scheme. Sensors (Basel). 21(21), 7069 (2021) 34. Lei, C., et al.: SDAE+Bi-LSTM-Based situation awareness algorithm for the CAN bus of intelligent connected vehicles. Electronics 11(1), 110 (2022) 35. Moulahi, T., Zidi, S., Alabdulatif, A., Atiquzzaman, M.: Comparative performance evaluation of intrusion detection based on machine learning in in-vehicle controller area network bus. IEEE Access 9, 99595–99605 (2021) 36. Biradar, N., Mohite, Y.: Security Challenges in Controller Area Network (CAN) in Smart Vehicles: Grenze International Journal of Engineering and Technology, June Issue (2022) 37. Bozdal, M., Samie, M, Aslam, Jennions, I.: Evaluation of CAN bus security challenges. Sensors 20, 2364 (2020) 38. Hoppe, T., Kiltz, S., Dittmann, J.: Applying intrusion detection to automotive it-early insights and remaining challenges. J. Inform. Assur. Secur. 4(6), 226–235 (2009) 39. Larson, U.E., Nilsson, D.K., Jonsson, E.: An Approach to Specification-Based Attack Detection for In-Vehicle Networks. In: Intelligent Vehicles Symposium. In: 2008 IEEE, pp. 220–225 (2008) 40. Müter, M., Groll, A., Freiling, F.C.: A structured approach to anomaly detection for in-vehicle networks. In: Information Assurance and Security (IAS). (Atlanta, 2010), pp. 92–98 (2010) 41. Müter, M., Asaj, N.: Entropy-based anomaly detection for in-vehicle networks. In: 2011 IEEE Intelligent Vehicles Symposium (IV). (Baden-Baden, 2011), pp. 1110–1115 42. Ling, C., Feng, D.: An algorithm for detection of malicious messages on CAN buses. In: 2012 National Conference on Information Technology and Computer Science. (Atlantis Press, Paris) (2012) 43. Miller, C., Valasek, C.: Adventures in automotive networks and control units. Def. Con. 21, 260–264 (2013) 44. Miller, C., Valasek, C.: A Survey of Remote Automotive Attack Surfaces. In: Black Hat USA, 2014, p. 94 (2014) 45. Wasicek, A., Weimerskirch, A.: In: SAE Technical Paper. Recognizing Manipulated Electronic Control Units (No. 2015-01-0202) (2015) 46. Taylor, A., Leblanc, S., Japkowicz, N.: Anomaly detection in Automobile Control Network Data with Long Short-Term Memory Networks. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2016) 47. Narayanan, S.N., Mittal, S., Joshi, A.: OBD_SecureAlert: An Anomaly Detection System for Vehicles. In: 2016 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–6 (St. Louis, 2016) 48. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends®. Signal Process. 7(3–4), 197–387 (2014) 49. Cho, K.T., Shin, K.G.: Fingerprinting Electronic Control Units for Vehicle Intrusion Detection. In: 25th {USENIX} Security Symposium ({USENIX} Security 16), pp. 911–927 (Austin, 2016) 50. Taylor, A., Japkowicz, N., Leblanc, S.: Frequency-Based Anomaly Detection for the Automotive CAN Bus. In: 2015 World Congress on Industrial Control Systems Security (WCICSS), pp. 45–49 (London, 2015) 51. Gmiden, M., Gmiden, M.H., Trabelsi, H.: An Intrusion Detection Method for Securing InVehicle CAN Bus. In: 2016 17th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), pp. 176–180 (Sousse, 2016)
Experimental Investigation of CT Scan Imaging Based COVID-19 Detection with Deep Learning Techniques Aditya Shinde1 , Anu Bajaj2,3(B) , and Ajith Abraham3 1 IIIT, Pune, India 2 Thapar Institute of Engineering and Technology, Patiala, India
[email protected] 3 Machine Intelligence Research Lab, Washington, USA
Abstract. One of the deadliest pandemics in recorded history, COVID-19 has resulted in more than 657 million infections and 6.67 million fatalities that have been confirmed as of December 27, 2022. An alternate screening method to RTPCR is radiographic testing, which includes chest X-ray and CT scan imaging. Especially CT scan imaging offers a clear image of the illness and severity level. The proposed methodology focuses on developing a COVID-19-compatible deep learning-based medical imaging system using widely accessible CT scan images. The proposed system receives CT scan pictures and categorizes them into COVID and non-COVID specified categories. Experimental investigation with different hyper-parameters is done to check the effectiveness of deep learning for COVID19 detection. This system will also perform augmentation of CT scan images and uses a dropout strategy to improve the model’s generalization ability. Experiments are performed on two benchmark datasets, resulting in a maximum accuracy of 95.53% on training data and 90.62% on test data. The proposed model outperformed existing machine learning algorithms in terms of accuracy, precision, recall and F1 score for both the datasets. Keywords: CT Scan Images · COVID-19 · Convolution Neural network · Augmentation · Dropouts
1 Introduction Image Since 2019, all countries have been struggling to come out of the negative impact caused by this virus. COVID-19 incidents have been reported in even more than 190 countries ever since then. This virus spreads rapidly, and still, it is impacting badly in the year 2022; many countries, including China, are experiencing a major outbreak resulting in many deaths [1]. The number of incidents increased because the coronavirus infection occurs within the body in just two weeks. Although many vaccines and screening methods have been discovered, many countries face a shortfall in healthcare resources due to its larger spread [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 599–613, 2023. https://doi.org/10.1007/978-3-031-35510-3_56
600
A. Shinde et al.
Reverse Transcription Polymerase Chain Reaction (RT-PCR) test is thought to become a standardized procedure for COVID-19 screening. Though, it could not provide a hundred percent reliable results since it looks only for the existence of specific pathogens to the virus. The RT-PCR has the limitations of being labour-intensive and highly variable test outcomes. Also, this test only confirms whether a patient is corona positive or negative. It does not state the level of corona infection and the patient’s severity level. Screening methods like X-ray and CT scan screenings are useful to determine severity level of the corona infection. Also, these methods give a quick and precise diagnosis of COVID-19. CT scan imaging performs better than X-ray as they offer better quality and contrast that helps extract the most relevant information from the images [1]. Deep learning image recognition systems are being utilized by the researchers to identify corona signs [2–5]. Therefore, we proposed a convolutional neural network (CNN) model for COVID-19 detection using CT scan images. The main contributions are: • • • •
To propose a CNN model for detecting COVID-19 using CT Scan imaging techniques To apply augmentation techniques on the images for generating more robust model To validate the model on two benchmark datasets To compare the performance with existing machine learning (ML) models
The organization of the paper is: Sect. 2 presents the related work. The Sect. 3 put forth proposed methodology, Sect. 4 explains detailed experimental setup and Sect. 5 analyzes the results and discussions. Section 6 concludes with future work.
2 Related Work This Section describes the past developments in deep learning based COVID-19 identification from chest CT scan images. Zhu et al. [1] used P-V-Net algorithm to segment the infected lung regions followed by the feature extraction to classify CT scan images. The performance of different pre-trained models is evaluated and authors concluded that models trained on out-of field datasets, boost the performance of COVID-19 diagnosis [2]. The underlying distributions of the biomarkers and differences in model performances were studied by [3]. In [4, 5], authors have trained numerous deep convolutional networks for unbalanced datasets to categorise X-ray images into three categories: normal, pneumonia, and COVID-19. The COVID MTNet architecture and the NABLA-N (∇N-Net) technique to segment image for both CXR and CT scan images, and multi-task deep learning employing transfer learning to detect COVID-19 were all suggested in [6]. The authors also introduced a new method for quantitatively analyzing the diseased area in CXR and CT scan images. One of the open-source networks for COVID-19 identification is the deep convolution network known as “COVID-Net,” which is recommended in [7] and made accessible to the general public online. It is tailored exclusively for COVID-19 diagnosis based on CXR pictures consisting of 13,975 CXR images. This model delved extensively into crucial aspects of COVID situations using a novel explainable technique. The method “DeepCOVIDExplainer” proposed for automatically identifying pneumonia, COVID-19, and normal patients from COVID-19 symptoms based on CXR
Experimental Investigation of CT Scan Imaging Techniques
601
images. The ensemble of DenseNet’s, ResNet’s, and VGGNet’s pre-processed, enhanced, and classified 16,995 CXR images [8]. The Deep Convolutional Neural Network (DCNN) used for binary classification of CXR images of pneumonia is compared to finely tailored pre-trained models. The CXR and CT scan pictures totaling 5856, of which 4273 are of pneumonia and 1583 are normal, are used to evaluate these improved models [9]. ResNet50 was used to extract deep features from CXR images, which were then fed to support vector machines (SVM) for COVID-19 detection. Deep networks were not used for classification because of less data availability for SVM. The COVID-19, pneumonia, and normal classes were taken into account. The outcomes were contrasted with those obtained using SVM in combination with conventional image processing techniques [9]. The detection of COVID-19 from machine learning techniques were systematically reviewed and analyzed by Ding et al. [10]. In [11], explainable deep neural network (xDNN) is proposed which can run on very low power computers without GPUs and its architectures itself is explainable. In this paper, we have proposed a CNN architecture for COVID-19 classification as follows.
3 Proposed Work The proposed methodology is as given in Fig. 1. It takes the input of CT scan images and applies pre-processing, image augmentation and given to CNN for classification as described below: 3.1 Image Pre-processing Each image is converted into the uniform size of 150*150 and normalized wherein each pixel intensity value is converted in the range of 0 to 1. The pre-processing includes resizing the images to the necessary dimensions with a batch size of 32 and an input picture of 224x224. 3.2 Augmentation Data augmentation is a powerful technique that applies number of operations on images and generates more variation in the training data so that overfitting can be avoided and more robust model can be created [12]. This technique is mainly useful when data size is less. Because of data augmentation, generalizability of the model gets improved [13]. The proposed model uses the following operations as a part of augmentation- horizontal flip, height shift, rotation, zooming and width shift. The proposed method has explored two kinds of variations in augmentation as given in Table 1. Augmentation-1 uses the rotation of 45 degrees and zooming of 0.50. It has changed the images considerably so there is a drop in performance of the model. Augmentation-2 is done with rotation of 30 degrees and zooming of 0.20 and it has improved the model performance with good generalization.
602
A. Shinde et al.
Fig. 1. Proposed Methodology Table 1. Image Augmentation Details Parameter of Image Augmentation
Augmentation-1
Augmentation -2
Rotation
45 degrees
30 degrees
Width Shift
0.15
0.15
Height Shift
0.15
0.15
Zooming
0.50
0.20
Horizontal Flip
True
True
3.3 Convolution Neural Network Model Convolution neural networks (CNNs) are mainly used for image classification. These are proved to be efficient and widely preferred because of both the tasks of feature extraction as well as classification by using the single architecture. CNNs have the standard structure with two main parts: first part consists of convolution and pooling layers and it is for feature extraction while second part consists of fully connected layers to classify images. The number of layers of convolution, pooling and fully connected depends on the application and available computing power. Each of these layers has various hyperparameters like number of layers, filters, filter size, activation functions, etc. and also training has the hyperparameters like optimizer, epochs, batch size, loss function, and performance metric [14]. The proposed model uses the following values of these hyper-parameters as given in Table 2. Table 3 shows the layerwise output shape and trainable parameters count by customizing the CNN with the defined hyperparameters in Table 2.
Experimental Investigation of CT Scan Imaging Techniques
603
Table 2. Hyperparameters for CNN Model Convolution Layers
Con1: 16 filters (3x3 size) Con2: 32 filters (3x3 size) Con3: 64 filters (3x3 size) Con4: 64 filters (3x3 size) AF1 : ReLu
Input Shape
150x150
Dense 1 Layer
512 Nodes, AF: Sigmoid
Dense 2 Layer
1 Node, AF: Sigmoid
Optimization
Adam
Loss Function
Binary Cross entropy
Performance Measure
Accuracy
Epochs
20
Batch Size
128
Image size
150x150
1 AF:Activation Function
Table 3. Layerwise output size and parameters count Layer
Output Shape
#Parameters
Con2D
150x150x16
448
MaxPooling
75x75x16
0
Con2D
75x75x32
4640
MaxPooling
37x37x32
0
Con2D
37x37x64
18496
MaxPooling
18x18x64
0
Con2D
18x18x64
36928
MaxPooling
9x9x64
0
Flaten
5184
0
Dense
512
2654720
Dense
1
513
Total Parameters
2,715,745
It uses a convolution network model with four blocks of convolution layers and maximum pooling. These convolution layers use 16, 32, 64 and 64 filters (each with a size of 3*3), respectively. Two dense layers with 512 and 1 nodes each are placed after these blocks. ReLu is used in the feature extraction part consisting of convolution & pooling
604
A. Shinde et al.
layers while Sigmoid activation functions are used in classification part consisting of dense layers respectively. 3.4 Dropout It is the simplest method for avoiding overfitting in networks. Some neurons are randomly selected and eliminated during network training. In other words, their activation levels are ignored in the forward pass and their weights are not updated in the backward pass of error propagation. This prevents the development of delicate models that are highly specialised to the training set of data [15]. When a layer’s dropout rate is set in terms of fractional integers like 0.1, 0.2, or 0.3, it signifies that 10%, 20%, or 30% of the nodes in the given layer will be dropped at random. The first and last max pooling layers in this case are subject to a dropout of 30%, while a fully connected layer with 512 nodes is subject to a dropout of 10%. Combining the augmentation and dropout procedures yields the improved performance and successfully removes the overfitting problem. Its details are given in Table 4. Table 4. Layerwise output size and parameters count of the CNN with dropouts and augmentation Layer
Output Size
#Parameters
Con2D
150x150x16
448
MaxPooling
75x75x16
0
Dropout
75x75x16
0
Con2D
75x75x32
4640
MaxPooling
37x37x32
0
Con2D
37x37x64
18496
MaxPooling
18x18x64
0
Dropout1
18x18x64
0
Con2D
18x18x64
36928
MaxPooling
9x9x64
0
Dropout2
9x9x64
0
Flaten
5184
0
Dense
512
2654720
Dropout
512
0
Dense
1
513
Total Parameters
2,715,745
Experimental Investigation of CT Scan Imaging Techniques
605
3.5 Model Compilation and Training Accuracy, Precision, Recall and F-measure are used as a performance metric. The assembled model is trained using training data for 20 iterations, and both training and validation are used to evaluate its performance.
4 Experimental Setup The suggested model framework is trained and tested using the cloud-based service Google Colaboratory, which is online and accessible. Python 3 is used in Google Colaboratory, along with supported libraries including Tensorflow, Keras, and OpenCV and hardware GPU accelerator.
5 Results and Discussion Experiments are performed on two different benchmark datasets of CT scan images. This is done because many time models are very much accustomed to the trained datasets and their performance degrades as soon as data from different sources changes. Dataset-I obtained from www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset [16]. DatasetII was taken from https://gas.graviti.com/dataset/graviti/COVID_CT [17]. These images are in.png and.jpg format and Table 5 presents the details of these datasets. Table 5. Dataset Used Sr. No
Dataset
Class
#Images
Total Images
Dataset-I
SARS-COV-2 CT-Scan
Positive
1252
2482
Negative
1230
Dataset-II
COVID-CT
Positive
349
Negative
397
746
Sample input images of each datasets are shown in Figs. 2 and 3. Two variations of experiments are performed on these datasets- without augmentation and with augmentation. In the experiments with Augmentation with rotation = 30 and zoom = 0.20 are performed.
Fig. 2. Dataset-I sample images
606
A. Shinde et al.
Fig. 3. Dataset-II sample images
5.1 Experimental Results on Dataset-I Table 6 and 7 and presents the training and validation accuracies & losses for all the three variations of experimentations as listed in the columns. These performances are recorded for 20 epochs. From the results, it is clear that accuracy without augmentation is higher. This is because the augmentation with rotation = 30 and zoom = 0.20 as in Fig. 4 has resulted into more challenging dataset.
Fig. 4. Sample Images Dataset-I after Augmentation (rotation = 30, zoom = 0.20)
Although the accuracy with augmentation is slightly less, it has produced the robust model that can handle images with variations in appearances. Also, it has reduced the overfitting wherein the difference between training and validation accuracy is less than in the first case without augmentation. Figure 5 shows the graphical representation of performance in terms of accuracy and losses. Table 6. Training and Validation Performance for Dataset-I without Augmentation Without Augmentation and Dropouts Epoch#
Training Accuracy
Testing Performance Accuracy
Loss
Precision
Recall
F1-Score
1
50.54
50.26
69.15
0
0
0
5
73.33
76.82
50.79
0.86
0.63
0.72
10
84.32
85.94
34.09
0.87
0.86
0.86
15
91.47
86.56
27.17
0.88
0.90
0.89
20
95.53
90.62
23.12
0.90
0.91
0.90
Experimental Investigation of CT Scan Imaging Techniques
607
Table 7. Training and Validation Performance for Dataset-I with Augmentation With Augmentation and Dropouts [Rotation = 30, Zoom = 0.20] Epoch#
Training Accuracy
Testing Performance Accuracy
Loss
Precision
Recall
F1-Score
1
77.26
74.74
51.25
0.68
0.94
0.79
5
86.58
85.94
31.04
0.84
0.65
0.77
10
87.55
85.68
33.39
0.90
0.79
0.84
15
87.34
89.50
27.50
0.91
0.85
0.86
20
88.58
82.55
39.37
0.91
0.70
0.80
These results are clearly visualized in the following plots given in Fig. 5. 5.2 Experimental Results on Dataset-II Table 8 and 9 represents the training and validation accuracies & losses for all the three variations of experimentations on Dataset-II. The performances for this dataset too are recorded for 20 epochs so that results can be compared with Dataset-I. Also, the accuracy without augmentation is higher as images are simpler and not tilted (see Table 9). Performance of Augmentation is lesser as it feeds more tougher and tilted images as given in Fig. 6 than given in Fig. 3. Dataset-I has more accuracy than Dataset-II, according to a comparison of the performance of the same model on the two datasets. This is due to the fact that Dataset-I comprises 2482 photos whereas Dataset-II only has 746. As a result, we may conclude that the model has learnt effectively for Dataset I and is capable of generalization.
608
A. Shinde et al.
a) Without Augmentation
b) With Augmentation Fig. 5. Dataset-I: Plots of Accuracy and Loss
Fig. 6. Sample Images Dataset-II after Augmentation (rotation = 30, zoom = 0.20)
Experimental Investigation of CT Scan Imaging Techniques
609
Table 8. Training and Validation Performance for Dataset-II without Augmentation Without Augmentation and Dropouts Epoch# 1
Training Accuracy 54.70
Testing Performance Accuracy
Loss
Precision
Recall
F1-Score
45.31
76.987
0
0
0
5
61.13
68.75
61.66
0.64
0.88
0.74
10
75.43
74.28
50.13
0.72
0.87
0.79
15
81.62
79.69
44.65
0.76
0.89
0.82
20
82.81
78.12
50.08
0.74
0.91
0.82
18
85.04
83.59
43.78
0.79
0.92
0.85
Table 9. Training and Validation Performance for Dataset-II with Augmentation With Augmentation and Dropouts [Rotation = 30, Zoom = 0.20] Epoch#
Training Accuracy
Testing Performance Accuracy
Loss
Precision
Recall
F1-Score
1
48.72
45.88
80.72
0
0
0
5
47.22
45.31
69.84
0
0
0
10
55.34
50
68.84
0.5
1
0.66
15
59.40
55.47
66.43
55.47
1
0.71
20
59.83
55.47
66.42
0.55
1
0.71
5.3 Evaluation with Classification by SVM Trained on Handcrafted Features Digital image processing uses a texture analysis technique called Gray-Level Cooccurrence Matrix (GLCM). Using this technique, the relationship between two pixels that are close by and have different grey intensities is represented. In general, we utilise GLCM to extract texture features from images, including homogeneity, contrast, dissimilarity, correlation, angular second moment (ASM), and energy. In this paper GLCM features are extracted and given to different ML models to classify (see Fig. 8). The experimental results of different machine learning algorithms trained on GLCM features are as given in Table 10 and 11.
610
A. Shinde et al.
(a) Without Augmentation
(b) With Augmentation
Fig. 7. Dataset-II: Plots of Accuracy and Loss
Experimental Investigation of CT Scan Imaging Techniques
611
Fig. 8. COVID-19 Classification using ML Models Trained on GLCM Features
Table 10. Experimental Results on Dataset-I using ML Models Classifier/Model
Accuracy
K-Nearest Neighbor
62.7
Decision Tree Naive Bayes
Precision
Recall
F1-Score
0.625
0.63
0.63
72.2
0.725
0.72
0.72
60.9
0.615
0.61
0.61
Support Vector Machine
61.5
0.63
0.615
0.605
Logistic Regression
63.5
0.635
0.635
0.635
Random Forest
79.6
0.8
0.8
0.795
Gradient Boosting
75.8
0.755
0.76
0.76
Ada Boosting
73.8
0.74
0.735
0.735
Artificial Neural Network
50.9
0.56
0.505
0.37
Proposed CNN Model
90.62
23.12
0.90
0.91
Table 11. Experimental Results on Dataset-II using ML Models Classifier/Model
Accuracy
Precision
Recall
F1-Score
K-Nearest Neighbor
59.3
0.6
0.59
0.58
Decision Tree
64.6
0.65
0.645
0.645
Naive Bayes
61.3
0.655
0.605
0.575
Support Vector Machine
56.0
0.585
0.555
0.51
Logistic Regression
63.3
0.645
0.63
0.62
Random Forest
68.6
0.69
0.685
0.68
Gradient Boosting
63.3
0.64
0.63
0.625
Ada Boosting
64.6
0.65
0.645
0.645 (continued)
612
A. Shinde et al. Table 11. (continued)
Classifier/Model
Accuracy
Precision
Recall
F1-Score
Artificial Neural Network
54.6
0.695
0.535
0.42
Proposed CNN Model
83.59
0.79
0.92
0.85
6 Conclusion and Future Work This study showed the efficacy of CNN, a type of deep neural network to diagnose COVID-19 by training on patient CT scan images. This study provides an explanation of the CNN experimental details by modifying its hyper features. The use of augmentation and dropouts has made the network robust. The highest accuracy of 90.62% is attained and it can further be improvised by increasing the dataset size and executing the model for higher number of epochs on GPU powered machines. Also segmenting the region of interest will contribute in making the system more adaptable in real-world scenarios.
References 1. Zhu, F., et al.: Severity detection of COVID-19 infection with machine learning of clinical records and CT images. Technology and Health Care Preprint, 1–16 (2022) 2. Goncalves, J., Li, Y., et al.: Nature Machine Intelligence 3(1), 28–32 (2021) 3. Rahimzadeh, M., Attar, A.: A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2. Inform. Med. Unlocked 19, 100360 (2020) 4. Alom, M.Z., Rahman, M.M., Nasrin, M.S., Taha, T.M., Asari, V.K.: COVID_MTNet: COVID19 detection with multi-task deep learning approaches. arXiv preprint arXiv:2004.03747 (2020) 5. Khan, E., et al.: Chest X-ray classification for the detection of COVID-19 using deep learning techniques. Sensors 22(3), 1211 (2022) 6. Karim, M.R., Döhmen, T., Cochez, M., Beyan, O., Rebholz-Schuhmann, D., Decker, S.: Deepcovidexplainer: explainable COVID-19 diagnosis from chest X-ray images. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1034–1037. IEEE (2020) 7. El Asnaoui, K., Chawki, Y., Idri, A.: Automated methods for detection and classification pneumonia based on x-ray images using deep learning. In: Artificial Intelligence and Blockchain for Future Cybersecurity Applications, pp. 257–284. Springer International Publishing, Cham (2021) 8. Walvekar, S., Shinde: Detection of COVID-19 from CT images using resnet50. In: 2nd International Conference on Communication & Information Processing (ICCIP) (2020) 9. Walvekar, S., Shinde, S.: Efficient medical image segmentation of covid-19 chest CT images based on deep learning techniques. In: 2021 International Conference on Emerging Smart Computing and Informatics (ESCI). IEEE (2021) 10. Ding, W., Nayak, J., Swapnarekha, H., Abraham, A., Naik, B., Pelusi, D.: Fusion of intelligent learning for COVID-19: a state-of-the-art review and analysis on real medical data. Neurocomputing 457, 40–66 (2021)
Experimental Investigation of CT Scan Imaging Techniques
613
11. Angelov, P., Soares, E.: Towards explainable deep neural networks (xDNN). Neural Netw. 130, 185–194 (2020) 12. Bloice, M.D., Stocker, C., Holzinger, A.: Augmentor: an image augmentation library for machine learning. arXiv preprint arXiv:1708.04680 (2017) 13. Bloice, M.D., Roth, P.M., Holzinger, A.: Biomedical image augmentation using augmentor. Bioinformatics 35(21), 4522–4524 (2019) 14. Aslan, M.F., Sabanci, K., Durdu, A., Unlersen, M.F.: COVID-19 diagnosis using state-of-theart CNN architecture features and Bayesian Optimization. Comp. Biol. Med., 105244(2022) 15. Abdar, M., et al.: UncertaintyFuseNet: robust uncertainty-aware hierarchical feature fusion model with ensemble Monte Carlo dropout for COVID-19 detection. Inform. Fusion 90, 364–381 (2023) 16. Soares, E., Angelov, P., Biaso, S., Froes, M.H., Abe, D.K.: SARS-CoV-2 CT-scan dataset: a large dataset of real patients CT scans for SARS-CoV-2 identification. MedRxiv (2020): 2020–04 17. Zhao, J., Yichen, Z., Xuehai, H., Pengtao, X.: Covid-CT-dataset: a CT scan dataset about covid-19 (2020)
Author Index
A Abayomi, Abdultaofeek 408 Abdelwahed, Salwa 132 Abraham, Ajith 39, 258, 384, 396, 428, 599 Adetiba, Emmanuel 408 Agarwal, Ratish 506 Agrawal, Vivek Kumar 496 Ahirwar, Gajendra Kumar 506 Ait Said, Marouane 92 Akkineni, Arunkumar 436 Alexander, S. Albert 30 Amutha, S. 320 Anshu, 374 Arroyo, José Elias C. 166 Arroyo, José Elias Claudio 120 Arulmurugaselvi, N. 320 Arumugam, Senthil Kumar 477, 487 Arya, Shikha 365 Attew, David 244 B B, Ramachandramoorthy K. 535 Babatunde, Alao 408 Bajaj, Anu 384, 396, 599 Bansal, Rohit 374 Bechini, Alessio 224 Belguith, Syrine 194 Ben Belgacem, Ali 62 Ben Hamida, Sana 177 Bennour, Chaker 62 Berrima, Mouhebeddine 177 Biswas, Shreya 396 Bondielli, Alessandro 224 Boudali, Imen 350 Bouricha, Hajer 234 C Cesur, Elif 428 Cesur, Muhammet Ra¸sit 428 Cortes, Omar Andres Carmona Crispim, José 214
281
D da Silva, Ada Cristina França 281 Dananjayan, Sathian 374 Dani, Virendra 328 Dankov, Yavor 301 Das, Pradeep Kumar 39 Dasmahapatra, Jaly 517 Dasmahapatra, Mili 517 de Andrade Lira Rabelo, Ricardo 143 de Araújo Gonçalves, Clésio 143 de Carvalho Ferreira, José Fernando 143 de Freitas Araujo, Matheus 120, 166 de Melo Souza Veras, Rodrigo 143 de Oliveira Santos, Lucas 99 de Paula Félix, Gabriel 166 Dell’Oglio, Pietro 224 Dhaku, Chavan Rajkumar 477, 487 E Ejbali, Ridha 62, 153 El Kamel, Ali 270 Eltaief, Hamdi 270 F Fakhfakh, Fairouz 71 Fakhfakh, Faten 71 Farid, Mohsen 244 Fetisov, Alexander 8, 81, 110 Filho, Pedro Pedrosa Rebouças 99 G Garai, Rhytam 449 Gargouri, Faiez 310 Ghannouchi, Sonia Ayachi 17 Ghedira, Khaled 234 Gnouma, Mariem 153 Gola, Kamal Kumar 365, 567 Gorin, Andrey 577 Gouasmia, Karima 310 Goyal, Jyotsana 328 Greyson, Kenedy Aliila 408
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Abraham et al. (Eds.): ISDA 2022, LNNS 717, pp. 615–617, 2023. https://doi.org/10.1007/978-3-031-35510-3
616
Author Index
H Hajami, Abdelmajid 92 Hammal, Ayoub 556 Hammal, Youcef 556 Hanh, Tran Thuy 292 Hariharan, P. 30 Haripriya, AP 547 Harshini, B. 49 Hind, Meziane 258 Hoang, Pham Viet 292 Hsairi, Lobna 234 Huong, Hoang Thi Lan 292 I Ifijeh, Ayodele Hephzibah
Mishra, Keshav 1 Mishra, Vaishali 587 Mridula, 365, 567 Mukherjee, Riti 460 N Nastepanin, Kirill 577 Nhidi, Wiem 62 Nogueira, Thiago Henrique Notcker, Joachim 408 Noura, Ouerdi 258 O Oshin, Oluwadamilola 408 Ouled Abdallah, Nesrine 71
408
J Joga, Parmeshwara 49 K Kadam, Sonali 587 Kanauzia, Rohit 365 Kanojia, Mahendra 1 Kazakov, Yuri 81, 110 Kbaier, Wiem 17 Khalfallah, Soulef 194 Khanh, Tran Quoc 292 Khusainova, Albina 418 Kokate, Priyanka 328 Koohborfardhaghighi, Somayeh Korbaa, Ouajdi 194 Kornaev, Alexey 8, 110 Kornaeva, Elena 8 Kumar, Dileep 320
P Pacharane, Rani 1 Pal, Nirban 460 Panda, Rutuparna 39 Pandey, Anjana 506 Parveen, Shabana 449 Peter, Geno 30 Polyakov, Roman 110, 577 Priya, R. Devi 30
436
L Lerari, Mehdi 556 Louhichi, Walid 177 Luong, Duong Trong 292 M Mahmoudi, Siwar 62 Marcelloni, Francesco 224 Marques, Júlio Vitor Monteiro 143 Martins, Andreia 214 Medeiros, Aldísio Gonçalves 99 Mefteh, Wafa 310 Meher, Sukadev 39 Messaoud, Ines Belhadj 350
R R, Devipriya 535 RajeshKumar, S 535 Rajpal, Jimmy 384 Ramya, S 535 Reddy, Boggala Thulasi Rego, Nazaré 214 Rizvi, Baqar 244 Rodichev, Alexey 577 Rudra, Bhawana 496 S Sahay, Rashmi 49 Sahu, Adyasha 39 Sanae, Mazouz 258 Sarvanan, M. 30 Sasikala, C 535 Savin, Leonid 8, 81 Savosin, Sergey 204 Shinde, Aditya 599 Shutin, Denis 81 Sil, Riya 449, 460, 517 Singh, Brij Mohan 365
30
120
Author Index
617
Singh, Shailesh 436 Stebakov, Ivan 8, 110 Surendiran, B. 320 T Tasneem, AS 547 Tenepalli, Deepika 338 Teslya, Nikolay 204 Thandava Meganathan, Navamani Thang, Tran Xuan 292 Thangaraj, Rajasekaran 535 Toms, Biju 477 Touati, Haifa 132 Tuan, Tran Ngoc 292 Tyagi, Amit Kumar 374
U Umapathi, K
535
V Veloso e Silva, Romuere Rodrigues Vijayanand, KS 547 Volkov, Artem 204 338
Y Youssef, Habib
270
Z Zaied, Mourad 153 Zeraoulia, Khaled 556
143